[13/15] madlib-site git commit: jupyter notebooks for 1.14 release

2018-04-23 Thread fmcquillan
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/418f361c/community-artifacts/Decision-trees-v1.ipynb
--
diff --git a/community-artifacts/Decision-trees-v1.ipynb 
b/community-artifacts/Decision-trees-v1.ipynb
new file mode 100644
index 000..e97b943
--- /dev/null
+++ b/community-artifacts/Decision-trees-v1.ipynb
@@ -0,0 +1,1590 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+"# Decision trees\n",
+"\n",
+"A decision tree is a supervised learning method that can be used for 
classification and regression. It consists of a structure in which internal 
nodes represent tests on attributes, and the branches from nodes represent the 
result of those tests. Each leaf node is a class label and the paths from root 
to leaf nodes define the set of classification or regression rules."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 34,
+   "metadata": {},
+   "outputs": [
+{
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+  "The sql extension is already loaded. To reload it, use:\n",
+  "  %reload_ext sql\n"
+ ]
+}
+   ],
+   "source": [
+"%load_ext sql"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 35,
+   "metadata": {},
+   "outputs": [
+{
+ "data": {
+  "text/plain": [
+   "u'Connected: fmcquillan@madlib'"
+  ]
+ },
+ "execution_count": 35,
+ "metadata": {},
+ "output_type": "execute_result"
+}
+   ],
+   "source": [
+"# Greenplum Database 5.4.0 on GCP (demo machine)\n",
+"#%sql postgresql://gpadmin@35.184.253.255:5432/madlib\n",
+"\n",
+"# PostgreSQL local\n",
+"%sql postgresql://fmcquillan@localhost:5432/madlib\n",
+"\n",
+"# Greenplum Database 4.3.10.0\n",
+"#%sql postgresql://gpdbchina@10.194.10.68:61000/madlib"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+"%sql select madlib.version();\n",
+"#%sql select version();"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+"# Decision tree classification examples"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+"# 1. Load data\n",
+"Data set related to whether to play golf or not."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 36,
+   "metadata": {},
+   "outputs": [
+{
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+  "Done.\n",
+  "Done.\n",
+  "14 rows affected.\n",
+  "14 rows affected.\n"
+ ]
+},
+{
+ "data": {
+  "text/html": [
+   "\n",
+   "\n",
+   "id\n",
+   "OUTLOOK\n",
+   "temperature\n",
+   "humidity\n",
+   "Temp_Humidity\n",
+   "clouds_airquality\n",
+   "windy\n",
+   "class\n",
+   "observation_weight\n",
+   "\n",
+   "\n",
+   "1\n",
+   "sunny\n",
+   "85.0\n",
+   "85.0\n",
+   "[85.0, 85.0]\n",
+   "[u'none', u'unhealthy']\n",
+   "False\n",
+   "Don't Play\n",
+   "5.0\n",
+   "\n",
+   "\n",
+   "2\n",
+   "sunny\n",
+   "80.0\n",
+   "90.0\n",
+   "[80.0, 90.0]\n",
+   "[u'none', u'moderate']\n",
+   "True\n",
+   "Don't Play\n",
+   "5.0\n",
+   "\n",
+   "\n",
+   "3\n",
+   "overcast\n",
+   "83.0\n",
+   "78.0\n",
+   "[83.0, 78.0]\n",
+   "[u'low', u'moderate']\n",
+   "False\n",
+   "Play\n",
+   "1.5\n",
+   "\n",
+   "\n",
+   "4\n",
+   "rain\n",
+   "70.0\n",
+   "96.0\n",
+   "[70.0, 96.0]\n",
+   "[u'low', u'moderate']\n",
+   "False\n",
+   "Play\n",
+   "1.0\n",
+   "\n",
+   "\n",
+   "5\n",
+   "rain\n",
+   "68.0\n",
+   "80.0\n",
+   "[68.0, 80.0]\n",
+   "[u'medium', u'good']\n",
+   "False\n",
+   "Play\n",
+   "1.0\n",
+   "\n",
+   "\n",
+   "6\n",
+   "rain\n",
+   "65.0\n",
+   "70.0\n",
+   "[65.0, 70.0]\n",
+   "[u'low', u'unhealthy']\n",
+   "True\n",
+   "Don't Play\n",
+   "1.0\n",
+   "\n",
+   "\n",
+   "7\n",
+   "overcast\n",
+   "64.0\n",
+   "65.0\n",
+  

[13/15] madlib-site git commit: jupyter notebooks for 1.14 release

2018-04-23 Thread fmcquillan
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/3f849b9e/community-artifacts/Decision-trees-v1.ipynb
--
diff --git a/community-artifacts/Decision-trees-v1.ipynb 
b/community-artifacts/Decision-trees-v1.ipynb
new file mode 100644
index 000..e97b943
--- /dev/null
+++ b/community-artifacts/Decision-trees-v1.ipynb
@@ -0,0 +1,1590 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+"# Decision trees\n",
+"\n",
+"A decision tree is a supervised learning method that can be used for 
classification and regression. It consists of a structure in which internal 
nodes represent tests on attributes, and the branches from nodes represent the 
result of those tests. Each leaf node is a class label and the paths from root 
to leaf nodes define the set of classification or regression rules."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 34,
+   "metadata": {},
+   "outputs": [
+{
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+  "The sql extension is already loaded. To reload it, use:\n",
+  "  %reload_ext sql\n"
+ ]
+}
+   ],
+   "source": [
+"%load_ext sql"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 35,
+   "metadata": {},
+   "outputs": [
+{
+ "data": {
+  "text/plain": [
+   "u'Connected: fmcquillan@madlib'"
+  ]
+ },
+ "execution_count": 35,
+ "metadata": {},
+ "output_type": "execute_result"
+}
+   ],
+   "source": [
+"# Greenplum Database 5.4.0 on GCP (demo machine)\n",
+"#%sql postgresql://gpadmin@35.184.253.255:5432/madlib\n",
+"\n",
+"# PostgreSQL local\n",
+"%sql postgresql://fmcquillan@localhost:5432/madlib\n",
+"\n",
+"# Greenplum Database 4.3.10.0\n",
+"#%sql postgresql://gpdbchina@10.194.10.68:61000/madlib"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+"%sql select madlib.version();\n",
+"#%sql select version();"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+"# Decision tree classification examples"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+"# 1. Load data\n",
+"Data set related to whether to play golf or not."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 36,
+   "metadata": {},
+   "outputs": [
+{
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+  "Done.\n",
+  "Done.\n",
+  "14 rows affected.\n",
+  "14 rows affected.\n"
+ ]
+},
+{
+ "data": {
+  "text/html": [
+   "\n",
+   "\n",
+   "id\n",
+   "OUTLOOK\n",
+   "temperature\n",
+   "humidity\n",
+   "Temp_Humidity\n",
+   "clouds_airquality\n",
+   "windy\n",
+   "class\n",
+   "observation_weight\n",
+   "\n",
+   "\n",
+   "1\n",
+   "sunny\n",
+   "85.0\n",
+   "85.0\n",
+   "[85.0, 85.0]\n",
+   "[u'none', u'unhealthy']\n",
+   "False\n",
+   "Don't Play\n",
+   "5.0\n",
+   "\n",
+   "\n",
+   "2\n",
+   "sunny\n",
+   "80.0\n",
+   "90.0\n",
+   "[80.0, 90.0]\n",
+   "[u'none', u'moderate']\n",
+   "True\n",
+   "Don't Play\n",
+   "5.0\n",
+   "\n",
+   "\n",
+   "3\n",
+   "overcast\n",
+   "83.0\n",
+   "78.0\n",
+   "[83.0, 78.0]\n",
+   "[u'low', u'moderate']\n",
+   "False\n",
+   "Play\n",
+   "1.5\n",
+   "\n",
+   "\n",
+   "4\n",
+   "rain\n",
+   "70.0\n",
+   "96.0\n",
+   "[70.0, 96.0]\n",
+   "[u'low', u'moderate']\n",
+   "False\n",
+   "Play\n",
+   "1.0\n",
+   "\n",
+   "\n",
+   "5\n",
+   "rain\n",
+   "68.0\n",
+   "80.0\n",
+   "[68.0, 80.0]\n",
+   "[u'medium', u'good']\n",
+   "False\n",
+   "Play\n",
+   "1.0\n",
+   "\n",
+   "\n",
+   "6\n",
+   "rain\n",
+   "65.0\n",
+   "70.0\n",
+   "[65.0, 70.0]\n",
+   "[u'low', u'unhealthy']\n",
+   "True\n",
+   "Don't Play\n",
+   "1.0\n",
+   "\n",
+   "\n",
+   "7\n",
+   "overcast\n",
+   "64.0\n",
+   "65.0\n",
+