[GitHub] madlib pull request #250: MLP: Allow one-hot encoded dependent var for class...

2018-04-04 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/madlib/pull/250


---


[GitHub] madlib pull request #250: MLP: Allow one-hot encoded dependent var for class...

2018-03-29 Thread njayaram2
Github user njayaram2 commented on a diff in the pull request:

https://github.com/apache/madlib/pull/250#discussion_r178167691
  
--- Diff: src/ports/postgres/modules/convex/mlp_igd.py_in ---
@@ -667,7 +678,8 @@ def _validate_dependent_var(source_table, 
dependent_varname,
 if is_classification:
 # Currently, classification doesn't accept an
--- End diff --

Yes, that's correct.


---


[GitHub] madlib pull request #250: MLP: Allow one-hot encoded dependent var for class...

2018-03-29 Thread njayaram2
Github user njayaram2 commented on a diff in the pull request:

https://github.com/apache/madlib/pull/250#discussion_r178168389
  
--- Diff: src/ports/postgres/modules/convex/mlp_igd.py_in ---
@@ -856,8 +868,16 @@ def mlp_predict(schema_madlib, model_table, 
data_table, id_col_name,
 activation = _get_activation_index(summary['activation'])
 layer_sizes = PY2SQL(
 summary['layer_sizes'], array_type="DOUBLE PRECISION")
-is_classification = int(summary["is_classification"])
 is_response = int(pred_type == 'response')
+is_classification = int(summary["is_classification"])
+classes = summary['classes']
+# Set a flag to indicate that it is a classification model, with an 
array
+# as the dependent var. The only scenario where classification allows 
for
+# an array dep var is when the user has provided a one-hot encoded dep 
var
+# during training, and mlp_classification does not one-hot encode
+# (and hence classes column in model's summary table is NULL).
+array_dep_var_for_classification = int(is_classification and not 
classes)
--- End diff --

Will change it to `is_dep_var_an_array_for_classification`


---


[GitHub] madlib pull request #250: MLP: Allow one-hot encoded dependent var for class...

2018-03-29 Thread njayaram2
Github user njayaram2 commented on a diff in the pull request:

https://github.com/apache/madlib/pull/250#discussion_r178168247
  
--- Diff: src/ports/postgres/modules/convex/mlp_igd.py_in ---
@@ -667,7 +678,8 @@ def _validate_dependent_var(source_table, 
dependent_varname,
 if is_classification:
 # Currently, classification doesn't accept an
 # array for dep type in IGD
-_assert("[]" not in expr_type and expr_type in 
classification_types,
+_assert(("[]" in expr_type and expr_type[:-2] in int_types) \
--- End diff --

- One-hot encoding is normally supposed to be an integer array, hence was 
checking only for int type. But, that assumption might be too strict, so will 
make changes to allow for other numeric types too.
- Good catch. It might be a good idea to disallow anything greater than a 
1-D array. Will make an error check for the same.


---


[GitHub] madlib pull request #250: MLP: Allow one-hot encoded dependent var for class...

2018-03-29 Thread njayaram2
Github user njayaram2 commented on a diff in the pull request:

https://github.com/apache/madlib/pull/250#discussion_r178168263
  
--- Diff: src/ports/postgres/modules/convex/mlp_igd.py_in ---
@@ -667,7 +678,8 @@ def _validate_dependent_var(source_table, 
dependent_varname,
 if is_classification:
 # Currently, classification doesn't accept an
 # array for dep type in IGD
-_assert("[]" not in expr_type and expr_type in 
classification_types,
+_assert(("[]" in expr_type and expr_type[:-2] in int_types) \
+or expr_type in classification_types,
 "Dependent variable column should be of type: "
--- End diff --

Will change it to cover both the cases.


---


[GitHub] madlib pull request #250: MLP: Allow one-hot encoded dependent var for class...

2018-03-29 Thread kaknikhil
Github user kaknikhil commented on a diff in the pull request:

https://github.com/apache/madlib/pull/250#discussion_r177912997
  
--- Diff: src/ports/postgres/modules/convex/mlp_igd.py_in ---
@@ -667,7 +678,8 @@ def _validate_dependent_var(source_table, 
dependent_varname,
 if is_classification:
 # Currently, classification doesn't accept an
--- End diff --

This comment is not relevant anymore. 


---


[GitHub] madlib pull request #250: MLP: Allow one-hot encoded dependent var for class...

2018-03-29 Thread kaknikhil
Github user kaknikhil commented on a diff in the pull request:

https://github.com/apache/madlib/pull/250#discussion_r177912480
  
--- Diff: src/ports/postgres/modules/convex/mlp_igd.py_in ---
@@ -667,7 +678,8 @@ def _validate_dependent_var(source_table, 
dependent_varname,
 if is_classification:
 # Currently, classification doesn't accept an
 # array for dep type in IGD
-_assert("[]" not in expr_type and expr_type in 
classification_types,
+_assert(("[]" in expr_type and expr_type[:-2] in int_types) \
--- End diff --

why are we checking for only int types and not all numeric types  ? 


---


[GitHub] madlib pull request #250: MLP: Allow one-hot encoded dependent var for class...

2018-03-29 Thread kaknikhil
Github user kaknikhil commented on a diff in the pull request:

https://github.com/apache/madlib/pull/250#discussion_r177913869
  
--- Diff: src/ports/postgres/modules/convex/mlp_igd.py_in ---
@@ -856,8 +868,16 @@ def mlp_predict(schema_madlib, model_table, 
data_table, id_col_name,
 activation = _get_activation_index(summary['activation'])
 layer_sizes = PY2SQL(
 summary['layer_sizes'], array_type="DOUBLE PRECISION")
-is_classification = int(summary["is_classification"])
 is_response = int(pred_type == 'response')
+is_classification = int(summary["is_classification"])
+classes = summary['classes']
+# Set a flag to indicate that it is a classification model, with an 
array
+# as the dependent var. The only scenario where classification allows 
for
+# an array dep var is when the user has provided a one-hot encoded dep 
var
+# during training, and mlp_classification does not one-hot encode
+# (and hence classes column in model's summary table is NULL).
+array_dep_var_for_classification = int(is_classification and not 
classes)
--- End diff --

can we change the name of `array_dep_var_for_classification` to something 
like`is_dep_var_an_array` so that it's clear that it's a flag.


---


[GitHub] madlib pull request #250: MLP: Allow one-hot encoded dependent var for class...

2018-03-26 Thread njayaram2
GitHub user njayaram2 opened a pull request:

https://github.com/apache/madlib/pull/250

MLP: Allow one-hot encoded dependent var for classification

JIRA:MADLIB-1222

MLP currently automatically encodes categorical variables for
classification but does not allow already encoded arrays for dependent
variables in mlp_classification. This commit lets users have an already
encoded array for the dependent variable and train a model.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/madlib/madlib 
feature/mlp/support-encoded-dep-var

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/madlib/pull/250.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #250


commit f5a87dee6bc8f27c1a13a4921ea726b391b1813d
Author: Nandish Jayaram 
Date:   2018-03-20T22:43:25Z

MLP: Allow one-hot encoded dependent var for classification

JIRA:MADLIB-1222

MLP currently automatically encodes categorical variables for
classification but does not allow already encoded arrays for dependent
variables in mlp_classification. This commit lets users have an already
encoded array for the dependent variable and train a model.




---