[GitHub] madlib pull request #253: MLP: Add install check tests for minibatch with gr...

2018-03-29 Thread kaknikhil
Github user kaknikhil closed the pull request at:

https://github.com/apache/madlib/pull/253


---


[GitHub] madlib issue #253: MLP: Add install check tests for minibatch with grouping

2018-03-29 Thread kaknikhil
Github user kaknikhil commented on the issue:

https://github.com/apache/madlib/pull/253
  
Closed by ab7166ff4fc55311ec29bb8b54d17becd9bb1750


---


[GitHub] madlib pull request #251: MLP: Simplify initialization of model coefficients

2018-03-29 Thread iyerr3
Github user iyerr3 commented on a diff in the pull request:

https://github.com/apache/madlib/pull/251#discussion_r178204767
  
--- Diff: src/modules/convex/mlp_igd.cpp ---
@@ -98,23 +98,28 @@ mlp_igd_transition::run(AnyType ) {
 
 double is_classification_double = (double) is_classification;
 double activation_double = (double) activation;
-MappedColumnVector coeff = 
args[10].getAs();
-
state.task.model.rebind(_classification_double,_double,
-()[0], numberOfStages,
-[0]);
-
-// state.task.model.is_classification =
-// static_cast(is_classification);
-// state.task.model.activation = 
static_cast(activation);
-// MappedColumnVector initial_coeff = 
args[10].getAs();
-// // copy initial_coeff into the model
-// Index fan_in, fan_out, layer_start = 0;
-// for (size_t k = 0; k < numberOfStages; ++k){
-// fan_in = numbersOfUnits[k];
-// fan_out = numbersOfUnits[k+1];
-// state.task.model.u[k] << 
initial_coeff.segment(layer_start, (fan_in+1)*fan_out);
-// layer_start = (fan_in + 1) * fan_out;
-// }
+if (!args[10].isNull()){
+// initial coefficients are provided
+MappedColumnVector warm_start_coeff = 
args[10].getAs();
+
+// copy warm start into the task model
+// state.reset() ensures algo.incrModel is copied from 
task.model
+Index layer_start = 0;
+for (size_t k = 0; k < numberOfStages; ++k){
+for (size_t j=0; j < state.task.model.u[k].cols(); 
++j){
+for (size_t i=0; i < state.task.model.u[k].rows(); 
++i){
+state.task.model.u[k](i, j) = warm_start_coeff(
+layer_start + j * 
state.task.model.u[k].rows() + i);
+}
+}
+layer_start = state.task.model.u[k].rows() * 
state.task.model.u[k].cols();
+}
+} else {
+// initialize the model with appropriate coefficients
+state.task.model.initialize(
--- End diff --

The comment in line 106 is actually for both the warm start copy and the 
initialize. Basically `state.reset()` in line 125 is copying `task.model` to 
`algo.incrModel`. 


---


[GitHub] madlib pull request #251: MLP: Simplify initialization of model coefficients

2018-03-29 Thread njayaram2
Github user njayaram2 commented on a diff in the pull request:

https://github.com/apache/madlib/pull/251#discussion_r178204067
  
--- Diff: src/modules/convex/mlp_igd.cpp ---
@@ -98,23 +98,28 @@ mlp_igd_transition::run(AnyType ) {
 
 double is_classification_double = (double) is_classification;
 double activation_double = (double) activation;
-MappedColumnVector coeff = 
args[10].getAs();
-
state.task.model.rebind(_classification_double,_double,
-()[0], numberOfStages,
-[0]);
-
-// state.task.model.is_classification =
-// static_cast(is_classification);
-// state.task.model.activation = 
static_cast(activation);
-// MappedColumnVector initial_coeff = 
args[10].getAs();
-// // copy initial_coeff into the model
-// Index fan_in, fan_out, layer_start = 0;
-// for (size_t k = 0; k < numberOfStages; ++k){
-// fan_in = numbersOfUnits[k];
-// fan_out = numbersOfUnits[k+1];
-// state.task.model.u[k] << 
initial_coeff.segment(layer_start, (fan_in+1)*fan_out);
-// layer_start = (fan_in + 1) * fan_out;
-// }
+if (!args[10].isNull()){
+// initial coefficients are provided
+MappedColumnVector warm_start_coeff = 
args[10].getAs();
+
+// copy warm start into the task model
+// state.reset() ensures algo.incrModel is copied from 
task.model
+Index layer_start = 0;
+for (size_t k = 0; k < numberOfStages; ++k){
+for (size_t j=0; j < state.task.model.u[k].cols(); 
++j){
+for (size_t i=0; i < state.task.model.u[k].rows(); 
++i){
+state.task.model.u[k](i, j) = warm_start_coeff(
+layer_start + j * 
state.task.model.u[k].rows() + i);
+}
+}
+layer_start = state.task.model.u[k].rows() * 
state.task.model.u[k].cols();
+}
+} else {
+// initialize the model with appropriate coefficients
+state.task.model.initialize(
--- End diff --

This is more of a doubt than a comment I guess. Should this be 
`state.algo.incrModel` instead?
The reason I thought so was that `MLPIGDAlgorithm::transition(state, 
tuple);` uses the model in algo instead of the one task.


---


[GitHub] madlib issue #250: MLP: Allow one-hot encoded dependent var for classificati...

2018-03-29 Thread asfgit
Github user asfgit commented on the issue:

https://github.com/apache/madlib/pull/250
  

Refer to this link for build results (access rights to CI server needed): 
https://builds.apache.org/job/madlib-pr-build/416/



---


[GitHub] madlib pull request #250: MLP: Allow one-hot encoded dependent var for class...

2018-03-29 Thread njayaram2
Github user njayaram2 commented on a diff in the pull request:

https://github.com/apache/madlib/pull/250#discussion_r178167691
  
--- Diff: src/ports/postgres/modules/convex/mlp_igd.py_in ---
@@ -667,7 +678,8 @@ def _validate_dependent_var(source_table, 
dependent_varname,
 if is_classification:
 # Currently, classification doesn't accept an
--- End diff --

Yes, that's correct.


---


[GitHub] madlib pull request #250: MLP: Allow one-hot encoded dependent var for class...

2018-03-29 Thread njayaram2
Github user njayaram2 commented on a diff in the pull request:

https://github.com/apache/madlib/pull/250#discussion_r178168389
  
--- Diff: src/ports/postgres/modules/convex/mlp_igd.py_in ---
@@ -856,8 +868,16 @@ def mlp_predict(schema_madlib, model_table, 
data_table, id_col_name,
 activation = _get_activation_index(summary['activation'])
 layer_sizes = PY2SQL(
 summary['layer_sizes'], array_type="DOUBLE PRECISION")
-is_classification = int(summary["is_classification"])
 is_response = int(pred_type == 'response')
+is_classification = int(summary["is_classification"])
+classes = summary['classes']
+# Set a flag to indicate that it is a classification model, with an 
array
+# as the dependent var. The only scenario where classification allows 
for
+# an array dep var is when the user has provided a one-hot encoded dep 
var
+# during training, and mlp_classification does not one-hot encode
+# (and hence classes column in model's summary table is NULL).
+array_dep_var_for_classification = int(is_classification and not 
classes)
--- End diff --

Will change it to `is_dep_var_an_array_for_classification`


---


[GitHub] madlib pull request #250: MLP: Allow one-hot encoded dependent var for class...

2018-03-29 Thread njayaram2
Github user njayaram2 commented on a diff in the pull request:

https://github.com/apache/madlib/pull/250#discussion_r178168247
  
--- Diff: src/ports/postgres/modules/convex/mlp_igd.py_in ---
@@ -667,7 +678,8 @@ def _validate_dependent_var(source_table, 
dependent_varname,
 if is_classification:
 # Currently, classification doesn't accept an
 # array for dep type in IGD
-_assert("[]" not in expr_type and expr_type in 
classification_types,
+_assert(("[]" in expr_type and expr_type[:-2] in int_types) \
--- End diff --

- One-hot encoding is normally supposed to be an integer array, hence was 
checking only for int type. But, that assumption might be too strict, so will 
make changes to allow for other numeric types too.
- Good catch. It might be a good idea to disallow anything greater than a 
1-D array. Will make an error check for the same.


---


[GitHub] madlib pull request #250: MLP: Allow one-hot encoded dependent var for class...

2018-03-29 Thread njayaram2
Github user njayaram2 commented on a diff in the pull request:

https://github.com/apache/madlib/pull/250#discussion_r178168263
  
--- Diff: src/ports/postgres/modules/convex/mlp_igd.py_in ---
@@ -667,7 +678,8 @@ def _validate_dependent_var(source_table, 
dependent_varname,
 if is_classification:
 # Currently, classification doesn't accept an
 # array for dep type in IGD
-_assert("[]" not in expr_type and expr_type in 
classification_types,
+_assert(("[]" in expr_type and expr_type[:-2] in int_types) \
+or expr_type in classification_types,
 "Dependent variable column should be of type: "
--- End diff --

Will change it to cover both the cases.


---


[GitHub] madlib issue #251: MLP: Simplify initialization of model coefficients

2018-03-29 Thread asfgit
Github user asfgit commented on the issue:

https://github.com/apache/madlib/pull/251
  

Refer to this link for build results (access rights to CI server needed): 
https://builds.apache.org/job/madlib-pr-build/415/



---


[GitHub] madlib issue #251: MLP: Simplify initialization of model coefficients

2018-03-29 Thread asfgit
Github user asfgit commented on the issue:

https://github.com/apache/madlib/pull/251
  

Refer to this link for build results (access rights to CI server needed): 
https://builds.apache.org/job/madlib-pr-build/414/



---


[GitHub] madlib pull request #250: MLP: Allow one-hot encoded dependent var for class...

2018-03-29 Thread kaknikhil
Github user kaknikhil commented on a diff in the pull request:

https://github.com/apache/madlib/pull/250#discussion_r177912997
  
--- Diff: src/ports/postgres/modules/convex/mlp_igd.py_in ---
@@ -667,7 +678,8 @@ def _validate_dependent_var(source_table, 
dependent_varname,
 if is_classification:
 # Currently, classification doesn't accept an
--- End diff --

This comment is not relevant anymore. 


---


[GitHub] madlib pull request #250: MLP: Allow one-hot encoded dependent var for class...

2018-03-29 Thread kaknikhil
Github user kaknikhil commented on a diff in the pull request:

https://github.com/apache/madlib/pull/250#discussion_r177912480
  
--- Diff: src/ports/postgres/modules/convex/mlp_igd.py_in ---
@@ -667,7 +678,8 @@ def _validate_dependent_var(source_table, 
dependent_varname,
 if is_classification:
 # Currently, classification doesn't accept an
 # array for dep type in IGD
-_assert("[]" not in expr_type and expr_type in 
classification_types,
+_assert(("[]" in expr_type and expr_type[:-2] in int_types) \
--- End diff --

why are we checking for only int types and not all numeric types  ? 


---


[GitHub] madlib pull request #250: MLP: Allow one-hot encoded dependent var for class...

2018-03-29 Thread kaknikhil
Github user kaknikhil commented on a diff in the pull request:

https://github.com/apache/madlib/pull/250#discussion_r177913869
  
--- Diff: src/ports/postgres/modules/convex/mlp_igd.py_in ---
@@ -856,8 +868,16 @@ def mlp_predict(schema_madlib, model_table, 
data_table, id_col_name,
 activation = _get_activation_index(summary['activation'])
 layer_sizes = PY2SQL(
 summary['layer_sizes'], array_type="DOUBLE PRECISION")
-is_classification = int(summary["is_classification"])
 is_response = int(pred_type == 'response')
+is_classification = int(summary["is_classification"])
+classes = summary['classes']
+# Set a flag to indicate that it is a classification model, with an 
array
+# as the dependent var. The only scenario where classification allows 
for
+# an array dep var is when the user has provided a one-hot encoded dep 
var
+# during training, and mlp_classification does not one-hot encode
+# (and hence classes column in model's summary table is NULL).
+array_dep_var_for_classification = int(is_classification and not 
classes)
--- End diff --

can we change the name of `array_dep_var_for_classification` to something 
like`is_dep_var_an_array` so that it's clear that it's a flag.


---


[GitHub] madlib issue #253: MLP: Add install check tests for minibatch with grouping

2018-03-29 Thread iyerr3
Github user iyerr3 commented on the issue:

https://github.com/apache/madlib/pull/253
  
LGTM


---