[GitHub] madlib issue #241: MiniBatch Pre-Processor: Add new module minibatch_preproc...

2018-03-21 Thread asfgit
Github user asfgit commented on the issue:

https://github.com/apache/madlib/pull/241
  

Refer to this link for build results (access rights to CI server needed): 
https://builds.apache.org/job/madlib-pr-build/393/



---


[GitHub] madlib issue #243: MLP: Add minibatch gradient descent solver

2018-03-21 Thread asfgit
Github user asfgit commented on the issue:

https://github.com/apache/madlib/pull/243
  

Refer to this link for build results (access rights to CI server needed): 
https://builds.apache.org/job/madlib-pr-build/392/



---


[GitHub] madlib pull request #242: PCA: Fix issue with text grouping col input

2018-03-21 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/madlib/pull/242


---


[GitHub] madlib issue #242: PCA: Fix issue with text grouping col input

2018-03-21 Thread fmcquillan99
Github user fmcquillan99 commented on the issue:

https://github.com/apache/madlib/pull/242
  
LGTM, this can be merged



---


[GitHub] madlib issue #241: MiniBatch Pre-Processor: Add new module minibatch_preproc...

2018-03-21 Thread asfgit
Github user asfgit commented on the issue:

https://github.com/apache/madlib/pull/241
  

Refer to this link for build results (access rights to CI server needed): 
https://builds.apache.org/job/madlib-pr-build/390/



---


[GitHub] madlib pull request #241: MiniBatch Pre-Processor: Add new module minibatch_...

2018-03-21 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/madlib/pull/241


---


[GitHub] madlib pull request #243: MLP: Add minibatch gradient descent solver

2018-03-21 Thread kaknikhil
Github user kaknikhil commented on a diff in the pull request:

https://github.com/apache/madlib/pull/243#discussion_r176262391
  
--- Diff: src/ports/postgres/modules/convex/test/mlp.sql_in ---
@@ -340,6 +181,51 @@ INSERT INTO iris_data VALUES
 (149,ARRAY[6.2,3.4,5.4,2.3],'Iris-virginica',3,2),
 (150,ARRAY[5.9,3.0,5.1,1.8],'Iris-virginica',3,2);
 
+-- NOTE that the batch specific tables were created using:
+-- madlib.minibatch_preprocessor(), with the regular source tables used in
+-- this file.
+
+-- Create preprocessed data that can be used with minibatch MLP:
+DROP TABLE IF EXISTS iris_data_batch, iris_data_batch_summary, 
iris_data_batch_standardization;
+CREATE TABLE iris_data_batch(
+__id__ integer,
+dependent_varname double precision[],
+independent_varname double precision[]
+);
+COPY iris_data_batch (__id__, dependent_varname, independent_varname) FROM 
STDIN NULL '?' DELIMITER '|';
+0 | 
{{0,1,0},{0,1,0},{0,0,1},{1,0,0},{0,1,0},{0,1,0},{0,0,1},{1,0,0},{1,0,0},{0,1,0},{1,0,0},{0,0,1},{0,0,1},{0,0,1},{1,0,0},{0,0,1},{0,0,1},{1,0,0},{1,0,0},{0,0,1},{0,1,0},{0,0,1},{0,0,1},{0,0,1},{0,0,1},{1,0,0},{0,1,0},{0,0,1},{0,0,1},{1,0,0}}
 | 
{{0.828881825720994,-0.314980522532101,0.363710790466334,0.159758615207397},{-1.08079689039279,-1.57669227467446,-0.229158821743702,-0.240110581430527},{-1.08079689039279,-1.32434992424599,0.482284712908341,0.692917544057962},{-1.46273263361555,0.442046528753317,-1.35561108494277,-1.30642843913166},{-0.0623015751321059,-0.567322872960574,0.245136868024327,0.159758615207397},{-0.189613489539692,-0.819665223389045,0.304423829245331,0.159758615207397},{0.701569911313408,-1.32434992424599,0.778719519013359,0.959497008483245},{-1.20810880480038,-0.0626381721036282,-1.35561108494277,-1.4397181713443},{-0.698861147170034,0.946731229610261,-1.35561108494277,-1.30642843913166},{-0.82617306157762,-1.32434992424599,-0.407019705406713,-0.106820849
 
217886},{-0.698861147170034,2.71312768260957,-1.29632412372177,-1.4397181713443},{1.33812948335134,0.442046528753317,1.31230217000239,1.49265593733381},{0.319634168090651,-0.0626381721036282,0.660145596571352,0.826207276270604},{0.701569911313408,-1.32434992424599,0.778719519013359,0.959497008483245},{-0.698861147170034,1.19907358003873,-1.29632412372177,-1.30642843913166},{1.46544139775892,0.189704178324845,0.838006480234363,1.49265593733381},{1.21081756894375,-0.0626381721036282,0.897293441455367,1.49265593733381},{-0.444237318354863,1.70375828089568,-1.29632412372177,-1.30642843913166},{-0.82617306157762,1.95610063132415,-1.05917627883775,-1.03984897470638},{0.828881825720994,-0.819665223389045,0.95658040267637,0.959497008483245},{0.956193740128579,-0.567322872960574,0.541571674129345,0.42633807963268},{1.33812948335134,0.442046528753317,1.31230217000239,1.49265593733381},{0.574257996905822,0.946731229610261,1.01586736389737,1.49265593733381},{0.0650103392754793,-0.81966522338904
 
5,0.838006480234363,0.959497008483245},{0.0650103392754793,-0.819665223389045,0.838006480234363,0.959497008483245},{-1.46273263361555,0.442046528753317,-1.35561108494277,-1.30642843913166},{0.574257996905822,-2.08137697553141,0.482284712908341,0.42633807963268},{1.21081756894375,0.189704178324845,1.13444128633938,1.62594566954645},{1.97468905538926,-0.314980522532101,1.54945001488641,0.826207276270604},{-1.08079689039279,0.189704178324845,-1.29632412372177,-1.4397181713443}}
+1 | 
{{0,1,0},{1,0,0},{0,1,0},{1,0,0},{1,0,0},{1,0,0},{1,0,0},{0,1,0},{0,0,1},{0,0,1},{1,0,0},{0,0,1},{1,0,0},{0,0,1},{0,1,0},{0,1,0},{0,1,0},{1,0,0},{1,0,0},{0,0,1},{0,1,0},{0,1,0},{0,0,1},{1,0,0},{1,0,0},{0,1,0},{1,0,0},{0,0,1},{0,1,0},{0,1,0}}
 | 
{{-0.0623015751321059,-0.0626381721036282,0.304423829245331,0.0264688829947554},{-0.316925403947277,2.96547003303804,-1.35561108494277,-1.30642843913166},{0.319634168090651,-0.819665223389045,0.838006480234363,0.559627811845321},{-0.953484975985206,1.19907358003873,-1.41489804616377,-1.17313870691902},{-0.953484975985206,0.442046528753317,-1.47418500738478,-1.30642843913166},{-1.33542071920796,0.442046528753317,-1.41489804616377,-1.30642843913166},{-1.71735646243072,-0.0626381721036282,-1.41489804616377,-1.30642843913166},{0.446946082498236,-0.0626381721036282,0.541571674129345,0.293048347420038},{1.21081756894375,-1.32434992424599,1.25301520878139,0.826207276270604},{0.701569911313408,0.694388879181789,1.3715891312234,1.75923540175909
 

[GitHub] madlib issue #241: MiniBatch Pre-Processor: Add new module minibatch_preproc...

2018-03-21 Thread asfgit
Github user asfgit commented on the issue:

https://github.com/apache/madlib/pull/241
  

Refer to this link for build results (access rights to CI server needed): 
https://builds.apache.org/job/madlib-pr-build/388/



---


[GitHub] madlib pull request #243: MLP: Add minibatch gradient descent solver

2018-03-21 Thread njayaram2
Github user njayaram2 commented on a diff in the pull request:

https://github.com/apache/madlib/pull/243#discussion_r176218740
  
--- Diff: src/modules/convex/mlp_igd.cpp ---
@@ -130,6 +145,90 @@ mlp_igd_transition::run(AnyType ) {
 
 return state;
 }
+/**
+ * @brief Perform the multilayer perceptron minibatch transition step
+ *
+ * Called for each tuple.
+ */
+AnyType
+mlp_minibatch_transition::run(AnyType ) {
+// For the first tuple: args[0] is nothing more than a marker that
+// indicates that we should do some initial operations.
+// For other tuples: args[0] holds the computation state until last 
tuple
+MLPMiniBatchState state = args[0];
+
+// initilize the state if first tuple
+if (state.algo.numRows == 0) {
+if (!args[3].isNull()) {
+MLPMiniBatchState previousState = 
args[3];
--- End diff --

Tried it, it was cleaner this way.


---


[GitHub] madlib pull request #246: DT user doc updates

2018-03-21 Thread fmcquillan99
Github user fmcquillan99 commented on a diff in the pull request:

https://github.com/apache/madlib/pull/246#discussion_r176152043
  
--- Diff: 
src/ports/postgres/modules/recursive_partitioning/decision_tree.sql_in ---
@@ -418,7 +468,10 @@ tree_predict(tree_model,
   new_data_table
   TEXT. Name of the table containing prediction data. This table is
   expected to contain the same features that were used during training. 
The table
-  should also contain id_col_name used for identifying each 
row.
+  should also contain id_col_name used for identifying each row.
+
+  If the new_data_table contains categorical variables
--- End diff --

Ok, I will remove this line.


---


[GitHub] madlib pull request #246: DT user doc updates

2018-03-21 Thread fmcquillan99
Github user fmcquillan99 commented on a diff in the pull request:

https://github.com/apache/madlib/pull/246#discussion_r176150844
  
--- Diff: 
src/ports/postgres/modules/recursive_partitioning/decision_tree.sql_in ---
@@ -127,7 +132,11 @@ tree_train(
 
   weights (optional)
   TEXT. Column name containing numerical weights for each observation.
+  Can be any value greater than 0 (does not need to be
+  an integer).  
   This can be used to handle the case of unbalanced data sets.
+  For classification the row's vote is multiplied by the weight, 
--- End diff --

ok


---