[GitHub] madlib pull request #334: Minibatch Preprocessor: Update online doc

2018-11-15 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/madlib/pull/334


---


[GitHub] madlib pull request #334: Minibatch Preprocessor: Update online doc

2018-11-15 Thread kaknikhil
Github user kaknikhil commented on a diff in the pull request:

https://github.com/apache/madlib/pull/334#discussion_r234051398
  
--- Diff: 
src/ports/postgres/modules/utilities/minibatch_preprocessing.py_in ---
@@ -487,10 +487,16 @@ class MiniBatchDocumentation:
 
 SUMMARY
 
-MiniBatch Preprocessor is a utility function to pre process the 
input
-data for use with models that support mini-batching as an 
optimization
+The mini-batch preprocessor is a utility that prepares input data 
for
+use by models that support mini-batch as an optimization option. 
(This
+is currently only the case for Neural Networks.) It is effectively 
a
--- End diff --

/s/Neural Networks/Neural Network


---


[GitHub] madlib pull request #334: Minibatch Preprocessor: Update online doc

2018-11-15 Thread kaknikhil
Github user kaknikhil commented on a diff in the pull request:

https://github.com/apache/madlib/pull/334#discussion_r234051252
  
--- Diff: 
src/ports/postgres/modules/utilities/minibatch_preprocessing.py_in ---
@@ -487,10 +487,16 @@ class MiniBatchDocumentation:
 
 SUMMARY
 
-MiniBatch Preprocessor is a utility function to pre process the 
input
-data for use with models that support mini-batching as an 
optimization
+The mini-batch preprocessor is a utility that prepares input data 
for
+use by models that support mini-batch as an optimization option. 
(This
+is currently only the case for Neural Networks.) It is effectively 
a
+packing operation that builds arrays of dependent and independent
+variables from the source data table.
 
-#TODO add more here
+The advantage of using mini-batching is that it can perform better 
than
+stochastic gradient descent (default MADlib optimizer) because it 
uses
+more than one training example at a time, typically resulting 
faster
--- End diff --

missing the word in `resulting in faster .`


---


[GitHub] madlib pull request #334: Minibatch Preprocessor: Update online doc

2018-11-15 Thread kaknikhil
Github user kaknikhil commented on a diff in the pull request:

https://github.com/apache/madlib/pull/334#discussion_r234051503
  
--- Diff: 
src/ports/postgres/modules/utilities/minibatch_preprocessing.py_in ---
@@ -487,10 +487,16 @@ class MiniBatchDocumentation:
 
 SUMMARY
 
-MiniBatch Preprocessor is a utility function to pre process the 
input
-data for use with models that support mini-batching as an 
optimization
+The mini-batch preprocessor is a utility that prepares input data 
for
+use by models that support mini-batch as an optimization option. 
(This
+is currently only the case for Neural Networks.) It is effectively 
a
+packing operation that builds arrays of dependent and independent
--- End diff --

should we instead say `build matrix of independent variable(s) and arrays 
of dependent variable` ?


---


[GitHub] madlib pull request #334: Minibatch Preprocessor: Update online doc

2018-11-15 Thread kaknikhil
Github user kaknikhil commented on a diff in the pull request:

https://github.com/apache/madlib/pull/334#discussion_r234051175
  
--- Diff: 
src/ports/postgres/modules/utilities/minibatch_preprocessing.py_in ---
@@ -508,8 +514,13 @@ class MiniBatchDocumentation:
 dependent_varname, -- TEXT. Name of the dependent variable 
column
 independent_varname,   -- TEXT. Name of the independent 
variable
   column
-buffer_size-- INTEGER. Number of source input rows 
to
-  pack into batch
+grouping_col   -- TEXT. Default NULL. An expression 
list used
+  to group the input dataset into 
discrete groups
+buffer_size-- INTEGER. Default computed 
automatically.
+  Number of source input rows to pack 
into batch
--- End diff --

/s/batch/buffer


---


[GitHub] madlib pull request #334: Minibatch Preprocessor: Update online doc

2018-10-23 Thread njayaram2
GitHub user njayaram2 opened a pull request:

https://github.com/apache/madlib/pull/334

Minibatch Preprocessor: Update online doc

The online doc is outdated. This commit adds two new parameters that
have been introduced since the last time the doc was edited.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/madlib/madlib doc/minibatch-preprocessor

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/madlib/pull/334.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #334


commit 7e95fc7d936f25e74ceceb74dfa7473c4eda45c8
Author: Nandish Jayaram 
Date:   2018-10-23T17:35:02Z

Minibatch Preprocessor: Update online doc

The online doc is outdated. This commit adds two new parameters that
have been introduced since the last time the doc was edited.




---