[GitHub] madlib issue #338: Install/Dev check: Add new test cases for some modules

2018-11-15 Thread asfgit
Github user asfgit commented on the issue:

https://github.com/apache/madlib/pull/338
  

Refer to this link for build results (access rights to CI server needed): 
https://builds.apache.org/job/madlib-pr-build/711/



---


[GitHub] madlib issue #334: Minibatch Preprocessor: Update online doc

2018-11-15 Thread asfgit
Github user asfgit commented on the issue:

https://github.com/apache/madlib/pull/334
  

Refer to this link for build results (access rights to CI server needed): 
https://builds.apache.org/job/madlib-pr-build/710/



---


[GitHub] madlib pull request #338: Install/Dev check: Add new test cases for some mod...

2018-11-15 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/madlib/pull/338


---


[GitHub] madlib pull request #334: Minibatch Preprocessor: Update online doc

2018-11-15 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/madlib/pull/334


---


[GitHub] madlib pull request #338: Install/Dev check: Add new test cases for some mod...

2018-11-15 Thread kaknikhil
Github user kaknikhil commented on a diff in the pull request:

https://github.com/apache/madlib/pull/338#discussion_r234052747
  
--- Diff: src/ports/postgres/modules/pmml/test/pmml.ic.sql_in ---
@@ -0,0 +1,119 @@
+/* --- 
*//**
+ *
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ *
+ *//* 
--- */
+DROP TABLE IF EXISTS abalone CASCADE;
+
+CREATE TABLE abalone (
+id integer,
+sex text,
+length double precision,
+diameter double precision,
+height double precision,
+whole double precision,
+shucked double precision,
+viscera double precision,
+shell double precision,
+rings integer
+);
+
+INSERT INTO abalone VALUES
+(3151, 'F', 0.655027, 0.505004, 
0.165008, 1.36699, 0.583519, 
0.351479, 0.396019, 10),
--- End diff --

Since this is an install check test, can we cut down on the dataset size 
and call the `glm` and the `pmml` functions only once. 


---


[GitHub] madlib pull request #334: Minibatch Preprocessor: Update online doc

2018-11-15 Thread kaknikhil
Github user kaknikhil commented on a diff in the pull request:

https://github.com/apache/madlib/pull/334#discussion_r234051398
  
--- Diff: 
src/ports/postgres/modules/utilities/minibatch_preprocessing.py_in ---
@@ -487,10 +487,16 @@ class MiniBatchDocumentation:
 
 SUMMARY
 
-MiniBatch Preprocessor is a utility function to pre process the 
input
-data for use with models that support mini-batching as an 
optimization
+The mini-batch preprocessor is a utility that prepares input data 
for
+use by models that support mini-batch as an optimization option. 
(This
+is currently only the case for Neural Networks.) It is effectively 
a
--- End diff --

/s/Neural Networks/Neural Network


---


[GitHub] madlib pull request #334: Minibatch Preprocessor: Update online doc

2018-11-15 Thread kaknikhil
Github user kaknikhil commented on a diff in the pull request:

https://github.com/apache/madlib/pull/334#discussion_r234051252
  
--- Diff: 
src/ports/postgres/modules/utilities/minibatch_preprocessing.py_in ---
@@ -487,10 +487,16 @@ class MiniBatchDocumentation:
 
 SUMMARY
 
-MiniBatch Preprocessor is a utility function to pre process the 
input
-data for use with models that support mini-batching as an 
optimization
+The mini-batch preprocessor is a utility that prepares input data 
for
+use by models that support mini-batch as an optimization option. 
(This
+is currently only the case for Neural Networks.) It is effectively 
a
+packing operation that builds arrays of dependent and independent
+variables from the source data table.
 
-#TODO add more here
+The advantage of using mini-batching is that it can perform better 
than
+stochastic gradient descent (default MADlib optimizer) because it 
uses
+more than one training example at a time, typically resulting 
faster
--- End diff --

missing the word in `resulting in faster .`


---


[GitHub] madlib pull request #334: Minibatch Preprocessor: Update online doc

2018-11-15 Thread kaknikhil
Github user kaknikhil commented on a diff in the pull request:

https://github.com/apache/madlib/pull/334#discussion_r234051503
  
--- Diff: 
src/ports/postgres/modules/utilities/minibatch_preprocessing.py_in ---
@@ -487,10 +487,16 @@ class MiniBatchDocumentation:
 
 SUMMARY
 
-MiniBatch Preprocessor is a utility function to pre process the 
input
-data for use with models that support mini-batching as an 
optimization
+The mini-batch preprocessor is a utility that prepares input data 
for
+use by models that support mini-batch as an optimization option. 
(This
+is currently only the case for Neural Networks.) It is effectively 
a
+packing operation that builds arrays of dependent and independent
--- End diff --

should we instead say `build matrix of independent variable(s) and arrays 
of dependent variable` ?


---


[GitHub] madlib pull request #334: Minibatch Preprocessor: Update online doc

2018-11-15 Thread kaknikhil
Github user kaknikhil commented on a diff in the pull request:

https://github.com/apache/madlib/pull/334#discussion_r234051175
  
--- Diff: 
src/ports/postgres/modules/utilities/minibatch_preprocessing.py_in ---
@@ -508,8 +514,13 @@ class MiniBatchDocumentation:
 dependent_varname, -- TEXT. Name of the dependent variable 
column
 independent_varname,   -- TEXT. Name of the independent 
variable
   column
-buffer_size-- INTEGER. Number of source input rows 
to
-  pack into batch
+grouping_col   -- TEXT. Default NULL. An expression 
list used
+  to group the input dataset into 
discrete groups
+buffer_size-- INTEGER. Default computed 
automatically.
+  Number of source input rows to pack 
into batch
--- End diff --

/s/batch/buffer


---


[GitHub] madlib issue #338: Install/Dev check: Add new test cases for some modules

2018-11-15 Thread asfgit
Github user asfgit commented on the issue:

https://github.com/apache/madlib/pull/338
  

Refer to this link for build results (access rights to CI server needed): 
https://builds.apache.org/job/madlib-pr-build/709/



---


[GitHub] madlib pull request #338: Install/Dev check: Add new test cases for some mod...

2018-11-15 Thread njayaram2
GitHub user njayaram2 opened a pull request:

https://github.com/apache/madlib/pull/338

Install/Dev check: Add new test cases for some modules

Some modules such as array_ops and pmml did not have any install check
files, while stemmer did not have any test files. This commit adds some
basic test cases for these modules.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/madlib/madlib ic-pmml-stemmer

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/madlib/pull/338.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #338


commit c351f176b305fb44bd87bc6a4f79c099a3d6fbe3
Author: Nandish Jayaram 
Date:   2018-09-29T00:15:40Z

Install/Dev check: Add new test cases for some modules

Some modules such as array_ops and pmml did not have any install check
files, while stemmer did not have any test files. This commit adds some
basic test cases for these modules.




---