[GitHub] madlib issue #282: Utilites: Add CTAS while dropping some columns

2018-07-11 Thread iyerr3
Github user iyerr3 commented on the issue:

https://github.com/apache/madlib/pull/282
  
That's just the way it has to be if it needs to be a link (bold) like the
other functions. There are some other functions in that list that have
parameters and you can get their info by clicking into them. Similarly you
can click drop_cols() to get the parameters info. It's either this or have
the parameters in that same page and make it inconsistent with the other
functions.

On Wed, Jul 11, 2018 at 5:56 PM Frank McQuillan 
wrote:

> looks like user docs lost the params description for dropcols()
>
> —
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub
> , or 
mute
> the thread
> 

> .
>



---


[GitHub] madlib issue #282: Utilites: Add CTAS while dropping some columns

2018-07-11 Thread fmcquillan99
Github user fmcquillan99 commented on the issue:

https://github.com/apache/madlib/pull/282
  
looks like user docs lost the params description for dropcols()


---


[GitHub] madlib issue #282: Utilites: Add CTAS while dropping some columns

2018-07-11 Thread asfgit
Github user asfgit commented on the issue:

https://github.com/apache/madlib/pull/282
  

Refer to this link for build results (access rights to CI server needed): 
https://builds.apache.org/job/madlib-pr-build/550/



---


[GitHub] madlib pull request #291: Feature: Vector to Columns

2018-07-11 Thread iyerr3
Github user iyerr3 commented on a diff in the pull request:

https://github.com/apache/madlib/pull/291#discussion_r201876969
  
--- Diff: 
src/ports/postgres/modules/internal/test/unit_tests/test_db_utils.py_in ---
@@ -0,0 +1,60 @@
+# coding=utf-8
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import sys
+from os import path
+
+# Add modules module to the pythonpath.

+sys.path.append(path.dirname(path.dirname(path.dirname(path.dirname(path.abspath(__file__))
+
+import unittest
+from mock import *
+import sys
+import plpy_mock as plpy
+
+m4_changequote(`')
+class Vec2ColsTestCase(unittest.TestCase):
--- End diff --

There is another `Vec2ColsTestCase` in `test_vec2cols.py_in`. Is this one 
redundant? 


---


[GitHub] madlib pull request #291: Feature: Vector to Columns

2018-07-11 Thread iyerr3
Github user iyerr3 commented on a diff in the pull request:

https://github.com/apache/madlib/pull/291#discussion_r201877031
  
--- Diff: 
src/ports/postgres/modules/utilities/test/unit_tests/test_utilities.py_in ---
@@ -243,6 +243,9 @@ class UtilitiesTestCase(unittest.TestCase):
 self.assertFalse(s.is_valid_psql_type('boolean[]', s.INCLUDE_ARRAY 
| s.ONLY_ARRAY))
 self.assertFalse(s.is_valid_psql_type('boolean', s.ONLY_ARRAY))
 self.assertFalse(s.is_valid_psql_type('boolean[]', s.ONLY_ARRAY))
+self.assertTrue(s.is_valid_psql_type('boolean[]', s.ANY_ARRAY))
--- End diff --

Can we move this and corresponding code changes to another commit and PR? 


---


[GitHub] madlib pull request #290: madpack: Add madpack option to run unit tests.

2018-07-11 Thread iyerr3
Github user iyerr3 commented on a diff in the pull request:

https://github.com/apache/madlib/pull/290#discussion_r201876197
  
--- Diff: 
src/ports/postgres/modules/convex/test/unit_tests/test_mlp_igd.py_in ---
@@ -126,10 +126,11 @@ class MLPMiniBatchTestCase(unittest.TestCase):
 'dependent_varname': 
'value',
 'class_values': 
'regression',
 'grouping_cols': 'value',
+'dependent_vartype': 
'value',
 'foo': 'bar'}]
 self.module = self.subject.MLPMinibatchPreProcessor("input")
 self.assertTrue(self.module.preprocessed_summary_dict)
-self.assertEqual(5, len(self.module.preprocessed_summary_dict))
+self.assertEqual(6, len(self.module.preprocessed_summary_dict))
--- End diff --

Let's put this as a separate commit (can be pushed to master already) since 
it's unrelated to this PR. 


---


[GitHub] madlib issue #282: Utilites: Add CTAS while dropping some columns

2018-07-11 Thread iyerr3
Github user iyerr3 commented on the issue:

https://github.com/apache/madlib/pull/282
  
@fmcquillan99 Suggested changes have been made in latest commit. 


---


[GitHub] madlib issue #290: madpack: Add madpack option to run unit tests.

2018-07-11 Thread asfgit
Github user asfgit commented on the issue:

https://github.com/apache/madlib/pull/290
  

Refer to this link for build results (access rights to CI server needed): 
https://builds.apache.org/job/madlib-pr-build/548/



---


[GitHub] madlib issue #290: madpack: Add madpack option to run unit tests.

2018-07-11 Thread asfgit
Github user asfgit commented on the issue:

https://github.com/apache/madlib/pull/290
  

Refer to this link for build results (access rights to CI server needed): 
https://builds.apache.org/job/madlib-pr-build/547/



---


[GitHub] madlib issue #290: madpack: Add madpack option to run unit tests.

2018-07-11 Thread asfgit
Github user asfgit commented on the issue:

https://github.com/apache/madlib/pull/290
  

Refer to this link for build results (access rights to CI server needed): 
https://builds.apache.org/job/madlib-pr-build/546/



---


[GitHub] madlib pull request #291: Feature: Vector to Columns

2018-07-11 Thread ArvindSridhar
GitHub user ArvindSridhar opened a pull request:

https://github.com/apache/madlib/pull/291

Feature: Vector to Columns

JIRA: MADLIB-1240
The vec2cols function enables users to split up a single column into 
multiple columns, given that the input column contains array entries. For 
example, if the input column contained ARRAY[1, 2, 3] in one of its rows, the 
output table will contain 3 different columns, one for each element of the 
array.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/madlib/madlib feature/vector-to-columns

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/madlib/pull/291.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #291


commit 3237acb45c553fc4fc2c20b6e7c9a0b6bec2ffe8
Author: Arvind Sridhar 
Date:   2018-07-06T00:18:55Z

Utilities: Create new function to convert vector to columns

JIRA: MADLIB-1240

The vec2cols function enables users to split up a single column
into multiple columns, given that the input column contains array
entries. For example, if the input column contained ARRAY[1, 2, 3]
in one of its rows, the output table will contain 3 different
columns, one for each element of the array.

Co-authored-by: Nikhil Kak 
Co-authored-by: Nandish Jayaram 

commit 0100da24333fda01fe2b0428d80da2ba5ab9
Author: Arvind Sridhar 
Date:   2018-07-09T23:12:28Z

Internal: Add function to check column type for 1D array

Co-authored-by: Nikhil Kak 

commit 1e8bc328824ea57a0d253834f36e7ca4b0eff26a
Author: Arvind Sridhar 
Date:   2018-07-09T23:14:48Z

Utilities: Add check for whether type is of any array variant

Co-authored-by: Nikhil Kak 




---


[GitHub] madlib issue #282: Utilites: Add CTAS while dropping some columns

2018-07-11 Thread fmcquillan99
Github user fmcquillan99 commented on the issue:

https://github.com/apache/madlib/pull/282
  
There is a bit of inconsistency related to the last param `cols_to_drop`

```
SELECT madlib.dropcols(
'houses',
'houses_out',
NULL
);
```
results in `houses_out` = 'houses' , i.e., no-op.

But

```
SELECT madlib.dropcols(
'houses',
'houses_out'
);
```
results in an error:

```
ERROR:  function madlib.dropcols(unknown, unknown) does not exist
LINE 1: SELECT madlib.dropcols(
   ^
HINT:  No function matches the given name and argument types. You might 
need to add explicit type casts.
```

I suggest we either make the param `cols_to_drop` mandatory (preferred) or 
make the 2 cases above consistent.

Also in the user docs:

1) please make the `dropcols` function name bold like the other utility 
functions on 
 http://madlib.apache.org/docs/latest/group__grp__utilities.html

2) In the left panel, I would suggest we rename
`Developer Database Functions` to `Utilities`
and rename the top level
`Utility Functions` to `Other Functions`

What to you think?





---


[GitHub] madlib issue #290: madpack: Add madpack option to run unit tests.

2018-07-11 Thread asfgit
Github user asfgit commented on the issue:

https://github.com/apache/madlib/pull/290
  

Refer to this link for build results (access rights to CI server needed): 
https://builds.apache.org/job/madlib-pr-build/544/



---


[GitHub] madlib pull request #290: madpack: Add madpack option to run unit tests.

2018-07-11 Thread njayaram2
GitHub user njayaram2 opened a pull request:

https://github.com/apache/madlib/pull/290

madpack: Add madpack option to run unit tests.

JIRA: MADLIB-1252

Unit tests in MADlib are written in python files, that are located in
the .../test/unit_tests/ folders, whose names begin with
the prefix "test_". This commit adds a new madpack option to run unit
tests similar to how we run install and dev checks.

- The new option added is: `unit-test`.
- Sample usage (on a postgres database, with MADlib installed on
  database `madlib`):
  * Run unit tests on all modules that have it defined:
  src/bin/madpack -p postgres -c /madlib unit-test
  * Run unit tests only for the `convex` module:
  src/bin/madpack -p postgres -c /madlib unit-test -t convex
  * Run unit tests only for the `convex` and decision trees module:
  src/bin/madpack -p postgres -c /madlib unit-test -t
  convex,recursive_partitioning/decision_tree
- Add command to run all unit tests in Jenkins build script.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/madlib/madlib madpack/unit-test

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/madlib/pull/290.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #290


commit 5f4395bc8548a5f74ba9a8fcb716f2a39ec51162
Author: Nandish Jayaram 
Date:   2018-07-11T00:32:24Z

madpack: Add madpack option to run unit tests.

JIRA: MADLIB-1252

Unit tests in MADlib are written in python files, that are located in
the .../test/unit_tests/ folders, whose names begin with
the prefix "test_". This commit adds a new madpack option to run unit
tests similar to how we run install and dev checks.

- The new option added is: `unit-test`.
- Sample usage (on a postgres database, with MADlib installed on
  database `madlib`):
  * Run unit tests on all modules that have it defined:
  src/bin/madpack -p postgres -c /madlib unit-test
  * Run unit tests only for the `convex` module:
  src/bin/madpack -p postgres -c /madlib unit-test -t convex
  * Run unit tests only for the `convex` and decision trees module:
  src/bin/madpack -p postgres -c /madlib unit-test -t
  convex,recursive_partitioning/decision_tree
- Add command to run all unit tests in Jenkins build script.




---


[GitHub] madlib issue #287: Fix incorrect dict expansion in table header

2018-07-11 Thread asfgit
Github user asfgit commented on the issue:

https://github.com/apache/madlib/pull/287
  

Refer to this link for build results (access rights to CI server needed): 
https://builds.apache.org/job/madlib-pr-build/543/



---


[GitHub] madlib issue #287: Fix incorrect dict expansion in table header

2018-07-11 Thread fmcquillan99
Github user fmcquillan99 commented on the issue:

https://github.com/apache/madlib/pull/287
  
This latest commit makes the following changes to use docs:

1) clarify cv for SVM and add user examples
2) clarify cv for elastic net and fix user examples
3) correct rmse calc in MLP

@rahiyer kindly review and correct and merge.

As for testing the CV output:  LGTM

```
SELECT * FROM abalone_svm_gaussian_regression_cv ORDER BY mean_score DESC;
 init_stepsize | lambda |   mean_score   | std_dev_score  
---+++
   1.0 |   0.01 | -4.06711568585 | 0.435966381366
   1.0 |0.1 | -4.08068428345 |  0.44660797513
   1.0 |0.5 | -4.52576046087 |  0.20597876382
  0.01 |   0.01 | -11.0231044189 | 0.739956548721
  0.01 |0.1 | -11.0244799274 | 0.740029346709
  0.01 |0.5 | -11.0305445077 | 0.740350338532
(6 rows)
```


---


[GitHub] madlib pull request #289: RF: Add impurity variable importance

2018-07-11 Thread iyerr3
Github user iyerr3 commented on a diff in the pull request:

https://github.com/apache/madlib/pull/289#discussion_r201728550
  
--- Diff: src/modules/recursive_partitioning/DT_impl.hpp ---
@@ -1512,6 +1512,9 @@ DecisionTree::computeVariableImportance(
 uint64_t maj_count = getMajorityCount(node_index);
 uint64_t min_count =  node_count - maj_count;
 
+if (min_count == 0) {
--- End diff --

I've pushed a commit to master that fixes the need to do this hack. Please 
rebase on master and remove these lines. 


---