[jira] [Resolved] (MADLIB-1089) Install check errors on HAWQ 2.2 when install MADlib on non-default schema

2017-08-09 Thread Frank McQuillan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MADLIB-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Frank McQuillan resolved MADLIB-1089.
-
Resolution: Fixed

> Install check errors on HAWQ 2.2 when install MADlib on non-default schema
> --
>
> Key: MADLIB-1089
> URL: https://issues.apache.org/jira/browse/MADLIB-1089
> Project: Apache MADlib
>  Issue Type: Bug
>  Components: All Modules
>Reporter: Frank McQuillan
>Priority: Minor
> Fix For: v1.12
>
> Attachments: k-means-IC-fail-on-hawq-2dot2, 
> linalg-IC-fail-on-hawq-2dot2
>
>
> Running install-check on a non-default schema in HAWQ 2.2 results in errors 
> for lining and means.
> {code}
> MADlib version: 1.10.0, git revision: rel/v1.9.1-58-ga3863b6, cmake 
> configuration time: Wed Mar  8 19:49:45 UTC 2017, build type: Release, bui
> ld system: Linux-2.6.18-238.27.1.el5.hotfix.bz516490, C compiler: gcc 4.4.0, 
> C++ compiler: g++ 4.4.0
>  PostgreSQL 8.2.15 (Greenplum Database 4.2.0 build 1) (HAWQ 2.2.0.0 build 
> 4141) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.8.5 20
> 150623 (Red Hat 4.8.5-11) compiled on Mar 30 2017 21:45:26
> {code}
> See attached log files and summaries below:
> linalg.sql_in
> {code}
> psql:/tmp/madlib.sGu72l/linalg/test/linalg.sql_in.tmp:165: ERROR:  Function 
> "closest_column(double precision[],double precision[],text)": Inval
> id distance metric provided: madlib1.squared_dist_norm2. Currently only 
> madlib provided distance functions are supported.
> {code}
> kmeans.sql_in
> {code}
> psql:/tmp/madlib.sGu72l/kmeans/test/kmeans.sql_in.tmp:117: ERROR:  
> plpy.SPIError: Function "closest_column(double precision[],double precision[
> ],text)": Invalid distance metric provided: madlib1.squared_dist_norm2. 
> Currently only madlib provided distance functions are supported.  (seg1
>  ip-10-32-127-188.ore6.vpc.pivotal.io:4 pid=483012) (plpython.c:4663)
> CONTEXT:  Traceback (most recent call last):
>   PL/Python function "internal_compute_kmeanspp_seeding", line 22, in 
> return kmeans.compute_kmeanspp_seeding(**globals())
>   PL/Python function "internal_compute_kmeanspp_seeding", line 154, in 
> compute_kmeanspp_seeding
>   PL/Python function "internal_compute_kmeanspp_seeding", line 415, in update
> PL/Python function "internal_compute_kmeanspp_seeding"
> SQL statement "SELECT  ( SELECT madlib1.internal_compute_kmeanspp_seeding( 
> '_madlib_kmeanspp_args', '_madlib_kmeanspp_state', textin(regclassou
> t( $1 )),  $2 ) )"
> PL/pgSQL function "kmeanspp_seeding" line 83 at assignment
> SQL statement "SELECT  madlib1.kmeans(  $1 ,  $2 , madlib1.kmeanspp_seeding( 
> $1 ,  $2 ,  $3 ,  $4 , NULL,  $5 ),  $4 ,  $6 ,  $7 ,  $8 )"
> PL/pgSQL function "kmeanspp" line 4 at assignment
> SQL statement "SELECT  madlib1.kmeanspp( $1 ,  $2 ,  $3 , 
> 'madlib1.squared_dist_norm2'::VARCHAR, 'madlib1.avg'::VARCHAR, 20::INTEGER, 
> 0.001::DO
> UBLE PRECISION, 1.0::DOUBLE PRECISION)"
> PL/pgSQL function "kmeanspp" line 4 at assignment
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MADLIB-1103) Remove pyxb GPL workaround

2017-08-09 Thread Ed Espino (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-1103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119507#comment-16119507
 ] 

Ed Espino commented on MADLIB-1103:
---

The fix will be made available in the PyXB 1.2.6 release. It is not clear when 
it will be made available. I suggest we push this to the next release.

> Remove pyxb GPL workaround
> --
>
> Key: MADLIB-1103
> URL: https://issues.apache.org/jira/browse/MADLIB-1103
> Project: Apache MADlib
>  Issue Type: Improvement
>  Components: Build System
>Reporter: Roman Shaposhnik
>Priority: Minor
> Fix For: v1.12
>
>
> Upstream pyxb has done the right thing and got rid of GPL code: 
> https://github.com/pabigot/pyxb/issues/77
> It would be great to remove workaround from MADlib



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (MADLIB-1118) Reduce size of elastic net install check table

2017-08-09 Thread Ed Espino (JIRA)

 [ 
https://issues.apache.org/jira/browse/MADLIB-1118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ed Espino reassigned MADLIB-1118:
-

Assignee: Ed Espino

> Reduce size of elastic net install check table
> --
>
> Key: MADLIB-1118
> URL: https://issues.apache.org/jira/browse/MADLIB-1118
> Project: Apache MADlib
>  Issue Type: Task
>  Components: Module: Regularized Regression
>Reporter: Frank McQuillan
>Assignee: Ed Espino
>Priority: Minor
> Fix For: v1.12
>
>
> IC is taking too long for elastic net.  I would suggest we reduce the size of 
> the input data table.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MADLIB-1118) Reduce size of elastic net install check table

2017-08-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-1118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119491#comment-16119491
 ] 

ASF GitHub Bot commented on MADLIB-1118:


GitHub user edespino opened a pull request:

https://github.com/apache/incubator-madlib/pull/163

MADLIB-1118. Change tolerance to 1e-2 (from 1e-6)

This changes the execution elapsed time to 2252 milliseconds from
10171 milliseconds on mac with Postgre 9.6

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/edespino/incubator-madlib MADLIB-1138

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-madlib/pull/163.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #163


commit 8c3bc61047f8a4cab61dd239502d08ede415316f
Author: Ed Espino 
Date:   2017-08-09T06:40:13Z

MADLIB-1118. Change tolerance to 1e-2 (from 1e-6)

This changes the execution elapsed time to 2252 milliseconds from
10171 milliseconds on mac with Postgre 9.6




> Reduce size of elastic net install check table
> --
>
> Key: MADLIB-1118
> URL: https://issues.apache.org/jira/browse/MADLIB-1118
> Project: Apache MADlib
>  Issue Type: Task
>  Components: Module: Regularized Regression
>Reporter: Frank McQuillan
>Assignee: Ed Espino
>Priority: Minor
> Fix For: v1.12
>
>
> IC is taking too long for elastic net.  I would suggest we reduce the size of 
> the input data table.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MADLIB-1118) Reduce size of elastic net install check table

2017-08-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-1118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16120552#comment-16120552
 ] 

ASF GitHub Bot commented on MADLIB-1118:


Github user edespino commented on the issue:

https://github.com/apache/incubator-madlib/pull/163
  
For future reference, this is how I reviewed the elastic_net install-check 
execution:

* Update the following file:`/src/ports/postgres/modules/elastic_net/test/elastic_net_install_check.sql_in`
  * Added `\timing` to top of the file.
  * Added `SELECT ASSERT (FALSE, 'Deliberately forced failure');` to the 
bottom of the file to force a failure condition. This will allowed me to review 
the timing information in the log files from the test execution.
* From build directory run `make install` to push updated install-check 
file to installation directory
* Run only the elastic_net test suite (using Postgres): 
`/usr/local/madlib/bin/madpack -s madlib -p postgres install-check -t 
elastic_net

I updated the elastic_net_train tolerance values with varying values and 
reran the repeated the scenario reviewing the recorded `Time:` values.


> Reduce size of elastic net install check table
> --
>
> Key: MADLIB-1118
> URL: https://issues.apache.org/jira/browse/MADLIB-1118
> Project: Apache MADlib
>  Issue Type: Task
>  Components: Module: Regularized Regression
>Reporter: Frank McQuillan
>Assignee: Ed Espino
>Priority: Minor
> Fix For: v1.12
>
>
> IC is taking too long for elastic net.  I would suggest we reduce the size of 
> the input data table.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MADLIB-1134) Neural Networks - MLP - Phase 2

2017-08-09 Thread Cooper Sloan (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-1134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16120655#comment-16120655
 ] 

Cooper Sloan commented on MADLIB-1134:
--

Very good article on regularization for NN:

https://piazza-resources.s3.amazonaws.com/ieytirtomz425i/ifyrgs0anxs3d5/Oct19Lecture.pdf?AWSAccessKeyId=AKIAIEDNRLJ4AZKBW6HA=1502245062=6FGb4J8zhez0uQWsu4xtefaKlFU%3D

> Neural Networks - MLP - Phase 2
> ---
>
> Key: MADLIB-1134
> URL: https://issues.apache.org/jira/browse/MADLIB-1134
> Project: Apache MADlib
>  Issue Type: Improvement
>  Components: Module: Neural Networks
>Reporter: Frank McQuillan
>Assignee: Cooper Sloan
> Fix For: v1.12
>
>
> Follow on from https://issues.apache.org/jira/browse/MADLIB-413
> Story
> As a MADlib developer, I want to get 2nd phase implementation of NN going 
> with training and prediction functions, so that I can use this to build to an 
> MVP version for GA.
> Features to add:
> * weights for inputs
> * logic for n_tries
> * normalize inputs
> * L2 regularization
> * learning rate policy
> * warm start



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (MADLIB-1134) Neural Networks - MLP - Phase 2

2017-08-09 Thread Cooper Sloan (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-1134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16120655#comment-16120655
 ] 

Cooper Sloan edited comment on MADLIB-1134 at 8/9/17 9:12 PM:
--

Very good article on regularization for NN:

https://www.google.com/url?sa=t=j==s=web=8=rja=8=0ahUKEwiaj573h8vVAhVos1QKHQubAa0QFgheMAc=https%3A%2F%2Fpiazza.com%2Fclass_profile%2Fget_resource%2Fieytirtomz425i%2Fifyrgs0anxs3d5=AFQjCNHYQNY3YuO4TX3UAzeplaXNgoANOQ


was (Author: coopersloan):
Very good article on regularization for NN:

https://piazza-resources.s3.amazonaws.com/ieytirtomz425i/ifyrgs0anxs3d5/Oct19Lecture.pdf?AWSAccessKeyId=AKIAIEDNRLJ4AZKBW6HA=1502245062=6FGb4J8zhez0uQWsu4xtefaKlFU%3D

> Neural Networks - MLP - Phase 2
> ---
>
> Key: MADLIB-1134
> URL: https://issues.apache.org/jira/browse/MADLIB-1134
> Project: Apache MADlib
>  Issue Type: Improvement
>  Components: Module: Neural Networks
>Reporter: Frank McQuillan
>Assignee: Cooper Sloan
> Fix For: v1.12
>
>
> Follow on from https://issues.apache.org/jira/browse/MADLIB-413
> Story
> As a MADlib developer, I want to get 2nd phase implementation of NN going 
> with training and prediction functions, so that I can use this to build to an 
> MVP version for GA.
> Features to add:
> * weights for inputs
> * logic for n_tries
> * normalize inputs
> * L2 regularization
> * learning rate policy
> * warm start



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (MADLIB-1005) Cannot compile for greenplum (arch linux) - AggCheckCallContext issue

2017-08-09 Thread Ed Espino (JIRA)

 [ 
https://issues.apache.org/jira/browse/MADLIB-1005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ed Espino resolved MADLIB-1005.
---
Resolution: Fixed

This was fixed back in January 2017 with the following commit:

{code}
git show -s 3cf3f6771ab51dd26605ce4d70cd70aee5d896dd
commit 3cf3f6771ab51dd26605ce4d70cd70aee5d896dd
Author: Dave Cramer 
Date:   Wed Jan 11 15:17:01 2017 -0800

Build: Exclude AggCheckCallContext for GPDB5

- Adds build files to compile MADlib with GPDB5
- GPDB5 cherrypicked AggCheckCallContext, we have to exclude it for GPDB5 
builds

Closes #83
{code}

I have verified this with GPDB 5 (7789b1a5fd18338b454396d5281a6127c9a9ee8a - 
{{configure --disable-orca --with-python}}) and MADlib 
(4e8616b7a9c0a21326b906ff534d341fab8a5fa4) on CentOS Linux release 7.3.1611 
(Core).

{code}
$ /usr/local/madlib/bin/madpack -s madlib -p greenplum install-check

madpack.py : INFO : Detected Greenplum DB version 5.0.0.
TEST CASE RESULT|Module: array_ops|array_ops.sql_in|PASS|Time: 582 milliseconds
TEST CASE RESULT|Module: bayes|gaussian_naive_bayes.sql_in|PASS|Time: 806 
milliseconds
TEST CASE RESULT|Module: bayes|bayes.sql_in|PASS|Time: 2032 milliseconds
TEST CASE RESULT|Module: crf|crf_train_small.sql_in|PASS|Time: 980 milliseconds
TEST CASE RESULT|Module: crf|crf_train_large.sql_in|PASS|Time: 1389 milliseconds
TEST CASE RESULT|Module: crf|crf_test_small.sql_in|PASS|Time: 852 milliseconds
TEST CASE RESULT|Module: crf|crf_test_large.sql_in|PASS|Time: 1019 milliseconds
TEST CASE RESULT|Module: 
elastic_net|elastic_net_install_check.sql_in|PASS|Time: 79703 milliseconds
TEST CASE RESULT|Module: linalg|svd.sql_in|PASS|Time: 7446 milliseconds
TEST CASE RESULT|Module: linalg|matrix_ops.sql_in|PASS|Time: 6264 milliseconds
TEST CASE RESULT|Module: linalg|linalg.sql_in|PASS|Time: 341 milliseconds
TEST CASE RESULT|Module: prob|prob.sql_in|PASS|Time: 1213 milliseconds
TEST CASE RESULT|Module: sketch|support.sql_in|PASS|Time: 49 milliseconds
TEST CASE RESULT|Module: sketch|mfv.sql_in|PASS|Time: 263 milliseconds
TEST CASE RESULT|Module: sketch|fm.sql_in|PASS|Time: 1782 milliseconds
TEST CASE RESULT|Module: sketch|cm.sql_in|PASS|Time: 6164 milliseconds
TEST CASE RESULT|Module: svm|svm.sql_in|PASS|Time: 13794 milliseconds
TEST CASE RESULT|Module: tsa|arima_train.sql_in|PASS|Time: 3856 milliseconds
TEST CASE RESULT|Module: tsa|arima.sql_in|PASS|Time: 3622 milliseconds
TEST CASE RESULT|Module: conjugate_gradient|conj_grad.sql_in|PASS|Time: 347 
milliseconds
TEST CASE RESULT|Module: knn|knn.sql_in|PASS|Time: 483 milliseconds
TEST CASE RESULT|Module: lda|lda.sql_in|PASS|Time: 3117 milliseconds
TEST CASE RESULT|Module: stats|wsr_test.sql_in|PASS|Time: 171 milliseconds
TEST CASE RESULT|Module: stats|t_test.sql_in|PASS|Time: 259 milliseconds
TEST CASE RESULT|Module: 
stats|robust_and_clustered_variance_coxph.sql_in|PASS|Time: 1125 milliseconds
TEST CASE RESULT|Module: stats|pred_metrics.sql_in|PASS|Time: 1015 milliseconds
TEST CASE RESULT|Module: stats|mw_test.sql_in|PASS|Time: 126 milliseconds
TEST CASE RESULT|Module: stats|ks_test.sql_in|PASS|Time: 336 milliseconds
TEST CASE RESULT|Module: stats|f_test.sql_in|PASS|Time: 127 milliseconds
TEST CASE RESULT|Module: stats|cox_prop_hazards.sql_in|PASS|Time: 2430 
milliseconds
TEST CASE RESULT|Module: stats|correlation.sql_in|PASS|Time: 1107 milliseconds
TEST CASE RESULT|Module: stats|chi2_test.sql_in|PASS|Time: 378 milliseconds
TEST CASE RESULT|Module: stats|anova_test.sql_in|PASS|Time: 267 milliseconds
TEST CASE RESULT|Module: svec_util|svec_test.sql_in|PASS|Time: 1567 milliseconds
TEST CASE RESULT|Module: svec_util|gp_sfv_sort_order.sql_in|PASS|Time: 126 
milliseconds
TEST CASE RESULT|Module: utilities|text_utilities.sql_in|PASS|Time: 288 
milliseconds
TEST CASE RESULT|Module: utilities|sessionize.sql_in|PASS|Time: 421 milliseconds
TEST CASE RESULT|Module: utilities|pivot.sql_in|PASS|Time: 1398 milliseconds
TEST CASE RESULT|Module: utilities|path.sql_in|PASS|Time: 439 milliseconds
TEST CASE RESULT|Module: utilities|encode_categorical.sql_in|PASS|Time: 735 
milliseconds
TEST CASE RESULT|Module: utilities|drop_madlib_temp.sql_in|PASS|Time: 165 
milliseconds
TEST CASE RESULT|Module: assoc_rules|assoc_rules.sql_in|PASS|Time: 1833 
milliseconds
TEST CASE RESULT|Module: convex|mlp.sql_in|PASS|Time: 14029 milliseconds
TEST CASE RESULT|Module: convex|lmf.sql_in|PASS|Time: 3226 milliseconds
TEST CASE RESULT|Module: glm|poisson.sql_in|PASS|Time: 1309 milliseconds
TEST CASE RESULT|Module: glm|ordinal.sql_in|PASS|Time: 1002 milliseconds
TEST CASE RESULT|Module: glm|multinom.sql_in|PASS|Time: 1184 milliseconds
TEST CASE RESULT|Module: glm|inverse_gaussian.sql_in|PASS|Time: 1604 
milliseconds
TEST CASE RESULT|Module: glm|gaussian.sql_in|PASS|Time: 1349 milliseconds
TEST CASE RESULT|Module: glm|gamma.sql_in|PASS|Time: 6276 milliseconds
TEST CASE RESULT|Module: glm|binomial.sql_in|PASS|Time: 

[jira] [Commented] (MADLIB-1118) Reduce size of elastic net install check table

2017-08-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-1118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16120865#comment-16120865
 ] 

ASF GitHub Bot commented on MADLIB-1118:


Github user njayaram2 commented on the issue:

https://github.com/apache/incubator-madlib/pull/163
  
LGTM, since we anyway don't assert on `relative_error` on `log_likelihood` 
in elastic_net.


> Reduce size of elastic net install check table
> --
>
> Key: MADLIB-1118
> URL: https://issues.apache.org/jira/browse/MADLIB-1118
> Project: Apache MADlib
>  Issue Type: Task
>  Components: Module: Regularized Regression
>Reporter: Frank McQuillan
>Assignee: Ed Espino
>Priority: Minor
> Fix For: v1.12
>
>
> IC is taking too long for elastic net.  I would suggest we reduce the size of 
> the input data table.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)