[jira] [Commented] (MADLIB-1172) Logistic regression produces empty output table but no error message on Greenplum
[ https://issues.apache.org/jira/browse/MADLIB-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16498722#comment-16498722 ] Frank McQuillan commented on MADLIB-1172: - {code} DROP TABLE IF EXISTS dummy_logit_gp, dummy_logit_gp_summary; SELECT madlib.logregr_train('dummy_data_gp' , 'dummy_logit_gp' , 'y' , 'ARRAY[1,x1,x2,x3,x4,x5]' , NULL , 20 , 'irls' ); {code} produces a proper warning message now {code} ERROR: plpy.Error: Logregr error: No model created possibly due to ill-conditioned data. (plpython.c:4960) CONTEXT: Traceback (most recent call last): PL/Python function "logregr_train", line 23, in return logistic.logregr_train(**globals()) PL/Python function "logregr_train", line 133, in logregr_train PL/Python function "logregr_train", line 349, in __logregr_train_compute PL/Python function "logregr_train" {code} LGTM > Logistic regression produces empty output table but no error message on > Greenplum > - > > Key: MADLIB-1172 > URL: https://issues.apache.org/jira/browse/MADLIB-1172 > Project: Apache MADlib > Issue Type: Bug > Components: Module: Logistic Regression >Reporter: Frank McQuillan >Assignee: Himanshu Pandey >Priority: Minor > Fix For: v1.15 > > Attachments: Logistic-regression-empty-output.ipynb, > load-data-sep.sql, load-data-singular.sql, load-data.sql > > > Separated and singular data sets may produce and empty model table on > Greenplum 4.3.x. On Postgres 9.6 the same example works OK. > See the attache jupyter notebook and data sets for details. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MADLIB-1172) Logistic regression produces empty output table but no error message on Greenplum
[ https://issues.apache.org/jira/browse/MADLIB-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16497464#comment-16497464 ] ASF GitHub Bot commented on MADLIB-1172: Github user asfgit closed the pull request at: https://github.com/apache/madlib/pull/270 > Logistic regression produces empty output table but no error message on > Greenplum > - > > Key: MADLIB-1172 > URL: https://issues.apache.org/jira/browse/MADLIB-1172 > Project: Apache MADlib > Issue Type: Bug > Components: Module: Logistic Regression >Reporter: Frank McQuillan >Assignee: Himanshu Pandey >Priority: Minor > Fix For: v1.15 > > Attachments: Logistic-regression-empty-output.ipynb, > load-data-sep.sql, load-data-singular.sql, load-data.sql > > > Separated and singular data sets may produce and empty model table on > Greenplum 4.3.x. On Postgres 9.6 the same example works OK. > See the attache jupyter notebook and data sets for details. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MADLIB-1172) Logistic regression produces empty output table but no error message on Greenplum
[ https://issues.apache.org/jira/browse/MADLIB-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16484418#comment-16484418 ] ASF GitHub Bot commented on MADLIB-1172: GitHub user hpandeycodeit opened a pull request: https://github.com/apache/madlib/pull/270 Jira 1172 JIRA MADLIB-1172: When the model cannot be generated due to the ill-conditioned input data, the output table doesn't get populated. In that case, Print the below error message: "An Output model cannot be created because of ill-conditioned Data" You can merge this pull request into a Git repository by running: $ git pull https://github.com/hpandeycodeit/incubator-madlib Jira_1172 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/madlib/pull/270.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #270 commit 1a53f70e9e8c41971489f8b2e825b0f657be334d Author: hpandeycodeit Date: 2018-05-22T18:49:11Z JIRA MADLIB-1172 When the model cannot be generated due to the ill-conditioned input data, the output table output table doen't get populated. In That case, Print the below error message: "An Output model cannot be created because of ill-conditioned Data" commit 9122dd2bd13e8d4ae9c3e0379468d5caa155a3a6 Author: hpandeycodeit Date: 2018-05-22T18:54:20Z Added an error message when the output table is Empty or model cannot be generated. > Logistic regression produces empty output table but no error message on > Greenplum > - > > Key: MADLIB-1172 > URL: https://issues.apache.org/jira/browse/MADLIB-1172 > Project: Apache MADlib > Issue Type: Bug > Components: Module: Logistic Regression >Reporter: Frank McQuillan >Assignee: Himanshu Pandey >Priority: Minor > Fix For: v1.15 > > Attachments: Logistic-regression-empty-output.ipynb, > load-data-sep.sql, load-data-singular.sql, load-data.sql > > > Separated and singular data sets may produce and empty model table on > Greenplum 4.3.x. On Postgres 9.6 the same example works OK. > See the attache jupyter notebook and data sets for details. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MADLIB-1172) Logistic regression produces empty output table but no error message on Greenplum
[ https://issues.apache.org/jira/browse/MADLIB-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16481315#comment-16481315 ] Himanshu Pandey commented on MADLIB-1172: - The following run, that's giving the warning in the logs, doesn't have a grouping involved: {code} select madlib.logregr_train('dummy_data','dummy_logit_gp','y','ARRAY[1,x1,x2,x3,x4,x5]',NULL,20,'irls'); {code} I have tested the above data with grouping and it's giving around 300 records: {code} select madlib.logregr_train('dummy_data','dummy_logit_gp','y','ARRAY[1,x1,x2,x3,x4,x5]','id',20,'irls'); {code} > Logistic regression produces empty output table but no error message on > Greenplum > - > > Key: MADLIB-1172 > URL: https://issues.apache.org/jira/browse/MADLIB-1172 > Project: Apache MADlib > Issue Type: Bug > Components: Module: Logistic Regression >Reporter: Frank McQuillan >Assignee: Himanshu Pandey >Priority: Minor > Fix For: v1.15 > > Attachments: Logistic-regression-empty-output.ipynb, > load-data-sep.sql, load-data-singular.sql, load-data.sql > > > Separated and singular data sets may produce and empty model table on > Greenplum 4.3.x. On Postgres 9.6 the same example works OK. > See the attache jupyter notebook and data sets for details. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MADLIB-1172) Logistic regression produces empty output table but no error message on Greenplum
[ https://issues.apache.org/jira/browse/MADLIB-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16479899#comment-16479899 ] Frank McQuillan commented on MADLIB-1172: - Do you think this sort of logic makes sense? case 1: {code} If no grouping and underflow/overflow OR If grouping and underflow/overflow for all groups THEN Error out with appropriate message {code} case 2: {code} If grouping and at least one valid solution (i.e., at least one group OK) THEN Continue but print warning message showing the invalid groups. {code} Looks like case 2 is what happens today, more or less. Can you also check what happens in predict with case 2 to make sure it is handled gracefully? > Logistic regression produces empty output table but no error message on > Greenplum > - > > Key: MADLIB-1172 > URL: https://issues.apache.org/jira/browse/MADLIB-1172 > Project: Apache MADlib > Issue Type: Bug > Components: Module: Logistic Regression >Reporter: Frank McQuillan >Assignee: Himanshu Pandey >Priority: Minor > Fix For: v1.15 > > Attachments: Logistic-regression-empty-output.ipynb, > load-data-sep.sql, load-data-singular.sql, load-data.sql > > > Separated and singular data sets may produce and empty model table on > Greenplum 4.3.x. On Postgres 9.6 the same example works OK. > See the attache jupyter notebook and data sets for details. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MADLIB-1172) Logistic regression produces empty output table but no error message on Greenplum
[ https://issues.apache.org/jira/browse/MADLIB-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16479460#comment-16479460 ] Himanshu Pandey commented on MADLIB-1172: - Yes, we can print something like this as a warning : "Please Note: Input data is not correct numerically and may generate improper results." > Logistic regression produces empty output table but no error message on > Greenplum > - > > Key: MADLIB-1172 > URL: https://issues.apache.org/jira/browse/MADLIB-1172 > Project: Apache MADlib > Issue Type: Bug > Components: Module: Logistic Regression >Reporter: Frank McQuillan >Assignee: Himanshu Pandey >Priority: Minor > Fix For: v1.15 > > Attachments: Logistic-regression-empty-output.ipynb, > load-data-sep.sql, load-data-singular.sql, load-data.sql > > > Separated and singular data sets may produce and empty model table on > Greenplum 4.3.x. On Postgres 9.6 the same example works OK. > See the attache jupyter notebook and data sets for details. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MADLIB-1172) Logistic regression produces empty output table but no error message on Greenplum
[ https://issues.apache.org/jira/browse/MADLIB-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478144#comment-16478144 ] Frank McQuillan commented on MADLIB-1172: - Nice sleuthing. Can we trap this error and give the user a nice message back so the user knows what is going on? > Logistic regression produces empty output table but no error message on > Greenplum > - > > Key: MADLIB-1172 > URL: https://issues.apache.org/jira/browse/MADLIB-1172 > Project: Apache MADlib > Issue Type: Bug > Components: Module: Logistic Regression >Reporter: Frank McQuillan >Assignee: Himanshu Pandey >Priority: Minor > Fix For: v1.15 > > Attachments: Logistic-regression-empty-output.ipynb, > load-data-sep.sql, load-data-singular.sql, load-data.sql > > > Separated and singular data sets may produce and empty model table on > Greenplum 4.3.x. On Postgres 9.6 the same example works OK. > See the attache jupyter notebook and data sets for details. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MADLIB-1172) Logistic regression produces empty output table but no error message on Greenplum
[ https://issues.apache.org/jira/browse/MADLIB-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478113#comment-16478113 ] Himanshu Pandey commented on MADLIB-1172: - Hi [~fmcquillan], This looks like a data issue. Following warning is captured in segment logs : {code} 2018-05-16 18:51:58.499974 UTC,"gpadmin","gpadmin",p2130,th2056992640,"127.0.0.1","55898",2018-05-16 18:51:58 UTC,5644,con141,cmd40,seg1,slice3,dx2066,x5644,sx1,"WARNING","01000","Over- or underflow in Newton step, while updating coefficients.Input data is likely of poor numerical condition.",," {code} The above warning is inside the logregr_irls_step_final function of logistic.cpp file: {code} if (!state.X_transp_AX.is_finite() || !state.X_transp_Az.is_finite()){ //throw NoSolutionFoundException( //"Over- or underflow in intermediate calulation. Input data is " //"likely of poor numerical condition."); warning("Over- or underflow in intermediate calulation. Input data is " "likely of poor numerical condition."); state.status = TERMINATED; return state; } {code} > Logistic regression produces empty output table but no error message on > Greenplum > - > > Key: MADLIB-1172 > URL: https://issues.apache.org/jira/browse/MADLIB-1172 > Project: Apache MADlib > Issue Type: Bug > Components: Module: Logistic Regression >Reporter: Frank McQuillan >Assignee: Himanshu Pandey >Priority: Minor > Fix For: v1.15 > > Attachments: Logistic-regression-empty-output.ipynb, > load-data-sep.sql, load-data-singular.sql, load-data.sql > > > Separated and singular data sets may produce and empty model table on > Greenplum 4.3.x. On Postgres 9.6 the same example works OK. > See the attache jupyter notebook and data sets for details. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MADLIB-1172) Logistic regression produces empty output table but no error message on Greenplum
[ https://issues.apache.org/jira/browse/MADLIB-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16471207#comment-16471207 ] Himanshu Pandey commented on MADLIB-1172: - Hi [~fmcquillan], Here is what I have discovered so far. The data for the last record, id = 300 is causing this issue. More specifically, value for column x5. When I changed the value to 107 or 110 it worked. The value in data-set is 108. For eg: {code} gpadmin=# update dummy_data set x5 = 107 where id = 300; UPDATE 1 gpadmin=# select madlib.logregr_train('dummy_data','dummy_logit_gp','y','ARRAY[1,x1,x2,x3,x4,x5]',NULL,20,'irls'); logregr_train --- (1 row) gpadmin=# select * from dummy_logit_gp; \{-60.6963399562406,83.2481369307379,-41.5757740167708,41.6723539072261,-220.233572947378,59.1513318881784} | -0.000188250058233221 | \{6379.02591630875,33122.6359459149,16561.3303753767,16561.3157515072,1895.85601499328,25482.6866481577} | {-0.00951498563457205,0.00251333067412484,- 0.00251041269477877,0.00251624656715053,-0.116165769555109,0.00232123608883504} | \{0.99240825441914,0.997994654370162,0.997996982573486,0.997992327831493,0.907521164983078,0.998147923225947} | {4.36429888587852e-27,1.4262856007256e+36,8.78760978274153e-19,1.25335284111666e+18,2.2582 631046692e-96,4.88761553216779e+25} | Infinity | 300 | 0 | 20 | \{{40691971.6409387,-211081401.140304,105540486.26844,-105540866.77075,-7621488.81098509,162096329.331243},{-211081401.140304,1097109012.00561,-548554623.142637,54855 4138.7612,41701724.8827007,-843656994.586568},\{105540486.26844,-548554623.142637,274277663.802374,-274276834.289312,-20852051.0896696,421829046.690131},\{-105540866.77075,548554138.7612,-274276834.289312,274277179.421121,20849664.1968923,-421827755.523287},{-7621488.81098509,41701724 .8827007,-20852051.0896696,20849664.1968923,3594270.02958621,-33175131.4666084},\{162096329.331243,-843656994.586568,421829046.690132,-421827755.523287,-33175131.4666084,649367318.808194}} (1 row) gpadmin=# update dummy_data set x5 = 110 where id = 300; UPDATE 1 gpadmin=# select madlib.logregr_train('dummy_data','dummy_logit_gp','y','ARRAY[1,x1,x2,x3,x4,x5]',NULL,20,'irls'); logregr_train --- (1 row) gpadmin=# select * from dummy_logit_gp; \{-82.8680682455691,198.593133300487,-99.2473096206599,99.3458092682156,-215.747249093631,-29.6039160744594} | -0.000188751550486611 | \{3095.02103385595,15902.436542112,7951.16609447363,7951.30599917857,1493.94792706878,12191.3941605613} | {-0.026774638149172,0.0124882204544305,-0.0 124821074596392,0.0124942756923804,-0.144414169452974,-0.00242826338682633} | \{0.978639481789339,0.990036100695913,0.99004099944,0.990031269691949,0.885173428489227,0.998062528038127} | {1.02531009876533e-36,1.76970931272333e+86,7.89661724612934e-44,1.39745156827503e+43,2.005211 7132129e-94,1.39053718214918e-13} | Infinity | 300 | 0 | 20 | \{{9579155.20001074,-49013448.3206615,24506250.6929641,-24507193.0554383,-1117659.32579782,37279704.1887547},{-49013448.3206615,252887487.9759,-126442619.757955,1264448 44.588794,7820179.2181533,-193478250.287115},\{24506250.6929641,-126442619.757955,63221042.2619071,-63221565.6815607,-3911332.71616586,96738775.4366319},\{-24507193.0554383,126444844.588794,-63221565.6815607,63223267.0925732,3908845.73413929,-96739456.7519012},{-1117659.32579782,78201 79.2181533,-3911332.71616586,3908845.73413929,2231880.40879312,-7079677.90789804},\{37279704.1887547,-193478250.287115,96738775.4366319,-96739456.7519012,-7079677.90789804,148630091.578168}} (1 row) {code} I will continue investigating and will update with my findings. > Logistic regression produces empty output table but no error message on > Greenplum > - > > Key: MADLIB-1172 > URL: https://issues.apache.org/jira/browse/MADLIB-1172 > Project: Apache MADlib > Issue Type: Bug > Components: Module: Logistic Regression >Reporter: Frank McQuillan >Assignee: Himanshu Pandey >Priority: Minor > Fix For: v1.15 > > Attachments: Logistic-regression-empty-output.ipynb, > load-data-sep.sql, load-data-singular.sql, load-data.sql > > > Separated and singular data sets may produce and empty model table on > Greenplum 4.3.x. On Postgres 9.6 the same example works OK. > See the attache jupyter notebook and data sets for details. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MADLIB-1172) Logistic regression produces empty output table but no error message on Greenplum
[ https://issues.apache.org/jira/browse/MADLIB-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16464386#comment-16464386 ] Frank McQuillan commented on MADLIB-1172: - [~hpandey] Correct, the Singular datasets works fine on gp and the separable dataset works fine on gp. But when the dataset is both *singular and separable* then it does not work on gp, i.e., it produces an empty output with no error message or information from the user. This is the data set called "load-data.sql". The notebook shows that "load-data.sql" does however work with postgres. The goal of this JIRA is to investigate the problem with the "load-data.sql" dataset. The issue is with the data set th > Logistic regression produces empty output table but no error message on > Greenplum > - > > Key: MADLIB-1172 > URL: https://issues.apache.org/jira/browse/MADLIB-1172 > Project: Apache MADlib > Issue Type: Bug > Components: Module: Logistic Regression >Reporter: Frank McQuillan >Assignee: Himanshu Pandey >Priority: Minor > Fix For: v1.15 > > Attachments: Logistic-regression-empty-output.ipynb, > load-data-sep.sql, load-data-singular.sql, load-data.sql > > > Separated and singular data sets may produce and empty model table on > Greenplum 4.3.x. On Postgres 9.6 the same example works OK. > See the attache jupyter notebook and data sets for details. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MADLIB-1172) Logistic regression produces empty output table but no error message on Greenplum
[ https://issues.apache.org/jira/browse/MADLIB-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16461501#comment-16461501 ] Himanshu Pandey commented on MADLIB-1172: - Hi [~fmcquillan] , I have tested this in 4.3.25 and below are the results. Both Singular data and Separated datasets work fine and return an output but the regular data-set load-data.sql is returning an empty model table which is different from the initial issue. {code:java} [gpadmin@gpdb ~]$ psql -f load-data.sql psql:load-data.sql:1: NOTICE: table "dummy_data" does not exist, skipping DROP TABLE psql:load-data.sql:2: NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'id' as the Greenplum Database data distribution key for this table. HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew. CREATE TABLE [gpadmin@gpdb ~]$ psql psql (8.2.15) Type "help" for help. gpadmin=# \dt List of relations Schema | Name | Type | Owner | Storage ++---+-+- public | dummy_data | table | gpadmin | heap (1 row) gpadmin=# select madlib.logregr_train('dummy_data','dummy_logit_gp','y','ARRAY[1,x1,x2,x3,x4,x5]',NULL,20,'irls'); logregr_train --- (1 row) gpadmin=# select * from dummy_logit_gp; coef | log_likelihood | std_err | z_stats | p_values | odds_ratios | condition_no | num_rows_processed | num_missing_rows_skipped | num_iterations | variance_covariance --++-+-+--+-+--++--++- | | | | | | | | | 4 | (1 row) gpadmin=# {code} > Logistic regression produces empty output table but no error message on > Greenplum > - > > Key: MADLIB-1172 > URL: https://issues.apache.org/jira/browse/MADLIB-1172 > Project: Apache MADlib > Issue Type: Bug > Components: Module: Logistic Regression >Reporter: Frank McQuillan >Assignee: Himanshu Pandey >Priority: Minor > Fix For: v1.15 > > Attachments: Logistic-regression-empty-output.ipynb, > load-data-sep.sql, load-data-singular.sql, load-data.sql > > > Separated and singular data sets may produce and empty model table on > Greenplum 4.3.x. On Postgres 9.6 the same example works OK. > See the attache jupyter notebook and data sets for details. -- This message was sent by Atlassian JIRA (v7.6.3#76005)