[jira] [Commented] (MADLIB-1172) Logistic regression produces empty output table but no error message on Greenplum

2018-06-01 Thread Frank McQuillan (JIRA)


[ 
https://issues.apache.org/jira/browse/MADLIB-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16498722#comment-16498722
 ] 

Frank McQuillan commented on MADLIB-1172:
-


{code}
DROP TABLE IF EXISTS dummy_logit_gp, dummy_logit_gp_summary;
SELECT madlib.logregr_train('dummy_data_gp'
  , 'dummy_logit_gp'
  , 'y'
  , 'ARRAY[1,x1,x2,x3,x4,x5]'
  , NULL
  , 20
  , 'irls'
  );
{code}
produces a proper warning message now
{code}
ERROR:  plpy.Error: Logregr error: No model created possibly due to 
ill-conditioned data. (plpython.c:4960)
CONTEXT:  Traceback (most recent call last):
  PL/Python function "logregr_train", line 23, in 
return logistic.logregr_train(**globals())
  PL/Python function "logregr_train", line 133, in logregr_train
  PL/Python function "logregr_train", line 349, in __logregr_train_compute
PL/Python function "logregr_train"
{code}

LGTM

> Logistic regression produces empty output table but no error message on 
> Greenplum
> -
>
> Key: MADLIB-1172
> URL: https://issues.apache.org/jira/browse/MADLIB-1172
> Project: Apache MADlib
>  Issue Type: Bug
>  Components: Module: Logistic Regression
>Reporter: Frank McQuillan
>Assignee: Himanshu Pandey
>Priority: Minor
> Fix For: v1.15
>
> Attachments: Logistic-regression-empty-output.ipynb, 
> load-data-sep.sql, load-data-singular.sql, load-data.sql
>
>
> Separated and singular data sets may produce and empty model table on 
> Greenplum 4.3.x.  On Postgres 9.6 the same example works OK. 
> See the attache jupyter notebook and data sets for details.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MADLIB-1172) Logistic regression produces empty output table but no error message on Greenplum

2018-05-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/MADLIB-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16497464#comment-16497464
 ] 

ASF GitHub Bot commented on MADLIB-1172:


Github user asfgit closed the pull request at:

https://github.com/apache/madlib/pull/270


> Logistic regression produces empty output table but no error message on 
> Greenplum
> -
>
> Key: MADLIB-1172
> URL: https://issues.apache.org/jira/browse/MADLIB-1172
> Project: Apache MADlib
>  Issue Type: Bug
>  Components: Module: Logistic Regression
>Reporter: Frank McQuillan
>Assignee: Himanshu Pandey
>Priority: Minor
> Fix For: v1.15
>
> Attachments: Logistic-regression-empty-output.ipynb, 
> load-data-sep.sql, load-data-singular.sql, load-data.sql
>
>
> Separated and singular data sets may produce and empty model table on 
> Greenplum 4.3.x.  On Postgres 9.6 the same example works OK. 
> See the attache jupyter notebook and data sets for details.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MADLIB-1172) Logistic regression produces empty output table but no error message on Greenplum

2018-05-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16484418#comment-16484418
 ] 

ASF GitHub Bot commented on MADLIB-1172:


GitHub user hpandeycodeit opened a pull request:

https://github.com/apache/madlib/pull/270

Jira 1172

JIRA MADLIB-1172:

When the model cannot be generated due to the
ill-conditioned input data, the output table doesn't get populated.
In that case, Print the below error message:
 
"An Output model cannot be created because of ill-conditioned Data"

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hpandeycodeit/incubator-madlib Jira_1172

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/madlib/pull/270.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #270


commit 1a53f70e9e8c41971489f8b2e825b0f657be334d
Author: hpandeycodeit 
Date:   2018-05-22T18:49:11Z

  JIRA MADLIB-1172

  When the model cannot be generated due to the
  ill-conditioned input data, the output table
  output table doen't get populated.
  In That case, Print the below error message:

  "An Output model cannot be created because of ill-conditioned Data"

commit 9122dd2bd13e8d4ae9c3e0379468d5caa155a3a6
Author: hpandeycodeit 
Date:   2018-05-22T18:54:20Z

  Added an error message when the output table is Empty or model cannot be 
generated.




> Logistic regression produces empty output table but no error message on 
> Greenplum
> -
>
> Key: MADLIB-1172
> URL: https://issues.apache.org/jira/browse/MADLIB-1172
> Project: Apache MADlib
>  Issue Type: Bug
>  Components: Module: Logistic Regression
>Reporter: Frank McQuillan
>Assignee: Himanshu Pandey
>Priority: Minor
> Fix For: v1.15
>
> Attachments: Logistic-regression-empty-output.ipynb, 
> load-data-sep.sql, load-data-singular.sql, load-data.sql
>
>
> Separated and singular data sets may produce and empty model table on 
> Greenplum 4.3.x.  On Postgres 9.6 the same example works OK. 
> See the attache jupyter notebook and data sets for details.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MADLIB-1172) Logistic regression produces empty output table but no error message on Greenplum

2018-05-18 Thread Himanshu Pandey (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16481315#comment-16481315
 ] 

Himanshu Pandey commented on MADLIB-1172:
-

The following run, that's giving the warning in the logs, doesn't have a 
grouping involved: 

{code}
select 
madlib.logregr_train('dummy_data','dummy_logit_gp','y','ARRAY[1,x1,x2,x3,x4,x5]',NULL,20,'irls');
{code}

I have tested the above data with grouping and it's giving around 300 records: 

{code}
select 
madlib.logregr_train('dummy_data','dummy_logit_gp','y','ARRAY[1,x1,x2,x3,x4,x5]','id',20,'irls');
{code}


> Logistic regression produces empty output table but no error message on 
> Greenplum
> -
>
> Key: MADLIB-1172
> URL: https://issues.apache.org/jira/browse/MADLIB-1172
> Project: Apache MADlib
>  Issue Type: Bug
>  Components: Module: Logistic Regression
>Reporter: Frank McQuillan
>Assignee: Himanshu Pandey
>Priority: Minor
> Fix For: v1.15
>
> Attachments: Logistic-regression-empty-output.ipynb, 
> load-data-sep.sql, load-data-singular.sql, load-data.sql
>
>
> Separated and singular data sets may produce and empty model table on 
> Greenplum 4.3.x.  On Postgres 9.6 the same example works OK. 
> See the attache jupyter notebook and data sets for details.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MADLIB-1172) Logistic regression produces empty output table but no error message on Greenplum

2018-05-17 Thread Frank McQuillan (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16479899#comment-16479899
 ] 

Frank McQuillan commented on MADLIB-1172:
-

Do you think this sort of logic makes sense?

case 1: 
{code}
If no grouping and underflow/overflow
OR
If grouping and underflow/overflow for all groups
THEN
Error out with appropriate message
{code}


case 2:
{code}
If grouping and at least one valid solution (i.e., at least one group OK)
THEN
Continue but print warning message showing the invalid groups.
{code}

Looks like case 2 is what happens today, more or less.
Can you also check what happens in predict with case 2 to make sure it is
handled gracefully?

> Logistic regression produces empty output table but no error message on 
> Greenplum
> -
>
> Key: MADLIB-1172
> URL: https://issues.apache.org/jira/browse/MADLIB-1172
> Project: Apache MADlib
>  Issue Type: Bug
>  Components: Module: Logistic Regression
>Reporter: Frank McQuillan
>Assignee: Himanshu Pandey
>Priority: Minor
> Fix For: v1.15
>
> Attachments: Logistic-regression-empty-output.ipynb, 
> load-data-sep.sql, load-data-singular.sql, load-data.sql
>
>
> Separated and singular data sets may produce and empty model table on 
> Greenplum 4.3.x.  On Postgres 9.6 the same example works OK. 
> See the attache jupyter notebook and data sets for details.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MADLIB-1172) Logistic regression produces empty output table but no error message on Greenplum

2018-05-17 Thread Himanshu Pandey (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16479460#comment-16479460
 ] 

Himanshu Pandey commented on MADLIB-1172:
-

Yes, we can print something like this as a warning : 

"Please Note: Input data is not correct numerically and may generate improper 
results."  



> Logistic regression produces empty output table but no error message on 
> Greenplum
> -
>
> Key: MADLIB-1172
> URL: https://issues.apache.org/jira/browse/MADLIB-1172
> Project: Apache MADlib
>  Issue Type: Bug
>  Components: Module: Logistic Regression
>Reporter: Frank McQuillan
>Assignee: Himanshu Pandey
>Priority: Minor
> Fix For: v1.15
>
> Attachments: Logistic-regression-empty-output.ipynb, 
> load-data-sep.sql, load-data-singular.sql, load-data.sql
>
>
> Separated and singular data sets may produce and empty model table on 
> Greenplum 4.3.x.  On Postgres 9.6 the same example works OK. 
> See the attache jupyter notebook and data sets for details.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MADLIB-1172) Logistic regression produces empty output table but no error message on Greenplum

2018-05-16 Thread Frank McQuillan (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478144#comment-16478144
 ] 

Frank McQuillan commented on MADLIB-1172:
-

Nice sleuthing.

Can we trap this error and give the user a nice message back so the user knows 
what is going on?

> Logistic regression produces empty output table but no error message on 
> Greenplum
> -
>
> Key: MADLIB-1172
> URL: https://issues.apache.org/jira/browse/MADLIB-1172
> Project: Apache MADlib
>  Issue Type: Bug
>  Components: Module: Logistic Regression
>Reporter: Frank McQuillan
>Assignee: Himanshu Pandey
>Priority: Minor
> Fix For: v1.15
>
> Attachments: Logistic-regression-empty-output.ipynb, 
> load-data-sep.sql, load-data-singular.sql, load-data.sql
>
>
> Separated and singular data sets may produce and empty model table on 
> Greenplum 4.3.x.  On Postgres 9.6 the same example works OK. 
> See the attache jupyter notebook and data sets for details.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MADLIB-1172) Logistic regression produces empty output table but no error message on Greenplum

2018-05-16 Thread Himanshu Pandey (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478113#comment-16478113
 ] 

Himanshu Pandey commented on MADLIB-1172:
-

Hi [~fmcquillan], 

This looks like a data issue. Following warning is captured in segment logs : 

{code}
2018-05-16 18:51:58.499974 
UTC,"gpadmin","gpadmin",p2130,th2056992640,"127.0.0.1","55898",2018-05-16 
18:51:58 
UTC,5644,con141,cmd40,seg1,slice3,dx2066,x5644,sx1,"WARNING","01000","Over- or 
underflow in Newton step, while updating coefficients.Input data is likely of 
poor numerical condition.",,"

{code}


The above warning is inside the logregr_irls_step_final function of 
logistic.cpp file: 

{code}
 if (!state.X_transp_AX.is_finite() || !state.X_transp_Az.is_finite()){
//throw NoSolutionFoundException(
//"Over- or underflow in intermediate calulation. Input data is "
//"likely of poor numerical condition.");
warning("Over- or underflow in intermediate calulation. Input data is "
  "likely of poor numerical condition.");
state.status = TERMINATED;
return state;
}
{code}



> Logistic regression produces empty output table but no error message on 
> Greenplum
> -
>
> Key: MADLIB-1172
> URL: https://issues.apache.org/jira/browse/MADLIB-1172
> Project: Apache MADlib
>  Issue Type: Bug
>  Components: Module: Logistic Regression
>Reporter: Frank McQuillan
>Assignee: Himanshu Pandey
>Priority: Minor
> Fix For: v1.15
>
> Attachments: Logistic-regression-empty-output.ipynb, 
> load-data-sep.sql, load-data-singular.sql, load-data.sql
>
>
> Separated and singular data sets may produce and empty model table on 
> Greenplum 4.3.x.  On Postgres 9.6 the same example works OK. 
> See the attache jupyter notebook and data sets for details.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MADLIB-1172) Logistic regression produces empty output table but no error message on Greenplum

2018-05-10 Thread Himanshu Pandey (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16471207#comment-16471207
 ] 

Himanshu Pandey commented on MADLIB-1172:
-

Hi [~fmcquillan],

Here is what I have discovered so far. The data for the last record, id = 300 
is causing this issue. More specifically, value for column x5. 

When I changed the value to 107 or 110 it worked. The value in data-set is 108. 

 

For eg: 

{code}

gpadmin=# update dummy_data set x5 = 107 where id = 300;

UPDATE 1

gpadmin=# select 
madlib.logregr_train('dummy_data','dummy_logit_gp','y','ARRAY[1,x1,x2,x3,x4,x5]',NULL,20,'irls');

 logregr_train 

---

 

(1 row)

 

gpadmin=# select * from dummy_logit_gp;

 
\{-60.6963399562406,83.2481369307379,-41.5757740167708,41.6723539072261,-220.233572947378,59.1513318881784}
 | -0.000188250058233221 | 
\{6379.02591630875,33122.6359459149,16561.3303753767,16561.3157515072,1895.85601499328,25482.6866481577}
 | {-0.00951498563457205,0.00251333067412484,-

0.00251041269477877,0.00251624656715053,-0.116165769555109,0.00232123608883504} 
| 
\{0.99240825441914,0.997994654370162,0.997996982573486,0.997992327831493,0.907521164983078,0.998147923225947}
 | 
{4.36429888587852e-27,1.4262856007256e+36,8.78760978274153e-19,1.25335284111666e+18,2.2582

631046692e-96,4.88761553216779e+25} |     Infinity |                300 |       
                 0 |             20 | 
\{{40691971.6409387,-211081401.140304,105540486.26844,-105540866.77075,-7621488.81098509,162096329.331243},{-211081401.140304,1097109012.00561,-548554623.142637,54855

4138.7612,41701724.8827007,-843656994.586568},\{105540486.26844,-548554623.142637,274277663.802374,-274276834.289312,-20852051.0896696,421829046.690131},\{-105540866.77075,548554138.7612,-274276834.289312,274277179.421121,20849664.1968923,-421827755.523287},{-7621488.81098509,41701724

.8827007,-20852051.0896696,20849664.1968923,3594270.02958621,-33175131.4666084},\{162096329.331243,-843656994.586568,421829046.690132,-421827755.523287,-33175131.4666084,649367318.808194}}

(1 row)

 

gpadmin=# update dummy_data set x5 = 110 where id = 300;

UPDATE 1

 

gpadmin=# select 
madlib.logregr_train('dummy_data','dummy_logit_gp','y','ARRAY[1,x1,x2,x3,x4,x5]',NULL,20,'irls');

 logregr_train 

---

 

(1 row)

 

gpadmin=# select * from dummy_logit_gp;

 

 
\{-82.8680682455691,198.593133300487,-99.2473096206599,99.3458092682156,-215.747249093631,-29.6039160744594}
 | -0.000188751550486611 | 
\{3095.02103385595,15902.436542112,7951.16609447363,7951.30599917857,1493.94792706878,12191.3941605613}
 | {-0.026774638149172,0.0124882204544305,-0.0

124821074596392,0.0124942756923804,-0.144414169452974,-0.00242826338682633} | 
\{0.978639481789339,0.990036100695913,0.99004099944,0.990031269691949,0.885173428489227,0.998062528038127}
 | 
{1.02531009876533e-36,1.76970931272333e+86,7.89661724612934e-44,1.39745156827503e+43,2.005211

7132129e-94,1.39053718214918e-13} |     Infinity |                300 |         
               0 |             20 | 
\{{9579155.20001074,-49013448.3206615,24506250.6929641,-24507193.0554383,-1117659.32579782,37279704.1887547},{-49013448.3206615,252887487.9759,-126442619.757955,1264448

44.588794,7820179.2181533,-193478250.287115},\{24506250.6929641,-126442619.757955,63221042.2619071,-63221565.6815607,-3911332.71616586,96738775.4366319},\{-24507193.0554383,126444844.588794,-63221565.6815607,63223267.0925732,3908845.73413929,-96739456.7519012},{-1117659.32579782,78201

79.2181533,-3911332.71616586,3908845.73413929,2231880.40879312,-7079677.90789804},\{37279704.1887547,-193478250.287115,96738775.4366319,-96739456.7519012,-7079677.90789804,148630091.578168}}

(1 row)

{code}

 I will continue investigating and will update with my findings. 

 

> Logistic regression produces empty output table but no error message on 
> Greenplum
> -
>
> Key: MADLIB-1172
> URL: https://issues.apache.org/jira/browse/MADLIB-1172
> Project: Apache MADlib
>  Issue Type: Bug
>  Components: Module: Logistic Regression
>Reporter: Frank McQuillan
>Assignee: Himanshu Pandey
>Priority: Minor
> Fix For: v1.15
>
> Attachments: Logistic-regression-empty-output.ipynb, 
> load-data-sep.sql, load-data-singular.sql, load-data.sql
>
>
> Separated and singular data sets may produce and empty model table on 
> Greenplum 4.3.x.  On Postgres 9.6 the same example works OK. 
> See the attache jupyter notebook and data sets for details.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MADLIB-1172) Logistic regression produces empty output table but no error message on Greenplum

2018-05-04 Thread Frank McQuillan (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16464386#comment-16464386
 ] 

Frank McQuillan commented on MADLIB-1172:
-

[~hpandey]
Correct, the Singular datasets works fine on gp
and
the separable dataset works fine on gp.

But when the dataset is both *singular and separable* then it does not work on 
gp, i.e., it produces an empty output with no error message or information from 
the user.   This is the data set called "load-data.sql".

The notebook shows that "load-data.sql" does however work with postgres.

The goal of this JIRA is to investigate the problem with the "load-data.sql" 
dataset.


The issue is with the data set th

> Logistic regression produces empty output table but no error message on 
> Greenplum
> -
>
> Key: MADLIB-1172
> URL: https://issues.apache.org/jira/browse/MADLIB-1172
> Project: Apache MADlib
>  Issue Type: Bug
>  Components: Module: Logistic Regression
>Reporter: Frank McQuillan
>Assignee: Himanshu Pandey
>Priority: Minor
> Fix For: v1.15
>
> Attachments: Logistic-regression-empty-output.ipynb, 
> load-data-sep.sql, load-data-singular.sql, load-data.sql
>
>
> Separated and singular data sets may produce and empty model table on 
> Greenplum 4.3.x.  On Postgres 9.6 the same example works OK. 
> See the attache jupyter notebook and data sets for details.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MADLIB-1172) Logistic regression produces empty output table but no error message on Greenplum

2018-05-02 Thread Himanshu Pandey (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16461501#comment-16461501
 ] 

Himanshu Pandey commented on MADLIB-1172:
-

Hi [~fmcquillan] ,

I have tested this in 4.3.25 and below are the results. 

Both Singular data and Separated datasets work fine and return an output but 
the regular data-set load-data.sql 

is returning an empty model table which is different from the initial issue. 

 
{code:java}
[gpadmin@gpdb ~]$ psql -f load-data.sql 
psql:load-data.sql:1: NOTICE: table "dummy_data" does not exist, skipping
DROP TABLE
psql:load-data.sql:2: NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- 
Using column named 'id' as the Greenplum Database data distribution key for 
this table.
HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make 
sure column(s) chosen are the optimal data distribution key to minimize skew.
CREATE TABLE
[gpadmin@gpdb ~]$ psql
psql (8.2.15)
Type "help" for help.

gpadmin=# \dt
List of relations
Schema | Name | Type | Owner | Storage 
++---+-+-
public | dummy_data | table | gpadmin | heap
(1 row)

gpadmin=# select 
madlib.logregr_train('dummy_data','dummy_logit_gp','y','ARRAY[1,x1,x2,x3,x4,x5]',NULL,20,'irls');
logregr_train 
---
 
(1 row)

gpadmin=# select * from dummy_logit_gp;
coef | log_likelihood | std_err | z_stats | p_values | odds_ratios | 
condition_no | num_rows_processed | num_missing_rows_skipped | num_iterations | 
variance_covariance 
--++-+-+--+-+--++--++-
| | | | | | | | | 4 | 
(1 row)

gpadmin=#

{code}

> Logistic regression produces empty output table but no error message on 
> Greenplum
> -
>
> Key: MADLIB-1172
> URL: https://issues.apache.org/jira/browse/MADLIB-1172
> Project: Apache MADlib
>  Issue Type: Bug
>  Components: Module: Logistic Regression
>Reporter: Frank McQuillan
>Assignee: Himanshu Pandey
>Priority: Minor
> Fix For: v1.15
>
> Attachments: Logistic-regression-empty-output.ipynb, 
> load-data-sep.sql, load-data-singular.sql, load-data.sql
>
>
> Separated and singular data sets may produce and empty model table on 
> Greenplum 4.3.x.  On Postgres 9.6 the same example works OK. 
> See the attache jupyter notebook and data sets for details.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)