[jira] [Commented] (MADLIB-1129) Additional output information for k-NN

2017-08-24 Thread Orhan Kislal (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-1129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16140920#comment-16140920
 ] 

Orhan Kislal commented on MADLIB-1129:
--

I like the options but I think we should not mix optional and mandatory 
parameters. Maybe we can rename 2 id columns and move them after 
output_neighbors (since the id columns are dependent on the output_neighbors)

> Additional output information for k-NN
> --
>
> Key: MADLIB-1129
> URL: https://issues.apache.org/jira/browse/MADLIB-1129
> Project: Apache MADlib
>  Issue Type: Improvement
>  Components: k-NN
>Reporter: Frank McQuillan
>Assignee: Himanshu Pandey
>Priority: Minor
>  Labels: starter
> Fix For: v2.0
>
>
> Follow on to
> https://issues.apache.org/jira/browse/MADLIB-927
> List the k-nearest neighbors that were used in the voting/averaging, sorted 
> in ASC order according to the distance function used.  This could be added to 
> the current output table.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MADLIB-1129) Additional output information for k-NN

2017-08-25 Thread Orhan Kislal (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-1129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16141988#comment-16141988
 ] 

Orhan Kislal commented on MADLIB-1129:
--

I think this looks good. Thanks [~fmcquillan]

> Additional output information for k-NN
> --
>
> Key: MADLIB-1129
> URL: https://issues.apache.org/jira/browse/MADLIB-1129
> Project: Apache MADlib
>  Issue Type: Improvement
>  Components: k-NN
>Reporter: Frank McQuillan
>Assignee: Himanshu Pandey
>Priority: Minor
>  Labels: starter
> Fix For: v2.0
>
>
> Follow on to
> https://issues.apache.org/jira/browse/MADLIB-927
> List the k-nearest neighbors that were used in the voting/averaging, sorted 
> in ASC order according to the distance function used.  This could be added to 
> the current output table.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MADLIB-1170) LMF performance with GPDB Optimizer

2017-11-01 Thread Orhan Kislal (JIRA)
Orhan Kislal created MADLIB-1170:


 Summary: LMF performance with GPDB Optimizer
 Key: MADLIB-1170
 URL: https://issues.apache.org/jira/browse/MADLIB-1170
 Project: Apache MADlib
  Issue Type: Improvement
  Components: Module: Matrix Factorisation
Reporter: Orhan Kislal


LMF performance is reduced when the gpdb optimizer is enabled. It seems LMF 
creates a corner case where Orca generates a SubPlan for uncorrelated query in 
the project list. 

Once that problem is solved, we should reevaluate the LMF performance



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MADLIB-1174) Improve madpack user experience

2017-11-13 Thread Orhan Kislal (JIRA)
Orhan Kislal created MADLIB-1174:


 Summary: Improve madpack user experience
 Key: MADLIB-1174
 URL: https://issues.apache.org/jira/browse/MADLIB-1174
 Project: Apache MADlib
  Issue Type: Improvement
  Components: Madpack
Reporter: Orhan Kislal
 Fix For: v1.13


1. madpack should behave like psql for connection options.  E.g.:
-   no host option provided = connect via a Unix-domain socket to a server 
on the local host, or via TCP/IP to localhost on machines that don't have 
Unix-domain sockets.
-   -h /foo = unix socket connection for socket in /foo
-   -h foo  =  tcp connection to foo
- The default user name is your operating-system user name, as is the 
default database name. 
- Environment variables to use: PGDATABASE, PGHOST, PGPORT, PGUSER 

There are two other JIRA's about the password requirements that might be 
related to this improvement: MADLIB-346, MADLIB-780



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MADLIB-1174) Improve madpack user experience

2017-11-13 Thread Orhan Kislal (JIRA)

 [ 
https://issues.apache.org/jira/browse/MADLIB-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Orhan Kislal updated MADLIB-1174:
-
Description: 
1. madpack should behave like psql for connection options.  E.g.:
-   no host option provided = connect via a Unix-domain socket to a server 
on the local host, or via TCP/IP to localhost on machines that don't have 
Unix-domain sockets.
-   -h /foo = unix socket connection for socket in /foo
-   -h foo  =  tcp connection to foo
- The default user name is your operating-system user name, as is the 
default database name. 
- Environment variables to use: PGDATABASE, PGHOST, PGPORT, PGUSER 

There are two other JIRA's about the password requirements that might be 
related to this improvement: MADLIB-346, MADLIB-780

2. madpack should be able to install a subset of modules using a flag similar 
to the -t option in install-check.

  was:
1. madpack should behave like psql for connection options.  E.g.:
-   no host option provided = connect via a Unix-domain socket to a server 
on the local host, or via TCP/IP to localhost on machines that don't have 
Unix-domain sockets.
-   -h /foo = unix socket connection for socket in /foo
-   -h foo  =  tcp connection to foo
- The default user name is your operating-system user name, as is the 
default database name. 
- Environment variables to use: PGDATABASE, PGHOST, PGPORT, PGUSER 

There are two other JIRA's about the password requirements that might be 
related to this improvement: MADLIB-346, MADLIB-780


> Improve madpack user experience
> ---
>
> Key: MADLIB-1174
> URL: https://issues.apache.org/jira/browse/MADLIB-1174
> Project: Apache MADlib
>  Issue Type: Improvement
>  Components: Madpack
>Reporter: Orhan Kislal
> Fix For: v1.13
>
>
> 1. madpack should behave like psql for connection options.  E.g.:
> -   no host option provided = connect via a Unix-domain socket to a 
> server on the local host, or via TCP/IP to localhost on machines that don't 
> have Unix-domain sockets.
> -   -h /foo = unix socket connection for socket in /foo
> -   -h foo  =  tcp connection to foo
> - The default user name is your operating-system user name, as is the 
> default database name. 
> - Environment variables to use: PGDATABASE, PGHOST, PGPORT, PGUSER 
> There are two other JIRA's about the password requirements that might be 
> related to this improvement: MADLIB-346, MADLIB-780
> 2. madpack should be able to install a subset of modules using a flag similar 
> to the -t option in install-check.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MADLIB-1175) Uninstall should cleanup symlinks

2017-11-13 Thread Orhan Kislal (JIRA)
Orhan Kislal created MADLIB-1175:


 Summary: Uninstall should cleanup symlinks
 Key: MADLIB-1175
 URL: https://issues.apache.org/jira/browse/MADLIB-1175
 Project: Apache MADlib
  Issue Type: Improvement
  Components: Build System
Reporter: Orhan Kislal
 Fix For: v1.13


Upgrade/install adds a few symlinks to reflect the new folder structure. We 
need an uninstall script that removes these symlinks when madlib is uninstalled 
using the rpm or gppkg. Currently, rpm uninstall will only remove the files it 
explicitly installs (i.e. all madlib files except the symlinks).  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MADLIB-1176) Consolidate Version Numbers

2017-11-13 Thread Orhan Kislal (JIRA)
Orhan Kislal created MADLIB-1176:


 Summary: Consolidate Version Numbers
 Key: MADLIB-1176
 URL: https://issues.apache.org/jira/browse/MADLIB-1176
 Project: Apache MADlib
  Issue Type: Improvement
  Components: Build System
Reporter: Orhan Kislal
 Fix For: v1.13


In addition to the regular version number, madlib uses an additional value in 
deploy/gppkg/CMakeLists.txt called MADLIB_GPPKG_VERSION. If possible, this 
additional version number should be removed. If not, its use should be 
documented.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (MADLIB-1168) Balance datasets

2017-12-19 Thread Orhan Kislal (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-1168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16297329#comment-16297329
 ] 

Orhan Kislal edited comment on MADLIB-1168 at 12/19/17 7:55 PM:


PR implements the aforementioned idea for undersampling without replacement. It 
seems having both row numbers and ordering will slow down the process quite a 
bit. An alternate approach would be handling each class differently. We can 
create a view for a given class (i.e. view_cl1), and use a query like:
{code}
select * from view_cl1 order by random() limit min_count;
{code}
and then return a union of these subqueries. I am not sure if this will 
actually improve performance since we will have multiple queries instead of a 
single one but it might be worth exploring. [~riyer] any thoughts?


was (Author: okislal):
The PR above implements idea above for undersampling without replacement. It 
seems having both row numbers and ordering will slow down the process quite a 
bit. An alternate approach would be handling each class differently. We can 
create a view for a given class (i.e. view_cl1), and use a query like:
{code}
select * from view_cl1 order by random() limit min_count;
{code}
and then return a union of these subqueries. I am not sure if this will 
actually improve performance since we will have multiple queries instead of a 
single one but it might be worth exploring. [~riyer] any thoughts?

> Balance datasets
> 
>
> Key: MADLIB-1168
> URL: https://issues.apache.org/jira/browse/MADLIB-1168
> Project: Apache MADlib
>  Issue Type: New Feature
>  Components: Module: Sampling
>Reporter: Frank McQuillan
> Fix For: v2.0
>
> Attachments: MADlib Balance Datasets Requirements.pdf, 
> MADlib_Balance_Datasets_Requirements_v2.pdf
>
>
> From [1] here is the motivation behind balancing datasets:
> “Most classification algorithms will only perform optimally when the number 
> of samples of each class is roughly the same. Highly skewed datasets, where 
> the minority is heavily outnumbered by one or more classes, have proven to be 
> a challenge while at the same time becoming more and more common.
> One way of addressing this issue is by re-sampling the dataset as to offset 
> this imbalance with the hope of arriving at a more robust and fair decision 
> boundary than you would otherwise.
> Re-sampling techniques can be divided in these categories:
> * Under-sampling the majority class(es).
> * Over-sampling the minority class.
> * Combining over- and under-sampling.
> * Create ensemble balanced sets.”
> There is an extensive literature on balancing datasets.  The plan for MADlib 
> in the initial phase is to offer basic functionality that can be extended in 
> later phases based on feedback from users.  
> Please see attached document for proposed scope of this story.
> References
> [1] imbalance-learn Python project
> http://contrib.scikit-learn.org/imbalanced-learn/stable/index.html
> https://github.com/scikit-learn-contrib/imbalanced-learn



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MADLIB-1168) Balance datasets

2017-12-19 Thread Orhan Kislal (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-1168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16297329#comment-16297329
 ] 

Orhan Kislal commented on MADLIB-1168:
--

The PR above implements idea above for undersampling without replacement. It 
seems having both row numbers and ordering will slow down the process quite a 
bit. An alternate approach would be handling each class differently. We can 
create a view for a given class (i.e. view_cl1), and use a query like:
{code}
select * from view_cl1 order by random() limit min_count;
{code}
and then return a union of these subqueries. I am not sure if this will 
actually improve performance since we will have multiple queries instead of a 
single one but it might be worth exploring. [~riyer] any thoughts?

> Balance datasets
> 
>
> Key: MADLIB-1168
> URL: https://issues.apache.org/jira/browse/MADLIB-1168
> Project: Apache MADlib
>  Issue Type: New Feature
>  Components: Module: Sampling
>Reporter: Frank McQuillan
> Fix For: v2.0
>
> Attachments: MADlib Balance Datasets Requirements.pdf, 
> MADlib_Balance_Datasets_Requirements_v2.pdf
>
>
> From [1] here is the motivation behind balancing datasets:
> “Most classification algorithms will only perform optimally when the number 
> of samples of each class is roughly the same. Highly skewed datasets, where 
> the minority is heavily outnumbered by one or more classes, have proven to be 
> a challenge while at the same time becoming more and more common.
> One way of addressing this issue is by re-sampling the dataset as to offset 
> this imbalance with the hope of arriving at a more robust and fair decision 
> boundary than you would otherwise.
> Re-sampling techniques can be divided in these categories:
> * Under-sampling the majority class(es).
> * Over-sampling the minority class.
> * Combining over- and under-sampling.
> * Create ensemble balanced sets.”
> There is an extensive literature on balancing datasets.  The plan for MADlib 
> in the initial phase is to offer basic functionality that can be extended in 
> later phases based on feedback from users.  
> Please see attached document for proposed scope of this story.
> References
> [1] imbalance-learn Python project
> http://contrib.scikit-learn.org/imbalanced-learn/stable/index.html
> https://github.com/scikit-learn-contrib/imbalanced-learn



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MADLIB-1168) Balance datasets

2018-01-05 Thread Orhan Kislal (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-1168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16314151#comment-16314151
 ] 

Orhan Kislal commented on MADLIB-1168:
--

[~fmcquillan] How should we handle negative values for output_table_size. I 
think giving an error should be fine but wanted to make sure that is the 
expected behavior in other similar modules (MADlib or other open source 
projects).

> Balance datasets
> 
>
> Key: MADLIB-1168
> URL: https://issues.apache.org/jira/browse/MADLIB-1168
> Project: Apache MADlib
>  Issue Type: New Feature
>  Components: Module: Sampling
>Reporter: Frank McQuillan
>Assignee: ssoni
> Fix For: v1.14
>
> Attachments: MADlib Balance Datasets Requirements.pdf, 
> MADlib_Balance_Datasets_Requirements_v2.pdf
>
>
> From [1] here is the motivation behind balancing datasets:
> “Most classification algorithms will only perform optimally when the number 
> of samples of each class is roughly the same. Highly skewed datasets, where 
> the minority is heavily outnumbered by one or more classes, have proven to be 
> a challenge while at the same time becoming more and more common.
> One way of addressing this issue is by re-sampling the dataset as to offset 
> this imbalance with the hope of arriving at a more robust and fair decision 
> boundary than you would otherwise.
> Re-sampling techniques can be divided in these categories:
> * Under-sampling the majority class(es).
> * Over-sampling the minority class.
> * Combining over- and under-sampling.
> * Create ensemble balanced sets.”
> There is an extensive literature on balancing datasets.  The plan for MADlib 
> in the initial phase is to offer basic functionality that can be extended in 
> later phases based on feedback from users.  
> Please see attached document for proposed scope of this story.
> References
> [1] imbalance-learn Python project
> http://contrib.scikit-learn.org/imbalanced-learn/stable/index.html
> https://github.com/scikit-learn-contrib/imbalanced-learn



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MADLIB-1197) Upgrade fails for 1.13

2018-01-11 Thread Orhan Kislal (JIRA)
Orhan Kislal created MADLIB-1197:


 Summary: Upgrade fails for 1.13
 Key: MADLIB-1197
 URL: https://issues.apache.org/jira/browse/MADLIB-1197
 Project: Apache MADlib
  Issue Type: Bug
  Components: Upgrade
Reporter: Orhan Kislal
 Fix For: v1.14


Upgrading to 1.13 fails with the following error:
{code}
madpack.py: INFO : > - knn
madpack.py: ERROR : TEST CASE RESULTed executing 
/tmp/madlib.xOnniK/knn/knn.sql_in.tmp
madpack.py: ERROR : Check the log at /tmp/madlib.xOnniK/knn/knn.sql_in.log
Exception:
(, Exception(), )
 File 
"/usr/local/greenplum-db-4.3.14.0/madlib/Versions/1.13/bin/../madpack/madpack.py",
 line 1171, in main
   _db_upgrade(schema, dbrev)
 File 
"/usr/local/greenplum-db-4.3.14.0/madlib/Versions/1.13/bin/../madpack/madpack.py",
 line 571, in _db_upgrade
   _db_create_objects(schema, None, True, sc)
 File 
"/usr/local/greenplum-db-4.3.14.0/madlib/Versions/1.13/bin/../madpack/madpack.py",
 line 734, in _db_create_objects
   raise Exception
madpack.py: ERROR : MADlib upgrade failed.
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MADLIB-1197) Upgrade fails for 1.13

2018-01-11 Thread Orhan Kislal (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16323342#comment-16323342
 ] 

Orhan Kislal commented on MADLIB-1197:
--

The issue seems to be a minor mistake with knn help functions. Here is the 
workaround: 
https://cwiki.apache.org/confluence/display/MADLIB/Installation+Guide#InstallationGuide-01/11/18-UpgradingMADlibto1.13

> Upgrade fails for 1.13
> --
>
> Key: MADLIB-1197
> URL: https://issues.apache.org/jira/browse/MADLIB-1197
> Project: Apache MADlib
>  Issue Type: Bug
>  Components: Upgrade
>Reporter: Orhan Kislal
> Fix For: v1.14
>
>
> Upgrading to 1.13 fails with the following error:
> {code}
> madpack.py: INFO : > - knn
> madpack.py: ERROR : TEST CASE RESULTed executing 
> /tmp/madlib.xOnniK/knn/knn.sql_in.tmp
> madpack.py: ERROR : Check the log at /tmp/madlib.xOnniK/knn/knn.sql_in.log
> Exception:
> (, Exception(), )
>  File 
> "/usr/local/greenplum-db-4.3.14.0/madlib/Versions/1.13/bin/../madpack/madpack.py",
>  line 1171, in main
>_db_upgrade(schema, dbrev)
>  File 
> "/usr/local/greenplum-db-4.3.14.0/madlib/Versions/1.13/bin/../madpack/madpack.py",
>  line 571, in _db_upgrade
>_db_create_objects(schema, None, True, sc)
>  File 
> "/usr/local/greenplum-db-4.3.14.0/madlib/Versions/1.13/bin/../madpack/madpack.py",
>  line 734, in _db_create_objects
>raise Exception
> madpack.py: ERROR : MADlib upgrade failed.
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MADLIB-1200) Pre-processing helper function for mini-batching

2018-02-07 Thread Orhan Kislal (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-1200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16356188#comment-16356188
 ] 

Orhan Kislal commented on MADLIB-1200:
--

Does it make sense to have a parameter called `standardize`? Algorithms such as 
elastic_net and MLP standardize the independent and dependent variables 
internally. It might be harder to standardize the processed table that is the 
output of minibatch_preprocessor() in those modules.

> Pre-processing helper function for mini-batching 
> -
>
> Key: MADLIB-1200
> URL: https://issues.apache.org/jira/browse/MADLIB-1200
> Project: Apache MADlib
>  Issue Type: New Feature
>  Components: Module: Utilities
>Reporter: Frank McQuillan
>Priority: Major
> Fix For: v1.14
>
>
> Related to
>  https://issues.apache.org/jira/browse/MADLIB-1037
>  https://issues.apache.org/jira/browse/MADLIB-1048
> Story
> {{As a}}
>  data scientist
>  {{I want to}}
>  pre-process input files for use with mini-batching
>  {{so that}}
>  the optimization part of MLP, SVM, etc. runs faster when I do multiple runs, 
> perhaps because I am tuning parameters (i.e., pre-processing is an occasional 
> operation that I don't want to re-do every time that I train a model)
> Interface
> {code:java}
> minibatch_preprocessor (
> source_table,  -- Name of the table containing the input 
> data.
> output_table,  -- Name of the table suitable for 
> mini-batching.
> dependent_varname, -- Name of the dependent variable column. 
> independent_varname,   -- Expression list to evaluate for the 
> independent variables.
> buffer_size,   -- ??? 
> ){code}
>  
> The main purpose of the function is to prepare the training data for 
> minibatching algorithms. This will be achieved in 2 stages
>  # Based on the batch size, group all the dependent and independent variables 
> in a single tuple representative of the batch.
>  # If the independent variables are boolean or text, perform one hot 
> encoding.  N/A for integer and floats. Note that if the integer vars are 
> actually categorical, they must be case to ::TEXT so that they get encoded.  
> Notes
> 1) Random shuffle needed for mini-batch.
>  2) Naive approach may be OK to start, not worth big investment to make run 
> 10% or 20% faster.
> Acceptance
> 1) Convert from standard to special format for mini-batching
>  2) Some scale testing OK (does not need to be comprehensive)
>  3) Document as a helper function user docs
>  4) IC



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MADLIB-1025) Support modern versions of gcc

2018-05-10 Thread Orhan Kislal (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-1025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16470904#comment-16470904
 ] 

Orhan Kislal commented on MADLIB-1025:
--

The string issue can be fixed by the -D_GLIBCXX_USE_CXX11_ABI=0 flag. Thanks to 
[~nikhilkak] and [~riyer] for this fix.

After applying this fix, the IC passes on default builds but not on release. It 
seems the cause of these failures is the optimization level of the compiler. 
cmake defaults to -O3 for Release builds. If we add -O2 via 
"add_compile_options" function of cmake, we can eliminate some of these errors. 
Since the -O3 is quite aggressive in optimization I believe it is acceptable to 
change. We should continue to provide convenience binaries compiled with gcc4 
to make sure existing users can easily upgrade.

The default build type (RelWithDebInfo) already uses -O2 and should not be 
affected by this issue.

> Support modern versions of gcc
> --
>
> Key: MADLIB-1025
> URL: https://issues.apache.org/jira/browse/MADLIB-1025
> Project: Apache MADlib
>  Issue Type: Bug
>  Components: Build System
>Reporter: Rahul Iyer
>Priority: Major
> Fix For: v1.15
>
>
> Compiling with gcc 6.2.0 gives the below error.
> {code}
> [ 84%] Building CXX object 
> src/ports/postgres/9.5/CMakeFiles/madlib_postgresql_9_5.dir/__/__/__/modules/elastic_net/elastic_net_gaussian_fista.cpp.o
> In file included from 
> /var/folders/rm/g9tb1s_53wb86s5_nrsdbxphgn/T/tmp8WXq3S/madlib-1.9.1/src/modules/elastic_net/elastic_net_binomial_igd.cpp:5:0:
> /var/folders/rm/g9tb1s_53wb86s5_nrsdbxphgn/T/tmp8WXq3S/madlib-1.9.1/src/modules/elastic_net/elastic_net_optimizer_igd.hpp:
>  In static member function 'static madlib::dbconnector::postgres::AnyType 
> madlib::modules::elastic_net::Igd<
> Model>::igd_transition(madlib::dbconnector::postgres::AnyType&, const 
> madlib::dbconnector::postgres::Allocator&)':
> /var/folders/rm/g9tb1s_53wb86s5_nrsdbxphgn/T/tmp8WXq3S/madlib-1.9.1/src/modules/elastic_net/elastic_net_optimizer_igd.hpp:69:46:
>  error: call of overloaded 
> 'log(madlib::modules::HandleTraits rayHandle >::ReferenceToUInt32&)' is ambiguous
>  state.p = 2 * log(state.dimension);
>   ^
> In file included from 
> /usr/local/Cellar/gcc/6.2.0/include/c++/6.2.0/cmath:45:0,
>  from /usr/local/Cellar/gcc/6.2.0/include/c++/6.2.0/math.h:36,
>  from 
> /var/folders/rm/g9tb1s_53wb86s5_nrsdbxphgn/T/tmp8WXq3S/madlib-1.9.1/src/ports/postgres/dbconnector/../../../../methods/svec/src/pg_gp/SparseData.h:24,
>  from 
> /var/folders/rm/g9tb1s_53wb86s5_nrsdbxphgn/T/tmp8WXq3S/madlib-1.9.1/src/ports/postgres/dbconnector/../../../../methods/svec/src/pg_gp/sparse_vector.h:10,
>  from 
> /var/folders/rm/g9tb1s_53wb86s5_nrsdbxphgn/T/tmp8WXq3S/madlib-1.9.1/src/ports/postgres/dbconnector/dbconnector.hpp:39,
>  from 
> /var/folders/rm/g9tb1s_53wb86s5_nrsdbxphgn/T/tmp8WXq3S/madlib-1.9.1/src/modules/elastic_net/elastic_net_binomial_igd.cpp:2:
> /usr/local/Cellar/gcc/6.2.0/lib/gcc/6/gcc/x86_64-apple-darwin15.6.0/6.2.0/include-fixed/math.h:402:15:
>  note: candidate: double log(double)
>  extern double log(double);
>^~~
> In file included from 
> /usr/local/Cellar/gcc/6.2.0/include/c++/6.2.0/math.h:36:0,
>  from 
> /var/folders/rm/g9tb1s_53wb86s5_nrsdbxphgn/T/tmp8WXq3S/madlib-1.9.1/src/ports/postgres/dbconnector/../../../../methods/svec/src/pg_gp/SparseData.h:24,
>  from 
> /var/folders/rm/g9tb1s_53wb86s5_nrsdbxphgn/T/tmp8WXq3S/madlib-1.9.1/src/ports/postgres/dbconnector/../../../../methods/svec/src/pg_gp/sparse_vector.h:10,
>  from 
> /var/folders/rm/g9tb1s_53wb86s5_nrsdbxphgn/T/tmp8WXq3S/madlib-1.9.1/src/ports/postgres/dbconnector/dbconnector.hpp:39,
>  from 
> /var/folders/rm/g9tb1s_53wb86s5_nrsdbxphgn/T/tmp8WXq3S/madlib-1.9.1/src/modules/elastic_net/elastic_net_binomial_igd.cpp:2:
> /usr/local/Cellar/gcc/6.2.0/include/c++/6.2.0/cmath:365:3: note: candidate: 
> long double std::log(long double)
>log(long double __x)
>^~~
> /usr/local/Cellar/gcc/6.2.0/include/c++/6.2.0/cmath:361:3: note: candidate: 
> float std::log(float)
>log(float __x)
>^~~
> make[3]: *** 
> [src/ports/postgres/9.5/CMakeFiles/madlib_postgresql_9_5.dir/__/__/__/modules/elastic_net/elastic_net_binomial_igd.cpp.o]
>  Error 1
> make[3]: *** Waiting for unfinished jobs
> make[2]: *** 
> [src/ports/postgres/9.5/CMakeFiles/madlib_postgresql_9_5.dir/all] Error 2
> make[1]: *** [all] Error 2
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MADLIB-1240) Vector to Columns

2018-05-24 Thread Orhan Kislal (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-1240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16489436#comment-16489436
 ] 

Orhan Kislal commented on MADLIB-1240:
--

w/[~njayaram] It is important to note that PostgreSQL has a limit on the number 
of columns a table can have (~1600). The function should give an appropriate 
error if the vector contains more than that. 

> Vector to Columns
> -
>
> Key: MADLIB-1240
> URL: https://issues.apache.org/jira/browse/MADLIB-1240
> Project: Apache MADlib
>  Issue Type: New Feature
>  Components: Module: Utilities
>Reporter: Frank McQuillan
>Priority: Major
> Fix For: v1.15
>
>
> related to https://issues.apache.org/jira/browse/MADLIB-1239
> Vector to Columns
> Converts a feature array in a single column of an output table into multiple 
> columns.  This process can be used to reverse the function cols2vec.
> {code}
> vec2cols(
> source_table,
> out_table,
> vector_col,
> dictionary,
> cols_to_output
> )
> source_table
> TEXT. Name of the table containing the source data.
> out_table
> TEXT. Name of the generated table containing the output. If a table with the 
> same name already exists, an error will be returned. 
> vector_col
> TEXT.  Name of the column containing the feature array.  Must be a 
> one-dimensional array.
> dictionary (optional)
> TEXT. Name of the table containing the array of names associated with the 
> feature array.  This table is created by the function 'cols2vec'.  If the 
> dictionary table is not specified, column names will be automatically 
> generated of the form 'feature_1, feature_2, ...feature_n'
> cols_to_output (optional)
> TEXT, default NULL. Comma-separated string of column names from the source 
> table to keep in the output table, in addition to the feature columns.  To 
> keep all columns from the source table, use '*'.
> Output
> The output table produced by the vec2cols function contains the following 
> columns:
> <...>
> Columns from source table, depending on which ones are kept (if any).
> feature columns
> Columns for each of the features in 'vector_col'.  Column type will depend on 
> the feature array type in the source table.  Column naming will depend on 
> whether the parameter 'dictionary' is used.
> {code}
> Notes
> (1)
> The function
> http://pivotalsoftware.github.io/PDLTools/group__ArrayUtilities.html
> is similar but the proposed MADlib one has more options.  To do the 
> equivalent of the PDL Tools one in MADlib, you would do:
> {code}
> vec2cols(
> table_name,
> output_table,
> vector_column,
> NULL,
> '*'
> )
> {code}
> (2)
> Please put the generated feature columns on the right side of the output 
> table, i.e., they will be the last column on the right.  Maintain the order 
> of the array.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MADLIB-1247) Separate install-check for better testing

2018-06-19 Thread Orhan Kislal (JIRA)
Orhan Kislal created MADLIB-1247:


 Summary: Separate install-check for better testing
 Key: MADLIB-1247
 URL: https://issues.apache.org/jira/browse/MADLIB-1247
 Project: Apache MADlib
  Issue Type: Improvement
Reporter: Orhan Kislal


The install-check is a valuable tool for both developers and users to validate 
the MADlib installation. In most cases, IC uses a small dataset to run the 
MADlib functions and validate the results. While this works well for 
developers, users on large clusters encounter failures because of these small 
dataset sizes (i.e. table with 10 rows on a cluster with 64 segments). The 
function should still work but testing for accuracy does not seem useful to me.

To solve this issue, we can separate the dual functionality of IC. Move the 
existing tests to "dev-check" and create a very light-weight install-check that 
calls the function but doesn't care about the assertion of results.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MADLIB-1248) Reinstalling on a db that does not have MADlib

2018-06-26 Thread Orhan Kislal (JIRA)
Orhan Kislal created MADLIB-1248:


 Summary: Reinstalling on a db that does not have MADlib
 Key: MADLIB-1248
 URL: https://issues.apache.org/jira/browse/MADLIB-1248
 Project: Apache MADlib
  Issue Type: Bug
  Components: Madpack
Reporter: Orhan Kislal


Reinstalling on a database that does not have MADlib already installed fails as 
expected. However, the info messages do not mention this failure. It seems the 
proper error message was lost during the `making madpack operations atomic` 
Jira (MADLIB-1242). 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MADLIB-1251) IC fails when -t argument is invalid

2018-07-03 Thread Orhan Kislal (JIRA)
Orhan Kislal created MADLIB-1251:


 Summary: IC fails when -t argument is invalid
 Key: MADLIB-1251
 URL: https://issues.apache.org/jira/browse/MADLIB-1251
 Project: Apache MADlib
  Issue Type: Bug
  Components: Madpack
Reporter: Orhan Kislal


IC fails if we give a non-existing module name as an argument. Madpack should 
give a proper error message instead.

Example (recursive-partitioning -> recursive_partitioning):
{code:java}
./src/bin/madpack -p postgres -c /madlib install-check -t recursive-partitioning
madpack.py: INFO : Detected PostgreSQL version 10.3.
Traceback (most recent call last):
  File 
"/Users/pivotal/workspace/dev-madlib/build-pg10/src/bin/../madpack/madpack.py", 
line 1318, in 
main(sys.argv[1:])
  File 
"/Users/pivotal/workspace/dev-madlib/build-pg10/src/bin/../madpack/madpack.py", 
line 1279, in main
run_install_check(locals(), args.testcase, args.command[0])
  File 
"/Users/pivotal/workspace/dev-madlib/build-pg10/src/bin/../madpack/madpack.py", 
line 967, in run_install_check
_internal_run_query("DROP SCHEMA IF EXISTS %s CASCADE;" % (test_schema), 
True)
UnboundLocalError: local variable 'test_schema' referenced before assignment
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MADLIB-1253) Pagerank grouping does not give any output on complete graphs

2018-07-13 Thread Orhan Kislal (JIRA)
Orhan Kislal created MADLIB-1253:


 Summary: Pagerank grouping does not give any output on complete 
graphs
 Key: MADLIB-1253
 URL: https://issues.apache.org/jira/browse/MADLIB-1253
 Project: Apache MADlib
  Issue Type: Bug
  Components: Module: Graph
Reporter: Orhan Kislal


If we use a completely connected graph with grouping, there is no output. 
Non-grouping calls do not have this issue.
{code:java}
CREATE TABLE vertex(
id INTEGER
);
CREATE TABLE edge_full(
src INTEGER,
dest INTEGER,
user_id INTEGER
);
INSERT INTO vertex VALUES
(0),
(1),
(2);
INSERT INTO "EDGE" VALUES
(0, 1, 1),
(0, 2, 1),
(1, 2, 1),
(2, 1, 1),
(1, 0, 1),
(2, 0, 1);

DROP TABLE IF EXISTS pagerank_grp_out_summary, pagerank_grp_out;
SELECT pagerank(
'vertex', -- Vertex table
'id', -- Vertix id column
'edge_full', -- "EDGE" table
'src=src, dest=dest', -- "EDGE" args
'pagerank_grp_out', -- Output table of PageRank
NULL, -- Default damping factor (0.85)
NULL, -- Default max iters (100)
NULL, -- Default Threshold
'user_id');
{code}
{code:java}
select * from pagerank_grp_out;
 user_id | id | pagerank
-++--
(0 rows)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MADLIB-1255) MLP: NaN loss for some hyperparam settings

2018-07-17 Thread Orhan Kislal (JIRA)
Orhan Kislal created MADLIB-1255:


 Summary: MLP: NaN loss for some hyperparam settings
 Key: MADLIB-1255
 URL: https://issues.apache.org/jira/browse/MADLIB-1255
 Project: Apache MADlib
  Issue Type: Bug
  Components: Module: Neural Networks
Reporter: Orhan Kislal


w/ [~njayaram] For the boston dataset (duplicated for multiple groups testing) 
the following query produces NaN for the loss.
{code:java}
SELECT setseed(0);
DROP TABLE IF EXISTS temp3;
DROP TABLE IF EXISTS temp3_summary;
DROP TABLE IF EXISTS temp3_standardization;
SELECT madlib.mlp_regression(
  'madlibtestdata.boston_grouping'::varchar,
  'temp3'::varchar,
  'ARRAY[crim, zn, indus, chas, nox, rm, age, dis, rad, tax, ptratio, 
b, lstat]'::varchar,
  'medv'::varchar,
  ARRAY[100]::integer[],
  'learning_rate_init=0.0025, lambda=0.1, 
learning_rate_policy=step, gamma=0.8, iterations_per_step=250, 
n_iterations=1500, tolerance=0, momentum=0'::varchar,
  'tanh'::varchar,
  NULL,
  False,
  False,
  'grp_col'
  );
SELECT loss FROM temp3 WHERE grp_col=2;
{code}
Dataset: [https://www.cs.toronto.edu/~delve/data/boston/bostonDetail.html]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (MADLIB-1255) MLP: NaN loss for some hyperparam settings

2018-07-17 Thread Orhan Kislal (JIRA)


 [ 
https://issues.apache.org/jira/browse/MADLIB-1255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Orhan Kislal updated MADLIB-1255:
-
Description: 
w/ [~njayaram] For the boston dataset (duplicated for multiple groups testing) 
the following query produces NaN for the loss. If NaN is an acceptable result, 
we should have a more user-friendly way of erroring out during prediction.
{code:java}
SELECT setseed(0);
DROP TABLE IF EXISTS temp3;
DROP TABLE IF EXISTS temp3_summary;
DROP TABLE IF EXISTS temp3_standardization;
SELECT madlib.mlp_regression(
  'madlibtestdata.boston_grouping'::varchar,
  'temp3'::varchar,
  'ARRAY[crim, zn, indus, chas, nox, rm, age, dis, rad, tax, ptratio, 
b, lstat]'::varchar,
  'medv'::varchar,
  ARRAY[100]::integer[],
  'learning_rate_init=0.0025, lambda=0.1, 
learning_rate_policy=step, gamma=0.8, iterations_per_step=250, 
n_iterations=1500, tolerance=0, momentum=0'::varchar,
  'tanh'::varchar,
  NULL,
  False,
  False,
  'grp_col'
  );
SELECT loss FROM temp3 WHERE grp_col=2;
{code}
Dataset: [https://www.cs.toronto.edu/~delve/data/boston/bostonDetail.html]

  was:
w/ [~njayaram] For the boston dataset (duplicated for multiple groups testing) 
the following query produces NaN for the loss.
{code:java}
SELECT setseed(0);
DROP TABLE IF EXISTS temp3;
DROP TABLE IF EXISTS temp3_summary;
DROP TABLE IF EXISTS temp3_standardization;
SELECT madlib.mlp_regression(
  'madlibtestdata.boston_grouping'::varchar,
  'temp3'::varchar,
  'ARRAY[crim, zn, indus, chas, nox, rm, age, dis, rad, tax, ptratio, 
b, lstat]'::varchar,
  'medv'::varchar,
  ARRAY[100]::integer[],
  'learning_rate_init=0.0025, lambda=0.1, 
learning_rate_policy=step, gamma=0.8, iterations_per_step=250, 
n_iterations=1500, tolerance=0, momentum=0'::varchar,
  'tanh'::varchar,
  NULL,
  False,
  False,
  'grp_col'
  );
SELECT loss FROM temp3 WHERE grp_col=2;
{code}
Dataset: [https://www.cs.toronto.edu/~delve/data/boston/bostonDetail.html]


> MLP: NaN loss for some hyperparam settings
> --
>
> Key: MADLIB-1255
> URL: https://issues.apache.org/jira/browse/MADLIB-1255
> Project: Apache MADlib
>  Issue Type: Bug
>  Components: Module: Neural Networks
>Reporter: Orhan Kislal
>Priority: Major
>
> w/ [~njayaram] For the boston dataset (duplicated for multiple groups 
> testing) the following query produces NaN for the loss. If NaN is an 
> acceptable result, we should have a more user-friendly way of erroring out 
> during prediction.
> {code:java}
> SELECT setseed(0);
> DROP TABLE IF EXISTS temp3;
> DROP TABLE IF EXISTS temp3_summary;
> DROP TABLE IF EXISTS temp3_standardization;
> SELECT madlib.mlp_regression(
>   'madlibtestdata.boston_grouping'::varchar,
>   'temp3'::varchar,
>   'ARRAY[crim, zn, indus, chas, nox, rm, age, dis, rad, tax, ptratio, 
> b, lstat]'::varchar,
>   'medv'::varchar,
>   ARRAY[100]::integer[],
>   'learning_rate_init=0.0025, lambda=0.1, 
> learning_rate_policy=step, gamma=0.8, iterations_per_step=250, 
> n_iterations=1500, tolerance=0, momentum=0'::varchar,
>   'tanh'::varchar,
>   NULL,
>   False,
>   False,
>   'grp_col'
>   );
> SELECT loss FROM temp3 WHERE grp_col=2;
> {code}
> Dataset: [https://www.cs.toronto.edu/~delve/data/boston/bostonDetail.html]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MADLIB-1260) Remove online examples

2018-07-31 Thread Orhan Kislal (JIRA)


[ 
https://issues.apache.org/jira/browse/MADLIB-1260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16564210#comment-16564210
 ] 

Orhan Kislal commented on MADLIB-1260:
--

w/ [~njayaram] We have created a sql script to run all 3 options of online help 
({{()}}, {{('usage')}} and {{('example')}}) for every function. Currently most 
matrix operations are failing on master but our branch should contain the fix 
for them as well. Please see the attachments for the script.

> Remove online examples
> --
>
> Key: MADLIB-1260
> URL: https://issues.apache.org/jira/browse/MADLIB-1260
> Project: Apache MADlib
>  Issue Type: Improvement
>  Components: All Modules
>Reporter: Jingyi Mei
>Priority: Major
> Fix For: v1.15.1
>
>
> For a madlib module, we can call 
> {code:java}
> select madlib_schema.module_name('example');{code}
> to print out examples of this module. They are hard to maintain and not that 
> useful since we already have examples in our user documentation 
> [http://madlib.apache.org/docs/latest/index.html/.|http://madlib.apache.org/docs/latest/index.html/]
>  
> We are going to remove those examples for every module that has it, and make 
> sure madlib throw out proper error message when user calls it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (MADLIB-1260) Remove online examples

2018-07-31 Thread Orhan Kislal (JIRA)


[ 
https://issues.apache.org/jira/browse/MADLIB-1260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16564210#comment-16564210
 ] 

Orhan Kislal edited comment on MADLIB-1260 at 7/31/18 7:01 PM:
---

w/ [~njayaram] We have created a sql script to run all 3 options of online help 
({{()}}, {{('usage')}} and {{('example')}}) for every function. Currently most 
matrix operations (online help) are failing on master but our branch should 
contain the fix for them as well. Please see the attachments for the script.


was (Author: okislal):
w/ [~njayaram] We have created a sql script to run all 3 options of online help 
({{()}}, {{('usage')}} and {{('example')}}) for every function. Currently most 
matrix operations are failing on master but our branch should contain the fix 
for them as well. Please see the attachments for the script.

> Remove online examples
> --
>
> Key: MADLIB-1260
> URL: https://issues.apache.org/jira/browse/MADLIB-1260
> Project: Apache MADlib
>  Issue Type: Improvement
>  Components: All Modules
>Reporter: Jingyi Mei
>Priority: Major
> Fix For: v1.15.1
>
> Attachments: test_online_help.sql
>
>
> For a madlib module, we can call 
> {code:java}
> select madlib_schema.module_name('example');{code}
> to print out examples of this module. They are hard to maintain and not that 
> useful since we already have examples in our user documentation 
> [http://madlib.apache.org/docs/latest/index.html/.|http://madlib.apache.org/docs/latest/index.html/]
>  
> We are going to remove those examples for every module that has it, and make 
> sure madlib throw out proper error message when user calls it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (MADLIB-1260) Remove online examples

2018-07-31 Thread Orhan Kislal (JIRA)


 [ 
https://issues.apache.org/jira/browse/MADLIB-1260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Orhan Kislal updated MADLIB-1260:
-
Attachment: test_online_help.sql

> Remove online examples
> --
>
> Key: MADLIB-1260
> URL: https://issues.apache.org/jira/browse/MADLIB-1260
> Project: Apache MADlib
>  Issue Type: Improvement
>  Components: All Modules
>Reporter: Jingyi Mei
>Priority: Major
> Fix For: v1.15.1
>
> Attachments: test_online_help.sql
>
>
> For a madlib module, we can call 
> {code:java}
> select madlib_schema.module_name('example');{code}
> to print out examples of this module. They are hard to maintain and not that 
> useful since we already have examples in our user documentation 
> [http://madlib.apache.org/docs/latest/index.html/.|http://madlib.apache.org/docs/latest/index.html/]
>  
> We are going to remove those examples for every module that has it, and make 
> sure madlib throw out proper error message when user calls it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (MADLIB-1060) Support expressions for column names in k-NN

2018-08-07 Thread Orhan Kislal (JIRA)


[ 
https://issues.apache.org/jira/browse/MADLIB-1060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16572475#comment-16572475
 ] 

Orhan Kislal edited comment on MADLIB-1060 at 8/7/18 11:26 PM:
---

Thanks for looking into this JIRA, Himanshu. You might want to check out the 
Path module (utilities) for a similar case. `partition_expr` is used for 
grouping but the basic idea should be similar.


was (Author: okislal):
Thanks for looking into this Jira Himanshu. You might want to check out the 
Path module (utilities) for a similar case. `partition_expr` is used for 
grouping but the basic idea should be similar.

> Support expressions for column names in k-NN
> 
>
> Key: MADLIB-1060
> URL: https://issues.apache.org/jira/browse/MADLIB-1060
> Project: Apache MADlib
>  Issue Type: Improvement
>  Components: k-NN
>Reporter: Frank McQuillan
>Assignee: Himanshu Pandey
>Priority: Minor
>  Labels: starter
> Fix For: v2.0
>
>
> Follow on to 
> https://issues.apache.org/jira/browse/MADLIB-927
> {code}
> knn( point_source,
>  point_column_name,
>  label_column_name,
>  test_source,
>  test_column_name,
>  id_column_name,
>  output_table,
>  operation,
>  k
>)
> {code}
> Possible improvements:
> 1) The parameters 'point_column_name' and 'test_column_name' should support 
> regular PostgreSQL expressions.
> 2) Should we infer 'c' or 'r' from the data types, rather than have to say 
> explicitly?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MADLIB-1060) Support expressions for column names in k-NN

2018-08-07 Thread Orhan Kislal (JIRA)


[ 
https://issues.apache.org/jira/browse/MADLIB-1060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16572475#comment-16572475
 ] 

Orhan Kislal commented on MADLIB-1060:
--

Thanks for looking into this Jira Himanshu. You might want to check out the 
Path module (utilities) for a similar case. `partition_expr` is used for 
grouping but the basic idea should be similar.

> Support expressions for column names in k-NN
> 
>
> Key: MADLIB-1060
> URL: https://issues.apache.org/jira/browse/MADLIB-1060
> Project: Apache MADlib
>  Issue Type: Improvement
>  Components: k-NN
>Reporter: Frank McQuillan
>Assignee: Himanshu Pandey
>Priority: Minor
>  Labels: starter
> Fix For: v2.0
>
>
> Follow on to 
> https://issues.apache.org/jira/browse/MADLIB-927
> {code}
> knn( point_source,
>  point_column_name,
>  label_column_name,
>  test_source,
>  test_column_name,
>  id_column_name,
>  output_table,
>  operation,
>  k
>)
> {code}
> Possible improvements:
> 1) The parameters 'point_column_name' and 'test_column_name' should support 
> regular PostgreSQL expressions.
> 2) Should we infer 'c' or 'r' from the data types, rather than have to say 
> explicitly?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MADLIB-1060) Support expressions for column names in k-NN

2018-08-09 Thread Orhan Kislal (JIRA)


[ 
https://issues.apache.org/jira/browse/MADLIB-1060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16575527#comment-16575527
 ] 

Orhan Kislal commented on MADLIB-1060:
--

I think you are on the right track. The dev-check for path has a similar 
partition clause (user_id, age_group > 1, income_group > 1).

> Support expressions for column names in k-NN
> 
>
> Key: MADLIB-1060
> URL: https://issues.apache.org/jira/browse/MADLIB-1060
> Project: Apache MADlib
>  Issue Type: Improvement
>  Components: k-NN
>Reporter: Frank McQuillan
>Assignee: Himanshu Pandey
>Priority: Minor
>  Labels: starter
> Fix For: v2.0
>
>
> Follow on to 
> https://issues.apache.org/jira/browse/MADLIB-927
> {code}
> knn( point_source,
>  point_column_name,
>  label_column_name,
>  test_source,
>  test_column_name,
>  id_column_name,
>  output_table,
>  operation,
>  k
>)
> {code}
> Possible improvements:
> 1) The parameters 'point_column_name' and 'test_column_name' should support 
> regular PostgreSQL expressions.
> 2) Should we infer 'c' or 'r' from the data types, rather than have to say 
> explicitly?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MADLIB-1276) Marginal Regression summary tables are lost after the end of session.

2018-09-24 Thread Orhan Kislal (JIRA)
Orhan Kislal created MADLIB-1276:


 Summary: Marginal Regression summary tables are lost after the end 
of session.
 Key: MADLIB-1276
 URL: https://issues.apache.org/jira/browse/MADLIB-1276
 Project: Apache MADlib
  Issue Type: Bug
  Components: Module: Logistic Regression
Reporter: Orhan Kislal


margins_logregr function creates 2 output tables. The model table is created in 
the correct schema but the summary table is left in the pg_temp schema, which 
means that at the end of the session, the summary table will be lost. Passing 
an output name with a schema does not work either and it throws the following 
error. 
{code:java}
ERROR:  spiexceptions.FeatureNotSupported: cannot move objects into or out of 
temporary schemas
{code}

This is caused by the way tables are created in margins_logregr. The function 
creates the model and the summary in a temp schema. The model table is used to 
create the output table. The summary, on the other hand, does not need much 
processing and simply gets renamed. However, the generic rename_table function 
does not change the schema unless there is a schema in the new name.

Copying the contents, instead of renaming the summary table, seems like an easy 
solution. Since this is just a summary table, the contents should be limited in 
size and this change should not decrease the performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MADLIB-1278) RPM: Upgrade does not work from 1.15 to 1.15.1

2018-10-01 Thread Orhan Kislal (JIRA)
Orhan Kislal created MADLIB-1278:


 Summary: RPM: Upgrade does not work from 1.15 to 1.15.1
 Key: MADLIB-1278
 URL: https://issues.apache.org/jira/browse/MADLIB-1278
 Project: Apache MADlib
  Issue Type: Bug
  Components: Build System
Reporter: Orhan Kislal


It seems there is an issue with upgrading the 1.15 version of MADlib rpm. The 
RPM upgrade process works as follows:
{code}
Run the %pre section of the RPM being installed.
Install the files that the RPM provides.
Run the %post section of the RPM.
Run the %preun of the old package.
Delete any old files not overwritten by the newer version. ...
Run the %postun hook of the old package.
{code}
In version 1.15, we added a post-uninstall script that removed the links for 
Current, bin and doc folders (also cleaned the Versions folder if empty). While 
this makes sense for uninstallation, upgrade process runs this script at the 
end and removes the newly updated links. 

We should provide a script to recreate these links with the release of 1.15.1 
and any other release that supports upgrading from 1.15.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (MADLIB-1257) PostgreSQL crashed during random forest training

2018-10-17 Thread Orhan Kislal (JIRA)


 [ 
https://issues.apache.org/jira/browse/MADLIB-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Orhan Kislal reassigned MADLIB-1257:


Assignee: Orhan Kislal

> PostgreSQL crashed during random forest training
> 
>
> Key: MADLIB-1257
> URL: https://issues.apache.org/jira/browse/MADLIB-1257
> Project: Apache MADlib
>  Issue Type: Bug
>  Components: Module: Random Forest
>Reporter: Rahul Iyer
>Assignee: Orhan Kislal
>Priority: Major
> Fix For: v2.0
>
> Attachments: train_data.gz
>
>
> User reported bug:
> I got a problem when training the grouped data with random forest(300 
> features). Small data was fine ( eg, 56K instances in 56 groups), but failed 
> for 240K instances in 250 groups. Postgres forced to disconnect the session 
> after showing the below message in verbose mode:
> {code:sql}
> NOTICE:  view "__madlib_temp_60124179_1532371657_7130296__" will be a 
> temporary view
> NOTICE:  sql_create_empty_result_table:
> CREATE TABLE analysis.dx_rf_train_output_1 (
> gid integer,
> sample_id   integer,
> treemadlib.bytea8);
> NOTICE:  sql_refresh_training_pois_cnt:
> TRUNCATE TABLE 
> __madlib_temp_91155016_1532371657_5660955__ CASCADE;
> INSERT INTO 
> __madlib_temp_91155016_1532371657_5660955__
> SELECT
> *,
> madlib.poisson_random(1) AS poisson_count
> FROM
> (
> SELECT
> *,
> 0.::double precision AS 
> __madlib_temp_14328459_1532371657_7318497__
> FROM analysis.dxpredict_svec
> ) subq
> WHERE __madlib_temp_14328459_1532371657_7318497__ 
> < 1
> NOTICE:
> src_cnt: 158360,
> oob_cnt: 92418,
> dup_cnt: 250617.
> NOTICE:  Started tree building for all groups
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
> The PostgreSQL did not capture the detail log even I increased the 
> logstatement to "all" 
> 2018-07-23 14:47:50.229 EDT [1090] LOG:  server process (PID 1980) was 
> terminated by signal 11: Segmentation fault
> 2018-07-23 14:47:50.229 EDT [1090] DETAIL:  Failed process was running: 
> SELECT madlib.forest_train('analysis.dxpredict_svec',
>'analysis.dx_rf_train_output_1',
>'rowid',
>'positive',
>'*',
>'rowid,positive,case_icd',
>'case_icd',
>30::integer,
>30::integer,
>TRUE::boolean,
>1::integer,
>10::integer,
>3::integer,
>1::integer,
>10::integer,
>NULL,
>TRUE
>);
> 2018-07-23 14:47:50.229 EDT [1090] LOG:  terminating any other active server 
> processes
> 2018-07-23 14:47:50.229 EDT [1401] WARNING:  terminating connection because 
> of crash of another server process
> {code}
> Another observation -  It crashed with 84 groups and 73K instance. In this 
> scenario, I shall have pretty enough memory and disk. 
> Also seems during the increasing of the groups, it used a lot of temporary 
> disk space when the data is over certain groups.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MADLIB-1257) PostgreSQL crashed during random forest training

2018-10-18 Thread Orhan Kislal (JIRA)


[ 
https://issues.apache.org/jira/browse/MADLIB-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16655449#comment-16655449
 ] 

Orhan Kislal commented on MADLIB-1257:
--

I ran the exact same query on OSX 10.13, PG 10.5, MADlib 1.15.1 and the run did 
complete, without error. I did see that the memory usage increased over time up 
to 4~5 GBs but that was not high enough to crash my system (16GB RAM).

> PostgreSQL crashed during random forest training
> 
>
> Key: MADLIB-1257
> URL: https://issues.apache.org/jira/browse/MADLIB-1257
> Project: Apache MADlib
>  Issue Type: Bug
>  Components: Module: Random Forest
>Reporter: Rahul Iyer
>Assignee: Orhan Kislal
>Priority: Major
> Fix For: v2.0
>
> Attachments: train_data.gz
>
>
> User reported bug:
> I got a problem when training the grouped data with random forest(300 
> features). Small data was fine ( eg, 56K instances in 56 groups), but failed 
> for 240K instances in 250 groups. Postgres forced to disconnect the session 
> after showing the below message in verbose mode:
> {code:sql}
> NOTICE:  view "__madlib_temp_60124179_1532371657_7130296__" will be a 
> temporary view
> NOTICE:  sql_create_empty_result_table:
> CREATE TABLE analysis.dx_rf_train_output_1 (
> gid integer,
> sample_id   integer,
> treemadlib.bytea8);
> NOTICE:  sql_refresh_training_pois_cnt:
> TRUNCATE TABLE 
> __madlib_temp_91155016_1532371657_5660955__ CASCADE;
> INSERT INTO 
> __madlib_temp_91155016_1532371657_5660955__
> SELECT
> *,
> madlib.poisson_random(1) AS poisson_count
> FROM
> (
> SELECT
> *,
> 0.::double precision AS 
> __madlib_temp_14328459_1532371657_7318497__
> FROM analysis.dxpredict_svec
> ) subq
> WHERE __madlib_temp_14328459_1532371657_7318497__ 
> < 1
> NOTICE:
> src_cnt: 158360,
> oob_cnt: 92418,
> dup_cnt: 250617.
> NOTICE:  Started tree building for all groups
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
> The PostgreSQL did not capture the detail log even I increased the 
> logstatement to "all" 
> 2018-07-23 14:47:50.229 EDT [1090] LOG:  server process (PID 1980) was 
> terminated by signal 11: Segmentation fault
> 2018-07-23 14:47:50.229 EDT [1090] DETAIL:  Failed process was running: 
> SELECT madlib.forest_train('analysis.dxpredict_svec',
>'analysis.dx_rf_train_output_1',
>'rowid',
>'positive',
>'*',
>'rowid,positive,case_icd',
>'case_icd',
>30::integer,
>30::integer,
>TRUE::boolean,
>1::integer,
>10::integer,
>3::integer,
>1::integer,
>10::integer,
>NULL,
>TRUE
>);
> 2018-07-23 14:47:50.229 EDT [1090] LOG:  terminating any other active server 
> processes
> 2018-07-23 14:47:50.229 EDT [1401] WARNING:  terminating connection because 
> of crash of another server process
> {code}
> Another observation -  It crashed with 84 groups and 73K instance. In this 
> scenario, I shall have pretty enough memory and disk. 
> Also seems during the increasing of the groups, it used a lot of temporary 
> disk space when the data is over certain groups.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MADLIB-1257) PostgreSQL crashed during random forest training

2018-10-18 Thread Orhan Kislal (JIRA)


[ 
https://issues.apache.org/jira/browse/MADLIB-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16655487#comment-16655487
 ] 

Orhan Kislal commented on MADLIB-1257:
--

The code itself (random_forest.py_in) has the following line and comment:
{code:java}
with HashaggControl(False):
# we disable hashagg since large number of groups could
# result in excessive memory usage.
{code}

> PostgreSQL crashed during random forest training
> 
>
> Key: MADLIB-1257
> URL: https://issues.apache.org/jira/browse/MADLIB-1257
> Project: Apache MADlib
>  Issue Type: Bug
>  Components: Module: Random Forest
>Reporter: Rahul Iyer
>Assignee: Orhan Kislal
>Priority: Major
> Fix For: v2.0
>
> Attachments: train_data.gz
>
>
> User reported bug:
> I got a problem when training the grouped data with random forest(300 
> features). Small data was fine ( eg, 56K instances in 56 groups), but failed 
> for 240K instances in 250 groups. Postgres forced to disconnect the session 
> after showing the below message in verbose mode:
> {code:sql}
> NOTICE:  view "__madlib_temp_60124179_1532371657_7130296__" will be a 
> temporary view
> NOTICE:  sql_create_empty_result_table:
> CREATE TABLE analysis.dx_rf_train_output_1 (
> gid integer,
> sample_id   integer,
> treemadlib.bytea8);
> NOTICE:  sql_refresh_training_pois_cnt:
> TRUNCATE TABLE 
> __madlib_temp_91155016_1532371657_5660955__ CASCADE;
> INSERT INTO 
> __madlib_temp_91155016_1532371657_5660955__
> SELECT
> *,
> madlib.poisson_random(1) AS poisson_count
> FROM
> (
> SELECT
> *,
> 0.::double precision AS 
> __madlib_temp_14328459_1532371657_7318497__
> FROM analysis.dxpredict_svec
> ) subq
> WHERE __madlib_temp_14328459_1532371657_7318497__ 
> < 1
> NOTICE:
> src_cnt: 158360,
> oob_cnt: 92418,
> dup_cnt: 250617.
> NOTICE:  Started tree building for all groups
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
> The PostgreSQL did not capture the detail log even I increased the 
> logstatement to "all" 
> 2018-07-23 14:47:50.229 EDT [1090] LOG:  server process (PID 1980) was 
> terminated by signal 11: Segmentation fault
> 2018-07-23 14:47:50.229 EDT [1090] DETAIL:  Failed process was running: 
> SELECT madlib.forest_train('analysis.dxpredict_svec',
>'analysis.dx_rf_train_output_1',
>'rowid',
>'positive',
>'*',
>'rowid,positive,case_icd',
>'case_icd',
>30::integer,
>30::integer,
>TRUE::boolean,
>1::integer,
>10::integer,
>3::integer,
>1::integer,
>10::integer,
>NULL,
>TRUE
>);
> 2018-07-23 14:47:50.229 EDT [1090] LOG:  terminating any other active server 
> processes
> 2018-07-23 14:47:50.229 EDT [1401] WARNING:  terminating connection because 
> of crash of another server process
> {code}
> Another observation -  It crashed with 84 groups and 73K instance. In this 
> scenario, I shall have pretty enough memory and disk. 
> Also seems during the increasing of the groups, it used a lot of temporary 
> disk space when the data is over certain groups.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MADLIB-1281) Create SQL scripts to get lists of changed UDOs and UDOCs

2018-10-26 Thread Orhan Kislal (JIRA)
Orhan Kislal created MADLIB-1281:


 Summary: Create SQL scripts to get lists of changed UDOs and UDOCs
 Key: MADLIB-1281
 URL: https://issues.apache.org/jira/browse/MADLIB-1281
 Project: Apache MADlib
  Issue Type: Bug
  Components: Madpack
Reporter: Orhan Kislal


Currently while upgrading, we use custom scripts to get the list of changed 
UDFs, UDAs and UDTs (diff_udf.sql and diff_udt.sql). We need to create sql 
files that will give us the list of changed UDOs and UDOCs in the same output 
format as UDFs, UDAs and UDTs.

We should also integrate these scripts into our create_changelist.py tool and 
make sure the upgrade tools work correctly with changed UDOs and UDOCs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (MADLIB-1281) Create SQL scripts to get lists of changed UDOs and UDOCs

2018-10-26 Thread Orhan Kislal (JIRA)


 [ 
https://issues.apache.org/jira/browse/MADLIB-1281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Orhan Kislal updated MADLIB-1281:
-
Issue Type: Improvement  (was: Bug)

> Create SQL scripts to get lists of changed UDOs and UDOCs
> -
>
> Key: MADLIB-1281
> URL: https://issues.apache.org/jira/browse/MADLIB-1281
> Project: Apache MADlib
>  Issue Type: Improvement
>  Components: Madpack
>Reporter: Orhan Kislal
>Priority: Major
> Fix For: v1.16
>
>
> Currently while upgrading, we use custom scripts to get the list of changed 
> UDFs, UDAs and UDTs (diff_udf.sql and diff_udt.sql). We need to create sql 
> files that will give us the list of changed UDOs and UDOCs in the same output 
> format as UDFs, UDAs and UDTs.
> We should also integrate these scripts into our create_changelist.py tool and 
> make sure the upgrade tools work correctly with changed UDOs and UDOCs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (MADLIB-1281) Create SQL scripts to get lists of changed UDOs and UDOCs

2018-10-26 Thread Orhan Kislal (JIRA)


 [ 
https://issues.apache.org/jira/browse/MADLIB-1281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Orhan Kislal updated MADLIB-1281:
-
Priority: Minor  (was: Major)

> Create SQL scripts to get lists of changed UDOs and UDOCs
> -
>
> Key: MADLIB-1281
> URL: https://issues.apache.org/jira/browse/MADLIB-1281
> Project: Apache MADlib
>  Issue Type: Improvement
>  Components: Madpack
>Reporter: Orhan Kislal
>Priority: Minor
> Fix For: v1.16
>
>
> Currently while upgrading, we use custom scripts to get the list of changed 
> UDFs, UDAs and UDTs (diff_udf.sql and diff_udt.sql). We need to create sql 
> files that will give us the list of changed UDOs and UDOCs in the same output 
> format as UDFs, UDAs and UDTs.
> We should also integrate these scripts into our create_changelist.py tool and 
> make sure the upgrade tools work correctly with changed UDOs and UDOCs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (MADLIB-1281) Create SQL scripts to get lists of changed UDOs and UDOCs

2018-10-26 Thread Orhan Kislal (JIRA)


 [ 
https://issues.apache.org/jira/browse/MADLIB-1281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Orhan Kislal updated MADLIB-1281:
-
Fix Version/s: v1.16

> Create SQL scripts to get lists of changed UDOs and UDOCs
> -
>
> Key: MADLIB-1281
> URL: https://issues.apache.org/jira/browse/MADLIB-1281
> Project: Apache MADlib
>  Issue Type: Improvement
>  Components: Madpack
>Reporter: Orhan Kislal
>Priority: Minor
> Fix For: v1.16
>
>
> Currently while upgrading, we use custom scripts to get the list of changed 
> UDFs, UDAs and UDTs (diff_udf.sql and diff_udt.sql). We need to create sql 
> files that will give us the list of changed UDOs and UDOCs in the same output 
> format as UDFs, UDAs and UDTs.
> We should also integrate these scripts into our create_changelist.py tool and 
> make sure the upgrade tools work correctly with changed UDOs and UDOCs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MADLIB-1281) Create SQL scripts to get lists of changed UDOs and UDOCs

2018-10-26 Thread Orhan Kislal (JIRA)


[ 
https://issues.apache.org/jira/browse/MADLIB-1281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16664918#comment-16664918
 ] 

Orhan Kislal commented on MADLIB-1281:
--

I have considered adding checks for UDCs as well but I haven't found a way to 
create a dependency to test the process. It seems Postgresql just replaces the 
cast with the function even in the views.


{code:sql}
madlib=# \dC+ madlib.*
  List of casts
Source type |Target type |  Function   | Implicit? | 
Description
++-+---+-
 bigint | madlib.svec| svec_cast_int8  | no|
 double precision   | madlib.svec| svec_cast_float8| no|
 double precision[] | madlib.svec| svec_cast_float8arr | no|
 integer| madlib.svec| svec_cast_int4  | no|
 madlib.svec| double precision[] | svec_return_array   | no|
 numeric| madlib.svec| svec_cast_numeric   | no|
 real   | madlib.svec| svec_cast_float4| no|
 smallint   | madlib.svec| svec_cast_int2  | no|
(8 rows)

madlib=# create view v6 as select '{1}:{1}'::madlib.svec::double precision[];
CREATE VIEW

madlib=# drop cast if exists(madlib.svec as double precision[]);
DROP CAST

madlib=# drop function madlib.svec_return_array(madlib.svec);
ERROR:  cannot drop function madlib.svec_return_array(madlib.svec) because 
other objects depend on it
DETAIL:  view v6 depends on function madlib.svec_return_array(madlib.svec)
HINT:  Use DROP ... CASCADE to drop the dependent objects too.
{code}

Unless we find a way to impose dependency on a cast, we should update the 
upgrade_util.py file to NOT remove drop/create statements for casts. If a new 
version changes the function of a cast, it will not be reflected in the 
database.

> Create SQL scripts to get lists of changed UDOs and UDOCs
> -
>
> Key: MADLIB-1281
> URL: https://issues.apache.org/jira/browse/MADLIB-1281
> Project: Apache MADlib
>  Issue Type: Improvement
>  Components: Madpack
>Reporter: Orhan Kislal
>Priority: Minor
> Fix For: v1.16
>
>
> Currently while upgrading, we use custom scripts to get the list of changed 
> UDFs, UDAs and UDTs (diff_udf.sql and diff_udt.sql). We need to create sql 
> files that will give us the list of changed UDOs and UDOCs in the same output 
> format as UDFs, UDAs and UDTs.
> We should also integrate these scripts into our create_changelist.py tool and 
> make sure the upgrade tools work correctly with changed UDOs and UDOCs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (MADLIB-1283) Postgres 11 support

2018-11-21 Thread Orhan Kislal (JIRA)


[ 
https://issues.apache.org/jira/browse/MADLIB-1283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16694384#comment-16694384
 ] 

Orhan Kislal edited comment on MADLIB-1283 at 11/21/18 8:29 AM:


There are a few fixes required for PG11 support (in addition to the build 
related changes). 

TRUE and FALSE are not defined by Postgresql anymore, we will change them to 
true and false.
We have to use TupleDescAttr(tupdesc, inID) instead of accessing this field 
directly (tupdesc->attrs[inID])

In addition, we need to change how we access pg_proc table.
In PG11, proisagg and proiswindow columns of pg_proc are combined into a single 
column: prokind. This column is used in two specific places:

knn: easy to add a simple pg version check since the code is called from python.
kmeans: messy, since the code is SQL (plpgsql) and we have to parse the version 
string. We might want to move the kmeans code to python as well.
Finally, we use this column in the create_changelist.py scripts. When we drop 
the support for PG10, we should update them as well.


was (Author: okislal):
There are a few fixes required for PG11 support (in . addition to the build 
related changes). 

TRUE and FALSE are not defined by Postgresql anymore, we will change them to 
true and false.
We have to use TupleDescAttr(tupdesc, inID) instead of accessing this field 
directly (tupdesc->attrs[inID])

In addition, we need to change how we access pg_proc table.
In PG11, proisagg and proiswindow columns of pg_proc are combined into a single 
column: prokind. This column is used in two specific places:

knn: easy to add a simple pg version check since the code is called from python.
kmeans: messy, since the code is SQL (plpgsql) and we have to parse the version 
string. We might want to move the kmeans code to python as well.
Finally, we use this column in the create_changelist.py scripts. When we drop 
the support for PG10, we should update them as well.

> Postgres 11 support
> ---
>
> Key: MADLIB-1283
> URL: https://issues.apache.org/jira/browse/MADLIB-1283
> Project: Apache MADlib
>  Issue Type: Improvement
>  Components: Build System
>Reporter: Darafei Praliaskouski
>Assignee: Orhan Kislal
>Priority: Major
> Fix For: v1.16
>
>
> Postgres 11.1 just got released. MADlib doesn't support it yet.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MADLIB-1283) Postgres 11 support

2018-11-21 Thread Orhan Kislal (JIRA)


[ 
https://issues.apache.org/jira/browse/MADLIB-1283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16694384#comment-16694384
 ] 

Orhan Kislal commented on MADLIB-1283:
--

There are a few fixes required for PG11 support (in . addition to the build 
related changes). 

TRUE and FALSE are not defined by Postgresql anymore, we will change them to 
true and false.
We have to use TupleDescAttr(tupdesc, inID) instead of accessing this field 
directly (tupdesc->attrs[inID])

In addition, we need to change how we access pg_proc table.
In PG11, proisagg and proiswindow columns of pg_proc are combined into a single 
column: prokind. This column is used in two specific places:

knn: easy to add a simple pg version check since the code is called from python.
kmeans: messy, since the code is SQL (plpgsql) and we have to parse the version 
string. We might want to move the kmeans code to python as well.
Finally, we use this column in the create_changelist.py scripts. When we drop 
the support for PG10, we should update them as well.

> Postgres 11 support
> ---
>
> Key: MADLIB-1283
> URL: https://issues.apache.org/jira/browse/MADLIB-1283
> Project: Apache MADlib
>  Issue Type: Improvement
>  Components: Build System
>Reporter: Darafei Praliaskouski
>Assignee: Orhan Kislal
>Priority: Major
> Fix For: v1.16
>
>
> Postgres 11.1 just got released. MADlib doesn't support it yet.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MADLIB-1283) Postgres 11 support

2018-11-21 Thread Orhan Kislal (JIRA)


[ 
https://issues.apache.org/jira/browse/MADLIB-1283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16695034#comment-16695034
 ] 

Orhan Kislal commented on MADLIB-1283:
--

It seems the initial design of the kmeans deliberately eschewed the python 
option in favor of plpgsql. Here is the relevant comment from kmeans.sql_in 
line 801-803

{code:java}
-- We first setup the argument table. Rationale: We want to avoid all data
-- conversion between native types and Python code. Instead, we use Python
-- as a pure driver layer.
{code}


> Postgres 11 support
> ---
>
> Key: MADLIB-1283
> URL: https://issues.apache.org/jira/browse/MADLIB-1283
> Project: Apache MADlib
>  Issue Type: Improvement
>  Components: Build System
>Reporter: Darafei Praliaskouski
>Assignee: Orhan Kislal
>Priority: Major
> Fix For: v1.16
>
>
> Postgres 11.1 just got released. MADlib doesn't support it yet.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (MADLIB-1283) Postgres 11 support

2018-12-14 Thread Orhan Kislal (JIRA)


 [ 
https://issues.apache.org/jira/browse/MADLIB-1283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Orhan Kislal resolved MADLIB-1283.
--
Resolution: Fixed

> Postgres 11 support
> ---
>
> Key: MADLIB-1283
> URL: https://issues.apache.org/jira/browse/MADLIB-1283
> Project: Apache MADlib
>  Issue Type: Improvement
>  Components: Build System
>Reporter: Darafei Praliaskouski
>Assignee: Orhan Kislal
>Priority: Major
> Fix For: v1.16
>
>
> Postgres 11.1 just got released. MADlib doesn't support it yet.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (MADLIB-1061) Additional computation methods for k-NN - kd tree

2019-02-01 Thread Orhan Kislal (JIRA)


 [ 
https://issues.apache.org/jira/browse/MADLIB-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Orhan Kislal updated MADLIB-1061:
-
Attachment: KNN-raw.pdf
KNN-charts.pdf
KNN-chart-data.pdf

> Additional computation methods for k-NN - kd tree
> -
>
> Key: MADLIB-1061
> URL: https://issues.apache.org/jira/browse/MADLIB-1061
> Project: Apache MADlib
>  Issue Type: New Feature
>  Components: k-NN
>Reporter: Frank McQuillan
>Assignee: Orhan Kislal
>Priority: Major
>  Labels: starter
> Fix For: v1.16
>
> Attachments: KNN-chart-data.pdf, KNN-charts.pdf, KNN-raw.pdf, 
> KNN-w-KD-tree-leaf-node-only.pdf, Sheet1-KNN-perf-num-features.pdf, 
> Sheet2-KNN-tree-construction.pdf, Sheet3-KNN-tree-depth.pdf
>
>
> Follow on to
> https://issues.apache.org/jira/browse/MADLIB-927
> which uses brute force.
> Determine other k-NN algos to implement.  From 
> http://scikit-learn.org/stable/modules/neighbors.html
> candidates are:
> * K-D Tree
> * Ball Tree
> * Other?
> This JIRA is to implement K-D tree.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MADLIB-1061) Additional computation methods for k-NN - kd tree

2019-02-01 Thread Orhan Kislal (JIRA)


[ 
https://issues.apache.org/jira/browse/MADLIB-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16758288#comment-16758288
 ] 

Orhan Kislal commented on MADLIB-1061:
--

I have attached some updated charts (and data) for the kd-tree enabled knn. It 
shows some significant improvements in the performance with acceptable losses 
in accuracy. 

> Additional computation methods for k-NN - kd tree
> -
>
> Key: MADLIB-1061
> URL: https://issues.apache.org/jira/browse/MADLIB-1061
> Project: Apache MADlib
>  Issue Type: New Feature
>  Components: k-NN
>Reporter: Frank McQuillan
>Assignee: Orhan Kislal
>Priority: Major
>  Labels: starter
> Fix For: v1.16
>
> Attachments: KNN-chart-data.pdf, KNN-charts.pdf, KNN-raw.pdf, 
> KNN-w-KD-tree-leaf-node-only.pdf, Sheet1-KNN-perf-num-features.pdf, 
> Sheet2-KNN-tree-construction.pdf, Sheet3-KNN-tree-depth.pdf
>
>
> Follow on to
> https://issues.apache.org/jira/browse/MADLIB-927
> which uses brute force.
> Determine other k-NN algos to implement.  From 
> http://scikit-learn.org/stable/modules/neighbors.html
> candidates are:
> * K-D Tree
> * Ball Tree
> * Other?
> This JIRA is to implement K-D tree.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MADLIB-1317) Multinomial results not matching with R method

2019-04-04 Thread Orhan Kislal (JIRA)


[ 
https://issues.apache.org/jira/browse/MADLIB-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16810423#comment-16810423
 ] 

Orhan Kislal commented on MADLIB-1317:
--

w/ [~khannaekta]
Hi Pratik,

We checked the output of multinom with a few sample datasets and they seemed to 
match the R output. 

The ref_category is an optional parameter. If you leave it as NULL, madlib 
should pick the first category (as the R function does). 

Could you try it without the ref_category and give us a sample dataset that 
exhibits this behavior?

> Multinomial results not matching with R method
> --
>
> Key: MADLIB-1317
> URL: https://issues.apache.org/jira/browse/MADLIB-1317
> Project: Apache MADlib
>  Issue Type: Bug
>  Components: Module: Multinomial Logistic Regression
>Reporter: Pratik
>Priority: Major
>
> Hi team,
> I have using madlib multinomial method on my dataset with categorical 
> independent variable (hot encoded) as below. 
>  
> {code:java}
> SELECT
>     CASE WHEN multinom IS NOT NULL THEN TRUE ELSE FALSE END
> FROM
>  madlib.multinom(
> 'TEMP_TEST_1',
> 'TEMP_TEST_1_OP',
>     'dep_var_col',
>     'ARRAY[ 1,hot_encoded_GENDER_col_val1, hot_encoded_GENDER_col_val2]',
>     '1',--REF CATEGORY 
>     'logit',
>     NULL,
>     'max_iter=100,optimizer=irls,tolerance=0.0001',
>     TRUE
>  );{code}
> Gender being a categorical column I am hot encoding it in 2 columns 0|1. 
> When comparing results with R's method coefficients match but the StdErr and 
> pValue are way off in comparison.
> R method -
> {code:java}
> nnet::multinom
> {code}
>  
> Is there anything I need to do specially for multinom or is it a bug? 
> Or is there perticular way I need to use R to compare results with multinom?
> *UPDATE:*
> Is it mandatory to have ref_category like column for categorical independent 
> variable?? 
> hot encoded GENDER_col_val1 from list of independent variable and results are 
> matching with Rs output.
>  
> Is there any documentation or reference to confirm this? 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MADLIB-1317) Multinomial results not matching with R method

2019-04-15 Thread Orhan Kislal (JIRA)


[ 
https://issues.apache.org/jira/browse/MADLIB-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16818226#comment-16818226
 ] 

Orhan Kislal commented on MADLIB-1317:
--

w/ [~khannaekta]

Hi Pratik,

One hot encoding is the preferred way to handle categorical independent 
variables for multinomial regression. Closing this JIRA per your confirmation. 
Let us know if you have any other questions.


> Multinomial results not matching with R method
> --
>
> Key: MADLIB-1317
> URL: https://issues.apache.org/jira/browse/MADLIB-1317
> Project: Apache MADlib
>  Issue Type: Bug
>  Components: Module: Multinomial Logistic Regression
>Reporter: Pratik
>Priority: Major
>
> Hi team,
> I have using madlib multinomial method on my dataset with categorical 
> independent variable (hot encoded) as below. 
>  
> {code:java}
> SELECT
>     CASE WHEN multinom IS NOT NULL THEN TRUE ELSE FALSE END
> FROM
>  madlib.multinom(
> 'TEMP_TEST_1',
> 'TEMP_TEST_1_OP',
>     'dep_var_col',
>     'ARRAY[ 1,hot_encoded_GENDER_col_val1, hot_encoded_GENDER_col_val2]',
>     '1',--REF CATEGORY 
>     'logit',
>     NULL,
>     'max_iter=100,optimizer=irls,tolerance=0.0001',
>     TRUE
>  );{code}
> Gender being a categorical column I am hot encoding it in 2 columns 0|1. 
> When comparing results with R's method coefficients match but the StdErr and 
> pValue are way off in comparison.
> R method -
> {code:java}
> nnet::multinom
> {code}
>  
> Is there anything I need to do specially for multinom or is it a bug? 
> Or is there perticular way I need to use R to compare results with multinom?
> *UPDATE:*
> Is it mandatory to have ref_category like column for categorical independent 
> variable?? 
> hot encoded GENDER_col_val1 from list of independent variable and results are 
> matching with Rs output.
>  
> Is there any documentation or reference to confirm this? 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (MADLIB-1317) Multinomial results not matching with R method

2019-04-15 Thread Orhan Kislal (JIRA)


 [ 
https://issues.apache.org/jira/browse/MADLIB-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Orhan Kislal closed MADLIB-1317.

Resolution: Cannot Reproduce
  Assignee: Orhan Kislal

> Multinomial results not matching with R method
> --
>
> Key: MADLIB-1317
> URL: https://issues.apache.org/jira/browse/MADLIB-1317
> Project: Apache MADlib
>  Issue Type: Bug
>  Components: Module: Multinomial Logistic Regression
>Reporter: Pratik
>Assignee: Orhan Kislal
>Priority: Major
>
> Hi team,
> I have using madlib multinomial method on my dataset with categorical 
> independent variable (hot encoded) as below. 
>  
> {code:java}
> SELECT
>     CASE WHEN multinom IS NOT NULL THEN TRUE ELSE FALSE END
> FROM
>  madlib.multinom(
> 'TEMP_TEST_1',
> 'TEMP_TEST_1_OP',
>     'dep_var_col',
>     'ARRAY[ 1,hot_encoded_GENDER_col_val1, hot_encoded_GENDER_col_val2]',
>     '1',--REF CATEGORY 
>     'logit',
>     NULL,
>     'max_iter=100,optimizer=irls,tolerance=0.0001',
>     TRUE
>  );{code}
> Gender being a categorical column I am hot encoding it in 2 columns 0|1. 
> When comparing results with R's method coefficients match but the StdErr and 
> pValue are way off in comparison.
> R method -
> {code:java}
> nnet::multinom
> {code}
>  
> Is there anything I need to do specially for multinom or is it a bug? 
> Or is there perticular way I need to use R to compare results with multinom?
> *UPDATE:*
> Is it mandatory to have ref_category like column for categorical independent 
> variable?? 
> hot encoded GENDER_col_val1 from list of independent variable and results are 
> matching with Rs output.
>  
> Is there any documentation or reference to confirm this? 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (MADLIB-1338) DL: Add support for reporting various metrics in fit/evaluate

2019-05-06 Thread Orhan Kislal (JIRA)


 [ 
https://issues.apache.org/jira/browse/MADLIB-1338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Orhan Kislal updated MADLIB-1338:
-
Description: 
The current {{madlib_keras.fit()}} code reports accuracy as the only metric, 
along with loss value. But we could ask for different metrics in compile params 
({{mae, binary_accuracy}} etc.), then {{Keras.evaluate()}} would return back 
{{loss}} (by default) and {{mean_absolute_error}} or {{binary_accuracy}} 
(metrics).
This JIRA requests support to report all of these metrics in the output table.
Other requirements:

Output summary table must have the metrics' labels (instead of just accuracy)
Remove loss/accuracy computation from fit_transition.



  was:
The current {{madlib_keras.fit()}} code reports accuracy as the only metric, 
along with loss value. But we could ask for multiple metrics in compile params 
(for eg., {{metrics=['mae','accuracy']}}), then {{Keras.evaluate()}} would 
return back {{loss}} (by default), {{mean_absolute_error}} and {{accuracy}} 
(metrics).
This JIRA requests support to report all of these metrics in the output table.
Other requirements:
1. Output summary table must have a 2-D array to report {{metrics}}. The inner 
dimension corresponds to all metrics values for the iteration at which it is 
computed.
1. Output summary table must have the metrics' labels (eg., 
[mean_absolute_error, accuracy])

Summary: DL: Add support for reporting various metrics in fit/evaluate  
(was: DL: Add support for reporting multiple metrics in fit/evaluate)

> DL: Add support for reporting various metrics in fit/evaluate
> -
>
> Key: MADLIB-1338
> URL: https://issues.apache.org/jira/browse/MADLIB-1338
> Project: Apache MADlib
>  Issue Type: New Feature
>  Components: Deep Learning
>Reporter: Nandish Jayaram
>Priority: Major
> Fix For: v1.16
>
>
> The current {{madlib_keras.fit()}} code reports accuracy as the only metric, 
> along with loss value. But we could ask for different metrics in compile 
> params ({{mae, binary_accuracy}} etc.), then {{Keras.evaluate()}} would 
> return back {{loss}} (by default) and {{mean_absolute_error}} or 
> {{binary_accuracy}} (metrics).
> This JIRA requests support to report all of these metrics in the output table.
> Other requirements:
> Output summary table must have the metrics' labels (instead of just accuracy)
> Remove loss/accuracy computation from fit_transition.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MADLIB-1361) Pivot: Fix array_agg + distinct scaling issue on gpdb

2019-06-14 Thread Orhan Kislal (JIRA)
Orhan Kislal created MADLIB-1361:


 Summary: Pivot: Fix array_agg + distinct scaling issue on gpdb 
 Key: MADLIB-1361
 URL: https://issues.apache.org/jira/browse/MADLIB-1361
 Project: Apache MADlib
  Issue Type: Bug
  Components: Module: Utilities
Reporter: Orhan Kislal


With large datasets, pivot fails because of the array_agg(distinct) query. This 
is because array_agg collects the values first and filters the distinct values 
later. This causes the array_agg to go out of memory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (MADLIB-1361) Pivot: Fix array_agg + distinct scaling issue on gpdb

2019-06-14 Thread Orhan Kislal (JIRA)


 [ 
https://issues.apache.org/jira/browse/MADLIB-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Orhan Kislal updated MADLIB-1361:
-
Fix Version/s: v1.16

> Pivot: Fix array_agg + distinct scaling issue on gpdb 
> --
>
> Key: MADLIB-1361
> URL: https://issues.apache.org/jira/browse/MADLIB-1361
> Project: Apache MADlib
>  Issue Type: Bug
>  Components: Module: Utilities
>Reporter: Orhan Kislal
>Priority: Major
> Fix For: v1.16
>
>
> With large datasets, pivot fails because of the array_agg(distinct) query. 
> This is because array_agg collects the values first and filters the distinct 
> values later. This causes the array_agg to go out of memory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (MADLIB-1248) Reinstalling on a db that does not have MADlib

2019-07-10 Thread Orhan Kislal (JIRA)


 [ 
https://issues.apache.org/jira/browse/MADLIB-1248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Orhan Kislal resolved MADLIB-1248.
--
Resolution: Fixed

> Reinstalling on a db that does not have MADlib
> --
>
> Key: MADLIB-1248
> URL: https://issues.apache.org/jira/browse/MADLIB-1248
> Project: Apache MADlib
>  Issue Type: Bug
>  Components: Madpack
>Reporter: Orhan Kislal
>Assignee: Jingyi Mei
>Priority: Major
>
> Reinstalling on a database that does not have MADlib already installed fails 
> as expected. However, the info messages do not mention this failure. It seems 
> the proper error message was lost during the `making madpack operations 
> atomic` Jira (MADLIB-1242). 
> The same issue happens to uninstall when there was no MADlib installed before.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (MADLIB-1248) Reinstalling on a db that does not have MADlib

2019-07-10 Thread Orhan Kislal (JIRA)


 [ 
https://issues.apache.org/jira/browse/MADLIB-1248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Orhan Kislal closed MADLIB-1248.


> Reinstalling on a db that does not have MADlib
> --
>
> Key: MADLIB-1248
> URL: https://issues.apache.org/jira/browse/MADLIB-1248
> Project: Apache MADlib
>  Issue Type: Bug
>  Components: Madpack
>Reporter: Orhan Kislal
>Assignee: Jingyi Mei
>Priority: Major
>
> Reinstalling on a database that does not have MADlib already installed fails 
> as expected. However, the info messages do not mention this failure. It seems 
> the proper error message was lost during the `making madpack operations 
> atomic` Jira (MADLIB-1242). 
> The same issue happens to uninstall when there was no MADlib installed before.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MADLIB-1372) MADlib Keras operations create too many threads

2019-07-25 Thread Orhan Kislal (JIRA)
Orhan Kislal created MADLIB-1372:


 Summary: MADlib Keras operations create too many threads
 Key: MADLIB-1372
 URL: https://issues.apache.org/jira/browse/MADLIB-1372
 Project: Apache MADlib
  Issue Type: Bug
  Components: Deep Learning
Reporter: Orhan Kislal
 Fix For: v1.17


We noticed that madlib keras operations that call keras functions create a 
number of threads every single time they are called. However, some of these 
threads are not cleaned up at the end of the function. If the number of 
iterations is very high, the thread count keeps increasing and eventually hits 
the system default limit (1024).

Here is the error message from the log
{code}
what(): Resource temporarily 
unavailable""SysLoggerMain","syslogger.c",618,
{code}
We tried keeping the session at the end of the function (saved the session info 
in SD) and reuse it for the next iteration but that didn't help with this issue.

It is possible to increase this limit by editing /etc/security/limits.conf and 
the files in /etc/security/limits.d/. This requires a restart of the system and 
the database to take effect. 

 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (MADLIB-1370) Knn - add zero check and output distance array

2019-08-14 Thread Orhan Kislal (JIRA)


 [ 
https://issues.apache.org/jira/browse/MADLIB-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Orhan Kislal resolved MADLIB-1370.
--
Resolution: Fixed
  Assignee: Orhan Kislal

> Knn - add zero check and output distance array
> --
>
> Key: MADLIB-1370
> URL: https://issues.apache.org/jira/browse/MADLIB-1370
> Project: Apache MADlib
>  Issue Type: Improvement
>  Components: k-NN
>Reporter: Frank McQuillan
>Assignee: Orhan Kislal
>Priority: Minor
> Fix For: v1.17
>
>
> In unsupervised mode of knn 
> http://madlib.apache.org/docs/latest/group__grp__knn.html
> when `point_source` and `test_source` are the same data set, nearest 
> neighbors is not reliably returning the 0 distance point as a nearest 
> neighbor.
> Could there a small neg issue here for a distance that is effectively 0 but 
> shows up as neg epsilon?
> Also, please assess if we can add a vector of distances to the output file:
> {code}
> Output Format
> The output of the KNN module is a table with the following columns:
> idINTEGER. The ids of test data points.
> test_column_name  DOUBLE PRECISION[]. The test data points.
> predictionINTEGER. Label in case of classification, average value in case 
> of regression.
> k_nearest_neighbours  INTEGER[]. List of nearest neighbors, sorted closest to 
> furthest from the corresponding test point.
> distance DOUBLE PRECISION[].  Distance sorted in the same order as the 
> 'k_nearest_neighbours' array.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Closed] (MADLIB-1370) Knn - add zero check and output distance array

2019-08-14 Thread Orhan Kislal (JIRA)


 [ 
https://issues.apache.org/jira/browse/MADLIB-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Orhan Kislal closed MADLIB-1370.


> Knn - add zero check and output distance array
> --
>
> Key: MADLIB-1370
> URL: https://issues.apache.org/jira/browse/MADLIB-1370
> Project: Apache MADlib
>  Issue Type: Improvement
>  Components: k-NN
>Reporter: Frank McQuillan
>Assignee: Orhan Kislal
>Priority: Minor
> Fix For: v1.17
>
>
> In unsupervised mode of knn 
> http://madlib.apache.org/docs/latest/group__grp__knn.html
> when `point_source` and `test_source` are the same data set, nearest 
> neighbors is not reliably returning the 0 distance point as a nearest 
> neighbor.
> Could there a small neg issue here for a distance that is effectively 0 but 
> shows up as neg epsilon?
> Also, please assess if we can add a vector of distances to the output file:
> {code}
> Output Format
> The output of the KNN module is a table with the following columns:
> idINTEGER. The ids of test data points.
> test_column_name  DOUBLE PRECISION[]. The test data points.
> predictionINTEGER. Label in case of classification, average value in case 
> of regression.
> k_nearest_neighbours  INTEGER[]. List of nearest neighbors, sorted closest to 
> furthest from the corresponding test point.
> distance DOUBLE PRECISION[].  Distance sorted in the same order as the 
> 'k_nearest_neighbours' array.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (MADLIB-1382) Add simple silhouette score for every point

2019-09-13 Thread Orhan Kislal (Jira)
Orhan Kislal created MADLIB-1382:


 Summary: Add simple silhouette score for every point
 Key: MADLIB-1382
 URL: https://issues.apache.org/jira/browse/MADLIB-1382
 Project: Apache MADlib
  Issue Type: New Feature
  Components: Module: k-Means Clustering
Reporter: Orhan Kislal
 Fix For: v1.17


A function to calculate the simple silhouette score for individual point would 
be useful for creating histograms and analyzing cluster quality.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (MADLIB-1388) Deep Learning module does not work with tables in non-public schemas

2019-10-15 Thread Orhan Kislal (Jira)
Orhan Kislal created MADLIB-1388:


 Summary: Deep Learning module does not work with tables in 
non-public schemas
 Key: MADLIB-1388
 URL: https://issues.apache.org/jira/browse/MADLIB-1388
 Project: Apache MADlib
  Issue Type: Bug
  Components: Deep Learning
Reporter: Orhan Kislal


I have tried both source table and model arch table in schemas to replicate the 
issue. It seems the quote indent is tripping up the validation code which 
causes the following error.
{code:java}
madlib=# SELECT madlib_keras_fit(   
   's1.t2', 
  
'keras_saved_out',  
   'model_arch',
  1,

 $$ optimizer=SGD(lr=0.01, decay=1e-6, nesterov=True), 
loss='categorical_crossentropy', metrics=['accuracy']$$::text,  

$$ batch_size=2, epochs=1, verbose=0 $$::text,  
   3);
ERROR:  plpy.Error: madlib_keras_fit error: Input table '"s1.t2"' does not 
exist. (plpy_elog.c:121)
CONTEXT:  Traceback (most recent call last):
  PL/Python function "madlib_keras_fit", line 21, in 
madlib_keras.fit(**globals())
  PL/Python function "madlib_keras_fit", line 42, in wrapper
  PL/Python function "madlib_keras_fit", line 99, in fit
  PL/Python function "madlib_keras_fit", line 370, in __init__
  PL/Python function "madlib_keras_fit", line 260, in __init__
  PL/Python function "madlib_keras_fit", line 268, in _validate_common_args
  PL/Python function "madlib_keras_fit", line 674, in input_tbl_valid
PL/Python function "madlib_keras_fit"
{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (MADLIB-1388) Deep Learning module does not work with tables in non-public schemas

2019-10-15 Thread Orhan Kislal (Jira)


 [ 
https://issues.apache.org/jira/browse/MADLIB-1388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Orhan Kislal reassigned MADLIB-1388:


Fix Version/s: v1.17
 Assignee: Orhan Kislal

> Deep Learning module does not work with tables in non-public schemas
> 
>
> Key: MADLIB-1388
> URL: https://issues.apache.org/jira/browse/MADLIB-1388
> Project: Apache MADlib
>  Issue Type: Bug
>  Components: Deep Learning
>Reporter: Orhan Kislal
>Assignee: Orhan Kislal
>Priority: Major
> Fix For: v1.17
>
>
> I have tried both source table and model arch table in schemas to replicate 
> the issue. It seems the quote indent is tripping up the validation code which 
> causes the following error.
> {code:java}
> madlib=# SELECT madlib_keras_fit( 
>  's1.t2', 
>   
> 'keras_saved_out',
>  'model_arch',
>   1,  
>   
>  $$ optimizer=SGD(lr=0.01, decay=1e-6, 
> nesterov=True), loss='categorical_crossentropy', 
> metrics=['accuracy']$$::text, 
>  $$ 
> batch_size=2, epochs=1, verbose=0 $$::text,   
>   3);
> ERROR:  plpy.Error: madlib_keras_fit error: Input table '"s1.t2"' does not 
> exist. (plpy_elog.c:121)
> CONTEXT:  Traceback (most recent call last):
>   PL/Python function "madlib_keras_fit", line 21, in 
> madlib_keras.fit(**globals())
>   PL/Python function "madlib_keras_fit", line 42, in wrapper
>   PL/Python function "madlib_keras_fit", line 99, in fit
>   PL/Python function "madlib_keras_fit", line 370, in __init__
>   PL/Python function "madlib_keras_fit", line 260, in __init__
>   PL/Python function "madlib_keras_fit", line 268, in _validate_common_args
>   PL/Python function "madlib_keras_fit", line 674, in input_tbl_valid
> PL/Python function "madlib_keras_fit"
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (MADLIB-1398) Fit Multiple Model does not utilize GPUs

2019-11-26 Thread Orhan Kislal (Jira)
Orhan Kislal created MADLIB-1398:


 Summary: Fit Multiple Model does not utilize GPUs
 Key: MADLIB-1398
 URL: https://issues.apache.org/jira/browse/MADLIB-1398
 Project: Apache MADlib
  Issue Type: Bug
  Components: Deep Learning
Reporter: Orhan Kislal
 Fix For: v1.17


It seems the fit multiple model function is not utilizing the GPU properly. 

 

I see that sessions are created and the memory is allocated but the GPU 
utilization is almost always at 0%. The running times are very long compared to 
the regular fit function as well. In the same setup, a single fit iteration 
takes ~300sec. Multifit with 3 models takes ~7000 sec. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (MADLIB-1398) Fit Multiple Model does not utilize GPUs

2019-12-04 Thread Orhan Kislal (Jira)


 [ 
https://issues.apache.org/jira/browse/MADLIB-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Orhan Kislal closed MADLIB-1398.

Resolution: Not A Bug

Closer inspection showed that this is not an issue. GPUs are being used, just 
that the execution time of non-GPU parts are significantly longer.

> Fit Multiple Model does not utilize GPUs
> 
>
> Key: MADLIB-1398
> URL: https://issues.apache.org/jira/browse/MADLIB-1398
> Project: Apache MADlib
>  Issue Type: Bug
>  Components: Deep Learning
>Reporter: Orhan Kislal
>Priority: Major
> Fix For: v1.17
>
>
> It seems the fit multiple model function is not utilizing the GPU properly. 
>  
> I see that sessions are created and the memory is allocated but the GPU 
> utilization is almost always at 0%. The running times are very long compared 
> to the regular fit function as well. In the same setup, a single fit 
> iteration takes ~300sec. Multifit with 3 models takes ~7000 sec. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (MADLIB-1400) Modify warm start logic for DL to handle case of missing weight

2020-01-09 Thread Orhan Kislal (Jira)


 [ 
https://issues.apache.org/jira/browse/MADLIB-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Orhan Kislal resolved MADLIB-1400.
--
  Assignee: Orhan Kislal
Resolution: Fixed

> Modify warm start logic for DL to handle case of missing weight
> ---
>
> Key: MADLIB-1400
> URL: https://issues.apache.org/jira/browse/MADLIB-1400
> Project: Apache MADlib
>  Issue Type: Improvement
>  Components: Deep Learning
>Reporter: Frank McQuillan
>Assignee: Orhan Kislal
>Priority: Major
> Fix For: v1.17
>
> Attachments: 20191219_163748.jpg
>
>
> I was trying to implement an autoML algorithm on top of the new multi-model 
> fit and ran into an issue with warm start.  I would suggest a slight change 
> in logic:
> Currently if there are not existing models+weights in the model table for 
> every single MST key in the MST table, we error out when warm start = TRUE.
> I suggest if there is not an entry in the model table+weights for an MST key 
> in the MST table, then randomly initialize the weights (or whatever the 
> default is in Keras) and do not error out.  Of course, if there are existing 
> models+weights in the model table for an MST key in the MST table, then use 
> them (which is currently what we do). 
> Again this all applies to warm start = TRUE only.
> Oh, if warm start = TRUE but the model table does not exist at all, we should 
> error out like we do today.
> Logic for warm start = FALSE as implemented seems OK to me.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (MADLIB-1391) PostgreSQL 12 support

2020-01-21 Thread Orhan Kislal (Jira)


 [ 
https://issues.apache.org/jira/browse/MADLIB-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Orhan Kislal closed MADLIB-1391.

Resolution: Fixed

> PostgreSQL 12 support
> -
>
> Key: MADLIB-1391
> URL: https://issues.apache.org/jira/browse/MADLIB-1391
> Project: Apache MADlib
>  Issue Type: New Feature
>  Components: All Modules
>Reporter: Frank McQuillan
>Assignee: Orhan Kislal
>Priority: Major
> Fix For: v1.17
>
>
> https://www.postgresql.org/about/news/1976/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (MADLIB-1372) MADlib Keras operations create too many threads

2020-01-21 Thread Orhan Kislal (Jira)


 [ 
https://issues.apache.org/jira/browse/MADLIB-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Orhan Kislal closed MADLIB-1372.

Resolution: Fixed

> MADlib Keras operations create too many threads
> ---
>
> Key: MADLIB-1372
> URL: https://issues.apache.org/jira/browse/MADLIB-1372
> Project: Apache MADlib
>  Issue Type: Bug
>  Components: Deep Learning
>Reporter: Orhan Kislal
>Priority: Major
> Fix For: v1.17
>
>
> We noticed that madlib keras operations that call keras functions create a 
> number of threads every single time they are called. However, some of these 
> threads are not cleaned up at the end of the function. If the number of 
> iterations is very high, the thread count keeps increasing and eventually 
> hits the system default limit (1024).
> Here is the error message from the log
> {code}
> what(): Resource temporarily 
> unavailable""SysLoggerMain","syslogger.c",618,
> {code}
> We tried keeping the session at the end of the function (saved the session 
> info in SD) and reuse it for the next iteration but that didn't help with 
> this issue.
> It is possible to increase this limit by editing /etc/security/limits.conf 
> and the files in /etc/security/limits.d/. This requires a restart of the 
> system and the database to take effect. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (MADLIB-1392) DL: Preprocessor support for asymmetric segment distribution

2020-01-21 Thread Orhan Kislal (Jira)


 [ 
https://issues.apache.org/jira/browse/MADLIB-1392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Orhan Kislal closed MADLIB-1392.

Resolution: Fixed

> DL: Preprocessor support for asymmetric segment distribution
> 
>
> Key: MADLIB-1392
> URL: https://issues.apache.org/jira/browse/MADLIB-1392
> Project: Apache MADlib
>  Issue Type: New Feature
>  Components: Deep Learning
>Reporter: Ekta Khanna
>Priority: Major
> Fix For: v1.17
>
>
> Add asymmetric segment redistribution support to the deep learning 
> preprocessor. Applies to {{training_preprocessor_dl()}} and 
> {{validation_preprocessor_dl()}}
> {code:java}
> training_preprocessor_dl(source_table,
>  output_table,
>  dependent_varname,
>  independent_varname,
>  buffer_size,
>  normalizing_const,
>  num_classes,
>  distribution_rules-- new optional param
> )
> {code}
> Following are the possible values for the new optional 
> param({{distribution_rules}})
>  # TEXT, *default*: {{all_segments}}. Specifies how to distribute the 
> {{output_table}}. This is important for how the fit function will use 
> resources on the cluster. The default {{all_segments}} means the 
> {{output_table}} will be distributed to all segments in the database cluster.
>  # If you specify {{gpu_segments}} then the {{output_table}} will be 
> distributed to all segments that are on hosts that have GPUs attached. This 
> will make maximum use of GPU resources.
>  # You can also specify the name of a resources table containing the segments 
> to use for training. This table is typically created and maintained by the 
> database administrator. Must contain a column called {{dbid}} that specifies 
> the segment id from the {{gp_segment_configuration}} table.
> Sample {{segments_to_use}} table:
> {code:java}
>  dbid | notes
>  -|--
> 2 | comment here
> 3 | comment here
> 4 | comment here
> 5 | comment here
> {code}
> Same deal as above ^^^ for validation preprocessor.
> This change adds a new column to the output summary table {{gpu_config}}, 
> contains the following values:
> # if {{distribution_policy}} = {{all_segments}}, then {{all_segments}}
> # if {{distribution_policy}} = {{gpu_segments}}, then array of segments ids 
> all segments that are on hosts that have GPUs attached
> # if {{distribution_policy}} = {{segments_to_use_table}}, then array of 
> segments ids, for the above sample {{segments_to_use}} table -> [2,3,4,5]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (MADLIB-1407) APSP fails if both vertex id column and edge src column has the same name

2020-02-07 Thread Orhan Kislal (Jira)
Orhan Kislal created MADLIB-1407:


 Summary: APSP fails if both vertex id column and edge src column 
has the same name
 Key: MADLIB-1407
 URL: https://issues.apache.org/jira/browse/MADLIB-1407
 Project: Apache MADlib
  Issue Type: Bug
  Components: Module: Graph
Reporter: Orhan Kislal
 Fix For: v1.17






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (MADLIB-1408) ASPS Path Function fails if src or dest column type is bigint

2020-02-07 Thread Orhan Kislal (Jira)
Orhan Kislal created MADLIB-1408:


 Summary: ASPS Path Function fails if src or dest column type is 
bigint
 Key: MADLIB-1408
 URL: https://issues.apache.org/jira/browse/MADLIB-1408
 Project: Apache MADlib
  Issue Type: Bug
  Components: Module: Graph
Reporter: Orhan Kislal
 Fix For: v1.17






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (MADLIB-1411) Graph/wcc fails if the user specifies a schema for the output table

2020-02-12 Thread Orhan Kislal (Jira)
Orhan Kislal created MADLIB-1411:


 Summary: Graph/wcc fails if the user specifies a schema for the 
output table
 Key: MADLIB-1411
 URL: https://issues.apache.org/jira/browse/MADLIB-1411
 Project: Apache MADlib
  Issue Type: Bug
  Components: Module: Graph
Reporter: Orhan Kislal
 Fix For: v1.17


Weakly connected components fail with the following query

select madlib.weakly_connected_components(
'a.vertex',
'id',
'a.edge',
'src=src,dest=dest',
'a.wcc'
);

with the following error:

ERROR: plpy.SPIError: syntax error at or near "."(Where Traceback (most recent 
call last):

PL/Python function "weakly_connected_components", line 21, in 

return wcc.wcc(**globals())

PL/Python function "weakly_connected_components", line 296, in wcc

PL/Python function "weakly_connected_components"; Position 87; qALTER TABLE 
_madlib_temp_newupdate18934479_1579437424_56662348_ RENAME TO a.wcc; File 
plpython.c; Line 5038; Routine PLy_elog; )

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (MADLIB-1411) Graph/wcc fails if the user specifies a schema for the output table

2020-02-12 Thread Orhan Kislal (Jira)


 [ 
https://issues.apache.org/jira/browse/MADLIB-1411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Orhan Kislal updated MADLIB-1411:
-
Priority: Minor  (was: Major)

> Graph/wcc fails if the user specifies a schema for the output table
> ---
>
> Key: MADLIB-1411
> URL: https://issues.apache.org/jira/browse/MADLIB-1411
> Project: Apache MADlib
>  Issue Type: Bug
>  Components: Module: Graph
>Reporter: Orhan Kislal
>Priority: Minor
> Fix For: v1.17
>
>
> Weakly connected components fail with the following query
> select madlib.weakly_connected_components(
> 'a.vertex',
> 'id',
> 'a.edge',
> 'src=src,dest=dest',
> 'a.wcc'
> );
> with the following error:
> ERROR: plpy.SPIError: syntax error at or near "."(Where Traceback (most 
> recent call last):
> PL/Python function "weakly_connected_components", line 21, in 
> return wcc.wcc(**globals())
> PL/Python function "weakly_connected_components", line 296, in wcc
> PL/Python function "weakly_connected_components"; Position 87; qALTER TABLE 
> _madlib_temp_newupdate18934479_1579437424_56662348_ RENAME TO a.wcc; File 
> plpython.c; Line 5038; Routine PLy_elog; )
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (MADLIB-1438) Failed to install official 1.17.0 RPM on CentOS 7.8

2020-06-24 Thread Orhan Kislal (Jira)


[ 
https://issues.apache.org/jira/browse/MADLIB-1438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17143894#comment-17143894
 ] 

Orhan Kislal commented on MADLIB-1438:
--

The 1.17.0 RPM is compiled with gcc 6.2 on CentOS. It is still possible to 
compile MADlib manually using gcc 4.8 (the default version for CentOS 7 IIRC).

> Failed to install official 1.17.0 RPM on CentOS 7.8
> ---
>
> Key: MADLIB-1438
> URL: https://issues.apache.org/jira/browse/MADLIB-1438
> Project: Apache MADlib
>  Issue Type: Improvement
>Reporter: cchsu
>Priority: Major
> Fix For: v1.17
>
>
> Hi,
>  
> I tried to install MADLib 1.17.0 on CentOS 7.8 but failed.
> {code:java}
> [root@gpdb ~]# yum install -y 
> https://dist.apache.org/repos/dist/release/madlib/1.17.0/apache-madlib-1.17.0-bin-Linux.rpm
> Loaded plugins: fastestmirror
> apache-madlib-1.17.0-bin-Linux.rpm   | 9.4 MB 
>  00:00:00 
> Examining /var/tmp/yum-root-N2Vxio/apache-madlib-1.17.0-bin-Linux.rpm: 
> madlib-1.17.0-1.x86_64
> Marking /var/tmp/yum-root-N2Vxio/apache-madlib-1.17.0-bin-Linux.rpm to be 
> installed
> Resolving Dependencies
> --> Running transaction check
> ---> Package madlib.x86_64 0:1.17.0-1 will be installed
> --> Processing Dependency: m4 >= 1.4 for package: madlib-1.17.0-1.x86_64
> Loading mirror speeds from cached hostfile
>  * base: centos4.zswap.net
>  * epel: d2lzkl7pfhq30w.cloudfront.net
>  * extras: yum.tamu.edu
>  * updates: ftp.ussg.iu.edu
> --> Processing Dependency: libstdc++.so.6(GLIBCXX_3.4.20)(64bit) for package: 
> madlib-1.17.0-1.x86_64
> --> Running transaction check
> ---> Package m4.x86_64 0:1.4.16-10.el7 will be installed
> ---> Package madlib.x86_64 0:1.17.0-1 will be installed
> --> Processing Dependency: libstdc++.so.6(GLIBCXX_3.4.20)(64bit) for package: 
> madlib-1.17.0-1.x86_64
> --> Finished Dependency Resolution
> Error: Package: madlib-1.17.0-1.x86_64 (/apache-madlib-1.17.0-bin-Linux)
>Requires: libstdc++.so.6(GLIBCXX_3.4.20)(64bit)
>  You could try using --skip-broken to work around the problem
>  You could try running: rpm -Va --nofiles --nodigest
> [root@gpdb ~]# 
> [root@gpdb ~]# cat /etc/*release
> CentOS Linux release 7.8.2003 (Core)
> NAME="CentOS Linux"
> VERSION="7 (Core)"
> ID="centos"
> ID_LIKE="rhel fedora"
> VERSION_ID="7"
> PRETTY_NAME="CentOS Linux 7 (Core)"
> ANSI_COLOR="0;31"
> CPE_NAME="cpe:/o:centos:centos:7"
> HOME_URL="https://www.centos.org/";
> BUG_REPORT_URL="https://bugs.centos.org/"CENTOS_MANTISBT_PROJECT="CentOS-7";
> CENTOS_MANTISBT_PROJECT_VERSION="7"
> REDHAT_SUPPORT_PRODUCT="centos"
> REDHAT_SUPPORT_PRODUCT_VERSION="7"CentOS Linux release 7.8.2003 (Core)
> CentOS Linux release 7.8.2003 (Core)
> [root@gpdb ~]# 
> {code}
> It seems that the latest CentOS 7.8 provides a little older version.
> {code:java}
> [root@gpdb ~]# strings /usr/lib64/libstdc++.so.6 | grep LIBCXX
> GLIBCXX_3.4
> GLIBCXX_3.4.1
> GLIBCXX_3.4.2
> GLIBCXX_3.4.3
> GLIBCXX_3.4.4
> GLIBCXX_3.4.5
> GLIBCXX_3.4.6
> GLIBCXX_3.4.7
> GLIBCXX_3.4.8
> GLIBCXX_3.4.9
> GLIBCXX_3.4.10
> GLIBCXX_3.4.11
> GLIBCXX_3.4.12
> GLIBCXX_3.4.13
> GLIBCXX_3.4.14
> GLIBCXX_3.4.15
> GLIBCXX_3.4.16
> GLIBCXX_3.4.17
> GLIBCXX_3.4.18
> GLIBCXX_3.4.19
> GLIBCXX_DEBUG_MESSAGE_LENGTH
> [root@gpdb ~]# 
> {code}
> Is it possible to improve this issue?
>  
> Thanks. :)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (MADLIB-1438) Failed to install official 1.17.0 RPM on CentOS 7.8

2020-06-24 Thread Orhan Kislal (Jira)


[ 
https://issues.apache.org/jira/browse/MADLIB-1438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17144592#comment-17144592
 ] 

Orhan Kislal commented on MADLIB-1438:
--

Hi [~cchsu],

IIRC, the docker image we used for compiling this version had a manually 
compiled gcc6.2. I believe you can get a high enough version of gcc via the 
Software Collection, did you try devtoolset-6 or devtoolset-7? Here is the link 
for instructions: 
[https://www.softwarecollections.org/en/scls/rhscl/devtoolset-6/]

Hope it helps,

Best

> Failed to install official 1.17.0 RPM on CentOS 7.8
> ---
>
> Key: MADLIB-1438
> URL: https://issues.apache.org/jira/browse/MADLIB-1438
> Project: Apache MADlib
>  Issue Type: Improvement
>Reporter: cchsu
>Priority: Major
> Fix For: v1.17
>
>
> Hi,
>  
> I tried to install MADLib 1.17.0 on CentOS 7.8 but failed.
> {code:java}
> [root@gpdb ~]# yum install -y 
> https://dist.apache.org/repos/dist/release/madlib/1.17.0/apache-madlib-1.17.0-bin-Linux.rpm
> Loaded plugins: fastestmirror
> apache-madlib-1.17.0-bin-Linux.rpm   | 9.4 MB 
>  00:00:00 
> Examining /var/tmp/yum-root-N2Vxio/apache-madlib-1.17.0-bin-Linux.rpm: 
> madlib-1.17.0-1.x86_64
> Marking /var/tmp/yum-root-N2Vxio/apache-madlib-1.17.0-bin-Linux.rpm to be 
> installed
> Resolving Dependencies
> --> Running transaction check
> ---> Package madlib.x86_64 0:1.17.0-1 will be installed
> --> Processing Dependency: m4 >= 1.4 for package: madlib-1.17.0-1.x86_64
> Loading mirror speeds from cached hostfile
>  * base: centos4.zswap.net
>  * epel: d2lzkl7pfhq30w.cloudfront.net
>  * extras: yum.tamu.edu
>  * updates: ftp.ussg.iu.edu
> --> Processing Dependency: libstdc++.so.6(GLIBCXX_3.4.20)(64bit) for package: 
> madlib-1.17.0-1.x86_64
> --> Running transaction check
> ---> Package m4.x86_64 0:1.4.16-10.el7 will be installed
> ---> Package madlib.x86_64 0:1.17.0-1 will be installed
> --> Processing Dependency: libstdc++.so.6(GLIBCXX_3.4.20)(64bit) for package: 
> madlib-1.17.0-1.x86_64
> --> Finished Dependency Resolution
> Error: Package: madlib-1.17.0-1.x86_64 (/apache-madlib-1.17.0-bin-Linux)
>Requires: libstdc++.so.6(GLIBCXX_3.4.20)(64bit)
>  You could try using --skip-broken to work around the problem
>  You could try running: rpm -Va --nofiles --nodigest
> [root@gpdb ~]# 
> [root@gpdb ~]# cat /etc/*release
> CentOS Linux release 7.8.2003 (Core)
> NAME="CentOS Linux"
> VERSION="7 (Core)"
> ID="centos"
> ID_LIKE="rhel fedora"
> VERSION_ID="7"
> PRETTY_NAME="CentOS Linux 7 (Core)"
> ANSI_COLOR="0;31"
> CPE_NAME="cpe:/o:centos:centos:7"
> HOME_URL="https://www.centos.org/";
> BUG_REPORT_URL="https://bugs.centos.org/"CENTOS_MANTISBT_PROJECT="CentOS-7";
> CENTOS_MANTISBT_PROJECT_VERSION="7"
> REDHAT_SUPPORT_PRODUCT="centos"
> REDHAT_SUPPORT_PRODUCT_VERSION="7"CentOS Linux release 7.8.2003 (Core)
> CentOS Linux release 7.8.2003 (Core)
> [root@gpdb ~]# 
> {code}
> It seems that the latest CentOS 7.8 provides a little older version.
> {code:java}
> [root@gpdb ~]# strings /usr/lib64/libstdc++.so.6 | grep LIBCXX
> GLIBCXX_3.4
> GLIBCXX_3.4.1
> GLIBCXX_3.4.2
> GLIBCXX_3.4.3
> GLIBCXX_3.4.4
> GLIBCXX_3.4.5
> GLIBCXX_3.4.6
> GLIBCXX_3.4.7
> GLIBCXX_3.4.8
> GLIBCXX_3.4.9
> GLIBCXX_3.4.10
> GLIBCXX_3.4.11
> GLIBCXX_3.4.12
> GLIBCXX_3.4.13
> GLIBCXX_3.4.14
> GLIBCXX_3.4.15
> GLIBCXX_3.4.16
> GLIBCXX_3.4.17
> GLIBCXX_3.4.18
> GLIBCXX_3.4.19
> GLIBCXX_DEBUG_MESSAGE_LENGTH
> [root@gpdb ~]# 
> {code}
> Is it possible to improve this issue?
>  
> Thanks. :)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (MADLIB-1442) Do not try to drop output tables

2020-07-09 Thread Orhan Kislal (Jira)
Orhan Kislal created MADLIB-1442:


 Summary: Do not try to drop output tables
 Key: MADLIB-1442
 URL: https://issues.apache.org/jira/browse/MADLIB-1442
 Project: Apache MADlib
  Issue Type: Improvement
  Components: All Modules
Reporter: Orhan Kislal
 Fix For: v1.18.0


It seems MADlib drops the output table before creating a new one in some 
modules. We should not do this, instead, the user should be warned of the issue.

In some cases, we first check if the output table exists and then try to drop 
the non-existent table anyway. We should remove these queries as well.

The list of modules that might have this issue:

Summary, assoc_rules, glm, pca_project, logistic regression



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (MADLIB-1457) DL: Add Multiple input/output support to load, fit, and evaulate

2020-10-21 Thread Orhan Kislal (Jira)
Orhan Kislal created MADLIB-1457:


 Summary: DL: Add Multiple input/output support to load, fit, and 
evaulate
 Key: MADLIB-1457
 URL: https://issues.apache.org/jira/browse/MADLIB-1457
 Project: Apache MADlib
  Issue Type: New Feature
  Components: Deep Learning
Reporter: Orhan Kislal
 Fix For: v1.18.0


We should add support for multiple input and output to increase the usability 
of the deep learning module. This is essential for YOLO v3 and v4 models.

 

This JIRA should track the changes in the load, fit, and evaluate functions. We 
should start with a simple use case and consider more advanced features in the 
later JIRAs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (MADLIB-1457) DL: Add Multiple input/output support to load, fit, and evaluate

2020-10-21 Thread Orhan Kislal (Jira)


 [ 
https://issues.apache.org/jira/browse/MADLIB-1457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Orhan Kislal updated MADLIB-1457:
-
Summary: DL: Add Multiple input/output support to load, fit, and evaluate  
(was: DL: Add Multiple input/output support to load, fit, and evaulate)

> DL: Add Multiple input/output support to load, fit, and evaluate
> 
>
> Key: MADLIB-1457
> URL: https://issues.apache.org/jira/browse/MADLIB-1457
> Project: Apache MADlib
>  Issue Type: New Feature
>  Components: Deep Learning
>Reporter: Orhan Kislal
>Priority: Major
> Fix For: v1.18.0
>
>
> We should add support for multiple input and output to increase the usability 
> of the deep learning module. This is essential for YOLO v3 and v4 models.
>  
> This JIRA should track the changes in the load, fit, and evaluate functions. 
> We should start with a simple use case and consider more advanced features in 
> the later JIRAs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (MADLIB-1458) DL: Add multiple input/output support on advanced features

2020-10-21 Thread Orhan Kislal (Jira)
Orhan Kislal created MADLIB-1458:


 Summary: DL: Add multiple input/output support on advanced features
 Key: MADLIB-1458
 URL: https://issues.apache.org/jira/browse/MADLIB-1458
 Project: Apache MADlib
  Issue Type: New Feature
  Components: Deep Learning
Reporter: Orhan Kislal
 Fix For: v1.18.0


This is a follow-up JIRA on MADLIB-1457. 

We should test the advanced features of the deep learning module with multiple 
inputs and outputs and make necessary changes to ensure that they work as 
expected.
 # predict
 # transfer learning
 # warm start 
 # custom loss and metrics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (MADLIB-1459) DL: Add Multiple input/output support to multi-model functions

2020-10-21 Thread Orhan Kislal (Jira)
Orhan Kislal created MADLIB-1459:


 Summary: DL: Add Multiple input/output support to multi-model 
functions
 Key: MADLIB-1459
 URL: https://issues.apache.org/jira/browse/MADLIB-1459
 Project: Apache MADlib
  Issue Type: New Feature
  Components: Deep Learning
Reporter: Orhan Kislal
 Fix For: v1.18.0


This is a follow-up to MADLIB-1457

We should add support for multiple input and outputs to the multi-model methods.
 # fit multiple
 # automl
 # hyperband

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (MADLIB-1465) DL: Iris predict accuracy has regressed

2021-01-29 Thread Orhan Kislal (Jira)
Orhan Kislal created MADLIB-1465:


 Summary: DL: Iris predict accuracy has regressed
 Key: MADLIB-1465
 URL: https://issues.apache.org/jira/browse/MADLIB-1465
 Project: Apache MADlib
  Issue Type: Bug
  Components: Deep Learning
Reporter: Orhan Kislal
 Fix For: v1.18.0


The docs examples have the iris prediction accuracy at 6 misclassifications out 
of 30 records. The current master branch have it at 23 missclassification. It 
seems the class labels are swapped around. Note: BYOM doesn't have this issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (MADLIB-1465) DL: Iris predict accuracy has regressed

2021-02-01 Thread Orhan Kislal (Jira)


[ 
https://issues.apache.org/jira/browse/MADLIB-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17276396#comment-17276396
 ] 

Orhan Kislal commented on MADLIB-1465:
--

I tried to recreate this issue on the exact same system but couldn't. It might 
have been a false alarm due to an overwritten summary table or something 
similar. It'll be great if another person could double-check the documentation 
example.

> DL: Iris predict accuracy has regressed
> ---
>
> Key: MADLIB-1465
> URL: https://issues.apache.org/jira/browse/MADLIB-1465
> Project: Apache MADlib
>  Issue Type: Bug
>  Components: Deep Learning
>Reporter: Orhan Kislal
>Priority: Major
> Fix For: v1.18.0
>
>
> The docs examples have the iris prediction accuracy at 6 misclassifications 
> out of 30 records. The current master branch have it at 23 
> missclassification. It seems the class labels are swapped around. Note: BYOM 
> doesn't have this issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (MADLIB-1465) DL: Iris predict accuracy has regressed

2021-02-02 Thread Orhan Kislal (Jira)


 [ 
https://issues.apache.org/jira/browse/MADLIB-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Orhan Kislal reassigned MADLIB-1465:


Assignee: Orhan Kislal

> DL: Iris predict accuracy has regressed
> ---
>
> Key: MADLIB-1465
> URL: https://issues.apache.org/jira/browse/MADLIB-1465
> Project: Apache MADlib
>  Issue Type: Bug
>  Components: Deep Learning
>Reporter: Orhan Kislal
>Assignee: Orhan Kislal
>Priority: Major
> Fix For: v1.18.0
>
>
> The docs examples have the iris prediction accuracy at 6 misclassifications 
> out of 30 records. The current master branch have it at 23 
> missclassification. It seems the class labels are swapped around. Note: BYOM 
> doesn't have this issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (MADLIB-1465) DL: Iris predict accuracy has regressed

2021-02-02 Thread Orhan Kislal (Jira)


[ 
https://issues.apache.org/jira/browse/MADLIB-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277344#comment-17277344
 ] 

Orhan Kislal commented on MADLIB-1465:
--

We tested it a number of times to replicate the issue to no avail. Seems like a 
noop

> DL: Iris predict accuracy has regressed
> ---
>
> Key: MADLIB-1465
> URL: https://issues.apache.org/jira/browse/MADLIB-1465
> Project: Apache MADlib
>  Issue Type: Bug
>  Components: Deep Learning
>Reporter: Orhan Kislal
>Priority: Major
> Fix For: v1.18.0
>
>
> The docs examples have the iris prediction accuracy at 6 misclassifications 
> out of 30 records. The current master branch have it at 23 
> missclassification. It seems the class labels are swapped around. Note: BYOM 
> doesn't have this issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (MADLIB-1465) DL: Iris predict accuracy has regressed

2021-02-02 Thread Orhan Kislal (Jira)


 [ 
https://issues.apache.org/jira/browse/MADLIB-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Orhan Kislal closed MADLIB-1465.

Resolution: Cannot Reproduce

> DL: Iris predict accuracy has regressed
> ---
>
> Key: MADLIB-1465
> URL: https://issues.apache.org/jira/browse/MADLIB-1465
> Project: Apache MADlib
>  Issue Type: Bug
>  Components: Deep Learning
>Reporter: Orhan Kislal
>Assignee: Orhan Kislal
>Priority: Major
> Fix For: v1.18.0
>
>
> The docs examples have the iris prediction accuracy at 6 misclassifications 
> out of 30 records. The current master branch have it at 23 
> missclassification. It seems the class labels are swapped around. Note: BYOM 
> doesn't have this issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (MADLIB-1472) DL: BYOM fails at get_num_classes

2021-03-08 Thread Orhan Kislal (Jira)
Orhan Kislal created MADLIB-1472:


 Summary: DL: BYOM fails at get_num_classes
 Key: MADLIB-1472
 URL: https://issues.apache.org/jira/browse/MADLIB-1472
 Project: Apache MADlib
  Issue Type: Bug
  Components: Deep Learning
Reporter: Orhan Kislal
 Fix For: v1.18.0


Error:

CONTEXT: PL/Python function "madlib_keras_predict_byom"
ERROR: plpy.Error: Unable to get number of classes from model architecture. 
(plpy_elog.c:121)
CONTEXT: Traceback (most recent call last):
 PL/Python function "madlib_keras_predict_byom", line 23, in 
 madlib_keras_predict.PredictBYOM(**globals())
 PL/Python function "madlib_keras_predict_byom", line 42, in wrapper
 PL/Python function "madlib_keras_predict_byom", line 314, in __init__
 PL/Python function "madlib_keras_predict_byom", line 326, in 
validate_and_set_defaults
 PL/Python function "madlib_keras_predict_byom", line 207, in 
set_default_class_values
 PL/Python function "madlib_keras_predict_byom", line 78, in get_num_classes
PL/Python function "madlib_keras_predict_byom"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (MADLIB-1434) MLP - add RMSprop solver

2021-04-07 Thread Orhan Kislal (Jira)


 [ 
https://issues.apache.org/jira/browse/MADLIB-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Orhan Kislal reassigned MADLIB-1434:


Assignee: Orhan Kislal

> MLP - add RMSprop solver
> 
>
> Key: MADLIB-1434
> URL: https://issues.apache.org/jira/browse/MADLIB-1434
> Project: Apache MADlib
>  Issue Type: New Feature
>  Components: Module: Neural Networks
>Reporter: Frank McQuillan
>Assignee: Orhan Kislal
>Priority: Major
> Fix For: v1.19.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (MADLIB-1435) MLP - add Adam solver

2021-04-07 Thread Orhan Kislal (Jira)


 [ 
https://issues.apache.org/jira/browse/MADLIB-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Orhan Kislal reassigned MADLIB-1435:


Assignee: Orhan Kislal

> MLP - add Adam solver
> -
>
> Key: MADLIB-1435
> URL: https://issues.apache.org/jira/browse/MADLIB-1435
> Project: Apache MADlib
>  Issue Type: New Feature
>  Components: Module: Neural Networks
>Reporter: Frank McQuillan
>Assignee: Orhan Kislal
>Priority: Major
> Fix For: v1.19.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (MADLIB-1493) Build: Add OSX tarball for release

2022-01-06 Thread Orhan Kislal (Jira)
Orhan Kislal created MADLIB-1493:


 Summary: Build: Add OSX tarball for release
 Key: MADLIB-1493
 URL: https://issues.apache.org/jira/browse/MADLIB-1493
 Project: Apache MADlib
  Issue Type: New Feature
  Components: Build System
Reporter: Orhan Kislal
 Fix For: v1.19.0


Since the usual OSX package is not supported anymore, we should at least create 
a compressed file that contains the build folder. 

The file should be portable, the user might unpack it in any location, not just 
/usr/local/. In addition, there should be a way to update symlinks Current, 
bin, and doc.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (MADLIB-1498) dbconnector interface

2022-04-27 Thread Orhan Kislal (Jira)


[ 
https://issues.apache.org/jira/browse/MADLIB-1498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17528845#comment-17528845
 ] 

Orhan Kislal commented on MADLIB-1498:
--

Hi Sabina,

Thanks for your interest in MADlib. I assume you know of our wiki and have seen 
the linked papers etc. at [https://cwiki.apache.org/confluence/display/MADLIB]

The next point of reference is the design doc at 
[https://madlib.apache.org/design.pdf]

This document should give you some idea on the design of the abstraction layer 
as well as the individual modules.

Good luck, please keep us up-to-date on your progress and let us know if we can 
help in any way.

Orhan

> dbconnector interface
> -
>
> Key: MADLIB-1498
> URL: https://issues.apache.org/jira/browse/MADLIB-1498
> Project: Apache MADlib
>  Issue Type: Question
>  Components: DB Abstraction Layer, Documentation
>Reporter: Sabina Dayanova
>Priority: Major
>
> Dear MADlib contributors!
> My name is Sabina, and I am currently working on integrating MADlib library 
> into another DBMS - 
> [ClickHouse|[https://clickhouse.com|https://clickhouse.com/]] I am trying to 
> understand the dbconnector interface, so that I can adapt it to the DBMS that 
> I am working with. Unfortunately, I am having a hard time doing that, since 
> there is no explanation of what the dbconnector.hpp file should exactly do. 
> I was wondering if you could give me some helpful information about it.
> Thanks!
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Closed] (MADLIB-1493) Build: Add OSX tarball for release

2022-04-29 Thread Orhan Kislal (Jira)


 [ 
https://issues.apache.org/jira/browse/MADLIB-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Orhan Kislal closed MADLIB-1493.

Resolution: Won't Do

> Build: Add OSX tarball for release
> --
>
> Key: MADLIB-1493
> URL: https://issues.apache.org/jira/browse/MADLIB-1493
> Project: Apache MADlib
>  Issue Type: New Feature
>  Components: Build System
>Reporter: Orhan Kislal
>Priority: Major
> Fix For: v1.19.0
>
>
> Since the usual OSX package is not supported anymore, we should at least 
> create a compressed file that contains the build folder. 
> The file should be portable, the user might unpack it in any location, not 
> just /usr/local/. In addition, there should be a way to update symlinks 
> Current, bin, and doc.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (MADLIB-1486) Madlib support python3.0

2022-04-29 Thread Orhan Kislal (Jira)


[ 
https://issues.apache.org/jira/browse/MADLIB-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17530184#comment-17530184
 ] 

Orhan Kislal commented on MADLIB-1486:
--

Hi [~soar],

I have tried to use your branch 
([https://github.com/soarpenguin/madlib/tree/madlib_py3)] for testing on 
Greenplum 7 but I hit a compilation error very early on.

I wanted to test PG 13 as well but brew does not have plpython support in its 
PostgreSQL formulas and their docker images are weirdly unintuitive. Which OS 
are you working with? 

Thanks

> Madlib support python3.0
> 
>
> Key: MADLIB-1486
> URL: https://issues.apache.org/jira/browse/MADLIB-1486
> Project: Apache MADlib
>  Issue Type: Wish
>Reporter: penguin
>Priority: Critical
>
> I tried to run madlib on postgres 13 but I failed. I then found that madlib 
> can only support python2.7 instead of python3. Do you plan to support 
> python3.0?



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (MADLIB-1499) madpack install error: libmadlib.so : No such file or directory

2022-05-24 Thread Orhan Kislal (Jira)


[ 
https://issues.apache.org/jira/browse/MADLIB-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17541746#comment-17541746
 ] 

Orhan Kislal commented on MADLIB-1499:
--

Hello,

Could you give us your cmake command and output as well as a bit more info 
about your system, OS and DB versions?

Thanks

> madpack install error: libmadlib.so : No such file or directory
> ---
>
> Key: MADLIB-1499
> URL: https://issues.apache.org/jira/browse/MADLIB-1499
> Project: Apache MADlib
>  Issue Type: Question
>  Components: Build System, Madpack
>Reporter: seth.qiang
>Priority: Major
>
> Hello everyone, I am referring to the documentation to compile madlib and an 
> error occurs when installing to greenplum. How should I solve it?  
> libmadlib.so file exists 
>  
> {code:java}
> CREATE SCHEMA madlib;
> CREATE SCHEMA
> DROP TABLE IF EXISTS madlib.migrationhistory;
> psql:/tmp/madlib.70V1nQ/madlib_install.sql:2: NOTICE:  table 
> "migrationhistory" does not exist, skipping
> DROP TABLE
> CREATE TABLE madlib.migrationhistory
>                    (id serial, version varchar(255),
>                     applied timestamp default current_timestamp);
> psql:/tmp/madlib.70V1nQ/madlib_install.sql:5: NOTICE:  Table doesn't have 
> 'DISTRIBUTED BY' clause -- Using column named 'id' as the Greenplum Database 
> data distribution key for this table.
> HINT:  The 'DISTRIBUTED BY' clause determines the distribution of data. Make 
> sure column(s) chosen are the optimal data distribution key to minimize skew.
> CREATE TABLEINSERT INTO madlib.migrationhistory(version)
>                             VALUES('1.18.0');
> INSERT 0 1CREATE OR REPLACE FUNCTION madlib.array_add(x anyarray, y anyarray) 
> RETURNS anyarray
> AS '/apache-madlib-1.18.0-src/build/src/ports/greenplum/6/lib/libmadlib.so', 
> 'array_add'
> LANGUAGE C IMMUTABLE
> NO SQL;
> psql:/tmp/madlib.70V1nQ/madlib_install.sql:23: ERROR:  could not access file 
> "/apache-madlib-1.18.0-src/build/src/ports/greenplum/6/lib/libmadlib.so": No 
> such file or directory {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (MADLIB-1499) madpack install error: libmadlib.so : No such file or directory

2022-05-26 Thread Orhan Kislal (Jira)


[ 
https://issues.apache.org/jira/browse/MADLIB-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17542595#comment-17542595
 ] 

Orhan Kislal commented on MADLIB-1499:
--

Thanks for the update, feel free to close the issue.

> madpack install error: libmadlib.so : No such file or directory
> ---
>
> Key: MADLIB-1499
> URL: https://issues.apache.org/jira/browse/MADLIB-1499
> Project: Apache MADlib
>  Issue Type: Question
>  Components: Build System, Madpack
>Reporter: seth.qiang
>Priority: Major
>
> Hello everyone, I am referring to the documentation to compile madlib and an 
> error occurs when installing to greenplum. How should I solve it?  
> libmadlib.so file exists 
>  
> {code:java}
> CREATE SCHEMA madlib;
> CREATE SCHEMA
> DROP TABLE IF EXISTS madlib.migrationhistory;
> psql:/tmp/madlib.70V1nQ/madlib_install.sql:2: NOTICE:  table 
> "migrationhistory" does not exist, skipping
> DROP TABLE
> CREATE TABLE madlib.migrationhistory
>                    (id serial, version varchar(255),
>                     applied timestamp default current_timestamp);
> psql:/tmp/madlib.70V1nQ/madlib_install.sql:5: NOTICE:  Table doesn't have 
> 'DISTRIBUTED BY' clause -- Using column named 'id' as the Greenplum Database 
> data distribution key for this table.
> HINT:  The 'DISTRIBUTED BY' clause determines the distribution of data. Make 
> sure column(s) chosen are the optimal data distribution key to minimize skew.
> CREATE TABLEINSERT INTO madlib.migrationhistory(version)
>                             VALUES('1.18.0');
> INSERT 0 1CREATE OR REPLACE FUNCTION madlib.array_add(x anyarray, y anyarray) 
> RETURNS anyarray
> AS '/apache-madlib-1.18.0-src/build/src/ports/greenplum/6/lib/libmadlib.so', 
> 'array_add'
> LANGUAGE C IMMUTABLE
> NO SQL;
> psql:/tmp/madlib.70V1nQ/madlib_install.sql:23: ERROR:  could not access file 
> "/apache-madlib-1.18.0-src/build/src/ports/greenplum/6/lib/libmadlib.so": No 
> such file or directory {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Comment Edited] (MADLIB-1500) install madlib on greenplum could not access file "$libdir/plpython2"

2022-06-08 Thread Orhan Kislal (Jira)


[ 
https://issues.apache.org/jira/browse/MADLIB-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17551771#comment-17551771
 ] 

Orhan Kislal edited comment on MADLIB-1500 at 6/8/22 8:14 PM:
--

Hi [~hiseth] ,

Could you check the following folder: `$GPHOME/lib/postgresql`. If your gpdb 
compilation worked as expected, you should see the `plpython2.so` file in there 
with the other .so files. If not, I would advise you to check the configure 
output for your gpdb compilation steps. 

You can check the python availability by creating a new database and running 
`create language plpythonu;` as well.

Thanks


was (Author: okislal):
Hi [~hiseth] ,

Could you check the following folder: `$GPHOME/lib/postgresql`. If your gpdb 
compilation worked as expected, you should see `plpython2.so` file in there 
with the other .so files. If not, I would advise you to check the configure 
output for your gpdb compilation steps. 

You can check the python availability by creating a new database and running 
`create language plpythonu;` as well.

Thanks

> install madlib on greenplum could not access file "$libdir/plpython2"
> -
>
> Key: MADLIB-1500
> URL: https://issues.apache.org/jira/browse/MADLIB-1500
> Project: Apache MADlib
>  Issue Type: Question
>Reporter: seth.qiang
>Priority: Major
>
> hi, everyone  i try to install madlib on greenplum, but  centos7 it's ok
> system: ubuntu20.04
> madlib: 1.18.0
> database: greenplum 6.18.2
> {code:java}
> madpack.py: INFO : Detected Greenplum DB version 6.18.2.
> madpack.py: INFO : *** Installing MADlib ***
> madpack.py: INFO : MADlib tools version    = 1.18.0 
> (/home/test/Desktop/gpdb-6.18.2.1-linux/gpdb/src/bin/../madpack/madpack.py)
> madpack.py: INFO : MADlib database version = None (host=localhost:15432, 
> db=test, schema=madlib)
> madpack.py: INFO : Testing PL/Python environment...
> madpack.py: INFO : > Creating language PL/Python...
> SQL command failed: 
> SQL: CREATE LANGUAGE plpythonu; 
> ERROR:  could not access file "$libdir/plpython2": No such file or directory
> : ERROR : False
> madpack.py: ERROR : Cannot create language plpythonu. Please check if you
>                 have configured and installed portid (your platform) with
>                 `--with-python` option. Stopping installation...
> Traceback (most recent call last): {code}
> I find this article : 
> [https://cwiki.apache.org/confluence/display/MADLIB/Installation+Guide]
> my gpdb used --python options



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (MADLIB-1500) install madlib on greenplum could not access file "$libdir/plpython2"

2022-06-08 Thread Orhan Kislal (Jira)


[ 
https://issues.apache.org/jira/browse/MADLIB-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17551771#comment-17551771
 ] 

Orhan Kislal commented on MADLIB-1500:
--

Hi [~hiseth] ,

Could you check the following folder: `$GPHOME/lib/postgresql`. If your gpdb 
compilation worked as expected, you should see `plpython2.so` file in there 
with the other .so files. If not, I would advise you to check the configure 
output for your gpdb compilation steps. 

You can check the python availability by creating a new database and running 
`create language plpythonu;` as well.

Thanks

> install madlib on greenplum could not access file "$libdir/plpython2"
> -
>
> Key: MADLIB-1500
> URL: https://issues.apache.org/jira/browse/MADLIB-1500
> Project: Apache MADlib
>  Issue Type: Question
>Reporter: seth.qiang
>Priority: Major
>
> hi, everyone  i try to install madlib on greenplum, but  centos7 it's ok
> system: ubuntu20.04
> madlib: 1.18.0
> database: greenplum 6.18.2
> {code:java}
> madpack.py: INFO : Detected Greenplum DB version 6.18.2.
> madpack.py: INFO : *** Installing MADlib ***
> madpack.py: INFO : MADlib tools version    = 1.18.0 
> (/home/test/Desktop/gpdb-6.18.2.1-linux/gpdb/src/bin/../madpack/madpack.py)
> madpack.py: INFO : MADlib database version = None (host=localhost:15432, 
> db=test, schema=madlib)
> madpack.py: INFO : Testing PL/Python environment...
> madpack.py: INFO : > Creating language PL/Python...
> SQL command failed: 
> SQL: CREATE LANGUAGE plpythonu; 
> ERROR:  could not access file "$libdir/plpython2": No such file or directory
> : ERROR : False
> madpack.py: ERROR : Cannot create language plpythonu. Please check if you
>                 have configured and installed portid (your platform) with
>                 `--with-python` option. Stopping installation...
> Traceback (most recent call last): {code}
> I find this article : 
> [https://cwiki.apache.org/confluence/display/MADLIB/Installation+Guide]
> my gpdb used --python options



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (MADLIB-1486) Madlib support python3.0

2022-06-13 Thread Orhan Kislal (Jira)


[ 
https://issues.apache.org/jira/browse/MADLIB-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17553742#comment-17553742
 ] 

Orhan Kislal commented on MADLIB-1486:
--

Unfortunately, I cannot even pass the cmake on PostgreSQL 13 (on MacOS). I'll 
try to get an ubuntu system to try it once I get the chance. In the meanwhile, 
could you paste the error you get here?

> Madlib support python3.0
> 
>
> Key: MADLIB-1486
> URL: https://issues.apache.org/jira/browse/MADLIB-1486
> Project: Apache MADlib
>  Issue Type: Wish
>Reporter: penguin
>Priority: Critical
>
> I tried to run madlib on postgres 13 but I failed. I then found that madlib 
> can only support python2.7 instead of python3. Do you plan to support 
> python3.0?



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (MADLIB-1501) Can not train model larger than 1GB.

2022-06-23 Thread Orhan Kislal (Jira)


[ 
https://issues.apache.org/jira/browse/MADLIB-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17558172#comment-17558172
 ] 

Orhan Kislal commented on MADLIB-1501:
--

Hi [~Blairruc-pku],

Unfortunately, PostgreSQL has a hard limit of 1GB on the field size: 
[https://www.postgresql.org/docs/current/limits.html.] 

Since MADlib models have to be stored in a field to iterate over them we cannot 
work around this issue. 

Please note that MADlib is designed to work with both PostgreSQL and Greenplum, 
so we have to consider building partial models and merging them for the 
multiple segments of GPDB. We will continue exploring ideas to overcome this 
problem, but for now, we must abide by this limitation.

> Can not train model larger than 1GB.
> 
>
> Key: MADLIB-1501
> URL: https://issues.apache.org/jira/browse/MADLIB-1501
> Project: Apache MADlib
>  Issue Type: Bug
>  Components: Deep Learning
>Reporter: Xinyi Zhang
>Priority: Major
> Fix For: v1.19.0
>
>
> When I want to train a model whose size is large than 1GB on Greenplum, I get 
> the error below:
> CONTEXT: PL/Python function "madlib_keras_fit"
> ERROR: spiexceptions.InternalError: invalid memory alloc request size 
> 1100478264 (plpy_elog.c:121) .
>  
> But If I use a smaller model, it can run successfully.
> It seems that "SELECT \{schema_madlib}.fit_step()" can not execute when the 
> model is larger than 1GB.
> I set my shared_buffers to 32GB, and the instance has 290G memory available. 
> So, something wrong might happen to the memory allocation in Madlib.
> I did not find any parameters to solve the problem. But since the large model 
> is quite common, I think there should be a solution for training models 
> larger than 1GB in Madlib.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (MADLIB-1503) Add multi column support for PageRank

2022-06-23 Thread Orhan Kislal (Jira)
Orhan Kislal created MADLIB-1503:


 Summary: Add multi column support for PageRank
 Key: MADLIB-1503
 URL: https://issues.apache.org/jira/browse/MADLIB-1503
 Project: Apache MADlib
  Issue Type: New Feature
Reporter: Orhan Kislal


PageRank should support multiple columns as vertex identifiers



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (MADLIB-1502) Add multi column support for wcc

2022-06-23 Thread Orhan Kislal (Jira)
Orhan Kislal created MADLIB-1502:


 Summary: Add multi column support for wcc
 Key: MADLIB-1502
 URL: https://issues.apache.org/jira/browse/MADLIB-1502
 Project: Apache MADlib
  Issue Type: New Feature
Reporter: Orhan Kislal


WCC should support multiple columns as vertex identifiers



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (MADLIB-1438) Failed to install official 1.17.0 RPM on CentOS 7.8

2022-06-23 Thread Orhan Kislal (Jira)


 [ 
https://issues.apache.org/jira/browse/MADLIB-1438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Orhan Kislal resolved MADLIB-1438.
--
Fix Version/s: v1.19.0
   (was: v1.17)
   Resolution: Fixed

The latest release fixes this issue.

> Failed to install official 1.17.0 RPM on CentOS 7.8
> ---
>
> Key: MADLIB-1438
> URL: https://issues.apache.org/jira/browse/MADLIB-1438
> Project: Apache MADlib
>  Issue Type: Improvement
>Reporter: cchsu
>Priority: Major
> Fix For: v1.19.0
>
>
> Hi,
>  
> I tried to install MADLib 1.17.0 on CentOS 7.8 but failed.
> {code:java}
> [root@gpdb ~]# yum install -y 
> https://dist.apache.org/repos/dist/release/madlib/1.17.0/apache-madlib-1.17.0-bin-Linux.rpm
> Loaded plugins: fastestmirror
> apache-madlib-1.17.0-bin-Linux.rpm   | 9.4 MB 
>  00:00:00 
> Examining /var/tmp/yum-root-N2Vxio/apache-madlib-1.17.0-bin-Linux.rpm: 
> madlib-1.17.0-1.x86_64
> Marking /var/tmp/yum-root-N2Vxio/apache-madlib-1.17.0-bin-Linux.rpm to be 
> installed
> Resolving Dependencies
> --> Running transaction check
> ---> Package madlib.x86_64 0:1.17.0-1 will be installed
> --> Processing Dependency: m4 >= 1.4 for package: madlib-1.17.0-1.x86_64
> Loading mirror speeds from cached hostfile
>  * base: centos4.zswap.net
>  * epel: d2lzkl7pfhq30w.cloudfront.net
>  * extras: yum.tamu.edu
>  * updates: ftp.ussg.iu.edu
> --> Processing Dependency: libstdc++.so.6(GLIBCXX_3.4.20)(64bit) for package: 
> madlib-1.17.0-1.x86_64
> --> Running transaction check
> ---> Package m4.x86_64 0:1.4.16-10.el7 will be installed
> ---> Package madlib.x86_64 0:1.17.0-1 will be installed
> --> Processing Dependency: libstdc++.so.6(GLIBCXX_3.4.20)(64bit) for package: 
> madlib-1.17.0-1.x86_64
> --> Finished Dependency Resolution
> Error: Package: madlib-1.17.0-1.x86_64 (/apache-madlib-1.17.0-bin-Linux)
>Requires: libstdc++.so.6(GLIBCXX_3.4.20)(64bit)
>  You could try using --skip-broken to work around the problem
>  You could try running: rpm -Va --nofiles --nodigest
> [root@gpdb ~]# 
> [root@gpdb ~]# cat /etc/*release
> CentOS Linux release 7.8.2003 (Core)
> NAME="CentOS Linux"
> VERSION="7 (Core)"
> ID="centos"
> ID_LIKE="rhel fedora"
> VERSION_ID="7"
> PRETTY_NAME="CentOS Linux 7 (Core)"
> ANSI_COLOR="0;31"
> CPE_NAME="cpe:/o:centos:centos:7"
> HOME_URL="https://www.centos.org/";
> BUG_REPORT_URL="https://bugs.centos.org/"CENTOS_MANTISBT_PROJECT="CentOS-7";
> CENTOS_MANTISBT_PROJECT_VERSION="7"
> REDHAT_SUPPORT_PRODUCT="centos"
> REDHAT_SUPPORT_PRODUCT_VERSION="7"CentOS Linux release 7.8.2003 (Core)
> CentOS Linux release 7.8.2003 (Core)
> [root@gpdb ~]# 
> {code}
> It seems that the latest CentOS 7.8 provides a little older version.
> {code:java}
> [root@gpdb ~]# strings /usr/lib64/libstdc++.so.6 | grep LIBCXX
> GLIBCXX_3.4
> GLIBCXX_3.4.1
> GLIBCXX_3.4.2
> GLIBCXX_3.4.3
> GLIBCXX_3.4.4
> GLIBCXX_3.4.5
> GLIBCXX_3.4.6
> GLIBCXX_3.4.7
> GLIBCXX_3.4.8
> GLIBCXX_3.4.9
> GLIBCXX_3.4.10
> GLIBCXX_3.4.11
> GLIBCXX_3.4.12
> GLIBCXX_3.4.13
> GLIBCXX_3.4.14
> GLIBCXX_3.4.15
> GLIBCXX_3.4.16
> GLIBCXX_3.4.17
> GLIBCXX_3.4.18
> GLIBCXX_3.4.19
> GLIBCXX_DEBUG_MESSAGE_LENGTH
> [root@gpdb ~]# 
> {code}
> Is it possible to improve this issue?
>  
> Thanks. :)



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Closed] (MADLIB-1501) Can not train model larger than 1GB.

2022-07-07 Thread Orhan Kislal (Jira)


 [ 
https://issues.apache.org/jira/browse/MADLIB-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Orhan Kislal closed MADLIB-1501.

Fix Version/s: (was: v1.19.0)
   Resolution: Invalid

> Can not train model larger than 1GB.
> 
>
> Key: MADLIB-1501
> URL: https://issues.apache.org/jira/browse/MADLIB-1501
> Project: Apache MADlib
>  Issue Type: Bug
>  Components: Deep Learning
>Reporter: Xinyi Zhang
>Priority: Major
>
> When I want to train a model whose size is large than 1GB on Greenplum, I get 
> the error below:
> CONTEXT: PL/Python function "madlib_keras_fit"
> ERROR: spiexceptions.InternalError: invalid memory alloc request size 
> 1100478264 (plpy_elog.c:121) .
>  
> But If I use a smaller model, it can run successfully.
> It seems that "SELECT \{schema_madlib}.fit_step()" can not execute when the 
> model is larger than 1GB.
> I set my shared_buffers to 32GB, and the instance has 290G memory available. 
> So, something wrong might happen to the memory allocation in Madlib.
> I did not find any parameters to solve the problem. But since the large model 
> is quite common, I think there should be a solution for training models 
> larger than 1GB in Madlib.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (MADLIB-1502) Add multi column support for wcc

2022-07-07 Thread Orhan Kislal (Jira)


 [ 
https://issues.apache.org/jira/browse/MADLIB-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Orhan Kislal resolved MADLIB-1502.
--
Resolution: Fixed

> Add multi column support for wcc
> 
>
> Key: MADLIB-1502
> URL: https://issues.apache.org/jira/browse/MADLIB-1502
> Project: Apache MADlib
>  Issue Type: New Feature
>Reporter: Orhan Kislal
>Priority: Major
>
> WCC should support multiple columns as vertex identifiers



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (MADLIB-1503) Add multi column support for PageRank

2022-07-07 Thread Orhan Kislal (Jira)


 [ 
https://issues.apache.org/jira/browse/MADLIB-1503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Orhan Kislal resolved MADLIB-1503.
--
Resolution: Fixed

> Add multi column support for PageRank
> -
>
> Key: MADLIB-1503
> URL: https://issues.apache.org/jira/browse/MADLIB-1503
> Project: Apache MADlib
>  Issue Type: New Feature
>Reporter: Orhan Kislal
>Priority: Major
>
> PageRank should support multiple columns as vertex identifiers



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (MADLIB-1505) Support for Windows Platform

2022-07-18 Thread Orhan Kislal (Jira)


[ 
https://issues.apache.org/jira/browse/MADLIB-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17568133#comment-17568133
 ] 

Orhan Kislal commented on MADLIB-1505:
--

Hi,

Unfortunately, we don't officially support Windows. You might want to try 
compiling from source but I don't think it will be an easy task.

Thanks

> Support for Windows Platform
> 
>
> Key: MADLIB-1505
> URL: https://issues.apache.org/jira/browse/MADLIB-1505
> Project: Apache MADlib
>  Issue Type: New Feature
>Reporter: Poorna Chandra Raju
>Priority: Major
>
> *Is madlib available for Windows platform at least for Postgres ?*
> If not, is it possible to build existing code on Windows or is it already in 
> the road map ?
> Let me know the possibilities and challenges on achieving this task.
> Thanks.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (MADLIB-1504) Does madlib support arm platform?

2022-07-18 Thread Orhan Kislal (Jira)


[ 
https://issues.apache.org/jira/browse/MADLIB-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17568136#comment-17568136
 ] 

Orhan Kislal commented on MADLIB-1504:
--

Hi,

We haven't tested with ARM but as long as you have the necessary dependencies, 
it should be possible to compile. Please feel free to post any errors here for 
us to take a look at. Officially supporting ARM would be great!

Thanks

> Does madlib support arm platform? 
> --
>
> Key: MADLIB-1504
> URL: https://issues.apache.org/jira/browse/MADLIB-1504
> Project: Apache MADlib
>  Issue Type: Question
>Reporter: seth.qiang
>Priority: Major
>
> I looked up the documentation and didn't see anything about the arm platform 
> does anyone know about this?
> https://cwiki.apache.org/confluence/display/MADLIB/Database+and+OS+Support



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


  1   2   >