[GitHub] madlib issue #189: Pivot: Reference "current" PostgreSQL docs instead of 8.2...

2017-10-05 Thread iyerr3
Github user iyerr3 commented on the issue: https://github.com/apache/madlib/pull/189 There is a minor risk in doing this. GPDB does not support all features in the current Postgres and I believe the documentation is trying to redirect both GPDB and Postgres users. ---

[GitHub] madlib issue #189: Pivot: Reference "current" PostgreSQL docs instead of 8.2...

2017-10-06 Thread iyerr3
Github user iyerr3 commented on the issue: https://github.com/apache/madlib/pull/189 Looks great. On Fri, Oct 6, 2017 at 1:03 PM, Ed Espino wrote: > @rahiyer <https://github.com/rahiyer> - Collaborating with Lisa Owen ( > @lisakowen <https://githu

[GitHub] madlib pull request #191: KNN: Fix optional parameters and ordering

2017-10-24 Thread iyerr3
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/madlib/pull/191#discussion_r146646897 --- Diff: src/ports/postgres/modules/knn/knn.py_in --- @@ -135,13 +135,17 @@ def knn(schema_madlib, point_source, point_column_name, point_id, label_column_n

[GitHub] madlib pull request #191: KNN: Fix optional parameters and ordering

2017-10-24 Thread iyerr3
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/madlib/pull/191#discussion_r146646995 --- Diff: src/ports/postgres/modules/knn/knn.py_in --- @@ -135,13 +135,17 @@ def knn(schema_madlib, point_source, point_column_name, point_id, label_column_n

[GitHub] madlib pull request #191: KNN: Fix optional parameters and ordering

2017-10-24 Thread iyerr3
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/madlib/pull/191#discussion_r146703574 --- Diff: src/ports/postgres/modules/knn/knn.py_in --- @@ -215,7 +222,8 @@ def knn(schema_madlib, point_source, point_column_name, point_id, label_column_n

[GitHub] madlib pull request #191: KNN: Fix optional parameters and ordering

2017-10-24 Thread iyerr3
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/madlib/pull/191#discussion_r146702725 --- Diff: src/ports/postgres/modules/knn/knn.py_in --- @@ -135,13 +135,17 @@ def knn(schema_madlib, point_source, point_column_name, point_id, label_column_n

[GitHub] madlib pull request #191: KNN: Fix optional parameters and ordering

2017-10-24 Thread iyerr3
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/madlib/pull/191#discussion_r146702329 --- Diff: src/ports/postgres/modules/knn/knn.py_in --- @@ -135,13 +135,17 @@ def knn(schema_madlib, point_source, point_column_name, point_id, label_column_n

[GitHub] madlib pull request #191: KNN: Fix optional parameters and ordering

2017-10-24 Thread iyerr3
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/madlib/pull/191#discussion_r146702427 --- Diff: src/ports/postgres/modules/knn/knn.py_in --- @@ -135,13 +135,17 @@ def knn(schema_madlib, point_source, point_column_name, point_id, label_column_n

[GitHub] madlib issue #191: KNN: Fix optional parameters and ordering

2017-10-24 Thread iyerr3
Github user iyerr3 commented on the issue: https://github.com/apache/madlib/pull/191 jenkins ok to retest ---

[GitHub] madlib pull request #192: LMF: Disable ORCA to improve the performance

2017-10-27 Thread iyerr3
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/madlib/pull/192#discussion_r147543845 --- Diff: src/ports/postgres/modules/convex/lmf_igd.py_in --- @@ -33,40 +34,45 @@ def compute_lmf_igd(schema_madlib, rel_args, rel_state, rel_source

[GitHub] madlib pull request #195: Feature: Add grouping support to HITS

2017-11-13 Thread iyerr3
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/madlib/pull/195#discussion_r150637914 --- Diff: src/ports/postgres/modules/utilities/utilities.py_in --- @@ -709,6 +709,17 @@ def _check_groups(tbl1, tbl2, grp_list): return ' AND &

[GitHub] madlib pull request #195: Feature: Add grouping support to HITS

2017-11-13 Thread iyerr3
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/madlib/pull/195#discussion_r150638553 --- Diff: src/ports/postgres/modules/utilities/validate_args.py_in --- @@ -262,6 +262,13 @@ def get_first_schema(table_name): return None

[GitHub] madlib issue #195: Feature: Add grouping support to HITS

2017-11-13 Thread iyerr3
Github user iyerr3 commented on the issue: https://github.com/apache/madlib/pull/195 Not sure what the `*_for_centrality_measures` names mean - are those function only used in centrality measures? ---

[GitHub] madlib pull request #197: Fix madlib version parsing for upgrade

2017-11-13 Thread iyerr3
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/madlib/pull/197#discussion_r150640177 --- Diff: src/madpack/upgrade_util.py --- @@ -142,11 +142,11 @@ def _load(self): """ # _mad_dbrev = 1.9.1 -

[GitHub] madlib pull request #197: Fix madlib version parsing for upgrade

2017-11-13 Thread iyerr3
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/madlib/pull/197#discussion_r150640686 --- Diff: src/madpack/upgrade_util.py --- @@ -142,11 +142,11 @@ def _load(self): """ # _mad_dbrev = 1.9.1 -

[GitHub] madlib pull request #197: Fix madlib version parsing for upgrade

2017-11-13 Thread iyerr3
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/madlib/pull/197#discussion_r150684741 --- Diff: src/madpack/upgrade_util.py --- @@ -142,11 +142,11 @@ def _load(self): """ # _mad_dbrev = 1.9.1 -

[GitHub] madlib pull request #197: Fix madlib version parsing for upgrade

2017-11-13 Thread iyerr3
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/madlib/pull/197#discussion_r150697403 --- Diff: src/madpack/upgrade_util.py --- @@ -142,11 +142,11 @@ def _load(self): """ # _mad_dbrev = 1.9.1 -

[GitHub] madlib pull request #200: Madpack: Move unit tests + refactor minor code

2017-11-14 Thread iyerr3
GitHub user iyerr3 opened a pull request: https://github.com/apache/madlib/pull/200 Madpack: Move unit tests + refactor minor code 1. Unit tests for get_rev_num and is_rev_gte moved to the correct location 2. GPDB 5/6 version extraction made more general 3. Bare except

[GitHub] madlib pull request #201: Allow array feature with more than 1664 entries

2017-11-14 Thread iyerr3
GitHub user iyerr3 opened a pull request: https://github.com/apache/madlib/pull/201 Allow array feature with more than 1664 entries JIRA: MADLIB-1173 The tree_predict function concatenates cat_feature_str and con_feature_str in summary table to obtain the feature string

[GitHub] madlib pull request #202: Multiple: Add casting to allow compilation in GCC ...

2017-11-15 Thread iyerr3
GitHub user iyerr3 opened a pull request: https://github.com/apache/madlib/pull/202 Multiple: Add casting to allow compilation in GCC 6+ JIRA: MADLIB-1025 GCC 6+ introduced stricter rules for implicit casting where loss of information is possible. Closes #202

[GitHub] madlib pull request #195: Feature: Add grouping support to HITS

2017-11-16 Thread iyerr3
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/madlib/pull/195#discussion_r151526779 --- Diff: src/ports/postgres/modules/graph/hits.py_in --- @@ -95,234 +109,391 @@ def hits(schema_madlib, vertex_table, vertex_id, edge_table, edge_args

[GitHub] madlib pull request #203: Build: Create single binary for all PG10 versions

2017-11-17 Thread iyerr3
GitHub user iyerr3 opened a pull request: https://github.com/apache/madlib/pull/203 Build: Create single binary for all PG10 versions JIRA: MADLIB-1179 Postgresql starting 10.0 is switching to semantic versioning (see https://www.postgresql.org/support/versioning

[GitHub] madlib pull request #204: Added additional distance metrics for k-NN: Jira-1...

2017-11-20 Thread iyerr3
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/madlib/pull/204#discussion_r152129858 --- Diff: src/ports/postgres/modules/knn/test/knn.sql_in --- @@ -73,23 +73,23 @@ copy knn_test_data (id, data) from stdin delimiter

[GitHub] madlib pull request #204: Added additional distance metrics for k-NN: Jira-1...

2017-11-20 Thread iyerr3
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/madlib/pull/204#discussion_r152123514 --- Diff: src/ports/postgres/modules/knn/knn.py_in --- @@ -88,12 +88,28 @@ def knn_validate_src(schema_madlib, point_source, point_column_name, point_id

[GitHub] madlib pull request #204: Added additional distance metrics for k-NN: Jira-1...

2017-11-20 Thread iyerr3
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/madlib/pull/204#discussion_r152122999 --- Diff: src/ports/postgres/modules/knn/knn.py_in --- @@ -88,12 +88,28 @@ def knn_validate_src(schema_madlib, point_source, point_column_name, point_id

[GitHub] madlib pull request #204: Added additional distance metrics for k-NN: Jira-1...

2017-11-20 Thread iyerr3
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/madlib/pull/204#discussion_r152122803 --- Diff: src/ports/postgres/modules/knn/knn.py_in --- @@ -88,12 +88,28 @@ def knn_validate_src(schema_madlib, point_source, point_column_name, point_id

[GitHub] madlib pull request #195: Feature: Add grouping support to HITS

2017-11-20 Thread iyerr3
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/madlib/pull/195#discussion_r152136163 --- Diff: src/ports/postgres/modules/utilities/utilities.py_in --- @@ -709,16 +709,35 @@ def _check_groups(tbl1, tbl2, grp_list): return ' AND &

[GitHub] madlib pull request #204: Added additional distance metrics for k-NN: Jira-1...

2017-11-21 Thread iyerr3
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/madlib/pull/204#discussion_r152419848 --- Diff: src/ports/postgres/modules/knn/knn.py_in --- @@ -89,20 +89,20 @@ def knn_validate_src(schema_madlib, point_source, point_column_name, point_id

[GitHub] madlib pull request #204: Added additional distance metrics for k-NN: Jira-1...

2017-11-21 Thread iyerr3
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/madlib/pull/204#discussion_r152420416 --- Diff: src/ports/postgres/modules/knn/knn.py_in --- @@ -155,6 +155,9 @@ def knn(schema_madlib, point_source, point_column_name, point_id, label_column_n

[GitHub] madlib pull request #204: Added additional distance metrics for k-NN: Jira-1...

2017-11-21 Thread iyerr3
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/madlib/pull/204#discussion_r152420629 --- Diff: src/ports/postgres/modules/knn/test/knn.sql_in --- @@ -73,23 +73,23 @@ copy knn_test_data (id, data) from stdin delimiter

[GitHub] madlib pull request #204: Added additional distance metrics for k-NN: Jira-1...

2017-11-21 Thread iyerr3
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/madlib/pull/204#discussion_r152420323 --- Diff: src/ports/postgres/modules/knn/knn.py_in --- @@ -89,20 +89,20 @@ def knn_validate_src(schema_madlib, point_source, point_column_name, point_id

[GitHub] madlib pull request #205: hits and lin regr doc updates

2017-11-21 Thread iyerr3
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/madlib/pull/205#discussion_r152463529 --- Diff: src/ports/postgres/modules/graph/hits.sql_in --- @@ -103,18 +102,18 @@ It is named by adding the suffix '_summary' to the

[GitHub] madlib pull request #205: hits and lin regr doc updates

2017-11-21 Thread iyerr3
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/madlib/pull/205#discussion_r152463581 --- Diff: src/ports/postgres/modules/graph/hits.sql_in --- @@ -103,18 +102,18 @@ It is named by adding the suffix '_summary' to the

[GitHub] madlib pull request #205: hits and lin regr doc updates

2017-11-21 Thread iyerr3
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/madlib/pull/205#discussion_r152463751 --- Diff: src/ports/postgres/modules/regress/linear.sql_in --- @@ -183,16 +183,15 @@ FROM ( @par Prediction Function The prediction function is as

[GitHub] madlib issue #205: hits and lin regr doc updates

2017-11-22 Thread iyerr3
Github user iyerr3 commented on the issue: https://github.com/apache/madlib/pull/205 Commit daf67f81b merges this PR. This can be closed. ---

[GitHub] madlib issue #204: Added additional distance metrics for k-NN: Jira-1059

2017-11-30 Thread iyerr3
Github user iyerr3 commented on the issue: https://github.com/apache/madlib/pull/204 Changes look good. I'll merge this. ---

[GitHub] madlib pull request #206: Feature: Allow NULL in rows for computing correlat...

2017-11-30 Thread iyerr3
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/madlib/pull/206#discussion_r154242803 --- Diff: src/ports/postgres/modules/stats/correlation.sql_in --- @@ -207,8 +203,9 @@ Result: @par Notes -Current implementation ignores

[GitHub] madlib pull request #206: Feature: Allow NULL in rows for computing correlat...

2017-11-30 Thread iyerr3
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/madlib/pull/206#discussion_r154241686 --- Diff: src/ports/postgres/modules/stats/correlation.py_in --- @@ -180,31 +180,29 @@ def _populate_output_table(schema_madlib, source_table, output_table

[GitHub] madlib pull request #206: Feature: Allow NULL in rows for computing correlat...

2017-11-30 Thread iyerr3
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/madlib/pull/206#discussion_r154242141 --- Diff: src/ports/postgres/modules/stats/correlation.py_in --- @@ -180,31 +180,29 @@ def _populate_output_table(schema_madlib, source_table, output_table

[GitHub] madlib issue #206: Feature: Allow NULL in rows for computing correlations an...

2017-12-01 Thread iyerr3
Github user iyerr3 commented on the issue: https://github.com/apache/madlib/pull/206 Also please change commit message to better indicate the intention of the commit. Something along the lines of "Correlation: Impute NULL values with mean" ---

[GitHub] madlib pull request #206: Feature: Allow NULL in rows for computing correlat...

2017-12-01 Thread iyerr3
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/madlib/pull/206#discussion_r154422395 --- Diff: src/ports/postgres/modules/stats/correlation.py_in --- @@ -165,8 +165,8 @@ def _populate_output_table(schema_madlib, source_table, output_table

[GitHub] madlib pull request #206: Feature: Allow NULL in rows for computing correlat...

2017-12-01 Thread iyerr3
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/madlib/pull/206#discussion_r154422562 --- Diff: src/ports/postgres/modules/stats/correlation.sql_in --- @@ -207,8 +204,17 @@ Result: @par Notes -Current implementation ignores

[GitHub] madlib issue #208: correct knn user docs

2017-12-04 Thread iyerr3
Github user iyerr3 commented on the issue: https://github.com/apache/madlib/pull/208 Looks good, will merge this. ---

[GitHub] madlib pull request #211: Change madlib gppkg version string

2017-12-07 Thread iyerr3
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/madlib/pull/211#discussion_r155598556 --- Diff: deploy/CMakeLists.txt --- @@ -82,4 +82,4 @@ cpack_add_component_group(ports file(GLOB PORT_COMPONENTS "${CMAKE_CURRENT_BINARY_DIR}/Compo

[GitHub] madlib pull request #211: Change madlib gppkg version string

2017-12-07 Thread iyerr3
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/madlib/pull/211#discussion_r155598729 --- Diff: deploy/gppkg/CMakeLists.txt --- @@ -2,8 +2,11 @@ # Packaging for Greenplum's

[GitHub] madlib pull request #211: Change madlib gppkg version string

2017-12-07 Thread iyerr3
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/madlib/pull/211#discussion_r155598502 --- Diff: cmake/LinuxUtils.cmake --- @@ -9,3 +9,14 @@ macro(rh_version OUT_VERSION) set(${OUT_VERSION} "${OUT_VERSION}-NOTFOUND")

[GitHub] madlib pull request #211: Change madlib gppkg version string

2017-12-07 Thread iyerr3
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/madlib/pull/211#discussion_r155598453 --- Diff: CMakeLists.txt --- @@ -275,4 +275,3 @@ install(CODE " ${CMAKE_MADLIB_ROOT}/doc ) ") - --

[GitHub] madlib pull request #211: Change madlib gppkg version string

2017-12-07 Thread iyerr3
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/madlib/pull/211#discussion_r155599007 --- Diff: src/ports/greenplum/cmake/GreenplumUtils.cmake --- @@ -17,6 +17,9 @@ function(add_gppkg GPDB_VERSION GPDB_VARIANT GPDB_VARIANT_SHORT

[GitHub] madlib pull request #211: Change madlib gppkg version string

2017-12-07 Thread iyerr3
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/madlib/pull/211#discussion_r155598266 --- Diff: cmake/LinuxUtils.cmake --- @@ -9,3 +9,14 @@ macro(rh_version OUT_VERSION) set(${OUT_VERSION} "${OUT_VERSION}-NOTFOUND")

[GitHub] madlib pull request #211: Change madlib gppkg version string

2017-12-07 Thread iyerr3
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/madlib/pull/211#discussion_r155597507 --- Diff: CMakeLists.txt --- @@ -275,4 +275,3 @@ install(CODE " ${CMAKE_MADLIB_ROOT}/doc ) ") --

[GitHub] madlib pull request #214: Correlation: Fix bug with international characters

2017-12-12 Thread iyerr3
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/madlib/pull/214#discussion_r156472963 --- Diff: src/ports/postgres/modules/stats/correlation.py_in --- @@ -179,9 +179,11 @@ def _populate_output_table(schema_madlib, source_table, output_table

[GitHub] madlib issue #215: Modify knn help funtion for easier use : JIRA-1187

2017-12-12 Thread iyerr3
Github user iyerr3 commented on the issue: https://github.com/apache/madlib/pull/215 Is there overlap between this and #213? Maybe consolidate to a single PR? ---

[GitHub] madlib pull request #216: Release: Upgrade to v1.13

2017-12-14 Thread iyerr3
GitHub user iyerr3 opened a pull request: https://github.com/apache/madlib/pull/216 Release: Upgrade to v1.13 You can merge this pull request into a Git repository by running: $ git pull https://github.com/iyerr3/incubator-madlib feature/upgrade_to_1.13 Alternatively you

[GitHub] madlib issue #216: Release: Upgrade to v1.13

2017-12-17 Thread iyerr3
Github user iyerr3 commented on the issue: https://github.com/apache/madlib/pull/216 Upgrade has been tested with postgres 9.6 and greenplum 4.3, 5.0. Merging this PR. ---

[GitHub] madlib pull request #217: Release: Update RELEASE_NOTES for v1.13

2017-12-17 Thread iyerr3
GitHub user iyerr3 opened a pull request: https://github.com/apache/madlib/pull/217 Release: Update RELEASE_NOTES for v1.13 JIRA: MADLIB-1189 You can merge this pull request into a Git repository by running: $ git pull https://github.com/iyerr3/incubator-madlib feature

[GitHub] madlib pull request #217: Release: Update RELEASE_NOTES for v1.13

2017-12-18 Thread iyerr3
Github user iyerr3 closed the pull request at: https://github.com/apache/madlib/pull/217 ---

[GitHub] madlib issue #217: Release: Update RELEASE_NOTES for v1.13

2017-12-18 Thread iyerr3
Github user iyerr3 commented on the issue: https://github.com/apache/madlib/pull/217 Closed this in d0ad93d261337661e40312caa9168eb9d6dc761f ---

[GitHub] madlib pull request #219: Multiple: Hard-wire values for construct_array cal...

2017-12-21 Thread iyerr3
GitHub user iyerr3 opened a pull request: https://github.com/apache/madlib/pull/219 Multiple: Hard-wire values for construct_array calls JIRA: MADLIB-1185 Original investigation and RCA performed by Nikhil Kak and Orhan Kislal Multiple modules called

[GitHub] madlib pull request #221: Multiple: Hard-wire values for construct_array cal...

2017-12-26 Thread iyerr3
GitHub user iyerr3 opened a pull request: https://github.com/apache/madlib/pull/221 Multiple: Hard-wire values for construct_array calls JIRA: MADLIB-1185 Original investigation and RCA performed by Nikhil Kak and Orhan Kislal Multiple modules called

[GitHub] madlib issue #220: Add more stats to summary function

2018-01-02 Thread iyerr3
Github user iyerr3 commented on the issue: https://github.com/apache/madlib/pull/220 This can be closed since merged by d025bb4609baeb7c7a1d136590780a8fafdee208. ---

[GitHub] madlib issue #222: minor update to summary() user docs

2018-01-03 Thread iyerr3
Github user iyerr3 commented on the issue: https://github.com/apache/madlib/pull/222 +1 ---

[GitHub] madlib pull request #225: Added option for weighted average for both classif...

2018-01-18 Thread iyerr3
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/madlib/pull/225#discussion_r162371982 --- Diff: src/ports/postgres/modules/knn/knn.py_in --- @@ -211,23 +222,43 @@ def knn(schema_madlib, point_source, point_column_name, point_id

[GitHub] madlib pull request #225: Added option for weighted average for both classif...

2018-01-18 Thread iyerr3
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/madlib/pull/225#discussion_r162369682 --- Diff: src/ports/postgres/modules/knn/knn.py_in --- @@ -167,22 +169,31 @@ def knn(schema_madlib, point_source, point_column_name, point_id

[GitHub] madlib pull request #225: Added option for weighted average for both classif...

2018-01-18 Thread iyerr3
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/madlib/pull/225#discussion_r162369645 --- Diff: src/ports/postgres/modules/knn/knn.py_in --- @@ -167,22 +169,31 @@ def knn(schema_madlib, point_source, point_column_name, point_id

[GitHub] madlib pull request #225: Added option for weighted average for both classif...

2018-01-18 Thread iyerr3
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/madlib/pull/225#discussion_r162369486 --- Diff: src/ports/postgres/modules/knn/knn.py_in --- @@ -167,22 +169,31 @@ def knn(schema_madlib, point_source, point_column_name, point_id

[GitHub] madlib pull request #229: SVM: Add minibatch as a new solver

2018-01-19 Thread iyerr3
GitHub user iyerr3 opened a pull request: https://github.com/apache/madlib/pull/229 SVM: Add minibatch as a new solver Additional author: Nikhil Kak This work is based on the original work by Xiaocheng Tang in #75. This PR adds two main features: 1. A

[GitHub] madlib pull request #229: SVM: Add minibatch as a new solver

2018-01-22 Thread iyerr3
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/madlib/pull/229#discussion_r163022926 --- Diff: src/modules/convex/algo/igd.hpp --- @@ -34,7 +34,10 @@ class IGD { typedef typename Task::model_type model_type; static void

[GitHub] madlib pull request #231: RF: Output non-negative importance values

2018-02-06 Thread iyerr3
GitHub user iyerr3 opened a pull request: https://github.com/apache/madlib/pull/231 RF: Output non-negative importance values Variable importance is computed in RF as the difference in prediction accuracy between original data and permuted data from out-of-bag samples (OOB

[GitHub] madlib issue #231: RF: Output non-negative importance values

2018-02-06 Thread iyerr3
Github user iyerr3 commented on the issue: https://github.com/apache/madlib/pull/231 This change ensures that all variable importance values are positive. The remaining properties remain as is: i.e. the feature with max value is most important and the values are not normalized

[GitHub] madlib pull request #235: update KNN, DT and RF docs to match recent commits

2018-02-15 Thread iyerr3
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/madlib/pull/235#discussion_r168523662 --- Diff: src/ports/postgres/modules/recursive_partitioning/decision_tree.sql_in --- @@ -355,6 +355,19 @@ tree_train( independent_var_types

[GitHub] madlib pull request #235: update KNN, DT and RF docs to match recent commits

2018-02-15 Thread iyerr3
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/madlib/pull/235#discussion_r168523757 --- Diff: src/ports/postgres/modules/recursive_partitioning/random_forest.sql_in --- @@ -208,13 +208,26 @@ forest_train(training_table_name

[GitHub] madlib pull request #236: DT: Ensure n_folds and null_proxy are set correctl...

2018-02-15 Thread iyerr3
GitHub user iyerr3 opened a pull request: https://github.com/apache/madlib/pull/236 DT: Ensure n_folds and null_proxy are set correctly The summary table in Decision Tree included two entries: k and null_proxy. The 'k' value is supposed to reflect the 'n_folds

[GitHub] madlib pull request #234: Create lower case column name in encode_categorica...

2018-02-15 Thread iyerr3
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/madlib/pull/234#discussion_r168525141 --- Diff: src/ports/postgres/modules/utilities/encode_categorical.py_in --- @@ -317,7 +317,19 @@ class CategoricalEncoder(object): if

[GitHub] madlib pull request #232: Multiple LDA improvements and fixes

2018-02-15 Thread iyerr3
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/madlib/pull/232#discussion_r168526426 --- Diff: src/ports/postgres/modules/lda/lda.py_in --- @@ -120,14 +120,22 @@ class LDATrainer: # etime = time.time() # plpy.notice

[GitHub] madlib pull request #:

2018-02-16 Thread iyerr3
Github user iyerr3 commented on the pull request: https://github.com/apache/madlib/commit/b3d528c44c01f507cd18e1676d65698a46366b10#commitcomment-27615498 In src/ports/postgres/modules/utilities/encode_categorical.py_in: In src/ports/postgres/modules/utilities

[GitHub] madlib issue #234: Create lower case column name in encode_categorical_varia...

2018-02-16 Thread iyerr3
Github user iyerr3 commented on the issue: https://github.com/apache/madlib/pull/234 LGTM ---

[GitHub] madlib issue #234: Create lower case column name in encode_categorical_varia...

2018-02-20 Thread iyerr3
Github user iyerr3 commented on the issue: https://github.com/apache/madlib/pull/234 I've pushed a commit (912a4d629) to `madlib/madlib` repo, branch `feature/encode_categorial_column_name_change` to address above issues. That commit can be cherry-picked here to continue with this PR. ---

[GitHub] madlib issue #234: Create lower case column name in encode_categorical_varia...

2018-02-21 Thread iyerr3
Github user iyerr3 commented on the issue: https://github.com/apache/madlib/pull/234 Please note in the above example `height>.10_false` and `height>.10_true` will have to be double quoted when referred to. The lower case `false` and `true` does not eliminate the quotes i

[GitHub] madlib pull request #238: MLP: Use array_upper to get the last array element

2018-02-21 Thread iyerr3
GitHub user iyerr3 opened a pull request: https://github.com/apache/madlib/pull/238 MLP: Use array_upper to get the last array element JIRA: MADLIB-1209 Postgresql arrays can be indexed in an arbitrary range. Hence, array_length is not necessarily the last element of

[GitHub] madlib pull request #237: Bugfix: MLP predict using 1.12 model fails on late...

2018-02-22 Thread iyerr3
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/madlib/pull/237#discussion_r169846145 --- Diff: src/ports/postgres/modules/convex/mlp_igd.py_in --- @@ -796,14 +807,34 @@ def mlp_predict(schema_madlib, else: # if not

[GitHub] madlib pull request #237: Bugfix: MLP predict using 1.12 model fails on late...

2018-02-22 Thread iyerr3
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/madlib/pull/237#discussion_r169846321 --- Diff: src/ports/postgres/modules/convex/mlp_igd.py_in --- @@ -796,14 +807,34 @@ def mlp_predict(schema_madlib, else: # if not

[GitHub] madlib pull request #237: Bugfix: MLP predict using 1.12 model fails on late...

2018-02-22 Thread iyerr3
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/madlib/pull/237#discussion_r169845958 --- Diff: src/ports/postgres/modules/convex/mlp_igd.py_in --- @@ -749,8 +749,18 @@ def mlp_predict(schema_madlib, summary['layer_

[GitHub] madlib pull request #239: Balance Sample: Add support for grouping

2018-03-06 Thread iyerr3
GitHub user iyerr3 opened a pull request: https://github.com/apache/madlib/pull/239 Balance Sample: Add support for grouping JIRA: MADLIB-1168 This commit adds grouping support for balanced sampling. Grouping is implemented as a loop over the existing logic, with

[GitHub] madlib pull request #239: Balance Sample: Add support for grouping

2018-03-07 Thread iyerr3
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/madlib/pull/239#discussion_r172941914 --- Diff: src/ports/postgres/modules/sample/balance_sample.sql_in --- @@ -543,6 +544,90 @@ SELECT * FROM output_table ORDER BY mainhue, name; (25 rows

[GitHub] madlib pull request #239: Balance Sample: Add support for grouping

2018-03-07 Thread iyerr3
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/madlib/pull/239#discussion_r172943311 --- Diff: src/ports/postgres/modules/sample/balance_sample.sql_in --- @@ -543,6 +544,90 @@ SELECT * FROM output_table ORDER BY mainhue, name; (25 rows

[GitHub] madlib pull request #239: Balance Sample: Add support for grouping

2018-03-07 Thread iyerr3
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/madlib/pull/239#discussion_r173080410 --- Diff: src/ports/postgres/modules/sample/balance_sample.py_in --- @@ -58,28 +60,64 @@ NOSAMPLE = 'nosample' NEW_ID_COLUMN = 

[GitHub] madlib pull request #239: Balance Sample: Add support for grouping

2018-03-07 Thread iyerr3
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/madlib/pull/239#discussion_r173080726 --- Diff: src/ports/postgres/modules/sample/balance_sample.py_in --- @@ -468,81 +544,107 @@ def balance_sample(schema_madlib, source_table, output_table

[GitHub] madlib pull request #242: PCA: Fix issue with text grouping col input

2018-03-14 Thread iyerr3
GitHub user iyerr3 opened a pull request: https://github.com/apache/madlib/pull/242 PCA: Fix issue with text grouping col input JIRA: MADLIB-1215 PCA fails when the grouping column is a text column (a common use case). This is because the column is compared to its

[GitHub] madlib pull request #246: DT user doc updates

2018-03-20 Thread iyerr3
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/madlib/pull/246#discussion_r175924018 --- Diff: src/ports/postgres/modules/recursive_partitioning/decision_tree.sql_in --- @@ -127,7 +132,11 @@ tree_train( weights (optional

[GitHub] madlib pull request #246: DT user doc updates

2018-03-20 Thread iyerr3
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/madlib/pull/246#discussion_r175927937 --- Diff: src/ports/postgres/modules/recursive_partitioning/decision_tree.sql_in --- @@ -418,7 +468,10 @@ tree_predict(tree_model, new_data_table

[GitHub] madlib pull request #247: SVM: Revert minibatch-related work

2018-03-22 Thread iyerr3
GitHub user iyerr3 opened a pull request: https://github.com/apache/madlib/pull/247 SVM: Revert minibatch-related work This commit is a partial revert of a8bbe08. Minibatch was not found to be useful for SVM and broken due to recent changes. We're disablin

[GitHub] madlib issue #247: SVM: Revert minibatch-related work

2018-03-22 Thread iyerr3
Github user iyerr3 commented on the issue: https://github.com/apache/madlib/pull/247 @kaknikhil Yes, both those changes are related to MLP and ideally should have been in a different commit. We can still make it that way by reverting them here and then reintroducing these

[GitHub] madlib pull request #248: DT: Ensure proper quoting in grouping coalesce

2018-03-22 Thread iyerr3
GitHub user iyerr3 opened a pull request: https://github.com/apache/madlib/pull/248 DT: Ensure proper quoting in grouping coalesce JIRA: MADLIB-1217 Grouping column value is coalesced with null_proxy to get the right null identifier when null_as_category is True. The

[GitHub] madlib issue #248: DT: Ensure proper quoting in grouping coalesce

2018-03-23 Thread iyerr3
Github user iyerr3 commented on the issue: https://github.com/apache/madlib/pull/248 I've combined two independent commits into a single PR, since the changes in the commits are close by and would benefit from the same reviewer looking at both changes. ---

[GitHub] madlib issue #248: DT: Ensure proper quoting in grouping coalesce

2018-03-23 Thread iyerr3
Github user iyerr3 commented on the issue: https://github.com/apache/madlib/pull/248 Commits 97064e2 and 2ad0bf7 will eventually be combined in ffd6355 before merging. ---

[GitHub] madlib pull request #246: DT user doc updates

2018-03-23 Thread iyerr3
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/madlib/pull/246#discussion_r176661267 --- Diff: src/ports/postgres/modules/recursive_partitioning/random_forest.sql_in --- @@ -97,327 +264,220 @@ forest_train(training_table_name

[GitHub] madlib pull request #246: DT user doc updates

2018-03-23 Thread iyerr3
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/madlib/pull/246#discussion_r176660561 --- Diff: src/ports/postgres/modules/recursive_partitioning/decision_tree.sql_in --- @@ -127,18 +128,20 @@ tree_train( grouping_cols (optional

[GitHub] madlib pull request #249: RF: Use NULL::integer[] when no continuous feature...

2018-03-25 Thread iyerr3
GitHub user iyerr3 opened a pull request: https://github.com/apache/madlib/pull/249 RF: Use NULL::integer[] when no continuous features JIRA: MADLIB-1219 When variable importance is enabled, to compute importance score, distribution of the categorical and continuous

[GitHub] madlib pull request #248: DT: Ensure proper quoting in grouping coalesce

2018-03-27 Thread iyerr3
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/madlib/pull/248#discussion_r177503935 --- Diff: src/ports/postgres/modules/recursive_partitioning/decision_tree.py_in --- @@ -970,16 +970,35 @@ def _get_bins_grps

[GitHub] madlib issue #249: RF: Use NULL::integer[] when no continuous features

2018-03-27 Thread iyerr3
Github user iyerr3 commented on the issue: https://github.com/apache/madlib/pull/249 @kaknikhil Thanks for the suggestion. It makes sense to use that function with integer types excluded (since DT does not treat integer as continuous). I'll add that as a separate commit since

  1   2   3   >