[GitHub] madlib pull request #315: JIRA:1060 - Modified KNN to accept expressions in ...
Github user njayaram2 commented on a diff in the pull request: https://github.com/apache/madlib/pull/315#discussion_r215464110 --- Diff: src/ports/postgres/modules/knn/knn.py_in --- @@ -53,22 +55,12 @@ def knn_validate_src(schema_madlib, point_source, point_column_name, point_id, if label_column_name and label_column_name.strip(): cols_in_tbl_valid(point_source, [label_column_name], 'kNN') -cols_in_tbl_valid(point_source, (point_column_name, point_id), 'kNN') -cols_in_tbl_valid(test_source, (test_column_name, test_id), 'kNN') - -if not is_col_array(point_source, point_column_name): -plpy.error("kNN Error: Feature column '{0}' in train table is not" - " an array.".format(point_column_name)) -if not is_col_array(test_source, test_column_name): -plpy.error("kNN Error: Feature column '{0}' in test table is not" - " an array.".format(test_column_name)) --- End diff -- @fmcquillan99 yes, I was looking at the alternatives. @hpandeycodeit I can take care of this. I will push a commit to this PR (you may have to merge it, since I am not a co-author on your branch). ---
[GitHub] madlib pull request #315: JIRA:1060 - Modified KNN to accept expressions in ...
Github user fmcquillan99 commented on a diff in the pull request: https://github.com/apache/madlib/pull/315#discussion_r215462116 --- Diff: src/ports/postgres/modules/knn/knn.py_in --- @@ -53,22 +55,12 @@ def knn_validate_src(schema_madlib, point_source, point_column_name, point_id, if label_column_name and label_column_name.strip(): cols_in_tbl_valid(point_source, [label_column_name], 'kNN') -cols_in_tbl_valid(point_source, (point_column_name, point_id), 'kNN') -cols_in_tbl_valid(test_source, (test_column_name, test_id), 'kNN') - -if not is_col_array(point_source, point_column_name): -plpy.error("kNN Error: Feature column '{0}' in train table is not" - " an array.".format(point_column_name)) -if not is_col_array(test_source, test_column_name): -plpy.error("kNN Error: Feature column '{0}' in test table is not" - " an array.".format(test_column_name)) --- End diff -- Yes I agree that this error message is too vague. So in retrospect I suggest we do check that a valid array type is generated by the expression. Sorry if I mislead earlier @hpandeycodeit @njayaram2 do you have an existing function that we can use for this check? ---
[GitHub] madlib issue #315: JIRA:1060 - Modified KNN to accept expressions in point_c...
Github user asfgit commented on the issue: https://github.com/apache/madlib/pull/315 Refer to this link for build results (access rights to CI server needed): https://builds.apache.org/job/madlib-pr-build/670/ ---
[GitHub] madlib pull request #315: JIRA:1060 - Modified KNN to accept expressions in ...
Github user hpandeycodeit commented on a diff in the pull request: https://github.com/apache/madlib/pull/315#discussion_r215149356 --- Diff: src/ports/postgres/modules/knn/knn.py_in --- @@ -53,22 +55,12 @@ def knn_validate_src(schema_madlib, point_source, point_column_name, point_id, if label_column_name and label_column_name.strip(): cols_in_tbl_valid(point_source, [label_column_name], 'kNN') -cols_in_tbl_valid(point_source, (point_column_name, point_id), 'kNN') -cols_in_tbl_valid(test_source, (test_column_name, test_id), 'kNN') --- End diff -- Added these functions back. ---
[GitHub] madlib pull request #315: JIRA:1060 - Modified KNN to accept expressions in ...
Github user hpandeycodeit commented on a diff in the pull request: https://github.com/apache/madlib/pull/315#discussion_r215149235 --- Diff: src/ports/postgres/modules/knn/knn.py_in --- @@ -264,12 +260,17 @@ def knn(schema_madlib, point_source, point_column_name, point_id, SELECT test.{test_id} AS {test_id_temp}, train.{point_id} as train_id, {fn_dist}( -train.{point_column_name}, -test.{test_column_name}) +p_col_name, +t_col_name) AS dist {label_out} -FROM {point_source} AS train, - {test_source} AS test +FROM +( +SELECT {point_id} , {point_column_name} as p_col_name , {label_column_name} from {point_source} +) train, +( +SELECT {test_id} ,{test_column_name} as t_col_name from {test_source} +) test --- End diff -- Done. ---
[GitHub] madlib pull request #315: JIRA:1060 - Modified KNN to accept expressions in ...
Github user hpandeycodeit commented on a diff in the pull request: https://github.com/apache/madlib/pull/315#discussion_r215148966 --- Diff: src/ports/postgres/modules/knn/knn.py_in --- @@ -53,22 +55,12 @@ def knn_validate_src(schema_madlib, point_source, point_column_name, point_id, if label_column_name and label_column_name.strip(): cols_in_tbl_valid(point_source, [label_column_name], 'kNN') -cols_in_tbl_valid(point_source, (point_column_name, point_id), 'kNN') -cols_in_tbl_valid(test_source, (test_column_name, test_id), 'kNN') - -if not is_col_array(point_source, point_column_name): -plpy.error("kNN Error: Feature column '{0}' in train table is not" - " an array.".format(point_column_name)) -if not is_col_array(test_source, test_column_name): -plpy.error("kNN Error: Feature column '{0}' in test table is not" - " an array.".format(test_column_name)) --- End diff -- As per the Jira and our discussion, it is okay to fail at distance function if the point_column_name and test_column_name params are not arrays. ---
[GitHub] madlib pull request #315: JIRA:1060 - Modified KNN to accept expressions in ...
Github user hpandeycodeit commented on a diff in the pull request: https://github.com/apache/madlib/pull/315#discussion_r215148640 --- Diff: src/ports/postgres/modules/knn/knn.py_in --- @@ -53,22 +55,12 @@ def knn_validate_src(schema_madlib, point_source, point_column_name, point_id, if label_column_name and label_column_name.strip(): cols_in_tbl_valid(point_source, [label_column_name], 'kNN') -cols_in_tbl_valid(point_source, (point_column_name, point_id), 'kNN') -cols_in_tbl_valid(test_source, (test_column_name, test_id), 'kNN') - -if not is_col_array(point_source, point_column_name): -plpy.error("kNN Error: Feature column '{0}' in train table is not" - " an array.".format(point_column_name)) -if not is_col_array(test_source, test_column_name): -plpy.error("kNN Error: Feature column '{0}' in test table is not" - " an array.".format(test_column_name)) - -if not array_col_has_no_null(point_source, point_column_name): -plpy.error("kNN Error: Feature column '{0}' in train table has some" - " NULL values.".format(point_column_name)) -if not array_col_has_no_null(test_source, test_column_name): -plpy.error("kNN Error: Feature column '{0}' in test table has some" - " NULL values.".format(test_column_name)) --- End diff -- Added it back. Had to do a minor change in array_col_has_no_null() for the expressions. ---