[GitHub] madlib pull request #315: JIRA:1060 - Modified KNN to accept expressions in ...

2018-09-05 Thread njayaram2
Github user njayaram2 commented on a diff in the pull request:

https://github.com/apache/madlib/pull/315#discussion_r215464110
  
--- Diff: src/ports/postgres/modules/knn/knn.py_in ---
@@ -53,22 +55,12 @@ def knn_validate_src(schema_madlib, point_source, 
point_column_name, point_id,
 
 if label_column_name and label_column_name.strip():
 cols_in_tbl_valid(point_source, [label_column_name], 'kNN')
-cols_in_tbl_valid(point_source, (point_column_name, point_id), 'kNN')
-cols_in_tbl_valid(test_source, (test_column_name, test_id), 'kNN')
-
-if not is_col_array(point_source, point_column_name):
-plpy.error("kNN Error: Feature column '{0}' in train table is not"
-   " an array.".format(point_column_name))
-if not is_col_array(test_source, test_column_name):
-plpy.error("kNN Error: Feature column '{0}' in test table is not"
-   " an array.".format(test_column_name))
--- End diff --

@fmcquillan99 yes, I was looking at the alternatives. @hpandeycodeit I can 
take care of this. I will push a commit to this PR (you may have to merge it, 
since I am not a co-author on your branch).


---


[GitHub] madlib pull request #315: JIRA:1060 - Modified KNN to accept expressions in ...

2018-09-05 Thread fmcquillan99
Github user fmcquillan99 commented on a diff in the pull request:

https://github.com/apache/madlib/pull/315#discussion_r215462116
  
--- Diff: src/ports/postgres/modules/knn/knn.py_in ---
@@ -53,22 +55,12 @@ def knn_validate_src(schema_madlib, point_source, 
point_column_name, point_id,
 
 if label_column_name and label_column_name.strip():
 cols_in_tbl_valid(point_source, [label_column_name], 'kNN')
-cols_in_tbl_valid(point_source, (point_column_name, point_id), 'kNN')
-cols_in_tbl_valid(test_source, (test_column_name, test_id), 'kNN')
-
-if not is_col_array(point_source, point_column_name):
-plpy.error("kNN Error: Feature column '{0}' in train table is not"
-   " an array.".format(point_column_name))
-if not is_col_array(test_source, test_column_name):
-plpy.error("kNN Error: Feature column '{0}' in test table is not"
-   " an array.".format(test_column_name))
--- End diff --

Yes I agree that this error message is too vague.  So in retrospect I 
suggest we do check that a valid array type is generated by the expression.   
Sorry if I mislead earlier @hpandeycodeit 

@njayaram2 do you have an existing function that we can use for this check?


---


[GitHub] madlib issue #315: JIRA:1060 - Modified KNN to accept expressions in point_c...

2018-09-05 Thread asfgit
Github user asfgit commented on the issue:

https://github.com/apache/madlib/pull/315
  

Refer to this link for build results (access rights to CI server needed): 
https://builds.apache.org/job/madlib-pr-build/670/



---


[GitHub] madlib pull request #315: JIRA:1060 - Modified KNN to accept expressions in ...

2018-09-05 Thread hpandeycodeit
Github user hpandeycodeit commented on a diff in the pull request:

https://github.com/apache/madlib/pull/315#discussion_r215149356
  
--- Diff: src/ports/postgres/modules/knn/knn.py_in ---
@@ -53,22 +55,12 @@ def knn_validate_src(schema_madlib, point_source, 
point_column_name, point_id,
 
 if label_column_name and label_column_name.strip():
 cols_in_tbl_valid(point_source, [label_column_name], 'kNN')
-cols_in_tbl_valid(point_source, (point_column_name, point_id), 'kNN')
-cols_in_tbl_valid(test_source, (test_column_name, test_id), 'kNN')
--- End diff --

Added these functions back. 


---


[GitHub] madlib pull request #315: JIRA:1060 - Modified KNN to accept expressions in ...

2018-09-05 Thread hpandeycodeit
Github user hpandeycodeit commented on a diff in the pull request:

https://github.com/apache/madlib/pull/315#discussion_r215149235
  
--- Diff: src/ports/postgres/modules/knn/knn.py_in ---
@@ -264,12 +260,17 @@ def knn(schema_madlib, point_source, 
point_column_name, point_id,
 SELECT test.{test_id} AS {test_id_temp},
 train.{point_id} as train_id,
 {fn_dist}(
-train.{point_column_name},
-test.{test_column_name})
+p_col_name,
+t_col_name)
 AS dist
 {label_out}
-FROM {point_source} AS train,
- {test_source} AS test
+FROM
+(
+SELECT {point_id} , {point_column_name} as 
p_col_name , {label_column_name} from {point_source}
+) train,
+(
+SELECT {test_id} ,{test_column_name} as 
t_col_name from {test_source}
+) test
--- End diff --

Done. 


---


[GitHub] madlib pull request #315: JIRA:1060 - Modified KNN to accept expressions in ...

2018-09-05 Thread hpandeycodeit
Github user hpandeycodeit commented on a diff in the pull request:

https://github.com/apache/madlib/pull/315#discussion_r215148966
  
--- Diff: src/ports/postgres/modules/knn/knn.py_in ---
@@ -53,22 +55,12 @@ def knn_validate_src(schema_madlib, point_source, 
point_column_name, point_id,
 
 if label_column_name and label_column_name.strip():
 cols_in_tbl_valid(point_source, [label_column_name], 'kNN')
-cols_in_tbl_valid(point_source, (point_column_name, point_id), 'kNN')
-cols_in_tbl_valid(test_source, (test_column_name, test_id), 'kNN')
-
-if not is_col_array(point_source, point_column_name):
-plpy.error("kNN Error: Feature column '{0}' in train table is not"
-   " an array.".format(point_column_name))
-if not is_col_array(test_source, test_column_name):
-plpy.error("kNN Error: Feature column '{0}' in test table is not"
-   " an array.".format(test_column_name))
--- End diff --

As per the Jira and our discussion, it is okay to fail at distance function 
if the point_column_name and test_column_name params are not arrays. 


---


[GitHub] madlib pull request #315: JIRA:1060 - Modified KNN to accept expressions in ...

2018-09-05 Thread hpandeycodeit
Github user hpandeycodeit commented on a diff in the pull request:

https://github.com/apache/madlib/pull/315#discussion_r215148640
  
--- Diff: src/ports/postgres/modules/knn/knn.py_in ---
@@ -53,22 +55,12 @@ def knn_validate_src(schema_madlib, point_source, 
point_column_name, point_id,
 
 if label_column_name and label_column_name.strip():
 cols_in_tbl_valid(point_source, [label_column_name], 'kNN')
-cols_in_tbl_valid(point_source, (point_column_name, point_id), 'kNN')
-cols_in_tbl_valid(test_source, (test_column_name, test_id), 'kNN')
-
-if not is_col_array(point_source, point_column_name):
-plpy.error("kNN Error: Feature column '{0}' in train table is not"
-   " an array.".format(point_column_name))
-if not is_col_array(test_source, test_column_name):
-plpy.error("kNN Error: Feature column '{0}' in test table is not"
-   " an array.".format(test_column_name))
-
-if not array_col_has_no_null(point_source, point_column_name):
-plpy.error("kNN Error: Feature column '{0}' in train table has 
some"
-   " NULL values.".format(point_column_name))
-if not array_col_has_no_null(test_source, test_column_name):
-plpy.error("kNN Error: Feature column '{0}' in test table has some"
-   " NULL values.".format(test_column_name))
--- End diff --

Added it back. Had to do a minor change in array_col_has_no_null() for the 
expressions. 


---