[GitHub] madlib issue #317: Fixed trailing whitespace in many sql_in files

2018-08-31 Thread asfgit
Github user asfgit commented on the issue:

https://github.com/apache/madlib/pull/317
  

Refer to this link for build results (access rights to CI server needed): 
https://builds.apache.org/job/madlib-pr-build/667/



---


[GitHub] madlib issue #316: Build: Disable AppendOnly if available

2018-08-31 Thread asfgit
Github user asfgit commented on the issue:

https://github.com/apache/madlib/pull/316
  

Refer to this link for build results (access rights to CI server needed): 
https://builds.apache.org/job/madlib-pr-build/666/



---


[GitHub] madlib pull request #315: JIRA:1060 - Modified KNN to accept expressions in ...

2018-08-31 Thread njayaram2
Github user njayaram2 commented on a diff in the pull request:

https://github.com/apache/madlib/pull/315#discussion_r214485090
  
--- Diff: src/ports/postgres/modules/knn/knn.py_in ---
@@ -53,22 +55,12 @@ def knn_validate_src(schema_madlib, point_source, 
point_column_name, point_id,
 
 if label_column_name and label_column_name.strip():
 cols_in_tbl_valid(point_source, [label_column_name], 'kNN')
-cols_in_tbl_valid(point_source, (point_column_name, point_id), 'kNN')
-cols_in_tbl_valid(test_source, (test_column_name, test_id), 'kNN')
-
-if not is_col_array(point_source, point_column_name):
-plpy.error("kNN Error: Feature column '{0}' in train table is not"
-   " an array.".format(point_column_name))
-if not is_col_array(test_source, test_column_name):
-plpy.error("kNN Error: Feature column '{0}' in test table is not"
-   " an array.".format(test_column_name))
--- End diff --

`point_column_name` and `test_column_name` params must be an array as this 
if check suggests. If it's not an array it fails further down when the distance 
function (such as `squared_dist_norm2`) is called.

I don't think the function `is_var_valid()` checks for these being arrays. 
You may have to check them after the new asserts, using a new helper function 
(`is_col_array()` cannot be used as is for expressions, and `is_var_valid()` 
does not check for an array, but just the validity of the expression)


---


[GitHub] madlib pull request #315: JIRA:1060 - Modified KNN to accept expressions in ...

2018-08-31 Thread njayaram2
Github user njayaram2 commented on a diff in the pull request:

https://github.com/apache/madlib/pull/315#discussion_r214487281
  
--- Diff: src/ports/postgres/modules/knn/knn.py_in ---
@@ -53,22 +55,12 @@ def knn_validate_src(schema_madlib, point_source, 
point_column_name, point_id,
 
 if label_column_name and label_column_name.strip():
 cols_in_tbl_valid(point_source, [label_column_name], 'kNN')
-cols_in_tbl_valid(point_source, (point_column_name, point_id), 'kNN')
-cols_in_tbl_valid(test_source, (test_column_name, test_id), 'kNN')
-
-if not is_col_array(point_source, point_column_name):
-plpy.error("kNN Error: Feature column '{0}' in train table is not"
-   " an array.".format(point_column_name))
-if not is_col_array(test_source, test_column_name):
-plpy.error("kNN Error: Feature column '{0}' in test table is not"
-   " an array.".format(test_column_name))
-
-if not array_col_has_no_null(point_source, point_column_name):
-plpy.error("kNN Error: Feature column '{0}' in train table has 
some"
-   " NULL values.".format(point_column_name))
-if not array_col_has_no_null(test_source, test_column_name):
-plpy.error("kNN Error: Feature column '{0}' in test table has some"
-   " NULL values.".format(test_column_name))
+
+_assert(is_var_valid(point_source, point_column_name),
+"kNN error: Invalid point_column_name expression")
--- End diff --

Can we change the error message to indicate invalid column name or 
expression? How about 
`"kNN error: {0} is an invalid column name or expression for 
point_column_name param".format(point_column_name)`?


---


[GitHub] madlib pull request #315: JIRA:1060 - Modified KNN to accept expressions in ...

2018-08-31 Thread njayaram2
Github user njayaram2 commented on a diff in the pull request:

https://github.com/apache/madlib/pull/315#discussion_r214484071
  
--- Diff: src/ports/postgres/modules/knn/knn.py_in ---
@@ -53,22 +55,12 @@ def knn_validate_src(schema_madlib, point_source, 
point_column_name, point_id,
 
 if label_column_name and label_column_name.strip():
 cols_in_tbl_valid(point_source, [label_column_name], 'kNN')
-cols_in_tbl_valid(point_source, (point_column_name, point_id), 'kNN')
-cols_in_tbl_valid(test_source, (test_column_name, test_id), 'kNN')
--- End diff --

`point_id` and `test_id` params are not validated anymore. This should 
still be done, since the new asserts only check for `point_column_name` and 
`test_column_name` being valid expressions. This validation is required to 
catch invalid values such as `NULL`, `''`, invalid or non-existing column name 
etc. for `point_id` and `test_id` params.


---


[GitHub] madlib pull request #315: JIRA:1060 - Modified KNN to accept expressions in ...

2018-08-31 Thread njayaram2
Github user njayaram2 commented on a diff in the pull request:

https://github.com/apache/madlib/pull/315#discussion_r214485621
  
--- Diff: src/ports/postgres/modules/knn/knn.py_in ---
@@ -53,22 +55,12 @@ def knn_validate_src(schema_madlib, point_source, 
point_column_name, point_id,
 
 if label_column_name and label_column_name.strip():
 cols_in_tbl_valid(point_source, [label_column_name], 'kNN')
-cols_in_tbl_valid(point_source, (point_column_name, point_id), 'kNN')
-cols_in_tbl_valid(test_source, (test_column_name, test_id), 'kNN')
-
-if not is_col_array(point_source, point_column_name):
-plpy.error("kNN Error: Feature column '{0}' in train table is not"
-   " an array.".format(point_column_name))
-if not is_col_array(test_source, test_column_name):
-plpy.error("kNN Error: Feature column '{0}' in test table is not"
-   " an array.".format(test_column_name))
-
-if not array_col_has_no_null(point_source, point_column_name):
-plpy.error("kNN Error: Feature column '{0}' in train table has 
some"
-   " NULL values.".format(point_column_name))
-if not array_col_has_no_null(test_source, test_column_name):
-plpy.error("kNN Error: Feature column '{0}' in test table has some"
-   " NULL values.".format(test_column_name))
--- End diff --

I think we will need to keep these asserts too. They seem to be checking 
for a different validation. If the helper function `array_col_has_no_null()` 
does not work for expressions, please go ahead and change that, or create a new 
helper function to handle the scenario (both expressions and column names).


---


[GitHub] madlib pull request #315: JIRA:1060 - Modified KNN to accept expressions in ...

2018-08-31 Thread njayaram2
Github user njayaram2 commented on a diff in the pull request:

https://github.com/apache/madlib/pull/315#discussion_r214487426
  
--- Diff: src/ports/postgres/modules/knn/knn.py_in ---
@@ -53,22 +55,12 @@ def knn_validate_src(schema_madlib, point_source, 
point_column_name, point_id,
 
 if label_column_name and label_column_name.strip():
 cols_in_tbl_valid(point_source, [label_column_name], 'kNN')
-cols_in_tbl_valid(point_source, (point_column_name, point_id), 'kNN')
-cols_in_tbl_valid(test_source, (test_column_name, test_id), 'kNN')
-
-if not is_col_array(point_source, point_column_name):
-plpy.error("kNN Error: Feature column '{0}' in train table is not"
-   " an array.".format(point_column_name))
-if not is_col_array(test_source, test_column_name):
-plpy.error("kNN Error: Feature column '{0}' in test table is not"
-   " an array.".format(test_column_name))
-
-if not array_col_has_no_null(point_source, point_column_name):
-plpy.error("kNN Error: Feature column '{0}' in train table has 
some"
-   " NULL values.".format(point_column_name))
-if not array_col_has_no_null(test_source, test_column_name):
-plpy.error("kNN Error: Feature column '{0}' in test table has some"
-   " NULL values.".format(test_column_name))
+
+_assert(is_var_valid(point_source, point_column_name),
+"kNN error: Invalid point_column_name expression")
+
+_assert(is_var_valid(test_source, test_column_name),
+"kNN error: Invalid test_column_name expression")
--- End diff --

Similar to the above comment, can we change the error message to:
`"kNN error: {0} is an invalid column name or expression for 
test_column_name param".format(test_column_name)`?


---


[GitHub] madlib pull request #315: JIRA:1060 - Modified KNN to accept expressions in ...

2018-08-31 Thread njayaram2
Github user njayaram2 commented on a diff in the pull request:

https://github.com/apache/madlib/pull/315#discussion_r214482318
  
--- Diff: src/ports/postgres/modules/knn/knn.py_in ---
@@ -264,12 +260,17 @@ def knn(schema_madlib, point_source, 
point_column_name, point_id,
 SELECT test.{test_id} AS {test_id_temp},
 train.{point_id} as train_id,
 {fn_dist}(
-train.{point_column_name},
-test.{test_column_name})
+p_col_name,
+t_col_name)
 AS dist
 {label_out}
-FROM {point_source} AS train,
- {test_source} AS test
+FROM
+(
+SELECT {point_id} , {point_column_name} as 
p_col_name , {label_column_name} from {point_source}
+) train,
+(
+SELECT {test_id} ,{test_column_name} as 
t_col_name from {test_source}
+) test
--- End diff --

Can you please use variables with unique strings for `train`, `test`, 
`p_col_name` and `t_col_name`. If the train or test table is named any of 
those, the query would fail I guess.
While you are at it, could you also do the same for other variables in this 
query: `train_id`, `r`, `dist_inverse` and others I may have missed listing out?


---


[GitHub] madlib issue #316: Build: Disable AppendOnly if available

2018-08-31 Thread asfgit
Github user asfgit commented on the issue:

https://github.com/apache/madlib/pull/316
  

Refer to this link for build results (access rights to CI server needed): 
https://builds.apache.org/job/madlib-pr-build/665/



---


[GitHub] madlib issue #315: JIRA:1060 - Modified KNN to accept expressions in point_c...

2018-08-31 Thread asfgit
Github user asfgit commented on the issue:

https://github.com/apache/madlib/pull/315
  

Refer to this link for build results (access rights to CI server needed): 
https://builds.apache.org/job/madlib-pr-build/664/



---


[GitHub] madlib pull request #315: JIRA:1060 - Modified KNN to accept expressions in ...

2018-08-31 Thread hpandeycodeit
Github user hpandeycodeit commented on a diff in the pull request:

https://github.com/apache/madlib/pull/315#discussion_r214415305
  
--- Diff: src/ports/postgres/modules/knn/test/knn.sql_in ---
@@ -122,5 +122,13 @@ select 
knn('knn_train_data_reg','data','id','label','knn_test_data','data','id',
 select assert(array_agg(prediction::numeric order by 
id)='{1,1,0.0408728591876018,1,0,0}', 'Wrong output in regression') from 
madlib_knn_result_regression;
 
 
+drop table if exists madlib_knn_result_classification;
+select 
knn('knn_train_data','data[1:1]||data[2:2]','id','label','knn_test_data','data[1:1]||data[2:2]','id','madlib_knn_result_classification',3,False,'MADLIB_SCHEMA.squared_dist_norm2',
 True);
+select assert(array_agg(prediction::numeric order by id)='{1,1,0,1,0,0}', 
'Wrong output in classification') from madlib_knn_result_classification;
+
+drop table if exists madlib_knn_result_regression;
+select 
knn('knn_train_data_reg','data[1:1]||data[2:2]','id','label','knn_test_data','data[1:1]||data[2:2]','id','madlib_knn_result_regression',3,False,'MADLIB_SCHEMA.squared_dist_norm2',
 True);
+select assert(array_agg(prediction::numeric order by 
id)='{1,1,0.0408728591876018,1,0,0}', 'Wrong output in regression') from 
madlib_knn_result_regression;
+
--- End diff --

Added this test as well along with other minor changes as suggested. 


---


[GitHub] madlib pull request #315: JIRA:1060 - Modified KNN to accept expressions in ...

2018-08-31 Thread hpandeycodeit
Github user hpandeycodeit commented on a diff in the pull request:

https://github.com/apache/madlib/pull/315#discussion_r214415227
  
--- Diff: src/ports/postgres/modules/knn/knn.py_in ---
@@ -290,11 +303,11 @@ def knn(schema_madlib, point_source, 
point_column_name, point_id,
 ON knn_temp.{test_id_temp} = knn_test.{test_id}
 {view_join}
 GROUP BY knn_temp.{test_id_temp},
- knn_test.{test_column_name}
+{test_column_name}
  {view_grp_by}
 """
 plpy.execute(sql.format(**locals()))
-plpy.execute("DROP TABLE IF EXISTS {0}".format(interim_table))
+# plpy.execute("DROP TABLE IF EXISTS {0}".format(interim_table))
--- End diff --

I pushed by mistake. I have reverted the changes. 


---


[GitHub] madlib pull request #315: JIRA:1060 - Modified KNN to accept expressions in ...

2018-08-31 Thread hpandeycodeit
Github user hpandeycodeit commented on a diff in the pull request:

https://github.com/apache/madlib/pull/315#discussion_r214414914
  
--- Diff: src/ports/postgres/modules/knn/knn.py_in ---
@@ -264,12 +275,14 @@ def knn(schema_madlib, point_source, 
point_column_name, point_id,
 SELECT test.{test_id} AS {test_id_temp},
 train.{point_id} as train_id,
 {fn_dist}(
-train.{point_column_name},
-test.{test_column_name})
+train.{point_col_name_temp},
+test.{test_col_name_temp})
 AS dist
 {label_out}
-FROM {point_source} AS train,
- {test_source} AS test
+FROM
+ {point_source_temp_table} as train,
+ {test_source_temp_table} as test
--- End diff --

Replaced the temp tables with subqueries. 


---


[GitHub] madlib issue #316: Build: Disable AppendOnly if available

2018-08-31 Thread asfgit
Github user asfgit commented on the issue:

https://github.com/apache/madlib/pull/316
  

Refer to this link for build results (access rights to CI server needed): 
https://builds.apache.org/job/madlib-pr-build/663/



---