[GitHub] madlib issue #315: JIRA:1060 - Modified KNN to accept expressions in point_c...
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/315 re-running the failed test, seems to pass now: ``` SELECT * FROM knn_result_list_neighbors ORDER BY id; ``` produces ``` id | data | k_nearest_neighbours +-+-- 1 | {2,1} | {1,2,3} 2 | {2,6} | {5,4,3} 3 | {15,40} | {7,6,5} 4 | {12,1} | {4,5,3} 5 | {2,90} | {9,6,7} 6 | {50,45} | {6,7,8} (6 rows) ``` LGTM ---
[GitHub] madlib issue #315: JIRA:1060 - Modified KNN to accept expressions in point_c...
Github user asfgit commented on the issue: https://github.com/apache/madlib/pull/315 Refer to this link for build results (access rights to CI server needed): https://builds.apache.org/job/madlib-pr-build/672/ ---
[GitHub] madlib issue #315: JIRA:1060 - Modified KNN to accept expressions in point_c...
Github user hpandeycodeit commented on the issue: https://github.com/apache/madlib/pull/315 Thanks for the update @fmcquillan99! I have fixed the error above and added a test case as well. ---
[GitHub] madlib issue #315: JIRA:1060 - Modified KNN to accept expressions in point_c...
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/315 Actually the earlier issue above ^^^ is OK, where I said `I'm not sure what this is doing` because forcing all training data to be a single point means that the distance to all test points is identical. So a random nearest neighbor could be picked. Which it is what seems to be happening. So I think just fix the error above and this should be good to go. ---
[GitHub] madlib issue #315: JIRA:1060 - Modified KNN to accept expressions in point_c...
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/315 (1) expression for test data array: ``` DROP TABLE IF EXISTS knn_result_classification; SELECT * FROM madlib.knn( 'knn_train_data', -- Table of training data 'data',-- Col name of training data 'id', -- Col name of id in train data 'label', -- Training labels 'knn_test_data', -- Table of test data '3 || ARRAY[4]',-- Col name of test data 'id', -- Col name of id in test data 'knn_result_classification', -- Output table 3,-- Number of nearest neighbors True, -- True to list nearest-neighbors by id 'madlib.squared_dist_norm2' -- Distance function ); SELECT * from knn_result_classification ORDER BY id; ``` produces ``` id | 3 || ARRAY[4] | prediction | k_nearest_neighbours +---++-- 1 | {3,4} | 1 | {3,4,5} 2 | {3,4} | 1 | {3,4,5} 3 | {3,4} | 1 | {3,4,5} 4 | {3,4} | 1 | {4,3,5} 5 | {3,4} | 1 | {3,4,5} 6 | {3,4} | 1 | {4,3,5} (6 rows) ``` (2) another expression for test data array: ``` DROP TABLE IF EXISTS knn_result_classification; SELECT * FROM madlib.knn( 'knn_train_data', -- Table of training data 'data',-- Col name of training data 'id', -- Col name of id in train data 'label', -- Training labels 'knn_test_data', -- Table of test data 'array[3.]::int[] || array[4]',-- Col name of test data 'id', -- Col name of id in test data 'knn_result_classification', -- Output table 3,-- Number of nearest neighbors True, -- True to list nearest-neighbors by id 'madlib.squared_dist_norm2' -- Distance function ); SELECT * from knn_result_classification ORDER BY id; ``` produces ``` id | array[3.]::int[] || array[4] | prediction | k_nearest_neighbours +--++-- 1 | {3,4}| 1 | {3,4,5} 2 | {3,4}| 1 | {3,4,5} 3 | {3,4}| 1 | {4,3,5} 4 | {3,4}| 1 | {3,4,5} 5 | {3,4}| 1 | {4,3,5} 6 | {3,4}| 1 | {4,3,5} (6 rows) ``` so this bit seems to work ---
[GitHub] madlib issue #315: JIRA:1060 - Modified KNN to accept expressions in point_c...
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/315 I'm not sure what this is doing: ``` %%sql DROP TABLE IF EXISTS knn_result_classification; SELECT * FROM madlib.knn( 'knn_train_data', -- Table of training data 'array[99.]::int[] || array[99]',-- Col name of training data 'id', -- Col name of id in train data 'label', -- Training labels 'knn_test_data', -- Table of test data 'data',-- Col name of test data 'id', -- Col name of id in test data 'knn_result_classification', -- Output table 1,-- Number of nearest neighbors True, -- True to list nearest-neighbors by id 'madlib.squared_dist_norm2' -- Distance function ); SELECT * from knn_result_classification ORDER BY id; ``` produces ``` id | data | prediction | k_nearest_neighbours +-++-- 1 | {2,1} | 0 | {8} 2 | {2,6} | 0 | {8} 3 | {15,40} | 0 | {8} 4 | {12,1} | 0 | {8} 5 | {2,90} | 1 | {1} 6 | {50,45} | 1 | {1} (6 rows) ``` I get the same result if I do: ``` DROP TABLE IF EXISTS knn_result_classification; SELECT * FROM madlib.knn( 'knn_train_data', -- Table of training data 'array[0.]::int[] || array[0]',-- Col name of training data 'id', -- Col name of id in train data 'label', -- Training labels 'knn_test_data', -- Table of test data 'data',-- Col name of test data 'id', -- Col name of id in test data 'knn_result_classification', -- Output table 1,-- Number of nearest neighbors True, -- True to list nearest-neighbors by id 'madlib.squared_dist_norm2' -- Distance function ); SELECT * from knn_result_classification ORDER BY id; ``` gives ``` id | data | prediction | k_nearest_neighbours +-++-- 1 | {2,1} | 0 | {8} 2 | {2,6} | 0 | {8} 3 | {15,40} | 0 | {8} 4 | {12,1} | 0 | {8} 5 | {2,90} | 1 | {1} 6 | {50,45} | 1 | {1} (6 rows) ``` ---
[GitHub] madlib issue #315: JIRA:1060 - Modified KNN to accept expressions in point_c...
Github user hpandeycodeit commented on the issue: https://github.com/apache/madlib/pull/315 @fmcquillan99 @njayaram2 Issue is here `{point_id} , {point_column_name} as {p_col_name} {label_column_name} from {point_source}` I will handle it and add a test case as well. ---
[GitHub] madlib issue #315: JIRA:1060 - Modified KNN to accept expressions in point_c...
Github user njayaram2 commented on the issue: https://github.com/apache/madlib/pull/315 @fmcquillan99 thanks for testing this out. I can have a look at this issue. ---
[GitHub] madlib issue #315: JIRA:1060 - Modified KNN to accept expressions in point_c...
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/315 load data: ``` DROP TABLE IF EXISTS knn_train_data; CREATE TABLE knn_train_data ( id integer, data integer[], label integer -- Integer label means for classification ); INSERT INTO knn_train_data VALUES (1, '{1,1}', 1), (2, '{2,2}', 1), (3, '{3,3}', 1), (4, '{4,4}', 1), (5, '{4,5}', 1), (6, '{20,50}', 0), (7, '{10,31}', 0), (8, '{81,13}', 0), (9, '{1,111}', 0); ``` run knn to list nearest neighbors only (without doing classification or regression). ``` DROP TABLE IF EXISTS knn_result_list_neighbors; SELECT * FROM madlib.knn( 'knn_train_data_reg', -- Table of training data 'data',-- Col name of training data 'id', -- Col Name of id in train data NULL, -- NULL training labels means just list neighbors 'knn_test_data', -- Table of test data 'data',-- Col name of test data 'id', -- Col name of id in test data 'knn_result_list_neighbors', -- Output table 3 -- Number of nearest neighbors ); ``` results in error ``` ERROR: plpy.SPIError: column "none" does not exist LINE 22: ...b_temp_p_col_name68224549_1536338162_43630832__ , None from ... ^ QUERY: CREATE TEMP TABLE __madlib_temp_interim_table80328381_1536338162_13726874__ AS SELECT * FROM ( SELECT row_number() over (partition by __madlib_temp_test_id_temp32938499_1536338162_42836514__ order by __madlib_temp_dist75319423_1536338162_4342866__) AS __madlib_temp_r77948314_1536338162_302550__, __madlib_temp_test_id_temp32938499_1536338162_42836514__, __madlib_temp_train_id51898468_1536338162_48880352__, CASE WHEN __madlib_temp_dist75319423_1536338162_4342866__ = 0.0 THEN 100.0 ELSE 1.0 / __madlib_temp_dist75319423_1536338162_4342866__ END AS __madlib_temp_dist_inverse21927322_1536338162_30562226__ FROM ( SELECT __madlib_temp_test58595915_1536338162_24414359__.id AS __madlib_temp_test_id_temp32938499_1536338162_42836514__, __madlib_temp_train73645570_1536338162_46994036__.id as __madlib_temp_train_id51898468_1536338162_48880352__, madlib.squared_dist_norm2( __madlib_temp_p_col_name68224549_1536338162_43630832__, __madlib_temp_t_col_name86464547_1536338162_93305987__) AS __madlib_temp_dist75319423_1536338162_4342866__ FROM ( SELECT id , data as __madlib_temp_p_col_name68224549_1536338162_43630832__ , None from knn_train_data_reg ) __madlib_temp_train73645570_1536338162_46994036__, ( SELECT id ,data as __madlib_temp_t_col_name86464547_1536338162_93305987__ from knn_test_data ) __madlib_temp_test58595915_1536338162_24414359__ ) __madlib_temp_x_temp_table56760627_1536338162_7182722__ ) __madlib_temp_y_temp_table96170617_1536338162_37029044__ WHERE __madlib_temp_y_temp_table96170617_1536338162_37029044__.__madlib_temp_r77948314_1536338162_302550__ <= 3 CONTEXT: Traceback (most recent call last): PL/Python function "knn", line 33, in weighted_avg PL/Python function "knn", line 305, in knn PL/Python function "knn" SQL statement "SELECT madlib.knn( $1 , $2 , $3 , $4 , $5 , $6 , $7 , $8 , $9 ,TRUE, 'madlib.squared_dist_norm2', FALSE)" PL/pgSQL function "knn" line 4 at assignment ``` ---
[GitHub] madlib issue #315: JIRA:1060 - Modified KNN to accept expressions in point_c...
Github user asfgit commented on the issue: https://github.com/apache/madlib/pull/315 Refer to this link for build results (access rights to CI server needed): https://builds.apache.org/job/madlib-pr-build/671/ ---
[GitHub] madlib issue #315: JIRA:1060 - Modified KNN to accept expressions in point_c...
Github user asfgit commented on the issue: https://github.com/apache/madlib/pull/315 Refer to this link for build results (access rights to CI server needed): https://builds.apache.org/job/madlib-pr-build/670/ ---
[GitHub] madlib issue #315: JIRA:1060 - Modified KNN to accept expressions in point_c...
Github user asfgit commented on the issue: https://github.com/apache/madlib/pull/315 Refer to this link for build results (access rights to CI server needed): https://builds.apache.org/job/madlib-pr-build/664/ ---
[GitHub] madlib issue #315: JIRA:1060 - Modified KNN to accept expressions in point_c...
Github user asfgit commented on the issue: https://github.com/apache/madlib/pull/315 Refer to this link for build results (access rights to CI server needed): https://builds.apache.org/job/madlib-pr-build/659/ ---