[GitHub] madlib issue #315: JIRA:1060 - Modified KNN to accept expressions in point_c...

2018-09-07 Thread fmcquillan99
Github user fmcquillan99 commented on the issue:

https://github.com/apache/madlib/pull/315
  
re-running the failed test, seems to pass now:

```
SELECT * FROM knn_result_list_neighbors ORDER BY id;
```
produces
```
 id |  data   | k_nearest_neighbours 
+-+--
  1 | {2,1}   | {1,2,3}
  2 | {2,6}   | {5,4,3}
  3 | {15,40} | {7,6,5}
  4 | {12,1}  | {4,5,3}
  5 | {2,90}  | {9,6,7}
  6 | {50,45} | {6,7,8}
(6 rows)
```

LGTM


---


[GitHub] madlib issue #315: JIRA:1060 - Modified KNN to accept expressions in point_c...

2018-09-07 Thread asfgit
Github user asfgit commented on the issue:

https://github.com/apache/madlib/pull/315
  

Refer to this link for build results (access rights to CI server needed): 
https://builds.apache.org/job/madlib-pr-build/672/



---


[GitHub] madlib issue #315: JIRA:1060 - Modified KNN to accept expressions in point_c...

2018-09-07 Thread hpandeycodeit
Github user hpandeycodeit commented on the issue:

https://github.com/apache/madlib/pull/315
  
Thanks for the update @fmcquillan99! I have fixed the error above and added 
a test case as well. 


---


[GitHub] madlib issue #315: JIRA:1060 - Modified KNN to accept expressions in point_c...

2018-09-07 Thread fmcquillan99
Github user fmcquillan99 commented on the issue:

https://github.com/apache/madlib/pull/315
  
Actually the earlier issue above ^^^ is OK, where I said `I'm not sure what 
this is doing` because forcing all training data to be a single point means 
that the distance to all test points is identical.  So a random nearest 
neighbor could be picked.  Which it is what seems to be happening.  

So I think just fix the error above and this should be good to go.


---


[GitHub] madlib issue #315: JIRA:1060 - Modified KNN to accept expressions in point_c...

2018-09-07 Thread fmcquillan99
Github user fmcquillan99 commented on the issue:

https://github.com/apache/madlib/pull/315
  
(1)
expression for test data array:
```
DROP TABLE IF EXISTS knn_result_classification;

SELECT * FROM madlib.knn(
'knn_train_data',  -- Table of training data
'data',-- Col name of training data
'id',  -- Col name of id in train data
'label',   -- Training labels
'knn_test_data',   -- Table of test data
'3 || ARRAY[4]',-- Col name of test data
'id',  -- Col name of id in test data
'knn_result_classification',  -- Output table
 3,-- Number of nearest neighbors
 True, -- True to list nearest-neighbors by 
id
 'madlib.squared_dist_norm2' -- Distance function
);

SELECT * from knn_result_classification ORDER BY id;
```
produces
```
 id | 3 || ARRAY[4] | prediction | k_nearest_neighbours 
+---++--
  1 | {3,4} |  1 | {3,4,5}
  2 | {3,4} |  1 | {3,4,5}
  3 | {3,4} |  1 | {3,4,5}
  4 | {3,4} |  1 | {4,3,5}
  5 | {3,4} |  1 | {3,4,5}
  6 | {3,4} |  1 | {4,3,5}
(6 rows)
```


(2)
another expression for test data array:
```
DROP TABLE IF EXISTS knn_result_classification;

SELECT * FROM madlib.knn(
'knn_train_data',  -- Table of training data
'data',-- Col name of training data
'id',  -- Col name of id in train data
'label',   -- Training labels
'knn_test_data',   -- Table of test data
'array[3.]::int[] || array[4]',-- Col name 
of test data
'id',  -- Col name of id in test data
'knn_result_classification',  -- Output table
 3,-- Number of nearest neighbors
 True, -- True to list nearest-neighbors by 
id
 'madlib.squared_dist_norm2' -- Distance function
);

SELECT * from knn_result_classification ORDER BY id;
```
produces
```
 id | array[3.]::int[] || array[4] | prediction | k_nearest_neighbours 
+--++--
  1 | {3,4}|  1 | {3,4,5}
  2 | {3,4}|  1 | {3,4,5}
  3 | {3,4}|  1 | {4,3,5}
  4 | {3,4}|  1 | {3,4,5}
  5 | {3,4}|  1 | {4,3,5}
  6 | {3,4}|  1 | {4,3,5}
(6 rows)
```
so this bit seems to work



---


[GitHub] madlib issue #315: JIRA:1060 - Modified KNN to accept expressions in point_c...

2018-09-07 Thread fmcquillan99
Github user fmcquillan99 commented on the issue:

https://github.com/apache/madlib/pull/315
  
I'm not sure what this is doing:
```
%%sql
DROP TABLE IF EXISTS knn_result_classification;

SELECT * FROM madlib.knn(
'knn_train_data',  -- Table of training data
'array[99.]::int[] || array[99]',-- Col 
name of training data
'id',  -- Col name of id in train data
'label',   -- Training labels
'knn_test_data',   -- Table of test data
'data',-- Col name of test data
'id',  -- Col name of id in test data
'knn_result_classification',  -- Output table
 1,-- Number of nearest neighbors
 True, -- True to list nearest-neighbors by 
id
 'madlib.squared_dist_norm2' -- Distance function
);

SELECT * from knn_result_classification ORDER BY id;
``` 
produces
```
 id |  data   | prediction | k_nearest_neighbours 
+-++--
  1 | {2,1}   |  0 | {8}
  2 | {2,6}   |  0 | {8}
  3 | {15,40} |  0 | {8}
  4 | {12,1}  |  0 | {8}
  5 | {2,90}  |  1 | {1}
  6 | {50,45} |  1 | {1}
(6 rows)
```

I get the same result if I do:
```
DROP TABLE IF EXISTS knn_result_classification;

SELECT * FROM madlib.knn(
'knn_train_data',  -- Table of training data
'array[0.]::int[] || array[0]',-- Col name 
of training data
'id',  -- Col name of id in train data
'label',   -- Training labels
'knn_test_data',   -- Table of test data
'data',-- Col name of test data
'id',  -- Col name of id in test data
'knn_result_classification',  -- Output table
 1,-- Number of nearest neighbors
 True, -- True to list nearest-neighbors by 
id
 'madlib.squared_dist_norm2' -- Distance function
);

SELECT * from knn_result_classification ORDER BY id;
```
gives
```
 id |  data   | prediction | k_nearest_neighbours 
+-++--
  1 | {2,1}   |  0 | {8}
  2 | {2,6}   |  0 | {8}
  3 | {15,40} |  0 | {8}
  4 | {12,1}  |  0 | {8}
  5 | {2,90}  |  1 | {1}
  6 | {50,45} |  1 | {1}
(6 rows)
```




---


[GitHub] madlib issue #315: JIRA:1060 - Modified KNN to accept expressions in point_c...

2018-09-07 Thread hpandeycodeit
Github user hpandeycodeit commented on the issue:

https://github.com/apache/madlib/pull/315
  
@fmcquillan99 @njayaram2 

Issue is here 

`{point_id} , {point_column_name} as {p_col_name} {label_column_name} from 
{point_source}`

I will handle it and add a test case as well. 


---


[GitHub] madlib issue #315: JIRA:1060 - Modified KNN to accept expressions in point_c...

2018-09-07 Thread njayaram2
Github user njayaram2 commented on the issue:

https://github.com/apache/madlib/pull/315
  
@fmcquillan99 thanks for testing this out. I can have a look at this issue.


---


[GitHub] madlib issue #315: JIRA:1060 - Modified KNN to accept expressions in point_c...

2018-09-07 Thread fmcquillan99
Github user fmcquillan99 commented on the issue:

https://github.com/apache/madlib/pull/315
  

load data:
```
DROP TABLE IF EXISTS knn_train_data;

CREATE TABLE knn_train_data (
id integer, 
data integer[], 
label integer  -- Integer label means for classification
);

INSERT INTO knn_train_data VALUES
(1, '{1,1}', 1),
(2, '{2,2}', 1),
(3, '{3,3}', 1),
(4, '{4,4}', 1),
(5, '{4,5}', 1),
(6, '{20,50}', 0),
(7, '{10,31}', 0),
(8, '{81,13}', 0),
(9, '{1,111}', 0);
```

run knn to list nearest neighbors only
(without doing classification or regression). 
```
DROP TABLE IF EXISTS knn_result_list_neighbors;

SELECT * FROM madlib.knn(
'knn_train_data_reg',  -- Table of training data
'data',-- Col name of training data
'id',  -- Col Name of id in train data
NULL,  -- NULL training labels means just 
list neighbors
'knn_test_data',   -- Table of test data
'data',-- Col name of test data
'id',  -- Col name of id in test data
'knn_result_list_neighbors', -- Output table
3  -- Number of nearest neighbors
);
```
results in error
```
ERROR:  plpy.SPIError: column "none" does not exist
LINE 22: ...b_temp_p_col_name68224549_1536338162_43630832__ , None from ...
  ^
QUERY:  
CREATE TEMP TABLE 
__madlib_temp_interim_table80328381_1536338162_13726874__ AS
SELECT * FROM (
SELECT row_number() over
(partition by 
__madlib_temp_test_id_temp32938499_1536338162_42836514__ order by 
__madlib_temp_dist75319423_1536338162_4342866__) AS 
__madlib_temp_r77948314_1536338162_302550__,

__madlib_temp_test_id_temp32938499_1536338162_42836514__,

__madlib_temp_train_id51898468_1536338162_48880352__,
CASE WHEN 
__madlib_temp_dist75319423_1536338162_4342866__ = 0.0 THEN 100.0
 ELSE 1.0 / 
__madlib_temp_dist75319423_1536338162_4342866__
END AS 
__madlib_temp_dist_inverse21927322_1536338162_30562226__

FROM (
SELECT 
__madlib_temp_test58595915_1536338162_24414359__.id AS 
__madlib_temp_test_id_temp32938499_1536338162_42836514__,

__madlib_temp_train73645570_1536338162_46994036__.id as 
__madlib_temp_train_id51898468_1536338162_48880352__,
madlib.squared_dist_norm2(

__madlib_temp_p_col_name68224549_1536338162_43630832__,

__madlib_temp_t_col_name86464547_1536338162_93305987__)
AS 
__madlib_temp_dist75319423_1536338162_4342866__

FROM
(
SELECT id , data as 
__madlib_temp_p_col_name68224549_1536338162_43630832__ , None from 
knn_train_data_reg
) 
__madlib_temp_train73645570_1536338162_46994036__,
(
SELECT id ,data as 
__madlib_temp_t_col_name86464547_1536338162_93305987__ from knn_test_data
) 
__madlib_temp_test58595915_1536338162_24414359__
) 
__madlib_temp_x_temp_table56760627_1536338162_7182722__
) 
__madlib_temp_y_temp_table96170617_1536338162_37029044__
WHERE 
__madlib_temp_y_temp_table96170617_1536338162_37029044__.__madlib_temp_r77948314_1536338162_302550__
 <= 3

CONTEXT:  Traceback (most recent call last):
  PL/Python function "knn", line 33, in 
weighted_avg
  PL/Python function "knn", line 305, in knn
PL/Python function "knn"
SQL statement "SELECT  madlib.knn( $1 , $2 , $3 , $4 , $5 , $6 , $7 , $8 , 
$9 ,TRUE, 'madlib.squared_dist_norm2', FALSE)"
PL/pgSQL function "knn" line 4 at assignment
```



---


[GitHub] madlib issue #315: JIRA:1060 - Modified KNN to accept expressions in point_c...

2018-09-06 Thread asfgit
Github user asfgit commented on the issue:

https://github.com/apache/madlib/pull/315
  

Refer to this link for build results (access rights to CI server needed): 
https://builds.apache.org/job/madlib-pr-build/671/



---


[GitHub] madlib issue #315: JIRA:1060 - Modified KNN to accept expressions in point_c...

2018-09-05 Thread asfgit
Github user asfgit commented on the issue:

https://github.com/apache/madlib/pull/315
  

Refer to this link for build results (access rights to CI server needed): 
https://builds.apache.org/job/madlib-pr-build/670/



---


[GitHub] madlib issue #315: JIRA:1060 - Modified KNN to accept expressions in point_c...

2018-08-31 Thread asfgit
Github user asfgit commented on the issue:

https://github.com/apache/madlib/pull/315
  

Refer to this link for build results (access rights to CI server needed): 
https://builds.apache.org/job/madlib-pr-build/664/



---


[GitHub] madlib issue #315: JIRA:1060 - Modified KNN to accept expressions in point_c...

2018-08-28 Thread asfgit
Github user asfgit commented on the issue:

https://github.com/apache/madlib/pull/315
  

Refer to this link for build results (access rights to CI server needed): 
https://builds.apache.org/job/madlib-pr-build/659/



---