[GitHub] [spark] srowen commented on issue #26415: [SPARK-18409][ML] LSH approxNearestNeighbors should use approxQuantile instead of sort

2019-11-20 Thread GitBox
srowen commented on issue #26415: [SPARK-18409][ML] LSH approxNearestNeighbors 
should use approxQuantile instead of sort
URL: https://github.com/apache/spark/pull/26415#issuecomment-556023514
 
 
   Merged to master


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] srowen commented on issue #26415: [SPARK-18409][ML] LSH approxNearestNeighbors should use approxQuantile instead of sort

2019-11-09 Thread GitBox
srowen commented on issue #26415: [SPARK-18409][ML] LSH approxNearestNeighbors 
should use approxQuantile instead of sort
URL: https://github.com/apache/spark/pull/26415#issuecomment-552103863
 
 
   I think that's too much complexity for the caller, and changes the API. How 
about: start with a quantile that should yield 2x the number of results. Use a 
fixed relative error that still achieves some good speedup over a sort. While 
not enough results, double the quantile.
   
   I guess we need to check, if not already, that there are more items than 
nearest neighbors to begin with (i.e. can't ask for 10 nearest neighbors from 8 
items). Also, cap quantile at 1 (in which case return all items anyway)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org