srowen commented on issue #26415: [SPARK-18409][ML] LSH approxNearestNeighbors
should use approxQuantile instead of sort
URL: https://github.com/apache/spark/pull/26415#issuecomment-552103863
I think that's too much complexity for the caller, and changes the API. How
about: start with a quantile that should yield 2x the number of results. Use a
fixed relative error that still achieves some good speedup over a sort. While
not enough results, double the quantile.
I guess we need to check, if not already, that there are more items than
nearest neighbors to begin with (i.e. can't ask for 10 nearest neighbors from 8
items). Also, cap quantile at 1 (in which case return all items anyway)
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org
With regards,
Apache Git Services
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org