zhengruifeng edited a comment on issue #26415: [SPARK-18409][ML] LSH 
approxNearestNeighbors should use approxQuantile instead of sort
URL: https://github.com/apache/spark/pull/26415#issuecomment-552077143
 
 
   Maybe we can add a new param like `method`, it support serveral option:
   1, exact, existing method
   2, approx, using approxQuantile
   3, stackļ¼Œalso an exact method, using 
`org.apache.spark.util.BoundedPriorityQueue` or 
`org.apache.spark.ml.recommendation.TopByKeyAggregator`, it only supports a 
relative small `numNearestNeighbors` (maybe <1000, this threshold is related to 
RAM config) to avoid OOM. 
   `numNearestNeighbors` is usually a small number, and it should be much 
faster than approach 1&2.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to