zhengruifeng commented on issue #26415: [SPARK-18409][ML] LSH approxNearestNeighbors should use approxQuantile instead of sort URL: https://github.com/apache/spark/pull/26415#issuecomment-552077143 Maybe we can add a new param like `method`, it support serveral option: 1, exact, existing method 2, approx, using approxQuantile 3, stackļ¼also an exact method, using `org.apache.spark.util.BoundedPriorityQueue` or `org.apache.spark.ml.recommendation.TopByKeyAggregator`, it only supports a relative small `numNearestNeighbors` (maybe <1000, this threshold is related to RAM config) to avoid OOM. However, it should be much faster than approach 1&2.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org