zhengruifeng commented on issue #26415: [SPARK-18409][ML] LSH 
approxNearestNeighbors should use approxQuantile instead of sort
URL: https://github.com/apache/spark/pull/26415#issuecomment-552077143
 
 
   Maybe we can add a new param like `method`, it support serveral option:
   1, exact, existing method
   2, approx, using approxQuantile
   3, stack,also an exact method, using 
`org.apache.spark.util.BoundedPriorityQueue` or 
`org.apache.spark.ml.recommendation.TopByKeyAggregator`, it only supports a 
relative small `numNearestNeighbors` (maybe <1000, this threshold is related to 
RAM config) to avoid OOM. However, it should be much faster than approach 1&2.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to