huaxingao opened a new pull request #26415: [SPARK-18409][ML] LSH approxNearestNeighbors should use approxQuantile instead of sort URL: https://github.com/apache/spark/pull/26415 ### What changes were proposed in this pull request? ```LSHModel.approxNearestNeighbors``` sorts the full dataset on the hashDistance in order to find a threshold. This PR uses approxQuantile instead. ### Why are the changes needed? To improve performance. ### Does this PR introduce any user-facing change? Yes. Changed ```LSH``` to make it extend ```HasRelativeError``` ```LSH``` and ```LSHModel``` have new APIs ```setRelativeError/getRelativeError``` ### How was this patch tested? Existing tests. Also added a couple doc test in python to test newly added ```getRelativeError```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
