zhengruifeng commented on issue #26948: [SPARK-30120][ML] LSH 
approxNearestNeighbors should use BoundedPriorityQueue when numNearestNeighbors 
is small
URL: https://github.com/apache/spark/pull/26948#issuecomment-568354256
 
 
   The total logic is kinda similar to the procedures `recall` & `ranking` in 
many classfication scenarios.
   recall: In the computation of `modelSubset`, more candidates than NN is 
selected. Even if it is said before 3.0.0 that `Compute threshold to get 
**exact** k elements.` and in current master that `Compute threshold to get 
around k elements.`
   Obtaining exact K elements are never impled, since method based on a 
threshold will select at least K elements.
   
   ranking: Then to get the final top-K items, candidates filter by above 
`hashDist` will be ranked by `keyDist`.
   
   I guess in the first part more candidates than NN are needed, no matter 
which selection method is used.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to