Github user Yunni commented on the issue:
https://github.com/apache/spark/pull/15800
> One way to look at it is that (a) will contain many duplicates in the L
sets of points, so (b) is more likely to have higher precision and recall.
I think this might be the place we are not on the same page. I consider the
output of (a)/(b) as our "Probing Sequence" (or "Probing buckets"), and in the
next step we pick and return k nearest keys in those buckets. Do you agree with
this part?
If you agree, then I claim more duplicates (It's actually redundancy rather
than duplicates) brings more chance for finding the correct k nearest neighbors
because we enlarge our search range.
If you disagree, I think we are not discussing based on the same NN search
implementation (differs from the current implementation). I would like to know
how you return k nearest neighbor after (b)?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]