Github user sethah commented on the issue:

    https://github.com/apache/spark/pull/15148
  
    I apologize for coming late to this, but I am taking a look at some of the 
documentation now. For `RandomProjection` class there are two links: one to 
wikipedia entry on stable distributions and one to a survey paper. The 
wikipedia links to the "stable distributions" section despite also having a 
section on random projections, which is the supposed algorithm. The paper has a 
"Random Projection" section as well - neither of the Random Projection methods 
in the links match the code here. I expressed this concern before. The approach 
in the Random Projection class does not match either the "Random Projection" 
method OR the "P-Stable distribution" methods that I find in the literature. 
    
    I summarized this in a comment way up towards the top. If this method is 
some well-accepted hybrid of the two, fine, but I think the references would 
leave users quite confused. I think it's nice to have certainty about the 
practical effectiveness of this method since it has already been deployed in 
industry, so my main concern is really just documentation. Right now, we're 
linking to sources which describe distinctly different algorithms than what we 
have implemented. Thoughts? 
    
    For convenience, some references: 
    * http://cseweb.ucsd.edu/~dasgupta/254-embeddings/lawrence.pdf
    * 
https://en.wikipedia.org/wiki/Locality-sensitive_hashing#LSH_algorithm_for_nearest_neighbor_search
    * https://people.csail.mit.edu/indyk/p117-andoni.pdf


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to