[GitHub] spark issue #16965: [Spark-18450][ML] Scala API Change for LSH AND-amplifica...

Yunni Fri, 24 Feb 2017 15:01:48 -0800

Github user Yunni commented on the issue:

    https://github.com/apache/spark/pull/16965
  
    @merlintang 
    (1) `hashDistance` is only used for multi-probe NN Search. The term 
`numHashTables`, `numHashFunctions` is very hard to interpret in OR-AND cases.
    
    (2) For similarity join, we actually first do explode and then join. The 
join key would be type of vector. 
    
    (3) Yes. However, in order to get rows using hashes, we need to do 
intersections on large sets of rows. While in AND-OR cases, we do union of 
small sets of rows, which is more efficient.
    
    I also suggest we limit the scope to the implementation of 
AND-amplification here. We can open other tickets to discuss memory issues, etc.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16965: [Spark-18450][ML] Scala API Change for LSH AND-amplifica...

Reply via email to