[GitHub] spark issue #16966: [SPARK-18409][ML]LSH approxNearestNeighbors should use a...

2017-05-06 Thread Yunni
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/16966 @MLnick @jkbradley @sethah Could you take a review? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #17092: [SPARK-18450][ML] Scala API Change for LSH AND-amplifica...

2017-05-06 Thread Yunni
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/17092 @MLnick @jkbradley @sethah Could you take a review? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #17092: [SPARK-18450][ML] Scala API Change for LSH AND-amplifica...

2017-04-06 Thread Yunni
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/17092 Ping. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #16966: [SPARK-18409][ML]LSH approxNearestNeighbors should use a...

2017-04-06 Thread Yunni
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/16966 Ping. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #17092: [SPARK-18450][ML] Scala API Change for LSH AND-amplifica...

2017-03-09 Thread Yunni
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/17092 @jkbradley @sethah Please take a review when you have time. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #16966: [SPARK-18409][ML]LSH approxNearestNeighbors should use a...

2017-03-09 Thread Yunni
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/16966 @MLnick @jkbradley Please take a review when you have time. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #17104: [MINOR][ML] Fix comments in LSH Examples and Python API

2017-02-28 Thread Yunni
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/17104 @srowen The full name works. Just want to make the comments shorter so that it's easier to read. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark issue #17092: [SPARK-18450][ML] Scala API Change for LSH AND-amplifica...

2017-02-28 Thread Yunni
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/17092 @jkbradley @MLnick Here is a clean PR. Sorry for messing up the previous one! @merlintang I am happy to continue our discussion here: https://issues.apache.org/jira/browse/SPARK-19771

[GitHub] spark pull request #17104: [MINOR][ML] Fix comments in LSH Examples and Pyth...

2017-02-28 Thread Yunni
GitHub user Yunni opened a pull request: https://github.com/apache/spark/pull/17104 [MINOR][ML] Fix comments in LSH Examples and Python API ## What changes were proposed in this pull request? Remove `org.apache.spark.examples.` in Add slash in one of the python doc

[GitHub] spark pull request #16715: [Spark-18080][ML][PYTHON] Python API & Examples f...

2017-02-27 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r103361528 --- Diff: python/pyspark/ml/feature.py --- @@ -120,6 +122,196 @@ def getThreshold(self): return self.getOrDefault(self.threshold

[GitHub] spark pull request #17092: [SPARK-18450][ML] Scala API Change for LSH AND-am...

2017-02-27 Thread Yunni
GitHub user Yunni opened a pull request: https://github.com/apache/spark/pull/17092 [SPARK-18450][ML] Scala API Change for LSH AND-amplification ## What changes were proposed in this pull request? Implemented a new Param numHashFunctions as the dimension of AND-amplification

[GitHub] spark pull request #16965: [SPARK-18450][ML] Scala API Change for LSH AND-am...

2017-02-27 Thread Yunni
Github user Yunni closed the pull request at: https://github.com/apache/spark/pull/16965 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #16965: [SPARK-18450][ML] Scala API Change for LSH AND-amplifica...

2017-02-27 Thread Yunni
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/16965 Looks like the rebase is making it even worse. I will reopen a PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #16966: [SPARK-18409][ML]LSH approxNearestNeighbors should use a...

2017-02-26 Thread Yunni
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/16966 @MLnick I did some experiments with WEX datasets. I have put the results in the description. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark issue #16965: [Spark-18450][ML] Scala API Change for LSH AND-amplifica...

2017-02-26 Thread Yunni
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/16965 The number of rows would be O(LN). The memory usage will be different as the size of each row has changed before and after the explode. Also the Catalyst Optimizer may do projections during join

[GitHub] spark issue #16965: [Spark-18450][ML] Scala API Change for LSH AND-amplifica...

2017-02-24 Thread Yunni
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/16965 @merlintang Not exactly. Each row will explode to L rows, where L is the number of hash tables. Like the following

[GitHub] spark issue #16965: [Spark-18450][ML] Scala API Change for LSH AND-amplifica...

2017-02-24 Thread Yunni
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/16965 @merlintang (1) `hashDistance` is only used for multi-probe NN Search. The term `numHashTables`, `numHashFunctions` is very hard to interpret in OR-AND cases. (2) For similarity join, we

[GitHub] spark issue #16965: [Spark-18450][ML] Scala API Change for LSH AND-amplifica...

2017-02-23 Thread Yunni
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/16965 @merlintang Sorry I still don't quite get why we need to support OR-AND when the effective threshold is low. My understanding is that we can always tune numHashTables and numHashFunctions

[GitHub] spark issue #16965: [Spark-18450][ML] Scala API Change for LSH AND-amplifica...

2017-02-23 Thread Yunni
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/16965 @merlintang We use AND-OR in both approxNearestNeighbor and approxSimilarityJoin, and it's more difficult for approxSimilarityJoin to adopt OR-AND than AND-OR. My understanding: for a (d1

[GitHub] spark issue #16715: [Spark-18080][ML][PYTHON] Python API & Examples for Loca...

2017-02-21 Thread Yunni
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/16715 Hi @e-m-m, I think the Python API will be included in Spark 2.2. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request #16966: [SPARK-18409][ML]LSH approxNearestNeighbors shoul...

2017-02-20 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/16966#discussion_r102065786 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala --- @@ -147,6 +148,15 @@ private[ml] abstract class LSHModel[T <: LSHMode

[GitHub] spark pull request #16966: [SPARK-18409][ML]LSH approxNearestNeighbors shoul...

2017-02-17 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/16966#discussion_r101832855 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala --- @@ -147,6 +148,15 @@ private[ml] abstract class LSHModel[T <: LSHMode

[GitHub] spark pull request #16966: [SPARK-18409][ML]LSH approxNearestNeighbors shoul...

2017-02-16 Thread Yunni
GitHub user Yunni opened a pull request: https://github.com/apache/spark/pull/16966 [SPARK-18409][ML]LSH approxNearestNeighbors should use approxQuantile instead of sort ## What changes were proposed in this pull request? In previous implementation of LSH approxNearestNeighbors

[GitHub] spark pull request #16965: [Spark-18450][ML] Scala API Change for LSH AND-am...

2017-02-16 Thread Yunni
GitHub user Yunni opened a pull request: https://github.com/apache/spark/pull/16965 [Spark-18450][ML] Scala API Change for LSH AND-amplification ## What changes were proposed in this pull request? Implemented a new Param numHashFunctions as the dimension of AND-amplification

[GitHub] spark issue #16715: [Spark-18080][ML][PYTHON] Python API & Examples for Loca...

2017-02-16 Thread Yunni
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/16715 Sure. Will do. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #16715: [Spark-18080][ML][PYTHON] Python API & Examples for Loca...

2017-02-14 Thread Yunni
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/16715 @sethah Really appreciate your detailed code review and comments. :) @MLnick @yanboliang Thank you for the help as well. Please let me know if you guys have any other comments. --- If your

[GitHub] spark pull request #16715: [Spark-18080][ML][PYTHON] Python API & Examples f...

2017-02-14 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r101089800 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala --- @@ -222,17 +222,18 @@ private[ml] abstract class LSHModel[T <: LSHMode

[GitHub] spark pull request #16715: [Spark-18080][ML][PYTHON] Python API & Examples f...

2017-02-14 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r101089807 --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/MinHashLSHExample.scala --- @@ -37,38 +43,45 @@ object MinHashLSHExample { (0

[GitHub] spark pull request #16715: [Spark-18080][ML][PYTHON] Python API & Examples f...

2017-02-14 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r101089762 --- Diff: python/pyspark/ml/feature.py --- @@ -755,6 +945,103 @@ def maxAbs(self): @inherit_doc +class MinHashLSH(JavaEstimator

[GitHub] spark pull request #16715: [Spark-18080][ML][PYTHON] Python API & Examples f...

2017-02-13 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100966548 --- Diff: docs/ml-features.md --- @@ -1558,6 +1558,15 @@ for more details on the API. {% include_example java/org/apache/spark/examples/ml

[GitHub] spark pull request #16715: [Spark-18080][ML][PYTHON] Python API & Examples f...

2017-02-13 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100966555 --- Diff: python/pyspark/ml/feature.py --- @@ -120,6 +122,198 @@ def getThreshold(self): return self.getOrDefault(self.threshold

[GitHub] spark pull request #16715: [Spark-18080][ML][PYTHON] Python API & Examples f...

2017-02-13 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100966541 --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/MinHashLSHExample.scala --- @@ -37,38 +38,44 @@ object MinHashLSHExample { (0

[GitHub] spark pull request #16715: [Spark-18080][ML][PYTHON] Python API & Examples f...

2017-02-13 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100966552 --- Diff: examples/src/main/python/ml/bucketed_random_projection_lsh_example.py --- @@ -0,0 +1,81 @@ +# +# Licensed to the Apache Software Foundation

[GitHub] spark pull request #16715: [Spark-18080][ML][PYTHON] Python API & Examples f...

2017-02-13 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100966561 --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/BucketedRandomProjectionLSHExample.scala --- @@ -38,40 +39,45 @@ object

[GitHub] spark pull request #16715: [Spark-18080][ML][PYTHON] Python API & Examples f...

2017-02-13 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100966545 --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaBucketedRandomProjectionLSHExample.java --- @@ -35,6 +35,8 @@ import

[GitHub] spark pull request #16715: [Spark-18080][ML][PYTHON] Python API & Examples f...

2017-02-13 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100966554 --- Diff: python/pyspark/ml/feature.py --- @@ -120,6 +122,198 @@ def getThreshold(self): return self.getOrDefault(self.threshold

[GitHub] spark pull request #16715: [Spark-18080][ML][PYTHON] Python API & Examples f...

2017-02-13 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100966534 --- Diff: python/pyspark/ml/feature.py --- @@ -120,6 +122,198 @@ def getThreshold(self): return self.getOrDefault(self.threshold

[GitHub] spark pull request #16715: [Spark-18080][ML][PYTHON] Python API & Examples f...

2017-02-13 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100966539 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/BucketedRandomProjectionLSH.scala --- @@ -111,8 +111,8 @@ class BucketedRandomProjectionLSHModel

[GitHub] spark pull request #16715: [Spark-18080][ML][PYTHON] Python API & Examples f...

2017-02-13 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100966530 --- Diff: python/pyspark/ml/feature.py --- @@ -120,6 +122,198 @@ def getThreshold(self): return self.getOrDefault(self.threshold

[GitHub] spark pull request #16715: [Spark-18080][ML][PYTHON] Python API & Examples f...

2017-02-13 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100966537 --- Diff: python/pyspark/ml/feature.py --- @@ -120,6 +122,198 @@ def getThreshold(self): return self.getOrDefault(self.threshold

[GitHub] spark issue #16715: [Spark-18080][ML] Python API & Examples for Locality Sen...

2017-02-08 Thread Yunni
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/16715 Jenkins retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #16715: [Spark-18080][ML] Python API & Examples for Local...

2017-02-08 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100199037 --- Diff: python/pyspark/ml/feature.py --- @@ -120,6 +122,200 @@ def getThreshold(self): return self.getOrDefault(self.threshold

[GitHub] spark pull request #16715: [Spark-18080][ML] Python API & Examples for Local...

2017-02-08 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100198559 --- Diff: python/pyspark/ml/feature.py --- @@ -120,6 +122,200 @@ def getThreshold(self): return self.getOrDefault(self.threshold

[GitHub] spark pull request #16715: [Spark-18080][ML] Python API & Examples for Local...

2017-02-08 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100192059 --- Diff: python/pyspark/ml/feature.py --- @@ -120,6 +122,200 @@ def getThreshold(self): return self.getOrDefault(self.threshold

[GitHub] spark pull request #16715: [Spark-18080][ML] Python API & Examples for Local...

2017-02-08 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100193058 --- Diff: python/pyspark/ml/feature.py --- @@ -755,6 +951,102 @@ def maxAbs(self): @inherit_doc +class MinHashLSH(JavaEstimator

[GitHub] spark pull request #16715: [Spark-18080][ML] Python API & Examples for Local...

2017-02-08 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100193020 --- Diff: python/pyspark/ml/feature.py --- @@ -755,6 +951,102 @@ def maxAbs(self): @inherit_doc +class MinHashLSH(JavaEstimator

[GitHub] spark pull request #16715: [Spark-18080][ML] Python API & Examples for Local...

2017-02-08 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100193043 --- Diff: python/pyspark/ml/feature.py --- @@ -755,6 +951,102 @@ def maxAbs(self): @inherit_doc +class MinHashLSH(JavaEstimator

[GitHub] spark pull request #16715: [Spark-18080][ML] Python API & Examples for Local...

2017-02-08 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100192985 --- Diff: examples/src/main/python/ml/bucketed_random_projection_lsh.py --- @@ -0,0 +1,76 @@ +# +# Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request #16715: [Spark-18080][ML] Python API & Examples for Local...

2017-02-08 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100192933 --- Diff: examples/src/main/python/ml/bucketed_random_projection_lsh.py --- @@ -0,0 +1,76 @@ +# +# Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request #16715: [Spark-18080][ML] Python API & Examples for Local...

2017-02-08 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100192881 --- Diff: examples/src/main/python/ml/min_hash_lsh.py --- @@ -0,0 +1,75 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] spark pull request #16715: [Spark-18080][ML] Python API & Examples for Local...

2017-02-08 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100192685 --- Diff: examples/src/main/python/ml/min_hash_lsh.py --- @@ -0,0 +1,75 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] spark pull request #16715: [Spark-18080][ML] Python API & Examples for Local...

2017-02-08 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100192402 --- Diff: python/pyspark/ml/feature.py --- @@ -120,6 +122,200 @@ def getThreshold(self): return self.getOrDefault(self.threshold

[GitHub] spark pull request #16715: [Spark-18080][ML] Python API & Examples for Local...

2017-02-08 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100192347 --- Diff: python/pyspark/ml/feature.py --- @@ -120,6 +122,200 @@ def getThreshold(self): return self.getOrDefault(self.threshold

[GitHub] spark pull request #16715: [Spark-18080][ML] Python API & Examples for Local...

2017-02-08 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100192333 --- Diff: python/pyspark/ml/feature.py --- @@ -120,6 +122,200 @@ def getThreshold(self): return self.getOrDefault(self.threshold

[GitHub] spark pull request #16715: [Spark-18080][ML] Python API & Examples for Local...

2017-02-08 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100192298 --- Diff: python/pyspark/ml/feature.py --- @@ -120,6 +122,200 @@ def getThreshold(self): return self.getOrDefault(self.threshold

[GitHub] spark pull request #16715: [Spark-18080][ML] Python API & Examples for Local...

2017-02-08 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100192314 --- Diff: python/pyspark/ml/feature.py --- @@ -120,6 +122,200 @@ def getThreshold(self): return self.getOrDefault(self.threshold

[GitHub] spark pull request #16715: [Spark-18080][ML] Python API & Examples for Local...

2017-02-08 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100192074 --- Diff: python/pyspark/ml/feature.py --- @@ -120,6 +122,200 @@ def getThreshold(self): return self.getOrDefault(self.threshold

[GitHub] spark pull request #16715: [Spark-18080][ML] Python API & Examples for Local...

2017-02-08 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100192026 --- Diff: python/pyspark/ml/feature.py --- @@ -120,6 +122,200 @@ def getThreshold(self): return self.getOrDefault(self.threshold

[GitHub] spark issue #16715: [Spark-18080][ML] Python API & Examples for Locality Sen...

2017-02-06 Thread Yunni
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/16715 @yanboliang, just a friendly reminder please don't forget to review the PR when you have time. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark issue #16715: [Spark-18080][ML] Python API & Examples for Locality Sen...

2017-01-28 Thread Yunni
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/16715 Thanks very much, @yanboliang ~~ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #16715: [Spark-18080][ML] Python API & Examples for Locality Sen...

2017-01-26 Thread Yunni
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/16715 @yanboliang @jkbradley Please take a look. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request #16715: [Spark-18080][ML] Python API & Examples for Local...

2017-01-26 Thread Yunni
GitHub user Yunni opened a pull request: https://github.com/apache/spark/pull/16715 [Spark-18080][ML] Python API & Examples for Locality Sensitive Hashing ## What changes were proposed in this pull request? This pull request includes python API and examples for LSH. The

[GitHub] spark issue #15795: [SPARK-18081][ML][DOCS] Add user guide for Locality Sens...

2016-12-02 Thread Yunni
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/15795 @MLnick @jkbradley I have changed the examples to be 1 example per algorithm which does transform, approxNearestNeighbor, and approxSimilarityJoin. PTAL. --- If your project is set up for it, you

[GitHub] spark pull request #15795: [SPARK-18081][ML][DOCS] Add user guide for Locali...

2016-12-02 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/15795#discussion_r90736878 --- Diff: docs/ml-features.md --- @@ -1478,3 +1478,139 @@ for more details on the API. {% include_example python/ml/chisq_selector_example.py

[GitHub] spark pull request #15795: [SPARK-18081][ML][DOCS] Add user guide for Locali...

2016-12-02 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/15795#discussion_r90736862 --- Diff: docs/ml-features.md --- @@ -1478,3 +1478,139 @@ for more details on the API. {% include_example python/ml/chisq_selector_example.py

[GitHub] spark pull request #15795: [SPARK-18081][ML][DOCS] Add user guide for Locali...

2016-12-02 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/15795#discussion_r90736883 --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/ApproxSimilarityJoinExample.scala --- @@ -0,0 +1,67 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #15795: [SPARK-18081][ML][DOCS] Add user guide for Locali...

2016-12-02 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/15795#discussion_r90736839 --- Diff: docs/ml-features.md --- @@ -1478,3 +1478,139 @@ for more details on the API. {% include_example python/ml/chisq_selector_example.py

[GitHub] spark pull request #15795: [SPARK-18081][ML][DOCS] Add user guide for Locali...

2016-12-02 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/15795#discussion_r90736831 --- Diff: docs/ml-features.md --- @@ -1478,3 +1478,139 @@ for more details on the API. {% include_example python/ml/chisq_selector_example.py

[GitHub] spark pull request #15795: [SPARK-18081][ML][DOCS] Add user guide for Locali...

2016-12-02 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/15795#discussion_r90736852 --- Diff: docs/ml-features.md --- @@ -1478,3 +1478,139 @@ for more details on the API. {% include_example python/ml/chisq_selector_example.py

[GitHub] spark pull request #15795: [SPARK-18081][ML][DOCS] Add user guide for Locali...

2016-12-02 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/15795#discussion_r90736546 --- Diff: docs/ml-features.md --- @@ -1478,3 +1478,139 @@ for more details on the API. {% include_example python/ml/chisq_selector_example.py

[GitHub] spark pull request #15795: [SPARK-18081][ML][DOCS] Add user guide for Locali...

2016-12-02 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/15795#discussion_r90736531 --- Diff: docs/ml-features.md --- @@ -1478,3 +1478,139 @@ for more details on the API. {% include_example python/ml/chisq_selector_example.py

[GitHub] spark pull request #15795: [SPARK-18081][ML][DOCS] Add user guide for Locali...

2016-12-02 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/15795#discussion_r90736515 --- Diff: docs/ml-features.md --- @@ -1478,3 +1478,139 @@ for more details on the API. {% include_example python/ml/chisq_selector_example.py

[GitHub] spark pull request #15795: [SPARK-18081][ML][DOCS] Add user guide for Locali...

2016-12-02 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/15795#discussion_r90736506 --- Diff: docs/ml-features.md --- @@ -1478,3 +1478,139 @@ for more details on the API. {% include_example python/ml/chisq_selector_example.py

[GitHub] spark issue #15795: [SPARK-18081] Add user guide for Locality Sensitive Hash...

2016-11-28 Thread Yunni
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/15795 @sethah I think so. I have made changes for the docs but I haven't made changes to the examples. Please take a look when you get a chance. --- If your project is set up for it, you can reply

[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-27 Thread Yunni
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/15874 @jkbradley If you don't have more comments, can we merge this because I need to change the examples in #15795 ? --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request #15795: [SPARK-18081] Add user guide for Locality Sensiti...

2016-11-27 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/15795#discussion_r89711243 --- Diff: docs/ml-features.md --- @@ -1396,3 +1396,149 @@ for more details on the API. {% include_example python/ml/chisq_selector_example.py

[GitHub] spark pull request #15795: [SPARK-18081] Add user guide for Locality Sensiti...

2016-11-27 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/15795#discussion_r89711247 --- Diff: docs/ml-features.md --- @@ -1396,3 +1396,149 @@ for more details on the API. {% include_example python/ml/chisq_selector_example.py

[GitHub] spark pull request #15795: [SPARK-18081] Add user guide for Locality Sensiti...

2016-11-27 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/15795#discussion_r89711255 --- Diff: docs/ml-features.md --- @@ -1396,3 +1396,149 @@ for more details on the API. {% include_example python/ml/chisq_selector_example.py

[GitHub] spark pull request #15795: [SPARK-18081] Add user guide for Locality Sensiti...

2016-11-27 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/15795#discussion_r89711233 --- Diff: docs/ml-features.md --- @@ -1396,3 +1396,149 @@ for more details on the API. {% include_example python/ml/chisq_selector_example.py

[GitHub] spark pull request #15795: [SPARK-18081] Add user guide for Locality Sensiti...

2016-11-27 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/15795#discussion_r89711231 --- Diff: docs/ml-features.md --- @@ -1396,3 +1396,149 @@ for more details on the API. {% include_example python/ml/chisq_selector_example.py

[GitHub] spark pull request #15795: [SPARK-18081] Add user guide for Locality Sensiti...

2016-11-27 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/15795#discussion_r89711207 --- Diff: docs/ml-features.md --- @@ -1396,3 +1396,149 @@ for more details on the API. {% include_example python/ml/chisq_selector_example.py

[GitHub] spark pull request #15795: [SPARK-18081] Add user guide for Locality Sensiti...

2016-11-27 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/15795#discussion_r89711204 --- Diff: docs/ml-features.md --- @@ -1396,3 +1396,149 @@ for more details on the API. {% include_example python/ml/chisq_selector_example.py

[GitHub] spark pull request #15795: [SPARK-18081] Add user guide for Locality Sensiti...

2016-11-27 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/15795#discussion_r89711166 --- Diff: docs/ml-features.md --- @@ -1396,3 +1396,149 @@ for more details on the API. {% include_example python/ml/chisq_selector_example.py

[GitHub] spark pull request #15795: [SPARK-18081] Add user guide for Locality Sensiti...

2016-11-27 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/15795#discussion_r89711162 --- Diff: docs/ml-features.md --- @@ -1396,3 +1396,149 @@ for more details on the API. {% include_example python/ml/chisq_selector_example.py

[GitHub] spark pull request #15795: [SPARK-18081] Add user guide for Locality Sensiti...

2016-11-27 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/15795#discussion_r89711165 --- Diff: docs/ml-features.md --- @@ -1396,3 +1396,149 @@ for more details on the API. {% include_example python/ml/chisq_selector_example.py

[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-22 Thread Yunni
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/15874 Thanks @sethah ! Your comment was very helpful and detailed :-) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-22 Thread Yunni
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/15874 @sethah PTAL --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-22 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/15874#discussion_r89215405 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MinHashLSH.scala --- @@ -31,36 +31,38 @@ import org.apache.spark.sql.types.StructType

[GitHub] spark pull request #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-22 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/15874#discussion_r89215190 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MinHashLSH.scala --- @@ -112,25 +116,26 @@ class MinHash(override val uid: String) extends LSH

[GitHub] spark pull request #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-22 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/15874#discussion_r89215142 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/MinHashLSHSuite.scala --- @@ -97,12 +118,31 @@ class MinHashSuite extends SparkFunSuite

[GitHub] spark pull request #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-22 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/15874#discussion_r89175604 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/BucketedRandomProjectionLSHSuite.scala --- @@ -43,70 +43,73 @@ class RandomProjectionSuite

[GitHub] spark pull request #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-22 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/15874#discussion_r89175438 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala --- @@ -155,8 +148,30 @@ private[ml] abstract class LSHModel[T <: LSHMode

[GitHub] spark pull request #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-22 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/15874#discussion_r89175473 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala --- @@ -155,8 +148,30 @@ private[ml] abstract class LSHModel[T <: LSHMode

[GitHub] spark pull request #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-22 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/15874#discussion_r89175497 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MinHashLSH.scala --- @@ -31,36 +31,38 @@ import org.apache.spark.sql.types.StructType

[GitHub] spark pull request #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-22 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/15874#discussion_r89175448 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala --- @@ -155,8 +148,30 @@ private[ml] abstract class LSHModel[T <: LSHMode

[GitHub] spark pull request #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-22 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/15874#discussion_r89175401 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MinHashLSH.scala --- @@ -31,36 +31,38 @@ import org.apache.spark.sql.types.StructType

[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-18 Thread Yunni
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/15874 Hi @sethah, grouping to a number of buckets does not really affect the independence since p is a mach larger prime. For example, in http://people.csail.mit.edu/mip/papers/kwise-lb/kwise-lb.pdf

[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-18 Thread Yunni
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/15874 @jkbradley Awesome, thanks so much! :) Now that the API is finalized, I will work on the User Doc --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-17 Thread Yunni
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/15874 Hi @jkbradley, **MinHash** Yes, I agree that I shouldn't have said it's perfect hashing. Theoretically, it should be Min-wise Independent Permutation Family. What we used here is 2

[GitHub] spark pull request #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-17 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/15874#discussion_r88569303 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/BucketedRandomProjectionLSHSuite.scala --- @@ -43,70 +43,72 @@ class RandomProjectionSuite

  1   2   3   >