[GitHub] spark pull request #16715: [Spark-18080][ML] Python API & Examples for Local...

2017-02-08 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100192881 --- Diff: examples/src/main/python/ml/min_hash_lsh.py --- @@ -0,0 +1,75 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] spark pull request #16715: [Spark-18080][ML] Python API & Examples for Local...

2017-02-08 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100192347 --- Diff: python/pyspark/ml/feature.py --- @@ -120,6 +122,200 @@ def getThreshold(self): return self.getOrDefault(self.threshold

[GitHub] spark pull request #16715: [Spark-18080][ML] Python API & Examples for Local...

2017-02-08 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100192333 --- Diff: python/pyspark/ml/feature.py --- @@ -120,6 +122,200 @@ def getThreshold(self): return self.getOrDefault(self.threshold

[GitHub] spark pull request #16715: [Spark-18080][ML] Python API & Examples for Local...

2017-02-08 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100192402 --- Diff: python/pyspark/ml/feature.py --- @@ -120,6 +122,200 @@ def getThreshold(self): return self.getOrDefault(self.threshold

[GitHub] spark pull request #16715: [Spark-18080][ML] Python API & Examples for Local...

2017-02-08 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100192298 --- Diff: python/pyspark/ml/feature.py --- @@ -120,6 +122,200 @@ def getThreshold(self): return self.getOrDefault(self.threshold

[GitHub] spark pull request #16715: [Spark-18080][ML] Python API & Examples for Local...

2017-02-08 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100192314 --- Diff: python/pyspark/ml/feature.py --- @@ -120,6 +122,200 @@ def getThreshold(self): return self.getOrDefault(self.threshold

[GitHub] spark pull request #16715: [Spark-18080][ML] Python API & Examples for Local...

2017-02-08 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100193020 --- Diff: python/pyspark/ml/feature.py --- @@ -755,6 +951,102 @@ def maxAbs(self): @inherit_doc +class MinHashLSH(JavaEstimator

[GitHub] spark pull request #16715: [Spark-18080][ML] Python API & Examples for Local...

2017-02-08 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100193058 --- Diff: python/pyspark/ml/feature.py --- @@ -755,6 +951,102 @@ def maxAbs(self): @inherit_doc +class MinHashLSH(JavaEstimator

[GitHub] spark pull request #16715: [Spark-18080][ML] Python API & Examples for Local...

2017-02-08 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100192685 --- Diff: examples/src/main/python/ml/min_hash_lsh.py --- @@ -0,0 +1,75 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] spark pull request #16715: [Spark-18080][ML] Python API & Examples for Local...

2017-02-08 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100192933 --- Diff: examples/src/main/python/ml/bucketed_random_projection_lsh.py --- @@ -0,0 +1,76 @@ +# +# Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request #16715: [Spark-18080][ML] Python API & Examples for Local...

2017-02-08 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100192985 --- Diff: examples/src/main/python/ml/bucketed_random_projection_lsh.py --- @@ -0,0 +1,76 @@ +# +# Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request #16715: [Spark-18080][ML] Python API & Examples for Local...

2017-02-08 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100193043 --- Diff: python/pyspark/ml/feature.py --- @@ -755,6 +951,102 @@ def maxAbs(self): @inherit_doc +class MinHashLSH(JavaEstimator

[GitHub] spark pull request #16715: [Spark-18080][ML] Python API & Examples for Local...

2017-02-08 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100192074 --- Diff: python/pyspark/ml/feature.py --- @@ -120,6 +122,200 @@ def getThreshold(self): return self.getOrDefault(self.threshold

[GitHub] spark pull request #16715: [Spark-18080][ML] Python API & Examples for Local...

2017-02-08 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100192026 --- Diff: python/pyspark/ml/feature.py --- @@ -120,6 +122,200 @@ def getThreshold(self): return self.getOrDefault(self.threshold

[GitHub] spark pull request #16715: [Spark-18080][ML] Python API & Examples for Local...

2017-02-08 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100199037 --- Diff: python/pyspark/ml/feature.py --- @@ -120,6 +122,200 @@ def getThreshold(self): return self.getOrDefault(self.threshold

[GitHub] spark pull request #16715: [Spark-18080][ML] Python API & Examples for Local...

2017-02-08 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100192059 --- Diff: python/pyspark/ml/feature.py --- @@ -120,6 +122,200 @@ def getThreshold(self): return self.getOrDefault(self.threshold

[GitHub] spark pull request #16715: [Spark-18080][ML] Python API & Examples for Local...

2017-02-08 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100198559 --- Diff: python/pyspark/ml/feature.py --- @@ -120,6 +122,200 @@ def getThreshold(self): return self.getOrDefault(self.threshold

[GitHub] spark issue #16715: [Spark-18080][ML] Python API & Examples for Locality Sen...

2017-02-08 Thread Yunni
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/16715 Jenkins retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #16715: [Spark-18080][ML] Python API & Examples for Locality Sen...

2017-02-06 Thread Yunni
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/16715 @yanboliang, just a friendly reminder please don't forget to review the PR when you have time. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark issue #16715: [Spark-18080][ML] Python API & Examples for Locality Sen...

2017-01-28 Thread Yunni
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/16715 Thanks very much, @yanboliang ~~ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #16715: [Spark-18080][ML] Python API & Examples for Locality Sen...

2017-01-26 Thread Yunni
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/16715 @yanboliang @jkbradley Please take a look. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request #16715: [Spark-18080][ML] Python API & Examples for Local...

2017-01-26 Thread Yunni
GitHub user Yunni opened a pull request: https://github.com/apache/spark/pull/16715 [Spark-18080][ML] Python API & Examples for Locality Sensitive Hashing ## What changes were proposed in this pull request? This pull request includes python API and examples for LSH. The

[GitHub] spark pull request #16966: [SPARK-18409][ML]LSH approxNearestNeighbors shoul...

2017-02-20 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/16966#discussion_r102065786 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala --- @@ -147,6 +148,15 @@ private[ml] abstract class LSHModel[T <: LSHMode

[GitHub] spark pull request #16966: [SPARK-18409][ML]LSH approxNearestNeighbors shoul...

2017-02-17 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/16966#discussion_r101832855 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala --- @@ -147,6 +148,15 @@ private[ml] abstract class LSHModel[T <: LSHMode

[GitHub] spark pull request #16966: [SPARK-18409][ML]LSH approxNearestNeighbors shoul...

2017-02-16 Thread Yunni
GitHub user Yunni opened a pull request: https://github.com/apache/spark/pull/16966 [SPARK-18409][ML]LSH approxNearestNeighbors should use approxQuantile instead of sort ## What changes were proposed in this pull request? In previous implementation of LSH approxNearestNeighbors

[GitHub] spark issue #16965: [Spark-18450][ML] Scala API Change for LSH AND-amplifica...

2017-02-23 Thread Yunni
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/16965 @merlintang Sorry I still don't quite get why we need to support OR-AND when the effective threshold is low. My understanding is that we can always tune numHashTables and numHashFunctions

[GitHub] spark issue #16965: [Spark-18450][ML] Scala API Change for LSH AND-amplifica...

2017-02-24 Thread Yunni
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/16965 @merlintang Not exactly. Each row will explode to L rows, where L is the number of hash tables. Like the following

[GitHub] spark issue #16965: [Spark-18450][ML] Scala API Change for LSH AND-amplifica...

2017-02-24 Thread Yunni
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/16965 @merlintang (1) `hashDistance` is only used for multi-probe NN Search. The term `numHashTables`, `numHashFunctions` is very hard to interpret in OR-AND cases. (2) For similarity join, we

[GitHub] spark issue #16965: [Spark-18450][ML] Scala API Change for LSH AND-amplifica...

2017-02-23 Thread Yunni
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/16965 @merlintang We use AND-OR in both approxNearestNeighbor and approxSimilarityJoin, and it's more difficult for approxSimilarityJoin to adopt OR-AND than AND-OR. My understanding: for a (d1

[GitHub] spark issue #16715: [Spark-18080][ML][PYTHON] Python API & Examples for Loca...

2017-02-21 Thread Yunni
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/16715 Hi @e-m-m, I think the Python API will be included in Spark 2.2. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request #16715: [Spark-18080][ML][PYTHON] Python API & Examples f...

2017-02-13 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100966530 --- Diff: python/pyspark/ml/feature.py --- @@ -120,6 +122,198 @@ def getThreshold(self): return self.getOrDefault(self.threshold

[GitHub] spark pull request #16715: [Spark-18080][ML][PYTHON] Python API & Examples f...

2017-02-13 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100966534 --- Diff: python/pyspark/ml/feature.py --- @@ -120,6 +122,198 @@ def getThreshold(self): return self.getOrDefault(self.threshold

[GitHub] spark pull request #16715: [Spark-18080][ML][PYTHON] Python API & Examples f...

2017-02-13 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100966545 --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaBucketedRandomProjectionLSHExample.java --- @@ -35,6 +35,8 @@ import

[GitHub] spark pull request #16715: [Spark-18080][ML][PYTHON] Python API & Examples f...

2017-02-13 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100966554 --- Diff: python/pyspark/ml/feature.py --- @@ -120,6 +122,198 @@ def getThreshold(self): return self.getOrDefault(self.threshold

[GitHub] spark pull request #16715: [Spark-18080][ML][PYTHON] Python API & Examples f...

2017-02-13 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100966552 --- Diff: examples/src/main/python/ml/bucketed_random_projection_lsh_example.py --- @@ -0,0 +1,81 @@ +# +# Licensed to the Apache Software Foundation

[GitHub] spark pull request #16715: [Spark-18080][ML][PYTHON] Python API & Examples f...

2017-02-13 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100966539 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/BucketedRandomProjectionLSH.scala --- @@ -111,8 +111,8 @@ class BucketedRandomProjectionLSHModel

[GitHub] spark pull request #16715: [Spark-18080][ML][PYTHON] Python API & Examples f...

2017-02-13 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100966537 --- Diff: python/pyspark/ml/feature.py --- @@ -120,6 +122,198 @@ def getThreshold(self): return self.getOrDefault(self.threshold

[GitHub] spark pull request #16715: [Spark-18080][ML][PYTHON] Python API & Examples f...

2017-02-13 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100966555 --- Diff: python/pyspark/ml/feature.py --- @@ -120,6 +122,198 @@ def getThreshold(self): return self.getOrDefault(self.threshold

[GitHub] spark pull request #16715: [Spark-18080][ML][PYTHON] Python API & Examples f...

2017-02-13 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100966561 --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/BucketedRandomProjectionLSHExample.scala --- @@ -38,40 +39,45 @@ object

[GitHub] spark pull request #16715: [Spark-18080][ML][PYTHON] Python API & Examples f...

2017-02-13 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100966548 --- Diff: docs/ml-features.md --- @@ -1558,6 +1558,15 @@ for more details on the API. {% include_example java/org/apache/spark/examples/ml

[GitHub] spark pull request #16715: [Spark-18080][ML][PYTHON] Python API & Examples f...

2017-02-13 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r100966541 --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/MinHashLSHExample.scala --- @@ -37,38 +38,44 @@ object MinHashLSHExample { (0

[GitHub] spark pull request #16715: [Spark-18080][ML][PYTHON] Python API & Examples f...

2017-02-14 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r101089762 --- Diff: python/pyspark/ml/feature.py --- @@ -755,6 +945,103 @@ def maxAbs(self): @inherit_doc +class MinHashLSH(JavaEstimator

[GitHub] spark pull request #16715: [Spark-18080][ML][PYTHON] Python API & Examples f...

2017-02-14 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r101089807 --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/MinHashLSHExample.scala --- @@ -37,38 +43,45 @@ object MinHashLSHExample { (0

[GitHub] spark pull request #16715: [Spark-18080][ML][PYTHON] Python API & Examples f...

2017-02-14 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r101089800 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala --- @@ -222,17 +222,18 @@ private[ml] abstract class LSHModel[T <: LSHMode

[GitHub] spark issue #16715: [Spark-18080][ML][PYTHON] Python API & Examples for Loca...

2017-02-14 Thread Yunni
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/16715 @sethah Really appreciate your detailed code review and comments. :) @MLnick @yanboliang Thank you for the help as well. Please let me know if you guys have any other comments. --- If your

[GitHub] spark issue #16715: [Spark-18080][ML][PYTHON] Python API & Examples for Loca...

2017-02-16 Thread Yunni
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/16715 Sure. Will do. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #16965: [Spark-18450][ML] Scala API Change for LSH AND-am...

2017-02-16 Thread Yunni
GitHub user Yunni opened a pull request: https://github.com/apache/spark/pull/16965 [Spark-18450][ML] Scala API Change for LSH AND-amplification ## What changes were proposed in this pull request? Implemented a new Param numHashFunctions as the dimension of AND-amplification

[GitHub] spark pull request #17092: [SPARK-18450][ML] Scala API Change for LSH AND-am...

2017-02-27 Thread Yunni
GitHub user Yunni opened a pull request: https://github.com/apache/spark/pull/17092 [SPARK-18450][ML] Scala API Change for LSH AND-amplification ## What changes were proposed in this pull request? Implemented a new Param numHashFunctions as the dimension of AND-amplification

[GitHub] spark pull request #16715: [Spark-18080][ML][PYTHON] Python API & Examples f...

2017-02-27 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r103361528 --- Diff: python/pyspark/ml/feature.py --- @@ -120,6 +122,196 @@ def getThreshold(self): return self.getOrDefault(self.threshold

[GitHub] spark issue #17104: [MINOR][ML] Fix comments in LSH Examples and Python API

2017-02-28 Thread Yunni
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/17104 @srowen The full name works. Just want to make the comments shorter so that it's easier to read. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark issue #16965: [Spark-18450][ML] Scala API Change for LSH AND-amplifica...

2017-02-26 Thread Yunni
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/16965 The number of rows would be O(LN). The memory usage will be different as the size of each row has changed before and after the explode. Also the Catalyst Optimizer may do projections during join

[GitHub] spark issue #16966: [SPARK-18409][ML]LSH approxNearestNeighbors should use a...

2017-02-26 Thread Yunni
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/16966 @MLnick I did some experiments with WEX datasets. I have put the results in the description. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request #17104: [MINOR][ML] Fix comments in LSH Examples and Pyth...

2017-02-28 Thread Yunni
GitHub user Yunni opened a pull request: https://github.com/apache/spark/pull/17104 [MINOR][ML] Fix comments in LSH Examples and Python API ## What changes were proposed in this pull request? Remove `org.apache.spark.examples.` in Add slash in one of the python doc

[GitHub] spark issue #16965: [SPARK-18450][ML] Scala API Change for LSH AND-amplifica...

2017-02-27 Thread Yunni
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/16965 Looks like the rebase is making it even worse. I will reopen a PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request #16965: [SPARK-18450][ML] Scala API Change for LSH AND-am...

2017-02-27 Thread Yunni
Github user Yunni closed the pull request at: https://github.com/apache/spark/pull/16965 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-25 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r80411490 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/lsh/LSH.scala --- @@ -0,0 +1,290 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-25 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r80411374 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/lsh/LSH.scala --- @@ -0,0 +1,290 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-05 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r82018962 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala --- @@ -0,0 +1,338 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-05 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r82027088 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala --- @@ -0,0 +1,334 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-05 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r82027065 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala --- @@ -0,0 +1,334 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-05 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r82027114 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala --- @@ -0,0 +1,334 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-05 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r82027003 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala --- @@ -0,0 +1,334 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-05 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r82026834 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala --- @@ -0,0 +1,334 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-05 Thread Yunni
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/15148 @jkbradley Take you time for the code review. :) I will be working on the open dataset testing at the same time. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-05 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r82027195 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala --- @@ -0,0 +1,334 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-20 Thread Yunni
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/15148 Hi @sethah, Thanks for the comments. - I agree. I have moved `lsh` package to be under `feature` - In "Similarity search in high dimensions via hashing", there is an

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-20 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r79639164 --- Diff: mllib/src/main/scala/org/apache/spark/ml/lsh/LSH.scala --- @@ -0,0 +1,270 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] spark pull request #15148: Spark 5992 yunn lsh

2016-09-19 Thread Yunni
GitHub user Yunni opened a pull request: https://github.com/apache/spark/pull/15148 Spark 5992 yunn lsh ## What changes were proposed in this pull request? Implement Locality Sensitive Hashing along with approximate nearest neighbors and approximate similarity join based

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-23 Thread Yunni
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/15148 Hi @sethah - My understanding is h(x) = floor((g1 dot x) / w) is one hash function, as is in the wiki. - In bulletpoint 6 of "Approach found on Wikipedia and here and here"

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-19 Thread Yunni
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/15148 Hi @sethah, I have updated the reference in the PR and scaladoc for LSH. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-19 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r79505015 --- Diff: mllib/src/main/scala/org/apache/spark/ml/lsh/LSH.scala --- @@ -0,0 +1,270 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-19 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r79505439 --- Diff: mllib/src/main/scala/org/apache/spark/ml/lsh/LSH.scala --- @@ -0,0 +1,270 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-19 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r79504728 --- Diff: mllib/src/main/scala/org/apache/spark/ml/lsh/LSH.scala --- @@ -0,0 +1,270 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-19 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r79505085 --- Diff: mllib/src/main/scala/org/apache/spark/ml/lsh/LSH.scala --- @@ -0,0 +1,270 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-19 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r79505534 --- Diff: mllib/src/main/scala/org/apache/spark/ml/lsh/RandomProjection.scala --- @@ -0,0 +1,79 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-19 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r79505523 --- Diff: mllib/src/main/scala/org/apache/spark/ml/lsh/RandomProjection.scala --- @@ -0,0 +1,79 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-19 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r79505554 --- Diff: mllib/src/test/scala/org/apache/spark/ml/lsh/LSHTest.scala --- @@ -0,0 +1,124 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-19 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r79505486 --- Diff: mllib/src/main/scala/org/apache/spark/ml/lsh/LSH.scala --- @@ -0,0 +1,270 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-19 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r79505544 --- Diff: mllib/src/main/scala/org/apache/spark/ml/lsh/RandomProjection.scala --- @@ -0,0 +1,79 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-19 Thread Yunni
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/15148 Thanks very much for reviewing @viirya I made some changes based on your comments. PTAL. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-28 Thread Yunni
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/15148 @jkbradley I see. Thanks Joseph! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-28 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r81031684 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala --- @@ -0,0 +1,322 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-28 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r81000669 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/lsh/LSHTest.scala --- @@ -0,0 +1,125 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-28 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r81000698 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/lsh/LSHTest.scala --- @@ -0,0 +1,125 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-28 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r81000599 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/lsh/RandomProjection.scala --- @@ -0,0 +1,88 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-28 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r81000659 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/lsh/RandomProjection.scala --- @@ -0,0 +1,88 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-28 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r81000563 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/lsh/LSH.scala --- @@ -0,0 +1,304 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-28 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r81000695 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/lsh/LSHTest.scala --- @@ -0,0 +1,125 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-28 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r81000707 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/lsh/LSHTest.scala --- @@ -0,0 +1,125 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-28 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r81000597 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/lsh/MinHash.scala --- @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-28 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r81000578 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/lsh/MinHash.scala --- @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-28 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r81000584 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/lsh/MinHash.scala --- @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-28 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r81000555 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/lsh/LSH.scala --- @@ -0,0 +1,304 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-28 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r81000528 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/lsh/LSH.scala --- @@ -0,0 +1,304 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-28 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r81000516 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/lsh/LSH.scala --- @@ -0,0 +1,304 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-28 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r81000488 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/lsh/LSH.scala --- @@ -0,0 +1,290 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-28 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r81000523 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/lsh/LSH.scala --- @@ -0,0 +1,304 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-28 Thread Yunni
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/15148 Hi @MLnick @jkbradley Thanks for the code review. I made some changes based on your comments. - I agree it's better to align the input types to vector in internal implementation

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-28 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r81000447 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/lsh/LSH.scala --- @@ -0,0 +1,304 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-28 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r81000455 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/lsh/LSH.scala --- @@ -0,0 +1,304 @@ +/* + * Licensed to the Apache Software Foundation (ASF

  1   2   3   >