Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/16715#discussion_r100192881
--- Diff: examples/src/main/python/ml/min_hash_lsh.py ---
@@ -0,0 +1,75 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/16715#discussion_r100192347
--- Diff: python/pyspark/ml/feature.py ---
@@ -120,6 +122,200 @@ def getThreshold(self):
return self.getOrDefault(self.threshold
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/16715#discussion_r100192333
--- Diff: python/pyspark/ml/feature.py ---
@@ -120,6 +122,200 @@ def getThreshold(self):
return self.getOrDefault(self.threshold
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/16715#discussion_r100192402
--- Diff: python/pyspark/ml/feature.py ---
@@ -120,6 +122,200 @@ def getThreshold(self):
return self.getOrDefault(self.threshold
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/16715#discussion_r100192298
--- Diff: python/pyspark/ml/feature.py ---
@@ -120,6 +122,200 @@ def getThreshold(self):
return self.getOrDefault(self.threshold
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/16715#discussion_r100192314
--- Diff: python/pyspark/ml/feature.py ---
@@ -120,6 +122,200 @@ def getThreshold(self):
return self.getOrDefault(self.threshold
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/16715#discussion_r100193020
--- Diff: python/pyspark/ml/feature.py ---
@@ -755,6 +951,102 @@ def maxAbs(self):
@inherit_doc
+class MinHashLSH(JavaEstimator
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/16715#discussion_r100193058
--- Diff: python/pyspark/ml/feature.py ---
@@ -755,6 +951,102 @@ def maxAbs(self):
@inherit_doc
+class MinHashLSH(JavaEstimator
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/16715#discussion_r100192685
--- Diff: examples/src/main/python/ml/min_hash_lsh.py ---
@@ -0,0 +1,75 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/16715#discussion_r100192933
--- Diff: examples/src/main/python/ml/bucketed_random_projection_lsh.py ---
@@ -0,0 +1,76 @@
+#
+# Licensed to the Apache Software Foundation (ASF
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/16715#discussion_r100192985
--- Diff: examples/src/main/python/ml/bucketed_random_projection_lsh.py ---
@@ -0,0 +1,76 @@
+#
+# Licensed to the Apache Software Foundation (ASF
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/16715#discussion_r100193043
--- Diff: python/pyspark/ml/feature.py ---
@@ -755,6 +951,102 @@ def maxAbs(self):
@inherit_doc
+class MinHashLSH(JavaEstimator
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/16715#discussion_r100192074
--- Diff: python/pyspark/ml/feature.py ---
@@ -120,6 +122,200 @@ def getThreshold(self):
return self.getOrDefault(self.threshold
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/16715#discussion_r100192026
--- Diff: python/pyspark/ml/feature.py ---
@@ -120,6 +122,200 @@ def getThreshold(self):
return self.getOrDefault(self.threshold
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/16715#discussion_r100199037
--- Diff: python/pyspark/ml/feature.py ---
@@ -120,6 +122,200 @@ def getThreshold(self):
return self.getOrDefault(self.threshold
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/16715#discussion_r100192059
--- Diff: python/pyspark/ml/feature.py ---
@@ -120,6 +122,200 @@ def getThreshold(self):
return self.getOrDefault(self.threshold
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/16715#discussion_r100198559
--- Diff: python/pyspark/ml/feature.py ---
@@ -120,6 +122,200 @@ def getThreshold(self):
return self.getOrDefault(self.threshold
Github user Yunni commented on the issue:
https://github.com/apache/spark/pull/16715
Jenkins retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user Yunni commented on the issue:
https://github.com/apache/spark/pull/16715
@yanboliang, just a friendly reminder please don't forget to review the PR
when you have time. Thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user Yunni commented on the issue:
https://github.com/apache/spark/pull/16715
Thanks very much, @yanboliang ~~
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user Yunni commented on the issue:
https://github.com/apache/spark/pull/16715
@yanboliang @jkbradley Please take a look. Thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
GitHub user Yunni opened a pull request:
https://github.com/apache/spark/pull/16715
[Spark-18080][ML] Python API & Examples for Locality Sensitive Hashing
## What changes were proposed in this pull request?
This pull request includes python API and examples for LSH. The
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/16966#discussion_r102065786
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala ---
@@ -147,6 +148,15 @@ private[ml] abstract class LSHModel[T <: LSHMode
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/16966#discussion_r101832855
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala ---
@@ -147,6 +148,15 @@ private[ml] abstract class LSHModel[T <: LSHMode
GitHub user Yunni opened a pull request:
https://github.com/apache/spark/pull/16966
[SPARK-18409][ML]LSH approxNearestNeighbors should use approxQuantile
instead of sort
## What changes were proposed in this pull request?
In previous implementation of LSH approxNearestNeighbors
Github user Yunni commented on the issue:
https://github.com/apache/spark/pull/16965
@merlintang Sorry I still don't quite get why we need to support OR-AND
when the effective threshold is low. My understanding is that we can always
tune numHashTables and numHashFunctions
Github user Yunni commented on the issue:
https://github.com/apache/spark/pull/16965
@merlintang Not exactly. Each row will explode to L rows, where L is the
number of hash tables. Like the following
Github user Yunni commented on the issue:
https://github.com/apache/spark/pull/16965
@merlintang
(1) `hashDistance` is only used for multi-probe NN Search. The term
`numHashTables`, `numHashFunctions` is very hard to interpret in OR-AND cases.
(2) For similarity join, we
Github user Yunni commented on the issue:
https://github.com/apache/spark/pull/16965
@merlintang We use AND-OR in both approxNearestNeighbor and
approxSimilarityJoin, and it's more difficult for approxSimilarityJoin to adopt
OR-AND than AND-OR.
My understanding: for a (d1
Github user Yunni commented on the issue:
https://github.com/apache/spark/pull/16715
Hi @e-m-m, I think the Python API will be included in Spark 2.2.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/16715#discussion_r100966530
--- Diff: python/pyspark/ml/feature.py ---
@@ -120,6 +122,198 @@ def getThreshold(self):
return self.getOrDefault(self.threshold
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/16715#discussion_r100966534
--- Diff: python/pyspark/ml/feature.py ---
@@ -120,6 +122,198 @@ def getThreshold(self):
return self.getOrDefault(self.threshold
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/16715#discussion_r100966545
--- Diff:
examples/src/main/java/org/apache/spark/examples/ml/JavaBucketedRandomProjectionLSHExample.java
---
@@ -35,6 +35,8 @@
import
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/16715#discussion_r100966554
--- Diff: python/pyspark/ml/feature.py ---
@@ -120,6 +122,198 @@ def getThreshold(self):
return self.getOrDefault(self.threshold
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/16715#discussion_r100966552
--- Diff:
examples/src/main/python/ml/bucketed_random_projection_lsh_example.py ---
@@ -0,0 +1,81 @@
+#
+# Licensed to the Apache Software Foundation
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/16715#discussion_r100966539
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/BucketedRandomProjectionLSH.scala
---
@@ -111,8 +111,8 @@ class BucketedRandomProjectionLSHModel
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/16715#discussion_r100966537
--- Diff: python/pyspark/ml/feature.py ---
@@ -120,6 +122,198 @@ def getThreshold(self):
return self.getOrDefault(self.threshold
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/16715#discussion_r100966555
--- Diff: python/pyspark/ml/feature.py ---
@@ -120,6 +122,198 @@ def getThreshold(self):
return self.getOrDefault(self.threshold
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/16715#discussion_r100966561
--- Diff:
examples/src/main/scala/org/apache/spark/examples/ml/BucketedRandomProjectionLSHExample.scala
---
@@ -38,40 +39,45 @@ object
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/16715#discussion_r100966548
--- Diff: docs/ml-features.md ---
@@ -1558,6 +1558,15 @@ for more details on the API.
{% include_example
java/org/apache/spark/examples/ml
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/16715#discussion_r100966541
--- Diff:
examples/src/main/scala/org/apache/spark/examples/ml/MinHashLSHExample.scala ---
@@ -37,38 +38,44 @@ object MinHashLSHExample {
(0
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/16715#discussion_r101089762
--- Diff: python/pyspark/ml/feature.py ---
@@ -755,6 +945,103 @@ def maxAbs(self):
@inherit_doc
+class MinHashLSH(JavaEstimator
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/16715#discussion_r101089807
--- Diff:
examples/src/main/scala/org/apache/spark/examples/ml/MinHashLSHExample.scala ---
@@ -37,38 +43,45 @@ object MinHashLSHExample {
(0
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/16715#discussion_r101089800
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala ---
@@ -222,17 +222,18 @@ private[ml] abstract class LSHModel[T <: LSHMode
Github user Yunni commented on the issue:
https://github.com/apache/spark/pull/16715
@sethah Really appreciate your detailed code review and comments. :)
@MLnick @yanboliang Thank you for the help as well. Please let me know if
you guys have any other comments.
---
If your
Github user Yunni commented on the issue:
https://github.com/apache/spark/pull/16715
Sure. Will do.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
GitHub user Yunni opened a pull request:
https://github.com/apache/spark/pull/16965
[Spark-18450][ML] Scala API Change for LSH AND-amplification
## What changes were proposed in this pull request?
Implemented a new Param numHashFunctions as the dimension of
AND-amplification
GitHub user Yunni opened a pull request:
https://github.com/apache/spark/pull/17092
[SPARK-18450][ML] Scala API Change for LSH AND-amplification
## What changes were proposed in this pull request?
Implemented a new Param numHashFunctions as the dimension of
AND-amplification
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/16715#discussion_r103361528
--- Diff: python/pyspark/ml/feature.py ---
@@ -120,6 +122,196 @@ def getThreshold(self):
return self.getOrDefault(self.threshold
Github user Yunni commented on the issue:
https://github.com/apache/spark/pull/17104
@srowen The full name works. Just want to make the comments shorter so that
it's easier to read.
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user Yunni commented on the issue:
https://github.com/apache/spark/pull/16965
The number of rows would be O(LN). The memory usage will be different as
the size of each row has changed before and after the explode. Also the
Catalyst Optimizer may do projections during join
Github user Yunni commented on the issue:
https://github.com/apache/spark/pull/16966
@MLnick I did some experiments with WEX datasets. I have put the results in
the description.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
GitHub user Yunni opened a pull request:
https://github.com/apache/spark/pull/17104
[MINOR][ML] Fix comments in LSH Examples and Python API
## What changes were proposed in this pull request?
Remove `org.apache.spark.examples.` in
Add slash in one of the python doc
Github user Yunni commented on the issue:
https://github.com/apache/spark/pull/16965
Looks like the rebase is making it even worse. I will reopen a PR.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user Yunni closed the pull request at:
https://github.com/apache/spark/pull/16965
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r80411490
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/lsh/LSH.scala ---
@@ -0,0 +1,290 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r80411374
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/lsh/LSH.scala ---
@@ -0,0 +1,290 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r82018962
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala ---
@@ -0,0 +1,338 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r82027088
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala ---
@@ -0,0 +1,334 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r82027065
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala ---
@@ -0,0 +1,334 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r82027114
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala ---
@@ -0,0 +1,334 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r82027003
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala ---
@@ -0,0 +1,334 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r82026834
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala ---
@@ -0,0 +1,334 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user Yunni commented on the issue:
https://github.com/apache/spark/pull/15148
@jkbradley Take you time for the code review. :) I will be working on the
open dataset testing at the same time.
---
If your project is set up for it, you can reply to this email and have your
reply
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r82027195
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala ---
@@ -0,0 +1,334 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user Yunni commented on the issue:
https://github.com/apache/spark/pull/15148
Hi @sethah,
Thanks for the comments.
- I agree. I have moved `lsh` package to be under `feature`
- In "Similarity search in high dimensions via hashing", there is an
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r79639164
--- Diff: mllib/src/main/scala/org/apache/spark/ml/lsh/LSH.scala ---
@@ -0,0 +1,270 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under
GitHub user Yunni opened a pull request:
https://github.com/apache/spark/pull/15148
Spark 5992 yunn lsh
## What changes were proposed in this pull request?
Implement Locality Sensitive Hashing along with approximate nearest
neighbors and approximate similarity join based
Github user Yunni commented on the issue:
https://github.com/apache/spark/pull/15148
Hi @sethah
- My understanding is h(x) = floor((g1 dot x) / w) is one hash function, as
is in the wiki.
- In bulletpoint 6 of "Approach found on Wikipedia and here and here"
Github user Yunni commented on the issue:
https://github.com/apache/spark/pull/15148
Hi @sethah, I have updated the reference in the PR and scaladoc for LSH.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r79505015
--- Diff: mllib/src/main/scala/org/apache/spark/ml/lsh/LSH.scala ---
@@ -0,0 +1,270 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r79505439
--- Diff: mllib/src/main/scala/org/apache/spark/ml/lsh/LSH.scala ---
@@ -0,0 +1,270 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r79504728
--- Diff: mllib/src/main/scala/org/apache/spark/ml/lsh/LSH.scala ---
@@ -0,0 +1,270 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r79505085
--- Diff: mllib/src/main/scala/org/apache/spark/ml/lsh/LSH.scala ---
@@ -0,0 +1,270 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r79505534
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/lsh/RandomProjection.scala ---
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r79505523
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/lsh/RandomProjection.scala ---
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r79505554
--- Diff: mllib/src/test/scala/org/apache/spark/ml/lsh/LSHTest.scala ---
@@ -0,0 +1,124 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r79505486
--- Diff: mllib/src/main/scala/org/apache/spark/ml/lsh/LSH.scala ---
@@ -0,0 +1,270 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r79505544
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/lsh/RandomProjection.scala ---
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user Yunni commented on the issue:
https://github.com/apache/spark/pull/15148
Thanks very much for reviewing @viirya I made some changes based on your
comments. PTAL.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user Yunni commented on the issue:
https://github.com/apache/spark/pull/15148
@jkbradley I see. Thanks Joseph!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r81031684
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala ---
@@ -0,0 +1,322 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r81000669
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/lsh/LSHTest.scala ---
@@ -0,0 +1,125 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r81000698
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/lsh/LSHTest.scala ---
@@ -0,0 +1,125 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r81000599
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/lsh/RandomProjection.scala ---
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r81000659
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/lsh/RandomProjection.scala ---
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r81000563
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/lsh/LSH.scala ---
@@ -0,0 +1,304 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r81000695
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/lsh/LSHTest.scala ---
@@ -0,0 +1,125 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r81000707
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/lsh/LSHTest.scala ---
@@ -0,0 +1,125 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r81000597
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/lsh/MinHash.scala ---
@@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r81000578
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/lsh/MinHash.scala ---
@@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r81000584
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/lsh/MinHash.scala ---
@@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r81000555
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/lsh/LSH.scala ---
@@ -0,0 +1,304 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r81000528
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/lsh/LSH.scala ---
@@ -0,0 +1,304 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r81000516
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/lsh/LSH.scala ---
@@ -0,0 +1,304 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r81000488
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/lsh/LSH.scala ---
@@ -0,0 +1,290 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r81000523
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/lsh/LSH.scala ---
@@ -0,0 +1,304 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user Yunni commented on the issue:
https://github.com/apache/spark/pull/15148
Hi @MLnick @jkbradley
Thanks for the code review. I made some changes based on your comments.
- I agree it's better to align the input types to vector in internal
implementation
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r81000447
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/lsh/LSH.scala ---
@@ -0,0 +1,304 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r81000455
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/lsh/LSH.scala ---
@@ -0,0 +1,304 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
1 - 100 of 269 matches
Mail list logo