Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15874#discussion_r88569315
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/BucketedRandomProjectionLSHSuite.scala
---
@@ -115,64 +117,83 @@ class RandomProjectionSuite
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15874#discussion_r88569321
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/MinHashLSHSuite.scala ---
@@ -86,9 +94,24 @@ class MinHashSuite extends SparkFunSuite
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15874#discussion_r88569056
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/BucketedRandomProjectionLSH.scala
---
@@ -147,15 +151,17 @@ class RandomProjection(override val
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15874#discussion_r88569084
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MinHashLSH.scala
---
@@ -31,36 +31,34 @@ import org.apache.spark.sql.types.StructType
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15874#discussion_r88569066
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MinHashLSH.scala
---
@@ -31,36 +31,34 @@ import org.apache.spark.sql.types.StructType
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15874#discussion_r88169546
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala ---
@@ -179,16 +211,13 @@ private[ml] abstract class LSHModel[T <: LSHMode
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15874#discussion_r88150618
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala ---
@@ -179,16 +211,13 @@ private[ml] abstract class LSHModel[T <: LSHMode
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15874#discussion_r88129780
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/MinHashLSHSuite.scala ---
@@ -24,7 +24,7 @@ import org.apache.spark.ml.util.DefaultReadWriteTest
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15874#discussion_r88129663
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala ---
@@ -179,16 +211,13 @@ private[ml] abstract class LSHModel[T <: LSHMode
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15874#discussion_r88129409
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala ---
@@ -106,22 +123,24 @@ private[ml] abstract class LSHModel[T <: LSHMode
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15874#discussion_r88128756
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MinHashLSH.scala
---
@@ -144,12 +152,12 @@ class MinHash(override val uid: String) extends
LSH
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15874#discussion_r88128823
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala ---
@@ -66,10 +66,10 @@ private[ml] abstract class LSHModel[T <: LSHModel[T]]
s
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15874#discussion_r88128732
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MinHashLSH.scala
---
@@ -125,11 +125,11 @@ class MinHash(override val uid: String) extends
LSH
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15874#discussion_r88128687
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MinHashLSH.scala
---
@@ -74,9 +72,12 @@ class MinHashModel private[ml
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15874#discussion_r88128341
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MinHashLSH.scala
---
@@ -74,9 +72,12 @@ class MinHashModel private[ml
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15874#discussion_r88128287
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MinHashLSH.scala
---
@@ -46,21 +42,23 @@ import org.apache.spark.sql.types.StructType
@Since
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15874#discussion_r88128199
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MinHashLSH.scala
---
@@ -31,13 +31,9 @@ import org.apache.spark.sql.types.StructType
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15874#discussion_r88128252
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MinHashLSH.scala
---
@@ -46,21 +42,23 @@ import org.apache.spark.sql.types.StructType
@Since
Github user Yunni commented on the issue:
https://github.com/apache/spark/pull/15874
Thanks, @sethah. I have reverted "AND-amplification" related changes. PTAL.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as wel
Github user Yunni closed the pull request at:
https://github.com/apache/spark/pull/15800
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user Yunni commented on the issue:
https://github.com/apache/spark/pull/15800
OK. Abandon this PR since we are making MultiProbe NN Search and
`hashDistance` private. Related changes are included in #15874
---
If your project is set up for it, you can reply to this email
GitHub user Yunni opened a pull request:
https://github.com/apache/spark/pull/15874
Spark 18408 yunn api improvements
## What changes were proposed in this pull request?
(1) Change output schema to `Array of Vector` instead of `Vectors`
(2) Use `numHashTables
Github user Yunni commented on the issue:
https://github.com/apache/spark/pull/15800
@MLnick Thanks! That's very good to know!
@sethah I agree with your comments. @jkbradley If you don't have objection,
shall I remove MultiProbe NN Search and `hashDistance`, so we
Github user Yunni commented on the issue:
https://github.com/apache/spark/pull/15800
@jkbradley I agree with your idea to get rid of full sorting and use
`approxQuantile` to find the threshold. Doing a full sort on whole dataset
hurts a lot in performance. Please file a ticket
Github user Yunni commented on the issue:
https://github.com/apache/spark/pull/15800
> One way to look at it is that (a) will contain many duplicates in the L
sets of points, so (b) is more likely to have higher precision and recall.
I think this might be the place
Github user Yunni commented on the issue:
https://github.com/apache/spark/pull/15800
@sethah That sounds good to me, expect that there is no `posexplode()` in
spark AFAIK. Do you think `hashDistance(x: Array[Vector], y: Array[Vector])` is
a better workaround, or we should still use
Github user Yunni commented on the issue:
https://github.com/apache/spark/pull/15800
> If a query point vector q hashes to some MinHash Vector [5.0, 22.0, 13.0]
the best candidates will be ones that hash to that same vector.
My second half is suggesting: If a query po
Github user Yunni commented on the issue:
https://github.com/apache/spark/pull/15800
Hi @jkbradley,
I agree with your claim on estimating Jaccard similarity, but looks like
your `L` and `k` are having the same effect on the performance. Consider a case
when we want to trade
Github user Yunni commented on the issue:
https://github.com/apache/spark/pull/15148
Thanks for the discussion, everyone! I will take a look at the JIRA.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user Yunni commented on the issue:
https://github.com/apache/spark/pull/15800
@jkbradley There are 2 reason I don't think averaging indicators is a good
hashDistance for the current implementation.
(1) SingleProbe NN performance relies on OR-amplification, changing
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15800#discussion_r87298552
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MinHash.scala ---
@@ -32,13 +32,7 @@ import org.apache.spark.sql.types.StructType
Github user Yunni commented on the issue:
https://github.com/apache/spark/pull/15800
@jkbradley Averaging indicators make more sense for an AND-amplified
MinHash function. The hash distance is 0 when all hash values are equal, and
grows as the more hash values differ.
As we
Github user Yunni commented on the issue:
https://github.com/apache/spark/pull/15800
@sethah Not exactly. Based on the logic in `approxNearestNeighbor`, if
there aren't enough candidates where the distance is zero, we'll scan the the
whole dataset.
I don't think multi
Github user Yunni commented on the issue:
https://github.com/apache/spark/pull/15148
@jkbradley I agree with most of your comments above. And I would like to
suggest the following:
- I would recommend a more intuitive name like `HyperplaneProjection`
instead of `PStableHashing
Github user Yunni commented on the issue:
https://github.com/apache/spark/pull/15795
@bravo-zhang @srowen I am OK to use the example in #15787.
But I still think `approxNearestNeighbor` and `approxSimilarityJoin` are
different algorithms and it would be easier for user
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15795#discussion_r86889880
--- Diff:
examples/src/main/scala/org/apache/spark/examples/ml/RandomProjectionExample.scala
---
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15795#discussion_r86889863
--- Diff:
examples/src/main/java/org/apache/spark/examples/ml/JavaRandomProjectionExample.java
---
@@ -0,0 +1,72 @@
+/*
+ * Licensed to the Apache
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15795#discussion_r86889844
--- Diff: docs/ml-features.md ---
@@ -1396,3 +1396,134 @@ for more details on the API.
{% include_example python/ml/chisq_selector_example.py
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15795#discussion_r86889702
--- Diff:
examples/src/main/java/org/apache/spark/examples/ml/JavaRandomProjectionExample.java
---
@@ -0,0 +1,72 @@
+/*
+ * Licensed to the Apache
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15795#discussion_r86889774
--- Diff: docs/ml-features.md ---
@@ -1396,3 +1396,134 @@ for more details on the API.
{% include_example python/ml/chisq_selector_example.py
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15795#discussion_r86877678
--- Diff: docs/ml-features.md ---
@@ -1396,3 +1396,134 @@ for more details on the API.
{% include_example python/ml/chisq_selector_example.py
GitHub user Yunni opened a pull request:
https://github.com/apache/spark/pull/15800
[SPARK-18334] MinHash should use binary hash distance
## What changes were proposed in this pull request?
MinHash currently is using the same `hashDistance` function as
RandomProjection
Github user Yunni commented on the issue:
https://github.com/apache/spark/pull/15148
@sethah Yes, that's why `outputDim` is introduced for users to trade off
between false negative rate and running time.
During my tests, LSH without amplification can be (0.5, 0.5)-sensitive
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r86724596
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MinHash.scala ---
@@ -0,0 +1,194 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
GitHub user Yunni opened a pull request:
https://github.com/apache/spark/pull/15795
[SPARK-18081] Add user guide for Locality Sensitive Hashing(LSH)
## What changes were proposed in this pull request?
The user guide for LSH is added to ml-features.md, with several scala/java
Github user Yunni commented on the issue:
https://github.com/apache/spark/pull/15148
@sethah I think you are right. OR-amplification is only applied inside NN
search and similarity join through `hashDistance` and `explode`. `transform`
itself does not apply amplifications
Github user Yunni commented on the issue:
https://github.com/apache/spark/pull/15148
Awesome! Thanks Joseph and thanks everyone else for reviewing this! ð
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r85591762
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MinHash.scala ---
@@ -0,0 +1,194 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r85459596
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala ---
@@ -0,0 +1,336 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r85459447
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala ---
@@ -0,0 +1,339 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r85444756
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/RandomProjection.scala ---
@@ -0,0 +1,215 @@
+/*
+ * Licensed to the Apache Software
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r85444781
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MinHash.scala ---
@@ -0,0 +1,192 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r85424257
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/RandomProjection.scala ---
@@ -0,0 +1,215 @@
+/*
+ * Licensed to the Apache Software
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r85418671
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MinHash.scala ---
@@ -0,0 +1,186 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r85417885
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala ---
@@ -0,0 +1,339 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user Yunni commented on the issue:
https://github.com/apache/spark/pull/15148
Thanks @jkbradley . I have made several changes to unit tests. Please let
me know if I missed any.
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r85248006
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/RandomProjectionSuite.scala ---
@@ -0,0 +1,148 @@
+/*
+ * Licensed to the Apache Software
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r85248016
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/RandomProjection.scala ---
@@ -0,0 +1,146 @@
+/*
+ * Licensed to the Apache Software
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r85247717
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/MinHashSuite.scala ---
@@ -0,0 +1,84 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r84586831
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala ---
@@ -0,0 +1,343 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r84586829
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala ---
@@ -0,0 +1,343 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user Yunni commented on the issue:
https://github.com/apache/spark/pull/15148
Thanks @jkbradley. I have removed BitSampling and SignRandomProjection for
a follow-up PR.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user Yunni commented on the issue:
https://github.com/apache/spark/pull/15148
Have no idea to solve this MiMa test. Could anyone give some clue?
```
java.lang.ArrayIndexOutOfBoundsException: 1660
at
com.typesafe.tools.mima.core.BufferReader.nextByte
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r83149648
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala ---
@@ -0,0 +1,343 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r82871238
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/RandomProjection.scala ---
@@ -0,0 +1,146 @@
+/*
+ * Licensed to the Apache Software
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r82726587
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala ---
@@ -0,0 +1,343 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r82722577
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala ---
@@ -0,0 +1,343 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r82722244
--- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/LSHTest.scala ---
@@ -0,0 +1,135 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r82722195
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/RandomProjection.scala ---
@@ -0,0 +1,159 @@
+/*
+ * Licensed to the Apache Software
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r82722184
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MinHash.scala ---
@@ -0,0 +1,143 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r82722187
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MinHash.scala ---
@@ -0,0 +1,143 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r82722189
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MinHash.scala ---
@@ -0,0 +1,143 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r82722181
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala ---
@@ -0,0 +1,343 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r82722185
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MinHash.scala ---
@@ -0,0 +1,143 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r82722177
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala ---
@@ -0,0 +1,343 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r82721024
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala ---
@@ -0,0 +1,339 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r82676608
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MinHash.scala ---
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r82635922
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MinHash.scala ---
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r82635900
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MinHash.scala ---
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r82635989
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala ---
@@ -0,0 +1,339 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r82635943
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MinHash.scala ---
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r82635973
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala ---
@@ -0,0 +1,339 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r82635955
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MinHash.scala ---
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r82635937
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MinHash.scala ---
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r82635871
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/RandomProjection.scala ---
@@ -0,0 +1,127 @@
+/*
+ * Licensed to the Apache Software
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r82635849
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/RandomProjection.scala ---
@@ -0,0 +1,127 @@
+/*
+ * Licensed to the Apache Software
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r82635887
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MinHash.scala ---
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r82635879
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MinHash.scala ---
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r82635859
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/RandomProjection.scala ---
@@ -0,0 +1,127 @@
+/*
+ * Licensed to the Apache Software
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r82635828
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/RandomProjection.scala ---
@@ -0,0 +1,127 @@
+/*
+ * Licensed to the Apache Software
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r82635792
--- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/LSHTest.scala ---
@@ -0,0 +1,130 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r82635810
--- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/LSHTest.scala ---
@@ -0,0 +1,130 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r82635817
--- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/LSHTest.scala ---
@@ -0,0 +1,130 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r82635840
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/RandomProjection.scala ---
@@ -0,0 +1,127 @@
+/*
+ * Licensed to the Apache Software
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r82635804
--- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/LSHTest.scala ---
@@ -0,0 +1,130 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r82619926
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala ---
@@ -0,0 +1,339 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r82539368
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala ---
@@ -0,0 +1,339 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r82534311
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala ---
@@ -0,0 +1,339 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user Yunni commented on the issue:
https://github.com/apache/spark/pull/15148
@jkbradley Take you time for the code review. :) I will be working on the
open dataset testing at the same time.
---
If your project is set up for it, you can reply to this email and have your
reply
Github user Yunni commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r82027195
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala ---
@@ -0,0 +1,334 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
101 - 200 of 269 matches
Mail list logo