[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/15148 Thanks for the discussion, everyone! I will take a look at the JIRA. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15148 Phew, done! https://issues.apache.org/jira/browse/SPARK-18392 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15148 Good points: Array of Vectors sounds good to me. There has been a lot of discussion. I'm going to try to summarize things in a follow-up JIRA, which I'll link here shortly. LSH turned out to be a much messier area than I expected; thanks a lot to everyone for all of the post-hoc reviews and discussions! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user sethah commented on the issue: https://github.com/apache/spark/pull/15148 If we were to use a matrix for the output, then when we do `approxSimilarityJoin` we would want to explode the output column by matrix rows, assuming the matrix structure was: | ---g1(x) | | ---g2(x) | | ... | | ---gL(x) | This is probably possible, but might be a bit awkward? `Array[Vector]` might make it a bit easier. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/15148 > This is very common in academic research and literature, but it may not be in industry. I'm fine with not considering it for now. Ok makes sense - for the `transform` case if users are looking to directly use the hash sigs as lower-dim representation, they can always set `L=1` and `d` (assuming we do AND + OR later) to get just one "vector" output. For the public vals - sorry if I wan't clear. I meant we should probably not expose them until the API is fully baked. But yes I see that they are useful to expose once we're happy with the API. I just don't love the idea of changing things later (and throwing errors and whatnot) if we can avoid it - I think we saw similar issues with e.g. NaiveBayes now. > What about outputting a Matrix instead of an Array of Vectors? That will make it easy to change in the future, without us having weird Vectors of length 1. Matrix can work - I don't think `Array[Vector]` is an issue either. I seem to recall a comment above that Matrix was a bit less easy to work with (exploding indices and so on). I don't see a big difference between an Lx1 matrix and an L-length Array of 1-d vectors in practical terms. So, I'm ok with either approach. I'll check the JIRA - sorry I missed the links. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15148 @MLnick I agree with most of your comments. A few responses: > In terms of transform - I disagree somewhat that the main use case is "dimensionality reduction". Perhaps there are common examples of using the hash signatures as a lower-dim representation as a feature in some model (e.g. in a similar way to say a PCA transform), but I haven't seen that. This is very common in academic research and literature, but it may not be in industry. I'm fine with not considering it for now. > I also don't see why randUnitVectors or randCoefficients needs to be public You mentioned people using LSH outside of Spark for serving. In order to do that, we will need to expose randUnitVectors and randCoefficients so that users can compute hash values for query points. That said, I'm fine with making those private for now and preventing this use case for 1 release while we stabilize the API. > One issue I have is that currently we would output a 1 x L set of hash values. But it actually should be L x 1 i.e. a set of signatures of length 1. I guess we can leave it as is, but document what the output actually is. What about outputting a Matrix instead of an Array of Vectors? That will make it easy to change in the future, without us having weird Vectors of length 1. > Finally, my understanding was results from some performance testing would be posted. I don't believe we've seen this yet. You can see some results linked from the JIRA. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/15148 Oh and for naming - I'm ok with the current ones actually. However we could think about changing to `ScalarRandomProjectionLSH` (a term mentioned in @karlhigley's package), as later we will have `SignRandomProjectionLSH` for cosine distance; and `MinHashLSH`, etc - just to make it clear what the class is doing. (perhaps later we have some other random projection algorithm that conflicts etc). We could name according to the estimated metric such as `EuclideanLSH` or so on, but if we want to support say Euclidean and Manhattan distance at some point that becomes problematic. So perhaps best not to? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/15148 I tend to agree that the terminology used here is a little confusing, and doesn't seem to match up with the "general" terminology (I use that term loosely however). Terminology In my dealings with LSH, I too have tended to come across the version that @sethah mentions (and @karlhigley's package, and others such as https://github.com/marufaytekin/lsh-spark, implement). that is, each input vector is hashed into `L` "tables" of hash signatures of "length" or "dimension" `d`. Each hash signature is created by concatenating the result of applying `d` "hash functions". I agree what's effectively implemented here is `L = outputDim` and `d=1`. What I find a bit troubling is that it is done "implicitly", as part of the `hashDistance` function. Without knowing that is what is happening, it is not clear to a new user - coming from other common LSH implementations - that `outputDim` is not the "number of hash functions" or "length of the hash signatures" but actually the "number of hash tables". Transform semantics In terms of `transform` - I disagree somewhat that the main use case is "dimensionality reduction". Perhaps there are common examples of using the hash signatures as a lower-dim representation as a feature in some model (e.g. in a similar way to say a PCA transform), but I haven't seen that. In my view, the real use case is the approximate nearest neighbour search. I'll give a concrete example for the `transform` output. Let's say I want to export recommendation model factor vectors (from ALS), or Word2Vec vectors, etc, to a real-time scoring system. I have many items, so I'd like to use LSH to make my scoring feasible. I do this by effectively doing a real-time version of OR-amplification. I store the hash tables (`L` tables of `d` hash signatures) with my vectors. When doing "similar items" for a given item, I retrieve the hash sigs of the query item, and use these to filter down the candidate item set for my scoring. This is in fact something I'm working on in a demo project currently. So if we will support the OR/AND combo, then it will be very important to output the full `L x d` set of hash sigs in `transform`. Proposal: My recommendation is: 1. future proof the API by returning `Array[Vector]` in `transform` (as mentioned above by others); 2. we need to update the docs / user guide to make it really clear what the implementation is doing; 3. I think we need to make it clear that the implied `d` value here is `1` - we can mention that AND amplification will be implemented later and perhaps even link to a JIRA. 4. rename `outputDim` to something like `numHashTables`. 5. when we add AND-amp, we can add the parameter `hashSignatureLength` or `numHashFunctions`. 6. make as much private as possible to avoid being stuck with any implementation detail in future releases (e.g. I also don't see why `randUnitVectors` or `randCoefficients` needs to be public). One issue I have is that currently we would output a `1 x L` set of hash values. But it actually should be `L x 1` i.e. a set of signatures of length `1`. I guess we can leave it as is, but document what the output actually is. I believe we should support OR/AND in future. If so, then to me many things need to change - `hashFunction`, `hashDistance` etc will need to be refactored. Most of the implementation is private/protected so I think it will be ok. Let's just ensure we're not left with an API that we can't change in future. Setting `L` and `d=1` must then yield the same result as current impl to avoid a behavior change (I guess this will be ok since current default for `L` is `1`, and we can make the default for `d` when added also `1`). Finally, my understanding was results from some performance testing would be posted. I don't believe we've seen this yet. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15148 @sethah > What is the intended use of the output column generated by transform? As an alternative set of features with decreased dimensionality? I agree it's mainly for dimensionality reduction, though these LSH functions are not ideal for that. (E.g., most people doing dimensionality reduction would probably want to use random projections without bucketing.) @karlhigley I agree with your description of different dimensionalities and agree we may just have to pick some terminology out of many choices. I'm fairly ambivalent about what terminology we choose, though it would be great for it to match whatever references we cite. (And maybe we do need another reference cited for describing OR vs AND amplification and "dimensions.") @Yunni * Have you seen "HyperplaneProjection" used in literature? * I'll respond about the hashDistance in [https://github.com/apache/spark/pull/15800] * Let's not implement both types of amplification just yet. Let's either: * Fix the API so we can add them in the future, or * Make LSH private for now so that we can change fix its API for 2.2. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/15148 @jkbradley I agree with most of your comments above. And I would like to suggest the following: - I would recommend a more intuitive name like `HyperplaneProjection` instead of `PStableHashing` if we adopt the LSH function @sethah suggested. - `x.toDense.values.zip(y.toDense.values).map(pair => pair._1 == pair._2).sum / x.size` is AND-amplification. I think we should use OR-amplification here. I have already made a pull request to fix the issue in #15800. - I think for MinHash, multi-probing NN Search is either single probing or full scan. - Here is my reference for Multi-probing: http://www.cs.princeton.edu/cass/papers/mplsh_vldb07.pdf @sethah @karlhigley Now I see your LSH function for Euclidean distance is the AND-amplification of what I have implemented. - Do you have any reference for compound AND/OR-amplification? I see this is not always working without assumptions on distance threshold and sensitivity, for example, `(0.6, 0.4)` => `(0.426, 0.098)` for `L = 4, d = 4`, and `(0.8, 0.2)` => `(0.678, 0.000)` for `L = 10, d = 10` - For the schema of `transform()`, I think we either add a generic type for the output column in LSH class or change the output type to `Array[Vector]`. I would recommend the latter way because (1) it's very easy to explode the array to get what @sethah suggested (2) The type of output column still needs to be spark sql compatible, which is not so generic. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user karlhigley commented on the issue: https://github.com/apache/spark/pull/15148 @jkbradley: "Multi-probe" seems like a standard term, and I think this is the [original paper](http://www.cs.princeton.edu/cass/papers/mplsh_vldb07.pdf) that coined it. > Terminology: For LSH, "dimensionality" = "number of hash functions" and is relevant only for amplification. Do you agree? I have yet to see a hash function used for LSH which does not have a discrete set. I confess that I'm a little confused what you mean by the above. There are several relevant dimensionalities: the dimensionality of the input points (`x`), the dimensionality of the computed hashes (i.e. the results of applying `g(x)`), and the number of hash tables computed (i.e. how many `g(x)` functions are applied), which is the dimensionality of AND-amplification (in a sense). After wrestling with inconsistent terminology for a while, what I settled on for spark-neighbors was to refer to `g(x)` as a hash function, the outputs of `g(x)` as hashes, the sub-elements of `g(x)` -- `h1(x)` etc. -- as whatever made sense for the particular method (e.g. `permutations` for Minhash), and the output of each of the L `g(x)` functions as a hash table. While that terminology isn't necessarily standard, it helped me identify the common concepts across LSH methods clearly enough to build some abstractions around them. Using those terms, the dimensionality of the `g(x)` hash functions and the hashes they produce is equivalent to the number of `h(x)` sub-elements they contain. I thought of applying OR-amplification as producing multiple hash tables by using multiple `g(x)` functions, with a collision in any one hash table producing a pair of candidate neighbors. Does that make any more (or less) sense? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user karlhigley commented on the issue: https://github.com/apache/spark/pull/15148 @sethah: Your description of the combination of AND and OR amplification from the literature matches my understanding, and the combination of the two is what I was aiming for in spark-neighbors. I also concur with your assessment of the potential performance impacts of OR-amplification without first applying AND-amplification, in terms of both precision/recall and runtime. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user sethah commented on the issue: https://github.com/apache/spark/pull/15148 I was using L to refer to the number of compound hash functions, but you're right that in my explanation L was the "OR" parameter and d was the "AND" parameter. Thinking more about it, this is a tough question. What is the intended use of the output column generated by transform? As an alternative set of features with decreased dimensionality? When/if we use the AND/OR amplification, we could go a couple of different routes. Let's say for d = 3 and L = 3 we could first apply our hashing scheme to the input to obtain: |features| g1| g2| g3| ||||| |[12.5609584702036...|[112.0,1.0,12.0]|[1.0,120.0,16.0]|[102.0,1.0,14.0] |...|...|...|...| Then we generate `g1(q), g2(q), g3(q)` where q is the query point and we would select all points where `g1(q) == g1(x_i) OR g2(q) == g2(x_i) OR ...`. In spark-neighbors, instead the number of elements in the output dataframe has `L * N` rows where N is the number of rows in the input dataframe. Then you can join on the hashed column plus a "table identifier" (the index l in range [1, L]). Still, this makes a temporary dataframe within the near-neighbors or approx-join algos, and I'm not sure the output schema of `transform` needs to have all `L` hashed values. We could store `randUnitVectors: Array[Array[Vector]]` and for transform output the hashed value for only the first sequence of random vectors, but that seems a bit strange to me. Thoughts? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15148 > The current implementation is equivalent to the L = 1 case always, and outputDim corresponds to d. That is true if you're talking about comparing hash values. But for approx similarity and nearest neighbors, this is doing d = 1 and L = outputDim (i.e., OR amplification). (Did you swap accidentally?) Definitely need to clarify in the docs. I'm not too worried about making ```randUnitVectors``` private. We can always deprecate it and have it throw an exception when it is not applicable. I'm more worried about the schema for transform(). Do you think we should go ahead and output a Matrix so we can support AND and OR in the future? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user sethah commented on the issue: https://github.com/apache/spark/pull/15148 So I'll try to summarize the AND/OR amplification and how I think it fits into the current API right now. LSH relies on a single hashing function `h(x)` which is (R, cR, p1, p2)-sensitive which just means it meets certain properties needed for LSH. In the case of 2-stable method `h(x) = floor((x dot r) / w)` which maps `Vector[Double] => Int`. p1 and p2 correspond to "good" and "bad" collision probabilities respectively. To decrease the probability of a bad collision we can use AND-amplification by creating a new, compound hash function `g(x) = [h1(x), h2(x), ..., hd(x)]` where the `h_i(x)` correspond to different random vectors `r`. Now we only consider collisions for two vectors x and y if g(x) == g(y) (i.e. standard vector equality). This makes the probability of both types of collisions decrease to `(p1^d, p2^d)`. For a hypothetical (0.8, 0.2)-sensitive distribution this goes to `(0.4, 0.0016)` for d = 4. Making the false-positive rate very low, but meaning we also miss a lot of good candidates. To mitigate this we can further apply OR-amplification by generating not one compound hash function g(x) but `L` compound functions g1(x) = [h11(x), ..., h1d(x)] g2(x) = [h21(x), ..., h2d(x)] gL(x) = [hL1(x), ..., hLd(x)] Then we convert the original probabilities to `(1 - (1 - p1^L)^b, 1 - (1 - p2^L)^b)` and in our example `(0.8, 0.2) => (0.8785, 0.006)` for L = 4, d = 4. The current implementation is equivalent to the `L = 1` case always, and `outputDim` corresponds to `d`. The concern I have with the RandomProjection API right now is that if we extend to offer arbitrary `L` then our models do not store just a d-dimensional array of random vectors but more like a `L x d` matrix of random vectors. And we would have `hashFunctions` instead of `hashFunction` (though this is still private). One question I have is - why do we expose `randUnitVectors` at all? I feel it leaves us more room for changes in the future if we do not expose it, especially considering the points I just made. There may be some reason to expose it that I haven't thought of though. What do we think about changing it to private? I like the idea of changing `outputDim` to something related to OR-amplification a lot. I think minhash is done properly right now but the `hashDistance` measure doesn't make sense as already discussed. Right now, I'd like to focus on making sure we don't corner ourselves with the API since internal algo details and documentation can always be changed later. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15148 It sounds like discussions are converging, but I want to confirm a few things + make a few additions. ### Amplification Is this agreed? * Approx neighbors and similarity are doing OR-amplification when comparing hash values, as described in the [Wikipedia article](https://en.wikipedia.org/wiki/Locality-sensitive_hashing#Stable_distributions). This is computing an amplified hash function *implicitly*. * transform() is not doing amplification. It outputs the value of a collection of hash functions, rather than aggregating them to do amplification. * This is my main question: Is amplification ever done explicitly, and when would you ever need that? Adding combined AND and OR amplification in the future sounds good to me. My main question right now is whether we need to adjust the API before the 2.1 release. I don't see a need to, but please comment if you see an issue with the current API. * One possibility: We could rename outputDim to something specific to OR-amplification. Terminology: For LSH, "dimensionality" = "number of hash functions" and is relevant only for amplification. Do you agree? I have yet to see a hash function used for LSH which does not have a discrete set. ### Random Projection I agree this should be renamed to something like "PStableHashing." My apologies for not doing enough background research to disambiguate. ### MinHash I think this is implemented correctly, according to the reference given in the linked Wikipedia article. * [This reference](https://github.com/apache/spark/blob/8f0ea011a7294679ec4275b2fef349ef45b6eb81/mllib/src/main/scala/org/apache/spark/ml/feature/MinHash.scala#L36) to perfect hash functions may be misleading. I'd prefer to remove it. ### hashDistance Rethinking this, I am unsure about what function we should use. Currently, hashDistance is only used by approxNearestNeighbors. Since approxNearestNeighbors sorts by hashDistance, using a soft measure might be better than what we currently have: * MinHash * Currently: Uses OR-amplification for single probing, and something odd for multiple probing * Best option for approxNearestNeighbors: [this Wikipedia section](https://en.wikipedia.org/wiki/MinHash#Variant_with_many_hash_functions), which is equivalent or OR-amplification when using single probing. I.e., replace [this line of code](https://github.com/apache/spark/blob/8f0ea011a7294679ec4275b2fef349ef45b6eb81/mllib/src/main/scala/org/apache/spark/ml/feature/MinHash.scala#L79) with: ```x.toDense.values.zip(y.toDense.values).map(pair => pair._1 == pair._2).sum / x.size``` * RandomProjection * Currently: Uses OR-amplification for single probing, and something reasonable for multiple probing @Yunni What is the best resource you have for single vs multiple probing? I'm wondering now if they are uncommon terms and should be renamed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/15148 @sethah Yes, that's why `outputDim` is introduced for users to trade off between false negative rate and running time. During my tests, LSH without amplification can be (0.5, 0.5)-sensitive or even worse depending on the input distribution. Even that case, `outputDim = 4` or `outputDim = 5` already gives very good accuracy. And the number of rows being scanned should be proportional to `outputDim * averageBucketSize`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user sethah commented on the issue: https://github.com/apache/spark/pull/15148 Ok, I'm looking more closely at this algorithm versus the literature. I agree that there is a lot of inconsistent terminology which is probably leading to some of the confusion here. Most or all of the LSH algorithms in the literature describe a process which applies a composition of AND and OR amplification. @karlhigley This is what the package spark-neighbors does as well, correct? AND amplification is applied by generating hash functions `g(x) = (h1(x), h2(x), ..., hd(x))` which are concatenations of several of the vanilla locality sensitive hashing functions. These algorithms only compare `g(x) == g(y)` for near-neighbor candidacy. Still, they then apply OR amplification by using `L` of these hashing functions and accepting a point as a candidate if any of the `g_i for i = 1 to L` hash functions fall into the same bucket as the query point. In this patch we only apply OR amplification by generating a single `g(x) = (h1(x), h2(x), ..., hd(x))` and we consider candidates if any of the `h_i for i = 1 to d` match. For a `(p1, p2)` sensitive hashing family, this OR amplification transforms it into a `(1 - (1 - p1)^d, 1 - (1 - p2)^d)` family, where p1 is a "good" collision and p2 is a "bad" collision. Consider a (0.8, 0.2) hash family where we apply OR amplification with a dimension `d = 10`. We will transform this into a `(0.9989, 0.893)` sensitive family. Basically, we amplify the good and bad collisions. If instead we implement the composition of **AND then OR** amplification as in the literature, we transform a `(0.8, 0.2)` sensitive family into a `(.8785, .0064)`. In this way, we amplify the "good" collision and dampen the "bad" collision probabilities. If this is correct, then I think the current implementation will end up selecting most of the points as candidates and may impact the runtime performance. [This r eference](http://web.stanford.edu/class/cs345a/slides/05-LSH.pdf) sums it up nicely IMO. I will look into testing this out more concretely. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/15148 @sethah I think you are right. OR-amplification is only applied inside NN search and similarity join through `hashDistance` and `explode`. `transform` itself does not apply amplifications. Sorry to miss this. I will clarify this in the user guide, and I am happy for the PR you send to fix the documentation. @jkbradley @MLnick --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user sethah commented on the issue: https://github.com/apache/spark/pull/15148 @karlhigley Thanks for your detailed response. From the amplification section on [Wikipedia](https://en.wikipedia.org/wiki/Locality-sensitive_hashing#Amplification), it is pretty clear to me that this implementation is not doing OR/AND amplification. `outputDim` is just the number of concatenated random hash functions (`k` in the wiki article). For now we can clarify some of this a bit better in the documentation, and perhaps in the future we can extend this implementation to use optional AND/OR amplification. I can work on a PR for it this week, unless there are any objections. @jkbradley @Yunni @MLnick ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user karlhigley commented on the issue: https://github.com/apache/spark/pull/15148 @sethah: I think you're right that there's a discrepancy here, and I'm embarrassed that I didn't see it when I first reviewed the PR. On a reread of the source and your comment above, it looks like the LSH models in this PR use a single hash function to compute a single hash table, which doesn't match my understanding of OR-amplification. For OR-amplification, multiple hash functions would be applied to compute multiple hash tables, and points placed in the same bucket in any hash table would be considered candidate neighbors. From the [comments](https://github.com/apache/spark/pull/15148/files#diff-e3391977ca23d69ff7201c8bdcd88437R36), it looks like the discrepancy might be due to some confusion between the number of hash functions applied and the dimensionality of the hash functions. This is a subtle point that I was confused about too, and it took me quite a while to work it out because different authors use the term "hash function" to refer to different things at different levels of abstraction. In one sense (at a lower level), a random projection is made up of many component hash functions, but in another sense (at a higher level) a random projection represents a single hash function for the purposes of OR-amplification. Given that the PR has already been merged, I concur that the best way forward is to adjust the comments and documentation. That probably involves changing the references to OR-amplification to simply refer to the dimensionality of the hash function. On the other issue you mentioned regarding mismatches between what's implemented and the linked documents, I think some of that confusion also stems from inconsistent terminology in the source material. LSH based on p-stable distributions (for Euclidean distance) does involve random projections, although the authors don't directly say so in the paper. There's a somewhat similar LSH method for cosine distance that's sometimes referred to as "sign random projection" (though the authors of the paper don't use that term either). Sign random projection is what the "Random Projection" section of the Wikipedia page is referring to; what's implemented here looks like LSH based p-stable distributions. Maybe one way to clarify would be to name the models after the distance measures they're intended to approximate, and provide explanations of the methods they use in the comments? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user sethah commented on the issue: https://github.com/apache/spark/pull/15148 I apologize for coming late to this, but I am taking a look at some of the documentation now. For `RandomProjection` class there are two links: one to wikipedia entry on stable distributions and one to a survey paper. The wikipedia links to the "stable distributions" section despite also having a section on random projections, which is the supposed algorithm. The paper has a "Random Projection" section as well - neither of the Random Projection methods in the links match the code here. I expressed this concern before. The approach in the Random Projection class does not match either the "Random Projection" method OR the "P-Stable distribution" methods that I find in the literature. I summarized this in a comment way up towards the top. If this method is some well-accepted hybrid of the two, fine, but I think the references would leave users quite confused. I think it's nice to have certainty about the practical effectiveness of this method since it has already been deployed in industry, so my main concern is really just documentation. Right now, we're linking to sources which describe distinctly different algorithms than what we have implemented. Thoughts? For convenience, some references: * http://cseweb.ucsd.edu/~dasgupta/254-embeddings/lawrence.pdf * https://en.wikipedia.org/wiki/Locality-sensitive_hashing#LSH_algorithm_for_nearest_neighbor_search * https://people.csail.mit.edu/indyk/p117-andoni.pdf --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/15148 Awesome! Thanks Joseph and thanks everyone else for reviewing this! ð --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15148 This LGTM now. Any other comments from other reviewers? I'll merge this, but we can follow up as needed. Thanks very much @Yunni for the PR and everyone else for helping to review! Merging with master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15148 **[Test build #67721 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67721/consoleFull)** for PR 15148 at commit [`3570845`](https://github.com/apache/spark/commit/35708458a0ee156c097ca604efeafaa37d3c8a6d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15148 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67721/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15148 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15148 **[Test build #67721 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67721/consoleFull)** for PR 15148 at commit [`3570845`](https://github.com/apache/spark/commit/35708458a0ee156c097ca604efeafaa37d3c8a6d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15148 **[Test build #67688 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67688/consoleFull)** for PR 15148 at commit [`97e1238`](https://github.com/apache/spark/commit/97e1238ddf14938539237facf354e0ce4fc4ed1c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15148 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67688/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15148 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15148 **[Test build #67688 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67688/consoleFull)** for PR 15148 at commit [`97e1238`](https://github.com/apache/spark/commit/97e1238ddf14938539237facf354e0ce4fc4ed1c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15148 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67683/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15148 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15148 **[Test build #67683 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67683/consoleFull)** for PR 15148 at commit [`6cda936`](https://github.com/apache/spark/commit/6cda936cf2c14f3e4c0e164b0d688fd4c8996b5d). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15148 **[Test build #67683 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67683/consoleFull)** for PR 15148 at commit [`6cda936`](https://github.com/apache/spark/commit/6cda936cf2c14f3e4c0e164b0d688fd4c8996b5d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15148 Only 2 comments remain, I believe: * I'd still like to remove the default outputCol value * Discussion about approxNearestNeighbors internals (in comments above) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15148 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67676/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15148 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15148 **[Test build #67676 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67676/consoleFull)** for PR 15148 at commit [`9a3704c`](https://github.com/apache/spark/commit/9a3704c6252c842c750c8cf98b0271ab51e3d44e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15148 **[Test build #67676 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67676/consoleFull)** for PR 15148 at commit [`9a3704c`](https://github.com/apache/spark/commit/9a3704c6252c842c750c8cf98b0271ab51e3d44e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15148 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67668/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15148 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15148 **[Test build #67668 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67668/consoleFull)** for PR 15148 at commit [`9bb3fd6`](https://github.com/apache/spark/commit/9bb3fd607519d245f72afedf95def63e0e7400a7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15148 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67665/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15148 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15148 **[Test build #67665 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67665/consoleFull)** for PR 15148 at commit [`20a9ebf`](https://github.com/apache/spark/commit/20a9ebf03d9bd1d32ea46454352a2ae5500ad5ea). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15148 **[Test build #67668 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67668/consoleFull)** for PR 15148 at commit [`9bb3fd6`](https://github.com/apache/spark/commit/9bb3fd607519d245f72afedf95def63e0e7400a7). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15148 **[Test build #67665 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67665/consoleFull)** for PR 15148 at commit [`20a9ebf`](https://github.com/apache/spark/commit/20a9ebf03d9bd1d32ea46454352a2ae5500ad5ea). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15148 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67609/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15148 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15148 **[Test build #67609 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67609/consoleFull)** for PR 15148 at commit [`1c4b9fb`](https://github.com/apache/spark/commit/1c4b9fb6821d5f86037a5f55976a72e85cb2440b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/15148 Thanks @jkbradley . I have made several changes to unit tests. Please let me know if I missed any. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15148 **[Test build #67609 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67609/consoleFull)** for PR 15148 at commit [`1c4b9fb`](https://github.com/apache/spark/commit/1c4b9fb6821d5f86037a5f55976a72e85cb2440b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15148 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67401/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15148 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15148 **[Test build #67401 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67401/consoleFull)** for PR 15148 at commit [`e14f73e`](https://github.com/apache/spark/commit/e14f73e8a49d409e09a6ed541d4b40f07dc81013). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15148 **[Test build #67401 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67401/consoleFull)** for PR 15148 at commit [`e14f73e`](https://github.com/apache/spark/commit/e14f73e8a49d409e09a6ed541d4b40f07dc81013). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15148 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15148 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67398/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15148 **[Test build #67398 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67398/consoleFull)** for PR 15148 at commit [`cad4ecb`](https://github.com/apache/spark/commit/cad4ecb3cea47e16b9c1073d30d8fd57bc397621). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/15148 Thanks @jkbradley. I have removed BitSampling and SignRandomProjection for a follow-up PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15148 **[Test build #67398 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67398/consoleFull)** for PR 15148 at commit [`cad4ecb`](https://github.com/apache/spark/commit/cad4ecb3cea47e16b9c1073d30d8fd57bc397621). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15148 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67055/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15148 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15148 **[Test build #67055 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67055/consoleFull)** for PR 15148 at commit [`66d553a`](https://github.com/apache/spark/commit/66d553a4e2bd8c219c09e17db11962cd49114a24). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15148 **[Test build #67055 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67055/consoleFull)** for PR 15148 at commit [`66d553a`](https://github.com/apache/spark/commit/66d553a4e2bd8c219c09e17db11962cd49114a24). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15148 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66914/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15148 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15148 **[Test build #66914 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66914/consoleFull)** for PR 15148 at commit [`a35e261`](https://github.com/apache/spark/commit/a35e26186a0d069e1c43907e257fa7b4ab31d140). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15148 **[Test build #66914 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66914/consoleFull)** for PR 15148 at commit [`a35e261`](https://github.com/apache/spark/commit/a35e26186a0d069e1c43907e257fa7b4ab31d140). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/15148 jenkins retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/15148 Have no idea to solve this MiMa test. Could anyone give some clue? ``` java.lang.ArrayIndexOutOfBoundsException: 1660 at com.typesafe.tools.mima.core.BufferReader.nextByte(BufferReader.scala:33) at com.typesafe.tools.mima.core.ClassfileParser$ConstantPool.(ClassfileParser.scala:91) at com.typesafe.tools.mima.core.ClassfileParser.parseAll(ClassfileParser.scala:67) at com.typesafe.tools.mima.core.ClassfileParser.parse(ClassfileParser.scala:59) at com.typesafe.tools.mima.core.ClassInfo.ensureLoaded(ClassInfo.scala:86) at com.typesafe.tools.mima.core.ClassInfo.methods(ClassInfo.scala:101) at com.typesafe.tools.mima.core.ClassInfo$$anonfun$lookupClassMethods$2.apply(ClassInfo.scala:123) at com.typesafe.tools.mima.core.ClassInfo$$anonfun$lookupClassMethods$2.apply(ClassInfo.scala:123) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15148 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15148 **[Test build #66873 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66873/consoleFull)** for PR 15148 at commit [`a35e261`](https://github.com/apache/spark/commit/a35e26186a0d069e1c43907e257fa7b4ab31d140). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15148 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66873/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15148 **[Test build #66873 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66873/consoleFull)** for PR 15148 at commit [`a35e261`](https://github.com/apache/spark/commit/a35e26186a0d069e1c43907e257fa7b4ab31d140). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15148 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15148 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66800/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15148 **[Test build #66800 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66800/consoleFull)** for PR 15148 at commit [`1b63173`](https://github.com/apache/spark/commit/1b6317396629b9f290a279dd735923c0fc8efd89). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class BitSampling(override val uid: String) extends LSH[BitSamplingModel]` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15148 **[Test build #66800 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66800/consoleFull)** for PR 15148 at commit [`1b63173`](https://github.com/apache/spark/commit/1b6317396629b9f290a279dd735923c0fc8efd89). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15148 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66774/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15148 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15148 **[Test build #66774 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66774/consoleFull)** for PR 15148 at commit [`19f6d89`](https://github.com/apache/spark/commit/19f6d8927f56f9e67a1d4f6d9a14722392469b5a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15148 **[Test build #66774 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66774/consoleFull)** for PR 15148 at commit [`19f6d89`](https://github.com/apache/spark/commit/19f6d8927f56f9e67a1d4f6d9a14722392469b5a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15148 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66717/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15148 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15148 **[Test build #66717 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66717/consoleFull)** for PR 15148 at commit [`2c95e5c`](https://github.com/apache/spark/commit/2c95e5c1d89e2db0350b5d8667e2ae8d293df7a9). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class MinHash(override val uid: String) extends LSH[MinHashModel] with HasSeed ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15148 **[Test build #66717 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66717/consoleFull)** for PR 15148 at commit [`2c95e5c`](https://github.com/apache/spark/commit/2c95e5c1d89e2db0350b5d8667e2ae8d293df7a9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15148 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66677/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15148 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15148 **[Test build #66677 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66677/consoleFull)** for PR 15148 at commit [`40d1f1b`](https://github.com/apache/spark/commit/40d1f1b077232a8feeb2dd66d9b846ded1839e63). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15148 **[Test build #66677 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66677/consoleFull)** for PR 15148 at commit [`40d1f1b`](https://github.com/apache/spark/commit/40d1f1b077232a8feeb2dd66d9b846ded1839e63). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15148 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15148 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/4/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15148 **[Test build #4 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/4/consoleFull)** for PR 15148 at commit [`142d8e9`](https://github.com/apache/spark/commit/142d8e96f7c7e5ef80b3fe11ada1be9cd499bc8a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15148 **[Test build #4 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/4/consoleFull)** for PR 15148 at commit [`142d8e9`](https://github.com/apache/spark/commit/142d8e96f7c7e5ef80b3fe11ada1be9cd499bc8a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15148 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66659/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15148 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15148 **[Test build #66659 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66659/consoleFull)** for PR 15148 at commit [`efe323c`](https://github.com/apache/spark/commit/efe323cd69b87cea6a19d39be0e480e9322b5fe5). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class MinHash(override val uid: String) extends LSH[MinHashModel] with MinHashParams ` * `class RandomProjection(override val uid: String) extends LSH[RandomProjectionModel]` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org