[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

MLnick Wed, 09 Nov 2016 11:55:48 -0800

Github user MLnick commented on the issue:

    https://github.com/apache/spark/pull/15148
  
    > This is very common in academic research and literature, but it may not 
be in industry. I'm fine with not considering it for now.
    
    Ok makes sense - for the `transform` case if users are looking to directly 
use the hash sigs as lower-dim representation, they can always set `L=1` and 
`d` (assuming we do AND + OR later) to get just one "vector" output.
    
    For the public vals - sorry if I wan't clear. I meant we should probably 
not expose them until the API is fully baked. But yes I see that they are 
useful to expose once we're happy with the API. I just don't love the idea of 
changing things later (and throwing errors and whatnot) if we can avoid it - I 
think we saw similar issues with e.g. NaiveBayes now.
    
    > What about outputting a Matrix instead of an Array of Vectors? That will 
make it easy to change in the future, without us having weird Vectors of length 
1.
    
    Matrix can work - I don't think `Array[Vector]` is an issue either. I seem 
to recall a comment above that Matrix was a bit less easy to work with 
(exploding indices and so on). I don't see a big difference between an Lx1 
matrix and an L-length Array of 1-d vectors in practical terms. So, I'm ok with 
either approach.
    
    I'll check the JIRA - sorry I missed the links.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

Reply via email to