Github user MLnick commented on the issue:
https://github.com/apache/spark/pull/15148
> This is very common in academic research and literature, but it may not
be in industry. I'm fine with not considering it for now.
Ok makes sense - for the `transform` case if users are looking to directly
use the hash sigs as lower-dim representation, they can always set `L=1` and
`d` (assuming we do AND + OR later) to get just one "vector" output.
For the public vals - sorry if I wan't clear. I meant we should probably
not expose them until the API is fully baked. But yes I see that they are
useful to expose once we're happy with the API. I just don't love the idea of
changing things later (and throwing errors and whatnot) if we can avoid it - I
think we saw similar issues with e.g. NaiveBayes now.
> What about outputting a Matrix instead of an Array of Vectors? That will
make it easy to change in the future, without us having weird Vectors of length
1.
Matrix can work - I don't think `Array[Vector]` is an issue either. I seem
to recall a comment above that Matrix was a bit less easy to work with
(exploding indices and so on). I don't see a big difference between an Lx1
matrix and an L-length Array of 1-d vectors in practical terms. So, I'm ok with
either approach.
I'll check the JIRA - sorry I missed the links.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]