Negative values are not really there to compensate for hash collisions. It's there because that makes the hashed vector space an approximation to the full vector space under inner product.
On 2 October 2016 at 00:17, Roman Yurchak <[email protected]> wrote: > On 01/10/16 15:34, Moyi Dang wrote: > > However, I don't understand why the negatives are there in the first > > place, or what they mean. I'm not sure if the absolute values are > > corresponding to the token counts. > > > > Can someone please help explain what the HashingVectorizer is doing? How > > do I get the HashingVectorizer to return token counts? > > Hi Moyi, > > it's a mechanism to compensate for hash collisions, see > https://github.com/scikit-learn/scikit-learn/issues/7513 The absolute > values are token counts for most practical applications (if you don't > have too many collisions). There will be a PR shortly to make this more > consistent. > > > _______________________________________________ > scikit-learn mailing list > [email protected] > https://mail.python.org/mailman/listinfo/scikit-learn >
_______________________________________________ scikit-learn mailing list [email protected] https://mail.python.org/mailman/listinfo/scikit-learn
