Have a look at Ted's talk about Mahout's SGD classifier: http://vimeo.com/21273655

As far as I remember he also covers the hashing issues you describe.

--sebastian

On 20.04.2011 15:06, Stanley Xu wrote:
Dear all,

Per my understand, what Feature Hashing did in SGD do compress the Feature
Dimensions to a fixed length Vector. Won't that make the training result
incorrect if Feature Hashing Collision happened? Won't the two features
hashed to the same slot would be thought as the same feature? Even if we
have multiple probes to reduce the total collision like a bloom filter.
Won't it also make the slot that has the collision looks like a combination
feature?

Thanks.

Best wishes,
Stanley Xu


Reply via email to