Stanley,

Yes.  What you say is correct.  Feature hashing can cause degradation.

With multiple hashing, however, you do have a fairly strong guarantee that
the feature hashing is very close to information preserving.  This is
related to the fact that the feature hashing operation is a random linear
transformation.  Since we are hashing to something that is still quite a
high dimensional space, the information loss is likely to be minimal.

On Wed, Apr 20, 2011 at 6:06 AM, Stanley Xu <[email protected]> wrote:

> Dear all,
>
> Per my understand, what Feature Hashing did in SGD do compress the Feature
> Dimensions to a fixed length Vector. Won't that make the training result
> incorrect if Feature Hashing Collision happened? Won't the two features
> hashed to the same slot would be thought as the same feature? Even if we
> have multiple probes to reduce the total collision like a bloom filter.
> Won't it also make the slot that has the collision looks like a combination
> feature?
>
> Thanks.
>
> Best wishes,
> Stanley Xu
>

Reply via email to