Stanley, Yes. What you say is correct. Feature hashing can cause degradation.
With multiple hashing, however, you do have a fairly strong guarantee that the feature hashing is very close to information preserving. This is related to the fact that the feature hashing operation is a random linear transformation. Since we are hashing to something that is still quite a high dimensional space, the information loss is likely to be minimal. On Wed, Apr 20, 2011 at 6:06 AM, Stanley Xu <[email protected]> wrote: > Dear all, > > Per my understand, what Feature Hashing did in SGD do compress the Feature > Dimensions to a fixed length Vector. Won't that make the training result > incorrect if Feature Hashing Collision happened? Won't the two features > hashed to the same slot would be thought as the same feature? Even if we > have multiple probes to reduce the total collision like a bloom filter. > Won't it also make the slot that has the collision looks like a combination > feature? > > Thanks. > > Best wishes, > Stanley Xu >
