Re: Conditional Random Fields

Olivier Grisel Tue, 25 Jan 2011 16:11:06 -0800

2011/1/26 Jörn Kottmann <[email protected]>:
>
> I tested that on the Leipzig Corpora for all languages, and generated
> all possible features. In the end I did not see a single hash collision.
>
> Even if there are collisions once in a while it might not harm the detection
> performance that much.


Natural languages are very redundant, and linear models can tolerate a
quite a high ratio of collisions before seeing the performance
degrade:

  http://hunch.net/~jl/projects/hash_reps/index.html

I think you can project to 1e6 dimensions with a hash function without
any problem in practice for text categorization. I don't know for
other kind of problems such as NER features.

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

Re: Conditional Random Fields

Reply via email to