On 9/1/11 4:50 PM, [email protected] wrote:
Maybe you need some language specific features. I just evaluated the
Portuguese proper name finder with the default OpenNLP features and got the
following:


Evaluated 56994 samples with 26462 entities; found: 26623 entities; correct:
23077.
        TOTAL: precision:   86,68%;  recall:   87,21%; F1:   86,94%.
         prop: precision:   86,68%;  recall:   87,21%; F1:   86,94%. [target:
26462; tp: 23077; fp: 3546]

A friend of mine is working directly with Maxent and got better results
because he is using specific features he developed for Portuguese. But it is
really difficult to tune it.

I am still not sure how the feature generation should be modified, these papers suggest that using prefix and suffix features help. And we already have such feature
generators, when I use these the recall goes up a little and the precision.
I got now 85% precision, and 44% recall, but I still would like to get a much higher
recall some where in the range of 70% or even 80%.

Some also use trigger words, not sure if that helps much, or other dictionaries.
Maybe compound noun splitting helps, not sure.

Or should I try to use a topic model, like they do in more modern NERs?

Jörn

Reply via email to