Hi Jörn, On Thu, Sep 1, 2011 at 10:45 AM, Jörn Kottmann <[email protected]> wrote:
> Hi All, > > I did a little testing with the German CONLL03 data, we only > get a recall of around 38% and a precision of 82% on the > development data for person names. > > I wonder what we are doing wrong here, that the numbers are > so bad compared to other systems which participated back than and > get a similar precision but much higher recall. > > Is the lack of lemma and pos features causing this? Or could it > be something else? > > These guys have a much better recall, and also use a maxent based > system: > http://www.cnts.ua.ac.be/**conll2003/pdf/18083kle.pdf<http://www.cnts.ua.ac.be/conll2003/pdf/18083kle.pdf> > > Any ideas what could be done to improve our name finder? > > Jörn > Maybe you need some language specific features. I just evaluated the Portuguese proper name finder with the default OpenNLP features and got the following: Evaluated 56994 samples with 26462 entities; found: 26623 entities; correct: 23077. TOTAL: precision: 86,68%; recall: 87,21%; F1: 86,94%. prop: precision: 86,68%; recall: 87,21%; F1: 86,94%. [target: 26462; tp: 23077; fp: 3546] A friend of mine is working directly with Maxent and got better results because he is using specific features he developed for Portuguese. But it is really difficult to tune it. William
