Hi everybody,

I was just evaluating the opennlp sentence detector trained on some of our data (using the Evaluator-class provided with opennlp). It did not perform very well and when I checked out the misclassified sentences and debugged a little bit, I realized that only these EOS (end of sentence) characters are currently supported:

'.', '!', '?'

However, in our case we have many other EOS (":" as one of the most common ones)

As I understood, the EOS s definied in DefaultSDContextGenerator.java which is called from SentenceDetectorME.train(...).

If I got it correctly, there is currently no way to configure (as a parameter or so) the EOS characters. Right?

Of course, I could write my own train method and do things differently, but then, I would not be able to use the Evaluator and CrossValidator classes which I find very handy.

Did I miss understand anything and is there a way to configure which EOS characters should be used ? If not, do you think it would be a good thing to have and if so, how can I contribute at this point?

Best
Katrin



--
Dr. Katrin Tomanek
Averbis GmbH
Tennenbacher Strasse 11
D-79106 Freiburg

Fon: +49 (0) 761 - 203 97696
Fax: +49 (0) 761 - 203 97694
E-Mail: katrin.toma...@averbis.com

Geschäftsführer: Dr. med. Philipp Daumke, Dr. Kornél Markó
Sitz der Gesellschaft: Freiburg i. Br.
AG Freiburg i. Br., HRB 701080

Reply via email to