Hi everybody,
I was just evaluating the opennlp sentence detector trained on some of
our data (using the Evaluator-class provided with opennlp). It did not
perform very well and when I checked out the misclassified sentences and
debugged a little bit, I realized that only these EOS (end of sentence)
characters are currently supported:
'.', '!', '?'
However, in our case we have many other EOS (":" as one of the most
common ones)
As I understood, the EOS s definied in DefaultSDContextGenerator.java
which is called from SentenceDetectorME.train(...).
If I got it correctly, there is currently no way to configure (as a
parameter or so) the EOS characters. Right?
Of course, I could write my own train method and do things differently,
but then, I would not be able to use the Evaluator and CrossValidator
classes which I find very handy.
Did I miss understand anything and is there a way to configure which EOS
characters should be used ? If not, do you think it would be a good
thing to have and if so, how can I contribute at this point?
Best
Katrin
--
Dr. Katrin Tomanek
Averbis GmbH
Tennenbacher Strasse 11
D-79106 Freiburg
Fon: +49 (0) 761 - 203 97696
Fax: +49 (0) 761 - 203 97694
E-Mail: katrin.toma...@averbis.com
Geschäftsführer: Dr. med. Philipp Daumke, Dr. Kornél Markó
Sitz der Gesellschaft: Freiburg i. Br.
AG Freiburg i. Br., HRB 701080