On Wed, Feb 8, 2012 at 5:52 PM, Katrin Tomanek
<katrin.toma...@averbis.com>wrote:

> Hi everybody,
>
> I was just evaluating the opennlp sentence detector trained on some of our
> data (using the Evaluator-class provided with opennlp). It did not perform
> very well and when I checked out the misclassified sentences and debugged a
> little bit, I realized that only these EOS (end of sentence) characters are
> currently supported:
>
> '.', '!', '?'
>
> However, in our case we have many other EOS (":" as one of the most common
> ones)
>
> As I understood, the EOS s definied in DefaultSDContextGenerator.java
> which is called from SentenceDetectorME.train(...).
>
> If I got it correctly, there is currently no way to configure (as a
> parameter or so) the EOS characters. Right?
>
> Of course, I could write my own train method and do things differently,
> but then, I would not be able to use the Evaluator and CrossValidator
> classes which I find very handy.
>
> Did I miss understand anything and is there a way to configure which EOS
> characters should be used ? If not, do you think it would be a good thing
> to have and if so, how can I contribute at this point?
>
>

You are absolutely right we should have this option. William just started a
thread on the dev list
to discuss this.

Our current idea to solve it is that you can pass in the name of a Factory
class which can
put the SentenceDetector together the way you need it.

But when I now think about it we maybe should define a Properties file
which can contain
custom configuration for a component. In this file we could have a property
for a custom factory
class and maybe a property which contains the EOS chars for the Sentence
Detector.

Anyway help is always very welcome. We should make a decision on how we
will implement
it in the thread on the dev list and then we can open a few jiras to
actually do the work.
This way you should be able to contribute easily.

Jörn

Reply via email to