Riccardo,

You can tune your sentence detector using a custom context generator.

At
http://svn.apache.org/viewvc/opennlp/trunk/opennlp-tools/src/test/java/opennlp/tools/sentdetect/
take a look at DummySentenceDetectorFactory.java
and SentenceDetectorFactoryTest.java

If you prefer a concrete example, take a look at an implementation I did
for another project:
https://github.com/cogroo/cogroo4/tree/master/cogroo-nlp/src/main/java/org/cogroo/tools/sentdetect

William


On Tue, Mar 26, 2013 at 9:52 AM, Riccardo Tasso <[email protected]>wrote:

> Thank you Jörn, in fact the results improved a lot:
> Precision: 0.5325131810193322
> Recall: 0.4745497259201253
> F-Measure: 0.5018633540372671
>
> I guess the splitter could have better results if it were able to detect
> parenthetic structure such as:
> some text - speech - other text
> which in my dataset is splitted as:
> some text
> - speech -
> other text
> Is it possible?
>
> Another optimization should be the one which could detect symbols to end a
> sentence longer than one character, for example "...".
>
> Can you tell me more about the following parameters?
>
>    - iterations
>    - cutoff
>
> Is there any guideline on how tune them?
>
> Cheers,
> Riccardo
>
>
>
> 2013/3/26 Jörn Kottmann <[email protected]>
>
> > On 03/26/2013 08:40 AM, Riccardo Tasso wrote:
> >
> >> Is the Sentence Detector able to split also on non dot characters? In my
> >> case there should be also other characters delimiting the end of a
> >> segment,
> >> such as: colon (:), dash (-), various kind of quotation marks (", `, ',
> >> ...).
> >>
> >
> > The Sentence Detector can only split on end-of-sentence characters, by
> > default these
> > are . ! ? but with 1.5.3 you can set them during training to your custom
> > set, there is
> > a command line argument for it on the Sentence Detector Trainer, haver a
> > look at the help.
> >
> > If you don't want to compile yourself use the 1.5.3 RC2 which we are
> > currently testing.
> >
> > Jörn
> >
> >
> >
>

Reply via email to