I should have mentioned that to use your factory you can simply specify the fully qualified name in the command line tool argument "-factory".
On Tue, Mar 26, 2013 at 10:21 AM, William Colen <[email protected]>wrote: > Riccardo, > > You can tune your sentence detector using a custom context generator. > > At > http://svn.apache.org/viewvc/opennlp/trunk/opennlp-tools/src/test/java/opennlp/tools/sentdetect/ > take a look at DummySentenceDetectorFactory.java > and SentenceDetectorFactoryTest.java > > If you prefer a concrete example, take a look at an implementation I did > for another project: > > https://github.com/cogroo/cogroo4/tree/master/cogroo-nlp/src/main/java/org/cogroo/tools/sentdetect > > William > > > On Tue, Mar 26, 2013 at 9:52 AM, Riccardo Tasso > <[email protected]>wrote: > >> Thank you Jörn, in fact the results improved a lot: >> Precision: 0.5325131810193322 >> Recall: 0.4745497259201253 >> F-Measure: 0.5018633540372671 >> >> I guess the splitter could have better results if it were able to detect >> parenthetic structure such as: >> some text - speech - other text >> which in my dataset is splitted as: >> some text >> - speech - >> other text >> Is it possible? >> >> Another optimization should be the one which could detect symbols to end a >> sentence longer than one character, for example "...". >> >> Can you tell me more about the following parameters? >> >> - iterations >> - cutoff >> >> Is there any guideline on how tune them? >> >> Cheers, >> Riccardo >> >> >> >> 2013/3/26 Jörn Kottmann <[email protected]> >> >> > On 03/26/2013 08:40 AM, Riccardo Tasso wrote: >> > >> >> Is the Sentence Detector able to split also on non dot characters? In >> my >> >> case there should be also other characters delimiting the end of a >> >> segment, >> >> such as: colon (:), dash (-), various kind of quotation marks (", `, ', >> >> ...). >> >> >> > >> > The Sentence Detector can only split on end-of-sentence characters, by >> > default these >> > are . ! ? but with 1.5.3 you can set them during training to your custom >> > set, there is >> > a command line argument for it on the Sentence Detector Trainer, haver a >> > look at the help. >> > >> > If you don't want to compile yourself use the 1.5.3 RC2 which we are >> > currently testing. >> > >> > Jörn >> > >> > >> > >> > >
