Re: OpenNLP Sentence Detector: EOS Characters

Katrin Tomanek Thu, 09 Feb 2012 00:21:38 -0800

Hi Jörn,

I only modified the training process.

However, when I check the predictions it turns out that the model neverlearns to split at ":" positions.

Shouldn't it be enought to modify the DefaultSDContextGenerator and theDefaultEndOfSentenceScanner so that these know about ":" as an EOS,right? Or are there other places where ":" should be added?


Best
Katrin


On 02/09/2012 09:18 AM, Joern Kottmann wrote:

Did you modify the evaluation as well? If you just do it during training the
evaluator will not be able to consider ":" as en EOS character.

For me it sounds like that it fails to split on the ":" in some place.

The sentence detector uses a maxent model to classify every EOS character
as either a SPLIT or NO_SPLIT.

Jörn

On Thu, Feb 9, 2012 at 8:59 AM, Katrin Tomanek
<katrin.toma...@averbis.com>wrote:

Hi Willian,

I am currently using opennlp-1.5.2 and try to use it as an API, i.e. not
to modify this code by write my own code around it. However, what I
described below (with the SDEventStream) results in the same as you are
describing: I am changing the set of EOS characters.

I am just wondering, why adding ":" as an EOS character decreases the
results (dropping von ~80F to 45F in sentence splitting, and ":" is always
a sentence boundary symbol in my data!)

Looks like I need to debug a little bit more whats happening in the
DefaultSDContextGenerator.



--
Dr. Katrin Tomanek
Averbis GmbH
Tennenbacher Strasse 11
D-79106 Freiburg

Fon: +49 (0) 761 - 203 97696
Fax: +49 (0) 761 - 203 97694
E-Mail: katrin.toma...@averbis.com

Geschäftsführer: Dr. med. Philipp Daumke, Dr. Kornél Markó
Sitz der Gesellschaft: Freiburg i. Br.
AG Freiburg i. Br., HRB 701080

Re: OpenNLP Sentence Detector: EOS Characters

Reply via email to