Hi Jörn,
I only modified the training process.
However, when I check the predictions it turns out that the model never
learns to split at ":" positions.
Shouldn't it be enought to modify the DefaultSDContextGenerator and the
DefaultEndOfSentenceScanner so that these know about ":" as an EOS,
right? Or are there other places where ":" should be added?
Best
Katrin
On 02/09/2012 09:18 AM, Joern Kottmann wrote:
Did you modify the evaluation as well? If you just do it during training the
evaluator will not be able to consider ":" as en EOS character.
For me it sounds like that it fails to split on the ":" in some place.
The sentence detector uses a maxent model to classify every EOS character
as either a SPLIT or NO_SPLIT.
Jörn
On Thu, Feb 9, 2012 at 8:59 AM, Katrin Tomanek
<katrin.toma...@averbis.com>wrote:
Hi Willian,
I am currently using opennlp-1.5.2 and try to use it as an API, i.e. not
to modify this code by write my own code around it. However, what I
described below (with the SDEventStream) results in the same as you are
describing: I am changing the set of EOS characters.
I am just wondering, why adding ":" as an EOS character decreases the
results (dropping von ~80F to 45F in sentence splitting, and ":" is always
a sentence boundary symbol in my data!)
Looks like I need to debug a little bit more whats happening in the
DefaultSDContextGenerator.
--
Dr. Katrin Tomanek
Averbis GmbH
Tennenbacher Strasse 11
D-79106 Freiburg
Fon: +49 (0) 761 - 203 97696
Fax: +49 (0) 761 - 203 97694
E-Mail: katrin.toma...@averbis.com
Geschäftsführer: Dr. med. Philipp Daumke, Dr. Kornél Markó
Sitz der Gesellschaft: Freiburg i. Br.
AG Freiburg i. Br., HRB 701080