[
https://issues.apache.org/jira/browse/OPENNLP-197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046833#comment-13046833
]
Jörn Kottmann commented on OPENNLP-197:
---------------------------------------
The europarl file also contains sentences which do not have an end of sentence
character at the end. This might lead to invalid training events, because the
training code assumes that the last eos character in a line marks the end of a
sentence, but in the case there is non, one in the middel of the the sentence
will be accidentally considered at the end of the sentence.
> The UIMA "Sentence Detector Trainer" may build erratic models depending on
> the covered text format of the sentence annotations.
> -------------------------------------------------------------------------------------------------------------------------------
>
> Key: OPENNLP-197
> URL: https://issues.apache.org/jira/browse/OPENNLP-197
> Project: OpenNLP
> Issue Type: Question
> Components: UIMA Integration
> Reporter: Nicolas Hernandez
> Attachments: fr-sent.zip
>
>
> In the opennlp-uima subproject, the "Sentence Detector Training" component
> asks for a Sentence annotation type as a parameter.
> The component does not check whether each corresponding sentence is written
> in its own line.
> As a matter of fact the built model would not work as expected.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira