[
https://issues.apache.org/jira/browse/OPENNLP-197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13045427#comment-13045427
]
Jörn Kottmann commented on OPENNLP-197:
---------------------------------------
Still not sure I understand you. The trainer takes the covered text of an
sentence annotation and performs whitespace based tokenization. So if you have
new lines, tabs or any other kind of spaces between two tokens, it should not
affect the trained model.
Do white spaces in a sentence annotation affect the trained model?
> The UIMA "Sentence Detector Trainer" may build erratic models depending on
> the covered text format of the sentence annotations.
> -------------------------------------------------------------------------------------------------------------------------------
>
> Key: OPENNLP-197
> URL: https://issues.apache.org/jira/browse/OPENNLP-197
> Project: OpenNLP
> Issue Type: Bug
> Components: UIMA Integration
> Reporter: Nicolas Hernandez
>
> In the opennlp-uima subproject, the "Sentence Detector Training" component
> asks for a Sentence annotation type as a parameter.
> The component does not check whether each corresponding sentence is written
> in its own line.
> As a matter of fact the built model would not work as expected.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira