[jira] [Commented] (OPENNLP-197) The UIMA "Sentence Detector Trainer" may build erratic models depending on the covered text format of the sentence annotations.

JIRA Wed, 08 Jun 2011 03:50:47 -0700

    [ 
https://issues.apache.org/jira/browse/OPENNLP-197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13045894#comment-13045894
 ]


Jörn Kottmann commented on OPENNLP-197:
---------------------------------------

The computation of the space previous and space next features in 
DefaultSDContextGenerator line 99 only considers a space character as a 
whitespace, but I think all kind of spaces should be considered here. That can 
easily be tested with StringUtil.isWhitespace.

> The UIMA "Sentence Detector Trainer" may build erratic models depending on 
> the covered text format of the sentence annotations.
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: OPENNLP-197
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-197
>             Project: OpenNLP
>          Issue Type: Bug
>          Components: UIMA Integration
>            Reporter: Nicolas Hernandez
>         Attachments: fr-sent.zip
>
>
> In the opennlp-uima subproject, the "Sentence Detector Training" component 
> asks for a Sentence annotation type as a parameter. 
> The component does not check whether each corresponding sentence is written 
> in its own line. 
> As a matter of fact the built model would not work as expected.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (OPENNLP-197) The UIMA "Sentence Detector Trainer" may build erratic models depending on the covered text format of the sentence annotations.

Reply via email to