UIMA Sentence Detector Trainer build models which does not split correctly the 
sentences
----------------------------------------------------------------------------------------

                 Key: OPENNLP-203
                 URL: https://issues.apache.org/jira/browse/OPENNLP-203
             Project: OpenNLP
          Issue Type: Bug
          Components: UIMA Integration
         Environment: OS
Linux version 2.6.32-30-generic (buildd@vernadsky) (gcc version 4.4.3 (Ubuntu 
4.4.3-4ubuntu5) ) #59-Ubuntu SMP Tue Mar 1 21:30:21 UTC 2011

JVM
java version "1.6.0_17"
Java(TM) SE Runtime Environment (build 1.6.0_17-b04)
Java HotSpot(TM) Server VM (build 14.3-b01, mixed mode)
            Reporter: Nicolas Hernandez


The models trained with the UIMA component give wrong begin/end offset despite 
the fact they manage to split text in sentences. 
I observed that the begin of a current sentence starts including as a first 
token the punctuation character of the previous one while the
previous one does not include it as its last one. 


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to