[ 
https://issues.apache.org/jira/browse/UIMA-1447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734083#action_12734083
 ] 

Jérôme Rocheteau commented on UIMA-1447:
----------------------------------------

I just would like to known if the Whitespace Tokenizer behaves as expected when 
a '\t' character have to be annotated as a TokenAnnotation or not following a ' 
' character? And then I would like to know if a patch could be applied? 

I don't have examples where the getCoveredText method returns a 0 length 
string. Actually, it returns a 1 length string even if it displays an empty 
string.

> Tabulations are annotated as tokens after a space
> -------------------------------------------------
>
>                 Key: UIMA-1447
>                 URL: https://issues.apache.org/jira/browse/UIMA-1447
>             Project: UIMA
>          Issue Type: Bug
>          Components: Sandbox-WhitespaceTokenizer
>    Affects Versions: 2.3S
>         Environment: Unix (ubuntu 8.04), Eclipse Galileo 3.5
>            Reporter: Jérôme Rocheteau
>
> This is a test-text for the Whitespace Tokenizer in the UIMA Sandbox. 
> It behaves as follows:        i.e. a '\t' character after a space is 
> annotated as a token and its covered text is set to the empty string ""! 
> I suppose it shoudn't be the case, am I wrong?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to