[ 
https://issues.apache.org/jira/browse/UIMA-1447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734745#action_12734745
 ] 

Marshall Schor commented on UIMA-1447:
--------------------------------------

A couple of comments:

1) if we change the behavior of this annotator, it may cause other uses of it 
to now fail, because they were built with the previous behavior in mind.

2) If we solve (1), and want to have a version of this annotator which defines 
whitespace differently, then I would prefer Jörn's fix because it puts all the 
programming logic involved in determining the meaning of Whitespace in one spot.

A possible fix for (1) would be to add an optional parameter (that defaults if 
not specified to the current mode of operation) that, when set, causes this 
alternate view of Whitespace to be used.  

Of course, another fix is just to have users that want other definitions of 
character classes, to copy this annotator, and rename it somewhat, and change 
the code to their liking :-) .

In this particular case, I agree with Jérôme that the definition of Whitespace 
is not what I would think is normally expected, so I'm in favor of finding some 
way to correct this (without breaking backward compatibility).

> Tabulations are annotated as tokens after a space
> -------------------------------------------------
>
>                 Key: UIMA-1447
>                 URL: https://issues.apache.org/jira/browse/UIMA-1447
>             Project: UIMA
>          Issue Type: Bug
>          Components: Sandbox-WhitespaceTokenizer
>    Affects Versions: 2.3S
>         Environment: Unix (ubuntu 8.04), Eclipse Galileo 3.5
>            Reporter: Jérôme Rocheteau
>         Attachments: patch-an-wst.txt
>
>
> This is a test-text for the Whitespace Tokenizer in the UIMA Sandbox. 
> It behaves as follows:        i.e. a '\t' character after a space is 
> annotated as a token and its covered text is set to the empty string ""! 
> I suppose it shoudn't be the case, am I wrong?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to