Tokenizers alpha numeric optimization only recognizes a-z as alpha chars
------------------------------------------------------------------------

                 Key: OPENNLP-141
                 URL: https://issues.apache.org/jira/browse/OPENNLP-141
             Project: OpenNLP
          Issue Type: Bug
          Components: Tokenizer
    Affects Versions: tools-1.5.0-sourceforge
            Reporter: Jörn Kottmann
            Priority: Minor


The Tokenizer has an optimization which skips tokens which are only made of 
numerics or alpha chars. In foreign languages the alpha chars contain umlauts 
and other letters which are not included in the a-z range.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to