Tokenizers alpha numeric optimization only recognizes a-z as alpha chars
------------------------------------------------------------------------
Key: OPENNLP-141
URL: https://issues.apache.org/jira/browse/OPENNLP-141
Project: OpenNLP
Issue Type: Bug
Components: Tokenizer
Affects Versions: tools-1.5.0-sourceforge
Reporter: Jörn Kottmann
Priority: Minor
The Tokenizer has an optimization which skips tokens which are only made of
numerics or alpha chars. In foreign languages the alpha chars contain umlauts
and other letters which are not included in the a-z range.
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira