Doccats bag of word feature generator should not use numbers as features
------------------------------------------------------------------------
Key: OPENNLP-327
URL: https://issues.apache.org/jira/browse/OPENNLP-327
Project: OpenNLP
Issue Type: Improvement
Components: Doccat
Reporter: Joern Kottmann
Assignee: Joern Kottmann
Priority: Minor
It turned out that Doccats bag of word feature generator can be very sensitive
to numbers when used for language identification. Therefore numbers should not
be included in the bag of words.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira