Doccats bag of word feature generator should not use numbers as features
------------------------------------------------------------------------

                 Key: OPENNLP-327
                 URL: https://issues.apache.org/jira/browse/OPENNLP-327
             Project: OpenNLP
          Issue Type: Improvement
          Components: Doccat
            Reporter: Joern Kottmann
            Assignee: Joern Kottmann
            Priority: Minor


It turned out that Doccats bag of word feature generator can be very sensitive 
to numbers when used for language identification. Therefore numbers should not 
be included in the bag of words.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to