Replace the regex token class feature generation with the fast string pattern 
implementation
--------------------------------------------------------------------------------------------

                 Key: OPENNLP-172
                 URL: https://issues.apache.org/jira/browse/OPENNLP-172
             Project: OpenNLP
          Issue Type: Improvement
          Components: Name Finder
    Affects Versions: tools-1.5.1-incubating
            Reporter: Jörn Kottmann
            Assignee: Jörn Kottmann
            Priority: Minor
             Fix For: tools-1.5.2-incubating


The token class feature is computed with the help of regular expression, the 
regular expressions are slower than the new fast token class feature method 
which uses the Character class to compute the token class.

The old regular expression based token class feature computation should be 
replaced with the new fast token class method.
The output of both methods is identical, so changing this will not break 
backward compatibility, but increase the throughput of the name finder by 
roughly 10%.

A measurement on the Leipzig corpus with 300K sentences increased the 
throughput from 556 sent/s to 618 sent/s.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to