Replace the regex token class feature generation with the fast string pattern
implementation
--------------------------------------------------------------------------------------------
Key: OPENNLP-172
URL: https://issues.apache.org/jira/browse/OPENNLP-172
Project: OpenNLP
Issue Type: Improvement
Components: Name Finder
Affects Versions: tools-1.5.1-incubating
Reporter: Jörn Kottmann
Assignee: Jörn Kottmann
Priority: Minor
Fix For: tools-1.5.2-incubating
The token class feature is computed with the help of regular expression, the
regular expressions are slower than the new fast token class feature method
which uses the Character class to compute the token class.
The old regular expression based token class feature computation should be
replaced with the new fast token class method.
The output of both methods is identical, so changing this will not break
backward compatibility, but increase the throughput of the name finder by
roughly 10%.
A measurement on the Leipzig corpus with 300K sentences increased the
throughput from 556 sent/s to 618 sent/s.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira