[
https://issues.apache.org/jira/browse/OPENNLP-172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jörn Kottmann reopened OPENNLP-172:
-----------------------------------
Looks like it handles non-english content better, but that might causes
regressions, need more testing before the issue can be closed.
> Replace the regex token class feature generation with the fast string pattern
> implementation
> --------------------------------------------------------------------------------------------
>
> Key: OPENNLP-172
> URL: https://issues.apache.org/jira/browse/OPENNLP-172
> Project: OpenNLP
> Issue Type: Improvement
> Components: Name Finder
> Affects Versions: tools-1.5.1-incubating
> Reporter: Jörn Kottmann
> Assignee: Jörn Kottmann
> Priority: Minor
> Fix For: tools-1.5.2-incubating
>
>
> The token class feature is computed with the help of regular expression, the
> regular expressions are slower than the new fast token class feature method
> which uses the Character class to compute the token class.
> The old regular expression based token class feature computation should be
> replaced with the new fast token class method.
> The output of both methods is identical, so changing this will not break
> backward compatibility, but increase the throughput of the name finder by
> roughly 10%.
> A measurement on the Leipzig corpus with 300K sentences increased the
> throughput from 556 sent/s to 618 sent/s.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira