[jira] [Reopened] (OPENNLP-172) Replace the regex token class feature generation with the fast string pattern implementation

JIRA Mon, 16 May 2011 06:45:30 -0700

     [ 
https://issues.apache.org/jira/browse/OPENNLP-172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jörn Kottmann reopened OPENNLP-172:
-----------------------------------


Looks like it handles non-english content better, but that might causes 
regressions, need more testing before the issue can be closed.

> Replace the regex token class feature generation with the fast string pattern 
> implementation
> --------------------------------------------------------------------------------------------
>
>                 Key: OPENNLP-172
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-172
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Name Finder
>    Affects Versions: tools-1.5.1-incubating
>            Reporter: Jörn Kottmann
>            Assignee: Jörn Kottmann
>            Priority: Minor
>             Fix For: tools-1.5.2-incubating
>
>
> The token class feature is computed with the help of regular expression, the 
> regular expressions are slower than the new fast token class feature method 
> which uses the Character class to compute the token class.
> The old regular expression based token class feature computation should be 
> replaced with the new fast token class method.
> The output of both methods is identical, so changing this will not break 
> backward compatibility, but increase the throughput of the name finder by 
> roughly 10%.
> A measurement on the Leipzig corpus with 300K sentences increased the 
> throughput from 556 sent/s to 618 sent/s.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Reopened] (OPENNLP-172) Replace the regex token class feature generation with the fast string pattern implementation

Reply via email to