[
https://issues.apache.org/jira/browse/OPENNLP-397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13154234#comment-13154234
]
Joern Kottmann commented on OPENNLP-397:
----------------------------------------
The POS Tagger assumes that you give it input which is tokenized (means tokens
are white space separated) and one sentence per line.
Anyway I will redo my measurements on the test data I used before. The actual
data shouldn't make a big difference when used for both measurements (as you
did).
As far as I know does java.util.HashMap always us a power of two for the array
size, and we use the load factor directly. When the array is larger the map
usually is faster because you get less collision.
> IndexHashTable can be improved
> ------------------------------
>
> Key: OPENNLP-397
> URL: https://issues.apache.org/jira/browse/OPENNLP-397
> Project: OpenNLP
> Issue Type: Improvement
> Components: Maxent
> Affects Versions: maxent-3.0.3-incubating
> Reporter: Catalin Mititelu
> Priority: Minor
> Labels: patch
> Attachments: patch-IndexHashTable.txt
>
>
> Running a profiler on POSTagger with an maxent model showed me a lot of CPU
> usage on IndexHashTable class. This class can be optimized to be faster.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira