[ 
https://issues.apache.org/jira/browse/OPENNLP-397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13154234#comment-13154234
 ] 

Joern Kottmann commented on OPENNLP-397:
----------------------------------------

The POS Tagger assumes that you give it input which is tokenized (means tokens 
are white space separated) and one sentence per line.

Anyway I will redo my measurements on the test data I used before. The actual 
data shouldn't make a big difference when used for both measurements (as you 
did).

As far as I know does java.util.HashMap always us a power of two for the array 
size, and we use the load factor directly. When the array is larger the map 
usually is faster because you get less collision.
                
> IndexHashTable can be improved
> ------------------------------
>
>                 Key: OPENNLP-397
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-397
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Maxent
>    Affects Versions: maxent-3.0.3-incubating
>            Reporter: Catalin Mititelu
>            Priority: Minor
>              Labels: patch
>         Attachments: patch-IndexHashTable.txt
>
>
> Running a profiler on POSTagger with an maxent model showed me a lot of CPU 
> usage on IndexHashTable class. This class can be optimized to be faster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to