[ 
https://issues.apache.org/jira/browse/OPENNLP-397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13154219#comment-13154219
 ] 

Catalin Mititelu commented on OPENNLP-397:
------------------------------------------

I used a profiler to detect why is "so slow" on POS parsing. I run also some 
tests before and after patch. I'm running on an i7 machine with 16GB memory, 
the used model is en-pos-maxent.bin. The test file is about 13M for the 
following results:
Before (3 steps): 
1st step
bin/opennlp POSTagger models/en-pos-maxent.bin <samples/ebooks.txt 
>samples/ebooks-en-pos-maxent.txt
Loading POS Tagger model ... done (1.192s)
Average: 3285.3 sent/s 
Total: 281320 sent
Runtime: 85.629s


2nd step
bin/opennlp POSTagger models/en-pos-maxent.bin <samples/ebooks.txt 
>samples/ebooks-en-pos-maxent2.txt
Loading POS Tagger model ... done (1.136s)
Average: 3926.6 sent/s 
Total: 281320 sent
Runtime: 71.644s


3rd step
bin/opennlp POSTagger models/en-pos-maxent.bin <samples/ebooks.txt 
>samples/ebooks-en-pos-maxent3.txt
Loading POS Tagger model ... done (0.930s)
Average: 3952.2 sent/s 
Total: 281320 sent
Runtime: 71.181s


After patch (using a HashMap) again in 3 steps:

1st step
bin/opennlp POSTagger models/en-pos-maxent.bin <samples/ebooks.txt 
>samples/ebooks-en-pos-maxent-patched.txt
Loading POS Tagger model ... done (0.920s)
Average: 5711.3 sent/s 
Total: 281320 sent
Runtime: 49.257s


2nd step
bin/opennlp POSTagger models/en-pos-maxent.bin <samples/ebooks.txt 
>samples/ebooks-en-pos-maxent-patched2.txt
Loading POS Tagger model ... done (0.927s)
Average: 5739.8 sent/s 
Total: 281320 sent
Runtime: 49.012s

3rd step
bin/opennlp POSTagger models/en-pos-maxent.bin <samples/ebooks.txt 
>samples/ebooks-en-pos-maxent-patched3.txt
Loading POS Tagger model ... done (0.928s)
Average: 5716.5 sent/s 
Total: 281320 sent
Runtime: 49.212s


I don't have any information about what memory is necessary.

Regards,
Catalin
                
> IndexHashTable can be improved
> ------------------------------
>
>                 Key: OPENNLP-397
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-397
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Maxent
>    Affects Versions: maxent-3.0.3-incubating
>            Reporter: Catalin Mititelu
>            Priority: Minor
>              Labels: patch
>         Attachments: patch-IndexHashTable.txt
>
>
> Running a profiler on POSTagger with an maxent model showed me a lot of CPU 
> usage on IndexHashTable class. This class can be optimized to be faster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to