[
https://issues.apache.org/jira/browse/OPENNLP-397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13154219#comment-13154219
]
Catalin Mititelu commented on OPENNLP-397:
------------------------------------------
I used a profiler to detect why is "so slow" on POS parsing. I run also some
tests before and after patch. I'm running on an i7 machine with 16GB memory,
the used model is en-pos-maxent.bin. The test file is about 13M for the
following results:
Before (3 steps):
1st step
bin/opennlp POSTagger models/en-pos-maxent.bin <samples/ebooks.txt
>samples/ebooks-en-pos-maxent.txt
Loading POS Tagger model ... done (1.192s)
Average: 3285.3 sent/s
Total: 281320 sent
Runtime: 85.629s
2nd step
bin/opennlp POSTagger models/en-pos-maxent.bin <samples/ebooks.txt
>samples/ebooks-en-pos-maxent2.txt
Loading POS Tagger model ... done (1.136s)
Average: 3926.6 sent/s
Total: 281320 sent
Runtime: 71.644s
3rd step
bin/opennlp POSTagger models/en-pos-maxent.bin <samples/ebooks.txt
>samples/ebooks-en-pos-maxent3.txt
Loading POS Tagger model ... done (0.930s)
Average: 3952.2 sent/s
Total: 281320 sent
Runtime: 71.181s
After patch (using a HashMap) again in 3 steps:
1st step
bin/opennlp POSTagger models/en-pos-maxent.bin <samples/ebooks.txt
>samples/ebooks-en-pos-maxent-patched.txt
Loading POS Tagger model ... done (0.920s)
Average: 5711.3 sent/s
Total: 281320 sent
Runtime: 49.257s
2nd step
bin/opennlp POSTagger models/en-pos-maxent.bin <samples/ebooks.txt
>samples/ebooks-en-pos-maxent-patched2.txt
Loading POS Tagger model ... done (0.927s)
Average: 5739.8 sent/s
Total: 281320 sent
Runtime: 49.012s
3rd step
bin/opennlp POSTagger models/en-pos-maxent.bin <samples/ebooks.txt
>samples/ebooks-en-pos-maxent-patched3.txt
Loading POS Tagger model ... done (0.928s)
Average: 5716.5 sent/s
Total: 281320 sent
Runtime: 49.212s
I don't have any information about what memory is necessary.
Regards,
Catalin
> IndexHashTable can be improved
> ------------------------------
>
> Key: OPENNLP-397
> URL: https://issues.apache.org/jira/browse/OPENNLP-397
> Project: OpenNLP
> Issue Type: Improvement
> Components: Maxent
> Affects Versions: maxent-3.0.3-incubating
> Reporter: Catalin Mititelu
> Priority: Minor
> Labels: patch
> Attachments: patch-IndexHashTable.txt
>
>
> Running a profiler on POSTagger with an maxent model showed me a lot of CPU
> usage on IndexHashTable class. This class can be optimized to be faster.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira