[
https://issues.apache.org/jira/browse/OPENNLP-196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13044776#comment-13044776
]
Jörn Kottmann commented on OPENNLP-196:
---------------------------------------
First test without the fix:
Got 64691 sequences
Indexing events using cutoff of 5
Computing event counts... done. 1422335 events
Indexing... done.
Sorting and merging events... Done indexing.
Incorporating indexed data for training...
done.
Number of Event Tokens: 1422335
Number of Outcomes: 45
Number of Predicates: 103444
Computing model parameters...
Performing 5 iterations.
1: . (1338465/1422335) 0.9410335821026692
2: . (1362083/1422335) 0.9576386716209613
3: . (1368995/1422335) 0.962498286268706
4: . (1373167/1422335) 0.9654314911747233
5: . (1376065/1422335) 0.9674689858577621
. (1381396/1422335) 0.971217048023145
...done.
Writing pos tagger model ... Compressed 103444 parameters to 74134
22966 outcome patterns
done (4.927s)
Wrote pos tagger model to
path: /Users/joern/dev/opennlp-apache/opennlp/opennlp-tools/en-pos.bin
real 24m35.059s
user 24m31.178s
sys 1m10.894s
Second test with the fix:
Got 64691 sequences
Indexing events using cutoff of 5
Computing event counts... done. 1422335 events
Indexing... done.
Sorting and merging events... Done indexing.
Incorporating indexed data for training...
done.
Number of Event Tokens: 1422335
Number of Outcomes: 45
Number of Predicates: 103444
Computing model parameters...
Performing 5 iterations.
1: . (1338465/1422335) 0.9410335821026692
2: . (1362083/1422335) 0.9576386716209613
3: . (1368995/1422335) 0.962498286268706
4: . (1373167/1422335) 0.9654314911747233
5: . (1376065/1422335) 0.9674689858577621
. (1381396/1422335) 0.971217048023145
...done.
Writing pos tagger model ... Compressed 103444 parameters to 74134
22966 outcome patterns
done (5.564s)
Wrote pos tagger model to
path: /Users/joern/dev/opennlp-apache/opennlp/opennlp-tools/en-pos.bin
real 14m34.409s
user 13m28.532s
sys 0m36.698s
> POS Tagger Sequence streams calls generateEvents in a loop
> -----------------------------------------------------------
>
> Key: OPENNLP-196
> URL: https://issues.apache.org/jira/browse/OPENNLP-196
> Project: OpenNLP
> Issue Type: Bug
> Components: POS Tagger
> Affects Versions: tools-1.5.1-incubating
> Reporter: Jörn Kottmann
> Assignee: Jörn Kottmann
> Priority: Trivial
> Fix For: tools-1.5.2-incubating
>
>
> The POS Tagger Sequence Stream class the generateEvents in a loop, but one
> call is enough.
> To fix this issue remove the loop around generateEvents.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira