[
https://issues.apache.org/jira/browse/OPENNLP-29?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037955#comment-13037955
]
Jörn Kottmann commented on OPENNLP-29:
--------------------------------------
In order to parallelize the implementation the computation of the model expects
should be performed in multiple threads since a significant amount of time is
spent there during training.
The model expects are frequently updated during the computation which makes
parallelization inefficient because updates to the model expect would need to
be synchronized. I evaluated different strategies of synchronization (lock free
updated and locking), but all turned out to only improve the training runtime
marginal. The additional computation power is almost lost through more
expensive writes and waiting time.
For these reason the following strategies turned out to work almost as good as
no synchronization.
The model expects are local to each thread and the n copies are joined after
computing them. This solution is almost as fast as not synchronizing the model
expect updates at all (which of course results incorrect parameters, but seems
good enough as a runtime performance baseline). The solution has the
disadvantage that the required amount of memory raises with the amount of
threads, but is not seen as a problem because the model expect usually only
need several ten MBs of memory per copy and moden multi core system usually
have many GBs of memory. Additionally this parallelization strategy makes good
use of the CPU core caches compared to a solution which shares model expects.
> Add multi threading support to GIS training
> -------------------------------------------
>
> Key: OPENNLP-29
> URL: https://issues.apache.org/jira/browse/OPENNLP-29
> Project: OpenNLP
> Issue Type: Improvement
> Components: Maxent
> Reporter: Jörn Kottmann
> Assignee: Jörn Kottmann
> Fix For: tools-1.5.2-incubating, maxent-3.0.2-incubating
>
>
> The GIS training is famous for taking quite some time to finish. Now days
> CPUs have many cores.
> The training algorithm should be updated to use multiple CPU cores to perform
> the training.
> There are various approaches to solve this tasks, we will document them in
> the wiki and
> discuss them on the mailing list.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira