[ 
https://issues.apache.org/jira/browse/OPENNLP-29?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037955#comment-13037955
 ] 

Jörn Kottmann commented on OPENNLP-29:
--------------------------------------

In order to parallelize the implementation the computation of the model expects 
should be performed in multiple threads since a significant amount of time is 
spent there during training.

The model expects are frequently updated during the computation which makes 
parallelization inefficient because updates to the model expect would need to 
be synchronized. I evaluated different strategies of synchronization (lock free 
updated and locking), but all turned out to only improve the training runtime 
marginal. The additional computation power is almost lost through more 
expensive writes and waiting time.

For these reason the following strategies turned out to work almost as good as 
no synchronization.
The model expects are local to each thread and the n copies are joined after 
computing them. This solution is almost as fast as not synchronizing the model 
expect updates at all (which of course results incorrect parameters, but seems 
good enough as a runtime performance baseline). The solution has the 
disadvantage that the required amount of memory raises with the amount of 
threads, but is not seen as a problem because the model expect usually only 
need several ten MBs of memory per copy and moden multi core system usually 
have many GBs of memory. Additionally this parallelization strategy makes good 
use of the CPU core caches compared to a solution which shares model expects.


> Add multi threading support to GIS training
> -------------------------------------------
>
>                 Key: OPENNLP-29
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-29
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Maxent
>            Reporter: Jörn Kottmann
>            Assignee: Jörn Kottmann
>             Fix For: tools-1.5.2-incubating, maxent-3.0.2-incubating
>
>
> The GIS training is famous for taking quite some time to finish. Now days 
> CPUs have many cores. 
> The training algorithm should be updated to use multiple CPU cores to perform 
> the training.
> There are various approaches to solve this tasks, we will document them in 
> the wiki and
> discuss them on the mailing list.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to