On 2/22/11 4:55 PM, Grant Ingersoll wrote:
Hi,

I'm using 1.4.3, but it looks like trunk has the same issue.  That is, it 
doesn't appear like the POSTaggerME class is thread safe, but perhaps I am 
misreading it.  I ask this, because it seems like the capturing of the 
bestSequence instance is a member variable and the tag and probs methods both 
access this method.  The reason I ask, is b/c I want to use this inside of 
Solr, but that is multithreaded and could be serving up a lot of requests and I 
certainly can't afford to load the model for each request.  The fix for this 
particular class seems relatively straightforward, at the cost of breaking back 
compatibility of the API (which is a whole other topic)

I haven't looked deeper, but are there any other classes that I should be aware 
of w/ thread safety that people can think of?

The components are not thread safe. They must only be called from one thread.
How to run OpenNLP in multiple threads then?

The models are thread-safe (because they are immutable) and can be shared
between multiple instances of the same component (not strictly immutable, so make sure to publish them correctly). Just create one instance per thread and share the model instance. In your case I guess you can just use ThreadLocal to maintain one instance
per thread combined with lazy initialization.

This way we are lock free and avoid difficult to understand
and test multi-threading code. Making sure that our models are immutable is easy and even if we make a mistake there it is unlikely that a user changes the model in an application like yours. In the end I believe that we found a really simple, solid
and nice solution for this problem.

Hope that helps,
Jörn

Reply via email to