On 2/22/11 4:55 PM, Grant Ingersoll wrote:
Hi,
I'm using 1.4.3, but it looks like trunk has the same issue. That is, it
doesn't appear like the POSTaggerME class is thread safe, but perhaps I am
misreading it. I ask this, because it seems like the capturing of the
bestSequence instance is a member variable and the tag and probs methods both
access this method. The reason I ask, is b/c I want to use this inside of
Solr, but that is multithreaded and could be serving up a lot of requests and I
certainly can't afford to load the model for each request. The fix for this
particular class seems relatively straightforward, at the cost of breaking back
compatibility of the API (which is a whole other topic)
I haven't looked deeper, but are there any other classes that I should be aware
of w/ thread safety that people can think of?
The components are not thread safe. They must only be called from one
thread.
How to run OpenNLP in multiple threads then?
The models are thread-safe (because they are immutable) and can be shared
between multiple instances of the same component (not strictly
immutable, so make sure
to publish them correctly). Just create one instance per thread and
share the
model instance. In your case I guess you can just use ThreadLocal to
maintain one instance
per thread combined with lazy initialization.
This way we are lock free and avoid difficult to understand
and test multi-threading code. Making sure that our models are immutable
is easy
and even if we make a mistake there it is unlikely that a user changes
the model
in an application like yours. In the end I believe that we found a
really simple, solid
and nice solution for this problem.
Hope that helps,
Jörn