On Mar 14, 2011, at 8:10 AM, Jörn Kottmann wrote:

> On 3/9/11 3:33 PM, Grant Ingersoll wrote:
>> On Mar 7, 2011, at 7:16 AM, Jörn Kottmann wrote:
>> 
>>> On 3/6/11 1:37 PM, Grant Ingersoll wrote:
>>>> On Mar 5, 2011, at 2:13 PM, Jörn Kottmann wrote:
>>>>> I actually tried to ask how you would do that. I don't think it is super 
>>>>> simple. Can you please shortly
>>>>> explain what you have in mind?
>>>> From the looks of it, we'd just need to return the bestSequence object (or 
>>>> some larger containing object) out to the user and not use it (or other 
>>>> pieces that may change) as a member variable.  Granted, I'm still learning 
>>>> the code, so I likely am misreading some things.  From the looks of it, 
>>>> though, simply changing the tag method to return the bestSequence would 
>>>> let the user make the appropriate calls to best outcome and to get the 
>>>> probabilities (or the probs() method could take in the bestSequence object 
>>>> if you wanted to keep that convenience)
>>>> 
>>>> I suppose I should just work up a patch, it would be a lot easier than 
>>>> discussing it in the abstract.
>>>> 
>>> There is also a cache which must be created then per call, we need to do 
>>> some measuring
>>> how expensive that is compared to the current solution.
>>> 
>>> The POS Tagger should also use the new feature generation stuff we made
>>> for the name finder, but that is not thread safe by design, because it has a
>>> state. The state is necessary to support per document features like we have 
>>> it in
>>> the name finder.
>>> 
>>> Do you think making the name finder and other components thread safe in the
>>> same way is also possible?
>> Not sure.  I only noticed it in the POS tagger.
>> 
>>> Right now we have the same thread-safety convention
>>> for all components, which I like because it is easy for some one new to 
>>> learn.
>>> When it is mixed, e.g. POS Tagger thread safe and name finder not, then 
>>> people
>>> will get confused.
>> It is no doubt a hard problem.   There is always this tradeoff between easy 
>> to learn and fast, it seems.  In my experience, most programmers aren't good 
>> at concurrent programming (and I certainly don't claim to be either) and so 
>> it is hard to get it right.   I think one of the big wins for us could be to 
>> make OpenNLP really fast, which will increase its viability and attract 
>> others.
> 
> Making OpenNLP much faster is of course good. When we discuss performance 
> changes we also need to
> know how much that change would speed things up. In my eyes the most to gain 
> is currently with
> optimizing the feature generation, making the caching more efficient, etc..
> 

Agreed.  If others aren't aware of it, YourKit gives out free Open Source 
licenses to their profiler for Apache committers.  Details on their website.

> How much faster do you think the POS Tagger will be with your proposed change?

This change isn't about performance, it's about thread safety.  Like I said, 
instead of talking on it, I'll put up a patch as soon as I get some spare time.

Reply via email to