On 3/9/11 3:33 PM, Grant Ingersoll wrote:
On Mar 7, 2011, at 7:16 AM, Jörn Kottmann wrote:

On 3/6/11 1:37 PM, Grant Ingersoll wrote:
On Mar 5, 2011, at 2:13 PM, Jörn Kottmann wrote:
I actually tried to ask how you would do that. I don't think it is super 
simple. Can you please shortly
explain what you have in mind?
 From the looks of it, we'd just need to return the bestSequence object (or 
some larger containing object) out to the user and not use it (or other pieces 
that may change) as a member variable.  Granted, I'm still learning the code, 
so I likely am misreading some things.  From the looks of it, though, simply 
changing the tag method to return the bestSequence would let the user make the 
appropriate calls to best outcome and to get the probabilities (or the probs() 
method could take in the bestSequence object if you wanted to keep that 
convenience)

I suppose I should just work up a patch, it would be a lot easier than 
discussing it in the abstract.

There is also a cache which must be created then per call, we need to do some 
measuring
how expensive that is compared to the current solution.

The POS Tagger should also use the new feature generation stuff we made
for the name finder, but that is not thread safe by design, because it has a
state. The state is necessary to support per document features like we have it 
in
the name finder.

Do you think making the name finder and other components thread safe in the
same way is also possible?
Not sure.  I only noticed it in the POS tagger.

Right now we have the same thread-safety convention
for all components, which I like because it is easy for some one new to learn.
When it is mixed, e.g. POS Tagger thread safe and name finder not, then people
will get confused.
It is no doubt a hard problem.   There is always this tradeoff between easy to 
learn and fast, it seems.  In my experience, most programmers aren't good at 
concurrent programming (and I certainly don't claim to be either) and so it is 
hard to get it right.   I think one of the big wins for us could be to make 
OpenNLP really fast, which will increase its viability and attract others.

Making OpenNLP much faster is of course good. When we discuss performance changes we also need to know how much that change would speed things up. In my eyes the most to gain is currently with
optimizing the feature generation, making the caching more efficient, etc..

How much faster do you think the POS Tagger will be with your proposed change?

Jörn

Reply via email to