Hi,

> Any ideas if this can cause problems
Yes, it can definitely cause problems. I've just observed such a problem
in our custom plugin which traverses the DOM tree to extract nodes by CSS3
selectors.

> and how to make it thread safe?
That's hard if not impossible. The inner states
(current node, stack, etc.) have to be saved somewhere.
The solution is to ensure that only one thread uses
the same NodeWalker instance. It's a small class, so
you can make it local to the filter() method of your plugin
(or any other method):

 public ParseResult filter(..., DocumentFragment doc) {
   ...
   NodeWalker walker = new NodeWalker(doc);
   while (walker.hasNext()) {
     ...
   }

That should be safe. The problems seen and discussed in the thread
"Wrong ParseData in segment" result from a DOM traverser implemented
as instance variable (member variable, field).

Cheers,
Sebastian

On 01/16/2013 06:51 PM, [email protected] wrote:
> Hello,
> 
> I use this class  NodeWalker at 
> src/java/org/apache/nutch/util/NodeWalker.java in one of our plugins. I 
> noticed this comment 
> //Currently this class is not thread safe.  It is assumed that only one   
> thread will be accessing the <code>NodeWalker</code> at any given time."
> above the class definition.
> 
> Any ideas if this can cause problems and how to make it thread safe?
> 
> Thanks.
> Alex.
> 

Reply via email to