Briggs wrote:
> What is the design contract on plugins when it comes to thread safety?
> I was under the assumption that plugins should be thread safe, but I
> have been running into concurrent modification exceptions from the
> language identifier plugin while indexing.  My application is a bit

They should be thread-safe. E.g. Fetcher runs many threads in parallel, 
each thread using plugins to handle fetching, parsing, url filtering, 
etc, etc.


> different from the normal nutch way.  I have may crawls going on
> concurrently within an application.  So, that means I would also have
> many concurrent indexing tasks.  So, if I can't be guaranteed that
> plugins are threadsafe, I may need to do a nasty thing and synchronize
> my index() method (ouch).
> 
> 
> Here is the exception, just for info:
> 
> java.util.ConcurrentModificationException
>        at java.util.HashMap$HashIterator.nextEntry(HashMap.java:787)
>        at java.util.HashMap$ValueIterator.next(HashMap.java:817)
>        at 
> org.apache.nutch.analysis.lang.NGramProfile.normalize(NGramProfile.java:277) 

This is a bug. My guess is that NGramProfile.getSorted() should be 
synchronized. Could you please test if this works?

-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to