What is the design contract on plugins when it comes to thread safety? I was under the assumption that plugins should be thread safe, but I have been running into concurrent modification exceptions from the language identifier plugin while indexing. My application is a bit different from the normal nutch way. I have may crawls going on concurrently within an application. So, that means I would also have many concurrent indexing tasks. So, if I can't be guaranteed that plugins are threadsafe, I may need to do a nasty thing and synchronize my index() method (ouch).
Here is the exception, just for info: java.util.ConcurrentModificationException at java.util.HashMap$HashIterator.nextEntry(HashMap.java:787) at java.util.HashMap$ValueIterator.next(HashMap.java:817) at org.apache.nutch.analysis.lang.NGramProfile.normalize(NGramProfile.java:277) at org.apache.nutch.analysis.lang.NGramProfile.analyze(NGramProfile.java:244) at org.apache.nutch.analysis.lang.LanguageIdentifier.identify(LanguageIdentifier.java:409) at org.apache.nutch.analysis.lang.LanguageIndexingFilter.filter(LanguageIndexingFilter.java:84) at org.apache.nutch.indexer.IndexingFilters.filter(IndexingFilters.java:131) at org.apache.nutch.indexer.Indexer.reduce(Indexer.java:240) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:313) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:155) --briggs "Conscious decisions by conscious minds are what make reality real" ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers