On Tue, Feb 01, 2005 at 10:38:06AM +0100, Andrzej Bialecki wrote: > John X wrote: > >Stefan, > > > >On Tue, Feb 01, 2005 at 01:55:03AM +0100, Stefan Groschupf wrote: > > > >>John, > >> > >>by the way, is the url filter multithreaded? > >>Do you think it is possible to implement the url filter extension > >>point multithreaded? > > > > > >As far as I know, none of the tools that currently use URLFilter service > >is multithreaded (WebDBInjector.java, UpdateDatabaseTool.java, etc.), > >though it would be nice to make sure URLFilter plugins are thread-safe. > > I was involved in implementation of Nutch-based "multi-crawler". We > wanted to run several Intranet crawls inside a single JVM - each crawl > with its own set of parameters, filters and configuration. This proved > to be rather difficult to implement, because in many places Nutch > assumes there is only one processing task (i.e. 1 or more threads, like > e.g. updatedb, generate, or fetch) per JVM. > > There is no concept of processing context, which would tie together > plugins, filters, configuration parameters etc. This is now implemented > as static methods on a couple of classes, the worst example being the > use of LOG.severe to terminate processing. > > An alternative would be to pass instances of "NutchContext" to all > processing tasks, so that they could read necessary parameters, or even > retrieve instances of plugins, filters etc. Such context could also > provide a data container to pass messages (like LOG.severe) to other > parts of the processing chain.
Do you have template code? John ------------------------------------------------------- This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting Tool for open source databases. Create drag-&-drop reports. Save time by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc. Download a FREE copy at http://www.intelliview.com/go/osdn_nl _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
