Stefan,
On Tue, Feb 01, 2005 at 01:55:03AM +0100, Stefan Groschupf wrote:
John,
by the way, is the url filter multithreaded?
Do you think it is possible to implement the url filter extension point multithreaded?
As far as I know, none of the tools that currently use URLFilter service is multithreaded (WebDBInjector.java, UpdateDatabaseTool.java, etc.), though it would be nice to make sure URLFilter plugins are thread-safe.
I was involved in implementation of Nutch-based "multi-crawler". We wanted to run several Intranet crawls inside a single JVM - each crawl with its own set of parameters, filters and configuration. This proved to be rather difficult to implement, because in many places Nutch assumes there is only one processing task (i.e. 1 or more threads, like e.g. updatedb, generate, or fetch) per JVM.
There is no concept of processing context, which would tie together plugins, filters, configuration parameters etc. This is now implemented as static methods on a couple of classes, the worst example being the use of LOG.severe to terminate processing.
An alternative would be to pass instances of "NutchContext" to all processing tasks, so that they could read necessary parameters, or even retrieve instances of plugins, filters etc. Such context could also provide a data container to pass messages (like LOG.severe) to other parts of the processing chain.
If we go with this approach, then we don't have to make plugins multi-threaded, because there will always be a single instance per "processing task".
-- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
------------------------------------------------------- This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting Tool for open source databases. Create drag-&-drop reports. Save time by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc. Download a FREE copy at http://www.intelliview.com/go/osdn_nl _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
