Using this model is important also from another point of view: with the current code, where NutchConf is a singleton, it's not possible to run several tasks in parallel within a single JVM, but with radically different parameters. E.g.: if you want to run several CrawlTool with different parameters, under a single JVM, it's currently not possible. With the setConfig() change it becomes possible.
But is it really how a search platform will be tuned in the "real life"?
On the contrary, I was thinking that the NucthConf must be a singleton over many JVM (over many nodes).
No? Isn't it the real use case?
Well, different people use it differently, I guess - this shows that Nutch has a much greater potential than originally thought. I hacked my way around that limitation, others do it similarly...
It seems to me that there are two major use cases for Nutch: one is Internet, and you are right that it would not be practical to run more than one processing task per JVM. But the second use case is also common, that of Intranet or selective Web crawling. Here the resources needed to run individual tasks are smaller, so it is often convenient to run several tasks in parallel, under the same JVM.
-- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
