I see, complex indeed. I'll manage for now. Thanks for your answer. On Tuesday 28 September 2010 14:18:06 Andrzej Bialecki wrote: > On 2010-09-28 13:55, Markus Jelsma wrote: > > Thanks. Could we modify the code so it will only output the info before > > the tasks are initialized? If so, how to proceed? > > This is a bit tricky, because the code is executed differently depending > on whether it executes in local mode (or from a local application) and > in distributed mode (or from one of the mapreduce tasks). > > In local mode resources are taken from a classpath determined during the > execution of the driver application (the one with main()), and these may > include (and often do!) multiple copies of local files, such as > conf/nutch-site.xml and nutch-site.xml that is packed inside a job jar. > Furthermore, plugins in local mode are NOT loaded from nutch.job, but > instead from the plugins/ directory... so their composition may be > different than the one that is used by distributed tasks. > > Now, the crux of the matter is that in order to print this list only > once you would have to do this from the driver application - but when > you run Nutch in distributed mode the driver application uses a > different classpath than each of the tasks will use, so the list could > be different, which would be very confusing... > > All in all, I think it's best to print it possibly many times from > tasks, or not at all. This choice could be implemented as a logging > level, or as a config property. >
Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350