I see, complex indeed. I'll manage for now. Thanks for your answer.

On Tuesday 28 September 2010 14:18:06 Andrzej Bialecki wrote:
> On 2010-09-28 13:55, Markus Jelsma wrote:
> > Thanks. Could we modify the code so it will only output the info before
> > the tasks are initialized? If so, how to proceed?
> 
> This is a bit tricky, because the code is executed differently depending
> on whether it executes in local mode (or from a local application) and
> in distributed mode (or from one of the mapreduce tasks).
> 
> In local mode resources are taken from a classpath determined during the
> execution of the driver application (the one with main()), and these may
> include (and often do!) multiple copies of local files, such as
> conf/nutch-site.xml and nutch-site.xml that is packed inside a job jar.
> Furthermore, plugins in local mode are NOT loaded from nutch.job, but
> instead from the plugins/ directory... so their composition may be
> different than the one that is used by distributed tasks.
> 
> Now, the crux of the matter is that in order to print this list only
> once you would have to do this from the driver application - but when
> you run Nutch in distributed mode the driver application uses a
> different classpath than each of the tasks will use, so the list could
> be different, which would be very confusing...
> 
> All in all, I think it's best to print it possibly many times from
> tasks, or not at all. This choice could be implemented as a logging
> level, or as a config property.
> 

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Reply via email to