I'm doing some testing with Nutch 2.0 and I noticed a possible issue.  When
I call nutch as follows:

Fetch -all -threads 100 -parse

The hadoop.log does not seem to indicate that the parameters are ever used.
In the log file all I see is:
2010-12-10 16:22:00,487 INFO  fetcher.FetcherJob - FetcherJob: threads: 10
2010-12-10 16:22:00,487 INFO  fetcher.FetcherJob - FetcherJob: parsing:
false

Which are the default settings, not what was specified on the command line.

Is it actually using the parameters?  The logs don't seem to show it is, but
I think it is.

If I insert a couple of LOG.info 
LOG.info("USED FetcherJob: threads: " + getConf().getInt(THREADS_KEY, 10)); 
LOG.info("USED FetcherJob: parsing: " + getConf().getBoolean(PARSE_KEY,
true));
statements in run(Map<String,Object> args) and it appears the values are
used, but just not recorded in the hadoop.log file.

It may be better to put the LOG.info statements in run after the arguments
are read, rather than in the fetch method.  Or do both, but show that the
command line is overriding the conf file.  It would make it easier to
understand what is going on.

Thanks
Brad

Reply via email to