I'm doing some testing with Nutch 2.0 and I noticed a possible issue. When
I call nutch as follows:
Fetch -all -threads 100 -parse
The hadoop.log does not seem to indicate that the parameters are ever used.
In the log file all I see is:
2010-12-10 16:22:00,487 INFO fetcher.FetcherJob - FetcherJob: threads: 10
2010-12-10 16:22:00,487 INFO fetcher.FetcherJob - FetcherJob: parsing:
false
Which are the default settings, not what was specified on the command line.
Is it actually using the parameters? The logs don't seem to show it is, but
I think it is.
If I insert a couple of LOG.info
LOG.info("USED FetcherJob: threads: " + getConf().getInt(THREADS_KEY, 10));
LOG.info("USED FetcherJob: parsing: " + getConf().getBoolean(PARSE_KEY,
true));
statements in run(Map<String,Object> args) and it appears the values are
used, but just not recorded in the hadoop.log file.
It may be better to put the LOG.info statements in run after the arguments
are read, rather than in the fetch method. Or do both, but show that the
command line is overriding the conf file. It would make it easier to
understand what is going on.
Thanks
Brad