Nutch uses Log4J so you could try to configure different log levels for Hadoop classes. But i think it's a bad practice as the JobClient logs important information when it finishes.
On Wednesday 21 March 2012 09:00:00 Andy Xue wrote: > Hi: > > When I was using Nutch 1.4 and Hadoop 1.01 to do distributed crawling, the > log4j logging is way too verbose. Which entries in the *log4j.properties* > file should I modify in order to display only WARN and above messages? I > have tried to replaces "ALL", "DEBUG", and "INFO" to "WARN" in both hadoop > and nutch *log4j.properties* files, but it still doesn't work at all. I > did build a new job file and copy the *log4j.properties* files to each > slave node. > > I appreciate your help. > > Below is an example. Bold lines are the ones that I want to remove: > =========================================================================== > ================= 12/03/21 18:42:39 INFO crawl.Injector: Injector: starting > at 2012-03-21 18:42:39 > 12/03/21 18:42:39 INFO crawl.Injector: Injector: crawlDb: > /crawl_database/crawldb > 12/03/21 18:42:39 INFO crawl.Injector: Injector: urlDir: /urls > 12/03/21 18:42:39 INFO crawl.Injector: Injector: Converting injected urls > to crawl db entries. > *12/03/21 18:42:44 INFO mapred.FileInputFormat: Total input paths to > process : 1* > *12/03/21 18:42:44 INFO mapred.JobClient: Running job: > job_201203211827_0011 * > *12/03/21 18:42:45 INFO mapred.JobClient: map 0% reduce 0%* > *12/03/21 18:42:58 INFO mapred.JobClient: map 10% reduce 0%* > *12/03/21 18:43:01 INFO mapred.JobClient: map 15% reduce 0%* > *12/03/21 18:43:04 INFO mapred.JobClient: map 20% reduce 0%* > *12/03/21 18:43:07 INFO mapred.JobClient: map 25% reduce 2%* > *12/03/21 18:43:10 INFO mapred.JobClient: map 35% reduce 5%* > *12/03/21 18:43:13 INFO mapred.JobClient: map 41% reduce 6%* > *12/03/21 18:43:16 INFO mapred.JobClient: map 51% reduce 7%* > *12/03/21 18:43:19 INFO mapred.JobClient: map 56% reduce 10%* > *12/03/21 18:43:22 INFO mapred.JobClient: map 66% reduce 13%* > *12/03/21 18:43:25 INFO mapred.JobClient: map 71% reduce 17%* > *12/03/21 18:43:28 INFO mapred.JobClient: map 82% reduce 19%* > *12/03/21 18:43:31 INFO mapred.JobClient: map 87% reduce 21%* > *12/03/21 18:43:34 INFO mapred.JobClient: map 97% reduce 23%* > *12/03/21 18:43:37 INFO mapred.JobClient: map 100% reduce 26%* > *12/03/21 18:43:40 INFO mapred.JobClient: map 100% reduce 29%* > *12/03/21 18:43:43 INFO mapred.JobClient: map 100% reduce 41%* > *12/03/21 18:43:46 INFO mapred.JobClient: map 100% reduce 65%* > *12/03/21 18:43:49 INFO mapred.JobClient: map 100% reduce 100%* > *12/03/21 18:43:54 INFO mapred.JobClient: Job complete: > job_201203211827_0011* > *12/03/21 18:43:54 INFO mapred.JobClient: Counters: 31* > *12/03/21 18:43:54 INFO mapred.JobClient: Job Counters * > *12/03/21 18:43:54 INFO mapred.JobClient: Launched reduce tasks=6* > *12/03/21 18:43:54 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=201894* > *12/03/21 18:43:54 INFO mapred.JobClient: Total time spent by all > reduces waiting after reserving slots (ms)=0* > *12/03/21 18:43:54 INFO mapred.JobClient: Total time spent by all maps > waiting after reserving slots (ms)=0* > *12/03/21 18:43:54 INFO mapred.JobClient: Rack-local map tasks=14* > *12/03/21 18:43:54 INFO mapred.JobClient: Launched map tasks=39* > *12/03/21 18:43:54 INFO mapred.JobClient: Data-local map tasks=25* > *12/03/21 18:43:54 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=274583* > *12/03/21 18:43:54 INFO mapred.JobClient: File Input Format Counters * > *12/03/21 18:43:54 INFO mapred.JobClient: Bytes Read=2339* > *12/03/21 18:43:54 INFO mapred.JobClient: File Output Format Counters * > *12/03/21 18:43:54 INFO mapred.JobClient: Bytes Written=817* > *12/03/21 18:43:54 INFO mapred.JobClient: FileSystemCounters* > *12/03/21 18:43:54 INFO mapred.JobClient: FILE_BYTES_READ=307* > *12/03/21 18:43:54 INFO mapred.JobClient: HDFS_BYTES_READ=5615* > *12/03/21 18:43:54 INFO mapred.JobClient: FILE_BYTES_WRITTEN=1357045* > *12/03/21 18:43:54 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=817* > *12/03/21 18:43:54 INFO mapred.JobClient: Map-Reduce Framework* > *12/03/21 18:43:54 INFO mapred.JobClient: Map output materialized > bytes=1675* > *12/03/21 18:43:54 INFO mapred.JobClient: Map input records=5* > *12/03/21 18:43:54 INFO mapred.JobClient: Reduce shuffle bytes=1651* > *12/03/21 18:43:54 INFO mapred.JobClient: Spilled Records=10* > *12/03/21 18:43:54 INFO mapred.JobClient: Map output bytes=261* > *12/03/21 18:43:54 INFO mapred.JobClient: Total committed heap usage > (bytes)=6684082176* > *12/03/21 18:43:54 INFO mapred.JobClient: CPU time spent (ms)=26660* > *12/03/21 18:43:54 INFO mapred.JobClient: Map input bytes=116* > *12/03/21 18:43:54 INFO mapred.JobClient: SPLIT_RAW_BYTES=3276* > *12/03/21 18:43:54 INFO mapred.JobClient: Combine input records=0* > *12/03/21 18:43:54 INFO mapred.JobClient: Reduce input records=5* > *12/03/21 18:43:54 INFO mapred.JobClient: Reduce input groups=5* > *12/03/21 18:43:54 INFO mapred.JobClient: Combine output records=0* > *12/03/21 18:43:54 INFO mapred.JobClient: Physical memory (bytes) > snapshot=8386580480* > *12/03/21 18:43:54 INFO mapred.JobClient: Reduce output records=5* > *12/03/21 18:43:54 INFO mapred.JobClient: Virtual memory (bytes) > snapshot=21784285184* > *12/03/21 18:43:54 INFO mapred.JobClient: Map output records=5* > =========================================================================== > ================= > > Regards > Andy -- Markus Jelsma - CTO - Openindex