Hi Sebastin I am referring to the INFO messages that are printed in console when nutch 1.14 is running in distributed mode. For example
Injecting seed URLs /mnt/nutch/runtime/deploy/bin/nutch inject /user/hadoop/crawlDIR/crawldb seed.txt 17/07/29 06:51:18 INFO crawl.Injector: Injector: starting at 2017-07-29 06:51:18 17/07/29 06:51:18 INFO crawl.Injector: Injector: crawlDb: /user/hadoop/crawlDIR/crawldb 17/07/29 06:51:18 INFO crawl.Injector: Injector: urlDir: seed.txt 17/07/29 06:51:18 INFO crawl.Injector: Injector: Converting injected urls to crawl db entries. 17/07/29 06:51:19 INFO client.RMProxy: Connecting to ResourceManager at ip-*-*-*-*.ec2.internal/*.*.*.*:8032 17/07/29 06:51:20 INFO input.FileInputFormat: Total input paths to process : 0 17/07/29 06:51:20 INFO input.FileInputFormat: Total input paths to process : 1 . . 17/07/29 06:51:20 INFO mapreduce.Job: Running job: job_1500749038440_0003 17/07/29 06:51:28 INFO mapreduce.Job: Job job_1500749038440_0003 running in uber mode : false 17/07/29 06:51:28 INFO mapreduce.Job: map 0% reduce 0% 17/07/29 06:51:33 INFO mapreduce.Job: map 100% reduce 0% 17/07/29 06:51:38 INFO mapreduce.Job: map 100% reduce 4% 17/07/29 06:51:40 INFO mapreduce.Job: map 100% reduce 6% 17/07/29 06:51:41 INFO mapreduce.Job: map 100% reduce 49% 17/07/29 06:51:42 INFO mapreduce.Job: map 100% reduce 66% 17/07/29 06:51:43 INFO mapreduce.Job: map 100% reduce 87% 17/07/29 06:51:44 INFO mapreduce.Job: map 100% reduce 100% I am running nutch from a EMR cluster. I did check around the log directories and I dont see the messages i see in the console anywhere else. One more thing i noticed is when i issue the command *ps -ef | grep nutch* hadoop 21616 18344 2 06:59 pts/1 00:00:09 /usr/lib/jvm/java-1.8.0-openjdk.x86_64/bin/java -Xmx1000m -server -XX:OnOutOfMemoryError=kill -9 %p *-Dhadoop.log.dir=/usr/lib/hadoop/logs* *-Dhadoop.log.file=hadoop.log* -Dhadoop.home.dir=/usr/lib/hadoop -Dhadoop.id.str= *-Dhadoop.root.logger=INFO,console* -Djava.library.path=:/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dhadoop.security.logger=INFO,NullAppender -Dsun.net.inetaddr.ttl=30 org.apache.hadoop.util.RunJar /mnt/nutch/runtime/deploy/apache-nutch-1.14-SNAPSHOT.job org.apache.nutch.fetcher.Fetcher -D mapreduce.map.java.opts=-Xmx2304m -D mapreduce.map.memory.mb=2880 -D mapreduce.reduce.java.opts=-Xmx4608m -D mapreduce.reduce.memory.mb=5760 -D mapreduce.job.reduces=12 -D mapreduce.reduce.speculative=false -D mapreduce.map.speculative=false -D mapreduce.map.output.compress=true -D fetcher.timelimit.mins=180 /user/hadoop/crawlDIR/segments/20170729065841 -noParsing -threads 100 The logger mentioned in the running process is console. How do i change it to the log file rotated by log4j ? i tried modifying the conf/log4j.properties file to use DRFA instead of cmdstdout logger. but that did not help either. Any help would be appreciated. Thanks Srini On Mon, Jul 24, 2017 at 12:52 AM, Sebastian Nagel < wastl.na...@googlemail.com> wrote: > Hi Srini, > > in distributed mode the bulk of Nutch's log output is kept in the Hadoop > task logs. > The configuration whether, how long and where these logs are kept depends > on the > configuration of your Hadoop cluster. You can easily find tutorials and > examples > how to configure this if you google for "hadoop task logs". > > Be careful the Nutch logs are usually huge. The easiest way to get them > for a jobs > is to run the following command on the master node: > > yarn logs -applicationId <app_id> > > Best, > Sebastian > > On 07/21/2017 10:09 PM, Srinivasan Ramaswamy wrote: > > Hi > > > > I am running nutch in distributed mode. I would like to see all nuch logs > > written to files. I only see the console output. Can i see the same > > information logged to some log files ? > > > > When i run nutch in local mode i do see the logs in runtime/local/logs > > directory. But when i run nutch in distributed mode, i dont see it > anywhere > > except console. > > > > Can anyone help me with the settings that i need to change ? > > > > Thanks > > Srini > > > >