What changes can be made in Nutch log4j.properties to reduce the size of
Nutch logging size.
Shubham
On Wednesday 07 September 2016 04:04 AM, Markus Jelsma wrote:
I've seen Hadoop not honouring some logs settings before. Are you really sure
these are org.apache.nutch.* logs? If so, and as said before, change
log4j.properties to not log INFO messages. If they Hadoop logs, of which there
are many, then change some Hadoop settings which i don't remember right now.
Hadoop is notorious for verbose logging and any job can easily create hundreds
MB's of logs on datanodes, HDFS master, YARN master, containers etc. This is
normal and should be expected. If your Hadoop cluster is not designed to take
GB's of logs, then your disk space is just too small. This is because so much
is happening when a job runs. Either increase disk space, or set all logging
levels to WARN or higher.
In any case, a Hadoop cluster always logs more than Nutch does, so Nutch
logging is the least of your problems.
M.
-----Original message-----
From:shubham.gupta <[email protected]>
Sent: Tuesday 6th September 2016 6:57
To: [email protected]
Subject: Re: Application creating huge amount of logs : Nutch 2.3.1 + Hadoop
2.7.1
Hey,
I have changed the user.log_retain size to 10 MB still it is creating a
huge size of logs. This leads to the failure of datanode and the job
fails. And, if the logs are deleted periodically then the fetch phase
takes a lot of time and it is uncertain that whether it will complete or
not.
Shubham Gupta
On Wednesday 24 August 2016 05:20 PM, Markus Jelsma wrote:
If it is Nutch logging, change its level in conf/log4j.properties. It can also
be Hadoop logging.
M.
-----Original message-----
From:shubham.gupta <[email protected]>
Sent: Tuesday 23rd August 2016 8:15
To: [email protected]
Subject: Application creating huge amount of logs : Nutch 2.3.1 + Hadoop 2.7.1
Hey
I have integrated Nutch 2.3.1 with Hadoop 2.7.1, and the fetcher.parse
property is set TRUE and the database used is MongoDB. While the map job
of nutch runs, it creates a huge size of nodelogs over 13GB in size. And
the cause of such huge amount of files in unknown. Any suggestion would
help.
Thanks in advance.
Shubham Gupta