[
https://issues.apache.org/jira/browse/YARN-4882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15278993#comment-15278993
]
Karthik Kambatla commented on YARN-4882:
----------------------------------------
Was debugging something else in the RM and saw this. We should definitely drive
towards reducing the log spew.
In terms of the discussion here, in my experience:
# We need logs for applications that fail to recover so we know why they failed
to recover.
# We don't really need logs for applications that recovered fine. At least so
far, I haven't run into any issues with the RM recovering a job that should not
have been. Does anyone have experience with these? I am fine with logging at
DEBUG or even TRACE level.
# It might be nice to log something about how many applications have been
recovered successfully, how many failed. And, since I am expressing wishes
here, may be the time we spent recovering applications.
App recovery failures should be less frequent, and I wonder if we need a
separate log file just for those. Also, in my experience, the RM can itself
fail if the app recovery fails. It is infinitely easier to have all the info in
one log file at debug time, as opposed to looking at two log files and tracking
timestamps. Are there any major advantages to a separate log file that I am
missing?
[~jlowe], [~rohithsharma], [~vvasudev], [~templedf] - can we reconsider this
and come to a consensus?
> Change the log level to DEBUG for recovering completed applications
> -------------------------------------------------------------------
>
> Key: YARN-4882
> URL: https://issues.apache.org/jira/browse/YARN-4882
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager
> Reporter: Rohith Sharma K S
> Assignee: Daniel Templeton
>
> I think for recovering completed applications no need to log as INFO, rather
> it can be made it as DEBUG. The problem seen from large cluster is if any
> issue happens during RM start up and continuously switching , then RM logs
> are filled with most with recovering applications only.
> There are 6 lines are logged for 1 applications as I shown in below logs,
> then consider RM default value for max-completed applications is 10K. So for
> each switch 10K*6=60K lines will be added which is not useful I feel.
> {noformat}
> 2016-03-01 10:20:59,077 INFO
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager: Default priority
> level is set to application:application_1456298208485_21507
> 2016-03-01 10:20:59,094 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Recovering
> app: application_1456298208485_21507 with 1 attempts and final state =
> FINISHED
> 2016-03-01 10:20:59,100 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
> Recovering attempt: appattempt_1456298208485_21507_000001 with final state:
> FINISHED
> 2016-03-01 10:20:59,107 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
> appattempt_1456298208485_21507_000001 State change from NEW to FINISHED
> 2016-03-01 10:20:59,111 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
> application_1456298208485_21507 State change from NEW to FINISHED
> 2016-03-01 10:20:59,112 INFO
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=rohith
> OPERATION=Application Finished - Succeeded TARGET=RMAppManager
> RESULT=SUCCESS APPID=application_1456298208485_21507
> {noformat}
> The main problem is missing important information's from the logs before RM
> unstable. Even though log roll back is 50 or 100, in a short period all these
> logs will be rolled out and all the logs contains only RM switching
> information that too recovering applications!!.
> I suggest at least completed applications recovery should be logged as DEBUG.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]