[ https://issues.apache.org/jira/browse/YARN-6042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15857340#comment-15857340 ]
Wilfred Spiegelenburg commented on YARN-6042: --------------------------------------------- I looked at the changes and it will help debugging the FS a lot when we get this into a release A couple of things: # In the FairScheduler change you add a new method {{dumpSchedulerState()}} why are you not passing in the rootQueue to the method? It safes getting it again since you have already got it the update method. # I am missing one number for the applications in the {{dumpStateInternal()}} for the FSLeafQueue: {{getNumPendingApps()}} or {{getNumActiveApps()}}. We need to have one of those to have a full view of what the application state is in the queue. # We add the LastTimeAtMinShare but not the LastTimeAtFairShare for the leaf queue as per: {{getLastTimeAtFairShareThreshold()}} I am also a bit worried about the test: in the output we build the debug string and get the time in milliseconds for the LastTimeAtMinShare. What if the {{updateStarvationStats()}} call was run 1 millisecond earlier than the debug string was build? The comparison would fail and the test would fail because of that. I don't think we can guarantee that those two calls will be in the same millisecond. > Fairscheduler: Dump scheduler state in log > ------------------------------------------ > > Key: YARN-6042 > URL: https://issues.apache.org/jira/browse/YARN-6042 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler > Reporter: Yufei Gu > Assignee: Yufei Gu > Attachments: YARN-6042.001.patch, YARN-6042.002.patch > > > To improve the debugging of scheduler issues it would be a big improvement to > be able to dump the scheduler state into a log on request. > The Dump the scheduler state at a point in time would allow debugging of a > scheduler that is not hung (deadlocked) but also not assigning containers. > Currently we do not have a proper overview of what state the scheduler and > the queues are in and we have to make assumptions or guess > The scheduler and queue state needed would include (not exhaustive): > - instantaneous and steady fair share (app / queue) > - AM share and resources > - weight > - app demand > - application run state (runnable/non runnable) > - last time at fair/min share -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org