[jira] [Commented] (MAPREDUCE-6771) Diagnostics information can be lost in .jhist if task containers are killed by Node Manager.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15452936#comment-15452936 ] Haibo Chen commented on MAPREDUCE-6771: --- bq. Note that we aren't stuck with TaskAttemptUnsuccessfulCompletion event for doing diagnostics. Agree. I am guessing the reason why diagnostics is included in TaskAttemptUnsuccessfulCompletionEvent is users only want to see diagnostics when task attempts fail. Parsing a new event and ignoring such events for successful task attempts does need additional change. bq. but waiting for a container completion event is not something the state machine does today. There is no need to wait for container completion event. My proposal is to wait for transition into FAILED state. As long as the task attempt goes into FAILED state, which does not necessarily need to be triggered by a container completion event (Time out (TA_TIMED_OUT) is already built-in in transitions from FAIL_FINISHING_CONTAINER to FAILED), the diagnostics will be written into jhist file. But your point of having a wide window is susceptible to AM crash is still very convincing. Given that there is no clear preferable approach to address the case in MAPREDUCE-4955, do you think I can go ahead address the issue in this jira? The symptom of MAPREDUCE-4955 and this one is the same, but the cause is not quite exactly. The case in MAPREDUCE-4955 happens when AM thinks the task attempt is already dead, or the diagnostics comes after a taskUnsuccessfulCompletionEvent is generated already, whereas the case in this jira happens when the diagnostics comes in while task attempt is still in running state, or before a taskUnsuccessfulCompletionEvent. The case in this jira is easy to fix, and we can keep MAPREDUCE-4955 to address the other when we decide what to do. > Diagnostics information can be lost in .jhist if task containers are killed > by Node Manager. > > > Key: MAPREDUCE-6771 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6771 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 2.7.3 >Reporter: Haibo Chen >Assignee: Haibo Chen > Attachments: TaUnsuccessfullyEventEmission.jpg, > mapreduce6771.001.patch > > > Task containers can go over their resource limit, and killed by Node Manager. > Then MR AM gets notified of the container status and diagnostics information > through its heartbeat with RM. However, it is possible that the diagnostics > information never gets into .jhist file, so when the job completes, the > diagnostics information associated with the failed task attempts is empty. > This makes it hard for users to root cause job failures that are often caused > by memory leak. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6771) Diagnostics information can be lost in .jhist if task containers are killed by Node Manager.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15447195#comment-15447195 ] Jason Lowe commented on MAPREDUCE-6771: --- bq. so my understanding of this is there should be ideally one such event in the jhist file Yes, ideally we should avoid emitting more than one TaskAttemptUnsuccessfulCompletion event. There are other tools besides the JHS that look at these jhist files, and I don't know how well they will handle more than one of these for the same attempt. Note that we aren't stuck with TaskAttemptUnsuccessfulCompletion event for doing diagnostics. We could use some new diagnostic event just for this purpose, but that too could cause troubles for jhist parsers that don't skip unknown records. As for the postponing we probably can move it farther down the state machine, but waiting for a container completion event is not something the state machine does today. For example, the FAIL_FINISHING_CONTAINER state is just waiting for the AM to send a kill container request to the NM and not actually waiting for the container completion event. It would need to do so. Another issue with postponing is the dependency on the container completion event. There have been issues in the past where the MR AM "missed" a container completion event and caused a scheduling hang. We'd need some kind of safety valve to prevent the AM from waiting forever for a completion event that would never arrive. Another issue with waiting is that if the AM crashes after a task reported failure but before the container completion event arrived then that won't be noticed by the subsequent AM attempt. (Yes, this race occurs today but the window would be significantly wider.) Those kinds of issues makes the "lets record some additional diagnostics after the fact" approach more appealing, since we do exactly what we do today with an addendum if a container completion event has more info after an attempt completion has already been recorded in the jhist file. Both approaches have pros and cons, and I'm not sure which I prefer yet. > Diagnostics information can be lost in .jhist if task containers are killed > by Node Manager. > > > Key: MAPREDUCE-6771 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6771 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 2.7.3 >Reporter: Haibo Chen >Assignee: Haibo Chen > Attachments: TaUnsuccessfullyEventEmission.jpg, > mapreduce6771.001.patch > > > Task containers can go over their resource limit, and killed by Node Manager. > Then MR AM gets notified of the container status and diagnostics information > through its heartbeat with RM. However, it is possible that the diagnostics > information never gets into .jhist file, so when the job completes, the > diagnostics information associated with the failed task attempts is empty. > This makes it hard for users to root cause job failures that are often caused > by memory leak. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6771) Diagnostics information can be lost in .jhist if task containers are killed by Node Manager.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15447008#comment-15447008 ] Haibo Chen commented on MAPREDUCE-6771: --- Thanks [~jlowe] for bringing up the case in MAPREDUCE-4955. Looking at that jira, it is indeed another case where diagnostics could be lost. bq. The AM would either need to postpone recording the attempt completion event until it receives the container completion event to see if there are any diagnostics or there needs to be a way to record postmortem diagnostics for attempts in the jhist file. The diagnostics are included as part of a TaskAttemptUnsuccessfullyCompletionEvent, so my understanding of this is there should be ideally one such event in the jhist file (If we emit multiple instances, JobHistoryParser will always take the last instance seen in .jhist file). Please correct me if I am wrong. Therefore, I am thinking of postponing recording the unsuccessfully completion event. bq. postpone recording the attempt completion event until it receives the container completion event to see if there are any diagnostics TaskAttemptUnsuccessfullyCompletionEvent is generated upon receipt of TA_KILL, TA_TooManyFetchFailures and TA_FailMsg Postponing the event emission until a container completion event is received makes the handling of TA_FAILMSG semantically inconsistent with that of other cases. I wonder if it is semantically cleaner to postpone the completion event until the transition into the final states (FAILED, KILLED). The emission of TaskAttemptUnsuccessfullyCompletionEvents happens currently before transition into FAIL_FINISHING_CONTAINER, FAILED or KILLED state, but given that FAIL_FINISHING_CONTAINER will eventually transition into FAILED state, we could reduce the three cases into two (See the attachment show transitions during which an TaskAttemptUnsuccessfullyCompletionEvent is generated). That is, right before a task attempt goes into KILLED or FAILED, a TaskAttemptUnsuccessfullyCompletionEvents is written into the .jhist file. > Diagnostics information can be lost in .jhist if task containers are killed > by Node Manager. > > > Key: MAPREDUCE-6771 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6771 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 2.7.3 >Reporter: Haibo Chen >Assignee: Haibo Chen > Attachments: mapreduce6771.001.patch > > > Task containers can go over their resource limit, and killed by Node Manager. > Then MR AM gets notified of the container status and diagnostics information > through its heartbeat with RM. However, it is possible that the diagnostics > information never gets into .jhist file, so when the job completes, the > diagnostics information associated with the failed task attempts is empty. > This makes it hard for users to root cause job failures that are often caused > by memory leak. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6771) Diagnostics information can be lost in .jhist if task containers are killed by Node Manager.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15446117#comment-15446117 ] Jason Lowe commented on MAPREDUCE-6771: --- bq. Not sure how a unit test can be written. Any suggestion is greatly appreciated. A unit test could verify that when the RMCommunicator receives a container completion event with diagnostics it sends the diagnostic event _before_ it sends the completion event. That test will fail before this change and pass afterwards. > Diagnostics information can be lost in .jhist if task containers are killed > by Node Manager. > > > Key: MAPREDUCE-6771 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6771 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 2.7.3 >Reporter: Haibo Chen >Assignee: Haibo Chen > Attachments: mapreduce6771.001.patch > > > Task containers can go over their resource limit, and killed by Node Manager. > Then MR AM gets notified of the container status and diagnostics information > through its heartbeat with RM. However, it is possible that the diagnostics > information never gets into .jhist file, so when the job completes, the > diagnostics information associated with the failed task attempts is empty. > This makes it hard for users to root cause job failures that are often caused > by memory leak. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6771) Diagnostics information can be lost in .jhist if task containers are killed by Node Manager.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15446112#comment-15446112 ] Jason Lowe commented on MAPREDUCE-6771: --- Thanks for the report and patch! This is closely related to MAPREDUCE-4955. I think this is necessary but not sufficient to fix the problem. As noted in MAPREDUCE-4955 the task can fail due to the SIGTERM and report that failure via the umbilical before the container completion event arrives at the AM. At that point the task attempt is already dead from the AM perspective and the .jhist entry already recorded, so the extra diagnostics have nowhere to go. The AM would either need to postpone recording the attempt completion event until it receives the container completion event to see if there are any diagnostics or there needs to be a way to record postmortem diagnostics for attempts in the jhist file. > Diagnostics information can be lost in .jhist if task containers are killed > by Node Manager. > > > Key: MAPREDUCE-6771 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6771 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 2.7.3 >Reporter: Haibo Chen >Assignee: Haibo Chen > Attachments: mapreduce6771.001.patch > > > Task containers can go over their resource limit, and killed by Node Manager. > Then MR AM gets notified of the container status and diagnostics information > through its heartbeat with RM. However, it is possible that the diagnostics > information never gets into .jhist file, so when the job completes, the > diagnostics information associated with the failed task attempts is empty. > This makes it hard for users to root cause job failures that are often caused > by memory leak. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6771) Diagnostics information can be lost in .jhist if task containers are killed by Node Manager.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15440439#comment-15440439 ] Hadoop QA commented on MAPREDUCE-6771: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 13m 21s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 17s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 16s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 29s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 34s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 21s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 20s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 39s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 44s {color} | {color:green} hadoop-mapreduce-client-app in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 34m 58s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12825749/mapreduce6771.001.patch | | JIRA Issue | MAPREDUCE-6771 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 3f99f0b54520 3.13.0-93-generic #140-Ubuntu SMP Mon Jul 18 21:21:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 19c743c | | Default Java | 1.8.0_101 | | findbugs | v3.0.0 | | Test Results | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6700/testReport/ | | modules | C: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app U: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6700/console | | Powered by | Apache Yetus 0.3.0 http://yetus.apache.org | This message was automatically generated. > Diagnostics information can be lost in .jhist if task containers are killed > by Node Manager. > > > Key: MAPREDUCE-6771 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6771 > Project: Hadoop Map/Reduce > Issue Type: Bug >
[jira] [Commented] (MAPREDUCE-6771) Diagnostics information can be lost in .jhist if task containers are killed by Node Manager.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15440379#comment-15440379 ] Haibo Chen commented on MAPREDUCE-6771: --- If tasked are killed or failed on NM before they can notify AM, the user need to dig through NM logs, or task logs hoping they can find some useful information as to why the task attempt failed. > Diagnostics information can be lost in .jhist if task containers are killed > by Node Manager. > > > Key: MAPREDUCE-6771 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6771 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 2.7.3 >Reporter: Haibo Chen >Assignee: Haibo Chen > > Task containers can go over their resource limit, and killed by Node Manager. > Then MR AM gets notified of the container status and diagnostics information > through its heartbeat with RM. However, it is possible that the diagnostics > information never gets into .jhist file, so when the job completes, the > diagnostics information associated with the failed task attempts is empty. > This makes it hard for users to root cause job failures that are often caused > by memory leak. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6771) Diagnostics information can be lost in .jhist if task containers are killed by Node Manager.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15440372#comment-15440372 ] Haibo Chen commented on MAPREDUCE-6771: --- Analysis: {code:java} RMContainerAllocator.getResources() { ... for (ContainerStatus cont : finishedContainers) { LOG.info("Received completed container " + cont.getContainerId()); TaskAttemptId attemptID = assignedRequests.get(cont.getContainerId()); if (attemptID == null) { LOG.error("Container complete event for unknown container id " + cont.getContainerId()); } else { pendingRelease.remove(cont.getContainerId()); assignedRequests.remove(attemptID); // send the container completed event to Task attempt eventHandler.handle(createContainerFinishedEvent(cont, attemptID)); // Send the diagnostics String diagnostics = StringInterner.weakIntern(cont.getDiagnostics()); eventHandler.handle(new TaskAttemptDiagnosticsUpdateEvent(attemptID, diagnostics)); preemptionPolicy.handleCompletedContainer(attemptID); } ... } {code} The scenario in question is described as follows: A job is running, and one of tasks attempt running on a NM is killed by the NM because the container exceeds its resource limit. The container status/diagnostics is sent to RM by the NM and then later to MR AM in its periodical heartbeat with RM as shown above. In MR AM, the task attempt is still in RUNNING state from AM's perspective, since the task heartbeat has not timed out. Upon receiving from RM that the task attempt container has finished, the RMCommunicator thread will place a ContainerFinishedEvent and a TaskAttemptDiagnosticsUpdateEvent in the event queue. The ContainerFinishedEvent will cause the task attempt in MR AM to transition from RUNNING to FAILED and a TaskAttemptUnsuccessfulCompletionEvent that contains the associated diagnostics information to be written to the .jhist file. The TaskAttemptDiagnosticsUpdateEvent will update the diagnostics information associated with the task attempt. But since the ContainerFinishedEvent is placed and processed before the TaskAttemptDiagnosticsUpdateEvent, the TaskAttemptUnsuccessfulCompletionEvent written to .jhist file will not contain the diagnostics info received from RM. After the job is completed, the user tries to access the failed task attempts through JHS, the TaskAttemptUnsuccessfulCompletionEvent is parsed to generate the failed attempt page. The page will not have diagnostics info from RM (such as container killed by Node Manager...) because it was never written to .jhist in the first place. > Diagnostics information can be lost in .jhist if task containers are killed > by Node Manager. > > > Key: MAPREDUCE-6771 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6771 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 2.7.3 >Reporter: Haibo Chen >Assignee: Haibo Chen > > Task containers can go over their resource limit, and killed by Node Manager. > Then MR AM gets notified of the container status and diagnostics information > through its heartbeat with RM. However, it is possible that the diagnostics > information never gets into .jhist file, so when the job completes, the > diagnostics information associated with the failed task attempts is empty. > This makes it hard for users to root cause job failures that are often caused > by memory leak. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org