[jira] [Updated] (MAPREDUCE-6771) Diagnostics information can be lost in .jhist if task containers are killed by Node Manager.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated MAPREDUCE-6771: -- Attachment: TaUnsuccessfullyEventEmission.jpg > Diagnostics information can be lost in .jhist if task containers are killed > by Node Manager. > > > Key: MAPREDUCE-6771 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6771 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 2.7.3 >Reporter: Haibo Chen >Assignee: Haibo Chen > Attachments: TaUnsuccessfullyEventEmission.jpg, > mapreduce6771.001.patch > > > Task containers can go over their resource limit, and killed by Node Manager. > Then MR AM gets notified of the container status and diagnostics information > through its heartbeat with RM. However, it is possible that the diagnostics > information never gets into .jhist file, so when the job completes, the > diagnostics information associated with the failed task attempts is empty. > This makes it hard for users to root cause job failures that are often caused > by memory leak. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6771) Diagnostics information can be lost in .jhist if task containers are killed by Node Manager.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated MAPREDUCE-6771: -- Status: Patch Available (was: Open) > Diagnostics information can be lost in .jhist if task containers are killed > by Node Manager. > > > Key: MAPREDUCE-6771 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6771 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 2.7.3 >Reporter: Haibo Chen >Assignee: Haibo Chen > Attachments: mapreduce6771.001.patch > > > Task containers can go over their resource limit, and killed by Node Manager. > Then MR AM gets notified of the container status and diagnostics information > through its heartbeat with RM. However, it is possible that the diagnostics > information never gets into .jhist file, so when the job completes, the > diagnostics information associated with the failed task attempts is empty. > This makes it hard for users to root cause job failures that are often caused > by memory leak. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6771) Diagnostics information can be lost in .jhist if task containers are killed by Node Manager.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated MAPREDUCE-6771: -- Attachment: mapreduce6771.001.patch Uploading a patch to fix this. Not sure how a unit test can be written. Any suggestion is greatly appreciated. > Diagnostics information can be lost in .jhist if task containers are killed > by Node Manager. > > > Key: MAPREDUCE-6771 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6771 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 2.7.3 >Reporter: Haibo Chen >Assignee: Haibo Chen > Attachments: mapreduce6771.001.patch > > > Task containers can go over their resource limit, and killed by Node Manager. > Then MR AM gets notified of the container status and diagnostics information > through its heartbeat with RM. However, it is possible that the diagnostics > information never gets into .jhist file, so when the job completes, the > diagnostics information associated with the failed task attempts is empty. > This makes it hard for users to root cause job failures that are often caused > by memory leak. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6771) Diagnostics information can be lost in .jhist if task containers are killed by Node Manager.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated MAPREDUCE-6771: -- Summary: Diagnostics information can be lost in .jhist if task containers are killed by Node Manager. (was: Diagnostics information is lost in .jhist if task containers are killed by Node Manager.) > Diagnostics information can be lost in .jhist if task containers are killed > by Node Manager. > > > Key: MAPREDUCE-6771 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6771 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 2.7.3 >Reporter: Haibo Chen >Assignee: Haibo Chen > > Task containers can go over their resource limit, and killed by Node Manager. > Then MR AM gets notified of the container status and diagnostics information > through its heartbeat with RM. However, it is possible that the diagnostics > information never gets into .jhist file, so when the job completes, the > diagnostics information associated with the failed task attempts is empty. > This makes it hard for users to root cause job failures that are often caused > by memory leak. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org