[jira] [Updated] (MAPREDUCE-6771) RMContainerAllocator sends container diagnostics event after corresponding completion event
[ https://issues.apache.org/jira/browse/MAPREDUCE-6771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated MAPREDUCE-6771: -- Resolution: Fixed Fix Version/s: 3.0.0-alpha2 2.7.4 2.8.0 Status: Resolved (was: Patch Available) Thanks, [~haibochen]! I committed this to trunk, branch-2, branch-2.8, and branch-2.7. > RMContainerAllocator sends container diagnostics event after corresponding > completion event > --- > > Key: MAPREDUCE-6771 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6771 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 2.7.3 >Reporter: Haibo Chen >Assignee: Haibo Chen > Fix For: 2.8.0, 2.7.4, 3.0.0-alpha2 > > Attachments: TaUnsuccessfullyEventEmission.jpg, > mapreduce6771.001.patch, mapreduce6771.002.patch, mapreduce6771.003.patch, > mapreduce6771.004.patch, mapreduce6771.branch-2.8.patch > > > Task containers can go over their resource limit, and killed by Node Manager. > Then MR AM gets notified of the container status and diagnostics information > through its heartbeat with RM. However, it is possible that the diagnostics > information never gets into .jhist file, so when the job completes, the > diagnostics information associated with the failed task attempts is empty. > This makes it hard for users to root cause job failures that are often caused > by memory leak. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6771) RMContainerAllocator sends container diagnostics event after corresponding completion event
[ https://issues.apache.org/jira/browse/MAPREDUCE-6771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated MAPREDUCE-6771: -- Attachment: mapreduce6771.branch-2.8.patch > RMContainerAllocator sends container diagnostics event after corresponding > completion event > --- > > Key: MAPREDUCE-6771 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6771 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 2.7.3 >Reporter: Haibo Chen >Assignee: Haibo Chen > Attachments: TaUnsuccessfullyEventEmission.jpg, > mapreduce6771.001.patch, mapreduce6771.002.patch, mapreduce6771.003.patch, > mapreduce6771.004.patch, mapreduce6771.branch-2.8.patch > > > Task containers can go over their resource limit, and killed by Node Manager. > Then MR AM gets notified of the container status and diagnostics information > through its heartbeat with RM. However, it is possible that the diagnostics > information never gets into .jhist file, so when the job completes, the > diagnostics information associated with the failed task attempts is empty. > This makes it hard for users to root cause job failures that are often caused > by memory leak. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6771) RMContainerAllocator sends container diagnostics event after corresponding completion event
[ https://issues.apache.org/jira/browse/MAPREDUCE-6771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated MAPREDUCE-6771: -- Attachment: mapreduce6771.004.patch > RMContainerAllocator sends container diagnostics event after corresponding > completion event > --- > > Key: MAPREDUCE-6771 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6771 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 2.7.3 >Reporter: Haibo Chen >Assignee: Haibo Chen > Attachments: TaUnsuccessfullyEventEmission.jpg, > mapreduce6771.001.patch, mapreduce6771.002.patch, mapreduce6771.003.patch, > mapreduce6771.004.patch > > > Task containers can go over their resource limit, and killed by Node Manager. > Then MR AM gets notified of the container status and diagnostics information > through its heartbeat with RM. However, it is possible that the diagnostics > information never gets into .jhist file, so when the job completes, the > diagnostics information associated with the failed task attempts is empty. > This makes it hard for users to root cause job failures that are often caused > by memory leak. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6771) RMContainerAllocator sends container diagnostics event after corresponding completion event
[ https://issues.apache.org/jira/browse/MAPREDUCE-6771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated MAPREDUCE-6771: -- Attachment: mapreduce6771.003.patch Uploaded a new one to address check style issues. > RMContainerAllocator sends container diagnostics event after corresponding > completion event > --- > > Key: MAPREDUCE-6771 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6771 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 2.7.3 >Reporter: Haibo Chen >Assignee: Haibo Chen > Attachments: TaUnsuccessfullyEventEmission.jpg, > mapreduce6771.001.patch, mapreduce6771.002.patch, mapreduce6771.003.patch > > > Task containers can go over their resource limit, and killed by Node Manager. > Then MR AM gets notified of the container status and diagnostics information > through its heartbeat with RM. However, it is possible that the diagnostics > information never gets into .jhist file, so when the job completes, the > diagnostics information associated with the failed task attempts is empty. > This makes it hard for users to root cause job failures that are often caused > by memory leak. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6771) RMContainerAllocator sends container diagnostics event after corresponding completion event
[ https://issues.apache.org/jira/browse/MAPREDUCE-6771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated MAPREDUCE-6771: -- Attachment: mapreduce6771.002.patch Sorry for the long delay. Uploading a new patch that adds a unit test. > RMContainerAllocator sends container diagnostics event after corresponding > completion event > --- > > Key: MAPREDUCE-6771 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6771 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 2.7.3 >Reporter: Haibo Chen >Assignee: Haibo Chen > Attachments: TaUnsuccessfullyEventEmission.jpg, > mapreduce6771.001.patch, mapreduce6771.002.patch > > > Task containers can go over their resource limit, and killed by Node Manager. > Then MR AM gets notified of the container status and diagnostics information > through its heartbeat with RM. However, it is possible that the diagnostics > information never gets into .jhist file, so when the job completes, the > diagnostics information associated with the failed task attempts is empty. > This makes it hard for users to root cause job failures that are often caused > by memory leak. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6771) RMContainerAllocator sends container diagnostics event after corresponding completion event
[ https://issues.apache.org/jira/browse/MAPREDUCE-6771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated MAPREDUCE-6771: -- Summary: RMContainerAllocator sends container diagnostics event after corresponding completion event (was: RMContainerAlllocator sends container diagnostics event after corresponding completion event) > RMContainerAllocator sends container diagnostics event after corresponding > completion event > --- > > Key: MAPREDUCE-6771 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6771 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 2.7.3 >Reporter: Haibo Chen >Assignee: Haibo Chen > Attachments: TaUnsuccessfullyEventEmission.jpg, > mapreduce6771.001.patch > > > Task containers can go over their resource limit, and killed by Node Manager. > Then MR AM gets notified of the container status and diagnostics information > through its heartbeat with RM. However, it is possible that the diagnostics > information never gets into .jhist file, so when the job completes, the > diagnostics information associated with the failed task attempts is empty. > This makes it hard for users to root cause job failures that are often caused > by memory leak. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org