[jira] [Updated] (MAPREDUCE-6771) RMContainerAllocator sends container diagnostics event after corresponding completion event

2016-09-29 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-6771:
--
   Resolution: Fixed
Fix Version/s: 3.0.0-alpha2
   2.7.4
   2.8.0
   Status: Resolved  (was: Patch Available)

Thanks, [~haibochen]!  I committed this to trunk, branch-2, branch-2.8, and 
branch-2.7.

> RMContainerAllocator sends container diagnostics event after corresponding 
> completion event
> ---
>
> Key: MAPREDUCE-6771
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6771
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.7.3
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Fix For: 2.8.0, 2.7.4, 3.0.0-alpha2
>
> Attachments: TaUnsuccessfullyEventEmission.jpg, 
> mapreduce6771.001.patch, mapreduce6771.002.patch, mapreduce6771.003.patch, 
> mapreduce6771.004.patch, mapreduce6771.branch-2.8.patch
>
>
> Task containers can go over their resource limit, and killed by Node Manager. 
> Then MR AM gets notified of the container status and diagnostics information 
> through its heartbeat with RM.  However, it is possible that the diagnostics 
> information never gets into .jhist file, so when the job completes, the 
> diagnostics information associated with the failed task attempts is empty.  
> This makes it hard for users to root cause job failures that are often caused 
> by memory leak.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6771) RMContainerAllocator sends container diagnostics event after corresponding completion event

2016-09-28 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated MAPREDUCE-6771:
--
Attachment: mapreduce6771.branch-2.8.patch

> RMContainerAllocator sends container diagnostics event after corresponding 
> completion event
> ---
>
> Key: MAPREDUCE-6771
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6771
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.7.3
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: TaUnsuccessfullyEventEmission.jpg, 
> mapreduce6771.001.patch, mapreduce6771.002.patch, mapreduce6771.003.patch, 
> mapreduce6771.004.patch, mapreduce6771.branch-2.8.patch
>
>
> Task containers can go over their resource limit, and killed by Node Manager. 
> Then MR AM gets notified of the container status and diagnostics information 
> through its heartbeat with RM.  However, it is possible that the diagnostics 
> information never gets into .jhist file, so when the job completes, the 
> diagnostics information associated with the failed task attempts is empty.  
> This makes it hard for users to root cause job failures that are often caused 
> by memory leak.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6771) RMContainerAllocator sends container diagnostics event after corresponding completion event

2016-09-26 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated MAPREDUCE-6771:
--
Attachment: mapreduce6771.004.patch

> RMContainerAllocator sends container diagnostics event after corresponding 
> completion event
> ---
>
> Key: MAPREDUCE-6771
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6771
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.7.3
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: TaUnsuccessfullyEventEmission.jpg, 
> mapreduce6771.001.patch, mapreduce6771.002.patch, mapreduce6771.003.patch, 
> mapreduce6771.004.patch
>
>
> Task containers can go over their resource limit, and killed by Node Manager. 
> Then MR AM gets notified of the container status and diagnostics information 
> through its heartbeat with RM.  However, it is possible that the diagnostics 
> information never gets into .jhist file, so when the job completes, the 
> diagnostics information associated with the failed task attempts is empty.  
> This makes it hard for users to root cause job failures that are often caused 
> by memory leak.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6771) RMContainerAllocator sends container diagnostics event after corresponding completion event

2016-09-16 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated MAPREDUCE-6771:
--
Attachment: mapreduce6771.003.patch

Uploaded a new one to address check style issues.

> RMContainerAllocator sends container diagnostics event after corresponding 
> completion event
> ---
>
> Key: MAPREDUCE-6771
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6771
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.7.3
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: TaUnsuccessfullyEventEmission.jpg, 
> mapreduce6771.001.patch, mapreduce6771.002.patch, mapreduce6771.003.patch
>
>
> Task containers can go over their resource limit, and killed by Node Manager. 
> Then MR AM gets notified of the container status and diagnostics information 
> through its heartbeat with RM.  However, it is possible that the diagnostics 
> information never gets into .jhist file, so when the job completes, the 
> diagnostics information associated with the failed task attempts is empty.  
> This makes it hard for users to root cause job failures that are often caused 
> by memory leak.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6771) RMContainerAllocator sends container diagnostics event after corresponding completion event

2016-09-16 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated MAPREDUCE-6771:
--
Attachment: mapreduce6771.002.patch

Sorry for the long delay. Uploading a new patch that adds a unit test.

> RMContainerAllocator sends container diagnostics event after corresponding 
> completion event
> ---
>
> Key: MAPREDUCE-6771
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6771
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.7.3
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: TaUnsuccessfullyEventEmission.jpg, 
> mapreduce6771.001.patch, mapreduce6771.002.patch
>
>
> Task containers can go over their resource limit, and killed by Node Manager. 
> Then MR AM gets notified of the container status and diagnostics information 
> through its heartbeat with RM.  However, it is possible that the diagnostics 
> information never gets into .jhist file, so when the job completes, the 
> diagnostics information associated with the failed task attempts is empty.  
> This makes it hard for users to root cause job failures that are often caused 
> by memory leak.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6771) RMContainerAllocator sends container diagnostics event after corresponding completion event

2016-08-31 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-6771:
--
Summary: RMContainerAllocator sends container diagnostics event after 
corresponding completion event  (was: RMContainerAlllocator sends container 
diagnostics event after corresponding completion event)

> RMContainerAllocator sends container diagnostics event after corresponding 
> completion event
> ---
>
> Key: MAPREDUCE-6771
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6771
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.7.3
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: TaUnsuccessfullyEventEmission.jpg, 
> mapreduce6771.001.patch
>
>
> Task containers can go over their resource limit, and killed by Node Manager. 
> Then MR AM gets notified of the container status and diagnostics information 
> through its heartbeat with RM.  However, it is possible that the diagnostics 
> information never gets into .jhist file, so when the job completes, the 
> diagnostics information associated with the failed task attempts is empty.  
> This makes it hard for users to root cause job failures that are often caused 
> by memory leak.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org