[jira] [Commented] (MAPREDUCE-6771) Diagnostics information can be lost in .jhist if task containers are killed by Node Manager.

2016-08-31 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15452936#comment-15452936
 ] 

Haibo Chen commented on MAPREDUCE-6771:
---

bq. Note that we aren't stuck with TaskAttemptUnsuccessfulCompletion event for 
doing diagnostics. 
Agree. I am guessing the reason why diagnostics is included in 
TaskAttemptUnsuccessfulCompletionEvent is users only want to see diagnostics 
when task attempts fail. Parsing a new event and ignoring such events for 
successful task attempts does need additional change.
bq.  but waiting for a container completion event is not something the state 
machine does today.
There is no need to wait for container completion event. My proposal is to wait 
for transition into FAILED state. As long as the task attempt goes into FAILED 
state, which does not necessarily need to be triggered by a container 
completion event (Time out (TA_TIMED_OUT) is already built-in in transitions 
from FAIL_FINISHING_CONTAINER to FAILED), the diagnostics will be written into 
jhist file. But your point of having a wide window is susceptible to AM crash 
is still very convincing.

Given that there is no clear preferable approach to address the case in 
MAPREDUCE-4955, do you think I can go ahead address the issue in this jira? The 
symptom of  MAPREDUCE-4955 and this one is the same, but the cause is not quite 
exactly. The case in MAPREDUCE-4955 happens when AM thinks the task attempt is 
already dead, or the diagnostics comes after a taskUnsuccessfulCompletionEvent 
is generated already, whereas the case in this jira happens when the 
diagnostics comes in while task attempt is still in running state, or before a 
taskUnsuccessfulCompletionEvent.  The case in this jira is easy to fix, and we 
can keep MAPREDUCE-4955 to address the other when we decide what to do.

> Diagnostics information can be lost in .jhist if task containers are killed 
> by Node Manager.
> 
>
> Key: MAPREDUCE-6771
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6771
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.7.3
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: TaUnsuccessfullyEventEmission.jpg, 
> mapreduce6771.001.patch
>
>
> Task containers can go over their resource limit, and killed by Node Manager. 
> Then MR AM gets notified of the container status and diagnostics information 
> through its heartbeat with RM.  However, it is possible that the diagnostics 
> information never gets into .jhist file, so when the job completes, the 
> diagnostics information associated with the failed task attempts is empty.  
> This makes it hard for users to root cause job failures that are often caused 
> by memory leak.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6771) Diagnostics information can be lost in .jhist if task containers are killed by Node Manager.

2016-08-29 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15447195#comment-15447195
 ] 

Jason Lowe commented on MAPREDUCE-6771:
---

bq.  so my understanding of this is there should be ideally one such event in 
the jhist file

Yes, ideally we should avoid emitting more than one 
TaskAttemptUnsuccessfulCompletion event.  There are other tools besides the JHS 
that look at these jhist files, and I don't know how well they will handle more 
than one of these for the same attempt.

Note that we aren't stuck with TaskAttemptUnsuccessfulCompletion event for 
doing diagnostics.  We could use some new diagnostic event just for this 
purpose, but that too could cause troubles for jhist parsers that don't skip 
unknown records.

As for the postponing we probably can move it farther down the state machine, 
but waiting for a container completion event is not something the state machine 
does today.  For example, the FAIL_FINISHING_CONTAINER state is just waiting 
for the AM to send a kill container request to the NM and not actually waiting 
for the container completion event.  It would need to do so.  Another issue 
with postponing is the dependency on the container completion event.  There 
have been issues in the past where the MR AM "missed" a container completion 
event and caused a scheduling hang.  We'd need some kind of safety valve to 
prevent the AM from waiting forever for a completion event that would never 
arrive.  Another issue with waiting is that if the AM crashes after a task 
reported failure but before the container completion event arrived then that 
won't be noticed by the subsequent AM attempt.  (Yes, this race occurs today 
but the window would be significantly wider.)  Those kinds of issues makes the 
"lets record some additional diagnostics after the fact" approach more 
appealing, since we do exactly what we do today with an addendum if a container 
completion event has more info after an attempt completion has already been 
recorded in the jhist file.

Both approaches have pros and cons, and I'm not sure which I prefer yet.

> Diagnostics information can be lost in .jhist if task containers are killed 
> by Node Manager.
> 
>
> Key: MAPREDUCE-6771
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6771
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.7.3
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: TaUnsuccessfullyEventEmission.jpg, 
> mapreduce6771.001.patch
>
>
> Task containers can go over their resource limit, and killed by Node Manager. 
> Then MR AM gets notified of the container status and diagnostics information 
> through its heartbeat with RM.  However, it is possible that the diagnostics 
> information never gets into .jhist file, so when the job completes, the 
> diagnostics information associated with the failed task attempts is empty.  
> This makes it hard for users to root cause job failures that are often caused 
> by memory leak.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6771) Diagnostics information can be lost in .jhist if task containers are killed by Node Manager.

2016-08-29 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15447008#comment-15447008
 ] 

Haibo Chen commented on MAPREDUCE-6771:
---

Thanks [~jlowe] for bringing up the case in MAPREDUCE-4955. Looking at that 
jira, it is indeed another case where diagnostics could be lost. 

bq. The AM would either need to postpone recording the attempt completion event 
until it receives the container completion event to see if there are any 
diagnostics or there needs to be a way to record postmortem diagnostics for 
attempts in the jhist file.
The diagnostics are included as part of a 
TaskAttemptUnsuccessfullyCompletionEvent, so my understanding of this is there 
should be ideally one such event in the jhist file (If we emit multiple 
instances, JobHistoryParser will always take the last instance seen in .jhist 
file). Please correct me if I am wrong. Therefore, I am thinking of postponing 
recording the unsuccessfully completion event.

bq. postpone recording the attempt completion event until it receives the 
container completion event to see if there are any diagnostics
TaskAttemptUnsuccessfullyCompletionEvent is generated upon receipt of TA_KILL, 
TA_TooManyFetchFailures and TA_FailMsg  Postponing the event emission until a 
container completion event is received makes the handling of TA_FAILMSG 
semantically inconsistent with that of other cases. I wonder if it is 
semantically cleaner to postpone the completion event until the transition into 
the final states (FAILED, KILLED).  The emission of 
TaskAttemptUnsuccessfullyCompletionEvents happens currently before transition 
into FAIL_FINISHING_CONTAINER, FAILED or KILLED state, but given that 
FAIL_FINISHING_CONTAINER will eventually transition into FAILED state, we could 
reduce the three cases into two (See the attachment show transitions during 
which an TaskAttemptUnsuccessfullyCompletionEvent is generated). That is, right 
before a task attempt goes into KILLED or FAILED, a 
TaskAttemptUnsuccessfullyCompletionEvents is written into the .jhist file.



> Diagnostics information can be lost in .jhist if task containers are killed 
> by Node Manager.
> 
>
> Key: MAPREDUCE-6771
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6771
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.7.3
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: mapreduce6771.001.patch
>
>
> Task containers can go over their resource limit, and killed by Node Manager. 
> Then MR AM gets notified of the container status and diagnostics information 
> through its heartbeat with RM.  However, it is possible that the diagnostics 
> information never gets into .jhist file, so when the job completes, the 
> diagnostics information associated with the failed task attempts is empty.  
> This makes it hard for users to root cause job failures that are often caused 
> by memory leak.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6771) Diagnostics information can be lost in .jhist if task containers are killed by Node Manager.

2016-08-29 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15446117#comment-15446117
 ] 

Jason Lowe commented on MAPREDUCE-6771:
---

bq. Not sure how a unit test can be written. Any suggestion is greatly 
appreciated.

A unit test could verify that when the RMCommunicator receives a container 
completion event with diagnostics it sends the diagnostic event _before_ it 
sends the completion event.  That test will fail before this change and pass 
afterwards.

> Diagnostics information can be lost in .jhist if task containers are killed 
> by Node Manager.
> 
>
> Key: MAPREDUCE-6771
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6771
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.7.3
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: mapreduce6771.001.patch
>
>
> Task containers can go over their resource limit, and killed by Node Manager. 
> Then MR AM gets notified of the container status and diagnostics information 
> through its heartbeat with RM.  However, it is possible that the diagnostics 
> information never gets into .jhist file, so when the job completes, the 
> diagnostics information associated with the failed task attempts is empty.  
> This makes it hard for users to root cause job failures that are often caused 
> by memory leak.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6771) Diagnostics information can be lost in .jhist if task containers are killed by Node Manager.

2016-08-29 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15446112#comment-15446112
 ] 

Jason Lowe commented on MAPREDUCE-6771:
---

Thanks for the report and patch!  This is closely related to MAPREDUCE-4955.

I think this is necessary but not sufficient to fix the problem.  As noted in 
MAPREDUCE-4955 the task can fail due to the SIGTERM and report that failure via 
the umbilical before the container completion event arrives at the AM.  At that 
point the task attempt is already dead from the AM perspective and the .jhist 
entry already recorded, so the extra diagnostics have nowhere to go.  The AM 
would either need to postpone recording the attempt completion event until it 
receives the container completion event to see if there are any diagnostics or 
there needs to be a way to record postmortem diagnostics for attempts in the 
jhist file.


> Diagnostics information can be lost in .jhist if task containers are killed 
> by Node Manager.
> 
>
> Key: MAPREDUCE-6771
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6771
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.7.3
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: mapreduce6771.001.patch
>
>
> Task containers can go over their resource limit, and killed by Node Manager. 
> Then MR AM gets notified of the container status and diagnostics information 
> through its heartbeat with RM.  However, it is possible that the diagnostics 
> information never gets into .jhist file, so when the job completes, the 
> diagnostics information associated with the failed task attempts is empty.  
> This makes it hard for users to root cause job failures that are often caused 
> by memory leak.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6771) Diagnostics information can be lost in .jhist if task containers are killed by Node Manager.

2016-08-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15440439#comment-15440439
 ] 

Hadoop QA commented on MAPREDUCE-6771:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 13m 21s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
17s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 29s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
34s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
21s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
39s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 12s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 44s 
{color} | {color:green} hadoop-mapreduce-client-app in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
17s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 34m 58s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12825749/mapreduce6771.001.patch
 |
| JIRA Issue | MAPREDUCE-6771 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 3f99f0b54520 3.13.0-93-generic #140-Ubuntu SMP Mon Jul 18 
21:21:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 19c743c |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6700/testReport/ |
| modules | C: 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app U: 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app |
| Console output | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6700/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> Diagnostics information can be lost in .jhist if task containers are killed 
> by Node Manager.
> 
>
> Key: MAPREDUCE-6771
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6771
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  

[jira] [Commented] (MAPREDUCE-6771) Diagnostics information can be lost in .jhist if task containers are killed by Node Manager.

2016-08-26 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15440379#comment-15440379
 ] 

Haibo Chen commented on MAPREDUCE-6771:
---

If tasked are killed or failed on NM before they can notify AM,  the user need 
to dig through NM logs, or task logs hoping they can find some useful 
information as to why the task attempt failed.

> Diagnostics information can be lost in .jhist if task containers are killed 
> by Node Manager.
> 
>
> Key: MAPREDUCE-6771
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6771
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.7.3
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>
> Task containers can go over their resource limit, and killed by Node Manager. 
> Then MR AM gets notified of the container status and diagnostics information 
> through its heartbeat with RM.  However, it is possible that the diagnostics 
> information never gets into .jhist file, so when the job completes, the 
> diagnostics information associated with the failed task attempts is empty.  
> This makes it hard for users to root cause job failures that are often caused 
> by memory leak.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6771) Diagnostics information can be lost in .jhist if task containers are killed by Node Manager.

2016-08-26 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15440372#comment-15440372
 ] 

Haibo Chen commented on MAPREDUCE-6771:
---

Analysis:
{code:java}
RMContainerAllocator.getResources() {
  ...
  for (ContainerStatus cont : finishedContainers) {
  LOG.info("Received completed container " + cont.getContainerId());
  TaskAttemptId attemptID = assignedRequests.get(cont.getContainerId());
  if (attemptID == null) {
LOG.error("Container complete event for unknown container id "
+ cont.getContainerId());
  } else {
pendingRelease.remove(cont.getContainerId());
assignedRequests.remove(attemptID);

// send the container completed event to Task attempt
eventHandler.handle(createContainerFinishedEvent(cont, attemptID));

// Send the diagnostics
String diagnostics = StringInterner.weakIntern(cont.getDiagnostics());
eventHandler.handle(new TaskAttemptDiagnosticsUpdateEvent(attemptID,
diagnostics));

preemptionPolicy.handleCompletedContainer(attemptID);
  }
  ...
}
{code}
The scenario in question is described as follows: A job is running, and one of 
tasks attempt running on a NM is killed by the NM because the container exceeds 
its resource limit. The container status/diagnostics is sent to RM by the NM 
and then later to MR AM in its periodical heartbeat with RM as shown above. In 
MR AM, the task attempt is still in RUNNING state from AM's perspective, since 
the task heartbeat has not timed out. 

Upon receiving from RM that the task attempt container has finished, the 
RMCommunicator thread will place a ContainerFinishedEvent and a 
TaskAttemptDiagnosticsUpdateEvent in the event queue. 

The ContainerFinishedEvent will cause the task attempt in MR AM to transition 
from RUNNING to FAILED and a TaskAttemptUnsuccessfulCompletionEvent that 
contains the associated diagnostics information to be written to the .jhist 
file.  The TaskAttemptDiagnosticsUpdateEvent will update the diagnostics 
information associated with the task attempt. 

But since the ContainerFinishedEvent is placed and processed before the 
TaskAttemptDiagnosticsUpdateEvent, the TaskAttemptUnsuccessfulCompletionEvent 
written to .jhist file will not contain the diagnostics info received from RM.

After the job is completed, the user  tries to access the failed task attempts 
through JHS, the TaskAttemptUnsuccessfulCompletionEvent is parsed to generate 
the failed attempt page.  The page will not have diagnostics info from RM (such 
as container killed by Node Manager...) because it was never written to .jhist 
in the first place.

> Diagnostics information can be lost in .jhist if task containers are killed 
> by Node Manager.
> 
>
> Key: MAPREDUCE-6771
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6771
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.7.3
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>
> Task containers can go over their resource limit, and killed by Node Manager. 
> Then MR AM gets notified of the container status and diagnostics information 
> through its heartbeat with RM.  However, it is possible that the diagnostics 
> information never gets into .jhist file, so when the job completes, the 
> diagnostics information associated with the failed task attempts is empty.  
> This makes it hard for users to root cause job failures that are often caused 
> by memory leak.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org