[jira] [Commented] (MAPREDUCE-7353) Mapreduce job fails when NM is stopped

2021-06-16 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17364279#comment-17364279
 ] 

Hadoop QA commented on MAPREDUCE-7353:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 14m 
10s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red}{color} | {color:red} The patch doesn't appear to 
include any new or modified tests. Please justify why no new tests are needed 
for this patch. Also please list what manual steps were performed to verify 
this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
45s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
39s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
36s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
35s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
37s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 16s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
32s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
31s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 16m 
17s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are 
enabled, using SpotBugs. {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green}  0m 
58s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
32s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
30s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
30s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
27s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
27s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
27s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
30s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 45s{color} | {color:green}{color} | {color:green} patch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| 

[jira] [Updated] (MAPREDUCE-7353) Mapreduce job fails when NM is stopped

2021-06-16 Thread Bilwa S T (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated MAPREDUCE-7353:
-
Status: Patch Available  (was: Open)

> Mapreduce job fails when NM is stopped
> --
>
> Key: MAPREDUCE-7353
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7353
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Major
> Attachments: MAPREDUCE-7353.001.patch
>
>
> Job fails as task fail due to too many fetch failures 
> {code:java}
> Line 48048: 2021-06-02 16:25:02,002 | INFO  | ContainerLauncher #6 | 
> Processing the event EventType: CONTAINER_REMOTE_CLEANUP for container 
> container_e03_1622107691213_1054_01_05 taskAttempt 
> attempt_1622107691213_1054_m_00_0 | ContainerLauncherImpl.java:394
>   Line 48053: 2021-06-02 16:25:02,002 | INFO  | ContainerLauncher #6 | 
> KILLING attempt_1622107691213_1054_m_00_0 | ContainerLauncherImpl.java:209
>   Line 58026: 2021-06-02 16:26:34,034 | INFO  | AsyncDispatcher event 
> handler | TaskAttempt killed because it ran on unusable node 
> node-group-1ZYEq0002:26009. AttemptId:attempt_1622107691213_1054_m_00_0 | 
> JobImpl.java:1401
>   Line 58030: 2021-06-02 16:26:34,034 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type TA_KILL | 
> TaskAttemptImpl.java:1390
>   Line 58035: 2021-06-02 16:26:34,034 | INFO  | RMCommunicator Allocator 
> | Killing taskAttempt:attempt_1622107691213_1054_m_00_0 because it is 
> running on unusable node:node-group-1ZYEq0002:26009 | 
> RMContainerAllocator.java:1066
>   Line 58043: 2021-06-02 16:26:34,034 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type TA_KILL | 
> TaskAttemptImpl.java:1390
>   Line 58054: 2021-06-02 16:26:34,034 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_DIAGNOSTICS_UPDATE | TaskAttemptImpl.java:1390
>   Line 58055: 2021-06-02 16:26:34,034 | INFO  | AsyncDispatcher event 
> handler | Diagnostics report from attempt_1622107691213_1054_m_00_0: 
> Container released on a *lost* node | TaskAttemptImpl.java:2649
>   Line 58057: 2021-06-02 16:26:34,034 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type TA_KILL | 
> TaskAttemptImpl.java:1390
>   Line 60317: 2021-06-02 16:26:57,057 | INFO  | AsyncDispatcher event 
> handler | Too many fetch-failures for output of task attempt: 
> attempt_1622107691213_1054_m_00_0 ... raising fetch failure to map | 
> JobImpl.java:2005
>   Line 60319: 2021-06-02 16:26:57,057 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_TOO_MANY_FETCH_FAILURE | TaskAttemptImpl.java:1390
>   Line 60320: 2021-06-02 16:26:57,057 | INFO  | AsyncDispatcher event 
> handler | attempt_1622107691213_1054_m_00_0 transitioned from state 
> SUCCESS_CONTAINER_CLEANUP to FAILED, event type is TA_TOO_MANY_FETCH_FAILURE 
> and nodeId=node-group-1ZYEq0002:26009 | TaskAttemptImpl.java:1411
>   Line 69487: 2021-06-02 16:30:02,002 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_DIAGNOSTICS_UPDATE | TaskAttemptImpl.java:1390
>   Line 69527: 2021-06-02 16:30:02,002 | INFO  | AsyncDispatcher event 
> handler | Diagnostics report from attempt_1622107691213_1054_m_00_0: 
> cleanup failed for container container_e03_1622107691213_1054_01_05 : 
> java.net.ConnectException: Call From node-group-1ZYEq0001/192.168.0.66 to 
> node-group-1ZYEq0002:26009 failed on connection exception: 
> java.net.ConnectException: Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused
>   Line 69607: 2021-06-02 16:30:02,002 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_CONTAINER_CLEANED | TaskAttemptImpl.java:1390
>   Line 69609: 2021-06-02 16:30:02,002 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_CONTAINER_CLEANED | TaskAttemptImpl.java:1390
>   Line 73645: 2021-06-02 16:23:56,056 | DEBUG | fetcher#9 | Fetcher 9 
> going to fetch from node-group-1ZYEq0002:26008 for: 
> [attempt_1622107691213_1054_m_00_0] | Fetcher.java:318
>   Line 73646: 2021-06-02 16:23:56,056 | DEBUG | fetcher#9 | MapOutput URL 
> for node-group-1ZYEq0002:26008 -> 
> http://node-group-1ZYEq0002:26008/mapOutput?job=job_1622107691213_1054=4=attempt_1622107691213_1054_m_00_0
>  | Fetcher.java:686
>   Line 74093: 2021-06-02 16:26:56,056 | INFO  | fetcher#9 | Reporting 
> fetch failure for 

[jira] [Updated] (MAPREDUCE-7353) Mapreduce job fails when NM is stopped

2021-06-16 Thread Bilwa S T (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated MAPREDUCE-7353:
-
Attachment: MAPREDUCE-7353.001.patch

> Mapreduce job fails when NM is stopped
> --
>
> Key: MAPREDUCE-7353
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7353
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Major
> Attachments: MAPREDUCE-7353.001.patch
>
>
> Job fails as task fail due to too many fetch failures 
> {code:java}
> Line 48048: 2021-06-02 16:25:02,002 | INFO  | ContainerLauncher #6 | 
> Processing the event EventType: CONTAINER_REMOTE_CLEANUP for container 
> container_e03_1622107691213_1054_01_05 taskAttempt 
> attempt_1622107691213_1054_m_00_0 | ContainerLauncherImpl.java:394
>   Line 48053: 2021-06-02 16:25:02,002 | INFO  | ContainerLauncher #6 | 
> KILLING attempt_1622107691213_1054_m_00_0 | ContainerLauncherImpl.java:209
>   Line 58026: 2021-06-02 16:26:34,034 | INFO  | AsyncDispatcher event 
> handler | TaskAttempt killed because it ran on unusable node 
> node-group-1ZYEq0002:26009. AttemptId:attempt_1622107691213_1054_m_00_0 | 
> JobImpl.java:1401
>   Line 58030: 2021-06-02 16:26:34,034 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type TA_KILL | 
> TaskAttemptImpl.java:1390
>   Line 58035: 2021-06-02 16:26:34,034 | INFO  | RMCommunicator Allocator 
> | Killing taskAttempt:attempt_1622107691213_1054_m_00_0 because it is 
> running on unusable node:node-group-1ZYEq0002:26009 | 
> RMContainerAllocator.java:1066
>   Line 58043: 2021-06-02 16:26:34,034 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type TA_KILL | 
> TaskAttemptImpl.java:1390
>   Line 58054: 2021-06-02 16:26:34,034 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_DIAGNOSTICS_UPDATE | TaskAttemptImpl.java:1390
>   Line 58055: 2021-06-02 16:26:34,034 | INFO  | AsyncDispatcher event 
> handler | Diagnostics report from attempt_1622107691213_1054_m_00_0: 
> Container released on a *lost* node | TaskAttemptImpl.java:2649
>   Line 58057: 2021-06-02 16:26:34,034 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type TA_KILL | 
> TaskAttemptImpl.java:1390
>   Line 60317: 2021-06-02 16:26:57,057 | INFO  | AsyncDispatcher event 
> handler | Too many fetch-failures for output of task attempt: 
> attempt_1622107691213_1054_m_00_0 ... raising fetch failure to map | 
> JobImpl.java:2005
>   Line 60319: 2021-06-02 16:26:57,057 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_TOO_MANY_FETCH_FAILURE | TaskAttemptImpl.java:1390
>   Line 60320: 2021-06-02 16:26:57,057 | INFO  | AsyncDispatcher event 
> handler | attempt_1622107691213_1054_m_00_0 transitioned from state 
> SUCCESS_CONTAINER_CLEANUP to FAILED, event type is TA_TOO_MANY_FETCH_FAILURE 
> and nodeId=node-group-1ZYEq0002:26009 | TaskAttemptImpl.java:1411
>   Line 69487: 2021-06-02 16:30:02,002 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_DIAGNOSTICS_UPDATE | TaskAttemptImpl.java:1390
>   Line 69527: 2021-06-02 16:30:02,002 | INFO  | AsyncDispatcher event 
> handler | Diagnostics report from attempt_1622107691213_1054_m_00_0: 
> cleanup failed for container container_e03_1622107691213_1054_01_05 : 
> java.net.ConnectException: Call From node-group-1ZYEq0001/192.168.0.66 to 
> node-group-1ZYEq0002:26009 failed on connection exception: 
> java.net.ConnectException: Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused
>   Line 69607: 2021-06-02 16:30:02,002 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_CONTAINER_CLEANED | TaskAttemptImpl.java:1390
>   Line 69609: 2021-06-02 16:30:02,002 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_CONTAINER_CLEANED | TaskAttemptImpl.java:1390
>   Line 73645: 2021-06-02 16:23:56,056 | DEBUG | fetcher#9 | Fetcher 9 
> going to fetch from node-group-1ZYEq0002:26008 for: 
> [attempt_1622107691213_1054_m_00_0] | Fetcher.java:318
>   Line 73646: 2021-06-02 16:23:56,056 | DEBUG | fetcher#9 | MapOutput URL 
> for node-group-1ZYEq0002:26008 -> 
> http://node-group-1ZYEq0002:26008/mapOutput?job=job_1622107691213_1054=4=attempt_1622107691213_1054_m_00_0
>  | Fetcher.java:686
>   Line 74093: 2021-06-02 16:26:56,056 | INFO  | fetcher#9 | Reporting 
> fetch failure for 

[jira] [Commented] (MAPREDUCE-7353) Mapreduce job fails when NM is stopped

2021-06-16 Thread Bilwa S T (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17364233#comment-17364233
 ] 

Bilwa S T commented on MAPREDUCE-7353:
--

cc [~epayne] [~jbrennan]

> Mapreduce job fails when NM is stopped
> --
>
> Key: MAPREDUCE-7353
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7353
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Major
> Attachments: MAPREDUCE-7353.001.patch
>
>
> Job fails as task fail due to too many fetch failures 
> {code:java}
> Line 48048: 2021-06-02 16:25:02,002 | INFO  | ContainerLauncher #6 | 
> Processing the event EventType: CONTAINER_REMOTE_CLEANUP for container 
> container_e03_1622107691213_1054_01_05 taskAttempt 
> attempt_1622107691213_1054_m_00_0 | ContainerLauncherImpl.java:394
>   Line 48053: 2021-06-02 16:25:02,002 | INFO  | ContainerLauncher #6 | 
> KILLING attempt_1622107691213_1054_m_00_0 | ContainerLauncherImpl.java:209
>   Line 58026: 2021-06-02 16:26:34,034 | INFO  | AsyncDispatcher event 
> handler | TaskAttempt killed because it ran on unusable node 
> node-group-1ZYEq0002:26009. AttemptId:attempt_1622107691213_1054_m_00_0 | 
> JobImpl.java:1401
>   Line 58030: 2021-06-02 16:26:34,034 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type TA_KILL | 
> TaskAttemptImpl.java:1390
>   Line 58035: 2021-06-02 16:26:34,034 | INFO  | RMCommunicator Allocator 
> | Killing taskAttempt:attempt_1622107691213_1054_m_00_0 because it is 
> running on unusable node:node-group-1ZYEq0002:26009 | 
> RMContainerAllocator.java:1066
>   Line 58043: 2021-06-02 16:26:34,034 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type TA_KILL | 
> TaskAttemptImpl.java:1390
>   Line 58054: 2021-06-02 16:26:34,034 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_DIAGNOSTICS_UPDATE | TaskAttemptImpl.java:1390
>   Line 58055: 2021-06-02 16:26:34,034 | INFO  | AsyncDispatcher event 
> handler | Diagnostics report from attempt_1622107691213_1054_m_00_0: 
> Container released on a *lost* node | TaskAttemptImpl.java:2649
>   Line 58057: 2021-06-02 16:26:34,034 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type TA_KILL | 
> TaskAttemptImpl.java:1390
>   Line 60317: 2021-06-02 16:26:57,057 | INFO  | AsyncDispatcher event 
> handler | Too many fetch-failures for output of task attempt: 
> attempt_1622107691213_1054_m_00_0 ... raising fetch failure to map | 
> JobImpl.java:2005
>   Line 60319: 2021-06-02 16:26:57,057 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_TOO_MANY_FETCH_FAILURE | TaskAttemptImpl.java:1390
>   Line 60320: 2021-06-02 16:26:57,057 | INFO  | AsyncDispatcher event 
> handler | attempt_1622107691213_1054_m_00_0 transitioned from state 
> SUCCESS_CONTAINER_CLEANUP to FAILED, event type is TA_TOO_MANY_FETCH_FAILURE 
> and nodeId=node-group-1ZYEq0002:26009 | TaskAttemptImpl.java:1411
>   Line 69487: 2021-06-02 16:30:02,002 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_DIAGNOSTICS_UPDATE | TaskAttemptImpl.java:1390
>   Line 69527: 2021-06-02 16:30:02,002 | INFO  | AsyncDispatcher event 
> handler | Diagnostics report from attempt_1622107691213_1054_m_00_0: 
> cleanup failed for container container_e03_1622107691213_1054_01_05 : 
> java.net.ConnectException: Call From node-group-1ZYEq0001/192.168.0.66 to 
> node-group-1ZYEq0002:26009 failed on connection exception: 
> java.net.ConnectException: Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused
>   Line 69607: 2021-06-02 16:30:02,002 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_CONTAINER_CLEANED | TaskAttemptImpl.java:1390
>   Line 69609: 2021-06-02 16:30:02,002 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_CONTAINER_CLEANED | TaskAttemptImpl.java:1390
>   Line 73645: 2021-06-02 16:23:56,056 | DEBUG | fetcher#9 | Fetcher 9 
> going to fetch from node-group-1ZYEq0002:26008 for: 
> [attempt_1622107691213_1054_m_00_0] | Fetcher.java:318
>   Line 73646: 2021-06-02 16:23:56,056 | DEBUG | fetcher#9 | MapOutput URL 
> for node-group-1ZYEq0002:26008 -> 
> http://node-group-1ZYEq0002:26008/mapOutput?job=job_1622107691213_1054=4=attempt_1622107691213_1054_m_00_0
>  | Fetcher.java:686
>   Line 74093: 2021-06-02 16:26:56,056 | INFO  | fetcher#9 | 

[jira] [Created] (MAPREDUCE-7353) Mapreduce job fails when NM is stopped

2021-06-16 Thread Bilwa S T (Jira)
Bilwa S T created MAPREDUCE-7353:


 Summary: Mapreduce job fails when NM is stopped
 Key: MAPREDUCE-7353
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7353
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Bilwa S T
Assignee: Bilwa S T


Job fails as task fail due to too many fetch failures 
{code:java}
Line 48048: 2021-06-02 16:25:02,002 | INFO  | ContainerLauncher #6 | Processing 
the event EventType: CONTAINER_REMOTE_CLEANUP for container 
container_e03_1622107691213_1054_01_05 taskAttempt 
attempt_1622107691213_1054_m_00_0 | ContainerLauncherImpl.java:394
Line 48053: 2021-06-02 16:25:02,002 | INFO  | ContainerLauncher #6 | 
KILLING attempt_1622107691213_1054_m_00_0 | ContainerLauncherImpl.java:209
Line 58026: 2021-06-02 16:26:34,034 | INFO  | AsyncDispatcher event 
handler | TaskAttempt killed because it ran on unusable node 
node-group-1ZYEq0002:26009. AttemptId:attempt_1622107691213_1054_m_00_0 | 
JobImpl.java:1401
Line 58030: 2021-06-02 16:26:34,034 | DEBUG | AsyncDispatcher event 
handler | Processing attempt_1622107691213_1054_m_00_0 of type TA_KILL | 
TaskAttemptImpl.java:1390
Line 58035: 2021-06-02 16:26:34,034 | INFO  | RMCommunicator Allocator 
| Killing taskAttempt:attempt_1622107691213_1054_m_00_0 because it is 
running on unusable node:node-group-1ZYEq0002:26009 | 
RMContainerAllocator.java:1066
Line 58043: 2021-06-02 16:26:34,034 | DEBUG | AsyncDispatcher event 
handler | Processing attempt_1622107691213_1054_m_00_0 of type TA_KILL | 
TaskAttemptImpl.java:1390
Line 58054: 2021-06-02 16:26:34,034 | DEBUG | AsyncDispatcher event 
handler | Processing attempt_1622107691213_1054_m_00_0 of type 
TA_DIAGNOSTICS_UPDATE | TaskAttemptImpl.java:1390
Line 58055: 2021-06-02 16:26:34,034 | INFO  | AsyncDispatcher event 
handler | Diagnostics report from attempt_1622107691213_1054_m_00_0: 
Container released on a *lost* node | TaskAttemptImpl.java:2649
Line 58057: 2021-06-02 16:26:34,034 | DEBUG | AsyncDispatcher event 
handler | Processing attempt_1622107691213_1054_m_00_0 of type TA_KILL | 
TaskAttemptImpl.java:1390
Line 60317: 2021-06-02 16:26:57,057 | INFO  | AsyncDispatcher event 
handler | Too many fetch-failures for output of task attempt: 
attempt_1622107691213_1054_m_00_0 ... raising fetch failure to map | 
JobImpl.java:2005
Line 60319: 2021-06-02 16:26:57,057 | DEBUG | AsyncDispatcher event 
handler | Processing attempt_1622107691213_1054_m_00_0 of type 
TA_TOO_MANY_FETCH_FAILURE | TaskAttemptImpl.java:1390
Line 60320: 2021-06-02 16:26:57,057 | INFO  | AsyncDispatcher event 
handler | attempt_1622107691213_1054_m_00_0 transitioned from state 
SUCCESS_CONTAINER_CLEANUP to FAILED, event type is TA_TOO_MANY_FETCH_FAILURE 
and nodeId=node-group-1ZYEq0002:26009 | TaskAttemptImpl.java:1411
Line 69487: 2021-06-02 16:30:02,002 | DEBUG | AsyncDispatcher event 
handler | Processing attempt_1622107691213_1054_m_00_0 of type 
TA_DIAGNOSTICS_UPDATE | TaskAttemptImpl.java:1390
Line 69527: 2021-06-02 16:30:02,002 | INFO  | AsyncDispatcher event 
handler | Diagnostics report from attempt_1622107691213_1054_m_00_0: 
cleanup failed for container container_e03_1622107691213_1054_01_05 : 
java.net.ConnectException: Call From node-group-1ZYEq0001/192.168.0.66 to 
node-group-1ZYEq0002:26009 failed on connection exception: 
java.net.ConnectException: Connection refused; For more details see:  
http://wiki.apache.org/hadoop/ConnectionRefused
Line 69607: 2021-06-02 16:30:02,002 | DEBUG | AsyncDispatcher event 
handler | Processing attempt_1622107691213_1054_m_00_0 of type 
TA_CONTAINER_CLEANED | TaskAttemptImpl.java:1390
Line 69609: 2021-06-02 16:30:02,002 | DEBUG | AsyncDispatcher event 
handler | Processing attempt_1622107691213_1054_m_00_0 of type 
TA_CONTAINER_CLEANED | TaskAttemptImpl.java:1390
Line 73645: 2021-06-02 16:23:56,056 | DEBUG | fetcher#9 | Fetcher 9 
going to fetch from node-group-1ZYEq0002:26008 for: 
[attempt_1622107691213_1054_m_00_0] | Fetcher.java:318
Line 73646: 2021-06-02 16:23:56,056 | DEBUG | fetcher#9 | MapOutput URL 
for node-group-1ZYEq0002:26008 -> 
http://node-group-1ZYEq0002:26008/mapOutput?job=job_1622107691213_1054=4=attempt_1622107691213_1054_m_00_0
 | Fetcher.java:686
Line 74093: 2021-06-02 16:26:56,056 | INFO  | fetcher#9 | Reporting 
fetch failure for attempt_1622107691213_1054_m_00_0 to MRAppMaster. | 
ShuffleSchedulerImpl.java:349
{code}

As we can see from logs that RM reported AM about node update at 16:26:34 but 
event was skipped as KILL event is ignored when TaskAttemptImpl is in 
SUCCESS_CONTAINER_CLEANUP state. So next we receive TA_TOO_MANY_FETCH_FAILURE 
event which will lead to task fail. 
 



--
This