[jira] [Commented] (MAPREDUCE-7314) Job will hang if NM is restarted while its running

2021-01-20 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17268809#comment-17268809
 ] 

Hadoop QA commented on MAPREDUCE-7314:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red}  3m  
8s{color} | {color:red}{color} | {color:red} Docker failed to build 
yetus/hadoop:9560f252cf1. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | MAPREDUCE-7314 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13019086/MAPREDUCE-7314-MR-6749.001.patch
 |
| Console output | 
https://ci-hadoop.apache.org/job/PreCommit-MAPREDUCE-Build/49/console |
| versions | git=2.17.1 |
| Powered by | Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org |


This message was automatically generated.



> Job will hang if NM is restarted while its running
> --
>
> Key: MAPREDUCE-7314
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7314
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Major
> Attachments: MAPREDUCE-7314-MR-6749.001.patch
>
>
> This is due to three different reasons
>  # PRIORITY_FAST_FAIL_MAP priority containers should be considered for reuse.
>  # Whenever CONTAINER_REMOTE_CLEANUP is fired for task attempt, it wont kill 
> current attempt which is assigned to container. That is because task attempt 
> is not updated in ContainerLauncherImpl#Container class. 
>  # Container gets assigned to task attempt even when container has stopped 
> running ie Container completed event is processed. This is because we add 
> reuse container map to allocated list. Makeremoterequest gets the same 
> container in allocationResponse whereas RM has sent same container in 
> finished container list. To avoid this we need to make sure allocated list 
> doesnt have any containers which are finished.
> Test credits : [~Rajshree]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7314) Job will hang if NM is restarted while its running

2021-01-20 Thread Bilwa S T (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17268769#comment-17268769
 ] 

Bilwa S T commented on MAPREDUCE-7314:
--

Hi [~epayne]

This is in branch MR-6749 when container reuse is enabled

> Job will hang if NM is restarted while its running
> --
>
> Key: MAPREDUCE-7314
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7314
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Major
>
> This is due to three different reasons
>  # PRIORITY_FAST_FAIL_MAP priority containers should be considered for reuse.
>  # Whenever CONTAINER_REMOTE_CLEANUP is fired for task attempt, it wont kill 
> current attempt which is assigned to container. That is because task attempt 
> is not updated in ContainerLauncherImpl#Container class. 
>  # Container gets assigned to task attempt even when container has stopped 
> running ie Container completed event is processed. This is because we add 
> reuse container map to allocated list. Makeremoterequest gets the same 
> container in allocationResponse whereas RM has sent same container in 
> finished container list. To avoid this we need to make sure allocated list 
> doesnt have any containers which are finished.
> Test credits : [~Rajshree]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7314) Job will hang if NM is restarted while its running

2021-01-20 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17268715#comment-17268715
 ] 

Eric Payne commented on MAPREDUCE-7314:
---

[~BilwaST], on what version are you seeing this?

> Job will hang if NM is restarted while its running
> --
>
> Key: MAPREDUCE-7314
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7314
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Major
>
> This is due to three different reasons
>  # PRIORITY_FAST_FAIL_MAP priority containers should be considered for reuse.
>  # Whenever CONTAINER_REMOTE_CLEANUP is fired for task attempt, it wont kill 
> current attempt which is assigned to container. That is because task attempt 
> is not updated in ContainerLauncherImpl#Container class. 
>  # Container gets assigned to task attempt even when container has stopped 
> running ie Container completed event is processed. This is because we add 
> reuse container map to allocated list. Makeremoterequest gets the same 
> container in allocationResponse whereas RM has sent same container in 
> finished container list. To avoid this we need to make sure allocated list 
> doesnt have any containers which are finished.
> Test credits : [~Rajshree]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org