[jira] [Assigned] (FLINK-9351) RM stop assigning slot to Job because the TM killed before connecting to JM successfully

Sihua Zhou (JIRA) Fri, 25 May 2018 05:56:29 -0700

     [ 
https://issues.apache.org/jira/browse/FLINK-9351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sihua Zhou reassigned FLINK-9351:
---------------------------------

    Assignee: Sihua Zhou

> RM stop assigning slot to Job because the TM killed before connecting to JM 
> successfully
> ----------------------------------------------------------------------------------------
>
>                 Key: FLINK-9351
>                 URL: https://issues.apache.org/jira/browse/FLINK-9351
>             Project: Flink
>          Issue Type: Bug
>          Components: Distributed Coordination
>    Affects Versions: 1.5.0
>            Reporter: Sihua Zhou
>            Assignee: Sihua Zhou
>            Priority: Critical
>             Fix For: 1.6.0
>
>
> The steps are the following(copied from Stephan's comments in 
> [5931|https://github.com/apache/flink/pull/5931]):
> - JobMaster / SlotPool requests a slot (AllocationID) from the ResourceManager
> - ResourceManager starts a container with a TaskManager
> - TaskManager registers at ResourceManager, which tells the TaskManager to 
> push a slot to the JobManager.
> - TaskManager container is killed
> - The ResourceManager does not queue back the slot requests (AllocationIDs) 
> that it sent to the previous TaskManager, so the requests are lost and need 
> to time out before another attempt is tried.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (FLINK-9351) RM stop assigning slot to Job because the TM killed before connecting to JM successfully

Reply via email to