Stefan Richter created FLINK-4141:
-------------------------------------

             Summary: TaskManager failures not always recover when killed 
during an ApplicationMaster failure in HA mode on Yarn
                 Key: FLINK-4141
                 URL: https://issues.apache.org/jira/browse/FLINK-4141
             Project: Flink
          Issue Type: Bug
    Affects Versions: 1.0.3
            Reporter: Stefan Richter


High availability on Yarn often fails to recover in the following test scenario:

1. Kill application master process.
2. Then, while application master is recovering, randomly kill several task 
managers (with some delay).

After the application master recovered, not all the killed task manager are 
brought back and no further attempts are made the restart them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to