Stefan Richter created FLINK-4141: ------------------------------------- Summary: TaskManager failures not always recover when killed during an ApplicationMaster failure in HA mode on Yarn Key: FLINK-4141 URL: https://issues.apache.org/jira/browse/FLINK-4141 Project: Flink Issue Type: Bug Affects Versions: 1.0.3 Reporter: Stefan Richter
High availability on Yarn often fails to recover in the following test scenario: 1. Kill application master process. 2. Then, while application master is recovering, randomly kill several task managers (with some delay). After the application master recovered, not all the killed task manager are brought back and no further attempts are made the restart them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)