[jira] [Updated] (FLINK-22506) YARN job cluster stuck in retrying creating JobManager if savepoint is corrupted
[ https://issues.apache.org/jira/browse/FLINK-22506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Lin updated FLINK-22506: - Attachment: yarn application attempts.png > YARN job cluster stuck in retrying creating JobManager if savepoint is > corrupted > > > Key: FLINK-22506 > URL: https://issues.apache.org/jira/browse/FLINK-22506 > Project: Flink > Issue Type: Improvement > Components: Deployment / YARN >Affects Versions: 1.11.3 >Reporter: Paul Lin >Priority: Major > Attachments: corrupted_savepoint.log, yarn application attempts.png > > > If a non-retryable error (e.g. the savepoint is corrupted or unaccessible) > occurs during the initiation of the job manager, the job cluster exits with > an error code. But since it does not mark the attempt as failed, it won't be > count as a failed attempt, and YARN will keep retrying forever. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-22506) YARN job cluster stuck in retrying creating JobManager if savepoint is corrupted
[ https://issues.apache.org/jira/browse/FLINK-22506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Lin updated FLINK-22506: - Attachment: corrupted_savepoint.log > YARN job cluster stuck in retrying creating JobManager if savepoint is > corrupted > > > Key: FLINK-22506 > URL: https://issues.apache.org/jira/browse/FLINK-22506 > Project: Flink > Issue Type: Improvement > Components: Deployment / YARN >Affects Versions: 1.11.3 >Reporter: Paul Lin >Priority: Major > Attachments: corrupted_savepoint.log > > > If a non-retryable error (e.g. the savepoint is corrupted or unaccessible) > occurs during the initiation of the job manager, the job cluster exits with > an error code. But since it does not mark the attempt as failed, it won't be > count as a failed attempt, and YARN will keep retrying forever. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-22506) YARN job cluster stuck in retrying creating JobManager if savepoint is corrupted
[ https://issues.apache.org/jira/browse/FLINK-22506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Knauf updated FLINK-22506: - Issue Type: Improvement (was: Bug) > YARN job cluster stuck in retrying creating JobManager if savepoint is > corrupted > > > Key: FLINK-22506 > URL: https://issues.apache.org/jira/browse/FLINK-22506 > Project: Flink > Issue Type: Improvement > Components: Deployment / YARN >Affects Versions: 1.11.3 >Reporter: Paul Lin >Priority: Major > > If a non-retryable error (e.g. the savepoint is corrupted or unaccessible) > occurs during the initiation of the job manager, the job cluster exits with > an error code. But since it does not mark the attempt as failed, it won't be > count as a failed attempt, and YARN will keep retrying forever. -- This message was sent by Atlassian Jira (v8.3.4#803005)