[jira] [Updated] (FLINK-22506) YARN job cluster stuck in retrying creating JobManager if savepoint is corrupted

2021-04-30 Thread Paul Lin (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-22506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Lin updated FLINK-22506:
-
Attachment: yarn application attempts.png

> YARN job cluster stuck in retrying creating JobManager if savepoint is 
> corrupted
> 
>
> Key: FLINK-22506
> URL: https://issues.apache.org/jira/browse/FLINK-22506
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Affects Versions: 1.11.3
>Reporter: Paul Lin
>Priority: Major
> Attachments: corrupted_savepoint.log, yarn application attempts.png
>
>
> If a non-retryable error (e.g. the savepoint is corrupted or unaccessible) 
> occurs during the initiation of the job manager, the job cluster exits with 
> an error code. But since it does not mark the attempt as failed, it won't be 
> count as a failed attempt, and YARN will keep retrying forever.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-22506) YARN job cluster stuck in retrying creating JobManager if savepoint is corrupted

2021-04-30 Thread Paul Lin (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-22506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Lin updated FLINK-22506:
-
Attachment: corrupted_savepoint.log

> YARN job cluster stuck in retrying creating JobManager if savepoint is 
> corrupted
> 
>
> Key: FLINK-22506
> URL: https://issues.apache.org/jira/browse/FLINK-22506
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Affects Versions: 1.11.3
>Reporter: Paul Lin
>Priority: Major
> Attachments: corrupted_savepoint.log
>
>
> If a non-retryable error (e.g. the savepoint is corrupted or unaccessible) 
> occurs during the initiation of the job manager, the job cluster exits with 
> an error code. But since it does not mark the attempt as failed, it won't be 
> count as a failed attempt, and YARN will keep retrying forever.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-22506) YARN job cluster stuck in retrying creating JobManager if savepoint is corrupted

2021-04-29 Thread Konstantin Knauf (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-22506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Knauf updated FLINK-22506:
-
Issue Type: Improvement  (was: Bug)

> YARN job cluster stuck in retrying creating JobManager if savepoint is 
> corrupted
> 
>
> Key: FLINK-22506
> URL: https://issues.apache.org/jira/browse/FLINK-22506
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Affects Versions: 1.11.3
>Reporter: Paul Lin
>Priority: Major
>
> If a non-retryable error (e.g. the savepoint is corrupted or unaccessible) 
> occurs during the initiation of the job manager, the job cluster exits with 
> an error code. But since it does not mark the attempt as failed, it won't be 
> count as a failed attempt, and YARN will keep retrying forever.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)