Adrian Vasiliu created FLINK-25098:
--------------------------------------

             Summary: Jobmanager CrashLoopBackOff in HA configuration
                 Key: FLINK-25098
                 URL: https://issues.apache.org/jira/browse/FLINK-25098
             Project: Flink
          Issue Type: Bug
          Components: Deployment / Kubernetes
    Affects Versions: 1.13.3, 1.13.2
         Environment: Reproduced with:
* Persistent jobs storage provided by the rocks-cephfs storage class.
* OpenShift 4.9.5.
            Reporter: Adrian Vasiliu


In a Kubernetes deployment of Flink 1.13.2 (also reproduced with Flink 1.13.3), 
turning to Flink HA by using 3 replicas of the jobmanager leads to 
CrashLoopBackoff for all replicas.

Attaching the full logs of the `jobmanager` and tls-proxy` containers of 
jobmanager pod:
[^jm-flink-ha-jobmanager-log.txt]
[^jm-flink-ha-tls-proxy-log.txt]

Remarks:
* This is a follow-up of 
https://issues.apache.org/jira/browse/FLINK-22014?focusedCommentId=17450524&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17450524.
 
* Picked Critical severity as HA is critical for our product.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to