Flink TaskManager container got restarted by K8S very frequently

Fan Xie Wed, 21 Jul 2021 20:42:26 -0700

Hi Flink Community,

Recently I deployed a Flink cluster(1 JM, 1TM) with k8s standalone mode. Later 
on I notice that the pod which the TM is running on got restarted by k8s very 
frequently (3 times within 10 minutes).  And I didn't see any error log for 
this pod. I tried to increase the container memory in both flink-conf.yaml file 
and k8s yaml file but that didn't help to solve this problem either.  Are there 
any other issues that may cause this problem? My k8s cluster has 5 nodes, each 
node has 4 vcpu and 16GB memory and the TM is not running any job.


flink-conf.yaml:

jobmanager.memory.process.size: 1600Mb
jobmanager.rpc.address: flink-test-job-jobmanager-service
blob.server.port: 6124
query.server.port: 6125

taskmanager.memory.process.size: 2048Mb
taskmanager.numberOfTaskSlots: 1

state.backend: filesystem
state.checkpoints.dir: file:///tmp/flink-checkpoints-directory
state.savepoints.dir: file:///tmp/flink-savepoints-directory


heartbeat.interval: 1000
heartbeat.timeout: 5000

task manager yaml file:

spec:
  containers:
  - name: taskmanager
    image: ###
    imagePullPolicy: Always
    command: ["taskmanager.sh"]
    args: ["start-foreground"]
    env:
      - name: JOB_MANAGER_RPC_ADDRESS
        value: flink-test-job-jobmanager-service
    resources:
      limits:
        cpu: 4
        memory: "4096Mi"
      requests:
        cpu: 1
        memory: "2048Mi"
    ports:
    - containerPort: 6122
      name: rpc
    livenessProbe:
      tcpSocket:
        port: 6122
      initialDelaySeconds: 30
      periodSeconds: 60
    volumeMounts:
    - name: flink-config-volume
      mountPath: /opt/flink/conf/
    securityContext:
      runAsUser: 9999
  volumes:
  - name: flink-config-volume
    configMap:
      name: test-job-config
      items:
      - key: flink-conf.yaml
        path: flink-conf.yaml
      - key: log4j-console.properties
        path: log4j-console.properties
      - key: log4j-cli.properties
        path: log4j-cli.properties

flink-conf.yaml
Description: flink-conf.yaml

Flink TaskManager container got restarted by K8S very frequently

Reply via email to