Hi Flink Community, Recently I deployed a Flink cluster(1 JM, 1TM) with k8s standalone mode. Later on I notice that the pod which the TM is running on got restarted by k8s very frequently (3 times within 10 minutes). And I didn't see any error log for this pod. I tried to increase the container memory in both flink-conf.yaml file and k8s yaml file but that didn't help to solve this problem either. Are there any other issues that may cause this problem? My k8s cluster has 5 nodes, each node has 4 vcpu and 16GB memory and the TM is not running any job.
flink-conf.yaml: jobmanager.memory.process.size: 1600Mb jobmanager.rpc.address: flink-test-job-jobmanager-service blob.server.port: 6124 query.server.port: 6125 taskmanager.memory.process.size: 2048Mb taskmanager.numberOfTaskSlots: 1 state.backend: filesystem state.checkpoints.dir: file:///tmp/flink-checkpoints-directory state.savepoints.dir: file:///tmp/flink-savepoints-directory heartbeat.interval: 1000 heartbeat.timeout: 5000 task manager yaml file: spec: containers: - name: taskmanager image: ### imagePullPolicy: Always command: ["taskmanager.sh"] args: ["start-foreground"] env: - name: JOB_MANAGER_RPC_ADDRESS value: flink-test-job-jobmanager-service resources: limits: cpu: 4 memory: "4096Mi" requests: cpu: 1 memory: "2048Mi" ports: - containerPort: 6122 name: rpc livenessProbe: tcpSocket: port: 6122 initialDelaySeconds: 30 periodSeconds: 60 volumeMounts: - name: flink-config-volume mountPath: /opt/flink/conf/ securityContext: runAsUser: 9999 volumes: - name: flink-config-volume configMap: name: test-job-config items: - key: flink-conf.yaml path: flink-conf.yaml - key: log4j-console.properties path: log4j-console.properties - key: log4j-cli.properties path: log4j-cli.properties
flink-conf.yaml
Description: flink-conf.yaml