Did you use the "jobmanager.sh start-foreground" in your own "run-job-manager.sh", just like what Flink has done in the docker-entrypoint.sh[1]?
I strongly suggest to start the Flink session cluster with official yamls[2]. [1]. https://github.com/apache/flink-docker/blob/master/1.13/scala_2.11-java11-debian/docker-entrypoint.sh#L114 [2]. https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/resource-providers/standalone/kubernetes/#starting-a-kubernetes-cluster-session-mode Best, Yang Qihua Yang <yang...@gmail.com> 于2021年10月1日周五 上午2:59写道: > Looks like after script *flink-daemon.sh *complete, it return exit 0. > Kubernetes regard it as done. Is that expected? > > Thanks, > Qihua > > On Thu, Sep 30, 2021 at 11:11 AM Qihua Yang <yang...@gmail.com> wrote: > >> Thank you for your reply. >> From the log, exit code is 0, and reason is Completed. >> Looks like the cluster is fine. But why kubenetes restart the pod. As you >> said, from perspective of Kubernetes everything is done. Then how to >> prevent the restart? >> It didn't even give chance to upload and run a jar.... >> >> Ports: 8081/TCP, 6123/TCP, 6124/TCP, 6125/TCP >> Host Ports: 0/TCP, 0/TCP, 0/TCP, 0/TCP >> Command: >> /opt/flink/bin/entrypoint.sh >> Args: >> /opt/flink/bin/run-job-manager.sh >> State: Waiting >> Reason: CrashLoopBackOff >> Last State: Terminated >> Reason: Completed >> Exit Code: 0 >> Started: Wed, 29 Sep 2021 20:12:30 -0700 >> Finished: Wed, 29 Sep 2021 20:12:45 -0700 >> Ready: False >> Restart Count: 131 >> >> Thanks, >> Qihua >> >> On Thu, Sep 30, 2021 at 1:00 AM Chesnay Schepler <ches...@apache.org> >> wrote: >> >>> Is the run-job-manager.sh script actually blocking? >>> Since you (apparently) use that as an entrypoint, if that scripts exits >>> after starting the JM then from the perspective of Kubernetes everything is >>> done. >>> >>> On 30/09/2021 08:59, Matthias Pohl wrote: >>> >>> Hi Qihua, >>> I guess, looking into kubectl describe and the JobManager logs would >>> help in understanding what's going on. >>> >>> Best, >>> Matthias >>> >>> On Wed, Sep 29, 2021 at 8:37 PM Qihua Yang <yang...@gmail.com> wrote: >>> >>>> Hi, >>>> I deployed flink in session mode. I didn't run any jobs. I saw below >>>> logs. That is normal, same as Flink menual shows. >>>> >>>> + /opt/flink/bin/run-job-manager.sh >>>> Starting HA cluster with 1 masters. >>>> Starting standalonesession daemon on host job-manager-776dcf6dd-xzs8g. >>>> Starting taskexecutor daemon on host job-manager-776dcf6dd-xzs8g. >>>> >>>> >>>> But when I check kubectl, it shows status is Completed. After a while, >>>> status changed to CrashLoopBackOff, and pod restart. >>>> NAME READY >>>> STATUS RESTARTS AGE >>>> job-manager-776dcf6dd-xzs8g 0/1 Completed 5 >>>> 5m27s >>>> >>>> NAME READY >>>> STATUS RESTARTS AGE >>>> job-manager-776dcf6dd-xzs8g 0/1 CrashLoopBackOff 5 >>>> 7m35s >>>> >>>> Anyone can help me understand why? >>>> Why do kubernetes regard this pod as completed and restart? Should I >>>> config something? either Flink side or Kubernetes side? From the Flink >>>> manual, after the cluster is started, I can upload a jar to run the >>>> application. >>>> >>>> Thanks, >>>> Qihua >>>> >>> >>>