Did you use the "jobmanager.sh start-foreground" in your own
"run-job-manager.sh", just like what Flink has done
in the docker-entrypoint.sh[1]?

I strongly suggest to start the Flink session cluster with official
yamls[2].

[1].
https://github.com/apache/flink-docker/blob/master/1.13/scala_2.11-java11-debian/docker-entrypoint.sh#L114
[2].
https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/resource-providers/standalone/kubernetes/#starting-a-kubernetes-cluster-session-mode

Best,
Yang

Qihua Yang <yang...@gmail.com> 于2021年10月1日周五 上午2:59写道:

> Looks like after script *flink-daemon.sh *complete, it return exit 0.
> Kubernetes regard it as done. Is that expected?
>
> Thanks,
> Qihua
>
> On Thu, Sep 30, 2021 at 11:11 AM Qihua Yang <yang...@gmail.com> wrote:
>
>> Thank you for your reply.
>> From the log, exit code is 0, and reason is Completed.
>> Looks like the cluster is fine. But why kubenetes restart the pod. As you
>> said, from perspective of Kubernetes everything is done. Then how to
>> prevent the restart?
>> It didn't even give chance to upload and run a jar....
>>
>>     Ports:         8081/TCP, 6123/TCP, 6124/TCP, 6125/TCP
>>     Host Ports:    0/TCP, 0/TCP, 0/TCP, 0/TCP
>>     Command:
>>       /opt/flink/bin/entrypoint.sh
>>     Args:
>>       /opt/flink/bin/run-job-manager.sh
>>     State:          Waiting
>>       Reason:       CrashLoopBackOff
>>     Last State:     Terminated
>>       Reason:       Completed
>>       Exit Code:    0
>>       Started:      Wed, 29 Sep 2021 20:12:30 -0700
>>       Finished:     Wed, 29 Sep 2021 20:12:45 -0700
>>     Ready:          False
>>     Restart Count:  131
>>
>> Thanks,
>> Qihua
>>
>> On Thu, Sep 30, 2021 at 1:00 AM Chesnay Schepler <ches...@apache.org>
>> wrote:
>>
>>> Is the run-job-manager.sh script actually blocking?
>>> Since you (apparently) use that as an entrypoint, if that scripts exits
>>> after starting the JM then from the perspective of Kubernetes everything is
>>> done.
>>>
>>> On 30/09/2021 08:59, Matthias Pohl wrote:
>>>
>>> Hi Qihua,
>>> I guess, looking into kubectl describe and the JobManager logs would
>>> help in understanding what's going on.
>>>
>>> Best,
>>> Matthias
>>>
>>> On Wed, Sep 29, 2021 at 8:37 PM Qihua Yang <yang...@gmail.com> wrote:
>>>
>>>> Hi,
>>>> I deployed flink in session mode. I didn't run any jobs. I saw below
>>>> logs. That is normal, same as Flink menual shows.
>>>>
>>>> + /opt/flink/bin/run-job-manager.sh
>>>> Starting HA cluster with 1 masters.
>>>> Starting standalonesession daemon on host job-manager-776dcf6dd-xzs8g.
>>>> Starting taskexecutor daemon on host job-manager-776dcf6dd-xzs8g.
>>>>
>>>>
>>>> But when I check kubectl, it shows status is Completed. After a while,
>>>> status changed to CrashLoopBackOff, and pod restart.
>>>> NAME                                                              READY
>>>>   STATUS             RESTARTS   AGE
>>>> job-manager-776dcf6dd-xzs8g       0/1     Completed          5
>>>>  5m27s
>>>>
>>>> NAME                                                              READY
>>>>   STATUS             RESTARTS   AGE
>>>> job-manager-776dcf6dd-xzs8g       0/1     CrashLoopBackOff   5
>>>>  7m35s
>>>>
>>>> Anyone can help me understand why?
>>>> Why do kubernetes regard this pod as completed and restart? Should I
>>>> config something? either Flink side or Kubernetes side? From the Flink
>>>> manual, after the cluster is started, I can upload a jar to run the
>>>> application.
>>>>
>>>> Thanks,
>>>> Qihua
>>>>
>>>
>>>

Reply via email to