Great to hear that it works on K8s and letting us know that the problem is likely to be caused by Minikube.
Cheers, Till On Fri, Sep 4, 2020 at 8:53 AM superainbower <superainbo...@163.com> wrote: > Hi Till & Yang, > I can deploy Flink on kubernetes(not minikube), it works well > So there are some problem about my minikube but I can’t find and fix it > Anyway I can deploy on k8s now > Thanks for your help! > superainbower > superainbo...@163.com > > <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=superainbower&uid=superainbower%40163.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png&items=%5B%22superainbower%40163.com%22%5D> > 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail81> 定制 > > On 09/3/2020 15:47,Till Rohrmann<trohrm...@apache.org> > <trohrm...@apache.org> wrote: > > In order to exclude a Minikube problem, you could also try to run Flink on > an older Minikube and an older K8s version. Our end-to-end tests use > Minikube v1.8.2, for example. > > Cheers, > Till > > On Thu, Sep 3, 2020 at 8:44 AM Yang Wang <danrtsey...@gmail.com> wrote: > >> Sorry i forget that the JobManager is binding its rpc address to >> flink-jobmanager, not the ip address. >> So you need to also update the jobmanager-session-deployment.yaml with >> following changes. >> >> ... >> containers: >> - name: jobmanager >> env: >> - name: JM_IP >> valueFrom: >> fieldRef: >> apiVersion: v1 >> fieldPath: status.podIP >> image: flink:1.11 >> args: ["jobmanager", "$(JM_IP)"] >> ... >> >> After then the JobManager is binding the rpc address with its ip. >> >> Best, >> Yang >> >> >> superainbower <superainbo...@163.com> 于2020年9月3日周四 上午11:38写道: >> >>> HI Yang, >>> I update taskmanager-session-deployment.yaml like this: >>> >>> apiVersion: apps/v1 >>> kind: Deployment >>> metadata: >>> name: flink-taskmanager >>> spec: >>> replicas: 1 >>> selector: >>> matchLabels: >>> app: flink >>> component: taskmanager >>> template: >>> metadata: >>> labels: >>> app: flink >>> component: taskmanager >>> spec: >>> containers: >>> - name: taskmanager >>> image: >>> registry.cn-hangzhou.aliyuncs.com/superainbower/flink:1.11.1 >>> args: ["taskmanager","-Djobmanager.rpc.address=172.18.0.5"] >>> ports: >>> - containerPort: 6122 >>> name: rpc >>> - containerPort: 6125 >>> name: query-state >>> livenessProbe: >>> tcpSocket: >>> port: 6122 >>> initialDelaySeconds: 30 >>> periodSeconds: 60 >>> volumeMounts: >>> - name: flink-config-volume >>> mountPath: /opt/flink/conf/ >>> securityContext: >>> runAsUser: 9999 # refers to user _flink_ from official flink >>> image, change if necessary >>> volumes: >>> - name: flink-config-volume >>> configMap: >>> name: flink-config >>> items: >>> - key: flink-conf.yaml >>> path: flink-conf.yaml >>> - key: log4j-console.properties >>> path: log4j-console.properties >>> imagePullSecrets: >>> - name: regcred >>> >>> And Delete the TaskManager pod and restart it , but the logs print this >>> >>> Could not resolve ResourceManager address akka.tcp:// >>> flink@172.18.0.5:6123/user/rpc/resourcemanager_*, retrying in 10000 ms: >>> Could not connect to rpc endpoint under address akka.tcp:// >>> flink@172.18.0.5:6123/user/rpc/resourcemanager_* >>> >>> It change flink-jobmanager to 172.18.0.5 >>> superainbower >>> superainbo...@163.com >>> >>> <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=superainbower&uid=superainbower%40163.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png&items=%5B%22superainbower%40163.com%22%5D> >>> 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail81> 定制 >>> >>> On 09/3/2020 11:09,Yang Wang<danrtsey...@gmail.com> >>> <danrtsey...@gmail.com> wrote: >>> >>> I guess something is wrong with your kube proxy, which causes >>> TaskManager could not connect to JobManager. >>> You could verify this by directly using JobManager Pod ip instead of >>> service name. >>> >>> Please do as follows. >>> * Edit the TaskManager deployment(via kubectl edit flink-taskmanager) >>> and update the args field to the following. >>> args: ["taskmanager", "-Djobmanager.rpc.address=172.18.0.5"] >>> Given that "172.18.0.5" is the JobManager pod ip. >>> * Delete the current TaskManager pod and let restart again >>> * Now check the TaskManager logs to check whether it could register >>> successfully >>> >>> >>> >>> Best, >>> Yang >>> >>> superainbower <superainbo...@163.com> 于2020年9月3日周四 上午9:35写道: >>> >>>> Hi Till, >>>> I find something may be helpful. >>>> The kubernetes Dashboard show job-manager ip 172.18.0.5, task-manager >>>> ip 172.18.0.6 >>>> When I run command 'kubectl exec -ti flink-taskmanager-74c68c6f48-jqpbn >>>> -- /bin/bash’ && ‘ping 172.18.0.5’ >>>> I can get response >>>> But when I ping flink-jobmanager ,there is no response >>>> >>>> superainbower >>>> superainbo...@163.com >>>> >>>> <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=superainbower&uid=superainbower%40163.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png&items=%5B%22superainbower%40163.com%22%5D> >>>> 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail81> 定制 >>>> >>>> On 09/3/2020 09:03,superainbower<superainbo...@163.com> >>>> <superainbo...@163.com> wrote: >>>> >>>> Hi Till, >>>> This is the taskManager log >>>> As you see, the logs print ‘line 92 -- Could not connect to >>>> flink-jobmanager:6123’ >>>> then print ‘line 128 --Could not resolve ResourceManager address >>>> akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*, >>>> retrying in 10000 ms: Could not connect to rpc endpoint under address >>>> akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*.’ >>>> And repeat print this >>>> >>>> A few minutes later, the taskmanger shut down and restart >>>> >>>> This is my yaml files, could u help me to confirm did I >>>> omitted something? Thanks a lot! >>>> --------------------------------------------------- >>>> flink-configuration-configmap.yaml >>>> apiVersion: v1 >>>> kind: ConfigMap >>>> metadata: >>>> name: flink-config >>>> labels: >>>> app: flink >>>> data: >>>> flink-conf.yaml: |+ >>>> jobmanager.rpc.address: flink-jobmanager >>>> taskmanager.numberOfTaskSlots: 1 >>>> blob.server.port: 6124 >>>> jobmanager.rpc.port: 6123 >>>> taskmanager.rpc.port: 6122 >>>> queryable-state.proxy.ports: 6125 >>>> jobmanager.memory.process.size: 1024m >>>> taskmanager.memory.process.size: 1024m >>>> parallelism.default: 1 >>>> log4j-console.properties: |+ >>>> rootLogger.level = INFO >>>> rootLogger.appenderRef.console.ref = ConsoleAppender >>>> rootLogger.appenderRef.rolling.ref = RollingFileAppender >>>> logger.akka.name = akka >>>> logger.akka.level = INFO >>>> logger.kafka.name= org.apache.kafka >>>> logger.kafka.level = INFO >>>> logger.hadoop.name = org.apache.hadoop >>>> logger.hadoop.level = INFO >>>> logger.zookeeper.name = org.apache.zookeeper >>>> logger.zookeeper.level = INFO >>>> appender.console.name = ConsoleAppender >>>> appender.console.type = CONSOLE >>>> appender.console.layout.type = PatternLayout >>>> appender.console.layout.pattern = %d{yyyy-MM-dd HH:mm:ss,SSS} %-5p >>>> %-60c %x - %m%n >>>> appender.rolling.name = RollingFileAppender >>>> appender.rolling.type = RollingFile >>>> appender.rolling.append = false >>>> appender.rolling.fileName = ${sys:log.file} >>>> appender.rolling.filePattern = ${sys:log.file}.%i >>>> appender.rolling.layout.type = PatternLayout >>>> appender.rolling.layout.pattern = %d{yyyy-MM-dd HH:mm:ss,SSS} %-5p >>>> %-60c %x - %m%n >>>> appender.rolling.policies.type = Policies >>>> appender.rolling.policies.size.type = SizeBasedTriggeringPolicy >>>> appender.rolling.policies.size.size=100MB >>>> appender.rolling.strategy.type = DefaultRolloverStrategy >>>> appender.rolling.strategy.max = 10 >>>> logger.netty.name = >>>> org.apache.flink.shaded.akka.org.jboss.netty.channel.DefaultChannelPipeline >>>> logger.netty.level = OFF >>>> --------------------------------------------------- >>>> jobmanager-service.yaml >>>> apiVersion: v1 >>>> kind: Service >>>> metadata: >>>> name: flink-jobmanager >>>> spec: >>>> type: ClusterIP >>>> ports: >>>> - name: rpc >>>> port: 6123 >>>> - name: blob-server >>>> port: 6124 >>>> - name: webui >>>> port: 8081 >>>> selector: >>>> app: flink >>>> component: jobmanager >>>> -------------------------------------------------- >>>> jobmanager-session-deployment.yaml >>>> apiVersion: apps/v1 >>>> kind: Deployment >>>> metadata: >>>> name: flink-jobmanager >>>> spec: >>>> replicas: 1 >>>> selector: >>>> matchLabels: >>>> app: flink >>>> component: jobmanager >>>> template: >>>> metadata: >>>> labels: >>>> app: flink >>>> component: jobmanager >>>> spec: >>>> containers: >>>> - name: jobmanager >>>> image: >>>> registry.cn-hangzhou.aliyuncs.com/superainbower/flink:1.11.1 >>>> args: ["jobmanager"] >>>> ports: >>>> - containerPort: 6123 >>>> name: rpc >>>> - containerPort: 6124 >>>> name: blob-server >>>> - containerPort: 8081 >>>> name: webui >>>> livenessProbe: >>>> tcpSocket: >>>> port: 6123 >>>> initialDelaySeconds: 30 >>>> periodSeconds: 60 >>>> volumeMounts: >>>> - name: flink-config-volume >>>> mountPath: /opt/flink/conf >>>> securityContext: >>>> runAsUser: 9999 # refers to user _flink_ from official flink >>>> image, change if necessary >>>> volumes: >>>> - name: flink-config-volume >>>> configMap: >>>> name: flink-config >>>> items: >>>> - key: flink-conf.yaml >>>> path: flink-conf.yaml >>>> - key: log4j-console.properties >>>> path: log4j-console.properties >>>> imagePullSecrets: >>>> - name: regcred >>>> --------------------------------------------------- >>>> taskmanager-session-deployment.yaml >>>> apiVersion: apps/v1 >>>> kind: Deployment >>>> metadata: >>>> name: flink-taskmanager >>>> spec: >>>> replicas: 1 >>>> selector: >>>> matchLabels: >>>> app: flink >>>> component: taskmanager >>>> template: >>>> metadata: >>>> labels: >>>> app: flink >>>> component: taskmanager >>>> spec: >>>> containers: >>>> - name: taskmanager >>>> image: >>>> registry.cn-hangzhou.aliyuncs.com/superainbower/flink:1.11.1 >>>> args: ["taskmanager"] >>>> ports: >>>> - containerPort: 6122 >>>> name: rpc >>>> - containerPort: 6125 >>>> name: query-state >>>> livenessProbe: >>>> tcpSocket: >>>> port: 6122 >>>> initialDelaySeconds: 30 >>>> periodSeconds: 60 >>>> volumeMounts: >>>> - name: flink-config-volume >>>> mountPath: /opt/flink/conf/ >>>> securityContext: >>>> runAsUser: 9999 # refers to user _flink_ from official flink >>>> image, change if necessary >>>> volumes: >>>> - name: flink-config-volume >>>> configMap: >>>> name: flink-config >>>> items: >>>> - key: flink-conf.yaml >>>> path: flink-conf.yaml >>>> - key: log4j-console.properties >>>> path: log4j-console.properties >>>> imagePullSecrets: >>>> - name: regcred >>>> >>>> >>>> superainbower >>>> superainbo...@163.com >>>> >>>> <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=superainbower&uid=superainbower%40163.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png&items=%5B%22superainbower%40163.com%22%5D> >>>> 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail81> 定制 >>>> >>>> On 09/2/2020 20:38,Till Rohrmann<trohrm...@apache.org> >>>> <trohrm...@apache.org> wrote: >>>> >>>> Hmm, this is indeed strange. Could you share the logs of the >>>> TaskManager with us? Ideally you set the log level to debug. Thanks a lot. >>>> >>>> Cheers, >>>> Till >>>> >>>> On Wed, Sep 2, 2020 at 12:45 PM art <superainbo...@163.com> wrote: >>>> >>>>> Hi Till, >>>>> >>>>> The full information when I run command ' kubectl get all’ like this: >>>>> >>>>> NAME READY STATUS RESTARTS >>>>> AGE >>>>> pod/flink-jobmanager-85bdbd98d8-ppjmf 1/1 Running 0 >>>>> 2m34s >>>>> pod/flink-taskmanager-74c68c6f48-6jb5v 1/1 Running 0 >>>>> 2m34s >>>>> >>>>> NAME TYPE CLUSTER-IP EXTERNAL-IP >>>>> PORT(S) AGE >>>>> service/flink-jobmanager ClusterIP 10.103.207.75 <none> >>>>> 6123/TCP,6124/TCP,8081/TCP 2m34s >>>>> service/kubernetes ClusterIP 10.96.0.1 <none> >>>>> 443/TCP 5d2h >>>>> >>>>> NAME READY UP-TO-DATE AVAILABLE >>>>> AGE >>>>> deployment.apps/flink-jobmanager 1/1 1 1 >>>>> 2m34s >>>>> deployment.apps/flink-taskmanager 1/1 1 1 >>>>> 2m34s >>>>> >>>>> NAME DESIRED CURRENT >>>>> READY AGE >>>>> replicaset.apps/flink-jobmanager-85bdbd98d8 1 1 1 >>>>> 2m34s >>>>> replicaset.apps/flink-taskmanager-74c68c6f48 1 1 1 >>>>> 2m34s >>>>> >>>>> And I can open flink ui but the task manger is 0 ,so the job manger is >>>>> work well >>>>> I think the problem is taksmanger can not register itself to >>>>> jobmanger, did I miss some configure? >>>>> >>>>> >>>>> 在 2020年9月2日,下午5:24,Till Rohrmann <trohrm...@apache.org> 写道: >>>>> >>>>> Hi art, >>>>> >>>>> could you check what `kubectl get services` returns? Usually if you >>>>> run `kubectl get all` you should also see the services. But in your case >>>>> there are no services listed. You have see something like >>>>> service/flink-jobmanager otherwise the flink-jobmanager service (K8s >>>>> service) is not running. >>>>> >>>>> Cheers, >>>>> Till >>>>> >>>>> On Wed, Sep 2, 2020 at 11:15 AM art <superainbo...@163.com> wrote: >>>>> >>>>>> Hi Till, >>>>>> >>>>>> I’m sure the job manager-service is started, I can find it in >>>>>> Kubernetes DashBoard >>>>>> >>>>>> When I run command ' kubectl get deployment’ I can got this: >>>>>> flink-jobmanager 1/1 1 1 33s >>>>>> flink-taskmanager 1/1 1 1 33s >>>>>> >>>>>> When I run command ' kubectl get all’ I can got this: >>>>>> NAME READY STATUS RESTARTS >>>>>> AGE >>>>>> pod/flink-jobmanager-85bdbd98d8-ppjmf 1/1 Running 0 >>>>>> 2m34s >>>>>> pod/flink-taskmanager-74c68c6f48-6jb5v 1/1 Running 0 >>>>>> 2m34s >>>>>> >>>>>> So, I think flink-jobmanager works well, but taskmannger is restarted >>>>>> every few minutes >>>>>> >>>>>> My minikube version: v1.12.3 >>>>>> Flink version:v1.11.1 >>>>>> >>>>>> 在 2020年9月2日,下午4:27,Till Rohrmann <trohrm...@apache.org> 写道: >>>>>> >>>>>> Hi art, >>>>>> >>>>>> could you verify that the jobmanager-service has been started? It >>>>>> looks as if the name flink-jobmanager is not resolvable. It could also >>>>>> help >>>>>> to know the Minikube and K8s version you are using. >>>>>> >>>>>> Cheers, >>>>>> Till >>>>>> >>>>>> On Wed, Sep 2, 2020 at 9:50 AM art <superainbo...@163.com> wrote: >>>>>> >>>>>>> Hi,I’m going to deploy flink on minikube referring to >>>>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.11/zh/ops/deployment/kubernetes.html >>>>>>> ; >>>>>>> kubectl create -f flink-configuration-configmap.yaml >>>>>>> kubectl create -f jobmanager-service.yaml >>>>>>> kubectl create -f jobmanager-session-deployment.yaml >>>>>>> kubectl create -f taskmanager-session-deployment.yaml >>>>>>> >>>>>>> But I got this >>>>>>> >>>>>>> 2020-09-02 06:45:42,664 WARN akka.remote.ReliableDeliverySupervisor >>>>>>> [] - Association with remote system [ >>>>>>> akka.tcp://flink@flink-jobmanager:6123] has failed, address is now >>>>>>> gated for [50] ms. Reason: [Association failed with [ >>>>>>> akka.tcp://flink@flink-jobmanager:6123]] Caused by: >>>>>>> [java.net.UnknownHostException: flink-jobmanager: Temporary failure in >>>>>>> name >>>>>>> resolution] >>>>>>> 2020-09-02 06:45:42,691 INFO >>>>>>> org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Could >>>>>>> not resolve ResourceManager address >>>>>>> akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*, >>>>>>> retrying in 10000 ms: Could not connect to rpc endpoint under address >>>>>>> akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*. >>>>>>> 2020-09-02 06:46:02,731 INFO >>>>>>> org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Could >>>>>>> not resolve ResourceManager address >>>>>>> akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*, >>>>>>> retrying in 10000 ms: Could not connect to rpc endpoint under address >>>>>>> akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*. >>>>>>> 2020-09-02 06:46:12,731 INFO >>>>>>> akka.remote.transport.ProtocolStateActor [] - No >>>>>>> response from remote for outbound association. Associate timed out after >>>>>>> [20000 ms]. >>>>>>> >>>>>>> And when I run the command 'kubectl exec -ti >>>>>>> flink-taskmanager-74c68c6f48-9tkvd -- /bin/bash’ && ‘ping >>>>>>> flink-jobmanager’ >>>>>>> , I find I cannot ping flink-jobmanager from taskmanager >>>>>>> >>>>>>> I am new to k8s, can anyone give me some tutorial? Thanks a lot ! >>>>>>> >>>>>> >>>>>> >>>>>