In order to exclude a Minikube problem, you could also try to run Flink on an older Minikube and an older K8s version. Our end-to-end tests use Minikube v1.8.2, for example.
Cheers, Till On Thu, Sep 3, 2020 at 8:44 AM Yang Wang <danrtsey...@gmail.com> wrote: > Sorry i forget that the JobManager is binding its rpc address to > flink-jobmanager, not the ip address. > So you need to also update the jobmanager-session-deployment.yaml with > following changes. > > ... > containers: > - name: jobmanager > env: > - name: JM_IP > valueFrom: > fieldRef: > apiVersion: v1 > fieldPath: status.podIP > image: flink:1.11 > args: ["jobmanager", "$(JM_IP)"] > ... > > After then the JobManager is binding the rpc address with its ip. > > Best, > Yang > > > superainbower <superainbo...@163.com> 于2020年9月3日周四 上午11:38写道: > >> HI Yang, >> I update taskmanager-session-deployment.yaml like this: >> >> apiVersion: apps/v1 >> kind: Deployment >> metadata: >> name: flink-taskmanager >> spec: >> replicas: 1 >> selector: >> matchLabels: >> app: flink >> component: taskmanager >> template: >> metadata: >> labels: >> app: flink >> component: taskmanager >> spec: >> containers: >> - name: taskmanager >> image: >> registry.cn-hangzhou.aliyuncs.com/superainbower/flink:1.11.1 >> args: ["taskmanager","-Djobmanager.rpc.address=172.18.0.5"] >> ports: >> - containerPort: 6122 >> name: rpc >> - containerPort: 6125 >> name: query-state >> livenessProbe: >> tcpSocket: >> port: 6122 >> initialDelaySeconds: 30 >> periodSeconds: 60 >> volumeMounts: >> - name: flink-config-volume >> mountPath: /opt/flink/conf/ >> securityContext: >> runAsUser: 9999 # refers to user _flink_ from official flink >> image, change if necessary >> volumes: >> - name: flink-config-volume >> configMap: >> name: flink-config >> items: >> - key: flink-conf.yaml >> path: flink-conf.yaml >> - key: log4j-console.properties >> path: log4j-console.properties >> imagePullSecrets: >> - name: regcred >> >> And Delete the TaskManager pod and restart it , but the logs print this >> >> Could not resolve ResourceManager address akka.tcp:// >> flink@172.18.0.5:6123/user/rpc/resourcemanager_*, retrying in 10000 ms: >> Could not connect to rpc endpoint under address akka.tcp:// >> flink@172.18.0.5:6123/user/rpc/resourcemanager_* >> >> It change flink-jobmanager to 172.18.0.5 >> superainbower >> superainbo...@163.com >> >> <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=superainbower&uid=superainbower%40163.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png&items=%5B%22superainbower%40163.com%22%5D> >> 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail81> 定制 >> >> On 09/3/2020 11:09,Yang Wang<danrtsey...@gmail.com> >> <danrtsey...@gmail.com> wrote: >> >> I guess something is wrong with your kube proxy, which causes TaskManager >> could not connect to JobManager. >> You could verify this by directly using JobManager Pod ip instead of >> service name. >> >> Please do as follows. >> * Edit the TaskManager deployment(via kubectl edit flink-taskmanager) >> and update the args field to the following. >> args: ["taskmanager", "-Djobmanager.rpc.address=172.18.0.5"] Given >> that "172.18.0.5" is the JobManager pod ip. >> * Delete the current TaskManager pod and let restart again >> * Now check the TaskManager logs to check whether it could register >> successfully >> >> >> >> Best, >> Yang >> >> superainbower <superainbo...@163.com> 于2020年9月3日周四 上午9:35写道: >> >>> Hi Till, >>> I find something may be helpful. >>> The kubernetes Dashboard show job-manager ip 172.18.0.5, task-manager ip >>> 172.18.0.6 >>> When I run command 'kubectl exec -ti flink-taskmanager-74c68c6f48-jqpbn >>> -- /bin/bash’ && ‘ping 172.18.0.5’ >>> I can get response >>> But when I ping flink-jobmanager ,there is no response >>> >>> superainbower >>> superainbo...@163.com >>> >>> <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=superainbower&uid=superainbower%40163.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png&items=%5B%22superainbower%40163.com%22%5D> >>> 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail81> 定制 >>> >>> On 09/3/2020 09:03,superainbower<superainbo...@163.com> >>> <superainbo...@163.com> wrote: >>> >>> Hi Till, >>> This is the taskManager log >>> As you see, the logs print ‘line 92 -- Could not connect to >>> flink-jobmanager:6123’ >>> then print ‘line 128 --Could not resolve ResourceManager address >>> akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*, >>> retrying in 10000 ms: Could not connect to rpc endpoint under address >>> akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*.’ >>> And repeat print this >>> >>> A few minutes later, the taskmanger shut down and restart >>> >>> This is my yaml files, could u help me to confirm did I >>> omitted something? Thanks a lot! >>> --------------------------------------------------- >>> flink-configuration-configmap.yaml >>> apiVersion: v1 >>> kind: ConfigMap >>> metadata: >>> name: flink-config >>> labels: >>> app: flink >>> data: >>> flink-conf.yaml: |+ >>> jobmanager.rpc.address: flink-jobmanager >>> taskmanager.numberOfTaskSlots: 1 >>> blob.server.port: 6124 >>> jobmanager.rpc.port: 6123 >>> taskmanager.rpc.port: 6122 >>> queryable-state.proxy.ports: 6125 >>> jobmanager.memory.process.size: 1024m >>> taskmanager.memory.process.size: 1024m >>> parallelism.default: 1 >>> log4j-console.properties: |+ >>> rootLogger.level = INFO >>> rootLogger.appenderRef.console.ref = ConsoleAppender >>> rootLogger.appenderRef.rolling.ref = RollingFileAppender >>> logger.akka.name = akka >>> logger.akka.level = INFO >>> logger.kafka.name= org.apache.kafka >>> logger.kafka.level = INFO >>> logger.hadoop.name = org.apache.hadoop >>> logger.hadoop.level = INFO >>> logger.zookeeper.name = org.apache.zookeeper >>> logger.zookeeper.level = INFO >>> appender.console.name = ConsoleAppender >>> appender.console.type = CONSOLE >>> appender.console.layout.type = PatternLayout >>> appender.console.layout.pattern = %d{yyyy-MM-dd HH:mm:ss,SSS} %-5p >>> %-60c %x - %m%n >>> appender.rolling.name = RollingFileAppender >>> appender.rolling.type = RollingFile >>> appender.rolling.append = false >>> appender.rolling.fileName = ${sys:log.file} >>> appender.rolling.filePattern = ${sys:log.file}.%i >>> appender.rolling.layout.type = PatternLayout >>> appender.rolling.layout.pattern = %d{yyyy-MM-dd HH:mm:ss,SSS} %-5p >>> %-60c %x - %m%n >>> appender.rolling.policies.type = Policies >>> appender.rolling.policies.size.type = SizeBasedTriggeringPolicy >>> appender.rolling.policies.size.size=100MB >>> appender.rolling.strategy.type = DefaultRolloverStrategy >>> appender.rolling.strategy.max = 10 >>> logger.netty.name = >>> org.apache.flink.shaded.akka.org.jboss.netty.channel.DefaultChannelPipeline >>> logger.netty.level = OFF >>> --------------------------------------------------- >>> jobmanager-service.yaml >>> apiVersion: v1 >>> kind: Service >>> metadata: >>> name: flink-jobmanager >>> spec: >>> type: ClusterIP >>> ports: >>> - name: rpc >>> port: 6123 >>> - name: blob-server >>> port: 6124 >>> - name: webui >>> port: 8081 >>> selector: >>> app: flink >>> component: jobmanager >>> -------------------------------------------------- >>> jobmanager-session-deployment.yaml >>> apiVersion: apps/v1 >>> kind: Deployment >>> metadata: >>> name: flink-jobmanager >>> spec: >>> replicas: 1 >>> selector: >>> matchLabels: >>> app: flink >>> component: jobmanager >>> template: >>> metadata: >>> labels: >>> app: flink >>> component: jobmanager >>> spec: >>> containers: >>> - name: jobmanager >>> image: >>> registry.cn-hangzhou.aliyuncs.com/superainbower/flink:1.11.1 >>> args: ["jobmanager"] >>> ports: >>> - containerPort: 6123 >>> name: rpc >>> - containerPort: 6124 >>> name: blob-server >>> - containerPort: 8081 >>> name: webui >>> livenessProbe: >>> tcpSocket: >>> port: 6123 >>> initialDelaySeconds: 30 >>> periodSeconds: 60 >>> volumeMounts: >>> - name: flink-config-volume >>> mountPath: /opt/flink/conf >>> securityContext: >>> runAsUser: 9999 # refers to user _flink_ from official flink >>> image, change if necessary >>> volumes: >>> - name: flink-config-volume >>> configMap: >>> name: flink-config >>> items: >>> - key: flink-conf.yaml >>> path: flink-conf.yaml >>> - key: log4j-console.properties >>> path: log4j-console.properties >>> imagePullSecrets: >>> - name: regcred >>> --------------------------------------------------- >>> taskmanager-session-deployment.yaml >>> apiVersion: apps/v1 >>> kind: Deployment >>> metadata: >>> name: flink-taskmanager >>> spec: >>> replicas: 1 >>> selector: >>> matchLabels: >>> app: flink >>> component: taskmanager >>> template: >>> metadata: >>> labels: >>> app: flink >>> component: taskmanager >>> spec: >>> containers: >>> - name: taskmanager >>> image: >>> registry.cn-hangzhou.aliyuncs.com/superainbower/flink:1.11.1 >>> args: ["taskmanager"] >>> ports: >>> - containerPort: 6122 >>> name: rpc >>> - containerPort: 6125 >>> name: query-state >>> livenessProbe: >>> tcpSocket: >>> port: 6122 >>> initialDelaySeconds: 30 >>> periodSeconds: 60 >>> volumeMounts: >>> - name: flink-config-volume >>> mountPath: /opt/flink/conf/ >>> securityContext: >>> runAsUser: 9999 # refers to user _flink_ from official flink >>> image, change if necessary >>> volumes: >>> - name: flink-config-volume >>> configMap: >>> name: flink-config >>> items: >>> - key: flink-conf.yaml >>> path: flink-conf.yaml >>> - key: log4j-console.properties >>> path: log4j-console.properties >>> imagePullSecrets: >>> - name: regcred >>> >>> >>> superainbower >>> superainbo...@163.com >>> >>> <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=superainbower&uid=superainbower%40163.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png&items=%5B%22superainbower%40163.com%22%5D> >>> 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail81> 定制 >>> >>> On 09/2/2020 20:38,Till Rohrmann<trohrm...@apache.org> >>> <trohrm...@apache.org> wrote: >>> >>> Hmm, this is indeed strange. Could you share the logs of the TaskManager >>> with us? Ideally you set the log level to debug. Thanks a lot. >>> >>> Cheers, >>> Till >>> >>> On Wed, Sep 2, 2020 at 12:45 PM art <superainbo...@163.com> wrote: >>> >>>> Hi Till, >>>> >>>> The full information when I run command ' kubectl get all’ like this: >>>> >>>> NAME READY STATUS RESTARTS >>>> AGE >>>> pod/flink-jobmanager-85bdbd98d8-ppjmf 1/1 Running 0 >>>> 2m34s >>>> pod/flink-taskmanager-74c68c6f48-6jb5v 1/1 Running 0 >>>> 2m34s >>>> >>>> NAME TYPE CLUSTER-IP EXTERNAL-IP >>>> PORT(S) AGE >>>> service/flink-jobmanager ClusterIP 10.103.207.75 <none> >>>> 6123/TCP,6124/TCP,8081/TCP 2m34s >>>> service/kubernetes ClusterIP 10.96.0.1 <none> >>>> 443/TCP 5d2h >>>> >>>> NAME READY UP-TO-DATE AVAILABLE AGE >>>> deployment.apps/flink-jobmanager 1/1 1 1 >>>> 2m34s >>>> deployment.apps/flink-taskmanager 1/1 1 1 >>>> 2m34s >>>> >>>> NAME DESIRED CURRENT >>>> READY AGE >>>> replicaset.apps/flink-jobmanager-85bdbd98d8 1 1 1 >>>> 2m34s >>>> replicaset.apps/flink-taskmanager-74c68c6f48 1 1 1 >>>> 2m34s >>>> >>>> And I can open flink ui but the task manger is 0 ,so the job manger is >>>> work well >>>> I think the problem is taksmanger can not register itself to jobmanger, >>>> did I miss some configure? >>>> >>>> >>>> 在 2020年9月2日,下午5:24,Till Rohrmann <trohrm...@apache.org> 写道: >>>> >>>> Hi art, >>>> >>>> could you check what `kubectl get services` returns? Usually if you run >>>> `kubectl get all` you should also see the services. But in your case there >>>> are no services listed. You have see something like >>>> service/flink-jobmanager otherwise the flink-jobmanager service (K8s >>>> service) is not running. >>>> >>>> Cheers, >>>> Till >>>> >>>> On Wed, Sep 2, 2020 at 11:15 AM art <superainbo...@163.com> wrote: >>>> >>>>> Hi Till, >>>>> >>>>> I’m sure the job manager-service is started, I can find it in >>>>> Kubernetes DashBoard >>>>> >>>>> When I run command ' kubectl get deployment’ I can got this: >>>>> flink-jobmanager 1/1 1 1 33s >>>>> flink-taskmanager 1/1 1 1 33s >>>>> >>>>> When I run command ' kubectl get all’ I can got this: >>>>> NAME READY STATUS RESTARTS >>>>> AGE >>>>> pod/flink-jobmanager-85bdbd98d8-ppjmf 1/1 Running 0 >>>>> 2m34s >>>>> pod/flink-taskmanager-74c68c6f48-6jb5v 1/1 Running 0 >>>>> 2m34s >>>>> >>>>> So, I think flink-jobmanager works well, but taskmannger is restarted >>>>> every few minutes >>>>> >>>>> My minikube version: v1.12.3 >>>>> Flink version:v1.11.1 >>>>> >>>>> 在 2020年9月2日,下午4:27,Till Rohrmann <trohrm...@apache.org> 写道: >>>>> >>>>> Hi art, >>>>> >>>>> could you verify that the jobmanager-service has been started? It >>>>> looks as if the name flink-jobmanager is not resolvable. It could also >>>>> help >>>>> to know the Minikube and K8s version you are using. >>>>> >>>>> Cheers, >>>>> Till >>>>> >>>>> On Wed, Sep 2, 2020 at 9:50 AM art <superainbo...@163.com> wrote: >>>>> >>>>>> Hi,I’m going to deploy flink on minikube referring to >>>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.11/zh/ops/deployment/kubernetes.html >>>>>> ; >>>>>> kubectl create -f flink-configuration-configmap.yaml >>>>>> kubectl create -f jobmanager-service.yaml >>>>>> kubectl create -f jobmanager-session-deployment.yaml >>>>>> kubectl create -f taskmanager-session-deployment.yaml >>>>>> >>>>>> But I got this >>>>>> >>>>>> 2020-09-02 06:45:42,664 WARN akka.remote.ReliableDeliverySupervisor >>>>>> [] - Association with remote system [ >>>>>> akka.tcp://flink@flink-jobmanager:6123] has failed, address is now >>>>>> gated for [50] ms. Reason: [Association failed with [ >>>>>> akka.tcp://flink@flink-jobmanager:6123]] Caused by: >>>>>> [java.net.UnknownHostException: flink-jobmanager: Temporary failure in >>>>>> name >>>>>> resolution] >>>>>> 2020-09-02 06:45:42,691 INFO >>>>>> org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Could >>>>>> not resolve ResourceManager address >>>>>> akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*, >>>>>> retrying in 10000 ms: Could not connect to rpc endpoint under address >>>>>> akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*. >>>>>> 2020-09-02 06:46:02,731 INFO >>>>>> org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Could >>>>>> not resolve ResourceManager address >>>>>> akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*, >>>>>> retrying in 10000 ms: Could not connect to rpc endpoint under address >>>>>> akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*. >>>>>> 2020-09-02 06:46:12,731 INFO >>>>>> akka.remote.transport.ProtocolStateActor [] - No >>>>>> response from remote for outbound association. Associate timed out after >>>>>> [20000 ms]. >>>>>> >>>>>> And when I run the command 'kubectl exec -ti >>>>>> flink-taskmanager-74c68c6f48-9tkvd -- /bin/bash’ && ‘ping >>>>>> flink-jobmanager’ >>>>>> , I find I cannot ping flink-jobmanager from taskmanager >>>>>> >>>>>> I am new to k8s, can anyone give me some tutorial? Thanks a lot ! >>>>>> >>>>> >>>>> >>>>