TaskManager的Slot的释放时机
各位大佬好,请教一个问题。 我根据ResourceID主动释放TM的链接的时候,我发现TM对应的Slots仅仅是标记为free。 而其真正是释放却要等到JobMaster主动cancel整个ExecuteGraph的时候,此时会逐个调用每个定点所在的slot的TM的cancel方法。 但是此时相关联的TM已经close掉,触发了rpc超时,默认20s。然后slot才会被释放。 我的问题是:为什么不在调用TaskExecutor的cancelTask之间判断下TM是否存活,如果不存活就直接走cancel的流程,不用等rpc超时后,才进行下一步??? 附上日志截图: | | johnjlong | | johnjl...@163.com | 签名由网易邮箱大师定制
K8s模式指定-C参数不生效
大佬们 我部署作业到k8s集群,有多个作业共享一个外部的jar的情况。外部的jar的通过共享存储,每个JM、TM都能访问。 我使用-C指定依赖的classpath。但是依然找不到对应的class。问题可能出在哪里呢?? 提交命令如下: bin/flink run-application -t kubernetes-application -Dkubernetes.cluster-id=demo -Dkubernetes.container.image=devhub.baymax.oppo.local/flink-browser-cluster/flink:1.13.2-scala_2.11-java8-centos-v1.0.0 -Dkubernetes.container.image.pull-policy=Always -Dkubernetes.rest-service.exposed.type=NodePort -C file:///home/service/var/chubaofs/flink/user/usrlib/demo/httpclient-4.5.2.jar -c com.martin.RandomDemo -p 2 local:///opt/flink/user/jars/flink-demo-1.0-without-http-SNAPSHOT.jar 部署截图: 错误截图: | | johnjlong | | johnjl...@163.com | 签名由网易邮箱大师定制
托管内存为什么不能够指定最小或者最大值?
大佬们,托管内存为什么不能够指定最小或者最大值? 还是说 taskmanager.memory.managed.fraction 计算出来的就是最大值? | | johnjlong | | johnjl...@163.com | 签名由网易邮箱大师定制
回复:flink on native k8s 无法查看火焰图问题
开启这个参数:rest.flamegraph.enabled: true | | johnjlong | | johnjl...@163.com | 签名由网易邮箱大师定制 在2021年9月14日 11:29,赵旭晨 写道: Unable to load requested file /jobs/d2fcac59f4a42ad17ceba8c5371862bb/。。 请问这是什么原因啊,恳请大佬解惑 镜像版本flink:1.13.2-scala_2.12-java8
回复:flink k8s部署使用s3做HA问题
Caused by: java.lang.ClassNotFoundException: com.amazonaws.services.s3.model.AmazonS3Exception | | johnjlong | | johnjl...@163.com | 签名由网易邮箱大师定制 在2021年7月27日 15:18,maker_d...@foxmail.com 写道: 各位开发者: 大家好! 我在使用flink native Kubernetes方式部署,使用minio做文件系统,配置如下: state.backend: filesystem fs.allowed-fallback-filesystems: s3 s3.endpoint: http://172.16.14.40:9000 s3.path-style: true s3.access-key: admin s3.secret-key: admin123 containerized.master.env.ENABLE_BUILT_IN_PLUGINS: flink-s3-fs-presto-1.12.4.jar containerized.taskmanager.env.ENABLE_BUILT_IN_PLUGINS: flink-s3-fs-presto-1.12.4.jar minio使用正常。 随后根据官方文档设置了HA,配置如下: kubernetes.cluster-id: flink-sessoion high-availability: org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory high-availability.storageDir: s3:///flink/recovery flink-session正常部署,但在提交作业时报错如下: org.apache.flink.client.program.ProgramInvocationException: The main method caused an error: java.util.concurrent.ExecutionException: org.apache.flink.runtime.client.JobSubmissionException: Failed to submit JobGraph. at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:366) at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:219) at org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114) at org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:812) at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:246) at org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1054) at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1132) at org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28) at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1132) Caused by: java.lang.RuntimeException: java.util.concurrent.ExecutionException: org.apache.flink.runtime.client.JobSubmissionException: Failed to submit JobGraph. at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:316) at org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:1061) at org.apache.flink.client.program.ContextEnvironment.executeAsync(ContextEnvironment.java:129) at org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:70) at org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:93) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:349) ... 8 more Caused by: java.util.concurrent.ExecutionException: org.apache.flink.runtime.client.JobSubmissionException: Failed to submit JobGraph. at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895) at org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:1056) ... 16 more Caused by: org.apache.flink.runtime.client.JobSubmissionException: Failed to submit JobGraph. at org.apache.flink.client.program.rest.RestClusterClient.lambda$submitJob$7(RestClusterClient.java:400) at java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870) at java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852) at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977) at org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$9(FutureUtils.java:364) at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) at java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:561) at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:929) at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.flink.runtime.rest.util.RestClientException: [org.apache.flink.runtime.rest.handler.RestHandlerException: Could not upload job files. at org.apache.flink.runtime.rest.handler.job.JobSubmitHandler.lambda$uploadJobGraphFiles$4(JobSubmitHandler.java:201) at java.util.concurrent.CompletableFuture.biApply(CompletableFuture.java:1119