TaskManager的Slot的释放时机

2022-01-24 文章 johnjlong
各位大佬好,请教一个问题。
我根据ResourceID主动释放TM的链接的时候,我发现TM对应的Slots仅仅是标记为free。
而其真正是释放却要等到JobMaster主动cancel整个ExecuteGraph的时候,此时会逐个调用每个定点所在的slot的TM的cancel方法。
但是此时相关联的TM已经close掉,触发了rpc超时,默认20s。然后slot才会被释放。


我的问题是:为什么不在调用TaskExecutor的cancelTask之间判断下TM是否存活,如果不存活就直接走cancel的流程,不用等rpc超时后,才进行下一步???


附上日志截图:


| |
johnjlong
|
|
johnjl...@163.com
|
签名由网易邮箱大师定制

K8s模式指定-C参数不生效

2022-01-05 文章 johnjlong


大佬们
我部署作业到k8s集群,有多个作业共享一个外部的jar的情况。外部的jar的通过共享存储,每个JM、TM都能访问。
我使用-C指定依赖的classpath。但是依然找不到对应的class。问题可能出在哪里呢??
提交命令如下:


bin/flink run-application -t kubernetes-application 
-Dkubernetes.cluster-id=demo 
-Dkubernetes.container.image=devhub.baymax.oppo.local/flink-browser-cluster/flink:1.13.2-scala_2.11-java8-centos-v1.0.0
 -Dkubernetes.container.image.pull-policy=Always 
-Dkubernetes.rest-service.exposed.type=NodePort -C 
file:///home/service/var/chubaofs/flink/user/usrlib/demo/httpclient-4.5.2.jar 
-c com.martin.RandomDemo -p 2 
local:///opt/flink/user/jars/flink-demo-1.0-without-http-SNAPSHOT.jar





部署截图:


错误截图:


| |
johnjlong
|
|
johnjl...@163.com
|
签名由网易邮箱大师定制

托管内存为什么不能够指定最小或者最大值?

2021-12-21 文章 johnjlong
大佬们,托管内存为什么不能够指定最小或者最大值?
还是说 taskmanager.memory.managed.fraction 计算出来的就是最大值?


| |
johnjlong
|
|
johnjl...@163.com
|
签名由网易邮箱大师定制

回复:flink on native k8s 无法查看火焰图问题

2021-09-13 文章 johnjlong
开启这个参数:rest.flamegraph.enabled: true


| |
johnjlong
|
|
johnjl...@163.com
|
签名由网易邮箱大师定制
在2021年9月14日 11:29,赵旭晨 写道:
Unable to load requested file /jobs/d2fcac59f4a42ad17ceba8c5371862bb/。。


请问这是什么原因啊,恳请大佬解惑
镜像版本flink:1.13.2-scala_2.12-java8




 

回复:flink k8s部署使用s3做HA问题

2021-07-27 文章 johnjlong
Caused by: java.lang.ClassNotFoundException: 
com.amazonaws.services.s3.model.AmazonS3Exception


| |
johnjlong
|
|
johnjl...@163.com
|
签名由网易邮箱大师定制
在2021年7月27日 15:18,maker_d...@foxmail.com 写道:
各位开发者:
大家好!

我在使用flink native Kubernetes方式部署,使用minio做文件系统,配置如下:
state.backend: filesystem
fs.allowed-fallback-filesystems: s3
s3.endpoint: http://172.16.14.40:9000
s3.path-style: true
s3.access-key: admin
s3.secret-key: admin123
containerized.master.env.ENABLE_BUILT_IN_PLUGINS: flink-s3-fs-presto-1.12.4.jar
containerized.taskmanager.env.ENABLE_BUILT_IN_PLUGINS: 
flink-s3-fs-presto-1.12.4.jar
minio使用正常。

随后根据官方文档设置了HA,配置如下:
kubernetes.cluster-id: flink-sessoion
high-availability: 
org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
high-availability.storageDir: s3:///flink/recovery

flink-session正常部署,但在提交作业时报错如下:
org.apache.flink.client.program.ProgramInvocationException: The main method 
caused an error: java.util.concurrent.ExecutionException: 
org.apache.flink.runtime.client.JobSubmissionException: Failed to submit 
JobGraph.
at 
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:366)
at 
org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:219)
at org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114)
at org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:812)
at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:246)
at org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1054)
at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1132)
at 
org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28)
at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1132)
Caused by: java.lang.RuntimeException: java.util.concurrent.ExecutionException: 
org.apache.flink.runtime.client.JobSubmissionException: Failed to submit 
JobGraph.
at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:316)
at 
org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:1061)
at 
org.apache.flink.client.program.ContextEnvironment.executeAsync(ContextEnvironment.java:129)
at 
org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:70)
at org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:93)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:349)
... 8 more
Caused by: java.util.concurrent.ExecutionException: 
org.apache.flink.runtime.client.JobSubmissionException: Failed to submit 
JobGraph.
at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
at 
org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:1056)
... 16 more
Caused by: org.apache.flink.runtime.client.JobSubmissionException: Failed to 
submit JobGraph.
at 
org.apache.flink.client.program.rest.RestClusterClient.lambda$submitJob$7(RestClusterClient.java:400)
at 
java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
at 
java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)
at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at 
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
at 
org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$9(FutureUtils.java:364)
at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
at 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:561)
at 
java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:929)
at 
java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.flink.runtime.rest.util.RestClientException: 
[org.apache.flink.runtime.rest.handler.RestHandlerException: Could not upload 
job files.
at 
org.apache.flink.runtime.rest.handler.job.JobSubmitHandler.lambda$uploadJobGraphFiles$4(JobSubmitHandler.java:201)
at java.util.concurrent.CompletableFuture.biApply(CompletableFuture.java:1119