Re: Flink docker on k8s job submission timeout

Chesnay Schepler Thu, 11 Nov 2021 05:54:24 -0800

You should be able to increase the timeout by setting client.timeout.


On 10/11/2021 15:32, dhanesh arole wrote:

Hello all,
We are trying to run a Flink job in standalone mode using the officialdocker image on k8s. As per this documentation<https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/resource-providers/standalone/docker/#advanced-customization> wehave created our custom docker image that extends from the officialimage and does some pre start actions. And finally does `exec/docker-entrypoint.sh standalone-job "$1"` to run the job manager. Wehave ensured that flink-conf.yaml is present at expected pathi.e. $FLINK_HOME"/conf/flink-conf.yaml and have setupJOB_MANAGER_RPC_ADDRESS from pod IP.
We submit our job for execution in application's main thread using`StreamExecutionEnvironment#executeAsync`. But while submitting thejob we are consistently getting AskTimeout exception fromdispatcher#SubmitJob. ( see logs below )
Based on some previous answers on mailing lists and issues, we triedincreasing "web.timeout" and "akka.ask.timeout" but neither of thathelped. It seems like the timeout value used for this particularfuture is hardcoded in code. somewhere. Would be great if someone can provide some help / pointers on what we are missing or things that weshould check for.
Error logs:
/Caused by: java.util.concurrent.TimeoutException: Invocation ofpublic abstract java.util.concurrent.CompletableFutureorg.apache.flink.runtime.dispatcher.DispatcherGateway.submitJob(org.apache.flink.runtime.jobgraph.JobGraph,org.apache.flink.api.common.time.Time)timed out.at org.apache.flink.runtime.rpc.akka.$Proxy31.submitJob(UnknownSource) ~[?:1.13.2]atorg.apache.flink.client.deployment.application.executors.EmbeddedExecutor.lambda$submitJob$6(EmbeddedExecutor.java:183)~[flink-dist_2.12-1.13.2.jar:1.13.2]at java.util.concurrent.CompletableFuture$UniCompose.tryFire(UnknownSource) ~[?:?]at java.util.concurrent.CompletableFuture.postComplete(Unknown Source)~[?:?]
at java.util.concurrent.CompletableFuture.complete(Unknown Source) ~[?:?]
atorg.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.lambda$invokeRpc$0(AkkaInvocationHandler.java:237)~[flink-dist_2.12-1.13.2.jar:1.13.2]at java.util.concurrent.CompletableFuture.uniWhenComplete(UnknownSource) ~[?:?]atjava.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(UnknownSource) ~[?
/.
.
.
.
.
/Caused by: akka.pattern.AskTimeoutException: Ask timed out on[Actor[akka://flink/user/rpc/dispatcher_1#2019478781]] after [60000ms]. Message of type[org.apache.flink.runtime.rpc.messages.LocalFencedMessage]. A typicalreason for `AskTimeoutException` is that the recipient actor didn'tsend a reply.atakka.pattern.PromiseActorRef$.$anonfun$defaultOnTimeout$1(AskSupport.scala:635)~[flink-dist_2.12-1.13.2.jar:1.13.2]atakka.pattern.PromiseActorRef$.$anonfun$apply$1(AskSupport.scala:650)~[flink-dist_2.12-1.13.2.jar:1.13.2]at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:205)~[flink-dist_2.12-1.13.2.jar:1.13.2]atscala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:870)~[flink-dist_2.12-1.13.2.jar:1.13.2]atscala.concurrent.BatchingExecutor.execute(BatchingExecutor.scala:109)~[flink-dist_2.12-1.13.2.jar:1.13.2]atscala.concurrent.BatchingExecutor.execute$(BatchingExecutor.scala:103)~[flink-dist_2.12-1.13.2.jar:1.13.2]atscala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:868)~[flink-dist_2.12-1.13.2.jar:1.13.2]atakka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:328)~[flink.jar:?]atakka.actor.LightArrayRevolverScheduler$$anon$3.executeBucket$1(LightArrayRevolverScheduler.scala:279)~[flink.jar:?]atakka.actor.LightArrayRevolverScheduler$$anon$3.nextTick(LightArrayRevolverScheduler.scala:283)~[flink.jar:?]/
-
Dhanesh Arole

Re: Flink docker on k8s job submission timeout

Reply via email to