You should be able to increase the timeout by setting client.timeout.
On 10/11/2021 15:32, dhanesh arole wrote:
Hello all,
We are trying to run a Flink job in standalone mode using the official
docker image on k8s. As per this documentation
<https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/resource-providers/standalone/docker/#advanced-customization> we
have created our custom docker image that extends from the official
image and does some pre start actions. And finally does `exec
/docker-entrypoint.sh standalone-job "$1"` to run the job manager. We
have ensured that flink-conf.yaml is present at expected path
i.e. $FLINK_HOME"/conf/flink-conf.yaml and have setup
JOB_MANAGER_RPC_ADDRESS from pod IP.
We submit our job for execution in application's main thread using
`StreamExecutionEnvironment#executeAsync`. But while submitting the
job we are consistently getting AskTimeout exception from
dispatcher#SubmitJob. ( see logs below )
Based on some previous answers on mailing lists and issues, we tried
increasing "web.timeout" and "akka.ask.timeout" but neither of that
helped. It seems like the timeout value used for this particular
future is hardcoded in code. somewhere. Would be great if someone can
provide some help / pointers on what we are missing or things that we
should check for.
Error logs:
/Caused by: java.util.concurrent.TimeoutException: Invocation of
public abstract java.util.concurrent.CompletableFuture
org.apache.flink.runtime.dispatcher.DispatcherGateway.submitJob(org.apache.flink.runtime.jobgraph.JobGraph,org.apache.flink.api.common.time.Time)
timed out.
at org.apache.flink.runtime.rpc.akka.$Proxy31.submitJob(Unknown
Source) ~[?:1.13.2]
at
org.apache.flink.client.deployment.application.executors.EmbeddedExecutor.lambda$submitJob$6(EmbeddedExecutor.java:183)
~[flink-dist_2.12-1.13.2.jar:1.13.2]
at java.util.concurrent.CompletableFuture$UniCompose.tryFire(Unknown
Source) ~[?:?]
at java.util.concurrent.CompletableFuture.postComplete(Unknown Source)
~[?:?]
at java.util.concurrent.CompletableFuture.complete(Unknown Source) ~[?:?]
at
org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.lambda$invokeRpc$0(AkkaInvocationHandler.java:237)
~[flink-dist_2.12-1.13.2.jar:1.13.2]
at java.util.concurrent.CompletableFuture.uniWhenComplete(Unknown
Source) ~[?:?]
at
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(Unknown
Source) ~[?
/.
.
.
.
.
/Caused by: akka.pattern.AskTimeoutException: Ask timed out on
[Actor[akka://flink/user/rpc/dispatcher_1#2019478781]] after [60000
ms]. Message of type
[org.apache.flink.runtime.rpc.messages.LocalFencedMessage]. A typical
reason for `AskTimeoutException` is that the recipient actor didn't
send a reply.
at
akka.pattern.PromiseActorRef$.$anonfun$defaultOnTimeout$1(AskSupport.scala:635)
~[flink-dist_2.12-1.13.2.jar:1.13.2]
at
akka.pattern.PromiseActorRef$.$anonfun$apply$1(AskSupport.scala:650)
~[flink-dist_2.12-1.13.2.jar:1.13.2]
at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:205)
~[flink-dist_2.12-1.13.2.jar:1.13.2]
at
scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:870)
~[flink-dist_2.12-1.13.2.jar:1.13.2]
at
scala.concurrent.BatchingExecutor.execute(BatchingExecutor.scala:109)
~[flink-dist_2.12-1.13.2.jar:1.13.2]
at
scala.concurrent.BatchingExecutor.execute$(BatchingExecutor.scala:103)
~[flink-dist_2.12-1.13.2.jar:1.13.2]
at
scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:868)
~[flink-dist_2.12-1.13.2.jar:1.13.2]
at
akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:328)
~[flink.jar:?]
at
akka.actor.LightArrayRevolverScheduler$$anon$3.executeBucket$1(LightArrayRevolverScheduler.scala:279)
~[flink.jar:?]
at
akka.actor.LightArrayRevolverScheduler$$anon$3.nextTick(LightArrayRevolverScheduler.scala:283)
~[flink.jar:?]/
-
Dhanesh Arole