[ https://issues.apache.org/jira/browse/FLINK-5072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15667421#comment-15667421 ]
Chesnay Schepler commented on FLINK-5072: ----------------------------------------- hmm. Well it isn't something critical if the MetricFetcher request times out; it will simply not update the metrics in the web interface and will try again 10 seconds later if required. However, the MetricQueryService is a separate actor that, if a job is fully running, only receives a request from the fetcher. I would think that it should be able to serve that request within the 10 second timeout. But frankly, i don't know a lot about the network conditions under heavy load. > MetricFetcher Ask Timeout > ------------------------- > > Key: FLINK-5072 > URL: https://issues.apache.org/jira/browse/FLINK-5072 > Project: Flink > Issue Type: Improvement > Reporter: Ufuk Celebi > > Running a large scale test with 1.2-SNAPSHOT and heavy load on the TMs, I > encountered a lot of ask timeouts for the metric fetcher: > {code} > akka.pattern.AskTimeoutException: Ask timed out on > [Actor[akka.tcp://flink@10.240.0.52:34471/user/MetricQueryService_container_1479207428252_0014_01_000026]] > after [10000 ms] > at > akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:333) > at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117) > at > scala.concurrent.Future$InternalCallbackExecutor$.scala$concurrent$Future$InternalCallbackExecutor$$unbatchedExecute(Future.scala:694) > at > scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:691) > at > akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:467) > at > akka.actor.LightArrayRevolverScheduler$$anon$8.executeBucket$1(Scheduler.scala:419) > at > akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:423) > at > akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:375) > at java.lang.Thread.run(Thread.java:745) > {code} > [~zentol] Does it make sense to investigate this further? -- This message was sent by Atlassian JIRA (v6.3.4#6332)