[ 
https://issues.apache.org/jira/browse/FLINK-5072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15667421#comment-15667421
 ] 

Chesnay Schepler commented on FLINK-5072:
-----------------------------------------

hmm. Well it isn't something critical if the MetricFetcher request times out; 
it will simply not update the metrics in the web interface and will try again 
10 seconds later if required.

However, the MetricQueryService is a separate actor that, if a job is fully 
running, only receives a request from the fetcher. I would think that it should 
be able to serve that request within the 10 second timeout. But frankly, i 
don't know a lot about the network conditions under heavy load.

> MetricFetcher Ask Timeout
> -------------------------
>
>                 Key: FLINK-5072
>                 URL: https://issues.apache.org/jira/browse/FLINK-5072
>             Project: Flink
>          Issue Type: Improvement
>            Reporter: Ufuk Celebi
>
> Running a large scale test with 1.2-SNAPSHOT and heavy load on the TMs, I 
> encountered a lot of ask timeouts for the metric fetcher:
> {code}
> akka.pattern.AskTimeoutException: Ask timed out on 
> [Actor[akka.tcp://flink@10.240.0.52:34471/user/MetricQueryService_container_1479207428252_0014_01_000026]]
>  after [10000 ms]
>       at 
> akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:333)
>       at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117)
>       at 
> scala.concurrent.Future$InternalCallbackExecutor$.scala$concurrent$Future$InternalCallbackExecutor$$unbatchedExecute(Future.scala:694)
>       at 
> scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:691)
>       at 
> akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:467)
>       at 
> akka.actor.LightArrayRevolverScheduler$$anon$8.executeBucket$1(Scheduler.scala:419)
>       at 
> akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:423)
>       at 
> akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:375)
>       at java.lang.Thread.run(Thread.java:745)
> {code}
> [~zentol] Does it make sense to investigate this further?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to