Gary Yao created FLINK-8887:
-------------------------------

             Summary: ClusterClient.getJobStatus can throw FencingTokenException
                 Key: FLINK-8887
                 URL: https://issues.apache.org/jira/browse/FLINK-8887
             Project: Flink
          Issue Type: Bug
          Components: Distributed Coordination
    Affects Versions: 1.5.0
            Reporter: Gary Yao
             Fix For: 1.5.0


*Description*
Calling {{RestClusterClient.getJobStatus}} or 
{{MiniClusterClient.getJobStatus}} can result in a {{FencingTokenException}}. 

*Analysis*
{{Dispatcher.requestJobStatus}} first looks the {{JobManagerRunner}} up by job 
id. If a reference is found, {{requestJobStatus}} is called on the respective 
instance. If not, the {{ArchivedExecutionGraphStore}} is queried. However, 
between the lookup and the method call, the {{JobMaster}} of the respective job 
may have lost leadership already (job finished), and has set the fencing token 
to {{null}}.

*Stacktrace*

{noformat}
Caused by: org.apache.flink.runtime.rpc.exceptions.FencingTokenException: 
Fencing token mismatch: Ignoring message LocalFencedMessage(null, 
LocalRpcInvocation(requestJobStatus(Time))) because the fencing token null did 
not match the expected fencing token b8423c75bc6838244b8c93c8bd4a4f51.
        at 
org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleMessage(FencedAkkaRpcActor.java:73)
        at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.lambda$onReceive$1(AkkaRpcActor.java:132)
        at 
akka.actor.ActorCell$$anonfun$become$1.applyOrElse(ActorCell.scala:544)
        at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
        at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
        at akka.actor.ActorCell.invoke(ActorCell.scala:495)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
        at akka.dispatch.Mailbox.run(Mailbox.scala:224)
        at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
{noformat}

{noformat}
Caused by: org.apache.flink.runtime.rpc.exceptions.FencingTokenException: 
Fencing token not set: Ignoring message LocalFencedMessage(null, 
LocalRpcInvocation(requestJobStatus(Time))) because the fencing token is null.
        at 
org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleMessage(FencedAkkaRpcActor.java:56)
        at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.lambda$onReceive$1(AkkaRpcActor.java:132)
        at 
akka.actor.ActorCell$$anonfun$become$1.applyOrElse(ActorCell.scala:544)
        at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
        at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
        at akka.actor.ActorCell.invoke(ActorCell.scala:495)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
        at akka.dispatch.Mailbox.run(Mailbox.scala:224)
        at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to