这个应该不是root cause,slot was removed通常是tm挂掉了导致的,需要找下对应的tm日志看下挂掉的原因。

Thank you~

Xintong Song



On Tue, Dec 24, 2019 at 10:06 PM hiliuxg <[email protected]> wrote:

> 偶尔发现,分配好的slot突然就被remove了,导致作业重启,看不出是什么原因导致?CPU和FULL GC都没有,异常信息如下:
>
> org.apache.flink.util.FlinkException: The assigned slot
> bae00218c818157649eb9e3c533b86af_11 was removed.
> &nbsp; &nbsp; &nbsp; &nbsp; at
> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.removeSlot(SlotManager.java:893)
> &nbsp; &nbsp; &nbsp; &nbsp; at
> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.removeSlots(SlotManager.java:863)
> &nbsp; &nbsp; &nbsp; &nbsp; at
> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.internalUnregisterTaskManager(SlotManager.java:1058)
> &nbsp; &nbsp; &nbsp; &nbsp; at
> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.unregisterTaskManager(SlotManager.java:385)
> &nbsp; &nbsp; &nbsp; &nbsp; at
> org.apache.flink.runtime.resourcemanager.ResourceManager.closeTaskManagerConnection(ResourceManager.java:847)
> &nbsp; &nbsp; &nbsp; &nbsp; at
> org.apache.flink.runtime.resourcemanager.ResourceManager$TaskManagerHeartbeatListener$1.run(ResourceManager.java:1161)
> &nbsp; &nbsp; &nbsp; &nbsp; at
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:392)
> &nbsp; &nbsp; &nbsp; &nbsp; at
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:185)
> &nbsp; &nbsp; &nbsp; &nbsp; at
> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74)
> &nbsp; &nbsp; &nbsp; &nbsp; at
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive(AkkaRpcActor.java:147)
> &nbsp; &nbsp; &nbsp; &nbsp; at
> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.onReceive(FencedAkkaRpcActor.java:40)
> &nbsp; &nbsp; &nbsp; &nbsp; at
> akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165)
> &nbsp; &nbsp; &nbsp; &nbsp; at
> akka.actor.Actor$class.aroundReceive(Actor.scala:502)
> &nbsp; &nbsp; &nbsp; &nbsp; at
> akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)
> &nbsp; &nbsp; &nbsp; &nbsp; at
> akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
> &nbsp; &nbsp; &nbsp; &nbsp; at
> akka.actor.ActorCell.invoke(ActorCell.scala:495)
> &nbsp; &nbsp; &nbsp; &nbsp; at
> akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
> &nbsp; &nbsp; &nbsp; &nbsp; at akka.dispatch.Mailbox.run(Mailbox.scala:224)
> &nbsp; &nbsp; &nbsp; &nbsp; at
> akka.dispatch.Mailbox.exec(Mailbox.scala:234)
> &nbsp; &nbsp; &nbsp; &nbsp; at
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> &nbsp; &nbsp; &nbsp; &nbsp; at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> &nbsp; &nbsp; &nbsp; &nbsp; at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> &nbsp; &nbsp; &nbsp; &nbsp; at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

回复