hi
这本身就是一个bug 应该是还没有修复

| |
JasonLee
|
|
邮箱:17610775...@163.com
|

Signature is customized by Netease Mail Master

在2020年08月04日 15:41,bradyMk 写道:
您好
我这边是用perJob的方式提交的,而且这种现象还是偶发性的,这次错误日志是这样的:

2020-08-04 10:30:14,475 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph        - Job
flink2Ots (e11a22af324049217fdff28aca9f73a5) switched from state FAILING to
FAILED.
java.lang.Exception: Container released on a *lost* node
   at
org.apache.flink.yarn.YarnResourceManager.lambda$onContainersCompleted$0(YarnResourceManager.java:370)
   at
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:397)
   at
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:190)
   at
org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74)
   at
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152)
   at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
   at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
   at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
   at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
   at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170)
   at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
   at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
   at akka.actor.Actor$class.aroundReceive(Actor.scala:517)
   at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225)
   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
   at akka.actor.ActorCell.invoke(ActorCell.scala:561)
   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
   at akka.dispatch.Mailbox.run(Mailbox.scala:225)
   at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
   at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
   at
akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
   at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
   at
akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
2020-08-04 10:30:14,476 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph        - Could not
restart the job flink2Ots (e11a22af324049217fdff28aca9f73a5) because the
restart strategy prevented it.
java.lang.Exception: Container released on a *lost* node
   at
org.apache.flink.yarn.YarnResourceManager.lambda$onContainersCompleted$0(YarnResourceManager.java:370)
   at
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:397)
   at
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:190)
   at
org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74)
   at
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152)
   at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
   at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
   at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
   at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
   at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170)
   at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
   at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
   at akka.actor.Actor$class.aroundReceive(Actor.scala:517)
   at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225)
   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
   at akka.actor.ActorCell.invoke(ActorCell.scala:561)
   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
   at akka.dispatch.Mailbox.run(Mailbox.scala:225)
   at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
   at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
   at
akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
   at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
   at
akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
2020-08-04 10:30:14,476 INFO
org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Stopping
checkpoint coordinator for job e11a22af324049217fdff28aca9f73a5.
2020-08-04 10:30:14,476 INFO
org.apache.flink.runtime.checkpoint.StandaloneCompletedCheckpointStore  -
Shutting down

但是我之前也遇到过这个错误时,yarn上的application是可以退出的。



-----
Best Wishes
--
Sent from: http://apache-flink.147419.n8.nabble.com/

回复