Do you have the JobManager and TaskManager logs of the corresponding TM, by
any chance?

On Mon, Jun 29, 2015 at 8:12 PM, Andra Lungu <lungu.an...@gmail.com> wrote:

> Something similar in flink-0.10-SNAPSHOT:
>
> 06/29/2015 10:33:46     CHAIN Join(Join at main(TriangleCount.java:79)) ->
> Combine (Reduce at main(TriangleCount.java:79))(222/224) switched to FAILED
> java.lang.Exception: The slot in which the task was executed has been
> released. Probably loss of TaskManager 57c67d938c9144bec5ba798bb8ebe636 @
> wally025 - 8 slots - URL: akka.tcp://
> flink@130.149.249.35:56135/user/taskmanager
>         at
> org.apache.flink.runtime.instance.SimpleSlot.releaseSlot(SimpleSlot.java:151)
>         at
> org.apache.flink.runtime.instance.SlotSharingGroupAssignment.releaseSharedSlot(SlotSharingGroupAssignment.java:547)
>         at
> org.apache.flink.runtime.instance.SharedSlot.releaseSlot(SharedSlot.java:119)
>         at
> org.apache.flink.runtime.instance.Instance.markDead(Instance.java:154)
>         at
> org.apache.flink.runtime.instance.InstanceManager.unregisterTaskManager(InstanceManager.java:182)
>         at
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1.applyOrElse(JobManager.scala:421)
>         at
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
>         at
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
>         at
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
>         at
> org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:36)
>         at
> org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:29)
>         at
> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
>         at
> org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:29)
>         at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>         at
> org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:92)
>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>         at
> akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46)
>         at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
>         at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
>         at akka.actor.ActorCell.invoke(ActorCell.scala:486)
>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>         at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>         at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>         at
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>         at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>         at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>         at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>
> 06/29/2015 10:33:46     Job execution switched to status FAILING.
>
>
> On Mon, Jun 29, 2015 at 1:08 PM, Alexander Alexandrov <
> alexander.s.alexand...@gmail.com> wrote:
>
>> I witnessed a similar issue yesterday on a simple job (single task chain,
>> no shuffles) with a release-0.9 based fork.
>>
>> 2015-04-15 14:59 GMT+02:00 Flavio Pompermaier <pomperma...@okkam.it>:
>>
>>> Yes , sorry for that..I found it somewhere in the logs..the problem was
>>> that the program didn't die immediately but was somehow hanging and I
>>> discovered the source of the problem only running the program on a subset
>>> of the data.
>>>
>>> Thnks for the support,
>>> Flavio
>>>
>>> On Wed, Apr 15, 2015 at 2:56 PM, Stephan Ewen <se...@apache.org> wrote:
>>>
>>>> This means that the TaskManager was lost. The JobManager can no longer
>>>> reach the TaskManager and consists all tasks executing ob the TaskManager
>>>> as failed.
>>>>
>>>> Have a look at the TaskManager log, it should describe why the
>>>> TaskManager failed.
>>>> Am 15.04.2015 14:45 schrieb "Flavio Pompermaier" <pomperma...@okkam.it
>>>> >:
>>>>
>>>>> Hi to all,
>>>>>
>>>>> I have this strange error in my job and I don't know what's going on.
>>>>> What can I do?
>>>>>
>>>>> The full exception is:
>>>>>
>>>>> The slot in which the task was scheduled has been killed (probably
>>>>> loss of TaskManager).
>>>>> at
>>>>> org.apache.flink.runtime.instance.SimpleSlot.cancel(SimpleSlot.java:98)
>>>>> at
>>>>> org.apache.flink.runtime.jobmanager.scheduler.SlotSharingGroupAssignment.releaseSimpleSlot(SlotSharingGroupAssignment.java:335)
>>>>> at
>>>>> org.apache.flink.runtime.jobmanager.scheduler.SlotSharingGroupAssignment.releaseSharedSlot(SlotSharingGroupAssignment.java:319)
>>>>> at
>>>>> org.apache.flink.runtime.instance.SharedSlot.releaseSlot(SharedSlot.java:106)
>>>>> at
>>>>> org.apache.flink.runtime.instance.Instance.markDead(Instance.java:151)
>>>>> at
>>>>> org.apache.flink.runtime.instance.InstanceManager.unregisterTaskManager(InstanceManager.java:182)
>>>>> at
>>>>> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1.applyOrElse(JobManager.scala:435)
>>>>> at
>>>>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
>>>>> at
>>>>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
>>>>> at
>>>>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
>>>>> at
>>>>> org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:37)
>>>>> at
>>>>> org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:30)
>>>>> at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
>>>>> at
>>>>> org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:30)
>>>>> at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>>>>> at
>>>>> org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:94)
>>>>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>>>>> at akka.actor.ActorCell.invoke(ActorCell.scala:487)
>>>>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>>>>> at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>>>>> at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>>>>> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>>>> at
>>>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>>>>> at
>>>>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>>>> at
>>>>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>>>>
>>>>>
>>>
>>
>

Reply via email to