Hey,

I am actually facing a similar issue lately, where the job manager release
the task slots as it cannot contact the taskmanager.

Meanwhile the taskmanager is also trying to connect to the Jobmanager and
fails multiple times. This happens on multiple taskmanagers seemingly
randomly. So the TM stays alive but the connection is lost.

Maybe these are related. We are currently trying to debug this problem.

Gyula

Till Rohrmann <trohrm...@apache.org> ezt írta (időpont: 2016. febr. 4., Cs,
15:55):

> Hi Radu,
>
> what does the log of the TaskManager 10.204.62.80:57910 say?
>
> Cheers,
> Till
> ​
>
> On Wed, Feb 3, 2016 at 6:00 PM, Radu Tudoran <radu.tudo...@huawei.com>
> wrote:
>
>> Hello,
>>
>>
>>
>>
>>
>> I am facing an error which for which I cannot figure the cause. Any idea
>> what could cause such an error?
>>
>>
>>
>>
>>
>>
>>
>> java.lang.Exception: The slot in which the task was executed has been
>> released. Probably loss of TaskManager a8b69bd9449ee6792e869a9ff9e843e2 @
>> cloudr6-admin - 4 slots - URL: akka.tcp://
>> flink@10.204.62.80:57910/user/taskmanager
>>
>>         at
>> org.apache.flink.runtime.instance.SimpleSlot.releaseSlot(SimpleSlot.java:151)
>>
>>         at
>> org.apache.flink.runtime.instance.SlotSharingGroupAssignment.releaseSharedSlot(SlotSharingGroupAssignment.java:547)
>>
>>         at
>> org.apache.flink.runtime.instance.SharedSlot.releaseSlot(SharedSlot.java:119)
>>
>>         at
>> org.apache.flink.runtime.instance.Instance.markDead(Instance.java:156)
>>
>>         at
>> org.apache.flink.runtime.instance.InstanceManager.unregisterTaskManager(InstanceManager.java:215)
>>
>>         at
>> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:696)
>>
>>         at
>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
>>
>>         at
>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
>>
>>         at
>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
>>
>>         at
>> org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44)
>>
>>         at
>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
>>
>>         at
>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
>>
>>         at
>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
>>
>>         at
>> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
>>
>>         at
>> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
>>
>>         at
>> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
>>
>>         at
>> org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
>>
>>         at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>>
>>         at
>> org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:100)
>>
>>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>>
>>         at
>> akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46)
>>
>>         at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
>>
>>         at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
>>
>>         at akka.actor.ActorCell.invoke(ActorCell.scala:486)
>>
>>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>>
>>         at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>>
>>         at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>>
>>         at
>> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>
>>         at
>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>>
>>         at
>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>
>>         at
>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>
>>
>>
>>
>>
>> Dr. Radu Tudoran
>>
>> Research Engineer - Big Data Expert
>>
>> IT R&D Division
>>
>>
>>
>> [image: image001.png]
>>
>> HUAWEI TECHNOLOGIES Duesseldorf GmbH
>>
>> European Research Center
>>
>> Riesstrasse 25, 80992 München
>>
>>
>>
>> E-mail: *radu.tudo...@huawei.com <radu.tudo...@huawei.com>*
>>
>> Mobile: +49 15209084330
>>
>> Telephone: +49 891588344173
>>
>>
>>
>> HUAWEI TECHNOLOGIES Duesseldorf GmbH
>> Hansaallee 205, 40549 Düsseldorf, Germany, www.huawei.com
>> Registered Office: Düsseldorf, Register Court Düsseldorf, HRB 56063,
>> Managing Director: Bo PENG, Wanzhou MENG, Lifang CHEN
>> Sitz der Gesellschaft: Düsseldorf, Amtsgericht Düsseldorf, HRB 56063,
>> Geschäftsführer: Bo PENG, Wanzhou MENG, Lifang CHEN
>>
>> This e-mail and its attachments contain confidential information from
>> HUAWEI, which is intended only for the person or entity whose address is
>> listed above. Any use of the information contained herein in any way
>> (including, but not limited to, total or partial disclosure, reproduction,
>> or dissemination) by persons other than the intended recipient(s) is
>> prohibited. If you receive this e-mail in error, please notify the sender
>> by phone or email immediately and delete it!
>>
>>
>>
>
>

Reply via email to