Hi Anindya,
The problem occurred again. The following is the log of the scheduler
driver log at Chronos side:
I0812 08:15:43.902712 96 sched.cpp:1937] Asked to abort the driver
I0812 08:15:43.902763 96 sched.cpp:981] Scheduler::statusUpdate took
1.436378441secs
I0812 08:15:43.902788 96 sched.cpp:988] Not sending status update
acknowledgment message b\
ecause the driver is not running!
I0812 08:15:43.902866 96 sched.cpp:919] Ignoring task status update message
because the dr\
iver is not running!
However from the earlier log I don't see the clue of why scheduler driver
be aborted.
Thankds,
Zhichang Yu
________________________________
发件人: 志昌 余 <[email protected]>
发送时间: 2016年8月9日 18:03:31
收件人: [email protected]
主题: 答复: Deactivationg framework unexpectly
Hi Anindys,
Thanks for the info. I'll enable scheduler driver log to see what happen.
Regards,
Zhichang Yu
________________________________
发件人: [email protected] <[email protected]> 代表 Anindya Sinha
<[email protected]>
发送时间: 2016年8月8日 23:50:10
收件人: [email protected]
主题: Re: Deactivationg framework unexpectly
Looks like your framework (chronos) is sending a DeactivateFrameworkMessage
message to the master. The scheduler driver would also send a
DeativateFramework message if it is aborted
(https://github.com/apache/mesos/blob/master/src/sched/sched.cpp#L1224).
Also, master can deactivate your framework if your framework disconnects or
fails over. Please check logs in master or see if your framework received a
FrameworkErrorMessage.
Thanks
Anindya
On Aug 8, 2016, at 3:35 AM, 志昌 余
<[email protected]<mailto:[email protected]>> wrote:
Hi,
I recently faced a wired problem. I'm running mesos + chronos. Chronos
often (once every several days) stops scheduling tasks due to mesos deactived
the framework.
As following is the log of mesos master leader:
# grep -iP "activat|disconnected" /var/log/mesos/mesos-master.INFO
I0806 13:40:33.143658 30 master.cpp:2551] Deactivating framework
90a6a7dc-7256-4e55-bd7e-573233c5df74-0000 (chronos-2.5.0-SNAPSHOT) at
[email protected]<mailto:[email protected]>:34544
I0806 13:40:33.143908 23 hierarchical.cpp:375] Deactivated framework
90a6a7dc-7256-4e55-bd7e-573233c5df74-0000
The fix is to manually reboot the chronos leader.
My env:
There are 3 physical machines, on each are running containerized mesos master
and chronos. When the issue occurred, the mesos leader and chronos leader were
both running on the same machine.
Software Version:
mesos-master:0.28.0-2.0.16.ubuntu1404
chronos:2.5.0-ce4469d.ubuntu1404-mesos-0.28.0-2.0.16.ubuntu1404
Can anyone give insight for this problem?
Thanks,
Zhichang Yu