Re: Framework stops to receive the heartbeats and events and gets removed from master

Vova Shelgunov Mon, 23 Jan 2017 08:27:56 -0800

Logs from mesos master:

0123 15:53:44.523613     7 http.cpp:391] HTTP POST for
/master/api/v1/scheduler from 172.18.0.1:58864 with User-Agent='AHC/2.0'
I0123 15:53:44.524159     7 master.cpp:4827] Processing ACKNOWLEDGE call
ac9a6e5e-67b3-490a-930f-0024eab734b4 for task 10336 of framework
3edce0a6-2a9e-448f-a5c2-666e2c2c3086-0005 (Test HTTP Framework) on agent
16c100c1-13fe-47b8-a2a0-aed9bafbbf8c-S0
I0123 15:53:44.524849     7 master.cpp:7744] Removing task 10336 with
resources cpus(*):0.1; mem(*):32 of framework
3edce0a6-2a9e-448f-a5c2-666e2c2c3086-0005 on agent
16c100c1-13fe-47b8-a2a0-aed9bafbbf8c-S0 at slave(1)@172.18.0.3:5051
(mesos-slave)
I0123 15:53:44.529033     7 master.cpp:1297] Framework
3edce0a6-2a9e-448f-a5c2-666e2c2c3086-0005 (Test HTTP Framework) disconnected
I0123 15:53:44.529636     7 master.cpp:2902] Disconnecting framework
3edce0a6-2a9e-448f-a5c2-666e2c2c3086-0005 (Test HTTP Framework)
I0123 15:53:44.529974     7 master.cpp:2926] Deactivating framework
3edce0a6-2a9e-448f-a5c2-666e2c2c3086-0005 (Test HTTP Framework)
I0123 15:53:44.530299     7 master.cpp:1310] Giving framework
3edce0a6-2a9e-448f-a5c2-666e2c2c3086-0005 (Test HTTP Framework) 0ns to
failover
I0123 15:53:44.530594     7 hierarchical.cpp:386] Deactivated framework
3edce0a6-2a9e-448f-a5c2-666e2c2c3086-0005
I0123 15:53:44.531962     7 master.cpp:6369] Framework failover timeout,
removing framework 3edce0a6-2a9e-448f-a5c2-666e2c2c3086-0005 (Test HTif TP
Framework)
I0123 15:53:44.534992     7 master.cpp:7103] Removing framework
3edce0a6-2a9e-448f-a5c2-666e2c2c3086-0005 (Test HTTP Framework)


It seems failover timeout is set to zero for the framework.

It can be my coding error if framework looses its connection to the master
multiple times (I see that I do not pass failover_timeout value during
reconnection).
I will try to observe if it solves my issue.

Thanks

2017-01-23 19:05 GMT+03:00 Vova Shelgunov <[email protected]>:

> Hi,
>
> I faced a very strange situation with my framework that talks to
> mesos master via Scheduler HTTP API:
>
> Sometimes my framework stops to receive the heartbeats and task updates
> from a master.
> I read the documentation of mesos (http://mesos.apache.
> org/documentation/latest/scheduler-http-api/), *Network partitions *section
> and I see that if a framework does not receive the heartbeats within some
> time it should reconnect to the master.
>
> I have written a heartbeat monitor that checks if there were not
> heartbeats last n seconds, then reconnect, but after the reconnection, I
> all the time receive an ERROR from the mesos master that my framework has
> been removed.
>
> Why is it happening?
>
> Regards,
> Uladzimir
>

Re: Framework stops to receive the heartbeats and events and gets removed from master

Reply via email to