Itamar,
you are right, Mesos executor and containerizer cannot distinguish
between busy and stuck processes. However, since you use your own
custom executor, you may want to implement a sort of health checks. It
depends on what your task processes are doing.
There are hundreds of reasons why an
Alex, Sharma, thanks for your input!
Trying to recreate the issue with a small cluster for the last few days, I
was not able to observe a scenario that I can be sure that my executor sent
the TASK_FINISHED update, but the scheduler did not receive it.
I did observe multiple times a scenario that
Itamar,
beyond checking master and slave logs, could you pleasse verify your
executor does send the TASK_FINISHED update? You may want to add some
logging and the check executor log. Mesos guarantees the delivery of
status updates, so I suspect the problem is on the executor's side.
On Wed, Jan
Have you checked the mesos-slave and mesos-master logs for that task id?
There should be logs in there for task state updates, including FINISHED.
There can be specific cases where sometimes the task status is not reliably
sent to your scheduler (due to mesos-master restarts, leader election
I'm using a custom internal framework, loosely based on MesosSubmit.
The phenomenon I'm seeing is something like this:
1. Task X is assigned to slave S.
2. I know this task should run for ~10minutes.
3. On the master dashboard, I see that task X is in the Running state for
several *hours*.
4. I
5 matches
Mail list logo