Re: Trying to debug an issue in mesos task tracking

2015-01-26 Thread Alex Rukletsov
Itamar, you are right, Mesos executor and containerizer cannot distinguish between busy and stuck processes. However, since you use your own custom executor, you may want to implement a sort of health checks. It depends on what your task processes are doing. There are hundreds of reasons why an

Re: Trying to debug an issue in mesos task tracking

2015-01-24 Thread Itamar Ostricher
Alex, Sharma, thanks for your input! Trying to recreate the issue with a small cluster for the last few days, I was not able to observe a scenario that I can be sure that my executor sent the TASK_FINISHED update, but the scheduler did not receive it. I did observe multiple times a scenario that

Re: Trying to debug an issue in mesos task tracking

2015-01-23 Thread Alex Rukletsov
Itamar, beyond checking master and slave logs, could you pleasse verify your executor does send the TASK_FINISHED update? You may want to add some logging and the check executor log. Mesos guarantees the delivery of status updates, so I suspect the problem is on the executor's side. On Wed, Jan

Re: Trying to debug an issue in mesos task tracking

2015-01-21 Thread Sharma Podila
Have you checked the mesos-slave and mesos-master logs for that task id? There should be logs in there for task state updates, including FINISHED. There can be specific cases where sometimes the task status is not reliably sent to your scheduler (due to mesos-master restarts, leader election

Trying to debug an issue in mesos task tracking

2015-01-21 Thread Itamar Ostricher
I'm using a custom internal framework, loosely based on MesosSubmit. The phenomenon I'm seeing is something like this: 1. Task X is assigned to slave S. 2. I know this task should run for ~10minutes. 3. On the master dashboard, I see that task X is in the Running state for several *hours*. 4. I