Re: orphan executor

2017-11-02 Thread Benjamin Mahler
t;> - I don't know what the interaction between mesos agent and executor is. >>> Is there a health check? >>> - There is a reconciliation between Mesos and Frameworks: will Mesos >>> include the "orphan" executor in the list there, so framework can find &

Re: orphan executor

2017-10-31 Thread Mohit Jaggi
what the interaction between mesos agent and executor is. >> Is there a health check? >> - There is a reconciliation between Mesos and Frameworks: will Mesos >> include the "orphan" executor in the list there, so framework can find >> runaways and kill them(u

Re: orphan executor

2017-10-31 Thread Benjamin Mahler
ut it down, etc). On Tue, Oct 31, 2017 at 4:27 PM, Mohit Jaggi wrote: > Good question. > - I don't know what the interaction between mesos agent and executor is. > Is there a health check? > - There is a reconciliation between Mesos and Frameworks: will Mesos > include the &quo

Re: orphan executor

2017-10-31 Thread Mohit Jaggi
Good question. - I don't know what the interaction between mesos agent and executor is. Is there a health check? - There is a reconciliation between Mesos and Frameworks: will Mesos include the "orphan" executor in the list there, so framework can find runaways and kill them(using

Re: orphan executor

2017-10-31 Thread Benjamin Mahler
What defines a runaway executor? Mesos does not know that this particular executor should self-terminate within some reasonable time after its task terminates. In this case the framework (Aurora) knows this expected behavior of Thermos and can clean up ones that get stuck after the task terminates

Re: orphan executor

2017-10-31 Thread Mohit Jaggi
I was asking if this can happen automatically. On Tue, Oct 31, 2017 at 2:41 PM, Benjamin Mahler wrote: > You can kill it manually by SIGKILLing the executor process. > Using the agent API, you can launch a nested container session and kill > the executor. +jie,gilbert, is there a CLI command for

Re: orphan executor

2017-10-31 Thread Benjamin Mahler
You can kill it manually by SIGKILLing the executor process. Using the agent API, you can launch a nested container session and kill the executor. +jie,gilbert, is there a CLI command for 'exec'ing into the container? On Tue, Oct 31, 2017 at 12:47 PM, Mohit Jaggi wrote: > Yes. There is a fix ava

Re: orphan executor

2017-10-31 Thread Mohit Jaggi
Yes. There is a fix available now in Aurora/Thermos to try and exit in such scenarios. But I am curious to know if Mesos agent has the functionality to reap runaway executors. On Tue, Oct 31, 2017 at 12:08 PM, Benjamin Mahler wrote: > Is my understanding correct that the Thermos transitions the

Re: orphan executor

2017-10-31 Thread Benjamin Mahler
Is my understanding correct that the Thermos transitions the task to TASK_FAILED, but Thermos gets stuck and can't terminate itself? The typical workflow for thermos, as a 1:1 task:executor approach, is that the executor terminates itself after the task is terminal. The full logs of the agent duri

Re: orphan executor

2017-10-27 Thread Mohit Jaggi
Here are some relevant logs. Aurora scheduler logs shows the task going from: INIT ->PENDING ->ASSIGNED ->STARTING ->RUNNING for a long time ->FAILED due to health check error, OSError: Resource temporarily unavailable (I think this is referring to running out of PID space, see thermos logs below)

Re: orphan executor

2017-10-27 Thread Vinod Kone
Can you share the agent and executor logs of an example orphaned executor? That would help us diagnose the issue. On Fri, Oct 27, 2017 at 8:19 PM, Mohit Jaggi wrote: > Folks, > Often I see some orphaned executors in my cluster. These are cases where > the framework was informed of task loss, so

orphan executor

2017-10-27 Thread Mohit Jaggi
Folks, Often I see some orphaned executors in my cluster. These are cases where the framework was informed of task loss, so has forgotten about them as expected, but the container(docker) is still around. AFAIK, Mesos agent is the only entity that has knowledge of these containers. How do I ensure