t;> - I don't know what the interaction between mesos agent and executor is.
>>> Is there a health check?
>>> - There is a reconciliation between Mesos and Frameworks: will Mesos
>>> include the "orphan" executor in the list there, so framework can find
&
what the interaction between mesos agent and executor is.
>> Is there a health check?
>> - There is a reconciliation between Mesos and Frameworks: will Mesos
>> include the "orphan" executor in the list there, so framework can find
>> runaways and kill them(u
ut
it down, etc).
On Tue, Oct 31, 2017 at 4:27 PM, Mohit Jaggi wrote:
> Good question.
> - I don't know what the interaction between mesos agent and executor is.
> Is there a health check?
> - There is a reconciliation between Mesos and Frameworks: will Mesos
> include the &quo
Good question.
- I don't know what the interaction between mesos agent and executor is. Is
there a health check?
- There is a reconciliation between Mesos and Frameworks: will Mesos
include the "orphan" executor in the list there, so framework can find
runaways and kill them(using
What defines a runaway executor?
Mesos does not know that this particular executor should self-terminate
within some reasonable time after its task terminates. In this case the
framework (Aurora) knows this expected behavior of Thermos and can clean up
ones that get stuck after the task terminates
I was asking if this can happen automatically.
On Tue, Oct 31, 2017 at 2:41 PM, Benjamin Mahler wrote:
> You can kill it manually by SIGKILLing the executor process.
> Using the agent API, you can launch a nested container session and kill
> the executor. +jie,gilbert, is there a CLI command for
You can kill it manually by SIGKILLing the executor process.
Using the agent API, you can launch a nested container session and kill the
executor. +jie,gilbert, is there a CLI command for 'exec'ing into the
container?
On Tue, Oct 31, 2017 at 12:47 PM, Mohit Jaggi wrote:
> Yes. There is a fix ava
Yes. There is a fix available now in Aurora/Thermos to try and exit in such
scenarios. But I am curious to know if Mesos agent has the functionality to
reap runaway executors.
On Tue, Oct 31, 2017 at 12:08 PM, Benjamin Mahler
wrote:
> Is my understanding correct that the Thermos transitions the
Is my understanding correct that the Thermos transitions the task to
TASK_FAILED, but Thermos gets stuck and can't terminate itself? The typical
workflow for thermos, as a 1:1 task:executor approach, is that the executor
terminates itself after the task is terminal.
The full logs of the agent duri
Here are some relevant logs. Aurora scheduler logs shows the task going
from:
INIT
->PENDING
->ASSIGNED
->STARTING
->RUNNING for a long time
->FAILED due to health check error, OSError: Resource temporarily
unavailable (I think this is referring to running out of PID space, see
thermos logs below)
Can you share the agent and executor logs of an example orphaned executor?
That would help us diagnose the issue.
On Fri, Oct 27, 2017 at 8:19 PM, Mohit Jaggi wrote:
> Folks,
> Often I see some orphaned executors in my cluster. These are cases where
> the framework was informed of task loss, so
Folks,
Often I see some orphaned executors in my cluster. These are cases where
the framework was informed of task loss, so has forgotten about them as
expected, but the container(docker) is still around. AFAIK, Mesos agent is
the only entity that has knowledge of these containers. How do I ensure
12 matches
Mail list logo