+AlexR

On Mon, May 2, 2016 at 2:31 PM, Jeff Schroeder <[email protected]>
wrote:

> Some frameworks like Aurora use custom executors to distribute the
> healthchecks with the tasks. This allows the task to survive a network
> partition without the scheduler setting it to TASK_LOST.
>
> Marathon uses mesos-health-check for command based health checks, but does
> TCP and HTTP healthchecks from the elected scheduler (marathon issue
> #3728). On a partition event, it sets those tasks to TASK_LOST causing the
> master to kill them on partition heal. It also means the scheduler gets
> bogged down when you have many tasks with many healthchecks defined.
>
> Can this feature get a Shepard as would be useful for making mesos tasks
> more resilient in general? There is an open review from Haosdent for fixing
> it.
>
> Thanks!
>
>
> --
> Text by Jeff, typos by iPhone
>

Reply via email to