+AlexR On Mon, May 2, 2016 at 2:31 PM, Jeff Schroeder <[email protected]> wrote:
> Some frameworks like Aurora use custom executors to distribute the > healthchecks with the tasks. This allows the task to survive a network > partition without the scheduler setting it to TASK_LOST. > > Marathon uses mesos-health-check for command based health checks, but does > TCP and HTTP healthchecks from the elected scheduler (marathon issue > #3728). On a partition event, it sets those tasks to TASK_LOST causing the > master to kill them on partition heal. It also means the scheduler gets > bogged down when you have many tasks with many healthchecks defined. > > Can this feature get a Shepard as would be useful for making mesos tasks > more resilient in general? There is an open review from Haosdent for fixing > it. > > Thanks! > > > -- > Text by Jeff, typos by iPhone >

