Re: Status of MESOS-2533?

haosdent Wed, 04 May 2016 10:54:46 -0700

Sorry for blocked by other things recently and didn't reply in time. @Jeff
I have already contact Alex last week, he would review this shortly when he
available. However, I think you could use curl to check the http status
instead so far? Do you encounter any problems when using curl in the
command health check?


On Thu, May 5, 2016 at 1:16 AM, Benjamin Mahler <[email protected]> wrote:

> +AlexR
>
> On Mon, May 2, 2016 at 2:31 PM, Jeff Schroeder <[email protected]
> > wrote:
>
>> Some frameworks like Aurora use custom executors to distribute the
>> healthchecks with the tasks. This allows the task to survive a network
>> partition without the scheduler setting it to TASK_LOST.
>>
>> Marathon uses mesos-health-check for command based health checks, but
>> does TCP and HTTP healthchecks from the elected scheduler (marathon issue
>> #3728). On a partition event, it sets those tasks to TASK_LOST causing the
>> master to kill them on partition heal. It also means the scheduler gets
>> bogged down when you have many tasks with many healthchecks defined.
>>
>> Can this feature get a Shepard as would be useful for making mesos tasks
>> more resilient in general? There is an open review from Haosdent for fixing
>> it.
>>
>> Thanks!
>>
>>
>> --
>> Text by Jeff, typos by iPhone
>>
>
>


-- 
Best Regards,
Haosdent Huang

Re: Status of MESOS-2533?

Reply via email to