Sagar Sadashiv Patwardhan created MESOS-8933:
------------------------------------------------

             Summary: Stop sending offers from agents in draining mode
                 Key: MESOS-8933
                 URL: https://issues.apache.org/jira/browse/MESOS-8933
             Project: Mesos
          Issue Type: Improvement
            Reporter: Sagar Sadashiv Patwardhan


*Background:*

At Yelp, we use mesos to run microservices(marathon), batch jobs(chronos and 
custom frameworks), spark(spark mesos framework) etc.  We also autoscale the 
number of agents in our cluster based on the current demand and some other 
metrics. We use mesos maintenance primitives to gracefully shut down mesos 
agents. 

*Problem:*

When we want to shut down an agent for some reason, we first move the agent 
into draining mode. This allows us to gracefully terminate the micro-services 
and other tasks. But, mesos continues to send offers from that agent with 
unavailability set. Frameworks such as marathon, chronos and spark ignore the 
unavailability and schedule the tasks on the agent. To prevent this from 
happening, we allocate all the available resources on that agent to maintenance 
role. But, this approach is not fool-proof. There is still a race condition 
between when we move the agent into draining mode and when we allocate all the 
available resources on the agent to maintenance role.

*Proposal:*

 It would be nice if mesos stops sending offers from the agents in draining 
mode. Something like this: 
[https://gist.github.com/sagar8192/0b9dbccc908818f8f9f5a18d1f634513] I don't 
know if this affects the allocator or not. We can put this behind a 
flag(something like --do-not-send-ffers-from-agents-in-draining-mode) and make 
it optional.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to