Re: Design doc: Agent draining and deprecation of maintenance primitives

Benjamin Mahler Thu, 06 Jun 2019 21:25:31 -0700

> With the new proposal, it's going to be as difficult as before to have
SLA-aware maintenances because it will need cooperation from the frameworks
anyway and we know this is rarely a priority for them. We will also lose
the ability to signal future maintenance in order to optimize allocations.

Personally, I think right now we should solve the basic need of draining a
node. The plan to add SLA-awareness into draining was to introduce a
capability that schedulers opt into that enables them to (1) take control
over the killing of tasks when an agent is put into the draining state and
(2) still get offers when an agent is the draining state in case the
scheduler needs to restart a task that *must* run. This allows an SLA-aware
scheduler to avoid killing during a drain if its task(s) will have SLAs
violated.

Perhaps this functionality can live alongside the maintenance schedule
information we currently support, without being coupled together. As far as
I'm aware that's something we hadn't considered (we considered integrating
into the maintenance schedules or replacing them).

> For example I had this idea to improve the allocator (or write a custom
one) that would offer resources from agents with no maintenance planned in
priority, and then sort agents by maintenance date in decremasing order.

Right now there is no meaning to the order of offers. Adding some meaning
to the ordering of offers quickly becomes an issue for us as soon as there
are multiple criteria that need to be evaluated. For example, if you want
to incorporate maintenance, load spreading, fault domain spreading, etc
across machines, it becomes less clear how offers should be ordered. One
could try to build some scoring model in mesos for ordering, but it will be
woefully inadequate since Mesos does not know anything about the pending
workloads: it's ultimately the schedulers that are best positioned to make
these decisions. This is why we are going to move towards an "optimistic
concurrency" model where schedulers can choose what they want and Mesos
enforces constraints (e.g. quota limits), thereby eliminating the
multi-scheduler scalability issues of the current offer model.

And as somewhat of an aside, the lack of built-in scheduling has been bad
for the Mesos ecosystem. The vast majority of users just need to schedule:
services, jobs and cron jobs. These have a pretty standard look and feel
(including the SLA aspect of them!). Many of the existing schedulers could
be thinner "orchestrators" that know when to submit something to be
scheduled by a common scheduler, rather than reimplementing all of the
typical scheduling primitives (constraints, SLA awareness, dealing with the
low level mesos scheduling API). My point here is that we ask too much of
frameworks and it hurts users. I would love to see scheduling become more
standardized and built into Mesos.

On Thu, Jun 6, 2019 at 10:52 AM Greg Mann <[email protected]> wrote:

> Maxime,
> Thanks for the feedback, it's much appreciated. I agree that it would be
> possible to evolve the existing primitives to accomplish something similar
> to the proposal. That is one option that was considered before writing the
> design doc, but after some discussion, I thought that it seems more
> appropriate to start over with a simpler model that accomplishes what we
> perceive to be the predominant use case: the automated draining of agent
> nodes, without the concept of a maintenance window or designated
> maintenance time in the future. However, perhaps this perception is
> incorrect?
>
> Using maintenance metadata to alter the sorting order in the allocator is
> an interesting idea; currently, the allocator does not have access to
> information about maintenance, but it's conceivable that we could extend
> the allocator interface to accommodate this. While the currently-proposed
> design would not allow this, it would allow operators to deactivate nodes,
> which is an extreme version of this, since deactivated agents would never
> have their resources offered to frameworks. This provides a blunt mechanism
> to prevent scheduling on nodes which have upcoming maintenance, although it
> sounds like you see some benefit to a more subtle notion of scheduling
> priority based on upcoming maintenance? Do you think that maintenance-aware
> sorting would provide much more benefit to you over agent deactivation? Do
> you make use of the existing maintenance primitives to signal upcoming
> maintenance on agents?
>
> Thanks!
> Greg
>
> On Thu, Jun 6, 2019 at 9:37 AM Maxime Brugidou <[email protected]>
> wrote:
>
>> Hi,
>>
>> As a Mesos operator, I am really surprised by this proposal.
>>
>> The main advantage of the proposed design is that we can finally set
>> nodes down for maintenance with a configurable kill grace period and a
>> proper task status (with maintenance primitives, it was TASK_LOST I think)
>> without any specific cooperation from the frameworks.
>>
>> I think that this could be just an evolution of the current primitives.
>>
>> With the new proposal, it's going to be as difficult as before to have
>> SLA-aware maintenances because it will need cooperation from the frameworks
>> anyway and we know this is rarely a priority for them. We will also lose
>> the ability to signal future maintenance in order to optimize allocations.
>>
>> For example I had this idea to improve the allocator (or write a custom
>> one) that would offer resources from agents with no maintenance planned in
>> priority, and then sort agents by maintenance date in decremasing order.
>> This would be a big improvement to prevent cluster reboots to trigger too
>> many task restarts. This will not be possible with the new primitives. The
>> same idea apply for frameworks too.
>>
>> Maxime
>>
>> Le jeu. 30 mai 2019 à 22:16, Joseph Wu <[email protected]> a écrit :
>>
>>> As far as I can tell, the document is public.
>>>
>>> On Thu, May 30, 2019 at 12:22 AM Marc Roos <[email protected]>
>>> wrote:
>>>
>>>>
>>>> Is the doc not public?
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Joseph Wu [mailto:[email protected]]
>>>> Sent: donderdag 30 mei 2019 2:07
>>>> To: dev; user
>>>> Subject: Design doc: Agent draining and deprecation of maintenance
>>>> primitives
>>>>
>>>> Hi all,
>>>>
>>>> A few years back, we added some constructs called maintenance
>>>> primitives
>>>> to Mesos.  This feature was meant to allow operators and frameworks to
>>>> cooperate in draining tasks off nodes scheduled for maintenance.  As
>>>> far
>>>> as we've observed since, this feature never achieved enough adoption to
>>>> be useful for operators.
>>>>
>>>> As such, we are proposing a more opinionated approach for draining
>>>> tasks.  The goal is to have Mesos perform draining in lieu of
>>>> frameworks, minimizing or eliminating the need to change frameworks to
>>>> account for draining.  We will also be simplifying the operator
>>>> workflow, which would only require a single call (holding an AgentID)
>>>> to
>>>> start draining; and a single call to bring an agent back into the
>>>> cluster.
>>>>
>>>>
>>>> Due to how closely this proposed feature overlaps with maintenance
>>>> primitives, we will be deprecating maintenance primitives upon
>>>> implementation of agent draining.
>>>>
>>>>
>>>> If interested, please take a look at the design document:
>>>>
>>>>
>>>> https://docs.google.com/document/d/1w3O80NFE6m52XNMv7EdXSO-1NebEs8opA8VZPG1tW0Y/
>>>>
>>>>
>>>>

Re: Design doc: Agent draining and deprecation of maintenance primitives

Reply via email to