Re: [Design doc] RPC: Fault domains in Mesos

2017-06-08 Thread Neil Conway
Folks,

Thanks to everyone for their feedback! Based on discussions with
members of the Mesos community, we've made a few changes to this
proposal. To summarize:

(1) Renamed "rack" to "zone", both to be a bit more abstract and to
match the terminology used by most public cloud providers. That is, a
fault domain now consists of a zone and a region.

(2) To accommodate future kinds of domains, the DomainInfo message now
has a nested "FaultDomain" field. New types of domains (e.g., latency
domains, power domains) might be represented in the future via
additional fields in DomainInfo, but such extensions are out of the
scope of the current proposal.

(3) Clarified that allowing an agent to transition from "no configured
domain" to "configured domain" will require an agent drain in the MVP,
and added some discussion of the implementation/framework API
challenges around supporting domain opt-in w/o.

The review chain for the MVP of this feature are up now (MESOS-7607).

Neil


On Mon, Apr 17, 2017 at 9:44 AM, Neil Conway  wrote:
> Folks,
>
> I'd like to enhance Mesos to support a first-class notion of "fault
> domains" -- i.e., identifying the "rack" and "region" (DC) where a
> Mesos agent or master is located. The goal is to enable two main
> features:
>
> (1) To make it easier to write "rack-aware" Mesos frameworks that are
> portable to different Mesos clusters.
>
> (2) To improve the experience of configuring Mesos with a set of
> masters and agents in one DC, and another pool of "remote" agents in a
> different DC.
>
> For more information, please see the design doc:
>
> https://docs.google.com/document/d/1gEugdkLRbBsqsiFv3urRPRNrHwUC-i1HwfFfHR_MvC8
>
> I'd love any feedback, either directly on the Google doc or via email.
>
> Thanks,
> Neil


Re: [Design doc] RPC: Fault domains in Mesos

2017-04-19 Thread Neil Conway
Hi Maxime,

Thanks for the feedback!

The proposed approach is definitely simplistic. The "Discussion"
section of the design doc describes some of the rationale for starting
with a very simple scheme: basically, because

(a) we want to assign clear semantics to the levels of the hierarchy
(regions are far away from each other and inter-region network links
have high latency; racks are close together and inter-rack network
links have low latency).

(b) we don't want to make life too difficult for framework authors.

(c) most server software (e.g., HDFS, Kafka, Cassandra, etc.) only
understands a simple hierarchy -- in many cases, just a single level
("racks"), or occasionally two levels ("racks" and "DCs").

Can you elaborate on the use-cases that you see for a more complex
hierarchy of fault domains? I'd be happy to chat off-list if you'd
prefer.

Thanks!

Neil

On Tue, Apr 18, 2017 at 1:33 AM, Maxime Brugidou
 wrote:
> Hi Neil,
>
> I really like the idea of incorporating the concept of fault domains in
> Mesos, however I feel like the implementation proposed is a bit narrow to be
> actually useful for most users.
>
> I feel like we could make the fault domains definition more generic. As an
> example in our setup we would like to have something like Region > Building
>> Cage > Pod > Rack. Failure domains would be hierarchically arranged
> (meaning one domain in a lower level can only be included in one domain
> above).
>
> As a concrete example, we could have the mesos masters be aware of the fault
> domain hierarchy (with a config map for example), and slaves would just need
> to declare their lowest-level domain (for example their rack id). Then
> frameworks could use this domain hierarchy at will. If they need to "spread"
> their tasks for a very highly available setup, they could first spread using
> the highest fault domain (like the region), then if they have enough tasks
> to launch they could spread within each sub-domain recursively until they
> run out of tasks to spread. We do not need to artificially limit the number
> of levels of fault domains and the name of the fault domains. Schedulers do
> not need to know the names either, just the hierarchy.
>
> Then, to provide the other feature of "remote" slaves that you describe, we
> could configure the mesos master to only send offers from a "default" local
> fault domain, and frameworks would need to advertise a certain capability to
> receive offers for other remote fault domains.
>
> I feel we could implement this by identifying a fault domain with a simple
> list of ids like ["US-WEST-1", "Building 2", "Cage 3", "POD 12", "Rack 3"]
> or ["US-EAST-2", "Building 1"]. Slaves would advertise their lowest-level
> fault domains and schedulers could use this arbitrarily as a hierarchical
> list.
>
> Thanks,
> Maxime
>
> On Mon, Apr 17, 2017 at 6:45 PM Neil Conway  wrote:
>>
>> Folks,
>>
>> I'd like to enhance Mesos to support a first-class notion of "fault
>> domains" -- i.e., identifying the "rack" and "region" (DC) where a
>> Mesos agent or master is located. The goal is to enable two main
>> features:
>>
>> (1) To make it easier to write "rack-aware" Mesos frameworks that are
>> portable to different Mesos clusters.
>>
>> (2) To improve the experience of configuring Mesos with a set of
>> masters and agents in one DC, and another pool of "remote" agents in a
>> different DC.
>>
>> For more information, please see the design doc:
>>
>>
>> https://docs.google.com/document/d/1gEugdkLRbBsqsiFv3urRPRNrHwUC-i1HwfFfHR_MvC8
>>
>> I'd love any feedback, either directly on the Google doc or via email.
>>
>> Thanks,
>> Neil


[Design doc] RPC: Fault domains in Mesos

2017-04-17 Thread Neil Conway
Folks,

I'd like to enhance Mesos to support a first-class notion of "fault
domains" -- i.e., identifying the "rack" and "region" (DC) where a
Mesos agent or master is located. The goal is to enable two main
features:

(1) To make it easier to write "rack-aware" Mesos frameworks that are
portable to different Mesos clusters.

(2) To improve the experience of configuring Mesos with a set of
masters and agents in one DC, and another pool of "remote" agents in a
different DC.

For more information, please see the design doc:

https://docs.google.com/document/d/1gEugdkLRbBsqsiFv3urRPRNrHwUC-i1HwfFfHR_MvC8

I'd love any feedback, either directly on the Google doc or via email.

Thanks,
Neil