Re: Rack awareness support for Mesos

james Wed, 15 Jun 2016 09:03:18 -0700

@Joris,

OK. Now I understand where you are coming from. As soon as I get sometime, I'll join that design discussion. Thanks for the clarifications.


James





On 06/15/2016 02:45 AM, Joris Van Remoortere wrote:

Since your interest is in the determination of the values, as
opposed to

their propagation, I would just urge that you keep in mind that
we may

(as a project) not want to support this information as the current

string attributes.

Huh? Why not? If the attributes change, why can't this sub-project
just change with those changing string attributes? Maybe some
elaboration how this might not naturally be able to evolve is a
warranted detail of discussion?

Sorry, I should clarify what I meant by support. By support I mean that
we may not want to promise that those values will be there (support as a
feature), and what schemas are mangled into the random strings that we
currently call attributes. I did not mean that we wouldn't allow users
to inject their own values if they wanted to. We just wouldn't control
the standard or schema as a project and therefore couldn't support it.

Any random collection of strings that has previously had no reserved
keywords is notoriously difficult to build new schemas in.
This is why we may want to instead introduce a typed structure that is
dedicated to fault domain information. This:

* Prevents us from colliding with current users' attributes.
* Allows us to have more control over the types (YAY) and ranges of
values.
* Allows us to introduce explicit structure such as dependency or
hierarchy.

The fact that users have already encoded information in attributes is
not a reason for us to limit ourselves to that scope when better
structures may be available. This is why we shouldn't assume that the
project will *provide support for* (as opposed to allow users to) using
attributes.

As your said, it is their prerogative to join the design discussion to
ensure that any formalized structure or schema we introduce is one that
they are agreeable with.

—
*Joris Van Remoortere*
Mesosphere

On Tue, Jun 14, 2016 at 6:31 PM, james <gar...@verizon.net
<mailto:gar...@verizon.net>> wrote:

On 06/14/2016 08:14 AM, Joris Van Remoortere wrote:

On the condition of compatible with existing framework which
already rely on parsing attributes for rack information.

There is currently nothing in Mesos that specifies the format or
structure for rack information in attributes.
The fact that operators / frameworks have decided to add this
information out of band is their problem to solve.
We don't need to be backwards compatible with something we never
published to begin with. This is why it's ok for us to consider
adding a
typed form of failure domain information that is separate from the
typeless string attributes.

True. But you have to start somewhere, know that the schema and
codes will morph over time to maintain relevance and usefulness. In
that vein, if folks have established interesting and useful
parameters for this work, then it is most beneficial that those
methods and codes are considered carefully. AKA:: speak up now.
Diversity and inclusion are keenly beneficial, where practical.

Since your interest is in the determination of the values, as
opposed to
their propagation, I would just urge that you keep in mind that
we may
(as a project) not want to support this information as the current
string attributes.

I would venture that both 'determination of the values and
propagation (delays)' are inherently important in a cluster of many
things:: hardware, resources, frameworks, security codes, etc etc.
The author
and others seem to be keenly aware that a tight focus is not going
to work, at this stage, so a broad appeal to a multitude of needs is
best.
And in fact, until some idea is proven to be useless or too difficult to
implement, the bigger the tent, the more useful the codes that
define this project/idea become. Personally, I'm very excited that
someone has stepped up in this area; hoping they keep an open mind
and flexibility geared toward multiplicative usage, in the future.
Most mature hardware folks who build ideas into robust systems do
exactly that, to motivate a multiplicative usage for organizing
hardware, performance and state metrics, and timing signals,
gregariously. All of this is routine semantics from a hardware
perspective.

At some point, folks will realize that kernel configuration, testing
and tweaks are critical to cluster performance, regardless of the codes
running on top of the cluster. So this project could easily use cgroups
and such for achieve robustness in many areas of need.

Like it or not large amounts of hardware, need to have schema,
planning and architectural robustness to keep large amounts of
hardware, pristinely available for software efficiency to be any
where near optimal deployment. This really becomes critical when the
mix of different CPU types, GPUs and ram are to be considered in
future deployments, regardless if you outsource or run your own
cluster. Hardware vendors are going to want to sell their products
to as wide of a customer base a possible and customers are going to
demand seamless management for expansion of resources. Furthermore,
as a consultant my experiences are that much of the future market is
going to demand outsourced, hybrid and in-house options as a
fundamental tenant of cluster resource adoption.

hth,
James

*Joris Van Remoortere*
Mesosphere

On Tue, Jun 14, 2016 at 3:02 PM, Du, Fan <fan...@intel.com
<mailto:fan...@intel.com>
<mailto:fan...@intel.com <mailto:fan...@intel.com>>> wrote:

On 2016/6/14 20:32, Joris Van Remoortere wrote:

#1. Stick with attributes for rack awareness

I don't think this is the right approach; however,
there seem to
be 2
components to this discussion:

1. How the values are presented (Attributes vs. a new
type-aware
structure)
2. How the values are determined (scripts vs.
automation vs.
modules)

It seems you are more interested in working on #2. If
that's the
case,
please make sure that you don't assume anything about
#1, as we not
everyone agrees that we will use the existing
attributes in the
future.

On the condition of compatible with existing framework
which already
rely on parsing attributes for rack information.

Quotes from my original statements:
> For compatibility with existing framework, I tend to be
ok with using
> attributes to convey the rack information

By all means, no matter what internal structures to use,
current
behavior should be honored. btw, I'm also thinking about
#1, it's
too earlier to bring up the details so far before the
ticket got
ACCEPTED.

Any way, I'm always open to all kind of discussion, thanks
for your
comments! Joris.

For #2, you should focus on an API (module or script
results)
that will
support all the different methods the community wants
to use to
generate
this data.

As you mentioned, updating the values for a running
agent is not
straightforward. A lot of design work will need to go
into how these
values are propagated to frameworks that have made
assumptions about
them, and which values are allowed to change vs. not.

—
*Joris Van Remoortere*
Mesosphere

On Tue, Jun 14, 2016 at 10:04 AM, Aaron Carey
<aca...@ilm.com <mailto:aca...@ilm.com>
<mailto:aca...@ilm.com <mailto:aca...@ilm.com>>
<mailto:aca...@ilm.com <mailto:aca...@ilm.com>
<mailto:aca...@ilm.com <mailto:aca...@ilm.com>>>> wrote:

#3 would be very helpful for us. Also related:

https://issues.apache.org/jira/browse/MESOS-3059

Aaron Carey
Production Engineer - Cloud Pipeline
Industrial Light & Magic
London
020 3751 9150

________________________________________
From: Du, Fan [fan...@intel.com
<mailto:fan...@intel.com> <mailto:fan...@intel.com
<mailto:fan...@intel.com>>
<mailto:fan...@intel.com <mailto:fan...@intel.com>
<mailto:fan...@intel.com <mailto:fan...@intel.com>>>]
Sent: 14 June 2016 07:24
To: user@mesos.apache.org
<mailto:user@mesos.apache.org> <mailto:user@mesos.apache.org
<mailto:user@mesos.apache.org>>
<mailto:user@mesos.apache.org
<mailto:user@mesos.apache.org> <mailto:user@mesos.apache.org
<mailto:user@mesos.apache.org>>>;
d...@mesos.apache.org <mailto:d...@mesos.apache.org>
<mailto:d...@mesos.apache.org <mailto:d...@mesos.apache.org>>
<mailto:d...@mesos.apache.org
<mailto:d...@mesos.apache.org> <mailto:d...@mesos.apache.org
<mailto:d...@mesos.apache.org>>>
Cc: Joris Van Remoortere; vinodk...@apache.org
<mailto:vinodk...@apache.org>
<mailto:vinodk...@apache.org <mailto:vinodk...@apache.org>>
<mailto:vinodk...@apache.org
<mailto:vinodk...@apache.org> <mailto:vinodk...@apache.org
<mailto:vinodk...@apache.org>>>

Subject: Re: Rack awareness support for Mesos

Hi everyone

Let me summarize the discussion about Rack
awareness in the
community so
far. First thanks for all the comments, advices or
challenges! :)

#1. Stick with attributes for rack awareness

For compatibility with existing framework, I tend
to be ok
with using
attributes to convey the rack information, but
with the
goal to do it
automatically, easy to maintain and with good
attributes
schema. This
will bring up below question where the controversy
starts.

#2. Scripts vs programmatic way

Both can be used to set attributes, I've made my
arguments
in the Jira
and the Design doc, I'm not gonna to argue more
here. But
please take a
look discussion at MESOS-3366 before, which allow
resources/attributes
discovery.

A module to implement *slaveAttributesDecorator*
hook will
works like
a charm here in a static way. And need to justify
attributes updating.

#3. Allow updating attributes
Several cases need to be covered here:

a). Mesos runs inside VMs or container, where live
migration happens, so
rack information need to be updated.

b). LLDP packets are broadcasted by the interval
10s~30s, a
vendor
specific implementation, and rack information are
usually
stored in LLDP
daemon to be queried. Worst cases(nodes fresh
reboot, or
daemon restart)
would be: Mesos slave have to wait 10s~30s for a
valid rack
information
before register to master. Allow updating
attributes will
mitigate this
problem.

c). Framework affinity

Framework X prefers to run on the same nodes with
another
framwork Y.
For example, it's desirable for Shark or Spark-SQL to
reside on the
*worker* node where Alluxio(former Tachyon) to
gain more
performance
boosting as SPARK-6707 ticket message
{tachyon=true;us-east-1=false}

If framework could advertise agent attributes in the
ResourcesOffer
process, awesome!

#4. Rearrange agents in a more scalable manner,
like per
rack basis

Randomly offering agents resource to framework
does not
improve data
locality, imagine the likelihood of a framework
getting
resources
underneath the same rack, at the scale of +30000
nodes.
Moreover time to
randomly shuffle the agents also grows.

How about rearranging the agent in a per rack
basis, and a
minor change
to the way how resources are allocated will fix this.

I might not see the whole picture here, so
comments are
welcomed!

On 2016/6/6 17:17, Du, Fan wrote:
> Hi, Mesos folks
>
> I’ve been thinking about Mesos rack awareness
support
for a while,
>
> it’s a common interest for lots of data center
applications to
provide
> data locality,
>
> fault tolerance and better task placement. Create
MESOS-5545 to track
> the story,
>
> and here is the initial design doc [1] to
support rack
awareness
in Mesos.
>
> Looking forward to hear any comments from end
user and other
developers,
>
> Thanks!
>
> [1]:
>

https://docs.google.com/document/d/1rql_LZSwtQzBPALnk0qCLsmxcT3-zB7X7aJp-H3xxyE/edit?usp=sharing
>

Re: Rack awareness support for Mesos

Reply via email to