Re: Rack awareness support for Mesos

2016-06-17 Thread Du, Fan
<mailto:d...@mesos.apache.org>>>
  <mailto:d...@mesos.apache.org
<mailto:d...@mesos.apache.org>
 <mailto:d...@mesos.apache.org
<mailto:d...@mesos.apache.org>> <mailto:d...@mesos.apache.org
<mailto:d...@mesos.apache.org>
 <mailto:d...@mesos.apache.org
<mailto:d...@mesos.apache.org>>>>
   Cc: Joris Van Remoortere;
vinodk...@apache.org <mailto:vinodk...@apache.org>
 <mailto:vinodk...@apache.org <mailto:vinodk...@apache.org>>
  <mailto:vinodk...@apache.org
    <mailto:vinodk...@apache.org> <mailto:vinodk...@apache.org
<mailto:vinodk...@apache.org>>>
   <mailto:vinodk...@apache.org
<mailto:vinodk...@apache.org>
 <mailto:vinodk...@apache.org
<mailto:vinodk...@apache.org>> <mailto:vinodk...@apache.org
<mailto:vinodk...@apache.org>
 <mailto:vinodk...@apache.org
<mailto:vinodk...@apache.org>>>>


   Subject: Re: Rack awareness support for Mesos

   Hi everyone

   Let me summarize the discussion about Rack
 awareness in the
  community so
   far. First thanks for all the comments,
advices or
  challenges! :)

   #1. Stick with attributes for rack awareness

   For compatibility with existing
framework, I tend
 to be ok
  with using
   attributes to convey the rack
information, but
 with the
  goal to do it
   automatically, easy to maintain and with good
 attributes
  schema. This
   will bring up below question where the
controversy
 starts.

   #2. Scripts vs programmatic way

   Both can be used to set attributes, I've
made my
 arguments
  in the Jira
   and the Design doc, I'm not gonna to
argue more
 here. But
  please take a
   look discussion at MESOS-3366 before,
which allow
  resources/attributes
   discovery.

   A module to implement
*slaveAttributesDecorator*
 hook will
  works like
   a charm here in a static way. And need to
justify
  attributes updating.

   #3. Allow updating attributes
   Several cases need to be covered here:

   a). Mesos runs inside VMs or container,
where live
  migration happens, so
   rack information need to be updated.

   b). LLDP packets are broadcasted by the
interval
 10s~30s, a
  vendor
   specific implementation, and rack
information are
 usually
  stored in LLDP
   daemon to be queried. Worst cases(nodes fresh
 reboot, or
  daemon restart)
   would be: Mesos slave have to wait
10s~30s for a
 valid rack
  information
   before register to master. Allow updating
 attributes will
  mitigate this
   problem.

   c). Framework affinity

   Framework X prefers to run on the same
nodes with
 another
  framwork Y.
   For example, it's desirable for Shark or
Spark-SQL to
  reside on the
   *worker* node where Alluxio(former
Tachyon) to
 gain more
  performance
   boosting as SPARK-6707 ticket message
  {tachyon=true;us-east-1=false}

   If framework could advertise agent
attributes in the
  ResourcesOffer
  

Re: Rack awareness support for Mesos

2016-06-16 Thread Joris Van Remoortere
 you should focus on an API (module or script
>> results)
>>  that will
>>  support all the different methods the community wants
>> to use to
>>  generate
>>  this data.
>>
>>  As you mentioned, updating the values for a running
>> agent is not
>>  straightforward. A lot of design work will need to go
>> into how these
>>  values are propagated to frameworks that have made
>> assumptions about
>>  them, and which values are allowed to change vs. not.
>>
>>  —
>>  *Joris Van Remoortere*
>>  Mesosphere
>>
>>  On Tue, Jun 14, 2016 at 10:04 AM, Aaron Carey
>> <aca...@ilm.com <mailto:aca...@ilm.com>
>>  <mailto:aca...@ilm.com <mailto:aca...@ilm.com>>
>>  <mailto:aca...@ilm.com <mailto:aca...@ilm.com>
>> <mailto:aca...@ilm.com <mailto:aca...@ilm.com>>>> wrote:
>>
>>   #3 would be very helpful for us. Also related:
>>
>> https://issues.apache.org/jira/browse/MESOS-3059
>>
>>   --
>>
>>   Aaron Carey
>>   Production Engineer - Cloud Pipeline
>>   Industrial Light & Magic
>>       London
>>   020 3751 9150
>>
>>   
>>   From: Du, Fan [fan...@intel.com
>> <mailto:fan...@intel.com> <mailto:fan...@intel.com
>> <mailto:fan...@intel.com>>
>>  <mailto:fan...@intel.com <mailto:fan...@intel.com>
>> <mailto:fan...@intel.com <mailto:fan...@intel.com>>>]
>>   Sent: 14 June 2016 07:24
>>   To: user@mesos.apache.org
>> <mailto:user@mesos.apache.org> <mailto:user@mesos.apache.org
>> <mailto:user@mesos.apache.org>>
>>  <mailto:user@mesos.apache.org
>> <mailto:user@mesos.apache.org> <mailto:user@mesos.apache.org
>> <mailto:user@mesos.apache.org>>>;
>> d...@mesos.apache.org <mailto:d...@mesos.apache.org>
>> <mailto:d...@mesos.apache.org <mailto:d...@mesos.apache.org>>
>>  <mailto:d...@mesos.apache.org
>> <mailto:d...@mesos.apache.org> <mailto:d...@mesos.apache.org
>> <mailto:d...@mesos.apache.org>>>
>>   Cc: Joris Van Remoortere; vinodk...@apache.org
>> <mailto:vinodk...@apache.org>
>>  <mailto:vinodk...@apache.org > vinodk...@apache.org>>
>>   <mailto:vinodk...@apache.org
>> <mailto:vinodk...@apache.org> <mailto:vinodk...@apache.org
>> <mailto:vinodk...@apache.org>>>
>>
>>
>>   Subject: Re: Rack awareness support for Mesos
>>
>>   Hi everyone
>>
>>   Let me summarize the discussion about Rack
>> awareness in the
>>  community so
>>   far. First thanks for all the comments, advices or
>>  challenges! :)
>>
>>   #1. Stick with attributes for rack awareness
>>
>>   For compatibility with existing framework, I tend
>> to be ok
>>  with using
>>   attributes to convey the rack information, but
>> with the
>>  goal to do it
>>   automatically, easy to maintain and with good
>> attributes
>>  schema. This
>>   will bring up below question where the controversy
>> starts.
>>
>>   #2. Scripts vs programmatic way
>>
>>   Both can be used to set attributes, I've made my
>> arguments
>>  in the Jira
>>   and the Design doc, I'm not gonna to argue more
>> here. But
>>  please take a
>>   look discussion at MESOS-3366 before, which allow
>>  resources/attributes
&g

Re: Rack awareness support for Mesos

2016-06-15 Thread james
he.org
<mailto:user@mesos.apache.org> <mailto:user@mesos.apache.org
<mailto:user@mesos.apache.org>>>;
d...@mesos.apache.org <mailto:d...@mesos.apache.org>
<mailto:d...@mesos.apache.org <mailto:d...@mesos.apache.org>>
 <mailto:d...@mesos.apache.org
<mailto:d...@mesos.apache.org> <mailto:d...@mesos.apache.org
<mailto:d...@mesos.apache.org>>>
  Cc: Joris Van Remoortere; vinodk...@apache.org
<mailto:vinodk...@apache.org>
 <mailto:vinodk...@apache.org <mailto:vinodk...@apache.org>>
  <mailto:vinodk...@apache.org
<mailto:vinodk...@apache.org> <mailto:vinodk...@apache.org
<mailto:vinodk...@apache.org>>>


  Subject: Re: Rack awareness support for Mesos

  Hi everyone

  Let me summarize the discussion about Rack
awareness in the
 community so
  far. First thanks for all the comments, advices or
 challenges! :)

  #1. Stick with attributes for rack awareness

  For compatibility with existing framework, I tend
to be ok
 with using
  attributes to convey the rack information, but
with the
 goal to do it
  automatically, easy to maintain and with good
attributes
 schema. This
  will bring up below question where the controversy
starts.

  #2. Scripts vs programmatic way

  Both can be used to set attributes, I've made my
arguments
 in the Jira
  and the Design doc, I'm not gonna to argue more
here. But
 please take a
  look discussion at MESOS-3366 before, which allow
 resources/attributes
  discovery.

  A module to implement *slaveAttributesDecorator*
hook will
 works like
  a charm here in a static way. And need to justify
 attributes updating.

  #3. Allow updating attributes
  Several cases need to be covered here:

  a). Mesos runs inside VMs or container, where live
 migration happens, so
  rack information need to be updated.

  b). LLDP packets are broadcasted by the interval
10s~30s, a
 vendor
  specific implementation, and rack information are
usually
 stored in LLDP
  daemon to be queried. Worst cases(nodes fresh
reboot, or
 daemon restart)
  would be: Mesos slave have to wait 10s~30s for a
valid rack
 information
  before register to master. Allow updating
attributes will
 mitigate this
  problem.

  c). Framework affinity

  Framework X prefers to run on the same nodes with
another
 framwork Y.
  For example, it's desirable for Shark or Spark-SQL to
 reside on the
  *worker* node where Alluxio(former Tachyon) to
gain more
 performance
  boosting as SPARK-6707 ticket message
 {tachyon=true;us-east-1=false}

  If framework could advertise agent attributes in the
 ResourcesOffer
  process, awesome!


  #4. Rearrange agents in a more scalable manner,
like per
 rack basis

  Randomly offering agents resource to framework
does not
 improve data
  locality, imagine the likelihood of a framework
getting
 resources
  underneath the same rack, at the scale of +3
nodes.
 Moreover time to
  randomly shuffle the agents also grows.

  How about rearranging the agent in a per rack
basis, and a
 minor change
  to the way how resources are allocated will fix this.


  I might not see the whole picture here, so
comments are
 welcomed!


  On 2016/6/6 17:17, Du, Fan wrote:
   > Hi, Mesos folks
   >
   > I’ve been thinking about Mesos rack awareness
support
 for a while,
  

Re: Rack awareness support for Mesos

2016-06-15 Thread Joris Van Remoortere
 Like it or not large amounts of hardware, need to have schema, planning
> and architectural robustness to keep large amounts of hardware, pristinely
> available for software efficiency to be any where near optimal deployment.
> This really becomes critical when the mix of different CPU types, GPUs and
> ram are to be considered in future deployments, regardless if you outsource
> or run your own cluster. Hardware vendors are going to want to sell their
> products to as wide of a customer base a possible and customers are going
> to demand seamless management for expansion of resources. Furthermore, as a
> consultant my experiences are that much of the future market is going to
> demand outsourced, hybrid and in-house options as a fundamental tenant of
> cluster resource adoption.
>
> hth,
> James
>
>
> *Joris Van Remoortere*
>> Mesosphere
>>
>> On Tue, Jun 14, 2016 at 3:02 PM, Du, Fan <fan...@intel.com
>> <mailto:fan...@intel.com>> wrote:
>>
>>
>>
>> On 2016/6/14 20:32, Joris Van Remoortere wrote:
>>
>>  #1. Stick with attributes for rack awareness
>>
>> I don't think this is the right approach; however, there seem to
>> be 2
>> components to this discussion:
>>
>> 1. How the values are presented (Attributes vs. a new type-aware
>> structure)
>> 2. How the values are determined (scripts vs. automation vs.
>> modules)
>>
>> It seems you are more interested in working on #2. If that's the
>> case,
>> please make sure that you don't assume anything about #1, as we
>> not
>> everyone agrees that we will use the existing attributes in the
>> future.
>>
>>
>> On the condition of compatible with existing framework which already
>> rely on parsing attributes for rack information.
>>
>> Quotes from my original statements:
>> > For compatibility with existing framework, I tend to be ok with
>> using
>> > attributes to convey the rack information
>>
>> By all means, no matter what internal structures to use, current
>> behavior should be honored. btw, I'm also thinking about #1, it's
>> too earlier to bring up the details so far before the ticket got
>> ACCEPTED.
>>
>> Any way, I'm always open to all kind of discussion, thanks for your
>> comments! Joris.
>>
>> For #2, you should focus on an API (module or script results)
>> that will
>> support all the different methods the community wants to use to
>> generate
>> this data.
>>
>> As you mentioned, updating the values for a running agent is not
>> straightforward. A lot of design work will need to go into how
>> these
>> values are propagated to frameworks that have made assumptions
>> about
>> them, and which values are allowed to change vs. not.
>>
>> —
>> *Joris Van Remoortere*
>> Mesosphere
>>
>> On Tue, Jun 14, 2016 at 10:04 AM, Aaron Carey <aca...@ilm.com
>> <mailto:aca...@ilm.com>
>> <mailto:aca...@ilm.com <mailto:aca...@ilm.com>>> wrote:
>>
>>  #3 would be very helpful for us. Also related:
>>
>> https://issues.apache.org/jira/browse/MESOS-3059
>>
>>  --
>>
>>  Aaron Carey
>>  Production Engineer - Cloud Pipeline
>>  Industrial Light & Magic
>>  London
>>      020 3751 9150
>>
>>  
>>  From: Du, Fan [fan...@intel.com <mailto:fan...@intel.com>
>> <mailto:fan...@intel.com <mailto:fan...@intel.com>>]
>>  Sent: 14 June 2016 07:24
>>  To: user@mesos.apache.org <mailto:user@mesos.apache.org>
>> <mailto:user@mesos.apache.org <mailto:user@mesos.apache.org>>;
>> d...@mesos.apache.org <mailto:d...@mesos.apache.org>
>> <mailto:d...@mesos.apache.org <mailto:d...@mesos.apache.org>>
>>  Cc: Joris Van Remoortere; vinodk...@apache.org
>> <mailto:vinodk...@apache.org>
>>  <mailto:vinodk...@apache.org <mailto:vinodk...@apache.org>>
>>
>>
>>  Subject: Re: Rack awareness support for Mesos
>>
>>  Hi everyone
>>
>

Re: Rack awareness support for Mesos

2016-06-14 Thread james
 (module or script results)
that will
support all the different methods the community wants to use to
generate
this data.

As you mentioned, updating the values for a running agent is not
straightforward. A lot of design work will need to go into how these
values are propagated to frameworks that have made assumptions about
them, and which values are allowed to change vs. not.

—
*Joris Van Remoortere*
Mesosphere

On Tue, Jun 14, 2016 at 10:04 AM, Aaron Carey <aca...@ilm.com
<mailto:aca...@ilm.com>
<mailto:aca...@ilm.com <mailto:aca...@ilm.com>>> wrote:

 #3 would be very helpful for us. Also related:

https://issues.apache.org/jira/browse/MESOS-3059

 --

 Aaron Carey
 Production Engineer - Cloud Pipeline
 Industrial Light & Magic
 London
 020 3751 9150

 
 From: Du, Fan [fan...@intel.com <mailto:fan...@intel.com>
<mailto:fan...@intel.com <mailto:fan...@intel.com>>]
 Sent: 14 June 2016 07:24
 To: user@mesos.apache.org <mailto:user@mesos.apache.org>
<mailto:user@mesos.apache.org <mailto:user@mesos.apache.org>>;
d...@mesos.apache.org <mailto:d...@mesos.apache.org>
<mailto:d...@mesos.apache.org <mailto:d...@mesos.apache.org>>
 Cc: Joris Van Remoortere; vinodk...@apache.org
<mailto:vinodk...@apache.org>
 <mailto:vinodk...@apache.org <mailto:vinodk...@apache.org>>

 Subject: Re: Rack awareness support for Mesos

 Hi everyone

 Let me summarize the discussion about Rack awareness in the
community so
 far. First thanks for all the comments, advices or
challenges! :)

 #1. Stick with attributes for rack awareness

 For compatibility with existing framework, I tend to be ok
with using
 attributes to convey the rack information, but with the
goal to do it
 automatically, easy to maintain and with good attributes
schema. This
 will bring up below question where the controversy starts.

 #2. Scripts vs programmatic way

 Both can be used to set attributes, I've made my arguments
in the Jira
 and the Design doc, I'm not gonna to argue more here. But
please take a
 look discussion at MESOS-3366 before, which allow
resources/attributes
 discovery.

 A module to implement *slaveAttributesDecorator* hook will
works like
 a charm here in a static way. And need to justify
attributes updating.

 #3. Allow updating attributes
 Several cases need to be covered here:

 a). Mesos runs inside VMs or container, where live
migration happens, so
 rack information need to be updated.

 b). LLDP packets are broadcasted by the interval 10s~30s, a
vendor
 specific implementation, and rack information are usually
stored in LLDP
 daemon to be queried. Worst cases(nodes fresh reboot, or
daemon restart)
 would be: Mesos slave have to wait 10s~30s for a valid rack
information
 before register to master. Allow updating attributes will
mitigate this
 problem.

 c). Framework affinity

 Framework X prefers to run on the same nodes with another
framwork Y.
 For example, it's desirable for Shark or Spark-SQL to
reside on the
 *worker* node where Alluxio(former Tachyon) to gain more
performance
 boosting as SPARK-6707 ticket message
{tachyon=true;us-east-1=false}

 If framework could advertise agent attributes in the
ResourcesOffer
 process, awesome!


 #4. Rearrange agents in a more scalable manner, like per
rack basis

 Randomly offering agents resource to framework does not
improve data
 locality, imagine the likelihood of a framework getting
resources
 underneath the same rack, at the scale of +3 nodes.
Moreover time to
 randomly shuffle the agents also grows.

 How about rearranging the agent in a per rack basis, and a
minor change
 to the way how resources are allocated will fix this.


 I might not see the whole picture here, so comments are
welcomed!


 On 2016/6/6 17:17, Du, Fan wrote:
  > Hi, Mesos folks
  >
  > I’ve been thinking about M

Re: Rack awareness support for Mesos

2016-06-14 Thread Du, Fan



On 2016/6/14 21:14, Joris Van Remoortere wrote:

On the condition of compatible with existing framework which already rely

on parsing attributes for rack information.
There is currently nothing in Mesos that specifies the format or structure
for rack information in attributes.
The fact that operators / frameworks have decided to add this information
out of band is their problem to solve.
We don't need to be backwards compatible with something we never published
to begin with. This is why it's ok for us to consider adding a typed form
of failure domain information that is separate from the typeless string
attributes.


hmm, sounds promising, then we can travel light!


Since your interest is in the determination of the values, as opposed to


You are presuming my work scope, this is not true from the very beginning.


their propagation, I would just urge that you keep in mind that we may (as
a project) not want to support this information as the current string
attributes.


Well understood, thanks for the explanation!
Any comments about #3. and #4?




—
*Joris Van Remoortere*
Mesosphere

On Tue, Jun 14, 2016 at 3:02 PM, Du, Fan <fan...@intel.com> wrote:




On 2016/6/14 20:32, Joris Van Remoortere wrote:


 #1. Stick with attributes for rack awareness

I don't think this is the right approach; however, there seem to be 2
components to this discussion:

1. How the values are presented (Attributes vs. a new type-aware
structure)
2. How the values are determined (scripts vs. automation vs. modules)

It seems you are more interested in working on #2. If that's the case,
please make sure that you don't assume anything about #1, as we not
everyone agrees that we will use the existing attributes in the future.



On the condition of compatible with existing framework which already rely
on parsing attributes for rack information.

Quotes from my original statements:

For compatibility with existing framework, I tend to be ok with using
attributes to convey the rack information


By all means, no matter what internal structures to use, current behavior
should be honored. btw, I'm also thinking about #1, it's too earlier to
bring up the details so far before the ticket got ACCEPTED.

Any way, I'm always open to all kind of discussion, thanks for your
comments! Joris.

For #2, you should focus on an API (module or script results) that will

support all the different methods the community wants to use to generate
this data.

As you mentioned, updating the values for a running agent is not
straightforward. A lot of design work will need to go into how these
values are propagated to frameworks that have made assumptions about
them, and which values are allowed to change vs. not.

—
*Joris Van Remoortere*
Mesosphere

On Tue, Jun 14, 2016 at 10:04 AM, Aaron Carey <aca...@ilm.com
<mailto:aca...@ilm.com>> wrote:

 #3 would be very helpful for us. Also related:

 https://issues.apache.org/jira/browse/MESOS-3059

 --

 Aaron Carey
 Production Engineer - Cloud Pipeline
 Industrial Light & Magic
 London
 020 3751 9150

 
 From: Du, Fan [fan...@intel.com <mailto:fan...@intel.com>]
 Sent: 14 June 2016 07:24
 To: user@mesos.apache.org <mailto:user@mesos.apache.org>;
 d...@mesos.apache.org <mailto:d...@mesos.apache.org>
 Cc: Joris Van Remoortere; vinodk...@apache.org
 <mailto:vinodk...@apache.org>

 Subject: Re: Rack awareness support for Mesos

 Hi everyone

 Let me summarize the discussion about Rack awareness in the community
so
 far. First thanks for all the comments, advices or challenges! :)

 #1. Stick with attributes for rack awareness

 For compatibility with existing framework, I tend to be ok with using
 attributes to convey the rack information, but with the goal to do it
 automatically, easy to maintain and with good attributes schema. This
 will bring up below question where the controversy starts.

 #2. Scripts vs programmatic way

 Both can be used to set attributes, I've made my arguments in the Jira
 and the Design doc, I'm not gonna to argue more here. But please take
a
 look discussion at MESOS-3366 before, which allow resources/attributes
 discovery.

 A module to implement *slaveAttributesDecorator* hook will works like
 a charm here in a static way. And need to justify attributes updating.

 #3. Allow updating attributes
 Several cases need to be covered here:

 a). Mesos runs inside VMs or container, where live migration happens,
so
 rack information need to be updated.

 b). LLDP packets are broadcasted by the interval 10s~30s, a vendor
 specific implementation, and rack information are usually stored in
LLDP
 daemon to be queried. Worst cases(nodes fresh reboot, or daemon
restart)
 would be: Mesos slave have to wait 10s~30s for a valid rack
information

Re: Rack awareness support for Mesos

2016-06-14 Thread Joris Van Remoortere
> On the condition of compatible with existing framework which already rely
on parsing attributes for rack information.
There is currently nothing in Mesos that specifies the format or structure
for rack information in attributes.
The fact that operators / frameworks have decided to add this information
out of band is their problem to solve.
We don't need to be backwards compatible with something we never published
to begin with. This is why it's ok for us to consider adding a typed form
of failure domain information that is separate from the typeless string
attributes.

Since your interest is in the determination of the values, as opposed to
their propagation, I would just urge that you keep in mind that we may (as
a project) not want to support this information as the current string
attributes.



—
*Joris Van Remoortere*
Mesosphere

On Tue, Jun 14, 2016 at 3:02 PM, Du, Fan <fan...@intel.com> wrote:

>
>
> On 2016/6/14 20:32, Joris Van Remoortere wrote:
>
>> #1. Stick with attributes for rack awareness
>>
>> I don't think this is the right approach; however, there seem to be 2
>> components to this discussion:
>>
>> 1. How the values are presented (Attributes vs. a new type-aware
>> structure)
>> 2. How the values are determined (scripts vs. automation vs. modules)
>>
>> It seems you are more interested in working on #2. If that's the case,
>> please make sure that you don't assume anything about #1, as we not
>> everyone agrees that we will use the existing attributes in the future.
>>
>
> On the condition of compatible with existing framework which already rely
> on parsing attributes for rack information.
>
> Quotes from my original statements:
> > For compatibility with existing framework, I tend to be ok with using
> > attributes to convey the rack information
>
> By all means, no matter what internal structures to use, current behavior
> should be honored. btw, I'm also thinking about #1, it's too earlier to
> bring up the details so far before the ticket got ACCEPTED.
>
> Any way, I'm always open to all kind of discussion, thanks for your
> comments! Joris.
>
> For #2, you should focus on an API (module or script results) that will
>> support all the different methods the community wants to use to generate
>> this data.
>>
>> As you mentioned, updating the values for a running agent is not
>> straightforward. A lot of design work will need to go into how these
>> values are propagated to frameworks that have made assumptions about
>> them, and which values are allowed to change vs. not.
>>
>> —
>> *Joris Van Remoortere*
>> Mesosphere
>>
>> On Tue, Jun 14, 2016 at 10:04 AM, Aaron Carey <aca...@ilm.com
>> <mailto:aca...@ilm.com>> wrote:
>>
>> #3 would be very helpful for us. Also related:
>>
>> https://issues.apache.org/jira/browse/MESOS-3059
>>
>> --
>>
>> Aaron Carey
>> Production Engineer - Cloud Pipeline
>> Industrial Light & Magic
>> London
>> 020 3751 9150
>>
>> 
>> From: Du, Fan [fan...@intel.com <mailto:fan...@intel.com>]
>> Sent: 14 June 2016 07:24
>> To: user@mesos.apache.org <mailto:user@mesos.apache.org>;
>> d...@mesos.apache.org <mailto:d...@mesos.apache.org>
>> Cc: Joris Van Remoortere; vinodk...@apache.org
>> <mailto:vinodk...@apache.org>
>>
>> Subject: Re: Rack awareness support for Mesos
>>
>> Hi everyone
>>
>> Let me summarize the discussion about Rack awareness in the community
>> so
>> far. First thanks for all the comments, advices or challenges! :)
>>
>> #1. Stick with attributes for rack awareness
>>
>> For compatibility with existing framework, I tend to be ok with using
>> attributes to convey the rack information, but with the goal to do it
>> automatically, easy to maintain and with good attributes schema. This
>> will bring up below question where the controversy starts.
>>
>> #2. Scripts vs programmatic way
>>
>> Both can be used to set attributes, I've made my arguments in the Jira
>> and the Design doc, I'm not gonna to argue more here. But please take
>> a
>> look discussion at MESOS-3366 before, which allow resources/attributes
>> discovery.
>>
>> A module to implement *slaveAttributesDecorator* hook will works like
>> a charm here in a static way. And need to justify attributes updating.
>>
>> #3. Allow updating attributes
>> 

Re: Rack awareness support for Mesos

2016-06-14 Thread Du, Fan



On 2016/6/14 20:32, Joris Van Remoortere wrote:

#1. Stick with attributes for rack awareness

I don't think this is the right approach; however, there seem to be 2
components to this discussion:

1. How the values are presented (Attributes vs. a new type-aware structure)
2. How the values are determined (scripts vs. automation vs. modules)

It seems you are more interested in working on #2. If that's the case,
please make sure that you don't assume anything about #1, as we not
everyone agrees that we will use the existing attributes in the future.


On the condition of compatible with existing framework which already 
rely on parsing attributes for rack information.


Quotes from my original statements:
> For compatibility with existing framework, I tend to be ok with using
> attributes to convey the rack information

By all means, no matter what internal structures to use, current 
behavior should be honored. btw, I'm also thinking about #1, it's too 
earlier to bring up the details so far before the ticket got ACCEPTED.


Any way, I'm always open to all kind of discussion, thanks for your 
comments! Joris.



For #2, you should focus on an API (module or script results) that will
support all the different methods the community wants to use to generate
this data.

As you mentioned, updating the values for a running agent is not
straightforward. A lot of design work will need to go into how these
values are propagated to frameworks that have made assumptions about
them, and which values are allowed to change vs. not.

—
*Joris Van Remoortere*
Mesosphere

On Tue, Jun 14, 2016 at 10:04 AM, Aaron Carey <aca...@ilm.com
<mailto:aca...@ilm.com>> wrote:

#3 would be very helpful for us. Also related:

https://issues.apache.org/jira/browse/MESOS-3059

--

Aaron Carey
Production Engineer - Cloud Pipeline
Industrial Light & Magic
London
020 3751 9150


From: Du, Fan [fan...@intel.com <mailto:fan...@intel.com>]
Sent: 14 June 2016 07:24
To: user@mesos.apache.org <mailto:user@mesos.apache.org>;
d...@mesos.apache.org <mailto:d...@mesos.apache.org>
Cc: Joris Van Remoortere; vinodk...@apache.org
<mailto:vinodk...@apache.org>
    Subject: Re: Rack awareness support for Mesos

Hi everyone

Let me summarize the discussion about Rack awareness in the community so
far. First thanks for all the comments, advices or challenges! :)

#1. Stick with attributes for rack awareness

For compatibility with existing framework, I tend to be ok with using
attributes to convey the rack information, but with the goal to do it
automatically, easy to maintain and with good attributes schema. This
will bring up below question where the controversy starts.

#2. Scripts vs programmatic way

Both can be used to set attributes, I've made my arguments in the Jira
and the Design doc, I'm not gonna to argue more here. But please take a
look discussion at MESOS-3366 before, which allow resources/attributes
discovery.

A module to implement *slaveAttributesDecorator* hook will works like
a charm here in a static way. And need to justify attributes updating.

#3. Allow updating attributes
Several cases need to be covered here:

a). Mesos runs inside VMs or container, where live migration happens, so
rack information need to be updated.

b). LLDP packets are broadcasted by the interval 10s~30s, a vendor
specific implementation, and rack information are usually stored in LLDP
daemon to be queried. Worst cases(nodes fresh reboot, or daemon restart)
would be: Mesos slave have to wait 10s~30s for a valid rack information
before register to master. Allow updating attributes will mitigate this
problem.

c). Framework affinity

Framework X prefers to run on the same nodes with another framwork Y.
For example, it's desirable for Shark or Spark-SQL to reside on the
*worker* node where Alluxio(former Tachyon) to gain more performance
boosting as SPARK-6707 ticket message {tachyon=true;us-east-1=false}

If framework could advertise agent attributes in the ResourcesOffer
process, awesome!


#4. Rearrange agents in a more scalable manner, like per rack basis

Randomly offering agents resource to framework does not improve data
locality, imagine the likelihood of a framework getting resources
underneath the same rack, at the scale of +3 nodes. Moreover time to
randomly shuffle the agents also grows.

How about rearranging the agent in a per rack basis, and a minor change
to the way how resources are allocated will fix this.


I might not see the whole picture here, so comments are welcomed!


On 2016/6/6 17:17, Du, Fan wrote:
 > Hi, Mesos folks
 >
 > I’ve been thinking about Mesos rack awareness support for a while,
 

Re: Rack awareness support for Mesos

2016-06-14 Thread Joris Van Remoortere
>
> #1. Stick with attributes for rack awareness

I don't think this is the right approach; however, there seem to be 2
components to this discussion:

1. How the values are presented (Attributes vs. a new type-aware structure)
2. How the values are determined (scripts vs. automation vs. modules)

It seems you are more interested in working on #2. If that's the case,
please make sure that you don't assume anything about #1, as we not
everyone agrees that we will use the existing attributes in the future.

For #2, you should focus on an API (module or script results) that will
support all the different methods the community wants to use to generate
this data.

As you mentioned, updating the values for a running agent is not
straightforward. A lot of design work will need to go into how these values
are propagated to frameworks that have made assumptions about them, and
which values are allowed to change vs. not.

—
*Joris Van Remoortere*
Mesosphere

On Tue, Jun 14, 2016 at 10:04 AM, Aaron Carey <aca...@ilm.com> wrote:

> #3 would be very helpful for us. Also related:
>
> https://issues.apache.org/jira/browse/MESOS-3059
>
> --
>
> Aaron Carey
> Production Engineer - Cloud Pipeline
> Industrial Light & Magic
> London
> 020 3751 9150
>
> 
> From: Du, Fan [fan...@intel.com]
> Sent: 14 June 2016 07:24
> To: user@mesos.apache.org; d...@mesos.apache.org
> Cc: Joris Van Remoortere; vinodk...@apache.org
> Subject: Re: Rack awareness support for Mesos
>
> Hi everyone
>
> Let me summarize the discussion about Rack awareness in the community so
> far. First thanks for all the comments, advices or challenges! :)
>
> #1. Stick with attributes for rack awareness
>
> For compatibility with existing framework, I tend to be ok with using
> attributes to convey the rack information, but with the goal to do it
> automatically, easy to maintain and with good attributes schema. This
> will bring up below question where the controversy starts.
>
> #2. Scripts vs programmatic way
>
> Both can be used to set attributes, I've made my arguments in the Jira
> and the Design doc, I'm not gonna to argue more here. But please take a
> look discussion at MESOS-3366 before, which allow resources/attributes
> discovery.
>
> A module to implement *slaveAttributesDecorator* hook will works like
> a charm here in a static way. And need to justify attributes updating.
>
> #3. Allow updating attributes
> Several cases need to be covered here:
>
> a). Mesos runs inside VMs or container, where live migration happens, so
> rack information need to be updated.
>
> b). LLDP packets are broadcasted by the interval 10s~30s, a vendor
> specific implementation, and rack information are usually stored in LLDP
> daemon to be queried. Worst cases(nodes fresh reboot, or daemon restart)
> would be: Mesos slave have to wait 10s~30s for a valid rack information
> before register to master. Allow updating attributes will mitigate this
> problem.
>
> c). Framework affinity
>
> Framework X prefers to run on the same nodes with another framwork Y.
> For example, it's desirable for Shark or Spark-SQL to reside on the
> *worker* node where Alluxio(former Tachyon) to gain more performance
> boosting as SPARK-6707 ticket message {tachyon=true;us-east-1=false}
>
> If framework could advertise agent attributes in the ResourcesOffer
> process, awesome!
>
>
> #4. Rearrange agents in a more scalable manner, like per rack basis
>
> Randomly offering agents resource to framework does not improve data
> locality, imagine the likelihood of a framework getting resources
> underneath the same rack, at the scale of +3 nodes. Moreover time to
> randomly shuffle the agents also grows.
>
> How about rearranging the agent in a per rack basis, and a minor change
> to the way how resources are allocated will fix this.
>
>
> I might not see the whole picture here, so comments are welcomed!
>
>
> On 2016/6/6 17:17, Du, Fan wrote:
> > Hi, Mesos folks
> >
> > I’ve been thinking about Mesos rack awareness support for a while,
> >
> > it’s a common interest for lots of data center applications to provide
> > data locality,
> >
> > fault tolerance and better task placement. Create MESOS-5545 to track
> > the story,
> >
> > and here is the initial design doc [1] to support rack awareness in
> Mesos.
> >
> > Looking forward to hear any comments from end user and other developers,
> >
> > Thanks!
> >
> > [1]:
> >
> https://docs.google.com/document/d/1rql_LZSwtQzBPALnk0qCLsmxcT3-zB7X7aJp-H3xxyE/edit?usp=sharing
> >
>


RE: Rack awareness support for Mesos

2016-06-14 Thread Aaron Carey
#3 would be very helpful for us. Also related:

https://issues.apache.org/jira/browse/MESOS-3059

--

Aaron Carey
Production Engineer - Cloud Pipeline
Industrial Light & Magic
London
020 3751 9150


From: Du, Fan [fan...@intel.com]
Sent: 14 June 2016 07:24
To: user@mesos.apache.org; d...@mesos.apache.org
Cc: Joris Van Remoortere; vinodk...@apache.org
Subject: Re: Rack awareness support for Mesos

Hi everyone

Let me summarize the discussion about Rack awareness in the community so
far. First thanks for all the comments, advices or challenges! :)

#1. Stick with attributes for rack awareness

For compatibility with existing framework, I tend to be ok with using
attributes to convey the rack information, but with the goal to do it
automatically, easy to maintain and with good attributes schema. This
will bring up below question where the controversy starts.

#2. Scripts vs programmatic way

Both can be used to set attributes, I've made my arguments in the Jira
and the Design doc, I'm not gonna to argue more here. But please take a
look discussion at MESOS-3366 before, which allow resources/attributes
discovery.

A module to implement *slaveAttributesDecorator* hook will works like
a charm here in a static way. And need to justify attributes updating.

#3. Allow updating attributes
Several cases need to be covered here:

a). Mesos runs inside VMs or container, where live migration happens, so
rack information need to be updated.

b). LLDP packets are broadcasted by the interval 10s~30s, a vendor
specific implementation, and rack information are usually stored in LLDP
daemon to be queried. Worst cases(nodes fresh reboot, or daemon restart)
would be: Mesos slave have to wait 10s~30s for a valid rack information
before register to master. Allow updating attributes will mitigate this
problem.

c). Framework affinity

Framework X prefers to run on the same nodes with another framwork Y.
For example, it's desirable for Shark or Spark-SQL to reside on the
*worker* node where Alluxio(former Tachyon) to gain more performance
boosting as SPARK-6707 ticket message {tachyon=true;us-east-1=false}

If framework could advertise agent attributes in the ResourcesOffer
process, awesome!


#4. Rearrange agents in a more scalable manner, like per rack basis

Randomly offering agents resource to framework does not improve data
locality, imagine the likelihood of a framework getting resources
underneath the same rack, at the scale of +3 nodes. Moreover time to
randomly shuffle the agents also grows.

How about rearranging the agent in a per rack basis, and a minor change
to the way how resources are allocated will fix this.


I might not see the whole picture here, so comments are welcomed!


On 2016/6/6 17:17, Du, Fan wrote:
> Hi, Mesos folks
>
> I’ve been thinking about Mesos rack awareness support for a while,
>
> it’s a common interest for lots of data center applications to provide
> data locality,
>
> fault tolerance and better task placement. Create MESOS-5545 to track
> the story,
>
> and here is the initial design doc [1] to support rack awareness in Mesos.
>
> Looking forward to hear any comments from end user and other developers,
>
> Thanks!
>
> [1]:
> https://docs.google.com/document/d/1rql_LZSwtQzBPALnk0qCLsmxcT3-zB7X7aJp-H3xxyE/edit?usp=sharing
>


Re: Rack awareness support for Mesos

2016-06-14 Thread Du, Fan

Hi everyone

Let me summarize the discussion about Rack awareness in the community so 
far. First thanks for all the comments, advices or challenges! :)


#1. Stick with attributes for rack awareness

For compatibility with existing framework, I tend to be ok with using 
attributes to convey the rack information, but with the goal to do it
automatically, easy to maintain and with good attributes schema. This 
will bring up below question where the controversy starts.


#2. Scripts vs programmatic way

Both can be used to set attributes, I've made my arguments in the Jira 
and the Design doc, I'm not gonna to argue more here. But please take a 
look discussion at MESOS-3366 before, which allow resources/attributes 
discovery.


A module to implement *slaveAttributesDecorator* hook will works like
a charm here in a static way. And need to justify attributes updating.

#3. Allow updating attributes
Several cases need to be covered here:

a). Mesos runs inside VMs or container, where live migration happens, so 
rack information need to be updated.


b). LLDP packets are broadcasted by the interval 10s~30s, a vendor 
specific implementation, and rack information are usually stored in LLDP 
daemon to be queried. Worst cases(nodes fresh reboot, or daemon restart) 
would be: Mesos slave have to wait 10s~30s for a valid rack information 
before register to master. Allow updating attributes will mitigate this 
problem.


c). Framework affinity

Framework X prefers to run on the same nodes with another framwork Y.
For example, it's desirable for Shark or Spark-SQL to reside on the
*worker* node where Alluxio(former Tachyon) to gain more performance 
boosting as SPARK-6707 ticket message {tachyon=true;us-east-1=false}


If framework could advertise agent attributes in the ResourcesOffer 
process, awesome!



#4. Rearrange agents in a more scalable manner, like per rack basis

Randomly offering agents resource to framework does not improve data 
locality, imagine the likelihood of a framework getting resources 
underneath the same rack, at the scale of +3 nodes. Moreover time to 
randomly shuffle the agents also grows.


How about rearranging the agent in a per rack basis, and a minor change 
to the way how resources are allocated will fix this.



I might not see the whole picture here, so comments are welcomed!


On 2016/6/6 17:17, Du, Fan wrote:

Hi, Mesos folks

I’ve been thinking about Mesos rack awareness support for a while,

it’s a common interest for lots of data center applications to provide
data locality,

fault tolerance and better task placement. Create MESOS-5545 to track
the story,

and here is the initial design doc [1] to support rack awareness in Mesos.

Looking forward to hear any comments from end user and other developers,

Thanks!

[1]:
https://docs.google.com/document/d/1rql_LZSwtQzBPALnk0qCLsmxcT3-zB7X7aJp-H3xxyE/edit?usp=sharing



Re: Rack awareness support for Mesos

2016-06-08 Thread Du, Fan



On 2016/6/8 0:58, james wrote:


Do I have access to the jira system by default joining this list,
or do I have to request permission somewhere? (sorry jira is new to me
so recommendations on jira, per mesos, in a document, would be keen.)


You need a JIRA account, sign up one here:
https://issues.apache.org/jira/secure/Signup!default.jspa


Re: Rack awareness support for Mesos

2016-06-07 Thread Jeff Schroeder
On Tuesday, June 7, 2016, Du, Fan  wrote:

>
>
> On 2016/6/6 23:48, Jörg Schad wrote:
>
>> Hi,
>> thanks for your idea and design doc!
>> Just a few thoughts:
>> a) The scheduling part would be implemented in a framework scheduler and
>> not the Mesos Core, or?
>>
>
> I'm not sure which level of scheduling part do you indicate,
> For the "Future" section of proposal?, It's Mesos allocation logic.
> And how to use rack information to implement advanced features (fault
> tolerance,
> data locality) is up to the framework scheduling part.
>
> b) As mentioned by James, this needs to be very flexible (and not
>> necessarily based on network structure),
>>
>
> The proposed network topology detection is modular, to fit into Ethernet,
> Infiniband, or other network implementation. And yes, user can statically
> configure /etc/mesos/rack_id to manipulate the logical network topology
> easily.
>
>
> afaik people are using labels
>> on the agents to identify different fault domains which can then be
>> interpreted by framework scheduler. Maybe it would make sense (instead
>> of identifying the network structure) to come up with a common label
>> naming scheme which can be understood by all/different frameworks.
>>
>
> I'm not convinced here why still using labels,
> Based on what information to label the agents? IMO, cluster operator
> still needs something like lldp to find out the network topology,
> every cluster operator will need to do it by his own, and it's better
> to abstract the logical inside Mesos to provide common interface to
> frameworks.


LLDP is Ethernet specific however. To go into Mesos, it would need to be
higher level as there are people who run Mesos with Infiniband or perhaps
an exotic custom networking fabric (Cray and IBM bits come to mind) that
might want to take advantage of this functionality. Labels are more
generic, but also more flexible in that regard.


-- 
Text by Jeff, typos by iPhone


Re: Rack awareness support for Mesos

2016-06-07 Thread Joris Van Remoortere
+dev.

@Fan, I responded on the JIRA with some next steps.
Thanks for bringing this up!

—
*Joris Van Remoortere*
Mesosphere

On Tue, Jun 7, 2016 at 12:58 PM, james <gar...@verizon.net> wrote:

> On 06/07/2016 09:57 AM, Du, Fan wrote:
>
>>
>>
>> On 2016/6/6 21:27, james wrote:
>>
>>> Hello,
>>>
>>>
>>> @Stephen::I guess Stephen is bringing up the 'security' aspect of who
>>> get's access to the information, particularly cluster/cloud devops,
>>> customers or interlopers?
>>>
>>
>> ACLs should play in this part to address security concern.
>>
>
> YES, and so much more! I know folks that their primary (in house cluster)
> usage is deep packet inspection on  the cluster
> With a cluster (inside) there is no limit to new tools that can be
> judiciously altered to benefit from cluster codes
>
>
>>
>>> @Fan:: As a consultant, most of my customers either have  or are
>>> planning hybrid installations, where some codes run on a local cluster
>>> or using 'the cloud' for dynamic load requirements. I would think your
>>> proposed scheme needs to be very flexible, both in application to a
>>> campus or Metropolitan Area Network, if not massively distributed around
>>> the globe. What about different resouce types (racks of arm64, gpu
>>> centric hardware, DSPs, FPGA etc etc. Hardware diversity bring many
>>> benefits to the cluster/cloud capabilities.
>>>
>>>
>>> This also begs the quesion of hardware management (boot/config/online)
>>> of the various hardware, such as is built into coreOS. Are several
>>> applications going to be supported? Standards track? Just Mesos DC/OS
>>> centric?
>>>
>>
>> It depends whether this proposal is accepted by Mesos, if you think
>> this feature is useful, let's discuss detailed requirement under
>> MESOS-5545.
>>
>
> OK. Take a look at 'Rackview' on sourceforge::
> 'http://rackview.sourceforge.net/'
>
>
> Do I have access to the jira system by default joining this list,
> or do I have to request permission somewhere? (sorry jira is new to me
> so recommendations on jira, per mesos, in a document, would be keen.)
>
>
>> btw, I have limited knowledge of CoreOS, will look into it.
>>
>
> CoreOS has some great ideas. But many of their codes are not current
> (when compared to the gentoo portage tree) and thus many are suspect
> for security/function.
>
> I thought the purpose was to get more folks involved here in discussions
> and then better formulated ideas  can migrate to the ticket (5545)  and
> repos.
>
>
>>
>>> TIMING DATA:: This is the main issue I see. Once you start 'vectoring
>>> in resources' you need to add timing (latency) data to encourage robust
>>> and diversified use of of this data. For HPC, this could be very
>>> valuable for rDMA abusive algorithms where memory constrained workloads
>>> not only need the knowledge of additional nearby memory resources, but
>>> the approximated (based on previous data collected) latency and
>>> bandwidth constraints to use those additional resources.
>>>
>>
>> Out of curiosity, which open sourced Mesos framework do you/your
>> customer run MPI?
>>
>
> Easy dude.Most of this work in tightly help and nothing to publish
> or open up yet. It's a mess (my professional opinion) right now and
> I'm testing a variety of tools just be able to have better instrumentation
> on these codes. Still rDMA is very attractive so it does warrant much
> attention and extreme, internal, excitement.
>
>
>
>
> Mesos can support MPI framework, but AFIK, it's immature [1][2].
>>
>
> YEP.
>
> I think this part of work should be investigated in future.
>>
>> [1]: https://github.com/apache/mesos/tree/master/mpi   <- mpd ring
>> version
>> [2]:https://github.com/mesosphere/mesos-hydra <- hydra version
>>
>
> Many codes floating around. Much excitement on new compiler features. Lots
> of hard work and testing going on. That said, the point I was try to make
> is "Vectoring in" resources, with a variety of parameters as a companion to
> your idea, is warranted for these aforementioned use cases
> and other opportunities.
>
>
>>
>>> Great idea. I do like it very much.
>>>
>>> hth,
>>> James
>>>
>>>
>>> On 06/06/2016 05:06 AM, Stephen Gran wrote:
>>>
>>>> Hi,
>>>>
>>>> This looks potentially interesting.  How does it work in a public c

Re: Rack awareness support for Mesos

2016-06-07 Thread james

On 06/07/2016 09:57 AM, Du, Fan wrote:



On 2016/6/6 21:27, james wrote:

Hello,


@Stephen::I guess Stephen is bringing up the 'security' aspect of who
get's access to the information, particularly cluster/cloud devops,
customers or interlopers?


ACLs should play in this part to address security concern.


YES, and so much more! I know folks that their primary (in house 
cluster) usage is deep packet inspection on  the cluster

With a cluster (inside) there is no limit to new tools that can be
judiciously altered to benefit from cluster codes





@Fan:: As a consultant, most of my customers either have  or are
planning hybrid installations, where some codes run on a local cluster
or using 'the cloud' for dynamic load requirements. I would think your
proposed scheme needs to be very flexible, both in application to a
campus or Metropolitan Area Network, if not massively distributed around
the globe. What about different resouce types (racks of arm64, gpu
centric hardware, DSPs, FPGA etc etc. Hardware diversity bring many
benefits to the cluster/cloud capabilities.


This also begs the quesion of hardware management (boot/config/online)
of the various hardware, such as is built into coreOS. Are several
applications going to be supported? Standards track? Just Mesos DC/OS
centric?


It depends whether this proposal is accepted by Mesos, if you think
this feature is useful, let's discuss detailed requirement under
MESOS-5545.


OK. Take a look at 'Rackview' on sourceforge::
'http://rackview.sourceforge.net/'


Do I have access to the jira system by default joining this list,
or do I have to request permission somewhere? (sorry jira is new to me
so recommendations on jira, per mesos, in a document, would be keen.)



btw, I have limited knowledge of CoreOS, will look into it.


CoreOS has some great ideas. But many of their codes are not current
(when compared to the gentoo portage tree) and thus many are suspect
for security/function.

I thought the purpose was to get more folks involved here in discussions
and then better formulated ideas  can migrate to the ticket (5545)  and 
repos.






TIMING DATA:: This is the main issue I see. Once you start 'vectoring
in resources' you need to add timing (latency) data to encourage robust
and diversified use of of this data. For HPC, this could be very
valuable for rDMA abusive algorithms where memory constrained workloads
not only need the knowledge of additional nearby memory resources, but
the approximated (based on previous data collected) latency and
bandwidth constraints to use those additional resources.


Out of curiosity, which open sourced Mesos framework do you/your
customer run MPI?


Easy dude.Most of this work in tightly help and nothing to publish
or open up yet. It's a mess (my professional opinion) right now and
I'm testing a variety of tools just be able to have better 
instrumentation on these codes. Still rDMA is very attractive so it does 
warrant much attention and extreme, internal, excitement.






Mesos can support MPI framework, but AFIK, it's immature [1][2].


YEP.


I think this part of work should be investigated in future.

[1]: https://github.com/apache/mesos/tree/master/mpi   <- mpd ring version
[2]:https://github.com/mesosphere/mesos-hydra <- hydra version


Many codes floating around. Much excitement on new compiler features. 
Lots of hard work and testing going on. That said, the point I was try 
to make is "Vectoring in" resources, with a variety of parameters as a 
companion to your idea, is warranted for these aforementioned use cases

and other opportunities.




Great idea. I do like it very much.

hth,
James


On 06/06/2016 05:06 AM, Stephen Gran wrote:

Hi,

This looks potentially interesting.  How does it work in a public cloud
deployment scenario?  I assume you would just have to disable this
feature, or not enable it?

Cheers,

On 06/06/16 10:17, Du, Fan wrote:

Hi, Mesos folks

I’ve been thinking about Mesos rack awareness support for a while,

it’s a common interest for lots of data center applications to provide
data locality,

fault tolerance and better task placement. Create MESOS-5545 to track
the story,

and here is the initial design doc [1] to support rack awareness in
Mesos.

Looking forward to hear any comments from end user and other
developers,

Thanks!

[1]:
https://docs.google.com/document/d/1rql_LZSwtQzBPALnk0qCLsmxcT3-zB7X7aJp-H3xxyE/edit?usp=sharing















RE: Rack awareness support for Mesos

2016-06-07 Thread Aaron Carey
Would this perhaps make sense as a mesos module which can automatically assigns 
labels to the agents, rather than something in the core itself?

--

Aaron Carey
Production Engineer - Cloud Pipeline
Industrial Light & Magic
London
020 3751 9150


From: Du, Fan [fan...@intel.com]
Sent: 07 June 2016 16:16
To: Jörg Schad; user@mesos.apache.org
Subject: Re: Rack awareness support for Mesos

On 2016/6/6 23:48, Jörg Schad wrote:
> Hi,
> thanks for your idea and design doc!
> Just a few thoughts:
> a) The scheduling part would be implemented in a framework scheduler and
> not the Mesos Core, or?

I'm not sure which level of scheduling part do you indicate,
For the "Future" section of proposal?, It's Mesos allocation logic.
And how to use rack information to implement advanced features (fault
tolerance,
data locality) is up to the framework scheduling part.

> b) As mentioned by James, this needs to be very flexible (and not
> necessarily based on network structure),

The proposed network topology detection is modular, to fit into Ethernet,
Infiniband, or other network implementation. And yes, user can statically
configure /etc/mesos/rack_id to manipulate the logical network topology
easily.


>afaik people are using labels
> on the agents to identify different fault domains which can then be
> interpreted by framework scheduler. Maybe it would make sense (instead
> of identifying the network structure) to come up with a common label
> naming scheme which can be understood by all/different frameworks.

I'm not convinced here why still using labels,
Based on what information to label the agents? IMO, cluster operator
still needs something like lldp to find out the network topology,
every cluster operator will need to do it by his own, and it's better
to abstract the logical inside Mesos to provide common interface to
frameworks.

Honestly speaking, I don't follow the argument here for the labels.
The proposal is designed to do it *automatically* to reduce maintenance
effort.

> Looking forward to your thoughts on this!
>
> On Mon, Jun 6, 2016 at 3:27 PM, james <gar...@verizon.net
> <mailto:gar...@verizon.net>> wrote:
>
> Hello,
>
>
> @Stephen::I guess Stephen is bringing up the 'security' aspect of
> who get's access to the information, particularly cluster/cloud
> devops, customers or interlopers?
>
>
> @Fan:: As a consultant, most of my customers either have  or are
> planning hybrid installations, where some codes run on a local
> cluster or using 'the cloud' for dynamic load requirements. I would
> think your proposed scheme needs to be very flexible, both in
> application to a campus or Metropolitan Area Network, if not
> massively distributed around the globe. What about different resouce
> types (racks of arm64, gpu centric hardware, DSPs, FPGA etc etc.
> Hardware diversity bring many
> benefits to the cluster/cloud capabilities.
>
>
> This also begs the quesion of hardware management (boot/config/online)
> of the various hardware, such as is built into coreOS. Are several
> applications going to be supported? Standards track? Just Mesos DC/OS
> centric?
>
>
> TIMING DATA:: This is the main issue I see. Once you start 'vectoring
> in resources' you need to add timing (latency) data to encourage robust
> and diversified use of of this data. For HPC, this could be very
> valuable for rDMA abusive algorithms where memory constrained
> workloads not only need the knowledge of additional nearby memory
> resources, but
> the approximated (based on previous data collected) latency and
> bandwidth constraints to use those additional resources.
>
>
> Great idea. I do like it very much.
>
> hth,
> James
>
>
>
> On 06/06/2016 05:06 AM, Stephen Gran wrote:
>
> Hi,
>
> This looks potentially interesting.  How does it work in a
> public cloud
> deployment scenario?  I assume you would just have to disable this
> feature, or not enable it?
>
> Cheers,
>
> On 06/06/16 10:17, Du, Fan wrote:
>
> Hi, Mesos folks
>
> I’ve been thinking about Mesos rack awareness support for a
> while,
>
> it’s a common interest for lots of data center applications
> to provide
> data locality,
>
> fault tolerance and better task placement. Create MESOS-5545
> to track
> the story,
>
> and here is the initial design doc [1] to support rack
> awareness in Mesos.
>
> Looking forward to hear any comments from end user and other
> developers,
>
> Thanks!
>
> [1]:
> 
> https://docs.google.com/document/d/1rql_LZSwtQzBPALnk0qCLsmxcT3-zB7X7aJp-H3xxyE/edit?usp=sharing
>
>
>
>


Re: Rack awareness support for Mesos

2016-06-07 Thread Du, Fan



On 2016/6/6 23:48, Jörg Schad wrote:

Hi,
thanks for your idea and design doc!
Just a few thoughts:
a) The scheduling part would be implemented in a framework scheduler and
not the Mesos Core, or?


I'm not sure which level of scheduling part do you indicate,
For the "Future" section of proposal?, It's Mesos allocation logic.
And how to use rack information to implement advanced features (fault 
tolerance,

data locality) is up to the framework scheduling part.


b) As mentioned by James, this needs to be very flexible (and not
necessarily based on network structure),


The proposed network topology detection is modular, to fit into Ethernet,
Infiniband, or other network implementation. And yes, user can statically
configure /etc/mesos/rack_id to manipulate the logical network topology
easily.



afaik people are using labels
on the agents to identify different fault domains which can then be
interpreted by framework scheduler. Maybe it would make sense (instead
of identifying the network structure) to come up with a common label
naming scheme which can be understood by all/different frameworks.


I'm not convinced here why still using labels,
Based on what information to label the agents? IMO, cluster operator
still needs something like lldp to find out the network topology,
every cluster operator will need to do it by his own, and it's better
to abstract the logical inside Mesos to provide common interface to
frameworks.

Honestly speaking, I don't follow the argument here for the labels.
The proposal is designed to do it *automatically* to reduce maintenance 
effort.



Looking forward to your thoughts on this!

On Mon, Jun 6, 2016 at 3:27 PM, james <gar...@verizon.net
<mailto:gar...@verizon.net>> wrote:

Hello,


@Stephen::I guess Stephen is bringing up the 'security' aspect of
who get's access to the information, particularly cluster/cloud
devops, customers or interlopers?


@Fan:: As a consultant, most of my customers either have  or are
planning hybrid installations, where some codes run on a local
cluster or using 'the cloud' for dynamic load requirements. I would
think your proposed scheme needs to be very flexible, both in
application to a campus or Metropolitan Area Network, if not
massively distributed around the globe. What about different resouce
types (racks of arm64, gpu centric hardware, DSPs, FPGA etc etc.
Hardware diversity bring many
benefits to the cluster/cloud capabilities.


This also begs the quesion of hardware management (boot/config/online)
of the various hardware, such as is built into coreOS. Are several
applications going to be supported? Standards track? Just Mesos DC/OS
centric?


TIMING DATA:: This is the main issue I see. Once you start 'vectoring
in resources' you need to add timing (latency) data to encourage robust
and diversified use of of this data. For HPC, this could be very
valuable for rDMA abusive algorithms where memory constrained
workloads not only need the knowledge of additional nearby memory
resources, but
the approximated (based on previous data collected) latency and
bandwidth constraints to use those additional resources.


Great idea. I do like it very much.

hth,
James



On 06/06/2016 05:06 AM, Stephen Gran wrote:

Hi,

This looks potentially interesting.  How does it work in a
public cloud
deployment scenario?  I assume you would just have to disable this
feature, or not enable it?

Cheers,

On 06/06/16 10:17, Du, Fan wrote:

Hi, Mesos folks

I’ve been thinking about Mesos rack awareness support for a
while,

it’s a common interest for lots of data center applications
to provide
data locality,

fault tolerance and better task placement. Create MESOS-5545
to track
the story,

and here is the initial design doc [1] to support rack
    awareness in Mesos.

Looking forward to hear any comments from end user and other
developers,

Thanks!

[1]:

https://docs.google.com/document/d/1rql_LZSwtQzBPALnk0qCLsmxcT3-zB7X7aJp-H3xxyE/edit?usp=sharing






Re: Rack awareness support for Mesos

2016-06-07 Thread Du, Fan



On 2016/6/6 21:27, james wrote:

Hello,


@Stephen::I guess Stephen is bringing up the 'security' aspect of who
get's access to the information, particularly cluster/cloud devops,
customers or interlopers?


ACLs should play in this part to address security concern.



@Fan:: As a consultant, most of my customers either have  or are
planning hybrid installations, where some codes run on a local cluster
or using 'the cloud' for dynamic load requirements. I would think your
proposed scheme needs to be very flexible, both in application to a
campus or Metropolitan Area Network, if not massively distributed around
the globe. What about different resouce types (racks of arm64, gpu
centric hardware, DSPs, FPGA etc etc. Hardware diversity bring many
benefits to the cluster/cloud capabilities.


This also begs the quesion of hardware management (boot/config/online)
of the various hardware, such as is built into coreOS. Are several
applications going to be supported? Standards track? Just Mesos DC/OS
centric?


It depends whether this proposal is accepted by Mesos, if you think
this feature is useful, let's discuss detailed requirement under MESOS-5545.

btw, I have limited knowledge of CoreOS, will look into it.



TIMING DATA:: This is the main issue I see. Once you start 'vectoring
in resources' you need to add timing (latency) data to encourage robust
and diversified use of of this data. For HPC, this could be very
valuable for rDMA abusive algorithms where memory constrained workloads
not only need the knowledge of additional nearby memory resources, but
the approximated (based on previous data collected) latency and
bandwidth constraints to use those additional resources.


Out of curiosity, which open sourced Mesos framework do you/your 
customer run MPI?

Mesos can support MPI framework, but AFIK, it's immature [1][2].
I think this part of work should be investigated in future.

[1]: https://github.com/apache/mesos/tree/master/mpi   <- mpd ring version
[2]:https://github.com/mesosphere/mesos-hydra <- hydra version



Great idea. I do like it very much.

hth,
James


On 06/06/2016 05:06 AM, Stephen Gran wrote:

Hi,

This looks potentially interesting.  How does it work in a public cloud
deployment scenario?  I assume you would just have to disable this
feature, or not enable it?

Cheers,

On 06/06/16 10:17, Du, Fan wrote:

Hi, Mesos folks

I’ve been thinking about Mesos rack awareness support for a while,

it’s a common interest for lots of data center applications to provide
data locality,

fault tolerance and better task placement. Create MESOS-5545 to track
the story,

and here is the initial design doc [1] to support rack awareness in
Mesos.

Looking forward to hear any comments from end user and other developers,

Thanks!

[1]:
https://docs.google.com/document/d/1rql_LZSwtQzBPALnk0qCLsmxcT3-zB7X7aJp-H3xxyE/edit?usp=sharing









Re: Rack awareness support for Mesos

2016-06-06 Thread Jörg Schad
Hi,
thanks for your idea and design doc!
Just a few thoughts:
a) The scheduling part would be implemented in a framework scheduler and
not the Mesos Core, or?
b) As mentioned by James, this needs to be very flexible (and not
necessarily based on network structure), afaik people are using labels on
the agents to identify different fault domains which can then be
interpreted by framework scheduler. Maybe it would make sense (instead of
identifying the network structure) to come up with a common label naming
scheme which can be understood by all/different frameworks.

Looking forward to your thoughts on this!

On Mon, Jun 6, 2016 at 3:27 PM, james <gar...@verizon.net> wrote:

> Hello,
>
>
> @Stephen::I guess Stephen is bringing up the 'security' aspect of who
> get's access to the information, particularly cluster/cloud devops,
> customers or interlopers?
>
>
> @Fan:: As a consultant, most of my customers either have  or are planning
> hybrid installations, where some codes run on a local cluster or using 'the
> cloud' for dynamic load requirements. I would think your proposed scheme
> needs to be very flexible, both in application to a campus or Metropolitan
> Area Network, if not massively distributed around the globe. What about
> different resouce types (racks of arm64, gpu centric hardware, DSPs, FPGA
> etc etc. Hardware diversity bring many
> benefits to the cluster/cloud capabilities.
>
>
> This also begs the quesion of hardware management (boot/config/online)
> of the various hardware, such as is built into coreOS. Are several
> applications going to be supported? Standards track? Just Mesos DC/OS
> centric?
>
>
> TIMING DATA:: This is the main issue I see. Once you start 'vectoring
> in resources' you need to add timing (latency) data to encourage robust
> and diversified use of of this data. For HPC, this could be very valuable
> for rDMA abusive algorithms where memory constrained workloads not only
> need the knowledge of additional nearby memory resources, but
> the approximated (based on previous data collected) latency and bandwidth
> constraints to use those additional resources.
>
>
> Great idea. I do like it very much.
>
> hth,
> James
>
>
>
> On 06/06/2016 05:06 AM, Stephen Gran wrote:
>
>> Hi,
>>
>> This looks potentially interesting.  How does it work in a public cloud
>> deployment scenario?  I assume you would just have to disable this
>> feature, or not enable it?
>>
>> Cheers,
>>
>> On 06/06/16 10:17, Du, Fan wrote:
>>
>>> Hi, Mesos folks
>>>
>>> I’ve been thinking about Mesos rack awareness support for a while,
>>>
>>> it’s a common interest for lots of data center applications to provide
>>> data locality,
>>>
>>> fault tolerance and better task placement. Create MESOS-5545 to track
>>> the story,
>>>
>>> and here is the initial design doc [1] to support rack awareness in
>>> Mesos.
>>>
>>> Looking forward to hear any comments from end user and other developers,
>>>
>>> Thanks!
>>>
>>> [1]:
>>>
>>> https://docs.google.com/document/d/1rql_LZSwtQzBPALnk0qCLsmxcT3-zB7X7aJp-H3xxyE/edit?usp=sharing
>>>
>>>
>>
>


Re: Rack awareness support for Mesos

2016-06-06 Thread james

Hello,


@Stephen::I guess Stephen is bringing up the 'security' aspect of who 
get's access to the information, particularly cluster/cloud devops, 
customers or interlopers?



@Fan:: As a consultant, most of my customers either have  or are 
planning hybrid installations, where some codes run on a local cluster 
or using 'the cloud' for dynamic load requirements. I would think your 
proposed scheme needs to be very flexible, both in application to a 
campus or Metropolitan Area Network, if not massively distributed around 
the globe. What about different resouce types (racks of arm64, gpu 
centric hardware, DSPs, FPGA etc etc. Hardware diversity bring many

benefits to the cluster/cloud capabilities.


This also begs the quesion of hardware management (boot/config/online)
of the various hardware, such as is built into coreOS. Are several 
applications going to be supported? Standards track? Just Mesos DC/OS

centric?


TIMING DATA:: This is the main issue I see. Once you start 'vectoring
in resources' you need to add timing (latency) data to encourage robust
and diversified use of of this data. For HPC, this could be very 
valuable for rDMA abusive algorithms where memory constrained workloads 
not only need the knowledge of additional nearby memory resources, but
the approximated (based on previous data collected) latency and 
bandwidth constraints to use those additional resources.



Great idea. I do like it very much.

hth,
James


On 06/06/2016 05:06 AM, Stephen Gran wrote:

Hi,

This looks potentially interesting.  How does it work in a public cloud
deployment scenario?  I assume you would just have to disable this
feature, or not enable it?

Cheers,

On 06/06/16 10:17, Du, Fan wrote:

Hi, Mesos folks

I’ve been thinking about Mesos rack awareness support for a while,

it’s a common interest for lots of data center applications to provide
data locality,

fault tolerance and better task placement. Create MESOS-5545 to track
the story,

and here is the initial design doc [1] to support rack awareness in Mesos.

Looking forward to hear any comments from end user and other developers,

Thanks!

[1]:
https://docs.google.com/document/d/1rql_LZSwtQzBPALnk0qCLsmxcT3-zB7X7aJp-H3xxyE/edit?usp=sharing







Re: Rack awareness support for Mesos

2016-06-06 Thread Stephen Gran
Hi,

This looks potentially interesting.  How does it work in a public cloud 
deployment scenario?  I assume you would just have to disable this 
feature, or not enable it?

Cheers,

On 06/06/16 10:17, Du, Fan wrote:
> Hi, Mesos folks
>
> I’ve been thinking about Mesos rack awareness support for a while,
>
> it’s a common interest for lots of data center applications to provide
> data locality,
>
> fault tolerance and better task placement. Create MESOS-5545 to track
> the story,
>
> and here is the initial design doc [1] to support rack awareness in Mesos.
>
> Looking forward to hear any comments from end user and other developers,
>
> Thanks!
>
> [1]:
> https://docs.google.com/document/d/1rql_LZSwtQzBPALnk0qCLsmxcT3-zB7X7aJp-H3xxyE/edit?usp=sharing
>

-- 
Stephen Gran
Senior Technical Architect

picture the possibilities | piksel.com


Rack awareness support for Mesos

2016-06-06 Thread Du, Fan
Hi, Mesos folks

I've been thinking about Mesos rack awareness support for a while,
it's a common interest for lots of data center applications to provide data 
locality,
fault tolerance and better task placement. Create MESOS-5545 to track the story,
and here is the initial design doc [1] to support rack awareness in Mesos.

Looking forward to hear any comments from end user and other developers,
Thanks!

[1]: 
https://docs.google.com/document/d/1rql_LZSwtQzBPALnk0qCLsmxcT3-zB7X7aJp-H3xxyE/edit?usp=sharing