Re: Rack awareness support for Mesos
<mailto:d...@mesos.apache.org>>> <mailto:d...@mesos.apache.org <mailto:d...@mesos.apache.org> <mailto:d...@mesos.apache.org <mailto:d...@mesos.apache.org>> <mailto:d...@mesos.apache.org <mailto:d...@mesos.apache.org> <mailto:d...@mesos.apache.org <mailto:d...@mesos.apache.org>>>> Cc: Joris Van Remoortere; vinodk...@apache.org <mailto:vinodk...@apache.org> <mailto:vinodk...@apache.org <mailto:vinodk...@apache.org>> <mailto:vinodk...@apache.org <mailto:vinodk...@apache.org> <mailto:vinodk...@apache.org <mailto:vinodk...@apache.org>>> <mailto:vinodk...@apache.org <mailto:vinodk...@apache.org> <mailto:vinodk...@apache.org <mailto:vinodk...@apache.org>> <mailto:vinodk...@apache.org <mailto:vinodk...@apache.org> <mailto:vinodk...@apache.org <mailto:vinodk...@apache.org>>>> Subject: Re: Rack awareness support for Mesos Hi everyone Let me summarize the discussion about Rack awareness in the community so far. First thanks for all the comments, advices or challenges! :) #1. Stick with attributes for rack awareness For compatibility with existing framework, I tend to be ok with using attributes to convey the rack information, but with the goal to do it automatically, easy to maintain and with good attributes schema. This will bring up below question where the controversy starts. #2. Scripts vs programmatic way Both can be used to set attributes, I've made my arguments in the Jira and the Design doc, I'm not gonna to argue more here. But please take a look discussion at MESOS-3366 before, which allow resources/attributes discovery. A module to implement *slaveAttributesDecorator* hook will works like a charm here in a static way. And need to justify attributes updating. #3. Allow updating attributes Several cases need to be covered here: a). Mesos runs inside VMs or container, where live migration happens, so rack information need to be updated. b). LLDP packets are broadcasted by the interval 10s~30s, a vendor specific implementation, and rack information are usually stored in LLDP daemon to be queried. Worst cases(nodes fresh reboot, or daemon restart) would be: Mesos slave have to wait 10s~30s for a valid rack information before register to master. Allow updating attributes will mitigate this problem. c). Framework affinity Framework X prefers to run on the same nodes with another framwork Y. For example, it's desirable for Shark or Spark-SQL to reside on the *worker* node where Alluxio(former Tachyon) to gain more performance boosting as SPARK-6707 ticket message {tachyon=true;us-east-1=false} If framework could advertise agent attributes in the ResourcesOffer
Re: Rack awareness support for Mesos
you should focus on an API (module or script >> results) >> that will >> support all the different methods the community wants >> to use to >> generate >> this data. >> >> As you mentioned, updating the values for a running >> agent is not >> straightforward. A lot of design work will need to go >> into how these >> values are propagated to frameworks that have made >> assumptions about >> them, and which values are allowed to change vs. not. >> >> — >> *Joris Van Remoortere* >> Mesosphere >> >> On Tue, Jun 14, 2016 at 10:04 AM, Aaron Carey >> <aca...@ilm.com <mailto:aca...@ilm.com> >> <mailto:aca...@ilm.com <mailto:aca...@ilm.com>> >> <mailto:aca...@ilm.com <mailto:aca...@ilm.com> >> <mailto:aca...@ilm.com <mailto:aca...@ilm.com>>>> wrote: >> >> #3 would be very helpful for us. Also related: >> >> https://issues.apache.org/jira/browse/MESOS-3059 >> >> -- >> >> Aaron Carey >> Production Engineer - Cloud Pipeline >> Industrial Light & Magic >> London >> 020 3751 9150 >> >> >> From: Du, Fan [fan...@intel.com >> <mailto:fan...@intel.com> <mailto:fan...@intel.com >> <mailto:fan...@intel.com>> >> <mailto:fan...@intel.com <mailto:fan...@intel.com> >> <mailto:fan...@intel.com <mailto:fan...@intel.com>>>] >> Sent: 14 June 2016 07:24 >> To: user@mesos.apache.org >> <mailto:user@mesos.apache.org> <mailto:user@mesos.apache.org >> <mailto:user@mesos.apache.org>> >> <mailto:user@mesos.apache.org >> <mailto:user@mesos.apache.org> <mailto:user@mesos.apache.org >> <mailto:user@mesos.apache.org>>>; >> d...@mesos.apache.org <mailto:d...@mesos.apache.org> >> <mailto:d...@mesos.apache.org <mailto:d...@mesos.apache.org>> >> <mailto:d...@mesos.apache.org >> <mailto:d...@mesos.apache.org> <mailto:d...@mesos.apache.org >> <mailto:d...@mesos.apache.org>>> >> Cc: Joris Van Remoortere; vinodk...@apache.org >> <mailto:vinodk...@apache.org> >> <mailto:vinodk...@apache.org > vinodk...@apache.org>> >> <mailto:vinodk...@apache.org >> <mailto:vinodk...@apache.org> <mailto:vinodk...@apache.org >> <mailto:vinodk...@apache.org>>> >> >> >> Subject: Re: Rack awareness support for Mesos >> >> Hi everyone >> >> Let me summarize the discussion about Rack >> awareness in the >> community so >> far. First thanks for all the comments, advices or >> challenges! :) >> >> #1. Stick with attributes for rack awareness >> >> For compatibility with existing framework, I tend >> to be ok >> with using >> attributes to convey the rack information, but >> with the >> goal to do it >> automatically, easy to maintain and with good >> attributes >> schema. This >> will bring up below question where the controversy >> starts. >> >> #2. Scripts vs programmatic way >> >> Both can be used to set attributes, I've made my >> arguments >> in the Jira >> and the Design doc, I'm not gonna to argue more >> here. But >> please take a >> look discussion at MESOS-3366 before, which allow >> resources/attributes &g
Re: Rack awareness support for Mesos
he.org <mailto:user@mesos.apache.org> <mailto:user@mesos.apache.org <mailto:user@mesos.apache.org>>>; d...@mesos.apache.org <mailto:d...@mesos.apache.org> <mailto:d...@mesos.apache.org <mailto:d...@mesos.apache.org>> <mailto:d...@mesos.apache.org <mailto:d...@mesos.apache.org> <mailto:d...@mesos.apache.org <mailto:d...@mesos.apache.org>>> Cc: Joris Van Remoortere; vinodk...@apache.org <mailto:vinodk...@apache.org> <mailto:vinodk...@apache.org <mailto:vinodk...@apache.org>> <mailto:vinodk...@apache.org <mailto:vinodk...@apache.org> <mailto:vinodk...@apache.org <mailto:vinodk...@apache.org>>> Subject: Re: Rack awareness support for Mesos Hi everyone Let me summarize the discussion about Rack awareness in the community so far. First thanks for all the comments, advices or challenges! :) #1. Stick with attributes for rack awareness For compatibility with existing framework, I tend to be ok with using attributes to convey the rack information, but with the goal to do it automatically, easy to maintain and with good attributes schema. This will bring up below question where the controversy starts. #2. Scripts vs programmatic way Both can be used to set attributes, I've made my arguments in the Jira and the Design doc, I'm not gonna to argue more here. But please take a look discussion at MESOS-3366 before, which allow resources/attributes discovery. A module to implement *slaveAttributesDecorator* hook will works like a charm here in a static way. And need to justify attributes updating. #3. Allow updating attributes Several cases need to be covered here: a). Mesos runs inside VMs or container, where live migration happens, so rack information need to be updated. b). LLDP packets are broadcasted by the interval 10s~30s, a vendor specific implementation, and rack information are usually stored in LLDP daemon to be queried. Worst cases(nodes fresh reboot, or daemon restart) would be: Mesos slave have to wait 10s~30s for a valid rack information before register to master. Allow updating attributes will mitigate this problem. c). Framework affinity Framework X prefers to run on the same nodes with another framwork Y. For example, it's desirable for Shark or Spark-SQL to reside on the *worker* node where Alluxio(former Tachyon) to gain more performance boosting as SPARK-6707 ticket message {tachyon=true;us-east-1=false} If framework could advertise agent attributes in the ResourcesOffer process, awesome! #4. Rearrange agents in a more scalable manner, like per rack basis Randomly offering agents resource to framework does not improve data locality, imagine the likelihood of a framework getting resources underneath the same rack, at the scale of +3 nodes. Moreover time to randomly shuffle the agents also grows. How about rearranging the agent in a per rack basis, and a minor change to the way how resources are allocated will fix this. I might not see the whole picture here, so comments are welcomed! On 2016/6/6 17:17, Du, Fan wrote: > Hi, Mesos folks > > I’ve been thinking about Mesos rack awareness support for a while,
Re: Rack awareness support for Mesos
Like it or not large amounts of hardware, need to have schema, planning > and architectural robustness to keep large amounts of hardware, pristinely > available for software efficiency to be any where near optimal deployment. > This really becomes critical when the mix of different CPU types, GPUs and > ram are to be considered in future deployments, regardless if you outsource > or run your own cluster. Hardware vendors are going to want to sell their > products to as wide of a customer base a possible and customers are going > to demand seamless management for expansion of resources. Furthermore, as a > consultant my experiences are that much of the future market is going to > demand outsourced, hybrid and in-house options as a fundamental tenant of > cluster resource adoption. > > hth, > James > > > *Joris Van Remoortere* >> Mesosphere >> >> On Tue, Jun 14, 2016 at 3:02 PM, Du, Fan <fan...@intel.com >> <mailto:fan...@intel.com>> wrote: >> >> >> >> On 2016/6/14 20:32, Joris Van Remoortere wrote: >> >> #1. Stick with attributes for rack awareness >> >> I don't think this is the right approach; however, there seem to >> be 2 >> components to this discussion: >> >> 1. How the values are presented (Attributes vs. a new type-aware >> structure) >> 2. How the values are determined (scripts vs. automation vs. >> modules) >> >> It seems you are more interested in working on #2. If that's the >> case, >> please make sure that you don't assume anything about #1, as we >> not >> everyone agrees that we will use the existing attributes in the >> future. >> >> >> On the condition of compatible with existing framework which already >> rely on parsing attributes for rack information. >> >> Quotes from my original statements: >> > For compatibility with existing framework, I tend to be ok with >> using >> > attributes to convey the rack information >> >> By all means, no matter what internal structures to use, current >> behavior should be honored. btw, I'm also thinking about #1, it's >> too earlier to bring up the details so far before the ticket got >> ACCEPTED. >> >> Any way, I'm always open to all kind of discussion, thanks for your >> comments! Joris. >> >> For #2, you should focus on an API (module or script results) >> that will >> support all the different methods the community wants to use to >> generate >> this data. >> >> As you mentioned, updating the values for a running agent is not >> straightforward. A lot of design work will need to go into how >> these >> values are propagated to frameworks that have made assumptions >> about >> them, and which values are allowed to change vs. not. >> >> — >> *Joris Van Remoortere* >> Mesosphere >> >> On Tue, Jun 14, 2016 at 10:04 AM, Aaron Carey <aca...@ilm.com >> <mailto:aca...@ilm.com> >> <mailto:aca...@ilm.com <mailto:aca...@ilm.com>>> wrote: >> >> #3 would be very helpful for us. Also related: >> >> https://issues.apache.org/jira/browse/MESOS-3059 >> >> -- >> >> Aaron Carey >> Production Engineer - Cloud Pipeline >> Industrial Light & Magic >> London >> 020 3751 9150 >> >> >> From: Du, Fan [fan...@intel.com <mailto:fan...@intel.com> >> <mailto:fan...@intel.com <mailto:fan...@intel.com>>] >> Sent: 14 June 2016 07:24 >> To: user@mesos.apache.org <mailto:user@mesos.apache.org> >> <mailto:user@mesos.apache.org <mailto:user@mesos.apache.org>>; >> d...@mesos.apache.org <mailto:d...@mesos.apache.org> >> <mailto:d...@mesos.apache.org <mailto:d...@mesos.apache.org>> >> Cc: Joris Van Remoortere; vinodk...@apache.org >> <mailto:vinodk...@apache.org> >> <mailto:vinodk...@apache.org <mailto:vinodk...@apache.org>> >> >> >> Subject: Re: Rack awareness support for Mesos >> >> Hi everyone >> >
Re: Rack awareness support for Mesos
(module or script results) that will support all the different methods the community wants to use to generate this data. As you mentioned, updating the values for a running agent is not straightforward. A lot of design work will need to go into how these values are propagated to frameworks that have made assumptions about them, and which values are allowed to change vs. not. — *Joris Van Remoortere* Mesosphere On Tue, Jun 14, 2016 at 10:04 AM, Aaron Carey <aca...@ilm.com <mailto:aca...@ilm.com> <mailto:aca...@ilm.com <mailto:aca...@ilm.com>>> wrote: #3 would be very helpful for us. Also related: https://issues.apache.org/jira/browse/MESOS-3059 -- Aaron Carey Production Engineer - Cloud Pipeline Industrial Light & Magic London 020 3751 9150 From: Du, Fan [fan...@intel.com <mailto:fan...@intel.com> <mailto:fan...@intel.com <mailto:fan...@intel.com>>] Sent: 14 June 2016 07:24 To: user@mesos.apache.org <mailto:user@mesos.apache.org> <mailto:user@mesos.apache.org <mailto:user@mesos.apache.org>>; d...@mesos.apache.org <mailto:d...@mesos.apache.org> <mailto:d...@mesos.apache.org <mailto:d...@mesos.apache.org>> Cc: Joris Van Remoortere; vinodk...@apache.org <mailto:vinodk...@apache.org> <mailto:vinodk...@apache.org <mailto:vinodk...@apache.org>> Subject: Re: Rack awareness support for Mesos Hi everyone Let me summarize the discussion about Rack awareness in the community so far. First thanks for all the comments, advices or challenges! :) #1. Stick with attributes for rack awareness For compatibility with existing framework, I tend to be ok with using attributes to convey the rack information, but with the goal to do it automatically, easy to maintain and with good attributes schema. This will bring up below question where the controversy starts. #2. Scripts vs programmatic way Both can be used to set attributes, I've made my arguments in the Jira and the Design doc, I'm not gonna to argue more here. But please take a look discussion at MESOS-3366 before, which allow resources/attributes discovery. A module to implement *slaveAttributesDecorator* hook will works like a charm here in a static way. And need to justify attributes updating. #3. Allow updating attributes Several cases need to be covered here: a). Mesos runs inside VMs or container, where live migration happens, so rack information need to be updated. b). LLDP packets are broadcasted by the interval 10s~30s, a vendor specific implementation, and rack information are usually stored in LLDP daemon to be queried. Worst cases(nodes fresh reboot, or daemon restart) would be: Mesos slave have to wait 10s~30s for a valid rack information before register to master. Allow updating attributes will mitigate this problem. c). Framework affinity Framework X prefers to run on the same nodes with another framwork Y. For example, it's desirable for Shark or Spark-SQL to reside on the *worker* node where Alluxio(former Tachyon) to gain more performance boosting as SPARK-6707 ticket message {tachyon=true;us-east-1=false} If framework could advertise agent attributes in the ResourcesOffer process, awesome! #4. Rearrange agents in a more scalable manner, like per rack basis Randomly offering agents resource to framework does not improve data locality, imagine the likelihood of a framework getting resources underneath the same rack, at the scale of +3 nodes. Moreover time to randomly shuffle the agents also grows. How about rearranging the agent in a per rack basis, and a minor change to the way how resources are allocated will fix this. I might not see the whole picture here, so comments are welcomed! On 2016/6/6 17:17, Du, Fan wrote: > Hi, Mesos folks > > I’ve been thinking about M
Re: Rack awareness support for Mesos
On 2016/6/14 21:14, Joris Van Remoortere wrote: On the condition of compatible with existing framework which already rely on parsing attributes for rack information. There is currently nothing in Mesos that specifies the format or structure for rack information in attributes. The fact that operators / frameworks have decided to add this information out of band is their problem to solve. We don't need to be backwards compatible with something we never published to begin with. This is why it's ok for us to consider adding a typed form of failure domain information that is separate from the typeless string attributes. hmm, sounds promising, then we can travel light! Since your interest is in the determination of the values, as opposed to You are presuming my work scope, this is not true from the very beginning. their propagation, I would just urge that you keep in mind that we may (as a project) not want to support this information as the current string attributes. Well understood, thanks for the explanation! Any comments about #3. and #4? — *Joris Van Remoortere* Mesosphere On Tue, Jun 14, 2016 at 3:02 PM, Du, Fan <fan...@intel.com> wrote: On 2016/6/14 20:32, Joris Van Remoortere wrote: #1. Stick with attributes for rack awareness I don't think this is the right approach; however, there seem to be 2 components to this discussion: 1. How the values are presented (Attributes vs. a new type-aware structure) 2. How the values are determined (scripts vs. automation vs. modules) It seems you are more interested in working on #2. If that's the case, please make sure that you don't assume anything about #1, as we not everyone agrees that we will use the existing attributes in the future. On the condition of compatible with existing framework which already rely on parsing attributes for rack information. Quotes from my original statements: For compatibility with existing framework, I tend to be ok with using attributes to convey the rack information By all means, no matter what internal structures to use, current behavior should be honored. btw, I'm also thinking about #1, it's too earlier to bring up the details so far before the ticket got ACCEPTED. Any way, I'm always open to all kind of discussion, thanks for your comments! Joris. For #2, you should focus on an API (module or script results) that will support all the different methods the community wants to use to generate this data. As you mentioned, updating the values for a running agent is not straightforward. A lot of design work will need to go into how these values are propagated to frameworks that have made assumptions about them, and which values are allowed to change vs. not. — *Joris Van Remoortere* Mesosphere On Tue, Jun 14, 2016 at 10:04 AM, Aaron Carey <aca...@ilm.com <mailto:aca...@ilm.com>> wrote: #3 would be very helpful for us. Also related: https://issues.apache.org/jira/browse/MESOS-3059 -- Aaron Carey Production Engineer - Cloud Pipeline Industrial Light & Magic London 020 3751 9150 From: Du, Fan [fan...@intel.com <mailto:fan...@intel.com>] Sent: 14 June 2016 07:24 To: user@mesos.apache.org <mailto:user@mesos.apache.org>; d...@mesos.apache.org <mailto:d...@mesos.apache.org> Cc: Joris Van Remoortere; vinodk...@apache.org <mailto:vinodk...@apache.org> Subject: Re: Rack awareness support for Mesos Hi everyone Let me summarize the discussion about Rack awareness in the community so far. First thanks for all the comments, advices or challenges! :) #1. Stick with attributes for rack awareness For compatibility with existing framework, I tend to be ok with using attributes to convey the rack information, but with the goal to do it automatically, easy to maintain and with good attributes schema. This will bring up below question where the controversy starts. #2. Scripts vs programmatic way Both can be used to set attributes, I've made my arguments in the Jira and the Design doc, I'm not gonna to argue more here. But please take a look discussion at MESOS-3366 before, which allow resources/attributes discovery. A module to implement *slaveAttributesDecorator* hook will works like a charm here in a static way. And need to justify attributes updating. #3. Allow updating attributes Several cases need to be covered here: a). Mesos runs inside VMs or container, where live migration happens, so rack information need to be updated. b). LLDP packets are broadcasted by the interval 10s~30s, a vendor specific implementation, and rack information are usually stored in LLDP daemon to be queried. Worst cases(nodes fresh reboot, or daemon restart) would be: Mesos slave have to wait 10s~30s for a valid rack information
Re: Rack awareness support for Mesos
> On the condition of compatible with existing framework which already rely on parsing attributes for rack information. There is currently nothing in Mesos that specifies the format or structure for rack information in attributes. The fact that operators / frameworks have decided to add this information out of band is their problem to solve. We don't need to be backwards compatible with something we never published to begin with. This is why it's ok for us to consider adding a typed form of failure domain information that is separate from the typeless string attributes. Since your interest is in the determination of the values, as opposed to their propagation, I would just urge that you keep in mind that we may (as a project) not want to support this information as the current string attributes. — *Joris Van Remoortere* Mesosphere On Tue, Jun 14, 2016 at 3:02 PM, Du, Fan <fan...@intel.com> wrote: > > > On 2016/6/14 20:32, Joris Van Remoortere wrote: > >> #1. Stick with attributes for rack awareness >> >> I don't think this is the right approach; however, there seem to be 2 >> components to this discussion: >> >> 1. How the values are presented (Attributes vs. a new type-aware >> structure) >> 2. How the values are determined (scripts vs. automation vs. modules) >> >> It seems you are more interested in working on #2. If that's the case, >> please make sure that you don't assume anything about #1, as we not >> everyone agrees that we will use the existing attributes in the future. >> > > On the condition of compatible with existing framework which already rely > on parsing attributes for rack information. > > Quotes from my original statements: > > For compatibility with existing framework, I tend to be ok with using > > attributes to convey the rack information > > By all means, no matter what internal structures to use, current behavior > should be honored. btw, I'm also thinking about #1, it's too earlier to > bring up the details so far before the ticket got ACCEPTED. > > Any way, I'm always open to all kind of discussion, thanks for your > comments! Joris. > > For #2, you should focus on an API (module or script results) that will >> support all the different methods the community wants to use to generate >> this data. >> >> As you mentioned, updating the values for a running agent is not >> straightforward. A lot of design work will need to go into how these >> values are propagated to frameworks that have made assumptions about >> them, and which values are allowed to change vs. not. >> >> — >> *Joris Van Remoortere* >> Mesosphere >> >> On Tue, Jun 14, 2016 at 10:04 AM, Aaron Carey <aca...@ilm.com >> <mailto:aca...@ilm.com>> wrote: >> >> #3 would be very helpful for us. Also related: >> >> https://issues.apache.org/jira/browse/MESOS-3059 >> >> -- >> >> Aaron Carey >> Production Engineer - Cloud Pipeline >> Industrial Light & Magic >> London >> 020 3751 9150 >> >> >> From: Du, Fan [fan...@intel.com <mailto:fan...@intel.com>] >> Sent: 14 June 2016 07:24 >> To: user@mesos.apache.org <mailto:user@mesos.apache.org>; >> d...@mesos.apache.org <mailto:d...@mesos.apache.org> >> Cc: Joris Van Remoortere; vinodk...@apache.org >> <mailto:vinodk...@apache.org> >> >> Subject: Re: Rack awareness support for Mesos >> >> Hi everyone >> >> Let me summarize the discussion about Rack awareness in the community >> so >> far. First thanks for all the comments, advices or challenges! :) >> >> #1. Stick with attributes for rack awareness >> >> For compatibility with existing framework, I tend to be ok with using >> attributes to convey the rack information, but with the goal to do it >> automatically, easy to maintain and with good attributes schema. This >> will bring up below question where the controversy starts. >> >> #2. Scripts vs programmatic way >> >> Both can be used to set attributes, I've made my arguments in the Jira >> and the Design doc, I'm not gonna to argue more here. But please take >> a >> look discussion at MESOS-3366 before, which allow resources/attributes >> discovery. >> >> A module to implement *slaveAttributesDecorator* hook will works like >> a charm here in a static way. And need to justify attributes updating. >> >> #3. Allow updating attributes >>
Re: Rack awareness support for Mesos
On 2016/6/14 20:32, Joris Van Remoortere wrote: #1. Stick with attributes for rack awareness I don't think this is the right approach; however, there seem to be 2 components to this discussion: 1. How the values are presented (Attributes vs. a new type-aware structure) 2. How the values are determined (scripts vs. automation vs. modules) It seems you are more interested in working on #2. If that's the case, please make sure that you don't assume anything about #1, as we not everyone agrees that we will use the existing attributes in the future. On the condition of compatible with existing framework which already rely on parsing attributes for rack information. Quotes from my original statements: > For compatibility with existing framework, I tend to be ok with using > attributes to convey the rack information By all means, no matter what internal structures to use, current behavior should be honored. btw, I'm also thinking about #1, it's too earlier to bring up the details so far before the ticket got ACCEPTED. Any way, I'm always open to all kind of discussion, thanks for your comments! Joris. For #2, you should focus on an API (module or script results) that will support all the different methods the community wants to use to generate this data. As you mentioned, updating the values for a running agent is not straightforward. A lot of design work will need to go into how these values are propagated to frameworks that have made assumptions about them, and which values are allowed to change vs. not. — *Joris Van Remoortere* Mesosphere On Tue, Jun 14, 2016 at 10:04 AM, Aaron Carey <aca...@ilm.com <mailto:aca...@ilm.com>> wrote: #3 would be very helpful for us. Also related: https://issues.apache.org/jira/browse/MESOS-3059 -- Aaron Carey Production Engineer - Cloud Pipeline Industrial Light & Magic London 020 3751 9150 From: Du, Fan [fan...@intel.com <mailto:fan...@intel.com>] Sent: 14 June 2016 07:24 To: user@mesos.apache.org <mailto:user@mesos.apache.org>; d...@mesos.apache.org <mailto:d...@mesos.apache.org> Cc: Joris Van Remoortere; vinodk...@apache.org <mailto:vinodk...@apache.org> Subject: Re: Rack awareness support for Mesos Hi everyone Let me summarize the discussion about Rack awareness in the community so far. First thanks for all the comments, advices or challenges! :) #1. Stick with attributes for rack awareness For compatibility with existing framework, I tend to be ok with using attributes to convey the rack information, but with the goal to do it automatically, easy to maintain and with good attributes schema. This will bring up below question where the controversy starts. #2. Scripts vs programmatic way Both can be used to set attributes, I've made my arguments in the Jira and the Design doc, I'm not gonna to argue more here. But please take a look discussion at MESOS-3366 before, which allow resources/attributes discovery. A module to implement *slaveAttributesDecorator* hook will works like a charm here in a static way. And need to justify attributes updating. #3. Allow updating attributes Several cases need to be covered here: a). Mesos runs inside VMs or container, where live migration happens, so rack information need to be updated. b). LLDP packets are broadcasted by the interval 10s~30s, a vendor specific implementation, and rack information are usually stored in LLDP daemon to be queried. Worst cases(nodes fresh reboot, or daemon restart) would be: Mesos slave have to wait 10s~30s for a valid rack information before register to master. Allow updating attributes will mitigate this problem. c). Framework affinity Framework X prefers to run on the same nodes with another framwork Y. For example, it's desirable for Shark or Spark-SQL to reside on the *worker* node where Alluxio(former Tachyon) to gain more performance boosting as SPARK-6707 ticket message {tachyon=true;us-east-1=false} If framework could advertise agent attributes in the ResourcesOffer process, awesome! #4. Rearrange agents in a more scalable manner, like per rack basis Randomly offering agents resource to framework does not improve data locality, imagine the likelihood of a framework getting resources underneath the same rack, at the scale of +3 nodes. Moreover time to randomly shuffle the agents also grows. How about rearranging the agent in a per rack basis, and a minor change to the way how resources are allocated will fix this. I might not see the whole picture here, so comments are welcomed! On 2016/6/6 17:17, Du, Fan wrote: > Hi, Mesos folks > > I’ve been thinking about Mesos rack awareness support for a while,
Re: Rack awareness support for Mesos
> > #1. Stick with attributes for rack awareness I don't think this is the right approach; however, there seem to be 2 components to this discussion: 1. How the values are presented (Attributes vs. a new type-aware structure) 2. How the values are determined (scripts vs. automation vs. modules) It seems you are more interested in working on #2. If that's the case, please make sure that you don't assume anything about #1, as we not everyone agrees that we will use the existing attributes in the future. For #2, you should focus on an API (module or script results) that will support all the different methods the community wants to use to generate this data. As you mentioned, updating the values for a running agent is not straightforward. A lot of design work will need to go into how these values are propagated to frameworks that have made assumptions about them, and which values are allowed to change vs. not. — *Joris Van Remoortere* Mesosphere On Tue, Jun 14, 2016 at 10:04 AM, Aaron Carey <aca...@ilm.com> wrote: > #3 would be very helpful for us. Also related: > > https://issues.apache.org/jira/browse/MESOS-3059 > > -- > > Aaron Carey > Production Engineer - Cloud Pipeline > Industrial Light & Magic > London > 020 3751 9150 > > > From: Du, Fan [fan...@intel.com] > Sent: 14 June 2016 07:24 > To: user@mesos.apache.org; d...@mesos.apache.org > Cc: Joris Van Remoortere; vinodk...@apache.org > Subject: Re: Rack awareness support for Mesos > > Hi everyone > > Let me summarize the discussion about Rack awareness in the community so > far. First thanks for all the comments, advices or challenges! :) > > #1. Stick with attributes for rack awareness > > For compatibility with existing framework, I tend to be ok with using > attributes to convey the rack information, but with the goal to do it > automatically, easy to maintain and with good attributes schema. This > will bring up below question where the controversy starts. > > #2. Scripts vs programmatic way > > Both can be used to set attributes, I've made my arguments in the Jira > and the Design doc, I'm not gonna to argue more here. But please take a > look discussion at MESOS-3366 before, which allow resources/attributes > discovery. > > A module to implement *slaveAttributesDecorator* hook will works like > a charm here in a static way. And need to justify attributes updating. > > #3. Allow updating attributes > Several cases need to be covered here: > > a). Mesos runs inside VMs or container, where live migration happens, so > rack information need to be updated. > > b). LLDP packets are broadcasted by the interval 10s~30s, a vendor > specific implementation, and rack information are usually stored in LLDP > daemon to be queried. Worst cases(nodes fresh reboot, or daemon restart) > would be: Mesos slave have to wait 10s~30s for a valid rack information > before register to master. Allow updating attributes will mitigate this > problem. > > c). Framework affinity > > Framework X prefers to run on the same nodes with another framwork Y. > For example, it's desirable for Shark or Spark-SQL to reside on the > *worker* node where Alluxio(former Tachyon) to gain more performance > boosting as SPARK-6707 ticket message {tachyon=true;us-east-1=false} > > If framework could advertise agent attributes in the ResourcesOffer > process, awesome! > > > #4. Rearrange agents in a more scalable manner, like per rack basis > > Randomly offering agents resource to framework does not improve data > locality, imagine the likelihood of a framework getting resources > underneath the same rack, at the scale of +3 nodes. Moreover time to > randomly shuffle the agents also grows. > > How about rearranging the agent in a per rack basis, and a minor change > to the way how resources are allocated will fix this. > > > I might not see the whole picture here, so comments are welcomed! > > > On 2016/6/6 17:17, Du, Fan wrote: > > Hi, Mesos folks > > > > I’ve been thinking about Mesos rack awareness support for a while, > > > > it’s a common interest for lots of data center applications to provide > > data locality, > > > > fault tolerance and better task placement. Create MESOS-5545 to track > > the story, > > > > and here is the initial design doc [1] to support rack awareness in > Mesos. > > > > Looking forward to hear any comments from end user and other developers, > > > > Thanks! > > > > [1]: > > > https://docs.google.com/document/d/1rql_LZSwtQzBPALnk0qCLsmxcT3-zB7X7aJp-H3xxyE/edit?usp=sharing > > >
RE: Rack awareness support for Mesos
#3 would be very helpful for us. Also related: https://issues.apache.org/jira/browse/MESOS-3059 -- Aaron Carey Production Engineer - Cloud Pipeline Industrial Light & Magic London 020 3751 9150 From: Du, Fan [fan...@intel.com] Sent: 14 June 2016 07:24 To: user@mesos.apache.org; d...@mesos.apache.org Cc: Joris Van Remoortere; vinodk...@apache.org Subject: Re: Rack awareness support for Mesos Hi everyone Let me summarize the discussion about Rack awareness in the community so far. First thanks for all the comments, advices or challenges! :) #1. Stick with attributes for rack awareness For compatibility with existing framework, I tend to be ok with using attributes to convey the rack information, but with the goal to do it automatically, easy to maintain and with good attributes schema. This will bring up below question where the controversy starts. #2. Scripts vs programmatic way Both can be used to set attributes, I've made my arguments in the Jira and the Design doc, I'm not gonna to argue more here. But please take a look discussion at MESOS-3366 before, which allow resources/attributes discovery. A module to implement *slaveAttributesDecorator* hook will works like a charm here in a static way. And need to justify attributes updating. #3. Allow updating attributes Several cases need to be covered here: a). Mesos runs inside VMs or container, where live migration happens, so rack information need to be updated. b). LLDP packets are broadcasted by the interval 10s~30s, a vendor specific implementation, and rack information are usually stored in LLDP daemon to be queried. Worst cases(nodes fresh reboot, or daemon restart) would be: Mesos slave have to wait 10s~30s for a valid rack information before register to master. Allow updating attributes will mitigate this problem. c). Framework affinity Framework X prefers to run on the same nodes with another framwork Y. For example, it's desirable for Shark or Spark-SQL to reside on the *worker* node where Alluxio(former Tachyon) to gain more performance boosting as SPARK-6707 ticket message {tachyon=true;us-east-1=false} If framework could advertise agent attributes in the ResourcesOffer process, awesome! #4. Rearrange agents in a more scalable manner, like per rack basis Randomly offering agents resource to framework does not improve data locality, imagine the likelihood of a framework getting resources underneath the same rack, at the scale of +3 nodes. Moreover time to randomly shuffle the agents also grows. How about rearranging the agent in a per rack basis, and a minor change to the way how resources are allocated will fix this. I might not see the whole picture here, so comments are welcomed! On 2016/6/6 17:17, Du, Fan wrote: > Hi, Mesos folks > > I’ve been thinking about Mesos rack awareness support for a while, > > it’s a common interest for lots of data center applications to provide > data locality, > > fault tolerance and better task placement. Create MESOS-5545 to track > the story, > > and here is the initial design doc [1] to support rack awareness in Mesos. > > Looking forward to hear any comments from end user and other developers, > > Thanks! > > [1]: > https://docs.google.com/document/d/1rql_LZSwtQzBPALnk0qCLsmxcT3-zB7X7aJp-H3xxyE/edit?usp=sharing >
Re: Rack awareness support for Mesos
Hi everyone Let me summarize the discussion about Rack awareness in the community so far. First thanks for all the comments, advices or challenges! :) #1. Stick with attributes for rack awareness For compatibility with existing framework, I tend to be ok with using attributes to convey the rack information, but with the goal to do it automatically, easy to maintain and with good attributes schema. This will bring up below question where the controversy starts. #2. Scripts vs programmatic way Both can be used to set attributes, I've made my arguments in the Jira and the Design doc, I'm not gonna to argue more here. But please take a look discussion at MESOS-3366 before, which allow resources/attributes discovery. A module to implement *slaveAttributesDecorator* hook will works like a charm here in a static way. And need to justify attributes updating. #3. Allow updating attributes Several cases need to be covered here: a). Mesos runs inside VMs or container, where live migration happens, so rack information need to be updated. b). LLDP packets are broadcasted by the interval 10s~30s, a vendor specific implementation, and rack information are usually stored in LLDP daemon to be queried. Worst cases(nodes fresh reboot, or daemon restart) would be: Mesos slave have to wait 10s~30s for a valid rack information before register to master. Allow updating attributes will mitigate this problem. c). Framework affinity Framework X prefers to run on the same nodes with another framwork Y. For example, it's desirable for Shark or Spark-SQL to reside on the *worker* node where Alluxio(former Tachyon) to gain more performance boosting as SPARK-6707 ticket message {tachyon=true;us-east-1=false} If framework could advertise agent attributes in the ResourcesOffer process, awesome! #4. Rearrange agents in a more scalable manner, like per rack basis Randomly offering agents resource to framework does not improve data locality, imagine the likelihood of a framework getting resources underneath the same rack, at the scale of +3 nodes. Moreover time to randomly shuffle the agents also grows. How about rearranging the agent in a per rack basis, and a minor change to the way how resources are allocated will fix this. I might not see the whole picture here, so comments are welcomed! On 2016/6/6 17:17, Du, Fan wrote: Hi, Mesos folks I’ve been thinking about Mesos rack awareness support for a while, it’s a common interest for lots of data center applications to provide data locality, fault tolerance and better task placement. Create MESOS-5545 to track the story, and here is the initial design doc [1] to support rack awareness in Mesos. Looking forward to hear any comments from end user and other developers, Thanks! [1]: https://docs.google.com/document/d/1rql_LZSwtQzBPALnk0qCLsmxcT3-zB7X7aJp-H3xxyE/edit?usp=sharing
Re: Rack awareness support for Mesos
On 2016/6/8 0:58, james wrote: Do I have access to the jira system by default joining this list, or do I have to request permission somewhere? (sorry jira is new to me so recommendations on jira, per mesos, in a document, would be keen.) You need a JIRA account, sign up one here: https://issues.apache.org/jira/secure/Signup!default.jspa
Re: Rack awareness support for Mesos
On Tuesday, June 7, 2016, Du, Fanwrote: > > > On 2016/6/6 23:48, Jörg Schad wrote: > >> Hi, >> thanks for your idea and design doc! >> Just a few thoughts: >> a) The scheduling part would be implemented in a framework scheduler and >> not the Mesos Core, or? >> > > I'm not sure which level of scheduling part do you indicate, > For the "Future" section of proposal?, It's Mesos allocation logic. > And how to use rack information to implement advanced features (fault > tolerance, > data locality) is up to the framework scheduling part. > > b) As mentioned by James, this needs to be very flexible (and not >> necessarily based on network structure), >> > > The proposed network topology detection is modular, to fit into Ethernet, > Infiniband, or other network implementation. And yes, user can statically > configure /etc/mesos/rack_id to manipulate the logical network topology > easily. > > > afaik people are using labels >> on the agents to identify different fault domains which can then be >> interpreted by framework scheduler. Maybe it would make sense (instead >> of identifying the network structure) to come up with a common label >> naming scheme which can be understood by all/different frameworks. >> > > I'm not convinced here why still using labels, > Based on what information to label the agents? IMO, cluster operator > still needs something like lldp to find out the network topology, > every cluster operator will need to do it by his own, and it's better > to abstract the logical inside Mesos to provide common interface to > frameworks. LLDP is Ethernet specific however. To go into Mesos, it would need to be higher level as there are people who run Mesos with Infiniband or perhaps an exotic custom networking fabric (Cray and IBM bits come to mind) that might want to take advantage of this functionality. Labels are more generic, but also more flexible in that regard. -- Text by Jeff, typos by iPhone
Re: Rack awareness support for Mesos
+dev. @Fan, I responded on the JIRA with some next steps. Thanks for bringing this up! — *Joris Van Remoortere* Mesosphere On Tue, Jun 7, 2016 at 12:58 PM, james <gar...@verizon.net> wrote: > On 06/07/2016 09:57 AM, Du, Fan wrote: > >> >> >> On 2016/6/6 21:27, james wrote: >> >>> Hello, >>> >>> >>> @Stephen::I guess Stephen is bringing up the 'security' aspect of who >>> get's access to the information, particularly cluster/cloud devops, >>> customers or interlopers? >>> >> >> ACLs should play in this part to address security concern. >> > > YES, and so much more! I know folks that their primary (in house cluster) > usage is deep packet inspection on the cluster > With a cluster (inside) there is no limit to new tools that can be > judiciously altered to benefit from cluster codes > > >> >>> @Fan:: As a consultant, most of my customers either have or are >>> planning hybrid installations, where some codes run on a local cluster >>> or using 'the cloud' for dynamic load requirements. I would think your >>> proposed scheme needs to be very flexible, both in application to a >>> campus or Metropolitan Area Network, if not massively distributed around >>> the globe. What about different resouce types (racks of arm64, gpu >>> centric hardware, DSPs, FPGA etc etc. Hardware diversity bring many >>> benefits to the cluster/cloud capabilities. >>> >>> >>> This also begs the quesion of hardware management (boot/config/online) >>> of the various hardware, such as is built into coreOS. Are several >>> applications going to be supported? Standards track? Just Mesos DC/OS >>> centric? >>> >> >> It depends whether this proposal is accepted by Mesos, if you think >> this feature is useful, let's discuss detailed requirement under >> MESOS-5545. >> > > OK. Take a look at 'Rackview' on sourceforge:: > 'http://rackview.sourceforge.net/' > > > Do I have access to the jira system by default joining this list, > or do I have to request permission somewhere? (sorry jira is new to me > so recommendations on jira, per mesos, in a document, would be keen.) > > >> btw, I have limited knowledge of CoreOS, will look into it. >> > > CoreOS has some great ideas. But many of their codes are not current > (when compared to the gentoo portage tree) and thus many are suspect > for security/function. > > I thought the purpose was to get more folks involved here in discussions > and then better formulated ideas can migrate to the ticket (5545) and > repos. > > >> >>> TIMING DATA:: This is the main issue I see. Once you start 'vectoring >>> in resources' you need to add timing (latency) data to encourage robust >>> and diversified use of of this data. For HPC, this could be very >>> valuable for rDMA abusive algorithms where memory constrained workloads >>> not only need the knowledge of additional nearby memory resources, but >>> the approximated (based on previous data collected) latency and >>> bandwidth constraints to use those additional resources. >>> >> >> Out of curiosity, which open sourced Mesos framework do you/your >> customer run MPI? >> > > Easy dude.Most of this work in tightly help and nothing to publish > or open up yet. It's a mess (my professional opinion) right now and > I'm testing a variety of tools just be able to have better instrumentation > on these codes. Still rDMA is very attractive so it does warrant much > attention and extreme, internal, excitement. > > > > > Mesos can support MPI framework, but AFIK, it's immature [1][2]. >> > > YEP. > > I think this part of work should be investigated in future. >> >> [1]: https://github.com/apache/mesos/tree/master/mpi <- mpd ring >> version >> [2]:https://github.com/mesosphere/mesos-hydra <- hydra version >> > > Many codes floating around. Much excitement on new compiler features. Lots > of hard work and testing going on. That said, the point I was try to make > is "Vectoring in" resources, with a variety of parameters as a companion to > your idea, is warranted for these aforementioned use cases > and other opportunities. > > >> >>> Great idea. I do like it very much. >>> >>> hth, >>> James >>> >>> >>> On 06/06/2016 05:06 AM, Stephen Gran wrote: >>> >>>> Hi, >>>> >>>> This looks potentially interesting. How does it work in a public c
Re: Rack awareness support for Mesos
On 06/07/2016 09:57 AM, Du, Fan wrote: On 2016/6/6 21:27, james wrote: Hello, @Stephen::I guess Stephen is bringing up the 'security' aspect of who get's access to the information, particularly cluster/cloud devops, customers or interlopers? ACLs should play in this part to address security concern. YES, and so much more! I know folks that their primary (in house cluster) usage is deep packet inspection on the cluster With a cluster (inside) there is no limit to new tools that can be judiciously altered to benefit from cluster codes @Fan:: As a consultant, most of my customers either have or are planning hybrid installations, where some codes run on a local cluster or using 'the cloud' for dynamic load requirements. I would think your proposed scheme needs to be very flexible, both in application to a campus or Metropolitan Area Network, if not massively distributed around the globe. What about different resouce types (racks of arm64, gpu centric hardware, DSPs, FPGA etc etc. Hardware diversity bring many benefits to the cluster/cloud capabilities. This also begs the quesion of hardware management (boot/config/online) of the various hardware, such as is built into coreOS. Are several applications going to be supported? Standards track? Just Mesos DC/OS centric? It depends whether this proposal is accepted by Mesos, if you think this feature is useful, let's discuss detailed requirement under MESOS-5545. OK. Take a look at 'Rackview' on sourceforge:: 'http://rackview.sourceforge.net/' Do I have access to the jira system by default joining this list, or do I have to request permission somewhere? (sorry jira is new to me so recommendations on jira, per mesos, in a document, would be keen.) btw, I have limited knowledge of CoreOS, will look into it. CoreOS has some great ideas. But many of their codes are not current (when compared to the gentoo portage tree) and thus many are suspect for security/function. I thought the purpose was to get more folks involved here in discussions and then better formulated ideas can migrate to the ticket (5545) and repos. TIMING DATA:: This is the main issue I see. Once you start 'vectoring in resources' you need to add timing (latency) data to encourage robust and diversified use of of this data. For HPC, this could be very valuable for rDMA abusive algorithms where memory constrained workloads not only need the knowledge of additional nearby memory resources, but the approximated (based on previous data collected) latency and bandwidth constraints to use those additional resources. Out of curiosity, which open sourced Mesos framework do you/your customer run MPI? Easy dude.Most of this work in tightly help and nothing to publish or open up yet. It's a mess (my professional opinion) right now and I'm testing a variety of tools just be able to have better instrumentation on these codes. Still rDMA is very attractive so it does warrant much attention and extreme, internal, excitement. Mesos can support MPI framework, but AFIK, it's immature [1][2]. YEP. I think this part of work should be investigated in future. [1]: https://github.com/apache/mesos/tree/master/mpi <- mpd ring version [2]:https://github.com/mesosphere/mesos-hydra <- hydra version Many codes floating around. Much excitement on new compiler features. Lots of hard work and testing going on. That said, the point I was try to make is "Vectoring in" resources, with a variety of parameters as a companion to your idea, is warranted for these aforementioned use cases and other opportunities. Great idea. I do like it very much. hth, James On 06/06/2016 05:06 AM, Stephen Gran wrote: Hi, This looks potentially interesting. How does it work in a public cloud deployment scenario? I assume you would just have to disable this feature, or not enable it? Cheers, On 06/06/16 10:17, Du, Fan wrote: Hi, Mesos folks I’ve been thinking about Mesos rack awareness support for a while, it’s a common interest for lots of data center applications to provide data locality, fault tolerance and better task placement. Create MESOS-5545 to track the story, and here is the initial design doc [1] to support rack awareness in Mesos. Looking forward to hear any comments from end user and other developers, Thanks! [1]: https://docs.google.com/document/d/1rql_LZSwtQzBPALnk0qCLsmxcT3-zB7X7aJp-H3xxyE/edit?usp=sharing
RE: Rack awareness support for Mesos
Would this perhaps make sense as a mesos module which can automatically assigns labels to the agents, rather than something in the core itself? -- Aaron Carey Production Engineer - Cloud Pipeline Industrial Light & Magic London 020 3751 9150 From: Du, Fan [fan...@intel.com] Sent: 07 June 2016 16:16 To: Jörg Schad; user@mesos.apache.org Subject: Re: Rack awareness support for Mesos On 2016/6/6 23:48, Jörg Schad wrote: > Hi, > thanks for your idea and design doc! > Just a few thoughts: > a) The scheduling part would be implemented in a framework scheduler and > not the Mesos Core, or? I'm not sure which level of scheduling part do you indicate, For the "Future" section of proposal?, It's Mesos allocation logic. And how to use rack information to implement advanced features (fault tolerance, data locality) is up to the framework scheduling part. > b) As mentioned by James, this needs to be very flexible (and not > necessarily based on network structure), The proposed network topology detection is modular, to fit into Ethernet, Infiniband, or other network implementation. And yes, user can statically configure /etc/mesos/rack_id to manipulate the logical network topology easily. >afaik people are using labels > on the agents to identify different fault domains which can then be > interpreted by framework scheduler. Maybe it would make sense (instead > of identifying the network structure) to come up with a common label > naming scheme which can be understood by all/different frameworks. I'm not convinced here why still using labels, Based on what information to label the agents? IMO, cluster operator still needs something like lldp to find out the network topology, every cluster operator will need to do it by his own, and it's better to abstract the logical inside Mesos to provide common interface to frameworks. Honestly speaking, I don't follow the argument here for the labels. The proposal is designed to do it *automatically* to reduce maintenance effort. > Looking forward to your thoughts on this! > > On Mon, Jun 6, 2016 at 3:27 PM, james <gar...@verizon.net > <mailto:gar...@verizon.net>> wrote: > > Hello, > > > @Stephen::I guess Stephen is bringing up the 'security' aspect of > who get's access to the information, particularly cluster/cloud > devops, customers or interlopers? > > > @Fan:: As a consultant, most of my customers either have or are > planning hybrid installations, where some codes run on a local > cluster or using 'the cloud' for dynamic load requirements. I would > think your proposed scheme needs to be very flexible, both in > application to a campus or Metropolitan Area Network, if not > massively distributed around the globe. What about different resouce > types (racks of arm64, gpu centric hardware, DSPs, FPGA etc etc. > Hardware diversity bring many > benefits to the cluster/cloud capabilities. > > > This also begs the quesion of hardware management (boot/config/online) > of the various hardware, such as is built into coreOS. Are several > applications going to be supported? Standards track? Just Mesos DC/OS > centric? > > > TIMING DATA:: This is the main issue I see. Once you start 'vectoring > in resources' you need to add timing (latency) data to encourage robust > and diversified use of of this data. For HPC, this could be very > valuable for rDMA abusive algorithms where memory constrained > workloads not only need the knowledge of additional nearby memory > resources, but > the approximated (based on previous data collected) latency and > bandwidth constraints to use those additional resources. > > > Great idea. I do like it very much. > > hth, > James > > > > On 06/06/2016 05:06 AM, Stephen Gran wrote: > > Hi, > > This looks potentially interesting. How does it work in a > public cloud > deployment scenario? I assume you would just have to disable this > feature, or not enable it? > > Cheers, > > On 06/06/16 10:17, Du, Fan wrote: > > Hi, Mesos folks > > I’ve been thinking about Mesos rack awareness support for a > while, > > it’s a common interest for lots of data center applications > to provide > data locality, > > fault tolerance and better task placement. Create MESOS-5545 > to track > the story, > > and here is the initial design doc [1] to support rack > awareness in Mesos. > > Looking forward to hear any comments from end user and other > developers, > > Thanks! > > [1]: > > https://docs.google.com/document/d/1rql_LZSwtQzBPALnk0qCLsmxcT3-zB7X7aJp-H3xxyE/edit?usp=sharing > > > >
Re: Rack awareness support for Mesos
On 2016/6/6 23:48, Jörg Schad wrote: Hi, thanks for your idea and design doc! Just a few thoughts: a) The scheduling part would be implemented in a framework scheduler and not the Mesos Core, or? I'm not sure which level of scheduling part do you indicate, For the "Future" section of proposal?, It's Mesos allocation logic. And how to use rack information to implement advanced features (fault tolerance, data locality) is up to the framework scheduling part. b) As mentioned by James, this needs to be very flexible (and not necessarily based on network structure), The proposed network topology detection is modular, to fit into Ethernet, Infiniband, or other network implementation. And yes, user can statically configure /etc/mesos/rack_id to manipulate the logical network topology easily. afaik people are using labels on the agents to identify different fault domains which can then be interpreted by framework scheduler. Maybe it would make sense (instead of identifying the network structure) to come up with a common label naming scheme which can be understood by all/different frameworks. I'm not convinced here why still using labels, Based on what information to label the agents? IMO, cluster operator still needs something like lldp to find out the network topology, every cluster operator will need to do it by his own, and it's better to abstract the logical inside Mesos to provide common interface to frameworks. Honestly speaking, I don't follow the argument here for the labels. The proposal is designed to do it *automatically* to reduce maintenance effort. Looking forward to your thoughts on this! On Mon, Jun 6, 2016 at 3:27 PM, james <gar...@verizon.net <mailto:gar...@verizon.net>> wrote: Hello, @Stephen::I guess Stephen is bringing up the 'security' aspect of who get's access to the information, particularly cluster/cloud devops, customers or interlopers? @Fan:: As a consultant, most of my customers either have or are planning hybrid installations, where some codes run on a local cluster or using 'the cloud' for dynamic load requirements. I would think your proposed scheme needs to be very flexible, both in application to a campus or Metropolitan Area Network, if not massively distributed around the globe. What about different resouce types (racks of arm64, gpu centric hardware, DSPs, FPGA etc etc. Hardware diversity bring many benefits to the cluster/cloud capabilities. This also begs the quesion of hardware management (boot/config/online) of the various hardware, such as is built into coreOS. Are several applications going to be supported? Standards track? Just Mesos DC/OS centric? TIMING DATA:: This is the main issue I see. Once you start 'vectoring in resources' you need to add timing (latency) data to encourage robust and diversified use of of this data. For HPC, this could be very valuable for rDMA abusive algorithms where memory constrained workloads not only need the knowledge of additional nearby memory resources, but the approximated (based on previous data collected) latency and bandwidth constraints to use those additional resources. Great idea. I do like it very much. hth, James On 06/06/2016 05:06 AM, Stephen Gran wrote: Hi, This looks potentially interesting. How does it work in a public cloud deployment scenario? I assume you would just have to disable this feature, or not enable it? Cheers, On 06/06/16 10:17, Du, Fan wrote: Hi, Mesos folks I’ve been thinking about Mesos rack awareness support for a while, it’s a common interest for lots of data center applications to provide data locality, fault tolerance and better task placement. Create MESOS-5545 to track the story, and here is the initial design doc [1] to support rack awareness in Mesos. Looking forward to hear any comments from end user and other developers, Thanks! [1]: https://docs.google.com/document/d/1rql_LZSwtQzBPALnk0qCLsmxcT3-zB7X7aJp-H3xxyE/edit?usp=sharing
Re: Rack awareness support for Mesos
On 2016/6/6 21:27, james wrote: Hello, @Stephen::I guess Stephen is bringing up the 'security' aspect of who get's access to the information, particularly cluster/cloud devops, customers or interlopers? ACLs should play in this part to address security concern. @Fan:: As a consultant, most of my customers either have or are planning hybrid installations, where some codes run on a local cluster or using 'the cloud' for dynamic load requirements. I would think your proposed scheme needs to be very flexible, both in application to a campus or Metropolitan Area Network, if not massively distributed around the globe. What about different resouce types (racks of arm64, gpu centric hardware, DSPs, FPGA etc etc. Hardware diversity bring many benefits to the cluster/cloud capabilities. This also begs the quesion of hardware management (boot/config/online) of the various hardware, such as is built into coreOS. Are several applications going to be supported? Standards track? Just Mesos DC/OS centric? It depends whether this proposal is accepted by Mesos, if you think this feature is useful, let's discuss detailed requirement under MESOS-5545. btw, I have limited knowledge of CoreOS, will look into it. TIMING DATA:: This is the main issue I see. Once you start 'vectoring in resources' you need to add timing (latency) data to encourage robust and diversified use of of this data. For HPC, this could be very valuable for rDMA abusive algorithms where memory constrained workloads not only need the knowledge of additional nearby memory resources, but the approximated (based on previous data collected) latency and bandwidth constraints to use those additional resources. Out of curiosity, which open sourced Mesos framework do you/your customer run MPI? Mesos can support MPI framework, but AFIK, it's immature [1][2]. I think this part of work should be investigated in future. [1]: https://github.com/apache/mesos/tree/master/mpi <- mpd ring version [2]:https://github.com/mesosphere/mesos-hydra <- hydra version Great idea. I do like it very much. hth, James On 06/06/2016 05:06 AM, Stephen Gran wrote: Hi, This looks potentially interesting. How does it work in a public cloud deployment scenario? I assume you would just have to disable this feature, or not enable it? Cheers, On 06/06/16 10:17, Du, Fan wrote: Hi, Mesos folks I’ve been thinking about Mesos rack awareness support for a while, it’s a common interest for lots of data center applications to provide data locality, fault tolerance and better task placement. Create MESOS-5545 to track the story, and here is the initial design doc [1] to support rack awareness in Mesos. Looking forward to hear any comments from end user and other developers, Thanks! [1]: https://docs.google.com/document/d/1rql_LZSwtQzBPALnk0qCLsmxcT3-zB7X7aJp-H3xxyE/edit?usp=sharing
Re: Rack awareness support for Mesos
Hi, thanks for your idea and design doc! Just a few thoughts: a) The scheduling part would be implemented in a framework scheduler and not the Mesos Core, or? b) As mentioned by James, this needs to be very flexible (and not necessarily based on network structure), afaik people are using labels on the agents to identify different fault domains which can then be interpreted by framework scheduler. Maybe it would make sense (instead of identifying the network structure) to come up with a common label naming scheme which can be understood by all/different frameworks. Looking forward to your thoughts on this! On Mon, Jun 6, 2016 at 3:27 PM, james <gar...@verizon.net> wrote: > Hello, > > > @Stephen::I guess Stephen is bringing up the 'security' aspect of who > get's access to the information, particularly cluster/cloud devops, > customers or interlopers? > > > @Fan:: As a consultant, most of my customers either have or are planning > hybrid installations, where some codes run on a local cluster or using 'the > cloud' for dynamic load requirements. I would think your proposed scheme > needs to be very flexible, both in application to a campus or Metropolitan > Area Network, if not massively distributed around the globe. What about > different resouce types (racks of arm64, gpu centric hardware, DSPs, FPGA > etc etc. Hardware diversity bring many > benefits to the cluster/cloud capabilities. > > > This also begs the quesion of hardware management (boot/config/online) > of the various hardware, such as is built into coreOS. Are several > applications going to be supported? Standards track? Just Mesos DC/OS > centric? > > > TIMING DATA:: This is the main issue I see. Once you start 'vectoring > in resources' you need to add timing (latency) data to encourage robust > and diversified use of of this data. For HPC, this could be very valuable > for rDMA abusive algorithms where memory constrained workloads not only > need the knowledge of additional nearby memory resources, but > the approximated (based on previous data collected) latency and bandwidth > constraints to use those additional resources. > > > Great idea. I do like it very much. > > hth, > James > > > > On 06/06/2016 05:06 AM, Stephen Gran wrote: > >> Hi, >> >> This looks potentially interesting. How does it work in a public cloud >> deployment scenario? I assume you would just have to disable this >> feature, or not enable it? >> >> Cheers, >> >> On 06/06/16 10:17, Du, Fan wrote: >> >>> Hi, Mesos folks >>> >>> I’ve been thinking about Mesos rack awareness support for a while, >>> >>> it’s a common interest for lots of data center applications to provide >>> data locality, >>> >>> fault tolerance and better task placement. Create MESOS-5545 to track >>> the story, >>> >>> and here is the initial design doc [1] to support rack awareness in >>> Mesos. >>> >>> Looking forward to hear any comments from end user and other developers, >>> >>> Thanks! >>> >>> [1]: >>> >>> https://docs.google.com/document/d/1rql_LZSwtQzBPALnk0qCLsmxcT3-zB7X7aJp-H3xxyE/edit?usp=sharing >>> >>> >> >
Re: Rack awareness support for Mesos
Hello, @Stephen::I guess Stephen is bringing up the 'security' aspect of who get's access to the information, particularly cluster/cloud devops, customers or interlopers? @Fan:: As a consultant, most of my customers either have or are planning hybrid installations, where some codes run on a local cluster or using 'the cloud' for dynamic load requirements. I would think your proposed scheme needs to be very flexible, both in application to a campus or Metropolitan Area Network, if not massively distributed around the globe. What about different resouce types (racks of arm64, gpu centric hardware, DSPs, FPGA etc etc. Hardware diversity bring many benefits to the cluster/cloud capabilities. This also begs the quesion of hardware management (boot/config/online) of the various hardware, such as is built into coreOS. Are several applications going to be supported? Standards track? Just Mesos DC/OS centric? TIMING DATA:: This is the main issue I see. Once you start 'vectoring in resources' you need to add timing (latency) data to encourage robust and diversified use of of this data. For HPC, this could be very valuable for rDMA abusive algorithms where memory constrained workloads not only need the knowledge of additional nearby memory resources, but the approximated (based on previous data collected) latency and bandwidth constraints to use those additional resources. Great idea. I do like it very much. hth, James On 06/06/2016 05:06 AM, Stephen Gran wrote: Hi, This looks potentially interesting. How does it work in a public cloud deployment scenario? I assume you would just have to disable this feature, or not enable it? Cheers, On 06/06/16 10:17, Du, Fan wrote: Hi, Mesos folks I’ve been thinking about Mesos rack awareness support for a while, it’s a common interest for lots of data center applications to provide data locality, fault tolerance and better task placement. Create MESOS-5545 to track the story, and here is the initial design doc [1] to support rack awareness in Mesos. Looking forward to hear any comments from end user and other developers, Thanks! [1]: https://docs.google.com/document/d/1rql_LZSwtQzBPALnk0qCLsmxcT3-zB7X7aJp-H3xxyE/edit?usp=sharing
Re: Rack awareness support for Mesos
Hi, This looks potentially interesting. How does it work in a public cloud deployment scenario? I assume you would just have to disable this feature, or not enable it? Cheers, On 06/06/16 10:17, Du, Fan wrote: > Hi, Mesos folks > > I’ve been thinking about Mesos rack awareness support for a while, > > it’s a common interest for lots of data center applications to provide > data locality, > > fault tolerance and better task placement. Create MESOS-5545 to track > the story, > > and here is the initial design doc [1] to support rack awareness in Mesos. > > Looking forward to hear any comments from end user and other developers, > > Thanks! > > [1]: > https://docs.google.com/document/d/1rql_LZSwtQzBPALnk0qCLsmxcT3-zB7X7aJp-H3xxyE/edit?usp=sharing > -- Stephen Gran Senior Technical Architect picture the possibilities | piksel.com
Rack awareness support for Mesos
Hi, Mesos folks I've been thinking about Mesos rack awareness support for a while, it's a common interest for lots of data center applications to provide data locality, fault tolerance and better task placement. Create MESOS-5545 to track the story, and here is the initial design doc [1] to support rack awareness in Mesos. Looking forward to hear any comments from end user and other developers, Thanks! [1]: https://docs.google.com/document/d/1rql_LZSwtQzBPALnk0qCLsmxcT3-zB7X7aJp-H3xxyE/edit?usp=sharing