Re: [openstack-dev] [TripleO] Generalising racks :- modelling a datacentre
Hi On 25 September 2013 04:15, Robert Collins robe...@robertcollins.netwrote: E.g. for any node I should be able to ask: - what failure domains is this in? [e.g. power-45, switch-23, ac-15, az-3, region-1] - what locality-of-reference features does this have? [e.g. switch-23, az-3, region-1] - where is it [e.g. DC 2, pod 4, enclosure 2, row 5, rack 3, RU 30, cartridge 40]. So, what do you think? As a recovering data-centre person, I love the idea of being able to map a given thing to not only its physical location, but its failure domain. +1 -- Cheers, Chris ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO] Generalising racks :- modelling a datacentre
On 09/25/2013 05:15 AM, Robert Collins wrote: One of the major things Tuskar does is model a datacenter - which is very useful for error correlation, capacity planning and scheduling. Long term I'd like this to be held somewhere where it is accessible for schedulers and ceilometer etc. E.g. network topology + switch information might be held by neutron where schedulers can rely on it being available, or possibly held by a unified topology db with scheduler glued into that, but updated by neutron / nova / cinder. Obviously this is a) non-trivial and b) not designed yet. However, the design of Tuskar today needs to accomodate a few things: - multiple reference architectures for clouds (unless there really is one true design) - the fact that today we don't have such an integrated vertical scheduler. So the current Tuskar model has three constructs that tie together to model the DC: - nodes - resource classes (grouping different types of nodes into service offerings - e.g. nodes that offer swift, or those that offer nova). - 'racks' AIUI the initial concept of Rack was to map to a physical rack, but this rapidly got shifted to be 'Logical Rack' rather than physical rack, but I think of Rack as really just a special case of a general modelling problem.. Yeah. Eventually, we settled on Logical Rack meaning a set of nodes on the same L2 network (in a setup where you would group nodes into isolated L2 segments). Which kind of suggests we come up with a better name. I agree there's a lot more useful stuff to model than just racks (or just L2 node groups). From a deployment perspective, if you have two disconnected infrastructures, thats two AZ's, and two underclouds : so we know that any one undercloud is fully connected (possibly multiple subnets, but one infrastructure). When would we want to subdivide that? One case is quick fault aggregation: if a physical rack loses power, rather than having 16 NOC folk independently investigating the same 16 down hypervisors, one would prefer to identify that the power to the rack has failed (for non-HA powered racks); likewise if a single switch fails (for non-HA network topologies) you want to identify that that switch is down rather than investigating all the cascaded errors independently. A second case is scheduling: you may want to put nova instances on the same switch as the cinder service delivering their block devices, when possible, or split VM's serving HA tasks apart. (We currently do this with host aggregates, but being able to do it directly would be much nicer). Lastly, if doing physical operations like power maintenance or moving racks around in a datacentre, being able to identify machines in the same rack can be super useful for planning, downtime announcements, orhttps://plus.google.com/hangouts/_/04919b4400b8c4c5ba706b752610cd433d9acbe1 host evacuation, and being able to find a specific machine in a DC is also important (e.g. what shelf in the rack, what cartridge in a chassis). I agree. However, we should take care not to commit ourselves to building a DCIM just yet. Back to 'Logical Rack' - you can see then that having a single construct to group machines together doesn't really support these use cases in a systematic fasion:- Physical rack modelling supports only a subset of the location/performance/failure use cases, and Logical rack doesn't support them at all: we're missing all the rich data we need to aggregate faults rapidly : power, network, air conditioning - and these things cover both single machine/groups of machines/racks/rows of racks scale (consider a networked PDU with 10 hosts on it - thats a fraction of a rack). So, what I'm suggesting is that we model the failure and performance domains directly, and include location (which is the incremental data racks add once failure and performance domains are modelled) too. We can separately noodle on exactly what failure domain and performance domain modelling looks like - e.g. the scheduler focus group would be a good place to have that discussion. Yeah I think it's pretty clear that the current Tuskar concept where Racks are the first-class objects isn't going to fly. We should switch our focus on the individual nodes and their grouping and metadata. I'd like to start with something small and simple that we can improve upon, though. How about just going with freeform tags and key/value metadata for the nodes? We can define some well-known tags and keys to begin with (rack, l2-network, power, switch, etc.), it would be easy to iterate and once we settle on the things we need, we can solidify them more. In the meantime, we have the API flexible enough to handle whatever architectures we end up supporting and the UI can provide the appropriate views into the data. And this would allow people to add their own criteria that we didn't consider. E.g. for any node I should be able to ask: - what failure domains is this in? [e.g. power-45,
Re: [openstack-dev] [TripleO] Generalising racks :- modelling a datacentre
I agree that such a thing is useful for scheduling. I see a bit of a tension here: for software engineering reasons we want some independence, but we also want to avoid wasteful duplication. I think we are collectively backing into the problem of metamodeling for datacenters, and establishing one or more software thingies that will contain/communicate datacenter models. A collection of nodes annotated with tags is a metamodel. You could define a graph-based metamodel without mandating any particular graph shape. You could be more prescriptive and mandate a tree shape as a good compromise between flexibility and making something that is reasonably easy to process. We can debate what the metamodel should be, but that is different from debating whether there is a metamodel. Regards, Mike From: Tomas Sedovic tsedo...@redhat.com To: openstack-dev@lists.openstack.org, Date: 09/25/2013 10:37 AM Subject:Re: [openstack-dev] [TripleO] Generalising racks :- modelling a datacentre On 09/25/2013 05:15 AM, Robert Collins wrote: One of the major things Tuskar does is model a datacenter - which is very useful for error correlation, capacity planning and scheduling. Long term I'd like this to be held somewhere where it is accessible for schedulers and ceilometer etc. E.g. network topology + switch information might be held by neutron where schedulers can rely on it being available, or possibly held by a unified topology db with scheduler glued into that, but updated by neutron / nova / cinder. Obviously this is a) non-trivial and b) not designed yet. However, the design of Tuskar today needs to accomodate a few things: - multiple reference architectures for clouds (unless there really is one true design) - the fact that today we don't have such an integrated vertical scheduler. So the current Tuskar model has three constructs that tie together to model the DC: - nodes - resource classes (grouping different types of nodes into service offerings - e.g. nodes that offer swift, or those that offer nova). - 'racks' AIUI the initial concept of Rack was to map to a physical rack, but this rapidly got shifted to be 'Logical Rack' rather than physical rack, but I think of Rack as really just a special case of a general modelling problem.. Yeah. Eventually, we settled on Logical Rack meaning a set of nodes on the same L2 network (in a setup where you would group nodes into isolated L2 segments). Which kind of suggests we come up with a better name. I agree there's a lot more useful stuff to model than just racks (or just L2 node groups). From a deployment perspective, if you have two disconnected infrastructures, thats two AZ's, and two underclouds : so we know that any one undercloud is fully connected (possibly multiple subnets, but one infrastructure). When would we want to subdivide that? One case is quick fault aggregation: if a physical rack loses power, rather than having 16 NOC folk independently investigating the same 16 down hypervisors, one would prefer to identify that the power to the rack has failed (for non-HA powered racks); likewise if a single switch fails (for non-HA network topologies) you want to identify that that switch is down rather than investigating all the cascaded errors independently. A second case is scheduling: you may want to put nova instances on the same switch as the cinder service delivering their block devices, when possible, or split VM's serving HA tasks apart. (We currently do this with host aggregates, but being able to do it directly would be much nicer). Lastly, if doing physical operations like power maintenance or moving racks around in a datacentre, being able to identify machines in the same rack can be super useful for planning, downtime announcements, orhttps://plus.google.com/hangouts/_/04919b4400b8c4c5ba706b752610cd433d9acbe1 host evacuation, and being able to find a specific machine in a DC is also important (e.g. what shelf in the rack, what cartridge in a chassis). I agree. However, we should take care not to commit ourselves to building a DCIM just yet. Back to 'Logical Rack' - you can see then that having a single construct to group machines together doesn't really support these use cases in a systematic fasion:- Physical rack modelling supports only a subset of the location/performance/failure use cases, and Logical rack doesn't support them at all: we're missing all the rich data we need to aggregate faults rapidly : power, network, air conditioning - and these things cover both single machine/groups of machines/racks/rows of racks scale (consider a networked PDU with 10 hosts on it - thats a fraction of a rack). So, what I'm suggesting is that we model the failure and performance domains directly, and include location (which is the incremental data racks add once failure and performance domains are modelled) too. We can separately noodle
Re: [openstack-dev] [TripleO] Generalising racks :- modelling a datacentre
On Sep 25, 2013, at 10:36 AM, Tomas Sedovic wrote: On 09/25/2013 05:15 AM, Robert Collins wrote: One of the major things Tuskar does is model a datacenter - which is very useful for error correlation, capacity planning and scheduling. Tuskar was designed for general infrastructure modeling within the scope of OpenStack. Yes, Tuskar could be used to model a datacenter but that was not its original design goal. This is not to say that modeling a datacenter wouldn't be useful and some of the points, concepts and ideas later in the post are very good. But in terms of an MVP, we were focused on providing an easy approach for cloud operators wishing to deploy OpenStack. What we're seeing is a use case where deployments are fairly small (small being 2-30 racks of gear). Long term I'd like this to be held somewhere where it is accessible for schedulers and ceilometer etc. E.g. network topology + switch information might be held by neutron where schedulers can rely on it being available, or possibly held by a unified topology db with scheduler glued into that, but updated by neutron / nova / cinder. Obviously this is a) non-trivial and b) not designed yet. However, the design of Tuskar today needs to accomodate a few things: - multiple reference architectures for clouds (unless there really is one true design) - the fact that today we don't have such an integrated vertical scheduler. +1 to both, but recognizing that these are long term asks. So the current Tuskar model has three constructs that tie together to model the DC: - nodes - resource classes (grouping different types of nodes into service offerings - e.g. nodes that offer swift, or those that offer nova). - 'racks' AIUI the initial concept of Rack was to map to a physical rack, but this rapidly got shifted to be 'Logical Rack' rather than physical rack, but I think of Rack as really just a special case of a general modelling problem.. Yeah. Eventually, we settled on Logical Rack meaning a set of nodes on the same L2 network (in a setup where you would group nodes into isolated L2 segments). Which kind of suggests we come up with a better name. I agree there's a lot more useful stuff to model than just racks (or just L2 node groups). Indeed. We chose the label rack because most folk understand it. When generating a bill of materials for cloud gear for example, people tend to think in rack elevations, etc. The rack model breaks down a bit when you start to consider things like system on chip solutions like Moonshot with the possibility of a number of chassis within a physical rack. This prompted further refinement of the concept. And as Tomas mentioned, we have shifted to logical racks based on L2 binding between nodes. Better, more fitting naming ideas here are welcome. From a deployment perspective, if you have two disconnected infrastructures, thats two AZ's, and two underclouds : so we know that any one undercloud is fully connected (possibly multiple subnets, but one infrastructure). When would we want to subdivide that? One case is quick fault aggregation: if a physical rack loses power, rather than having 16 NOC folk independently investigating the same 16 down hypervisors, one would prefer to identify that the power to the rack has failed (for non-HA powered racks); likewise if a single switch fails (for non-HA network topologies) you want to identify that that switch is down rather than investigating all the cascaded errors independently. A second case is scheduling: you may want to put nova instances on the same switch as the cinder service delivering their block devices, when possible, or split VM's serving HA tasks apart. (We currently do this with host aggregates, but being able to do it directly would be much nicer). Lastly, if doing physical operations like power maintenance or moving racks around in a datacentre, being able to identify machines in the same rack can be super useful for planning, downtime announcements, orhttps://plus.google.com/hangouts/_/04919b4400b8c4c5ba706b752610cd433d9acbe1 host evacuation, and being able to find a specific machine in a DC is also important (e.g. what shelf in the rack, what cartridge in a chassis). I agree. However, we should take care not to commit ourselves to building a DCIM just yet. Back to 'Logical Rack' - you can see then that having a single construct to group machines together doesn't really support these use cases in a systematic fasion:- Physical rack modelling supports only a subset of the location/performance/failure use cases, and Logical rack doesn't support them at all: we're missing all the rich data we need to aggregate faults rapidly : power, network, air conditioning - and these things cover both single machine/groups of machines/racks/rows of racks scale (consider a networked PDU with 10 hosts on it - thats a fraction of a rack). So, what I'm suggesting
Re: [openstack-dev] [TripleO] Generalising racks :- modelling a datacentre
We're interested in topology support in order to support placement locality (to optimize task placement in ensembles). Vish and I started talking about what would be needed a few months ago, and came up with two approaches that would work: - modelling the system as a full graph (ie rich enough topology information in order to represent orthogonal concerns, like power and network, for example) - a limited approach where location was described through a feature vector that could be used for determining group diameter, which could be in turn used to compute group affinity and dispersion. We're also beginning to think about trying to expose network topology upwards for scheduling as well. When you are interested in full topology, you can't take any shortcuts, so I think that we're stuck with a full graph for this. It definitely makes sense to have a well maintained, flexible single definition of this data that can be used everywhere. -nld ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev