Re: [openstack-dev] [TripleO] Generalising racks :- modelling a datacentre

2013-09-25 Thread Chris Jones
Hi

On 25 September 2013 04:15, Robert Collins robe...@robertcollins.netwrote:

 E.g. for any node I should be able to ask:
 - what failure domains is this in? [e.g. power-45, switch-23, ac-15,
 az-3, region-1]
 - what locality-of-reference features does this have? [e.g. switch-23,
 az-3, region-1]
 - where is it [e.g. DC 2, pod 4, enclosure 2, row 5, rack 3, RU 30,
 cartridge 40].




 So, what do you think?


As a recovering data-centre person, I love the idea of being able to map a
given thing to not only its physical location, but its failure domain. +1

-- 
Cheers,

Chris
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Generalising racks :- modelling a datacentre

2013-09-25 Thread Tomas Sedovic

On 09/25/2013 05:15 AM, Robert Collins wrote:

One of the major things Tuskar does is model a datacenter - which is
very useful for error correlation, capacity planning and scheduling.

Long term I'd like this to be held somewhere where it is accessible
for schedulers and ceilometer etc. E.g. network topology + switch
information might be held by neutron where schedulers can rely on it
being available, or possibly held by a unified topology db with
scheduler glued into that, but updated by neutron / nova / cinder.
Obviously this is a) non-trivial and b) not designed yet.

However, the design of Tuskar today needs to accomodate a few things:
  - multiple reference architectures for clouds (unless there really is
one true design)
  - the fact that today we don't have such an integrated vertical scheduler.

So the current Tuskar model has three constructs that tie together to
model the DC:
  - nodes
  - resource classes (grouping different types of nodes into service
offerings - e.g. nodes that offer swift, or those that offer nova).
  - 'racks'

AIUI the initial concept of Rack was to map to a physical rack, but
this rapidly got shifted to be 'Logical Rack' rather than physical
rack, but I think of Rack as really just a special case of a general
modelling problem..


Yeah. Eventually, we settled on Logical Rack meaning a set of nodes on 
the same L2 network (in a setup where you would group nodes into 
isolated L2 segments). Which kind of suggests we come up with a better name.


I agree there's a lot more useful stuff to model than just racks (or 
just L2 node groups).





From a deployment perspective, if you have two disconnected

infrastructures, thats two AZ's, and two underclouds : so we know that
any one undercloud is fully connected (possibly multiple subnets, but
one infrastructure). When would we want to subdivide that?

One case is quick fault aggregation: if a physical rack loses power,
rather than having 16 NOC folk independently investigating the same 16
down hypervisors, one would prefer to identify that the power to the
rack has failed (for non-HA powered racks); likewise if a single
switch fails (for non-HA network topologies) you want to identify that
that switch is down rather than investigating all the cascaded errors
independently.

A second case is scheduling: you may want to put nova instances on the
same switch as the cinder service delivering their block devices, when
possible, or split VM's serving HA tasks apart. (We currently do this
with host aggregates, but being able to do it directly would be much
nicer).

Lastly, if doing physical operations like power maintenance or moving
racks around in a datacentre, being able to identify machines in the
same rack can be super useful for planning, downtime announcements, 
orhttps://plus.google.com/hangouts/_/04919b4400b8c4c5ba706b752610cd433d9acbe1
host evacuation, and being able to find a specific machine in a DC is
also important (e.g. what shelf in the rack, what cartridge in a
chassis).


I agree. However, we should take care not to commit ourselves to 
building a DCIM just yet.




Back to 'Logical Rack' - you can see then that having a single
construct to group machines together doesn't really support these use
cases in a systematic fasion:- Physical rack modelling supports only a
subset of the location/performance/failure use cases, and Logical rack
doesn't support them at all: we're missing all the rich data we need
to aggregate faults rapidly : power, network, air conditioning - and
these things cover both single machine/groups of machines/racks/rows
of racks scale (consider a networked PDU with 10 hosts on it - thats a
fraction of a rack).

So, what I'm suggesting is that we model the failure and performance
domains directly, and include location (which is the incremental data
racks add once failure and performance domains are modelled) too. We
can separately noodle on exactly what failure domain and performance
domain modelling looks like - e.g. the scheduler focus group would be
a good place to have that discussion.


Yeah I think it's pretty clear that the current Tuskar concept where 
Racks are the first-class objects isn't going to fly. We should switch 
our focus on the individual nodes and their grouping and metadata.


I'd like to start with something small and simple that we can improve 
upon, though. How about just going with freeform tags and key/value 
metadata for the nodes?


We can define some well-known tags and keys to begin with (rack, 
l2-network, power, switch, etc.), it would be easy to iterate and once 
we settle on the things we need, we can solidify them more.


In the meantime, we have the API flexible enough to handle whatever 
architectures we end up supporting and the UI can provide the 
appropriate views into the data.


And this would allow people to add their own criteria that we didn't 
consider.




E.g. for any node I should be able to ask:
- what failure domains is this in? [e.g. power-45, 

Re: [openstack-dev] [TripleO] Generalising racks :- modelling a datacentre

2013-09-25 Thread Mike Spreitzer
I agree that such a thing is useful for scheduling.  I see a bit of a 
tension here: for software engineering reasons we want some independence, 
but we also want to avoid wasteful duplication.

I think we are collectively backing into the problem of metamodeling for 
datacenters, and establishing one or more software thingies that will 
contain/communicate datacenter models.  A collection of nodes annotated 
with tags is a metamodel.  You could define a graph-based metamodel 
without mandating any particular graph shape.  You could be more 
prescriptive and mandate a tree shape as a good compromise between 
flexibility and making something that is reasonably easy to process.  We 
can debate what the metamodel should be, but that is different from 
debating whether there is a metamodel.

Regards,
Mike



From:   Tomas Sedovic tsedo...@redhat.com
To: openstack-dev@lists.openstack.org, 
Date:   09/25/2013 10:37 AM
Subject:Re: [openstack-dev] [TripleO] Generalising racks :- 
modelling a datacentre



On 09/25/2013 05:15 AM, Robert Collins wrote:
 One of the major things Tuskar does is model a datacenter - which is
 very useful for error correlation, capacity planning and scheduling.

 Long term I'd like this to be held somewhere where it is accessible
 for schedulers and ceilometer etc. E.g. network topology + switch
 information might be held by neutron where schedulers can rely on it
 being available, or possibly held by a unified topology db with
 scheduler glued into that, but updated by neutron / nova / cinder.
 Obviously this is a) non-trivial and b) not designed yet.

 However, the design of Tuskar today needs to accomodate a few things:
   - multiple reference architectures for clouds (unless there really is
 one true design)
   - the fact that today we don't have such an integrated vertical 
scheduler.

 So the current Tuskar model has three constructs that tie together to
 model the DC:
   - nodes
   - resource classes (grouping different types of nodes into service
 offerings - e.g. nodes that offer swift, or those that offer nova).
   - 'racks'

 AIUI the initial concept of Rack was to map to a physical rack, but
 this rapidly got shifted to be 'Logical Rack' rather than physical
 rack, but I think of Rack as really just a special case of a general
 modelling problem..

Yeah. Eventually, we settled on Logical Rack meaning a set of nodes on 
the same L2 network (in a setup where you would group nodes into 
isolated L2 segments). Which kind of suggests we come up with a better 
name.

I agree there's a lot more useful stuff to model than just racks (or 
just L2 node groups).


From a deployment perspective, if you have two disconnected
 infrastructures, thats two AZ's, and two underclouds : so we know that
 any one undercloud is fully connected (possibly multiple subnets, but
 one infrastructure). When would we want to subdivide that?

 One case is quick fault aggregation: if a physical rack loses power,
 rather than having 16 NOC folk independently investigating the same 16
 down hypervisors, one would prefer to identify that the power to the
 rack has failed (for non-HA powered racks); likewise if a single
 switch fails (for non-HA network topologies) you want to identify that
 that switch is down rather than investigating all the cascaded errors
 independently.

 A second case is scheduling: you may want to put nova instances on the
 same switch as the cinder service delivering their block devices, when
 possible, or split VM's serving HA tasks apart. (We currently do this
 with host aggregates, but being able to do it directly would be much
 nicer).

 Lastly, if doing physical operations like power maintenance or moving
 racks around in a datacentre, being able to identify machines in the
 same rack can be super useful for planning, downtime announcements, 
orhttps://plus.google.com/hangouts/_/04919b4400b8c4c5ba706b752610cd433d9acbe1
 host evacuation, and being able to find a specific machine in a DC is
 also important (e.g. what shelf in the rack, what cartridge in a
 chassis).

I agree. However, we should take care not to commit ourselves to 
building a DCIM just yet.


 Back to 'Logical Rack' - you can see then that having a single
 construct to group machines together doesn't really support these use
 cases in a systematic fasion:- Physical rack modelling supports only a
 subset of the location/performance/failure use cases, and Logical rack
 doesn't support them at all: we're missing all the rich data we need
 to aggregate faults rapidly : power, network, air conditioning - and
 these things cover both single machine/groups of machines/racks/rows
 of racks scale (consider a networked PDU with 10 hosts on it - thats a
 fraction of a rack).

 So, what I'm suggesting is that we model the failure and performance
 domains directly, and include location (which is the incremental data
 racks add once failure and performance domains are modelled) too. We
 can separately noodle

Re: [openstack-dev] [TripleO] Generalising racks :- modelling a datacentre

2013-09-25 Thread Keith Basil
On Sep 25, 2013, at 10:36 AM, Tomas Sedovic wrote:

 On 09/25/2013 05:15 AM, Robert Collins wrote:
 One of the major things Tuskar does is model a datacenter - which is
 very useful for error correlation, capacity planning and scheduling.

Tuskar was designed for general infrastructure modeling within the scope of 
OpenStack.

Yes, Tuskar could be used to model a datacenter but that was not its original 
design goal.  This is not to say that modeling a datacenter wouldn't be useful 
and some of the points, concepts and ideas later in the post are very good.

But in terms of an MVP, we were focused on providing an easy approach for cloud 
operators wishing to deploy OpenStack.  What we're seeing is a use case where 
deployments are fairly small (small being 2-30 racks of gear).


 Long term I'd like this to be held somewhere where it is accessible
 for schedulers and ceilometer etc. E.g. network topology + switch
 information might be held by neutron where schedulers can rely on it
 being available, or possibly held by a unified topology db with
 scheduler glued into that, but updated by neutron / nova / cinder.
 Obviously this is a) non-trivial and b) not designed yet.
 
 However, the design of Tuskar today needs to accomodate a few things:
  - multiple reference architectures for clouds (unless there really is
 one true design)

  - the fact that today we don't have such an integrated vertical scheduler.

+1 to both, but recognizing that these are long term asks. 

 So the current Tuskar model has three constructs that tie together to
 model the DC:
  - nodes
  - resource classes (grouping different types of nodes into service
 offerings - e.g. nodes that offer swift, or those that offer nova).
  - 'racks'
 
 AIUI the initial concept of Rack was to map to a physical rack, but
 this rapidly got shifted to be 'Logical Rack' rather than physical
 rack, but I think of Rack as really just a special case of a general
 modelling problem..
 
 Yeah. Eventually, we settled on Logical Rack meaning a set of nodes on the 
 same L2 network (in a setup where you would group nodes into isolated L2 
 segments). Which kind of suggests we come up with a better name.
 
 I agree there's a lot more useful stuff to model than just racks (or just L2 
 node groups).

Indeed.  We chose the label rack because most folk understand it.  When 
generating a bill of materials for cloud gear for example, people tend to think 
in rack elevations, etc.  The rack model breaks down a bit when you start to 
consider things like system on chip solutions like Moonshot with the 
possibility of a number of chassis within a physical rack.  This  prompted 
further refinement of the concept.  And as Tomas mentioned, we have shifted to 
logical racks based on L2 binding between nodes.  Better, more fitting naming 
ideas here are welcome. 


 From a deployment perspective, if you have two disconnected
 infrastructures, thats two AZ's, and two underclouds : so we know that
 any one undercloud is fully connected (possibly multiple subnets, but
 one infrastructure). When would we want to subdivide that?
 
 One case is quick fault aggregation: if a physical rack loses power,
 rather than having 16 NOC folk independently investigating the same 16
 down hypervisors, one would prefer to identify that the power to the
 rack has failed (for non-HA powered racks); likewise if a single
 switch fails (for non-HA network topologies) you want to identify that
 that switch is down rather than investigating all the cascaded errors
 independently.
 
 A second case is scheduling: you may want to put nova instances on the
 same switch as the cinder service delivering their block devices, when
 possible, or split VM's serving HA tasks apart. (We currently do this
 with host aggregates, but being able to do it directly would be much
 nicer).
 
 Lastly, if doing physical operations like power maintenance or moving
 racks around in a datacentre, being able to identify machines in the
 same rack can be super useful for planning, downtime announcements, 
 orhttps://plus.google.com/hangouts/_/04919b4400b8c4c5ba706b752610cd433d9acbe1
 host evacuation, and being able to find a specific machine in a DC is
 also important (e.g. what shelf in the rack, what cartridge in a
 chassis).
 
 I agree. However, we should take care not to commit ourselves to building a 
 DCIM just yet.
 
 
 Back to 'Logical Rack' - you can see then that having a single
 construct to group machines together doesn't really support these use
 cases in a systematic fasion:- Physical rack modelling supports only a
 subset of the location/performance/failure use cases, and Logical rack
 doesn't support them at all: we're missing all the rich data we need
 to aggregate faults rapidly : power, network, air conditioning - and
 these things cover both single machine/groups of machines/racks/rows
 of racks scale (consider a networked PDU with 10 hosts on it - thats a
 fraction of a rack).
 
 So, what I'm suggesting 

Re: [openstack-dev] [TripleO] Generalising racks :- modelling a datacentre

2013-09-25 Thread Narayan Desai
We're interested in topology support in order to support placement locality
(to optimize task placement in ensembles). Vish and I started talking about
what would be needed a few months ago, and came up with two approaches that
would work:
 - modelling the system as a full graph (ie rich enough topology
information in order to represent orthogonal concerns, like power and
network, for example)
 - a limited approach where location was described through a feature vector
that could be used for determining group diameter, which could be in turn
used to compute group affinity and dispersion.

We're also beginning to think about trying to expose network topology
upwards for scheduling as well. When you are interested in full topology,
you can't take any shortcuts, so I think that we're stuck with a full graph
for this.

It definitely makes sense to have a well maintained, flexible single
definition of this data that can be used everywhere.
 -nld
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev