On 10 December 2013 09:55, Tzu-Mainn Chen <tzuma...@redhat.com> wrote:
* created as part of undercloud install process
By that note I meant, that Nodes are not resources, Resource instances
run on Nodes. Nodes are the generic pool of hardware we can deploy
things onto.
I don't think "resource nodes" is intended to imply that nodes are
resources; rather, it's supposed to
indicate that it's a node where a resource instance runs. It's supposed to
separate it from "management node"
and "unallocated node".
So the question is are we looking at /nodes/ that have a /current
role/, or are we looking at /roles/ that have some /current nodes/.
My contention is that the role is the interesting thing, and the nodes
is the incidental thing. That is, as a sysadmin, my hierarchy of
concerns is something like:
A: are all services running
B: are any of them in a degraded state where I need to take prompt
action to prevent a service outage [might mean many things: - software
update/disk space criticals/a machine failed and we need to scale the
cluster back up/too much load]
C: are there any planned changes I need to make [new software deploy,
feature request from user, replacing a faulty machine]
D: are there long term issues sneaking up on me [capacity planning,
machine obsolescence]
If we take /nodes/ as the interesting thing, and what they are doing
right now as the incidental thing, it's much harder to map that onto
the sysadmin concerns. If we start with /roles/ then can answer:
A: by showing the list of roles and the summary stats (how many
machines, service status aggregate), role level alerts (e.g. nova-api
is not responding)
B: by showing the list of roles and more detailed stats (overall
load, response times of services, tickets against services
and a list of in trouble instances in each role - instances with
alerts against them - low disk, overload, failed service,
early-detection alerts from hardware
C: probably out of our remit for now in the general case, but we need
to enable some things here like replacing faulty machines
D: by looking at trend graphs for roles (not machines), but also by
looking at the hardware in aggregate - breakdown by age of machines,
summary data for tickets filed against instances that were deployed to
a particular machine
C: and D: are (F) category work, but for all but the very last thing,
it seems clear how to approach this from a roles perspective.
I've tried to approach this using /nodes/ as the starting point, and
after two terrible drafts I've deleted the section. I'd love it if
someone could show me how it would work:)
* Unallocated nodes
This implies an 'allocation' step, that we don't have - how about
'Idle nodes' or something.
It can be auto-allocation. I don't see problem with 'unallocated' term.
Ok, it's not a biggy. I do think it will frame things poorly and lead
to an expectation about how TripleO works that doesn't match how it
does, but we can change it later if I'm right, and if I'm wrong, well
it won't be the first time :).
I'm interested in what the distinction you're making here is. I'd rather
get things
defined correctly the first time, and it's very possible that I'm missing a
fundamental
definition here.
So we have:
- node - a physical general purpose machine capable of running in
many roles. Some nodes may have hardware layout that is particularly
useful for a given role.
- role - a specific workload we want to map onto one or more nodes.
Examples include 'undercloud control plane', 'overcloud control
plane', 'overcloud storage', 'overcloud compute' etc.
- instance - A role deployed on a node - this is where work actually
happens.
- scheduling - the process of deciding which role is deployed on which node.
The way TripleO works is that we defined a Heat template that lays out
policy: 5 instances of 'overcloud control plane please', '20
hypervisors' etc. Heat passes that to Nova, which pulls the image for
the role out of Glance, picks a node, and deploys the image to the
node.
Note in particular the order: Heat -> Nova -> Scheduler -> Node chosen.
The user action is not 'allocate a Node to 'overcloud control plane',
it is 'size the control plane through heat'.
So when we talk about 'unallocated Nodes', the implication is that
users 'allocate Nodes', but they don't: they size roles, and after
doing all that there may be some Nodes that are - yes - unallocated,
or have nothing scheduled to them. So... I'm not debating that we
should have a list of free hardware - we totally should - I'm debating
how we frame it. 'Available Nodes' or 'Undeployed machines' or
whatever. I just want to get away from talking about something
([manual] allocation) that we don't offer.
-Rob
--
Robert Collins <rbtcoll...@hp.com>
Distinguished Technologist
HP Converged Cloud
_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev