On 05/21/2016 08:35 PM, Dan Prince wrote:
On Fri, 2016-05-20 at 14:06 +0200, Dmitry Tantsur wrote:
On 05/20/2016 01:44 PM, Dan Prince wrote:

On Thu, 2016-05-19 at 15:31 +0200, Dmitry Tantsur wrote:

Hi all!

We started some discussions on https://review.openstack.org/#/c/3
0020
0/
about the future of node management (registering, configuring and
introspecting) in the new API, but I think it's more fair (and
convenient) to move it here. The goal is to fix several long-
standing
design flaws that affect the logic behind tripleoclient. So
fasten
your
seatbelts, here it goes.

If you already understand why we need to change this logic, just
scroll
down to "what do you propose?" section.

"introspection bulk start" is evil
----------------------------------

As many of you obviously know, TripleO used the following command
for
introspection:

  openstack baremetal introspection bulk start

As not everyone knows though, this command does not come from
ironic-inspector project, it's part of TripleO itself. And the
ironic
team has some big problems with it.

The way it works is

1. Take all nodes in "available" state and move them to
"manageable"
state
2. Execute introspection for all nodes in "manageable" state
3. Move all nodes with successful introspection to "available"
state.

Step 3 is pretty controversial, step 1 is just horrible. This not
how
the ironic-inspector team designed introspection to work (hence
it
refuses to run on nodes in "available" state), and that's now how
the
ironic team expects the ironic state machine to be handled. To
explain
it I'll provide a brief information on the ironic state machine.

ironic node lifecycle
---------------------

With recent versions of the bare metal API (starting with 1.11),
nodes
begin their life in a state called "enroll". Nodes in this state
are
not
available for deployment, nor for most of other actions. Ironic
does
not
touch such nodes in any way.

To make nodes alive an operator uses "manage" provisioning action
to
move nodes to "manageable" state. During this transition the
power
and
management credentials (IPMI, SSH, etc) are validated to ensure
that
nodes in "manageable" state are, well, manageable. This state is
still
not available for deployment. With nodes in this state an
operator
can
execute various pre-deployment actions, such as introspection,
RAID
configuration, etc. So to sum it up, nodes in "manageable" state
are
being configured before exposing them into the cloud.

The last step before the deployment it to make nodes "available"
using
the "provide" provisioning action. Such nodes are exposed to
nova,
and
can be deployed to at any moment. No long-running configuration
actions
should be run in this state. The "manage" action can be used to
bring
nodes back to "manageable" state for configuration (e.g.
reintrospection).

so what's the problem?
----------------------

The problem is that TripleO essentially bypasses this logic by
keeping
all nodes "available" and walking them through provisioning steps
automatically. Just a couple of examples of what gets broken:

(1) Imagine I have 10 nodes in my overcloud, 10 nodes ready for
deployment (including potential autoscaling) and I want to enroll
10
more nodes.

Both introspection and ready-state operations nowadays will touch
both
10 new nodes AND 10 nodes which are ready for deployment,
potentially
making the latter not ready for deployment any more (and
definitely
moving them out of pool for some time).

Particularly, any manual configuration made by an operator before
making
nodes "available" may get destroyed.

(2) TripleO has to disable automated cleaning. Automated cleaning
is
a
set of steps (currently only wiping the hard drive) that happens
in
ironic 1) before nodes are available, 2) after an instance is
deleted.
As TripleO CLI constantly moves nodes back-and-forth from and to
"available" state, cleaning kicks in every time. Unless it's
disabled.

Disabling cleaning might sound a sufficient work around, until
you
need
it. And you actually do. Here is a real life example of how to
get
yourself broken by not having cleaning:

a. Deploy an overcloud instance
b. Delete it
c. Deploy an overcloud instance on a different hard drive
d. Boom.
This sounds like an Ironic bug to me. Cleaning (wiping a disk) and
removing state that would break subsequent installations on a
different
drive are different things. In TripleO I think the reason we
disable
cleaning is largely because of the extra time it takes and the fact
that our baremetal cloud isn't multi-tenant (currently at least).
We fix this "bug" by introducing cleaning. This is the process to
guarantee each deployment starts with a clean environment. It's hard
to
known which remained data can cause which problem (e.g. what about a
remaining UEFI partition? any remainings of Ceph? I don't know).





As we didn't pass cleaning, there is still a config drive on the
disk
used in the first deployment. With 2 config drives present cloud-
init
will pick a random one, breaking the deployment.
TripleO isn't using config drive is it? Until Nova supports config
drives via Ironic I think we are blocked on using it.
TripleO does use config drives (btw I'm telling you a real bug here,
not
something I made up). Nova does support Ironic config drives, it
does
not support (and does not want to) injecting random data from an
Ironic
node there (we wanted to pass data from introspection to the node).





To top it all, TripleO users tend to not use root device hints,
so
switching root disks may happen randomly between deployments.
Have
fun
debugging.

what do you propose?
--------------------

I would like the new TripleO mistral workflows to start following
the
ironic state machine closer. Imagine the following workflows:

1. register: take JSON, create nodes in "manageable" state. I do
believe
we can automate the enroll->manageable transition, as it serves
the
purpose of validation (and discovery, but lets put it aside).

2. provide: take a list of nodes or all "managable" nodes and
move
them
to "available". By using this workflow an operator will make a
*conscious* decision to add some nodes to the cloud.

3. introspect: take a list of "managable" (!!!) nodes or all
"manageable" nodes and move them through introspection. This is
an
optional step between "register" and "provide".

4. set_node_state: a helper workflow to move nodes between
states.
The
"provide" workflow is essentially set_node_state with
verb=provide,
but
is separate due to its high importance in the node lifecycle.

5. configure: given a couple of parameters (deploy image, local
boot
flag, etc), update given or all "manageable" nodes with them.
I like how you've split things up into the above workflows.
Furthermore, I think we'll actually be able to accomplish most, if
not
all of it by using pure Mistral workflows (very little custom
actions
involved).

One refinement I might suggestion is that for the workflows that
take a
list of uuid's *or* search for a type of nodes that we might split
them
into two workflows. One which calls the other.
Good idea.

Ryan Brady and I spent some time yesterday implementing the suggested
workflows (all except for the 'config' one I think which could come
later).

Fantastic, thank you! Lets continue in gerrit now.


How does this one look:

https://review.openstack.org/#/c/300200/4/workbooks/baremetal.yaml

We've got some python-tripleoclient patches coming soon too which use
this updated workflow to do the node registration bits.

Dan




For example a 'provide_managed_nodes' workflow would call into the
'provide' workflow which takes a list of uuids? I think this gives
us
the same features we need and exposes the required input parameters
more cleanly to the end user.

So long as we can do the above and still make the existing python-
tripleoclient calls backwards compatible I think we should be in
good
shape.
Awesome!



Dan




Essentially the only addition here is the "provide" action which
I
hope
you already realize should be an explicit step.

what about tripleoclient
------------------------

Of course we want to keep backward compatibility. The existing
commands

  openstack baremetal import
  openstack baremetal configure boot
  openstack baremetal introspection bulk start

will use some combinations of workflows above and will be
deprecated.

The new commands (also avoiding hijacking into the bare metal
namespaces) will be provided strictly matching the workflows
(especially
in terms of the state machine):

  openstack overcloud node import
  openstack overcloud node configure
  openstack overcloud node introspect
  openstack overcloud node provide

(I have a good reasoning behind each of these names, but if I put
it
here this mail will be way too long).

Now to save a user some typing:
1. the configure command will be optional, as the import command
will
set the defaults
2. the introspect command will get --provide flag
3. the import command will get --introspect and --provide flags

So the simplest flow for people will be:

  openstack overcloud node import --provide instackenv.json

this command will use 2 workflows and will result in a bunch of
"available" nodes, essentially making it a synonym of the
"baremetal
import" command.

With introspection it becomes:

  openstack overcloud node import --introspect --provide
instackenv.json

this command will use 3 workflows and will result in "available"
and
introspected nodes.


Thanks for reading such a long email (ping me on IRC if you
actually
read it through just for statistics). I hope it makes sense for
you.

Dmitry.

_________________________________________________________________
____
_____
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:un
subs
cribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
___________________________________________________________________
_______
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsu
bscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


_____________________________________________________________________
_____
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubs
cribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to