On 06/13/2016 06:28 PM, Ben Nemec wrote:
On 06/13/2016 09:41 AM, Jiri Tomasek wrote:
Hi all,
As we are close to merging the initial Nodes Registration workflows and
action [1, 2] using Mistral which successfully provides the current
registration logic via common API, I'd like to start discussion on how
to improve it so it satisfies GUI and CLI requirements. I'd like to try
to describe the clients goals and define requirements, describe current
workflow problems and propose a solution. I'd like to record the result
of discussion to Blueprint [3] which Ryan already created.
CLI goals and optimal workflow
========================
CLI's main benefit is based on the fact that it's commands can simply
become part of a script, so it is important that the operation is
idempotent. The optimal CLI workflow is:
User runs 'openstack baremetal import' and provides instackenv.json file
which includes all nodes information. When the registration fails at
some point, user is notified about the error and re-runs the command
with the same set of nodes. Rinse and repeat until all nodes are
properly registered.
I would actually not describe this as the optimal workflow for CLI
registration either. It would be much better if the registration
completed for all the nodes that it can in the first place and then any
failed nodes can be cleaned up later. There's no reason one bad node in
a file containing 100 nodes should block the entire deployment.
On that note, the only part of your proposal that I'm not sure would be
useful for the CLI is the non-blocking part. I don't know that a CLI
fire-and-forget mode makes a lot of sense, although if there's a way for
the CLI to check the status then that might be fine too. However,
pretty much all of the other usability stuff you're talking about would
benefit the CLI too.
GUI goals and optimal workflow
=========================
GUI's main goal is to provide a user friendly way to register nodes,
inform the user on the process, handle the problems and lets user fix
them. GUI strives for being responsive and interactive.
GUI allows user to add nodes specification manually one by one by
provided form or allow user (in same manner as CLI) to provide the
instackenv.json file which holds the nodes description. Importing the
file (or adding node manually) will populate an array of nodes the user
wants to register. User is able to browse these nodes and make
corrections to their configuration. GUI provides client side validations
to verify inputs (node name format, required fields, mac address, ip
address format etc.)
Then user triggers the registration. The nodes are moved to nodes table
as they are being registered. If an error occurs during registration of
any of the nodes, user is notified about the issue and can fix it in
registration form and can re-trigger registration for failed nodes.
Rinse and repeat until all nodes are successfully registered and in
proper state (manageable).
Such workflow keeps the GUI interactive, user does not have to look at
the spinner for several minutes (in case of registering hundreds of
nodes), hoping that something does not happen wrong. User is constantly
informed about the progress, user is able to react to the situation as
he wants, User is able to freely interact with the GUI while
registration is happening on the background. User is able to register
nodes in batches.
Current solution
=============
Current solution uses register_or_update_nodes workflow [1] which takes
a nodes_json array and runs register_or_update_nodes and
set_nodes_managed tasks. When the whole operation completes it sends
Zaqar message notifying about the result of the registration of the
whole batch of nodes.
register_or_update_nodes runs tripleo.register_or_update_nodes action
[2] which uses business logic in tripleo_common/utils/nodes.py
utils.nodes.py module has been originally extracted from tripleoclient
to get the business logic behind the common API. It does following:
- converts the instackenv.json nodes format to appropriate ironic driver
format (driver-info fields)
- sets kernel and ramdisk ids defaults if they're not provided
- for each node it tests if node already exists (finds nodes by mac
addresses) and updates it or registers it as new based on the result.
Current Problems:
- no zaqar notification is sent for each node
- nodes are registered in batch, registration fails when error happens
on a certain node, leaving already registered nodes in inconsistent state
- workflow does not notify user about what nodes have been registered
and what failed, only thing user gets is relevant error message
- when the workflow succeeds, the registered_nodes list sent by Zaqar
message has outdated information
- when nodes are updated using nodes registration, the forkflow ends up
as failed, without any error output, although the nodes are updated
successfully
- utils/nodes.py decides whether the node should be created or updated
based on mac address which is subject to change. It needs to be done by
UUID which is fixed.
The UUID is not fixed. If you register the same node twice, it will get
a different UUID each time (unless you explicitly set the UUID as I see
you can now do, but we can't and shouldn't require that). The MAC
shouldn't change unless you spoof it (which is not something I think we
should be designing around) or point it at a different NIC. In the
latter case, we have no way of knowing whether you changed to a
different NIC on the same system or reassigned an IPMI address to a
completely different system, so requiring that to be a new node is the
only sane way to handle it AFAICT.
Starting last week you can put UUID to instackenv.json and ironic will
use it.
I'm not actually sure from just reading the code that we would require a
new node on a mac change anyway. In that case I think it would match
the IPMI address of the node instead and select that existing node for
update. I could be wrong about that since I haven't actually tested it,
but that's how it looks to me.
We already use IPMI address to match, but
1. it can be changed
2. lookup will fail if we switch driver
- utils/nodes.py uses instackenv.json nodes list format - the conversion
should be done in client
Why? Putting logic like that in the client is evil.
- instackenv.json uses nodes list format which is not compatible with
ironic which forces us to do the format conversions and limit the ironic
driver support
Totally agree that we should just drop the translation layer. It isn't
a useful abstraction - if you're using a vendor-specific driver, then
you probably have to set driver-specific keys, so you can't just swap
out drivers in your config anyway. If you're not using driver-specific
keys then you still aren't gaining anything from this extra layer.
From looking at the existing code, the only _possible_ benefit from this
layer is some abstraction of drivers that use a different key for the
actual IPMI address, but honestly that inconsistency sounds like
something that should be fixed in Ironic, not papered over in the node
registration code. I guess that's a discussion we'd have to have with
the Ironic team.
+1. we were talking about it, but it wasn't a big priority. Some push
from TripleO side would probably help :)
Proposed changes
===============
To satisfy clients requirements we need to:
- assure the idempotency of idempotency of running the nodes
registration providing the instackenv.json
- enable the workflow to track each node registration workflow separately
The changes can be done in 2 steps:
1. refactor register_or_update_nodes workflow and utils/nodes.py
- register_or_update_nodes workflow calls register_or_update_node
workflow for each node, that workflow then runs tasks:
tripleo.register_or_update_node (action), set_node_managed,
send_message. When whole workflow finishes, summary message is sent.
- reduce the register action and utils/nodes.py to the mechanism of
deciding on whether we create new or update the node - based on whether
the uuid is provided in nodes list and whether node is present in ironic
- move the format conversion from utils/nodes.py to client
These changes allow each node to finish it's registration without being
interrupted by failure on other node, avoid nodes getting into
inconsistent state, allow interactive reporting on each nodes state,
allows to run validations on each node separately
2. change instackenv.json nodes format to match Ironic
- this change allows the client to pass data from instackenv.json to
ironic without intervention, so since the interaction is data > ironic,
user can specify any ironic driver in instackenv.json. In addition, GUI
can dynamically display relevant properties user needs to fill in based
on selected driver.
Possible problems:
We need to consider performance impact by running separate workflow for
each node, doing some benchmark tests would be beneficial.
Alternative solution:
If we decide to stick with single workflow it will have the UX impact as
we won't be able to interactively report on each node changes. But we'll
still be able to report on each task result. Other requirements still stand:
- nodes which got registered before the failure happened need to
continue with next tasks (set_node_managed)
- workflow needs to report registered nodes list, failed nodes list and
error.
[1] https://review.openstack.org/#/c/300200/
[2] https://review.openstack.org/#/c/319587/
[3]
https://blueprints.launchpad.net/tripleo-common/+spec/improve-baremetal-workflows
-- Jirka
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev