Re: [openstack-dev] [TripleO] Tuskar CLI after architecture changes

Jay Pipes Thu, 19 Dec 2013 15:23:03 -0800

On 12/19/2013 04:55 AM, Radomir Dopieralski wrote:

On 14/12/13 16:51, Jay Pipes wrote:


[snip]

Instead of focusing on locking issues -- which I agree are very
important in the virtualized side of things where resources are
"thinner" -- I believe that in the bare-metal world, a more useful focus
would be to ensure that the Tuskar API service treats related group
operations (like "deploy an undercloud on these nodes") in a way that
can handle failures in a graceful and/or atomic way.


Atomicity of operations can be achieved by intoducing critical sections.
You basically have two ways of doing that, optimistic and pessimistic.
Pessimistic critical section is implemented with a locking mechanism
that prevents all other processes from entering the critical section
until it is finished.

I'm familiar with the traditional non-distributed software concept of amutex (or in Windows world, a critical section). But we aren't dealingwith traditional non-distributed software here. We're dealing withhighly distributed software where components involved in the"transaction" may not be running on the same host or have much awarenessof each other at all.

And, in any case (see below), I don't think that this is a problem thatneeds to be solved in Tuskar.

Perhaps you have some other way of making them atomic that I can't think of?

I should not have used the term atomic above. I actually do not thinkthat the things that Tuskar/Ironic does should be viewed as an atomicoperation. More below.

For example, if the construction or installation of one compute worker
failed, adding some retry or retry-after-wait-for-event logic would be
more useful than trying to put locks in a bunch of places to prevent
multiple sysadmins from trying to deploy on the same bare-metal nodes
(since it's just not gonna happen in the real world, and IMO, if it did
happen, the sysadmins/deployers should be punished and have to clean up
their own mess ;)


I don't see why they should be punished, if the UI was assuring them
that they are doing exactly the thing that they wanted to do, at every
step, and in the end it did something completely different, without any
warning. If anyone deserves punishment in such a situation, it's the
programmers who wrote the UI in such a way.

The issue I am getting at is that, in the real world, the problem ofmultiple users of Tuskar attempting to deploy an undercloud on the exactsame set of bare metal machines is just not going to happen. If youthink this is actually a real-world problem, and have seen two sysadminsactively trying to deploy an undercloud on bare-metal machines at thesame time without unbeknownst to each other, then I feel bad for thesysadmins that found themselves in such a situation, but I feel itstheir own fault for not knowing about what the other was doing.

Trying to make a complex series of related but distributed actions --like the underlying actions of the Tuskar -> Ironic API calls -- into anatomic operation is just not a good use of programming effort, IMO.Instead, I'm advocating that programming effort should instead be spentcoding a workflow/taskflow pipeline that can gracefully retry failedoperations and report the state of the total taskflow back to the user.


Hope that makes more sense,
-jay

_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO] Tuskar CLI after architecture changes

Reply via email to