On 01/13/2015 01:15 AM, Ton Ngo wrote:
     I was also thinking of using the environment to hold the breakpoint,
similarly to parameters.  The CLI and API would process it just like
parameters.

    As for the state of a stack hitting the breakpoint, leveraging the
FAILED state seems to be sufficient, we just need to add enough information
to differentiate between a failed resource and a resource at a breakpoint.
Something like emitting an event or a message should be enough to make that
distinction.   Debugger for native program typically does the same thing,
leveraging the exception handling in the OS by inserting an artificial
error at the breakpoint to force a program to stop.  Then the debugger
would just remember the address of these artificial errors to decode the
state of the stopped program.

    As for the workflow, instead of spinning in the scheduler waiting for a
signal, I was thinking of moving the stack off the engine as a failed
stack. So this would be an end-state for the stack as Steve suggested, but
without adding a new stack state.   Again, this is similar to how a program
being debugged is handled:  they are moved off the ready queue and their
context is preserved for examination.  This seems to keep the
implementation simple and we don't have to worry about timeout,
performance, etc.  Continuing from the breakpoint then should be similar to
stack-update on a failed stack.  We do need some additional handling, such
as allowing resource in-progress to run to completion instead of aborting.

     For the parallel paths in a template, I am thinking about these
alternatives:
1. Stop after all the current in-progress resources complete, but do not
start any new resources even if there is no dependency.  This should be
easier to implement, but the state of the stack would be non-deterministic.
2. Stop only the paths with the breakpoint, continue all other parallel
paths to completion.  This seems harder to implement, but the stack would
be in a deterministic state and easier for the user to reason with.

    To be compatible with convergence, I had suggested to Clint earlier to
add a mode where the convergence engine does not attempt to retry so the
user can debug, and I believe this was added to the blueprint.

Ton,


Regarding the spinning schedule, I get the token expiry and stuff, but it is *super simple* to implement.

Literally a while loop that yields. Two lines of code.

And we don't have to change anything in the scheduler or the way we handle stack or whatever. Heat already knows how to handle this situation.

Can we start with that implementation (because it's simple and correct) and then take it from there? Assuming we can stick to the same API/UI, we should be able to change it later when we've documented issues with the current approach.


As for parallel execution, I definitely prefer the deterministic approach: stop on the breakpoint and everything that depends on it, but resolve everything else that you can.

Again, this is trivially handled by Heat already (my patch has no special handling for this case). If you want to pause everything, you can always set up more breakpoints and advance them either manually or all at once with the (to be implemented) stepping functionality.





From:   Steven Hardy <[email protected]>
To:     "OpenStack Development Mailing List (not for usage questions)"
             <[email protected]>
Date:   01/12/2015 02:40 PM
Subject:        Re: [openstack-dev] [Heat] Where to keep data about stack
             breakpoints?



On Mon, Jan 12, 2015 at 05:10:47PM -0500, Zane Bitter wrote:
On 12/01/15 13:05, Steven Hardy wrote:
I also had a chat with Steve Hardy and he suggested adding a STOPPED
state
to the stack (this isn't in the spec). While not strictly necessary to
implement the spec, this would help people figure out that the stack
has
reached a breakpoint instead of just waiting on a resource that takes
a long
time to finish (the heat-engine log and event-list still show that a
breakpoint was reached but I'd like to have it in stack-list and
resource-list, too).

It makes more sense to me to call it PAUSED (we're not completely
stopping
the stack creation after all, just pausing it for a bit), I'll let
Steve
explain why that's not the right choice :-).
So, I've not got strong opinions on the name, it's more the workflow:

1. User triggers a stack create/update
2. Heat walks the graph, hits a breakpoint and stops.
3. Heat explicitly triggers continuation of the create/update

Did you mean the user rather than Heat for (3)?

Oops, yes I did.

My argument is that (3) is always a stack update, either a PUT or PATCH
update, e.g we_are_  completely stopping stack creation, then a user can
choose to re-start it (either with the same or a different definition).

Hmmm, ok that's interesting. I have not been thinking of it that way.
I've
always thought of it like this:


http://docs.aws.amazon.com/AutoScaling/latest/DeveloperGuide/adding-lifecycle-hooks.html


(Incidentally, this suggests an implementation where the lifecycle hook
is
actually a resource - with its own API, naturally.)

So, if it's requested, before each operation we send out a notification
(hopefully via Zaqar), and if a breakpoint is set that operation is not
carried out until the user makes an API call acknowledging it.

I guess I was trying to keep it initially simpler than that, given that we
don't have any integration with a heat-user messaging system at present.

So, it_is_  really an end state, as a user might never choose to update
>from the stopped state, in which case *_STOPPED makes more sense.

That makes a bit more sense now.

I think this is going to be really hard to implement though. Because
while
one branch of the graph stops, other branches have to continue as far as
they can. At what point do you change the state of the stack?

True, this is a disadvantage of specifying a single breakpoint when there
may be parallel paths through the graph.

However, I was thinking we could just reuse our existing error path
implementation, so it needn't be hard to implement at all, e.g.

1. Stack action started where a resource has a breakpoint set
2. Stack.stack_task.resource_action checks if resource is a breakpoint
3. If a breakpoint is set, we raise a exception.ResourceFailure subclass
4. The normal error_wait_time is respected, e.g currently in-progress
actions are given a chance to complete.

Basically, the only implementation would be raising a special new type of
exception, which would enable a suitable message (and event) to be shown to
the user "Stack create aborted due to breakpoint on resource foo".

Pre/post breakpoint actions/messaging could be added later via a similar
method to the stack-level lifecycle plugin hooks.

If folks are happy with e.g CREATE_FAILED as a post-breakpoint state, this
could simplify things a lot, as we'd not need any new state or much new
code at all?

Paused implies the same action as the PATCH update, only we trigger
continuation of the operation from the point we reached via some sort of
user signal.

If we actually pause an in-progress action via the scheduler, we'd have
to
start worrying about stuff like token expiry, hitting timeouts,
resilience
to engine restarts, etc, etc.  So forcing an explicit update seems
simpler
to me.

Yes, token expiry and stack timeouts are annoying things we'd have to
deal
with. (Resilience to engine restarts is not affected though.) However,
I'm
not sure your model is simpler, and in particular it sounds much harder
to
implement in the convergence architecture.

So you're advocating keeping the scheduler spinning, until a user sends a
signal to the resource to clear the breakpoint?

I don't see why we couldn't do both, have a "abort_on_breakpoint" flag or
something, but I'd be interested in further understanding how the
error-path approach outlined above would be incompatible with convergence.

Thanks,

Steve

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to