On Mon, Jan 12, 2015 at 05:10:47PM -0500, Zane Bitter wrote: > On 12/01/15 13:05, Steven Hardy wrote: > >>>I also had a chat with Steve Hardy and he suggested adding a STOPPED state > >>>to the stack (this isn't in the spec). While not strictly necessary to > >>>implement the spec, this would help people figure out that the stack has > >>>reached a breakpoint instead of just waiting on a resource that takes a > >>>long > >>>time to finish (the heat-engine log and event-list still show that a > >>>breakpoint was reached but I'd like to have it in stack-list and > >>>resource-list, too). > >>> > >>>It makes more sense to me to call it PAUSED (we're not completely stopping > >>>the stack creation after all, just pausing it for a bit), I'll let Steve > >>>explain why that's not the right choice :-). > >So, I've not got strong opinions on the name, it's more the workflow: > > > >1. User triggers a stack create/update > >2. Heat walks the graph, hits a breakpoint and stops. > >3. Heat explicitly triggers continuation of the create/update > > Did you mean the user rather than Heat for (3)?
Oops, yes I did. > >My argument is that (3) is always a stack update, either a PUT or PATCH > >update, e.g we_are_ completely stopping stack creation, then a user can > >choose to re-start it (either with the same or a different definition). > > Hmmm, ok that's interesting. I have not been thinking of it that way. I've > always thought of it like this: > > http://docs.aws.amazon.com/AutoScaling/latest/DeveloperGuide/adding-lifecycle-hooks.html > > (Incidentally, this suggests an implementation where the lifecycle hook is > actually a resource - with its own API, naturally.) > > So, if it's requested, before each operation we send out a notification > (hopefully via Zaqar), and if a breakpoint is set that operation is not > carried out until the user makes an API call acknowledging it. I guess I was trying to keep it initially simpler than that, given that we don't have any integration with a heat-user messaging system at present. > >So, it_is_ really an end state, as a user might never choose to update > >from the stopped state, in which case *_STOPPED makes more sense. > > That makes a bit more sense now. > > I think this is going to be really hard to implement though. Because while > one branch of the graph stops, other branches have to continue as far as > they can. At what point do you change the state of the stack? True, this is a disadvantage of specifying a single breakpoint when there may be parallel paths through the graph. However, I was thinking we could just reuse our existing error path implementation, so it needn't be hard to implement at all, e.g. 1. Stack action started where a resource has a breakpoint set 2. Stack.stack_task.resource_action checks if resource is a breakpoint 3. If a breakpoint is set, we raise a exception.ResourceFailure subclass 4. The normal error_wait_time is respected, e.g currently in-progress actions are given a chance to complete. Basically, the only implementation would be raising a special new type of exception, which would enable a suitable message (and event) to be shown to the user "Stack create aborted due to breakpoint on resource foo". Pre/post breakpoint actions/messaging could be added later via a similar method to the stack-level lifecycle plugin hooks. If folks are happy with e.g CREATE_FAILED as a post-breakpoint state, this could simplify things a lot, as we'd not need any new state or much new code at all? > >Paused implies the same action as the PATCH update, only we trigger > >continuation of the operation from the point we reached via some sort of > >user signal. > > > >If we actually pause an in-progress action via the scheduler, we'd have to > >start worrying about stuff like token expiry, hitting timeouts, resilience > >to engine restarts, etc, etc. So forcing an explicit update seems simpler > >to me. > > Yes, token expiry and stack timeouts are annoying things we'd have to deal > with. (Resilience to engine restarts is not affected though.) However, I'm > not sure your model is simpler, and in particular it sounds much harder to > implement in the convergence architecture. So you're advocating keeping the scheduler spinning, until a user sends a signal to the resource to clear the breakpoint? I don't see why we couldn't do both, have a "abort_on_breakpoint" flag or something, but I'd be interested in further understanding how the error-path approach outlined above would be incompatible with convergence. Thanks, Steve __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: [email protected]?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
