> Right - examining the current state isn't a good way to determine > what happened with one particular request. This is exactly one of > the reasons some providers create Jobs for all actions. Checking the > resource "later" to see why something bad happened is fragile since > other opertaons might have happened since then, erasing any "error > message" type of state info. And relying on event/error logs is hard > since correlating one particular action with a flood of events is > tricky - especially in a multi-user environment where several > actions could be underway at once. If each action resulted in a Job > URI being returned then the client can check that Job resource when > its convinient for them - and this could be quite useful in both > happy and unhappy situations. > > And to be clear, a Job doesn't necessarily need to be a a full new > resource, it could (under the covers) map to a grouping of event > logs entries but the point is that from a client's perspective they > have an easy mechanism (e.g. issue a GET to a single URI) that > returns all of the info needed to determine what happened with one > particular operation.
Agreed on all points. I wonder could we simply leverage the existing X-Compute-Request-Id header to provide the context on the over-arching operation that the client wishes to be informed about? For example, by providing an administrative API extension allowing queries on the async "Job" status, identified via the req-<UUID> string returned from the initial call invoking the operation. Since the components serving such an operation are generally distributed (e.g. nova-api, nova-scheduler, nova-compute etc.) and tied together via async messaging, I don't think simple log scraping would be sufficient. But if each component was to follow logic such as: 1. when a context is received, check status in the nova DB for that request ID - if absent, mark as in-progress 2. when an operation hits an unrecoverable error condition, the exception- handling path should mark the request as failed in the nova DB 3. when an operation reaches a definitive endpoint, e.g. the instance is successfully launched, then the request status is marked as complete Step #3 would probably be most problematic, in the sense of identifying what constitutes the logical endpoint for every operation (e.g. a volume might created from a snapshot in order to be attached somewhere in a subsequent operation, or as part of a boot-from-volume operation). There would be some extra DB manipulation to consider, adding overhead & latency. There would also be wrinkles around the lifecycle of entries in the request status table, when to reap old entries etc. Just a thought in any case ... Cheers, Eoghan _______________________________________________ Mailing list: https://launchpad.net/~openstack Post to : [email protected] Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp

