>However, considering the unhappy-path for a second, is there a place
for surfacing some more context as to why the new instance unexpectedly
went into the ERROR state?


I assume the philosophy is that the API has validated the request as far and it 
can, and returned any meaningful error messages, etc.   Anything that fails 
past that point is something going wrong from the cloud provider and there is 
nothing the user could have done to avoid the error, so any additional 
information won't help them.

However on the basis that up-front validation is seldom perfect, and things can 
change while a request is in flight I think that being able to tell a user 
that, for example, their request failed because the image was deleted before it 
could be downloaded would be useful.

One approach might be to make the task_state more granular and use that to 
qualify the error.   In general our users have found having the state shown as 
"vm_state (task_state)" was useful as it shows progress during things like 
building.

Phil



From: [email protected] 
[mailto:[email protected]] On Behalf Of 
Doug Davis
Sent: 29 June 2012 12:45
To: Eoghan Glynn
Cc: [email protected]
Subject: Re: [Openstack] Nova and asynchronous instance launching


Right - examining the current state isn't a good way to determine what happened 
with one particular request.  This is exactly one of the reasons some providers 
create Jobs for all actions.  Checking the resource "later" to see why 
something bad happened is fragile since other opertaons might have happened 
since then, erasing any "error message" type of state info.  And relying on 
event/error logs is hard since correlating one particular action with a flood 
of events is tricky - especially in a multi-user environment where several 
actions could be underway at once.  If each action resulted in a Job URI being 
returned then the client can check that Job resource when its convinient for 
them - and this could be quite useful in both happy and unhappy situations.

And to be clear, a Job doesn't necessarily need to be a a full new resource, it 
could (under the covers) map to a grouping of event logs entries but the point 
is that from a client's perspective they have an easy mechanism (e.g. issue a 
GET to a single URI) that returns all of the info needed to determine what 
happened with one particular operation.

thanks
-Doug
______________________________________________________
STSM |  Standards Architect  |  IBM Software Group
(919) 254-6905  |  IBM 444-6905  |  [email protected]<mailto:[email protected]>
The more I'm around some people, the more I like my dog.

Eoghan Glynn <[email protected]<mailto:[email protected]>>

06/29/2012 06:00 AM

To

Doug Davis/Raleigh/IBM@IBMUS

cc

[email protected]<mailto:[email protected]>, Jay Pipes 
<[email protected]<mailto:[email protected]>>

Subject

Re: [Openstack] Nova and asynchronous instance launching








> Note that I do distinguish between a 'real' async op (where you
> really return little more than a 202) and one that returns a
> skeleton of the resource being created - like instance.create() does
> now.

So the latter approach at least provides a way to poll on the resource
status, so as to figure out if and when it becomes usable.

In the happy-path, eventually the instance status transitions to
ACTIVE and away we go.

However, considering the unhappy-path for a second, is there a place
for surfacing some more context as to why the new instance unexpectedly
went into the ERROR state?

For example even just an indication that failure occurred in the scheduler
(e.g. resource starvation) or on the target compute node. Is the thought
that such information may be operationally sensitive, or just TMI for a
typical cloud user?

Cheers,
Eoghan

_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

Reply via email to