On 6/7/2018 9:02 AM, Matt Riedemann wrote:
We have a nova spec  which is at the point that it needs some API
user (and operator) feedback on what nova API should be doing when
listing servers and there are down cells (unable to reach the cell DB or
it times out).
tl;dr: the spec proposes to return "shell" instances which have the
server uuid and created_at fields set, and maybe some other fields we
can set, but otherwise a bunch of fields in the server response would be
set to UNKNOWN sentinel values. This would be unversioned, and therefore
could wreak havoc on existing client side code that expects fields like
'config_drive' and 'updated' to be of a certain format.
There are alternatives listed in the spec so please read this over and
provide feedback since this is a pretty major UX change.
Oh, and no pressure, but today is the spec freeze deadline for Rocky.
The options laid out right now are:
1. Without a new microversion, include 'shell' servers in the response
when listing over down cells. These would have UNKNOWN values for the
fields in the server object. gibi and I didn't like this because
existing client code wouldn't know how to deal with these UNKNOWN shell
instances - and not all of the server fields are simple strings, we have
booleans, integers, dicts and lists, so what would those values be?
2. In a new microversion, return a new top-level parameter when listing
servers which would include minimal details about servers that are in
down cells (minimal like just the uuid). This was an alternative gibi
and I had discussed because we didn't like the client-side impacts w/o a
microversion or the full 'shell' servers in option 1. From an IRC
conversation last week with mordred , dansmith and mordred don't care
for the new top-level parameter since clients would have to merge that
in to the full list of available servers. Plus, in the future, if we
ever have some kind of caching mechanism in the API from which we can
pull instance information if it's in a down cell, then the new top-level
parameter becomes kind of pointless.
3. In a new microversion, include servers from down cells in the same
top-level servers response parameter but for those in down cells, we'll
just include minimal information (status=UNKNOWN and the uuid). Clients
would opt-in to the new microversion when they know how to deal with
what an instance in UNKNOWN status means. In the future, we could use a
caching mechanism to fill in these details for instances in down cells.
#3 is kind of a compromise on options 1 and 2, and I'm OK with it
(barring any hairy details).
In all cases, we won't include 'shell' servers in the response if the
user is filtering (or paging?) because we can't be honest about the
results and just have to treat the filters as if they don't apply to the
instances in the down cell.
If you have a server in a down cell, you can't delete it or really do
anything with it because we literally can't pull the instance out of the
cell database while the cell is down. You'd get a 500 or 503 in that case.
Regardless of microversion, we plan on omitting instances from down
cells when listing which is a backportable reliability bug fix  so we
don't 500 the API when listing across 70 cells and 1 is down.
OpenStack Development Mailing List (not for usage questions)