On 02/18/2017 01:46 PM, Matt Riedemann wrote:
I haven't fully dug into testing this, but I got wondering about this
question from reviewing a change [1] which would make the unshelve
operation start to check the volume AZ compared to the instance AZ when
the compute manager calls _prep_block_device.
That change is attempting to remove the check_attach() method in
nova.volume.cinder.API since it's mostly redundant with state checks
that Cinder does when reserving the volume. The only other thing that
Nova does in there right now is compare the AZs.
What I'm wondering is, with that change, will things break because of a
scenario like this:
1. Create volume in AZ 1.
2. Create server in AZ 1.
3. Attach volume to server (or boot server from volume in step 2).
4. Shelve (offload) server.
5. Unshelve server - nova-scheduler puts it into AZ 2.
6. _prep_block_device compares instance AZ 2 to volume AZ 1 and unshelve
fails with InvalidVolume.
If unshelving a server in AZ 1 can't move it outside of AZ 1, then we're
fine and the AZ check when unshelving is redundant but harmless.
[1]
https://review.openstack.org/#/c/335358/38/nova/virt/block_device.py@249
When an instance is unshelved, the unshelve_instance() RPC API method is
passed a RequestSpec object as the request_spec parameter:
https://github.com/openstack/nova/blob/master/nova/conductor/manager.py#L600
This request spec object is passed to schedule_instances():
https://github.com/openstack/nova/blob/master/nova/conductor/manager.py#L660
(you will note that the code directly above there "resets force_hosts"
parameters, ostensibly to prevent any forced destination host from being
passed to the scheduler)
The question is: does the above request spec contain availability zone
information for the original instance? If it does, we're good. If it
doesn't, we can get into the problem described above.
From what I can tell (and Sylvain might be the best person to answer
this, thus his cc'ing), the availability zone is *always* stored in the
request spec for an instance:
https://github.com/openstack/nova/blob/master/nova/compute/api.py#L966
Which means that upon unshelving after a shelve_offload, we will always
pass the scheduler the original AZ.
Sylvain, do you concur?
Best,
-jay
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev