Re: [openstack-dev] [nova] Will unshelving an offloaded instance respect the original AZ?

2017-02-20 Thread Sylvain Bauza


Le 20/02/2017 09:41, Jay Pipes a écrit :
> On 02/18/2017 01:46 PM, Matt Riedemann wrote:
>> I haven't fully dug into testing this, but I got wondering about this
>> question from reviewing a change [1] which would make the unshelve
>> operation start to check the volume AZ compared to the instance AZ when
>> the compute manager calls _prep_block_device.
>>
>> That change is attempting to remove the check_attach() method in
>> nova.volume.cinder.API since it's mostly redundant with state checks
>> that Cinder does when reserving the volume. The only other thing that
>> Nova does in there right now is compare the AZs.
>>
>> What I'm wondering is, with that change, will things break because of a
>> scenario like this:
>>
>> 1. Create volume in AZ 1.
>> 2. Create server in AZ 1.
>> 3. Attach volume to server (or boot server from volume in step 2).
>> 4. Shelve (offload) server.
>> 5. Unshelve server - nova-scheduler puts it into AZ 2.
>> 6. _prep_block_device compares instance AZ 2 to volume AZ 1 and unshelve
>> fails with InvalidVolume.
>>
>> If unshelving a server in AZ 1 can't move it outside of AZ 1, then we're
>> fine and the AZ check when unshelving is redundant but harmless.
>>
>> [1]
>> https://review.openstack.org/#/c/335358/38/nova/virt/block_device.py@249
> 
> When an instance is unshelved, the unshelve_instance() RPC API method is
> passed a RequestSpec object as the request_spec parameter:
> 
> https://github.com/openstack/nova/blob/master/nova/conductor/manager.py#L600
> 
> 
> This request spec object is passed to schedule_instances():
> 
> https://github.com/openstack/nova/blob/master/nova/conductor/manager.py#L660
> 
> 
> (you will note that the code directly above there "resets force_hosts"
> parameters, ostensibly to prevent any forced destination host from being
> passed to the scheduler)
> 
> The question is: does the above request spec contain availability zone
> information for the original instance? If it does, we're good. If it
> doesn't, we can get into the problem described above.
> 
> From what I can tell (and Sylvain might be the best person to answer
> this, thus his cc'ing), the availability zone is *always* stored in the
> request spec for an instance:
> 
> https://github.com/openstack/nova/blob/master/nova/compute/api.py#L966
> 
> Which means that upon unshelving after a shelve_offload, we will always
> pass the scheduler the original AZ.
> 
> Sylvain, do you concur?
> 

tl;dr: Exactly this, it's not possible since Mitaka to unshelve on a
different AZ if you have the AZFilter enabled.

Longer version:

Exactly this. If the instance was booted using a specific AZ flag, then :

 #1 the instance.az field is set to something different from a conf opt
default
and #2 the attached RequestSpec is getting the AZ field set

Both are persisted later in the conductor.


Now, say this instance is shelved/unshelved, then we get the original
RequestSpec at the API level
https://github.com/openstack/nova/blob/466769e588dc44d11987430b54ca1bd7188abffb/nova/compute/api.py#L3275-L3276

That's how the above conductor method you provided is getting the Spec
passed as argument.

Later, when the call is made to the scheduler, if the AZFilter is
enabled, it goes verifying that spec_obj.az field against the compute AZ
and refuses to accept the host if the AZ is different.

One side note tho, if the instance is not specified with a AZ, then of
course it can be unshelved on a compute not in the same AZ, since the
user didn't explicitly asked to stick with an AZ.

HTH,
-Sylvain


> Best,
> -jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Will unshelving an offloaded instance respect the original AZ?

2017-02-20 Thread Jay Pipes

On 02/18/2017 01:46 PM, Matt Riedemann wrote:

I haven't fully dug into testing this, but I got wondering about this
question from reviewing a change [1] which would make the unshelve
operation start to check the volume AZ compared to the instance AZ when
the compute manager calls _prep_block_device.

That change is attempting to remove the check_attach() method in
nova.volume.cinder.API since it's mostly redundant with state checks
that Cinder does when reserving the volume. The only other thing that
Nova does in there right now is compare the AZs.

What I'm wondering is, with that change, will things break because of a
scenario like this:

1. Create volume in AZ 1.
2. Create server in AZ 1.
3. Attach volume to server (or boot server from volume in step 2).
4. Shelve (offload) server.
5. Unshelve server - nova-scheduler puts it into AZ 2.
6. _prep_block_device compares instance AZ 2 to volume AZ 1 and unshelve
fails with InvalidVolume.

If unshelving a server in AZ 1 can't move it outside of AZ 1, then we're
fine and the AZ check when unshelving is redundant but harmless.

[1]
https://review.openstack.org/#/c/335358/38/nova/virt/block_device.py@249


When an instance is unshelved, the unshelve_instance() RPC API method is 
passed a RequestSpec object as the request_spec parameter:


https://github.com/openstack/nova/blob/master/nova/conductor/manager.py#L600

This request spec object is passed to schedule_instances():

https://github.com/openstack/nova/blob/master/nova/conductor/manager.py#L660

(you will note that the code directly above there "resets force_hosts" 
parameters, ostensibly to prevent any forced destination host from being 
passed to the scheduler)


The question is: does the above request spec contain availability zone 
information for the original instance? If it does, we're good. If it 
doesn't, we can get into the problem described above.


From what I can tell (and Sylvain might be the best person to answer 
this, thus his cc'ing), the availability zone is *always* stored in the 
request spec for an instance:


https://github.com/openstack/nova/blob/master/nova/compute/api.py#L966

Which means that upon unshelving after a shelve_offload, we will always 
pass the scheduler the original AZ.


Sylvain, do you concur?

Best,
-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] Will unshelving an offloaded instance respect the original AZ?

2017-02-18 Thread Matt Riedemann
I haven't fully dug into testing this, but I got wondering about this 
question from reviewing a change [1] which would make the unshelve 
operation start to check the volume AZ compared to the instance AZ when 
the compute manager calls _prep_block_device.


That change is attempting to remove the check_attach() method in 
nova.volume.cinder.API since it's mostly redundant with state checks 
that Cinder does when reserving the volume. The only other thing that 
Nova does in there right now is compare the AZs.


What I'm wondering is, with that change, will things break because of a 
scenario like this:


1. Create volume in AZ 1.
2. Create server in AZ 1.
3. Attach volume to server (or boot server from volume in step 2).
4. Shelve (offload) server.
5. Unshelve server - nova-scheduler puts it into AZ 2.
6. _prep_block_device compares instance AZ 2 to volume AZ 1 and unshelve 
fails with InvalidVolume.


If unshelving a server in AZ 1 can't move it outside of AZ 1, then we're 
fine and the AZ check when unshelving is redundant but harmless.


[1] https://review.openstack.org/#/c/335358/38/nova/virt/block_device.py@249

--

Thanks,

Matt Riedemann

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev