Re: [openstack-dev] [nova][cinder] Can all non-Ironic compute drivers handle volume size extension?

2017-04-12 Thread Mathieu Gagné
On Wed, Apr 12, 2017 at 2:54 PM, Matt Riedemann  wrote:
>
> Correct, I thought about this yesterday too. And this should be a detail in
> the Cinder spec for sure, but Cinder should probably have a specific policy
> check for attempting to extend an attached volume. Having said this, I see
> that Cinder has a "volume:extend" policy rule but I don't see it actually
> checked in the code, is that a bug?
>
> But the idea is, you, as a deployer, could allow extending volumes that are
> not attached (using the existing volume:extend policy) but disable the
> ability to extend attached volumes (maybe new rule volume:extend_attached?).
> Then if you're running older computes, or not running libvirt/hyperv
> computes, etc, then you just disable the API entrypoint for the entire
> operation on the Cinder side.
>
> ^ should all be captured in the Cinder spec.
>

It has been added to Cinder spec:
https://review.openstack.org/#/c/453286/

--
Mathieu

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][cinder] Can all non-Ironic compute drivers handle volume size extension?

2017-04-12 Thread Matt Riedemann

On 4/12/2017 12:30 PM, Mathieu Gagné wrote:

Thanks for starting this discussion. There is a lot to cover/answer.

On Tue, Apr 11, 2017 at 6:35 PM, Matt Riedemann  wrote:


This is not discoverable at the moment, for the end user or cinder, so I'm
trying to figure out what the failure mode looks like.

This all starts on the cinder side to extend the size of the attached
volume. Cinder is going to have to see if Nova is new enough to handle this
(via the available API versions) before accepting the request and resizing
the volume. Then Cinder sends the event to Nova. This is where it gets
interesting.

On the Nova side, if all of the computes aren't new enough, we could just
fail the request outright with a 409. What does Cinder do then? Rollback the
volume resize?


This means an extend volume operation would need to check for Nova
support first.
This also means adding a new API call to fetch and discover such
capabilities per instance (from associated compute node).
If we want to catch errors in volume size extension in Nova, we will
need to find an other way, external events are async.


Today cinder can GET /versions from the compute API and tell if it 
should even start attempting volume extend or not for an attached 
volume. If the microversion support isn't there in the compute side, 
cinder should fail fast in the API. That's a detail for the cinder spec.


Once the request reaches nova, we could technically lookup the service 
version for the compute from the API and tell if it's new enough to 
support this capability and fail fast if it won't. I don't know if we'll 
do that, but we have it in our pocket. Either way, the Cinder side 
should handle an error response from Nova and proceed accordingly 
(rollback the volume extend).





But let's say the computes are new enough, but the instance is on a compute
that does not support the operation. Then what? Do we register an instance
fault and put the instance into ERROR state? Then the admin would need to
intervene.

Are there other ideas? Until we have capabilities (info) exposed out of the
API we're stuck with questions like this.



Like TommyLike mentioned in a review, AWS introduced Live Volume
Modifications available on some instance types.
On instance types with limited support, you need to stop/start the
instance or detach/attach the volume.
On instances started before a certain date, you need to stop/start the
instance or detach/attach the volume at least once.
In all cases, the end user needs to extend the partition/filesystem in
the instance.

They have the luxury to fully control the environment and synchronize
the compute service with the volume service.
Even (speculatively) having bidirectional
orchestration/synchronization/communications or what.

I have that same luxury since I only support one volume backend and
virt driver combination.
But I now start to grasp the extend of what adding such feature
requires, especially when it implies cross-services support...


Yeah it's super fun isn't it. :) This is why it takes a long time to get 
some features into Nova.




We have a matrix of compute drivers and volume backends to support
with some combinations which might never support online volume
extension.
There is the desire for OpenStack to be interoperable between clouds
so there is a strong incentive to make it work for all combinations.

I will still take the liberty to ask:

Would it be in the realm of possibilities for a deployer to have to
explicitly enable this feature?
A deployer would be able to enable such feature once all
services/components it choose to deployed fully support online volume
extension.


Correct, I thought about this yesterday too. And this should be a detail 
in the Cinder spec for sure, but Cinder should probably have a specific 
policy check for attempting to extend an attached volume. Having said 
this, I see that Cinder has a "volume:extend" policy rule but I don't 
see it actually checked in the code, is that a bug?


But the idea is, you, as a deployer, could allow extending volumes that 
are not attached (using the existing volume:extend policy) but disable 
the ability to extend attached volumes (maybe new rule 
volume:extend_attached?). Then if you're running older computes, or not 
running libvirt/hyperv computes, etc, then you just disable the API 
entrypoint for the entire operation on the Cinder side.


^ should all be captured in the Cinder spec.



I know it won't address cases where a mixed of volume backends and
virt drivers are deployed.
So we would still need capabilities discoverability. This includes
volume type capabilities discoverability which I'm not sure exists
today.

Lets not start about how Horizon will discover such capabilities per
instance/volume. That's an other can of worms. =)

--
Mathieu

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: 

Re: [openstack-dev] [nova][cinder] Can all non-Ironic compute drivers handle volume size extension?

2017-04-12 Thread Mathieu Gagné
Thanks for starting this discussion. There is a lot to cover/answer.

On Tue, Apr 11, 2017 at 6:35 PM, Matt Riedemann  wrote:
>
> This is not discoverable at the moment, for the end user or cinder, so I'm
> trying to figure out what the failure mode looks like.
>
> This all starts on the cinder side to extend the size of the attached
> volume. Cinder is going to have to see if Nova is new enough to handle this
> (via the available API versions) before accepting the request and resizing
> the volume. Then Cinder sends the event to Nova. This is where it gets
> interesting.
>
> On the Nova side, if all of the computes aren't new enough, we could just
> fail the request outright with a 409. What does Cinder do then? Rollback the
> volume resize?

This means an extend volume operation would need to check for Nova
support first.
This also means adding a new API call to fetch and discover such
capabilities per instance (from associated compute node).
If we want to catch errors in volume size extension in Nova, we will
need to find an other way, external events are async.

> But let's say the computes are new enough, but the instance is on a compute
> that does not support the operation. Then what? Do we register an instance
> fault and put the instance into ERROR state? Then the admin would need to
> intervene.
>
> Are there other ideas? Until we have capabilities (info) exposed out of the
> API we're stuck with questions like this.
>

Like TommyLike mentioned in a review, AWS introduced Live Volume
Modifications available on some instance types.
On instance types with limited support, you need to stop/start the
instance or detach/attach the volume.
On instances started before a certain date, you need to stop/start the
instance or detach/attach the volume at least once.
In all cases, the end user needs to extend the partition/filesystem in
the instance.

They have the luxury to fully control the environment and synchronize
the compute service with the volume service.
Even (speculatively) having bidirectional
orchestration/synchronization/communications or what.

I have that same luxury since I only support one volume backend and
virt driver combination.
But I now start to grasp the extend of what adding such feature
requires, especially when it implies cross-services support...

We have a matrix of compute drivers and volume backends to support
with some combinations which might never support online volume
extension.
There is the desire for OpenStack to be interoperable between clouds
so there is a strong incentive to make it work for all combinations.

I will still take the liberty to ask:

Would it be in the realm of possibilities for a deployer to have to
explicitly enable this feature?
A deployer would be able to enable such feature once all
services/components it choose to deployed fully support online volume
extension.

I know it won't address cases where a mixed of volume backends and
virt drivers are deployed.
So we would still need capabilities discoverability. This includes
volume type capabilities discoverability which I'm not sure exists
today.

Lets not start about how Horizon will discover such capabilities per
instance/volume. That's an other can of worms. =)

--
Mathieu

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][cinder] Can all non-Ironic compute drivers handle volume size extension?

2017-04-12 Thread Mathieu Gagné
On Wed, Apr 12, 2017 at 12:58 PM, Matt Riedemann  wrote:
>
> I guess we also have the instance action events. So we could avoid putting
> the instance into ERROR state but record an instance action event that the
> extend volume event failed on the compute, so at least the user/admin could
> figure out why it didn't change on the nova side.
>
> What they do after that I'm not sure. Would detaching and then re-attaching
> the now resized volume fix it?
>

There are 2 steps to the volume size extension: iscsi --rescan and
virsh blockresize.
As an admin, you could call the same API endpoint to retrigger all of
those steps.
If iscsi --rescan succeeds but virsh blockresize fails, stopping and
starting the instance will fix the issue.
This is the step we asked our customer to perform before virsh
blockresize was added to our implementation.

--
Mathieu

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][cinder] Can all non-Ironic compute drivers handle volume size extension?

2017-04-12 Thread Matt Riedemann

On 4/11/2017 5:35 PM, Matt Riedemann wrote:

I'm reading through mgagne's spec to support attached volume size
extension callbacks in Nova [1] and the question that comes up is what
happens when the backend compute does not support this, either because
it's too old (Ocata) or the virt driver does not support the event?

The spec is targeted at libvirt to use os-brick, but the hyper-v driver
is also using os-brick since Ocata, and the Windows connector support
the extend_volume operation, so that should work.

I don't know about powervm, vmware or xen though.

This is not discoverable at the moment, for the end user or cinder, so
I'm trying to figure out what the failure mode looks like.

This all starts on the cinder side to extend the size of the attached
volume. Cinder is going to have to see if Nova is new enough to handle
this (via the available API versions) before accepting the request and
resizing the volume. Then Cinder sends the event to Nova. This is where
it gets interesting.

On the Nova side, if all of the computes aren't new enough, we could
just fail the request outright with a 409. What does Cinder do then?
Rollback the volume resize?

But let's say the computes are new enough, but the instance is on a
compute that does not support the operation. Then what? Do we register
an instance fault and put the instance into ERROR state? Then the admin
would need to intervene.

Are there other ideas? Until we have capabilities (info) exposed out of
the API we're stuck with questions like this.

[1] https://review.openstack.org/#/c/453272/



I guess we also have the instance action events. So we could avoid 
putting the instance into ERROR state but record an instance action 
event that the extend volume event failed on the compute, so at least 
the user/admin could figure out why it didn't change on the nova side.


What they do after that I'm not sure. Would detaching and then 
re-attaching the now resized volume fix it?


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova][cinder] Can all non-Ironic compute drivers handle volume size extension?

2017-04-11 Thread Matt Riedemann
I'm reading through mgagne's spec to support attached volume size 
extension callbacks in Nova [1] and the question that comes up is what 
happens when the backend compute does not support this, either because 
it's too old (Ocata) or the virt driver does not support the event?


The spec is targeted at libvirt to use os-brick, but the hyper-v driver 
is also using os-brick since Ocata, and the Windows connector support 
the extend_volume operation, so that should work.


I don't know about powervm, vmware or xen though.

This is not discoverable at the moment, for the end user or cinder, so 
I'm trying to figure out what the failure mode looks like.


This all starts on the cinder side to extend the size of the attached 
volume. Cinder is going to have to see if Nova is new enough to handle 
this (via the available API versions) before accepting the request and 
resizing the volume. Then Cinder sends the event to Nova. This is where 
it gets interesting.


On the Nova side, if all of the computes aren't new enough, we could 
just fail the request outright with a 409. What does Cinder do then? 
Rollback the volume resize?


But let's say the computes are new enough, but the instance is on a 
compute that does not support the operation. Then what? Do we register 
an instance fault and put the instance into ERROR state? Then the admin 
would need to intervene.


Are there other ideas? Until we have capabilities (info) exposed out of 
the API we're stuck with questions like this.


[1] https://review.openstack.org/#/c/453272/

--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev