Re: [openstack-dev] [cinder][nova] proper syncing of cinder volume state

2014-12-01 Thread Duncan Thomas
John:

States that the driver can/should do some cleanup work during the
transition:

attaching -> available or error
detaching -> available or error
error -> available or error
deleting -> deleted or error_deleting

Also in possibly wanted in future but much harder:
backing_up -> available or error (need to make sure the backup service
copes)
restoring -> error (need to make sure the backup service copes)

I haven't looked at the entire state space yet, these are the obvious ones
off the top of my head


On 1 December 2014 at 06:30, John Griffith  wrote:

> On Fri, Nov 28, 2014 at 11:25 AM, D'Angelo, Scott 
> wrote:
> > A Cinder blueprint has been submitted to allow the python-cinderclient to
> > involve the back end storage driver in resetting the state of a cinder
> > volume:
> >
> > https://blueprints.launchpad.net/cinder/+spec/reset-state-with-driver
> >
> > and the spec:
> >
> > https://review.openstack.org/#/c/134366
> >
> >
> >
> > This blueprint contains various use cases for a volume that may be
> listed in
> > the Cinder DataBase in state detaching|attaching|creating|deleting.
> >
> > The Proposed solution involves augmenting the python-cinderclient command
> > ‘reset-state’, but other options are listed, including those that
> >
> > involve Nova, since the state of a volume in the Nova XML found in
> > /etc/libvirt/qemu/.xml may also be out-of-sync with the
> >
> > Cinder DB or storage back end.
> >
> >
> >
> > A related proposal for adding a new non-admin API for changing volume
> status
> > from ‘attaching’ to ‘error’ has also been proposed:
> >
> > https://review.openstack.org/#/c/137503/
> >
> >
> >
> > Some questions have arisen:
> >
> > 1) Should ‘reset-state’ command be changed at all, since it was
> originally
> > just to modify the Cinder DB?
> >
> > 2) Should ‘reset-state’ be fixed to prevent the naïve admin from changing
> > the CinderDB to be out-of-sync with the back end storage?
> >
> > 3) Should ‘reset-state’ be kept the same, but augmented with new options?
> >
> > 4) Should a new command be implemented, with possibly a new admin API to
> > properly sync state?
> >
> > 5) Should Nova be involved? If so, should this be done as a separate
> body of
> > work?
> >
> >
> >
> > This has proven to be a complex issue and there seems to be a good bit of
> > interest. Please provide feedback, comments, and suggestions.
> >
> >
> > ___
> > OpenStack-dev mailing list
> > OpenStack-dev@lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
>
> Hey Scott,
>
> Thanks for posting this to the ML, I stated my opinion on the spec,
> but for completeness:
> My feeling is that reset-state has morphed into something entirely
> different than originally intended.  That's actually great, nothing
> wrong there at all.  I strongly disagree with the statements that
> "setting the status in the DB only is almost always the wrong thing to
> do".  The whole point was to allow the state to be changed in the DB
> so the item could in most cases be deleted.  There was never an intent
> (that I'm aware of) to make this some sort of uber resync and heal API
> call.
>
> All of that history aside, I think it would be great to add some
> driver interaction here.  I am however very unclear on what that would
> actually include.  For example, would you let a Volume's state be
> changed from "Error-Attaching" to "In-Use" and just run through the
> process of retyring an attach?  To me that seems like a bad idea.  I'm
> much happier with the current state of changing the state form "Error"
> to "Available" (and NOTHING else) so that an operation can be retried,
> or the resource can be deleted.  If you start allowing any state
> transition (which sadly we've started to do) you're almost never going
> to get things correct.  This also covers almost every situation even
> though it means you have to explicitly retry operations or steps (I
> don't think that's a bad thing) and make the code significantly more
> robust IMO (we have some issues lately with things being robust).
>
> My proposal would be to go back to limiting the things you can do with
> reset-state (basicly make it so you can only release the resource back
> to available) and add the driver interaction to clean up any mess if
> possible.  This could be a simple driver call added like
> "make_volume_available" whereby the driver just ensures that there are
> no attachments and well; honestly nothing else comes to mind as
> being something the driver cares about here. The final option then
> being to add some more power to force-delete.
>
> Is there anything other than attach that matters from a driver?  If
> people are talking error-recovery that to me is a whole different
> topic and frankly I think we need to spend more time preventing errors
> as opposed to trying to recover from them via new API calls.
>
> Curious to see if any other folks have input here?
>
> John
>
> 

Re: [openstack-dev] [cinder][nova] proper syncing of cinder volume state

2014-11-30 Thread John Griffith
On Fri, Nov 28, 2014 at 11:25 AM, D'Angelo, Scott  wrote:
> A Cinder blueprint has been submitted to allow the python-cinderclient to
> involve the back end storage driver in resetting the state of a cinder
> volume:
>
> https://blueprints.launchpad.net/cinder/+spec/reset-state-with-driver
>
> and the spec:
>
> https://review.openstack.org/#/c/134366
>
>
>
> This blueprint contains various use cases for a volume that may be listed in
> the Cinder DataBase in state detaching|attaching|creating|deleting.
>
> The Proposed solution involves augmenting the python-cinderclient command
> ‘reset-state’, but other options are listed, including those that
>
> involve Nova, since the state of a volume in the Nova XML found in
> /etc/libvirt/qemu/.xml may also be out-of-sync with the
>
> Cinder DB or storage back end.
>
>
>
> A related proposal for adding a new non-admin API for changing volume status
> from ‘attaching’ to ‘error’ has also been proposed:
>
> https://review.openstack.org/#/c/137503/
>
>
>
> Some questions have arisen:
>
> 1) Should ‘reset-state’ command be changed at all, since it was originally
> just to modify the Cinder DB?
>
> 2) Should ‘reset-state’ be fixed to prevent the naïve admin from changing
> the CinderDB to be out-of-sync with the back end storage?
>
> 3) Should ‘reset-state’ be kept the same, but augmented with new options?
>
> 4) Should a new command be implemented, with possibly a new admin API to
> properly sync state?
>
> 5) Should Nova be involved? If so, should this be done as a separate body of
> work?
>
>
>
> This has proven to be a complex issue and there seems to be a good bit of
> interest. Please provide feedback, comments, and suggestions.
>
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>

Hey Scott,

Thanks for posting this to the ML, I stated my opinion on the spec,
but for completeness:
My feeling is that reset-state has morphed into something entirely
different than originally intended.  That's actually great, nothing
wrong there at all.  I strongly disagree with the statements that
"setting the status in the DB only is almost always the wrong thing to
do".  The whole point was to allow the state to be changed in the DB
so the item could in most cases be deleted.  There was never an intent
(that I'm aware of) to make this some sort of uber resync and heal API
call.

All of that history aside, I think it would be great to add some
driver interaction here.  I am however very unclear on what that would
actually include.  For example, would you let a Volume's state be
changed from "Error-Attaching" to "In-Use" and just run through the
process of retyring an attach?  To me that seems like a bad idea.  I'm
much happier with the current state of changing the state form "Error"
to "Available" (and NOTHING else) so that an operation can be retried,
or the resource can be deleted.  If you start allowing any state
transition (which sadly we've started to do) you're almost never going
to get things correct.  This also covers almost every situation even
though it means you have to explicitly retry operations or steps (I
don't think that's a bad thing) and make the code significantly more
robust IMO (we have some issues lately with things being robust).

My proposal would be to go back to limiting the things you can do with
reset-state (basicly make it so you can only release the resource back
to available) and add the driver interaction to clean up any mess if
possible.  This could be a simple driver call added like
"make_volume_available" whereby the driver just ensures that there are
no attachments and well; honestly nothing else comes to mind as
being something the driver cares about here. The final option then
being to add some more power to force-delete.

Is there anything other than attach that matters from a driver?  If
people are talking error-recovery that to me is a whole different
topic and frankly I think we need to spend more time preventing errors
as opposed to trying to recover from them via new API calls.

Curious to see if any other folks have input here?

John

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [cinder][nova] proper syncing of cinder volume state

2014-11-28 Thread D'Angelo, Scott
A Cinder blueprint has been submitted to allow the python-cinderclient to 
involve the back end storage driver in resetting the state of a cinder volume:
https://blueprints.launchpad.net/cinder/+spec/reset-state-with-driver
and the spec:
https://review.openstack.org/#/c/134366

This blueprint contains various use cases for a volume that may be listed in 
the Cinder DataBase in state detaching|attaching|creating|deleting.
The Proposed solution involves augmenting the python-cinderclient command 
'reset-state', but other options are listed, including those that
involve Nova, since the state of a volume in the Nova XML found in 
/etc/libvirt/qemu/.xml may also be out-of-sync with the
Cinder DB or storage back end.

A related proposal for adding a new non-admin API for changing volume status 
from 'attaching' to 'error' has also been proposed:
https://review.openstack.org/#/c/137503/

Some questions have arisen:
1) Should 'reset-state' command be changed at all, since it was originally just 
to modify the Cinder DB?
2) Should 'reset-state' be fixed to prevent the naïve admin from changing the 
CinderDB to be out-of-sync with the back end storage?
3) Should 'reset-state' be kept the same, but augmented with new options?
4) Should a new command be implemented, with possibly a new admin API to 
properly sync state?
5) Should Nova be involved? If so, should this be done as a separate body of 
work?

This has proven to be a complex issue and there seems to be a good bit of 
interest. Please provide feedback, comments, and suggestions.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev