Re: [openstack-dev] [nova][cinder] volumes stuck detaching attaching and force detach
On 3/1/2016 11:36 PM, John Griffith wrote: On Tue, Mar 1, 2016 at 3:48 PM, Murray, Paul (HP Cloud) <pmur...@hpe.com <mailto:pmur...@hpe.com>> wrote: > -Original Message- > From: D'Angelo, Scott > > Matt, changing Nova to store the connector info at volume attach time does > help. Where the gap will remain is after Nova evacuation or live migration, This will happen with shelve as well I think. Volumes are not detached in shelve IIRC. > when that info will need to be updated in Cinder. We need to change the > Cinder API to have some mechanism to allow this. > We'd also like Cinder to store the appropriate info to allow a force-detach for > the cases where Nova cannot make the call to Cinder. > Ongoing work for this and related issues is tracked and discussed here: > https://etherpad.openstack.org/p/cinder-nova-api-changes > > Scott D'Angelo (scottda) > > From: Matt Riedemann [mrie...@linux.vnet.ibm.com <mailto:mrie...@linux.vnet.ibm.com>] > Sent: Monday, February 29, 2016 7:48 AM > To: openstack-dev@lists.openstack.org <mailto:openstack-dev@lists.openstack.org> > Subject: Re: [openstack-dev] [nova][cinder] volumes stuck detaching > attaching and force detach > > On 2/22/2016 4:08 PM, Walter A. Boring IV wrote: > > On 02/22/2016 11:24 AM, John Garbutt wrote: > >> Hi, > >> > >> Just came up on IRC, when nova-compute gets killed half way through a > >> volume attach (i.e. no graceful shutdown), things get stuck in a bad > >> state, like volumes stuck in the attaching state. > >> > >> This looks like a new addition to this conversation: > >> http://lists.openstack.org/pipermail/openstack-dev/2015- > December/0826 > >> 83.html > >> > >> And brings us back to this discussion: > >> https://blueprints.launchpad.net/nova/+spec/add-force-detach-to-nova > >> > >> What if we move our attention towards automatically recovering from > >> the above issue? I am wondering if we can look at making our usually > >> recovery code deal with the above situation: > >> > https://github.com/openstack/nova/blob/834b5a9e3a4f8c6ee2e3387845fc24 > >> c79f4bf615/nova/compute/manager.py#L934 > >> > >> > >> Did we get the Cinder APIs in place that enable the force-detach? I > >> think we did and it was this one? > >> https://blueprints.launchpad.net/python-cinderclient/+spec/nova-force > >> -detach-needs-cinderclient-api > >> > >> > >> I think diablo_rojo might be able to help dig for any bugs we have > >> related to this. I just wanted to get this idea out there before I > >> head out. > >> > >> Thanks, > >> John > >> > >> > __ > ___ > >> _ > >> > >> OpenStack Development Mailing List (not for usage questions) > >> Unsubscribe: > >> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe <http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe> > >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > >> . > >> > > The problem is a little more complicated. > > > > In order for cinder backends to be able to do a force detach > > correctly, the Cinder driver needs to have the correct 'connector' > > dictionary passed in to terminate_connection. That connector > > dictionary is the collection of initiator side information which is gleaned > here: > > https://github.com/openstack/os-brick/blob/master/os_brick/initiator/c > > onnector.py#L99-L144 > > > > > > The plan was to save that connector information in the Cinder > > volume_attachment table. When a force detach is called, Cinder has > > the existing connector saved if Nova doesn't have it. The problem was > > live migration. When you migrate to the destination n-cpu host, the > > connector that Cinder had is now out of date. There is no API in > > Cinder today to allow updating an existing attachment. > > > > So, the plan at t
Re: [openstack-dev] [nova][cinder] volumes stuck detaching attaching and force detach
On Tue, Mar 1, 2016 at 3:48 PM, Murray, Paul (HP Cloud) <pmur...@hpe.com> wrote: > > > -Original Message- > > From: D'Angelo, Scott > > > > Matt, changing Nova to store the connector info at volume attach time > does > > help. Where the gap will remain is after Nova evacuation or live > migration, > > This will happen with shelve as well I think. Volumes are not detached in > shelve > IIRC. > > > when that info will need to be updated in Cinder. We need to change the > > Cinder API to have some mechanism to allow this. > > We'd also like Cinder to store the appropriate info to allow a > force-detach for > > the cases where Nova cannot make the call to Cinder. > > Ongoing work for this and related issues is tracked and discussed here: > > https://etherpad.openstack.org/p/cinder-nova-api-changes > > > > Scott D'Angelo (scottda) > > > > From: Matt Riedemann [mrie...@linux.vnet.ibm.com] > > Sent: Monday, February 29, 2016 7:48 AM > > To: openstack-dev@lists.openstack.org > > Subject: Re: [openstack-dev] [nova][cinder] volumes stuck detaching > > attaching and force detach > > > > On 2/22/2016 4:08 PM, Walter A. Boring IV wrote: > > > On 02/22/2016 11:24 AM, John Garbutt wrote: > > >> Hi, > > >> > > >> Just came up on IRC, when nova-compute gets killed half way through a > > >> volume attach (i.e. no graceful shutdown), things get stuck in a bad > > >> state, like volumes stuck in the attaching state. > > >> > > >> This looks like a new addition to this conversation: > > >> http://lists.openstack.org/pipermail/openstack-dev/2015- > > December/0826 > > >> 83.html > > >> > > >> And brings us back to this discussion: > > >> https://blueprints.launchpad.net/nova/+spec/add-force-detach-to-nova > > >> > > >> What if we move our attention towards automatically recovering from > > >> the above issue? I am wondering if we can look at making our usually > > >> recovery code deal with the above situation: > > >> > > https://github.com/openstack/nova/blob/834b5a9e3a4f8c6ee2e3387845fc24 > > >> c79f4bf615/nova/compute/manager.py#L934 > > >> > > >> > > >> Did we get the Cinder APIs in place that enable the force-detach? I > > >> think we did and it was this one? > > >> https://blueprints.launchpad.net/python-cinderclient/+spec/nova-force > > >> -detach-needs-cinderclient-api > > >> > > >> > > >> I think diablo_rojo might be able to help dig for any bugs we have > > >> related to this. I just wanted to get this idea out there before I > > >> head out. > > >> > > >> Thanks, > > >> John > > >> > > >> > > __ > > ___ > > >> _ > > >> > > >> OpenStack Development Mailing List (not for usage questions) > > >> Unsubscribe: > > >> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > > >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > >> . > > >> > > > The problem is a little more complicated. > > > > > > In order for cinder backends to be able to do a force detach > > > correctly, the Cinder driver needs to have the correct 'connector' > > > dictionary passed in to terminate_connection. That connector > > > dictionary is the collection of initiator side information which is > gleaned > > here: > > > https://github.com/openstack/os-brick/blob/master/os_brick/initiator/c > > > onnector.py#L99-L144 > > > > > > > > > The plan was to save that connector information in the Cinder > > > volume_attachment table. When a force detach is called, Cinder has > > > the existing connector saved if Nova doesn't have it. The problem was > > > live migration. When you migrate to the destination n-cpu host, the > > > connector that Cinder had is now out of date. There is no API in > > > Cinder today to allow updating an existing attachment. > > > > > > So, the plan at the Mitaka summit was to add this new API, but it > > > required microversions to land, which we still don't have in Cinder's > > > API today. > > > > > > > > > Walt > > > > > > > > __
Re: [openstack-dev] [nova][cinder] volumes stuck detaching attaching and force detach
> -Original Message- > From: D'Angelo, Scott > > Matt, changing Nova to store the connector info at volume attach time does > help. Where the gap will remain is after Nova evacuation or live migration, This will happen with shelve as well I think. Volumes are not detached in shelve IIRC. > when that info will need to be updated in Cinder. We need to change the > Cinder API to have some mechanism to allow this. > We'd also like Cinder to store the appropriate info to allow a force-detach > for > the cases where Nova cannot make the call to Cinder. > Ongoing work for this and related issues is tracked and discussed here: > https://etherpad.openstack.org/p/cinder-nova-api-changes > > Scott D'Angelo (scottda) > > From: Matt Riedemann [mrie...@linux.vnet.ibm.com] > Sent: Monday, February 29, 2016 7:48 AM > To: openstack-dev@lists.openstack.org > Subject: Re: [openstack-dev] [nova][cinder] volumes stuck detaching > attaching and force detach > > On 2/22/2016 4:08 PM, Walter A. Boring IV wrote: > > On 02/22/2016 11:24 AM, John Garbutt wrote: > >> Hi, > >> > >> Just came up on IRC, when nova-compute gets killed half way through a > >> volume attach (i.e. no graceful shutdown), things get stuck in a bad > >> state, like volumes stuck in the attaching state. > >> > >> This looks like a new addition to this conversation: > >> http://lists.openstack.org/pipermail/openstack-dev/2015- > December/0826 > >> 83.html > >> > >> And brings us back to this discussion: > >> https://blueprints.launchpad.net/nova/+spec/add-force-detach-to-nova > >> > >> What if we move our attention towards automatically recovering from > >> the above issue? I am wondering if we can look at making our usually > >> recovery code deal with the above situation: > >> > https://github.com/openstack/nova/blob/834b5a9e3a4f8c6ee2e3387845fc24 > >> c79f4bf615/nova/compute/manager.py#L934 > >> > >> > >> Did we get the Cinder APIs in place that enable the force-detach? I > >> think we did and it was this one? > >> https://blueprints.launchpad.net/python-cinderclient/+spec/nova-force > >> -detach-needs-cinderclient-api > >> > >> > >> I think diablo_rojo might be able to help dig for any bugs we have > >> related to this. I just wanted to get this idea out there before I > >> head out. > >> > >> Thanks, > >> John > >> > >> > __ > ___ > >> _ > >> > >> OpenStack Development Mailing List (not for usage questions) > >> Unsubscribe: > >> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > >> . > >> > > The problem is a little more complicated. > > > > In order for cinder backends to be able to do a force detach > > correctly, the Cinder driver needs to have the correct 'connector' > > dictionary passed in to terminate_connection. That connector > > dictionary is the collection of initiator side information which is gleaned > here: > > https://github.com/openstack/os-brick/blob/master/os_brick/initiator/c > > onnector.py#L99-L144 > > > > > > The plan was to save that connector information in the Cinder > > volume_attachment table. When a force detach is called, Cinder has > > the existing connector saved if Nova doesn't have it. The problem was > > live migration. When you migrate to the destination n-cpu host, the > > connector that Cinder had is now out of date. There is no API in > > Cinder today to allow updating an existing attachment. > > > > So, the plan at the Mitaka summit was to add this new API, but it > > required microversions to land, which we still don't have in Cinder's > > API today. > > > > > > Walt > > > > > __ > > > OpenStack Development Mailing List (not for usage questions) > > Unsubscribe: > > openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > > Regarding storing off the initial connector information from the attach, does > this [1] help bridge the gap? That adds the connector dict to the > connection_info dict that is serialized and stored in the nova > block_device_mappings table, and then in that patch is used to
Re: [openstack-dev] [nova][cinder] volumes stuck detaching attaching and force detach
Matt, changing Nova to store the connector info at volume attach time does help. Where the gap will remain is after Nova evacuation or live migration, when that info will need to be updated in Cinder. We need to change the Cinder API to have some mechanism to allow this. We'd also like Cinder to store the appropriate info to allow a force-detach for the cases where Nova cannot make the call to Cinder. Ongoing work for this and related issues is tracked and discussed here: https://etherpad.openstack.org/p/cinder-nova-api-changes Scott D'Angelo (scottda) From: Matt Riedemann [mrie...@linux.vnet.ibm.com] Sent: Monday, February 29, 2016 7:48 AM To: openstack-dev@lists.openstack.org Subject: Re: [openstack-dev] [nova][cinder] volumes stuck detaching attaching and force detach On 2/22/2016 4:08 PM, Walter A. Boring IV wrote: > On 02/22/2016 11:24 AM, John Garbutt wrote: >> Hi, >> >> Just came up on IRC, when nova-compute gets killed half way through a >> volume attach (i.e. no graceful shutdown), things get stuck in a bad >> state, like volumes stuck in the attaching state. >> >> This looks like a new addition to this conversation: >> http://lists.openstack.org/pipermail/openstack-dev/2015-December/082683.html >> >> And brings us back to this discussion: >> https://blueprints.launchpad.net/nova/+spec/add-force-detach-to-nova >> >> What if we move our attention towards automatically recovering from >> the above issue? I am wondering if we can look at making our usually >> recovery code deal with the above situation: >> https://github.com/openstack/nova/blob/834b5a9e3a4f8c6ee2e3387845fc24c79f4bf615/nova/compute/manager.py#L934 >> >> >> Did we get the Cinder APIs in place that enable the force-detach? I >> think we did and it was this one? >> https://blueprints.launchpad.net/python-cinderclient/+spec/nova-force-detach-needs-cinderclient-api >> >> >> I think diablo_rojo might be able to help dig for any bugs we have >> related to this. I just wanted to get this idea out there before I >> head out. >> >> Thanks, >> John >> >> __ >> >> OpenStack Development Mailing List (not for usage questions) >> Unsubscribe: >> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> . >> > The problem is a little more complicated. > > In order for cinder backends to be able to do a force detach correctly, > the Cinder driver needs to have the correct 'connector' dictionary > passed in to terminate_connection. That connector dictionary is the > collection of initiator side information which is gleaned here: > https://github.com/openstack/os-brick/blob/master/os_brick/initiator/connector.py#L99-L144 > > > The plan was to save that connector information in the Cinder > volume_attachment table. When a force detach is called, Cinder has the > existing connector saved if Nova doesn't have it. The problem was live > migration. When you migrate to the destination n-cpu host, the > connector that Cinder had is now out of date. There is no API in Cinder > today to allow updating an existing attachment. > > So, the plan at the Mitaka summit was to add this new API, but it > required microversions to land, which we still don't have in Cinder's > API today. > > > Walt > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > Regarding storing off the initial connector information from the attach, does this [1] help bridge the gap? That adds the connector dict to the connection_info dict that is serialized and stored in the nova block_device_mappings table, and then in that patch is used to pass it to terminate_connection in the case that the host has changed. [1] https://review.openstack.org/#/c/266095/ -- Thanks, Matt Riedemann __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][cinder] volumes stuck detaching attaching and force detach
On 2/22/2016 4:08 PM, Walter A. Boring IV wrote: On 02/22/2016 11:24 AM, John Garbutt wrote: Hi, Just came up on IRC, when nova-compute gets killed half way through a volume attach (i.e. no graceful shutdown), things get stuck in a bad state, like volumes stuck in the attaching state. This looks like a new addition to this conversation: http://lists.openstack.org/pipermail/openstack-dev/2015-December/082683.html And brings us back to this discussion: https://blueprints.launchpad.net/nova/+spec/add-force-detach-to-nova What if we move our attention towards automatically recovering from the above issue? I am wondering if we can look at making our usually recovery code deal with the above situation: https://github.com/openstack/nova/blob/834b5a9e3a4f8c6ee2e3387845fc24c79f4bf615/nova/compute/manager.py#L934 Did we get the Cinder APIs in place that enable the force-detach? I think we did and it was this one? https://blueprints.launchpad.net/python-cinderclient/+spec/nova-force-detach-needs-cinderclient-api I think diablo_rojo might be able to help dig for any bugs we have related to this. I just wanted to get this idea out there before I head out. Thanks, John __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev . The problem is a little more complicated. In order for cinder backends to be able to do a force detach correctly, the Cinder driver needs to have the correct 'connector' dictionary passed in to terminate_connection. That connector dictionary is the collection of initiator side information which is gleaned here: https://github.com/openstack/os-brick/blob/master/os_brick/initiator/connector.py#L99-L144 The plan was to save that connector information in the Cinder volume_attachment table. When a force detach is called, Cinder has the existing connector saved if Nova doesn't have it. The problem was live migration. When you migrate to the destination n-cpu host, the connector that Cinder had is now out of date. There is no API in Cinder today to allow updating an existing attachment. So, the plan at the Mitaka summit was to add this new API, but it required microversions to land, which we still don't have in Cinder's API today. Walt __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev Regarding storing off the initial connector information from the attach, does this [1] help bridge the gap? That adds the connector dict to the connection_info dict that is serialized and stored in the nova block_device_mappings table, and then in that patch is used to pass it to terminate_connection in the case that the host has changed. [1] https://review.openstack.org/#/c/266095/ -- Thanks, Matt Riedemann __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][cinder] volumes stuck detaching attaching and force detach
On 22 February 2016 at 22:08, Walter A. Boring IVwrote: > On 02/22/2016 11:24 AM, John Garbutt wrote: >> >> Hi, >> >> Just came up on IRC, when nova-compute gets killed half way through a >> volume attach (i.e. no graceful shutdown), things get stuck in a bad >> state, like volumes stuck in the attaching state. >> >> This looks like a new addition to this conversation: >> >> http://lists.openstack.org/pipermail/openstack-dev/2015-December/082683.html >> And brings us back to this discussion: >> https://blueprints.launchpad.net/nova/+spec/add-force-detach-to-nova >> >> What if we move our attention towards automatically recovering from >> the above issue? I am wondering if we can look at making our usually >> recovery code deal with the above situation: >> >> https://github.com/openstack/nova/blob/834b5a9e3a4f8c6ee2e3387845fc24c79f4bf615/nova/compute/manager.py#L934 >> >> Did we get the Cinder APIs in place that enable the force-detach? I >> think we did and it was this one? >> >> https://blueprints.launchpad.net/python-cinderclient/+spec/nova-force-detach-needs-cinderclient-api >> >> I think diablo_rojo might be able to help dig for any bugs we have >> related to this. I just wanted to get this idea out there before I >> head out. >> >> Thanks, >> John >> >> __ >> OpenStack Development Mailing List (not for usage questions) >> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> . >> > The problem is a little more complicated. > > In order for cinder backends to be able to do a force detach correctly, the > Cinder driver needs to have the correct 'connector' dictionary passed in to > terminate_connection. That connector dictionary is the collection of > initiator side information which is gleaned here: > https://github.com/openstack/os-brick/blob/master/os_brick/initiator/connector.py#L99-L144 > > The plan was to save that connector information in the Cinder > volume_attachment table. When a force detach is called, Cinder has the > existing connector saved if Nova doesn't have it. The problem was live > migration. When you migrate to the destination n-cpu host, the connector > that Cinder had is now out of date. There is no API in Cinder today to > allow updating an existing attachment. > > So, the plan at the Mitaka summit was to add this new API, but it required > microversions to land, which we still don't have in Cinder's API today. Ah, OK. We do keep looping back to that core issue. Thanks, johnthetubaguy __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][cinder] volumes stuck detaching attaching and force detach
On 02/22/2016 11:24 AM, John Garbutt wrote: Hi, Just came up on IRC, when nova-compute gets killed half way through a volume attach (i.e. no graceful shutdown), things get stuck in a bad state, like volumes stuck in the attaching state. This looks like a new addition to this conversation: http://lists.openstack.org/pipermail/openstack-dev/2015-December/082683.html And brings us back to this discussion: https://blueprints.launchpad.net/nova/+spec/add-force-detach-to-nova What if we move our attention towards automatically recovering from the above issue? I am wondering if we can look at making our usually recovery code deal with the above situation: https://github.com/openstack/nova/blob/834b5a9e3a4f8c6ee2e3387845fc24c79f4bf615/nova/compute/manager.py#L934 Did we get the Cinder APIs in place that enable the force-detach? I think we did and it was this one? https://blueprints.launchpad.net/python-cinderclient/+spec/nova-force-detach-needs-cinderclient-api I think diablo_rojo might be able to help dig for any bugs we have related to this. I just wanted to get this idea out there before I head out. Thanks, John __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev . The problem is a little more complicated. In order for cinder backends to be able to do a force detach correctly, the Cinder driver needs to have the correct 'connector' dictionary passed in to terminate_connection. That connector dictionary is the collection of initiator side information which is gleaned here: https://github.com/openstack/os-brick/blob/master/os_brick/initiator/connector.py#L99-L144 The plan was to save that connector information in the Cinder volume_attachment table. When a force detach is called, Cinder has the existing connector saved if Nova doesn't have it. The problem was live migration. When you migrate to the destination n-cpu host, the connector that Cinder had is now out of date. There is no API in Cinder today to allow updating an existing attachment. So, the plan at the Mitaka summit was to add this new API, but it required microversions to land, which we still don't have in Cinder's API today. Walt __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [nova][cinder] volumes stuck detaching attaching and force detach
Hi, Just came up on IRC, when nova-compute gets killed half way through a volume attach (i.e. no graceful shutdown), things get stuck in a bad state, like volumes stuck in the attaching state. This looks like a new addition to this conversation: http://lists.openstack.org/pipermail/openstack-dev/2015-December/082683.html And brings us back to this discussion: https://blueprints.launchpad.net/nova/+spec/add-force-detach-to-nova What if we move our attention towards automatically recovering from the above issue? I am wondering if we can look at making our usually recovery code deal with the above situation: https://github.com/openstack/nova/blob/834b5a9e3a4f8c6ee2e3387845fc24c79f4bf615/nova/compute/manager.py#L934 Did we get the Cinder APIs in place that enable the force-detach? I think we did and it was this one? https://blueprints.launchpad.net/python-cinderclient/+spec/nova-force-detach-needs-cinderclient-api I think diablo_rojo might be able to help dig for any bugs we have related to this. I just wanted to get this idea out there before I head out. Thanks, John __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev