On 12/2/2015 11:37 AM, Rosa, Andrea (HP Cloud Services) wrote:
Hi
thanks Sean for bringing this point, I have been working on the change and on
the (abandoned) spec.
I'll try here to summarize all the discussions we had and what we decided.
From: Sean Dague [mailto:s...@dague.net]
Sent: 02 December 2015 13:31
To: OpenStack Development Mailing List (not for usage questions)
Subject: [openstack-dev] [nova] what are the key errors with volume detach
This patch to add a bunch of logic to nova-manage for forcing volume detach
raised a bunch of questions
https://review.openstack.org/#/c/184537/24/nova/cmd/manage.py,cm
On this specific review there are some valid concerns that I am happy to
address, but first we need to understand if we want this change.
FWIW I think it is still a valid change, please see below.
In thinking about this for the last day, I think the real concern is that we
have
so many safety checks on volume delete, that if we failed with a partially
setup volume, we have too many safety latches to tear it down again.
Do we have some detailed bugs about how that happens? Is it possible to
just fix DELETE to work correctly even when we're in these odd states?
In a simplified view of a detach volume we can say that the nova code does:
1 detach the volume from the instance
2 Inform cinder about the detach and call the terminate_connection on the
cinder API.
3 delete the dbm recod in the nova DB
We actually:
1. terminate the connection in cinder:
https://github.com/openstack/nova/blob/c4ca1abb4a49bf0bce765acd3ce906bd117ce9b7/nova/compute/manager.py#L2312
2. detach the volume
https://github.com/openstack/nova/blob/c4ca1abb4a49bf0bce765acd3ce906bd117ce9b7/nova/compute/manager.py#L2315
3. delete the volume (if marked for delete_on_termination):
https://github.com/openstack/nova/blob/c4ca1abb4a49bf0bce765acd3ce906bd117ce9b7/nova/compute/manager.py#L2348
4. delete the bdm in the nova db:
https://github.com/openstack/nova/blob/c4ca1abb4a49bf0bce765acd3ce906bd117ce9b7/nova/compute/manager.py#L908
So if terminate_connection fails, we shouldn't get to detach. And if
detach fails, we shouldn't get to delete.
If 2 fails the volumes get stuck in a detaching status and any further attempt
to delete or detach the volume will fail:
"Delete for volume <volume_id> failed: Volume <volume_id> is still attached, detach
volume first. (HTTP 400)"
And if you try to detach:
"EROR (BadRequest): Invalid input received: Invalid volume: Unable to detach volume.
Volume status must be 'in-use' and attach_status must be 'attached' to detach. Currently:
status: 'detaching', attach_status: 'attached.' (HTTP 400)"
at the moment the only way to clean up the situation is to hack the nova DB for
deleting the bdm record and do some hack on the cinder side as well.
We wanted a way to clean up the situation avoiding the manual hack to the nova
DB.
Can't cinder rollback state somehow if it's bogus or failed an
operation? For example, if detach failed, shouldn't we not be in
'detaching' state? This is like auto-reverting task_state on server
instances when an operation fails so that we can reset or delete those
servers if needed.
Solution proposed #1
Move the deletion of the bdm record so as it happens before calling cinder, I
thought that was ok as from nova side we have done, no leaking bdm and the
problem was just in the cinder side, but I was wrong.
We have to call the terminate_connection otherwise the device may show back on
the nova host, for example that is true for iSCSI volumes:
"if an iSCSI session from the compute host to the storage backend still exists
(because other volumes are connected), then the volume you just removed will show back up
on the next scsi bus rescan."
The key point here is that Nova must call the terminate_connection because just Nova has
the "connector info" to call the terminate connection method, so Cinder can't
fix it.
Solution proposed #2
Then I thought, ok, so let's expose a new nova API called force delete volume which skips
all the check and allow to detach a volume in "detaching" status, I thought it
was ok but I was wrong (again).
The main concern here is that we do not want to have the concept of "force
delete", the user already asked for detaching the volume and the call should be
idempotent and just work.
So adding a new API was just adding a technical debt in the RESP API for a
buggy/weak interaction between the Cinder API and Nova, or in other words we
are adding a Nova API for fixing a bug in Cinder, which is very odd.
Solution proposed #3
Ok, so the solution is to fix the Cinder API and makes the interaction between
Nova volume manager and that API robust.
This time I was right (YAY) but as you can imagine this fix is not going to be
an easy one and after talking with Cinder guys they clearly told me that thatt
is going to be a massive change in the Cinder API and it is unlikely to land in
the N(utella) or O(melette) release.
Solution proposed #4: The trade-off
This solution is the solution I am proposing in the patch you mentioned, the
idea is to have a temporary solution which allows us to give a handy tool for
the operators to easily fix a problem which can occur quite often.
The solution should be something with a low impact on the nova code and easy to
remove when we will have the proper solution for the root cause.
"quick", "useful", "dirty", "tool", "trade-off", "db"....we call it
nova-manage!
The idea is to put a new method in the compute/api.py which skips all the
checks on the volume status and go ahead with calling the detach_volume on the
compute manager to detach the volume, call the terminate_connection and clean
the bdm entry.
But I thought calling detach on the cinder API when the volume is
already in detaching status would be a failure? Or is that only the
check in the nova API?
Nova-manage will have a command to call directly that new method.
That is a recap that I hope can help to understand how we decided to use
nova-manage instead of other solutions, it was a bit long but I tried to
condensate the comments from 53 spec patches + 24 code change patches (and
counting).
PS: I added the cinder tag as they are interested in this change as well.
Thanks
--
Andrea Rosa
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
As Sean pointed out in another reply, I feel like what we're really
missing here is some rollback code in the case that delete fails so we
don't get in this stuck state and have to rely on deleting the BDMs
manually in the database just to delete the instance.
We should rollback on delete fail 1 so that delete request 2 can pass
the 'check attach' checks again.
--
Thanks,
Matt Riedemann
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev