I am developing as cinder plugin and I recently faced one problem while running
the cinder driver's certification test (
devstack/driver_certs/cinder_driver_cert.sh )
Here is what happened and my understanding of the problem (correct me if I am
wrong)
>From the test logs I can see the following information:
{2}
tempest.api.compute.volumes.test_attach_volume.AttachVolumeTestJSON.test_list_get_volume_attachments
[238.283153s] ... FAILED
.....
Captured traceback:
~~~~~~~~~~~~~~~~~~~
Traceback (most recent call last):
File "tempest/api/compute/volumes/test_attach_volume.py", line 46, in
_detach
self.volumes_client.wait_for_volume_status(volume_id, 'available')
File "tempest/services/volume/json/volumes_client.py", line 182, in
wait_for_volume_status
raise exceptions.TimeoutException(message)
TimeoutException: Request timed out
Details: Volume bfcd0dd4-2abd-4c06-b8b8-e05d77306d6a failed to reach
available status (current: detaching) within the required time (196 s).
Traceback (most recent call last):
_StringException: Empty attachments:
stderr
stdout
when I check what the test case actually does, I see the following
44 if self.attachment:
45 self.servers_client.detach_volume(server_id, volume_id)
46 self.volumes_client.wait_for_volume_status(volume_id,
'available')
So the volume failed to detach within the timeout period.
To do simple the test that failed does the following:
1 attach a volume
2 wait for the volume to be attached ( by checking the volume status to be
'in-use')
3 detach the volume
4 wait for the volume to be detached ( by checking the volume status to be
'available')
In nova-compute's log I see an Exception that seems to be the reason why we get
this timeout. This happens when the detach request is received by nova-compute:
1226617609] Exception during message handling: <type 'NoneType'> can't be
decoded
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher Traceback
(most recent call last):
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher File
"/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line
142, in _dispatch_and_reply
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher
executor_callback))
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher File
"/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line
186, in _dispatch
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher
executor_callback)
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher File
"/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line
130, in _do_dispatch
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher result =
func(ctxt, **new_args)
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher File
"/opt/stack/nova/nova/compute/manager.py", line 414, in decorated_function
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher return
function(self, context, *args, **kwargs)
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher File
"/opt/stack/nova/nova/exception.py", line 88, in wrapped
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher payload)
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher File
"/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 82, in
__exit__
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher
six.reraise(self.type_, self.value, self.tb)
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher File
"/opt/stack/nova/nova/exception.py", line 71, in wrapped
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher return
f(self, context, *args, **kw)
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher File
"/opt/stack/nova/nova/compute/manager.py", line 298, in decorated_function
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher pass
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher File
"/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 82, in
__exit__
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher
six.reraise(self.type_, self.value, self.tb)
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher File
"/opt/stack/nova/nova/compute/manager.py", line 283, in decorated_function
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher return
function(self, context, *args, **kwargs)
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher File
"/opt/stack/nova/nova/compute/manager.py", line 326, in decorated_function
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher
kwargs['instance'], e, sys.exc_info())
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher File
"/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 82, in
__exit__
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher
six.reraise(self.type_, self.value, self.tb)
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher File
"/opt/stack/nova/nova/compute/manager.py", line 314, in decorated_function
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher return
function(self, context, *args, **kwargs)
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher File
"/opt/stack/nova/nova/compute/manager.py", line 4730, in detach_volume
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher
self._detach_volume(context, instance, bdm)
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher File
"/opt/stack/nova/nova/compute/manager.py", line 4671, in _detach_volume
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher
connection_info = jsonutils.loads(bdm.connection_info)
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher File
"/usr/local/lib/python2.7/dist-packages/oslo_serialization/jsonutils.py", line
213, in loads
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher return
json.loads(encodeutils.safe_decode(s, encoding), **kwargs)
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher File
"/usr/local/lib/python2.7/dist-packages/oslo_utils/encodeutils.py", line 33, in
safe_decode
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher raise
TypeError("%s can't be decoded" % type(text))
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher TypeError:
<type 'NoneType'> can't be decoded
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher
When I run only this specific test case, the problem does not occur.
Hence I added some logs in nova-compute to find out the difference between
these two executions. From those, I can see that the connection_info field is
None when the Error is triggered.
This log is printed when nova receive the detach request ( just before parsing
the connection_info field ).
2015-02-26 13:50:00.572 AUDIT nova.compute.manager
[req-9930623e-c073-4c5b-bcaf-f4098245277f AttachVolumeTestJSON-1822846528
AttachVolumeTestJSON-1226617609] print connection info :
BlockDeviceMapping(boot_index=None,connection_info=None,created_at=2015-02-26T05:49:36Z,delete_on_termination=False,deleted=False,deleted_at=None,destination_type='volume',device_name='/dev/vdb',device_type=None,disk_bus=None,guest_format=None,id=166,image_id=None,instance=<?>,instance_uuid=9aeea261-30af-46eb-9fbb-53d03b65b43f,no_device=False,snapshot_id=None,source_type='volume',updated_at=None,volume_id='bfcd0dd4-2abd-4c06-b8b8-e05d77306d6a',volume_size=None)
Then I tried to figure out why this field was None and went back to the
previous step : attach volume. From my basic understanding, when nova attach a
volume, it will update two things ( source : nova/virt/libvirt/block_device.py)
- 1 The volume status ( in cinder's database )
- 2 The connection_info ( after the call to virt_driver.attach_volume )
>From the logs in cinder (c-api.log )
The first time the volume status is mentioned as 'in-use' is at 13:49:59.491 in
(cinder-api's log)
2-26 13:49:59.491 31396 INFO cinder.api.v1.volumes
[req-dc56cd34-8ae3-42b0-8c35-8c01cd142955 690795b5c8c74e4fb6d292eeebef12d5
b49a40aa157e460592f3e3ac9a9f16eb - - -] vol={'migration_status': None,
'provider_id': None, 'availability_zone': u'nova', 'terminated_at': None,
'updated_at': datetime.datetime(2015, 2, 26, 5, 49, 58), 'provider_geometry':
None, 'replication_extended_status': None, 'replication_status': u'disabled',
'snapshot_id': None, 'ec2_id': None, 'mountpoint': u'/dev/vdb', 'deleted_at':
None, 'id': u'bfcd0dd4-2abd-4c06-b8b8-e05d77306d6a', 'size': 1L, 'user_id':
u'690795b5c8c74e4fb6d292eeebef12d5', 'attach_time':
u'2015-02-26T05:49:52.128113', 'attached_host': None, 'display_description':
None, 'volume_admin_metadata':
u'[<cinder.db.sqlalchemy.models.VolumeAdminMetadata object at 0x7fa32ed8cd10>,
<cinder.db.sqlalchemy.models.VolumeAdminMetadata object at 0x7fa32ef789d0>]',
'project_id': u'b49a40aa157e460592f3e3ac9a9f16eb', 'launched_at':
datetime.datetime(2015, 2, 26, 5, 49, 35), 'scheduled_at':
datetime.datetime(2015, 2, 26, 5, 49, 33), 'status': u'in-use',
'volume_type_id': None, 'deleted': False, 'provider_location': u'999993877',
'host': u'cwisr3s18#disco', 'consistencygroup_id': None, 'source_volid': None,
'provider_auth': None, 'display_name': u'test', 'instance_uuid':
u'9aeea261-30af-46eb-9fbb-53d03b65b43f', 'bootable': False, 'created_at':
datetime.datetime(2015, 2, 26, 5, 49, 32), 'attach_status': u'attached',
'volume_type': None, 'consistencygroup': None, 'volume_metadata': [],
'_name_id': None, 'encryption_key_id': None, 'replication_driver_data': None,
'metadata': {u'readonly': u'False', u'attached_mode': u'rw'}}
Which matches approximately the time the test script got the in-use status
2015-02-26 13:49:59,558 31513 DEBUG [tempest_lib.common.rest_client] Request
(AttachVolumeTestJSON:test_list_get_volume_attachments): 200 GET
http://10.203.18.1:8776/v1/b49a40aa157e460592f3e3ac9a9f16eb/volumes/bfcd0dd4-2abd-4c06-b8b8-e05d77306d6a
0.099s
Request - Headers: {'Content-Type': 'application/json', 'Accept':
'application/json', 'X-Auth-Token': '<omitted>'}
Body: None
Response - Headers: {'status': '200', 'content-length': '748',
'content-location':
'http://10.203.18.1:8776/v1/b49a40aa157e460592f3e3ac9a9f16eb/volumes/bfcd0dd4-2abd-4c06-b8b8-e05d77306d6a',
'x-compute-request-id': 'req-dc56cd34-8ae3-42b0-8c35-8c01cd142955',
'connection': 'close', 'date': 'Thu, 26 Feb 2015 05:49:59 GMT', 'content-type':
'application/json', 'x-openstack-request-id':
'req-dc56cd34-8ae3-42b0-8c35-8c01cd142955'}
Body: {"volume": {"status": "in-use", "display_name": "test",
"attachments": [{"host_name": null, "device": "/dev/vdb", "server_id":
"9aeea261-30af-46eb-9fbb-53d03b65b43f", "id":
"bfcd0dd4-2abd-4c06-b8b8-e05d77306d6a", "volume_id":
"bfcd0dd4-2abd-4c06-b8b8-e05d77306d6a"}], "availability_zone": "nova",
"bootable": "false", "encrypted": false, "created_at":
"2015-02-26T05:49:32.000000", "os-vol-tenant-attr:tenant_id":
"b49a40aa157e460592f3e3ac9a9f16eb", "os-volume-replication:driver_data": null,
"display_description": null, "os-volume-replication:extended_status": null,
"volume_type": null, "snapshot_id": null, "source_volid": null, "metadata":
{"readonly": "False", "attached_mode": "rw"}, "id":
"bfcd0dd4-2abd-4c06-b8b8-e05d77306d6a", "size": 1}}
After that the test script will call detach volume and we will face the error
previously mentioned
Besides I also added one log at the very last line of the function attach in
nova/virt/block_device.py in order to check the connection info
@update_db
def attach(self, context, instance, volume_api, virt_driver,
do_check_attach=True, do_driver_attach=False):
>From Nova-cpu's log I can see that the log is displayed at
2015-02-26 13:50:02.003 AUDIT nova.virt.block_device
[req-fd8124ca-d608-44a0-a1b9-7cb933fd840f AttachVolumeTestJSON-1822846528
AttachVolumeTestJSON-1226617609] test thelo 7 {u'driver_volume_type': u'disco',
'serial': u'bfcd0dd4-2abd-4c06-b8b8-e05d77306d6a', u'data': {'device_path':
u'/dev/dms999993877', u'access_mode': u'rw', u'disco_id': u'999993877',
u'qos_specs': None, u'name': u'volume-bfcd0dd4-2abd-4c06-b8b8-e05d77306d6a'}}
Which means that the detach_volume is called before attach_volume functions
actually ends and potentially update the connection_info field in the database.
To confirm this logic, I added a "sleep" before we detach the volume in tempest
test and run the test again. This time the test passed.
Does this logic seem correct? If yes, is that a bug and shouldn't we guarantee
that the connection_info is saved into the database before we update the volume
status?
Thanks for taking the time to read this,
Thelo
====================================================================
This email may contain confidential information. Please do not use or disclose
it in any way and delete it if you are not the intended recipient.
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev