I am developing as cinder plugin and I recently faced one problem while running 
the cinder driver's certification test ( 
devstack/driver_certs/cinder_driver_cert.sh )

Here is what happened and my understanding of the problem (correct me if I am 
wrong)

>From the test logs I can see the following information:

{2} 
tempest.api.compute.volumes.test_attach_volume.AttachVolumeTestJSON.test_list_get_volume_attachments
 [238.283153s] ... FAILED
.....
Captured traceback:
~~~~~~~~~~~~~~~~~~~
    Traceback (most recent call last):
      File "tempest/api/compute/volumes/test_attach_volume.py", line 46, in 
_detach
        self.volumes_client.wait_for_volume_status(volume_id, 'available')
      File "tempest/services/volume/json/volumes_client.py", line 182, in 
wait_for_volume_status
        raise exceptions.TimeoutException(message)
    TimeoutException: Request timed out
    Details: Volume bfcd0dd4-2abd-4c06-b8b8-e05d77306d6a failed to reach 
available status (current: detaching) within the required time (196 s).
    Traceback (most recent call last):
    _StringException: Empty attachments:
      stderr
      stdout

when I check what the test case actually does, I see the following

44         if self.attachment:
45             self.servers_client.detach_volume(server_id, volume_id)
46             self.volumes_client.wait_for_volume_status(volume_id, 
'available')

So the volume failed to detach within the timeout period.

To do simple the test that failed does the following:

1 attach a volume
2 wait for the volume to be attached ( by checking the volume status to be 
'in-use')
3 detach the volume
4 wait for the volume to be detached ( by checking the volume status to be 
'available')

In nova-compute's log I see an Exception that seems to be the reason why we get 
this timeout. This happens when the detach request is received by nova-compute:

1226617609] Exception during message handling: <type 'NoneType'> can't be 
decoded
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher Traceback 
(most recent call last):
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher   File 
"/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 
142, in _dispatch_and_reply
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher     
executor_callback))
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher   File 
"/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 
186, in _dispatch
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher     
executor_callback)
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher   File 
"/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 
130, in _do_dispatch
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher     result = 
func(ctxt, **new_args)
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher   File 
"/opt/stack/nova/nova/compute/manager.py", line 414, in decorated_function
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher     return 
function(self, context, *args, **kwargs)
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher   File 
"/opt/stack/nova/nova/exception.py", line 88, in wrapped
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher     payload)
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher   File 
"/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 82, in 
__exit__
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher     
six.reraise(self.type_, self.value, self.tb)
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher   File 
"/opt/stack/nova/nova/exception.py", line 71, in wrapped
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher     return 
f(self, context, *args, **kw)
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher   File 
"/opt/stack/nova/nova/compute/manager.py", line 298, in decorated_function
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher     pass
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher   File 
"/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 82, in 
__exit__
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher     
six.reraise(self.type_, self.value, self.tb)
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher   File 
"/opt/stack/nova/nova/compute/manager.py", line 283, in decorated_function
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher     return 
function(self, context, *args, **kwargs)
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher   File 
"/opt/stack/nova/nova/compute/manager.py", line 326, in decorated_function
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher     
kwargs['instance'], e, sys.exc_info())
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher   File 
"/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 82, in 
__exit__
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher     
six.reraise(self.type_, self.value, self.tb)
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher   File 
"/opt/stack/nova/nova/compute/manager.py", line 314, in decorated_function
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher     return 
function(self, context, *args, **kwargs)
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher   File 
"/opt/stack/nova/nova/compute/manager.py", line 4730, in detach_volume
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher     
self._detach_volume(context, instance, bdm)
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher   File 
"/opt/stack/nova/nova/compute/manager.py", line 4671, in _detach_volume
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher     
connection_info = jsonutils.loads(bdm.connection_info)
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher   File 
"/usr/local/lib/python2.7/dist-packages/oslo_serialization/jsonutils.py", line 
213, in loads
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher     return 
json.loads(encodeutils.safe_decode(s, encoding), **kwargs)
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher   File 
"/usr/local/lib/python2.7/dist-packages/oslo_utils/encodeutils.py", line 33, in 
safe_decode
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher     raise 
TypeError("%s can't be decoded" % type(text))
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher TypeError: 
<type 'NoneType'> can't be decoded
2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher

When I run only this specific test case, the problem does not occur.

Hence I added some logs in nova-compute to find out the difference between 
these two executions. From those, I can see that the connection_info field is 
None when the Error is triggered.

This log is printed when nova receive the detach request ( just before parsing 
the connection_info field ).
2015-02-26 13:50:00.572 AUDIT nova.compute.manager 
[req-9930623e-c073-4c5b-bcaf-f4098245277f AttachVolumeTestJSON-1822846528 
AttachVolumeTestJSON-1226617609] print connection info : 
BlockDeviceMapping(boot_index=None,connection_info=None,created_at=2015-02-26T05:49:36Z,delete_on_termination=False,deleted=False,deleted_at=None,destination_type='volume',device_name='/dev/vdb',device_type=None,disk_bus=None,guest_format=None,id=166,image_id=None,instance=<?>,instance_uuid=9aeea261-30af-46eb-9fbb-53d03b65b43f,no_device=False,snapshot_id=None,source_type='volume',updated_at=None,volume_id='bfcd0dd4-2abd-4c06-b8b8-e05d77306d6a',volume_size=None)

Then I tried to figure out why this field was None and went back to the 
previous step : attach volume. From my basic understanding, when nova attach a 
volume, it will update two things ( source : nova/virt/libvirt/block_device.py)

-          1 The volume status ( in cinder's database )

-          2 The connection_info ( after the call to virt_driver.attach_volume )
>From the logs in cinder (c-api.log )

The first time the volume status is mentioned as 'in-use' is at 13:49:59.491 in 
(cinder-api's log)
2-26 13:49:59.491 31396 INFO cinder.api.v1.volumes 
[req-dc56cd34-8ae3-42b0-8c35-8c01cd142955 690795b5c8c74e4fb6d292eeebef12d5 
b49a40aa157e460592f3e3ac9a9f16eb - - -] vol={'migration_status': None, 
'provider_id': None, 'availability_zone': u'nova', 'terminated_at': None, 
'updated_at': datetime.datetime(2015, 2, 26, 5, 49, 58), 'provider_geometry': 
None, 'replication_extended_status': None, 'replication_status': u'disabled', 
'snapshot_id': None, 'ec2_id': None, 'mountpoint': u'/dev/vdb', 'deleted_at': 
None, 'id': u'bfcd0dd4-2abd-4c06-b8b8-e05d77306d6a', 'size': 1L, 'user_id': 
u'690795b5c8c74e4fb6d292eeebef12d5', 'attach_time': 
u'2015-02-26T05:49:52.128113', 'attached_host': None, 'display_description': 
None, 'volume_admin_metadata': 
u'[<cinder.db.sqlalchemy.models.VolumeAdminMetadata object at 0x7fa32ed8cd10>, 
<cinder.db.sqlalchemy.models.VolumeAdminMetadata object at 0x7fa32ef789d0>]', 
'project_id': u'b49a40aa157e460592f3e3ac9a9f16eb', 'launched_at': 
datetime.datetime(2015, 2, 26, 5, 49, 35), 'scheduled_at': 
datetime.datetime(2015, 2, 26, 5, 49, 33), 'status': u'in-use', 
'volume_type_id': None, 'deleted': False, 'provider_location': u'999993877', 
'host': u'cwisr3s18#disco', 'consistencygroup_id': None, 'source_volid': None, 
'provider_auth': None, 'display_name': u'test', 'instance_uuid': 
u'9aeea261-30af-46eb-9fbb-53d03b65b43f', 'bootable': False, 'created_at': 
datetime.datetime(2015, 2, 26, 5, 49, 32), 'attach_status': u'attached', 
'volume_type': None, 'consistencygroup': None, 'volume_metadata': [], 
'_name_id': None, 'encryption_key_id': None, 'replication_driver_data': None, 
'metadata': {u'readonly': u'False', u'attached_mode': u'rw'}}


Which matches approximately the time the test script got the in-use status

2015-02-26 13:49:59,558 31513 DEBUG    [tempest_lib.common.rest_client] Request 
(AttachVolumeTestJSON:test_list_get_volume_attachments): 200 GET 
http://10.203.18.1:8776/v1/b49a40aa157e460592f3e3ac9a9f16eb/volumes/bfcd0dd4-2abd-4c06-b8b8-e05d77306d6a
 0.099s
        Request - Headers: {'Content-Type': 'application/json', 'Accept': 
'application/json', 'X-Auth-Token': '<omitted>'}
            Body: None
        Response - Headers: {'status': '200', 'content-length': '748', 
'content-location': 
'http://10.203.18.1:8776/v1/b49a40aa157e460592f3e3ac9a9f16eb/volumes/bfcd0dd4-2abd-4c06-b8b8-e05d77306d6a',
 'x-compute-request-id': 'req-dc56cd34-8ae3-42b0-8c35-8c01cd142955', 
'connection': 'close', 'date': 'Thu, 26 Feb 2015 05:49:59 GMT', 'content-type': 
'application/json', 'x-openstack-request-id': 
'req-dc56cd34-8ae3-42b0-8c35-8c01cd142955'}
            Body: {"volume": {"status": "in-use", "display_name": "test", 
"attachments": [{"host_name": null, "device": "/dev/vdb", "server_id": 
"9aeea261-30af-46eb-9fbb-53d03b65b43f", "id": 
"bfcd0dd4-2abd-4c06-b8b8-e05d77306d6a", "volume_id": 
"bfcd0dd4-2abd-4c06-b8b8-e05d77306d6a"}], "availability_zone": "nova", 
"bootable": "false", "encrypted": false, "created_at": 
"2015-02-26T05:49:32.000000", "os-vol-tenant-attr:tenant_id": 
"b49a40aa157e460592f3e3ac9a9f16eb", "os-volume-replication:driver_data": null, 
"display_description": null, "os-volume-replication:extended_status": null, 
"volume_type": null, "snapshot_id": null, "source_volid": null, "metadata": 
{"readonly": "False", "attached_mode": "rw"}, "id": 
"bfcd0dd4-2abd-4c06-b8b8-e05d77306d6a", "size": 1}}

After that the test script will call detach volume and we will face the error 
previously mentioned

Besides I also added one log at the very last line of the function attach in 
nova/virt/block_device.py in order to check the connection info

    @update_db
    def attach(self, context, instance, volume_api, virt_driver,
               do_check_attach=True, do_driver_attach=False):

>From Nova-cpu's log I can see that the log is displayed at

2015-02-26 13:50:02.003 AUDIT nova.virt.block_device 
[req-fd8124ca-d608-44a0-a1b9-7cb933fd840f AttachVolumeTestJSON-1822846528 
AttachVolumeTestJSON-1226617609] test thelo 7 {u'driver_volume_type': u'disco', 
'serial': u'bfcd0dd4-2abd-4c06-b8b8-e05d77306d6a', u'data': {'device_path': 
u'/dev/dms999993877', u'access_mode': u'rw', u'disco_id': u'999993877', 
u'qos_specs': None, u'name': u'volume-bfcd0dd4-2abd-4c06-b8b8-e05d77306d6a'}}

Which means that the detach_volume is called before attach_volume functions 
actually ends and potentially update the connection_info field in the database.

To confirm this logic, I added a "sleep" before we detach the volume in tempest 
test and run the test again. This time the test passed.

Does this logic seem correct? If yes, is that a bug and shouldn't we guarantee 
that the connection_info is saved into the database before we update the volume 
status?

Thanks for taking the time to read this,

Thelo

====================================================================
This email may contain confidential information. Please do not use or disclose 
it in any way and delete it if you are not the intended recipient.

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to