Public bug reported: Sometimes the following dirty BDM enty (1.row) can be seen in the database that multiple BDMs with the same image_id and instance_uuid.
mysql> select * from block_device_mapping where volume_id='153bcab4-1f88-440c-9782-3c661a7502a8' \G *************************** 1. row *************************** created_at: 2017-02-02 02:28:45 updated_at: NULL deleted_at: NULL id: 9754 device_name: /dev/vdb delete_on_termination: 0 snapshot_id: NULL volume_id: 153bcab4-1f88-440c-9782-3c661a7502a8 volume_size: NULL no_device: NULL connection_info: NULL instance_uuid: b52f9264-d8b3-406a-bf9b-d7d7471b13fc deleted: 0 source_type: volume destination_type: volume guest_format: NULL device_type: NULL disk_bus: NULL boot_index: NULL image_id: NULL *************************** 2. row *************************** created_at: 2017-02-02 02:29:31 updated_at: 2017-02-27 10:59:42 deleted_at: NULL id: 9757 device_name: /dev/vdc delete_on_termination: 0 snapshot_id: NULL volume_id: 153bcab4-1f88-440c-9782-3c661a7502a8 volume_size: NULL no_device: NULL connection_info: {"driver_volume_type": "rbd", "serial": "153bcab4-1f88-440c-9782-3c661a7502a8", "data": {"secret_type": "ceph", "name": "cinder-ceph/volume-153bcab4-1f88-440c-9782-3c661a7502a8", "secret_uuid": null, "qos_specs": null, "hosts": ["10.7.1.202", "10.7.1.203", "10.7.1.204"], "auth_enabled": true, "access_mode": "rw", "auth_username": "cinder-ceph", "ports": ["6789", "6789", "6789"]}} instance_uuid: b52f9264-d8b3-406a-bf9b-d7d7471b13fc deleted: 0 source_type: volume destination_type: volume guest_format: NULL device_type: disk disk_bus: virtio boot_index: NULL image_id: NULL then it cause we fail to detach the volume and see the following error since connection_info of row 1 is NULL. 2017-03-23 13:28:05.360 1865733 TRACE oslo_messaging.rpc.dispatcher self._detach_volume(context, instance, bdm) 2017-03-23 13:28:05.360 1865733 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 4801, in _detach_volume 2017-03-23 13:28:05.360 1865733 TRACE oslo_messaging.rpc.dispatcher connection_info = jsonutils.loads(bdm.connection_info) 2017-03-23 13:28:05.360 1865733 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/oslo_serialization/jsonutils.py", line 215, in loads 2017-03-23 13:28:05.360 1865733 TRACE oslo_messaging.rpc.dispatcher return json.loads(encodeutils.safe_decode(s, encoding), **kwargs) 2017-03-23 13:28:05.360 1865733 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/oslo_utils/encodeutils.py", line 33, in safe_decode 2017-03-23 13:28:05.360 1865733 TRACE oslo_messaging.rpc.dispatcher raise TypeError("%s can't be decoded" % type(text)) 2017-03-23 13:28:05.360 1865733 TRACE oslo_messaging.rpc.dispatcher TypeError: <type 'NoneType'> can't be decoded This kind of dirty data can be produced when happened to fail to run this line _attach_volume()#volume_bdm.destroy() [1], I think these conditions may cause it to happen: 1, lose the database during the operation volume_bdm.destroy() 2, lose an MQ connection or RPC timing out during the operation volume_bdm.destroy() If you lose the database during any operation, things are going to be bad, so in general I'm not sure how realistic guarding for that case is. Losing an MQ connection or RPC timing out is probably more realistic. Seems the fix [2] is trying to solve the point 2. However, I'm thinking if we can bypass the dirty BDM entry according to the condition that connection_info is NULL no matter how it is produced. [1] https://github.com/openstack/nova/blob/master/nova/compute/api.py#L3724 [2] https://review.openstack.org/#/c/290793 ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1681998 Title: Bypass the dirty BDM enty no matter how it is produced Status in OpenStack Compute (nova): New Bug description: Sometimes the following dirty BDM enty (1.row) can be seen in the database that multiple BDMs with the same image_id and instance_uuid. mysql> select * from block_device_mapping where volume_id='153bcab4-1f88-440c-9782-3c661a7502a8' \G *************************** 1. row *************************** created_at: 2017-02-02 02:28:45 updated_at: NULL deleted_at: NULL id: 9754 device_name: /dev/vdb delete_on_termination: 0 snapshot_id: NULL volume_id: 153bcab4-1f88-440c-9782-3c661a7502a8 volume_size: NULL no_device: NULL connection_info: NULL instance_uuid: b52f9264-d8b3-406a-bf9b-d7d7471b13fc deleted: 0 source_type: volume destination_type: volume guest_format: NULL device_type: NULL disk_bus: NULL boot_index: NULL image_id: NULL *************************** 2. row *************************** created_at: 2017-02-02 02:29:31 updated_at: 2017-02-27 10:59:42 deleted_at: NULL id: 9757 device_name: /dev/vdc delete_on_termination: 0 snapshot_id: NULL volume_id: 153bcab4-1f88-440c-9782-3c661a7502a8 volume_size: NULL no_device: NULL connection_info: {"driver_volume_type": "rbd", "serial": "153bcab4-1f88-440c-9782-3c661a7502a8", "data": {"secret_type": "ceph", "name": "cinder-ceph/volume-153bcab4-1f88-440c-9782-3c661a7502a8", "secret_uuid": null, "qos_specs": null, "hosts": ["10.7.1.202", "10.7.1.203", "10.7.1.204"], "auth_enabled": true, "access_mode": "rw", "auth_username": "cinder-ceph", "ports": ["6789", "6789", "6789"]}} instance_uuid: b52f9264-d8b3-406a-bf9b-d7d7471b13fc deleted: 0 source_type: volume destination_type: volume guest_format: NULL device_type: disk disk_bus: virtio boot_index: NULL image_id: NULL then it cause we fail to detach the volume and see the following error since connection_info of row 1 is NULL. 2017-03-23 13:28:05.360 1865733 TRACE oslo_messaging.rpc.dispatcher self._detach_volume(context, instance, bdm) 2017-03-23 13:28:05.360 1865733 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 4801, in _detach_volume 2017-03-23 13:28:05.360 1865733 TRACE oslo_messaging.rpc.dispatcher connection_info = jsonutils.loads(bdm.connection_info) 2017-03-23 13:28:05.360 1865733 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/oslo_serialization/jsonutils.py", line 215, in loads 2017-03-23 13:28:05.360 1865733 TRACE oslo_messaging.rpc.dispatcher return json.loads(encodeutils.safe_decode(s, encoding), **kwargs) 2017-03-23 13:28:05.360 1865733 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/oslo_utils/encodeutils.py", line 33, in safe_decode 2017-03-23 13:28:05.360 1865733 TRACE oslo_messaging.rpc.dispatcher raise TypeError("%s can't be decoded" % type(text)) 2017-03-23 13:28:05.360 1865733 TRACE oslo_messaging.rpc.dispatcher TypeError: <type 'NoneType'> can't be decoded This kind of dirty data can be produced when happened to fail to run this line _attach_volume()#volume_bdm.destroy() [1], I think these conditions may cause it to happen: 1, lose the database during the operation volume_bdm.destroy() 2, lose an MQ connection or RPC timing out during the operation volume_bdm.destroy() If you lose the database during any operation, things are going to be bad, so in general I'm not sure how realistic guarding for that case is. Losing an MQ connection or RPC timing out is probably more realistic. Seems the fix [2] is trying to solve the point 2. However, I'm thinking if we can bypass the dirty BDM entry according to the condition that connection_info is NULL no matter how it is produced. [1] https://github.com/openstack/nova/blob/master/nova/compute/api.py#L3724 [2] https://review.openstack.org/#/c/290793 To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1681998/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp