[Yahoo-eng-team] [Bug 1835926] Re: Volume attachment may fail after rescuing instance on an image with different hw_disk_bus
** Also affects: nova/queens Importance: Undecided Status: New ** Also affects: nova/train Importance: Undecided Status: New ** Also affects: nova/stein Importance: Undecided Status: New ** Also affects: nova/rocky Importance: Undecided Status: New ** Changed in: nova/train Status: New => Fix Released ** Changed in: nova/stein Status: New => In Progress ** Changed in: nova/rocky Status: New => In Progress ** Changed in: nova/queens Status: New => In Progress ** Changed in: nova/train Assignee: (unassigned) => Alexandre arents (aarents) ** Changed in: nova/stein Assignee: (unassigned) => Alexandre arents (aarents) ** Changed in: nova/rocky Assignee: (unassigned) => Alexandre arents (aarents) ** Changed in: nova/queens Assignee: (unassigned) => Alexandre arents (aarents) -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1835926 Title: Volume attachment may fail after rescuing instance on an image with different hw_disk_bus Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) queens series: In Progress Status in OpenStack Compute (nova) rocky series: In Progress Status in OpenStack Compute (nova) stein series: In Progress Status in OpenStack Compute (nova) train series: Fix Released Bug description: Description === Look likes rescue may update instances.root_device_name if rescue image has different disk bus (image property hw_disk_bus) than instance. This introduce a mimatch between device name and driver used for instance: During instance config generation, nova guess the disk bus driver according table instance_system_metadata.image_hw_disk_bus, and get root device name from table instances.root_device_name. Because of this mismatch, cinder attachment may failed with the following error message in compute log: unable to execute QEMU command 'device_add': Duplicate ID 'virtio-disk0' for device Probable solution is to avoid rescue action to update instance.root_device_name Steps to reproduce == On a fresh master devstack: openstack image save cirros-0.4.0-x86_64-disk --file /tmp/cirros-0.4.0-x86_64-disk.disk #create a new image, but an scsi one: openstack image create --container-format bare --disk-format qcow2 --file /tmp/cirros-0.4.0-x86_64-disk.disk --property hw_disk_bus='scsi' --property hw_scsi_model='virtio-scsi' cirros-0.4.0-x86_64-scsi-disk #create instance with default virtio driver: openstack server create --flavor m1.small --image cirros-0.4.0-x86_64-disk --nic net-id=private test mysql> select root_device_name from instances where uuid='xxx' /dev/vda #rescue instance but with the scsi image: $openstack server rescue --image cirros-0.4.0-x86_64-scsi-disk mysql> select root_device_name from instances where uuid='xxx' /dev/sda $openstack server unrescue # root_device_name is still on sda should be on vda according instance metadata mysql> select root_device_name from instances where uuid='xxx' /dev/sda $virsh dumpxml instance-0001 | grep "bus='virtio" # at the next hard reboot new xml is generated with scsi device name BUT with virtio driver. $openstack server reboot --hard xxx $virsh dumpxml instance-0001 | grep -A 1 "bus='virtio" $openstack volume create --size 10 test $openstack server add volume 1c9b1582-5fc7-417a-a8a0-387e8833731f 0621430c-b0d2-4cca-8868-f86f36f1ef29 $sudo journalctl -u devstack@n-cpu.service | grep Duplicate Jul 05 09:29:54 alex-devstack-compute2 nova-compute[28285]: ERROR nova.virt.libvirt.driver [None req-38714989-4deb-4a05-bdfc-3418edbda7e3 demo demo] [instance: 1c9b1582-5fc7-417a-a8a0-387e8833731f] Failed to attach volume at mountpoint: /dev/vda: libvirtError: internal error: unable to execute QEMU command 'device_add': Duplicate ID 'virtio-disk0' for device Error probably comes from the fact that nova lookup for next availiable virtio device based on name, which is vda - virtio-disk0 (as root device is currently sda) but because root device sda is already using virtio-disk0 it failed. Expected result === instance root_device_name should remain the same as before rescue/unrescue, regardless of image used for rescuing. Actual result = instance root_device_name is updated according the hw_disk_bus property for the image used during rescue(and never set back to original value) To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1835926/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.ne
[Yahoo-eng-team] [Bug 1835926] Re: Volume attachment may fail after rescuing instance on an image with different hw_disk_bus
Reviewed: https://review.opendev.org/67 Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=5e0ed5e7fee3c4c887263a0e9fa847c2dcc5cf3b Submitter: Zuul Branch:master commit 5e0ed5e7fee3c4c887263a0e9fa847c2dcc5cf3b Author: Alexandre Arents Date: Tue Jul 9 16:13:01 2019 + Do not update root_device_name during guest config _get_guest_config() is currently updating instance.root_device_name and called in many ways like: _hard_reboot(), rescue(), spawn(), resume(), finish_migration(), finish_revert_migration() It is an issue because root_device_name is initally set during instance build and should remain the same after: manager.py: _do_build_and_run_instance() .. _default_block_device_names() <-here .. driver.spawn() This may lead to edge case, like in rescue where this value can be mistakenly updated to reflect disk bus property of rescue image (hw_disk_bus). Further more, a _get* method should not modify instance object. Note that test test_get_guest_config_bug_1118829 is removed because no more relevant with current code. Change-Id: I1787f9717618d0837208844e8065840d30341cf7 Closes-Bug: #1835926 ** Changed in: nova Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1835926 Title: Volume attachment may fail after rescuing instance on an image with different hw_disk_bus Status in OpenStack Compute (nova): Fix Released Bug description: Description === Look likes rescue may update instances.root_device_name if rescue image has different disk bus (image property hw_disk_bus) than instance. This introduce a mimatch between device name and driver used for instance: During instance config generation, nova guess the disk bus driver according table instance_system_metadata.image_hw_disk_bus, and get root device name from table instances.root_device_name. Because of this mismatch, cinder attachment may failed with the following error message in compute log: unable to execute QEMU command 'device_add': Duplicate ID 'virtio-disk0' for device Probable solution is to avoid rescue action to update instance.root_device_name Steps to reproduce == On a fresh master devstack: openstack image save cirros-0.4.0-x86_64-disk --file /tmp/cirros-0.4.0-x86_64-disk.disk #create a new image, but an scsi one: openstack image create --container-format bare --disk-format qcow2 --file /tmp/cirros-0.4.0-x86_64-disk.disk --property hw_disk_bus='scsi' --property hw_scsi_model='virtio-scsi' cirros-0.4.0-x86_64-scsi-disk #create instance with default virtio driver: openstack server create --flavor m1.small --image cirros-0.4.0-x86_64-disk --nic net-id=private test mysql> select root_device_name from instances where uuid='xxx' /dev/vda #rescue instance but with the scsi image: $openstack server rescue --image cirros-0.4.0-x86_64-scsi-disk mysql> select root_device_name from instances where uuid='xxx' /dev/sda $openstack server unrescue # root_device_name is still on sda should be on vda according instance metadata mysql> select root_device_name from instances where uuid='xxx' /dev/sda $virsh dumpxml instance-0001 | grep "bus='virtio" # at the next hard reboot new xml is generated with scsi device name BUT with virtio driver. $openstack server reboot --hard xxx $virsh dumpxml instance-0001 | grep -A 1 "bus='virtio" $openstack volume create --size 10 test $openstack server add volume 1c9b1582-5fc7-417a-a8a0-387e8833731f 0621430c-b0d2-4cca-8868-f86f36f1ef29 $sudo journalctl -u devstack@n-cpu.service | grep Duplicate Jul 05 09:29:54 alex-devstack-compute2 nova-compute[28285]: ERROR nova.virt.libvirt.driver [None req-38714989-4deb-4a05-bdfc-3418edbda7e3 demo demo] [instance: 1c9b1582-5fc7-417a-a8a0-387e8833731f] Failed to attach volume at mountpoint: /dev/vda: libvirtError: internal error: unable to execute QEMU command 'device_add': Duplicate ID 'virtio-disk0' for device Error probably comes from the fact that nova lookup for next availiable virtio device based on name, which is vda - virtio-disk0 (as root device is currently sda) but because root device sda is already using virtio-disk0 it failed. Expected result === instance root_device_name should remain the same as before rescue/unrescue, regardless of image used for rescuing. Actual result = instance root_device_name is updated according the hw_disk_bus property for the image used during rescue(and never set back to original value) To manage notifications about this bug go to: https://bugs.