[Yahoo-eng-team] [Bug 1835926] Re: Volume attachment may fail after rescuing instance on an image with different hw_disk_bus

2019-12-06 Thread Matt Riedemann
** Also affects: nova/queens
   Importance: Undecided
   Status: New

** Also affects: nova/train
   Importance: Undecided
   Status: New

** Also affects: nova/stein
   Importance: Undecided
   Status: New

** Also affects: nova/rocky
   Importance: Undecided
   Status: New

** Changed in: nova/train
   Status: New => Fix Released

** Changed in: nova/stein
   Status: New => In Progress

** Changed in: nova/rocky
   Status: New => In Progress

** Changed in: nova/queens
   Status: New => In Progress

** Changed in: nova/train
 Assignee: (unassigned) => Alexandre arents (aarents)

** Changed in: nova/stein
 Assignee: (unassigned) => Alexandre arents (aarents)

** Changed in: nova/rocky
 Assignee: (unassigned) => Alexandre arents (aarents)

** Changed in: nova/queens
 Assignee: (unassigned) => Alexandre arents (aarents)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1835926

Title:
  Volume attachment may fail after rescuing instance on an image with
  different hw_disk_bus

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) queens series:
  In Progress
Status in OpenStack Compute (nova) rocky series:
  In Progress
Status in OpenStack Compute (nova) stein series:
  In Progress
Status in OpenStack Compute (nova) train series:
  Fix Released

Bug description:
  Description
  ===

  Look likes rescue may update instances.root_device_name if rescue image has 
different disk bus (image property hw_disk_bus) than instance.
  This introduce a mimatch between device name and driver used for instance:

  During instance config generation, nova guess the disk bus driver according 
table instance_system_metadata.image_hw_disk_bus,
  and get root device name from table instances.root_device_name.
  Because of this mismatch, cinder attachment may failed with the following 
error message in compute log:
   unable to execute QEMU command 'device_add': Duplicate ID 'virtio-disk0' for 
device

  Probable solution is to avoid rescue action to update
  instance.root_device_name

  Steps to reproduce
  ==

  On a fresh master devstack:
  openstack image save cirros-0.4.0-x86_64-disk --file 
/tmp/cirros-0.4.0-x86_64-disk.disk
  #create a new image, but an scsi one:
  openstack image create --container-format bare --disk-format qcow2  --file 
/tmp/cirros-0.4.0-x86_64-disk.disk --property hw_disk_bus='scsi' --property  
hw_scsi_model='virtio-scsi' cirros-0.4.0-x86_64-scsi-disk
  #create instance with default virtio driver:
  openstack server create  --flavor m1.small --image cirros-0.4.0-x86_64-disk 
--nic net-id=private test
  mysql> select root_device_name from instances where uuid='xxx'
  /dev/vda
  #rescue instance but with the scsi image: 
  $openstack server rescue  --image  cirros-0.4.0-x86_64-scsi-disk
  mysql> select root_device_name from instances where uuid='xxx'
  /dev/sda
  $openstack server unrescue 
  # root_device_name is still on sda should be on vda according instance 
metadata
  mysql> select root_device_name from instances where uuid='xxx'
  /dev/sda
  $virsh dumpxml instance-0001 | grep "bus='virtio"
  
  
  # at the next hard reboot new xml is generated with scsi device name BUT with 
virtio driver.
  $openstack server reboot --hard xxx
  $virsh dumpxml instance-0001 | grep -A 1 "bus='virtio"
  
  
  $openstack volume create --size 10 test
  $openstack server add volume 1c9b1582-5fc7-417a-a8a0-387e8833731f 
0621430c-b0d2-4cca-8868-f86f36f1ef29
  $sudo journalctl -u devstack@n-cpu.service | grep Duplicate
  Jul 05 09:29:54 alex-devstack-compute2 nova-compute[28285]: ERROR 
nova.virt.libvirt.driver [None req-38714989-4deb-4a05-bdfc-3418edbda7e3 demo 
demo] [instance: 1c9b1582-5fc7-417a-a8a0-387e8833731f] Failed to attach volume 
at mountpoint: /dev/vda: libvirtError: internal error: unable to execute QEMU 
command 'device_add': Duplicate ID 'virtio-disk0' for device

  Error probably comes from the fact that nova lookup for next availiable 
virtio device based on name, which is  vda - virtio-disk0 (as root device is 
currently sda)
  but because root device sda is already using virtio-disk0 it failed.

  
  Expected result
  ===
  instance root_device_name should remain the same as before rescue/unrescue, 
regardless of image used for rescuing.

  
  Actual result
  =
  instance root_device_name is updated according the hw_disk_bus property for 
the image used during rescue(and never set back to original value)

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1835926/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.ne

[Yahoo-eng-team] [Bug 1835926] Re: Volume attachment may fail after rescuing instance on an image with different hw_disk_bus

2019-11-25 Thread OpenStack Infra
Reviewed:  https://review.opendev.org/67
Committed: 
https://git.openstack.org/cgit/openstack/nova/commit/?id=5e0ed5e7fee3c4c887263a0e9fa847c2dcc5cf3b
Submitter: Zuul
Branch:master

commit 5e0ed5e7fee3c4c887263a0e9fa847c2dcc5cf3b
Author: Alexandre Arents 
Date:   Tue Jul 9 16:13:01 2019 +

Do not update root_device_name during guest config

_get_guest_config() is currently updating instance.root_device_name
and called in many ways like:

_hard_reboot(), rescue(), spawn(), resume(), finish_migration(),
finish_revert_migration()

It is an issue because root_device_name is initally set during instance
build and should remain the same after:

manager.py: _do_build_and_run_instance()
 ..
   _default_block_device_names() <-here
   ..
   driver.spawn()

This may lead to edge case, like in rescue where this value can be 
mistakenly
updated to reflect disk bus property of rescue image (hw_disk_bus).
Further more, a _get* method should not modify instance object.

Note that test test_get_guest_config_bug_1118829 is removed because no more
relevant with current code.

Change-Id: I1787f9717618d0837208844e8065840d30341cf7
Closes-Bug: #1835926


** Changed in: nova
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1835926

Title:
  Volume attachment may fail after rescuing instance on an image with
  different hw_disk_bus

Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  Description
  ===

  Look likes rescue may update instances.root_device_name if rescue image has 
different disk bus (image property hw_disk_bus) than instance.
  This introduce a mimatch between device name and driver used for instance:

  During instance config generation, nova guess the disk bus driver according 
table instance_system_metadata.image_hw_disk_bus,
  and get root device name from table instances.root_device_name.
  Because of this mismatch, cinder attachment may failed with the following 
error message in compute log:
   unable to execute QEMU command 'device_add': Duplicate ID 'virtio-disk0' for 
device

  Probable solution is to avoid rescue action to update
  instance.root_device_name

  Steps to reproduce
  ==

  On a fresh master devstack:
  openstack image save cirros-0.4.0-x86_64-disk --file 
/tmp/cirros-0.4.0-x86_64-disk.disk
  #create a new image, but an scsi one:
  openstack image create --container-format bare --disk-format qcow2  --file 
/tmp/cirros-0.4.0-x86_64-disk.disk --property hw_disk_bus='scsi' --property  
hw_scsi_model='virtio-scsi' cirros-0.4.0-x86_64-scsi-disk
  #create instance with default virtio driver:
  openstack server create  --flavor m1.small --image cirros-0.4.0-x86_64-disk 
--nic net-id=private test
  mysql> select root_device_name from instances where uuid='xxx'
  /dev/vda
  #rescue instance but with the scsi image: 
  $openstack server rescue  --image  cirros-0.4.0-x86_64-scsi-disk
  mysql> select root_device_name from instances where uuid='xxx'
  /dev/sda
  $openstack server unrescue 
  # root_device_name is still on sda should be on vda according instance 
metadata
  mysql> select root_device_name from instances where uuid='xxx'
  /dev/sda
  $virsh dumpxml instance-0001 | grep "bus='virtio"
  
  
  # at the next hard reboot new xml is generated with scsi device name BUT with 
virtio driver.
  $openstack server reboot --hard xxx
  $virsh dumpxml instance-0001 | grep -A 1 "bus='virtio"
  
  
  $openstack volume create --size 10 test
  $openstack server add volume 1c9b1582-5fc7-417a-a8a0-387e8833731f 
0621430c-b0d2-4cca-8868-f86f36f1ef29
  $sudo journalctl -u devstack@n-cpu.service | grep Duplicate
  Jul 05 09:29:54 alex-devstack-compute2 nova-compute[28285]: ERROR 
nova.virt.libvirt.driver [None req-38714989-4deb-4a05-bdfc-3418edbda7e3 demo 
demo] [instance: 1c9b1582-5fc7-417a-a8a0-387e8833731f] Failed to attach volume 
at mountpoint: /dev/vda: libvirtError: internal error: unable to execute QEMU 
command 'device_add': Duplicate ID 'virtio-disk0' for device

  Error probably comes from the fact that nova lookup for next availiable 
virtio device based on name, which is  vda - virtio-disk0 (as root device is 
currently sda)
  but because root device sda is already using virtio-disk0 it failed.

  
  Expected result
  ===
  instance root_device_name should remain the same as before rescue/unrescue, 
regardless of image used for rescuing.

  
  Actual result
  =
  instance root_device_name is updated according the hw_disk_bus property for 
the image used during rescue(and never set back to original value)

To manage notifications about this bug go to:
https://bugs.