Public bug reported:

SRU Justification

[Impact]
During large scale deployment testing, we found below call trace when 
provisioning Ubuntu 18.04 VM with size Standard_NV24. Engineer deployed 
instance 10 times and encountered once.

It looks like a race condition when probe device, but finally all
devices can be probed.

[ 4.938162] sysfs: cannot create duplicate filename 
'/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00/device:07/VMBUS:01/47505500-0003-0000-3130-444531334632/pci0003:00/0003:00:00.0/config'
[ 4.944816] sr 5:0:0:0: [sr0] scsi3-mmc drive: 0x/0x tray
[ 4.951818] CPU: 0 PID: 135 Comm: kworker/0:2 Not tainted 5.4.0-1061-azure 
#64~18.04.1-Ubuntu
[ 4.951820] Hardware name: Microsoft Corporation Virtual Machine/Virtual 
Machine, BIOS 090007 06/02/2017
[ 4.958943] cdrom: Uniform CD-ROM driver Revision: 3.20
[ 4.955812] Workqueue: hv_pri_chan vmbus_add_channel_work
[ 4.955812] Call Trace:
[ 4.955812] dump_stack+0x57/0x6d
[ 4.955812] sysfs_warn_dup+0x5b/0x70
[ 4.955812] sysfs_add_file_mode_ns+0x158/0x180
[ 4.955812] sysfs_create_bin_file+0x64/0x90
[ 4.955812] pci_create_sysfs_dev_files+0x72/0x270
[ 4.955812] pci_bus_add_device+0x30/0x80
[ 4.955812] pci_bus_add_devices+0x31/0x70
[ 4.955812] hv_pci_probe+0x48c/0x650
[ 4.955812] vmbus_probe+0x3e/0x90
[ 4.955812] really_probe+0xf5/0x440
[ 4.955812] driver_probe_device+0x11b/0x130
[ 4.955812] __device_attach_driver+0x7b/0xe0
[ 4.955812] ? driver_allows_async_probing+0x60/0x60
[ 4.955812] bus_for_each_drv+0x6e/0xb0
[ 4.955812] __device_attach+0xe4/0x160
[ 4.955812] device_initial_probe+0x13/0x20
[ 4.955812] bus_probe_device+0x92/0xa0
[ 4.955812] device_add+0x402/0x690
[ 4.955812] device_register+0x1a/0x20
[ 4.955812] vmbus_device_register+0x5e/0xf0
[ 4.955812] vmbus_add_channel_work+0x2c4/0x640
[ 4.955812] process_one_work+0x209/0x400
[ 4.955812] worker_thread+0x34/0x400
[ 4.955812] kthread+0x121/0x140
[ 4.955812] ? process_one_work+0x400/0x400
[ 4.955812] ? kthread_park+0x90/0x90
[ 4.955812] ret_from_fork+0x35/0x40
[ 5.043612] hv_pci 47505500-0004-0001-3130-444531334632: PCI VMBus probing: 
Using version 0x10002
[ 5.260563] hv_pci 47505500-0004-0001-3130-444531334632: PCI host bridge to bus 
0004:00

Dexuan did some research and it looks like this is a longstanding race 
condition bug in the generic PCI subsystem (due to the timing, there can be 
more than 1 place where the PCI code tries to create the same ‘config’ sysfs 
file):
https://patchwork.kernel.org/project/linux-pci/patch/20200716110423.xtfyb3n6tn5ixedh@pali/#23669641
The bug was reported on 7/16/2020, and the last reply was on 6/25/2021. It 
looks like this has not been fixed after 1+ year…
Business Impact

[Test Case]

Repeated deployment on a Standard_NV24 instance. MS reported the
reproduction rate is 3/551 before the patch, and 0/838 with the patch.

[Where things could go wrong]

Deployments could fail for other reasons.

[Other info]

SF: #00321027

** Affects: linux-azure (Ubuntu)
     Importance: Medium
     Assignee: Tim Gardner (timg-tpi)
         Status: New

** Affects: linux-azure (Ubuntu Bionic)
     Importance: Undecided
         Status: New

** Affects: linux-azure (Ubuntu Focal)
     Importance: Undecided
         Status: New

** Package changed: linux (Ubuntu) => linux-azure (Ubuntu)

** Changed in: linux-azure (Ubuntu)
   Importance: Undecided => Medium

** Changed in: linux-azure (Ubuntu)
     Assignee: (unassigned) => Tim Gardner (timg-tpi)

** Also affects: linux-azure (Ubuntu Focal)
   Importance: Undecided
       Status: New

** Also affects: linux-azure (Ubuntu Bionic)
   Importance: Undecided
       Status: New

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1952621

Title:
  Bionic/linux-azure: Call trace on Ubuntu 18.04 VM with Standard NV24

Status in linux-azure package in Ubuntu:
  New
Status in linux-azure source package in Bionic:
  New
Status in linux-azure source package in Focal:
  New

Bug description:
  SRU Justification

  [Impact]
  During large scale deployment testing, we found below call trace when 
provisioning Ubuntu 18.04 VM with size Standard_NV24. Engineer deployed 
instance 10 times and encountered once.

  It looks like a race condition when probe device, but finally all
  devices can be probed.

  [ 4.938162] sysfs: cannot create duplicate filename 
'/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00/device:07/VMBUS:01/47505500-0003-0000-3130-444531334632/pci0003:00/0003:00:00.0/config'
  [ 4.944816] sr 5:0:0:0: [sr0] scsi3-mmc drive: 0x/0x tray
  [ 4.951818] CPU: 0 PID: 135 Comm: kworker/0:2 Not tainted 5.4.0-1061-azure 
#64~18.04.1-Ubuntu
  [ 4.951820] Hardware name: Microsoft Corporation Virtual Machine/Virtual 
Machine, BIOS 090007 06/02/2017
  [ 4.958943] cdrom: Uniform CD-ROM driver Revision: 3.20
  [ 4.955812] Workqueue: hv_pri_chan vmbus_add_channel_work
  [ 4.955812] Call Trace:
  [ 4.955812] dump_stack+0x57/0x6d
  [ 4.955812] sysfs_warn_dup+0x5b/0x70
  [ 4.955812] sysfs_add_file_mode_ns+0x158/0x180
  [ 4.955812] sysfs_create_bin_file+0x64/0x90
  [ 4.955812] pci_create_sysfs_dev_files+0x72/0x270
  [ 4.955812] pci_bus_add_device+0x30/0x80
  [ 4.955812] pci_bus_add_devices+0x31/0x70
  [ 4.955812] hv_pci_probe+0x48c/0x650
  [ 4.955812] vmbus_probe+0x3e/0x90
  [ 4.955812] really_probe+0xf5/0x440
  [ 4.955812] driver_probe_device+0x11b/0x130
  [ 4.955812] __device_attach_driver+0x7b/0xe0
  [ 4.955812] ? driver_allows_async_probing+0x60/0x60
  [ 4.955812] bus_for_each_drv+0x6e/0xb0
  [ 4.955812] __device_attach+0xe4/0x160
  [ 4.955812] device_initial_probe+0x13/0x20
  [ 4.955812] bus_probe_device+0x92/0xa0
  [ 4.955812] device_add+0x402/0x690
  [ 4.955812] device_register+0x1a/0x20
  [ 4.955812] vmbus_device_register+0x5e/0xf0
  [ 4.955812] vmbus_add_channel_work+0x2c4/0x640
  [ 4.955812] process_one_work+0x209/0x400
  [ 4.955812] worker_thread+0x34/0x400
  [ 4.955812] kthread+0x121/0x140
  [ 4.955812] ? process_one_work+0x400/0x400
  [ 4.955812] ? kthread_park+0x90/0x90
  [ 4.955812] ret_from_fork+0x35/0x40
  [ 5.043612] hv_pci 47505500-0004-0001-3130-444531334632: PCI VMBus probing: 
Using version 0x10002
  [ 5.260563] hv_pci 47505500-0004-0001-3130-444531334632: PCI host bridge to 
bus 0004:00

  Dexuan did some research and it looks like this is a longstanding race 
condition bug in the generic PCI subsystem (due to the timing, there can be 
more than 1 place where the PCI code tries to create the same ‘config’ sysfs 
file):
  
https://patchwork.kernel.org/project/linux-pci/patch/20200716110423.xtfyb3n6tn5ixedh@pali/#23669641
  The bug was reported on 7/16/2020, and the last reply was on 6/25/2021. It 
looks like this has not been fixed after 1+ year…
  Business Impact

  [Test Case]

  Repeated deployment on a Standard_NV24 instance. MS reported the
  reproduction rate is 3/551 before the patch, and 0/838 with the patch.

  [Where things could go wrong]

  Deployments could fail for other reasons.

  [Other info]

  SF: #00321027

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1952621/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to