Public bug reported:

$ lsb_release -rd
Description:    Ubuntu 19.10
Release:        19.10

[Impact]

Devices affected:

* [1022:149c] USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Matisse 
USB 3.0 Host Controller
* [1022:1487] Audio device [0403]: Advanced Micro Devices, Inc. [AMD] 
Starship/Matisse HD Audio Controller

Despite advertising FLReset device capabilities, performing a function
level reset of either of these devices causes the system to lock up.
This is of particular issue where these devices appear in their own
IOMMU groups and are well suited to VFIO passthrough.

Issue was introduced in AMD's "AGESA Combo-AM4 1.0.0.4 Patch B"
microcode update, and affects dozens of motherboard models across
various vendors.

Additional discussion of this issue:
https://www.reddit.com/r/VFIO/comments/eba5mh/workaround_patch_for_passing_through_usb_and/

[Fix]

Add a quirk to disable FLR on these devices.  Sample patch attached.

[Test Case]

Peform the test on an impacted system:

* B350, B450, X370, X470, X570 motherboards (practically anything with an AM4 
socket);
* Ryzen 3000-series CPU (2000-series possibly also affected);
* BIOS/UEFI firmware that includes "AGESA Combo-AM4 1.0.0.4 Patch B" (check 
vendor release notes)

In the above case where '0000:10:00.3' is the USB controller
'1022:149c', issue a reset command

  $ echo 1 | sudo tee /sys/bus/pci/devices/0000\:10\:00.3/reset

Impacted systems will not return successfully and become unstable,
requiring a reboot.  `/var/logs/syslog` will show something resembling
the following

  Mar  4 14:51:26 bunty kernel: [ 1745.043914] xhci_hcd 0000:10:00.3: not ready 
1023ms after FLR; waiting
  Mar  4 14:51:28 bunty kernel: [ 1747.091910] xhci_hcd 0000:10:00.3: not ready 
2047ms after FLR; waiting
  Mar  4 14:51:32 bunty kernel: [ 1750.163972] xhci_hcd 0000:10:00.3: not ready 
4095ms after FLR; waiting
  Mar  4 14:51:37 bunty kernel: [ 1755.283933] xhci_hcd 0000:10:00.3: not ready 
8191ms after FLR; waiting
  Mar  4 14:51:46 bunty kernel: [ 1764.499943] xhci_hcd 0000:10:00.3: not ready 
16383ms after FLR; waiting
  Mar  4 14:52:04 bunty kernel: [ 1782.164126] xhci_hcd 0000:10:00.3: not ready 
32767ms after FLR; waiting
  Mar  4 14:52:39 bunty kernel: [ 1816.979432] xhci_hcd 0000:10:00.3: not ready 
65535ms after FLR; giving up
  Mar  4 14:52:39 bunty kernel: [ 1817.978790] clocksource: timekeeping 
watchdog on CPU14: Marking clocksource 'tsc' as unstable because the skew is 
too large:
  Mar  4 14:52:39 bunty kernel: [ 1817.978806] clocksource:                     
  'hpet' wd_now: f63fcfe wd_last: d468894 mask: ffffffff
  Mar  4 14:52:39 bunty kernel: [ 1817.978809] clocksource:                     
  'tsc' cs_now: 60e67e17758 cs_last: 60d2a81ce24 mask: ffffffffffffffff
  Mar  4 14:52:39 bunty kernel: [ 1817.978818] tsc: Marking TSC unstable due to 
clocksource watchdog
  Mar  4 14:52:40 bunty kernel: [ 1817.978892] TSC found unstable after boot, 
most likely due to broken BIOS. Use 'tsc=unstable'.
  Mar  4 14:52:40 bunty kernel: [ 1817.978894] sched_clock: Marking unstable 
(1817664630139, 314261908)<-(1817981099530, -2209419)

[Regression Risk]

Unknown

** Affects: linux (Ubuntu)
     Importance: Undecided
         Status: New

** Patch added: "amdnoflr-ubuntu-5.3.0-40.32-generic.patch"
   
https://bugs.launchpad.net/bugs/1865988/+attachment/5333271/+files/amdnoflr-ubuntu-5.3.0-40.32-generic.patch

** Description changed:

  $ lsb_release -rd
  Description:    Ubuntu 19.10
  Release:        19.10
  
  [Impact]
  
  Devices affected:
  
  * [1022:149c] USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] 
Matisse USB 3.0 Host Controller
  * [1022:1487] Audio device [0403]: Advanced Micro Devices, Inc. [AMD] 
Starship/Matisse HD Audio Controller
  
- Despite advertising FLReset device capabilties, performing a function
+ Despite advertising FLReset device capabilities, performing a function
  level reset of either of these devices causes the system to lock up.
  This is of particular issue where these devices appear in their own
- IOMMU groups and are well suited to VFIO passhthrough.
+ IOMMU groups and are well suited to VFIO passthrough.
  
  Issue was introduced in AMD's "AGESA Combo-AM4 1.0.0.4 Patch B"
- microcode update, and affects dozens of motherboad models across various
- vendors.
+ microcode update, and affects dozens of motherboard models across
+ various vendors.
  
  [Fix]
  
  Add a quirk to disable FLR on these devices.  Sample patch attached.
  
  [Test Case]
  
  Peform the test on an impacted system:
  
  * B350, B450, X370, X470, X570 motherboards (practically anything with an AM4 
socket);
  * Ryzen 3000-series CPU (2000-series possibly also affected);
  * BIOS/UEFI firmware that includes "AGESA Combo-AM4 1.0.0.4 Patch B" (check 
vendor release notes)
  
  In the above case where '0000:10:00.3' is the USB controller
  '1022:149c', issue a reset command
  
-   $ echo 1 | sudo tee /sys/bus/pci/devices/0000\:10\:00.3/reset
+   $ echo 1 | sudo tee /sys/bus/pci/devices/0000\:10\:00.3/reset
  
  Impacted systems will not return successfully and become unstable,
  requiring a reboot.  `/var/logs/syslog` will show something resembling
  the following
  
-   Mar  4 14:51:26 bunty kernel: [ 1745.043914] xhci_hcd 0000:10:00.3: not 
ready 1023ms after FLR; waiting
-   Mar  4 14:51:28 bunty kernel: [ 1747.091910] xhci_hcd 0000:10:00.3: not 
ready 2047ms after FLR; waiting
-   Mar  4 14:51:32 bunty kernel: [ 1750.163972] xhci_hcd 0000:10:00.3: not 
ready 4095ms after FLR; waiting
-   Mar  4 14:51:37 bunty kernel: [ 1755.283933] xhci_hcd 0000:10:00.3: not 
ready 8191ms after FLR; waiting
-   Mar  4 14:51:46 bunty kernel: [ 1764.499943] xhci_hcd 0000:10:00.3: not 
ready 16383ms after FLR; waiting
-   Mar  4 14:52:04 bunty kernel: [ 1782.164126] xhci_hcd 0000:10:00.3: not 
ready 32767ms after FLR; waiting
-   Mar  4 14:52:39 bunty kernel: [ 1816.979432] xhci_hcd 0000:10:00.3: not 
ready 65535ms after FLR; giving up
-   Mar  4 14:52:39 bunty kernel: [ 1817.978790] clocksource: timekeeping 
watchdog on CPU14: Marking clocksource 'tsc' as unstable because the skew is 
too large:
-   Mar  4 14:52:39 bunty kernel: [ 1817.978806] clocksource:                   
    'hpet' wd_now: f63fcfe wd_last: d468894 mask: ffffffff
-   Mar  4 14:52:39 bunty kernel: [ 1817.978809] clocksource:                   
    'tsc' cs_now: 60e67e17758 cs_last: 60d2a81ce24 mask: ffffffffffffffff
-   Mar  4 14:52:39 bunty kernel: [ 1817.978818] tsc: Marking TSC unstable due 
to clocksource watchdog
-   Mar  4 14:52:40 bunty kernel: [ 1817.978892] TSC found unstable after boot, 
most likely due to broken BIOS. Use 'tsc=unstable'.
-   Mar  4 14:52:40 bunty kernel: [ 1817.978894] sched_clock: Marking unstable 
(1817664630139, 314261908)<-(1817981099530, -2209419)
+   Mar  4 14:51:26 bunty kernel: [ 1745.043914] xhci_hcd 0000:10:00.3: not 
ready 1023ms after FLR; waiting
+   Mar  4 14:51:28 bunty kernel: [ 1747.091910] xhci_hcd 0000:10:00.3: not 
ready 2047ms after FLR; waiting
+   Mar  4 14:51:32 bunty kernel: [ 1750.163972] xhci_hcd 0000:10:00.3: not 
ready 4095ms after FLR; waiting
+   Mar  4 14:51:37 bunty kernel: [ 1755.283933] xhci_hcd 0000:10:00.3: not 
ready 8191ms after FLR; waiting
+   Mar  4 14:51:46 bunty kernel: [ 1764.499943] xhci_hcd 0000:10:00.3: not 
ready 16383ms after FLR; waiting
+   Mar  4 14:52:04 bunty kernel: [ 1782.164126] xhci_hcd 0000:10:00.3: not 
ready 32767ms after FLR; waiting
+   Mar  4 14:52:39 bunty kernel: [ 1816.979432] xhci_hcd 0000:10:00.3: not 
ready 65535ms after FLR; giving up
+   Mar  4 14:52:39 bunty kernel: [ 1817.978790] clocksource: timekeeping 
watchdog on CPU14: Marking clocksource 'tsc' as unstable because the skew is 
too large:
+   Mar  4 14:52:39 bunty kernel: [ 1817.978806] clocksource:                   
    'hpet' wd_now: f63fcfe wd_last: d468894 mask: ffffffff
+   Mar  4 14:52:39 bunty kernel: [ 1817.978809] clocksource:                   
    'tsc' cs_now: 60e67e17758 cs_last: 60d2a81ce24 mask: ffffffffffffffff
+   Mar  4 14:52:39 bunty kernel: [ 1817.978818] tsc: Marking TSC unstable due 
to clocksource watchdog
+   Mar  4 14:52:40 bunty kernel: [ 1817.978892] TSC found unstable after boot, 
most likely due to broken BIOS. Use 'tsc=unstable'.
+   Mar  4 14:52:40 bunty kernel: [ 1817.978894] sched_clock: Marking unstable 
(1817664630139, 314261908)<-(1817981099530, -2209419)
  
  [Regression Risk]
  
  Unknown

** Description changed:

  $ lsb_release -rd
  Description:    Ubuntu 19.10
  Release:        19.10
  
  [Impact]
  
  Devices affected:
  
  * [1022:149c] USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] 
Matisse USB 3.0 Host Controller
  * [1022:1487] Audio device [0403]: Advanced Micro Devices, Inc. [AMD] 
Starship/Matisse HD Audio Controller
  
  Despite advertising FLReset device capabilities, performing a function
  level reset of either of these devices causes the system to lock up.
  This is of particular issue where these devices appear in their own
  IOMMU groups and are well suited to VFIO passthrough.
  
  Issue was introduced in AMD's "AGESA Combo-AM4 1.0.0.4 Patch B"
  microcode update, and affects dozens of motherboard models across
  various vendors.
+ 
+ Additional discussion of this issue:
+ 
https://www.reddit.com/r/VFIO/comments/eba5mh/workaround_patch_for_passing_through_usb_and/
  
  [Fix]
  
  Add a quirk to disable FLR on these devices.  Sample patch attached.
  
  [Test Case]
  
  Peform the test on an impacted system:
  
  * B350, B450, X370, X470, X570 motherboards (practically anything with an AM4 
socket);
  * Ryzen 3000-series CPU (2000-series possibly also affected);
  * BIOS/UEFI firmware that includes "AGESA Combo-AM4 1.0.0.4 Patch B" (check 
vendor release notes)
  
  In the above case where '0000:10:00.3' is the USB controller
  '1022:149c', issue a reset command
  
    $ echo 1 | sudo tee /sys/bus/pci/devices/0000\:10\:00.3/reset
  
  Impacted systems will not return successfully and become unstable,
  requiring a reboot.  `/var/logs/syslog` will show something resembling
  the following
  
    Mar  4 14:51:26 bunty kernel: [ 1745.043914] xhci_hcd 0000:10:00.3: not 
ready 1023ms after FLR; waiting
    Mar  4 14:51:28 bunty kernel: [ 1747.091910] xhci_hcd 0000:10:00.3: not 
ready 2047ms after FLR; waiting
    Mar  4 14:51:32 bunty kernel: [ 1750.163972] xhci_hcd 0000:10:00.3: not 
ready 4095ms after FLR; waiting
    Mar  4 14:51:37 bunty kernel: [ 1755.283933] xhci_hcd 0000:10:00.3: not 
ready 8191ms after FLR; waiting
    Mar  4 14:51:46 bunty kernel: [ 1764.499943] xhci_hcd 0000:10:00.3: not 
ready 16383ms after FLR; waiting
    Mar  4 14:52:04 bunty kernel: [ 1782.164126] xhci_hcd 0000:10:00.3: not 
ready 32767ms after FLR; waiting
    Mar  4 14:52:39 bunty kernel: [ 1816.979432] xhci_hcd 0000:10:00.3: not 
ready 65535ms after FLR; giving up
    Mar  4 14:52:39 bunty kernel: [ 1817.978790] clocksource: timekeeping 
watchdog on CPU14: Marking clocksource 'tsc' as unstable because the skew is 
too large:
    Mar  4 14:52:39 bunty kernel: [ 1817.978806] clocksource:                   
    'hpet' wd_now: f63fcfe wd_last: d468894 mask: ffffffff
    Mar  4 14:52:39 bunty kernel: [ 1817.978809] clocksource:                   
    'tsc' cs_now: 60e67e17758 cs_last: 60d2a81ce24 mask: ffffffffffffffff
    Mar  4 14:52:39 bunty kernel: [ 1817.978818] tsc: Marking TSC unstable due 
to clocksource watchdog
    Mar  4 14:52:40 bunty kernel: [ 1817.978892] TSC found unstable after boot, 
most likely due to broken BIOS. Use 'tsc=unstable'.
    Mar  4 14:52:40 bunty kernel: [ 1817.978894] sched_clock: Marking unstable 
(1817664630139, 314261908)<-(1817981099530, -2209419)
  
  [Regression Risk]
  
  Unknown

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1865988

Title:
  Performing function level reset of AMD onboard USB and audio devices
  causes system lockup

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1865988/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to