[Kernel-packages] [Bug 1867455] Re: Having more than one AMD Radeon RX 580 installed will trigger a PCIe bus fatal error during boot.
It appears the firmware is not loading, looking through the previous logs it appears the firmware has never loaded during any of the previous testing. The Ubuntu 19.10 VM is having the same problem as the Ubuntu 20.04 host. root@r920-cmwhv52:~# grep smu /var/log/kern.log Mar 13 22:40:40 r920-cmwhv52 kernel: [ 11.394925] amdgpu: [powerplay] hwmgr_sw_init smu backed is polaris10_smu Mar 13 22:40:40 r920-cmwhv52 kernel: [ 14.788159] smu firmware loading failed Mar 13 23:35:05 r920-cmwhv52 kernel: [ 11.368465] amdgpu: [powerplay] hwmgr_sw_init smu backed is polaris10_smu Mar 13 23:35:05 r920-cmwhv52 kernel: [ 14.750625] smu firmware loading failed Mar 14 01:23:06 r920-cmwhv52 kernel: [ 11.484064] amdgpu: [powerplay] hwmgr_sw_init smu backed is polaris10_smu Mar 14 01:23:06 r920-cmwhv52 kernel: [ 14.872698] smu firmware loading failed Mar 14 11:44:19 r920-cmwhv52 kernel: [ 11.487538] amdgpu: [powerplay] hwmgr_sw_init smu backed is polaris10_smu Mar 14 11:44:19 r920-cmwhv52 kernel: [ 14.869816] smu firmware loading failed Mar 14 13:36:40 r920-cmwhv52 kernel: [ 815.669131] amdgpu: [powerplay] hwmgr_sw_init smu backed is polaris10_smu Mar 14 13:36:43 r920-cmwhv52 kernel: [ 819.095834] smu firmware loading failed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-5.4 in Ubuntu. https://bugs.launchpad.net/bugs/1867455 Title: Having more than one AMD Radeon RX 580 installed will trigger a PCIe bus fatal error during boot. Status in linux-5.4 package in Ubuntu: New Bug description: I have a Dell R920 with three MSI Radeon RX 580 8G V1 cards, if more than one of these cards is installed in the system, a bus fatal error will be generated causing the system to reset itself when the amdgpu module is loaded and it tries to initialize the device. ProblemType: Bug DistroRelease: Ubuntu 20.04 Package: linux-modules-5.4.0-14-lowlatency 5.4.0-14.17 ProcVersionSignature: Ubuntu 5.4.0-14.17-lowlatency 5.4.18 Uname: Linux 5.4.0-14-lowlatency x86_64 NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair ApportVersion: 2.20.11-0ubuntu20 Architecture: amd64 AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/by-path', '/dev/snd/controlC0', '/dev/snd/hwC0D0', '/dev/snd/pcmC0D9p', '/dev/snd/pcmC0D8p', '/dev/snd/pcmC0D7p', '/dev/snd/pcmC0D3p', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: Date: Sat Mar 14 12:20:47 2020 Dependencies: MachineType: Dell Inc. PowerEdge R920 ProcEnviron: TERM=xterm-256color PATH=(custom, no user) LANG=en_US.UTF-8 SHELL=/bin/bash ProcFB: 0 mgag200drmfb ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-5.4.0-14-lowlatency root=ZFS=rpool/ROOT/ubuntu ro intel_iommu=on iommu=pt root=ZFS=rpool/ROOT/ubuntu intremap=no_x2apic_optout ipv6.disable=1 mitigations=off PulseList: Error: command ['pacmd', 'list'] failed with exit code 1: No PulseAudio daemon running, or not running as session daemon. RelatedPackageVersions: linux-restricted-modules-5.4.0-14-lowlatency N/A linux-backports-modules-5.4.0-14-lowlatency N/A linux-firmware 1.186 RfKill: SourcePackage: linux-5.4 UpgradeStatus: No upgrade log present (probably fresh install) dmi.bios.date: 06/26/2019 dmi.bios.vendor: Dell Inc. dmi.bios.version: 1.9.0 dmi.board.name: 0V7HD0 dmi.board.vendor: Dell Inc. dmi.board.version: A06 dmi.chassis.type: 23 dmi.chassis.vendor: Dell Inc. dmi.modalias: dmi:bvnDellInc.:bvr1.9.0:bd06/26/2019:svnDellInc.:pnPowerEdgeR920:pvr:rvnDellInc.:rn0V7HD0:rvrA06:cvnDellInc.:ct23:cvr: dmi.product.name: PowerEdge R920 dmi.sys.vendor: Dell Inc. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-5.4/+bug/1867455/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1867455] Re: Having more than one AMD Radeon RX 580 installed will trigger a PCIe bus fatal error during boot.
I noticed the follow in /var/log/kern.log when I passed through two RX580 to the same Ubuntu 19.10 VM: root@over-zfs-test-1:/var/log# grep amd kern.log Mar 14 19:13:31 over-zfs-test-1 kernel: [0.00] Linux version 5.3.0-40-generic (buildd@lcy01-amd64-026) (gcc version 9.2.1 20191008 (Ubuntu 9.2.1-9ubuntu2)) #32-Ubuntu SMP Fri Jan 31 20:24:34 UTC 2020 (Ubuntu 5.3.0-40.32-generic 5.3.18) Mar 14 19:13:31 over-zfs-test-1 kernel: [5.896408] [drm] amdgpu kernel modesetting enabled. Mar 14 19:13:31 over-zfs-test-1 kernel: [5.899443] amdgpu :06:00.0: remove_conflicting_pci_framebuffers: bar 0: 0x12 -> 0x13 Mar 14 19:13:31 over-zfs-test-1 kernel: [5.899495] amdgpu :06:00.0: remove_conflicting_pci_framebuffers: bar 2: 0x14 -> 0x14001f Mar 14 19:13:31 over-zfs-test-1 kernel: [5.899560] amdgpu :06:00.0: remove_conflicting_pci_framebuffers: bar 5: 0xfc20 -> 0xfc23 Mar 14 19:13:31 over-zfs-test-1 kernel: [6.324289] amdgpu :06:00.0: GPU pci config reset Mar 14 19:13:31 over-zfs-test-1 kernel: [6.844695] amdgpu :06:00.0: BAR 2: releasing [mem 0x14-0x14001f 64bit pref] Mar 14 19:13:31 over-zfs-test-1 kernel: [6.846800] amdgpu :06:00.0: BAR 0: releasing [mem 0x12-0x13 64bit pref] Mar 14 19:13:31 over-zfs-test-1 kernel: [6.851331] amdgpu :06:00.0: BAR 0: assigned [mem 0x12-0x13 64bit pref] Mar 14 19:13:31 over-zfs-test-1 kernel: [6.853799] amdgpu :06:00.0: BAR 2: assigned [mem 0x14-0x14001f 64bit pref] Mar 14 19:13:31 over-zfs-test-1 kernel: [7.055675] amdgpu :06:00.0: VRAM: 8192M 0x00F4 - 0x00F5 (8192M used) Mar 14 19:13:31 over-zfs-test-1 kernel: [7.057405] amdgpu :06:00.0: GART: 256M 0x00FF - 0x00FF0FFF Mar 14 19:13:31 over-zfs-test-1 kernel: [7.062079] [drm] amdgpu: 8192M of VRAM memory ready Mar 14 19:13:31 over-zfs-test-1 kernel: [7.063325] [drm] amdgpu: 8192M of GTT memory ready. Mar 14 19:13:31 over-zfs-test-1 kernel: [7.086747] amdgpu: [powerplay] hwmgr_sw_init smu backed is polaris10_smu Mar 14 19:13:31 over-zfs-test-1 kernel: [ 10.693155] amdgpu: [powerplay] Mar 14 19:13:31 over-zfs-test-1 kernel: [ 12.530859] amdgpu: [powerplay] Mar 14 19:13:31 over-zfs-test-1 kernel: [ 16.088944] amdgpu: [powerplay] Mar 14 19:13:31 over-zfs-test-1 kernel: [ 17.909031] amdgpu: [powerplay] Mar 14 19:13:31 over-zfs-test-1 kernel: [ 21.309891] amdgpu: [powerplay] Mar 14 19:13:31 over-zfs-test-1 kernel: [ 23.014157] amdgpu: [powerplay] Mar 14 19:13:31 over-zfs-test-1 kernel: [ 26.406895] amdgpu: [powerplay] Mar 14 19:13:31 over-zfs-test-1 kernel: [ 28.115737] amdgpu: [powerplay] Mar 14 19:13:31 over-zfs-test-1 kernel: [ 31.511862] amdgpu: [powerplay] Mar 14 19:13:31 over-zfs-test-1 kernel: [ 33.210902] amdgpu: [powerplay] Mar 14 19:13:31 over-zfs-test-1 kernel: [ 34.974621] amdgpu: [powerplay] SMU load firmware failed Mar 14 19:13:31 over-zfs-test-1 kernel: [ 34.975513] amdgpu: [powerplay] fw load failed Mar 14 19:13:31 over-zfs-test-1 kernel: [ 34.976809] amdgpu :06:00.0: amdgpu_device_ip_init failed Mar 14 19:13:31 over-zfs-test-1 kernel: [ 34.977443] amdgpu :06:00.0: Fatal error during GPU init Mar 14 19:13:31 over-zfs-test-1 kernel: [ 34.978183] [drm] amdgpu: finishing device. Mar 14 19:13:31 over-zfs-test-1 kernel: [ 34.992069] WARNING: CPU: 22 PID: 361 at drivers/gpu/drm/amd/amdgpu/amdgpu_object.c:930 amdgpu_bo_unpin.cold+0x0/0x40 [amdgpu] Mar 14 19:13:31 over-zfs-test-1 kernel: [ 34.993699] Modules linked in: hid_generic usbhid hid amdgpu(+) crct10dif_pclmul crc32_pclmul ghash_clmulni_intel amd_iommu_v2 gpu_sched i2c_algo_bit qxl aesni_intel ttm drm_kms_helper aes_x86_64 syscopyarea crypto_simd sysfillrect sysimgblt cryptd ahci fb_sys_fops psmouse virtio_net glue_helper lpc_ich i2c_i801 libahci drm net_failover virtio_blk failover Mar 14 19:13:31 over-zfs-test-1 kernel: [ 34.997757] RIP: 0010:amdgpu_bo_unpin.cold+0x0/0x40 [amdgpu] Mar 14 19:13:31 over-zfs-test-1 kernel: [ 35.007489] amdgpu_bo_free_kernel+0x70/0x120 [amdgpu] Mar 14 19:13:31 over-zfs-test-1 kernel: [ 35.008290] amdgpu_gfx_rlc_fini+0x4b/0x70 [amdgpu] Mar 14 19:13:31 over-zfs-test-1 kernel: [ 35.009090] gfx_v8_0_sw_fini+0xac/0x1a0 [amdgpu] Mar 14 19:13:31 over-zfs-test-1 kernel: [ 35.009905] amdgpu_device_fini+0x26b/0x4ac [amdgpu] Mar 14 19:13:31 over-zfs-test-1 kernel: [ 35.010697] amdgpu_driver_unload_kms+0x52/0xa0 [amdgpu] Mar 14 19:13:31 over-zfs-test-1 kernel: [ 35.011517] amdgpu_driver_load_kms.cold+0x39/0x5c [amdgpu] Mar 14 19:13:31 over-zfs-test-1 kernel: [ 35.013070] amdgpu_pci_probe+0xf7/0x160 [amdgpu] Mar 14 19:13:31 over-zfs-test-1 kernel: [ 35.023395] amdgpu_init+0x83/0x8d [amdgpu] Mar 14 19:13:31 over-zfs-test-1 kernel: [ 35.037565] amdgpu :06:00.0: a855cc3c unpin not necessary Mar 14 19:13:31
[Kernel-packages] [Bug 1867455] Re: Having more than one AMD Radeon RX 580 installed will trigger a PCIe bus fatal error during boot.
With the amdgpu module blacklisted on the host hypervisor, I was able to successfully passthrough the GPU to a separate Ubuntu 19.10 VM, one GPU per VM. See attached screenshot. So I would say this demonstrates that the physical hardware is fine and that this is a software problem. ** Attachment added: "Screenshot of GPU passthrough to VMs" https://bugs.launchpad.net/ubuntu/+source/linux-5.4/+bug/1867455/+attachment/5336977/+files/dual-vms-screenshot.png -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-5.4 in Ubuntu. https://bugs.launchpad.net/bugs/1867455 Title: Having more than one AMD Radeon RX 580 installed will trigger a PCIe bus fatal error during boot. Status in linux-5.4 package in Ubuntu: New Bug description: I have a Dell R920 with three MSI Radeon RX 580 8G V1 cards, if more than one of these cards is installed in the system, a bus fatal error will be generated causing the system to reset itself when the amdgpu module is loaded and it tries to initialize the device. ProblemType: Bug DistroRelease: Ubuntu 20.04 Package: linux-modules-5.4.0-14-lowlatency 5.4.0-14.17 ProcVersionSignature: Ubuntu 5.4.0-14.17-lowlatency 5.4.18 Uname: Linux 5.4.0-14-lowlatency x86_64 NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair ApportVersion: 2.20.11-0ubuntu20 Architecture: amd64 AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/by-path', '/dev/snd/controlC0', '/dev/snd/hwC0D0', '/dev/snd/pcmC0D9p', '/dev/snd/pcmC0D8p', '/dev/snd/pcmC0D7p', '/dev/snd/pcmC0D3p', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: Date: Sat Mar 14 12:20:47 2020 Dependencies: MachineType: Dell Inc. PowerEdge R920 ProcEnviron: TERM=xterm-256color PATH=(custom, no user) LANG=en_US.UTF-8 SHELL=/bin/bash ProcFB: 0 mgag200drmfb ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-5.4.0-14-lowlatency root=ZFS=rpool/ROOT/ubuntu ro intel_iommu=on iommu=pt root=ZFS=rpool/ROOT/ubuntu intremap=no_x2apic_optout ipv6.disable=1 mitigations=off PulseList: Error: command ['pacmd', 'list'] failed with exit code 1: No PulseAudio daemon running, or not running as session daemon. RelatedPackageVersions: linux-restricted-modules-5.4.0-14-lowlatency N/A linux-backports-modules-5.4.0-14-lowlatency N/A linux-firmware 1.186 RfKill: SourcePackage: linux-5.4 UpgradeStatus: No upgrade log present (probably fresh install) dmi.bios.date: 06/26/2019 dmi.bios.vendor: Dell Inc. dmi.bios.version: 1.9.0 dmi.board.name: 0V7HD0 dmi.board.vendor: Dell Inc. dmi.board.version: A06 dmi.chassis.type: 23 dmi.chassis.vendor: Dell Inc. dmi.modalias: dmi:bvnDellInc.:bvr1.9.0:bd06/26/2019:svnDellInc.:pnPowerEdgeR920:pvr:rvnDellInc.:rn0V7HD0:rvrA06:cvnDellInc.:ct23:cvr: dmi.product.name: PowerEdge R920 dmi.sys.vendor: Dell Inc. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-5.4/+bug/1867455/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1867455] Re: Having more than one AMD Radeon RX 580 installed will trigger a PCIe bus fatal error during boot.
After blacklisting the amdgpu module via echo "blacklist amdgpu" > /etc/modprobe.d/blacklist-amdgpu.conf && update-initramfs -u; the system will let me boot all the way up with more than one RX580 installed (currently two are installed). Running modprobe amdgpu after the system was already booted cause a kernel panic, which I have captured and attached a screenshot of below. root@r920-cmwhv52:~# lspci | grep AMD 22:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] (rev e7) 22:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere HDMI Audio [Radeon RX 470/480 / 570/580/590] 61:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] (rev e7) 61:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere HDMI Audio [Radeon RX 470/480 / 570/580/590] root@r920-cmwhv52:~# lspci -vs 22:00 22:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] (rev e7) (prog-if 00 [VGA controller]) Subsystem: Micro-Star International Co., Ltd. [MSI] Radeon RX 580 Flags: fast devsel, IRQ 15, NUMA node 1 Memory at 3c0 (64-bit, prefetchable) [disabled] [size=256M] Memory at 3c01000 (64-bit, prefetchable) [disabled] [size=2M] I/O ports at 4000 [disabled] [size=256] Memory at afe8 (32-bit, non-prefetchable) [disabled] [size=256K] Expansion ROM at afec [disabled] [size=128K] Capabilities: [48] Vendor Specific Information: Len=08 Capabilities: [50] Power Management version 3 Capabilities: [58] Express Legacy Endpoint, MSI 00 Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+ Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 Capabilities: [150] Advanced Error Reporting Capabilities: [200] Resizable BAR Capabilities: [270] Secondary PCI Express Capabilities: [2b0] Address Translation Service (ATS) Capabilities: [2c0] Page Request Interface (PRI) Capabilities: [2d0] Process Address Space ID (PASID) Capabilities: [320] Latency Tolerance Reporting Capabilities: [328] Alternative Routing-ID Interpretation (ARI) Capabilities: [370] L1 PM Substates lspci: Unable to load libkmod resources: error -12 22:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere HDMI Audio [Radeon RX 470/480 / 570/580/590] Subsystem: Micro-Star International Co., Ltd. [MSI] Ellesmere HDMI Audio [Radeon RX 470/480 / 570/580/590] Flags: bus master, fast devsel, latency 0, IRQ 379, NUMA node 1 Memory at afefc000 (64-bit, non-prefetchable) [size=16K] Capabilities: [48] Vendor Specific Information: Len=08 Capabilities: [50] Power Management version 3 Capabilities: [58] Express Legacy Endpoint, MSI 00 Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+ Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 Capabilities: [150] Advanced Error Reporting Capabilities: [328] Alternative Routing-ID Interpretation (ARI) Kernel driver in use: snd_hda_intel root@r920-cmwhv52:~# lspci -vs 61:00 61:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] (rev e7) (prog-if 00 [VGA controller]) Subsystem: Micro-Star International Co., Ltd. [MSI] Radeon RX 580 Flags: fast devsel, IRQ 15, NUMA node 3 Memory at 33ff000 (64-bit, prefetchable) [disabled] [size=256M] Memory at 33fefe0 (64-bit, prefetchable) [disabled] [size=2M] I/O ports at 6000 [disabled] [size=256] Memory at a7e8 (32-bit, non-prefetchable) [disabled] [size=256K] Expansion ROM at a7ec [disabled] [size=128K] Capabilities: [48] Vendor Specific Information: Len=08 Capabilities: [50] Power Management version 3 Capabilities: [58] Express Legacy Endpoint, MSI 00 Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+ Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 Capabilities: [150] Advanced Error Reporting Capabilities: [200] Resizable BAR Capabilities: [270] Secondary PCI Express Capabilities: [2b0] Address Translation Service (ATS) Capabilities: [2c0] Page Request Interface (PRI) Capabilities: [2d0] Process Address Space ID (PASID) Capabilities: [320] Latency Tolerance Reporting Capabilities: [328] Alternative Routing-ID Interpretation (ARI) Capabilities: [370] L1 PM Substates lspci: Unable to load libkmod resources: error -12 61:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere HDMI Audio [Radeon RX 470/480 / 570/580/590] Subsystem:
[Kernel-packages] [Bug 1867455] Re: Having more than one AMD Radeon RX 580 installed will trigger a PCIe bus fatal error during boot.
** Attachment added: "Screenshot of console just before system reboot" https://bugs.launchpad.net/ubuntu/+source/linux-5.4/+bug/1867455/+attachment/5336947/+files/rpviewer-28.png -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-5.4 in Ubuntu. https://bugs.launchpad.net/bugs/1867455 Title: Having more than one AMD Radeon RX 580 installed will trigger a PCIe bus fatal error during boot. Status in linux-5.4 package in Ubuntu: New Bug description: I have a Dell R920 with three MSI Radeon RX 580 8G V1 cards, if more than one of these cards is installed in the system, a bus fatal error will be generated causing the system to reset itself when the amdgpu module is loaded and it tries to initialize the device. ProblemType: Bug DistroRelease: Ubuntu 20.04 Package: linux-modules-5.4.0-14-lowlatency 5.4.0-14.17 ProcVersionSignature: Ubuntu 5.4.0-14.17-lowlatency 5.4.18 Uname: Linux 5.4.0-14-lowlatency x86_64 NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair ApportVersion: 2.20.11-0ubuntu20 Architecture: amd64 AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/by-path', '/dev/snd/controlC0', '/dev/snd/hwC0D0', '/dev/snd/pcmC0D9p', '/dev/snd/pcmC0D8p', '/dev/snd/pcmC0D7p', '/dev/snd/pcmC0D3p', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: Date: Sat Mar 14 12:20:47 2020 Dependencies: MachineType: Dell Inc. PowerEdge R920 ProcEnviron: TERM=xterm-256color PATH=(custom, no user) LANG=en_US.UTF-8 SHELL=/bin/bash ProcFB: 0 mgag200drmfb ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-5.4.0-14-lowlatency root=ZFS=rpool/ROOT/ubuntu ro intel_iommu=on iommu=pt root=ZFS=rpool/ROOT/ubuntu intremap=no_x2apic_optout ipv6.disable=1 mitigations=off PulseList: Error: command ['pacmd', 'list'] failed with exit code 1: No PulseAudio daemon running, or not running as session daemon. RelatedPackageVersions: linux-restricted-modules-5.4.0-14-lowlatency N/A linux-backports-modules-5.4.0-14-lowlatency N/A linux-firmware 1.186 RfKill: SourcePackage: linux-5.4 UpgradeStatus: No upgrade log present (probably fresh install) dmi.bios.date: 06/26/2019 dmi.bios.vendor: Dell Inc. dmi.bios.version: 1.9.0 dmi.board.name: 0V7HD0 dmi.board.vendor: Dell Inc. dmi.board.version: A06 dmi.chassis.type: 23 dmi.chassis.vendor: Dell Inc. dmi.modalias: dmi:bvnDellInc.:bvr1.9.0:bd06/26/2019:svnDellInc.:pnPowerEdgeR920:pvr:rvnDellInc.:rn0V7HD0:rvrA06:cvnDellInc.:ct23:cvr: dmi.product.name: PowerEdge R920 dmi.sys.vendor: Dell Inc. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-5.4/+bug/1867455/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1867455] Re: Having more than one AMD Radeon RX 580 installed will trigger a PCIe bus fatal error during boot.
** Attachment added: "Screenshot of iDRAC Lifecycle Log" https://bugs.launchpad.net/ubuntu/+source/linux-5.4/+bug/1867455/+attachment/5336948/+files/idrac-screenshot.png -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-5.4 in Ubuntu. https://bugs.launchpad.net/bugs/1867455 Title: Having more than one AMD Radeon RX 580 installed will trigger a PCIe bus fatal error during boot. Status in linux-5.4 package in Ubuntu: New Bug description: I have a Dell R920 with three MSI Radeon RX 580 8G V1 cards, if more than one of these cards is installed in the system, a bus fatal error will be generated causing the system to reset itself when the amdgpu module is loaded and it tries to initialize the device. ProblemType: Bug DistroRelease: Ubuntu 20.04 Package: linux-modules-5.4.0-14-lowlatency 5.4.0-14.17 ProcVersionSignature: Ubuntu 5.4.0-14.17-lowlatency 5.4.18 Uname: Linux 5.4.0-14-lowlatency x86_64 NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair ApportVersion: 2.20.11-0ubuntu20 Architecture: amd64 AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/by-path', '/dev/snd/controlC0', '/dev/snd/hwC0D0', '/dev/snd/pcmC0D9p', '/dev/snd/pcmC0D8p', '/dev/snd/pcmC0D7p', '/dev/snd/pcmC0D3p', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: Date: Sat Mar 14 12:20:47 2020 Dependencies: MachineType: Dell Inc. PowerEdge R920 ProcEnviron: TERM=xterm-256color PATH=(custom, no user) LANG=en_US.UTF-8 SHELL=/bin/bash ProcFB: 0 mgag200drmfb ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-5.4.0-14-lowlatency root=ZFS=rpool/ROOT/ubuntu ro intel_iommu=on iommu=pt root=ZFS=rpool/ROOT/ubuntu intremap=no_x2apic_optout ipv6.disable=1 mitigations=off PulseList: Error: command ['pacmd', 'list'] failed with exit code 1: No PulseAudio daemon running, or not running as session daemon. RelatedPackageVersions: linux-restricted-modules-5.4.0-14-lowlatency N/A linux-backports-modules-5.4.0-14-lowlatency N/A linux-firmware 1.186 RfKill: SourcePackage: linux-5.4 UpgradeStatus: No upgrade log present (probably fresh install) dmi.bios.date: 06/26/2019 dmi.bios.vendor: Dell Inc. dmi.bios.version: 1.9.0 dmi.board.name: 0V7HD0 dmi.board.vendor: Dell Inc. dmi.board.version: A06 dmi.chassis.type: 23 dmi.chassis.vendor: Dell Inc. dmi.modalias: dmi:bvnDellInc.:bvr1.9.0:bd06/26/2019:svnDellInc.:pnPowerEdgeR920:pvr:rvnDellInc.:rn0V7HD0:rvrA06:cvnDellInc.:ct23:cvr: dmi.product.name: PowerEdge R920 dmi.sys.vendor: Dell Inc. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-5.4/+bug/1867455/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1867455] [NEW] Having more than one AMD Radeon RX 580 installed will trigger a PCIe bus fatal error during boot.
Public bug reported: I have a Dell R920 with three MSI Radeon RX 580 8G V1 cards, if more than one of these cards is installed in the system, a bus fatal error will be generated causing the system to reset itself when the amdgpu module is loaded and it tries to initialize the device. ProblemType: Bug DistroRelease: Ubuntu 20.04 Package: linux-modules-5.4.0-14-lowlatency 5.4.0-14.17 ProcVersionSignature: Ubuntu 5.4.0-14.17-lowlatency 5.4.18 Uname: Linux 5.4.0-14-lowlatency x86_64 NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair ApportVersion: 2.20.11-0ubuntu20 Architecture: amd64 AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/by-path', '/dev/snd/controlC0', '/dev/snd/hwC0D0', '/dev/snd/pcmC0D9p', '/dev/snd/pcmC0D8p', '/dev/snd/pcmC0D7p', '/dev/snd/pcmC0D3p', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: Date: Sat Mar 14 12:20:47 2020 Dependencies: MachineType: Dell Inc. PowerEdge R920 ProcEnviron: TERM=xterm-256color PATH=(custom, no user) LANG=en_US.UTF-8 SHELL=/bin/bash ProcFB: 0 mgag200drmfb ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-5.4.0-14-lowlatency root=ZFS=rpool/ROOT/ubuntu ro intel_iommu=on iommu=pt root=ZFS=rpool/ROOT/ubuntu intremap=no_x2apic_optout ipv6.disable=1 mitigations=off PulseList: Error: command ['pacmd', 'list'] failed with exit code 1: No PulseAudio daemon running, or not running as session daemon. RelatedPackageVersions: linux-restricted-modules-5.4.0-14-lowlatency N/A linux-backports-modules-5.4.0-14-lowlatency N/A linux-firmware 1.186 RfKill: SourcePackage: linux-5.4 UpgradeStatus: No upgrade log present (probably fresh install) dmi.bios.date: 06/26/2019 dmi.bios.vendor: Dell Inc. dmi.bios.version: 1.9.0 dmi.board.name: 0V7HD0 dmi.board.vendor: Dell Inc. dmi.board.version: A06 dmi.chassis.type: 23 dmi.chassis.vendor: Dell Inc. dmi.modalias: dmi:bvnDellInc.:bvr1.9.0:bd06/26/2019:svnDellInc.:pnPowerEdgeR920:pvr:rvnDellInc.:rn0V7HD0:rvrA06:cvnDellInc.:ct23:cvr: dmi.product.name: PowerEdge R920 dmi.sys.vendor: Dell Inc. ** Affects: linux-5.4 (Ubuntu) Importance: Undecided Status: New ** Tags: amd64 apport-bug focal -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-5.4 in Ubuntu. https://bugs.launchpad.net/bugs/1867455 Title: Having more than one AMD Radeon RX 580 installed will trigger a PCIe bus fatal error during boot. Status in linux-5.4 package in Ubuntu: New Bug description: I have a Dell R920 with three MSI Radeon RX 580 8G V1 cards, if more than one of these cards is installed in the system, a bus fatal error will be generated causing the system to reset itself when the amdgpu module is loaded and it tries to initialize the device. ProblemType: Bug DistroRelease: Ubuntu 20.04 Package: linux-modules-5.4.0-14-lowlatency 5.4.0-14.17 ProcVersionSignature: Ubuntu 5.4.0-14.17-lowlatency 5.4.18 Uname: Linux 5.4.0-14-lowlatency x86_64 NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair ApportVersion: 2.20.11-0ubuntu20 Architecture: amd64 AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/by-path', '/dev/snd/controlC0', '/dev/snd/hwC0D0', '/dev/snd/pcmC0D9p', '/dev/snd/pcmC0D8p', '/dev/snd/pcmC0D7p', '/dev/snd/pcmC0D3p', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: Date: Sat Mar 14 12:20:47 2020 Dependencies: MachineType: Dell Inc. PowerEdge R920 ProcEnviron: TERM=xterm-256color PATH=(custom, no user) LANG=en_US.UTF-8 SHELL=/bin/bash ProcFB: 0 mgag200drmfb ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-5.4.0-14-lowlatency root=ZFS=rpool/ROOT/ubuntu ro intel_iommu=on iommu=pt root=ZFS=rpool/ROOT/ubuntu intremap=no_x2apic_optout ipv6.disable=1 mitigations=off PulseList: Error: command ['pacmd', 'list'] failed with exit code 1: No PulseAudio daemon running, or not running as session daemon. RelatedPackageVersions: linux-restricted-modules-5.4.0-14-lowlatency N/A linux-backports-modules-5.4.0-14-lowlatency N/A linux-firmware 1.186 RfKill: SourcePackage: linux-5.4 UpgradeStatus: No upgrade log present (probably fresh install) dmi.bios.date: 06/26/2019 dmi.bios.vendor: Dell Inc. dmi.bios.version: 1.9.0 dmi.board.name: 0V7HD0 dmi.board.vendor: Dell Inc. dmi.board.version: A06 dmi.chassis.type: 23 dmi.chassis.vendor: Dell Inc. dmi.modalias: dmi:bvnDellInc.:bvr1.9.0:bd06/26/2019:svnDellInc.:pnPowerEdgeR920:pvr:rvnDellInc.:rn0V7HD0:rvrA06:cvnDellInc.:ct23:cvr: dmi.product.name: PowerEdge R920 dmi.sys.vendor: Dell Inc. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-5.4/+bug/1867455/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1765232] Re: Kernel 4.15.0-15 breaks Dell PowerEdge 12th Gen servers
I just did a fresh install of the April 24 daily build with kernel 4.15.0-19 on my Dell R820 and I can confirm this has been fix. Thank you! -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1765232 Title: Kernel 4.15.0-15 breaks Dell PowerEdge 12th Gen servers Status in linux package in Ubuntu: Fix Released Status in linux source package in Bionic: Fix Released Bug description: SRU Justification Impact: Since 4.15.0-15 some machines have been failing to boot due to IO hangs. This is caused by patches applied for LP #1759723, which assigned managed interrupt vectors and reply queues for all possible CPUs, not just present CPUs. Some drivers were not prepared to cope with this and end up selecting reply queues not mapped to an online CPU, causing IO hangs during boot. Fix: There are driver fixes available upstream, but there are 8-ish patches in total and we're extremely close to release, so the safer bet it to just revert the patches for LP #1759723. We can consider reintroducing them with required fixes at a later time. Regression Potential: This is obviously going to reintroduce the problem the patches were intended to fix. These are less serious than the problems which the patches introduced, and IBM has given their okay to revert them as well. Test Case: Verified to fix affected hardware on LP #1765232. --- For Ubuntu 18.04 amd64 server, I updated the kernel from 4.15.0-13 to 4.15.0-15. The system hangs at boot up now. Rolling back to the old kernel everything works as expected. It appears to hang while trying to enumerate an SD card device, which isn't even installed. I have tried this on a Dell R620 and R820 and both have the exact same problem. I even downloaded a new installer iso from the daily builds to reinstall the OS and the installer hangs too now. Support for Dell PowerEdge 12th gen servers appears to be broken with kernel 4.15.0-15. https://imgur.com/VT9zd0w https://imgur.com/4BQPCXo To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1765232/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1765232] Re: Kernel 4.15.0-15 breaks Dell PowerEdge 12th Gen servers
The kernel from post #12 also works without issue for me too. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1765232 Title: Kernel 4.15.0-15 breaks Dell PowerEdge 12th Gen servers Status in linux package in Ubuntu: Confirmed Status in linux source package in Bionic: Confirmed Bug description: For Ubuntu 18.04 amd64 server, I updated the kernel from 4.15.0-13 to 4.15.0-15. The system hangs at boot up now. Rolling back to the old kernel everything works as expected. It appears to hang while trying to enumerate an SD card device, which isn't even installed. I have tried this on a Dell R620 and R820 and both have the exact same problem. I even downloaded a new installer iso from the daily builds to reinstall the OS and the installer hangs too now. Support for Dell PowerEdge 12th gen servers appears to be broken with kernel 4.15.0-15. https://imgur.com/VT9zd0w https://imgur.com/4BQPCXo To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1765232/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1765232] Re: Kernel 4.15.0-15 breaks Dell PowerEdge 12th Gen servers
I just tried the proposed kernel (4.15.0-18.19) and it still has the same problem. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1765232 Title: Kernel 4.15.0-15 breaks Dell PowerEdge 12th Gen servers Status in linux package in Ubuntu: Confirmed Status in linux source package in Bionic: Confirmed Bug description: For Ubuntu 18.04 amd64 server, I updated the kernel from 4.15.0-13 to 4.15.0-15. The system hangs at boot up now. Rolling back to the old kernel everything works as expected. It appears to hang while trying to enumerate an SD card device, which isn't even installed. I have tried this on a Dell R620 and R820 and both have the exact same problem. I even downloaded a new installer iso from the daily builds to reinstall the OS and the installer hangs too now. Support for Dell PowerEdge 12th gen servers appears to be broken with kernel 4.15.0-15. https://imgur.com/VT9zd0w https://imgur.com/4BQPCXo To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1765232/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1765105] Re: HP Smart Array driver fails to load, kernel panics boot process
This bug appears to present with similar symptoms as this bug report: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1765232 I presume that Dell doesn't use the hpsa driver, but both are hanging at approximately the same time in the boot up process and the same subsystems appear to be involved... I would speculate that they have something in common that is triggering this. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1765105 Title: HP Smart Array driver fails to load, kernel panics boot process Status in linux package in Ubuntu: Confirmed Bug description: Hi I have an HP ProLiant G6 with a Smart Array controller on Bionic, Kernel 4.15.0-15, that fails to load the hpsa driver for the Smart Array, (occasionally) drops to the initramfs shell, as it cannot find a root device in a reasonable amount of time, and then finally kernel panics with the stack trace leading to the hpsa driver. Occasionally it will complain about resetting the logical volume scsi bus. Downgrading to the previous kernel 4.15.0-13 works just fine. Attached is a screenshot showing the kernel panic, and below is a link to a YouTube video of the whole boot process, a video recording from ILO. To save you the time of watching the whole thing, 39 seconds is the bus reset error, then it hangs around waiting for a root device, and the kernel panics at 4:16. The relevant information are separate below Video: https://youtu.be/45ka2Btgiqk lsb_release: Description: Ubuntu Bionic Beaver (development branch) Release: 18.04 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1765105/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1765232] Re: Kernel 4.15.0-15 breaks Dell PowerEdge 12th Gen servers
Re bot, the stated command can't be run because the system will not boot to a bash prompt. ** Changed in: linux (Ubuntu) Status: Incomplete => Confirmed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1765232 Title: Kernel 4.15.0-15 breaks Dell PowerEdge 12th Gen servers Status in linux package in Ubuntu: Confirmed Bug description: For Ubuntu 18.04 amd64 server, I updated the kernel from 4.15.0-13 to 4.15.0-15. The system hangs at boot up now. Rolling back to the old kernel everything works as expected. It appears to hang while trying to enumerate an SD card device, which isn't even installed. I have tried this on a Dell R620 and R820 and both have the exact same problem. I even downloaded a new installer iso from the daily builds to reinstall the OS and the installer hangs too now. Support for Dell PowerEdge 12th gen servers appears to be broken with kernel 4.15.0-15. https://imgur.com/VT9zd0w https://imgur.com/4BQPCXo To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1765232/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp