[Kernel-packages] [Bug 2081085] Re: wbt:wbt_* trace event NULL pointer dereference with GENHD_FL_HIDDEN disks
This bug is awaiting verification that the linux- gcp-5.4/5.4.0-1140.149~18.04.1 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic-linux- gcp-5.4' to 'verification-done-bionic-linux-gcp-5.4'. If the problem still exists, change the tag 'verification-needed-bionic-linux-gcp-5.4' to 'verification-failed-bionic-linux-gcp-5.4'. If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you! ** Tags added: kernel-spammed-bionic-linux-gcp-5.4-v2 verification-needed-bionic-linux-gcp-5.4 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2081085 Title: wbt:wbt_* trace event NULL pointer dereference with GENHD_FL_HIDDEN disks Status in linux package in Ubuntu: Invalid Status in linux source package in Focal: Fix Released Bug description: [Impact] Systems with storage devices that utilize the GENHD_FL_HIDDEN flag, such as NVMe disks declaring support for multiple controllers (aka native multipathing), will have a request queue with backing_dev_info->dev set to NULL. When tracing is enabled with any of the wbt:wbt_* events enabled, a NULL pointer dereference will occur in the corresponding trace function called from wb_timer_fn. This occurs when the trace function attempts to access the device's name with dev_name. On a DGXA100 system, this can be reproduced by running the following, where /dev/nvme0n1 is one of the 4 NVMe disks in the system that support native multipathing: $ echo 1 | sudo tee /sys/kernel/tracing/events/wbt/enable $ echo 1 | sudo tee /sys/kernel/tracing/tracing_on $ sudo dd if=/dev/zero of=/dev/nvme0n1 A NULL pointer dereference will occur and the system will become unresponsive. [Fix] The upstream commit d51cfc53ade318 ("bdi: use bdi_dev_name() to get device name") resolves this by changing the wbt:wbt_* trace functions to use the bdi_dev_name function instead of dev_name. The bdi_dev_name function safely handles the case where the supplied device is NULL. [Test Case] Verified that the commit d51cfc53ade318 ("bdi: use bdi_dev_name() to get device name") resolves the issue on DGXA100 when applied to the "Ubuntu-5.4.0-196.216" tag. The reproducer no longer causes a NULL pointer dereference or otherwise crash the system. [Regression Potential] There is a low risk of a regression: * In the focal K5.4 kernel, the bdi_dev_name function is used in other trace event functions for the same purpose of catching the case where bdi->dev is NULL. * This change is already present in kernel versions 5.7 and newer. [Other] The patch d51cfc53ade318 ("bdi: use bdi_dev_name() to get device name") is already present in Jammy K5.15 and newer. - Originally found via kernel regression testing and reported here by @cypressyew: https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/2072972. [ 1073.837085] BUG: kernel NULL pointer dereference, address: 0050 [ 1073.844858] #PF: supervisor read access in kernel mode [ 1073.850589] #PF: error_code(0x) - not-present page [ 1073.856318] PGD 0 P4D 0 [ 1073.859141] Oops: [#1] SMP NOPTI [ 1073.863226] CPU: 9 PID: 0 Comm: swapper/9 Tainted: P OE 5.4.0-196-generic #216-Ubuntu [ 1073.873319] Hardware name: NVIDIA DGXA100 920-23687-2530-000/DGXA100, BIOS 1.25 08/31/2023 [ 1073.882547] RIP: 0010:trace_event_raw_event_wbt_timer+0x6f/0x100 [ 1073.889248] Code: 59 80 e5 02 0f 85 8f 00 00 00 4c 89 e6 ba 34 00 00 00 48 8d 7d a0 e8 00 aa c9 ff 49 89 c4 48 85 c0 74 37 49 8b 87 b8 03 00 00 <48> 8b 70 50 48 85 f6 74 45 49 8d 7c 24 08 ba 20 00 00 00 e8 c9 18 [ 1073.910200] RSP: 0018:af08598a8da0 EFLAGS: 00010282 [ 1073.916029] RAX: RBX: RCX: 8100 [ 1073.923988] RDX: 970f0ba9501c RSI: 0100 RDI: 970f0ba95018 [ 1073.931947] RBP: af08598a8e08 R08: 970f0ba95010 R09: 0100 [ 1073.939906] R10: cf083fdc9a58 R11: 0386 R12: 970f0ba9501c [ 1073.947864] R13: R14: 0001 R15: 976efdfba000 [ 1073.955824] FS: () GS:970f0f64() knlGS: [ 1073.964850] CS: 0010 DS: ES: CR0: 80050033 [ 1073.971259] CR2: 0050 CR3: 001f7de8 CR4: 00340ee0 [ 1073.979217] Call Trace: [ 1073.981942] [ 1073.984188] ? show_regs.cold+0x1a/0x1f [ 1073.988456] ? __die+0x90/0xd9 [ 1073.991862] ? no_context.isra.0+0x12c/0x320 [ 1073.996626] ? update_group_capacity+0x2c/0x1d0 [ 1074.001679] ? __bad_area_nosemaphore+0x45/0x1a0 [
[Kernel-packages] [Bug 2081085] Re: wbt:wbt_* trace event NULL pointer dereference with GENHD_FL_HIDDEN disks
This bug was fixed in the package linux - 5.4.0-200.220 --- linux (5.4.0-200.220) focal; urgency=medium * focal/linux: 5.4.0-200.220 -proposed tracker (LP: #2082937) * Packaging resync (LP: #1786013) - [Packaging] debian.master/dkms-versions -- update from kernel-versions (main/2024.09.30) * CVE-2024-26800 - tls: rx: coalesce exit paths in tls_decrypt_sg() - tls: separate no-async decryption request handling from async - tls: fix use-after-free on failed backlog decryption * CVE-2024-26641 - ip6_tunnel: make sure to pull inner header in __ip6_tnl_rcv() * CVE-2021-47212 - net/mlx5: Update error handler for UCTX and UMEM * wbt:wbt_* trace event NULL pointer dereference with GENHD_FL_HIDDEN disks (LP: #2081085) - bdi: use bdi_dev_name() to get device name * Focal update: v5.4.284 upstream stable release (LP: #2081278) - drm: panel-orientation-quirks: Add quirk for OrangePi Neo - i2c: Fix conditional for substituting empty ACPI functions - net: usb: qmi_wwan: add MeiG Smart SRM825L - drm/amdgpu: Fix uninitialized variable warning in amdgpu_afmt_acr - drm/amdgpu: fix overflowed array index read warning - drm/amd/display: Check gpio_id before used as array index - drm/amd/display: Stop amdgpu_dm initialize when stream nums greater than 6 - drm/amd/display: Check num_valid_sets before accessing reader_wm_sets[] - drm/amd/display: Fix Coverity INTEGER_OVERFLOW within dal_gpio_service_create - drm/amdgpu: fix ucode out-of-bounds read warning - drm/amdgpu: fix mc_data out-of-bounds read warning - drm/amdkfd: Reconcile the definition and use of oem_id in struct kfd_topology_device - apparmor: fix possible NULL pointer dereference - ionic: fix potential irq name truncation - usbip: Don't submit special requests twice - usb: typec: ucsi: Fix null pointer dereference in trace - smack: tcp: ipv4, fix incorrect labeling - wifi: cfg80211: make hash table duplicates more survivable - drm/amd/display: Skip wbscl_set_scaler_filter if filter is null - media: uvcvideo: Enforce alignment of frame and interval - block: initialize integrity buffer to zero before writing it to media - net: set SOCK_RCU_FREE before inserting socket into hashtable - virtio_net: Fix napi_skb_cache_put warning - udf: Limit file size to 4TB - i2c: Use IS_REACHABLE() for substituting empty ACPI functions - sch/netem: fix use after free in netem_dequeue - ASoC: dapm: Fix UAF for snd_soc_pcm_runtime object - ALSA: hda/conexant: Add pincfg quirk to enable top speakers on Sirius devices - ata: libata: Fix memory leak for error path in ata_host_alloc() - irqchip/gic-v2m: Fix refcount leak in gicv2m_of_init() - mmc: dw_mmc: Fix IDMAC operation with pages bigger than 4K - mmc: sdhci-of-aspeed: fix module autoloading - fuse: update stats for pages in dropped aux writeback list - fuse: use unsigned type for getxattr/listxattr size truncation - reset: hi6220: Add support for AO reset controller - clk: hi6220: use CLK_OF_DECLARE_DRIVER - clk: qcom: clk-alpha-pll: Fix the pll post div mask - clk: qcom: clk-alpha-pll: Fix the trion pll postdiv set rate API - ila: call nf_unregister_net_hooks() sooner - sched: sch_cake: fix bulk flow accounting logic for host fairness - nilfs2: fix missing cleanup on rollforward recovery error - nilfs2: fix state management in error path of log writing function - ALSA: hda: Add input value sanity checks to HDMI channel map controls - smack: unix sockets: fix accept()ed socket label - irqchip/armada-370-xp: Do not allow mapping IRQ 0 and 1 - af_unix: Remove put_pid()/put_cred() in copy_peercred(). - netfilter: nf_conncount: fix wrong variable type - udf: Avoid excessive partition lengths - wifi: brcmsmac: advertise MFP_CAPABLE to enable WPA3 - usb: uas: set host status byte on data completion error - PCI: keystone: Add workaround for Errata #i2037 (AM65x SR 1.0) - media: qcom: camss: Add check for v4l2_fwnode_endpoint_parse - pcmcia: Use resource_size function on resource object - can: bcm: Remove proc entry when dev is unregistered. - igb: Fix not clearing TimeSync interrupts for 82580 - platform/x86: dell-smbios: Fix error path in dell_smbios_init() - tcp_bpf: fix return value of tcp_bpf_sendmsg() - cx82310_eth: re-enable ethernet mode after router reboot - drivers/net/usb: Remove all strcpy() uses - net: usb: don't write directly to netdev->dev_addr - usbnet: modern method to get random MAC - net: bridge: fdb: convert is_local to bitops - net: bridge: fdb: convert is_static to bitops - net: bridge: fdb: convert is_sticky to bitops - net: bridge: fdb: convert added_by_user to bitops - net: bridge: fdb: convert added_by_external_learn to use bitops - net: bridge: br_fdb_external_lear
[Kernel-packages] [Bug 2081085] Re: wbt:wbt_* trace event NULL pointer dereference with GENHD_FL_HIDDEN disks
With 5.4.0-198-generic, running the reproducer causes the reported crash on a DGXA100, leaving the system unresponsive. After enabling -proposed, upgrading to 5.4.0-200-generic, and rebooting, the reproducer does not cause a crash. $ echo 1 | sudo tee /sys/kernel/tracing/events/wbt/enable $ echo 1 | sudo tee /sys/kernel/tracing/tracing_on $ sudo dd if=/dev/zero of=/dev/nvme3n1 Note, /dev/nvme3n1 is one of the NVMe disks on DGXA100 with native multipathing, as evidenced by the link /sys/class/block/nvme3c3n1 being present: $ readlink /sys/class/block/nvme3c3n1 ../../devices/pci:40/:40:01.1/:41:00.0/:42:08.0/:50:00.0/:51:00.0/:52:00.0/nvme/nvme3/nvme3c3n1 ** Tags removed: verification-needed-focal-linux ** Tags added: verification-done-focal-linux -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2081085 Title: wbt:wbt_* trace event NULL pointer dereference with GENHD_FL_HIDDEN disks Status in linux package in Ubuntu: Invalid Status in linux source package in Focal: Fix Committed Bug description: [Impact] Systems with storage devices that utilize the GENHD_FL_HIDDEN flag, such as NVMe disks declaring support for multiple controllers (aka native multipathing), will have a request queue with backing_dev_info->dev set to NULL. When tracing is enabled with any of the wbt:wbt_* events enabled, a NULL pointer dereference will occur in the corresponding trace function called from wb_timer_fn. This occurs when the trace function attempts to access the device's name with dev_name. On a DGXA100 system, this can be reproduced by running the following, where /dev/nvme0n1 is one of the 4 NVMe disks in the system that support native multipathing: $ echo 1 | sudo tee /sys/kernel/tracing/events/wbt/enable $ echo 1 | sudo tee /sys/kernel/tracing/tracing_on $ sudo dd if=/dev/zero of=/dev/nvme0n1 A NULL pointer dereference will occur and the system will become unresponsive. [Fix] The upstream commit d51cfc53ade318 ("bdi: use bdi_dev_name() to get device name") resolves this by changing the wbt:wbt_* trace functions to use the bdi_dev_name function instead of dev_name. The bdi_dev_name function safely handles the case where the supplied device is NULL. [Test Case] Verified that the commit d51cfc53ade318 ("bdi: use bdi_dev_name() to get device name") resolves the issue on DGXA100 when applied to the "Ubuntu-5.4.0-196.216" tag. The reproducer no longer causes a NULL pointer dereference or otherwise crash the system. [Regression Potential] There is a low risk of a regression: * In the focal K5.4 kernel, the bdi_dev_name function is used in other trace event functions for the same purpose of catching the case where bdi->dev is NULL. * This change is already present in kernel versions 5.7 and newer. [Other] The patch d51cfc53ade318 ("bdi: use bdi_dev_name() to get device name") is already present in Jammy K5.15 and newer. - Originally found via kernel regression testing and reported here by @cypressyew: https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/2072972. [ 1073.837085] BUG: kernel NULL pointer dereference, address: 0050 [ 1073.844858] #PF: supervisor read access in kernel mode [ 1073.850589] #PF: error_code(0x) - not-present page [ 1073.856318] PGD 0 P4D 0 [ 1073.859141] Oops: [#1] SMP NOPTI [ 1073.863226] CPU: 9 PID: 0 Comm: swapper/9 Tainted: P OE 5.4.0-196-generic #216-Ubuntu [ 1073.873319] Hardware name: NVIDIA DGXA100 920-23687-2530-000/DGXA100, BIOS 1.25 08/31/2023 [ 1073.882547] RIP: 0010:trace_event_raw_event_wbt_timer+0x6f/0x100 [ 1073.889248] Code: 59 80 e5 02 0f 85 8f 00 00 00 4c 89 e6 ba 34 00 00 00 48 8d 7d a0 e8 00 aa c9 ff 49 89 c4 48 85 c0 74 37 49 8b 87 b8 03 00 00 <48> 8b 70 50 48 85 f6 74 45 49 8d 7c 24 08 ba 20 00 00 00 e8 c9 18 [ 1073.910200] RSP: 0018:af08598a8da0 EFLAGS: 00010282 [ 1073.916029] RAX: RBX: RCX: 8100 [ 1073.923988] RDX: 970f0ba9501c RSI: 0100 RDI: 970f0ba95018 [ 1073.931947] RBP: af08598a8e08 R08: 970f0ba95010 R09: 0100 [ 1073.939906] R10: cf083fdc9a58 R11: 0386 R12: 970f0ba9501c [ 1073.947864] R13: R14: 0001 R15: 976efdfba000 [ 1073.955824] FS: () GS:970f0f64() knlGS: [ 1073.964850] CS: 0010 DS: ES: CR0: 80050033 [ 1073.971259] CR2: 0050 CR3: 001f7de8 CR4: 00340ee0 [ 1073.979217] Call Trace: [ 1073.981942] [ 1073.984188] ? show_regs.cold+0x1a/0x1f [ 1073.988456] ? __die+0x90/0xd9 [ 1073.991862] ? no_context.isra.0+0x12c/0x320 [ 1073.996626] ? update_group_capacity+0x2c/0x1d0 [ 1074.001679] ? __bad_area_nosemaphore+0x45/0x1a0 [ 1074.
[Kernel-packages] [Bug 2081085] Re: wbt:wbt_* trace event NULL pointer dereference with GENHD_FL_HIDDEN disks
This bug is awaiting verification that the linux/5.4.0-200.220 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal-linux' to 'verification-done-focal-linux'. If the problem still exists, change the tag 'verification-needed-focal- linux' to 'verification-failed-focal-linux'. If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you! ** Tags added: kernel-spammed-focal-linux-v2 verification-needed-focal-linux -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2081085 Title: wbt:wbt_* trace event NULL pointer dereference with GENHD_FL_HIDDEN disks Status in linux package in Ubuntu: Invalid Status in linux source package in Focal: Fix Committed Bug description: [Impact] Systems with storage devices that utilize the GENHD_FL_HIDDEN flag, such as NVMe disks declaring support for multiple controllers (aka native multipathing), will have a request queue with backing_dev_info->dev set to NULL. When tracing is enabled with any of the wbt:wbt_* events enabled, a NULL pointer dereference will occur in the corresponding trace function called from wb_timer_fn. This occurs when the trace function attempts to access the device's name with dev_name. On a DGXA100 system, this can be reproduced by running the following, where /dev/nvme0n1 is one of the 4 NVMe disks in the system that support native multipathing: $ echo 1 | sudo tee /sys/kernel/tracing/events/wbt/enable $ echo 1 | sudo tee /sys/kernel/tracing/tracing_on $ sudo dd if=/dev/zero of=/dev/nvme0n1 A NULL pointer dereference will occur and the system will become unresponsive. [Fix] The upstream commit d51cfc53ade318 ("bdi: use bdi_dev_name() to get device name") resolves this by changing the wbt:wbt_* trace functions to use the bdi_dev_name function instead of dev_name. The bdi_dev_name function safely handles the case where the supplied device is NULL. [Test Case] Verified that the commit d51cfc53ade318 ("bdi: use bdi_dev_name() to get device name") resolves the issue on DGXA100 when applied to the "Ubuntu-5.4.0-196.216" tag. The reproducer no longer causes a NULL pointer dereference or otherwise crash the system. [Regression Potential] There is a low risk of a regression: * In the focal K5.4 kernel, the bdi_dev_name function is used in other trace event functions for the same purpose of catching the case where bdi->dev is NULL. * This change is already present in kernel versions 5.7 and newer. [Other] The patch d51cfc53ade318 ("bdi: use bdi_dev_name() to get device name") is already present in Jammy K5.15 and newer. - Originally found via kernel regression testing and reported here by @cypressyew: https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/2072972. [ 1073.837085] BUG: kernel NULL pointer dereference, address: 0050 [ 1073.844858] #PF: supervisor read access in kernel mode [ 1073.850589] #PF: error_code(0x) - not-present page [ 1073.856318] PGD 0 P4D 0 [ 1073.859141] Oops: [#1] SMP NOPTI [ 1073.863226] CPU: 9 PID: 0 Comm: swapper/9 Tainted: P OE 5.4.0-196-generic #216-Ubuntu [ 1073.873319] Hardware name: NVIDIA DGXA100 920-23687-2530-000/DGXA100, BIOS 1.25 08/31/2023 [ 1073.882547] RIP: 0010:trace_event_raw_event_wbt_timer+0x6f/0x100 [ 1073.889248] Code: 59 80 e5 02 0f 85 8f 00 00 00 4c 89 e6 ba 34 00 00 00 48 8d 7d a0 e8 00 aa c9 ff 49 89 c4 48 85 c0 74 37 49 8b 87 b8 03 00 00 <48> 8b 70 50 48 85 f6 74 45 49 8d 7c 24 08 ba 20 00 00 00 e8 c9 18 [ 1073.910200] RSP: 0018:af08598a8da0 EFLAGS: 00010282 [ 1073.916029] RAX: RBX: RCX: 8100 [ 1073.923988] RDX: 970f0ba9501c RSI: 0100 RDI: 970f0ba95018 [ 1073.931947] RBP: af08598a8e08 R08: 970f0ba95010 R09: 0100 [ 1073.939906] R10: cf083fdc9a58 R11: 0386 R12: 970f0ba9501c [ 1073.947864] R13: R14: 0001 R15: 976efdfba000 [ 1073.955824] FS: () GS:970f0f64() knlGS: [ 1073.964850] CS: 0010 DS: ES: CR0: 80050033 [ 1073.971259] CR2: 0050 CR3: 001f7de8 CR4: 00340ee0 [ 1073.979217] Call Trace: [ 1073.981942] [ 1073.984188] ? show_regs.cold+0x1a/0x1f [ 1073.988456] ? __die+0x90/0xd9 [ 1073.991862] ? no_context.isra.0+0x12c/0x320 [ 1073.996626] ? update_group_capacity+0x2c/0x1d0 [ 1074.001679] ? __bad_area_nosemaphore+0x45/0x1a0 [ 1074.006829] ? bad_area_nosemaphore+0x16/0x20 [ 1074.011687] ? do_us
[Kernel-packages] [Bug 2081085] Re: wbt:wbt_* trace event NULL pointer dereference with GENHD_FL_HIDDEN disks
** Changed in: linux (Ubuntu Focal) Importance: Undecided => Medium ** Changed in: linux (Ubuntu Focal) Status: In Progress => Fix Committed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2081085 Title: wbt:wbt_* trace event NULL pointer dereference with GENHD_FL_HIDDEN disks Status in linux package in Ubuntu: Invalid Status in linux source package in Focal: Fix Committed Bug description: [Impact] Systems with storage devices that utilize the GENHD_FL_HIDDEN flag, such as NVMe disks declaring support for multiple controllers (aka native multipathing), will have a request queue with backing_dev_info->dev set to NULL. When tracing is enabled with any of the wbt:wbt_* events enabled, a NULL pointer dereference will occur in the corresponding trace function called from wb_timer_fn. This occurs when the trace function attempts to access the device's name with dev_name. On a DGXA100 system, this can be reproduced by running the following, where /dev/nvme0n1 is one of the 4 NVMe disks in the system that support native multipathing: $ echo 1 | sudo tee /sys/kernel/tracing/events/wbt/enable $ echo 1 | sudo tee /sys/kernel/tracing/tracing_on $ sudo dd if=/dev/zero of=/dev/nvme0n1 A NULL pointer dereference will occur and the system will become unresponsive. [Fix] The upstream commit d51cfc53ade318 ("bdi: use bdi_dev_name() to get device name") resolves this by changing the wbt:wbt_* trace functions to use the bdi_dev_name function instead of dev_name. The bdi_dev_name function safely handles the case where the supplied device is NULL. [Test Case] Verified that the commit d51cfc53ade318 ("bdi: use bdi_dev_name() to get device name") resolves the issue on DGXA100 when applied to the "Ubuntu-5.4.0-196.216" tag. The reproducer no longer causes a NULL pointer dereference or otherwise crash the system. [Regression Potential] There is a low risk of a regression: * In the focal K5.4 kernel, the bdi_dev_name function is used in other trace event functions for the same purpose of catching the case where bdi->dev is NULL. * This change is already present in kernel versions 5.7 and newer. [Other] The patch d51cfc53ade318 ("bdi: use bdi_dev_name() to get device name") is already present in Jammy K5.15 and newer. - Originally found via kernel regression testing and reported here by @cypressyew: https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/2072972. [ 1073.837085] BUG: kernel NULL pointer dereference, address: 0050 [ 1073.844858] #PF: supervisor read access in kernel mode [ 1073.850589] #PF: error_code(0x) - not-present page [ 1073.856318] PGD 0 P4D 0 [ 1073.859141] Oops: [#1] SMP NOPTI [ 1073.863226] CPU: 9 PID: 0 Comm: swapper/9 Tainted: P OE 5.4.0-196-generic #216-Ubuntu [ 1073.873319] Hardware name: NVIDIA DGXA100 920-23687-2530-000/DGXA100, BIOS 1.25 08/31/2023 [ 1073.882547] RIP: 0010:trace_event_raw_event_wbt_timer+0x6f/0x100 [ 1073.889248] Code: 59 80 e5 02 0f 85 8f 00 00 00 4c 89 e6 ba 34 00 00 00 48 8d 7d a0 e8 00 aa c9 ff 49 89 c4 48 85 c0 74 37 49 8b 87 b8 03 00 00 <48> 8b 70 50 48 85 f6 74 45 49 8d 7c 24 08 ba 20 00 00 00 e8 c9 18 [ 1073.910200] RSP: 0018:af08598a8da0 EFLAGS: 00010282 [ 1073.916029] RAX: RBX: RCX: 8100 [ 1073.923988] RDX: 970f0ba9501c RSI: 0100 RDI: 970f0ba95018 [ 1073.931947] RBP: af08598a8e08 R08: 970f0ba95010 R09: 0100 [ 1073.939906] R10: cf083fdc9a58 R11: 0386 R12: 970f0ba9501c [ 1073.947864] R13: R14: 0001 R15: 976efdfba000 [ 1073.955824] FS: () GS:970f0f64() knlGS: [ 1073.964850] CS: 0010 DS: ES: CR0: 80050033 [ 1073.971259] CR2: 0050 CR3: 001f7de8 CR4: 00340ee0 [ 1073.979217] Call Trace: [ 1073.981942] [ 1073.984188] ? show_regs.cold+0x1a/0x1f [ 1073.988456] ? __die+0x90/0xd9 [ 1073.991862] ? no_context.isra.0+0x12c/0x320 [ 1073.996626] ? update_group_capacity+0x2c/0x1d0 [ 1074.001679] ? __bad_area_nosemaphore+0x45/0x1a0 [ 1074.006829] ? bad_area_nosemaphore+0x16/0x20 [ 1074.011687] ? do_user_addr_fault+0x267/0x440 [ 1074.016547] ? __enqueue_entity+0x96/0xa0 [ 1074.021018] ? enqueue_entity+0x139/0x670 [ 1074.025490] ? __do_page_fault+0x58/0x90 [ 1074.029861] ? do_page_fault+0x2c/0xe0 [ 1074.034044] ? page_fault+0x34/0x40 [ 1074.037933] ? trace_event_raw_event_wbt_timer+0x6f/0x100 [ 1074.043954] ? enqueue_entity+0x139/0x670 [ 1074.048426] wb_timer_fn+0x1d6/0x3c0 [ 1074.052413] ? blk_mq_tag_update_depth+0x100/0x100 [ 1074.057755] blk_stat_timer_fn+0x13a/0x140 [ 1074.062326] call_timer_fn+0x32/0x130 [ 1074.066409] __
[Kernel-packages] [Bug 2081085] Re: wbt:wbt_* trace event NULL pointer dereference with GENHD_FL_HIDDEN disks
** Description changed: [Impact] Systems with storage devices that utilize the GENHD_FL_HIDDEN flag, such as NVMe disks declaring support for multiple controllers (aka native multipathing), will have a request queue with backing_dev_info->dev set to NULL. When tracing is enabled with any of the wbt:wbt_* events enabled, a NULL pointer dereference will occur in the corresponding trace function called from wb_timer_fn. This occurs when the trace function attempts to access the device's name with dev_name. On a DGXA100 system, this can be reproduced by running the following, where /dev/nvme0n1 is one of the 4 NVMe disks in the system that support native multipathing: $ echo 1 | sudo tee /sys/kernel/tracing/events/wbt/enable $ echo 1 | sudo tee /sys/kernel/tracing/tracing_on $ sudo dd if=/dev/zero of=/dev/nvme0n1 A NULL pointer dereference will occur and the system will become unresponsive. [Fix] The upstream commit d51cfc53ade318 ("bdi: use bdi_dev_name() to get device name") resolves this by changing the wbt:wbt_* trace functions to use the bdi_dev_name function instead of dev_name. The bdi_dev_name function safely handles the case where the supplied device is NULL. [Test Case] Verified that the commit d51cfc53ade318 ("bdi: use bdi_dev_name() to get device name") resolves the issue on DGXA100 when applied to the "Ubuntu-5.4.0-196.216" tag. The reproducer no longer causes a NULL pointer dereference or otherwise crash the system. [Regression Potential] There is a low risk of a regression: * In the focal K5.4 kernel, the bdi_dev_name function is used in other trace event functions for the same purpose of catching the case where bdi->dev is NULL. * This change is already present in kernel versions 5.7 and newer. [Other] The patch d51cfc53ade318 ("bdi: use bdi_dev_name() to get device name") is already present in Jammy K5.15 and newer. - - Originally found via kernel regression testing and reported here: https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/2072972. + Originally found via kernel regression testing and reported here by @cypressyew: https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/2072972. [ 1073.837085] BUG: kernel NULL pointer dereference, address: 0050 [ 1073.844858] #PF: supervisor read access in kernel mode [ 1073.850589] #PF: error_code(0x) - not-present page - [ 1073.856318] PGD 0 P4D 0 + [ 1073.856318] PGD 0 P4D 0 [ 1073.859141] Oops: [#1] SMP NOPTI [ 1073.863226] CPU: 9 PID: 0 Comm: swapper/9 Tainted: P OE 5.4.0-196-generic #216-Ubuntu [ 1073.873319] Hardware name: NVIDIA DGXA100 920-23687-2530-000/DGXA100, BIOS 1.25 08/31/2023 [ 1073.882547] RIP: 0010:trace_event_raw_event_wbt_timer+0x6f/0x100 [ 1073.889248] Code: 59 80 e5 02 0f 85 8f 00 00 00 4c 89 e6 ba 34 00 00 00 48 8d 7d a0 e8 00 aa c9 ff 49 89 c4 48 85 c0 74 37 49 8b 87 b8 03 00 00 <48> 8b 70 50 48 85 f6 74 45 49 8d 7c 24 08 ba 20 00 00 00 e8 c9 18 [ 1073.910200] RSP: 0018:af08598a8da0 EFLAGS: 00010282 [ 1073.916029] RAX: RBX: RCX: 8100 [ 1073.923988] RDX: 970f0ba9501c RSI: 0100 RDI: 970f0ba95018 [ 1073.931947] RBP: af08598a8e08 R08: 970f0ba95010 R09: 0100 [ 1073.939906] R10: cf083fdc9a58 R11: 0386 R12: 970f0ba9501c [ 1073.947864] R13: R14: 0001 R15: 976efdfba000 [ 1073.955824] FS: () GS:970f0f64() knlGS: [ 1073.964850] CS: 0010 DS: ES: CR0: 80050033 [ 1073.971259] CR2: 0050 CR3: 001f7de8 CR4: 00340ee0 [ 1073.979217] Call Trace: [ 1073.981942] [ 1073.984188] ? show_regs.cold+0x1a/0x1f [ 1073.988456] ? __die+0x90/0xd9 [ 1073.991862] ? no_context.isra.0+0x12c/0x320 [ 1073.996626] ? update_group_capacity+0x2c/0x1d0 [ 1074.001679] ? __bad_area_nosemaphore+0x45/0x1a0 [ 1074.006829] ? bad_area_nosemaphore+0x16/0x20 [ 1074.011687] ? do_user_addr_fault+0x267/0x440 [ 1074.016547] ? __enqueue_entity+0x96/0xa0 [ 1074.021018] ? enqueue_entity+0x139/0x670 [ 1074.025490] ? __do_page_fault+0x58/0x90 [ 1074.029861] ? do_page_fault+0x2c/0xe0 [ 1074.034044] ? page_fault+0x34/0x40 [ 1074.037933] ? trace_event_raw_event_wbt_timer+0x6f/0x100 [ 1074.043954] ? enqueue_entity+0x139/0x670 [ 1074.048426] wb_timer_fn+0x1d6/0x3c0 [ 1074.052413] ? blk_mq_tag_update_depth+0x100/0x100 [ 1074.057755] blk_stat_timer_fn+0x13a/0x140 [ 1074.062326] call_timer_fn+0x32/0x130 [ 1074.066409] __run_timers.part.0+0x180/0x280 [ 1074.071174] ? tick_sched_handle+0x33/0x60 [ 1074.075740] ? tick_sched_timer+0x3d/0x80 [ 1074.080211] ? recalibrate_cpu_khz+0x10/0x10 [ 1074.084971] ? ktime_get+0x3e/0xa0 [ 1074.088765] ? native_apic_msr_write+0x2b/0x30 [ 1074.093719] run_timer_softirq+0x2a/0x50 [ 1074.0
[Kernel-packages] [Bug 2081085] Re: wbt:wbt_* trace event NULL pointer dereference with GENHD_FL_HIDDEN disks
Patch submitted to kernel-team mailing list: https://lists.ubuntu.com/archives/kernel- team/2024-September/153806.html. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2081085 Title: wbt:wbt_* trace event NULL pointer dereference with GENHD_FL_HIDDEN disks Status in linux package in Ubuntu: Invalid Status in linux source package in Focal: In Progress Bug description: [Impact] Systems with storage devices that utilize the GENHD_FL_HIDDEN flag, such as NVMe disks declaring support for multiple controllers (aka native multipathing), will have a request queue with backing_dev_info->dev set to NULL. When tracing is enabled with any of the wbt:wbt_* events enabled, a NULL pointer dereference will occur in the corresponding trace function called from wb_timer_fn. This occurs when the trace function attempts to access the device's name with dev_name. On a DGXA100 system, this can be reproduced by running the following, where /dev/nvme0n1 is one of the 4 NVMe disks in the system that support native multipathing: $ echo 1 | sudo tee /sys/kernel/tracing/events/wbt/enable $ echo 1 | sudo tee /sys/kernel/tracing/tracing_on $ sudo dd if=/dev/zero of=/dev/nvme0n1 A NULL pointer dereference will occur and the system will become unresponsive. [Fix] The upstream commit d51cfc53ade318 ("bdi: use bdi_dev_name() to get device name") resolves this by changing the wbt:wbt_* trace functions to use the bdi_dev_name function instead of dev_name. The bdi_dev_name function safely handles the case where the supplied device is NULL. [Test Case] Verified that the commit d51cfc53ade318 ("bdi: use bdi_dev_name() to get device name") resolves the issue on DGXA100 when applied to the "Ubuntu-5.4.0-196.216" tag. The reproducer no longer causes a NULL pointer dereference or otherwise crash the system. [Regression Potential] There is a low risk of a regression: * In the focal K5.4 kernel, the bdi_dev_name function is used in other trace event functions for the same purpose of catching the case where bdi->dev is NULL. * This change is already present in kernel versions 5.7 and newer. [Other] The patch d51cfc53ade318 ("bdi: use bdi_dev_name() to get device name") is already present in Jammy K5.15 and newer. - Originally found via kernel regression testing and reported here by @cypressyew: https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/2072972. [ 1073.837085] BUG: kernel NULL pointer dereference, address: 0050 [ 1073.844858] #PF: supervisor read access in kernel mode [ 1073.850589] #PF: error_code(0x) - not-present page [ 1073.856318] PGD 0 P4D 0 [ 1073.859141] Oops: [#1] SMP NOPTI [ 1073.863226] CPU: 9 PID: 0 Comm: swapper/9 Tainted: P OE 5.4.0-196-generic #216-Ubuntu [ 1073.873319] Hardware name: NVIDIA DGXA100 920-23687-2530-000/DGXA100, BIOS 1.25 08/31/2023 [ 1073.882547] RIP: 0010:trace_event_raw_event_wbt_timer+0x6f/0x100 [ 1073.889248] Code: 59 80 e5 02 0f 85 8f 00 00 00 4c 89 e6 ba 34 00 00 00 48 8d 7d a0 e8 00 aa c9 ff 49 89 c4 48 85 c0 74 37 49 8b 87 b8 03 00 00 <48> 8b 70 50 48 85 f6 74 45 49 8d 7c 24 08 ba 20 00 00 00 e8 c9 18 [ 1073.910200] RSP: 0018:af08598a8da0 EFLAGS: 00010282 [ 1073.916029] RAX: RBX: RCX: 8100 [ 1073.923988] RDX: 970f0ba9501c RSI: 0100 RDI: 970f0ba95018 [ 1073.931947] RBP: af08598a8e08 R08: 970f0ba95010 R09: 0100 [ 1073.939906] R10: cf083fdc9a58 R11: 0386 R12: 970f0ba9501c [ 1073.947864] R13: R14: 0001 R15: 976efdfba000 [ 1073.955824] FS: () GS:970f0f64() knlGS: [ 1073.964850] CS: 0010 DS: ES: CR0: 80050033 [ 1073.971259] CR2: 0050 CR3: 001f7de8 CR4: 00340ee0 [ 1073.979217] Call Trace: [ 1073.981942] [ 1073.984188] ? show_regs.cold+0x1a/0x1f [ 1073.988456] ? __die+0x90/0xd9 [ 1073.991862] ? no_context.isra.0+0x12c/0x320 [ 1073.996626] ? update_group_capacity+0x2c/0x1d0 [ 1074.001679] ? __bad_area_nosemaphore+0x45/0x1a0 [ 1074.006829] ? bad_area_nosemaphore+0x16/0x20 [ 1074.011687] ? do_user_addr_fault+0x267/0x440 [ 1074.016547] ? __enqueue_entity+0x96/0xa0 [ 1074.021018] ? enqueue_entity+0x139/0x670 [ 1074.025490] ? __do_page_fault+0x58/0x90 [ 1074.029861] ? do_page_fault+0x2c/0xe0 [ 1074.034044] ? page_fault+0x34/0x40 [ 1074.037933] ? trace_event_raw_event_wbt_timer+0x6f/0x100 [ 1074.043954] ? enqueue_entity+0x139/0x670 [ 1074.048426] wb_timer_fn+0x1d6/0x3c0 [ 1074.052413] ? blk_mq_tag_update_depth+0x100/0x100 [ 1074.057755] blk_stat_timer_fn+0x13a/0x140 [ 1074.062326] call_timer_fn+0x32/0x130 [ 1074.066409] __run_timers.part.0+0x180/0x280 [