[Kernel-packages] [Bug 2081085] Re: wbt:wbt_* trace event NULL pointer dereference with GENHD_FL_HIDDEN disks

2024-11-21 Thread Ubuntu Kernel Bot
This bug is awaiting verification that the linux-
gcp-5.4/5.4.0-1140.149~18.04.1 kernel in -proposed solves the problem.
Please test the kernel and update this bug with the results. If the
problem is solved, change the tag 'verification-needed-bionic-linux-
gcp-5.4' to 'verification-done-bionic-linux-gcp-5.4'. If the problem
still exists, change the tag 'verification-needed-bionic-linux-gcp-5.4'
to 'verification-failed-bionic-linux-gcp-5.4'.


If verification is not done by 5 working days from today, this fix will
be dropped from the source code, and this bug will be closed.


See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
to enable and use -proposed. Thank you!


** Tags added: kernel-spammed-bionic-linux-gcp-5.4-v2 
verification-needed-bionic-linux-gcp-5.4

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2081085

Title:
  wbt:wbt_* trace event NULL pointer dereference with GENHD_FL_HIDDEN
  disks

Status in linux package in Ubuntu:
  Invalid
Status in linux source package in Focal:
  Fix Released

Bug description:
  [Impact]
  Systems with storage devices that utilize the GENHD_FL_HIDDEN flag, such as 
NVMe disks declaring support for multiple controllers (aka native 
multipathing), will have a request queue with backing_dev_info->dev set to 
NULL. When tracing is enabled with any of the wbt:wbt_* events enabled, a NULL 
pointer dereference will occur in the corresponding trace function called from 
wb_timer_fn. This occurs when the trace function attempts to access the 
device's name with dev_name.

  On a DGXA100 system, this can be reproduced by running the following, where 
/dev/nvme0n1 is one of the 4 NVMe disks in the system that support native 
multipathing:
  $ echo 1 | sudo tee /sys/kernel/tracing/events/wbt/enable
  $ echo 1 | sudo tee /sys/kernel/tracing/tracing_on
  $ sudo dd if=/dev/zero of=/dev/nvme0n1

  A NULL pointer dereference will occur and the system will become
  unresponsive.

  [Fix]
  The upstream commit d51cfc53ade318 ("bdi: use bdi_dev_name() to get device 
name") resolves this by changing the wbt:wbt_* trace functions to use the 
bdi_dev_name function instead of dev_name. The bdi_dev_name function safely 
handles the case where the supplied device is NULL.

  [Test Case]
  Verified that the commit d51cfc53ade318 ("bdi: use bdi_dev_name() to get 
device name") resolves the issue on DGXA100 when applied to the 
"Ubuntu-5.4.0-196.216" tag. The reproducer no longer causes a NULL pointer 
dereference or otherwise crash the system.

  [Regression Potential]
  There is a low risk of a regression:
  * In the focal K5.4 kernel, the bdi_dev_name function is used in other trace 
event functions for the same purpose of catching the case where bdi->dev is 
NULL.
  * This change is already present in kernel versions 5.7 and newer.

  [Other]
  The patch d51cfc53ade318 ("bdi: use bdi_dev_name() to get device name") is 
already present in Jammy K5.15 and newer.

  -
  Originally found via kernel regression testing and reported here by 
@cypressyew: https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/2072972.

  [ 1073.837085] BUG: kernel NULL pointer dereference, address: 0050
  [ 1073.844858] #PF: supervisor read access in kernel mode
  [ 1073.850589] #PF: error_code(0x) - not-present page
  [ 1073.856318] PGD 0 P4D 0
  [ 1073.859141] Oops:  [#1] SMP NOPTI
  [ 1073.863226] CPU: 9 PID: 0 Comm: swapper/9 Tainted: P   OE 
5.4.0-196-generic #216-Ubuntu
  [ 1073.873319] Hardware name: NVIDIA DGXA100 920-23687-2530-000/DGXA100, BIOS 
1.25 08/31/2023
  [ 1073.882547] RIP: 0010:trace_event_raw_event_wbt_timer+0x6f/0x100
  [ 1073.889248] Code: 59 80 e5 02 0f 85 8f 00 00 00 4c 89 e6 ba 34 00 00 00 48 
8d 7d a0 e8 00 aa c9 ff 49 89 c4 48 85 c0 74 37 49 8b 87 b8 03 00 00 <48> 8b 70 
50 48 85 f6 74 45 49 8d 7c 24 08 ba 20 00 00 00 e8 c9 18
  [ 1073.910200] RSP: 0018:af08598a8da0 EFLAGS: 00010282
  [ 1073.916029] RAX:  RBX:  RCX: 
8100
  [ 1073.923988] RDX: 970f0ba9501c RSI: 0100 RDI: 
970f0ba95018
  [ 1073.931947] RBP: af08598a8e08 R08: 970f0ba95010 R09: 
0100
  [ 1073.939906] R10: cf083fdc9a58 R11: 0386 R12: 
970f0ba9501c
  [ 1073.947864] R13:  R14: 0001 R15: 
976efdfba000
  [ 1073.955824] FS:  () GS:970f0f64() 
knlGS:
  [ 1073.964850] CS:  0010 DS:  ES:  CR0: 80050033
  [ 1073.971259] CR2: 0050 CR3: 001f7de8 CR4: 
00340ee0
  [ 1073.979217] Call Trace:
  [ 1073.981942]  
  [ 1073.984188]  ? show_regs.cold+0x1a/0x1f
  [ 1073.988456]  ? __die+0x90/0xd9
  [ 1073.991862]  ? no_context.isra.0+0x12c/0x320
  [ 1073.996626]  ? update_group_capacity+0x2c/0x1d0
  [ 1074.001679]  ? __bad_area_nosemaphore+0x45/0x1a0
  [ 

[Kernel-packages] [Bug 2081085] Re: wbt:wbt_* trace event NULL pointer dereference with GENHD_FL_HIDDEN disks

2024-10-30 Thread Launchpad Bug Tracker
This bug was fixed in the package linux - 5.4.0-200.220

---
linux (5.4.0-200.220) focal; urgency=medium

  * focal/linux: 5.4.0-200.220 -proposed tracker (LP: #2082937)

  * Packaging resync (LP: #1786013)
- [Packaging] debian.master/dkms-versions -- update from kernel-versions
  (main/2024.09.30)

  * CVE-2024-26800
- tls: rx: coalesce exit paths in tls_decrypt_sg()
- tls: separate no-async decryption request handling from async
- tls: fix use-after-free on failed backlog decryption

  * CVE-2024-26641
- ip6_tunnel: make sure to pull inner header in __ip6_tnl_rcv()

  * CVE-2021-47212
- net/mlx5: Update error handler for UCTX and UMEM

  * wbt:wbt_* trace event NULL pointer dereference with GENHD_FL_HIDDEN disks
(LP: #2081085)
- bdi: use bdi_dev_name() to get device name

  * Focal update: v5.4.284 upstream stable release (LP: #2081278)
- drm: panel-orientation-quirks: Add quirk for OrangePi Neo
- i2c: Fix conditional for substituting empty ACPI functions
- net: usb: qmi_wwan: add MeiG Smart SRM825L
- drm/amdgpu: Fix uninitialized variable warning in amdgpu_afmt_acr
- drm/amdgpu: fix overflowed array index read warning
- drm/amd/display: Check gpio_id before used as array index
- drm/amd/display: Stop amdgpu_dm initialize when stream nums greater than 6
- drm/amd/display: Check num_valid_sets before accessing reader_wm_sets[]
- drm/amd/display: Fix Coverity INTEGER_OVERFLOW within
  dal_gpio_service_create
- drm/amdgpu: fix ucode out-of-bounds read warning
- drm/amdgpu: fix mc_data out-of-bounds read warning
- drm/amdkfd: Reconcile the definition and use of oem_id in struct
  kfd_topology_device
- apparmor: fix possible NULL pointer dereference
- ionic: fix potential irq name truncation
- usbip: Don't submit special requests twice
- usb: typec: ucsi: Fix null pointer dereference in trace
- smack: tcp: ipv4, fix incorrect labeling
- wifi: cfg80211: make hash table duplicates more survivable
- drm/amd/display: Skip wbscl_set_scaler_filter if filter is null
- media: uvcvideo: Enforce alignment of frame and interval
- block: initialize integrity buffer to zero before writing it to media
- net: set SOCK_RCU_FREE before inserting socket into hashtable
- virtio_net: Fix napi_skb_cache_put warning
- udf: Limit file size to 4TB
- i2c: Use IS_REACHABLE() for substituting empty ACPI functions
- sch/netem: fix use after free in netem_dequeue
- ASoC: dapm: Fix UAF for snd_soc_pcm_runtime object
- ALSA: hda/conexant: Add pincfg quirk to enable top speakers on Sirius
  devices
- ata: libata: Fix memory leak for error path in ata_host_alloc()
- irqchip/gic-v2m: Fix refcount leak in gicv2m_of_init()
- mmc: dw_mmc: Fix IDMAC operation with pages bigger than 4K
- mmc: sdhci-of-aspeed: fix module autoloading
- fuse: update stats for pages in dropped aux writeback list
- fuse: use unsigned type for getxattr/listxattr size truncation
- reset: hi6220: Add support for AO reset controller
- clk: hi6220: use CLK_OF_DECLARE_DRIVER
- clk: qcom: clk-alpha-pll: Fix the pll post div mask
- clk: qcom: clk-alpha-pll: Fix the trion pll postdiv set rate API
- ila: call nf_unregister_net_hooks() sooner
- sched: sch_cake: fix bulk flow accounting logic for host fairness
- nilfs2: fix missing cleanup on rollforward recovery error
- nilfs2: fix state management in error path of log writing function
- ALSA: hda: Add input value sanity checks to HDMI channel map controls
- smack: unix sockets: fix accept()ed socket label
- irqchip/armada-370-xp: Do not allow mapping IRQ 0 and 1
- af_unix: Remove put_pid()/put_cred() in copy_peercred().
- netfilter: nf_conncount: fix wrong variable type
- udf: Avoid excessive partition lengths
- wifi: brcmsmac: advertise MFP_CAPABLE to enable WPA3
- usb: uas: set host status byte on data completion error
- PCI: keystone: Add workaround for Errata #i2037 (AM65x SR 1.0)
- media: qcom: camss: Add check for v4l2_fwnode_endpoint_parse
- pcmcia: Use resource_size function on resource object
- can: bcm: Remove proc entry when dev is unregistered.
- igb: Fix not clearing TimeSync interrupts for 82580
- platform/x86: dell-smbios: Fix error path in dell_smbios_init()
- tcp_bpf: fix return value of tcp_bpf_sendmsg()
- cx82310_eth: re-enable ethernet mode after router reboot
- drivers/net/usb: Remove all strcpy() uses
- net: usb: don't write directly to netdev->dev_addr
- usbnet: modern method to get random MAC
- net: bridge: fdb: convert is_local to bitops
- net: bridge: fdb: convert is_static to bitops
- net: bridge: fdb: convert is_sticky to bitops
- net: bridge: fdb: convert added_by_user to bitops
- net: bridge: fdb: convert added_by_external_learn to use bitops
- net: bridge: br_fdb_external_lear

[Kernel-packages] [Bug 2081085] Re: wbt:wbt_* trace event NULL pointer dereference with GENHD_FL_HIDDEN disks

2024-10-27 Thread Jacob Martin
With 5.4.0-198-generic, running the reproducer causes the reported crash
on a DGXA100, leaving the system unresponsive.

After enabling -proposed, upgrading to 5.4.0-200-generic, and rebooting,
the reproducer does not cause a crash.

$ echo 1 | sudo tee /sys/kernel/tracing/events/wbt/enable
$ echo 1 | sudo tee /sys/kernel/tracing/tracing_on
$ sudo dd if=/dev/zero of=/dev/nvme3n1

Note, /dev/nvme3n1 is one of the NVMe disks on DGXA100 with native
multipathing, as evidenced by the link /sys/class/block/nvme3c3n1 being
present:

$ readlink /sys/class/block/nvme3c3n1
../../devices/pci:40/:40:01.1/:41:00.0/:42:08.0/:50:00.0/:51:00.0/:52:00.0/nvme/nvme3/nvme3c3n1

** Tags removed: verification-needed-focal-linux
** Tags added: verification-done-focal-linux

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2081085

Title:
  wbt:wbt_* trace event NULL pointer dereference with GENHD_FL_HIDDEN
  disks

Status in linux package in Ubuntu:
  Invalid
Status in linux source package in Focal:
  Fix Committed

Bug description:
  [Impact]
  Systems with storage devices that utilize the GENHD_FL_HIDDEN flag, such as 
NVMe disks declaring support for multiple controllers (aka native 
multipathing), will have a request queue with backing_dev_info->dev set to 
NULL. When tracing is enabled with any of the wbt:wbt_* events enabled, a NULL 
pointer dereference will occur in the corresponding trace function called from 
wb_timer_fn. This occurs when the trace function attempts to access the 
device's name with dev_name.

  On a DGXA100 system, this can be reproduced by running the following, where 
/dev/nvme0n1 is one of the 4 NVMe disks in the system that support native 
multipathing:
  $ echo 1 | sudo tee /sys/kernel/tracing/events/wbt/enable
  $ echo 1 | sudo tee /sys/kernel/tracing/tracing_on
  $ sudo dd if=/dev/zero of=/dev/nvme0n1

  A NULL pointer dereference will occur and the system will become
  unresponsive.

  [Fix]
  The upstream commit d51cfc53ade318 ("bdi: use bdi_dev_name() to get device 
name") resolves this by changing the wbt:wbt_* trace functions to use the 
bdi_dev_name function instead of dev_name. The bdi_dev_name function safely 
handles the case where the supplied device is NULL.

  [Test Case]
  Verified that the commit d51cfc53ade318 ("bdi: use bdi_dev_name() to get 
device name") resolves the issue on DGXA100 when applied to the 
"Ubuntu-5.4.0-196.216" tag. The reproducer no longer causes a NULL pointer 
dereference or otherwise crash the system.

  [Regression Potential]
  There is a low risk of a regression:
  * In the focal K5.4 kernel, the bdi_dev_name function is used in other trace 
event functions for the same purpose of catching the case where bdi->dev is 
NULL.
  * This change is already present in kernel versions 5.7 and newer.

  [Other]
  The patch d51cfc53ade318 ("bdi: use bdi_dev_name() to get device name") is 
already present in Jammy K5.15 and newer.

  -
  Originally found via kernel regression testing and reported here by 
@cypressyew: https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/2072972.

  [ 1073.837085] BUG: kernel NULL pointer dereference, address: 0050
  [ 1073.844858] #PF: supervisor read access in kernel mode
  [ 1073.850589] #PF: error_code(0x) - not-present page
  [ 1073.856318] PGD 0 P4D 0
  [ 1073.859141] Oops:  [#1] SMP NOPTI
  [ 1073.863226] CPU: 9 PID: 0 Comm: swapper/9 Tainted: P   OE 
5.4.0-196-generic #216-Ubuntu
  [ 1073.873319] Hardware name: NVIDIA DGXA100 920-23687-2530-000/DGXA100, BIOS 
1.25 08/31/2023
  [ 1073.882547] RIP: 0010:trace_event_raw_event_wbt_timer+0x6f/0x100
  [ 1073.889248] Code: 59 80 e5 02 0f 85 8f 00 00 00 4c 89 e6 ba 34 00 00 00 48 
8d 7d a0 e8 00 aa c9 ff 49 89 c4 48 85 c0 74 37 49 8b 87 b8 03 00 00 <48> 8b 70 
50 48 85 f6 74 45 49 8d 7c 24 08 ba 20 00 00 00 e8 c9 18
  [ 1073.910200] RSP: 0018:af08598a8da0 EFLAGS: 00010282
  [ 1073.916029] RAX:  RBX:  RCX: 
8100
  [ 1073.923988] RDX: 970f0ba9501c RSI: 0100 RDI: 
970f0ba95018
  [ 1073.931947] RBP: af08598a8e08 R08: 970f0ba95010 R09: 
0100
  [ 1073.939906] R10: cf083fdc9a58 R11: 0386 R12: 
970f0ba9501c
  [ 1073.947864] R13:  R14: 0001 R15: 
976efdfba000
  [ 1073.955824] FS:  () GS:970f0f64() 
knlGS:
  [ 1073.964850] CS:  0010 DS:  ES:  CR0: 80050033
  [ 1073.971259] CR2: 0050 CR3: 001f7de8 CR4: 
00340ee0
  [ 1073.979217] Call Trace:
  [ 1073.981942]  
  [ 1073.984188]  ? show_regs.cold+0x1a/0x1f
  [ 1073.988456]  ? __die+0x90/0xd9
  [ 1073.991862]  ? no_context.isra.0+0x12c/0x320
  [ 1073.996626]  ? update_group_capacity+0x2c/0x1d0
  [ 1074.001679]  ? __bad_area_nosemaphore+0x45/0x1a0
  [ 1074.

[Kernel-packages] [Bug 2081085] Re: wbt:wbt_* trace event NULL pointer dereference with GENHD_FL_HIDDEN disks

2024-10-01 Thread Ubuntu Kernel Bot
This bug is awaiting verification that the linux/5.4.0-200.220 kernel in
-proposed solves the problem. Please test the kernel and update this bug
with the results. If the problem is solved, change the tag
'verification-needed-focal-linux' to 'verification-done-focal-linux'. If
the problem still exists, change the tag 'verification-needed-focal-
linux' to 'verification-failed-focal-linux'.


If verification is not done by 5 working days from today, this fix will
be dropped from the source code, and this bug will be closed.


See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
to enable and use -proposed. Thank you!


** Tags added: kernel-spammed-focal-linux-v2 verification-needed-focal-linux

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2081085

Title:
  wbt:wbt_* trace event NULL pointer dereference with GENHD_FL_HIDDEN
  disks

Status in linux package in Ubuntu:
  Invalid
Status in linux source package in Focal:
  Fix Committed

Bug description:
  [Impact]
  Systems with storage devices that utilize the GENHD_FL_HIDDEN flag, such as 
NVMe disks declaring support for multiple controllers (aka native 
multipathing), will have a request queue with backing_dev_info->dev set to 
NULL. When tracing is enabled with any of the wbt:wbt_* events enabled, a NULL 
pointer dereference will occur in the corresponding trace function called from 
wb_timer_fn. This occurs when the trace function attempts to access the 
device's name with dev_name.

  On a DGXA100 system, this can be reproduced by running the following, where 
/dev/nvme0n1 is one of the 4 NVMe disks in the system that support native 
multipathing:
  $ echo 1 | sudo tee /sys/kernel/tracing/events/wbt/enable
  $ echo 1 | sudo tee /sys/kernel/tracing/tracing_on
  $ sudo dd if=/dev/zero of=/dev/nvme0n1

  A NULL pointer dereference will occur and the system will become
  unresponsive.

  [Fix]
  The upstream commit d51cfc53ade318 ("bdi: use bdi_dev_name() to get device 
name") resolves this by changing the wbt:wbt_* trace functions to use the 
bdi_dev_name function instead of dev_name. The bdi_dev_name function safely 
handles the case where the supplied device is NULL.

  [Test Case]
  Verified that the commit d51cfc53ade318 ("bdi: use bdi_dev_name() to get 
device name") resolves the issue on DGXA100 when applied to the 
"Ubuntu-5.4.0-196.216" tag. The reproducer no longer causes a NULL pointer 
dereference or otherwise crash the system.

  [Regression Potential]
  There is a low risk of a regression:
  * In the focal K5.4 kernel, the bdi_dev_name function is used in other trace 
event functions for the same purpose of catching the case where bdi->dev is 
NULL.
  * This change is already present in kernel versions 5.7 and newer.

  [Other]
  The patch d51cfc53ade318 ("bdi: use bdi_dev_name() to get device name") is 
already present in Jammy K5.15 and newer.

  -
  Originally found via kernel regression testing and reported here by 
@cypressyew: https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/2072972.

  [ 1073.837085] BUG: kernel NULL pointer dereference, address: 0050
  [ 1073.844858] #PF: supervisor read access in kernel mode
  [ 1073.850589] #PF: error_code(0x) - not-present page
  [ 1073.856318] PGD 0 P4D 0
  [ 1073.859141] Oops:  [#1] SMP NOPTI
  [ 1073.863226] CPU: 9 PID: 0 Comm: swapper/9 Tainted: P   OE 
5.4.0-196-generic #216-Ubuntu
  [ 1073.873319] Hardware name: NVIDIA DGXA100 920-23687-2530-000/DGXA100, BIOS 
1.25 08/31/2023
  [ 1073.882547] RIP: 0010:trace_event_raw_event_wbt_timer+0x6f/0x100
  [ 1073.889248] Code: 59 80 e5 02 0f 85 8f 00 00 00 4c 89 e6 ba 34 00 00 00 48 
8d 7d a0 e8 00 aa c9 ff 49 89 c4 48 85 c0 74 37 49 8b 87 b8 03 00 00 <48> 8b 70 
50 48 85 f6 74 45 49 8d 7c 24 08 ba 20 00 00 00 e8 c9 18
  [ 1073.910200] RSP: 0018:af08598a8da0 EFLAGS: 00010282
  [ 1073.916029] RAX:  RBX:  RCX: 
8100
  [ 1073.923988] RDX: 970f0ba9501c RSI: 0100 RDI: 
970f0ba95018
  [ 1073.931947] RBP: af08598a8e08 R08: 970f0ba95010 R09: 
0100
  [ 1073.939906] R10: cf083fdc9a58 R11: 0386 R12: 
970f0ba9501c
  [ 1073.947864] R13:  R14: 0001 R15: 
976efdfba000
  [ 1073.955824] FS:  () GS:970f0f64() 
knlGS:
  [ 1073.964850] CS:  0010 DS:  ES:  CR0: 80050033
  [ 1073.971259] CR2: 0050 CR3: 001f7de8 CR4: 
00340ee0
  [ 1073.979217] Call Trace:
  [ 1073.981942]  
  [ 1073.984188]  ? show_regs.cold+0x1a/0x1f
  [ 1073.988456]  ? __die+0x90/0xd9
  [ 1073.991862]  ? no_context.isra.0+0x12c/0x320
  [ 1073.996626]  ? update_group_capacity+0x2c/0x1d0
  [ 1074.001679]  ? __bad_area_nosemaphore+0x45/0x1a0
  [ 1074.006829]  ? bad_area_nosemaphore+0x16/0x20
  [ 1074.011687]  ? do_us

[Kernel-packages] [Bug 2081085] Re: wbt:wbt_* trace event NULL pointer dereference with GENHD_FL_HIDDEN disks

2024-09-25 Thread Stefan Bader
** Changed in: linux (Ubuntu Focal)
   Importance: Undecided => Medium

** Changed in: linux (Ubuntu Focal)
   Status: In Progress => Fix Committed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2081085

Title:
  wbt:wbt_* trace event NULL pointer dereference with GENHD_FL_HIDDEN
  disks

Status in linux package in Ubuntu:
  Invalid
Status in linux source package in Focal:
  Fix Committed

Bug description:
  [Impact]
  Systems with storage devices that utilize the GENHD_FL_HIDDEN flag, such as 
NVMe disks declaring support for multiple controllers (aka native 
multipathing), will have a request queue with backing_dev_info->dev set to 
NULL. When tracing is enabled with any of the wbt:wbt_* events enabled, a NULL 
pointer dereference will occur in the corresponding trace function called from 
wb_timer_fn. This occurs when the trace function attempts to access the 
device's name with dev_name.

  On a DGXA100 system, this can be reproduced by running the following, where 
/dev/nvme0n1 is one of the 4 NVMe disks in the system that support native 
multipathing:
  $ echo 1 | sudo tee /sys/kernel/tracing/events/wbt/enable
  $ echo 1 | sudo tee /sys/kernel/tracing/tracing_on
  $ sudo dd if=/dev/zero of=/dev/nvme0n1

  A NULL pointer dereference will occur and the system will become
  unresponsive.

  [Fix]
  The upstream commit d51cfc53ade318 ("bdi: use bdi_dev_name() to get device 
name") resolves this by changing the wbt:wbt_* trace functions to use the 
bdi_dev_name function instead of dev_name. The bdi_dev_name function safely 
handles the case where the supplied device is NULL.

  [Test Case]
  Verified that the commit d51cfc53ade318 ("bdi: use bdi_dev_name() to get 
device name") resolves the issue on DGXA100 when applied to the 
"Ubuntu-5.4.0-196.216" tag. The reproducer no longer causes a NULL pointer 
dereference or otherwise crash the system.

  [Regression Potential]
  There is a low risk of a regression:
  * In the focal K5.4 kernel, the bdi_dev_name function is used in other trace 
event functions for the same purpose of catching the case where bdi->dev is 
NULL.
  * This change is already present in kernel versions 5.7 and newer.

  [Other]
  The patch d51cfc53ade318 ("bdi: use bdi_dev_name() to get device name") is 
already present in Jammy K5.15 and newer.

  -
  Originally found via kernel regression testing and reported here by 
@cypressyew: https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/2072972.

  [ 1073.837085] BUG: kernel NULL pointer dereference, address: 0050
  [ 1073.844858] #PF: supervisor read access in kernel mode
  [ 1073.850589] #PF: error_code(0x) - not-present page
  [ 1073.856318] PGD 0 P4D 0
  [ 1073.859141] Oops:  [#1] SMP NOPTI
  [ 1073.863226] CPU: 9 PID: 0 Comm: swapper/9 Tainted: P   OE 
5.4.0-196-generic #216-Ubuntu
  [ 1073.873319] Hardware name: NVIDIA DGXA100 920-23687-2530-000/DGXA100, BIOS 
1.25 08/31/2023
  [ 1073.882547] RIP: 0010:trace_event_raw_event_wbt_timer+0x6f/0x100
  [ 1073.889248] Code: 59 80 e5 02 0f 85 8f 00 00 00 4c 89 e6 ba 34 00 00 00 48 
8d 7d a0 e8 00 aa c9 ff 49 89 c4 48 85 c0 74 37 49 8b 87 b8 03 00 00 <48> 8b 70 
50 48 85 f6 74 45 49 8d 7c 24 08 ba 20 00 00 00 e8 c9 18
  [ 1073.910200] RSP: 0018:af08598a8da0 EFLAGS: 00010282
  [ 1073.916029] RAX:  RBX:  RCX: 
8100
  [ 1073.923988] RDX: 970f0ba9501c RSI: 0100 RDI: 
970f0ba95018
  [ 1073.931947] RBP: af08598a8e08 R08: 970f0ba95010 R09: 
0100
  [ 1073.939906] R10: cf083fdc9a58 R11: 0386 R12: 
970f0ba9501c
  [ 1073.947864] R13:  R14: 0001 R15: 
976efdfba000
  [ 1073.955824] FS:  () GS:970f0f64() 
knlGS:
  [ 1073.964850] CS:  0010 DS:  ES:  CR0: 80050033
  [ 1073.971259] CR2: 0050 CR3: 001f7de8 CR4: 
00340ee0
  [ 1073.979217] Call Trace:
  [ 1073.981942]  
  [ 1073.984188]  ? show_regs.cold+0x1a/0x1f
  [ 1073.988456]  ? __die+0x90/0xd9
  [ 1073.991862]  ? no_context.isra.0+0x12c/0x320
  [ 1073.996626]  ? update_group_capacity+0x2c/0x1d0
  [ 1074.001679]  ? __bad_area_nosemaphore+0x45/0x1a0
  [ 1074.006829]  ? bad_area_nosemaphore+0x16/0x20
  [ 1074.011687]  ? do_user_addr_fault+0x267/0x440
  [ 1074.016547]  ? __enqueue_entity+0x96/0xa0
  [ 1074.021018]  ? enqueue_entity+0x139/0x670
  [ 1074.025490]  ? __do_page_fault+0x58/0x90
  [ 1074.029861]  ? do_page_fault+0x2c/0xe0
  [ 1074.034044]  ? page_fault+0x34/0x40
  [ 1074.037933]  ? trace_event_raw_event_wbt_timer+0x6f/0x100
  [ 1074.043954]  ? enqueue_entity+0x139/0x670
  [ 1074.048426]  wb_timer_fn+0x1d6/0x3c0
  [ 1074.052413]  ? blk_mq_tag_update_depth+0x100/0x100
  [ 1074.057755]  blk_stat_timer_fn+0x13a/0x140
  [ 1074.062326]  call_timer_fn+0x32/0x130
  [ 1074.066409]  __

[Kernel-packages] [Bug 2081085] Re: wbt:wbt_* trace event NULL pointer dereference with GENHD_FL_HIDDEN disks

2024-09-18 Thread Jacob Martin
** Description changed:

  [Impact]
  Systems with storage devices that utilize the GENHD_FL_HIDDEN flag, such as 
NVMe disks declaring support for multiple controllers (aka native 
multipathing), will have a request queue with backing_dev_info->dev set to 
NULL. When tracing is enabled with any of the wbt:wbt_* events enabled, a NULL 
pointer dereference will occur in the corresponding trace function called from 
wb_timer_fn. This occurs when the trace function attempts to access the 
device's name with dev_name.
  
  On a DGXA100 system, this can be reproduced by running the following, where 
/dev/nvme0n1 is one of the 4 NVMe disks in the system that support native 
multipathing:
  $ echo 1 | sudo tee /sys/kernel/tracing/events/wbt/enable
  $ echo 1 | sudo tee /sys/kernel/tracing/tracing_on
  $ sudo dd if=/dev/zero of=/dev/nvme0n1
  
  A NULL pointer dereference will occur and the system will become
  unresponsive.
  
  [Fix]
  The upstream commit d51cfc53ade318 ("bdi: use bdi_dev_name() to get device 
name") resolves this by changing the wbt:wbt_* trace functions to use the 
bdi_dev_name function instead of dev_name. The bdi_dev_name function safely 
handles the case where the supplied device is NULL.
  
  [Test Case]
  Verified that the commit d51cfc53ade318 ("bdi: use bdi_dev_name() to get 
device name") resolves the issue on DGXA100 when applied to the 
"Ubuntu-5.4.0-196.216" tag. The reproducer no longer causes a NULL pointer 
dereference or otherwise crash the system.
  
  [Regression Potential]
  There is a low risk of a regression:
  * In the focal K5.4 kernel, the bdi_dev_name function is used in other trace 
event functions for the same purpose of catching the case where bdi->dev is 
NULL.
  * This change is already present in kernel versions 5.7 and newer.
  
  [Other]
  The patch d51cfc53ade318 ("bdi: use bdi_dev_name() to get device name") is 
already present in Jammy K5.15 and newer.
  
  -
- Originally found via kernel regression testing and reported here: 
https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/2072972.
+ Originally found via kernel regression testing and reported here by 
@cypressyew: https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/2072972.
  
  [ 1073.837085] BUG: kernel NULL pointer dereference, address: 0050
  [ 1073.844858] #PF: supervisor read access in kernel mode
  [ 1073.850589] #PF: error_code(0x) - not-present page
- [ 1073.856318] PGD 0 P4D 0 
+ [ 1073.856318] PGD 0 P4D 0
  [ 1073.859141] Oops:  [#1] SMP NOPTI
  [ 1073.863226] CPU: 9 PID: 0 Comm: swapper/9 Tainted: P   OE 
5.4.0-196-generic #216-Ubuntu
  [ 1073.873319] Hardware name: NVIDIA DGXA100 920-23687-2530-000/DGXA100, BIOS 
1.25 08/31/2023
  [ 1073.882547] RIP: 0010:trace_event_raw_event_wbt_timer+0x6f/0x100
  [ 1073.889248] Code: 59 80 e5 02 0f 85 8f 00 00 00 4c 89 e6 ba 34 00 00 00 48 
8d 7d a0 e8 00 aa c9 ff 49 89 c4 48 85 c0 74 37 49 8b 87 b8 03 00 00 <48> 8b 70 
50 48 85 f6 74 45 49 8d 7c 24 08 ba 20 00 00 00 e8 c9 18
  [ 1073.910200] RSP: 0018:af08598a8da0 EFLAGS: 00010282
  [ 1073.916029] RAX:  RBX:  RCX: 
8100
  [ 1073.923988] RDX: 970f0ba9501c RSI: 0100 RDI: 
970f0ba95018
  [ 1073.931947] RBP: af08598a8e08 R08: 970f0ba95010 R09: 
0100
  [ 1073.939906] R10: cf083fdc9a58 R11: 0386 R12: 
970f0ba9501c
  [ 1073.947864] R13:  R14: 0001 R15: 
976efdfba000
  [ 1073.955824] FS:  () GS:970f0f64() 
knlGS:
  [ 1073.964850] CS:  0010 DS:  ES:  CR0: 80050033
  [ 1073.971259] CR2: 0050 CR3: 001f7de8 CR4: 
00340ee0
  [ 1073.979217] Call Trace:
  [ 1073.981942]  
  [ 1073.984188]  ? show_regs.cold+0x1a/0x1f
  [ 1073.988456]  ? __die+0x90/0xd9
  [ 1073.991862]  ? no_context.isra.0+0x12c/0x320
  [ 1073.996626]  ? update_group_capacity+0x2c/0x1d0
  [ 1074.001679]  ? __bad_area_nosemaphore+0x45/0x1a0
  [ 1074.006829]  ? bad_area_nosemaphore+0x16/0x20
  [ 1074.011687]  ? do_user_addr_fault+0x267/0x440
  [ 1074.016547]  ? __enqueue_entity+0x96/0xa0
  [ 1074.021018]  ? enqueue_entity+0x139/0x670
  [ 1074.025490]  ? __do_page_fault+0x58/0x90
  [ 1074.029861]  ? do_page_fault+0x2c/0xe0
  [ 1074.034044]  ? page_fault+0x34/0x40
  [ 1074.037933]  ? trace_event_raw_event_wbt_timer+0x6f/0x100
  [ 1074.043954]  ? enqueue_entity+0x139/0x670
  [ 1074.048426]  wb_timer_fn+0x1d6/0x3c0
  [ 1074.052413]  ? blk_mq_tag_update_depth+0x100/0x100
  [ 1074.057755]  blk_stat_timer_fn+0x13a/0x140
  [ 1074.062326]  call_timer_fn+0x32/0x130
  [ 1074.066409]  __run_timers.part.0+0x180/0x280
  [ 1074.071174]  ? tick_sched_handle+0x33/0x60
  [ 1074.075740]  ? tick_sched_timer+0x3d/0x80
  [ 1074.080211]  ? recalibrate_cpu_khz+0x10/0x10
  [ 1074.084971]  ? ktime_get+0x3e/0xa0
  [ 1074.088765]  ? native_apic_msr_write+0x2b/0x30
  [ 1074.093719]  run_timer_softirq+0x2a/0x50
  [ 1074.0

[Kernel-packages] [Bug 2081085] Re: wbt:wbt_* trace event NULL pointer dereference with GENHD_FL_HIDDEN disks

2024-09-18 Thread Jacob Martin
Patch submitted to kernel-team mailing list:
https://lists.ubuntu.com/archives/kernel-
team/2024-September/153806.html.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2081085

Title:
  wbt:wbt_* trace event NULL pointer dereference with GENHD_FL_HIDDEN
  disks

Status in linux package in Ubuntu:
  Invalid
Status in linux source package in Focal:
  In Progress

Bug description:
  [Impact]
  Systems with storage devices that utilize the GENHD_FL_HIDDEN flag, such as 
NVMe disks declaring support for multiple controllers (aka native 
multipathing), will have a request queue with backing_dev_info->dev set to 
NULL. When tracing is enabled with any of the wbt:wbt_* events enabled, a NULL 
pointer dereference will occur in the corresponding trace function called from 
wb_timer_fn. This occurs when the trace function attempts to access the 
device's name with dev_name.

  On a DGXA100 system, this can be reproduced by running the following, where 
/dev/nvme0n1 is one of the 4 NVMe disks in the system that support native 
multipathing:
  $ echo 1 | sudo tee /sys/kernel/tracing/events/wbt/enable
  $ echo 1 | sudo tee /sys/kernel/tracing/tracing_on
  $ sudo dd if=/dev/zero of=/dev/nvme0n1

  A NULL pointer dereference will occur and the system will become
  unresponsive.

  [Fix]
  The upstream commit d51cfc53ade318 ("bdi: use bdi_dev_name() to get device 
name") resolves this by changing the wbt:wbt_* trace functions to use the 
bdi_dev_name function instead of dev_name. The bdi_dev_name function safely 
handles the case where the supplied device is NULL.

  [Test Case]
  Verified that the commit d51cfc53ade318 ("bdi: use bdi_dev_name() to get 
device name") resolves the issue on DGXA100 when applied to the 
"Ubuntu-5.4.0-196.216" tag. The reproducer no longer causes a NULL pointer 
dereference or otherwise crash the system.

  [Regression Potential]
  There is a low risk of a regression:
  * In the focal K5.4 kernel, the bdi_dev_name function is used in other trace 
event functions for the same purpose of catching the case where bdi->dev is 
NULL.
  * This change is already present in kernel versions 5.7 and newer.

  [Other]
  The patch d51cfc53ade318 ("bdi: use bdi_dev_name() to get device name") is 
already present in Jammy K5.15 and newer.

  -
  Originally found via kernel regression testing and reported here by 
@cypressyew: https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/2072972.

  [ 1073.837085] BUG: kernel NULL pointer dereference, address: 0050
  [ 1073.844858] #PF: supervisor read access in kernel mode
  [ 1073.850589] #PF: error_code(0x) - not-present page
  [ 1073.856318] PGD 0 P4D 0
  [ 1073.859141] Oops:  [#1] SMP NOPTI
  [ 1073.863226] CPU: 9 PID: 0 Comm: swapper/9 Tainted: P   OE 
5.4.0-196-generic #216-Ubuntu
  [ 1073.873319] Hardware name: NVIDIA DGXA100 920-23687-2530-000/DGXA100, BIOS 
1.25 08/31/2023
  [ 1073.882547] RIP: 0010:trace_event_raw_event_wbt_timer+0x6f/0x100
  [ 1073.889248] Code: 59 80 e5 02 0f 85 8f 00 00 00 4c 89 e6 ba 34 00 00 00 48 
8d 7d a0 e8 00 aa c9 ff 49 89 c4 48 85 c0 74 37 49 8b 87 b8 03 00 00 <48> 8b 70 
50 48 85 f6 74 45 49 8d 7c 24 08 ba 20 00 00 00 e8 c9 18
  [ 1073.910200] RSP: 0018:af08598a8da0 EFLAGS: 00010282
  [ 1073.916029] RAX:  RBX:  RCX: 
8100
  [ 1073.923988] RDX: 970f0ba9501c RSI: 0100 RDI: 
970f0ba95018
  [ 1073.931947] RBP: af08598a8e08 R08: 970f0ba95010 R09: 
0100
  [ 1073.939906] R10: cf083fdc9a58 R11: 0386 R12: 
970f0ba9501c
  [ 1073.947864] R13:  R14: 0001 R15: 
976efdfba000
  [ 1073.955824] FS:  () GS:970f0f64() 
knlGS:
  [ 1073.964850] CS:  0010 DS:  ES:  CR0: 80050033
  [ 1073.971259] CR2: 0050 CR3: 001f7de8 CR4: 
00340ee0
  [ 1073.979217] Call Trace:
  [ 1073.981942]  
  [ 1073.984188]  ? show_regs.cold+0x1a/0x1f
  [ 1073.988456]  ? __die+0x90/0xd9
  [ 1073.991862]  ? no_context.isra.0+0x12c/0x320
  [ 1073.996626]  ? update_group_capacity+0x2c/0x1d0
  [ 1074.001679]  ? __bad_area_nosemaphore+0x45/0x1a0
  [ 1074.006829]  ? bad_area_nosemaphore+0x16/0x20
  [ 1074.011687]  ? do_user_addr_fault+0x267/0x440
  [ 1074.016547]  ? __enqueue_entity+0x96/0xa0
  [ 1074.021018]  ? enqueue_entity+0x139/0x670
  [ 1074.025490]  ? __do_page_fault+0x58/0x90
  [ 1074.029861]  ? do_page_fault+0x2c/0xe0
  [ 1074.034044]  ? page_fault+0x34/0x40
  [ 1074.037933]  ? trace_event_raw_event_wbt_timer+0x6f/0x100
  [ 1074.043954]  ? enqueue_entity+0x139/0x670
  [ 1074.048426]  wb_timer_fn+0x1d6/0x3c0
  [ 1074.052413]  ? blk_mq_tag_update_depth+0x100/0x100
  [ 1074.057755]  blk_stat_timer_fn+0x13a/0x140
  [ 1074.062326]  call_timer_fn+0x32/0x130
  [ 1074.066409]  __run_timers.part.0+0x180/0x280
  [