Fix patch for Jammy and Noble submitted to kernel team mailing list:
https://lists.ubuntu.com/archives/kernel-team/2025-July/161181.html.

** Description changed:

+ SRU Justification:
+ 
+ [Impact]
+ 
+ On systems with ConnectX devices using the mlx5_ib driver, the dereg_mr
+ InfiniBand operation will produce a kernel WARNING message when
+ deregistering device memory. The WARNING occurs only when an IOMMU is
+ in-use and not in passthrough/identity mode, but in this case the
+ mlx5_ib driver is still behaving incorrectly.
+ 
+ [ 343.588824] ------------[ cut here ]------------
+ [ 343.588829] WARNING: CPU: 68 PID: 4076 at drivers/iommu/dma-iommu.c:1198 
iommu_dma_unmap_page+0x12c/0x190
+ ...
+ [ 343.589101] Call trace:
+ [ 343.589102] iommu_dma_unmap_page+0x12c/0x190
+ [ 343.589104] dma_unmap_page_attrs+0x1f8/0x290
+ [ 343.589107] mlx5_free_priv_descs+0x94/0xe0 [mlx5_ib]
+ [ 343.589121] mlx5_ib_dereg_mr+0x330/0x4f8 [mlx5_ib]
+ [ 343.589131] ib_dereg_mr_user+0x54/0x178 [ib_core]
+ [ 343.589148] uverbs_free_mr+0x24/0x50 [ib_uverbs]
+ [ 343.589155] destroy_hw_idr_uobject+0x38/0x98 [ib_uverbs]
+ [ 343.589160] uverbs_destroy_uobject+0x4c/0x230 [ib_uverbs]
+ [ 343.589165] uobj_destroy+0x60/0xe8 [ib_uverbs]
+ [ 343.589170] ib_uverbs_run_method+0x194/0x310 [ib_uverbs]
+ [ 343.589175] ib_uverbs_cmd_verbs+0x1ac/0x288 [ib_uverbs]
+ [ 343.589180] ib_uverbs_ioctl+0xb0/0x150 [ib_uverbs]
+ [ 343.589185] __arm64_sys_ioctl+0xd0/0x150
+ [ 343.589189] invoke_syscall.constprop.0+0x84/0x100
+ [ 343.589191] do_el0_svc+0x4c/0x100
+ [ 343.589192] el0_svc+0x48/0x1c8
+ [ 343.589195] el0t_64_sync_handler+0x148/0x158
+ [ 343.589197] el0t_64_sync+0x1b0/0x1b8
+ [ 343.589199] ---[ end trace 0000000000000000 ]---
+ 
+ Oracular obtained the fix via stable updates, and 6.14 kernels and newer
+ already have this fix.
+ 
+ Jammy and Noble are still affected.
+ 
+ [Fix]
+ 
+ This is resolved by backporting abc7b3f1f056 ("RDMA/mlx5: Fix a WARN
+ during dereg_mr for DM type") from upstream. The patch submitted with
+ this cover letter was originally submitted to noble:linux-nvidia, but
+ benefits jammy:linux and noble:linux as well.
+ 
+ [Test Plan]
+ 
+ For systems with ConnectX devices configured for InfiniBand, this can be
+ reproduced with:
+ 
+ $ ibv_rc_pingpong -g 0 -j &
+ $ ibv_rc_pingpong -g 0 -j 127.0.0.1
+ Finally, check dmesg for a WARNING message.
+ 
+ [Where issues could arise]
+ 
+ These changes affect the mlx5_ib driver. Regressions would likely appear
+ as misbehavior of this driver, particularly where it handles releasing
+ RDMA/IB memory regions.
+ 
+ ----------- above SRU justification added by ~jacobmartin -----------
+ 
  If running ibv_rc_pingpong like this:
  ibv_rc_pingpong -g 0 -j
  with kernel 6.8.0-1025-nvidia-64k
  
  will see this warning at dmesg:
  
  [  343.588824] ------------[ cut here ]------------
  [  343.588829] WARNING: CPU: 68 PID: 4076 at drivers/iommu/dma-iommu.c:1198 
iommu_dma_unmap_page+0x12c/0x190
  [  343.588837] Modules linked in: rpcrdma rdma_ucm ib_iser libiscsi 
scsi_transport_iscsi rdma_cm ib_ipoib iw_cm ib_cm xt_conntrack xt_MASQUERADE 
bridge stp llc xt_set ip_set nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 
nf_defrag_ipv4 xt_addrtype nft_compat nf_tables xfrm_user xfrm_algo qrtr 
overlay cfg80211 sunrpc binfmt_misc nls_iso8859_1 dax_hmem nvidia_cspmu 
cxl_acpi ast ses cxl_core i2c_algo_bit arm_smmuv3_pmu ipmi_ssif enclosure 
arm_cspmu_module coresight_trbe arm_spe_pmu acpi_power_meter cppc_cpufreq 
acpi_ipmi ipmi_devintf spi_nor nvidia_uvm(OE) coresight_tmc coresight_funnel 
coresight_stm ipmi_msghandler stm_p_basic coresight stm_core nvidia_drm(OE) 
uio_pdrv_genirq uio nvidia_modeset(OE) video ib_umad nvidia_fs(O) nvidia(OE) 
ecc dm_multipath nvme_fabrics nvme_keyring efi_pstore nfnetlink dmi_sysfs 
ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 
async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon 
raid6_pq libcrc32c raid1 raid0 mlx5_ib ib_uverbs macsec ib_core crct10dif_ce
  [  343.588907]  polyval_ce mlx5_core polyval_generic ghash_ce sm4_ce_gcm 
sm4_ce_ccm sm4_ce sm4_ce_cipher sm4 sm3_ce sm3 mlxfw i2c_smbus mpt3sas nvme 
sha3_ce psample sha2_ce nvme_core raid_class tls xhci_pci sha256_arm64 sha1_ce 
xhci_pci_renesas scsi_transport_sas nvme_auth pci_hyperv_intf i2c_tegra 
aes_neon_bs aes_neon_blk aes_ce_blk aes_ce_cipher
  [  343.588929] CPU: 68 PID: 4076 Comm: ibv_rc_pingpong Tainted: G        W  
OE      6.8.0-1025-nvidia-64k #28-Ubuntu
  [  343.588931] Hardware name: Quanta Cloud Technology Inc. QuantaGrid S74G-2U 
1S7GZ9Z0002/S7G MB (CG1), BIOS 3A21 07/10/2024
  [  343.588932] pstate: 83400009 (Nzcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
  [  343.588933] pc : iommu_dma_unmap_page+0x12c/0x190
  [  343.588935] lr : iommu_dma_unmap_page+0x44/0x190
  [  343.588936] sp : ffff8000bd04f840
  [  343.588936] x29: ffff8000bd04f840 x28: ffff8000bd04fba0 x27: 
0000000000000001
  [  343.588939] x26: 0000000000000000 x25: 0000000000000010 x24: 
0000000000000001
  [  343.588941] x23: 0000000000000000 x22: 0000000000000000 x21: 
ffff000116904000
  [  343.588942] x20: 0000000000000000 x19: ffff00008e3530c8 x18: 
ffff8000bd980088
  [  343.588944] x17: 0000000000000000 x16: 0000000000000000 x15: 
0000ffffef1a0390
  [  343.588946] x14: 0000000000000000 x13: 0000000000000000 x12: 
0000000000000000
  [  343.588949] x11: 0000000000000000 x10: 0000000000000000 x9 : 
ffff800080df8d90
  [  343.588952] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 
0000000000000000
  [  343.588954] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 
0000000000000000
  [  343.588959] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 
0000000000000000
  [  343.589101] Call trace:
  [  343.589102]  iommu_dma_unmap_page+0x12c/0x190
  [  343.589104]  dma_unmap_page_attrs+0x1f8/0x290
  [  343.589107]  mlx5_free_priv_descs+0x94/0xe0 [mlx5_ib]
  [  343.589121]  mlx5_ib_dereg_mr+0x330/0x4f8 [mlx5_ib]
  [  343.589131]  ib_dereg_mr_user+0x54/0x178 [ib_core]
  [  343.589148]  uverbs_free_mr+0x24/0x50 [ib_uverbs]
  [  343.589155]  destroy_hw_idr_uobject+0x38/0x98 [ib_uverbs]
  [  343.589160]  uverbs_destroy_uobject+0x4c/0x230 [ib_uverbs]
  [  343.589165]  uobj_destroy+0x60/0xe8 [ib_uverbs]
  [  343.589170]  ib_uverbs_run_method+0x194/0x310 [ib_uverbs]
  [  343.589175]  ib_uverbs_cmd_verbs+0x1ac/0x288 [ib_uverbs]
  [  343.589180]  ib_uverbs_ioctl+0xb0/0x150 [ib_uverbs]
  [  343.589185]  __arm64_sys_ioctl+0xd0/0x150
  [  343.589189]  invoke_syscall.constprop.0+0x84/0x100
  [  343.589191]  do_el0_svc+0x4c/0x100
  [  343.589192]  el0_svc+0x48/0x1c8
  [  343.589195]  el0t_64_sync_handler+0x148/0x158
  [  343.589197]  el0t_64_sync+0x1b0/0x1b8
  [  343.589199] ---[ end trace 0000000000000000 ]---

** Also affects: linux (Ubuntu Jammy)
   Importance: Undecided
       Status: New

** Also affects: linux-nvidia (Ubuntu Jammy)
   Importance: Undecided
       Status: New

** No longer affects: linux-nvidia (Ubuntu Jammy)

** Changed in: linux (Ubuntu Jammy)
       Status: New => In Progress

** Changed in: linux (Ubuntu Jammy)
   Importance: Undecided => Low

** Changed in: linux (Ubuntu Jammy)
     Assignee: (unassigned) => Jacob Martin (jacobmartin)

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2107816

Title:
  warning at iommu_dma_unmap_page when running ibv_rc_pingpong

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2107816/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to