Public bug reported:

[Impact]
Shows follow_pte() warning with nvidia dirver + 6.11 kernel.

Aug 09 09:20:42 ubuntu-202407-34200 kernel: WARNING: CPU: 0 PID: 2918 at
include/linux/rwsem.h:80 follow_pte+0x220/0x230

[Fix]
This occurs during suspend when a function from the NVIDIA 
'nv_revoke_gpu_mappings_locked()' calls the kernel function 
'unmap_mapping_range()', which eventually ends up calling 'follow_pte()'. The 
function 'follow_pte()' calls an assertion 'mmap_assert_locked' to check if the 
'mmap_lock' has been taken. 

This assertion fails, and we see a warning call trace (no functional issue, 
just some output in dmesg). All of this happens in kernel versions v6.10 
through v6.11. 
This is a kernel bug, not an NVIDIA driver bug, and has also been discussed 
here in the kernel mailing list : 
https://lore.kernel.org/linux-mm/[email protected]/T/#u

There is a series of patches to address this issue and replace the follow_pte()
https://lore.kernel.org/linux-mm/[email protected]/
We try to cherry pick the new functions and at the same time preserve the 
follow_pte() for compatiblity with the old drivers.

b1b46751671b mm: fix follow_pfnmap API lockdep assert
75182022a043 mm/x86: support large pfn mappings
cbea8536d933 mm/x86/pat: use the new follow_pfnmap API
6da8e9634bb7 mm: new follow_pfnmap API
6857be5fecae mm: introduce ARCH_SUPPORTS_HUGE_PFNMAP and special bits to pmd/pud

[Test]
1. Boot up the machine with 6.11 kernel + nvidia driver
2. Do suspend/resume and check dmesg
3. There should be no nvidia call trace

[Where problems could occur]
Only this patch change the code that uses follow_pfnmap() to replace 
follow_pfn()
cbea8536d933 ("mm/x86/pat: use the new follow_pfnmap API")
The changes are 1x1 mappingable and should do the identical things.

** Affects: hwe-next
     Importance: Undecided
         Status: New

** Affects: linux-oem-6.11 (Ubuntu)
     Importance: Undecided
         Status: Invalid

** Affects: linux-oem-6.11 (Ubuntu Noble)
     Importance: Undecided
     Assignee: AceLan Kao (acelankao)
         Status: In Progress


** Tags: jira-somerville-371 jira-stella-147 oem-priority

** Also affects: linux-oem-6.11 (Ubuntu Noble)
   Importance: Undecided
       Status: New

** Changed in: linux-oem-6.11 (Ubuntu Noble)
       Status: New => In Progress

** Changed in: linux-oem-6.11 (Ubuntu Noble)
     Assignee: (unassigned) => AceLan Kao (acelankao)

** Changed in: linux-oem-6.11 (Ubuntu)
       Status: New => Invalid

** Tags added: jira-stella-147 oem-priority

** Tags added: jira-somerville-371

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2086668

Title:
  NVIDIA WANR_ON call trace right after power on or resumed on 6.11
  kernel

To manage notifications about this bug go to:
https://bugs.launchpad.net/hwe-next/+bug/2086668/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to