** Description changed: SRU Justification: [ Impact ] - * KVM 2nd level guest (means KVM VM that runs nested on top of a Power 10 - PowerVM hypervisor) hangs during LTP (Linux Test Projects) test suite. + * KVM 2nd level guest (means KVM VM that runs nested on top of a Power 10 + PowerVM hypervisor) hangs during LTP (Linux Test Projects) test suite. - * It hangs with: - "Back trace of paca->saved_r1 (0xc000000c1bc8bb00) (possibly stale) @ new_slab" + * It hangs with: + "Back trace of paca->saved_r1 (0xc000000c1bc8bb00) (possibly stale) @ new_slab" - * Diagnosing the issues points this this fix/upstream-commit: - [commit message, by Barry Song <[email protected]>] - Within try_to_unmap_one(), page_vma_mapped_walk() races with other PTE - modifications preceded by pte clear. While iterating over PTEs of a large folio, - it only starts acquiring PTL from the first valid (present) PTE. - PTE modifications can temporarily set PTEs to pte_none. - Consequently, the initial PTEs of a large folio might be skipped - in try_to_unmap_one(). - For example, for an anon folio, if we skip PTE0, we may have PTE0 which is - still present, while PTE1 ~ PTE(nr_pages - 1) are swap entries after - try_to_unmap_one(). - So folio will be still mapped, the folio fails to be reclaimed and is put - back to LRU in this round. - This also breaks up PTEs optimization such as CONT-PTE on this large folio - and may lead to accident folio_split() afterwards. - And since a part of PTEs are now swap entries, accessing those parts will - introduce overhead - do_swap_page. - Although the kernel can withstand all of the above issues, the situation - still seems quite awkward and warrants making it more ideal. - The same race also occurs with small folios, but they have only one PTE, - thus, it won't be possible for them to be partially unmapped. - This patch [see below] holds PTL from PTE0, allowing us to avoid reading - PTE values that are in the process of being transformed. With stable PTE - values, we can ensure that this large folio is either completely reclaimed - or that all PTEs remain untouched in this round. - A corner case is that if we hold PTL from PTE0 and most initial PTEs have - been really unmapped before that, we may increase the duration of holding - PTL. Thus we only apply this optimization to folios which are still entirely - mapped (not in deferred_split list). + * Diagnosing the issues points this this fix/upstream-commit: + [commit message, by Barry Song <[email protected]>] + Within try_to_unmap_one(), page_vma_mapped_walk() races with other PTE + modifications preceded by pte clear. While iterating over PTEs of a large folio, + it only starts acquiring PTL from the first valid (present) PTE. + PTE modifications can temporarily set PTEs to pte_none. + Consequently, the initial PTEs of a large folio might be skipped + in try_to_unmap_one(). + For example, for an anon folio, if we skip PTE0, we may have PTE0 which is + still present, while PTE1 ~ PTE(nr_pages - 1) are swap entries after + try_to_unmap_one(). + So folio will be still mapped, the folio fails to be reclaimed and is put + back to LRU in this round. + This also breaks up PTEs optimization such as CONT-PTE on this large folio + and may lead to accident folio_split() afterwards. + And since a part of PTEs are now swap entries, accessing those parts will + introduce overhead - do_swap_page. + Although the kernel can withstand all of the above issues, the situation + still seems quite awkward and warrants making it more ideal. + The same race also occurs with small folios, but they have only one PTE, + thus, it won't be possible for them to be partially unmapped. + This patch [see below] holds PTL from PTE0, allowing us to avoid reading + PTE values that are in the process of being transformed. With stable PTE + values, we can ensure that this large folio is either completely reclaimed + or that all PTEs remain untouched in this round. + A corner case is that if we hold PTL from PTE0 and most initial PTEs have + been really unmapped before that, we may increase the duration of holding + PTL. Thus we only apply this optimization to folios which are still entirely + mapped (not in deferred_split list). [ Fix ] - * 73bc32875ee9 73bc32875ee9b1881dd780308c6793fe463fe803 - "mm: hold PTL from the first PTE while reclaiming a large folio" + * 73bc32875ee9 73bc32875ee9b1881dd780308c6793fe463fe803 + "mm: hold PTL from the first PTE while reclaiming a large folio" [ Test Plan ] - * An IBM Power 10 system (where PowerVM is mandatory) - running Ubuntu Server 24.04 (kernel 6.8) or later - with (nested) KVM setup (so KVM on top of PowerVM). + * An IBM Power 10 system (where PowerVM is mandatory) + running Ubuntu Server 24.04 (kernel 6.8) or later + with (nested) KVM setup (so KVM on top of PowerVM). - * Run LTP test suite - Tests running: SLS(io,base) + * Run LTP test suite + Tests running: SLS(io,base) - * Without the patch the above test will hang with - Back trace of paca->saved_r1 (0xc000000c1bc8bb00) (possibly stale) @ new_slab + * Without the patch the above test will hang with + Back trace of paca->saved_r1 (0xc000000c1bc8bb00) (possibly stale) @ new_slab [ Where problems could occur ] - * This is a common code change in the memory management sub-system, - hence great care needs to be taken, even if it was discussed upfront - at the https://lore.kernel.org/ mailing list and the upstream commit - provenance shows that many eyes had a look at this. + * This is a common code change in the memory management sub-system, + hence great care needs to be taken, even if it was discussed upfront + at the https://lore.kernel.org/ mailing list and the upstream commit + provenance shows that many eyes had a look at this. - * The modification is relatively small with just one if statement - (across two lines) in mm/vmscan.c. + * The modification is relatively small with just one if statement + (across two lines) in mm/vmscan.c. - * This change is to assist 'try_to_unmap' to acquire page table locks (PTL) - from the first page table entry (PTE) and to eliminate the influence of - temporary and volatile PTE values. + * This change is to assist 'try_to_unmap' to acquire page table locks (PTL) + from the first page table entry (PTE) and to eliminate the influence of + temporary and volatile PTE values. - * If done wrong it can especially have a negative impact in case of large folios. - and wrong hints might be given to try_to_unmap - which may lead to bad page swapping. + * If done wrong it can especially have a negative impact in case of large folios. + and wrong hints might be given to try_to_unmap + which may lead to bad page swapping. - * In case of an issue with this patch the result can also be decreased - performance and efficiency in the page table handling - the opposite - of what the patch is supposed to address. + * In case of an issue with this patch the result can also be decreased + performance and efficiency in the page table handling - the opposite + of what the patch is supposed to address. - * Fortunately several developers had their eyes on this commit, - as the provenance of the patch and the discussion ot lkml shows. + * Fortunately several developers had their eyes on this commit, + as the provenance of the patch and the discussion at lkml shows. [ Other Info ] - - * The commit is upstream since v6.10(-rc1), hence it will be included - in oracular with the planned target kernel. + + * The commit is upstream since v6.10(-rc1), hence it will be included + in oracular with the planned target kernel. __________ == Comment: #0 - SEETEENA THOUFEEK <[email protected]> - 2024-08-06 00:20:57 == +++ This bug was initially created as a clone of Bug #206372 +++ ---Problem Description--- L2 Guest hung during LTP Tests. Back trace of paca->saved_r1 (0xc000000c1bc8bb00) (possibly stale) @ new_slab (edit) ---uname output--- NA ---Additional Hardware Info--- NA Contact Information = na ---Debugger Data--- NA ---Patches Installed--- NA ---Steps to Reproduce--- Tests running: SLS(io,base) LPAR Config: ============ PHYP Environment: PowerVM LPAR Hostname/IP: 10.33.2.107 Rootvg Filesystem: xfs Network Interface: Shiner-T vNIC/SR-IOV Config: n/a IO Type: SAN IO Disk Type: raw Multipath Enabled: No ------------------------------------------------------------------------------------- DUMP Config: ============ KDUMP configured: Yes XMON enabled no DUMP Available: no Machine Type = na Userspace rpm: NA The userspace tool has the following bit modes: NA Userspace tool obtained from project website: na Userspace tool common name: NA *Additional Instructions for na: -Post a private note with access information to the machine that is currently in the debugger. -Attach ltrace and strace of userspace application. please include this commit in Ubuntu 24.04 upstream commit which is solving these data store lockups: 73bc32875ee9b1881dd780308c6793fe463fe803 mm: hold PTL from the first PTE while reclaiming a large folio
-- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/2076147 Title: Add 'mm: hold PTL from the first PTE while reclaiming a large folio' to fix L2 Guest hang during LTP Test To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/2076147/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
