** Description changed: SRU Justification: [ Impact ] - * A KVM guest (VM) that got live migrated between two Power 10 systems - (using nested virtualization, means KVM on top of PowerVM) will - highly likely crash after about an hour. - - * At that point it looked like the live migration itself was already - successful, but it wasn't, and the crash is caused due to it. + * A KVM guest (VM) that got live migrated between two Power 10 systems + (using nested virtualization, means KVM on top of PowerVM) will + highly likely crash after about an hour. + + * At that point it looked like the live migration itself was already + successful, but it wasn't, and the crash is caused due to it. [ Test Plan ] - * Setting up two Power 10 systems (with firmware level FW1060 or newer, - that supports nested KVM) with Ubuntu Server 24.04 for ppc64el. - - * Setup a qemu/KVM environment that allows to live migrate a KVM - guest from one P10 system to the other. - - * (The disk type does not seem to matter, hence NFS based disk storage - can be used for example). - - * After about an hour the live migrated guest is likely to crash. - Hence wait for 2 hours (which increases the likeliness) and - a crash due to: - "migrate_misplaced_folio+0x540/0x5d0" - occurs. + * Setting up two Power 10 systems (with firmware level FW1060 or newer, + that supports nested KVM) with Ubuntu Server 24.04 for ppc64el. + + * Setup a qemu/KVM environment that allows to live migrate a KVM + guest from one P10 system to the other. + + * (The disk type does not seem to matter, hence NFS based disk storage + can be used for example). + + * After about an hour the live migrated guest is likely to crash. + Hence wait for 2 hours (which increases the likeliness) and + a crash due to: + "migrate_misplaced_folio+0x540/0x5d0" + occurs. [ Where problems could occur ] - * The 'fix' to avoid calling folio_likely_mapped_shared for cases where - folio might have already been unmapped and the move of the checks - might have an impact on page table locks if done wrong, - which may lead to wrong locks, blocked memory and finally crashes. - - * The direct folio calls in mm/huge_memory.c and mm/memory.c got now - 'in-directed', which may lead to a different behaviour and side-effects. - However, isolation is still done, just slightly different and - instead of using numamigrate_isolate_folio, now in (the renamed) - migrate_misplaced_folio_prepare. - - * Further upstream conversations: - https://lkml.kernel.org/r/[email protected] - https://lkml.kernel.org/r/[email protected] - https://lkml.kernel.org/r/[email protected] - - * Fixing a confusing return code, now to just return 0, on success is - clarifying the return code handling and usage, and was mainly done in - preparation of further changes, - but can have bad side effects if the return code was used in other - code places already as is. - - * Further upstream conversations: - https://lkml.kernel.org/r/[email protected] - https://lkml.kernel.org/r/[email protected] - - * Fixing the fact that NUMA balancing prohibits mTHP - (multi-size Transparent Hugepage Support) seems to be unreasonable - since its an exclusive mapping. - Allowing this seems to bring significant performance improvements - see commit message d2136d749d76), but introduced significant changes - PTE mapping and modifications and even relies on further commits: - 859d4adc3415 ("mm: numa: do not trap faults on shared data section pages") - 80d47f5de5e3 ("mm: don't try to NUMA-migrate COW pages that have other uses") - This case cause issues on systems configured for THP, - may confuse the ordering, which may even lead to memory corruption. - And this may especially hit (NUMA) systems with high core numbers, - where balancing is more often needed. - - * Further upstream conversations: - https://lore.kernel.org/all/[email protected]/ - https://lkml.kernel.org/r/c33a5c0b0a0323b1f8ed53772f50501f4b196e25.1712132950.git.baolin.w...@linux.alibaba.com - https://lkml.kernel.org/r/d28d276d599c26df7f38c9de8446f60e22dd1950.1711683069.git.baolin.w...@linux.alibaba.com - - * The refactoring of the code for NUMA mapping rebuilding and moving - it into a new helper, seems to be straight forward, since the active code - stays unchanged, however the new function needs to be callable, but this - is the case since its all in mm/memory.c. - - * Further upstream conversations: - https://lkml.kernel.org/r/[email protected] - https://lkml.kernel.org/r/[email protected] - https://lkml.kernel.org/r/8bc2586bdd8dbbe6d83c09b77b360ec8fcac3736.1711683069.git.baolin.w...@linux.alibaba.com - - * The refactoring of folio_estimated_sharers to folio_likely_mapped_shared - is more significant, since the logic changed from - (folio_estimated_sharers) 'estimate the number of sharers of a folio' to - (folio_likely_mapped_shared) 'estimate if the folio is mapped into the page - tables of more than one MM'. - - * Since this is an estimation, the results may be unpredictable - (especially for bigger folios), and not like expected or assumed - (there are quite some side-notes in the code comments of bb34f78d72c2, - that mention potential fuzzy results), hence this - may lead to unforeseen behavior. - - * The condition statements became clearer since it's now based on - (more or less obvious) number counts, but can still be erroneous in - case folio_estimated_sharers does incorrect calculations. - - * Further upstream conversations: - https://lkml.kernel.org/r/[email protected] - https://lkml.kernel.org/r/[email protected] - - * mm/numa_balancing: allow migrate on protnone reference with + * The 'fix' to avoid calling folio_likely_mapped_shared for cases where + folio might have already been unmapped and the move of the checks + might have an impact on page table locks if done wrong, + which may lead to wrong locks, blocked memory and finally crashes. + + * The direct folio calls in mm/huge_memory.c and mm/memory.c got now + 'in-directed', which may lead to a different behaviour and side-effects. + However, isolation is still done, just slightly different and + instead of using numamigrate_isolate_folio, now in (the renamed) + migrate_misplaced_folio_prepare. + + * Further upstream conversations: + https://lkml.kernel.org/r/[email protected] + https://lkml.kernel.org/r/[email protected] + https://lkml.kernel.org/r/[email protected] + + * Fixing a confusing return code, now to just return 0, on success is + clarifying the return code handling and usage, and was mainly done in + preparation of further changes, + but can have bad side effects if the return code was used in other + code places already as is. + + * Further upstream conversations: + https://lkml.kernel.org/r/[email protected] + https://lkml.kernel.org/r/[email protected] + + * Fixing the fact that NUMA balancing prohibits mTHP + (multi-size Transparent Hugepage Support) seems to be unreasonable + since its an exclusive mapping. + Allowing this seems to bring significant performance improvements + see commit message d2136d749d76), but introduced significant changes + PTE mapping and modifications and even relies on further commits: + 859d4adc3415 ("mm: numa: do not trap faults on shared data section pages") + 80d47f5de5e3 ("mm: don't try to NUMA-migrate COW pages that have other uses") + This case cause issues on systems configured for THP, + may confuse the ordering, which may even lead to memory corruption. + And this may especially hit (NUMA) systems with high core numbers, + where balancing is more often needed. + + * Further upstream conversations: + https://lore.kernel.org/all/[email protected]/ + https://lkml.kernel.org/r/c33a5c0b0a0323b1f8ed53772f50501f4b196e25.1712132950.git.baolin.w...@linux.alibaba.com + https://lkml.kernel.org/r/d28d276d599c26df7f38c9de8446f60e22dd1950.1711683069.git.baolin.w...@linux.alibaba.com + + * The refactoring of the code for NUMA mapping rebuilding and moving + it into a new helper, seems to be straight forward, since the active code + stays unchanged, however the new function needs to be callable, but this + is the case since its all in mm/memory.c. + + * Further upstream conversations: + https://lkml.kernel.org/r/[email protected] + https://lkml.kernel.org/r/[email protected] + https://lkml.kernel.org/r/8bc2586bdd8dbbe6d83c09b77b360ec8fcac3736.1711683069.git.baolin.w...@linux.alibaba.com + + * The refactoring of folio_estimated_sharers to folio_likely_mapped_shared + is more significant, since the logic changed from + (folio_estimated_sharers) 'estimate the number of sharers of a folio' to + (folio_likely_mapped_shared) 'estimate if the folio is mapped into the page + tables of more than one MM'. + + * Since this is an estimation, the results may be unpredictable + (especially for bigger folios), and not like expected or assumed + (there are quite some side-notes in the code comments of bb34f78d72c2, + that mention potential fuzzy results), hence this + may lead to unforeseen behavior. + + * The condition statements became clearer since it's now based on + (more or less obvious) number counts, but can still be erroneous in + case folio_estimated_sharers does incorrect calculations. + + * Further upstream conversations: + https://lkml.kernel.org/r/[email protected] + https://lkml.kernel.org/r/[email protected] + + * mm/numa_balancing: allow migrate on protnone reference with MPOL_PREFERRED_MANY policy - Commit 133d04b1eee9 extends commit bda420b98505 "numa balancing: migrate - on fault among multiple bound nodes" from allowing NUMA fault migrations - when the executing node is part of the policy mask for MPOL_BIND, - to also support MPOL_PREFERRED_MANY policy. - Both cases (MPOL_BIND and MPOL_PREFERRED_MANY) are treated in the same way. - In case the NUMA topology is not correctly considered, changes here - may lead to decreased memory performance. - However, the code changes themselves are relatively traceable. - - * Further upstream conversations: - https://lkml.kernel.org/r/158acc57319129aa46d50fd64c9330f3e7c7b4bf.1711373653.git.donet...@linux.ibm.com - https://lkml.kernel.org/r/369d6a58758396335fd1176d97bbca4e7730d75a.1709909210.git.donet...@linux.ibm.com - - * Finally commit f8fd525ba3a2 ("mm/mempolicy: use numa_node_id() instead - of cpu_to_node()") is a patchset to further optimize the cross-socket - memory access with MPOL_PREFERRED_MANY policy. - The mpol_misplaced changes are mainly moving from cpu_to_node to - numa_node_id, and with that make the code more NUMA aware. - Based on that vm_fault/vmf needs to be considered instead of - vm_area_struct/vma. - This may have consequences on the memory policy itself. - - * Further upstream conversations: - https://lkml.kernel.org/r/[email protected] - https://lkml.kernel.org/r/6059f034f436734b472d066db69676fb3a459864.1711373653.git.donet...@linux.ibm.com - https://lkml.kernel.org/r/[email protected] - https://lkml.kernel.org/r/744646531af02cc687cde8ae788fb1779e99d02c.1709909210.git.donet...@linux.ibm.com - - * The overall patch set touches quite a bit of common code, - but the modifications were intensely discussed with many experts - in the various mailing-list threads that are referenced above. + Commit 133d04b1eee9 extends commit bda420b98505 "numa balancing: migrate + on fault among multiple bound nodes" from allowing NUMA fault migrations + when the executing node is part of the policy mask for MPOL_BIND, + to also support MPOL_PREFERRED_MANY policy. + Both cases (MPOL_BIND and MPOL_PREFERRED_MANY) are treated in the same way. + In case the NUMA topology is not correctly considered, changes here + may lead to decreased memory performance. + However, the code changes themselves are relatively traceable. + + * Further upstream conversations: + https://lkml.kernel.org/r/158acc57319129aa46d50fd64c9330f3e7c7b4bf.1711373653.git.donet...@linux.ibm.com + https://lkml.kernel.org/r/369d6a58758396335fd1176d97bbca4e7730d75a.1709909210.git.donet...@linux.ibm.com + + * Finally commit f8fd525ba3a2 ("mm/mempolicy: use numa_node_id() instead + of cpu_to_node()") is a patchset to further optimize the cross-socket + memory access with MPOL_PREFERRED_MANY policy. + The mpol_misplaced changes are mainly moving from cpu_to_node to + numa_node_id, and with that make the code more NUMA aware. + Based on that vm_fault/vmf needs to be considered instead of + vm_area_struct/vma. + This may have consequences on the memory policy itself. + + * Further upstream conversations: + https://lkml.kernel.org/r/[email protected] + https://lkml.kernel.org/r/6059f034f436734b472d066db69676fb3a459864.1711373653.git.donet...@linux.ibm.com + https://lkml.kernel.org/r/[email protected] + https://lkml.kernel.org/r/744646531af02cc687cde8ae788fb1779e99d02c.1709909210.git.donet...@linux.ibm.com + + * The overall patch set touches quite a bit of common code, + but the modifications were intensely discussed with many experts + in the various mailing-list threads that are referenced above. [ Other Info ] - * The first two "mm/migrate" commits are the newest and were - upstream accepted with kernel v6.11(-rc1), - all other are already upstream since v6.10(-rc1). - - * Hence oracular (with a planned target kernel of 6.11) is not affect, - and the SRU is for noble only. - And since (nested) KVM virtualization was (re-)introduced with - noble, no older Ubuntu release are affected. + * The first two "mm/migrate" commits are the newest and were + upstream accepted with kernel v6.11(-rc1), + all other are already upstream since v6.10(-rc1). + + * Hence oracular (with a planned target kernel of 6.11) is not affect, + and the SRU is for noble only. + + * And since (nested) KVM virtualization on ppc64el was (re-)introduced + just with noble, no older Ubuntu releases older than noble are affected. __________ == Comment: #0 - SEETEENA THOUFEEK <[email protected]> - 2024-08-12 23:50:17 == +++ This bug was initially created as a clone of Bug #207985 +++ ---Problem Description--- Post Migration Non-MDC L1 eralp1 crashed with migrate_misplaced_folio+0x4cc/0x5d0 ( Machine Type = na Contact Information = [email protected] ---Steps to Reproduce--- Problem description : After 1 hour of successful migration from doodlp1 [MDC MODE] to eralp1[NON MDC mode],eralp1 guest and dump is collected ---uname output--- na ---Debugger--- A debugger is not configured [281827.975244] NIP [c0000000005f0620] migrate_misplaced_folio+0x4f0/0x5d0 [281827.975251] LR [c0000000005f067c] migrate_misplaced_folio+0x54c/0x5d0 [281827.975258] Call Trace: [281827.975260] [c000001e19ff7140] [c0000000005f0670] migrate_misplaced_folio+0x540/0x5d0 (unreliable) [281827.975268] [c000001e19ff71d0] [c00000000054c9f0] __handle_mm_fault+0xf70/0x28e0 [281827.975276] [c000001e19ff7310] [c00000000054e478] handle_mm_fault+0x118/0x400 [281827.975284] [c000001e19ff7360] [c00000000053598c] __get_user_pages+0x1ec/0x5b0 [281827.975291] [c000001e19ff7420] [c000000000536920] get_user_pages_unlocked+0x120/0x4f0 [281827.975298] [c000001e19ff74c0] [c00800001894ea9c] hva_to_pfn+0xf4/0x630 [kvm] [281827.975316] [c000001e19ff7550] [c008000018b4efc4] kvmppc_book3s_instantiate_page+0xec/0x790 [kvm_hv] [281827.975326] [c000001e19ff7660] [c008000018b4f750] kvmppc_book3s_radix_page_fault+0xe8/0x380 [kvm_hv] [281827.975335] [c000001e19ff7700] [c008000018b488fc] kvmppc_book3s_hv_page_fault+0x294/0xd60 [kvm_hv] [281827.975344] [c000001e19ff77e0] [c008000018b43f5c] kvmppc_vcpu_run_hv+0xf94/0x11d0 [kvm_hv] [281827.975352] [c000001e19ff78a0] [c00800001896131c] kvmppc_vcpu_run+0x34/0x48 [kvm] [281827.975365] [c000001e19ff78c0] [c00800001895c164] kvm_arch_vcpu_ioctl_run+0x39c/0x570 [kvm] [281827.975379] [c000001e19ff7950] [c00800001894a104] kvm_vcpu_ioctl+0x20c/0x9a8 [kvm] [281827.975391] [c000001e19ff7b30] [c000000000683974] sys_ioctl+0x574/0x16a0 [281827.975395] [c000001e19ff7c30] [c000000000030838] system_call_exception+0x168/0x310 [281827.975400] [c000001e19ff7e50] [c00000000000d05c] system_call_vectored_common+0x15c/0x2ec [281827.975406] --- interrupt: 3000 at 0x7fffb7d4d2bc Mirroring to distro as per message in group channel Please pick these patches for this bug: ee86814b0562 ("mm/migrate: move NUMA hinting fault folio isolation + checks under PTL") 4b88c23ab8c9 ("mm/migrate: make migrate_misplaced_folio() return 0 on success") d2136d749d76 ("mm: support multi-size THP numa balancing") 6b0ed7b3c775 ("mm: factor out the numa mapping rebuilding into a new helper") ebb34f78d72c ("mm: convert folio_estimated_sharers() to folio_likely_mapped_shared()") 133d04b1eee9 ("mm/numa_balancing: allow migrate on protnone reference with MPOL_PREFERRED_MANY policy") f8fd525ba3a2 ("mm/mempolicy: use numa_node_id() instead of cpu_to_node()") Thanks, Amit
-- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/2076866 Title: Guest crahses post migration with migrate_misplaced_folio+0x4cc/0x5d0 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/2076866/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
