[Bug 2076866] Re: Guest crahses post migration with migrate_misplaced_folio+0x4cc/0x5d0

Frank Heimes Wed, 04 Sep 2024 23:55:42 -0700

** Description changed:

  SRU Justification:
  
  [ Impact ]
  
-  * A KVM guest (VM) that got live migrated between two Power 10 systems
-    (using nested virtualization, means KVM on top of PowerVM) will
-    highly likely crash after about an hour.
- 
-  * At that point it looked like the live migration itself was already
-    successful, but it wasn't, and the crash is caused due to it.
+  * A KVM guest (VM) that got live migrated between two Power 10 systems
+    (using nested virtualization, means KVM on top of PowerVM) will
+    highly likely crash after about an hour.
+ 
+  * At that point it looked like the live migration itself was already
+    successful, but it wasn't, and the crash is caused due to it.
  
  [ Test Plan ]
  
-  * Setting up two Power 10 systems (with firmware level FW1060 or newer,
-    that supports nested KVM) with Ubuntu Server 24.04 for ppc64el.
- 
-  * Setup a qemu/KVM environment that allows to live migrate a KVM
-    guest from one P10 system to the other.
- 
-  * (The disk type does not seem to matter, hence NFS based disk storage
-     can be used for example).
- 
-  * After about an hour the live migrated guest is likely to crash.
-    Hence wait for 2 hours (which increases the likeliness) and
-    a crash due to:
-    "migrate_misplaced_folio+0x540/0x5d0"
-    occurs.
+  * Setting up two Power 10 systems (with firmware level FW1060 or newer,
+    that supports nested KVM) with Ubuntu Server 24.04 for ppc64el.
+ 
+  * Setup a qemu/KVM environment that allows to live migrate a KVM
+    guest from one P10 system to the other.
+ 
+  * (The disk type does not seem to matter, hence NFS based disk storage
+     can be used for example).
+ 
+  * After about an hour the live migrated guest is likely to crash.
+    Hence wait for 2 hours (which increases the likeliness) and
+    a crash due to:
+    "migrate_misplaced_folio+0x540/0x5d0"
+    occurs.
  
  [ Where problems could occur ]
  
-  * The 'fix' to avoid calling folio_likely_mapped_shared for cases where
-    folio might have already been unmapped and the move of the checks
-    might have an impact on page table locks if done wrong,
-    which may lead to wrong locks, blocked memory and finally crashes.
- 
-  * The direct folio calls in mm/huge_memory.c and mm/memory.c got now
-    'in-directed', which may lead to a different behaviour and side-effects.
-    However, isolation is still done, just slightly different and
-    instead of using numamigrate_isolate_folio, now in (the renamed)
-    migrate_misplaced_folio_prepare.
- 
-  * Further upstream conversations:
-    https://lkml.kernel.org/r/[email protected]
-    https://lkml.kernel.org/r/[email protected]
-    https://lkml.kernel.org/r/[email protected]
- 
-  * Fixing a confusing return code, now to just return 0, on success is
-    clarifying the return code handling and usage, and was mainly done in
-    preparation of further changes,
-    but can have bad side effects if the return code was used in other
-    code places already as is.
- 
-  * Further upstream conversations:
-    https://lkml.kernel.org/r/[email protected]
-    https://lkml.kernel.org/r/[email protected]
- 
-  * Fixing the fact that NUMA balancing prohibits mTHP
-    (multi-size Transparent Hugepage Support) seems to be unreasonable
-    since its an exclusive mapping.
-    Allowing this seems to bring significant performance improvements
-    see commit message d2136d749d76), but introduced significant changes
-    PTE mapping and modifications and even relies on further commits:
-    859d4adc3415 ("mm: numa: do not trap faults on shared data section pages")
-    80d47f5de5e3 ("mm: don't try to NUMA-migrate COW pages that have other 
uses")
-    This case cause issues on systems configured for THP,
-    may confuse the ordering, which may even lead to memory corruption.
-    And this may especially hit (NUMA) systems with high core numbers,
-    where balancing is more often needed.
- 
-  * Further upstream conversations:
-    
https://lore.kernel.org/all/[email protected]/
-    
https://lkml.kernel.org/r/c33a5c0b0a0323b1f8ed53772f50501f4b196e25.1712132950.git.baolin.w...@linux.alibaba.com
-    
https://lkml.kernel.org/r/d28d276d599c26df7f38c9de8446f60e22dd1950.1711683069.git.baolin.w...@linux.alibaba.com
- 
-  * The refactoring of the code for NUMA mapping rebuilding and moving
-    it into a new helper, seems to be straight forward, since the active code
-    stays unchanged, however the new function needs to be callable, but this
-    is the case since its all in mm/memory.c.
- 
-  * Further upstream conversations:
-    
https://lkml.kernel.org/r/[email protected]
-    
https://lkml.kernel.org/r/[email protected]
-    
https://lkml.kernel.org/r/8bc2586bdd8dbbe6d83c09b77b360ec8fcac3736.1711683069.git.baolin.w...@linux.alibaba.com
- 
-  * The refactoring of folio_estimated_sharers to folio_likely_mapped_shared
-    is more significant, since the logic changed from
-    (folio_estimated_sharers) 'estimate the number of sharers of a folio' to
-    (folio_likely_mapped_shared) 'estimate if the folio is mapped into the page
-    tables of more than one MM'.
- 
-  * Since this is an estimation, the results may be unpredictable
-    (especially for bigger folios), and not like expected or assumed
-    (there are quite some side-notes in the code comments of bb34f78d72c2,
-    that mention potential fuzzy results), hence this
-    may lead to unforeseen behavior.
- 
-  * The condition statements became clearer since it's now based on
-    (more or less obvious) number counts, but can still be erroneous in
-    case folio_estimated_sharers does incorrect calculations.
- 
-  * Further upstream conversations:
-    https://lkml.kernel.org/r/[email protected]
-    https://lkml.kernel.org/r/[email protected]
- 
-  * mm/numa_balancing: allow migrate on protnone reference with
+  * The 'fix' to avoid calling folio_likely_mapped_shared for cases where
+    folio might have already been unmapped and the move of the checks
+    might have an impact on page table locks if done wrong,
+    which may lead to wrong locks, blocked memory and finally crashes.
+ 
+  * The direct folio calls in mm/huge_memory.c and mm/memory.c got now
+    'in-directed', which may lead to a different behaviour and side-effects.
+    However, isolation is still done, just slightly different and
+    instead of using numamigrate_isolate_folio, now in (the renamed)
+    migrate_misplaced_folio_prepare.
+ 
+  * Further upstream conversations:
+    https://lkml.kernel.org/r/[email protected]
+    https://lkml.kernel.org/r/[email protected]
+    https://lkml.kernel.org/r/[email protected]
+ 
+  * Fixing a confusing return code, now to just return 0, on success is
+    clarifying the return code handling and usage, and was mainly done in
+    preparation of further changes,
+    but can have bad side effects if the return code was used in other
+    code places already as is.
+ 
+  * Further upstream conversations:
+    https://lkml.kernel.org/r/[email protected]
+    https://lkml.kernel.org/r/[email protected]
+ 
+  * Fixing the fact that NUMA balancing prohibits mTHP
+    (multi-size Transparent Hugepage Support) seems to be unreasonable
+    since its an exclusive mapping.
+    Allowing this seems to bring significant performance improvements
+    see commit message d2136d749d76), but introduced significant changes
+    PTE mapping and modifications and even relies on further commits:
+    859d4adc3415 ("mm: numa: do not trap faults on shared data section pages")
+    80d47f5de5e3 ("mm: don't try to NUMA-migrate COW pages that have other 
uses")
+    This case cause issues on systems configured for THP,
+    may confuse the ordering, which may even lead to memory corruption.
+    And this may especially hit (NUMA) systems with high core numbers,
+    where balancing is more often needed.
+ 
+  * Further upstream conversations:
+    
https://lore.kernel.org/all/[email protected]/
+    
https://lkml.kernel.org/r/c33a5c0b0a0323b1f8ed53772f50501f4b196e25.1712132950.git.baolin.w...@linux.alibaba.com
+    
https://lkml.kernel.org/r/d28d276d599c26df7f38c9de8446f60e22dd1950.1711683069.git.baolin.w...@linux.alibaba.com
+ 
+  * The refactoring of the code for NUMA mapping rebuilding and moving
+    it into a new helper, seems to be straight forward, since the active code
+    stays unchanged, however the new function needs to be callable, but this
+    is the case since its all in mm/memory.c.
+ 
+  * Further upstream conversations:
+    
https://lkml.kernel.org/r/[email protected]
+    
https://lkml.kernel.org/r/[email protected]
+    
https://lkml.kernel.org/r/8bc2586bdd8dbbe6d83c09b77b360ec8fcac3736.1711683069.git.baolin.w...@linux.alibaba.com
+ 
+  * The refactoring of folio_estimated_sharers to folio_likely_mapped_shared
+    is more significant, since the logic changed from
+    (folio_estimated_sharers) 'estimate the number of sharers of a folio' to
+    (folio_likely_mapped_shared) 'estimate if the folio is mapped into the page
+    tables of more than one MM'.
+ 
+  * Since this is an estimation, the results may be unpredictable
+    (especially for bigger folios), and not like expected or assumed
+    (there are quite some side-notes in the code comments of bb34f78d72c2,
+    that mention potential fuzzy results), hence this
+    may lead to unforeseen behavior.
+ 
+  * The condition statements became clearer since it's now based on
+    (more or less obvious) number counts, but can still be erroneous in
+    case folio_estimated_sharers does incorrect calculations.
+ 
+  * Further upstream conversations:
+    https://lkml.kernel.org/r/[email protected]
+    https://lkml.kernel.org/r/[email protected]
+ 
+  * mm/numa_balancing: allow migrate on protnone reference with
  MPOL_PREFERRED_MANY policy
  
-    Commit 133d04b1eee9 extends commit bda420b98505 "numa balancing: migrate
-    on fault among multiple bound nodes" from allowing NUMA fault migrations
-    when the executing node is part of the policy mask for MPOL_BIND,
-    to also support MPOL_PREFERRED_MANY policy.
-    Both cases (MPOL_BIND and MPOL_PREFERRED_MANY) are treated in the same way.
-    In case the NUMA topology is not correctly considered, changes here
-    may lead to decreased memory performance.
-    However, the code changes themselves are relatively traceable.
- 
-  * Further upstream conversations:
-    
https://lkml.kernel.org/r/158acc57319129aa46d50fd64c9330f3e7c7b4bf.1711373653.git.donet...@linux.ibm.com
-    
https://lkml.kernel.org/r/369d6a58758396335fd1176d97bbca4e7730d75a.1709909210.git.donet...@linux.ibm.com
- 
-  * Finally commit f8fd525ba3a2 ("mm/mempolicy: use numa_node_id() instead
-    of cpu_to_node()") is a patchset to further optimize the cross-socket
-    memory access with MPOL_PREFERRED_MANY policy.
-    The mpol_misplaced changes are mainly moving from cpu_to_node to
-    numa_node_id, and with that make the code more NUMA aware.
-    Based on that vm_fault/vmf needs to be considered instead of
-    vm_area_struct/vma.
-    This may have consequences on the memory policy itself.
- 
-  * Further upstream conversations:
-    https://lkml.kernel.org/r/[email protected]
-    
https://lkml.kernel.org/r/6059f034f436734b472d066db69676fb3a459864.1711373653.git.donet...@linux.ibm.com
-    https://lkml.kernel.org/r/[email protected]
-    
https://lkml.kernel.org/r/744646531af02cc687cde8ae788fb1779e99d02c.1709909210.git.donet...@linux.ibm.com
- 
-  * The overall patch set touches quite a bit of common code,
-    but the modifications were intensely discussed with many experts
-    in the various mailing-list threads that are referenced above.
+    Commit 133d04b1eee9 extends commit bda420b98505 "numa balancing: migrate
+    on fault among multiple bound nodes" from allowing NUMA fault migrations
+    when the executing node is part of the policy mask for MPOL_BIND,
+    to also support MPOL_PREFERRED_MANY policy.
+    Both cases (MPOL_BIND and MPOL_PREFERRED_MANY) are treated in the same way.
+    In case the NUMA topology is not correctly considered, changes here
+    may lead to decreased memory performance.
+    However, the code changes themselves are relatively traceable.
+ 
+  * Further upstream conversations:
+    
https://lkml.kernel.org/r/158acc57319129aa46d50fd64c9330f3e7c7b4bf.1711373653.git.donet...@linux.ibm.com
+    
https://lkml.kernel.org/r/369d6a58758396335fd1176d97bbca4e7730d75a.1709909210.git.donet...@linux.ibm.com
+ 
+  * Finally commit f8fd525ba3a2 ("mm/mempolicy: use numa_node_id() instead
+    of cpu_to_node()") is a patchset to further optimize the cross-socket
+    memory access with MPOL_PREFERRED_MANY policy.
+    The mpol_misplaced changes are mainly moving from cpu_to_node to
+    numa_node_id, and with that make the code more NUMA aware.
+    Based on that vm_fault/vmf needs to be considered instead of
+    vm_area_struct/vma.
+    This may have consequences on the memory policy itself.
+ 
+  * Further upstream conversations:
+    https://lkml.kernel.org/r/[email protected]
+    
https://lkml.kernel.org/r/6059f034f436734b472d066db69676fb3a459864.1711373653.git.donet...@linux.ibm.com
+    https://lkml.kernel.org/r/[email protected]
+    
https://lkml.kernel.org/r/744646531af02cc687cde8ae788fb1779e99d02c.1709909210.git.donet...@linux.ibm.com
+ 
+  * The overall patch set touches quite a bit of common code,
+    but the modifications were intensely discussed with many experts
+    in the various mailing-list threads that are referenced above.
  
  [ Other Info ]
  
-  * The first two "mm/migrate" commits are the newest and were
-    upstream accepted with kernel v6.11(-rc1),
-    all other are already upstream since v6.10(-rc1).
- 
-  * Hence oracular (with a planned target kernel of 6.11) is not affect,
-    and the SRU is for noble only.
-    And since (nested) KVM virtualization was (re-)introduced with
-    noble, no older Ubuntu release are affected.
+  * The first two "mm/migrate" commits are the newest and were
+    upstream accepted with kernel v6.11(-rc1),
+    all other are already upstream since v6.10(-rc1).
+ 
+  * Hence oracular (with a planned target kernel of 6.11) is not affect,
+    and the SRU is for noble only.
+ 
+  * And since (nested) KVM virtualization on ppc64el was (re-)introduced
+    just with noble, no older Ubuntu releases older than noble are affected.
  
  __________
  
  == Comment: #0 - SEETEENA THOUFEEK <[email protected]> - 2024-08-12 
23:50:17 ==
  +++ This bug was initially created as a clone of Bug #207985 +++
  
  ---Problem Description---
  Post Migration Non-MDC L1 eralp1 crashed with 
migrate_misplaced_folio+0x4cc/0x5d0 (
  
  Machine Type = na
  
  Contact Information = [email protected]
  
  ---Steps to Reproduce---
   Problem description  :
  After 1 hour of successful migration from doodlp1 [MDC MODE] to eralp1[NON 
MDC mode],eralp1 guest
  and dump is collected
  
  ---uname output---
  na
  
  ---Debugger---
  A debugger is not configured
  
  [281827.975244] NIP [c0000000005f0620] migrate_misplaced_folio+0x4f0/0x5d0
  [281827.975251] LR [c0000000005f067c] migrate_misplaced_folio+0x54c/0x5d0
  [281827.975258] Call Trace:
  [281827.975260] [c000001e19ff7140] [c0000000005f0670] 
migrate_misplaced_folio+0x540/0x5d0 (unreliable)
  [281827.975268] [c000001e19ff71d0] [c00000000054c9f0] 
__handle_mm_fault+0xf70/0x28e0
  [281827.975276] [c000001e19ff7310] [c00000000054e478] 
handle_mm_fault+0x118/0x400
  [281827.975284] [c000001e19ff7360] [c00000000053598c] 
__get_user_pages+0x1ec/0x5b0
  [281827.975291] [c000001e19ff7420] [c000000000536920] 
get_user_pages_unlocked+0x120/0x4f0
  [281827.975298] [c000001e19ff74c0] [c00800001894ea9c] hva_to_pfn+0xf4/0x630 
[kvm]
  [281827.975316] [c000001e19ff7550] [c008000018b4efc4] 
kvmppc_book3s_instantiate_page+0xec/0x790 [kvm_hv]
  [281827.975326] [c000001e19ff7660] [c008000018b4f750] 
kvmppc_book3s_radix_page_fault+0xe8/0x380 [kvm_hv]
  [281827.975335] [c000001e19ff7700] [c008000018b488fc] 
kvmppc_book3s_hv_page_fault+0x294/0xd60 [kvm_hv]
  [281827.975344] [c000001e19ff77e0] [c008000018b43f5c] 
kvmppc_vcpu_run_hv+0xf94/0x11d0 [kvm_hv]
  [281827.975352] [c000001e19ff78a0] [c00800001896131c] 
kvmppc_vcpu_run+0x34/0x48 [kvm]
  [281827.975365] [c000001e19ff78c0] [c00800001895c164] 
kvm_arch_vcpu_ioctl_run+0x39c/0x570 [kvm]
  [281827.975379] [c000001e19ff7950] [c00800001894a104] 
kvm_vcpu_ioctl+0x20c/0x9a8 [kvm]
  [281827.975391] [c000001e19ff7b30] [c000000000683974] sys_ioctl+0x574/0x16a0
  [281827.975395] [c000001e19ff7c30] [c000000000030838] 
system_call_exception+0x168/0x310
  [281827.975400] [c000001e19ff7e50] [c00000000000d05c] 
system_call_vectored_common+0x15c/0x2ec
  [281827.975406] --- interrupt: 3000 at 0x7fffb7d4d2bc
  
  Mirroring to distro as per message in group channel
  
  Please pick these patches for this bug:
  
  ee86814b0562 ("mm/migrate: move NUMA hinting fault folio isolation + checks 
under PTL")
  4b88c23ab8c9 ("mm/migrate: make migrate_misplaced_folio() return 0 on 
success")
  d2136d749d76 ("mm: support multi-size THP numa balancing")
  6b0ed7b3c775 ("mm: factor out the numa mapping rebuilding into a new helper")
  ebb34f78d72c ("mm: convert folio_estimated_sharers() to 
folio_likely_mapped_shared()")
  133d04b1eee9 ("mm/numa_balancing: allow migrate on protnone reference with 
MPOL_PREFERRED_MANY policy")
  f8fd525ba3a2 ("mm/mempolicy: use numa_node_id() instead of cpu_to_node()")
  
  Thanks,
  Amit


-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2076866

Title:
  Guest crahses post migration with migrate_misplaced_folio+0x4cc/0x5d0

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/2076866/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2076866] Re: Guest crahses post migration with migrate_misplaced_folio+0x4cc/0x5d0

Reply via email to