** Description changed:

  SRU Justification:
  
  [ Impact ]
  
-  * While running a (nested) KVM guest on Power 10 (with PowerVM)
-    and performing a CPU hotplug, trying to set to 68 vCPUs,
-    the KVM guest crashes.
+  * While running a (nested) KVM guest on Power 10 (with PowerVM)
+    and performing a CPU hotplug, trying to set to 68 vCPUs,
+    the KVM guest crashes.
  
-  * In the failure case the KVM guest has maxvcpus 128,
-    and it starts fine with an initial value of 4 vCPUs,
-    but fails after a larger increase (here to 68 vCPUs).
+  * In the failure case the KVM guest has maxvcpus 128,
+    and it starts fine with an initial value of 4 vCPUs,
+    but fails after a larger increase (here to 68 vCPUs).
  
-  * The error reported is:
-    [ 662.102542] KVM: Create Guest vcpu hcall failed, rc=-44
-    error: Unable to read from monitor: Connection reset by peer
+  * The error reported is:
+    [ 662.102542] KVM: Create Guest vcpu hcall failed, rc=-44
+    error: Unable to read from monitor: Connection reset by peer
  
-  * This especially seems to happen in memory constraint systems.
+  * This especially seems to happen in memory constraint systems.
  
-  * This can be avoided by pre-creating and parking vCPUs on success
-    or return error otherwise, which then leads to a graceful error 
-    in case of a vCPU hotplug failure, while the guest keeps running.
+  * This can be avoided by pre-creating and parking vCPUs on success
+    or return error otherwise, which then leads to a graceful error
+    in case of a vCPU hotplug failure, while the guest keeps running.
  
  [ Fix ]
  
-  * 08c3286822 ("accel/kvm: Extract common KVM vCPU {creation,parking}
+  * 08c3286822 ("accel/kvm: Extract common KVM vCPU {creation,parking}
  code") [pre-req]
  
-  * c6a3d7bc9e ("accel/kvm: Introduce kvm_create_and_park_vcpu() helper")
+  * c6a3d7bc9e ("accel/kvm: Introduce kvm_create_and_park_vcpu() helper")
  
-  * 18530e7c57 ("cpu-common.c: export cpu_get_free_index to be reused
+  * 18530e7c57 ("cpu-common.c: export cpu_get_free_index to be reused
  later")
  
-  * cfb52d07f5 ("target/ppc: handle vcpu hotplug failure gracefully")
+  * cfb52d07f5 ("target/ppc: handle vcpu hotplug failure gracefully")
  
  [ Test Plan ]
  
-  * Setup an IBM Power10 system (with firmware FW1060 or newer,
-    that comes with nested KVM support), running Ubuntu Server 24.04.
+  * Setup an IBM Power10 system (with firmware FW1060 or newer,
+    that comes with nested KVM support), running Ubuntu Server 24.04.
  
-  * Install and configure KVM on this system with a (higher)
-    maxvcpus value of 128, but have a (smaller) initial value of 4 vCPUs.
-    $ virsh define ubu2404.xml
+  * Install and configure KVM on this system with a (higher)
+    maxvcpus value of 128, but have a (smaller) initial value of 4 vCPUs.
+    $ virsh define ubu2404.xml
+    (https://launchpadlibrarian.net/748483993/check.xml)
  
-  * Now after successful definition, start the VM:
-    $ virsh start ubu2404 --console
+  * Now after successful definition, start the VM:
+    $ virsh start ubu2404 --console
  
-  * If the VM is up and running increase the vCPUs to a larger value
-    here 68:
-    $ virsh setvcpus ubu2404 68
+  * If the VM is up and running increase the vCPUs to a larger value
+    here 68:
+    $ virsh setvcpus ubu2404 68
  
-  * A system with an unpatched qemu will crash, showing:
-    [ 662.102542] KVM: Create Guest vcpu hcall failed, rc=-44
-    error: Unable to read from monitor: Connection reset by peer
+  * A system with an unpatched qemu will crash, showing:
+    [ 662.102542] KVM: Create Guest vcpu hcall failed, rc=-44
+    error: Unable to read from monitor: Connection reset by peer
  
-  * A patches environment will:
-    - either just successfully hotplug the new amount (68) of vCPUs
-      without further messages
-    - or (in case very memory constraint) print a (graceful) error
-      message that hotplug couldn't be performed,
-      but stays up and running:
-      error: internal error: unable to execute QEMU command 'device_add': \
-      kvmppc_cpu_realize: vcpu hotplug failed with -12
+  * A patches environment will:
+    - either just successfully hotplug the new amount (68) of vCPUs
+      without further messages
+    - or (in case very memory constraint) print a (graceful) error
+      message that hotplug couldn't be performed,
+      but stays up and running:
+      error: internal error: unable to execute QEMU command 'device_add': \
+      kvmppc_cpu_realize: vcpu hotplug failed with -12
  
-  * Since certain firmware is required, IBM is doing the test and validation
-    (and already successfully verified based on the PPA test builds).
+  * Since certain firmware is required, IBM is doing the test and validation
+    (and already successfully verified based on the PPA test builds).
  
  [ Where problems could occur ]
  
-  * All modification were done in target/ppc/kvm.c
-    and are with that limited to the IBM Power platform,
-    and will not affect other architectures.
+  * All modification were done in target/ppc/kvm.c
+    and are with that limited to the IBM Power platform,
+    and will not affect other architectures.
  
-  * The implementation of the pre-creation of vCPUs (init cpu_target_realize)
-    may lead to early failures when a user doesn't expect to have such an
-    amount of vCPUs yet.
+  * The implementation of the pre-creation of vCPUs (init cpu_target_realize)
+    may lead to early failures when a user doesn't expect to have such an
+    amount of vCPUs yet.
  
-  * And the pre-creation and especially parking (kvm_create_and_park_vcpu)
-    will probably consume more resources than before.
+  * And the pre-creation and especially parking (kvm_create_and_park_vcpu)
+    will probably consume more resources than before.
  
-  * Hence a patched system might run with a reduced max amount of vCPUs,
-    but instead will not crash hard, but gracefully fail on lack of resources.
+  * Hence a patched system might run with a reduced max amount of vCPUs,
+    but instead will not crash hard, but gracefully fail on lack of resources.
  
-  * This case and the patch(es) are also discussed in more detail here:
-    
https://lore.kernel.org/qemu-devel/[email protected]/T/#t
-    and here:
-    https://bugzilla.redhat.com/show_bug.cgi?id=2304078
+  * This case and the patch(es) are also discussed in more detail here:
+    
https://lore.kernel.org/qemu-devel/[email protected]/T/#t
+    and here:
+    https://bugzilla.redhat.com/show_bug.cgi?id=2304078
  
  [ Other Info ]
  
-  * The code is upstream accepted with qemu v9.1.0(-rc0),
-    and the upload to oracular was done,
-    and now only noble is affected.
+  * The code is upstream accepted with qemu v9.1.0(-rc0),
+    and the upload to oracular was done,
+    and now only noble is affected.
  
-  * Ubuntu releases older than noble are not affected,
-    since (nested) KVM virtualization on P10
-    was introduced starting with noble.
+  * Ubuntu releases older than noble are not affected,
+    since (nested) KVM virtualization on P10
+    was introduced starting with noble.
  __________
  
  == Comment: #0 - SEETEENA THOUFEEK <[email protected]> - 2024-08-12 
03:47:06 ==
  +++ This bug was initially created as a clone of Bug #205620 +++
  
  ---Problem Description---
  cpu hotplug crashes the guest!cpu hotplug crashes the guest!
  
  ---Steps to Reproduce---
   I have been trying for the CPU hotplugging to the guest with maxvcpus as 128 
and current value I am giving as 4! but when I try to hotplug 68 vcpus to the 
guest, it crahses and we get error message as:
  [  303.808494] KVM: Create Guest vcpu hcall failed, rc=-44
  error: Unable to read from monitor: Connection reset by peer
  
  Steps to reproduce:
  
  1) virsh define bug.xml
  
  2) virsh start Fedora39 --console
  
  3) virsh setvcpus Fedora39 68
  
  Output :
  [  662.102542] KVM: Create Guest vcpu hcall failed, rc=-44
  error: Unable to read from monitor: Connection reset by peer
  
  If resources are less, in my thinking it should fail gracefully!
  Attaching the XML file that i have used and will post the observations on MDC 
system there i saw this same failure on higher number.
  
  fixed with upstream commit
  
  https://github.com/qemu/qemu/commit/cfb52d07f53aa916003d43f69c945c2b42bc6374
  
  Machine Type = na
  
  ---Debugger---
  A debugger is not configured
  
  Contact Information = [email protected]
  
  ---uname output---
  NA

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2076587

Title:
  cpu hotplug crashes the guest!

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/2076587/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to