Re: [PULL 2/8] migration: ram block cpr blockers

Xiaoyao Li Sat, 05 Apr 2025 10:36:35 -0700

On 3/27/2025 8:27 PM, Steven Sistare wrote:

On 3/26/2025 5:34 PM, Michael Roth wrote:
On Wed, Mar 26, 2025 at 05:13:50PM -0300, Fabiano Rosas wrote:
Michael Roth <michael.r...@amd.com> writes:
Quoting Tom Lendacky (2025-03-26 14:21:31)
On 3/26/25 13:46, Tom Lendacky wrote:
On 3/7/25 12:15, Fabiano Rosas wrote:
From: Steve Sistare <steven.sist...@oracle.com>
Unlike cpr-reboot mode, cpr-transfer mode cannot save volatileram blocksin the migration stream file and recreate them later, because thephysicalmemory for the blocks is pinned and registered for vfio. Add ablocker
for volatile ram blocks.
Also add a blocker for RAM_GUEST_MEMFD. Preserving guest_memfdmay be
sufficient for CPR, but it has not been tested yet.

Signed-off-by: Steve Sistare <steven.sist...@oracle.com>
Reviewed-by: Fabiano Rosas <faro...@suse.de>
Reviewed-by: Peter Xu <pet...@redhat.com>
Reviewed-by: David Hildenbrand <da...@redhat.com>
Message-ID: <1740667681-257312-1-git-send-email-steven.sist...@oracle.com>
Signed-off-by: Fabiano Rosas <faro...@suse.de>
---
  include/exec/memory.h   |  3 ++
  include/exec/ramblock.h |  1 +
  migration/savevm.c      |  2 ++
system/physmem.c | 66 +++++++++++++++++++++++++++++++++++++++++
  4 files changed, 72 insertions(+)
This patch breaks booting an SNP guest as it triggers the following
assert:
qemu-system-x86_64: ../util/error.c:68: error_setv: Assertion`*errp == NULL' failed.
Usually this means the error has already been set previously, which is
not allowed.
I tracked it to the err_setg() call in ram_block_add_cpr_blocker().
It looks like the error message is unable to be printed because
rb->cpr_blocker is NULL.
Adding aux-ram-share=on to the -machine object gets me past theerror andtherefore the assertion, but isn't that an incompatible change tohow an
SNP guest has to be started?
If I update the err_setg() call to use the errp parameter that ispassed
into ram_block_add_cpr_blocker(), I get the following message and then
the guest launch terminates:
The usage at ram_block_add_cpr_blocker() is correct, the cpr_blocker
gets initialized and registered as a migration blocker. The errp only
becomes relevant later when migration starts and the error condition is
met.
qemu-system-x86_64: Memory region pc.bios is not compatible with CPR.
share=on is required for memory-backend objects, and aux-ram-share=on is
required.
Since errp is an &error_fatal, it causes QEMU to exit, so this^ error
message is bogus.
The qemu parameters I used prior to this patch that allowed an SNPguest
to launch were:

-machine type=q35,confidential-guest-support=sev0,memory-backend=ram1
-object memory-backend-memfd,id=ram1,size=16G,share=true,prealloc=false
With these parameters after this patch, the launch fails.
I think it might be failing because the caller of
ram_block_add_cpr_blocker() is passing in &error_abort, but if the
error_setg() is call on a properly initialized cpr_blocker value then
SNP is still able to boot for me.
I'm not sure where the best spot is
to initialize cpr_blocker, it probably needs to be done before either
ram_block_add_cpr_blocker() or ram_block_del_cpr_blocker() arecallable,
but the following avoids the reported crash at least:

diff --git a/system/physmem.c b/system/physmem.c
index 44dd129662..bff0fdcaac 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -4176,6 +4176,7 @@ void ram_block_add_cpr_blocker(RAMBlock *rb,Error **errp)
          return;
      }

+    rb->cpr_blocker = NULL;
Could it be the cpr_blocker already got set at ram_block_add() in the
RAM_GUEST_MEMFD path?
That seems to be the case: in some cases ram_block_add() sets cpr_blocker
when (new_block->flags & RAM_GUEST_MEMFD) is true, and then soon after
when ram_block_add_cpr_blocker() is called on the same RAMBlock:
2025-03-26T21:08:15.092427Z qemu-system-x86_64: warning:ram_block_add: new_block 0x55c247e4c880 new_block->cpr_blocker (nil)name ram1 2025-03-26T21:08:15.124710Z qemu-system-x86_64: warning:ram_block_add: new_block 0x55c2480fde00 new_block->cpr_blocker (nil)name pc.bios 2025-03-26T21:08:15.126190Z qemu-system-x86_64: warning:ram_block_add_cpr_blocker: rb 0x55c2480fde00 rb->cpr_blocker0x55c2480fe050 name pc.bios 2025-03-26T21:08:15.138582Z qemu-system-x86_64: warning:ram_block_add: new_block 0x55c247e3c1e0 new_block->cpr_blocker (nil)name pc.rom 2025-03-26T21:08:15.138938Z qemu-system-x86_64: warning:ram_block_add_cpr_blocker: rb 0x55c247e3c1e0 rb->cpr_blocker0x55c247e3c890 name pc.rom 2025-03-26T21:08:16.185577Z qemu-system-x86_64: warning:ram_block_add_cpr_blocker: rb 0x55c248db9200 rb->cpr_blocker (nil)name /rom@etc/acpi/tables 2025-03-26T21:08:16.187140Z qemu-system-x86_64: warning:ram_block_add_cpr_blocker: rb 0x55c248085620 rb->cpr_blocker (nil)name /rom@etc/table-loader 2025-03-26T21:08:16.188029Z qemu-system-x86_64: warning:ram_block_add_cpr_blocker: rb 0x55c2480ce220 rb->cpr_blocker (nil)name /rom@etc/acpi/rsd
Thanks everyone for debugging this. To summarize,ram_block_add_cpr_blocker already blocksguest_memfd, because rb->fd < 0. The fix is to delete this redundantcode in ram_block_add:
         error_setg(&new_block->cpr_blocker,
                    "Memory region %s uses guest_memfd, "
                    "which is not supported with CPR.",
                    memory_region_name(new_block->mr));
         migrate_add_blocker_modes(&new_block->cpr_blocker, errp,
MIG_MODE_CPR_TRANSFER,MIG_MODE_CPR_EXEC,
                                   -1);

I just encountered the same issue with TDX guest, after rebasing TDXcode to 10.0.0-rc0.


thank you all for the reporting and quick solution for it.

I will submit a fix (unless Tom or Michael would prefer to author it).

- Steve

Re: [PULL 2/8] migration: ram block cpr blockers

Reply via email to