On 12/12/2025 12:27 AM, Peter Xu wrote:
On Thu, Dec 11, 2025 at 03:41:46PM +0800, Xiaoyao Li wrote:
On 11/20/2025 1:29 AM, Peter Xu wrote:
Host backends supports guest-memfd now by detecting whether it's a
confidential VM.  There's no way to choose it yet from the memory level to
use it in-place.  If we use guest-memfd, it so far always implies we need
two layers of memory backends, while the guest-memfd only provides the
private set of pages.

This patch introduces a way so that QEMU can consume guest memfd as the
only source of memory to back the object (aka, in place), rather than
having another backend supporting the pages converted to shared.

To use the in-place guest-memfd, one can add a memfd object with:

    -object memory-backend-memfd,guest-memfd=on,share=on

Note that share=on is required with in-place guest_memfd.

Signed-off-by: Peter Xu <[email protected]>

overall looks good to me except a few comments below,

Reviewed-by: Xiaoyao Li <[email protected]>

---
   qapi/qom.json            |  6 +++-
   backends/hostmem-memfd.c | 66 +++++++++++++++++++++++++++++++++++++---
   2 files changed, 67 insertions(+), 5 deletions(-)

diff --git a/qapi/qom.json b/qapi/qom.json
index 6f5c9de0f0..9ebf17bfc7 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -763,13 +763,17 @@
   # @seal: if true, create a sealed-file, which will block further
   #     resizing of the memory (default: true)
   #
+# @guest-memfd: if true, use guest-memfd to back the memory region.
+#     (default: false, since: 11.0)
+#
   # Since: 2.12
   ##
   { 'struct': 'MemoryBackendMemfdProperties',
     'base': 'MemoryBackendProperties',
     'data': { '*hugetlb': 'bool',
               '*hugetlbsize': 'size',
-            '*seal': 'bool' },
+            '*seal': 'bool',
+            '*guest-memfd': 'bool' },
     'if': 'CONFIG_LINUX' }
   ##
diff --git a/backends/hostmem-memfd.c b/backends/hostmem-memfd.c
index ea93f034e4..1fa16c1e1d 100644
--- a/backends/hostmem-memfd.c
+++ b/backends/hostmem-memfd.c
@@ -18,6 +18,8 @@
   #include "qapi/error.h"
   #include "qom/object.h"
   #include "migration/cpr.h"
+#include "system/kvm.h"
+#include <linux/kvm.h>
   OBJECT_DECLARE_SIMPLE_TYPE(HostMemoryBackendMemfd, MEMORY_BACKEND_MEMFD)
@@ -28,6 +30,13 @@ struct HostMemoryBackendMemfd {
       bool hugetlb;
       uint64_t hugetlbsize;
       bool seal;
+    /*
+     * NOTE: this differs from HostMemoryBackend's guest_memfd_private,
+     * which represents a internally private guest-memfd that only backs
+     * private pages.  Instead, this flag marks the memory backend will
+     * 100% use the guest-memfd pages in-place.
+     */
+    bool guest_memfd;
   };
   static bool
@@ -47,10 +56,40 @@ memfd_backend_memory_alloc(HostMemoryBackend *backend, 
Error **errp)
           goto have_fd;
       }
-    fd = qemu_memfd_create(TYPE_MEMORY_BACKEND_MEMFD, backend->size,
-                           m->hugetlb, m->hugetlbsize, m->seal ?
-                           F_SEAL_GROW | F_SEAL_SHRINK | F_SEAL_SEAL : 0,
-                           errp);
+    if (m->guest_memfd) {
+        /* User choose to use in-place guest-memfd to back the VM.. */
+        if (!backend->share) {
+            error_setg(errp, "In-place guest-memfd must be used with 
share=on");
+            return false;
+        }
+
+        /*
+         * This is the request to have a guest-memfd to back private pages.
+         * In-place guest-memfd doesn't work like that.  Disable it for now

This seems not correct to me. I think in-place guest-memfd can work with
guest_memfd_private. The former serves as shared memory and referenced by
the userspace_addr while the latter serves as private memory referenced by
the fd of guest_memfd.

While the argument of "disable it for now to make it simple" does make sense
to me.

Oops, I forgot to touch up quite a few places that kept mentioning
in-place, sorry.

I'll squash this diff into this patch when repost:

diff --git a/backends/hostmem-memfd.c b/backends/hostmem-memfd.c
index 1fa16c1e1d..e9e288651e 100644
--- a/backends/hostmem-memfd.c
+++ b/backends/hostmem-memfd.c
@@ -57,16 +57,16 @@ memfd_backend_memory_alloc(HostMemoryBackend *backend, 
Error **errp)
      }
if (m->guest_memfd) {
-        /* User choose to use in-place guest-memfd to back the VM.. */
+        /* User choose to use fully shared guest-memfd to back the VM.. */
          if (!backend->share) {
-            error_setg(errp, "In-place guest-memfd must be used with 
share=on");
+            error_setg(errp, "Guest-memfd=on must be used with share=on");

lower-case the guest-memfd? since it's the name of the property?

              return false;
          }
/*
           * This is the request to have a guest-memfd to back private pages.
-         * In-place guest-memfd doesn't work like that.  Disable it for now
-         * to make it simple, so that each memory backend can only have
+         * Fully shared guest-memfd doesn't work like that.  Disable it for
+         * now to make it simple, so that each memory backend can only have
           * guest-memfd either as private, or fully shared.
           */
          if (backend->guest_memfd_private) {

After rethinking on it, I think we just remove the check. There is nothing from QEMU side to prevent such usage. It is KVM currently that doesn't support INIT_SHARED guest-memfd for confidential VMs.

If we change to KVM to allow it:

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8bb8830561d4..a658b636a9cf 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -13957,7 +13957,8 @@ bool kvm_arch_no_poll(struct kvm_vcpu *vcpu)
  */
 bool kvm_arch_supports_gmem_init_shared(struct kvm *kvm)
 {
-       return !kvm_arch_has_private_mem(kvm);
+       return true;
+

and remove this check in QEMU, it can actually boot a TDX guest with
fully-shared guest-memfd back'ed as shared part.

I'll also fix the commit message on in-place, now the one to be reposted:

   hostmem: Support fully shared guest memfd to back a VM
Host backends supports guest-memfd now by detecting whether it's a
   confidential VM.  There's no way to choose it yet from the memory level to
   use it fully shared.  If we use guest-memfd, it so far always implies we
   need two layers of memory backends, while the guest-memfd only provides the
   private set of pages.
This patch introduces a way so that QEMU can consume guest memfd as the
   only source of memory to back the object (aka, fully shared), rather than
   having another backend supporting the pages converted to shared.

As above, I think what the patch achieves is to enable guest-memfd (with MMAP and INIT_SHARED) to back shared memory. And it is not conflicted with using another guest-memfd to back private memory.

   To use the fully shared guest-memfd, one can add a memfd object with:
-object memory-backend-memfd,guest-memfd=on,share=on Note that share=on is required with fully shared guest_memfd.

I'll not take your R-b as of now, please check and ack again if you see fit
after reading.


+         * to make it simple, so that each memory backend can only have
+         * guest-memfd either as private, or fully shared.
+         */
+        if (backend->guest_memfd_private) {
+            error_setg(errp, "In-place guest-memfd cannot be used with another 
"
+                       "private guest-memfd");
+            return false;
+        }

please add the following check as I commented in v1:

        if (!kvm_enabled()) {
                error_setg(errp, "in-place guest-memfd requires KVM");
                return false;
        }

IMHO it's redundant to set here, when kvm not enabled,
kvm_create_guest_memfd() should be a stub.

No. The KVM stub is for the case where KVM is disable at compile time.

The kvm_enabled() check here is for the case where users use different accelerators other than KVM, e.g., -accel tcg.

However indeed I found the stub didn't set an error, so how about add one
trivial patch to add a verbal error for it instead?

commit aeeaba6dfc68a1c89af90c12f36cb8fe48faecfd
Author: Peter Xu <[email protected]>
Date:   Thu Dec 11 11:19:44 2025 -0500

     kvm/stub: Provide explicit error for kvm_create_guest_memfd()
So that there will be a verbal string returned when kvm not enabled. Signed-off-by: Peter Xu <[email protected]>
---
  accel/stubs/kvm-stub.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/accel/stubs/kvm-stub.c b/accel/stubs/kvm-stub.c
index 73f04eb589..01b1d6285e 100644
--- a/accel/stubs/kvm-stub.c
+++ b/accel/stubs/kvm-stub.c
@@ -123,6 +123,7 @@ bool kvm_hwpoisoned_mem(void)
int kvm_create_guest_memfd(uint64_t size, uint64_t flags, Error **errp)
  {
+    error_setg(errp, "KVM is not enabled");
      return -ENOSYS;
  }

IIUC it'll achieve the same goal with better layering.


+        /* TODO: add huge page support */
+        fd = kvm_create_guest_memfd(backend->size,
+                                    GUEST_MEMFD_FLAG_MMAP |
+                                    GUEST_MEMFD_FLAG_INIT_SHARED,
+                                    errp);
+        if (fd < 0) {
+            return false;
+        }

how about just removing the fd check here because ...

We needed it because at least the stub returns -ENOSYS..

I can remove it, but I'll need to change below to "fd<0" check.  That I can
do.

yeah, change to "fd < 0" looks better.

Thanks,


+    } else {
+        fd = qemu_memfd_create(TYPE_MEMORY_BACKEND_MEMFD, backend->size,
+                               m->hugetlb, m->hugetlbsize, m->seal ?
+                               F_SEAL_GROW | F_SEAL_SHRINK | F_SEAL_SEAL : 0,
+                               errp);
+    }
+
       if (fd == -1) {
           return false;
       }

... the existing check can work for the guest memfd as well.


Reply via email to