Hi David,

On 2025/8/20 17:29, David Hildenbrand wrote:
On 20.08.25 06:10, Gao Xiang wrote:
(try to Cc David and Paolo for some discussion...)

Hi David and Paolo,


Hi!

If possible, could you share some thoughts about this, because
currently each `memory-backend-file` has their own page cache
on the host, but if QEMU can provide one nvdimm device backed
by multiple files, so that EROFS can share memory in finer
layer granularity on the host.  (we don't need to attach so
many devices, because some container images can be dozens of
layers.)

Sounds a bit like what virtio-fs does?

Thanks for your reply!

From the use cases themselves, I think it's similar. I also think
it's even closer to use virtio-blk to pass a golden image to the
guest: using a memory device to provide a golden image filesystem
(with many layers) is better for security and data integrity
checks, especially since the user already has a single secure
hash (for example, sha256) of the golden image.

It also avoids certain performance issues, such as unnecessary
metadata messages and virtio-dax slot reclaim problems.



Without further investigatation, I wonder which direction is
better:

    1) one memory-backend-file backed by multiple files;

No.


    2) nvdimm, virtio-pmem, .. backed by multiple
       `memory-backend-file`s..

Better.

But it sounds like needing a per-device modification...



Currently I don't have extra slot to look into the QEMU codebase,
but if the idea is acceptable, I will try to work on this later.

But is this really better than just using many devices?

I think hot-plugging too many devices might be a problem (they
could be many container images in a pod (VM), and each container
image can have dozons of layers), since I've heard similar
concerns about block device hot-plugging from our internal virt
team and folks from other companies, though I haven't looked
into it myself.

And also I heard PMEM devices needs to be aligned with guest
sparse memory SECTION_SIZE, it seems it's unfriendly to
small-size layers, I don't know the latest status and the
details since I'm not actively working on this stuff.

Thanks,
Gao Xiang




Reply via email to