Re: [Qemu-block] [Qemu-devel] Some question about savem/qcow2 incremental snapshot

Stefan Hajnoczi Wed, 09 May 2018 03:20:08 -0700

On Tue, May 08, 2018 at 05:03:09PM +0200, Kevin Wolf wrote:
> Am 08.05.2018 um 16:41 hat Eric Blake geschrieben:
> > On 12/25/2017 01:33 AM, He Junyan wrote:
> 2. Make the nvdimm device use the QEMU block layer so that it is backed
>    by a non-raw disk image (such as a qcow2 file representing the
>    content of the nvdimm) that supports snapshots.
> 
>    This part is hard because it requires some completely new
>    infrastructure such as mapping clusters of the image file to guest
>    pages, and doing cluster allocation (including the copy on write
>    logic) by handling guest page faults.
> 
> I think it makes sense to invest some effort into such interfaces, but
> be prepared for a long journey.


I like the suggestion but it needs to be followed up with a concrete
design that is feasible and fair for Junyan and others to implement.
Otherwise the "long journey" is really just a way of rejecting this
feature.

Let's discuss the details of using the block layer for NVDIMM and try to
come up with a plan.

The biggest issue with using the block layer is that persistent memory
applications use load/store instructions to directly access data.  This
is fundamentally different from the block layer, which transfers blocks
of data to and from the device.

Because of block DMA, QEMU is able to perform processing at each block
driver graph node.  This doesn't exist for persistent memory because
software does not trap I/O.  Therefore the concept of filter nodes
doesn't make sense for persistent memory - we certainly do not want to
trap every I/O because performance would be terrible.

Another difference is that persistent memory I/O is synchronous.
Load/store instructions execute quickly.  Perhaps we could use KVM async
page faults in cases where QEMU needs to perform processing, but again
the performance would be bad.

Most protocol drivers do not support direct memory access.  iscsi, curl,
etc just don't fit the model.  One might be tempted to implement
buffering but at that point it's better to just use block devices.

I have CCed Pankaj, who is working on the virtio-pmem device.  I need to
be clear that emulated NVDIMM cannot be supported with the block layer
since it lacks a guest flush mechanism.  There is no way for
applications to let the hypervisor know the file needs to be fsynced.
That's what virtio-pmem addresses.

Summary:
A subset of the block layer could be used to back virtio-pmem.  This
requires a new block driver API and the KVM async page fault mechanism
for trapping and mapping pages.  Actual emulated NVDIMM devices cannot
be supported unless the hardware specification is extended with a
virtualization-friendly interface in the future.

Please let me know your thoughts.

Stefan

signature.asc
Description: PGP signature

Re: [Qemu-block] [Qemu-devel] Some question about savem/qcow2 incremental snapshot

Reply via email to