On Tue, May 08, 2018 at 05:03:09PM +0200, Kevin Wolf wrote: > Am 08.05.2018 um 16:41 hat Eric Blake geschrieben: > > On 12/25/2017 01:33 AM, He Junyan wrote: > 2. Make the nvdimm device use the QEMU block layer so that it is backed > by a non-raw disk image (such as a qcow2 file representing the > content of the nvdimm) that supports snapshots. > > This part is hard because it requires some completely new > infrastructure such as mapping clusters of the image file to guest > pages, and doing cluster allocation (including the copy on write > logic) by handling guest page faults. > > I think it makes sense to invest some effort into such interfaces, but > be prepared for a long journey.
I like the suggestion but it needs to be followed up with a concrete design that is feasible and fair for Junyan and others to implement. Otherwise the "long journey" is really just a way of rejecting this feature. Let's discuss the details of using the block layer for NVDIMM and try to come up with a plan. The biggest issue with using the block layer is that persistent memory applications use load/store instructions to directly access data. This is fundamentally different from the block layer, which transfers blocks of data to and from the device. Because of block DMA, QEMU is able to perform processing at each block driver graph node. This doesn't exist for persistent memory because software does not trap I/O. Therefore the concept of filter nodes doesn't make sense for persistent memory - we certainly do not want to trap every I/O because performance would be terrible. Another difference is that persistent memory I/O is synchronous. Load/store instructions execute quickly. Perhaps we could use KVM async page faults in cases where QEMU needs to perform processing, but again the performance would be bad. Most protocol drivers do not support direct memory access. iscsi, curl, etc just don't fit the model. One might be tempted to implement buffering but at that point it's better to just use block devices. I have CCed Pankaj, who is working on the virtio-pmem device. I need to be clear that emulated NVDIMM cannot be supported with the block layer since it lacks a guest flush mechanism. There is no way for applications to let the hypervisor know the file needs to be fsynced. That's what virtio-pmem addresses. Summary: A subset of the block layer could be used to back virtio-pmem. This requires a new block driver API and the KVM async page fault mechanism for trapping and mapping pages. Actual emulated NVDIMM devices cannot be supported unless the hardware specification is extended with a virtualization-friendly interface in the future. Please let me know your thoughts. Stefan
signature.asc
Description: PGP signature
