Re: [Qemu-devel] [RFC] qemu: Add virtio pmem device

2018-04-08 Thread David Hildenbrand
On 09.04.2018 05:26, Stefan Hajnoczi wrote:
> On Thu, Apr 05, 2018 at 08:09:26AM -0400, Pankaj Gupta wrote:
>>> Will this raw file already have the "disk information header" (no idea
>>> how that stuff is called) encoded? Are there any plans/possible ways to
>>>
>>> a) automatically create the headers? (if that's even possible)
>>
>> Its raw. Right now we are just supporting raw format.  
>>
>> As this is direct mapping of memory into guest address space, I don't
>> think we can have an abstraction of headers for block specific features.
>> Or may be we can get opinion of others(Qemu block people) it is at all 
>> possible?
> 
> memdev and the block layer are completely separate.  The block layer
> isn't designed for memory-mapped access.
> 

Not questioning if this is the right thing to do now. I was wondering if
we could expose any block device in the future as virtio-pmem. And I
think with quite some work it could be possible.

As you said, we will need some buffering. Maybe userfaultfd and friends
(WP) could allow to implement that.

> I think it makes sense to use memdev here.  If the user wants a block
> device, they should use an emulated block device, not virtio-pmem,
> because buffering is necessary anyway when an image file format is used.
> 
> Stefan
> 


-- 

Thanks,

David / dhildenb


Re: [Qemu-devel] [RFC] qemu: Add virtio pmem device

2018-04-08 Thread Stefan Hajnoczi
On Thu, Apr 05, 2018 at 08:09:26AM -0400, Pankaj Gupta wrote:
> > Will this raw file already have the "disk information header" (no idea
> > how that stuff is called) encoded? Are there any plans/possible ways to
> > 
> > a) automatically create the headers? (if that's even possible)
> 
> Its raw. Right now we are just supporting raw format.  
> 
> As this is direct mapping of memory into guest address space, I don't
> think we can have an abstraction of headers for block specific features.
> Or may be we can get opinion of others(Qemu block people) it is at all 
> possible?

memdev and the block layer are completely separate.  The block layer
isn't designed for memory-mapped access.

I think it makes sense to use memdev here.  If the user wants a block
device, they should use an emulated block device, not virtio-pmem,
because buffering is necessary anyway when an image file format is used.

Stefan


signature.asc
Description: PGP signature


Re: [Qemu-devel] [RFC] qemu: Add virtio pmem device

2018-04-05 Thread David Hildenbrand

>>
>> So right now you're just using some memdev for testing.
> 
> yes.
> 
>>
>> I assume that the memory region we will provide to the guest will be a
>> simple memory mapped raw file. Dirty tracking (using the kvm slot) will
>> be used to detect which blocks actually changed and have to be flushed
>> to disk.
> 
> Not really, we will perform fsync on raw file. As this file is created
> on regular storage and not nvdimm, so host page cache radix tree would have 
> the dirty pages information which will be used for fsync.

Ah right. That makes things a lot easier!

>
>>
>> Will this raw file already have the "disk information header" (no idea
>> how that stuff is called) encoded? Are there any plans/possible ways to
>>
>> a) automatically create the headers? (if that's even possible)
> 
> Its raw. Right now we are just supporting raw format.  
> 
> As this is direct mapping of memory into guest address space, I don't
> think we can have an abstraction of headers for block specific features.
> Or may be we can get opinion of others(Qemu block people) it is at all 
> possible?
> 
>> b) support anything but raw files?
>>
>> Please note that under x86, a KVM memory slot still has a (in my
>> opinion) fairly big overhead depending on the size of the slot (rmap,
>> page_track). We might have to optimize that.
> 
> I have not tried/observed this. Right now I just used single memory slot and 
> cold add
> few MB's of memory in Qemu. Can you please provide more details on this?
> 

You can have a look at kvm_arch_create_memslot() in arch/x86/kvm/x86.c.

"npages" is used to allocate certain arrays (rmap for shadow page
tables). Also kvm_page_track_create_memslot() allocates data for page_track.

Having a big disk involves a lot of memory overhead due to the big kvm
memory slot. This is already the case for NVDIMMs as of now.

Other architectures (e.g. s390x) don't have this "problem". They don't
allocate any such data depending on the size of a memory slot.

This is certainly something to work on in the future.

-- 

Thanks,

David / dhildenb


Re: [Qemu-devel] [RFC] qemu: Add virtio pmem device

2018-04-05 Thread Pankaj Gupta

Hi David,

> >  This patch adds virtio-pmem Qemu device.
> > 
> >  This device configures memory address range information with file
> >  backend type. It acts like persistent memory device for KVM guest.
> >  It presents the memory address range to virtio-pmem driver over
> >  virtio channel and does the block flush whenever there is request
> >  from guest to flush/sync. (Qemu part for backing file flush
> >  is yet to be implemented).
> > 
> >  Current code is a RFC to support guest with persistent memory
> >  range & DAX.
> > 
> > Signed-off-by: Pankaj Gupta 
> > ---
> >  hw/virtio/Makefile.objs |   2 +-
> >  hw/virtio/virtio-pci.c  |  44 +
> >  hw/virtio/virtio-pci.h  |  14 +++
> >  hw/virtio/virtio-pmem.c | 133
> >  
> >  include/hw/pci/pci.h|   1 +
> >  include/hw/virtio/virtio-pmem.h |  43 +
> >  include/standard-headers/linux/virtio_ids.h |   1 +
> >  7 files changed, 237 insertions(+), 1 deletion(-)
> >  create mode 100644 hw/virtio/virtio-pmem.c
> >  create mode 100644 include/hw/virtio/virtio-pmem.h
> > 
> > diff --git a/hw/virtio/Makefile.objs b/hw/virtio/Makefile.objs
> > index 765d363c1f..bb5573d2ef 100644
> > --- a/hw/virtio/Makefile.objs
> > +++ b/hw/virtio/Makefile.objs
> > @@ -5,7 +5,7 @@ common-obj-y += virtio-bus.o
> >  common-obj-y += virtio-mmio.o
> >  
> >  obj-y += virtio.o virtio-balloon.o
> > -obj-$(CONFIG_LINUX) += vhost.o vhost-backend.o vhost-user.o
> > +obj-$(CONFIG_LINUX) += vhost.o vhost-backend.o vhost-user.o virtio-pmem.o
> >  obj-$(CONFIG_VHOST_VSOCK) += vhost-vsock.o
> >  obj-y += virtio-crypto.o
> >  obj-$(CONFIG_VIRTIO_PCI) += virtio-crypto-pci.o
> > diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> > index c20537f31d..114ca05497 100644
> > --- a/hw/virtio/virtio-pci.c
> > +++ b/hw/virtio/virtio-pci.c
> > @@ -2491,6 +2491,49 @@ static const TypeInfo virtio_rng_pci_info = {
> >  .class_init= virtio_rng_pci_class_init,
> >  };
> >  
> > +/* virtio-pmem-pci */
> > +
> > +static void virtio_pmem_pci_realize(VirtIOPCIProxy *vpci_dev, Error
> > **errp)
> > +{
> > +VirtIOPMEMPCI *vpmem = VIRTIO_PMEM_PCI(vpci_dev);
> > +DeviceState *vdev = DEVICE(&vpmem->vdev);
> > +
> > +qdev_set_parent_bus(vdev, BUS(&vpci_dev->bus));
> > +object_property_set_bool(OBJECT(vdev), true, "realized", errp);
> > +}
> > +
> > +static void virtio_pmem_pci_class_init(ObjectClass *klass, void *data)
> > +{
> > +DeviceClass *dc = DEVICE_CLASS(klass);
> > +VirtioPCIClass *k = VIRTIO_PCI_CLASS(klass);
> > +PCIDeviceClass *pcidev_k = PCI_DEVICE_CLASS(klass);
> > +k->realize = virtio_pmem_pci_realize;
> > +set_bit(DEVICE_CATEGORY_MISC, dc->categories);
> > +pcidev_k->vendor_id = PCI_VENDOR_ID_REDHAT_QUMRANET;
> > +pcidev_k->device_id = PCI_DEVICE_ID_VIRTIO_PMEM;
> > +pcidev_k->revision = VIRTIO_PCI_ABI_VERSION;
> > +pcidev_k->class_id = PCI_CLASS_OTHERS;
> > +}
> > +
> > +static void virtio_pmem_pci_instance_init(Object *obj)
> > +{
> > +VirtIOPMEMPCI *dev = VIRTIO_PMEM_PCI(obj);
> > +
> > +virtio_instance_init_common(obj, &dev->vdev, sizeof(dev->vdev),
> > +TYPE_VIRTIO_PMEM);
> > +object_property_add_alias(obj, "memdev", OBJECT(&dev->vdev), "memdev",
> > +  &error_abort);
> > +}
> > +
> > +static const TypeInfo virtio_pmem_pci_info = {
> > +.name  = TYPE_VIRTIO_PMEM_PCI,
> > +.parent= TYPE_VIRTIO_PCI,
> > +.instance_size = sizeof(VirtIOPMEMPCI),
> > +.instance_init = virtio_pmem_pci_instance_init,
> > +.class_init= virtio_pmem_pci_class_init,
> > +};
> > +
> > +
> >  /* virtio-input-pci */
> >  
> >  static Property virtio_input_pci_properties[] = {
> > @@ -2683,6 +2726,7 @@ static void virtio_pci_register_types(void)
> >  type_register_static(&virtio_balloon_pci_info);
> >  type_register_static(&virtio_serial_pci_info);
> >  type_register_static(&virtio_net_pci_info);
> > +type_register_static(&virtio_pmem_pci_info);
> >  #ifdef CONFIG_VHOST_SCSI
> >  type_register_static(&vhost_scsi_pci_info);
> >  #endif
> > diff --git a/hw/virtio/virtio-pci.h b/hw/virtio/virtio-pci.h
> > index 813082b0d7..fe74fcad3f 100644
> > --- a/hw/virtio/virtio-pci.h
> > +++ b/hw/virtio/virtio-pci.h
> > @@ -19,6 +19,7 @@
> >  #include "hw/virtio/virtio-blk.h"
> >  #include "hw/virtio/virtio-net.h"
> >  #include "hw/virtio/virtio-rng.h"
> > +#include "hw/virtio/virtio-pmem.h"
> >  #include "hw/virtio/virtio-serial.h"
> >  #include "hw/virtio/virtio-scsi.h"
> >  #include "hw/virtio/virtio-balloon.h"
> > @@ -57,6 +58,7 @@ typedef struct VirtIOInputHostPCI VirtIOInputHostPCI;
> >  typedef struct VirtIOGPUPCI VirtIOGPUPCI;
> >  typedef struct VHostVSockPCI VHostVSockPCI;
> >  typedef struct VirtIOCryptoPCI VirtIOCryptoPCI;
> > +typedef struct VirtIOPMEMPCI VirtIOPMEMPCI;
> >