Re: [PATCH RFC 1/4] include/linux/compiler*.h: fix OPTIMIZER_HIDE_VAR

2019-01-09 Thread Michael S. Tsirkin
On Wed, Jan 09, 2019 at 11:35:52AM +0100, Miguel Ojeda wrote:
> On Tue, Jan 8, 2019 at 6:44 PM Nick Desaulniers  
> wrote:
> >
> > Also for more context, see:
> > commit 7829fb09a2b4 ("lib: make memzero_explicit more robust against
> > dead store elimination")
> 
> By the way, shouldn't that barrier_data() be directly in compiler.h
> too, since it is for both gcc & clang?
> 
> > Reviewed-by: Nick Desaulniers 
> >
> > + Miguel
> > Miguel, would you mind taking this into your compiler-attributes tree?
> 
> Sure, at least we get quickly some linux-next time.


BTW why linux-next? shouldn't this go into 5.0 and stable? It's a bugfix after 
all.

> Note it would be nice to separate the patch into two (one for the
> comments, another for OPTIMIZER_HIDE_VAR), and also possibly another
> for barrier_data().
> 
> Cheers,
> Miguel
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 0/5] kvm "virtio pmem" device

2019-01-09 Thread Dave Chinner
On Wed, Jan 09, 2019 at 08:17:31PM +0530, Pankaj Gupta wrote:
>  This patch series has implementation for "virtio pmem". 
>  "virtio pmem" is fake persistent memory(nvdimm) in guest 
>  which allows to bypass the guest page cache. This also
>  implements a VIRTIO based asynchronous flush mechanism.  

H. Sharing the host page cache direct into the guest VM. Sounds
like a good idea, but.

This means the guest VM can now run timing attacks to observe host
side page cache residency, and depending on the implementation I'm
guessing that the guest will be able to control host side page
cache eviction, too (e.g. via discard or hole punch operations).

Which means this functionality looks to me like a new vector for
information leakage into and out of the guest VM via guest
controlled host page cache manipulation.

https://arxiv.org/pdf/1901.01161

I might be wrong, but if I'm not we're going to have to be very
careful about how guest VMs can access and manipulate host side
resources like the page cache.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] virtio-ccw: wire up ->bus_name callback

2019-01-09 Thread Cornelia Huck
On Thu,  3 Jan 2019 15:11:45 +0100
Cornelia Huck  wrote:

> Return the bus id of the ccw proxy device. This makes 'ethtool -i'
> show a more useful value than 'virtio' in the bus-info field.
> 
> Signed-off-by: Cornelia Huck 
> ---
>  drivers/s390/virtio/virtio_ccw.c | 8 
>  1 file changed, 8 insertions(+)

Queued.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 1/3] virtio-balloon: tweak config_changed implementation

2019-01-09 Thread Michael S. Tsirkin
On Wed, Jan 09, 2019 at 07:22:50PM +0100, Christian Borntraeger wrote:
> 
> On 09.01.2019 15:52, Michael S. Tsirkin wrote:
> > On Wed, Jan 09, 2019 at 01:07:16PM +0100, Christian Borntraeger wrote:
> >> On 09.01.2019 11:35, Wei Wang wrote:
> >>> On 01/08/2019 04:46 PM, Christian Borntraeger wrote:
> 
>  On 08.01.2019 06:35, Wei Wang wrote:
> > On 01/07/2019 09:49 PM, Christian Borntraeger wrote:
> >> On 07.01.2019 08:01, Wei Wang wrote:
> >>> virtio-ccw has deadlock issues with reading the config space inside 
> >>> the
> >>> interrupt context, so we tweak the virtballoon_changed implementation
> >>> by moving the config read operations into the related workqueue 
> >>> contexts.
> >>> The config_read_bitmap is used as a flag to the workqueue callbacks
> >>> about the related config fields that need to be read.
> >>>
> >>> The cmd_id_received is also renamed to cmd_id_received_cache, and
> >>> the value should be obtained via virtio_balloon_cmd_id_received.
> >>>
> >>> Reported-by: Christian Borntraeger 
> >>> Signed-off-by: Wei Wang 
> >>> Reviewed-by: Cornelia Huck 
> >>> Reviewed-by: Halil Pasic 
> >> Together with
> >>     virtio_pci: use queue idx instead of array idx to set up the vq
> >>     virtio: don't allocate vqs when names[i] = NULL
> >>
> >> Tested-by: Christian Borntraeger 
> > OK. I don't plan to send a new version of the above patches as no 
> > changes needed so far.
> >
> > Michael, if the above two patches look good to you, please help add the 
> > related tested-by
> > and reviewed-by tags. Thanks.
>  Can we also make sure that
> 
>  virtio_pci: use queue idx instead of array idx to set up the vq
>  virtio: don't allocate vqs when names[i] = NULL
> 
>  also land in stable?
> 
> >>>
> >>> You could also send the request to stable after it gets merged to Linus' 
> >>> tree.
> >>> The stable review committee will decide whether to take it.
> >>>
> >>> Please see Option 2:
> >>>
> >>> https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
> >>>
> >>
> >> Those patches are not upstream yet, Correct?
> >>
> >> Michael,
> >>
> >> can you add the stable tag before submitting? If not, can you give me a 
> >> heads up when doing the
> >> pull request so that I can ping the stable folks.
> > 
> > Can you reply to patches that you feel are needed on stable with just
> > 
> > Cc: sta...@vger.kernel.org
> > 
> > in the message body?
> > 
> > Then it's all automatically handled by ack attaching scripts.
> 
> Done. But this only works if those patches are not already part of a tree. I 
> guess they have to go via
> your tree, correct?

Yes. It works because I rebase my tree.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 1/3] virtio-balloon: tweak config_changed implementation

2019-01-09 Thread Christian Borntraeger

On 09.01.2019 15:52, Michael S. Tsirkin wrote:
> On Wed, Jan 09, 2019 at 01:07:16PM +0100, Christian Borntraeger wrote:
>> On 09.01.2019 11:35, Wei Wang wrote:
>>> On 01/08/2019 04:46 PM, Christian Borntraeger wrote:

 On 08.01.2019 06:35, Wei Wang wrote:
> On 01/07/2019 09:49 PM, Christian Borntraeger wrote:
>> On 07.01.2019 08:01, Wei Wang wrote:
>>> virtio-ccw has deadlock issues with reading the config space inside the
>>> interrupt context, so we tweak the virtballoon_changed implementation
>>> by moving the config read operations into the related workqueue 
>>> contexts.
>>> The config_read_bitmap is used as a flag to the workqueue callbacks
>>> about the related config fields that need to be read.
>>>
>>> The cmd_id_received is also renamed to cmd_id_received_cache, and
>>> the value should be obtained via virtio_balloon_cmd_id_received.
>>>
>>> Reported-by: Christian Borntraeger 
>>> Signed-off-by: Wei Wang 
>>> Reviewed-by: Cornelia Huck 
>>> Reviewed-by: Halil Pasic 
>> Together with
>>     virtio_pci: use queue idx instead of array idx to set up the vq
>>     virtio: don't allocate vqs when names[i] = NULL
>>
>> Tested-by: Christian Borntraeger 
> OK. I don't plan to send a new version of the above patches as no changes 
> needed so far.
>
> Michael, if the above two patches look good to you, please help add the 
> related tested-by
> and reviewed-by tags. Thanks.
 Can we also make sure that

 virtio_pci: use queue idx instead of array idx to set up the vq
 virtio: don't allocate vqs when names[i] = NULL

 also land in stable?

>>>
>>> You could also send the request to stable after it gets merged to Linus' 
>>> tree.
>>> The stable review committee will decide whether to take it.
>>>
>>> Please see Option 2:
>>>
>>> https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
>>>
>>
>> Those patches are not upstream yet, Correct?
>>
>> Michael,
>>
>> can you add the stable tag before submitting? If not, can you give me a 
>> heads up when doing the
>> pull request so that I can ping the stable folks.
> 
> Can you reply to patches that you feel are needed on stable with just
> 
> Cc: sta...@vger.kernel.org
> 
> in the message body?
> 
> Then it's all automatically handled by ack attaching scripts.

Done. But this only works if those patches are not already part of a tree. I 
guess they have to go via
your tree, correct?

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v2 15/15] drm/bochs: reserve bo for pin/unpin

2019-01-09 Thread Alex Deucher
On Wed, Jan 9, 2019 at 12:36 PM Daniel Vetter  wrote:
>
> On Wed, Jan 9, 2019 at 3:54 PM Gerd Hoffmann  wrote:
> >
> > On Wed, Jan 09, 2019 at 11:10:44AM +0100, Daniel Vetter wrote:
> > > On Tue, Jan 08, 2019 at 12:25:19PM +0100, Gerd Hoffmann wrote:
> > > > The buffer object must be reserved before calling
> > > > ttm_bo_validate for pinning/unpinning.
> > > >
> > > > Signed-off-by: Gerd Hoffmann 
> > >
> > > Seems a bit a bisect fumble in your series here: legacy kms code reserved
> > > the ttm bo before calling boch_bo_pin/unpin, your atomic code doesn't. I
> > > think pushing this into bochs_bo_pin/unpin makes sense for atomic, but to
> > > avoid bisect fail I think you need to have these temporarily in your
> > > cleanup/prepare_plane functions too.
> >
> > I think I've sorted that.  Have some other changes too, will probably
> > send v3 tomorrow.
> >
> > > Looked through the entire series, this here is the only issue I think
> > > should be fixed before merging (making atomic_enable optional can be done
> > > as a follow-up if you feel like it). With that addressed on the series:
> > >
> > > Acked-by: Daniel Vetter 
> >
> > Thanks.
> >
> > While being at it:  I'm also looking at dma-buf export and import
> > support for the qemu drivers.
> >
> > Right now both qxl and virtio have gem_prime_get_sg_table and
> > gem_prime_import_sg_table handlers which throw a WARN_ONCE() and return
> > an error.
> >
> > If I understand things correctly it is valid to set all import/export
> > callbacks (prime_handle_to_fd, prime_fd_to_handle,
> > gem_prime_get_sg_table, gem_prime_import_sg_table) to NULL when not
> > supporting dma-buf import/export and still advertise DRIVER_PRIME to
> > indicate the other prime callbacks are supported (so generic fbdev
> > emulation can use gem_prime_vmap etc).  Is that correct?
>
> I'm not sure how much that's a good idea ... Never thought about it
> tbh. All the fbdev/dma-buf stuff has plenty of hacks and
> inconsistencies still, so I guess we can't make it much worse really.
>
> > On exporting:
> >
> > TTM_PL_TT should be easy, just pin the buffer, grab the pages list and
> > feed that into drm_prime_pages_to_sg.  Didn't try yet though.  Is that
> > approach correct?
> >
> > Is it possible to export TTM_PL_VRAM objects (with backing storage being
> > a pci memory bar)?  If so, how?
>
> Not really in general. amdgpu upcasts to amdgpu_bo (if it's amgpu BO)
> and then knows the internals so it can do a proper pci peer2peer
> mapping. Or at least there's been lots of patches floating around to
> make that happen.

Here's Christian's WIP stuff for adding device memory support to dma-buf:
https://cgit.freedesktop.org/~deathsimple/linux/log/?h=p2p

Alex

>
> I think other drivers migrate the bo out of VRAM.
>
> > On importing:
> >
> > Importing into TTM_PL_TT object looks easy again, at least when the
> > object is actually stored in RAM.  What if not?
>
> They are all supposed to be stored in RAM. Note that all current ttm
> importers totally break the abstraction, by taking the sg list,
> throwing the dma mapping away and assuming there's a struct page
> backing it. Would be good if we could stop spreading that abuse - the
> dma-buf interfaces have been modelled after the ttm bo interfaces, so
> shouldn't be too hard to wire this up correctly.
>
> > Importing into TTM_PL_VRAM:  Impossible I think, without copying over
> > the data.  Should that be done?  If so, how?  Or is it better to just
> > not support import then?
>
> Hm, since you ask about TTM concepts and not what this means in terms
> of dma-buf: As long as you upcast to the ttm_bo you can do whatever
> you want to really. But with plain dma-buf this doesn't work right now
> (least because ttm assumes it gets system RAM on import, in theory you
> could put the peer2peer dma mapping into the sg list and it should
> work).
> -Daniel
>
> >
> > thanks,
> >   Gerd
> >
> > ___
> > dri-devel mailing list
> > dri-de...@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
>
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 3/5] libnvdimm: add nd_region buffered dax_dev flag

2019-01-09 Thread Pankaj Gupta


> >
> > This patch adds 'DAXDEV_BUFFERED' flag which is set
> > for virtio pmem corresponding nd_region. This later
> > is used to disable MAP_SYNC functionality for ext4
> > & xfs filesystem.
> >
> > Signed-off-by: Pankaj Gupta 
> > ---
> >  drivers/dax/super.c  | 17 +
> >  drivers/nvdimm/pmem.c|  3 +++
> >  drivers/nvdimm/region_devs.c |  7 +++
> >  drivers/virtio/pmem.c|  1 +
> >  include/linux/dax.h  |  9 +
> >  include/linux/libnvdimm.h|  6 ++
> >  6 files changed, 43 insertions(+)
> >
> > diff --git a/drivers/dax/super.c b/drivers/dax/super.c
> > index 6e928f3..9128740 100644
> > --- a/drivers/dax/super.c
> > +++ b/drivers/dax/super.c
> > @@ -167,6 +167,8 @@ enum dax_device_flags {
> > DAXDEV_ALIVE,
> > /* gate whether dax_flush() calls the low level flush routine */
> > DAXDEV_WRITE_CACHE,
> > +   /* flag to disable MAP_SYNC for virtio based host page cache flush
> > */
> > +   DAXDEV_BUFFERED,
> >  };
> >
> >  /**
> > @@ -335,6 +337,21 @@ bool dax_write_cache_enabled(struct dax_device
> > *dax_dev)
> >  }
> >  EXPORT_SYMBOL_GPL(dax_write_cache_enabled);
> >
> > +void virtio_pmem_host_cache(struct dax_device *dax_dev, bool wc)
> > +{
> > +   if (wc)
> > +   set_bit(DAXDEV_BUFFERED, _dev->flags);
> > +   else
> > +   clear_bit(DAXDEV_BUFFERED, _dev->flags);
> > +}
> > +EXPORT_SYMBOL_GPL(virtio_pmem_host_cache);
> 
> The "write_cache" property was structured this way because it can
> conceivably change at runtime. The MAP_SYNC capability should be
> static and never changed after init.

o.k. Will change.

> 
> > +bool virtio_pmem_host_cache_enabled(struct dax_device *dax_dev)
> > +{
> > +   return test_bit(DAXDEV_BUFFERED, _dev->flags);
> > +}
> > +EXPORT_SYMBOL_GPL(virtio_pmem_host_cache_enabled);
> 
> Echoing Darrick and Jan this is should be a generic property of a
> dax_device and not specific to virtio. I don't like the "buffered"
> designation as that's not accurate. There may be hardware reasons why
> a dax_device is not synchronous, like a requirement to flush a
> write-pending queue or otherwise notify the device of new writes.

Agree.

> 
> I would just have a dax_synchronous() helper and a DAXDEV_SYNC flag. I
> would also modify alloc_dax() to take a flags argument so that the
> capability can be instantiated when the dax_device is allocated.

o.k. Will make the change.

Thanks,
Pankaj
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v1 2/2] virtio: don't allocate vqs when names[i] = NULL

2019-01-09 Thread Christian Borntraeger
Cc: sta...@vger.kernel.org
Fixes: 86a559787e6f ("virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT")

On 28.12.2018 03:26, Wei Wang wrote:
> Some vqs may not need to be allocated when their related feature bits
> are disabled. So callers may pass in such vqs with "names = NULL".
> Then we skip such vq allocations.
> 
> Signed-off-by: Wei Wang 
> ---
>  drivers/misc/mic/vop/vop_main.c|  9 +++--
>  drivers/remoteproc/remoteproc_virtio.c |  9 +++--
>  drivers/s390/virtio/virtio_ccw.c   | 12 +---
>  drivers/virtio/virtio_mmio.c   |  9 +++--
>  4 files changed, 30 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/misc/mic/vop/vop_main.c b/drivers/misc/mic/vop/vop_main.c
> index 6b212c8..2bfa3a9 100644
> --- a/drivers/misc/mic/vop/vop_main.c
> +++ b/drivers/misc/mic/vop/vop_main.c
> @@ -394,16 +394,21 @@ static int vop_find_vqs(struct virtio_device *dev, 
> unsigned nvqs,
>   struct _vop_vdev *vdev = to_vopvdev(dev);
>   struct vop_device *vpdev = vdev->vpdev;
>   struct mic_device_ctrl __iomem *dc = vdev->dc;
> - int i, err, retry;
> + int i, err, retry, queue_idx = 0;
>  
>   /* We must have this many virtqueues. */
>   if (nvqs > ioread8(>desc->num_vq))
>   return -ENOENT;
>  
>   for (i = 0; i < nvqs; ++i) {
> + if (!names[i]) {
> + vqs[i] = NULL;
> + continue;
> + }
> +
>   dev_dbg(_vop_dev(vdev), "%s: %d: %s\n",
>   __func__, i, names[i]);
> - vqs[i] = vop_find_vq(dev, i, callbacks[i], names[i],
> + vqs[i] = vop_find_vq(dev, queue_idx++, callbacks[i], names[i],
>ctx ? ctx[i] : false);
>   if (IS_ERR(vqs[i])) {
>   err = PTR_ERR(vqs[i]);
> diff --git a/drivers/remoteproc/remoteproc_virtio.c 
> b/drivers/remoteproc/remoteproc_virtio.c
> index 183fc42..2d7cd344 100644
> --- a/drivers/remoteproc/remoteproc_virtio.c
> +++ b/drivers/remoteproc/remoteproc_virtio.c
> @@ -153,10 +153,15 @@ static int rproc_virtio_find_vqs(struct virtio_device 
> *vdev, unsigned int nvqs,
>const bool * ctx,
>struct irq_affinity *desc)
>  {
> - int i, ret;
> + int i, ret, queue_idx = 0;
>  
>   for (i = 0; i < nvqs; ++i) {
> - vqs[i] = rp_find_vq(vdev, i, callbacks[i], names[i],
> + if (!names[i]) {
> + vqs[i] = NULL;
> + continue;
> + }
> +
> + vqs[i] = rp_find_vq(vdev, queue_idx++, callbacks[i], names[i],
>   ctx ? ctx[i] : false);
>   if (IS_ERR(vqs[i])) {
>   ret = PTR_ERR(vqs[i]);
> diff --git a/drivers/s390/virtio/virtio_ccw.c 
> b/drivers/s390/virtio/virtio_ccw.c
> index fc9dbad..ae1d56d 100644
> --- a/drivers/s390/virtio/virtio_ccw.c
> +++ b/drivers/s390/virtio/virtio_ccw.c
> @@ -635,7 +635,7 @@ static int virtio_ccw_find_vqs(struct virtio_device 
> *vdev, unsigned nvqs,
>  {
>   struct virtio_ccw_device *vcdev = to_vc_device(vdev);
>   unsigned long *indicatorp = NULL;
> - int ret, i;
> + int ret, i, queue_idx = 0;
>   struct ccw1 *ccw;
>  
>   ccw = kzalloc(sizeof(*ccw), GFP_DMA | GFP_KERNEL);
> @@ -643,8 +643,14 @@ static int virtio_ccw_find_vqs(struct virtio_device 
> *vdev, unsigned nvqs,
>   return -ENOMEM;
>  
>   for (i = 0; i < nvqs; ++i) {
> - vqs[i] = virtio_ccw_setup_vq(vdev, i, callbacks[i], names[i],
> -  ctx ? ctx[i] : false, ccw);
> + if (!names[i]) {
> + vqs[i] = NULL;
> + continue;
> + }
> +
> + vqs[i] = virtio_ccw_setup_vq(vdev, queue_idx++, callbacks[i],
> +  names[i], ctx ? ctx[i] : false,
> +  ccw);
>   if (IS_ERR(vqs[i])) {
>   ret = PTR_ERR(vqs[i]);
>   vqs[i] = NULL;
> diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> index 4cd9ea5..d9dd0f78 100644
> --- a/drivers/virtio/virtio_mmio.c
> +++ b/drivers/virtio/virtio_mmio.c
> @@ -468,7 +468,7 @@ static int vm_find_vqs(struct virtio_device *vdev, 
> unsigned nvqs,
>  {
>   struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
>   unsigned int irq = platform_get_irq(vm_dev->pdev, 0);
> - int i, err;
> + int i, err, queue_idx = 0;
>  
>   err = request_irq(irq, vm_interrupt, IRQF_SHARED,
>   dev_name(>dev), vm_dev);
> @@ -476,7 +476,12 @@ static int vm_find_vqs(struct virtio_device *vdev, 
> unsigned nvqs,
>   return err;
>  
>   for (i = 0; i < nvqs; ++i) {
> - vqs[i] = vm_setup_vq(vdev, i, callbacks[i], names[i],
> + if (!names[i]) {
> +

Re: [PATCH v2 15/15] drm/bochs: reserve bo for pin/unpin

2019-01-09 Thread Daniel Vetter
On Wed, Jan 9, 2019 at 3:54 PM Gerd Hoffmann  wrote:
>
> On Wed, Jan 09, 2019 at 11:10:44AM +0100, Daniel Vetter wrote:
> > On Tue, Jan 08, 2019 at 12:25:19PM +0100, Gerd Hoffmann wrote:
> > > The buffer object must be reserved before calling
> > > ttm_bo_validate for pinning/unpinning.
> > >
> > > Signed-off-by: Gerd Hoffmann 
> >
> > Seems a bit a bisect fumble in your series here: legacy kms code reserved
> > the ttm bo before calling boch_bo_pin/unpin, your atomic code doesn't. I
> > think pushing this into bochs_bo_pin/unpin makes sense for atomic, but to
> > avoid bisect fail I think you need to have these temporarily in your
> > cleanup/prepare_plane functions too.
>
> I think I've sorted that.  Have some other changes too, will probably
> send v3 tomorrow.
>
> > Looked through the entire series, this here is the only issue I think
> > should be fixed before merging (making atomic_enable optional can be done
> > as a follow-up if you feel like it). With that addressed on the series:
> >
> > Acked-by: Daniel Vetter 
>
> Thanks.
>
> While being at it:  I'm also looking at dma-buf export and import
> support for the qemu drivers.
>
> Right now both qxl and virtio have gem_prime_get_sg_table and
> gem_prime_import_sg_table handlers which throw a WARN_ONCE() and return
> an error.
>
> If I understand things correctly it is valid to set all import/export
> callbacks (prime_handle_to_fd, prime_fd_to_handle,
> gem_prime_get_sg_table, gem_prime_import_sg_table) to NULL when not
> supporting dma-buf import/export and still advertise DRIVER_PRIME to
> indicate the other prime callbacks are supported (so generic fbdev
> emulation can use gem_prime_vmap etc).  Is that correct?

I'm not sure how much that's a good idea ... Never thought about it
tbh. All the fbdev/dma-buf stuff has plenty of hacks and
inconsistencies still, so I guess we can't make it much worse really.

> On exporting:
>
> TTM_PL_TT should be easy, just pin the buffer, grab the pages list and
> feed that into drm_prime_pages_to_sg.  Didn't try yet though.  Is that
> approach correct?
>
> Is it possible to export TTM_PL_VRAM objects (with backing storage being
> a pci memory bar)?  If so, how?

Not really in general. amdgpu upcasts to amdgpu_bo (if it's amgpu BO)
and then knows the internals so it can do a proper pci peer2peer
mapping. Or at least there's been lots of patches floating around to
make that happen.

I think other drivers migrate the bo out of VRAM.

> On importing:
>
> Importing into TTM_PL_TT object looks easy again, at least when the
> object is actually stored in RAM.  What if not?

They are all supposed to be stored in RAM. Note that all current ttm
importers totally break the abstraction, by taking the sg list,
throwing the dma mapping away and assuming there's a struct page
backing it. Would be good if we could stop spreading that abuse - the
dma-buf interfaces have been modelled after the ttm bo interfaces, so
shouldn't be too hard to wire this up correctly.

> Importing into TTM_PL_VRAM:  Impossible I think, without copying over
> the data.  Should that be done?  If so, how?  Or is it better to just
> not support import then?

Hm, since you ask about TTM concepts and not what this means in terms
of dma-buf: As long as you upcast to the ttm_bo you can do whatever
you want to really. But with plain dma-buf this doesn't work right now
(least because ttm assumes it gets system RAM on import, in theory you
could put the peer2peer dma mapping into the sg list and it should
work).
-Daniel

>
> thanks,
>   Gerd
>
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel



-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC 1/4] include/linux/compiler*.h: fix OPTIMIZER_HIDE_VAR

2019-01-09 Thread Michael S. Tsirkin
On Wed, Jan 09, 2019 at 11:35:52AM +0100, Miguel Ojeda wrote:
> On Tue, Jan 8, 2019 at 6:44 PM Nick Desaulniers  
> wrote:
> >
> > Also for more context, see:
> > commit 7829fb09a2b4 ("lib: make memzero_explicit more robust against
> > dead store elimination")
> 
> By the way, shouldn't that barrier_data() be directly in compiler.h
> too, since it is for both gcc & clang?
> 
> > Reviewed-by: Nick Desaulniers 
> >
> > + Miguel
> > Miguel, would you mind taking this into your compiler-attributes tree?
> 
> Sure, at least we get quickly some linux-next time.
> 
> Note it would be nice to separate the patch into two (one for the
> comments, another for OPTIMIZER_HIDE_VAR), and also possibly another
> for barrier_data().
> 
> Cheers,
> Miguel

Okay, I will try.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 1/3] virtio-balloon: tweak config_changed implementation

2019-01-09 Thread Michael S. Tsirkin
On Wed, Jan 09, 2019 at 01:07:16PM +0100, Christian Borntraeger wrote:
> On 09.01.2019 11:35, Wei Wang wrote:
> > On 01/08/2019 04:46 PM, Christian Borntraeger wrote:
> >>
> >> On 08.01.2019 06:35, Wei Wang wrote:
> >>> On 01/07/2019 09:49 PM, Christian Borntraeger wrote:
>  On 07.01.2019 08:01, Wei Wang wrote:
> > virtio-ccw has deadlock issues with reading the config space inside the
> > interrupt context, so we tweak the virtballoon_changed implementation
> > by moving the config read operations into the related workqueue 
> > contexts.
> > The config_read_bitmap is used as a flag to the workqueue callbacks
> > about the related config fields that need to be read.
> >
> > The cmd_id_received is also renamed to cmd_id_received_cache, and
> > the value should be obtained via virtio_balloon_cmd_id_received.
> >
> > Reported-by: Christian Borntraeger 
> > Signed-off-by: Wei Wang 
> > Reviewed-by: Cornelia Huck 
> > Reviewed-by: Halil Pasic 
>  Together with
>      virtio_pci: use queue idx instead of array idx to set up the vq
>      virtio: don't allocate vqs when names[i] = NULL
> 
>  Tested-by: Christian Borntraeger 
> >>> OK. I don't plan to send a new version of the above patches as no changes 
> >>> needed so far.
> >>>
> >>> Michael, if the above two patches look good to you, please help add the 
> >>> related tested-by
> >>> and reviewed-by tags. Thanks.
> >> Can we also make sure that
> >>
> >> virtio_pci: use queue idx instead of array idx to set up the vq
> >> virtio: don't allocate vqs when names[i] = NULL
> >>
> >> also land in stable?
> >>
> > 
> > You could also send the request to stable after it gets merged to Linus' 
> > tree.
> > The stable review committee will decide whether to take it.
> > 
> > Please see Option 2:
> > 
> > https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
> > 
> 
> Those patches are not upstream yet, Correct?
> 
> Michael,
> 
> can you add the stable tag before submitting? If not, can you give me a heads 
> up when doing the
> pull request so that I can ping the stable folks.

Can you reply to patches that you feel are needed on stable with just

Cc: sta...@vger.kernel.org

in the message body?

Then it's all automatically handled by ack attaching scripts.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 4/5] ext4: disable map_sync for virtio pmem

2019-01-09 Thread Pankaj Gupta


> 
> On Wed 09-01-19 19:26:05, Pankaj Gupta wrote:
> > Virtio pmem provides asynchronous host page cache flush
> > mechanism. We don't support 'MAP_SYNC' with virtio pmem
> > and ext4.
> > 
> > Signed-off-by: Pankaj Gupta 
> ...
> > @@ -371,6 +373,13 @@ static int ext4_file_mmap(struct file *file, struct
> > vm_area_struct *vma)
> > if (!IS_DAX(file_inode(file)) && (vma->vm_flags & VM_SYNC))
> > return -EOPNOTSUPP;
> >  
> > +   /* We don't support synchronous mappings with guest direct access
> > +* and virtio based host page cache flush mechanism.
> > +*/
> > +   if (IS_DAX(file_inode(file)) && virtio_pmem_host_cache_enabled(dax_dev)
> > +   && (vma->vm_flags & VM_SYNC))
> > +   return -EOPNOTSUPP;
> > +
> 
> Shouldn't there rather be some generic way of doing this? Having
> virtio_pmem_host_cache_enabled() check in filesystem code just looks like
> filesystem sniffing into details is should not care about... Maybe just
> naming this (or having a wrapper) dax_dev_map_sync_supported()?

Thanks for the feedback.

Just wanted to avoid 'dax' in function name just to differentiate this with 
real dax.
But yes can add a wrapper: dax_dev_map_sync_supported()

Thanks,
Pankaj
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v3 5/5] xfs: disable map_sync for virtio pmem

2019-01-09 Thread Pankaj Gupta
Virtio pmem provides asynchronous host page cache flush
mechanism. we don't support 'MAP_SYNC' with virtio pmem 
and xfs.

Signed-off-by: Pankaj Gupta 
---
 fs/xfs/xfs_file.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index e474250..eae4aa4 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1190,6 +1190,14 @@ xfs_file_mmap(
if (!IS_DAX(file_inode(filp)) && (vma->vm_flags & VM_SYNC))
return -EOPNOTSUPP;
 
+   /* We don't support synchronous mappings with guest direct access
+* and virtio based host page cache mechanism.
+*/
+   if (IS_DAX(file_inode(filp)) && virtio_pmem_host_cache_enabled(
+   xfs_find_daxdev_for_inode(file_inode(filp))) &&
+   (vma->vm_flags & VM_SYNC))
+   return -EOPNOTSUPP;
+
file_accessed(filp);
vma->vm_ops = _file_vm_ops;
if (IS_DAX(file_inode(filp)))
-- 
2.9.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 1/3] virtio-balloon: tweak config_changed implementation

2019-01-09 Thread Michael S. Tsirkin
On Wed, Jan 09, 2019 at 06:35:01PM +0800, Wei Wang wrote:
> On 01/08/2019 04:46 PM, Christian Borntraeger wrote:
> > 
> > On 08.01.2019 06:35, Wei Wang wrote:
> > > On 01/07/2019 09:49 PM, Christian Borntraeger wrote:
> > > > On 07.01.2019 08:01, Wei Wang wrote:
> > > > > virtio-ccw has deadlock issues with reading the config space inside 
> > > > > the
> > > > > interrupt context, so we tweak the virtballoon_changed implementation
> > > > > by moving the config read operations into the related workqueue 
> > > > > contexts.
> > > > > The config_read_bitmap is used as a flag to the workqueue callbacks
> > > > > about the related config fields that need to be read.
> > > > > 
> > > > > The cmd_id_received is also renamed to cmd_id_received_cache, and
> > > > > the value should be obtained via virtio_balloon_cmd_id_received.
> > > > > 
> > > > > Reported-by: Christian Borntraeger 
> > > > > Signed-off-by: Wei Wang 
> > > > > Reviewed-by: Cornelia Huck 
> > > > > Reviewed-by: Halil Pasic 
> > > > Together with
> > > > virtio_pci: use queue idx instead of array idx to set up the vq
> > > > virtio: don't allocate vqs when names[i] = NULL
> > > > 
> > > > Tested-by: Christian Borntraeger 
> > > OK. I don't plan to send a new version of the above patches as no changes 
> > > needed so far.
> > > 
> > > Michael, if the above two patches look good to you, please help add the 
> > > related tested-by
> > > and reviewed-by tags. Thanks.
> > Can we also make sure that
> > 
> > virtio_pci: use queue idx instead of array idx to set up the vq
> > virtio: don't allocate vqs when names[i] = NULL
> > 
> > also land in stable?
> > 
> 
> You could also send the request to stable after it gets merged to Linus'
> tree.
> The stable review committee will decide whether to take it.
> 
> Please see Option 2:
> 
> https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
> 
> 
> Best,
> Wei

That's mostly for 3rd party reporters.
Why not Cc stable directly in the patches?

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v3 4/5] ext4: disable map_sync for virtio pmem

2019-01-09 Thread Pankaj Gupta
Virtio pmem provides asynchronous host page cache flush
mechanism. We don't support 'MAP_SYNC' with virtio pmem 
and ext4. 

Signed-off-by: Pankaj Gupta 
---
 fs/ext4/file.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 69d65d4..e54f48b 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -360,8 +360,10 @@ static const struct vm_operations_struct ext4_file_vm_ops 
= {
 static int ext4_file_mmap(struct file *file, struct vm_area_struct *vma)
 {
struct inode *inode = file->f_mapping->host;
+   struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
+   struct dax_device *dax_dev = sbi->s_daxdev;
 
-   if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb
+   if (unlikely(ext4_forced_shutdown(sbi)))
return -EIO;
 
/*
@@ -371,6 +373,13 @@ static int ext4_file_mmap(struct file *file, struct 
vm_area_struct *vma)
if (!IS_DAX(file_inode(file)) && (vma->vm_flags & VM_SYNC))
return -EOPNOTSUPP;
 
+   /* We don't support synchronous mappings with guest direct access
+* and virtio based host page cache flush mechanism.
+*/
+   if (IS_DAX(file_inode(file)) && virtio_pmem_host_cache_enabled(dax_dev)
+   && (vma->vm_flags & VM_SYNC))
+   return -EOPNOTSUPP;
+
file_accessed(file);
if (IS_DAX(file_inode(file))) {
vma->vm_ops = _dax_vm_ops;
-- 
2.9.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v3 3/5] libnvdimm: add nd_region buffered dax_dev flag

2019-01-09 Thread Pankaj Gupta
This patch adds 'DAXDEV_BUFFERED' flag which is set
for virtio pmem corresponding nd_region. This later
is used to disable MAP_SYNC functionality for ext4
& xfs filesystem.

Signed-off-by: Pankaj Gupta 
---
 drivers/dax/super.c  | 17 +
 drivers/nvdimm/pmem.c|  3 +++
 drivers/nvdimm/region_devs.c |  7 +++
 drivers/virtio/pmem.c|  1 +
 include/linux/dax.h  |  9 +
 include/linux/libnvdimm.h|  6 ++
 6 files changed, 43 insertions(+)

diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index 6e928f3..9128740 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -167,6 +167,8 @@ enum dax_device_flags {
DAXDEV_ALIVE,
/* gate whether dax_flush() calls the low level flush routine */
DAXDEV_WRITE_CACHE,
+   /* flag to disable MAP_SYNC for virtio based host page cache flush */
+   DAXDEV_BUFFERED,
 };
 
 /**
@@ -335,6 +337,21 @@ bool dax_write_cache_enabled(struct dax_device *dax_dev)
 }
 EXPORT_SYMBOL_GPL(dax_write_cache_enabled);
 
+void virtio_pmem_host_cache(struct dax_device *dax_dev, bool wc)
+{
+   if (wc)
+   set_bit(DAXDEV_BUFFERED, _dev->flags);
+   else
+   clear_bit(DAXDEV_BUFFERED, _dev->flags);
+}
+EXPORT_SYMBOL_GPL(virtio_pmem_host_cache);
+
+bool virtio_pmem_host_cache_enabled(struct dax_device *dax_dev)
+{
+   return test_bit(DAXDEV_BUFFERED, _dev->flags);
+}
+EXPORT_SYMBOL_GPL(virtio_pmem_host_cache_enabled);
+
 bool dax_alive(struct dax_device *dax_dev)
 {
lockdep_assert_held(_srcu);
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index fe1217b..8d190a3 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -472,6 +472,9 @@ static int pmem_attach_disk(struct device *dev,
return -ENOMEM;
}
dax_write_cache(dax_dev, nvdimm_has_cache(nd_region));
+
+   /* Set buffered bit in 'dax_dev' for virtio pmem */
+   virtio_pmem_host_cache(dax_dev, nvdimm_is_buffered(nd_region));
pmem->dax_dev = dax_dev;
 
gendev = disk_to_dev(disk);
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index f8218b4..1f8b2be 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -1264,6 +1264,13 @@ int nd_region_conflict(struct nd_region *nd_region, 
resource_size_t start,
return device_for_each_child(_bus->dev, , region_conflict);
 }
 
+int nvdimm_is_buffered(struct nd_region *nd_region)
+{
+   return is_nd_pmem(_region->dev) &&
+   test_bit(ND_REGION_BUFFERED, _region->flags);
+}
+EXPORT_SYMBOL_GPL(nvdimm_is_buffered);
+
 void __exit nd_region_devs_exit(void)
 {
ida_destroy(_ida);
diff --git a/drivers/virtio/pmem.c b/drivers/virtio/pmem.c
index 51f5349..901767b 100644
--- a/drivers/virtio/pmem.c
+++ b/drivers/virtio/pmem.c
@@ -81,6 +81,7 @@ static int virtio_pmem_probe(struct virtio_device *vdev)
ndr_desc.numa_node = nid;
ndr_desc.flush = virtio_pmem_flush;
set_bit(ND_REGION_PAGEMAP, _desc.flags);
+   set_bit(ND_REGION_BUFFERED, _desc.flags);
nd_region = nvdimm_pmem_region_create(nvdimm_bus, _desc);
nd_region->provider_data =  dev_to_virtio
(nd_region->dev.parent->parent);
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 0dd316a..d16e03e 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -37,6 +37,8 @@ void put_dax(struct dax_device *dax_dev);
 void kill_dax(struct dax_device *dax_dev);
 void dax_write_cache(struct dax_device *dax_dev, bool wc);
 bool dax_write_cache_enabled(struct dax_device *dax_dev);
+void virtio_pmem_host_cache(struct dax_device *dax_dev, bool wc);
+bool virtio_pmem_host_cache_enabled(struct dax_device *dax_dev);
 #else
 static inline struct dax_device *dax_get_by_host(const char *host)
 {
@@ -64,6 +66,13 @@ static inline bool dax_write_cache_enabled(struct dax_device 
*dax_dev)
 {
return false;
 }
+static inline void virtio_pmem_host_cache(struct dax_device *dax_dev, bool wc)
+{
+}
+static inline bool virtio_pmem_host_cache_enabled(struct dax_device *dax_dev)
+{
+   return false;
+}
 #endif
 
 struct writeback_control;
diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
index ca8bc07..94616f1 100644
--- a/include/linux/libnvdimm.h
+++ b/include/linux/libnvdimm.h
@@ -64,6 +64,11 @@ enum {
 */
ND_REGION_PERSIST_MEMCTRL = 2,
 
+   /* provides virtio based asynchronous flush mechanism for buffered
+* host page cache.
+*/
+   ND_REGION_BUFFERED = 3,
+
/* mark newly adjusted resources as requiring a label update */
DPA_RESOURCE_ADJUSTED = 1 << 0,
 };
@@ -265,6 +270,7 @@ int generic_nvdimm_flush(struct nd_region *nd_region);
 int nvdimm_has_flush(struct nd_region *nd_region);
 int nvdimm_has_cache(struct nd_region *nd_region);
 int nvdimm_in_overwrite(struct nvdimm *nvdimm);
+int nvdimm_is_buffered(struct nd_region 

[PATCH v3 0/5] kvm "virtio pmem" device

2019-01-09 Thread Pankaj Gupta
 This patch series has implementation for "virtio pmem". 
 "virtio pmem" is fake persistent memory(nvdimm) in guest 
 which allows to bypass the guest page cache. This also
 implements a VIRTIO based asynchronous flush mechanism.  
 
 Sharing guest kernel driver in this patchset with the 
 changes suggested in v2. Tested with Qemu side device 
 emulation for virtio-pmem [6]. 
 
 Details of project idea for 'virtio pmem' flushing interface 
 is shared [3] & [4].

 Implementation is divided into two parts:
 New virtio pmem guest driver and qemu code changes for new 
 virtio pmem paravirtualized device.

1. Guest virtio-pmem kernel driver
-
   - Reads persistent memory range from paravirt device and 
 registers with 'nvdimm_bus'.  
   - 'nvdimm/pmem' driver uses this information to allocate 
 persistent memory region and setup filesystem operations 
 to the allocated memory. 
   - virtio pmem driver implements asynchronous flushing 
 interface to flush from guest to host.

2. Qemu virtio-pmem device
-
   - Creates virtio pmem device and exposes a memory range to 
 KVM guest. 
   - At host side this is file backed memory which acts as 
 persistent memory. 
   - Qemu side flush uses aio thread pool API's and virtio 
 for asynchronous guest multi request handling. 

   David Hildenbrand CCed also posted a modified version[6] of 
   qemu virtio-pmem code based on updated Qemu memory device API. 

 Virtio-pmem errors handling:
 
  Checked behaviour of virtio-pmem for below types of errors
  Need suggestions on expected behaviour for handling these errors?

  - Hardware Errors: Uncorrectable recoverable Errors: 
  a] virtio-pmem: 
- As per current logic if error page belongs to Qemu process, 
  host MCE handler isolates(hwpoison) that page and send SIGBUS. 
  Qemu SIGBUS handler injects exception to KVM guest. 
- KVM guest then isolates the page and send SIGBUS to guest 
  userspace process which has mapped the page. 
  
  b] Existing implementation for ACPI pmem driver: 
- Handles such errors with MCE notifier and creates a list 
  of bad blocks. Read/direct access DAX operation return EIO 
  if accessed memory page fall in bad block list.
- It also starts backgound scrubbing.  
- Similar functionality can be reused in virtio-pmem with MCE 
  notifier but without scrubbing(no ACPI/ARS)? Need inputs to 
  confirm if this behaviour is ok or needs any change?

Changes from PATCH v2: [1]
- Disable MAP_SYNC for ext4 & XFS filesystems - [Dan] 
- Use name 'virtio pmem' in place of 'fake dax' 

Changes from PATCH v1: [2]
- 0-day build test for build dependency on libnvdimm 

 Changes suggested by - [Dan Williams]
- Split the driver into two parts virtio & pmem  
- Move queuing of async block request to block layer
- Add "sync" parameter in nvdimm_flush function
- Use indirect call for nvdimm_flush
- Don’t move declarations to common global header e.g nd.h
- nvdimm_flush() return 0 or -EIO if it fails
- Teach nsio_rw_bytes() that the flush can fail
- Rename nvdimm_flush() to generic_nvdimm_flush()
- Use 'nd_region->provider_data' for long dereferencing
- Remove virtio_pmem_freeze/restore functions
- Remove BSD license text with SPDX license text

- Add might_sleep() in virtio_pmem_flush - [Luiz]
- Make spin_lock_irqsave() narrow

Changes from RFC v3
- Rebase to latest upstream - Luiz
- Call ndregion->flush in place of nvdimm_flush- Luiz
- kmalloc return check - Luiz
- virtqueue full handling - Stefan
- Don't map entire virtio_pmem_req to device - Stefan
- request leak, correct sizeof req- Stefan
- Move declaration to virtio_pmem.c

Changes from RFC v2:
- Add flush function in the nd_region in place of switching
  on a flag - Dan & Stefan
- Add flush completion function with proper locking and wait
  for host side flush completion - Stefan & Dan
- Keep userspace API in uapi header file - Stefan, MST
- Use LE fields & New device id - MST
- Indentation & spacing suggestions - MST & Eric
- Remove extra header files & add licensing - Stefan

Changes from RFC v1:
- Reuse existing 'pmem' code for registering persistent 
  memory and other operations instead of creating an entirely 
  new block driver.
- Use VIRTIO driver to register memory information with 
  nvdimm_bus and create region_type accordingly. 
- Call VIRTIO flush from existing pmem driver.

Pankaj Gupta (5):
   libnvdimm: nd_region flush callback support
   virtio-pmem: Add virtio-pmem guest driver
   libnvdimm: add nd_region buffered dax_dev flag
   ext4: disable map_sync for virtio pmem
   xfs: disable map_sync for virtio pmem

[2] https://lkml.org/lkml/2018/8/31/407
[3] https://www.spinics.net/lists/kvm/msg149761.html
[4] https://www.spinics.net/lists/kvm/msg153095.html  
[5] https://lkml.org/lkml/2018/8/31/413
[6] https://marc.info/?l=qemu-devel=153555721901824=2

 

[PATCH v3 2/5] virtio-pmem: Add virtio pmem driver

2019-01-09 Thread Pankaj Gupta
This patch adds virtio-pmem driver for KVM guest.

Guest reads the persistent memory range information from
Qemu over VIRTIO and registers it on nvdimm_bus. It also
creates a nd_region object with the persistent memory
range information so that existing 'nvdimm/pmem' driver
can reserve this into system memory map. This way
'virtio-pmem' driver uses existing functionality of pmem
driver to register persistent memory compatible for DAX
capable filesystems.

This also provides function to perform guest flush over
VIRTIO from 'pmem' driver when userspace performs flush
on DAX memory range.

Signed-off-by: Pankaj Gupta 
---
 drivers/nvdimm/virtio_pmem.c |  84 ++
 drivers/virtio/Kconfig   |  10 
 drivers/virtio/Makefile  |   1 +
 drivers/virtio/pmem.c| 124 +++
 include/linux/virtio_pmem.h  |  60 +++
 include/uapi/linux/virtio_ids.h  |   1 +
 include/uapi/linux/virtio_pmem.h |  10 
 7 files changed, 290 insertions(+)
 create mode 100644 drivers/nvdimm/virtio_pmem.c
 create mode 100644 drivers/virtio/pmem.c
 create mode 100644 include/linux/virtio_pmem.h
 create mode 100644 include/uapi/linux/virtio_pmem.h

diff --git a/drivers/nvdimm/virtio_pmem.c b/drivers/nvdimm/virtio_pmem.c
new file mode 100644
index 000..2a1b1ba
--- /dev/null
+++ b/drivers/nvdimm/virtio_pmem.c
@@ -0,0 +1,84 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * virtio_pmem.c: Virtio pmem Driver
+ *
+ * Discovers persistent memory range information
+ * from host and provides a virtio based flushing
+ * interface.
+ */
+#include 
+#include "nd.h"
+
+ /* The interrupt handler */
+void host_ack(struct virtqueue *vq)
+{
+   unsigned int len;
+   unsigned long flags;
+   struct virtio_pmem_request *req, *req_buf;
+   struct virtio_pmem *vpmem = vq->vdev->priv;
+
+   spin_lock_irqsave(>pmem_lock, flags);
+   while ((req = virtqueue_get_buf(vq, )) != NULL) {
+   req->done = true;
+   wake_up(>host_acked);
+
+   if (!list_empty(>req_list)) {
+   req_buf = list_first_entry(>req_list,
+   struct virtio_pmem_request, list);
+   list_del(>req_list);
+   req_buf->wq_buf_avail = true;
+   wake_up(_buf->wq_buf);
+   }
+   }
+   spin_unlock_irqrestore(>pmem_lock, flags);
+}
+EXPORT_SYMBOL_GPL(host_ack);
+
+ /* The request submission function */
+int virtio_pmem_flush(struct nd_region *nd_region)
+{
+   int err;
+   unsigned long flags;
+   struct scatterlist *sgs[2], sg, ret;
+   struct virtio_device *vdev = nd_region->provider_data;
+   struct virtio_pmem *vpmem = vdev->priv;
+   struct virtio_pmem_request *req;
+
+   might_sleep();
+   req = kmalloc(sizeof(*req), GFP_KERNEL);
+   if (!req)
+   return -ENOMEM;
+
+   req->done = req->wq_buf_avail = false;
+   strcpy(req->name, "FLUSH");
+   init_waitqueue_head(>host_acked);
+   init_waitqueue_head(>wq_buf);
+   sg_init_one(, req->name, strlen(req->name));
+   sgs[0] = 
+   sg_init_one(, >ret, sizeof(req->ret));
+   sgs[1] = 
+
+   spin_lock_irqsave(>pmem_lock, flags);
+   err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req, GFP_ATOMIC);
+   if (err) {
+   dev_err(>dev, "failed to send command to virtio pmem 
device\n");
+
+   list_add_tail(>req_list, >list);
+   spin_unlock_irqrestore(>pmem_lock, flags);
+
+   /* When host has read buffer, this completes via host_ack */
+   wait_event(req->wq_buf, req->wq_buf_avail);
+   spin_lock_irqsave(>pmem_lock, flags);
+   }
+   virtqueue_kick(vpmem->req_vq);
+   spin_unlock_irqrestore(>pmem_lock, flags);
+
+   /* When host has read buffer, this completes via host_ack */
+   wait_event(req->host_acked, req->done);
+   err = req->ret;
+   kfree(req);
+
+   return err;
+};
+EXPORT_SYMBOL_GPL(virtio_pmem_flush);
+MODULE_LICENSE("GPL");
diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
index 3589764..9f634a2 100644
--- a/drivers/virtio/Kconfig
+++ b/drivers/virtio/Kconfig
@@ -42,6 +42,16 @@ config VIRTIO_PCI_LEGACY
 
  If unsure, say Y.
 
+config VIRTIO_PMEM
+   tristate "Support for virtio pmem driver"
+   depends on VIRTIO
+   depends on LIBNVDIMM
+   help
+   This driver provides support for virtio based flushing interface
+   for persistent memory range.
+
+   If unsure, say M.
+
 config VIRTIO_BALLOON
tristate "Virtio balloon driver"
depends on VIRTIO
diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
index 3a2b5c5..143ce91 100644
--- a/drivers/virtio/Makefile
+++ b/drivers/virtio/Makefile
@@ -6,3 +6,4 @@ virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
 

[PATCH v3 1/5] libnvdimm: nd_region flush callback support

2019-01-09 Thread Pankaj Gupta
This patch adds functionality to perform flush from guest
to host over VIRTIO. We are registering a callback based
on 'nd_region' type. virtio_pmem driver requires this special
flush function. For rest of the region types we are registering
existing flush function. Report error returned by host fsync
failure to userspace.

This also handles asynchronous flush requests from the block layer
by creating a child bio and chaining it with parent bio.

Signed-off-by: Pankaj Gupta 
---
 drivers/acpi/nfit/core.c |  4 ++--
 drivers/nvdimm/claim.c   |  6 --
 drivers/nvdimm/nd.h  |  1 +
 drivers/nvdimm/pmem.c| 12 
 drivers/nvdimm/region_devs.c | 38 --
 include/linux/libnvdimm.h|  5 -
 6 files changed, 55 insertions(+), 11 deletions(-)

diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
index b072cfc..f154852 100644
--- a/drivers/acpi/nfit/core.c
+++ b/drivers/acpi/nfit/core.c
@@ -2234,7 +2234,7 @@ static void write_blk_ctl(struct nfit_blk *nfit_blk, 
unsigned int bw,
offset = to_interleave_offset(offset, mmio);
 
writeq(cmd, mmio->addr.base + offset);
-   nvdimm_flush(nfit_blk->nd_region);
+   nvdimm_flush(nfit_blk->nd_region, NULL, false);
 
if (nfit_blk->dimm_flags & NFIT_BLK_DCR_LATCH)
readq(mmio->addr.base + offset);
@@ -2283,7 +2283,7 @@ static int acpi_nfit_blk_single_io(struct nfit_blk 
*nfit_blk,
}
 
if (rw)
-   nvdimm_flush(nfit_blk->nd_region);
+   nvdimm_flush(nfit_blk->nd_region, NULL, false);
 
rc = read_blk_stat(nfit_blk, lane) ? -EIO : 0;
return rc;
diff --git a/drivers/nvdimm/claim.c b/drivers/nvdimm/claim.c
index fb667bf..a1dfa06 100644
--- a/drivers/nvdimm/claim.c
+++ b/drivers/nvdimm/claim.c
@@ -263,7 +263,7 @@ static int nsio_rw_bytes(struct nd_namespace_common *ndns,
struct nd_namespace_io *nsio = to_nd_namespace_io(>dev);
unsigned int sz_align = ALIGN(size + (offset & (512 - 1)), 512);
sector_t sector = offset >> 9;
-   int rc = 0;
+   int rc = 0, ret = 0;
 
if (unlikely(!size))
return 0;
@@ -301,7 +301,9 @@ static int nsio_rw_bytes(struct nd_namespace_common *ndns,
}
 
memcpy_flushcache(nsio->addr + offset, buf, size);
-   nvdimm_flush(to_nd_region(ndns->dev.parent));
+   ret = nvdimm_flush(to_nd_region(ndns->dev.parent), NULL, false);
+   if (ret)
+   rc = ret;
 
return rc;
 }
diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
index 98317e7..d53a2d1 100644
--- a/drivers/nvdimm/nd.h
+++ b/drivers/nvdimm/nd.h
@@ -160,6 +160,7 @@ struct nd_region {
struct nd_interleave_set *nd_set;
struct nd_percpu_lane __percpu *lane;
struct nd_mapping mapping[0];
+   int (*flush)(struct nd_region *nd_region);
 };
 
 struct nd_blk_region {
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 6071e29..5d6a4a1 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -192,6 +192,7 @@ static blk_status_t pmem_do_bvec(struct pmem_device *pmem, 
struct page *page,
 
 static blk_qc_t pmem_make_request(struct request_queue *q, struct bio *bio)
 {
+   int ret = 0;
blk_status_t rc = 0;
bool do_acct;
unsigned long start;
@@ -201,7 +202,7 @@ static blk_qc_t pmem_make_request(struct request_queue *q, 
struct bio *bio)
struct nd_region *nd_region = to_region(pmem);
 
if (bio->bi_opf & REQ_PREFLUSH)
-   nvdimm_flush(nd_region);
+   ret = nvdimm_flush(nd_region, bio, true);
 
do_acct = nd_iostat_start(bio, );
bio_for_each_segment(bvec, bio, iter) {
@@ -216,7 +217,10 @@ static blk_qc_t pmem_make_request(struct request_queue *q, 
struct bio *bio)
nd_iostat_end(bio, start);
 
if (bio->bi_opf & REQ_FUA)
-   nvdimm_flush(nd_region);
+   ret = nvdimm_flush(nd_region, bio, true);
+
+   if (ret)
+   bio->bi_status = errno_to_blk_status(ret);
 
bio_endio(bio);
return BLK_QC_T_NONE;
@@ -528,14 +532,14 @@ static int nd_pmem_remove(struct device *dev)
sysfs_put(pmem->bb_state);
pmem->bb_state = NULL;
}
-   nvdimm_flush(to_nd_region(dev->parent));
+   nvdimm_flush(to_nd_region(dev->parent), NULL, false);
 
return 0;
 }
 
 static void nd_pmem_shutdown(struct device *dev)
 {
-   nvdimm_flush(to_nd_region(dev->parent));
+   nvdimm_flush(to_nd_region(dev->parent), NULL, false);
 }
 
 static void nd_pmem_notify(struct device *dev, enum nvdimm_event event)
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index fa37afc..5508727 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -290,7 +290,9 @@ static ssize_t deep_flush_store(struct device *dev, struct 
device_attribute *att
return rc;
if 

Re: [Qemu-devel] [PATCH v3 0/5] kvm "virtio pmem" device

2019-01-09 Thread Pankaj Gupta


Please ignore this series as my network went down while 
sending this. I will send this series again.

Thanks,
Pankaj  

> 
> This patch series has implementation for "virtio pmem".
>  "virtio pmem" is fake persistent memory(nvdimm) in guest
>  which allows to bypass the guest page cache. This also
>  implements a VIRTIO based asynchronous flush mechanism.
>  
>  Sharing guest kernel driver in this patchset with the
>  changes suggested in v2. Tested with Qemu side device
>  emulation for virtio-pmem [6].
>  
>  Details of project idea for 'virtio pmem' flushing interface
>  is shared [3] & [4].
> 
>  Implementation is divided into two parts:
>  New virtio pmem guest driver and qemu code changes for new
>  virtio pmem paravirtualized device.
> 
> 1. Guest virtio-pmem kernel driver
> -
>- Reads persistent memory range from paravirt device and
>  registers with 'nvdimm_bus'.
>- 'nvdimm/pmem' driver uses this information to allocate
>  persistent memory region and setup filesystem operations
>  to the allocated memory.
>- virtio pmem driver implements asynchronous flushing
>  interface to flush from guest to host.
> 
> 2. Qemu virtio-pmem device
> -
>- Creates virtio pmem device and exposes a memory range to
>  KVM guest.
>- At host side this is file backed memory which acts as
>  persistent memory.
>- Qemu side flush uses aio thread pool API's and virtio
>  for asynchronous guest multi request handling.
> 
>David Hildenbrand CCed also posted a modified version[6] of
>qemu virtio-pmem code based on updated Qemu memory device API.
> 
>  Virtio-pmem errors handling:
>  
>   Checked behaviour of virtio-pmem for below types of errors
>   Need suggestions on expected behaviour for handling these errors?
> 
>   - Hardware Errors: Uncorrectable recoverable Errors:
>   a] virtio-pmem:
> - As per current logic if error page belongs to Qemu process,
>   host MCE handler isolates(hwpoison) that page and send SIGBUS.
>   Qemu SIGBUS handler injects exception to KVM guest.
> - KVM guest then isolates the page and send SIGBUS to guest
>   userspace process which has mapped the page.
>   
>   b] Existing implementation for ACPI pmem driver:
> - Handles such errors with MCE notifier and creates a list
>   of bad blocks. Read/direct access DAX operation return EIO
>   if accessed memory page fall in bad block list.
> - It also starts backgound scrubbing.
> - Similar functionality can be reused in virtio-pmem with MCE
>   notifier but without scrubbing(no ACPI/ARS)? Need inputs to
>   confirm if this behaviour is ok or needs any change?
> 
> Changes from PATCH v2: [1]
> - Disable MAP_SYNC for ext4 & XFS filesystems - [Dan]
> - Use name 'virtio pmem' in place of 'fake dax'
> 
> Changes from PATCH v1: [2]
> - 0-day build test for build dependency on libnvdimm
> 
>  Changes suggested by - [Dan Williams]
> - Split the driver into two parts virtio & pmem
> - Move queuing of async block request to block layer
> - Add "sync" parameter in nvdimm_flush function
> - Use indirect call for nvdimm_flush
> - Don’t move declarations to common global header e.g nd.h
> - nvdimm_flush() return 0 or -EIO if it fails
> - Teach nsio_rw_bytes() that the flush can fail
> - Rename nvdimm_flush() to generic_nvdimm_flush()
> - Use 'nd_region->provider_data' for long dereferencing
> - Remove virtio_pmem_freeze/restore functions
> - Remove BSD license text with SPDX license text
> 
> - Add might_sleep() in virtio_pmem_flush - [Luiz]
> - Make spin_lock_irqsave() narrow
> 
> Changes from RFC v3
> - Rebase to latest upstream - Luiz
> - Call ndregion->flush in place of nvdimm_flush- Luiz
> - kmalloc return check - Luiz
> - virtqueue full handling - Stefan
> - Don't map entire virtio_pmem_req to device - Stefan
> - request leak, correct sizeof req- Stefan
> - Move declaration to virtio_pmem.c
> 
> Changes from RFC v2:
> - Add flush function in the nd_region in place of switching
>   on a flag - Dan & Stefan
> - Add flush completion function with proper locking and wait
>   for host side flush completion - Stefan & Dan
> - Keep userspace API in uapi header file - Stefan, MST
> - Use LE fields & New device id - MST
> - Indentation & spacing suggestions - MST & Eric
> - Remove extra header files & add licensing - Stefan
> 
> Changes from RFC v1:
> - Reuse existing 'pmem' code for registering persistent
>   memory and other operations instead of creating an entirely
>   new block driver.
> - Use VIRTIO driver to register memory information with
>   nvdimm_bus and create region_type accordingly.
> - Call VIRTIO flush from existing pmem driver.
> 
> Pankaj Gupta (5):
>libnvdimm: nd_region flush callback support
>virtio-pmem: Add virtio-pmem guest driver
>libnvdimm: add nd_region buffered dax_dev flag
>ext4: disable map_sync for virtio 

Re: [PATCH v3 4/5] ext4: disable map_sync for virtio pmem

2019-01-09 Thread Jan Kara
On Wed 09-01-19 19:26:05, Pankaj Gupta wrote:
> Virtio pmem provides asynchronous host page cache flush
> mechanism. We don't support 'MAP_SYNC' with virtio pmem 
> and ext4. 
> 
> Signed-off-by: Pankaj Gupta 
...
> @@ -371,6 +373,13 @@ static int ext4_file_mmap(struct file *file, struct 
> vm_area_struct *vma)
>   if (!IS_DAX(file_inode(file)) && (vma->vm_flags & VM_SYNC))
>   return -EOPNOTSUPP;
>  
> + /* We don't support synchronous mappings with guest direct access
> +  * and virtio based host page cache flush mechanism.
> +  */
> + if (IS_DAX(file_inode(file)) && virtio_pmem_host_cache_enabled(dax_dev)
> + && (vma->vm_flags & VM_SYNC))
> + return -EOPNOTSUPP;
> +

Shouldn't there rather be some generic way of doing this? Having
virtio_pmem_host_cache_enabled() check in filesystem code just looks like
filesystem sniffing into details is should not care about... Maybe just
naming this (or having a wrapper) dax_dev_map_sync_supported()?

Honza
-- 
Jan Kara 
SUSE Labs, CR
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] virtio:linux:kernel:NULL check after kmalloc is needed

2019-01-09 Thread Michael S. Tsirkin
On Tue, Jan 08, 2019 at 01:40:33PM +0800, Yi Wang wrote:
> From: "huang.zijiang" 
> 
> NULL check is needed because kmalloc maybe return NULL.
> 
> Signed-off-by: huang.zijiang 

Can't hurt I will queue it.

> ---
>  tools/virtio/linux/kernel.h | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/tools/virtio/linux/kernel.h b/tools/virtio/linux/kernel.h
> index 7ef45a4..2afcad8 100644
> --- a/tools/virtio/linux/kernel.h
> +++ b/tools/virtio/linux/kernel.h
> @@ -65,6 +65,8 @@ static inline void *kzalloc(size_t s, gfp_t gfp)
>  {
>   void *p = kmalloc(s, gfp);
>  
> + if (!p)
> + return -ENOMEM;
>   memset(p, 0, s);
>   return p;
>  }
> -- 
> 1.8.3.1
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next V2 1/3] virtio: introduce in order feature bit

2019-01-09 Thread Michael S. Tsirkin
On Wed, Jan 09, 2019 at 04:05:28PM +0800, Jason Wang wrote:
> Signed-off-by: Jason Wang 
> ---
>  include/uapi/linux/virtio_config.h | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/include/uapi/linux/virtio_config.h 
> b/include/uapi/linux/virtio_config.h
> index 1196e1c1d4f6..2698e069ed9e 100644
> --- a/include/uapi/linux/virtio_config.h
> +++ b/include/uapi/linux/virtio_config.h
> @@ -78,6 +78,12 @@
>  /* This feature indicates support for the packed virtqueue layout. */
>  #define VIRTIO_F_RING_PACKED 34
>  
> +/*
> + * Device uses buffers in the same order in which they have been
> + * available.

been *made* available

> + */
> +#define VIRTIO_F_IN_ORDER35
> +
>  /*
>   * Does the device support Single Root I/O Virtualization?
>   */
> -- 
> 2.17.1
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v3 5/5] xfs: disable map_sync for virtio pmem

2019-01-09 Thread Pankaj Gupta
Virtio pmem provides asynchronous host page cache flush
mechanism. we don't support 'MAP_SYNC' with virtio pmem 
and xfs.

Signed-off-by: Pankaj Gupta 
---
 fs/xfs/xfs_file.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index e474250..eae4aa4 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1190,6 +1190,14 @@ xfs_file_mmap(
if (!IS_DAX(file_inode(filp)) && (vma->vm_flags & VM_SYNC))
return -EOPNOTSUPP;
 
+   /* We don't support synchronous mappings with guest direct access
+* and virtio based host page cache mechanism.
+*/
+   if (IS_DAX(file_inode(filp)) && virtio_pmem_host_cache_enabled(
+   xfs_find_daxdev_for_inode(file_inode(filp))) &&
+   (vma->vm_flags & VM_SYNC))
+   return -EOPNOTSUPP;
+
file_accessed(filp);
vma->vm_ops = _file_vm_ops;
if (IS_DAX(file_inode(filp)))
-- 
2.9.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net V2] vhost: log dirty page correctly

2019-01-09 Thread Michael S. Tsirkin
On Wed, Jan 09, 2019 at 03:29:47PM +0800, Jason Wang wrote:
> Vhost dirty page logging API is designed to sync through GPA. But we
> try to log GIOVA when device IOTLB is enabled. This is wrong and may
> lead to missing data after migration.
> 
> To solve this issue, when logging with device IOTLB enabled, we will:
> 
> 1) reuse the device IOTLB translation result of GIOVA->HVA mapping to
>get HVA, for writable descriptor, get HVA through iovec. For used
>ring update, translate its GIOVA to HVA
> 2) traverse the GPA->HVA mapping to get the possible GPA and log
>through GPA. Pay attention this reverse mapping is not guaranteed
>to be unique, so we should log each possible GPA in this case.
> 
> This fix the failure of scp to guest during migration. In -next, we
> will probably support passing GIOVA->GPA instead of GIOVA->HVA.
> 
> Fixes: 6b1e6cc7855b ("vhost: new device IOTLB API")
> Reported-by: Jintack Lim 
> Cc: Jintack Lim 
> Signed-off-by: Jason Wang 
> ---
> The patch is needed for stable.
> Changes from V1:
> - return error instead of warn
> ---
>  drivers/vhost/net.c   |  3 +-
>  drivers/vhost/vhost.c | 82 +++
>  drivers/vhost/vhost.h |  3 +-
>  3 files changed, 72 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 36f3d0f49e60..bca86bf7189f 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -1236,7 +1236,8 @@ static void handle_rx(struct vhost_net *net)
>   if (nvq->done_idx > VHOST_NET_BATCH)
>   vhost_net_signal_used(nvq);
>   if (unlikely(vq_log))
> - vhost_log_write(vq, vq_log, log, vhost_len);
> + vhost_log_write(vq, vq_log, log, vhost_len,
> + vq->iov, in);
>   total_len += vhost_len;
>   if (unlikely(vhost_exceeds_weight(++recv_pkts, total_len))) {
>   vhost_poll_queue(>poll);
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 9f7942cbcbb2..ee095f08ffd4 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -1733,11 +1733,70 @@ static int log_write(void __user *log_base,
>   return r;
>  }
>  
> +static int log_write_hva(struct vhost_virtqueue *vq, u64 hva, u64 len)
> +{
> + struct vhost_umem *umem = vq->umem;
> + struct vhost_umem_node *u;
> + u64 gpa;
> + int r;
> + bool hit = false;
> +
> + list_for_each_entry(u, >umem_list, link) {
> + if (u->userspace_addr < hva &&
> + u->userspace_addr + u->size >=
> + hva + len) {

So this tries to see that the GPA range is completely within
the GVA region. Does this have to be the case?
And if yes why not return 0 below instead of hit = true?
I'm also a bit concerned about overflow when addr + len is on a 64 bit
boundary.  Why not check add + size - 1 and hva + len - 1 instead?


> + gpa = u->start + hva - u->userspace_addr;
> + r = log_write(vq->log_base, gpa, len);
> + if (r < 0)
> + return r;
> + hit = true;
> + }
> + }
> +
> + if (!hit)
> + return -EFAULT;
> +
> + return 0;
> +}
> +
> +static int log_used(struct vhost_virtqueue *vq, u64 used_offset, u64 len)
> +{
> + struct iovec iov[64];
> + int i, ret;
> +
> + if (!vq->iotlb)
> + return log_write(vq->log_base, vq->log_addr + used_offset, len);
> +
> + ret = translate_desc(vq, (u64)(uintptr_t)vq->used + used_offset,
> +  len, iov, 64, VHOST_ACCESS_WO);


We don't need the cast to u64 here do we?

> + if (ret)
> + return ret;
> +
> + for (i = 0; i < ret; i++) {
> + ret = log_write_hva(vq, (u64)(uintptr_t)iov[i].iov_base,


We don't need the cast to u64 here do we?

> + iov[i].iov_len);
> + if (ret)
> + return ret;
> + }
> +
> + return 0;
> +}
> +
>  int vhost_log_write(struct vhost_virtqueue *vq, struct vhost_log *log,
> - unsigned int log_num, u64 len)
> + unsigned int log_num, u64 len, struct iovec *iov, int count)
>  {
>   int i, r;
>  
> + if (vq->iotlb) {
> + for (i = 0; i < count; i++) {
> + r = log_write_hva(vq, (u64)(uintptr_t)iov[i].iov_base,
> +   iov[i].iov_len);


We don't need the cast to u64 here do we?

> + if (r < 0)
> + return r;
> + }
> + return 0;
> + }
> +
>   /* Make sure data written is seen before log. */
>   smp_wmb();

Shouldn't the wmb be before log_write_hva too?

>   for (i = 0; i < log_num; ++i) {
> @@ -1769,9 +1828,8 @@ static int vhost_update_used_flags(struct 
> vhost_virtqueue *vq)
>  

Re: [PATCH v2 15/15] drm/bochs: reserve bo for pin/unpin

2019-01-09 Thread Daniel Vetter
On Wed, Jan 9, 2019 at 9:52 PM Gerd Hoffmann  wrote:
>
>   Hi,
>
> > > If I understand things correctly it is valid to set all import/export
> > > callbacks (prime_handle_to_fd, prime_fd_to_handle,
> > > gem_prime_get_sg_table, gem_prime_import_sg_table) to NULL when not
> > > supporting dma-buf import/export and still advertise DRIVER_PRIME to
> > > indicate the other prime callbacks are supported (so generic fbdev
> > > emulation can use gem_prime_vmap etc).  Is that correct?
> >
> > I'm not sure how much that's a good idea ... Never thought about it
> > tbh. All the fbdev/dma-buf stuff has plenty of hacks and
> > inconsistencies still, so I guess we can't make it much worse really.
>
> Setting prime_handle_to_fd + prime_fd_to_handle to NULL has the effect
> that drm stops advertising DRM_PRIME_CAP_{IMPORT,EXPORT} to userspace.
>
> Which looks better to me than telling userspace we support it then throw
> errors unconditionally when userspace tries to use that.
>
> > > Is it possible to export TTM_PL_VRAM objects (with backing storage being
> > > a pci memory bar)?  If so, how?
> >
> > Not really in general. amdgpu upcasts to amdgpu_bo (if it's amgpu BO)
> > and then knows the internals so it can do a proper pci peer2peer
> > mapping. Or at least there's been lots of patches floating around to
> > make that happen.
>
> That is limited to bo sharing between two amdgpu devices, correct?
>
> > I think other drivers migrate the bo out of VRAM.
>
> Well, that doesn't look too useful.  bochs and qxl virtual hardware
> can't access buffers outside VRAM.  So, while I could migrate the
> buffers to RAM (via memcpy) when exporting they would at the same time
> become unusable for the GPU ...
>
> > > On importing:
> > >
> > > Importing into TTM_PL_TT object looks easy again, at least when the
> > > object is actually stored in RAM.  What if not?
> >
> > They are all supposed to be stored in RAM. Note that all current ttm
> > importers totally break the abstraction, by taking the sg list,
> > throwing the dma mapping away and assuming there's a struct page
> > backing it. Would be good if we could stop spreading that abuse - the
> > dma-buf interfaces have been modelled after the ttm bo interfaces, so
> > shouldn't be too hard to wire this up correctly.
>
> Ok.  With virtio-gpu (where objects are backed by RAM pages anyway)
> wiring this up should be easy.
>
> But given there is no correct sample code I can look at it would be cool
> if you could give some more hints how this is supposed to work.  The
> gem_prime_import_sg_table() callback gets a sg list passed in after all,
> so I probably would have tried to take the sg list too ...

I'm not a fan of that helper either, that's really the broken part
imo. i915 doesn't use it. It's a midlayer so that the nvidia blob can
avoid directly touching the EXPORT_SYMBOL_GPL dma-buf symbols, afaiui
there's really no other solid reason for it. What the new gem cma
helpers does is imo much better (it still uses the import_sg_table
midlayer, but oh well).

For ttm you'd need to make sure that all the various ttm cpu side
access functions also all go through the relevant dma-buf interfaces,
and not through the struct page list fished out of the sgt. That was
at least the idea, long ago.

> > > Importing into TTM_PL_VRAM:  Impossible I think, without copying over
> > > the data.  Should that be done?  If so, how?  Or is it better to just
> > > not support import then?
> >
> > Hm, since you ask about TTM concepts and not what this means in terms
> > of dma-buf:
>
> Ok, more details on the quesion:
>
> dma-buf: whatever the driver gets passed into the
> gem_prime_import_sg_table() callback.
>
> import into TTM_PL_VRAM: qemu driver which supports VRAM storage only
> (bochs, qxl), so the buffer has to be stored there if we want do
> something with it (like scanning out to a crtc).
>
> > As long as you upcast to the ttm_bo you can do whatever
> > you want to really.
>
> Well, if the dma-buf comes from another device (say export vgem bo, then
> try import into bochs/qxl/virtio) I can't upcast.

In that case you'll in practice only get system RAM, and you're not
allowed to move it (dma-buf is meant to be zero-copy after all). If
your hw can't scan these out directly, then userspace needs to arrange
for a buffer copy into a native buffer somehow (that's how Xorg prime
works at least I think). No idea whether your virtual gpus can make
use of that directly. You might also get some pci peer2peer range in
the future, but it's strictly opt-in (because there's too many dma-buf
importers that just blindly assume there's a struct page behind the
sgt).

> When the dma-buf comes from the same device drm_gem_prime_import_dev()
> will notice and take a shortcut (skip import, just increase refcount
> instead), so I don't have to worry about that case in the
> gem_prime_import_sg_table() callback.

You can also upcast if it's from the same driver, not just same device.
-Daniel

> > But with plain 

Re: [PATCH v2 15/15] drm/bochs: reserve bo for pin/unpin

2019-01-09 Thread Gerd Hoffmann
  Hi,

> > If I understand things correctly it is valid to set all import/export
> > callbacks (prime_handle_to_fd, prime_fd_to_handle,
> > gem_prime_get_sg_table, gem_prime_import_sg_table) to NULL when not
> > supporting dma-buf import/export and still advertise DRIVER_PRIME to
> > indicate the other prime callbacks are supported (so generic fbdev
> > emulation can use gem_prime_vmap etc).  Is that correct?
> 
> I'm not sure how much that's a good idea ... Never thought about it
> tbh. All the fbdev/dma-buf stuff has plenty of hacks and
> inconsistencies still, so I guess we can't make it much worse really.

Setting prime_handle_to_fd + prime_fd_to_handle to NULL has the effect
that drm stops advertising DRM_PRIME_CAP_{IMPORT,EXPORT} to userspace.

Which looks better to me than telling userspace we support it then throw
errors unconditionally when userspace tries to use that.

> > Is it possible to export TTM_PL_VRAM objects (with backing storage being
> > a pci memory bar)?  If so, how?
> 
> Not really in general. amdgpu upcasts to amdgpu_bo (if it's amgpu BO)
> and then knows the internals so it can do a proper pci peer2peer
> mapping. Or at least there's been lots of patches floating around to
> make that happen.

That is limited to bo sharing between two amdgpu devices, correct?

> I think other drivers migrate the bo out of VRAM.

Well, that doesn't look too useful.  bochs and qxl virtual hardware
can't access buffers outside VRAM.  So, while I could migrate the
buffers to RAM (via memcpy) when exporting they would at the same time
become unusable for the GPU ...

> > On importing:
> >
> > Importing into TTM_PL_TT object looks easy again, at least when the
> > object is actually stored in RAM.  What if not?
> 
> They are all supposed to be stored in RAM. Note that all current ttm
> importers totally break the abstraction, by taking the sg list,
> throwing the dma mapping away and assuming there's a struct page
> backing it. Would be good if we could stop spreading that abuse - the
> dma-buf interfaces have been modelled after the ttm bo interfaces, so
> shouldn't be too hard to wire this up correctly.

Ok.  With virtio-gpu (where objects are backed by RAM pages anyway)
wiring this up should be easy.

But given there is no correct sample code I can look at it would be cool
if you could give some more hints how this is supposed to work.  The
gem_prime_import_sg_table() callback gets a sg list passed in after all,
so I probably would have tried to take the sg list too ...

> > Importing into TTM_PL_VRAM:  Impossible I think, without copying over
> > the data.  Should that be done?  If so, how?  Or is it better to just
> > not support import then?
> 
> Hm, since you ask about TTM concepts and not what this means in terms
> of dma-buf:

Ok, more details on the quesion:

dma-buf: whatever the driver gets passed into the
gem_prime_import_sg_table() callback.

import into TTM_PL_VRAM: qemu driver which supports VRAM storage only
(bochs, qxl), so the buffer has to be stored there if we want do
something with it (like scanning out to a crtc).

> As long as you upcast to the ttm_bo you can do whatever
> you want to really.

Well, if the dma-buf comes from another device (say export vgem bo, then
try import into bochs/qxl/virtio) I can't upcast.

When the dma-buf comes from the same device drm_gem_prime_import_dev()
will notice and take a shortcut (skip import, just increase refcount
instead), so I don't have to worry about that case in the
gem_prime_import_sg_table() callback.

> But with plain dma-buf this doesn't work right now
> (least because ttm assumes it gets system RAM on import, in theory you
> could put the peer2peer dma mapping into the sg list and it should
> work).

Well, qemu display devices don't have peer2peer dma support.
So I guess the answer is "doesn't work".

cheers,
  Gerd

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next V2 0/3] basic in order support for vhost_net

2019-01-09 Thread Jason Wang


On 2019/1/9 下午4:05, Jason Wang wrote:

Hi:

This series implement basic in order feature support for
vhost_net. This feature requires both driver and device to use
descriptors in order which can simplify the implementation and
optimizaton for both side. The series also implement a simple
optimization that avoid read available ring. Test shows 10%
performance improvement at most.

More optimizations could be done on top.

Changes from V1:
- no code changes
- add result of SMAP off



Just notice the patch works only for some specific case e.g a buffer 
only contain one descriptor. Will respin.


Thanks




Jason Wang (3):
   virtio: introduce in order feature bit
   vhost_net: support in order feature
   vhost: don't touch avail ring if in_order is negotiated

  drivers/vhost/net.c|  6 --
  drivers/vhost/vhost.c  | 19 ---
  include/uapi/linux/virtio_config.h |  6 ++
  3 files changed, 22 insertions(+), 9 deletions(-)


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v3 1/3] virtio-balloon: tweak config_changed implementation

2019-01-09 Thread Christian Borntraeger
On 09.01.2019 11:35, Wei Wang wrote:
> On 01/08/2019 04:46 PM, Christian Borntraeger wrote:
>>
>> On 08.01.2019 06:35, Wei Wang wrote:
>>> On 01/07/2019 09:49 PM, Christian Borntraeger wrote:
 On 07.01.2019 08:01, Wei Wang wrote:
> virtio-ccw has deadlock issues with reading the config space inside the
> interrupt context, so we tweak the virtballoon_changed implementation
> by moving the config read operations into the related workqueue contexts.
> The config_read_bitmap is used as a flag to the workqueue callbacks
> about the related config fields that need to be read.
>
> The cmd_id_received is also renamed to cmd_id_received_cache, and
> the value should be obtained via virtio_balloon_cmd_id_received.
>
> Reported-by: Christian Borntraeger 
> Signed-off-by: Wei Wang 
> Reviewed-by: Cornelia Huck 
> Reviewed-by: Halil Pasic 
 Together with
     virtio_pci: use queue idx instead of array idx to set up the vq
     virtio: don't allocate vqs when names[i] = NULL

 Tested-by: Christian Borntraeger 
>>> OK. I don't plan to send a new version of the above patches as no changes 
>>> needed so far.
>>>
>>> Michael, if the above two patches look good to you, please help add the 
>>> related tested-by
>>> and reviewed-by tags. Thanks.
>> Can we also make sure that
>>
>> virtio_pci: use queue idx instead of array idx to set up the vq
>> virtio: don't allocate vqs when names[i] = NULL
>>
>> also land in stable?
>>
> 
> You could also send the request to stable after it gets merged to Linus' tree.
> The stable review committee will decide whether to take it.
> 
> Please see Option 2:
> 
> https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
> 

Those patches are not upstream yet, Correct?

Michael,

can you add the stable tag before submitting? If not, can you give me a heads 
up when doing the
pull request so that I can ping the stable folks.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v3 1/3] virtio-balloon: tweak config_changed implementation

2019-01-09 Thread Wei Wang

On 01/08/2019 04:46 PM, Christian Borntraeger wrote:


On 08.01.2019 06:35, Wei Wang wrote:

On 01/07/2019 09:49 PM, Christian Borntraeger wrote:

On 07.01.2019 08:01, Wei Wang wrote:

virtio-ccw has deadlock issues with reading the config space inside the
interrupt context, so we tweak the virtballoon_changed implementation
by moving the config read operations into the related workqueue contexts.
The config_read_bitmap is used as a flag to the workqueue callbacks
about the related config fields that need to be read.

The cmd_id_received is also renamed to cmd_id_received_cache, and
the value should be obtained via virtio_balloon_cmd_id_received.

Reported-by: Christian Borntraeger 
Signed-off-by: Wei Wang 
Reviewed-by: Cornelia Huck 
Reviewed-by: Halil Pasic 

Together with
virtio_pci: use queue idx instead of array idx to set up the vq
virtio: don't allocate vqs when names[i] = NULL

Tested-by: Christian Borntraeger 

OK. I don't plan to send a new version of the above patches as no changes 
needed so far.

Michael, if the above two patches look good to you, please help add the related 
tested-by
and reviewed-by tags. Thanks.

Can we also make sure that

virtio_pci: use queue idx instead of array idx to set up the vq
virtio: don't allocate vqs when names[i] = NULL

also land in stable?



You could also send the request to stable after it gets merged to Linus' 
tree.

The stable review committee will decide whether to take it.

Please see Option 2:

https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html


Best,
Wei
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 5/5] xfs: disable map_sync for virtio pmem

2019-01-09 Thread Pankaj Gupta


> > Virtio pmem provides asynchronous host page cache flush
> > mechanism. we don't support 'MAP_SYNC' with virtio pmem
> > and xfs.
> > 
> > Signed-off-by: Pankaj Gupta 
> > ---
> >  fs/xfs/xfs_file.c | 8 
> >  1 file changed, 8 insertions(+)
> > 
> > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> > index e474250..eae4aa4 100644
> > --- a/fs/xfs/xfs_file.c
> > +++ b/fs/xfs/xfs_file.c
> > @@ -1190,6 +1190,14 @@ xfs_file_mmap(
> > if (!IS_DAX(file_inode(filp)) && (vma->vm_flags & VM_SYNC))
> > return -EOPNOTSUPP;
> >  
> > +   /* We don't support synchronous mappings with guest direct access
> > +* and virtio based host page cache mechanism.
> > +*/
> > +   if (IS_DAX(file_inode(filp)) && virtio_pmem_host_cache_enabled(
> 
> Echoing what Jan said, this ought to be some sort of generic function
> that tells us whether or not memory mapped from the dax device will
> always still be accessible even after a crash (i.e. supports MAP_SYNC).

o.k
> 
> What if the underlying file on the host is itself on pmem and can be
> MAP_SYNC'd?  Shouldn't the guest be able to use MAP_SYNC as well?

Guest MAP_SYNC on actual host pmem will sync guest metadata, as guest 
writes are persistent on actual pmem. Host side Qemu MAP_SYNC enabling
work for pmem is in progress. It will make sure metadata at host side
is in consistent state after a crash or any other metadata corruption 
operation.

For virtio-pmem, we are emulating pmem device over regular storage on
host side. Guest need to call fsync followed by write to make sure
guest metadata is in consistent state(or journalled). Host backing file
data & metadata will also be persistent after guest to host fsync.

Thanks,
Pankaj

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 3/5] libnvdimm: add nd_region buffered dax_dev flag

2019-01-09 Thread Dan Williams
On Wed, Jan 9, 2019 at 5:53 AM Pankaj Gupta  wrote:
>
> This patch adds 'DAXDEV_BUFFERED' flag which is set
> for virtio pmem corresponding nd_region. This later
> is used to disable MAP_SYNC functionality for ext4
> & xfs filesystem.
>
> Signed-off-by: Pankaj Gupta 
> ---
>  drivers/dax/super.c  | 17 +
>  drivers/nvdimm/pmem.c|  3 +++
>  drivers/nvdimm/region_devs.c |  7 +++
>  drivers/virtio/pmem.c|  1 +
>  include/linux/dax.h  |  9 +
>  include/linux/libnvdimm.h|  6 ++
>  6 files changed, 43 insertions(+)
>
> diff --git a/drivers/dax/super.c b/drivers/dax/super.c
> index 6e928f3..9128740 100644
> --- a/drivers/dax/super.c
> +++ b/drivers/dax/super.c
> @@ -167,6 +167,8 @@ enum dax_device_flags {
> DAXDEV_ALIVE,
> /* gate whether dax_flush() calls the low level flush routine */
> DAXDEV_WRITE_CACHE,
> +   /* flag to disable MAP_SYNC for virtio based host page cache flush */
> +   DAXDEV_BUFFERED,
>  };
>
>  /**
> @@ -335,6 +337,21 @@ bool dax_write_cache_enabled(struct dax_device *dax_dev)
>  }
>  EXPORT_SYMBOL_GPL(dax_write_cache_enabled);
>
> +void virtio_pmem_host_cache(struct dax_device *dax_dev, bool wc)
> +{
> +   if (wc)
> +   set_bit(DAXDEV_BUFFERED, _dev->flags);
> +   else
> +   clear_bit(DAXDEV_BUFFERED, _dev->flags);
> +}
> +EXPORT_SYMBOL_GPL(virtio_pmem_host_cache);

The "write_cache" property was structured this way because it can
conceivably change at runtime. The MAP_SYNC capability should be
static and never changed after init.

> +bool virtio_pmem_host_cache_enabled(struct dax_device *dax_dev)
> +{
> +   return test_bit(DAXDEV_BUFFERED, _dev->flags);
> +}
> +EXPORT_SYMBOL_GPL(virtio_pmem_host_cache_enabled);

Echoing Darrick and Jan this is should be a generic property of a
dax_device and not specific to virtio. I don't like the "buffered"
designation as that's not accurate. There may be hardware reasons why
a dax_device is not synchronous, like a requirement to flush a
write-pending queue or otherwise notify the device of new writes.

I would just have a dax_synchronous() helper and a DAXDEV_SYNC flag. I
would also modify alloc_dax() to take a flags argument so that the
capability can be instantiated when the dax_device is allocated.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 15/15] drm/bochs: reserve bo for pin/unpin

2019-01-09 Thread Daniel Vetter
On Tue, Jan 08, 2019 at 12:25:19PM +0100, Gerd Hoffmann wrote:
> The buffer object must be reserved before calling
> ttm_bo_validate for pinning/unpinning.
> 
> Signed-off-by: Gerd Hoffmann 

Seems a bit a bisect fumble in your series here: legacy kms code reserved
the ttm bo before calling boch_bo_pin/unpin, your atomic code doesn't. I
think pushing this into bochs_bo_pin/unpin makes sense for atomic, but to
avoid bisect fail I think you need to have these temporarily in your
cleanup/prepare_plane functions too.

Looked through the entire series, this here is the only issue I think
should be fixed before merging (making atomic_enable optional can be done
as a follow-up if you feel like it). With that addressed on the series:

Acked-by: Daniel Vetter 
> ---
>  drivers/gpu/drm/bochs/bochs_mm.c | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/gpu/drm/bochs/bochs_mm.c 
> b/drivers/gpu/drm/bochs/bochs_mm.c
> index cfe061c25f..970a591908 100644
> --- a/drivers/gpu/drm/bochs/bochs_mm.c
> +++ b/drivers/gpu/drm/bochs/bochs_mm.c
> @@ -223,7 +223,11 @@ int bochs_bo_pin(struct bochs_bo *bo, u32 pl_flag)
>   bochs_ttm_placement(bo, pl_flag);
>   for (i = 0; i < bo->placement.num_placement; i++)
>   bo->placements[i].flags |= TTM_PL_FLAG_NO_EVICT;
> + ret = ttm_bo_reserve(>bo, true, false, NULL);
> + if (ret)
> + return ret;
>   ret = ttm_bo_validate(>bo, >placement, );
> + ttm_bo_unreserve(>bo);
>   if (ret)
>   return ret;
>  
> @@ -247,7 +251,11 @@ int bochs_bo_unpin(struct bochs_bo *bo)
>  
>   for (i = 0; i < bo->placement.num_placement; i++)
>   bo->placements[i].flags &= ~TTM_PL_FLAG_NO_EVICT;
> + ret = ttm_bo_reserve(>bo, true, false, NULL);
> + if (ret)
> + return ret;
>   ret = ttm_bo_validate(>bo, >placement, );
> + ttm_bo_unreserve(>bo);
>   if (ret)
>   return ret;
>  
> -- 
> 2.9.3
> 
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v1 1/2] virtio_pci: use queue idx instead of array idx to set up the vq

2019-01-09 Thread Christian Borntraeger
Cc: sta...@vger.kernel.org
Fixes: 86a559787e6f ("virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT")


On 28.12.2018 03:26, Wei Wang wrote:
> When find_vqs, there will be no vq[i] allocation if its corresponding
> names[i] is NULL. For example, the caller may pass in names[i] (i=4)
> with names[2] being NULL because the related feature bit is turned off,
> so technically there are 3 queues on the device, and name[4] should
> correspond to the 3rd queue on the device.
> 
> So we use queue_idx as the queue index, which is increased only when the
> queue exists.
> 
> Signed-off-by: Wei Wang 
> ---
>  drivers/virtio/virtio_pci_common.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_pci_common.c 
> b/drivers/virtio/virtio_pci_common.c
> index 465a6f5..d0584c0 100644
> --- a/drivers/virtio/virtio_pci_common.c
> +++ b/drivers/virtio/virtio_pci_common.c
> @@ -285,7 +285,7 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, 
> unsigned nvqs,
>  {
>   struct virtio_pci_device *vp_dev = to_vp_device(vdev);
>   u16 msix_vec;
> - int i, err, nvectors, allocated_vectors;
> + int i, err, nvectors, allocated_vectors, queue_idx = 0;
>  
>   vp_dev->vqs = kcalloc(nvqs, sizeof(*vp_dev->vqs), GFP_KERNEL);
>   if (!vp_dev->vqs)
> @@ -321,7 +321,7 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, 
> unsigned nvqs,
>   msix_vec = allocated_vectors++;
>   else
>   msix_vec = VP_MSIX_VQ_VECTOR;
> - vqs[i] = vp_setup_vq(vdev, i, callbacks[i], names[i],
> + vqs[i] = vp_setup_vq(vdev, queue_idx++, callbacks[i], names[i],
>ctx ? ctx[i] : false,
>msix_vec);
>   if (IS_ERR(vqs[i])) {
> @@ -356,7 +356,7 @@ static int vp_find_vqs_intx(struct virtio_device *vdev, 
> unsigned nvqs,
>   const char * const names[], const bool *ctx)
>  {
>   struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> - int i, err;
> + int i, err, queue_idx = 0;
>  
>   vp_dev->vqs = kcalloc(nvqs, sizeof(*vp_dev->vqs), GFP_KERNEL);
>   if (!vp_dev->vqs)
> @@ -374,7 +374,7 @@ static int vp_find_vqs_intx(struct virtio_device *vdev, 
> unsigned nvqs,
>   vqs[i] = NULL;
>   continue;
>   }
> - vqs[i] = vp_setup_vq(vdev, i, callbacks[i], names[i],
> + vqs[i] = vp_setup_vq(vdev, queue_idx++, callbacks[i], names[i],
>ctx ? ctx[i] : false,
>VIRTIO_MSI_NO_VECTOR);
>   if (IS_ERR(vqs[i])) {
> 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 15/15] drm/bochs: reserve bo for pin/unpin

2019-01-09 Thread Gerd Hoffmann
On Wed, Jan 09, 2019 at 11:10:44AM +0100, Daniel Vetter wrote:
> On Tue, Jan 08, 2019 at 12:25:19PM +0100, Gerd Hoffmann wrote:
> > The buffer object must be reserved before calling
> > ttm_bo_validate for pinning/unpinning.
> > 
> > Signed-off-by: Gerd Hoffmann 
> 
> Seems a bit a bisect fumble in your series here: legacy kms code reserved
> the ttm bo before calling boch_bo_pin/unpin, your atomic code doesn't. I
> think pushing this into bochs_bo_pin/unpin makes sense for atomic, but to
> avoid bisect fail I think you need to have these temporarily in your
> cleanup/prepare_plane functions too.

I think I've sorted that.  Have some other changes too, will probably
send v3 tomorrow.

> Looked through the entire series, this here is the only issue I think
> should be fixed before merging (making atomic_enable optional can be done
> as a follow-up if you feel like it). With that addressed on the series:
> 
> Acked-by: Daniel Vetter 

Thanks.

While being at it:  I'm also looking at dma-buf export and import
support for the qemu drivers.

Right now both qxl and virtio have gem_prime_get_sg_table and
gem_prime_import_sg_table handlers which throw a WARN_ONCE() and return
an error.

If I understand things correctly it is valid to set all import/export
callbacks (prime_handle_to_fd, prime_fd_to_handle,
gem_prime_get_sg_table, gem_prime_import_sg_table) to NULL when not
supporting dma-buf import/export and still advertise DRIVER_PRIME to
indicate the other prime callbacks are supported (so generic fbdev
emulation can use gem_prime_vmap etc).  Is that correct?

On exporting:

TTM_PL_TT should be easy, just pin the buffer, grab the pages list and
feed that into drm_prime_pages_to_sg.  Didn't try yet though.  Is that
approach correct?

Is it possible to export TTM_PL_VRAM objects (with backing storage being
a pci memory bar)?  If so, how?

On importing:

Importing into TTM_PL_TT object looks easy again, at least when the
object is actually stored in RAM.  What if not?

Importing into TTM_PL_VRAM:  Impossible I think, without copying over
the data.  Should that be done?  If so, how?  Or is it better to just
not support import then?

thanks,
  Gerd

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v3 4/5] ext4: disable map_sync for virtio pmem

2019-01-09 Thread Pankaj Gupta
Virtio pmem provides asynchronous host page cache flush
mechanism. We don't support 'MAP_SYNC' with virtio pmem 
and ext4. 

Signed-off-by: Pankaj Gupta 
---
 fs/ext4/file.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 69d65d4..e54f48b 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -360,8 +360,10 @@ static const struct vm_operations_struct ext4_file_vm_ops 
= {
 static int ext4_file_mmap(struct file *file, struct vm_area_struct *vma)
 {
struct inode *inode = file->f_mapping->host;
+   struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
+   struct dax_device *dax_dev = sbi->s_daxdev;
 
-   if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb
+   if (unlikely(ext4_forced_shutdown(sbi)))
return -EIO;
 
/*
@@ -371,6 +373,13 @@ static int ext4_file_mmap(struct file *file, struct 
vm_area_struct *vma)
if (!IS_DAX(file_inode(file)) && (vma->vm_flags & VM_SYNC))
return -EOPNOTSUPP;
 
+   /* We don't support synchronous mappings with guest direct access
+* and virtio based host page cache flush mechanism.
+*/
+   if (IS_DAX(file_inode(file)) && virtio_pmem_host_cache_enabled(dax_dev)
+   && (vma->vm_flags & VM_SYNC))
+   return -EOPNOTSUPP;
+
file_accessed(file);
if (IS_DAX(file_inode(file))) {
vma->vm_ops = _dax_vm_ops;
-- 
2.9.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v3 3/5] libnvdimm: add nd_region buffered dax_dev flag

2019-01-09 Thread Pankaj Gupta
This patch adds 'DAXDEV_BUFFERED' flag which is set
for virtio pmem corresponding nd_region. This later
is used to disable MAP_SYNC functionality for ext4
& xfs filesystem.

Signed-off-by: Pankaj Gupta 
---
 drivers/dax/super.c  | 17 +
 drivers/nvdimm/pmem.c|  3 +++
 drivers/nvdimm/region_devs.c |  7 +++
 drivers/virtio/pmem.c|  1 +
 include/linux/dax.h  |  9 +
 include/linux/libnvdimm.h|  6 ++
 6 files changed, 43 insertions(+)

diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index 6e928f3..9128740 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -167,6 +167,8 @@ enum dax_device_flags {
DAXDEV_ALIVE,
/* gate whether dax_flush() calls the low level flush routine */
DAXDEV_WRITE_CACHE,
+   /* flag to disable MAP_SYNC for virtio based host page cache flush */
+   DAXDEV_BUFFERED,
 };
 
 /**
@@ -335,6 +337,21 @@ bool dax_write_cache_enabled(struct dax_device *dax_dev)
 }
 EXPORT_SYMBOL_GPL(dax_write_cache_enabled);
 
+void virtio_pmem_host_cache(struct dax_device *dax_dev, bool wc)
+{
+   if (wc)
+   set_bit(DAXDEV_BUFFERED, _dev->flags);
+   else
+   clear_bit(DAXDEV_BUFFERED, _dev->flags);
+}
+EXPORT_SYMBOL_GPL(virtio_pmem_host_cache);
+
+bool virtio_pmem_host_cache_enabled(struct dax_device *dax_dev)
+{
+   return test_bit(DAXDEV_BUFFERED, _dev->flags);
+}
+EXPORT_SYMBOL_GPL(virtio_pmem_host_cache_enabled);
+
 bool dax_alive(struct dax_device *dax_dev)
 {
lockdep_assert_held(_srcu);
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index fe1217b..8d190a3 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -472,6 +472,9 @@ static int pmem_attach_disk(struct device *dev,
return -ENOMEM;
}
dax_write_cache(dax_dev, nvdimm_has_cache(nd_region));
+
+   /* Set buffered bit in 'dax_dev' for virtio pmem */
+   virtio_pmem_host_cache(dax_dev, nvdimm_is_buffered(nd_region));
pmem->dax_dev = dax_dev;
 
gendev = disk_to_dev(disk);
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index f8218b4..1f8b2be 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -1264,6 +1264,13 @@ int nd_region_conflict(struct nd_region *nd_region, 
resource_size_t start,
return device_for_each_child(_bus->dev, , region_conflict);
 }
 
+int nvdimm_is_buffered(struct nd_region *nd_region)
+{
+   return is_nd_pmem(_region->dev) &&
+   test_bit(ND_REGION_BUFFERED, _region->flags);
+}
+EXPORT_SYMBOL_GPL(nvdimm_is_buffered);
+
 void __exit nd_region_devs_exit(void)
 {
ida_destroy(_ida);
diff --git a/drivers/virtio/pmem.c b/drivers/virtio/pmem.c
index 51f5349..901767b 100644
--- a/drivers/virtio/pmem.c
+++ b/drivers/virtio/pmem.c
@@ -81,6 +81,7 @@ static int virtio_pmem_probe(struct virtio_device *vdev)
ndr_desc.numa_node = nid;
ndr_desc.flush = virtio_pmem_flush;
set_bit(ND_REGION_PAGEMAP, _desc.flags);
+   set_bit(ND_REGION_BUFFERED, _desc.flags);
nd_region = nvdimm_pmem_region_create(nvdimm_bus, _desc);
nd_region->provider_data =  dev_to_virtio
(nd_region->dev.parent->parent);
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 0dd316a..d16e03e 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -37,6 +37,8 @@ void put_dax(struct dax_device *dax_dev);
 void kill_dax(struct dax_device *dax_dev);
 void dax_write_cache(struct dax_device *dax_dev, bool wc);
 bool dax_write_cache_enabled(struct dax_device *dax_dev);
+void virtio_pmem_host_cache(struct dax_device *dax_dev, bool wc);
+bool virtio_pmem_host_cache_enabled(struct dax_device *dax_dev);
 #else
 static inline struct dax_device *dax_get_by_host(const char *host)
 {
@@ -64,6 +66,13 @@ static inline bool dax_write_cache_enabled(struct dax_device 
*dax_dev)
 {
return false;
 }
+static inline void virtio_pmem_host_cache(struct dax_device *dax_dev, bool wc)
+{
+}
+static inline bool virtio_pmem_host_cache_enabled(struct dax_device *dax_dev)
+{
+   return false;
+}
 #endif
 
 struct writeback_control;
diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
index ca8bc07..94616f1 100644
--- a/include/linux/libnvdimm.h
+++ b/include/linux/libnvdimm.h
@@ -64,6 +64,11 @@ enum {
 */
ND_REGION_PERSIST_MEMCTRL = 2,
 
+   /* provides virtio based asynchronous flush mechanism for buffered
+* host page cache.
+*/
+   ND_REGION_BUFFERED = 3,
+
/* mark newly adjusted resources as requiring a label update */
DPA_RESOURCE_ADJUSTED = 1 << 0,
 };
@@ -265,6 +270,7 @@ int generic_nvdimm_flush(struct nd_region *nd_region);
 int nvdimm_has_flush(struct nd_region *nd_region);
 int nvdimm_has_cache(struct nd_region *nd_region);
 int nvdimm_in_overwrite(struct nvdimm *nvdimm);
+int nvdimm_is_buffered(struct nd_region 

[PATCH v3 2/5] virtio-pmem: Add virtio pmem driver

2019-01-09 Thread Pankaj Gupta
This patch adds virtio-pmem driver for KVM guest.

Guest reads the persistent memory range information from
Qemu over VIRTIO and registers it on nvdimm_bus. It also
creates a nd_region object with the persistent memory
range information so that existing 'nvdimm/pmem' driver
can reserve this into system memory map. This way
'virtio-pmem' driver uses existing functionality of pmem
driver to register persistent memory compatible for DAX
capable filesystems.

This also provides function to perform guest flush over
VIRTIO from 'pmem' driver when userspace performs flush
on DAX memory range.

Signed-off-by: Pankaj Gupta 
---
 drivers/nvdimm/virtio_pmem.c |  84 ++
 drivers/virtio/Kconfig   |  10 
 drivers/virtio/Makefile  |   1 +
 drivers/virtio/pmem.c| 124 +++
 include/linux/virtio_pmem.h  |  60 +++
 include/uapi/linux/virtio_ids.h  |   1 +
 include/uapi/linux/virtio_pmem.h |  10 
 7 files changed, 290 insertions(+)
 create mode 100644 drivers/nvdimm/virtio_pmem.c
 create mode 100644 drivers/virtio/pmem.c
 create mode 100644 include/linux/virtio_pmem.h
 create mode 100644 include/uapi/linux/virtio_pmem.h

diff --git a/drivers/nvdimm/virtio_pmem.c b/drivers/nvdimm/virtio_pmem.c
new file mode 100644
index 000..2a1b1ba
--- /dev/null
+++ b/drivers/nvdimm/virtio_pmem.c
@@ -0,0 +1,84 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * virtio_pmem.c: Virtio pmem Driver
+ *
+ * Discovers persistent memory range information
+ * from host and provides a virtio based flushing
+ * interface.
+ */
+#include 
+#include "nd.h"
+
+ /* The interrupt handler */
+void host_ack(struct virtqueue *vq)
+{
+   unsigned int len;
+   unsigned long flags;
+   struct virtio_pmem_request *req, *req_buf;
+   struct virtio_pmem *vpmem = vq->vdev->priv;
+
+   spin_lock_irqsave(>pmem_lock, flags);
+   while ((req = virtqueue_get_buf(vq, )) != NULL) {
+   req->done = true;
+   wake_up(>host_acked);
+
+   if (!list_empty(>req_list)) {
+   req_buf = list_first_entry(>req_list,
+   struct virtio_pmem_request, list);
+   list_del(>req_list);
+   req_buf->wq_buf_avail = true;
+   wake_up(_buf->wq_buf);
+   }
+   }
+   spin_unlock_irqrestore(>pmem_lock, flags);
+}
+EXPORT_SYMBOL_GPL(host_ack);
+
+ /* The request submission function */
+int virtio_pmem_flush(struct nd_region *nd_region)
+{
+   int err;
+   unsigned long flags;
+   struct scatterlist *sgs[2], sg, ret;
+   struct virtio_device *vdev = nd_region->provider_data;
+   struct virtio_pmem *vpmem = vdev->priv;
+   struct virtio_pmem_request *req;
+
+   might_sleep();
+   req = kmalloc(sizeof(*req), GFP_KERNEL);
+   if (!req)
+   return -ENOMEM;
+
+   req->done = req->wq_buf_avail = false;
+   strcpy(req->name, "FLUSH");
+   init_waitqueue_head(>host_acked);
+   init_waitqueue_head(>wq_buf);
+   sg_init_one(, req->name, strlen(req->name));
+   sgs[0] = 
+   sg_init_one(, >ret, sizeof(req->ret));
+   sgs[1] = 
+
+   spin_lock_irqsave(>pmem_lock, flags);
+   err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req, GFP_ATOMIC);
+   if (err) {
+   dev_err(>dev, "failed to send command to virtio pmem 
device\n");
+
+   list_add_tail(>req_list, >list);
+   spin_unlock_irqrestore(>pmem_lock, flags);
+
+   /* When host has read buffer, this completes via host_ack */
+   wait_event(req->wq_buf, req->wq_buf_avail);
+   spin_lock_irqsave(>pmem_lock, flags);
+   }
+   virtqueue_kick(vpmem->req_vq);
+   spin_unlock_irqrestore(>pmem_lock, flags);
+
+   /* When host has read buffer, this completes via host_ack */
+   wait_event(req->host_acked, req->done);
+   err = req->ret;
+   kfree(req);
+
+   return err;
+};
+EXPORT_SYMBOL_GPL(virtio_pmem_flush);
+MODULE_LICENSE("GPL");
diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
index 3589764..9f634a2 100644
--- a/drivers/virtio/Kconfig
+++ b/drivers/virtio/Kconfig
@@ -42,6 +42,16 @@ config VIRTIO_PCI_LEGACY
 
  If unsure, say Y.
 
+config VIRTIO_PMEM
+   tristate "Support for virtio pmem driver"
+   depends on VIRTIO
+   depends on LIBNVDIMM
+   help
+   This driver provides support for virtio based flushing interface
+   for persistent memory range.
+
+   If unsure, say M.
+
 config VIRTIO_BALLOON
tristate "Virtio balloon driver"
depends on VIRTIO
diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
index 3a2b5c5..143ce91 100644
--- a/drivers/virtio/Makefile
+++ b/drivers/virtio/Makefile
@@ -6,3 +6,4 @@ virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
 

[PATCH v3 1/5] libnvdimm: nd_region flush callback support

2019-01-09 Thread Pankaj Gupta
This patch adds functionality to perform flush from guest
to host over VIRTIO. We are registering a callback based
on 'nd_region' type. virtio_pmem driver requires this special
flush function. For rest of the region types we are registering
existing flush function. Report error returned by host fsync
failure to userspace.

This also handles asynchronous flush requests from the block layer
by creating a child bio and chaining it with parent bio.

Signed-off-by: Pankaj Gupta 
---
 drivers/acpi/nfit/core.c |  4 ++--
 drivers/nvdimm/claim.c   |  6 --
 drivers/nvdimm/nd.h  |  1 +
 drivers/nvdimm/pmem.c| 12 
 drivers/nvdimm/region_devs.c | 38 --
 include/linux/libnvdimm.h|  5 -
 6 files changed, 55 insertions(+), 11 deletions(-)

diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
index b072cfc..f154852 100644
--- a/drivers/acpi/nfit/core.c
+++ b/drivers/acpi/nfit/core.c
@@ -2234,7 +2234,7 @@ static void write_blk_ctl(struct nfit_blk *nfit_blk, 
unsigned int bw,
offset = to_interleave_offset(offset, mmio);
 
writeq(cmd, mmio->addr.base + offset);
-   nvdimm_flush(nfit_blk->nd_region);
+   nvdimm_flush(nfit_blk->nd_region, NULL, false);
 
if (nfit_blk->dimm_flags & NFIT_BLK_DCR_LATCH)
readq(mmio->addr.base + offset);
@@ -2283,7 +2283,7 @@ static int acpi_nfit_blk_single_io(struct nfit_blk 
*nfit_blk,
}
 
if (rw)
-   nvdimm_flush(nfit_blk->nd_region);
+   nvdimm_flush(nfit_blk->nd_region, NULL, false);
 
rc = read_blk_stat(nfit_blk, lane) ? -EIO : 0;
return rc;
diff --git a/drivers/nvdimm/claim.c b/drivers/nvdimm/claim.c
index fb667bf..a1dfa06 100644
--- a/drivers/nvdimm/claim.c
+++ b/drivers/nvdimm/claim.c
@@ -263,7 +263,7 @@ static int nsio_rw_bytes(struct nd_namespace_common *ndns,
struct nd_namespace_io *nsio = to_nd_namespace_io(>dev);
unsigned int sz_align = ALIGN(size + (offset & (512 - 1)), 512);
sector_t sector = offset >> 9;
-   int rc = 0;
+   int rc = 0, ret = 0;
 
if (unlikely(!size))
return 0;
@@ -301,7 +301,9 @@ static int nsio_rw_bytes(struct nd_namespace_common *ndns,
}
 
memcpy_flushcache(nsio->addr + offset, buf, size);
-   nvdimm_flush(to_nd_region(ndns->dev.parent));
+   ret = nvdimm_flush(to_nd_region(ndns->dev.parent), NULL, false);
+   if (ret)
+   rc = ret;
 
return rc;
 }
diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
index 98317e7..d53a2d1 100644
--- a/drivers/nvdimm/nd.h
+++ b/drivers/nvdimm/nd.h
@@ -160,6 +160,7 @@ struct nd_region {
struct nd_interleave_set *nd_set;
struct nd_percpu_lane __percpu *lane;
struct nd_mapping mapping[0];
+   int (*flush)(struct nd_region *nd_region);
 };
 
 struct nd_blk_region {
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 6071e29..5d6a4a1 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -192,6 +192,7 @@ static blk_status_t pmem_do_bvec(struct pmem_device *pmem, 
struct page *page,
 
 static blk_qc_t pmem_make_request(struct request_queue *q, struct bio *bio)
 {
+   int ret = 0;
blk_status_t rc = 0;
bool do_acct;
unsigned long start;
@@ -201,7 +202,7 @@ static blk_qc_t pmem_make_request(struct request_queue *q, 
struct bio *bio)
struct nd_region *nd_region = to_region(pmem);
 
if (bio->bi_opf & REQ_PREFLUSH)
-   nvdimm_flush(nd_region);
+   ret = nvdimm_flush(nd_region, bio, true);
 
do_acct = nd_iostat_start(bio, );
bio_for_each_segment(bvec, bio, iter) {
@@ -216,7 +217,10 @@ static blk_qc_t pmem_make_request(struct request_queue *q, 
struct bio *bio)
nd_iostat_end(bio, start);
 
if (bio->bi_opf & REQ_FUA)
-   nvdimm_flush(nd_region);
+   ret = nvdimm_flush(nd_region, bio, true);
+
+   if (ret)
+   bio->bi_status = errno_to_blk_status(ret);
 
bio_endio(bio);
return BLK_QC_T_NONE;
@@ -528,14 +532,14 @@ static int nd_pmem_remove(struct device *dev)
sysfs_put(pmem->bb_state);
pmem->bb_state = NULL;
}
-   nvdimm_flush(to_nd_region(dev->parent));
+   nvdimm_flush(to_nd_region(dev->parent), NULL, false);
 
return 0;
 }
 
 static void nd_pmem_shutdown(struct device *dev)
 {
-   nvdimm_flush(to_nd_region(dev->parent));
+   nvdimm_flush(to_nd_region(dev->parent), NULL, false);
 }
 
 static void nd_pmem_notify(struct device *dev, enum nvdimm_event event)
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index fa37afc..5508727 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -290,7 +290,9 @@ static ssize_t deep_flush_store(struct device *dev, struct 
device_attribute *att
return rc;
if 

[PATCH v3 0/5] kvm "virtio pmem" device

2019-01-09 Thread Pankaj Gupta
 This patch series has implementation for "virtio pmem". 
 "virtio pmem" is fake persistent memory(nvdimm) in guest 
 which allows to bypass the guest page cache. This also
 implements a VIRTIO based asynchronous flush mechanism.  
 
 Sharing guest kernel driver in this patchset with the 
 changes suggested in v2. Tested with Qemu side device 
 emulation for virtio-pmem [6]. 
 
 Details of project idea for 'virtio pmem' flushing interface 
 is shared [3] & [4].

 Implementation is divided into two parts:
 New virtio pmem guest driver and qemu code changes for new 
 virtio pmem paravirtualized device.

1. Guest virtio-pmem kernel driver
-
   - Reads persistent memory range from paravirt device and 
 registers with 'nvdimm_bus'.  
   - 'nvdimm/pmem' driver uses this information to allocate 
 persistent memory region and setup filesystem operations 
 to the allocated memory. 
   - virtio pmem driver implements asynchronous flushing 
 interface to flush from guest to host.

2. Qemu virtio-pmem device
-
   - Creates virtio pmem device and exposes a memory range to 
 KVM guest. 
   - At host side this is file backed memory which acts as 
 persistent memory. 
   - Qemu side flush uses aio thread pool API's and virtio 
 for asynchronous guest multi request handling. 

   David Hildenbrand CCed also posted a modified version[6] of 
   qemu virtio-pmem code based on updated Qemu memory device API. 

 Virtio-pmem errors handling:
 
  Checked behaviour of virtio-pmem for below types of errors
  Need suggestions on expected behaviour for handling these errors?

  - Hardware Errors: Uncorrectable recoverable Errors: 
  a] virtio-pmem: 
- As per current logic if error page belongs to Qemu process, 
  host MCE handler isolates(hwpoison) that page and send SIGBUS. 
  Qemu SIGBUS handler injects exception to KVM guest. 
- KVM guest then isolates the page and send SIGBUS to guest 
  userspace process which has mapped the page. 
  
  b] Existing implementation for ACPI pmem driver: 
- Handles such errors with MCE notifier and creates a list 
  of bad blocks. Read/direct access DAX operation return EIO 
  if accessed memory page fall in bad block list.
- It also starts backgound scrubbing.  
- Similar functionality can be reused in virtio-pmem with MCE 
  notifier but without scrubbing(no ACPI/ARS)? Need inputs to 
  confirm if this behaviour is ok or needs any change?

Changes from PATCH v2: [1]
- Disable MAP_SYNC for ext4 & XFS filesystems - [Dan] 
- Use name 'virtio pmem' in place of 'fake dax' 

Changes from PATCH v1: [2]
- 0-day build test for build dependency on libnvdimm 

 Changes suggested by - [Dan Williams]
- Split the driver into two parts virtio & pmem  
- Move queuing of async block request to block layer
- Add "sync" parameter in nvdimm_flush function
- Use indirect call for nvdimm_flush
- Don’t move declarations to common global header e.g nd.h
- nvdimm_flush() return 0 or -EIO if it fails
- Teach nsio_rw_bytes() that the flush can fail
- Rename nvdimm_flush() to generic_nvdimm_flush()
- Use 'nd_region->provider_data' for long dereferencing
- Remove virtio_pmem_freeze/restore functions
- Remove BSD license text with SPDX license text

- Add might_sleep() in virtio_pmem_flush - [Luiz]
- Make spin_lock_irqsave() narrow

Changes from RFC v3
- Rebase to latest upstream - Luiz
- Call ndregion->flush in place of nvdimm_flush- Luiz
- kmalloc return check - Luiz
- virtqueue full handling - Stefan
- Don't map entire virtio_pmem_req to device - Stefan
- request leak, correct sizeof req- Stefan
- Move declaration to virtio_pmem.c

Changes from RFC v2:
- Add flush function in the nd_region in place of switching
  on a flag - Dan & Stefan
- Add flush completion function with proper locking and wait
  for host side flush completion - Stefan & Dan
- Keep userspace API in uapi header file - Stefan, MST
- Use LE fields & New device id - MST
- Indentation & spacing suggestions - MST & Eric
- Remove extra header files & add licensing - Stefan

Changes from RFC v1:
- Reuse existing 'pmem' code for registering persistent 
  memory and other operations instead of creating an entirely 
  new block driver.
- Use VIRTIO driver to register memory information with 
  nvdimm_bus and create region_type accordingly. 
- Call VIRTIO flush from existing pmem driver.

Pankaj Gupta (5):
   libnvdimm: nd_region flush callback support
   virtio-pmem: Add virtio-pmem guest driver
   libnvdimm: add nd_region buffered dax_dev flag
   ext4: disable map_sync for virtio pmem
   xfs: disable map_sync for virtio pmem

[2] https://lkml.org/lkml/2018/8/31/407
[3] https://www.spinics.net/lists/kvm/msg149761.html
[4] https://www.spinics.net/lists/kvm/msg153095.html  
[5] https://lkml.org/lkml/2018/8/31/413
[6] https://marc.info/?l=qemu-devel=153555721901824=2

 

Re: [PATCH v2 03/15] drm/bochs: atomic: add atomic_flush+atomic_enable callbacks.

2019-01-09 Thread Daniel Vetter
On Tue, Jan 08, 2019 at 12:25:07PM +0100, Gerd Hoffmann wrote:
> Conversion to atomic modesetting, step one.
> Add atomic crtc helper callbacks.
> 
> Signed-off-by: Gerd Hoffmann 
> ---
>  drivers/gpu/drm/bochs/bochs_kms.c | 25 +
>  1 file changed, 25 insertions(+)
> 
> diff --git a/drivers/gpu/drm/bochs/bochs_kms.c 
> b/drivers/gpu/drm/bochs/bochs_kms.c
> index f7e6d1a9b3..2cbd406b1f 100644
> --- a/drivers/gpu/drm/bochs/bochs_kms.c
> +++ b/drivers/gpu/drm/bochs/bochs_kms.c
> @@ -115,6 +115,29 @@ static int bochs_crtc_page_flip(struct drm_crtc *crtc,
>   return 0;
>  }
>  
> +static void bochs_crtc_atomic_enable(struct drm_crtc *crtc,
> +  struct drm_crtc_state *old_crtc_state)
> +{
> +}

A patch to make this optional in the helpers would be neat.
-Daniel

> +
> +static void bochs_crtc_atomic_flush(struct drm_crtc *crtc,
> + struct drm_crtc_state *old_crtc_state)
> +{
> + struct drm_device *dev = crtc->dev;
> + struct drm_pending_vblank_event *event;
> +
> + if (crtc->state && crtc->state->event) {
> + unsigned long irqflags;
> +
> + spin_lock_irqsave(>event_lock, irqflags);
> + event = crtc->state->event;
> + crtc->state->event = NULL;
> + drm_crtc_send_vblank_event(crtc, event);
> + spin_unlock_irqrestore(>event_lock, irqflags);
> + }
> +}
> +
> +
>  /* These provide the minimum set of functions required to handle a CRTC */
>  static const struct drm_crtc_funcs bochs_crtc_funcs = {
>   .set_config = drm_crtc_helper_set_config,
> @@ -128,6 +151,8 @@ static const struct drm_crtc_helper_funcs 
> bochs_helper_funcs = {
>   .mode_set_base = bochs_crtc_mode_set_base,
>   .prepare = bochs_crtc_prepare,
>   .commit = bochs_crtc_commit,
> + .atomic_enable = bochs_crtc_atomic_enable,
> + .atomic_flush = bochs_crtc_atomic_flush,
>  };
>  
>  static const uint32_t bochs_formats[] = {
> -- 
> 2.9.3
> 
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] qxl: Use struct_size() in kzalloc()

2019-01-09 Thread Gerd Hoffmann
On Tue, Jan 08, 2019 at 10:21:52AM -0600, Gustavo A. R. Silva wrote:
> One of the more common cases of allocation size calculations is finding the
> size of a structure that has a zero-sized array at the end, along with memory
> for some number of elements for that array. For example:
> 
> struct foo {
> int stuff;
> void *entry[];
> };
> 
> instance = kzalloc(sizeof(struct foo) + sizeof(void *) * count, GFP_KERNEL);
> 
> Instead of leaving these open-coded and prone to type mistakes, we can now
> use the new struct_size() helper:
> 
> instance = kzalloc(struct_size(instance, entry, count), GFP_KERNEL);
> 
> This code was detected with the help of Coccinelle.

Patch queued up.

thanks,
  Gerd

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2] drm/virtio: Drop deprecated load/unload initialization

2019-01-09 Thread Gerd Hoffmann
On Tue, Jan 08, 2019 at 11:59:30AM -0300, Ezequiel Garcia wrote:
> Move the code around so the driver is probed the bus
> .probe and removed from the bus .remove callbacks.
> This commit is just a cleanup and shouldn't affect
> functionality.

That one works.

thanks,
  Gerd

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] vhost/vsock: fix vhost vsock cid hashing inconsistent

2019-01-09 Thread Christian Borntraeger
Adding Peter,

is this the same problem that you reported me today?
Can you test Zha Bins patch?

Christian

On 08.01.2019 09:07, Zha Bin wrote:
> The vsock core only supports 32bit CID, but the Virtio-vsock spec define
> CID (dst_cid and src_cid) as u64 and the upper 32bits is reserved as
> zero. This inconsistency causes one bug in vhost vsock driver. The
> scenarios is:
> 
>   0. A hash table (vhost_vsock_hash) is used to map an CID to a vsock
>   object. And hash_min() is used to compute the hash key. hash_min() is
>   defined as:
>   (sizeof(val) <= 4 ? hash_32(val, bits) : hash_long(val, bits)).
>   That means the hash algorithm has dependency on the size of macro
>   argument 'val'.
>   0. In function vhost_vsock_set_cid(), a 64bit CID is passed to
>   hash_min() to compute the hash key when inserting a vsock object into
>   the hash table.
>   0. In function vhost_vsock_get(), a 32bit CID is passed to hash_min()
>   to compute the hash key when looking up a vsock for an CID.
> 
> Because the different size of the CID, hash_min() returns different hash
> key, thus fails to look up the vsock object for an CID.
> 
> To fix this bug, we keep CID as u64 in the IOCTLs and virtio message
> headers, but explicitly convert u64 to u32 when deal with the hash table
> and vsock core.
> 
> Fixes: 834e772c8db0 ("vhost/vsock: fix use-after-free in network stack 
> callers")
> Link: https://github.com/stefanha/virtio/blob/vsock/trunk/content.tex
> Signed-off-by: Zha Bin 
> Reviewed-by: Liu Jiang 
> ---
>  drivers/vhost/vsock.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> index bc42d38ae031..3fbc068eaa9b 100644
> --- a/drivers/vhost/vsock.c
> +++ b/drivers/vhost/vsock.c
> @@ -642,7 +642,7 @@ static int vhost_vsock_set_cid(struct vhost_vsock *vsock, 
> u64 guest_cid)
>   hash_del_rcu(>hash);
>  
>   vsock->guest_cid = guest_cid;
> - hash_add_rcu(vhost_vsock_hash, >hash, guest_cid);
> + hash_add_rcu(vhost_vsock_hash, >hash, vsock->guest_cid);
>   mutex_unlock(_vsock_mutex);
>  
>   return 0;
> 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH net-next V2 3/3] vhost: don't touch avail ring if in_order is negotiated

2019-01-09 Thread Jason Wang
Device use descriptors table in order, so there's no need to read
index from available ring. This eliminate the cache contention on
available ring completely.

Virito-user + vhost_kernel + XDP_DROP on 2.60GHz Broadwell

Before /After
SMAP on:  4.8Mpps   5.3Mpps(+10%)
SMAP off: 6.6Mpps   7.0Mpps(+6%)

Signed-off-by: Jason Wang 
---
 drivers/vhost/vhost.c | 19 ---
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 55e5aa662ad5..ab0d05262235 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -2012,6 +2012,7 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq,
__virtio16 avail_idx;
__virtio16 ring_head;
int ret, access;
+   bool in_order = vhost_has_feature(vq, VIRTIO_F_IN_ORDER);
 
/* Check it isn't doing very strange things with descriptor numbers. */
last_avail_idx = vq->last_avail_idx;
@@ -2044,15 +2045,19 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq,
 
/* Grab the next descriptor number they're advertising, and increment
 * the index we've seen. */
-   if (unlikely(vhost_get_avail(vq, ring_head,
->avail->ring[last_avail_idx & (vq->num - 1)]))) {
-   vq_err(vq, "Failed to read head: idx %d address %p\n",
-  last_avail_idx,
-  >avail->ring[last_avail_idx % vq->num]);
-   return -EFAULT;
+   if (!in_order) {
+   if (unlikely(vhost_get_avail(vq, ring_head,
+   >avail->ring[last_avail_idx & (vq->num - 1)]))) {
+   vq_err(vq, "Failed to read head: idx %d address %p\n",
+   last_avail_idx,
+   >avail->ring[last_avail_idx % vq->num]);
+   return -EFAULT;
+   }
+   head = vhost16_to_cpu(vq, ring_head);
+   } else {
+   head = last_avail_idx & (vq->num - 1);
}
 
-   head = vhost16_to_cpu(vq, ring_head);
 
/* If their number is silly, that's an error. */
if (unlikely(head >= vq->num)) {
-- 
2.17.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH net-next V2 2/3] vhost_net: support in order feature

2019-01-09 Thread Jason Wang
This makes vhost_net to support in order feature. This is as simple as
use datacopy path when it was negotiated. An alternative is not to
advertise in order when zerocopy is enabled which tends to be
suboptimal consider zerocopy may suffer from e.g HOL issues.

Signed-off-by: Jason Wang 
---
 drivers/vhost/net.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 36f3d0f49e60..0870f51a1c76 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -74,7 +74,8 @@ enum {
VHOST_NET_FEATURES = VHOST_FEATURES |
 (1ULL << VHOST_NET_F_VIRTIO_NET_HDR) |
 (1ULL << VIRTIO_NET_F_MRG_RXBUF) |
-(1ULL << VIRTIO_F_IOMMU_PLATFORM)
+(1ULL << VIRTIO_F_IOMMU_PLATFORM) |
+(1ULL << VIRTIO_F_IN_ORDER)
 };
 
 enum {
@@ -977,7 +978,8 @@ static void handle_tx(struct vhost_net *net)
vhost_disable_notify(>dev, vq);
vhost_net_disable_vq(net, vq);
 
-   if (vhost_sock_zcopy(sock))
+   if (vhost_sock_zcopy(sock) &&
+   !vhost_has_feature(vq, VIRTIO_F_IN_ORDER))
handle_tx_zerocopy(net, sock);
else
handle_tx_copy(net, sock);
-- 
2.17.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH net-next V2 1/3] virtio: introduce in order feature bit

2019-01-09 Thread Jason Wang
Signed-off-by: Jason Wang 
---
 include/uapi/linux/virtio_config.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/include/uapi/linux/virtio_config.h 
b/include/uapi/linux/virtio_config.h
index 1196e1c1d4f6..2698e069ed9e 100644
--- a/include/uapi/linux/virtio_config.h
+++ b/include/uapi/linux/virtio_config.h
@@ -78,6 +78,12 @@
 /* This feature indicates support for the packed virtqueue layout. */
 #define VIRTIO_F_RING_PACKED   34
 
+/*
+ * Device uses buffers in the same order in which they have been
+ * available.
+ */
+#define VIRTIO_F_IN_ORDER  35
+
 /*
  * Does the device support Single Root I/O Virtualization?
  */
-- 
2.17.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH net-next V2 0/3] basic in order support for vhost_net

2019-01-09 Thread Jason Wang
Hi:

This series implement basic in order feature support for
vhost_net. This feature requires both driver and device to use
descriptors in order which can simplify the implementation and
optimizaton for both side. The series also implement a simple
optimization that avoid read available ring. Test shows 10%
performance improvement at most.

More optimizations could be done on top.

Changes from V1:
- no code changes
- add result of SMAP off

Jason Wang (3):
  virtio: introduce in order feature bit
  vhost_net: support in order feature
  vhost: don't touch avail ring if in_order is negotiated

 drivers/vhost/net.c|  6 --
 drivers/vhost/vhost.c  | 19 ---
 include/uapi/linux/virtio_config.h |  6 ++
 3 files changed, 22 insertions(+), 9 deletions(-)

-- 
2.17.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization