Re: [virtio-dev] Re: [PATCH] virtio-gpu: add shared resource feature

2019-12-10 Thread Gerd Hoffmann
  Hi,

> > They are never taken away from guest ram.  The guest is free to do
> > whatever it wants with the pages.  It's the job of the guest driver to
> > make sure the pages are not released until the host stopped using them.
> > So the host must drop all references before completing the
> > DETACH_BACKING command.
> 
> Let's write that in the spec.

Yes, saying so explicitly that totally makes sense.
I'll update the patch accordingly.

> We (you?) need to modify drm_gem_shmem_* helpers to always be cached
> for virtgpu for now, since the TTM removal changed that behavior.

We can just roll our own mmap function to handle that.
I'll prepare a patch (that is independent from F_SHARED anyway).

> Then, we should expose host properties to the guest, like [1] suggests
> (maybe something like VIRTIO_GPU_F_GUEST_ATTRIBUTE_OVERRIDE,
> VIRTIO_GPU_F_GUEST_CACHE_OPS).  If there's already a method to do
> this, that would be awesome.

We'll need that for sure when mapping host resources into the guest, but
we are not there yet with F_SHARED only.

> > Also: what is special in virtualization here?  There are arm drivers
> > using drm_gem_shmem_* helpers.  How do they manage this?
> 
> Panfrost, v3d.  I'm surprised they haven't hit this issue yet.

Maybe because all mappings are consistent.  With virtio-gpu you could
end up in a situation where the guest uses wc whereas the host uses
cached and things go boom from there.

cheers,
  Gerd


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] Re: [PATCH] virtio-gpu: add shared resource feature

2019-12-09 Thread Gurchetan Singh
On Mon, Dec 9, 2019 at 3:31 AM Gerd Hoffmann  wrote:
>
> > > > 1) Driver sends RESOURCE_CREATE_2D shared request
> > > > 2) Driver sends ATTACH_BACKING request
> > > > 3) Device creates a shared resource
> > > > 4) Driver sends SET_SCANOUT request
> > > > 5) Device sends shared resource to display
> > > > 6) Driver sends DETACH_BACKING request
> > >
> > > Hmm, I think you can crash the current qemu code that way (didn't test
> > > though).  I think the proper solution would be to return -EBUSY on the
> > > DETACH_BACKING request.  Guess I'll add a new error code for that.
> >
> > That would add the extra complication of a "delayed detach list" in
> > the guest kernel.
>
> Why?  I can't see how that can happen with the linux guest driver.  You
> need a drm framebuffer to map an gem object to a scanout.  The drm
> framebuffer holds a reference of the gem object, so the linux kernel
> wouldn't try to delete the gem object (-> DETACH_BACKING + UNREF) while
> it is mapped to the scanout because the refcount wouldn't go down to
> zero.

The case I'm thinking about is:

1) Guest DRM framebuffer refcount goes to zero
2) Host DRM framebuffer refcount is non-zero

I prefer the guest kernel just complain loudly rather than doing
complicated tricks in such instance.

>
> > I don't know if it's absolutely necessary --
> > because the open source implementation (udmabuf) seems to take a
> > reference on all constituent pages like DRM gem does [a][b][c][d].  We
> > should figure out if the extra reference is sufficient to prevent the
> > pages from going back to guest RAM (the guest kernel takes references
> > on those pages too -- I don't know if guest references are treated
> > differently).
>
> They are never taken away from guest ram.  The guest is free to do
> whatever it wants with the pages.  It's the job of the guest driver to
> make sure the pages are not released until the host stopped using them.
> So the host must drop all references before completing the
> DETACH_BACKING command.

Let's write that in the spec.

>
> > > I think we can do 90% of that optimization without changing the
> > > protocol.  Display updates typically involve multiple commands.  A
> > > transfer, possibly a set-scanout (when page-flipping), finally flush.
> > > The driver can first queue all commands, then notify the host, so we
> > > will have only one vmexit per display update instead of one vmexit per
> > > command.
> >
> > We also want to prevent this scenario:
> >
> > 1) Guest creates a writecombine mapping (like drm_gem_shmem_* helpers
> > do), but the pages were previously accessed and thus some data is in
> > the cache.
> > 2) Cache eviction
> > 3) Non-zero data in buffer.
>
> This can happen on arm I guess?

Exactly.

> I know caching is a more complicated on
> arm than on x86.  Care to explaint this (or do you have a good link?).

Two differences:
- On x86, the guest can perform cache operations, while on ARM it's a
privileged operation [1].
- On ARM, the guest attribute seems to override the host attribute[1].
I've verified this.

Even on x86, it seems it's possible to get WC buffers by changing the
MTTR[2] (though I haven't verified this yet).

[1] https://events.static.linuxfound.org/sites/events/files/slides/slides_10.pdf
[2] https://lwn.net/Articles/646712/

We (you?) need to modify drm_gem_shmem_* helpers to always be cached
for virtgpu for now, since the TTM removal changed that behavior.

Then, we should expose host properties to the guest, like [1] suggests
(maybe something like VIRTIO_GPU_F_GUEST_ATTRIBUTE_OVERRIDE,
VIRTIO_GPU_F_GUEST_CACHE_OPS).  If there's already a method to do
this, that would be awesome.

Then we can optimize away.

>
> Also: what is special in virtualization here?  There are arm drivers
> using drm_gem_shmem_* helpers.  How do they manage this?

Panfrost, v3d.  I'm surprised they haven't hit this issue yet.

>
> > > Is there a good reason why the driver should be aware of that?
> >
> > Technically TRANSFER_TO_HOST is legal for 2D resources (though rarely
> > used).
>
> It's used for dumb gem objects.
>
> > I suppose we can modify RESOURCE_INFO to inform userspace the
> > resource is shared, and thus avoid hypercalls.
> >
> > If we leave the decision to create the shared resource to the driver
> > and not the device, *_2D_SHARED makes sense.  Right now, it's
> > *_2D_MAYBE_SHARED.
>
> Yes, the guest can't force it.
>
> cheers,
>   Gerd
>

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] Re: [PATCH] virtio-gpu: add shared resource feature

2019-12-09 Thread Gerd Hoffmann
> > > 1) Driver sends RESOURCE_CREATE_2D shared request
> > > 2) Driver sends ATTACH_BACKING request
> > > 3) Device creates a shared resource
> > > 4) Driver sends SET_SCANOUT request
> > > 5) Device sends shared resource to display
> > > 6) Driver sends DETACH_BACKING request
> >
> > Hmm, I think you can crash the current qemu code that way (didn't test
> > though).  I think the proper solution would be to return -EBUSY on the
> > DETACH_BACKING request.  Guess I'll add a new error code for that.
> 
> That would add the extra complication of a "delayed detach list" in
> the guest kernel.

Why?  I can't see how that can happen with the linux guest driver.  You
need a drm framebuffer to map an gem object to a scanout.  The drm
framebuffer holds a reference of the gem object, so the linux kernel
wouldn't try to delete the gem object (-> DETACH_BACKING + UNREF) while
it is mapped to the scanout because the refcount wouldn't go down to
zero.

> I don't know if it's absolutely necessary --
> because the open source implementation (udmabuf) seems to take a
> reference on all constituent pages like DRM gem does [a][b][c][d].  We
> should figure out if the extra reference is sufficient to prevent the
> pages from going back to guest RAM (the guest kernel takes references
> on those pages too -- I don't know if guest references are treated
> differently).

They are never taken away from guest ram.  The guest is free to do
whatever it wants with the pages.  It's the job of the guest driver to
make sure the pages are not released until the host stopped using them.
So the host must drop all references before completing the
DETACH_BACKING command.

> > I think we can do 90% of that optimization without changing the
> > protocol.  Display updates typically involve multiple commands.  A
> > transfer, possibly a set-scanout (when page-flipping), finally flush.
> > The driver can first queue all commands, then notify the host, so we
> > will have only one vmexit per display update instead of one vmexit per
> > command.
> 
> We also want to prevent this scenario:
> 
> 1) Guest creates a writecombine mapping (like drm_gem_shmem_* helpers
> do), but the pages were previously accessed and thus some data is in
> the cache.
> 2) Cache eviction
> 3) Non-zero data in buffer.

This can happen on arm I guess?  I know caching is a more complicated on
arm than on x86.  Care to explaint this (or do you have a good link?).

Also: what is special in virtualization here?  There are arm drivers
using drm_gem_shmem_* helpers.  How do they manage this?

> > Is there a good reason why the driver should be aware of that?
> 
> Technically TRANSFER_TO_HOST is legal for 2D resources (though rarely
> used).

It's used for dumb gem objects.

> I suppose we can modify RESOURCE_INFO to inform userspace the
> resource is shared, and thus avoid hypercalls.
> 
> If we leave the decision to create the shared resource to the driver
> and not the device, *_2D_SHARED makes sense.  Right now, it's
> *_2D_MAYBE_SHARED.

Yes, the guest can't force it.

cheers,
  Gerd


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] Re: [PATCH] virtio-gpu: add shared resource feature

2019-12-05 Thread Gurchetan Singh
On Wed, Dec 4, 2019 at 11:09 PM Gerd Hoffmann  wrote:
>
>   Hi,
>
> > If the following scenario happens:
> >
> > 1) Driver sends RESOURCE_CREATE_2D shared request
> > 2) Driver sends ATTACH_BACKING request
> > 3) Device creates a shared resource
> > 4) Driver sends SET_SCANOUT request
> > 5) Device sends shared resource to display
> > 6) Driver sends DETACH_BACKING request
>
> Hmm, I think you can crash the current qemu code that way (didn't test
> though).  I think the proper solution would be to return -EBUSY on the
> DETACH_BACKING request.  Guess I'll add a new error code for that.

That would add the extra complication of a "delayed detach list" in
the guest kernel.  I don't know if it's absolutely necessary --
because the open source implementation (udmabuf) seems to take a
reference on all constituent pages like DRM gem does [a][b][c][d].  We
should figure out if the extra reference is sufficient to prevent the
pages from going back to guest RAM (the guest kernel takes references
on those pages too -- I don't know if guest references are treated
differently).

[a] https://github.com/torvalds/linux/blob/master/drivers/gpu/drm/drm_gem.c#L578
[b] https://github.com/torvalds/linux/blob/master/drivers/dma-buf/udmabuf.c#L161
[c] https://github.com/torvalds/linux/blob/master/drivers/dma-buf/udmabuf.c#L203
[d] http://lkml.iu.edu/hypermail/linux/kernel/0907.3/02155.html

>
> > > +Otherwise the shared resources are used like normal resources.
> > > +Especially the driver must send explicit VIRTIO_GPU_CMD_TRANSFER_*
> > > +commands to the device for both normal and shared resources.  Reason:
> > > +The device might have to flush caches.
> >
> > Can we make this spec stronger to avoid to avoid transfers all the
> > time (i.e, in virtio_gpu_update_dumb_bo)?
> >
> > drm_gem_shmem_* helpers seem to use WC buffers, and dumb buffers are
> > traditionally WC as well.  If we mandate the host ("device?" here)
> > must properly clean caches at creation time, it may be possible to
> > avoid hypercalls for 2D_SHARED resources.
>
> I think we can do 90% of that optimization without changing the
> protocol.  Display updates typically involve multiple commands.  A
> transfer, possibly a set-scanout (when page-flipping), finally flush.
> The driver can first queue all commands, then notify the host, so we
> will have only one vmexit per display update instead of one vmexit per
> command.

We also want to prevent this scenario:

1) Guest creates a writecombine mapping (like drm_gem_shmem_* helpers
do), but the pages were previously accessed and thus some data is in
the cache.
2) Cache eviction
3) Non-zero data in buffer.

>
> > > The device MAY also choose to
> > > +not create mapping for the shared resource.  Especially for small
> > > +resources it might be more efficient to just copy the data instead of
> > > +establishing a shared mapping.
> >
> > Should the device send a response code  (OK_SHARED} to inform the
> > driver that the resource is shared?
>
> Is there a good reason why the driver should be aware of that?

Technically TRANSFER_TO_HOST is legal for 2D resources (though rarely
used).  I suppose we can modify RESOURCE_INFO to inform userspace the
resource is shared, and thus avoid hypercalls.

If we leave the decision to create the shared resource to the driver
and not the device, *_2D_SHARED makes sense.  Right now, it's
*_2D_MAYBE_SHARED.

>
> I'd prefer to make that an internal implementation detail of
> the device and not inform the guest.
>
> cheers,
>   Gerd
>

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] Re: [PATCH] virtio-gpu: add shared resource feature

2019-12-04 Thread Gerd Hoffmann
  Hi,

> If the following scenario happens:
> 
> 1) Driver sends RESOURCE_CREATE_2D shared request
> 2) Driver sends ATTACH_BACKING request
> 3) Device creates a shared resource
> 4) Driver sends SET_SCANOUT request
> 5) Device sends shared resource to display
> 6) Driver sends DETACH_BACKING request

Hmm, I think you can crash the current qemu code that way (didn't test
though).  I think the proper solution would be to return -EBUSY on the
DETACH_BACKING request.  Guess I'll add a new error code for that.

> > +Otherwise the shared resources are used like normal resources.
> > +Especially the driver must send explicit VIRTIO_GPU_CMD_TRANSFER_*
> > +commands to the device for both normal and shared resources.  Reason:
> > +The device might have to flush caches.
> 
> Can we make this spec stronger to avoid to avoid transfers all the
> time (i.e, in virtio_gpu_update_dumb_bo)?
> 
> drm_gem_shmem_* helpers seem to use WC buffers, and dumb buffers are
> traditionally WC as well.  If we mandate the host ("device?" here)
> must properly clean caches at creation time, it may be possible to
> avoid hypercalls for 2D_SHARED resources.

I think we can do 90% of that optimization without changing the
protocol.  Display updates typically involve multiple commands.  A
transfer, possibly a set-scanout (when page-flipping), finally flush.
The driver can first queue all commands, then notify the host, so we
will have only one vmexit per display update instead of one vmexit per
command.

> > The device MAY also choose to
> > +not create mapping for the shared resource.  Especially for small
> > +resources it might be more efficient to just copy the data instead of
> > +establishing a shared mapping.
> 
> Should the device send a response code  (OK_SHARED} to inform the
> driver that the resource is shared?

Is there a good reason why the driver should be aware of that?

I'd prefer to make that an internal implementation detail of
the device and not inform the guest.

cheers,
  Gerd


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org