Re: [Xen-devel][RFC 3/3] xen/gntdev: Add support for Linux dma buffers

2018-05-21 Thread Dongwon Kim
Still need more time to review the whole code changes but I noticed one thing.

We've been using the term "hyper_dmabuf" for hypervisor-agnostic linux dmabuf
solution and we are planning to call any of our future solution for other
hypervisors the same name. So having same name for this xen-specific structure
or functions you implemented is confusing. Would you change it to something
else like... "xen_"? 

On Thu, May 17, 2018 at 11:26:04AM +0300, Oleksandr Andrushchenko wrote:
> From: Oleksandr Andrushchenko 
> 
> Signed-off-by: Oleksandr Andrushchenko 
> ---
>  drivers/xen/gntdev.c  | 954 +-
>  include/uapi/xen/gntdev.h | 101 
>  include/xen/gntdev_exp.h  |  23 +
>  3 files changed, 1066 insertions(+), 12 deletions(-)
>  create mode 100644 include/xen/gntdev_exp.h
> 
> diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
> index 9510f228efe9..0ee88e193362 100644
> --- a/drivers/xen/gntdev.c
> +++ b/drivers/xen/gntdev.c
> @@ -4,6 +4,8 @@
>   * Device for accessing (in user-space) pages that have been granted by other
>   * domains.
>   *
> + * DMA buffer implementation is based on drivers/gpu/drm/drm_prime.c.
> + *
>   * Copyright (c) 2006-2007, D G Murray.
>   *   (c) 2009 Gerd Hoffmann 
>   *
> @@ -37,6 +39,9 @@
>  #include 
>  #include 
>  
> +#include 
> +#include 
> +
>  #include 
>  #include 
>  #include 
> @@ -61,16 +66,39 @@ static atomic_t pages_mapped = ATOMIC_INIT(0);
>  static int use_ptemod;
>  #define populate_freeable_maps use_ptemod
>  
> +#ifndef GRANT_INVALID_REF
> +/*
> + * Note on usage of grant reference 0 as invalid grant reference:
> + * grant reference 0 is valid, but never exposed to a driver,
> + * because of the fact it is already in use/reserved by the PV console.
> + */
> +#define GRANT_INVALID_REF0
> +#endif
> +
>  struct gntdev_priv {
>   /* maps with visible offsets in the file descriptor */
>   struct list_head maps;
>   /* maps that are not visible; will be freed on munmap.
>* Only populated if populate_freeable_maps == 1 */
>   struct list_head freeable_maps;
> + /* List of dma-bufs. */
> + struct list_head dma_bufs;
>   /* lock protects maps and freeable_maps */
>   struct mutex lock;
>   struct mm_struct *mm;
>   struct mmu_notifier mn;
> +
> + /* Private data of the hyper DMA buffers. */
> +
> + struct device *dev;
> + /* List of exported DMA buffers. */
> + struct list_head dmabuf_exp_list;
> + /* List of wait objects. */
> + struct list_head dmabuf_exp_wait_list;
> + /* List of imported DMA buffers. */
> + struct list_head dmabuf_imp_list;
> + /* This is the lock which protects dma_buf_xxx lists. */
> + struct mutex dmabuf_lock;
>  };
>  
>  struct unmap_notify {
> @@ -95,10 +123,65 @@ struct grant_map {
>   struct gnttab_unmap_grant_ref *kunmap_ops;
>   struct page **pages;
>   unsigned long pages_vm_start;
> +
> + /*
> +  * All the fields starting with dmabuf_ are only valid if this
> +  * mapping is used for exporting a DMA buffer.
> +  * If dmabuf_vaddr is not NULL then this mapping is backed by DMA
> +  * capable memory.
> +  */
> +
> + /* Flags used to create this DMA buffer: GNTDEV_DMABUF_FLAG_XXX. */
> + bool dmabuf_flags;
> + /* Virtual/CPU address of the DMA buffer. */
> + void *dmabuf_vaddr;
> + /* Bus address of the DMA buffer. */
> + dma_addr_t dmabuf_bus_addr;
> +};
> +
> +struct hyper_dmabuf {
> + struct gntdev_priv *priv;
> + struct dma_buf *dmabuf;
> + struct list_head next;
> + int fd;
> +
> + union {
> + struct {
> + /* Exported buffers are reference counted. */
> + struct kref refcount;
> + struct grant_map *map;
> + } exp;
> + struct {
> + /* Granted references of the imported buffer. */
> + grant_ref_t *refs;
> + /* Scatter-gather table of the imported buffer. */
> + struct sg_table *sgt;
> + /* dma-buf attachment of the imported buffer. */
> + struct dma_buf_attachment *attach;
> + } imp;
> + } u;
> +
> + /* Number of pages this buffer has. */
> + int nr_pages;
> + /* Pages of this buffer. */
> + struct page **pages;
> +};
> +
> +struct hyper_dmabuf_wait_obj {
> + struct list_head next;
> + struct hyper_dmabuf *hyper_dmabuf;
> + struct completion completion;
> +};
> +
> +struct hyper_dambuf_attachment {
minor typo: dam->dma (same thing in other places as well.)

> + struct sg_table *sgt;
> + enum dma_data_direction dir;
>  };
>  
>  static int unmap_grant_pages(struct grant_map *map, int offset, int pages);
>  
> +static struct miscdevice gntdev_miscdev;
> +
>  /* 

buffer sharing across VMs - xen-zcopy and hyper_dmabuf discussion

2018-04-12 Thread Dongwon Kim
(changed subject and decoupling from udmabuf thread)

On Wed, Apr 11, 2018 at 08:59:32AM +0300, Oleksandr Andrushchenko wrote:
> On 04/10/2018 08:26 PM, Dongwon Kim wrote:
> >On Tue, Apr 10, 2018 at 09:37:53AM +0300, Oleksandr Andrushchenko wrote:
> >>On 04/06/2018 09:57 PM, Dongwon Kim wrote:
> >>>On Fri, Apr 06, 2018 at 03:36:03PM +0300, Oleksandr Andrushchenko wrote:
> >>>>On 04/06/2018 02:57 PM, Gerd Hoffmann wrote:
> >>>>>   Hi,
> >>>>>
> >>>>>>>I fail to see any common ground for xen-zcopy and udmabuf ...
> >>>>>>Does the above mean you can assume that xen-zcopy and udmabuf
> >>>>>>can co-exist as two different solutions?
> >>>>>Well, udmabuf route isn't fully clear yet, but yes.
> >>>>>
> >>>>>See also gvt (intel vgpu), where the hypervisor interface is abstracted
> >>>>>away into a separate kernel modules even though most of the actual vgpu
> >>>>>emulation code is common.
> >>>>Thank you for your input, I'm just trying to figure out
> >>>>which of the three z-copy solutions intersect and how much
> >>>>>>And what about hyper-dmabuf?
> >>>xen z-copy solution is pretty similar fundamentally to hyper_dmabuf
> >>>in terms of these core sharing feature:
> >>>
> >>>1. the sharing process - import prime/dmabuf from the producer -> extract
> >>>underlying pages and get those shared -> return references for shared pages
> >Another thing is danvet was kind of against to the idea of importing existing
> >dmabuf/prime buffer and forward it to the other domain due to synchronization
> >issues. He proposed to make hyper_dmabuf only work as an exporter so that it
> >can have a full control over the buffer. I think we need to talk about this
> >further as well.
> Yes, I saw this. But this limits the use-cases so much.

I agree. Our current approach is a lot more flexible. You can find very
similar feedback in my reply to those review messages. However, I also
understand Daniel's concern as well. I believe we need more dicussion
regarding this matter.

> For instance, running Android as a Guest (which uses ION to allocate
> buffers) means that finally HW composer will import dma-buf into
> the DRM driver. Then, in case of xen-front for example, it needs to be
> shared with the backend (Host side). Of course, we can change user-space
> to make xen-front allocate the buffers (make it exporter), but what we try
> to avoid is to change user-space which in normal world would have remain
> unchanged otherwise.
> So, I do think we have to support this use-case and just have to understand
> the complexity.
> 
> >
> >danvet, can you comment on this topic?
> >
> >>>2. the page sharing mechanism - it uses Xen-grant-table.
> >>>
> >>>And to give you a quick summary of differences as far as I understand
> >>>between two implementations (please correct me if I am wrong, Oleksandr.)
> >>>
> >>>1. xen-zcopy is DRM specific - can import only DRM prime buffer
> >>>while hyper_dmabuf can export any dmabuf regardless of originator
> >>Well, this is true. And at the same time this is just a matter
> >>of extending the API: xen-zcopy is a helper driver designed for
> >>xen-front/back use-case, so this is why it only has DRM PRIME API
> >>>2. xen-zcopy doesn't seem to have dma-buf synchronization between two VMs
> >>>while (as danvet called it as remote dmabuf api sharing) hyper_dmabuf sends
> >>>out synchronization message to the exporting VM for synchronization.
> >>This is true. Again, this is because of the use-cases it covers.
> >>But having synchronization for a generic solution seems to be a good idea.
> >Yeah, understood xen-zcopy works ok with your use case. But I am just curious
> >if it is ok not to have any inter-domain synchronization in this sharing 
> >model.
> The synchronization is done with displif protocol [1]
> >The buffer being shared is technically dma-buf and originator needs to be 
> >able
> >to keep track of it.
> As I am working in DRM terms the tracking is done by the DRM core
> for me for free. (This might be one of the reasons Daniel sees DRM
> based implementation fit very good from code-reuse POV).

yeah but once you have a DRM object (whether it's dmabuf or not) on a remote
domain, it is totally new object and out of sync (correct me if I am wrong)
with original DRM prime, isn't it? How could these two different but based on
same pages be synchronized?

> &g

Re: [RfC PATCH] Add udmabuf misc device

2018-04-10 Thread Dongwon Kim
On Tue, Apr 10, 2018 at 09:37:53AM +0300, Oleksandr Andrushchenko wrote:
> On 04/06/2018 09:57 PM, Dongwon Kim wrote:
> >On Fri, Apr 06, 2018 at 03:36:03PM +0300, Oleksandr Andrushchenko wrote:
> >>On 04/06/2018 02:57 PM, Gerd Hoffmann wrote:
> >>>   Hi,
> >>>
> >>>>>I fail to see any common ground for xen-zcopy and udmabuf ...
> >>>>Does the above mean you can assume that xen-zcopy and udmabuf
> >>>>can co-exist as two different solutions?
> >>>Well, udmabuf route isn't fully clear yet, but yes.
> >>>
> >>>See also gvt (intel vgpu), where the hypervisor interface is abstracted
> >>>away into a separate kernel modules even though most of the actual vgpu
> >>>emulation code is common.
> >>Thank you for your input, I'm just trying to figure out
> >>which of the three z-copy solutions intersect and how much
> >>>>And what about hyper-dmabuf?
> >xen z-copy solution is pretty similar fundamentally to hyper_dmabuf
> >in terms of these core sharing feature:
> >
> >1. the sharing process - import prime/dmabuf from the producer -> extract
> >underlying pages and get those shared -> return references for shared pages

Another thing is danvet was kind of against to the idea of importing existing
dmabuf/prime buffer and forward it to the other domain due to synchronization
issues. He proposed to make hyper_dmabuf only work as an exporter so that it
can have a full control over the buffer. I think we need to talk about this
further as well.

danvet, can you comment on this topic?

> >
> >2. the page sharing mechanism - it uses Xen-grant-table.
> >
> >And to give you a quick summary of differences as far as I understand
> >between two implementations (please correct me if I am wrong, Oleksandr.)
> >
> >1. xen-zcopy is DRM specific - can import only DRM prime buffer
> >while hyper_dmabuf can export any dmabuf regardless of originator
> Well, this is true. And at the same time this is just a matter
> of extending the API: xen-zcopy is a helper driver designed for
> xen-front/back use-case, so this is why it only has DRM PRIME API
> >
> >2. xen-zcopy doesn't seem to have dma-buf synchronization between two VMs
> >while (as danvet called it as remote dmabuf api sharing) hyper_dmabuf sends
> >out synchronization message to the exporting VM for synchronization.
> This is true. Again, this is because of the use-cases it covers.
> But having synchronization for a generic solution seems to be a good idea.

Yeah, understood xen-zcopy works ok with your use case. But I am just curious
if it is ok not to have any inter-domain synchronization in this sharing model.
The buffer being shared is technically dma-buf and originator needs to be able
to keep track of it.

> >
> >3. 1-level references - when using grant-table for sharing pages, there will
> >be same # of refs (each 8 byte)
> To be precise, grant ref is 4 bytes
You are right. Thanks for correction.;)

> >as # of shared pages, which is passed to
> >the userspace to be shared with importing VM in case of xen-zcopy.
> The reason for that is that xen-zcopy is a helper driver, e.g.
> the grant references come from the display backend [1], which implements
> Xen display protocol [2]. So, effectively the backend extracts references
> from frontend's requests and passes those to xen-zcopy as an array
> of refs.
> >  Compared
> >to this, hyper_dmabuf does multiple level addressing to generate only one
> >reference id that represents all shared pages.
> In the protocol [2] only one reference to the gref directory is passed
> between VMs
> (and the gref directory is a single-linked list of shared pages containing
> all
> of the grefs of the buffer).

ok, good to know. I will look into its implementation in more details but is
this gref directory (chained grefs) something that can be used for any general
memory sharing use case or is it jsut for xen-display (in current code base)?

> 
> >
> >4. inter VM messaging (hype_dmabuf only) - hyper_dmabuf has inter-vm msg
> >communication defined for dmabuf synchronization and private data (meta
> >info that Matt Roper mentioned) exchange.
> This is true, xen-zcopy has no means for inter VM sync and meta-data,
> simply because it doesn't have any code for inter VM exchange in it,
> e.g. the inter VM protocol is handled by the backend [1].
> >
> >5. driver-to-driver notification (hyper_dmabuf only) - importing VM gets
> >notified when newdmabuf is exported from other VM - uevent can be optionally
> >generated when this happens.
> >
> >6. structure - hyper_dmabuf is targetting to provide a ge

Re: [RfC PATCH] Add udmabuf misc device

2018-04-06 Thread Dongwon Kim
On Fri, Apr 06, 2018 at 03:36:03PM +0300, Oleksandr Andrushchenko wrote:
> On 04/06/2018 02:57 PM, Gerd Hoffmann wrote:
> >   Hi,
> >
> >>>I fail to see any common ground for xen-zcopy and udmabuf ...
> >>Does the above mean you can assume that xen-zcopy and udmabuf
> >>can co-exist as two different solutions?
> >Well, udmabuf route isn't fully clear yet, but yes.
> >
> >See also gvt (intel vgpu), where the hypervisor interface is abstracted
> >away into a separate kernel modules even though most of the actual vgpu
> >emulation code is common.
> Thank you for your input, I'm just trying to figure out
> which of the three z-copy solutions intersect and how much
> >>And what about hyper-dmabuf?

xen z-copy solution is pretty similar fundamentally to hyper_dmabuf
in terms of these core sharing feature:

1. the sharing process - import prime/dmabuf from the producer -> extract
underlying pages and get those shared -> return references for shared pages

2. the page sharing mechanism - it uses Xen-grant-table.

And to give you a quick summary of differences as far as I understand
between two implementations (please correct me if I am wrong, Oleksandr.)

1. xen-zcopy is DRM specific - can import only DRM prime buffer
while hyper_dmabuf can export any dmabuf regardless of originator

2. xen-zcopy doesn't seem to have dma-buf synchronization between two VMs
while (as danvet called it as remote dmabuf api sharing) hyper_dmabuf sends
out synchronization message to the exporting VM for synchronization.

3. 1-level references - when using grant-table for sharing pages, there will
be same # of refs (each 8 byte) as # of shared pages, which is passed to
the userspace to be shared with importing VM in case of xen-zcopy. Compared
to this, hyper_dmabuf does multiple level addressing to generate only one
reference id that represents all shared pages.

4. inter VM messaging (hype_dmabuf only) - hyper_dmabuf has inter-vm msg
communication defined for dmabuf synchronization and private data (meta
info that Matt Roper mentioned) exchange.

5. driver-to-driver notification (hyper_dmabuf only) - importing VM gets
notified when newdmabuf is exported from other VM - uevent can be optionally
generated when this happens.

6. structure - hyper_dmabuf is targetting to provide a generic solution for
inter-domain dmabuf sharing for most hypervisors, which is why it has two
layers as mattrope mentioned, front-end that contains standard API and backend
that is specific to hypervisor.

> >No idea, didn't look at it in detail.
> >
> >Looks pretty complex from a distant view.  Maybe because it tries to
> >build a communication framework using dma-bufs instead of a simple
> >dma-buf passing mechanism.

we started with simple dma-buf sharing but realized there are many
things we need to consider in real use-case, so we added communication
, notification and dma-buf synchronization then re-structured it to 
front-end and back-end (this made things more compicated..) since Xen
was not our only target. Also, we thought passing the reference for the
buffer (hyper_dmabuf_id) is not secure so added uvent mechanism later.

> Yes, I am looking at it now, trying to figure out the full story
> and its implementation. BTW, Intel guys were about to share some
> test application for hyper-dmabuf, maybe I have missed one.
> It could probably better explain the use-cases and the complexity
> they have in hyper-dmabuf.

One example is actually in github. If you want take a look at it, please
visit:

https://github.com/downor/linux_hyper_dmabuf_test/tree/xen/simple_export

> >
> >Like xen-zcopy it seems to depend on the idea that the hypervisor
> >manages all memory it is easy for guests to share pages with the help of
> >the hypervisor.
> So, for xen-zcopy we were not trying to make it generic,
> it just solves display (dumb) zero-copying use-cases for Xen.
> We implemented it as a DRM helper driver because we can't see any
> other use-cases as of now.
> For example, we also have Xen para-virtualized sound driver, but
> its buffer memory usage is not comparable to what display wants
> and it works somewhat differently (e.g. there is no "frame done"
> event, so one can't tell when the sound buffer can be "flipped").
> At the same time, we do not use virtio-gpu, so this could probably
> be one more candidate for shared dma-bufs some day.
> >   Which simply isn't the case on kvm.
> >
> >hyper-dmabuf and xen-zcopy could maybe share code, or hyper-dmabuf build
> >on top of xen-zcopy.
> Hm, I can imagine that: xen-zcopy could be a library code for hyper-dmabuf
> in terms of implementing all that page sharing fun in multiple directions,
> e.g. Host->Guest, Guest->Host, Guest<->Guest.
> But I'll let Matt and Dongwon to comment on that.

I think we can definitely collaborate. Especially, maybe we are using some
outdated sharing mechanism/grant-table mechanism in our Xen backend (thanks
for bringing that up Oleksandr). However, the question is once we collaborate