Re: Number of planes from fourcc code
On 09/13/2018 02:46 PM, Hans Verkuil wrote: On 09/13/18 13:29, Oleksandr Andrushchenko wrote: Hi, all! Is there a way in V4L2 to get number of planes from fourcc code or specifically I need number of planes for a given pixel format expressed as V4L2_PIX_FMT_* value. Sadly not. It's part of the documentation for the formats, but there is no naming scheme through which you can deduce this or even helper functions for it. Ok, then I'll probably try to explain what I want to do: I am implementing a Xen frontend driver which implements a para-virtual camera protocol [1] which can support multiple pixel formats. Thus, the driver will support both single and multi plane formats. For that I need to implement both single and multi plane format enumerators: .vidioc_enum_fmt_vid_cap .vidioc_enum_fmt_vid_cap_mplane and for .vidioc_enum_fmt_vid_cap I have to filter out of supported pixel formats those which are multi-planar. So, I hoped I can use some helper (like DRM provides) to get number of planes for a given pixel format. So, it seems that I'll have to code similar table as DRM does for various V4L2 encoded pixel formats to get num_planes... I think the main reason why this never happened is that drivers tend to have custom code for this anyway. I have proposed in the past that some of this information is exposed via VIDIOC_ENUM_FMT, but it never got traction. I know that DRM has such a helper [1], but I am not quite sure if I can call it with V4L2_PIX_FMT_* as argument to get what I need. I am a bit confused here because there are different definitions for DRM [2] and V4L2 [3]. I know. Each subsystem has traditionally been assigning fourccs independently. In all fairness, this seems to be the case for fourccs throughout the whole industry. Regards, Hans Thank you, Oleksandr [1] https://elixir.bootlin.com/linux/v4.19-rc3/source/drivers/gpu/drm/drm_fourcc.c#L199 [2] https://elixir.bootlin.com/linux/v4.19-rc3/source/include/uapi/drm/drm_fourcc.h [3] https://elixir.bootlin.com/linux/v4.19-rc3/source/include/uapi/linux/videodev2.h [1] https://patchwork.kernel.org/patch/10595259/
Number of planes from fourcc code
Hi, all! Is there a way in V4L2 to get number of planes from fourcc code or specifically I need number of planes for a given pixel format expressed as V4L2_PIX_FMT_* value. I know that DRM has such a helper [1], but I am not quite sure if I can call it with V4L2_PIX_FMT_* as argument to get what I need. I am a bit confused here because there are different definitions for DRM [2] and V4L2 [3]. Thank you, Oleksandr [1] https://elixir.bootlin.com/linux/v4.19-rc3/source/drivers/gpu/drm/drm_fourcc.c#L199 [2] https://elixir.bootlin.com/linux/v4.19-rc3/source/include/uapi/drm/drm_fourcc.h [3] https://elixir.bootlin.com/linux/v4.19-rc3/source/include/uapi/linux/videodev2.h
Re: [Xen-devel] [PATCH v2 5/9] xen/gntdev: Allow mappings for DMA buffers
On 06/08/2018 10:21 PM, Boris Ostrovsky wrote: On 06/08/2018 01:59 PM, Stefano Stabellini wrote: @@ -325,6 +401,14 @@ static int map_grant_pages(struct grant_map *map) map->unmap_ops[i].handle = map->map_ops[i].handle; if (use_ptemod) map->kunmap_ops[i].handle = map->kmap_ops[i].handle; +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC + else if (map->dma_vaddr) { + unsigned long mfn; + + mfn = __pfn_to_mfn(page_to_pfn(map->pages[i])); Not pfn_to_mfn()? I'd love to, but pfn_to_mfn is only defined for x86, not ARM: [1] and [2] Thus, drivers/xen/gntdev.c:408:10: error: implicit declaration of function ‘pfn_to_mfn’ [-Werror=implicit-function-declaration] mfn = pfn_to_mfn(page_to_pfn(map->pages[i])); So, I'll keep __pfn_to_mfn How will this work on non-PV x86? So, you mean I need: #ifdef CONFIG_X86 mfn = pfn_to_mfn(page_to_pfn(map->pages[i])); #else mfn = __pfn_to_mfn(page_to_pfn(map->pages[i])); #endif I'd rather fix it in ARM code. Stefano, why does ARM uses the underscored version? Do you want me to add one more patch for ARM to wrap __pfn_to_mfn with static inline for ARM? e.g. static inline ...pfn_to_mfn(...) { __pfn_to_mfn(); } A Xen on ARM guest doesn't actually know the mfns behind its own pseudo-physical pages. This is why we stopped using pfn_to_mfn and started using pfn_to_bfn instead, which will generally return "pfn", unless the page is a foreign grant. See include/xen/arm/page.h. pfn_to_bfn was also introduced on x86. For example, see the usage of pfn_to_bfn in drivers/xen/swiotlb-xen.c. Otherwise, if you don't care about other mapped grants, you can just use pfn_to_gfn, that always returns pfn. I think then this code needs to use pfn_to_bfn(). Ok Also, for your information, we support different page granularities in Linux as a Xen guest, see the comment at include/xen/arm/page.h: /* * The pseudo-physical frame (pfn) used in all the helpers is always based * on Xen page granularity (i.e 4KB). * * A Linux page may be split across multiple non-contiguous Xen page so we * have to keep track with frame based on 4KB page granularity. * * PV drivers should never make a direct usage of those helpers (particularly * pfn_to_gfn and gfn_to_pfn). */ A Linux page could be 64K, but a Xen page is always 4K. A granted page is also 4K. We have helpers to take into account the offsets to map multiple Xen grants in a single Linux page, see for example drivers/xen/grant-table.c:gnttab_foreach_grant. Most PV drivers have been converted to be able to work with 64K pages correctly, but if I remember correctly gntdev.c is the only remaining driver that doesn't support 64K pages yet, so you don't have to deal with it if you don't want to. I believe somewhere in this series there is a test for PAGE_SIZE vs. XEN_PAGE_SIZE. Right, Oleksandr? Not in gntdev. You might have seen this in xen-drmfront/xen-sndfront, but I didn't touch gntdev for that. Do you want me to add yet another patch in the series to check for that? Thanks for the explanation. -boris Thank you, Oleksandr
[PATCH 0/8] xen: dma-buf support for grant device
From: Oleksandr Andrushchenko <oleksandr_andrushche...@epam.com> This work is in response to my previous attempt to introduce Xen/DRM zero-copy driver [1] to enable Linux dma-buf API [2] for Xen based frontends/backends. There is also an existing hyper_dmabuf approach available [3] which, if reworked to utilize the proposed solution, can greatly benefit as well. RFC for this series was published and discussed [9], comments addressed. The original rationale behind this work was to enable zero-copying use-cases while working with Xen para-virtual display driver [4]: when using Xen PV DRM frontend driver then on backend side one will need to do copying of display buffers' contents (filled by the frontend's user-space) into buffers allocated at the backend side. Taking into account the size of display buffers and frames per second it may result in unneeded huge data bus occupation and performance loss. The helper driver [4] allows implementing zero-copying use-cases when using Xen para-virtualized frontend display driver by implementing a DRM/KMS helper driver running on backend's side. It utilizes PRIME buffers API (implemented on top of Linux dma-buf) to share frontend's buffers with physical device drivers on backend's side: - a dumb buffer created on backend's side can be shared with the Xen PV frontend driver, so it directly writes into backend's domain memory (into the buffer exported from DRM/KMS driver of a physical display device) - a dumb buffer allocated by the frontend can be imported into physical device DRM/KMS driver, thus allowing to achieve no copying as well Finally, it was discussed and decided ([1], [5]) that it is worth implementing such use-cases via extension of the existing Xen gntdev driver instead of introducing new DRM specific driver. Please note, that the support of dma-buf is Linux only, as dma-buf is a Linux only thing. Now to the proposed solution. The changes to the existing Xen drivers in the Linux kernel fall into 2 categories: 1. DMA-able memory buffer allocation and increasing/decreasing memory reservation of the pages of such a buffer. This is required if we are about to share dma-buf with the hardware that does require those to be allocated with dma_alloc_xxx API. (It is still possible to allocate a dma-buf from any system memory, e.g. system pages). 2. Extension of the gntdev driver to enable it to import/export dma-buf’s. The first four patches are in preparation for Xen dma-buf support, but I consider those usable regardless of the dma-buf use-case, e.g. other frontend/backend kernel modules may also benefit from these for better code reuse: 0001-xen-grant-table-Make-set-clear-page-private-code-sha.patch 0002-xen-balloon-Move-common-memory-reservation-routines-.patch 0003-xen-grant-table-Allow-allocating-buffers-suitable-fo.patch 0004-xen-gntdev-Allow-mappings-for-DMA-buffers.patch The next three patches are Xen implementation of dma-buf as part of the grant device: 0005-xen-gntdev-Add-initial-support-for-dma-buf-UAPI.patch 0006-xen-gntdev-Implement-dma-buf-export-functionality.patch 0007-xen-gntdev-Implement-dma-buf-import-functionality.patch The last patch makes it possible for in-kernel use of Xen dma-buf API: 0008-xen-gntdev-Expose-gntdev-s-dma-buf-API-for-in-kernel.patch The corresponding libxengnttab changes are available at [6]. All the above was tested with display backend [7] and its accompanying helper library [8] on Renesas ARM64 based board. *To all the communities*: I would like to ask you to review the proposed solution and give feedback on it, so I can improve and send final patches for review (this is still work in progress, but enough to start discussing the implementation). Thank you in advance, Oleksandr Andrushchenko [1] https://lists.freedesktop.org/archives/dri-devel/2018-April/173163.html [2] https://elixir.bootlin.com/linux/v4.17-rc5/source/Documentation/driver-api/dma-buf.rst [3] https://lists.xenproject.org/archives/html/xen-devel/2018-02/msg01202.html [4] https://cgit.freedesktop.org/drm/drm-misc/tree/drivers/gpu/drm/xen [5] https://patchwork.kernel.org/patch/10279681/ [6] https://github.com/andr2000/xen/tree/xen_dma_buf_v1 [7] https://github.com/andr2000/displ_be/tree/xen_dma_buf_v1 [8] https://github.com/andr2000/libxenbe/tree/xen_dma_buf_v1 [9] https://lkml.org/lkml/2018/5/17/215 Oleksandr Andrushchenko (8): xen/grant-table: Make set/clear page private code shared xen/balloon: Move common memory reservation routines to a module xen/grant-table: Allow allocating buffers suitable for DMA xen/gntdev: Allow mappings for DMA buffers xen/gntdev: Add initial support for dma-buf UAPI xen/gntdev: Implement dma-buf export functionality xen/gntdev: Implement dma-buf import functionality xen/gntdev: Expose gntdev's dma-buf API for in-kernel use drivers/xen/Kconfig | 23 + drivers/xen/Makefile |1 + drivers/xen/balloon.c
[PATCH 3/8] xen/grant-table: Allow allocating buffers suitable for DMA
From: Oleksandr Andrushchenko <oleksandr_andrushche...@epam.com> Extend grant table module API to allow allocating buffers that can be used for DMA operations and mapping foreign grant references on top of those. The resulting buffer is similar to the one allocated by the balloon driver in terms that proper memory reservation is made ({increase|decrease}_reservation and VA mappings updated if needed). This is useful for sharing foreign buffers with HW drivers which cannot work with scattered buffers provided by the balloon driver, but require DMAable memory instead. Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushche...@epam.com> --- drivers/xen/Kconfig | 13 drivers/xen/grant-table.c | 124 ++ include/xen/grant_table.h | 25 3 files changed, 162 insertions(+) diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig index e5d0c28372ea..3431fe210624 100644 --- a/drivers/xen/Kconfig +++ b/drivers/xen/Kconfig @@ -161,6 +161,19 @@ config XEN_GRANT_DEV_ALLOC to other domains. This can be used to implement frontend drivers or as part of an inter-domain shared memory channel. +config XEN_GRANT_DMA_ALLOC + bool "Allow allocating DMA capable buffers with grant reference module" + depends on XEN + help + Extends grant table module API to allow allocating DMA capable + buffers and mapping foreign grant references on top of it. + The resulting buffer is similar to one allocated by the balloon + driver in terms that proper memory reservation is made + ({increase|decrease}_reservation and VA mappings updated if needed). + This is useful for sharing foreign buffers with HW drivers which + cannot work with scattered buffers provided by the balloon driver, + but require DMAable memory instead. + config SWIOTLB_XEN def_bool y select SWIOTLB diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c index d7488226e1f2..06fe6e7f639c 100644 --- a/drivers/xen/grant-table.c +++ b/drivers/xen/grant-table.c @@ -45,6 +45,9 @@ #include #include #include +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC +#include +#endif #include #include @@ -57,6 +60,7 @@ #ifdef CONFIG_X86 #include #endif +#include #include #include @@ -811,6 +815,82 @@ int gnttab_alloc_pages(int nr_pages, struct page **pages) } EXPORT_SYMBOL(gnttab_alloc_pages); +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC +/** + * gnttab_dma_alloc_pages - alloc DMAable pages suitable for grant mapping into + * @args: arguments to the function + */ +int gnttab_dma_alloc_pages(struct gnttab_dma_alloc_args *args) +{ + unsigned long pfn, start_pfn; + xen_pfn_t *frames; + size_t size; + int i, ret; + + frames = kcalloc(args->nr_pages, sizeof(*frames), GFP_KERNEL); + if (!frames) + return -ENOMEM; + + size = args->nr_pages << PAGE_SHIFT; + if (args->coherent) + args->vaddr = dma_alloc_coherent(args->dev, size, +>dev_bus_addr, +GFP_KERNEL | __GFP_NOWARN); + else + args->vaddr = dma_alloc_wc(args->dev, size, + >dev_bus_addr, + GFP_KERNEL | __GFP_NOWARN); + if (!args->vaddr) { + pr_err("Failed to allocate DMA buffer of size %zu\n", size); + ret = -ENOMEM; + goto fail_free_frames; + } + + start_pfn = __phys_to_pfn(args->dev_bus_addr); + for (pfn = start_pfn, i = 0; pfn < start_pfn + args->nr_pages; + pfn++, i++) { + struct page *page = pfn_to_page(pfn); + + args->pages[i] = page; + frames[i] = xen_page_to_gfn(page); + xenmem_reservation_scrub_page(page); + } + + xenmem_reservation_va_mapping_reset(args->nr_pages, args->pages); + + ret = xenmem_reservation_decrease(args->nr_pages, frames); + if (ret != args->nr_pages) { + pr_err("Failed to decrease reservation for DMA buffer\n"); + xenmem_reservation_increase(ret, frames); + ret = -EFAULT; + goto fail_free_dma; + } + + ret = gnttab_pages_set_private(args->nr_pages, args->pages); + if (ret < 0) + goto fail_clear_private; + + kfree(frames); + return 0; + +fail_clear_private: + gnttab_pages_clear_private(args->nr_pages, args->pages); +fail_free_dma: + xenmem_reservation_va_mapping_update(args->nr_pages, args->pages, +frames); + if (args->coherent) + dma_free_coherent(args->dev, size, +
[PATCH 1/8] xen/grant-table: Make set/clear page private code shared
From: Oleksandr Andrushchenko <oleksandr_andrushche...@epam.com> Make set/clear page private code shared and accessible to other kernel modules which can re-use these instead of open-coding. Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushche...@epam.com> --- drivers/xen/grant-table.c | 54 +-- include/xen/grant_table.h | 3 +++ 2 files changed, 38 insertions(+), 19 deletions(-) diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c index 27be107d6480..d7488226e1f2 100644 --- a/drivers/xen/grant-table.c +++ b/drivers/xen/grant-table.c @@ -769,29 +769,18 @@ void gnttab_free_auto_xlat_frames(void) } EXPORT_SYMBOL_GPL(gnttab_free_auto_xlat_frames); -/** - * gnttab_alloc_pages - alloc pages suitable for grant mapping into - * @nr_pages: number of pages to alloc - * @pages: returns the pages - */ -int gnttab_alloc_pages(int nr_pages, struct page **pages) +int gnttab_pages_set_private(int nr_pages, struct page **pages) { int i; - int ret; - - ret = alloc_xenballooned_pages(nr_pages, pages); - if (ret < 0) - return ret; for (i = 0; i < nr_pages; i++) { #if BITS_PER_LONG < 64 struct xen_page_foreign *foreign; foreign = kzalloc(sizeof(*foreign), GFP_KERNEL); - if (!foreign) { - gnttab_free_pages(nr_pages, pages); + if (!foreign) return -ENOMEM; - } + set_page_private(pages[i], (unsigned long)foreign); #endif SetPagePrivate(pages[i]); @@ -799,14 +788,30 @@ int gnttab_alloc_pages(int nr_pages, struct page **pages) return 0; } -EXPORT_SYMBOL(gnttab_alloc_pages); +EXPORT_SYMBOL(gnttab_pages_set_private); /** - * gnttab_free_pages - free pages allocated by gnttab_alloc_pages() - * @nr_pages; number of pages to free - * @pages: the pages + * gnttab_alloc_pages - alloc pages suitable for grant mapping into + * @nr_pages: number of pages to alloc + * @pages: returns the pages */ -void gnttab_free_pages(int nr_pages, struct page **pages) +int gnttab_alloc_pages(int nr_pages, struct page **pages) +{ + int ret; + + ret = alloc_xenballooned_pages(nr_pages, pages); + if (ret < 0) + return ret; + + ret = gnttab_pages_set_private(nr_pages, pages); + if (ret < 0) + gnttab_free_pages(nr_pages, pages); + + return ret; +} +EXPORT_SYMBOL(gnttab_alloc_pages); + +void gnttab_pages_clear_private(int nr_pages, struct page **pages) { int i; @@ -818,6 +823,17 @@ void gnttab_free_pages(int nr_pages, struct page **pages) ClearPagePrivate(pages[i]); } } +} +EXPORT_SYMBOL(gnttab_pages_clear_private); + +/** + * gnttab_free_pages - free pages allocated by gnttab_alloc_pages() + * @nr_pages; number of pages to free + * @pages: the pages + */ +void gnttab_free_pages(int nr_pages, struct page **pages) +{ + gnttab_pages_clear_private(nr_pages, pages); free_xenballooned_pages(nr_pages, pages); } EXPORT_SYMBOL(gnttab_free_pages); diff --git a/include/xen/grant_table.h b/include/xen/grant_table.h index 2e37741f6b8d..de03f2542bb7 100644 --- a/include/xen/grant_table.h +++ b/include/xen/grant_table.h @@ -198,6 +198,9 @@ void gnttab_free_auto_xlat_frames(void); int gnttab_alloc_pages(int nr_pages, struct page **pages); void gnttab_free_pages(int nr_pages, struct page **pages); +int gnttab_pages_set_private(int nr_pages, struct page **pages); +void gnttab_pages_clear_private(int nr_pages, struct page **pages); + int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops, struct gnttab_map_grant_ref *kmap_ops, struct page **pages, unsigned int count); -- 2.17.0
[PATCH 2/8] xen/balloon: Move common memory reservation routines to a module
From: Oleksandr Andrushchenko <oleksandr_andrushche...@epam.com> Memory {increase|decrease}_reservation and VA mappings update/reset code used in balloon driver can be made common, so other drivers can also re-use the same functionality without open-coding. Create a dedicated module for the shared code and export corresponding symbols for other kernel modules. Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushche...@epam.com> --- drivers/xen/Makefile | 1 + drivers/xen/balloon.c | 71 ++ drivers/xen/mem-reservation.c | 134 ++ include/xen/mem_reservation.h | 29 4 files changed, 170 insertions(+), 65 deletions(-) create mode 100644 drivers/xen/mem-reservation.c create mode 100644 include/xen/mem_reservation.h diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile index 451e833f5931..3c87b0c3aca6 100644 --- a/drivers/xen/Makefile +++ b/drivers/xen/Makefile @@ -2,6 +2,7 @@ obj-$(CONFIG_HOTPLUG_CPU) += cpu_hotplug.o obj-$(CONFIG_X86) += fallback.o obj-y += grant-table.o features.o balloon.o manage.o preempt.o time.o +obj-y += mem-reservation.o obj-y += events/ obj-y += xenbus/ diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c index 065f0b607373..57b482d67a3a 100644 --- a/drivers/xen/balloon.c +++ b/drivers/xen/balloon.c @@ -71,6 +71,7 @@ #include #include #include +#include static int xen_hotplug_unpopulated; @@ -157,13 +158,6 @@ static DECLARE_DELAYED_WORK(balloon_worker, balloon_process); #define GFP_BALLOON \ (GFP_HIGHUSER | __GFP_NOWARN | __GFP_NORETRY | __GFP_NOMEMALLOC) -static void scrub_page(struct page *page) -{ -#ifdef CONFIG_XEN_SCRUB_PAGES - clear_highpage(page); -#endif -} - /* balloon_append: add the given page to the balloon. */ static void __balloon_append(struct page *page) { @@ -463,11 +457,6 @@ static enum bp_state increase_reservation(unsigned long nr_pages) int rc; unsigned long i; struct page *page; - struct xen_memory_reservation reservation = { - .address_bits = 0, - .extent_order = EXTENT_ORDER, - .domid= DOMID_SELF - }; if (nr_pages > ARRAY_SIZE(frame_list)) nr_pages = ARRAY_SIZE(frame_list); @@ -486,9 +475,7 @@ static enum bp_state increase_reservation(unsigned long nr_pages) page = balloon_next_page(page); } - set_xen_guest_handle(reservation.extent_start, frame_list); - reservation.nr_extents = nr_pages; - rc = HYPERVISOR_memory_op(XENMEM_populate_physmap, ); + rc = xenmem_reservation_increase(nr_pages, frame_list); if (rc <= 0) return BP_EAGAIN; @@ -496,29 +483,7 @@ static enum bp_state increase_reservation(unsigned long nr_pages) page = balloon_retrieve(false); BUG_ON(page == NULL); -#ifdef CONFIG_XEN_HAVE_PVMMU - /* -* We don't support PV MMU when Linux and Xen is using -* different page granularity. -*/ - BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE); - - if (!xen_feature(XENFEAT_auto_translated_physmap)) { - unsigned long pfn = page_to_pfn(page); - - set_phys_to_machine(pfn, frame_list[i]); - - /* Link back into the page tables if not highmem. */ - if (!PageHighMem(page)) { - int ret; - ret = HYPERVISOR_update_va_mapping( - (unsigned long)__va(pfn << PAGE_SHIFT), - mfn_pte(frame_list[i], PAGE_KERNEL), - 0); - BUG_ON(ret); - } - } -#endif + xenmem_reservation_va_mapping_update(1, , _list[i]); /* Relinquish the page back to the allocator. */ free_reserved_page(page); @@ -535,11 +500,6 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp) unsigned long i; struct page *page, *tmp; int ret; - struct xen_memory_reservation reservation = { - .address_bits = 0, - .extent_order = EXTENT_ORDER, - .domid= DOMID_SELF - }; LIST_HEAD(pages); if (nr_pages > ARRAY_SIZE(frame_list)) @@ -553,7 +513,7 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp) break; } adjust_managed_page_count(page, -1); - scrub_page(page); + xenmem_reservation_scrub_page(page); list_add(>lru, ); } @@ -575,25 +535,8 @@ static enum bp_state decr
[PATCH 4/8] xen/gntdev: Allow mappings for DMA buffers
From: Oleksandr Andrushchenko <oleksandr_andrushche...@epam.com> Allow mappings for DMA backed buffers if grant table module supports such: this extends grant device to not only map buffers made of balloon pages, but also from buffers allocated with dma_alloc_xxx. Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushche...@epam.com> --- drivers/xen/gntdev.c | 100 +- include/uapi/xen/gntdev.h | 15 ++ 2 files changed, 113 insertions(+), 2 deletions(-) diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c index bd56653b9bbc..640a579f42ea 100644 --- a/drivers/xen/gntdev.c +++ b/drivers/xen/gntdev.c @@ -37,6 +37,9 @@ #include #include #include +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC +#include +#endif #include #include @@ -72,6 +75,11 @@ struct gntdev_priv { struct mutex lock; struct mm_struct *mm; struct mmu_notifier mn; + +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC + /* Device for which DMA memory is allocated. */ + struct device *dma_dev; +#endif }; struct unmap_notify { @@ -96,10 +104,28 @@ struct grant_map { struct gnttab_unmap_grant_ref *kunmap_ops; struct page **pages; unsigned long pages_vm_start; + +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC + /* +* If dmabuf_vaddr is not NULL then this mapping is backed by DMA +* capable memory. +*/ + + /* Device for which DMA memory is allocated. */ + struct device *dma_dev; + /* Flags used to create this DMA buffer: GNTDEV_DMABUF_FLAG_XXX. */ + bool dma_flags; + /* Virtual/CPU address of the DMA buffer. */ + void *dma_vaddr; + /* Bus address of the DMA buffer. */ + dma_addr_t dma_bus_addr; +#endif }; static int unmap_grant_pages(struct grant_map *map, int offset, int pages); +static struct miscdevice gntdev_miscdev; + /* -- */ static void gntdev_print_maps(struct gntdev_priv *priv, @@ -121,8 +147,26 @@ static void gntdev_free_map(struct grant_map *map) if (map == NULL) return; +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC + if (map->dma_vaddr) { + struct gnttab_dma_alloc_args args; + + args.dev = map->dma_dev; + args.coherent = map->dma_flags & GNTDEV_DMA_FLAG_COHERENT; + args.nr_pages = map->count; + args.pages = map->pages; + args.vaddr = map->dma_vaddr; + args.dev_bus_addr = map->dma_bus_addr; + + gnttab_dma_free_pages(); + } else if (map->pages) { + gnttab_free_pages(map->count, map->pages); + } +#else if (map->pages) gnttab_free_pages(map->count, map->pages); +#endif + kfree(map->pages); kfree(map->grants); kfree(map->map_ops); @@ -132,7 +176,8 @@ static void gntdev_free_map(struct grant_map *map) kfree(map); } -static struct grant_map *gntdev_alloc_map(struct gntdev_priv *priv, int count) +static struct grant_map *gntdev_alloc_map(struct gntdev_priv *priv, int count, + int dma_flags) { struct grant_map *add; int i; @@ -155,8 +200,37 @@ static struct grant_map *gntdev_alloc_map(struct gntdev_priv *priv, int count) NULL == add->pages) goto err; +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC + add->dma_flags = dma_flags; + + /* +* Check if this mapping is requested to be backed +* by a DMA buffer. +*/ + if (dma_flags & (GNTDEV_DMA_FLAG_WC | GNTDEV_DMA_FLAG_COHERENT)) { + struct gnttab_dma_alloc_args args; + + /* Remember the device, so we can free DMA memory. */ + add->dma_dev = priv->dma_dev; + + args.dev = priv->dma_dev; + args.coherent = dma_flags & GNTDEV_DMA_FLAG_COHERENT; + args.nr_pages = count; + args.pages = add->pages; + + if (gnttab_dma_alloc_pages()) + goto err; + + add->dma_vaddr = args.vaddr; + add->dma_bus_addr = args.dev_bus_addr; + } else { + if (gnttab_alloc_pages(count, add->pages)) + goto err; + } +#else if (gnttab_alloc_pages(count, add->pages)) goto err; +#endif for (i = 0; i < count; i++) { add->map_ops[i].handle = -1; @@ -323,8 +397,19 @@ static int map_grant_pages(struct grant_map *map) } map->unmap_ops[i].handle = map->map_ops[i].handle; +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC + if (use_ptemod) { + map->kunmap_ops[i].handle = map->kmap_ops[i].handle; + } else if (map->dm
[PATCH 6/8] xen/gntdev: Implement dma-buf export functionality
From: Oleksandr Andrushchenko <oleksandr_andrushche...@epam.com> 1. Create a dma-buf from grant references provided by the foreign domain. By default dma-buf is backed by system memory pages, but by providing GNTDEV_DMA_FLAG_XXX flags it can also be created as a DMA write-combine/coherent buffer, e.g. allocated with corresponding dma_alloc_xxx API. Export the resulting buffer as a new dma-buf. 2. Implement waiting for the dma-buf to be released: block until the dma-buf with the file descriptor provided is released. If within the time-out provided the buffer is not released then -ETIMEDOUT error is returned. If the buffer with the file descriptor does not exist or has already been released, then -ENOENT is returned. For valid file descriptors this must not be treated as error. Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushche...@epam.com> --- drivers/xen/gntdev.c | 478 ++- 1 file changed, 476 insertions(+), 2 deletions(-) diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c index 9e450622af1a..52abc6cd5846 100644 --- a/drivers/xen/gntdev.c +++ b/drivers/xen/gntdev.c @@ -4,6 +4,8 @@ * Device for accessing (in user-space) pages that have been granted by other * domains. * + * DMA buffer implementation is based on drivers/gpu/drm/drm_prime.c. + * * Copyright (c) 2006-2007, D G Murray. * (c) 2009 Gerd Hoffmann <kra...@redhat.com> * (c) 2018 Oleksandr Andrushchenko, EPAM Systems Inc. @@ -41,6 +43,9 @@ #ifdef CONFIG_XEN_GRANT_DMA_ALLOC #include #endif +#ifdef CONFIG_XEN_GNTDEV_DMABUF +#include +#endif #include #include @@ -81,6 +86,17 @@ struct gntdev_priv { /* Device for which DMA memory is allocated. */ struct device *dma_dev; #endif + +#ifdef CONFIG_XEN_GNTDEV_DMABUF + /* Private data of the hyper DMA buffers. */ + + /* List of exported DMA buffers. */ + struct list_head dmabuf_exp_list; + /* List of wait objects. */ + struct list_head dmabuf_exp_wait_list; + /* This is the lock which protects dma_buf_xxx lists. */ + struct mutex dmabuf_lock; +#endif }; struct unmap_notify { @@ -125,12 +141,38 @@ struct grant_map { #ifdef CONFIG_XEN_GNTDEV_DMABUF struct xen_dmabuf { + struct gntdev_priv *priv; + struct dma_buf *dmabuf; + struct list_head next; + int fd; + union { + struct { + /* Exported buffers are reference counted. */ + struct kref refcount; + struct grant_map *map; + } exp; struct { /* Granted references of the imported buffer. */ grant_ref_t *refs; } imp; } u; + + /* Number of pages this buffer has. */ + int nr_pages; + /* Pages of this buffer. */ + struct page **pages; +}; + +struct xen_dmabuf_wait_obj { + struct list_head next; + struct xen_dmabuf *xen_dmabuf; + struct completion completion; +}; + +struct xen_dmabuf_attachment { + struct sg_table *sgt; + enum dma_data_direction dir; }; #endif @@ -320,6 +362,16 @@ static void gntdev_put_map(struct gntdev_priv *priv, struct grant_map *map) gntdev_free_map(map); } +#ifdef CONFIG_XEN_GNTDEV_DMABUF +static void gntdev_remove_map(struct gntdev_priv *priv, struct grant_map *map) +{ + mutex_lock(>lock); + list_del(>next); + gntdev_put_map(NULL /* already removed */, map); + mutex_unlock(>lock); +} +#endif + /* -- */ static int find_grant_ptes(pte_t *pte, pgtable_t token, @@ -628,6 +680,12 @@ static int gntdev_open(struct inode *inode, struct file *flip) INIT_LIST_HEAD(>freeable_maps); mutex_init(>lock); +#ifdef CONFIG_XEN_GNTDEV_DMABUF + mutex_init(>dmabuf_lock); + INIT_LIST_HEAD(>dmabuf_exp_list); + INIT_LIST_HEAD(>dmabuf_exp_wait_list); +#endif + if (use_ptemod) { priv->mm = get_task_mm(current); if (!priv->mm) { @@ -1053,17 +,433 @@ static long gntdev_ioctl_grant_copy(struct gntdev_priv *priv, void __user *u) /* DMA buffer export support. */ /* -- */ +/* -- */ +/* Implementation of wait for exported DMA buffer to be released. */ +/* -- */ + +static void dmabuf_exp_release(struct kref *kref); + +static struct xen_dmabuf_wait_obj * +dmabuf_exp_wait_obj_new(struct gntdev_priv *priv, + struct xen_dmabuf *xen_dmabuf) +{ + struct xen_dmabuf_wait_obj *obj; + + obj = kzalloc(sizeof(*obj), GFP_KERNEL);
[PATCH 5/8] xen/gntdev: Add initial support for dma-buf UAPI
From: Oleksandr Andrushchenko <oleksandr_andrushche...@epam.com> Add UAPI and IOCTLs for dma-buf grant device driver extension: the extension allows userspace processes and kernel modules to use Xen backed dma-buf implementation. With this extension grant references to the pages of an imported dma-buf can be exported for other domain use and grant references coming from a foreign domain can be converted into a local dma-buf for local export. Implement basic initialization and stubs for Xen DMA buffers' support. Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushche...@epam.com> --- drivers/xen/Kconfig | 10 +++ drivers/xen/gntdev.c | 148 ++ include/uapi/xen/gntdev.h | 91 +++ 3 files changed, 249 insertions(+) diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig index 3431fe210624..eaf63a2c7ae6 100644 --- a/drivers/xen/Kconfig +++ b/drivers/xen/Kconfig @@ -152,6 +152,16 @@ config XEN_GNTDEV help Allows userspace processes to use grants. +config XEN_GNTDEV_DMABUF + bool "Add support for dma-buf grant access device driver extension" + depends on XEN_GNTDEV && XEN_GRANT_DMA_ALLOC + help + Allows userspace processes and kernel modules to use Xen backed + dma-buf implementation. With this extension grant references to + the pages of an imported dma-buf can be exported for other domain + use and grant references coming from a foreign domain can be + converted into a local dma-buf for local export. + config XEN_GRANT_DEV_ALLOC tristate "User-space grant reference allocator driver" depends on XEN diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c index 640a579f42ea..9e450622af1a 100644 --- a/drivers/xen/gntdev.c +++ b/drivers/xen/gntdev.c @@ -6,6 +6,7 @@ * * Copyright (c) 2006-2007, D G Murray. * (c) 2009 Gerd Hoffmann <kra...@redhat.com> + * (c) 2018 Oleksandr Andrushchenko, EPAM Systems Inc. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of @@ -122,6 +123,17 @@ struct grant_map { #endif }; +#ifdef CONFIG_XEN_GNTDEV_DMABUF +struct xen_dmabuf { + union { + struct { + /* Granted references of the imported buffer. */ + grant_ref_t *refs; + } imp; + } u; +}; +#endif + static int unmap_grant_pages(struct grant_map *map, int offset, int pages); static struct miscdevice gntdev_miscdev; @@ -1036,6 +1048,128 @@ static long gntdev_ioctl_grant_copy(struct gntdev_priv *priv, void __user *u) return ret; } +#ifdef CONFIG_XEN_GNTDEV_DMABUF +/* -- */ +/* DMA buffer export support. */ +/* -- */ + +static int dmabuf_exp_wait_released(struct gntdev_priv *priv, int fd, + int wait_to_ms) +{ + return -ETIMEDOUT; +} + +static int dmabuf_exp_from_refs(struct gntdev_priv *priv, int flags, + int count, u32 domid, u32 *refs, u32 *fd) +{ + *fd = -1; + return -EINVAL; +} + +/* -- */ +/* DMA buffer import support. */ +/* -- */ + +static int dmabuf_imp_release(struct gntdev_priv *priv, u32 fd) +{ + return 0; +} + +static struct xen_dmabuf * +dmabuf_imp_to_refs(struct gntdev_priv *priv, int fd, int count, int domid) +{ + return ERR_PTR(-ENOMEM); +} + +/* -- */ +/* DMA buffer IOCTL support. */ +/* -- */ + +static long +gntdev_ioctl_dmabuf_exp_from_refs(struct gntdev_priv *priv, + struct ioctl_gntdev_dmabuf_exp_from_refs __user *u) +{ + struct ioctl_gntdev_dmabuf_exp_from_refs op; + u32 *refs; + long ret; + + if (copy_from_user(, u, sizeof(op)) != 0) + return -EFAULT; + + refs = kcalloc(op.count, sizeof(*refs), GFP_KERNEL); + if (!refs) + return -ENOMEM; + + if (copy_from_user(refs, u->refs, sizeof(*refs) * op.count) != 0) { + ret = -EFAULT; + goto out; + } + + ret = dmabuf_exp_from_refs(priv, op.flags, op.count, + op.domid, refs, ); + if (ret) + goto out; + + if (copy_to_user(u, , sizeof(op)) != 0) + ret = -EFAULT; + +out: + kfree(refs); + return ret; +} + +static long +gntdev_ioctl_
[PATCH 7/8] xen/gntdev: Implement dma-buf import functionality
From: Oleksandr Andrushchenko <oleksandr_andrushche...@epam.com> 1. Import a dma-buf with the file descriptor provided and export granted references to the pages of that dma-buf into the array of grant references. 2. Add API to close all references to an imported buffer, so it can be released by the owner. This is only valid for buffers created with IOCTL_GNTDEV_DMABUF_IMP_TO_REFS. Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushche...@epam.com> --- drivers/xen/gntdev.c | 237 ++- 1 file changed, 234 insertions(+), 3 deletions(-) diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c index 52abc6cd5846..d8b6168f2cd9 100644 --- a/drivers/xen/gntdev.c +++ b/drivers/xen/gntdev.c @@ -71,6 +71,17 @@ static atomic_t pages_mapped = ATOMIC_INIT(0); static int use_ptemod; #define populate_freeable_maps use_ptemod +#ifdef CONFIG_XEN_GNTDEV_DMABUF +#ifndef GRANT_INVALID_REF +/* + * Note on usage of grant reference 0 as invalid grant reference: + * grant reference 0 is valid, but never exposed to a driver, + * because of the fact it is already in use/reserved by the PV console. + */ +#define GRANT_INVALID_REF 0 +#endif +#endif + struct gntdev_priv { /* maps with visible offsets in the file descriptor */ struct list_head maps; @@ -94,6 +105,8 @@ struct gntdev_priv { struct list_head dmabuf_exp_list; /* List of wait objects. */ struct list_head dmabuf_exp_wait_list; + /* List of imported DMA buffers. */ + struct list_head dmabuf_imp_list; /* This is the lock which protects dma_buf_xxx lists. */ struct mutex dmabuf_lock; #endif @@ -155,6 +168,10 @@ struct xen_dmabuf { struct { /* Granted references of the imported buffer. */ grant_ref_t *refs; + /* Scatter-gather table of the imported buffer. */ + struct sg_table *sgt; + /* dma-buf attachment of the imported buffer. */ + struct dma_buf_attachment *attach; } imp; } u; @@ -684,6 +701,7 @@ static int gntdev_open(struct inode *inode, struct file *flip) mutex_init(>dmabuf_lock); INIT_LIST_HEAD(>dmabuf_exp_list); INIT_LIST_HEAD(>dmabuf_exp_wait_list); + INIT_LIST_HEAD(>dmabuf_imp_list); #endif if (use_ptemod) { @@ -1544,15 +1562,228 @@ static int dmabuf_exp_from_refs(struct gntdev_priv *priv, int flags, /* DMA buffer import support. */ /* -- */ -static int dmabuf_imp_release(struct gntdev_priv *priv, u32 fd) +static int +dmabuf_imp_grant_foreign_access(struct page **pages, u32 *refs, + int count, int domid) { - return 0; + grant_ref_t priv_gref_head; + int i, ret; + + ret = gnttab_alloc_grant_references(count, _gref_head); + if (ret < 0) { + pr_err("Cannot allocate grant references, ret %d\n", ret); + return ret; + } + + for (i = 0; i < count; i++) { + int cur_ref; + + cur_ref = gnttab_claim_grant_reference(_gref_head); + if (cur_ref < 0) { + ret = cur_ref; + pr_err("Cannot claim grant reference, ret %d\n", ret); + goto out; + } + + gnttab_grant_foreign_access_ref(cur_ref, domid, + xen_page_to_gfn(pages[i]), 0); + refs[i] = cur_ref; + } + + ret = 0; + +out: + gnttab_free_grant_references(priv_gref_head); + return ret; +} + +static void dmabuf_imp_end_foreign_access(u32 *refs, int count) +{ + int i; + + for (i = 0; i < count; i++) + if (refs[i] != GRANT_INVALID_REF) + gnttab_end_foreign_access(refs[i], 0, 0UL); +} + +static void dmabuf_imp_free_storage(struct xen_dmabuf *xen_dmabuf) +{ + kfree(xen_dmabuf->pages); + kfree(xen_dmabuf->u.imp.refs); + kfree(xen_dmabuf); +} + +static struct xen_dmabuf *dmabuf_imp_alloc_storage(int count) +{ + struct xen_dmabuf *xen_dmabuf; + int i; + + xen_dmabuf = kzalloc(sizeof(*xen_dmabuf), GFP_KERNEL); + if (!xen_dmabuf) + goto fail; + + xen_dmabuf->u.imp.refs = kcalloc(count, +sizeof(xen_dmabuf->u.imp.refs[0]), +GFP_KERNEL); + if (!xen_dmabuf->u.imp.refs) + goto fail; + + xen_dmabuf->pages = kcalloc(count, + sizeof(xen_dmabuf->pages[0]), + GFP_KERNEL); + if (!xen_dmabuf->pages) + goto
[PATCH 8/8] xen/gntdev: Expose gntdev's dma-buf API for in-kernel use
From: Oleksandr Andrushchenko <oleksandr_andrushche...@epam.com> Allow creating grant device context for use by kernel modules which require functionality, provided by gntdev. Export symbols for dma-buf API provided by the module. Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushche...@epam.com> --- drivers/xen/gntdev.c| 116 ++-- include/xen/grant_dev.h | 37 + 2 files changed, 113 insertions(+), 40 deletions(-) create mode 100644 include/xen/grant_dev.h diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c index d8b6168f2cd9..912056f3e909 100644 --- a/drivers/xen/gntdev.c +++ b/drivers/xen/gntdev.c @@ -684,14 +684,33 @@ static const struct mmu_notifier_ops gntdev_mmu_ops = { /* -- */ -static int gntdev_open(struct inode *inode, struct file *flip) +void gntdev_free_context(struct gntdev_priv *priv) +{ + struct grant_map *map; + + pr_debug("priv %p\n", priv); + + mutex_lock(>lock); + while (!list_empty(>maps)) { + map = list_entry(priv->maps.next, struct grant_map, next); + list_del(>next); + gntdev_put_map(NULL /* already removed */, map); + } + WARN_ON(!list_empty(>freeable_maps)); + + mutex_unlock(>lock); + + kfree(priv); +} +EXPORT_SYMBOL(gntdev_free_context); + +struct gntdev_priv *gntdev_alloc_context(struct device *dev) { struct gntdev_priv *priv; - int ret = 0; priv = kzalloc(sizeof(*priv), GFP_KERNEL); if (!priv) - return -ENOMEM; + return ERR_PTR(-ENOMEM); INIT_LIST_HEAD(>maps); INIT_LIST_HEAD(>freeable_maps); @@ -704,6 +723,32 @@ static int gntdev_open(struct inode *inode, struct file *flip) INIT_LIST_HEAD(>dmabuf_imp_list); #endif +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC + priv->dma_dev = dev; + + /* +* The device is not spawn from a device tree, so arch_setup_dma_ops +* is not called, thus leaving the device with dummy DMA ops. +* Fix this call of_dma_configure() with a NULL node to set +* default DMA ops. +*/ + of_dma_configure(priv->dma_dev, NULL); +#endif + pr_debug("priv %p\n", priv); + + return priv; +} +EXPORT_SYMBOL(gntdev_alloc_context); + +static int gntdev_open(struct inode *inode, struct file *flip) +{ + struct gntdev_priv *priv; + int ret = 0; + + priv = gntdev_alloc_context(gntdev_miscdev.this_device); + if (IS_ERR(priv)) + return PTR_ERR(priv); + if (use_ptemod) { priv->mm = get_task_mm(current); if (!priv->mm) { @@ -716,23 +761,11 @@ static int gntdev_open(struct inode *inode, struct file *flip) } if (ret) { - kfree(priv); + gntdev_free_context(priv); return ret; } flip->private_data = priv; -#ifdef CONFIG_XEN_GRANT_DMA_ALLOC - priv->dma_dev = gntdev_miscdev.this_device; - - /* -* The device is not spawn from a device tree, so arch_setup_dma_ops -* is not called, thus leaving the device with dummy DMA ops. -* Fix this call of_dma_configure() with a NULL node to set -* default DMA ops. -*/ - of_dma_configure(priv->dma_dev, NULL); -#endif - pr_debug("priv %p\n", priv); return 0; } @@ -740,22 +773,11 @@ static int gntdev_open(struct inode *inode, struct file *flip) static int gntdev_release(struct inode *inode, struct file *flip) { struct gntdev_priv *priv = flip->private_data; - struct grant_map *map; - - pr_debug("priv %p\n", priv); - - mutex_lock(>lock); - while (!list_empty(>maps)) { - map = list_entry(priv->maps.next, struct grant_map, next); - list_del(>next); - gntdev_put_map(NULL /* already removed */, map); - } - WARN_ON(!list_empty(>freeable_maps)); - mutex_unlock(>lock); if (use_ptemod) mmu_notifier_unregister(>mn, priv->mm); - kfree(priv); + + gntdev_free_context(priv); return 0; } @@ -1210,7 +1232,7 @@ dmabuf_exp_wait_obj_get_by_fd(struct gntdev_priv *priv, int fd) return ret; } -static int dmabuf_exp_wait_released(struct gntdev_priv *priv, int fd, +int gntdev_dmabuf_exp_wait_released(struct gntdev_priv *priv, int fd, int wait_to_ms) { struct xen_dmabuf *xen_dmabuf; @@ -1242,6 +1264,7 @@ static int dmabuf_exp_wait_released(struct gntdev_priv *priv, int fd, dmabuf_exp_wait_obj_free(priv, obj); return ret; } +EXPORT_SYMBOL(gntdev_dmabuf_exp_wait_released); /* ---
Re: [Xen-devel] [RFC 1/3] xen/balloon: Allow allocating DMA buffers
On 05/22/2018 09:02 PM, Boris Ostrovsky wrote: On 05/22/2018 11:00 AM, Oleksandr Andrushchenko wrote: On 05/22/2018 05:33 PM, Boris Ostrovsky wrote: On 05/22/2018 01:55 AM, Oleksandr Andrushchenko wrote: On 05/21/2018 11:36 PM, Boris Ostrovsky wrote: On 05/21/2018 03:13 PM, Oleksandr Andrushchenko wrote: On 05/21/2018 09:53 PM, Boris Ostrovsky wrote: On 05/21/2018 01:32 PM, Oleksandr Andrushchenko wrote: On 05/21/2018 07:35 PM, Boris Ostrovsky wrote: On 05/21/2018 01:40 AM, Oleksandr Andrushchenko wrote: On 05/19/2018 01:04 AM, Boris Ostrovsky wrote: On 05/17/2018 04:26 AM, Oleksandr Andrushchenko wrote: From: Oleksandr Andrushchenko <oleksandr_andrushche...@epam.com> A commit message would be useful. Sure, v1 will have it Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushche...@epam.com> for (i = 0; i < nr_pages; i++) { - page = alloc_page(gfp); - if (page == NULL) { - nr_pages = i; - state = BP_EAGAIN; - break; + if (ext_pages) { + page = ext_pages[i]; + } else { + page = alloc_page(gfp); + if (page == NULL) { + nr_pages = i; + state = BP_EAGAIN; + break; + } } scrub_page(page); list_add(>lru, ); @@ -529,7 +565,7 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp) i = 0; list_for_each_entry_safe(page, tmp, , lru) { /* XENMEM_decrease_reservation requires a GFN */ - frame_list[i++] = xen_page_to_gfn(page); + frames[i++] = xen_page_to_gfn(page); #ifdef CONFIG_XEN_HAVE_PVMMU /* @@ -552,18 +588,22 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp) #endif list_del(>lru); - balloon_append(page); + if (!ext_pages) + balloon_append(page); So what you are proposing is not really ballooning. You are just piggybacking on existing interfaces, aren't you? Sort of. Basically I need to {increase|decrease}_reservation, not actually allocating ballooned pages. Do you think I can simply EXPORT_SYMBOL for {increase|decrease}_reservation? Any other suggestion? I am actually wondering how much of that code you end up reusing. You pretty much create new code paths in both routines and common code ends up being essentially the hypercall. Well, I hoped that it would be easier to maintain if I modify existing code to support both use-cases, but I am also ok to create new routines if this seems to be reasonable - please let me know So the question is --- would it make sense to do all of this separately from the balloon driver? This can be done, but which driver will host this code then? If we move from the balloon driver, then this could go to either gntdev or grant-table. What's your preference? A separate module? Is there any use for this feature outside of your zero-copy DRM driver? Intel's hyper dma-buf (Dongwon/Matt CC'ed), V4L/GPU at least. At the time I tried to upstream zcopy driver it was discussed and decided that it would be better if I remove all DRM specific code and move it to Xen drivers. Thus, this RFC. But it can also be implemented as a dedicated Xen dma-buf driver which will have all the code from this RFC + a bit more (char/misc device handling at least). This will also require a dedicated user-space library, just like libxengnttab.so for gntdev (now I have all new IOCTLs covered there). If the idea of a dedicated Xen dma-buf driver seems to be more attractive we can work toward this solution. BTW, I do support this idea, but was not sure if Xen community accepts yet another driver which duplicates quite some code of the existing gntdev/balloon/grant-table. And now after this RFC I hope that all cons and pros of both dedicated driver and gntdev/balloon/grant-table extension are clearly seen and we can make a decision. IIRC the objection for a separate module was in the context of gntdev was discussion, because (among other things) people didn't want to have yet another file in /dev/xen/ Here we are talking about (a new) balloon-like module which doesn't create any new user-visible interfaces. And as for duplicating code --- as I said, I am not convinced there is much of duplication. I might even argue that we should add a new config option for this module. I am not quite sure I am fully following you here: so, you suggest that we have balloon.c unchanged, but instead create a new module (namely a file under the same folder as balloon.c, e.g. dma-buf-reservation.c) and move those {increase|decrease}_reservation routines (specific to dma-buf) to that new file? And make it selectable via Kconfig? If so, then how about the changes to grant-table and gntdev? Those will look inconsistent then. Inconsistent with what? The changes to grant code will also be under
Re: [Xen-devel] [RFC 1/3] xen/balloon: Allow allocating DMA buffers
On 05/22/2018 05:33 PM, Boris Ostrovsky wrote: On 05/22/2018 01:55 AM, Oleksandr Andrushchenko wrote: On 05/21/2018 11:36 PM, Boris Ostrovsky wrote: On 05/21/2018 03:13 PM, Oleksandr Andrushchenko wrote: On 05/21/2018 09:53 PM, Boris Ostrovsky wrote: On 05/21/2018 01:32 PM, Oleksandr Andrushchenko wrote: On 05/21/2018 07:35 PM, Boris Ostrovsky wrote: On 05/21/2018 01:40 AM, Oleksandr Andrushchenko wrote: On 05/19/2018 01:04 AM, Boris Ostrovsky wrote: On 05/17/2018 04:26 AM, Oleksandr Andrushchenko wrote: From: Oleksandr Andrushchenko <oleksandr_andrushche...@epam.com> A commit message would be useful. Sure, v1 will have it Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushche...@epam.com> for (i = 0; i < nr_pages; i++) { - page = alloc_page(gfp); - if (page == NULL) { - nr_pages = i; - state = BP_EAGAIN; - break; + if (ext_pages) { + page = ext_pages[i]; + } else { + page = alloc_page(gfp); + if (page == NULL) { + nr_pages = i; + state = BP_EAGAIN; + break; + } } scrub_page(page); list_add(>lru, ); @@ -529,7 +565,7 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp) i = 0; list_for_each_entry_safe(page, tmp, , lru) { /* XENMEM_decrease_reservation requires a GFN */ - frame_list[i++] = xen_page_to_gfn(page); + frames[i++] = xen_page_to_gfn(page); #ifdef CONFIG_XEN_HAVE_PVMMU /* @@ -552,18 +588,22 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp) #endif list_del(>lru); - balloon_append(page); + if (!ext_pages) + balloon_append(page); So what you are proposing is not really ballooning. You are just piggybacking on existing interfaces, aren't you? Sort of. Basically I need to {increase|decrease}_reservation, not actually allocating ballooned pages. Do you think I can simply EXPORT_SYMBOL for {increase|decrease}_reservation? Any other suggestion? I am actually wondering how much of that code you end up reusing. You pretty much create new code paths in both routines and common code ends up being essentially the hypercall. Well, I hoped that it would be easier to maintain if I modify existing code to support both use-cases, but I am also ok to create new routines if this seems to be reasonable - please let me know So the question is --- would it make sense to do all of this separately from the balloon driver? This can be done, but which driver will host this code then? If we move from the balloon driver, then this could go to either gntdev or grant-table. What's your preference? A separate module? Is there any use for this feature outside of your zero-copy DRM driver? Intel's hyper dma-buf (Dongwon/Matt CC'ed), V4L/GPU at least. At the time I tried to upstream zcopy driver it was discussed and decided that it would be better if I remove all DRM specific code and move it to Xen drivers. Thus, this RFC. But it can also be implemented as a dedicated Xen dma-buf driver which will have all the code from this RFC + a bit more (char/misc device handling at least). This will also require a dedicated user-space library, just like libxengnttab.so for gntdev (now I have all new IOCTLs covered there). If the idea of a dedicated Xen dma-buf driver seems to be more attractive we can work toward this solution. BTW, I do support this idea, but was not sure if Xen community accepts yet another driver which duplicates quite some code of the existing gntdev/balloon/grant-table. And now after this RFC I hope that all cons and pros of both dedicated driver and gntdev/balloon/grant-table extension are clearly seen and we can make a decision. IIRC the objection for a separate module was in the context of gntdev was discussion, because (among other things) people didn't want to have yet another file in /dev/xen/ Here we are talking about (a new) balloon-like module which doesn't create any new user-visible interfaces. And as for duplicating code --- as I said, I am not convinced there is much of duplication. I might even argue that we should add a new config option for this module. I am not quite sure I am fully following you here: so, you suggest that we have balloon.c unchanged, but instead create a new module (namely a file under the same folder as balloon.c, e.g. dma-buf-reservation.c) and move those {increase|decrease}_reservation routines (specific to dma-buf) to that new file? And make it selectable via Kconfig? If so, then how about the changes to grant-table and gntdev? Those will look inconsistent then. Inconsistent with what? The changes to grant code will also be under the new config option. Ah, ok. Option 1. We will have Kconfig option which will cover dma-buf changes in balloon, g
Re: [Xen-devel][RFC 3/3] xen/gntdev: Add support for Linux dma buffers
On 05/22/2018 12:31 AM, Dongwon Kim wrote: Still need more time to review the whole code changes Take your time, I just wanted to make sure that all interested parties are in the discussion, so we all finally have what we all want, not a thing covering only my use-cases but I noticed one thing. We've been using the term "hyper_dmabuf" for hypervisor-agnostic linux dmabuf solution and we are planning to call any of our future solution for other hypervisors the same name. So having same name for this xen-specific structure or functions you implemented is confusing. Would you change it to something else like... "xen_"? Np, will rename On Thu, May 17, 2018 at 11:26:04AM +0300, Oleksandr Andrushchenko wrote: From: Oleksandr Andrushchenko <oleksandr_andrushche...@epam.com> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushche...@epam.com> --- drivers/xen/gntdev.c | 954 +- include/uapi/xen/gntdev.h | 101 include/xen/gntdev_exp.h | 23 + 3 files changed, 1066 insertions(+), 12 deletions(-) create mode 100644 include/xen/gntdev_exp.h diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c index 9510f228efe9..0ee88e193362 100644 --- a/drivers/xen/gntdev.c +++ b/drivers/xen/gntdev.c @@ -4,6 +4,8 @@ * Device for accessing (in user-space) pages that have been granted by other * domains. * + * DMA buffer implementation is based on drivers/gpu/drm/drm_prime.c. + * * Copyright (c) 2006-2007, D G Murray. * (c) 2009 Gerd Hoffmann <kra...@redhat.com> * @@ -37,6 +39,9 @@ #include #include +#include +#include + #include #include #include @@ -61,16 +66,39 @@ static atomic_t pages_mapped = ATOMIC_INIT(0); static int use_ptemod; #define populate_freeable_maps use_ptemod +#ifndef GRANT_INVALID_REF +/* + * Note on usage of grant reference 0 as invalid grant reference: + * grant reference 0 is valid, but never exposed to a driver, + * because of the fact it is already in use/reserved by the PV console. + */ +#define GRANT_INVALID_REF 0 +#endif + struct gntdev_priv { /* maps with visible offsets in the file descriptor */ struct list_head maps; /* maps that are not visible; will be freed on munmap. * Only populated if populate_freeable_maps == 1 */ struct list_head freeable_maps; + /* List of dma-bufs. */ + struct list_head dma_bufs; /* lock protects maps and freeable_maps */ struct mutex lock; struct mm_struct *mm; struct mmu_notifier mn; + + /* Private data of the hyper DMA buffers. */ + + struct device *dev; + /* List of exported DMA buffers. */ + struct list_head dmabuf_exp_list; + /* List of wait objects. */ + struct list_head dmabuf_exp_wait_list; + /* List of imported DMA buffers. */ + struct list_head dmabuf_imp_list; + /* This is the lock which protects dma_buf_xxx lists. */ + struct mutex dmabuf_lock; }; struct unmap_notify { @@ -95,10 +123,65 @@ struct grant_map { struct gnttab_unmap_grant_ref *kunmap_ops; struct page **pages; unsigned long pages_vm_start; + + /* +* All the fields starting with dmabuf_ are only valid if this +* mapping is used for exporting a DMA buffer. +* If dmabuf_vaddr is not NULL then this mapping is backed by DMA +* capable memory. +*/ + + /* Flags used to create this DMA buffer: GNTDEV_DMABUF_FLAG_XXX. */ + bool dmabuf_flags; + /* Virtual/CPU address of the DMA buffer. */ + void *dmabuf_vaddr; + /* Bus address of the DMA buffer. */ + dma_addr_t dmabuf_bus_addr; +}; + +struct hyper_dmabuf { + struct gntdev_priv *priv; + struct dma_buf *dmabuf; + struct list_head next; + int fd; + + union { + struct { + /* Exported buffers are reference counted. */ + struct kref refcount; + struct grant_map *map; + } exp; + struct { + /* Granted references of the imported buffer. */ + grant_ref_t *refs; + /* Scatter-gather table of the imported buffer. */ + struct sg_table *sgt; + /* dma-buf attachment of the imported buffer. */ + struct dma_buf_attachment *attach; + } imp; + } u; + + /* Number of pages this buffer has. */ + int nr_pages; + /* Pages of this buffer. */ + struct page **pages; +}; + +struct hyper_dmabuf_wait_obj { + struct list_head next; + struct hyper_dmabuf *hyper_dmabuf; + struct completion completion; +}; + +struct hyper_dambuf_attachment { minor typo: dam->dma (same thing in other places as well.) sure, thanks + struct sg_ta
Re: [Xen-devel] [RFC 1/3] xen/balloon: Allow allocating DMA buffers
On 05/21/2018 11:36 PM, Boris Ostrovsky wrote: On 05/21/2018 03:13 PM, Oleksandr Andrushchenko wrote: On 05/21/2018 09:53 PM, Boris Ostrovsky wrote: On 05/21/2018 01:32 PM, Oleksandr Andrushchenko wrote: On 05/21/2018 07:35 PM, Boris Ostrovsky wrote: On 05/21/2018 01:40 AM, Oleksandr Andrushchenko wrote: On 05/19/2018 01:04 AM, Boris Ostrovsky wrote: On 05/17/2018 04:26 AM, Oleksandr Andrushchenko wrote: From: Oleksandr Andrushchenko <oleksandr_andrushche...@epam.com> A commit message would be useful. Sure, v1 will have it Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushche...@epam.com> for (i = 0; i < nr_pages; i++) { - page = alloc_page(gfp); - if (page == NULL) { - nr_pages = i; - state = BP_EAGAIN; - break; + if (ext_pages) { + page = ext_pages[i]; + } else { + page = alloc_page(gfp); + if (page == NULL) { + nr_pages = i; + state = BP_EAGAIN; + break; + } } scrub_page(page); list_add(>lru, ); @@ -529,7 +565,7 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp) i = 0; list_for_each_entry_safe(page, tmp, , lru) { /* XENMEM_decrease_reservation requires a GFN */ - frame_list[i++] = xen_page_to_gfn(page); + frames[i++] = xen_page_to_gfn(page); #ifdef CONFIG_XEN_HAVE_PVMMU /* @@ -552,18 +588,22 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp) #endif list_del(>lru); - balloon_append(page); + if (!ext_pages) + balloon_append(page); So what you are proposing is not really ballooning. You are just piggybacking on existing interfaces, aren't you? Sort of. Basically I need to {increase|decrease}_reservation, not actually allocating ballooned pages. Do you think I can simply EXPORT_SYMBOL for {increase|decrease}_reservation? Any other suggestion? I am actually wondering how much of that code you end up reusing. You pretty much create new code paths in both routines and common code ends up being essentially the hypercall. Well, I hoped that it would be easier to maintain if I modify existing code to support both use-cases, but I am also ok to create new routines if this seems to be reasonable - please let me know So the question is --- would it make sense to do all of this separately from the balloon driver? This can be done, but which driver will host this code then? If we move from the balloon driver, then this could go to either gntdev or grant-table. What's your preference? A separate module? Is there any use for this feature outside of your zero-copy DRM driver? Intel's hyper dma-buf (Dongwon/Matt CC'ed), V4L/GPU at least. At the time I tried to upstream zcopy driver it was discussed and decided that it would be better if I remove all DRM specific code and move it to Xen drivers. Thus, this RFC. But it can also be implemented as a dedicated Xen dma-buf driver which will have all the code from this RFC + a bit more (char/misc device handling at least). This will also require a dedicated user-space library, just like libxengnttab.so for gntdev (now I have all new IOCTLs covered there). If the idea of a dedicated Xen dma-buf driver seems to be more attractive we can work toward this solution. BTW, I do support this idea, but was not sure if Xen community accepts yet another driver which duplicates quite some code of the existing gntdev/balloon/grant-table. And now after this RFC I hope that all cons and pros of both dedicated driver and gntdev/balloon/grant-table extension are clearly seen and we can make a decision. IIRC the objection for a separate module was in the context of gntdev was discussion, because (among other things) people didn't want to have yet another file in /dev/xen/ Here we are talking about (a new) balloon-like module which doesn't create any new user-visible interfaces. And as for duplicating code --- as I said, I am not convinced there is much of duplication. I might even argue that we should add a new config option for this module. I am not quite sure I am fully following you here: so, you suggest that we have balloon.c unchanged, but instead create a new module (namely a file under the same folder as balloon.c, e.g. dma-buf-reservation.c) and move those {increase|decrease}_reservation routines (specific to dma-buf) to that new file? And make it selectable via Kconfig? If so, then how about the changes to grant-table and gntdev? Those will look inconsistent then. If you suggest a new kernel driver module: IMO, there is nothing bad if we create a dedicated kernel module (driver) for Xen dma-buf handling selectable under Kconfig option. Yes, this will create a yet another device under /dev/xen, but most people will never see it if we set Kconfig to default to &qu
Re: [Xen-devel] [RFC 1/3] xen/balloon: Allow allocating DMA buffers
On 05/21/2018 09:53 PM, Boris Ostrovsky wrote: On 05/21/2018 01:32 PM, Oleksandr Andrushchenko wrote: On 05/21/2018 07:35 PM, Boris Ostrovsky wrote: On 05/21/2018 01:40 AM, Oleksandr Andrushchenko wrote: On 05/19/2018 01:04 AM, Boris Ostrovsky wrote: On 05/17/2018 04:26 AM, Oleksandr Andrushchenko wrote: From: Oleksandr Andrushchenko <oleksandr_andrushche...@epam.com> A commit message would be useful. Sure, v1 will have it Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushche...@epam.com> for (i = 0; i < nr_pages; i++) { - page = alloc_page(gfp); - if (page == NULL) { - nr_pages = i; - state = BP_EAGAIN; - break; + if (ext_pages) { + page = ext_pages[i]; + } else { + page = alloc_page(gfp); + if (page == NULL) { + nr_pages = i; + state = BP_EAGAIN; + break; + } } scrub_page(page); list_add(>lru, ); @@ -529,7 +565,7 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp) i = 0; list_for_each_entry_safe(page, tmp, , lru) { /* XENMEM_decrease_reservation requires a GFN */ - frame_list[i++] = xen_page_to_gfn(page); + frames[i++] = xen_page_to_gfn(page); #ifdef CONFIG_XEN_HAVE_PVMMU /* @@ -552,18 +588,22 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp) #endif list_del(>lru); - balloon_append(page); + if (!ext_pages) + balloon_append(page); So what you are proposing is not really ballooning. You are just piggybacking on existing interfaces, aren't you? Sort of. Basically I need to {increase|decrease}_reservation, not actually allocating ballooned pages. Do you think I can simply EXPORT_SYMBOL for {increase|decrease}_reservation? Any other suggestion? I am actually wondering how much of that code you end up reusing. You pretty much create new code paths in both routines and common code ends up being essentially the hypercall. Well, I hoped that it would be easier to maintain if I modify existing code to support both use-cases, but I am also ok to create new routines if this seems to be reasonable - please let me know So the question is --- would it make sense to do all of this separately from the balloon driver? This can be done, but which driver will host this code then? If we move from the balloon driver, then this could go to either gntdev or grant-table. What's your preference? A separate module? Is there any use for this feature outside of your zero-copy DRM driver? Intel's hyper dma-buf (Dongwon/Matt CC'ed), V4L/GPU at least. At the time I tried to upstream zcopy driver it was discussed and decided that it would be better if I remove all DRM specific code and move it to Xen drivers. Thus, this RFC. But it can also be implemented as a dedicated Xen dma-buf driver which will have all the code from this RFC + a bit more (char/misc device handling at least). This will also require a dedicated user-space library, just like libxengnttab.so for gntdev (now I have all new IOCTLs covered there). If the idea of a dedicated Xen dma-buf driver seems to be more attractive we can work toward this solution. BTW, I do support this idea, but was not sure if Xen community accepts yet another driver which duplicates quite some code of the existing gntdev/balloon/grant-table. And now after this RFC I hope that all cons and pros of both dedicated driver and gntdev/balloon/grant-table extension are clearly seen and we can make a decision. -boris Thank you, Oleksandr [1] https://lists.freedesktop.org/archives/dri-devel/2018-April/173163.html
Re: [Xen-devel] [RFC 1/3] xen/balloon: Allow allocating DMA buffers
On 05/21/2018 07:35 PM, Boris Ostrovsky wrote: On 05/21/2018 01:40 AM, Oleksandr Andrushchenko wrote: On 05/19/2018 01:04 AM, Boris Ostrovsky wrote: On 05/17/2018 04:26 AM, Oleksandr Andrushchenko wrote: From: Oleksandr Andrushchenko <oleksandr_andrushche...@epam.com> A commit message would be useful. Sure, v1 will have it Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushche...@epam.com> for (i = 0; i < nr_pages; i++) { - page = alloc_page(gfp); - if (page == NULL) { - nr_pages = i; - state = BP_EAGAIN; - break; + if (ext_pages) { + page = ext_pages[i]; + } else { + page = alloc_page(gfp); + if (page == NULL) { + nr_pages = i; + state = BP_EAGAIN; + break; + } } scrub_page(page); list_add(>lru, ); @@ -529,7 +565,7 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp) i = 0; list_for_each_entry_safe(page, tmp, , lru) { /* XENMEM_decrease_reservation requires a GFN */ - frame_list[i++] = xen_page_to_gfn(page); + frames[i++] = xen_page_to_gfn(page); #ifdef CONFIG_XEN_HAVE_PVMMU /* @@ -552,18 +588,22 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp) #endif list_del(>lru); - balloon_append(page); + if (!ext_pages) + balloon_append(page); So what you are proposing is not really ballooning. You are just piggybacking on existing interfaces, aren't you? Sort of. Basically I need to {increase|decrease}_reservation, not actually allocating ballooned pages. Do you think I can simply EXPORT_SYMBOL for {increase|decrease}_reservation? Any other suggestion? I am actually wondering how much of that code you end up reusing. You pretty much create new code paths in both routines and common code ends up being essentially the hypercall. Well, I hoped that it would be easier to maintain if I modify existing code to support both use-cases, but I am also ok to create new routines if this seems to be reasonable - please let me know So the question is --- would it make sense to do all of this separately from the balloon driver? This can be done, but which driver will host this code then? If we move from the balloon driver, then this could go to either gntdev or grant-table. What's your preference? -boris Thank you, Oleksandr
Re: [Xen-devel][RFC 2/3] xen/grant-table: Extend API to work with DMA buffers
On 05/19/2018 01:19 AM, Boris Ostrovsky wrote: On 05/17/2018 04:26 AM, Oleksandr Andrushchenko wrote: From: Oleksandr Andrushchenko <oleksandr_andrushche...@epam.com> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushche...@epam.com> --- drivers/xen/grant-table.c | 49 +++ include/xen/grant_table.h | 7 ++ 2 files changed, 56 insertions(+) diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c index bb36b1e1dbcc..c27bcc420575 100644 --- a/drivers/xen/grant-table.c +++ b/drivers/xen/grant-table.c @@ -729,6 +729,55 @@ void gnttab_free_pages(int nr_pages, struct page **pages) } EXPORT_SYMBOL(gnttab_free_pages); +int gnttab_dma_alloc_pages(struct device *dev, bool coherent, + int nr_pages, struct page **pages, + void **vaddr, dma_addr_t *dev_bus_addr) +{ + int i; + int ret; + + ret = alloc_dma_xenballooned_pages(dev, coherent, nr_pages, pages, + vaddr, dev_bus_addr); + if (ret < 0) + return ret; + + for (i = 0; i < nr_pages; i++) { +#if BITS_PER_LONG < 64 + struct xen_page_foreign *foreign; + + foreign = kzalloc(sizeof(*foreign), GFP_KERNEL); + if (!foreign) { + gnttab_dma_free_pages(dev, flags, nr_pages, pages, + *vaddr, *dev_bus_addr); + return -ENOMEM; + } + set_page_private(pages[i], (unsigned long)foreign); +#endif + SetPagePrivate(pages[i]); + } + return 0; +} +EXPORT_SYMBOL(gnttab_dma_alloc_pages); + +void gnttab_dma_free_pages(struct device *dev, bool coherent, + int nr_pages, struct page **pages, + void *vaddr, dma_addr_t dev_bus_addr) +{ + int i; + + for (i = 0; i < nr_pages; i++) { + if (PagePrivate(pages[i])) { +#if BITS_PER_LONG < 64 + kfree((void *)page_private(pages[i])); +#endif + ClearPagePrivate(pages[i]); + } + } + free_dma_xenballooned_pages(dev, coherent, nr_pages, pages, + vaddr, dev_bus_addr); +} +EXPORT_SYMBOL(gnttab_dma_free_pages); Given that these routines look almost exactly like their non-dma counterparts I wonder whether common code could be factored out. Yes, this can be done -boris + /* Handling of paged out grant targets (GNTST_eagain) */ #define MAX_DELAY 256 static inline void diff --git a/include/xen/grant_table.h b/include/xen/grant_table.h index 34b1379f9777..20ee2b5ba965 100644 --- a/include/xen/grant_table.h +++ b/include/xen/grant_table.h @@ -195,6 +195,13 @@ void gnttab_free_auto_xlat_frames(void); int gnttab_alloc_pages(int nr_pages, struct page **pages); void gnttab_free_pages(int nr_pages, struct page **pages); +int gnttab_dma_alloc_pages(struct device *dev, bool coherent, + int nr_pages, struct page **pages, + void **vaddr, dma_addr_t *dev_bus_addr); +void gnttab_dma_free_pages(struct device *dev, bool coherent, + int nr_pages, struct page **pages, + void *vaddr, dma_addr_t dev_bus_addr); + int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops, struct gnttab_map_grant_ref *kmap_ops, struct page **pages, unsigned int count);
Re: [Xen-devel] [RFC 1/3] xen/balloon: Allow allocating DMA buffers
On 05/19/2018 01:04 AM, Boris Ostrovsky wrote: On 05/17/2018 04:26 AM, Oleksandr Andrushchenko wrote: From: Oleksandr Andrushchenko <oleksandr_andrushche...@epam.com> A commit message would be useful. Sure, v1 will have it Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushche...@epam.com> for (i = 0; i < nr_pages; i++) { - page = alloc_page(gfp); - if (page == NULL) { - nr_pages = i; - state = BP_EAGAIN; - break; + if (ext_pages) { + page = ext_pages[i]; + } else { + page = alloc_page(gfp); + if (page == NULL) { + nr_pages = i; + state = BP_EAGAIN; + break; + } } scrub_page(page); list_add(>lru, ); @@ -529,7 +565,7 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp) i = 0; list_for_each_entry_safe(page, tmp, , lru) { /* XENMEM_decrease_reservation requires a GFN */ - frame_list[i++] = xen_page_to_gfn(page); + frames[i++] = xen_page_to_gfn(page); #ifdef CONFIG_XEN_HAVE_PVMMU /* @@ -552,18 +588,22 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp) #endif list_del(>lru); - balloon_append(page); + if (!ext_pages) + balloon_append(page); So what you are proposing is not really ballooning. You are just piggybacking on existing interfaces, aren't you? Sort of. Basically I need to {increase|decrease}_reservation, not actually allocating ballooned pages. Do you think I can simply EXPORT_SYMBOL for {increase|decrease}_reservation? Any other suggestion? -boris Thank you, Oleksandr
[Xen-devel][RFC 2/3] xen/grant-table: Extend API to work with DMA buffers
From: Oleksandr Andrushchenko <oleksandr_andrushche...@epam.com> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushche...@epam.com> --- drivers/xen/grant-table.c | 49 +++ include/xen/grant_table.h | 7 ++ 2 files changed, 56 insertions(+) diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c index bb36b1e1dbcc..c27bcc420575 100644 --- a/drivers/xen/grant-table.c +++ b/drivers/xen/grant-table.c @@ -729,6 +729,55 @@ void gnttab_free_pages(int nr_pages, struct page **pages) } EXPORT_SYMBOL(gnttab_free_pages); +int gnttab_dma_alloc_pages(struct device *dev, bool coherent, + int nr_pages, struct page **pages, + void **vaddr, dma_addr_t *dev_bus_addr) +{ + int i; + int ret; + + ret = alloc_dma_xenballooned_pages(dev, coherent, nr_pages, pages, + vaddr, dev_bus_addr); + if (ret < 0) + return ret; + + for (i = 0; i < nr_pages; i++) { +#if BITS_PER_LONG < 64 + struct xen_page_foreign *foreign; + + foreign = kzalloc(sizeof(*foreign), GFP_KERNEL); + if (!foreign) { + gnttab_dma_free_pages(dev, flags, nr_pages, pages, + *vaddr, *dev_bus_addr); + return -ENOMEM; + } + set_page_private(pages[i], (unsigned long)foreign); +#endif + SetPagePrivate(pages[i]); + } + return 0; +} +EXPORT_SYMBOL(gnttab_dma_alloc_pages); + +void gnttab_dma_free_pages(struct device *dev, bool coherent, + int nr_pages, struct page **pages, + void *vaddr, dma_addr_t dev_bus_addr) +{ + int i; + + for (i = 0; i < nr_pages; i++) { + if (PagePrivate(pages[i])) { +#if BITS_PER_LONG < 64 + kfree((void *)page_private(pages[i])); +#endif + ClearPagePrivate(pages[i]); + } + } + free_dma_xenballooned_pages(dev, coherent, nr_pages, pages, + vaddr, dev_bus_addr); +} +EXPORT_SYMBOL(gnttab_dma_free_pages); + /* Handling of paged out grant targets (GNTST_eagain) */ #define MAX_DELAY 256 static inline void diff --git a/include/xen/grant_table.h b/include/xen/grant_table.h index 34b1379f9777..20ee2b5ba965 100644 --- a/include/xen/grant_table.h +++ b/include/xen/grant_table.h @@ -195,6 +195,13 @@ void gnttab_free_auto_xlat_frames(void); int gnttab_alloc_pages(int nr_pages, struct page **pages); void gnttab_free_pages(int nr_pages, struct page **pages); +int gnttab_dma_alloc_pages(struct device *dev, bool coherent, + int nr_pages, struct page **pages, + void **vaddr, dma_addr_t *dev_bus_addr); +void gnttab_dma_free_pages(struct device *dev, bool coherent, + int nr_pages, struct page **pages, + void *vaddr, dma_addr_t dev_bus_addr); + int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops, struct gnttab_map_grant_ref *kmap_ops, struct page **pages, unsigned int count); -- 2.17.0
[Xen-devel][RFC 1/3] xen/balloon: Allow allocating DMA buffers
From: Oleksandr Andrushchenko <oleksandr_andrushche...@epam.com> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushche...@epam.com> --- drivers/xen/balloon.c | 214 +++--- drivers/xen/xen-balloon.c | 2 + include/xen/balloon.h | 11 +- 3 files changed, 188 insertions(+), 39 deletions(-) diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c index e4db19e88ab1..e3a145aa9f29 100644 --- a/drivers/xen/balloon.c +++ b/drivers/xen/balloon.c @@ -415,8 +415,10 @@ static bool balloon_is_inflated(void) return balloon_stats.balloon_low || balloon_stats.balloon_high; } -static enum bp_state increase_reservation(unsigned long nr_pages) +static enum bp_state increase_reservation(unsigned long nr_pages, + struct page **ext_pages) { + enum bp_state ret = BP_DONE; int rc; unsigned long i; struct page *page; @@ -425,32 +427,49 @@ static enum bp_state increase_reservation(unsigned long nr_pages) .extent_order = EXTENT_ORDER, .domid= DOMID_SELF }; + xen_pfn_t *frames; - if (nr_pages > ARRAY_SIZE(frame_list)) - nr_pages = ARRAY_SIZE(frame_list); + if (nr_pages > ARRAY_SIZE(frame_list)) { + frames = kcalloc(nr_pages, sizeof(xen_pfn_t), GFP_KERNEL); + if (!frames) + return BP_ECANCELED; + } else { + frames = frame_list; + } - page = list_first_entry_or_null(_pages, struct page, lru); - for (i = 0; i < nr_pages; i++) { - if (!page) { - nr_pages = i; - break; - } + /* XENMEM_populate_physmap requires a PFN based on Xen +* granularity. +*/ + if (ext_pages) { + for (i = 0; i < nr_pages; i++) + frames[i] = page_to_xen_pfn(ext_pages[i]); + } else { + page = list_first_entry_or_null(_pages, + struct page, lru); + for (i = 0; i < nr_pages; i++) { + if (!page) { + nr_pages = i; + break; + } - /* XENMEM_populate_physmap requires a PFN based on Xen -* granularity. -*/ - frame_list[i] = page_to_xen_pfn(page); - page = balloon_next_page(page); + frames[i] = page_to_xen_pfn(page); + page = balloon_next_page(page); + } } - set_xen_guest_handle(reservation.extent_start, frame_list); + set_xen_guest_handle(reservation.extent_start, frames); reservation.nr_extents = nr_pages; rc = HYPERVISOR_memory_op(XENMEM_populate_physmap, ); - if (rc <= 0) - return BP_EAGAIN; + if (rc <= 0) { + ret = BP_EAGAIN; + goto out; + } for (i = 0; i < rc; i++) { - page = balloon_retrieve(false); + if (ext_pages) + page = ext_pages[i]; + else + page = balloon_retrieve(false); BUG_ON(page == NULL); #ifdef CONFIG_XEN_HAVE_PVMMU @@ -463,14 +482,14 @@ static enum bp_state increase_reservation(unsigned long nr_pages) if (!xen_feature(XENFEAT_auto_translated_physmap)) { unsigned long pfn = page_to_pfn(page); - set_phys_to_machine(pfn, frame_list[i]); + set_phys_to_machine(pfn, frames[i]); /* Link back into the page tables if not highmem. */ if (!PageHighMem(page)) { int ret; ret = HYPERVISOR_update_va_mapping( (unsigned long)__va(pfn << PAGE_SHIFT), - mfn_pte(frame_list[i], PAGE_KERNEL), + mfn_pte(frames[i], PAGE_KERNEL), 0); BUG_ON(ret); } @@ -478,15 +497,22 @@ static enum bp_state increase_reservation(unsigned long nr_pages) #endif /* Relinquish the page back to the allocator. */ - __free_reserved_page(page); + if (!ext_pages) + __free_reserved_page(page); } - balloon_stats.current_pages += rc; + if (!ext_pages) + balloon_stats.current_pages += rc; - return BP_DONE; +out: + if (frames != frame_list) + kfree(frames); + + return ret; } -static enum bp_state decrease_reservation(unsigned long
[Xen-devel][RFC 3/3] xen/gntdev: Add support for Linux dma buffers
From: Oleksandr Andrushchenko <oleksandr_andrushche...@epam.com> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushche...@epam.com> --- drivers/xen/gntdev.c | 954 +- include/uapi/xen/gntdev.h | 101 include/xen/gntdev_exp.h | 23 + 3 files changed, 1066 insertions(+), 12 deletions(-) create mode 100644 include/xen/gntdev_exp.h diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c index 9510f228efe9..0ee88e193362 100644 --- a/drivers/xen/gntdev.c +++ b/drivers/xen/gntdev.c @@ -4,6 +4,8 @@ * Device for accessing (in user-space) pages that have been granted by other * domains. * + * DMA buffer implementation is based on drivers/gpu/drm/drm_prime.c. + * * Copyright (c) 2006-2007, D G Murray. * (c) 2009 Gerd Hoffmann <kra...@redhat.com> * @@ -37,6 +39,9 @@ #include #include +#include +#include + #include #include #include @@ -61,16 +66,39 @@ static atomic_t pages_mapped = ATOMIC_INIT(0); static int use_ptemod; #define populate_freeable_maps use_ptemod +#ifndef GRANT_INVALID_REF +/* + * Note on usage of grant reference 0 as invalid grant reference: + * grant reference 0 is valid, but never exposed to a driver, + * because of the fact it is already in use/reserved by the PV console. + */ +#define GRANT_INVALID_REF 0 +#endif + struct gntdev_priv { /* maps with visible offsets in the file descriptor */ struct list_head maps; /* maps that are not visible; will be freed on munmap. * Only populated if populate_freeable_maps == 1 */ struct list_head freeable_maps; + /* List of dma-bufs. */ + struct list_head dma_bufs; /* lock protects maps and freeable_maps */ struct mutex lock; struct mm_struct *mm; struct mmu_notifier mn; + + /* Private data of the hyper DMA buffers. */ + + struct device *dev; + /* List of exported DMA buffers. */ + struct list_head dmabuf_exp_list; + /* List of wait objects. */ + struct list_head dmabuf_exp_wait_list; + /* List of imported DMA buffers. */ + struct list_head dmabuf_imp_list; + /* This is the lock which protects dma_buf_xxx lists. */ + struct mutex dmabuf_lock; }; struct unmap_notify { @@ -95,10 +123,65 @@ struct grant_map { struct gnttab_unmap_grant_ref *kunmap_ops; struct page **pages; unsigned long pages_vm_start; + + /* +* All the fields starting with dmabuf_ are only valid if this +* mapping is used for exporting a DMA buffer. +* If dmabuf_vaddr is not NULL then this mapping is backed by DMA +* capable memory. +*/ + + /* Flags used to create this DMA buffer: GNTDEV_DMABUF_FLAG_XXX. */ + bool dmabuf_flags; + /* Virtual/CPU address of the DMA buffer. */ + void *dmabuf_vaddr; + /* Bus address of the DMA buffer. */ + dma_addr_t dmabuf_bus_addr; +}; + +struct hyper_dmabuf { + struct gntdev_priv *priv; + struct dma_buf *dmabuf; + struct list_head next; + int fd; + + union { + struct { + /* Exported buffers are reference counted. */ + struct kref refcount; + struct grant_map *map; + } exp; + struct { + /* Granted references of the imported buffer. */ + grant_ref_t *refs; + /* Scatter-gather table of the imported buffer. */ + struct sg_table *sgt; + /* dma-buf attachment of the imported buffer. */ + struct dma_buf_attachment *attach; + } imp; + } u; + + /* Number of pages this buffer has. */ + int nr_pages; + /* Pages of this buffer. */ + struct page **pages; +}; + +struct hyper_dmabuf_wait_obj { + struct list_head next; + struct hyper_dmabuf *hyper_dmabuf; + struct completion completion; +}; + +struct hyper_dambuf_attachment { + struct sg_table *sgt; + enum dma_data_direction dir; }; static int unmap_grant_pages(struct grant_map *map, int offset, int pages); +static struct miscdevice gntdev_miscdev; + /* -- */ static void gntdev_print_maps(struct gntdev_priv *priv, @@ -120,8 +203,17 @@ static void gntdev_free_map(struct grant_map *map) if (map == NULL) return; - if (map->pages) + if (map->dmabuf_vaddr) { + bool coherent = map->dmabuf_flags & + GNTDEV_DMABUF_FLAG_DMA_COHERENT; + + gnttab_dma_free_pages(gntdev_miscdev.this_device, + coherent, map->count, map->pages, + map->dmabuf_vaddr, map->dmab
[Xen-devel][RFC 0/3] dma-buf support for gntdev
From: Oleksandr Andrushchenko <oleksandr_andrushche...@epam.com> This work is in response to my previous attempt to introduce Xen/DRM zero-copy driver [1] to enable Linux dma-buf API [2] for Xen based frontends/backends. There is also an existing hyper_dmabuf approach available [3] which, if reworked to utilize the proposed solution, can greatly benefit as well. The original rationale behind this work was to enable zero-copying use-cases while working with Xen para-virtual display driver [4]: when using Xen PV DRM frontend driver then on backend side one will need to do copying of display buffers' contents (filled by the frontend's user-space) into buffers allocated at the backend side. Taking into account the size of display buffers and frames per second it may result in unneeded huge data bus occupation and performance loss. The helper driver [4] allows implementing zero-copying use-cases when using Xen para-virtualized frontend display driver by implementing a DRM/KMS helper driver running on backend's side. It utilizes PRIME buffers API (implemented on top of Linux dma-buf) to share frontend's buffers with physical device drivers on backend's side: - a dumb buffer created on backend's side can be shared with the Xen PV frontend driver, so it directly writes into backend's domain memory (into the buffer exported from DRM/KMS driver of a physical display device) - a dumb buffer allocated by the frontend can be imported into physical device DRM/KMS driver, thus allowing to achieve no copying as well Finally, it was discussed and decided ([1], [5]) that it is worth implementing such use-cases via extension of the existing Xen gntdev driver instead of introducing new DRM specific driver. Please note, that the support of dma-buf is Linux only, as dma-buf is a Linux only thing. Now to the proposed solution. The changes to the existing Xen drivers in the Linux kernel fall into 2 categories: 1. DMA-able memory buffer allocation and ballooning in/out the pages of such a buffer. This is required if we are about to share dma-buf with the hardware that does require those to be allocated with dma_alloc_xxx API. (It is still possible to allocate a dma-buf from any other memory, e.g. system pages). 2. Extension of the gntdev driver to enable it to import/export dma-buf’s. The first two patches in this series solve #1 and the last one is for #2. The corresponding libxengnttab changes are available at [6]. All the above was tested with display backend [7] and its accompanying helper library [8] on Renesas ARM64 based board. *To all the communities*: I would like to ask you to review the proposed solution and give feedback on it, so I can improve and send final patches for review (this is still work in progress, but enough to start discussing the implementation). Thank you in advance, Oleksandr Andrushchenko [1] https://lists.freedesktop.org/archives/dri-devel/2018-April/173163.html [2] https://elixir.bootlin.com/linux/v4.17-rc5/source/Documentation/driver-api/dma-buf.rst [3] https://lists.xenproject.org/archives/html/xen-devel/2018-02/msg01202.html [4] https://cgit.freedesktop.org/drm/drm-misc/tree/drivers/gpu/drm/xen [5] https://patchwork.kernel.org/patch/10279681/ [6] https://github.com/andr2000/xen/tree/xen_dma_buf_v1 [7] https://github.com/andr2000/displ_be/tree/xen_dma_buf_v1 [8] https://github.com/andr2000/libxenbe/tree/xen_dma_buf_v1 Oleksandr Andrushchenko (3): xen/balloon: Allow allocating DMA buffers xen/grant-table: Extend API to work with DMA buffers xen/gntdev: Add support for Linux dma buffers drivers/xen/balloon.c | 214 +++-- drivers/xen/gntdev.c | 954 +- drivers/xen/grant-table.c | 49 ++ drivers/xen/xen-balloon.c | 2 + include/uapi/xen/gntdev.h | 101 include/xen/balloon.h | 11 +- include/xen/gntdev_exp.h | 23 + include/xen/grant_table.h | 7 + 8 files changed, 1310 insertions(+), 51 deletions(-) create mode 100644 include/xen/gntdev_exp.h -- 2.17.0
Re: [RfC PATCH] Add udmabuf misc device
On 04/16/2018 12:32 PM, Daniel Vetter wrote: On Mon, Apr 16, 2018 at 10:22 AM, Oleksandr Andrushchenko <andr2...@gmail.com> wrote: On 04/16/2018 10:43 AM, Daniel Vetter wrote: On Mon, Apr 16, 2018 at 10:16:31AM +0300, Oleksandr Andrushchenko wrote: On 04/13/2018 06:37 PM, Daniel Vetter wrote: On Wed, Apr 11, 2018 at 08:59:32AM +0300, Oleksandr Andrushchenko wrote: On 04/10/2018 08:26 PM, Dongwon Kim wrote: On Tue, Apr 10, 2018 at 09:37:53AM +0300, Oleksandr Andrushchenko wrote: On 04/06/2018 09:57 PM, Dongwon Kim wrote: On Fri, Apr 06, 2018 at 03:36:03PM +0300, Oleksandr Andrushchenko wrote: On 04/06/2018 02:57 PM, Gerd Hoffmann wrote: Hi, I fail to see any common ground for xen-zcopy and udmabuf ... Does the above mean you can assume that xen-zcopy and udmabuf can co-exist as two different solutions? Well, udmabuf route isn't fully clear yet, but yes. See also gvt (intel vgpu), where the hypervisor interface is abstracted away into a separate kernel modules even though most of the actual vgpu emulation code is common. Thank you for your input, I'm just trying to figure out which of the three z-copy solutions intersect and how much And what about hyper-dmabuf? xen z-copy solution is pretty similar fundamentally to hyper_dmabuf in terms of these core sharing feature: 1. the sharing process - import prime/dmabuf from the producer -> extract underlying pages and get those shared -> return references for shared pages Another thing is danvet was kind of against to the idea of importing existing dmabuf/prime buffer and forward it to the other domain due to synchronization issues. He proposed to make hyper_dmabuf only work as an exporter so that it can have a full control over the buffer. I think we need to talk about this further as well. Yes, I saw this. But this limits the use-cases so much. For instance, running Android as a Guest (which uses ION to allocate buffers) means that finally HW composer will import dma-buf into the DRM driver. Then, in case of xen-front for example, it needs to be shared with the backend (Host side). Of course, we can change user-space to make xen-front allocate the buffers (make it exporter), but what we try to avoid is to change user-space which in normal world would have remain unchanged otherwise. So, I do think we have to support this use-case and just have to understand the complexity. Erm, why do you need importer capability for this use-case? guest1 -> ION -> xen-front -> hypervisor -> guest 2 -> xen-zcopy exposes that dma-buf -> import to the real display hw No where in this chain do you need xen-zcopy to be able to import a dma-buf (within linux, it needs to import a bunch of pages from the hypervisor). Now if your plan is to use xen-zcopy in the guest1 instead of xen-front, then you indeed need to import. This is the exact use-case I was referring to while saying we need to import on Guest1 side. If hyper-dmabuf is so generic that there is no xen-front in the picture, then it needs to import a dma-buf, so it can be exported at Guest2 side. But that imo doesn't make sense: - xen-front gives you clearly defined flip events you can forward to the hypervisor. xen-zcopy would need to add that again. xen-zcopy is a helper driver which doesn't handle page flips and is not a KMS driver as one might think of: the DRM UAPI it uses is just to export a dma-buf as a PRIME buffer, but that's it. Flipping etc. is done by the backend [1], not xen-zcopy. Same for hyperdmabuf (and really we're not going to shuffle struct dma_fence over the wire in a generic fashion between hypervisor guests). - xen-front already has the idea of pixel format for the buffer (and any other metadata). Again, xen-zcopy and hyperdmabuf lack that, would need to add it shoehorned in somehow. Again, here you are talking of something which is implemented in Xen display backend, not xen-zcopy, e.g. display backend can implement para-virtual display w/o xen-zcopy at all, but in this case there is a memory copying for each frame. With the help of xen-zcopy the backend feeds xen-front's buffers directly into Guest2 DRM/KMS or Weston or whatever as xen-zcopy exports remote buffers as PRIME buffers, thus no buffer copying is required. Why do you need to copy on every frame for xen-front? In the above pipeline, using xen-front I see 0 architectural reasons to have a copy anywhere. This seems to be the core of the confusion we're having here. Ok, so I'll try to explain: 1. xen-front - produces a display buffer to be shown at Guest2 by the backend, shares its grant references with the backend 2. xen-front sends page flip event to the backend specifying the buffer in question 3. Backend takes the shared buffer (which is only a buffer mapped into backend's memory, it is not a dma-buf/PRIME one) and makes memcpy from it to a local dumb/surface Why do you even do that? The copying here I mean - why don't you just di
Re: [RfC PATCH] Add udmabuf misc device
On 04/16/2018 10:43 AM, Daniel Vetter wrote: On Mon, Apr 16, 2018 at 10:16:31AM +0300, Oleksandr Andrushchenko wrote: On 04/13/2018 06:37 PM, Daniel Vetter wrote: On Wed, Apr 11, 2018 at 08:59:32AM +0300, Oleksandr Andrushchenko wrote: On 04/10/2018 08:26 PM, Dongwon Kim wrote: On Tue, Apr 10, 2018 at 09:37:53AM +0300, Oleksandr Andrushchenko wrote: On 04/06/2018 09:57 PM, Dongwon Kim wrote: On Fri, Apr 06, 2018 at 03:36:03PM +0300, Oleksandr Andrushchenko wrote: On 04/06/2018 02:57 PM, Gerd Hoffmann wrote: Hi, I fail to see any common ground for xen-zcopy and udmabuf ... Does the above mean you can assume that xen-zcopy and udmabuf can co-exist as two different solutions? Well, udmabuf route isn't fully clear yet, but yes. See also gvt (intel vgpu), where the hypervisor interface is abstracted away into a separate kernel modules even though most of the actual vgpu emulation code is common. Thank you for your input, I'm just trying to figure out which of the three z-copy solutions intersect and how much And what about hyper-dmabuf? xen z-copy solution is pretty similar fundamentally to hyper_dmabuf in terms of these core sharing feature: 1. the sharing process - import prime/dmabuf from the producer -> extract underlying pages and get those shared -> return references for shared pages Another thing is danvet was kind of against to the idea of importing existing dmabuf/prime buffer and forward it to the other domain due to synchronization issues. He proposed to make hyper_dmabuf only work as an exporter so that it can have a full control over the buffer. I think we need to talk about this further as well. Yes, I saw this. But this limits the use-cases so much. For instance, running Android as a Guest (which uses ION to allocate buffers) means that finally HW composer will import dma-buf into the DRM driver. Then, in case of xen-front for example, it needs to be shared with the backend (Host side). Of course, we can change user-space to make xen-front allocate the buffers (make it exporter), but what we try to avoid is to change user-space which in normal world would have remain unchanged otherwise. So, I do think we have to support this use-case and just have to understand the complexity. Erm, why do you need importer capability for this use-case? guest1 -> ION -> xen-front -> hypervisor -> guest 2 -> xen-zcopy exposes that dma-buf -> import to the real display hw No where in this chain do you need xen-zcopy to be able to import a dma-buf (within linux, it needs to import a bunch of pages from the hypervisor). Now if your plan is to use xen-zcopy in the guest1 instead of xen-front, then you indeed need to import. This is the exact use-case I was referring to while saying we need to import on Guest1 side. If hyper-dmabuf is so generic that there is no xen-front in the picture, then it needs to import a dma-buf, so it can be exported at Guest2 side. But that imo doesn't make sense: - xen-front gives you clearly defined flip events you can forward to the hypervisor. xen-zcopy would need to add that again. xen-zcopy is a helper driver which doesn't handle page flips and is not a KMS driver as one might think of: the DRM UAPI it uses is just to export a dma-buf as a PRIME buffer, but that's it. Flipping etc. is done by the backend [1], not xen-zcopy. Same for hyperdmabuf (and really we're not going to shuffle struct dma_fence over the wire in a generic fashion between hypervisor guests). - xen-front already has the idea of pixel format for the buffer (and any other metadata). Again, xen-zcopy and hyperdmabuf lack that, would need to add it shoehorned in somehow. Again, here you are talking of something which is implemented in Xen display backend, not xen-zcopy, e.g. display backend can implement para-virtual display w/o xen-zcopy at all, but in this case there is a memory copying for each frame. With the help of xen-zcopy the backend feeds xen-front's buffers directly into Guest2 DRM/KMS or Weston or whatever as xen-zcopy exports remote buffers as PRIME buffers, thus no buffer copying is required. Why do you need to copy on every frame for xen-front? In the above pipeline, using xen-front I see 0 architectural reasons to have a copy anywhere. This seems to be the core of the confusion we're having here. Ok, so I'll try to explain: 1. xen-front - produces a display buffer to be shown at Guest2 by the backend, shares its grant references with the backend 2. xen-front sends page flip event to the backend specifying the buffer in question 3. Backend takes the shared buffer (which is only a buffer mapped into backend's memory, it is not a dma-buf/PRIME one) and makes memcpy from it to a local dumb/surface 4. Backend flips that local dumb buffer/surface If I have a xen-zcopy helper driver then I can avoid doing step 3): 1) 2) remain the same as above 3) Initially for a new display buffer, backend calls xen-zcopy to create a
Re: [RfC PATCH] Add udmabuf misc device
On 04/13/2018 06:37 PM, Daniel Vetter wrote: On Wed, Apr 11, 2018 at 08:59:32AM +0300, Oleksandr Andrushchenko wrote: On 04/10/2018 08:26 PM, Dongwon Kim wrote: On Tue, Apr 10, 2018 at 09:37:53AM +0300, Oleksandr Andrushchenko wrote: On 04/06/2018 09:57 PM, Dongwon Kim wrote: On Fri, Apr 06, 2018 at 03:36:03PM +0300, Oleksandr Andrushchenko wrote: On 04/06/2018 02:57 PM, Gerd Hoffmann wrote: Hi, I fail to see any common ground for xen-zcopy and udmabuf ... Does the above mean you can assume that xen-zcopy and udmabuf can co-exist as two different solutions? Well, udmabuf route isn't fully clear yet, but yes. See also gvt (intel vgpu), where the hypervisor interface is abstracted away into a separate kernel modules even though most of the actual vgpu emulation code is common. Thank you for your input, I'm just trying to figure out which of the three z-copy solutions intersect and how much And what about hyper-dmabuf? xen z-copy solution is pretty similar fundamentally to hyper_dmabuf in terms of these core sharing feature: 1. the sharing process - import prime/dmabuf from the producer -> extract underlying pages and get those shared -> return references for shared pages Another thing is danvet was kind of against to the idea of importing existing dmabuf/prime buffer and forward it to the other domain due to synchronization issues. He proposed to make hyper_dmabuf only work as an exporter so that it can have a full control over the buffer. I think we need to talk about this further as well. Yes, I saw this. But this limits the use-cases so much. For instance, running Android as a Guest (which uses ION to allocate buffers) means that finally HW composer will import dma-buf into the DRM driver. Then, in case of xen-front for example, it needs to be shared with the backend (Host side). Of course, we can change user-space to make xen-front allocate the buffers (make it exporter), but what we try to avoid is to change user-space which in normal world would have remain unchanged otherwise. So, I do think we have to support this use-case and just have to understand the complexity. Erm, why do you need importer capability for this use-case? guest1 -> ION -> xen-front -> hypervisor -> guest 2 -> xen-zcopy exposes that dma-buf -> import to the real display hw No where in this chain do you need xen-zcopy to be able to import a dma-buf (within linux, it needs to import a bunch of pages from the hypervisor). Now if your plan is to use xen-zcopy in the guest1 instead of xen-front, then you indeed need to import. This is the exact use-case I was referring to while saying we need to import on Guest1 side. If hyper-dmabuf is so generic that there is no xen-front in the picture, then it needs to import a dma-buf, so it can be exported at Guest2 side. But that imo doesn't make sense: - xen-front gives you clearly defined flip events you can forward to the hypervisor. xen-zcopy would need to add that again. xen-zcopy is a helper driver which doesn't handle page flips and is not a KMS driver as one might think of: the DRM UAPI it uses is just to export a dma-buf as a PRIME buffer, but that's it. Flipping etc. is done by the backend [1], not xen-zcopy. Same for hyperdmabuf (and really we're not going to shuffle struct dma_fence over the wire in a generic fashion between hypervisor guests). - xen-front already has the idea of pixel format for the buffer (and any other metadata). Again, xen-zcopy and hyperdmabuf lack that, would need to add it shoehorned in somehow. Again, here you are talking of something which is implemented in Xen display backend, not xen-zcopy, e.g. display backend can implement para-virtual display w/o xen-zcopy at all, but in this case there is a memory copying for each frame. With the help of xen-zcopy the backend feeds xen-front's buffers directly into Guest2 DRM/KMS or Weston or whatever as xen-zcopy exports remote buffers as PRIME buffers, thus no buffer copying is required. Ofc you won't be able to shovel sound or media stream data over to another guest like this, but that's what you have xen-v4l and xen-sound or whatever else for. Trying to make a new uapi, which means userspace must be changed for all the different use-case, instead of reusing standard linux driver uapi (which just happens to send the data to another hypervisor guest instead of real hw) imo just doesn't make much sense. Also, at least for the gpu subsystem: Any new uapi must have full userspace available for it, see: https://dri.freedesktop.org/docs/drm/gpu/drm-uapi.html#open-source-userspace-requirements Adding more uapi is definitely the most painful way to fix a use-case. Personally I'd go as far and also change the xen-zcopy side on the receiving guest to use some standard linux uapi. E.g. you could write an output v4l driver to receive the frames from guest1. So, we now know that xen-zcopy was not meant to handle page flips, but to implement ne
Re: [RfC PATCH] Add udmabuf misc device
On 04/10/2018 08:26 PM, Dongwon Kim wrote: On Tue, Apr 10, 2018 at 09:37:53AM +0300, Oleksandr Andrushchenko wrote: On 04/06/2018 09:57 PM, Dongwon Kim wrote: On Fri, Apr 06, 2018 at 03:36:03PM +0300, Oleksandr Andrushchenko wrote: On 04/06/2018 02:57 PM, Gerd Hoffmann wrote: Hi, I fail to see any common ground for xen-zcopy and udmabuf ... Does the above mean you can assume that xen-zcopy and udmabuf can co-exist as two different solutions? Well, udmabuf route isn't fully clear yet, but yes. See also gvt (intel vgpu), where the hypervisor interface is abstracted away into a separate kernel modules even though most of the actual vgpu emulation code is common. Thank you for your input, I'm just trying to figure out which of the three z-copy solutions intersect and how much And what about hyper-dmabuf? xen z-copy solution is pretty similar fundamentally to hyper_dmabuf in terms of these core sharing feature: 1. the sharing process - import prime/dmabuf from the producer -> extract underlying pages and get those shared -> return references for shared pages Another thing is danvet was kind of against to the idea of importing existing dmabuf/prime buffer and forward it to the other domain due to synchronization issues. He proposed to make hyper_dmabuf only work as an exporter so that it can have a full control over the buffer. I think we need to talk about this further as well. Yes, I saw this. But this limits the use-cases so much. For instance, running Android as a Guest (which uses ION to allocate buffers) means that finally HW composer will import dma-buf into the DRM driver. Then, in case of xen-front for example, it needs to be shared with the backend (Host side). Of course, we can change user-space to make xen-front allocate the buffers (make it exporter), but what we try to avoid is to change user-space which in normal world would have remain unchanged otherwise. So, I do think we have to support this use-case and just have to understand the complexity. danvet, can you comment on this topic? 2. the page sharing mechanism - it uses Xen-grant-table. And to give you a quick summary of differences as far as I understand between two implementations (please correct me if I am wrong, Oleksandr.) 1. xen-zcopy is DRM specific - can import only DRM prime buffer while hyper_dmabuf can export any dmabuf regardless of originator Well, this is true. And at the same time this is just a matter of extending the API: xen-zcopy is a helper driver designed for xen-front/back use-case, so this is why it only has DRM PRIME API 2. xen-zcopy doesn't seem to have dma-buf synchronization between two VMs while (as danvet called it as remote dmabuf api sharing) hyper_dmabuf sends out synchronization message to the exporting VM for synchronization. This is true. Again, this is because of the use-cases it covers. But having synchronization for a generic solution seems to be a good idea. Yeah, understood xen-zcopy works ok with your use case. But I am just curious if it is ok not to have any inter-domain synchronization in this sharing model. The synchronization is done with displif protocol [1] The buffer being shared is technically dma-buf and originator needs to be able to keep track of it. As I am working in DRM terms the tracking is done by the DRM core for me for free. (This might be one of the reasons Daniel sees DRM based implementation fit very good from code-reuse POV). 3. 1-level references - when using grant-table for sharing pages, there will be same # of refs (each 8 byte) To be precise, grant ref is 4 bytes You are right. Thanks for correction.;) as # of shared pages, which is passed to the userspace to be shared with importing VM in case of xen-zcopy. The reason for that is that xen-zcopy is a helper driver, e.g. the grant references come from the display backend [1], which implements Xen display protocol [2]. So, effectively the backend extracts references from frontend's requests and passes those to xen-zcopy as an array of refs. Compared to this, hyper_dmabuf does multiple level addressing to generate only one reference id that represents all shared pages. In the protocol [2] only one reference to the gref directory is passed between VMs (and the gref directory is a single-linked list of shared pages containing all of the grefs of the buffer). ok, good to know. I will look into its implementation in more details but is this gref directory (chained grefs) something that can be used for any general memory sharing use case or is it jsut for xen-display (in current code base)? Not to mislead you: one grant ref is passed via displif protocol, but the page it's referencing contains the rest of the grant refs. As to if this can be used for any memory: yes. It is the same for sndif and displif Xen protocols, but defined twice as strictly speaking sndif and displif are two separate protocols. While reviewing your RFC v2 one of the comments I had [2] was that if we can
Re: [RfC PATCH] Add udmabuf misc device
On 04/06/2018 09:57 PM, Dongwon Kim wrote: On Fri, Apr 06, 2018 at 03:36:03PM +0300, Oleksandr Andrushchenko wrote: On 04/06/2018 02:57 PM, Gerd Hoffmann wrote: Hi, I fail to see any common ground for xen-zcopy and udmabuf ... Does the above mean you can assume that xen-zcopy and udmabuf can co-exist as two different solutions? Well, udmabuf route isn't fully clear yet, but yes. See also gvt (intel vgpu), where the hypervisor interface is abstracted away into a separate kernel modules even though most of the actual vgpu emulation code is common. Thank you for your input, I'm just trying to figure out which of the three z-copy solutions intersect and how much And what about hyper-dmabuf? xen z-copy solution is pretty similar fundamentally to hyper_dmabuf in terms of these core sharing feature: 1. the sharing process - import prime/dmabuf from the producer -> extract underlying pages and get those shared -> return references for shared pages 2. the page sharing mechanism - it uses Xen-grant-table. And to give you a quick summary of differences as far as I understand between two implementations (please correct me if I am wrong, Oleksandr.) 1. xen-zcopy is DRM specific - can import only DRM prime buffer while hyper_dmabuf can export any dmabuf regardless of originator Well, this is true. And at the same time this is just a matter of extending the API: xen-zcopy is a helper driver designed for xen-front/back use-case, so this is why it only has DRM PRIME API 2. xen-zcopy doesn't seem to have dma-buf synchronization between two VMs while (as danvet called it as remote dmabuf api sharing) hyper_dmabuf sends out synchronization message to the exporting VM for synchronization. This is true. Again, this is because of the use-cases it covers. But having synchronization for a generic solution seems to be a good idea. 3. 1-level references - when using grant-table for sharing pages, there will be same # of refs (each 8 byte) To be precise, grant ref is 4 bytes as # of shared pages, which is passed to the userspace to be shared with importing VM in case of xen-zcopy. The reason for that is that xen-zcopy is a helper driver, e.g. the grant references come from the display backend [1], which implements Xen display protocol [2]. So, effectively the backend extracts references from frontend's requests and passes those to xen-zcopy as an array of refs. Compared to this, hyper_dmabuf does multiple level addressing to generate only one reference id that represents all shared pages. In the protocol [2] only one reference to the gref directory is passed between VMs (and the gref directory is a single-linked list of shared pages containing all of the grefs of the buffer). 4. inter VM messaging (hype_dmabuf only) - hyper_dmabuf has inter-vm msg communication defined for dmabuf synchronization and private data (meta info that Matt Roper mentioned) exchange. This is true, xen-zcopy has no means for inter VM sync and meta-data, simply because it doesn't have any code for inter VM exchange in it, e.g. the inter VM protocol is handled by the backend [1]. 5. driver-to-driver notification (hyper_dmabuf only) - importing VM gets notified when newdmabuf is exported from other VM - uevent can be optionally generated when this happens. 6. structure - hyper_dmabuf is targetting to provide a generic solution for inter-domain dmabuf sharing for most hypervisors, which is why it has two layers as mattrope mentioned, front-end that contains standard API and backend that is specific to hypervisor. Again, xen-zcopy is decoupled from inter VM communication No idea, didn't look at it in detail. Looks pretty complex from a distant view. Maybe because it tries to build a communication framework using dma-bufs instead of a simple dma-buf passing mechanism. we started with simple dma-buf sharing but realized there are many things we need to consider in real use-case, so we added communication , notification and dma-buf synchronization then re-structured it to front-end and back-end (this made things more compicated..) since Xen was not our only target. Also, we thought passing the reference for the buffer (hyper_dmabuf_id) is not secure so added uvent mechanism later. Yes, I am looking at it now, trying to figure out the full story and its implementation. BTW, Intel guys were about to share some test application for hyper-dmabuf, maybe I have missed one. It could probably better explain the use-cases and the complexity they have in hyper-dmabuf. One example is actually in github. If you want take a look at it, please visit: https://github.com/downor/linux_hyper_dmabuf_test/tree/xen/simple_export Thank you, I'll have a look Like xen-zcopy it seems to depend on the idea that the hypervisor manages all memory it is easy for guests to share pages with the help of the hypervisor. So, for xen-zcopy we were not trying to make it generic, it just solves display (dumb) zero-copying use-cases for X
Re: [RfC PATCH] Add udmabuf misc device
On 04/06/2018 02:57 PM, Gerd Hoffmann wrote: Hi, I fail to see any common ground for xen-zcopy and udmabuf ... Does the above mean you can assume that xen-zcopy and udmabuf can co-exist as two different solutions? Well, udmabuf route isn't fully clear yet, but yes. See also gvt (intel vgpu), where the hypervisor interface is abstracted away into a separate kernel modules even though most of the actual vgpu emulation code is common. Thank you for your input, I'm just trying to figure out which of the three z-copy solutions intersect and how much And what about hyper-dmabuf? No idea, didn't look at it in detail. Looks pretty complex from a distant view. Maybe because it tries to build a communication framework using dma-bufs instead of a simple dma-buf passing mechanism. Yes, I am looking at it now, trying to figure out the full story and its implementation. BTW, Intel guys were about to share some test application for hyper-dmabuf, maybe I have missed one. It could probably better explain the use-cases and the complexity they have in hyper-dmabuf. Like xen-zcopy it seems to depend on the idea that the hypervisor manages all memory it is easy for guests to share pages with the help of the hypervisor. So, for xen-zcopy we were not trying to make it generic, it just solves display (dumb) zero-copying use-cases for Xen. We implemented it as a DRM helper driver because we can't see any other use-cases as of now. For example, we also have Xen para-virtualized sound driver, but its buffer memory usage is not comparable to what display wants and it works somewhat differently (e.g. there is no "frame done" event, so one can't tell when the sound buffer can be "flipped"). At the same time, we do not use virtio-gpu, so this could probably be one more candidate for shared dma-bufs some day. Which simply isn't the case on kvm. hyper-dmabuf and xen-zcopy could maybe share code, or hyper-dmabuf build on top of xen-zcopy. Hm, I can imagine that: xen-zcopy could be a library code for hyper-dmabuf in terms of implementing all that page sharing fun in multiple directions, e.g. Host->Guest, Guest->Host, Guest<->Guest. But I'll let Matt and Dongwon to comment on that. cheers, Gerd Thank you, Oleksandr P.S. Sorry for making your original mail thread to discuss things much broader than your RFC...
Re: [RfC PATCH] Add udmabuf misc device
On 04/06/2018 12:07 PM, Gerd Hoffmann wrote: I'm not sure we can create something which works on both kvm and xen. The memory management model is quite different ... On xen the hypervisor manages all memory. Guests can allow other guests to access specific pages (using grant tables). In theory any guest <=> guest communication is possible. In practice is mostly guest <=> dom0 because guests access their virtual hardware that way. dom0 is the priviledged guest which owns any hardware not managed by xen itself. Xen guests can ask the hypervisor to update the mapping of guest physical pages. They can ballon down (unmap and free pages). They can ballon up (ask the hypervisor to map fresh pages). They can map pages exported by other guests using grant tables. xen-zcopy makes heavy use of this. It balloons down, to make room in the guest physical address space, then goes map the exported pages there, finally composes a dma-buf. On kvm qemu manages all guest memory. qemu also has all guest memory mapped, so a grant-table like mechanism isn't needed to implement virtual devices. qemu can decide how it backs memory for the guest. qemu propagates the guest memory map to the kvm driver in the linux kernel. kvm guests have some control over the guest memory map, for example they can map pci bars wherever they want in their guest physical address space by programming the base registers accordingly, but unlike xen guests they can't ask the host to remap individual pages. Due to qemu having all guest memory mapped virtual devices are typically designed to have the guest allocate resources, then notify the host where they are located. This is where the udmabuf idea comes from: Guest tells the host (qemu) where the gem object is, and qemu then can create a dmabuf backed by those pages to pass it on to other processes such as the wayland display server. Possibly even without the guest explicitly asking for it, i.e. export the framebuffer placed by the guest in the (virtual) vga pci memory bar as dma-buf. And I can imagine that this is useful outsize virtualization too. I fail to see any common ground for xen-zcopy and udmabuf ... Does the above mean you can assume that xen-zcopy and udmabuf can co-exist as two different solutions? And what about hyper-dmabuf? Thank you, Oleksandr
Re: [RfC PATCH] Add udmabuf misc device
On 04/06/2018 12:07 PM, Gerd Hoffmann wrote: Hi, * The general interface should be able to express sharing from any guest:guest, not just guest:host. Arbitrary G:G sharing might be something some hypervisors simply aren't able to support, but the userspace API itself shouldn't make assumptions or restrict that. I think ideally the sharing API would include some kind of query_targets interface that would return a list of VM's that your current OS is allowed to share with; that list would be depend on the policy established by the system integrator, but obviously wouldn't include targets that the hypervisor itself wouldn't be capable of handling. Can you give a use-case for this? I mean that the system integrator is the one who defines which guests/hosts talk to each other, but querying means that it is possible that VMs have some sort of discovery mechanism, so they can decide on their own whom to connect to. Note that vsock (created by vmware, these days also has a virtio transport for kvm) started with support for both guest <=> host and guest <=> guest support. But later on guest <=> guest was dropped. As far I know the reasons where (a) lack of use cases and (b) security. So, I likewise would know more details on the use cases you have in mind here. Unless we have a compelling use case here I'd suggest to drop the guest <=> guest requirement as it makes the whole thing alot more complex. This is exactly the use-case we have: in our setup Dom0 doesn't own any HW at all and all the HW is passed into a dedicated driver domain (DomD) which is still a guest domain. Then, buffers are shared between two guests, for example, DomD and DomA (Android guest) * The sharing API could be used to share multiple kinds of content in a single system. The sharing sink driver running in the content producer's VM should accept some additional metadata that will be passed over to the target VM as well. Not sure this should be part of hyper-dmabuf. A dma-buf is nothing but a block of data, period. Therefore protocols with dma-buf support (wayland for example) typically already send over metadata describing the content, so duplicating that in hyper-dmabuf looks pointless. 1. We are targeting ARM and one of the major requirements for the buffer sharing is the ability to allocate physically contiguous buffers, which gets even more complicated for systems not backed with an IOMMU. So, for some use-cases it is enough to make the buffers contiguous in terms of IPA and sometimes those need to be contiguous in terms of PA. Which pretty much implies the host must to the allocation. 2. For Xen we would love to see UAPI to create a dma-buf from grant references provided, so we can use this generic solution to implement zero-copying without breaking the existing Xen protocols. This can probably be extended to other hypervizors as well. I'm not sure we can create something which works on both kvm and xen. The memory management model is quite different ... On xen the hypervisor manages all memory. Guests can allow other guests to access specific pages (using grant tables). In theory any guest <=> guest communication is possible. In practice is mostly guest <=> dom0 because guests access their virtual hardware that way. dom0 is the priviledged guest which owns any hardware not managed by xen itself. Please see above for our setup with DomD and Dom0 being a generic ARMv8 domain, no HW Xen guests can ask the hypervisor to update the mapping of guest physical pages. They can ballon down (unmap and free pages). They can ballon up (ask the hypervisor to map fresh pages). They can map pages exported by other guests using grant tables. xen-zcopy makes heavy use of this. It balloons down, to make room in the guest physical address space, then goes map the exported pages there, finally composes a dma-buf. This is what it does On kvm qemu manages all guest memory. qemu also has all guest memory mapped, so a grant-table like mechanism isn't needed to implement virtual devices. qemu can decide how it backs memory for the guest. qemu propagates the guest memory map to the kvm driver in the linux kernel. kvm guests have some control over the guest memory map, for example they can map pci bars wherever they want in their guest physical address space by programming the base registers accordingly, but unlike xen guests they can't ask the host to remap individual pages. Due to qemu having all guest memory mapped virtual devices are typically designed to have the guest allocate resources, then notify the host where they are located. This is where the udmabuf idea comes from: Guest tells the host (qemu) where the gem object is, and qemu then can create a dmabuf backed by those pages to pass it on to other processes such as the wayland display server. Possibly even without the guest explicitly asking for it, i.e. export the framebuffer placed by the
Re: [RfC PATCH] Add udmabuf misc device
ackend hypervisor-specific stuff. I'm not super familiar with xen-zcopy Let me describe the rationale and some implementation details of the Xen zero-copy driver I posted recently [1]. The main requirement for us to implement such a helper driver was an ability to avoid memory copying for large buffers in display use-cases. This is why we only focused on DRM use-cases, not trying to implement something generic. This is why the driver is somewhat coupled with Xen para-virtualized DRM driver [2] by Xen para-virtual display protocol [3] grant references sharing mechanism, e.g. backend receives an array of Xen grant references to frontend's buffer pages. These grant references are then used to construct a PRIME buffer. The same mechanism is used when backend shares a buffer with the frontend, but in the other direction. More details on UAPI of the driver are available at [1]. So, when discussing a possibility to share dma-bufs in a generic way I would also like to have the following considered: 1. We are targeting ARM and one of the major requirements for the buffer sharing is the ability to allocate physically contiguous buffers, which gets even more complicated for systems not backed with an IOMMU. So, for some use-cases it is enough to make the buffers contiguous in terms of IPA and sometimes those need to be contiguous in terms of PA. (The use-case is that you use Wayland-DRM/KMS or share the buffer with the driver implemented with DRM CMA helpers). 2. For Xen we would love to see UAPI to create a dma-buf from grant references provided, so we can use this generic solution to implement zero-copying without breaking the existing Xen protocols. This can probably be extended to other hypervizors as well. Thank you, Oleksandr Andrushchenko and udmabuf, but it sounds like they're approaching similar problems from slightly different directions, so we should make sure we can come up with something that satisfies everyone's requirements. Matt On Wed, Mar 14, 2018 at 9:03 AM, Gerd Hoffmann <kra...@redhat.com> wrote: Hi, Either mlock account (because it's mlocked defacto), and get_user_pages won't do that for you. Or you write the full-blown userptr implementation, including mmu_notifier support (see i915 or amdgpu), but that also requires Christian Königs latest ->invalidate_mapping RFC for dma-buf (since atm exporting userptr buffers is a no-go). I guess I'll look at mlock accounting for starters then. Easier for now, and leaves the door open to switch to userptr later as this should be transparent to userspace. Known issue: Driver API isn't complete yet. Need add some flags, for example to support read-only buffers. dma-buf has no concept of read-only. I don't think we can even enforce that (not many iommus can enforce this iirc), so pretty much need to require r/w memory. Ah, ok. Just saw the 'write' arg for get_user_pages_fast and figured we might support that, but if iommus can't handle that anyway it's pointless indeed. Cc: David Airlie <airl...@linux.ie> Cc: Tomeu Vizoso <tomeu.viz...@collabora.com> Signed-off-by: Gerd Hoffmann <kra...@redhat.com> btw there's also the hyperdmabuf stuff from the xen folks, but imo their solution of forwarding the entire dma-buf api is over the top. This here looks _much_ better, pls cc all the hyperdmabuf people on your next version. Fun fact: googling for "hyperdmabuf" found me your mail and nothing else :-o (Trying "hyper dmabuf" instead worked better then). Yes, will cc them on the next version. Not sure it'll help much on xen though due to the memory management being very different. Basically xen owns the memory, not the kernel of the control domain (dom0), so creating dmabufs for guest memory chunks isn't that simple ... Also it's not clear whenever they really need guest -> guest exports or guest -> dom0 exports. Overall I like the idea, but too lazy to review. Cool. General comments on the idea was all I was looking for for the moment. Spare yor review cycles for the next version ;) Oh, some kselftests for this stuff would be lovely. I'll look into it. thanks, Gerd ___ dri-devel mailing list dri-de...@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch [1] https://patchwork.freedesktop.org/series/40880/ [2] https://cgit.freedesktop.org/drm/drm-misc/commit/?id=c575b7eeb89f94356997abd62d6d5a0590e259b7 [3] https://elixir.bootlin.com/linux/v4.16-rc7/source/include/xen/interface/io/displif.h