Re: [PATCH] reservation: don't wait when timeout=0
On Tue, Nov 21, 2017 at 9:38 AM, Chris Wilson <ch...@chris-wilson.co.uk> wrote: > Quoting Rob Clark (2017-11-21 14:08:46) >> If we are testing if a reservation object's fences have been >> signaled with timeout=0 (non-blocking), we need to pass 0 for >> timeout to dma_fence_wait_timeout(). >> >> Plus bonus spelling correction. >> >> Signed-off-by: Rob Clark <robdcl...@gmail.com> >> --- >> drivers/dma-buf/reservation.c | 11 +-- >> 1 file changed, 9 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c >> index dec3a815455d..71f51140a9ad 100644 >> --- a/drivers/dma-buf/reservation.c >> +++ b/drivers/dma-buf/reservation.c >> @@ -420,7 +420,7 @@ EXPORT_SYMBOL_GPL(reservation_object_get_fences_rcu); >> * >> * RETURNS >> * Returns -ERESTARTSYS if interrupted, 0 if the wait timed out, or >> - * greater than zer on success. >> + * greater than zero on success. >> */ >> long reservation_object_wait_timeout_rcu(struct reservation_object *obj, >> bool wait_all, bool intr, >> @@ -483,7 +483,14 @@ long reservation_object_wait_timeout_rcu(struct >> reservation_object *obj, >> goto retry; >> } >> >> - ret = dma_fence_wait_timeout(fence, intr, ret); >> + /* >> +* Note that dma_fence_wait_timeout() will return 1 if >> +* the fence is already signaled, so in the wait_all >> +* case when we go through the retry loop again, ret >> +* will be greater than 0 and we don't want this to >> +* cause _wait_timeout() to block >> +*/ >> + ret = dma_fence_wait_timeout(fence, intr, timeout ? ret : 0); > > One should ask if we should just fix the interface to stop returning > incorrect results (stop "correcting" a completion with 0 jiffies remaining > as 1). A timeout can be distinguished by -ETIME (or your pick of errno). perhaps -EBUSY, if we go that route (although maybe it should be a follow-on patch, this one is suitable for backport to stable/lts if one should so choose..) I think current approach was chosen to match schedule_timeout() and other such functions that take a timeout in jiffies. Not making a judgement on whether that is a good or bad reason.. BR, -R > -Chris
[PATCH] reservation: don't wait when timeout=0
If we are testing if a reservation object's fences have been signaled with timeout=0 (non-blocking), we need to pass 0 for timeout to dma_fence_wait_timeout(). Plus bonus spelling correction. Signed-off-by: Rob Clark <robdcl...@gmail.com> --- drivers/dma-buf/reservation.c | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c index dec3a815455d..71f51140a9ad 100644 --- a/drivers/dma-buf/reservation.c +++ b/drivers/dma-buf/reservation.c @@ -420,7 +420,7 @@ EXPORT_SYMBOL_GPL(reservation_object_get_fences_rcu); * * RETURNS * Returns -ERESTARTSYS if interrupted, 0 if the wait timed out, or - * greater than zer on success. + * greater than zero on success. */ long reservation_object_wait_timeout_rcu(struct reservation_object *obj, bool wait_all, bool intr, @@ -483,7 +483,14 @@ long reservation_object_wait_timeout_rcu(struct reservation_object *obj, goto retry; } - ret = dma_fence_wait_timeout(fence, intr, ret); + /* +* Note that dma_fence_wait_timeout() will return 1 if +* the fence is already signaled, so in the wait_all +* case when we go through the retry loop again, ret +* will be greater than 0 and we don't want this to +* cause _wait_timeout() to block +*/ + ret = dma_fence_wait_timeout(fence, intr, timeout ? ret : 0); dma_fence_put(fence); if (ret > 0 && wait_all && (i + 1 < shared_count)) goto retry; -- 2.13.6
Re: DRM Format Modifiers in v4l2
On Fri, Sep 1, 2017 at 3:13 AM, Laurent Pinchartwrote: > Hi Nicolas, > > On Thursday, 31 August 2017 19:12:58 EEST Nicolas Dufresne wrote: >> Le jeudi 31 août 2017 à 17:28 +0300, Laurent Pinchart a écrit : >> >> e.g. if I have two devices which support MODIFIER_FOO, I could attempt >> >> to share a buffer between them which uses MODIFIER_FOO without >> >> necessarily knowing exactly what it is/does. >> > >> > Userspace could certainly set modifiers blindly, but the point of >> > modifiers is to generate side effects benefitial to the use case at hand >> > (for instance by optimizing the memory access pattern). To use them >> > meaningfully userspace would need to have at least an idea of the side >> > effects they generate. >> >> Generic userspace will basically pick some random combination. > > In that case userspace could set no modifier at all by default (except in the > case where unmodified formats are not supported by the hardware, but I don't > expect that to be the most common case). > >> To allow generically picking the optimal configuration we could indeed rely >> on the application knowledge, but we could also enhance the spec so that >> the order in the enumeration becomes meaningful. > > I'm not sure how far we should go. I could imagine a system where the API > would report capabilities for modifiers (e.g. this modifier lowers the > bandwidth, this one enhances the quality, ...), but going in that direction, > where do we stop ? In practice I expect userspace to know some information > about the hardware, so I'd rather avoid over-engineering the API. > I think in the (hopefully not too) long term, something like https://github.com/cubanismo/allocator/ is the way forward. That doesn't quite solve how v4l2 kernel part sorts out w/ corresponding userspace .so what is preferable, but at least that is compartmentalized to v4l2.. on the gl/vk side of things there will ofc be a hardware specific userspace part that knows what it prefers. For v4l2, it probably makes sense to sort out what the userspace level API is and work backwards from there, rather than risk trying to design a kernel uapi that might turn out to be the wrong thing. BR, -R
Re: [PATCH] media: venus: hfi: fix error handling in hfi_sys_init_done()
On Sun, Jul 9, 2017 at 3:49 PM, Stanimir Varbanov <stanimir.varba...@linaro.org> wrote: > Hi Rob, > > On 07/09/2017 04:19 PM, Rob Clark wrote: >> Not entirely sure what triggers it, but with venus build as kernel >> module and in initrd, we hit this crash: > > Is it happens occasionally or everytime in the initrd? And also with > your patch it will bail out on venus_probe, does it crash again on next > venus_probe? seems to happen every time.. (module is ending up in initrd, but not sure if fw is.. which might be triggering this?) I could not boot successfully without this patch, but otoh I haven't yet tried if venus actually works after boot. BR, -R >> >> Unable to handle kernel paging request at virtual address 80003c039000 >> pgd = 0a14f000 >> [80003c039000] *pgd=bd9f7003, *pud=bd9f6003, >> *pmd=bd9f0003, *pte= >> Internal error: Oops: 9607 [#1] SMP >> Modules linked in: qcom_wcnss_pil(E+) crc32_ce(E) qcom_common(E) >> venus_core(E+) remoteproc(E) snd_soc_msm8916_digital(E) virtio_ring(E) >> cdc_ether(E) snd_soc_lpass_apq8016(E) snd_soc_lpass_cpu(E) >> snd_soc_apq8016_sbc(E) snd_soc_lpass_platform(E) v4l2_mem2mem(E) virtio(E) >> snd_soc_core(E) ac97_bus(E) snd_pcm_dmaengine(E) snd_seq(E) leds_gpio(E) >> videobuf2_v4l2(E) videobuf2_core(E) snd_seq_device(E) sndi_pcm(E) >> videodev(E) media(E) nvmem_qfprom(E) msm(E) snd_timer(E) snd(E) soundcore(E) >> spi_qup(E) mdt_loader(E) qcom_tsens(E) qcom_spmi_temp_alarm(E) nvmem_core(E) >> msm_rng(E) uas(E) usb_storage(E) dm9601(E) usbnet(E) mii(E) mmc_block(E) >> adv7511(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) >> fb_sys_fops(E) qcom_spmi_vadc(E) qcom_vadc_common(PE) industrialio(E) >> pinctrl_spmi_mpp(E) >>pinctrl_spmi_gpio(E) rtc_pm8xxx(E) clk_smd_rpm(E) sdhci_msm(E) >> sdhci_pltfm(E) qcom_smd_regulator(E) drm(E) smd_rpm(E) qcom_spmi_pmic(E) >> regmap_spmi(E) ci_hdrc_msm(E) ci_hdrc(E) usb3503(E) extcon_usb_gpio(E) >> phy_msm_usb(E) udc_core(E) qcom_hwspinlock(E) extcon_core(E) ehci_msm(E) >> i2c_qup(E) sdhci(E) mmc_core(E) spmi_pmic_arb(E) spmi(E) qcom_smd(E) smsm(E) >> rpmsg_core(E) smp2p(E) smem(E) hwspinlock_core(E) gpio_keys(E) >> CPU: 2 PID: 551 Comm: irq/150-venus Tainted: PE 4.12.0+ #1625 >> Hardware name: qualcomm dragonboard410c/dragonboard410c, BIOS >> 2017.07-rc2-00144-ga97bdbdf72-dirty 07/08/2017 >> task: 800037338000 task.stack: 800038e0 >> PC is at hfi_sys_init_done+0x64/0x140 [venus_core] >> LR is at hfi_process_msg_packet+0xcc/0x1e8 [venus_core] >> pc : [] lr : [] pstate: 20400145 >> sp : 800038e03c60 >> x29: 800038e03c60 x28: >> x27: 000df018 x26: 0118f4d0 >> x25: 00020003 x24: 80003a8d3010 >> x23: 0118f760 x22: 800037b40028 >> x21: 8000382981f0 x20: 800037b40028 >> x19: 80003c039000 x18: 0020 >> x17: x16: 800037338000 >> x15: x14: 00100014 >> x13: 00011007 x12: 00010020 >> x11: 100e x10: 0001 >> x9 : 0002 x8 : 00140001 >> x7 : 1010 x6 : 0148 >> x5 : 1009 x4 : 80003c039000 >> x3 : cd770abb x2 : 0042 >> x1 : 0788 x0 : 0002 >> Process irq/150-venus (pid: 551, stack limit = 0x800038e0) >> Call trace: >> [] hfi_sys_init_done+0x64/0x140 [venus_core] >> [] hfi_process_msg_packet+0xcc/0x1e8 [venus_core] >> [] venus_isr_thread+0x1b4/0x208 [venus_core] >> [] hfi_isr_thread+0x28/0x38 [venus_core] >> [] irq_thread_fn+0x30/0x70 >> [] irq_thread+0x14c/0x1c8 >> [] kthread+0x138/0x140 >> [] ret_from_fork+0x10/0x40 >> Code: 52820125 52820207 7a431820 54000249 (b9400263) >> ---[ end trace c963460f20a984b6 ]--- >> >> The problem is that in the error case, we've incremented the data ptr >> but not decremented rem_bytes, and keep reading (presumably garbage) >> until eventually we go beyond the end of the buffer. >> >> Instead, on first error, we should probably just bail out. Other >> option is to increment read_bytes by sizeof(u32) before the switch, >> rather than only accounting for the ptype header in the non-error >> case. Note that in this case it is HFI_ERR_SYS_INVALID_PARAMETER, >> ie. an unrecognized/unsupported parameter, so interpreting the next >> word as a property type would be bogus. The other erro
[PATCH] media: venus: hfi: fix error handling in hfi_sys_init_done()
Not entirely sure what triggers it, but with venus build as kernel module and in initrd, we hit this crash: Unable to handle kernel paging request at virtual address 80003c039000 pgd = 0a14f000 [80003c039000] *pgd=bd9f7003, *pud=bd9f6003, *pmd=bd9f0003, *pte= Internal error: Oops: 9607 [#1] SMP Modules linked in: qcom_wcnss_pil(E+) crc32_ce(E) qcom_common(E) venus_core(E+) remoteproc(E) snd_soc_msm8916_digital(E) virtio_ring(E) cdc_ether(E) snd_soc_lpass_apq8016(E) snd_soc_lpass_cpu(E) snd_soc_apq8016_sbc(E) snd_soc_lpass_platform(E) v4l2_mem2mem(E) virtio(E) snd_soc_core(E) ac97_bus(E) snd_pcm_dmaengine(E) snd_seq(E) leds_gpio(E) videobuf2_v4l2(E) videobuf2_core(E) snd_seq_device(E) snd_pcm(E) videodev(E) media(E) nvmem_qfprom(E) msm(E) snd_timer(E) snd(E) soundcore(E) spi_qup(E) mdt_loader(E) qcom_tsens(E) qcom_spmi_temp_alarm(E) nvmem_core(E) msm_rng(E) uas(E) usb_storage(E) dm9601(E) usbnet(E) mii(E) mmc_block(E) adv7511(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) qcom_spmi_vadc(E) qcom_vadc_common(PE) industrialio(E) pinctrl_spmi_mpp(E) pinctrl_spmi_gpio(E) rtc_pm8xxx(E) clk_smd_rpm(E) sdhci_msm(E) sdhci_pltfm(E) qcom_smd_regulator(E) drm(E) smd_rpm(E) qcom_spmi_pmic(E) regmap_spmi(E) ci_hdrc_msm(E) ci_hdrc(E) usb3503(E) extcon_usb_gpio(E) phy_msm_usb(E) udc_core(E) qcom_hwspinlock(E) extcon_core(E) ehci_msm(E) i2c_qup(E) sdhci(E) mmc_core(E) spmi_pmic_arb(E) spmi(E) qcom_smd(E) smsm(E) rpmsg_core(E) smp2p(E) smem(E) hwspinlock_core(E) gpio_keys(E) CPU: 2 PID: 551 Comm: irq/150-venus Tainted: PE 4.12.0+ #1625 Hardware name: qualcomm dragonboard410c/dragonboard410c, BIOS 2017.07-rc2-00144-ga97bdbdf72-dirty 07/08/2017 task: 800037338000 task.stack: 800038e0 PC is at hfi_sys_init_done+0x64/0x140 [venus_core] LR is at hfi_process_msg_packet+0xcc/0x1e8 [venus_core] pc : [] lr : [] pstate: 20400145 sp : 800038e03c60 x29: 800038e03c60 x28: x27: 000df018 x26: 0118f4d0 x25: 00020003 x24: 80003a8d3010 x23: 0118f760 x22: 800037b40028 x21: 8000382981f0 x20: 800037b40028 x19: 80003c039000 x18: 0020 x17: x16: 800037338000 x15: x14: 00100014 x13: 00011007 x12: 00010020 x11: 100e x10: 0001 x9 : 0002 x8 : 00140001 x7 : 1010 x6 : 0148 x5 : 1009 x4 : 80003c039000 x3 : cd770abb x2 : 0042 x1 : 0788 x0 : 0002 Process irq/150-venus (pid: 551, stack limit = 0x800038e0) Call trace: [] hfi_sys_init_done+0x64/0x140 [venus_core] [] hfi_process_msg_packet+0xcc/0x1e8 [venus_core] [] venus_isr_thread+0x1b4/0x208 [venus_core] [] hfi_isr_thread+0x28/0x38 [venus_core] [] irq_thread_fn+0x30/0x70 [] irq_thread+0x14c/0x1c8 [] kthread+0x138/0x140 [] ret_from_fork+0x10/0x40 Code: 52820125 52820207 7a431820 54000249 (b9400263) ---[ end trace c963460f20a984b6 ]--- The problem is that in the error case, we've incremented the data ptr but not decremented rem_bytes, and keep reading (presumably garbage) until eventually we go beyond the end of the buffer. Instead, on first error, we should probably just bail out. Other option is to increment read_bytes by sizeof(u32) before the switch, rather than only accounting for the ptype header in the non-error case. Note that in this case it is HFI_ERR_SYS_INVALID_PARAMETER, ie. an unrecognized/unsupported parameter, so interpreting the next word as a property type would be bogus. The other error cases are due to truncated buffer, so there isn't likely to be anything valid to interpret in the remainder of the buffer. So just bailing seems like a reasonable solution. Signed-off-by: Rob Clark <robdcl...@gmail.com> --- drivers/media/platform/qcom/venus/hfi_msgs.c | 11 ++- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/drivers/media/platform/qcom/venus/hfi_msgs.c b/drivers/media/platform/qcom/venus/hfi_msgs.c index debf80a92797..4190825b20a1 100644 --- a/drivers/media/platform/qcom/venus/hfi_msgs.c +++ b/drivers/media/platform/qcom/venus/hfi_msgs.c @@ -239,11 +239,12 @@ static void hfi_sys_init_done(struct venus_core *core, struct venus_inst *inst, break; } - if (!error) { - rem_bytes -= read_bytes; - data += read_bytes; - num_properties--; - } + if (error) + break; + + rem_bytes -= read_bytes; + data += read_bytes; + num_properties--; } err_no_prop: -- 2.13.0
Re: [RFC PATCH 00/12] Ion cleanup in preparation for moving out of staging
On Mon, Mar 13, 2017 at 5:09 PM, Laura Abbottwrote: >> Hm, we might want to expose all the heaps as individual >> /dev/ion_$heapname nodes? Should we do this from the start, since >> we're massively revamping the uapi anyway (imo not needed, current >> state seems to work too)? >> -Daniel >> > > I thought about that. One advantage with separate /dev/ion_$heap > is that we don't have to worry about a limit of 32 possible > heaps per system (32-bit heap id allocation field). But dealing > with an ioctl seems easier than names. Userspace might be less > likely to hardcode random id numbers vs. names as well. other advantage, I think, is selinux (brought up elsewhere on this thread).. heaps at known fixed PAs are useful for certain sorts of attacks so being able to restrict access more easily seems like a good thing BR, -R
Re: [RFC PATCH 00/12] Ion cleanup in preparation for moving out of staging
On Fri, Mar 10, 2017 at 7:40 AM, Daniel Vetterwrote: > On Fri, Mar 10, 2017 at 10:31:13AM +, Brian Starkey wrote: >> Hi, >> >> On Thu, Mar 09, 2017 at 09:38:49AM -0800, Laura Abbott wrote: >> > On 03/09/2017 02:00 AM, Benjamin Gaignard wrote: >> >> [snip] >> >> > > >> > > For me those patches are going in the right direction. >> > > >> > > I still have few questions: >> > > - since alignment management has been remove from ion-core, should it >> > > be also removed from ioctl structure ? >> > >> > Yes, I think I'm going to go with the suggestion to fixup the ABI >> > so we don't need the compat layer and as part of that I'm also >> > dropping the align argument. >> > >> >> Is the only motivation for removing the alignment parameter that >> no-one got around to using it for something useful yet? >> The original comment was true - different devices do have different >> alignment requirements. >> >> Better alignment can help SMMUs use larger blocks when mapping, >> reducing TLB pressure and the chance of a page table walk causing >> display underruns. > > Extending ioctl uapi is easy, trying to get rid of bad uapi is much > harder. Given that right now we don't have an ion allocator that does > alignment I think removing it makes sense. And if we go with lots of > heaps, we might as well have an ion heap per alignment that your hw needs, > so there's different ways to implement this in the future. slight correction: if you plan ahead (and do things like zero init if userspace passes in a smaller ioctl struct like drm_ioctl does), extending ioctl uapi is easy.. might be something worth fixing from the get-go.. BR, -R > At least from the unix device memory allocator pov it's probably simpler > to encode stuff like this into the heap name, instead of having to pass > heap + list of additional properties/constraints. > -Daniel > -- > Daniel Vetter > Software Engineer, Intel Corporation > http://blog.ffwll.ch > ___ > dri-devel mailing list > dri-de...@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC 0/6] Module for tracking/accounting shared memory buffers
On Tue, Oct 11, 2016 at 7:50 PM, Ruchi Kandoiwrote: > This patchstack introduces a new "memtrack" module for tracking and accounting > memory exported to userspace as shared buffers, like dma-buf fds or GEM > handles. btw, I wouldn't care much about the non-dmabuf case.. dri2/flink is kind of legacy and the sharing patterns there are not so complex that we have found the need for any more elaborate debug infrastructure than what we already have. Between existing dmabuf debugfs, and /proc/*/maps (and /proc/*/fd?), I wonder what is missing? Maybe there is a less intrusive way to get at the debugging info you want? BR, -R > Any process holding a reference to these buffers will keep the kernel from > reclaiming its backing pages. mm counters don't provide a complete picture of > these allocations, since they only account for pages that are mapped into a > process's address space. This problem is especially bad for systems like > Android that use dma-buf fds to share graphics and multimedia buffers between > processes: these allocations are often large, have complex sharing patterns, > and are rarely mapped into every process that holds a reference to them. > > memtrack maintains a per-process list of shared buffer references, which is > exported to userspace as /proc/[pid]/memtrack. Buffers can be optionally > "tagged" with a short string: for example, Android userspace would use this > tag to identify whether buffers were allocated on behalf of the camera stack, > GL, etc. memtrack also exports the VMAs associated with these buffers so > that pages already included in the process's mm counters aren't > double-counted. > > Shared-buffer allocators can hook into memtrack by embedding > struct memtrack_buffer in their buffer metadata, calling > memtrack_buffer_{init,remove} at buffer allocation and free time, and > memtrack_buffer_{install,uninstall} when a userspace process takes or > drops a reference to the buffer. For fd-backed buffers like dma-bufs, hooks > in > fdtable.c and fork.c automatically notify memtrack when references are added > or > removed from a process's fd table. > > This patchstack adds memtrack hooks into dma-buf and ion. If there's upstream > interest in memtrack, it can be extended to other memory allocators as well, > such as GEM implementations. > > Greg Hackmann (1): > drivers: staging: ion: add ION_IOC_TAG ioctl > > Ruchi Kandoi (5): > fs: add installed and uninstalled file_operations > drivers: misc: add memtrack > dma-buf: add memtrack support > memtrack: Adds the accounting to keep track of all mmaped/unmapped > pages. > memtrack: Add memtrack accounting for forked processes. > > drivers/android/binder.c| 4 +- > drivers/dma-buf/dma-buf.c | 37 +++ > drivers/misc/Kconfig| 16 + > drivers/misc/Makefile | 1 + > drivers/misc/memtrack.c | 516 > > drivers/staging/android/ion/ion-ioctl.c | 17 ++ > drivers/staging/android/ion/ion.c | 60 +++- > drivers/staging/android/ion/ion_priv.h | 2 + > drivers/staging/android/uapi/ion.h | 25 ++ > fs/file.c | 38 ++- > fs/open.c | 2 +- > fs/proc/base.c | 4 + > include/linux/dma-buf.h | 5 + > include/linux/fdtable.h | 4 +- > include/linux/fs.h | 2 + > include/linux/memtrack.h| 130 > include/linux/mm.h | 3 + > include/linux/sched.h | 3 + > kernel/fork.c | 23 +- > 19 files changed, 875 insertions(+), 17 deletions(-) > create mode 100644 drivers/misc/memtrack.c > create mode 100644 include/linux/memtrack.h > > -- > 2.8.0.rc3.226.g39d4020 > -- To unsubscribe from this list: send the line "unsubscribe linux-media" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v10 0/3] Secure Memory Allocation Framework
probably should keep the discussion on github (USAGE.md was updated a bit more and merged into https://github.com/cubanismo/allocator so look there for the latest).. but briefly: 1) my expectation is if the user is implementing some use-case, it knows what devices and APIs are involved, otherwise it wouldn't be able to pass a buffer to that device/API.. 2) if assertion/usage/devices haven't changed, you can re-use the merged caps for however many buffer allocations BR, -R On Fri, Oct 7, 2016 at 9:54 AM, Benjamin Gaignard <benjamin.gaign...@linaro.org> wrote: > Rob, > > how do you know which devices are concerned when listing the constraints ? > Does combine_capabilities is done from each allocation or can it be cached ? > > Regards, > Benjmain > > 2016-10-06 18:54 GMT+02:00 Rob Clark <robdcl...@gmail.com>: >> so there is discussion about a "central userspace allocator" (ie. more >> like a common userspace API that could be implemented on top of >> various devices/APIs) to decide in a generic way which device could >> allocate. >> >> https://github.com/cubanismo/allocator >> >> and I wrote up some rough thoughts/proposal about how the usage might >> look.. just rough, so don't try to compile it or anything, and not >> consensus yet so it will probably change/evolve.. >> >> https://github.com/robclark/allocator/blob/master/USAGE.md >> >> I think ion could be just another device to share buffers with, which >> happens to not impose any specific constraints. How "liballoc-ion.so" >> backend figures out how to map constraints/usage to a heap is a bit >> hand-wavey at the moment. >> >> BR, >> -R >> >> On Wed, Oct 5, 2016 at 9:40 AM, Benjamin Gaignard >> <benjamin.gaign...@linaro.org> wrote: >>> because with ion it is up to userland to decide which heap to use >>> and until now userland doesn't have any way to get device constraints... >>> >>> I will prefer let a central allocator (in kernel) decide from the >>> attached devices >>> which allocator is the best. It is what I have implemented in smaf. >>> >>> Benjamin >>> >>> >>> 2016-10-05 15:19 GMT+02:00 Daniel Vetter <dan...@ffwll.ch>: >>>> On Tue, Oct 04, 2016 at 01:47:21PM +0200, Benjamin Gaignard wrote: >>>>> version 10 changes: >>>>> - rebased on kernel 4.8 tag >>>>> - minor typo fix >>>>> >>>>> version 9 changes: >>>>> - rebased on 4.8-rc5 >>>>> - struct dma_attrs doesn't exist anymore so update CMA allocator >>>>>to compile with new dma_*_attr functions >>>>> - add example SMAF use case in cover letter >>>>> >>>>> version 8 changes: >>>>> - rework of the structures used within ioctl >>>>>by adding a version field and padding to be futur proof >>>>> - rename fake secure moduel to test secure module >>>>> - fix the various remarks done on the previous patcheset >>>>> >>>>> version 7 changes: >>>>> - rebased on kernel 4.6-rc7 >>>>> - simplify secure module API >>>>> - add vma ops to be able to detect mmap/munmap calls >>>>> - add ioctl to get number and allocator names >>>>> - update libsmaf with adding tests >>>>>https://git.linaro.org/people/benjamin.gaignard/libsmaf.git >>>>> - add debug log in fake secure module >>>>> >>>>> version 6 changes: >>>>> - rebased on kernel 4.5-rc4 >>>>> - fix mmapping bug while requested allocation size isn't a a multiple of >>>>>PAGE_SIZE (add a test for this in libsmaf) >>>>> >>>>> version 5 changes: >>>>> - rebased on kernel 4.3-rc6 >>>>> - rework locking schema and make handle status use an atomic_t >>>>> - add a fake secure module to allow performing tests without trusted >>>>>environment >>>>> >>>>> version 4 changes: >>>>> - rebased on kernel 4.3-rc3 >>>>> - fix missing EXPORT_SYMBOL for smaf_create_handle() >>>>> >>>>> version 3 changes: >>>>> - Remove ioctl for allocator selection instead provide the name of >>>>>the targeted allocator with allocation request. >>>>>Selecting allocator from userland isn't the prefered way of working >>>>>but is needed when the first user of th
Re: [PATCH v10 0/3] Secure Memory Allocation Framework
so there is discussion about a "central userspace allocator" (ie. more like a common userspace API that could be implemented on top of various devices/APIs) to decide in a generic way which device could allocate. https://github.com/cubanismo/allocator and I wrote up some rough thoughts/proposal about how the usage might look.. just rough, so don't try to compile it or anything, and not consensus yet so it will probably change/evolve.. https://github.com/robclark/allocator/blob/master/USAGE.md I think ion could be just another device to share buffers with, which happens to not impose any specific constraints. How "liballoc-ion.so" backend figures out how to map constraints/usage to a heap is a bit hand-wavey at the moment. BR, -R On Wed, Oct 5, 2016 at 9:40 AM, Benjamin Gaignardwrote: > because with ion it is up to userland to decide which heap to use > and until now userland doesn't have any way to get device constraints... > > I will prefer let a central allocator (in kernel) decide from the > attached devices > which allocator is the best. It is what I have implemented in smaf. > > Benjamin > > > 2016-10-05 15:19 GMT+02:00 Daniel Vetter : >> On Tue, Oct 04, 2016 at 01:47:21PM +0200, Benjamin Gaignard wrote: >>> version 10 changes: >>> - rebased on kernel 4.8 tag >>> - minor typo fix >>> >>> version 9 changes: >>> - rebased on 4.8-rc5 >>> - struct dma_attrs doesn't exist anymore so update CMA allocator >>>to compile with new dma_*_attr functions >>> - add example SMAF use case in cover letter >>> >>> version 8 changes: >>> - rework of the structures used within ioctl >>>by adding a version field and padding to be futur proof >>> - rename fake secure moduel to test secure module >>> - fix the various remarks done on the previous patcheset >>> >>> version 7 changes: >>> - rebased on kernel 4.6-rc7 >>> - simplify secure module API >>> - add vma ops to be able to detect mmap/munmap calls >>> - add ioctl to get number and allocator names >>> - update libsmaf with adding tests >>>https://git.linaro.org/people/benjamin.gaignard/libsmaf.git >>> - add debug log in fake secure module >>> >>> version 6 changes: >>> - rebased on kernel 4.5-rc4 >>> - fix mmapping bug while requested allocation size isn't a a multiple of >>>PAGE_SIZE (add a test for this in libsmaf) >>> >>> version 5 changes: >>> - rebased on kernel 4.3-rc6 >>> - rework locking schema and make handle status use an atomic_t >>> - add a fake secure module to allow performing tests without trusted >>>environment >>> >>> version 4 changes: >>> - rebased on kernel 4.3-rc3 >>> - fix missing EXPORT_SYMBOL for smaf_create_handle() >>> >>> version 3 changes: >>> - Remove ioctl for allocator selection instead provide the name of >>>the targeted allocator with allocation request. >>>Selecting allocator from userland isn't the prefered way of working >>>but is needed when the first user of the buffer is a software component. >>> - Fix issues in case of error while creating smaf handle. >>> - Fix module license. >>> - Update libsmaf and tests to care of the SMAF API evolution >>>https://git.linaro.org/people/benjamin.gaignard/libsmaf.git >>> >>> version 2 changes: >>> - Add one ioctl to allow allocator selection from userspace. >>>This is required for the uses case where the first user of >>>the buffer is a software IP which can't perform dma_buf attachement. >>> - Add name and ranking to allocator structure to be able to sort them. >>> - Create a tiny library to test SMAF: >>>https://git.linaro.org/people/benjamin.gaignard/libsmaf.git >>> - Fix one issue when try to secure buffer without secure module registered >>> >>> SMAF aim to solve two problems: allocating memory that fit with hardware IPs >>> constraints and secure those data from bus point of view. >>> >>> One example of SMAF usage is camera preview: on SoC you may use either an >>> USB >>> webcam or the built-in camera interface and the frames could be send >>> directly >>> to the dipslay Ip or handle by GPU. >>> Most of USB interfaces and GPU have mmu but almost all built-in camera >>> interace and display Ips don't have mmu so when selecting how allocate >>> buffer you need to be aware of each devices constraints (contiguous memroy, >>> stride, boundary, alignment ...). >>> ION has solve this problem by let userland decide which allocator (heap) to >>> use >>> but this require to adapt userland for each platform and sometime for each >>> use case. >>> >>> To be sure to select the best allocation method for devices SMAF implement >>> deferred allocation mechanism: memory allocation is only done when the first >>> device effectively required it. >>> Allocator modules have to implement a match() to let SMAF know if they are >>> compatibles with devices needs. >>> This patch set provide an example of allocator module which use >>> dma_{alloc/free/mmap}_attrs() and
Re: [Mesa-dev] [RFC] New dma_buf -> EGLImage EGL extension - Final spec published!
Tom, hmm, I wonder if it was a bug/oversight for the YUV capabilities of this extension to not depend on OES_EGL_image_external (which unfortunately, doesn't seem to have a GL counterpart)? I think this currently implies that you could sample from an imported YUV eglimg using (for example) sampler2D in GL or GLES, which I think was not the intention. BR, -R On Mon, Feb 25, 2013 at 6:54 AM, Tom Cooksey <tom.cook...@arm.com> wrote: > Hi All, > > The final spec has had enum values assigned and been published on Khronos: > > http://www.khronos.org/registry/egl/extensions/EXT/EGL_EXT_image_dma_buf_import.txt > > Thanks to all who've provided input. > > > Cheers, > > Tom > > > >> -Original Message- >> From: mesa-dev-bounces+tom.cooksey=arm@lists.freedesktop.org >> [mailto:mesa-dev- >> bounces+tom.cooksey=arm@lists.freedesktop.org] On Behalf Of Tom Cooksey >> Sent: 04 October 2012 13:10 >> To: mesa-...@lists.freedesktop.org; linaro-mm-...@lists.linaro.org; dri- >> de...@lists.freedesktop.org; linux-media@vger.kernel.org >> Subject: [Mesa-dev] [RFC] New dma_buf -> EGLImage EGL extension - New draft! >> >> Hi All, >> >> After receiving a fair bit of feedback (thanks!), I've updated the >> EGL_EXT_image_dma_buf_import spec >> and expanded it to resolve a number of the issues. Please find the latest >> draft below and let >> me >> know any additional feedback you might have, either on the lists or by >> private e-mail - I >> don't mind >> which. >> >> I think the only remaining issue now is if we need a mechanism whereby an >> application can >> query >> which drm_fourcc.h formats EGL supports or if just failing with >> EGL_BAD_MATCH when the >> application >> has use one EGL doesn't support is sufficient. Any thoughts? >> >> >> Cheers, >> >> Tom >> >> >> 8< >> >> >> Name >> >> EXT_image_dma_buf_import >> >> Name Strings >> >> EGL_EXT_image_dma_buf_import >> >> Contributors >> >> Jesse Barker >> Rob Clark >> Tom Cooksey >> >> Contacts >> >> Jesse Barker (jesse 'dot' barker 'at' linaro 'dot' org) >> Tom Cooksey (tom 'dot' cooksey 'at' arm 'dot' com) >> >> Status >> >> DRAFT >> >> Version >> >> Version 4, October 04, 2012 >> >> Number >> >> EGL Extension ??? >> >> Dependencies >> >> EGL 1.2 is required. >> >> EGL_KHR_image_base is required. >> >> The EGL implementation must be running on a Linux kernel supporting the >> dma_buf buffer sharing mechanism. >> >> This extension is written against the wording of the EGL 1.2 >> Specification. >> >> Overview >> >> This extension allows creating an EGLImage from a Linux dma_buf file >> descriptor or multiple file descriptors in the case of multi-plane YUV >> images. >> >> New Types >> >> None >> >> New Procedures and Functions >> >> None >> >> New Tokens >> >> Accepted by the parameter of eglCreateImageKHR: >> >> EGL_LINUX_DMA_BUF_EXT >> >> Accepted as an attribute in the parameter of >> eglCreateImageKHR: >> >> EGL_LINUX_DRM_FOURCC_EXT >> EGL_DMA_BUF_PLANE0_FD_EXT >> EGL_DMA_BUF_PLANE0_OFFSET_EXT >> EGL_DMA_BUF_PLANE0_PITCH_EXT >> EGL_DMA_BUF_PLANE1_FD_EXT >> EGL_DMA_BUF_PLANE1_OFFSET_EXT >> EGL_DMA_BUF_PLANE1_PITCH_EXT >> EGL_DMA_BUF_PLANE2_FD_EXT >> EGL_DMA_BUF_PLANE2_OFFSET_EXT >> EGL_DMA_BUF_PLANE2_PITCH_EXT >> EGL_YUV_COLOR_SPACE_HINT_EXT >> EGL_SAMPLE_RANGE_HINT_EXT >> EGL_YUV_CHROMA_HORIZONTAL_SITING_HINT_EXT >> EGL_YUV_CHROMA_VERTICAL_SITING_HINT_EXT >> >> Accepted as the value for the EGL_YUV_COLOR_SPACE_HINT_EXT attribute: >> >> EGL_ITU_REC601_EXT >> EGL_ITU_REC709_EXT >> EGL_ITU_REC2020_EXT >> >> Accepted as the value for the EGL_SAMPLE_RANGE_HINT_EXT attribute: >> >> EGL_YUV_FULL_RANGE_EXT >> EGL_YUV_NARROW_RANGE_EXT >> >> Accepted as the value for the EGL_YUV_CHROMA_HORIZONTAL_SITING_HINT_EXT & &g
Re: DRM device memory writeback (Mali-DP)
On Mon, Jul 18, 2016 at 11:01 AM, Daniel Vetterwrote: > On Mon, Jul 18, 2016 at 12:47:38PM +0200, Hans Verkuil wrote: >> On 07/18/2016 12:29 PM, Brian Starkey wrote: >> > OK, so let's talk about using connectors instead then :-) >> > >> > We can attach a generic fixed_mode property which can represent panel >> > fitters or Mali-DP's writeback scaling, that sounds good. >> > >> > The DRM driver can create a new "virtual" encoder/connector pair, I >> > guess with a new connector type specifically for memory-writeback? >> > On this connector we allow the fb_id property to attach a framebuffer >> > for writing into. >> > We'd probably want an additional property to enable writeback, perhaps >> > an enum: "disabled", "streaming", "oneshot". >> > >> > I think it makes sense to put the output buffer format, size etc. >> > validation in the virtual encoder's atomic_check. It has access to >> > both CRTC and connector state, and the encoder is supposed to >> > represent the format/stream conversion element anyway. >> > >> > What I'm not so clear on is how to manage the API with userspace. >> >> I am not terribly enthusiastic (to say the least) if a new drm API is >> created for video capture. That's what V4L2 is for and it comes with >> kernel frameworks, documentation and utilities. Creating a new API will >> not make userspace develoeprs happy. >> >> I know it is a pain to work with different subsystems, but reinventing >> the wheel is a much bigger pain. For you and for the poor sods who have >> to use yet another API to do the same thing. >> >> Of course this has to be hooked up to drm at the low level, and anything >> that can be done to help out with that from the V4L2 side of things is >> fine, but creating duplicate public APIs isn't the way IMHO. > > I agree for the streaming video use-case. But for compositors I do agree > with Eric that a simple frame capture interface integrated with the drm > atomic ioctl is what we want. At least my understanding of v4l is that > it's not that well suited to grabbing specific (single) frames. same here, we defn want to use v4l for the streaming case (so we can take advantage of existing userspace support for capture/encode/stream, etc).. but for compositors, I think v4l would be awkward. Ideal case is single set of driver APIs, so driver doesn't have to care so much about the use case, and then some drm_v4l helpers to expose a v4l device for the streaming case. There are some older patches floating around that implemented v4l inside drm/msm.. which looked simple enough, although I don't think we should need to do that in each drm driver.. BR, -R -- To unsubscribe from this list: send the line "unsubscribe linux-media" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Mesa-dev] [RFC] New dma_buf -> EGLImage EGL extension - Final spec published!
On Mon, Jun 20, 2016 at 8:37 AM, Pekka Paalanen <ppaala...@gmail.com> wrote: > On Fri, 17 Jun 2016 11:44:34 -0400 > Rob Clark <robdcl...@gmail.com> wrote: > >> On Fri, Jun 17, 2016 at 9:31 AM, Pekka Paalanen <ppaala...@gmail.com> wrote: >> > On Fri, 17 Jun 2016 08:26:04 -0400 >> > Rob Clark <robdcl...@gmail.com> wrote: >> > >> >> On Fri, Jun 17, 2016 at 3:59 AM, Pekka Paalanen <ppaala...@gmail.com> >> >> wrote: >> >> > On Thu, 16 Jun 2016 10:40:51 -0400 >> >> > Rob Clark <robdcl...@gmail.com> wrote: >> >> > >> >> >> So, if we wanted to extend this to support the fourcc-modifiers that >> >> >> we have on the kernel side for compressed/tiled/etc formats, what >> >> >> would be the right approach? >> >> >> >> >> >> A new version of the existing extension or a new >> >> >> EGL_EXT_image_dma_buf_import2 extension, or ?? >> >> > >> >> > Hi Rob, >> >> > >> >> > there are actually several things it might be nice to add: >> >> > >> >> > - a fourth plane, to match what DRM AddFB2 supports >> >> > >> >> > - the 64-bit fb modifiers >> >> > >> >> > - queries for which pixel formats are supported by EGL, so a display >> >> > server can tell the applications that before the application goes and >> >> > tries with a random bunch of them, shooting in the dark >> >> > >> >> > - queries for which modifiers are supported for each pixel format, ditto >> >> > >> >> > I discussed these with Emil in the past, and it seems an appropriate >> >> > approach might be the following. >> >> > >> >> > Adding the 4th plane can be done as revising the existing >> >> > EGL_EXT_image_dma_buf_import extension. The plane count is tied to >> >> > pixel formats (and modifiers?), so the user does not need to know >> >> > specifically whether the EGL implementation could handle a 4th plane or >> >> > not. It is implied by the pixel format. >> >> > >> >> > Adding the fb modifiers needs to be a new extension, so that users can >> >> > tell if they are supported or not. This is to avoid the following false >> >> > failure: if user assumes modifiers are always supported, it will (may?) >> >> > provide zero modifiers explicitly. If EGL implementation does not >> >> > handle modifiers this would be rejected as unrecognized attributes, >> >> > while if the zero modifiers were not given explicitly, everything would >> >> > just work. >> >> >> >> hmm, if we design it as "not passing modifier" == "zero modifier", and >> >> "never explicitly pass a zero modifier" then modifiers could be added >> >> without a new extension. Although I agree that queries would need a >> >> new extension.. so perhaps not worth being clever. >> > >> > Indeed. >> > >> >> > The queries obviously(?) need a new extension. It might make sense >> >> > to bundle both modifier support and the queries in the same new >> >> > extension. >> >> > >> >> > We have some rough old WIP code at >> >> > https://git.collabora.com/cgit/user/lfrb/mesa.git/log/?h=T1410-modifiers >> >> > https://git.collabora.com/cgit/user/lfrb/egl-specs.git/log/?h=T1410 >> >> > >> >> > >> >> >> On Mon, Feb 25, 2013 at 6:54 AM, Tom Cooksey <tom.cook...@arm.com> >> >> >> wrote: >> >> >> > Hi All, >> >> >> > >> >> >> > The final spec has had enum values assigned and been published on >> >> >> > Khronos: >> >> >> > >> >> >> > http://www.khronos.org/registry/egl/extensions/EXT/EGL_EXT_image_dma_buf_import.txt >> >> >> > >> >> >> > Thanks to all who've provided input. >> >> > >> >> > May I also pull your attention to a detail with the existing spec and >> >> > Mesa behaviour I am asking about in >> >> > https://lists.freedesktop.org/archives/mesa-dev/2016-June/120249.html >> >> > "What is EGL_EXT_image_dma_buf_import image orientation as a GL >> >> > texture
Re: [Mesa-dev] [RFC] New dma_buf -> EGLImage EGL extension - Final spec published!
On Fri, Jun 17, 2016 at 9:31 AM, Pekka Paalanen <ppaala...@gmail.com> wrote: > On Fri, 17 Jun 2016 08:26:04 -0400 > Rob Clark <robdcl...@gmail.com> wrote: > >> On Fri, Jun 17, 2016 at 3:59 AM, Pekka Paalanen <ppaala...@gmail.com> wrote: >> > On Thu, 16 Jun 2016 10:40:51 -0400 >> > Rob Clark <robdcl...@gmail.com> wrote: >> > >> >> So, if we wanted to extend this to support the fourcc-modifiers that >> >> we have on the kernel side for compressed/tiled/etc formats, what >> >> would be the right approach? >> >> >> >> A new version of the existing extension or a new >> >> EGL_EXT_image_dma_buf_import2 extension, or ?? >> > >> > Hi Rob, >> > >> > there are actually several things it might be nice to add: >> > >> > - a fourth plane, to match what DRM AddFB2 supports >> > >> > - the 64-bit fb modifiers >> > >> > - queries for which pixel formats are supported by EGL, so a display >> > server can tell the applications that before the application goes and >> > tries with a random bunch of them, shooting in the dark >> > >> > - queries for which modifiers are supported for each pixel format, ditto >> > >> > I discussed these with Emil in the past, and it seems an appropriate >> > approach might be the following. >> > >> > Adding the 4th plane can be done as revising the existing >> > EGL_EXT_image_dma_buf_import extension. The plane count is tied to >> > pixel formats (and modifiers?), so the user does not need to know >> > specifically whether the EGL implementation could handle a 4th plane or >> > not. It is implied by the pixel format. >> > >> > Adding the fb modifiers needs to be a new extension, so that users can >> > tell if they are supported or not. This is to avoid the following false >> > failure: if user assumes modifiers are always supported, it will (may?) >> > provide zero modifiers explicitly. If EGL implementation does not >> > handle modifiers this would be rejected as unrecognized attributes, >> > while if the zero modifiers were not given explicitly, everything would >> > just work. >> >> hmm, if we design it as "not passing modifier" == "zero modifier", and >> "never explicitly pass a zero modifier" then modifiers could be added >> without a new extension. Although I agree that queries would need a >> new extension.. so perhaps not worth being clever. > > Indeed. > >> > The queries obviously(?) need a new extension. It might make sense >> > to bundle both modifier support and the queries in the same new >> > extension. >> > >> > We have some rough old WIP code at >> > https://git.collabora.com/cgit/user/lfrb/mesa.git/log/?h=T1410-modifiers >> > https://git.collabora.com/cgit/user/lfrb/egl-specs.git/log/?h=T1410 >> > >> > >> >> On Mon, Feb 25, 2013 at 6:54 AM, Tom Cooksey <tom.cook...@arm.com> wrote: >> >> > Hi All, >> >> > >> >> > The final spec has had enum values assigned and been published on >> >> > Khronos: >> >> > >> >> > http://www.khronos.org/registry/egl/extensions/EXT/EGL_EXT_image_dma_buf_import.txt >> >> > >> >> > Thanks to all who've provided input. >> > >> > May I also pull your attention to a detail with the existing spec and >> > Mesa behaviour I am asking about in >> > https://lists.freedesktop.org/archives/mesa-dev/2016-June/120249.html >> > "What is EGL_EXT_image_dma_buf_import image orientation as a GL texture?" >> > Doing a dmabuf import seems to imply an y-flip AFAICT. >> >> I would have expected that *any* egl external image (dma-buf or >> otherwise) should have native orientation rather than gl orientation. >> It's somewhat useless otherwise. > > In that case importing dmabuf works differently than importing a > wl_buffer (wl_drm), because for the latter, the y-invert flag is > returned such that the orientation will match GL. And the direct > scanout path goes through GBM since you have to import a wl_buffer, and > I haven't looked what GBM does wrt. y-flip if anything. > >> I didn't read it carefully yet (would need caffeine first ;-)) but >> EGL_KHR_image_base does say "This extension defines a new EGL resource >> type that is suitable for sharing 2D arrays of image data between >> client APIs" which to me implies native or
Re: [Mesa-dev] [RFC] New dma_buf -> EGLImage EGL extension - Final spec published!
On Fri, Jun 17, 2016 at 3:59 AM, Pekka Paalanen <ppaala...@gmail.com> wrote: > On Thu, 16 Jun 2016 10:40:51 -0400 > Rob Clark <robdcl...@gmail.com> wrote: > >> So, if we wanted to extend this to support the fourcc-modifiers that >> we have on the kernel side for compressed/tiled/etc formats, what >> would be the right approach? >> >> A new version of the existing extension or a new >> EGL_EXT_image_dma_buf_import2 extension, or ?? > > Hi Rob, > > there are actually several things it might be nice to add: > > - a fourth plane, to match what DRM AddFB2 supports > > - the 64-bit fb modifiers > > - queries for which pixel formats are supported by EGL, so a display > server can tell the applications that before the application goes and > tries with a random bunch of them, shooting in the dark > > - queries for which modifiers are supported for each pixel format, ditto > > I discussed these with Emil in the past, and it seems an appropriate > approach might be the following. > > Adding the 4th plane can be done as revising the existing > EGL_EXT_image_dma_buf_import extension. The plane count is tied to > pixel formats (and modifiers?), so the user does not need to know > specifically whether the EGL implementation could handle a 4th plane or > not. It is implied by the pixel format. > > Adding the fb modifiers needs to be a new extension, so that users can > tell if they are supported or not. This is to avoid the following false > failure: if user assumes modifiers are always supported, it will (may?) > provide zero modifiers explicitly. If EGL implementation does not > handle modifiers this would be rejected as unrecognized attributes, > while if the zero modifiers were not given explicitly, everything would > just work. hmm, if we design it as "not passing modifier" == "zero modifier", and "never explicitly pass a zero modifier" then modifiers could be added without a new extension. Although I agree that queries would need a new extension.. so perhaps not worth being clever. > The queries obviously(?) need a new extension. It might make sense > to bundle both modifier support and the queries in the same new > extension. > > We have some rough old WIP code at > https://git.collabora.com/cgit/user/lfrb/mesa.git/log/?h=T1410-modifiers > https://git.collabora.com/cgit/user/lfrb/egl-specs.git/log/?h=T1410 > > >> On Mon, Feb 25, 2013 at 6:54 AM, Tom Cooksey <tom.cook...@arm.com> wrote: >> > Hi All, >> > >> > The final spec has had enum values assigned and been published on Khronos: >> > >> > http://www.khronos.org/registry/egl/extensions/EXT/EGL_EXT_image_dma_buf_import.txt >> > >> > Thanks to all who've provided input. > > May I also pull your attention to a detail with the existing spec and > Mesa behaviour I am asking about in > https://lists.freedesktop.org/archives/mesa-dev/2016-June/120249.html > "What is EGL_EXT_image_dma_buf_import image orientation as a GL texture?" > Doing a dmabuf import seems to imply an y-flip AFAICT. I would have expected that *any* egl external image (dma-buf or otherwise) should have native orientation rather than gl orientation. It's somewhat useless otherwise. I didn't read it carefully yet (would need caffeine first ;-)) but EGL_KHR_image_base does say "This extension defines a new EGL resource type that is suitable for sharing 2D arrays of image data between client APIs" which to me implies native orientation. So that just sounds like a mesa bug somehow? Do you just get that w/ i965? I know some linaro folks have been using this extension to import buffers from video decoder with freedreno/gallium and no one mentioned the video being upside down. BR, -R > > Thanks, > pq -- To unsubscribe from this list: send the line "unsubscribe linux-media" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Mesa-dev] [RFC] New dma_buf -> EGLImage EGL extension - Final spec published!
So, if we wanted to extend this to support the fourcc-modifiers that we have on the kernel side for compressed/tiled/etc formats, what would be the right approach? A new version of the existing extension or a new EGL_EXT_image_dma_buf_import2 extension, or ?? BR, -R On Mon, Feb 25, 2013 at 6:54 AM, Tom Cooksey <tom.cook...@arm.com> wrote: > Hi All, > > The final spec has had enum values assigned and been published on Khronos: > > http://www.khronos.org/registry/egl/extensions/EXT/EGL_EXT_image_dma_buf_import.txt > > Thanks to all who've provided input. > > > Cheers, > > Tom > > > >> -Original Message- >> From: mesa-dev-bounces+tom.cooksey=arm@lists.freedesktop.org >> [mailto:mesa-dev- >> bounces+tom.cooksey=arm@lists.freedesktop.org] On Behalf Of Tom Cooksey >> Sent: 04 October 2012 13:10 >> To: mesa-...@lists.freedesktop.org; linaro-mm-...@lists.linaro.org; dri- >> de...@lists.freedesktop.org; linux-media@vger.kernel.org >> Subject: [Mesa-dev] [RFC] New dma_buf -> EGLImage EGL extension - New draft! >> >> Hi All, >> >> After receiving a fair bit of feedback (thanks!), I've updated the >> EGL_EXT_image_dma_buf_import spec >> and expanded it to resolve a number of the issues. Please find the latest >> draft below and let >> me >> know any additional feedback you might have, either on the lists or by >> private e-mail - I >> don't mind >> which. >> >> I think the only remaining issue now is if we need a mechanism whereby an >> application can >> query >> which drm_fourcc.h formats EGL supports or if just failing with >> EGL_BAD_MATCH when the >> application >> has use one EGL doesn't support is sufficient. Any thoughts? >> >> >> Cheers, >> >> Tom >> >> >> 8< >> >> >> Name >> >> EXT_image_dma_buf_import >> >> Name Strings >> >> EGL_EXT_image_dma_buf_import >> >> Contributors >> >> Jesse Barker >> Rob Clark >> Tom Cooksey >> >> Contacts >> >> Jesse Barker (jesse 'dot' barker 'at' linaro 'dot' org) >> Tom Cooksey (tom 'dot' cooksey 'at' arm 'dot' com) >> >> Status >> >> DRAFT >> >> Version >> >> Version 4, October 04, 2012 >> >> Number >> >> EGL Extension ??? >> >> Dependencies >> >> EGL 1.2 is required. >> >> EGL_KHR_image_base is required. >> >> The EGL implementation must be running on a Linux kernel supporting the >> dma_buf buffer sharing mechanism. >> >> This extension is written against the wording of the EGL 1.2 >> Specification. >> >> Overview >> >> This extension allows creating an EGLImage from a Linux dma_buf file >> descriptor or multiple file descriptors in the case of multi-plane YUV >> images. >> >> New Types >> >> None >> >> New Procedures and Functions >> >> None >> >> New Tokens >> >> Accepted by the parameter of eglCreateImageKHR: >> >> EGL_LINUX_DMA_BUF_EXT >> >> Accepted as an attribute in the parameter of >> eglCreateImageKHR: >> >> EGL_LINUX_DRM_FOURCC_EXT >> EGL_DMA_BUF_PLANE0_FD_EXT >> EGL_DMA_BUF_PLANE0_OFFSET_EXT >> EGL_DMA_BUF_PLANE0_PITCH_EXT >> EGL_DMA_BUF_PLANE1_FD_EXT >> EGL_DMA_BUF_PLANE1_OFFSET_EXT >> EGL_DMA_BUF_PLANE1_PITCH_EXT >> EGL_DMA_BUF_PLANE2_FD_EXT >> EGL_DMA_BUF_PLANE2_OFFSET_EXT >> EGL_DMA_BUF_PLANE2_PITCH_EXT >> EGL_YUV_COLOR_SPACE_HINT_EXT >> EGL_SAMPLE_RANGE_HINT_EXT >> EGL_YUV_CHROMA_HORIZONTAL_SITING_HINT_EXT >> EGL_YUV_CHROMA_VERTICAL_SITING_HINT_EXT >> >> Accepted as the value for the EGL_YUV_COLOR_SPACE_HINT_EXT attribute: >> >> EGL_ITU_REC601_EXT >> EGL_ITU_REC709_EXT >> EGL_ITU_REC2020_EXT >> >> Accepted as the value for the EGL_SAMPLE_RANGE_HINT_EXT attribute: >> >> EGL_YUV_FULL_RANGE_EXT >> EGL_YUV_NARROW_RANGE_EXT >> >> Accepted as the value for the EGL_YUV_CHROMA_HORIZONTAL_SITING_HINT_EXT & >> EGL_YUV_CHROMA_VERTICAL_SITING_HINT_EXT attributes: >> >> EGL_YUV_CHROMA_SITING_0_EXT >> EGL_YUV_CHROMA_SITING_0_5_EXT >> >
Re: gstreamer: v4l2videodec plugin
On Thu, Apr 28, 2016 at 7:33 AM, Stanimir Varbanov <stanimir.varba...@linaro.org> wrote: > On 04/15/2016 07:09 PM, Nicolas Dufresne wrote: >> Le vendredi 15 avril 2016 à 11:58 -0400, Rob Clark a écrit : >>> The issue is probably the YUV format, which we cannot really deal >>> with >>> properly in gallium.. it's a similar issue to multi-planer even if >>> it >>> is in a single buffer. >>> >>> The best way to handle this would be to import the same dmabuf fd >>> twice, with appropriate offsets, to create one GL_RED eglimage for Y >>> and one GL_RG eglimage for UV, and then combine them in shader in a >>> similar way to how you'd handle separate Y and UV planes.. >> >> That's the strategy we use in GStreamer, as very few GL stack support >> implicit color conversions. For that to work you need to implement the >> "offset" field in winsys_handle, that was added recently, and make sure >> you have R8 and RG88 support (usually this is just mapping). > > Thanks, > > OK, I have made the relevant changes in Mesa and now I have image but > the U and V components are swapped (the format is NV12, the UV plane is > at the same buffer but at offset). Digging further and tracing gstreamer > with apitrace I'm observing something weird. > > The gst import dmabuf with following call: > > eglCreateImageKHR(dpy = 0x7fa8013030, ctx = NULL, target = > EGL_LINUX_DMA_BUF_EXT, buffer = NULL, attrib_list = {EGL_WIDTH, 640, > EGL_HEIGHT, 360, EGL_LINUX_DRM_FOURCC_EXT, 943215175, > EGL_DMA_BUF_PLANE0_FD_EXT, 29, EGL_DMA_BUF_PLANE0_OFFSET_EXT, 942080, > EGL_DMA_BUF_PLANE0_PITCH_EXT, 1280, EGL_NONE}) = 0x7f980027d0 > > the fourcc format is DRM_FORMAT_GR88 (943215175 decimal). > > after that: > > glTexImage2D(target = GL_TEXTURE_2D, level = 0, internalformat = GL_RG8, > width = 640, height = 360, border = 0, format = GL_RG, type = > GL_UNSIGNED_BYTE, pixels = NULL) > > and finally on the fragment shader we have: > > yuv.x=texture2D(Ytex, texcoord * tex_scale0).r; > yuv.yz=texture2D(UVtex, texcoord * tex_scale1).rg; > > I was expecting to see DRM_FORMAT_RG88 / GL_RG and shader sampling > y <- r > z <- g > > or DRM_FORMAT_GR88 / GL_RG and shader sampling > y <- g > z <- r IIRC you are using gles? Could you recompile glimagesink to use desktop GL? I'm wondering a bit, but just speculation since I don't have a way to step through it, but the 'if (_mesa_is_gles())' case in st_ChooseTextureFormat.. normally for gles the driver is more free to choose the corresponding internal-format, which is maybe not the right thing to do for textures which are imported eglimages. (if recompiling mesa is easier, you could just change that to 'if (0)' and see if it "fixes" things.. that ofc is not the proper fix, but it would confirm whether this is what is going on..) BR, -R > Also, browsing the code in Mesa for Intel i965 dri driver I found where > the __DRI_IMAGE_FORMAT_GR88 becomes MESA_FORMAT_R8G8_UNORM [1]. > > So I'm wondering is that intensional? > > Depending on the answer I should make the same in the Gallium dri2 in > dri2_from_dma_bufs(). > > -- > regards, > Stan > > [1] > https://cgit.freedesktop.org/mesa/mesa/tree/src/mesa/drivers/dri/common/dri_util.c#n878 > -- To unsubscribe from this list: send the line "unsubscribe linux-media" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: gstreamer: v4l2videodec plugin
On Fri, Apr 15, 2016 at 12:09 PM, Nicolas Dufresne <nico...@ndufresne.ca> wrote: > Le vendredi 15 avril 2016 à 11:58 -0400, Rob Clark a écrit : >> The issue is probably the YUV format, which we cannot really deal >> with >> properly in gallium.. it's a similar issue to multi-planer even if >> it >> is in a single buffer. >> >> The best way to handle this would be to import the same dmabuf fd >> twice, with appropriate offsets, to create one GL_RED eglimage for Y >> and one GL_RG eglimage for UV, and then combine them in shader in a >> similar way to how you'd handle separate Y and UV planes.. > > That's the strategy we use in GStreamer, as very few GL stack support > implicit color conversions. For that to work you need to implement the > "offset" field in winsys_handle, that was added recently, and make sure > you have R8 and RG88 support (usually this is just mapping). oh, heh, looks like nobody bothered to add this yet: - diff --git a/src/gallium/drivers/freedreno/freedreno_resource.c b/src/gallium/drivers/freedreno/freedreno_resource.c index 9aded3b..fab78ab 100644 --- a/src/gallium/drivers/freedreno/freedreno_resource.c +++ b/src/gallium/drivers/freedreno/freedreno_resource.c @@ -669,6 +669,7 @@ fd_resource_from_handle(struct pipe_screen *pscreen, rsc->base.vtbl = _resource_vtbl; rsc->cpp = util_format_get_blocksize(tmpl->format); slice->pitch /= rsc->cpp; +slice->offset = handle->offset; assert(rsc->cpp); - > cheers, > Nicolas -- To unsubscribe from this list: send the line "unsubscribe linux-media" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: gstreamer: v4l2videodec plugin
On Tue, Apr 12, 2016 at 4:57 AM, Stanimir Varbanovwrote: > Hi Nicolas, > > On 04/11/2016 07:25 PM, Nicolas Dufresne wrote: >> Le lundi 11 avril 2016 à 15:11 +0300, Stanimir Varbanov a écrit : >>> adding gstreamer-devel >>> >>> On 04/11/2016 03:03 PM, Stanimir Varbanov wrote: Hi, I'm working on QCOM v4l2 video decoder/encoder driver and in order to test its functionalities I'm using gstreamer v4l2videodec plugin. I am able to use the v4l2videodec plugin with MMAP, now I want to try the dmabuf export from v4l2 and import dmabuf buffers to glimagesink. I upgraded gst to 1.7.91 so that I have the dmabuf support in glimagesink. Mesa version is 11.1.2. >> >> I'm very happy to see this report. So far, we only had report that this >> element works on Freescale IMX.6 (CODA) and Exynos 4/5. > > In this context, I would be very happy to see v4l2videoenc merged soon :) > >> I'm using the following pipeline: GST_GL_PLATFORM=egl GST_GL_API=gles2 gst-launch-1.0 $GSTDEBUG $GSTFILESRC ! qtdemux name=m m.video_0 ! h264parse ! v4l2video32dec capture-io-mode=dmabuf ! glimagesink I stalled on this error: eglimagememory gsteglimagememory.c:473:gst_egl_image_memory_from_dmabuf:>>> llocator0> eglCreateImage failed: EGL_BAD_MATCH which in Mesa is: libEGL debug: EGL user error 0x3009 (EGL_BAD_MATCH) in dri2_create_image_khr_texture Do someone know how the dmabuf import is tested when the support has been added to glimagesink? Or some pointers how to continue with debugging? >> >> So far the DMABuf support in glimagesink has been tested on Intel/Mesa >> and libMALI. There is work in progress in Gallium/Mesa, but until >> recently there was no support for offset in imported buffer, which >> would result in BAD_MATCH error. I cannot guaranty this is the exact >> reason here, BAD_MATCH is used for a very wide variety of reason in >> those extensions. The right place to dig into this issue would be >> through the Mesa list and/or Mesa code. Find out what is missing for >> you driver in Mesa and then I may help you further. > > I came down to these conditions > > https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/state_trackers/dri/dri2.c?h=11.2#n1063 > > but I don't know how this is related. The gstreamer > (gst_egl_image_memory_from_dmabuf) doesn't set this "level" so it will > be zero. > >> >> For the reference, the importation strategy we use in GStreamer has >> been inspired of Kodi (xmbc). It consist of importing each YUV plane >> seperatly using R8 and RG88 textures and doing the color conversion >> using shaders. Though, if the frame is allocated as a single DMABuf, >> this requires using offset to access the frame data, and that support > > Yep that is my case, the driver capture buffers has one plain, hence one > dmabuf will be exported per buffer. > >> had only been recently added in Gallium base code and in Radeon driver >> recently. I don't know if Freedreno, VC4 have that, and I know nouveau >> don't. > > Rob, do we need to add something in Freedreno Gallium driver to handle > dmabuf import? The issue is probably the YUV format, which we cannot really deal with properly in gallium.. it's a similar issue to multi-planer even if it is in a single buffer. The best way to handle this would be to import the same dmabuf fd twice, with appropriate offsets, to create one GL_RED eglimage for Y and one GL_RG eglimage for UV, and then combine them in shader in a similar way to how you'd handle separate Y and UV planes.. BR, -R > -- > regards, > Stan -- To unsubscribe from this list: send the line "unsubscribe linux-media" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] How implement Secure Data Path ?
On Wed, May 6, 2015 at 4:35 AM, Daniel Vetter dan...@ffwll.ch wrote: On Tue, May 05, 2015 at 05:54:05PM +0100, One Thousand Gnomes wrote: First what is Secure Data Path ? SDP is a set of hardware features to garanty that some memories regions could only be read and/or write by specific hardware IPs. You can imagine it as a kind of memory firewall which grant/revoke accesses to memory per devices. Firewall configuration must be done in a trusted environment: for ARM architecture we plan to use OP-TEE + a trusted application to do that. It's not just an ARM feature so any basis for this in the core code should be generic, whether its being enforced by ARM SDP, various Intel feature sets or even via a hypervisor. I have try 2 hacky approachs with dma_buf: - add a secure field in dma_buf structure and configure firewall in dma_buf_{map/unmap}_attachment() functions. How is SDP not just another IOMMU. The only oddity here is that it happens to configure buffers the CPU can't touch and it has a control mechanism that is designed to cover big media corp type uses where the threat model is that the system owner is the enemy. Why does anything care about it being SDP, there are also generic cases this might be a useful optimisation (eg knowing the buffer isn't CPU touched so you can optimise cache flushing). The control mechanism is a device/platform detail as with any IOMMU. It doesn't matter who configures it or how, providing it happens. We do presumably need some small core DMA changes - anyone trying to map such a buffer into CPU space needs to get a warning or error but what else ? From buffer allocation point of view I also facing a problem because when v4l2 or drm/kms are exporting buffers by using dma_buf they don't attaching themself on it and never call dma_buf_{map/unmap}_attachment(). This is not an issue in those framework while it is how dma_buf exporters are supposed to work. Which could be addressed if need be. So if SDP is just another IOMMU feature, just as stuff like IMR is on some x86 devices, and hypervisor enforced protection is on assorted platforms why do we need a special way to do it ? Is there anything actually needed beyond being able to tell the existing DMA code that this buffer won't be CPU touched and wiring it into the DMA operations for the platform ? Iirc most of the dma api stuff gets unhappy when memory isn't struct page backed. In i915 we do use sg tables everywhere though (even for memory not backed by struct page, e.g. the stolen range the bios prereserves), but we fill those out manually. A possible generic design I see is to have a secure memory allocator device which doesn nothing else but hand out dma-bufs. With that we can hide the platform-specific allocation methods in there (some need to allocate from carveouts, other just need to mark the pages specifically). Also dma-buf has explicit methods for cpu access, which are allowed to fail. And using the dma-buf attach tracking we can also reject dma to devices which cannot access the secure memory. Given all that I think going through the dma-buf interface but with a special-purpose allocator seems to fit. I'm not sure whether a special iommu is a good idea otoh: I'd expect that for most devices the driver would need to decide about which iommu to pick (or maybe keep track of some special flags for an extended dma_map interface). At least looking at gpu drivers using iommus would require special code, whereas fully hiding all this behind the dma-buf interface should fit in much better. jfwiw, I'd fully expect devices to be dealing with a mix of secure and insecure buffers, so I'm also not really sure how the 'special iommu' plan would play out.. I think 'secure' allocator device sounds attractive from PoV of separating out platform nonsense.. not sure if it is exactly that easy, since importing device probably needs to set some special bits here and there.. BR, -R -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms
On Wed, Feb 11, 2015 at 6:12 AM, Russell King - ARM Linux li...@arm.linux.org.uk wrote: On Wed, Feb 11, 2015 at 09:28:37AM +0100, Marek Szyprowski wrote: Hello, On 2015-01-27 09:25, Sumit Semwal wrote: Add some helpers to share the constraints of devices while attaching to the dmabuf buffer. At each attach, the constraints are calculated based on the following: - max_segment_size, max_segment_count, segment_boundary_mask from device_dma_parameters. In case the attaching device's constraints don't match up, attach() fails. At detach, the constraints are recalculated based on the remaining attached devices. Two helpers are added: - dma_buf_get_constraints - which gives the current constraints as calculated during each attach on the buffer till the time, - dma_buf_recalc_constraints - which recalculates the constraints for all currently attached devices for the 'paranoid' ones amongst us. The idea of this patch is largely taken from Rob Clark's RFC at https://lkml.org/lkml/2012/7/19/285, and the comments received on it. Cc: Rob Clark robdcl...@gmail.com Signed-off-by: Sumit Semwal sumit.sem...@linaro.org The code looks okay, although it will probably will work well only with typical cases like 'contiguous memory needed' or 'no constraints at all' (iommu). Which is a damn good reason to NAK it - by that admission, it's a half-baked idea. If all we want to know is whether the importer can accept only contiguous memory or not, make a flag to do that, and allow the exporter to test this flag. Don't over-engineer this to make it _seem_ like it can do something that it actually totally fails with. jfyi, I agree with that.. I think the flag is probably the right approach to start with. At the end of the day it *is* still just an in-kernel API (and not something that ends up as userspace ABI) so when we come up with the use case to make it more generic we can. Vs. making it look like something more generic when it isn't really yet. As I've already pointed out, there's a major problem if you have already had a less restrictive attachment which has an active mapping, and a new more restrictive attachment comes along later. It seems from Rob's descriptions that we also need another flag in the importer to indicate whether it wants to have a valid struct page in the scatter list, or whether it (correctly) uses the DMA accessors on the scatter list - so that exporters can reject importers which are buggy. to be completely generic, we would really need a way that the device could take over only just the last iommu (in case there were multiple levels of address translation).. I'm not completely sure, but I *think* the other arm gpu's have their own internal mmu for doing context switching, etc, so if there is an additional iommu in front of them they may actually still want to use the normal dma api's. Someone please contradict me if I am wrong. If this ends up being an issue only for msm, then I'm completely ok with the easier option of a less generic solution.. BR, -R -- FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms
On Wed, Feb 11, 2015 at 7:56 AM, Daniel Vetter dan...@ffwll.ch wrote: On Wed, Feb 11, 2015 at 06:23:52AM -0500, Rob Clark wrote: On Wed, Feb 11, 2015 at 6:12 AM, Russell King - ARM Linux li...@arm.linux.org.uk wrote: As I've already pointed out, there's a major problem if you have already had a less restrictive attachment which has an active mapping, and a new more restrictive attachment comes along later. It seems from Rob's descriptions that we also need another flag in the importer to indicate whether it wants to have a valid struct page in the scatter list, or whether it (correctly) uses the DMA accessors on the scatter list - so that exporters can reject importers which are buggy. to be completely generic, we would really need a way that the device could take over only just the last iommu (in case there were multiple levels of address translation).. I still hold that if the dma api steals the iommu your gpu needs for context switching then that's a bug in the platform setup code. dma api really doesn't have any concept of switchable hw contexts. So trying to work around this brokeness by mandating it as a valid dma-buf use-case is totally backwards. sure, my only point is that if I'm the odd man out, I can live with a hack (ie. requiring drm/msm to be aware enough of the platform to know if there is 1 level of address translation and frob'ing my 'struct device' accordingly)... no point in a generic solution for one user. I like to be practical. BR, -R -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms
On Tue, Feb 3, 2015 at 9:37 AM, Russell King - ARM Linux li...@arm.linux.org.uk wrote: On Tue, Feb 03, 2015 at 09:04:03AM -0500, Rob Clark wrote: Since I'm stuck w/ an iommu, instead of built in mmu, my plan was to drop use of dma-mapping entirely (incl the current call to dma_map_sg, which I just need until we can use drm_cflush on arm), and attach/detach iommu domains directly to implement context switches. At that point, dma_addr_t really has no sensible meaning for me. So how do you intend to import from a subsystem which only gives you the dma_addr_t? If you aren't passing system memory, you have no struct page. You can't fake up a struct page. What this means is that struct scatterlist can't represent it any other way. Tell the exporter to stop using carveouts, and give me proper memory instead.. ;-) Well, at least on these SoC's, I think the only valid use for carveout memory is the bootloader splashscreen. And I was planning on just hanging on to that for myself for fbdev scanout buffer or other internal (non shared) usage.. BR, -R -- FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms
On Tue, Feb 3, 2015 at 9:41 AM, Russell King - ARM Linux li...@arm.linux.org.uk wrote: On Tue, Feb 03, 2015 at 03:17:27PM +0100, Arnd Bergmann wrote: On Tuesday 03 February 2015 09:04:03 Rob Clark wrote: Since I'm stuck w/ an iommu, instead of built in mmu, my plan was to drop use of dma-mapping entirely (incl the current call to dma_map_sg, which I just need until we can use drm_cflush on arm), and attach/detach iommu domains directly to implement context switches. At that point, dma_addr_t really has no sensible meaning for me. I think what you see here is a quite common hardware setup and we really lack the right abstraction for it at the moment. Everybody seems to work around it with a mix of the dma-mapping API and the iommu API. These are doing different things, and even though the dma-mapping API can be implemented on top of the iommu API, they are not really compatible. I'd go as far as saying that the DMA API on top of IOMMU is more intended to be for a system IOMMU for the bus in question, rather than a device-level IOMMU. If an IOMMU is part of a device, then the device should handle it (maybe via an abstraction) and not via the DMA API. The DMA API should be handing the bus addresses to the device driver which the device's IOMMU would need to generate. (In other words, in this circumstance, the DMA API shouldn't give you the device internal address.) if the dma_addr_t becomes the address upstream of the iommu (in practice, the phys addr), that would, I think, address my concerns about dma_addr_t BR, -R -- FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms
On Tue, Feb 3, 2015 at 9:52 AM, Arnd Bergmann a...@arndb.de wrote: On Tuesday 03 February 2015 14:41:09 Russell King - ARM Linux wrote: On Tue, Feb 03, 2015 at 03:17:27PM +0100, Arnd Bergmann wrote: On Tuesday 03 February 2015 09:04:03 Rob Clark wrote: Since I'm stuck w/ an iommu, instead of built in mmu, my plan was to drop use of dma-mapping entirely (incl the current call to dma_map_sg, which I just need until we can use drm_cflush on arm), and attach/detach iommu domains directly to implement context switches. At that point, dma_addr_t really has no sensible meaning for me. I think what you see here is a quite common hardware setup and we really lack the right abstraction for it at the moment. Everybody seems to work around it with a mix of the dma-mapping API and the iommu API. These are doing different things, and even though the dma-mapping API can be implemented on top of the iommu API, they are not really compatible. I'd go as far as saying that the DMA API on top of IOMMU is more intended to be for a system IOMMU for the bus in question, rather than a device-level IOMMU. If an IOMMU is part of a device, then the device should handle it (maybe via an abstraction) and not via the DMA API. The DMA API should be handing the bus addresses to the device driver which the device's IOMMU would need to generate. (In other words, in this circumstance, the DMA API shouldn't give you the device internal address.) Exactly. And the abstraction that people choose at the moment is the iommu API, for better or worse. It makes a lot of sense to use this API if the same iommu is used for other devices as well (which is the case on Tegra and probably a lot of others). Unfortunately the iommu API lacks support for cache management, and probably other things as well, because this was not an issue for the original use case (device assignment on KVM/x86). This could be done by adding explicit or implied cache management to the IOMMU mapping interfaces, or by extending the dma-mapping interfaces in a way that covers the use case of the device managing its own address space, in addition to the existing coherent and streaming interfaces. I think for gpu's, we'd prefer explicit and less abstraction.. which is probably opposite of what every other driver would want In the end, my eventual goal is explicit control of tlb flush, and control of my address space. And in fact in some cases we are going to want to use the gpu to bang on iommu registers to do context switches and tlb flushes. (Which is obviously not the first step.. and something that is fairly difficult to get right/secure.. but the performance win seems significant so I'm not sure we can avoid it.) BR, -R Arnd -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Linaro-mm-sig] [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms
On Tue, Feb 3, 2015 at 11:12 AM, Arnd Bergmann a...@arndb.de wrote: I agree for the case you are describing here. From what I understood from Rob was that he is looking at something more like: Fig 3 CPU--L1cache--L2cache--Memory--IOMMU---iobus--device where the IOMMU controls one or more contexts per device, and is shared across GPU and non-GPU devices. Here, we need to use the dmap-mapping interface to set up the IO page table for any device that is unable to address all of system RAM, and we can use it for purposes like isolation of the devices. There are also cases where using the IOMMU is not optional. Actually, just to clarify, the IOMMU instance is specific to the GPU.. not shared with other devices. Otherwise managing multiple contexts would go quite badly.. But other devices have their own instance of the same IOMMU.. so same driver could be used. BR, -R -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms
On Tue, Feb 3, 2015 at 7:28 AM, Russell King - ARM Linux li...@arm.linux.org.uk wrote: On Tue, Feb 03, 2015 at 08:48:56AM +0100, Daniel Vetter wrote: On Mon, Feb 02, 2015 at 03:30:21PM -0500, Rob Clark wrote: On Mon, Feb 2, 2015 at 11:54 AM, Daniel Vetter dan...@ffwll.ch wrote: My initial thought is for dma-buf to not try to prevent something than an exporter can actually do.. I think the scenario you describe could be handled by two sg-lists, if the exporter was clever enough. That's already needed, each attachment has it's own sg-list. After all there's no array of dma_addr_t in the sg tables, so you can't use one sg for more than one mapping. And due to different iommu different devices can easily end up with different addresses. Well, to be fair it may not be explicitly stated, but currently one should assume the dma_addr_t's in the dmabuf sglist are bogus. With gpu's that implement per-process/context page tables, I'm not really sure that there is a sane way to actually do anything else.. Hm, what does per-process/context page tables have to do here? At least on i915 we have a two levels of page tables: - first level for vm/device isolation, used through dma api - 2nd level for per-gpu-context isolation and context switching, handled internally. Since atm the dma api doesn't have any context of contexts or different pagetables, I don't see who you could use that at all. What I've found with *my* etnaviv drm implementation (not Christian's - I found it impossible to work with Christian, especially with the endless msm doesn't do it that way, so we shouldn't responses and his attitude towards cherry-picking my development work [*]) is that it's much easier to keep the GPU MMU local to the GPU and under the control of the DRM MM code, rather than attaching the IOMMU to the DMA API and handling it that way. There are several reasons for that: 1. DRM has a better idea about when the memory needs to be mapped to the GPU, and it can more effectively manage the GPU MMU. 2. The GPU MMU may have TLBs which can only be flushed via a command in the GPU command stream, so it's fundamentally necessary for the MMU to be managed by the GPU driver so that it knows when (and how) to insert the flushes. If gpu mmu needs some/all updates to happen from command-stream then probably better to handle it internally.. That is a slightly different scenario from msm, where we have many instances of the same iommu[*] scattered through the SoC in front of various different devices. BR, -R [*] at least from iommu register layout, same driver is used for all instances.. but maybe the tlb+walker are maybe more tightly integrated to the gpu, but that is just speculation on implementation details based on some paper I found along the way * - as a direct result of that, I've stopped all further development of etnaviv drm, and I'm intending to strip it out from my Xorg DDX driver as the etnaviv drm API which Christian wants is completely incompatible with the non-etnaviv drm, and that just creates far too much pain in the DDX driver. -- FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms
On Tue, Feb 3, 2015 at 2:48 AM, Daniel Vetter dan...@ffwll.ch wrote: On Mon, Feb 02, 2015 at 03:30:21PM -0500, Rob Clark wrote: On Mon, Feb 2, 2015 at 11:54 AM, Daniel Vetter dan...@ffwll.ch wrote: My initial thought is for dma-buf to not try to prevent something than an exporter can actually do.. I think the scenario you describe could be handled by two sg-lists, if the exporter was clever enough. That's already needed, each attachment has it's own sg-list. After all there's no array of dma_addr_t in the sg tables, so you can't use one sg for more than one mapping. And due to different iommu different devices can easily end up with different addresses. Well, to be fair it may not be explicitly stated, but currently one should assume the dma_addr_t's in the dmabuf sglist are bogus. With gpu's that implement per-process/context page tables, I'm not really sure that there is a sane way to actually do anything else.. Hm, what does per-process/context page tables have to do here? At least on i915 we have a two levels of page tables: - first level for vm/device isolation, used through dma api - 2nd level for per-gpu-context isolation and context switching, handled internally. Since atm the dma api doesn't have any context of contexts or different pagetables, I don't see who you could use that at all. Since I'm stuck w/ an iommu, instead of built in mmu, my plan was to drop use of dma-mapping entirely (incl the current call to dma_map_sg, which I just need until we can use drm_cflush on arm), and attach/detach iommu domains directly to implement context switches. At that point, dma_addr_t really has no sensible meaning for me. BR, -R -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Linaro-mm-sig] [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms
On Tue, Feb 3, 2015 at 11:58 AM, Russell King - ARM Linux li...@arm.linux.org.uk wrote: Okay, but switching contexts is not something which the DMA API has any knowledge of (so it can't know which context to associate with which mapping.) While it knows which device, it has no knowledge (nor is there any way for it to gain knowledge) about contexts. My personal view is that extending the DMA API in this way feels quite dirty - it's a violation of the DMA API design, which is to (a) demark the buffer ownership between CPU and DMA agent, and (b) to translate buffer locations into a cookie which device drivers can use to instruct their device to access that memory. To see why, consider... that you map a buffer to a device in context A, and then you switch to context B, which means the dma_addr_t given previously is no longer valid. You then try to unmap it... which is normally done using the (now no longer valid) dma_addr_t. It seems to me that to support this at DMA API level, we would need to completely revamp the DMA API, which IMHO isn't going to be nice. (It would mean that we end up with three APIs - the original PCI DMA API, the existing DMA API, and some new DMA API.) Do we have any views on how common this feature is? I can't think of cases outside of GPU's.. if it were more common I'd be in favor of teaching dma api about multiple contexts, but right now I think that would just amount to forcing a lot of churn on everyone else for the benefit of GPU's. IMHO it makes more sense for GPU drivers to bypass the dma api if they need to. Plus, sooner or later, someone will discover that with some trick or optimization they can get moar fps, but the extra layer of abstraction will just be getting in the way. BR, -R -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms
On Mon, Feb 2, 2015 at 11:54 AM, Daniel Vetter dan...@ffwll.ch wrote: My initial thought is for dma-buf to not try to prevent something than an exporter can actually do.. I think the scenario you describe could be handled by two sg-lists, if the exporter was clever enough. That's already needed, each attachment has it's own sg-list. After all there's no array of dma_addr_t in the sg tables, so you can't use one sg for more than one mapping. And due to different iommu different devices can easily end up with different addresses. Well, to be fair it may not be explicitly stated, but currently one should assume the dma_addr_t's in the dmabuf sglist are bogus. With gpu's that implement per-process/context page tables, I'm not really sure that there is a sane way to actually do anything else.. BR, -R -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms
On Mon, Feb 2, 2015 at 4:46 PM, Russell King - ARM Linux li...@arm.linux.org.uk wrote: On Mon, Feb 02, 2015 at 03:30:21PM -0500, Rob Clark wrote: On Mon, Feb 2, 2015 at 11:54 AM, Daniel Vetter dan...@ffwll.ch wrote: My initial thought is for dma-buf to not try to prevent something than an exporter can actually do.. I think the scenario you describe could be handled by two sg-lists, if the exporter was clever enough. That's already needed, each attachment has it's own sg-list. After all there's no array of dma_addr_t in the sg tables, so you can't use one sg for more than one mapping. And due to different iommu different devices can easily end up with different addresses. Well, to be fair it may not be explicitly stated, but currently one should assume the dma_addr_t's in the dmabuf sglist are bogus. With gpu's that implement per-process/context page tables, I'm not really sure that there is a sane way to actually do anything else.. That's incorrect - and goes dead against the design of scatterlists. yeah, a bit of an abuse, although I'm not sure I see a much better way when a device vaddr depends on user context.. Not only that, but it is entirely possible that you may get handed memory via dmabufs for which there are no struct page's associated with that memory - think about display systems which have their own video memory which is accessible to the GPU, but it isn't system memory. well, I guess anyways when it comes to sharing buffers, it won't be the vram placement of the bo that gets shared ;-) BR, -R In those circumstances, you have to use the dma_addr_t's and not the pages. -- FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms
On Thu, Jan 29, 2015 at 10:47 AM, Russell King - ARM Linux li...@arm.linux.org.uk wrote: On Thu, Jan 29, 2015 at 09:00:11PM +0530, Sumit Semwal wrote: So, short answer is, it is left to the exporter to decide. The dma-buf framework should not even attempt to decide or enforce any of the above. At each dma_buf_attach(), there's a callback to the exporter, where the exporter can decide, if it intends to handle these kind of cases, on the best way forward. The exporter might, for example, decide to migrate backing storage, That's a decision which the exporter can not take. Think about it... If subsystem Y has mapped the buffer, it could be accessing the buffer's backing storage at the same time that subsystem Z tries to attach to the buffer. The *theory* is that Y is map/unmap'ing the buffer around each use, so there will be some point where things could be migrated and remapped.. in practice, I am not sure that anyone is doing this yet. Probably it would be reasonable if a more restrictive subsystem tried to attach after the buffer was already allocated and mapped in a way that don't meet the new constraints, then -EBUSY. But from a quick look it seems like there needs to be a slight fixup to not return 0 if calc_constraints() fails.. Once the buffer has been exported to another user, the exporter has effectively lost control over mediating accesses to that buffer. All that it can do with the way the dma-buf API is today is to allocate a _different_ scatter list pointing at the same backing storage which satisfies the segment size and number of segments, etc. There's also another issue which you haven't addressed. What if several attachments result in lowering max_segment_size and max_segment_count such that: max_segment_size * max_segment_count dmabuf-size but individually, the attachments allow dmabuf-size to be represented as a scatterlist? Quite possibly for some of these edge some of cases, some of the dma-buf exporters are going to need to get more clever (ie. hand off different scatterlists to different clients). Although I think by far the two common cases will be I can support anything via an iommu/mmu and I need phys contig. But that isn't an issue w/ dma-buf itself, so much as it is an issue w/ drivers. I guess there would be more interest in fixing up drivers when actual hw comes along that needs it.. BR, -R If an exporter were to take notice of the max_segment_size and max_segment_count, the resulting buffer is basically unrepresentable as a scatterlist. Please consider the possible sequences of use (such as the scenario above) when creating or augmenting an API. I tried to think of the scenarios I could think of, but If you still feel this approach doesn't help with your concerns, I'll graciously accept advice to improve it. See the new one above :) -- FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms
On Thu, Jan 29, 2015 at 2:26 PM, Russell King - ARM Linux li...@arm.linux.org.uk wrote: On Thu, Jan 29, 2015 at 01:52:09PM -0500, Rob Clark wrote: Quite possibly for some of these edge some of cases, some of the dma-buf exporters are going to need to get more clever (ie. hand off different scatterlists to different clients). Although I think by far the two common cases will be I can support anything via an iommu/mmu and I need phys contig. But that isn't an issue w/ dma-buf itself, so much as it is an issue w/ drivers. I guess there would be more interest in fixing up drivers when actual hw comes along that needs it.. However, validating the attachments is the business of dma-buf. This is actual infrastructure, which should ensure some kind of sanity such as the issues I've raised. My initial thought is for dma-buf to not try to prevent something than an exporter can actually do.. I think the scenario you describe could be handled by two sg-lists, if the exporter was clever enough. That all said, I think probably all the existing exporters cache the sg-list. And I can't think of any actual hw which would hit this problem that can be solved by multiple sg-lists for the same physical memory. (And the constraint calculation kind of assumes the end result will be a single sg-list.) So it seems reasonable to me to check that max_segment_count * max_segment_size is not smaller than the buffer. If it was a less theoretical problem, I think I'd more inclined for a way that the exporter could override the checks, or something along those lines. otoh, if the attachment is just not possible because the buffer has been already allocated and mapped by someone with more relaxed constraints.. then I think the driver should be the one returning the error since dma-buf doesn't know this. The whole we can push it onto our users is really on - what that results in is the users ignoring most of the requirements and just doing their own thing, which ultimately ends up with the whole thing turning into a disgusting mess - one which becomes very difficult to fix later. Ideally at some point, dma-mapping or some helpers would support allocations matching constraints.. I think only actual gpu drivers want to do crazy enough things that they'd want to bypass dma-mapping. If everyone else can use dma-mapping and/or helpers then we make it harder for drivers to do the wrong thing than to do the right thing. Now, if we're going to do the more clever thing you mention above, that rather negates the point of this two-part patch set, which is to provide the union of the DMA capabilities of all users. A union in that case is no longer sane as we'd be tailoring the SG lists to each user. It doesn't really negate.. a different sg list representing the same physical memory cannot suddenly make the buffer physically contiguous (from the perspective of memory).. (unless we are not on the same page here, so to speak) BR, -R If we aren't going to do the more clever thing, then yes, we need this code to calculate that union, but we _also_ need it to do sanity checking right from the start, and refuse conditions which ultimately break the ability to make use of that union - in other words, when the union of the DMA capabilities means that the dmabuf can't be represented. Unless we do that, we'll just end up with random drivers interpreting what they want from the DMA capabilities, and we'll have some drivers exporting (eg) scatterlists which satisfy the maximum byte size of an element, but ignoring the maximum number of entries or vice versa, and that'll most probably hide the case of too small a union. It really doesn't make sense to do both either: that route is even more madness, because we'll end up with two classes of drivers - those which use the union approach, and those which don't. The KISS principle applies here. -- FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms
On Thu, Jan 29, 2015 at 5:31 PM, Russell King - ARM Linux li...@arm.linux.org.uk wrote: On Thu, Jan 29, 2015 at 05:18:33PM -0500, Rob Clark wrote: On Thu, Jan 29, 2015 at 2:26 PM, Russell King - ARM Linux li...@arm.linux.org.uk wrote: Now, if we're going to do the more clever thing you mention above, that rather negates the point of this two-part patch set, which is to provide the union of the DMA capabilities of all users. A union in that case is no longer sane as we'd be tailoring the SG lists to each user. It doesn't really negate.. a different sg list representing the same physical memory cannot suddenly make the buffer physically contiguous (from the perspective of memory).. (unless we are not on the same page here, so to speak) If we are really only interested in the physically contiguous vs scattered differentiation, why can't this be just a simple flag? I'd be fine with that.. I was trying to make it a bit less of a point solution, but maybe trying to be too generic is not worth it.. There is apparently some hw which has iommu's but small # of tlb entries, and would prefer partially contiguous buffers. But that isn't a hard constraint, and maybe shouldn't be solved w/ max_segment_count. And I'm not sure how common that is. I think I know where you're coming from on that distinction - most GPUs can cope with their buffers being discontiguous in memory, but scanout and capture hardware tends to need contiguous buffers. My guess is that you're looking for some way that a GPU driver could allocate a buffer, which can then be imported into the scanout hardware - and when it is, the underlying backing store is converted to a contiguous buffer. Is that the usage scenario you're thinking of? Pretty much.. and maybe a few slight permutations on that involving cameras / video codecs / etc. But the really-really common case is gpu (with mmu/iommu) + display (without). Just solving this problem would be a really good first step. BR, -R -- FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [REPOST PATCH 1/8] fence: dma-buf cross-device synchronization (v17)
, 0 if the wait timed out, or the + * remaining timeout in jiffies on success. + */ +long Shouldn't this be signed to be explicit? +fence_default_wait(struct fence *fence, bool intr, signed long timeout) +{ + struct default_wait_cb cb; + unsigned long flags; + long ret = timeout; + bool was_set; + + if (test_bit(FENCE_FLAG_SIGNALED_BIT, fence-flags)) + return timeout; + + spin_lock_irqsave(fence-lock, flags); + + if (intr signal_pending(current)) { + ret = -ERESTARTSYS; + goto out; + } + + was_set = test_and_set_bit(FENCE_FLAG_ENABLE_SIGNAL_BIT, fence-flags); + + if (test_bit(FENCE_FLAG_SIGNALED_BIT, fence-flags)) + goto out; + + if (!was_set) { + trace_fence_enable_signal(fence); + + if (!fence-ops-enable_signaling(fence)) { + __fence_signal(fence); + goto out; + } + } + + cb.base.func = fence_default_wait_cb; + cb.task = current; + list_add(cb.base.node, fence-cb_list); + + while (!test_bit(FENCE_FLAG_SIGNALED_BIT, fence-flags) ret 0) { + if (intr) + __set_current_state(TASK_INTERRUPTIBLE); + else + __set_current_state(TASK_UNINTERRUPTIBLE); + spin_unlock_irqrestore(fence-lock, flags); + + ret = schedule_timeout(ret); + + spin_lock_irqsave(fence-lock, flags); + if (ret 0 intr signal_pending(current)) + ret = -ERESTARTSYS; + } + + if (!list_empty(cb.base.node)) + list_del(cb.base.node); + __set_current_state(TASK_RUNNING); + +out: + spin_unlock_irqrestore(fence-lock, flags); + return ret; +} +EXPORT_SYMBOL(fence_default_wait); + +/** + * __fence_init - Initialize a custom fence. + * @fence: [in]the fence to initialize + * @ops: [in]the fence_ops for operations on this fence + * @lock:[in]the irqsafe spinlock to use for locking this fence + * @context: [in]the execution context this fence is run on + * @seqno: [in]a linear increasing sequence number for this context + * + * Initializes an allocated fence, the caller doesn't have to keep its + * refcount after committing with this fence, but it will need to hold a + * refcount again if fence_ops.enable_signaling gets called. This can + * be used for other implementing other types of fence. + * + * context and seqno are used for easy comparison between fences, allowing + * to check which fence is later by simply using fence_later. + */ +void +__fence_init(struct fence *fence, const struct fence_ops *ops, + spinlock_t *lock, unsigned context, unsigned seqno) +{ + BUG_ON(!lock); + BUG_ON(!ops || !ops-wait || !ops-enable_signaling || +!ops-get_driver_name || !ops-get_timeline_name); + + kref_init(fence-refcount); + fence-ops = ops; + INIT_LIST_HEAD(fence-cb_list); + fence-lock = lock; + fence-context = context; + fence-seqno = seqno; + fence-flags = 0UL; + + trace_fence_init(fence); +} +EXPORT_SYMBOL(__fence_init); Again with the __ exported function... I don't even see a fence_init() function anywhere, why the __ ? think of it as a 'protected' constructor.. only the derived fence subclass should call. diff --git a/include/linux/fence.h b/include/linux/fence.h new file mode 100644 index ..65f2a01ee7e4 --- /dev/null +++ b/include/linux/fence.h @@ -0,0 +1,333 @@ +/* + * Fence mechanism for dma-buf to allow for asynchronous dma access + * + * Copyright (C) 2012 Canonical Ltd + * Copyright (C) 2012 Texas Instruments + * + * Authors: + * Rob Clark robdcl...@gmail.com + * Maarten Lankhorst maarten.lankho...@canonical.com + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published by + * the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program. If not, see http://www.gnu.org/licenses/. + */ + +#ifndef __LINUX_FENCE_H +#define __LINUX_FENCE_H + +#include linux/err.h +#include linux/wait.h +#include linux/list.h +#include linux/bitops.h +#include linux/kref.h +#include linux/sched.h +#include linux/printk.h + +struct fence; +struct fence_ops; +struct fence_cb; + +/** + * struct fence - software synchronization primitive + * @refcount: refcount for this fence + * @ops: fence_ops associated with this fence + * @cb_list: list of all
Re: [REPOST PATCH 1/8] fence: dma-buf cross-device synchronization (v17)
On Thu, Jun 19, 2014 at 1:00 PM, Greg KH gre...@linuxfoundation.org wrote: On Thu, Jun 19, 2014 at 10:00:18AM -0400, Rob Clark wrote: On Wed, Jun 18, 2014 at 9:13 PM, Greg KH gre...@linuxfoundation.org wrote: On Wed, Jun 18, 2014 at 12:36:54PM +0200, Maarten Lankhorst wrote: +#define CREATE_TRACE_POINTS +#include trace/events/fence.h + +EXPORT_TRACEPOINT_SYMBOL(fence_annotate_wait_on); +EXPORT_TRACEPOINT_SYMBOL(fence_emit); Are you really willing to live with these as tracepoints for forever? What is the use of them in debugging? Was it just for debugging the fence code, or for something else? +/** + * fence_context_alloc - allocate an array of fence contexts + * @num: [in]amount of contexts to allocate + * + * This function will return the first index of the number of fences allocated. + * The fence context is used for setting fence-context to a unique number. + */ +unsigned fence_context_alloc(unsigned num) +{ + BUG_ON(!num); + return atomic_add_return(num, fence_context_counter) - num; +} +EXPORT_SYMBOL(fence_context_alloc); EXPORT_SYMBOL_GPL()? Same goes for all of the exports in here. Traditionally all of the driver core exports have been with this marking, any objection to making that change here as well? tbh, I prefer EXPORT_SYMBOL().. well, I'd prefer even more if there wasn't even a need for EXPORT_SYMBOL_GPL(), but sadly it is a fact of life. We already went through this debate once with dma-buf. We aren't going to change $evil_vendor's mind about non-gpl modules. The only result will be a more flugly convoluted solution (ie. use syncpt EXPORT_SYMBOL() on top of fence EXPORT_SYMBOL_GPL()) just as a workaround, with the result that no-one benefits. It has been proven that using _GPL() exports have caused companies to release their code properly over the years, so as these really are Linux-only apis, please change them to be marked this way, it helps everyone out in the end. Well, maybe that is the true in some cases. But it certainly didn't work out that way for dma-buf. And I think the end result is worse. I don't really like coming down on the side of EXPORT_SYMBOL() instead of EXPORT_SYMBOL_GPL(), but if we do use EXPORT_SYMBOL_GPL() then the result will only be creative workarounds using the _GPL symbols indirectly by whatever is available via EXPORT_SYMBOL(). I don't really see how that will be better. +int __fence_signal(struct fence *fence) +{ + struct fence_cb *cur, *tmp; + int ret = 0; + + if (WARN_ON(!fence)) + return -EINVAL; + + if (!ktime_to_ns(fence-timestamp)) { + fence-timestamp = ktime_get(); + smp_mb__before_atomic(); + } + + if (test_and_set_bit(FENCE_FLAG_SIGNALED_BIT, fence-flags)) { + ret = -EINVAL; + + /* + * we might have raced with the unlocked fence_signal, + * still run through all callbacks + */ + } else + trace_fence_signaled(fence); + + list_for_each_entry_safe(cur, tmp, fence-cb_list, node) { + list_del_init(cur-node); + cur-func(fence, cur); + } + return ret; +} +EXPORT_SYMBOL(__fence_signal); Don't export a function with __ in front of it, you are saying that an internal function is somehow valid for everyone else to call? Why? You aren't even documenting the function, so who knows how to use it? fwiw, the __ versions appear to mainly be concessions for android syncpt. That is the only user outside of fence.c, and it should stay that way. How are you going to ensure this? And where did you document it? Please fix this up, it's a horrid way to create a new api. If the android code needs to be fixed to fit into this model, then fix it. heh, and in fact I was wrong about this.. the __ versions are actually for when the lock is already held. Maarten needs to rename (ie _locked suffix) and add some API docs for this. +void +__fence_init(struct fence *fence, const struct fence_ops *ops, + spinlock_t *lock, unsigned context, unsigned seqno) +{ + BUG_ON(!lock); + BUG_ON(!ops || !ops-wait || !ops-enable_signaling || +!ops-get_driver_name || !ops-get_timeline_name); + + kref_init(fence-refcount); + fence-ops = ops; + INIT_LIST_HEAD(fence-cb_list); + fence-lock = lock; + fence-context = context; + fence-seqno = seqno; + fence-flags = 0UL; + + trace_fence_init(fence); +} +EXPORT_SYMBOL(__fence_init); Again with the __ exported function... I don't even see a fence_init() function anywhere, why the __ ? think of it as a 'protected' constructor.. only the derived fence subclass should call. Where do you say this? Again, not a good reason, fix up the api please. + kref_get(fence-refcount); +} Why
Re: [REPOST PATCH 1/8] fence: dma-buf cross-device synchronization (v17)
On Thu, Jun 19, 2014 at 2:19 PM, Greg KH gre...@linuxfoundation.org wrote: On Thu, Jun 19, 2014 at 01:45:30PM -0400, Rob Clark wrote: On Thu, Jun 19, 2014 at 1:00 PM, Greg KH gre...@linuxfoundation.org wrote: On Thu, Jun 19, 2014 at 10:00:18AM -0400, Rob Clark wrote: On Wed, Jun 18, 2014 at 9:13 PM, Greg KH gre...@linuxfoundation.org wrote: On Wed, Jun 18, 2014 at 12:36:54PM +0200, Maarten Lankhorst wrote: +#define CREATE_TRACE_POINTS +#include trace/events/fence.h + +EXPORT_TRACEPOINT_SYMBOL(fence_annotate_wait_on); +EXPORT_TRACEPOINT_SYMBOL(fence_emit); Are you really willing to live with these as tracepoints for forever? What is the use of them in debugging? Was it just for debugging the fence code, or for something else? +/** + * fence_context_alloc - allocate an array of fence contexts + * @num: [in]amount of contexts to allocate + * + * This function will return the first index of the number of fences allocated. + * The fence context is used for setting fence-context to a unique number. + */ +unsigned fence_context_alloc(unsigned num) +{ + BUG_ON(!num); + return atomic_add_return(num, fence_context_counter) - num; +} +EXPORT_SYMBOL(fence_context_alloc); EXPORT_SYMBOL_GPL()? Same goes for all of the exports in here. Traditionally all of the driver core exports have been with this marking, any objection to making that change here as well? tbh, I prefer EXPORT_SYMBOL().. well, I'd prefer even more if there wasn't even a need for EXPORT_SYMBOL_GPL(), but sadly it is a fact of life. We already went through this debate once with dma-buf. We aren't going to change $evil_vendor's mind about non-gpl modules. The only result will be a more flugly convoluted solution (ie. use syncpt EXPORT_SYMBOL() on top of fence EXPORT_SYMBOL_GPL()) just as a workaround, with the result that no-one benefits. It has been proven that using _GPL() exports have caused companies to release their code properly over the years, so as these really are Linux-only apis, please change them to be marked this way, it helps everyone out in the end. Well, maybe that is the true in some cases. But it certainly didn't work out that way for dma-buf. And I think the end result is worse. I don't really like coming down on the side of EXPORT_SYMBOL() instead of EXPORT_SYMBOL_GPL(), but if we do use EXPORT_SYMBOL_GPL() then the result will only be creative workarounds using the _GPL symbols indirectly by whatever is available via EXPORT_SYMBOL(). I don't really see how that will be better. You are saying that you _know_ companies will violate our license, so you should just give up? And how do you know people aren't working on preventing those indirect usages as well? :) Well, all I know is what happened with dmabuf. This seems like the exact same scenario (same vendor, same driver, same use-case). Not really sure how we could completely prevent indirect usage, given that drm core and many of the drivers are dual MIT/GPL. (But ofc, IANAL.) Sorry, I'm not going to give up here, again, it has proven to work in the past in changing the ways of _very_ large companies, why stop now? In the general case, I would agree. But in this specific case, I am not very optimistic. That said, it isn't really my loss if it is _GPL().. I don't have to use or support that particular driver. But given that we have some history from the same debate with dma-buf, I think it is pretty easy to infer the result from making fence EXPORT_SYMBOL_GPL(). BR, -R thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [REPOST PATCH 1/8] fence: dma-buf cross-device synchronization (v17)
On Thu, Jun 19, 2014 at 5:50 PM, Dave Airlie airl...@gmail.com wrote: On 20 June 2014 04:19, Greg KH gre...@linuxfoundation.org wrote: On Thu, Jun 19, 2014 at 01:45:30PM -0400, Rob Clark wrote: On Thu, Jun 19, 2014 at 1:00 PM, Greg KH gre...@linuxfoundation.org wrote: On Thu, Jun 19, 2014 at 10:00:18AM -0400, Rob Clark wrote: On Wed, Jun 18, 2014 at 9:13 PM, Greg KH gre...@linuxfoundation.org wrote: On Wed, Jun 18, 2014 at 12:36:54PM +0200, Maarten Lankhorst wrote: +#define CREATE_TRACE_POINTS +#include trace/events/fence.h + +EXPORT_TRACEPOINT_SYMBOL(fence_annotate_wait_on); +EXPORT_TRACEPOINT_SYMBOL(fence_emit); Are you really willing to live with these as tracepoints for forever? What is the use of them in debugging? Was it just for debugging the fence code, or for something else? +/** + * fence_context_alloc - allocate an array of fence contexts + * @num: [in]amount of contexts to allocate + * + * This function will return the first index of the number of fences allocated. + * The fence context is used for setting fence-context to a unique number. + */ +unsigned fence_context_alloc(unsigned num) +{ + BUG_ON(!num); + return atomic_add_return(num, fence_context_counter) - num; +} +EXPORT_SYMBOL(fence_context_alloc); EXPORT_SYMBOL_GPL()? Same goes for all of the exports in here. Traditionally all of the driver core exports have been with this marking, any objection to making that change here as well? tbh, I prefer EXPORT_SYMBOL().. well, I'd prefer even more if there wasn't even a need for EXPORT_SYMBOL_GPL(), but sadly it is a fact of life. We already went through this debate once with dma-buf. We aren't going to change $evil_vendor's mind about non-gpl modules. The only result will be a more flugly convoluted solution (ie. use syncpt EXPORT_SYMBOL() on top of fence EXPORT_SYMBOL_GPL()) just as a workaround, with the result that no-one benefits. It has been proven that using _GPL() exports have caused companies to release their code properly over the years, so as these really are Linux-only apis, please change them to be marked this way, it helps everyone out in the end. Well, maybe that is the true in some cases. But it certainly didn't work out that way for dma-buf. And I think the end result is worse. I don't really like coming down on the side of EXPORT_SYMBOL() instead of EXPORT_SYMBOL_GPL(), but if we do use EXPORT_SYMBOL_GPL() then the result will only be creative workarounds using the _GPL symbols indirectly by whatever is available via EXPORT_SYMBOL(). I don't really see how that will be better. You are saying that you _know_ companies will violate our license, so you should just give up? And how do you know people aren't working on preventing those indirect usages as well? :) Sorry, I'm not going to give up here, again, it has proven to work in the past in changing the ways of _very_ large companies, why stop now? I've found large companies shipping lots of hw putting pressure on other large/small companies seems to be only way this has ever happened, we'd like to cover that up and say its some great GPL enforcement thing. To be honest, author's choice is how I'd treat this. Personally I think _GPL is broken by design, and that Linus's initial point for them has been so diluted by random lobby groups asking for every symbol to be _GPL that they are becoming effectively pointless now. I also dislike the fact that the lobby groups don't just bring violators to court. I'm also sure someone like the LF could have a nice income stream if Linus gave them permission to enforce his copyrights. But anyways, has someone checked that iOS or Windows don't have a fence interface? so we know that this is a Linux only interface and any works using it are derived? Say the nvidia driver isn't a derived work now, will using this interface magically translate it into a derived work, so we can go sue them? I don't think so. I've no ideas about what the APIs are in windows, but windows has had multi-gpu support for a *long* time, which implies some mechanism like dmabuf and fence.. this isn't exactly an area where we are trailblazing here. BR, -R But its up to Maarten and Rob, and if they say no _GPL then I don't think we should be overriding authors intents. Dave. -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [REPOST PATCH 1/8] fence: dma-buf cross-device synchronization (v17)
; + + if (test_bit(FENCE_FLAG_SIGNALED_BIT, fence-flags)) + return timeout; + + spin_lock_irqsave(fence-lock, flags); + + if (intr signal_pending(current)) { + ret = -ERESTARTSYS; + goto out; + } + + was_set = test_and_set_bit(FENCE_FLAG_ENABLE_SIGNAL_BIT, fence-flags); + + if (test_bit(FENCE_FLAG_SIGNALED_BIT, fence-flags)) + goto out; + + if (!was_set) { + trace_fence_enable_signal(fence); + + if (!fence-ops-enable_signaling(fence)) { + __fence_signal(fence); + goto out; + } + } + + cb.base.func = fence_default_wait_cb; + cb.task = current; + list_add(cb.base.node, fence-cb_list); + + while (!test_bit(FENCE_FLAG_SIGNALED_BIT, fence-flags) ret 0) { + if (intr) + __set_current_state(TASK_INTERRUPTIBLE); + else + __set_current_state(TASK_UNINTERRUPTIBLE); + spin_unlock_irqrestore(fence-lock, flags); + + ret = schedule_timeout(ret); + + spin_lock_irqsave(fence-lock, flags); + if (ret 0 intr signal_pending(current)) + ret = -ERESTARTSYS; + } + + if (!list_empty(cb.base.node)) + list_del(cb.base.node); + __set_current_state(TASK_RUNNING); + +out: + spin_unlock_irqrestore(fence-lock, flags); + return ret; +} +EXPORT_SYMBOL(fence_default_wait); + +/** + * __fence_init - Initialize a custom fence. + * @fence: [in]the fence to initialize + * @ops: [in]the fence_ops for operations on this fence + * @lock:[in]the irqsafe spinlock to use for locking this fence + * @context: [in]the execution context this fence is run on + * @seqno: [in]a linear increasing sequence number for this context + * + * Initializes an allocated fence, the caller doesn't have to keep its + * refcount after committing with this fence, but it will need to hold a + * refcount again if fence_ops.enable_signaling gets called. This can + * be used for other implementing other types of fence. + * + * context and seqno are used for easy comparison between fences, allowing + * to check which fence is later by simply using fence_later. + */ +void +__fence_init(struct fence *fence, const struct fence_ops *ops, + spinlock_t *lock, unsigned context, unsigned seqno) +{ + BUG_ON(!lock); + BUG_ON(!ops || !ops-wait || !ops-enable_signaling || +!ops-get_driver_name || !ops-get_timeline_name); + + kref_init(fence-refcount); + fence-ops = ops; + INIT_LIST_HEAD(fence-cb_list); + fence-lock = lock; + fence-context = context; + fence-seqno = seqno; + fence-flags = 0UL; + + trace_fence_init(fence); +} +EXPORT_SYMBOL(__fence_init); Again with the __ exported function... I don't even see a fence_init() function anywhere, why the __ ? diff --git a/include/linux/fence.h b/include/linux/fence.h new file mode 100644 index ..65f2a01ee7e4 --- /dev/null +++ b/include/linux/fence.h @@ -0,0 +1,333 @@ +/* + * Fence mechanism for dma-buf to allow for asynchronous dma access + * + * Copyright (C) 2012 Canonical Ltd + * Copyright (C) 2012 Texas Instruments + * + * Authors: + * Rob Clark robdcl...@gmail.com + * Maarten Lankhorst maarten.lankho...@canonical.com + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published by + * the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program. If not, see http://www.gnu.org/licenses/. + */ + +#ifndef __LINUX_FENCE_H +#define __LINUX_FENCE_H + +#include linux/err.h +#include linux/wait.h +#include linux/list.h +#include linux/bitops.h +#include linux/kref.h +#include linux/sched.h +#include linux/printk.h + +struct fence; +struct fence_ops; +struct fence_cb; + +/** + * struct fence - software synchronization primitive + * @refcount: refcount for this fence + * @ops: fence_ops associated with this fence + * @cb_list: list of all callbacks to call + * @lock: spin_lock_irqsave used for locking + * @context: execution context this fence belongs to, returned by + * fence_context_alloc() + * @seqno: the sequence number of this fence inside the execution context, + * can be compared to decide which fence would be signaled later. + * @flags: A mask of FENCE_FLAG_* defined below + * @timestamp: Timestamp when the fence was signaled
Re: [REPOST PATCH 1/8] fence: dma-buf cross-device synchronization (v17)
to bool. Rework dma_fence_wait. Added dma_fence_is_signaled and dma_fence_wait_timeout. s/dma// and change exports to non GPL. Added fence_is_signaled and fence_enable_sw_signaling calls, add ability to override default wait operation. v10: remove event_queue, use a custom list, export try_to_wake_up from scheduler. Remove fence lock and use a global spinlock instead, this should hopefully remove all the locking headaches I was having on trying to implement this. enable_signaling is called with this lock held. v11: Use atomic ops for flags, lifting the need for some spin_lock_irqsaves. However I kept the guarantee that after fence_signal returns, it is guaranteed that enable_signaling has either been called to completion, or will not be called any more. Add contexts and seqno to base fence implementation. This allows you to wait for less fences, by testing for seqno + signaled, and then only wait on the later fence. Add FENCE_TRACE, FENCE_WARN, and FENCE_ERR. This makes debugging easier. An CONFIG_DEBUG_FENCE will be added to turn off the FENCE_TRACE spam, and another runtime option can turn it off at runtime. v12: Add CONFIG_FENCE_TRACE. Add missing documentation for the fence-context and fence-seqno members. v13: Fixup CONFIG_FENCE_TRACE kconfig description. Move fence_context_alloc to fence. Simplify fence_later. Kill priv member to fence_cb. v14: Remove priv argument from fence_add_callback, oops! v15: Remove priv from documentation. Explicitly include linux/atomic.h. v16: Add trace events. Import changes required by android syncpoints. v17: Use wake_up_state instead of try_to_wake_up. (Colin Cross) Fix up commit description for seqno_fence. (Rob Clark) Signed-off-by: Maarten Lankhorst maarten.lankho...@canonical.com Signed-off-by: Thierry Reding thierry.red...@gmail.com #use smp_mb__before_atomic() Reviewed-by: Rob Clark robdcl...@gmail.com --- Documentation/DocBook/device-drivers.tmpl |2 drivers/base/Kconfig |9 + drivers/base/Makefile |2 drivers/base/fence.c | 416 + include/linux/fence.h | 333 +++ include/trace/events/fence.h | 128 + 6 files changed, 889 insertions(+), 1 deletion(-) create mode 100644 drivers/base/fence.c create mode 100644 include/linux/fence.h create mode 100644 include/trace/events/fence.h Who is going to sign up to maintain this code? (hint, it's not me...) that would be Sumit (dma-buf tree).. probably we should move fence/reservation/dma-buf into drivers/dma-buf (or something approximately like that) BR, -R thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/6] dma-buf: add poll support, v2
On Mon, Feb 17, 2014 at 10:58 AM, Maarten Lankhorst maarten.lankho...@canonical.com wrote: Thanks to Fengguang Wu for spotting a missing static cast. v2: - Kill unused variable need_shared. Signed-off-by: Maarten Lankhorst maarten.lankho...@canonical.com --- drivers/base/dma-buf.c | 101 +++ include/linux/dma-buf.h | 12 ++ 2 files changed, 113 insertions(+) diff --git a/drivers/base/dma-buf.c b/drivers/base/dma-buf.c index 85e792c2c909..77ea621ab59d 100644 --- a/drivers/base/dma-buf.c +++ b/drivers/base/dma-buf.c @@ -30,6 +30,7 @@ #include linux/export.h #include linux/debugfs.h #include linux/seq_file.h +#include linux/poll.h #include linux/reservation.h static inline int is_dma_buf_file(struct file *); @@ -52,6 +53,13 @@ static int dma_buf_release(struct inode *inode, struct file *file) BUG_ON(dmabuf-vmapping_counter); + /* +* Any fences that a dma-buf poll can wait on should be signaled +* before releasing dma-buf. This is the responsibility of each +* driver that uses the reservation objects. minor nit.. but wouldn't hurt to mention in the comment that if you hit this BUG_ON it is because someone isn't holding a ref to the dmabuf while there is a pending fence +*/ + BUG_ON(dmabuf-cb_shared.active || dmabuf-cb_excl.active); + dmabuf-ops-release(dmabuf); mutex_lock(db_list.lock); @@ -108,10 +116,99 @@ static loff_t dma_buf_llseek(struct file *file, loff_t offset, int whence) return base + offset; } +static void dma_buf_poll_cb(struct fence *fence, struct fence_cb *cb) +{ + struct dma_buf_poll_cb_t *dcb = (struct dma_buf_poll_cb_t*) cb; + unsigned long flags; + + spin_lock_irqsave(dcb-poll-lock, flags); + wake_up_locked_poll(dcb-poll, dcb-active); + dcb-active = 0; + spin_unlock_irqrestore(dcb-poll-lock, flags); +} + +static unsigned int dma_buf_poll(struct file *file, poll_table *poll) +{ + struct dma_buf *dmabuf; + struct reservation_object *resv; + unsigned long events; + + dmabuf = file-private_data; + if (!dmabuf || !dmabuf-resv) + return POLLERR; + + resv = dmabuf-resv; + + poll_wait(file, dmabuf-poll, poll); + + events = poll_requested_events(poll) (POLLIN | POLLOUT); + if (!events) + return 0; + + ww_mutex_lock(resv-lock, NULL); + + if (resv-fence_excl (!(events POLLOUT) || resv-fence_shared_count == 0)) { + struct dma_buf_poll_cb_t *dcb = dmabuf-cb_excl; + unsigned long pevents = POLLIN; + + if (resv-fence_shared_count == 0) + pevents |= POLLOUT; + + spin_lock_irq(dmabuf-poll.lock); + if (dcb-active) { + dcb-active |= pevents; + events = ~pevents; + } else + dcb-active = pevents; + spin_unlock_irq(dmabuf-poll.lock); + + if (events pevents) { + if (!fence_add_callback(resv-fence_excl, + dcb-cb, dma_buf_poll_cb)) + events = ~pevents; + else + // No callback queued, wake up any additional waiters. couple spots w/ c++ comments, which I assume you didn't mean to leave? Anyways, other than those minor things, Reviewed-by: Rob Clark robdcl...@gmail.com + dma_buf_poll_cb(NULL, dcb-cb); + } + } + + if ((events POLLOUT) resv-fence_shared_count 0) { + struct dma_buf_poll_cb_t *dcb = dmabuf-cb_shared; + int i; + + /* Only queue a new callback if no event has fired yet */ + spin_lock_irq(dmabuf-poll.lock); + if (dcb-active) + events = ~POLLOUT; + else + dcb-active = POLLOUT; + spin_unlock_irq(dmabuf-poll.lock); + + if (!(events POLLOUT)) + goto out; + + for (i = 0; i resv-fence_shared_count; ++i) + if (!fence_add_callback(resv-fence_shared[i], + dcb-cb, dma_buf_poll_cb)) { + events = ~POLLOUT; + break; + } + + // No callback queued, wake up any additional waiters. + if (i == resv-fence_shared_count) + dma_buf_poll_cb(NULL, dcb-cb); + } + +out: + ww_mutex_unlock(resv-lock); + return events; +} + static const struct file_operations dma_buf_fops = { .release= dma_buf_release, .mmap
Re: [PATCH 3/6] dma-buf: use reservation objects
if maybe we just want to promote the 'struct reservation_object' ptr into 'struct drm_gem_object' so we can have a common get_prime_res_obj fxn for everyone using GEM? Anyways, that only matters within drivers/gpu/drm so easy enough to change it later.. so for the drm/fence/reservation/dmabuf bits: Reviewed-by: Rob Clark robdcl...@gmail.com + + return dma_buf_export(obj, drm_gem_prime_dmabuf_ops, obj-size, + flags, robj); } EXPORT_SYMBOL(drm_gem_prime_export); diff --git a/drivers/gpu/drm/exynos/exynos_drm_dmabuf.c b/drivers/gpu/drm/exynos/exynos_drm_dmabuf.c index 59827cc5e770..b5e89f46326e 100644 --- a/drivers/gpu/drm/exynos/exynos_drm_dmabuf.c +++ b/drivers/gpu/drm/exynos/exynos_drm_dmabuf.c @@ -187,7 +187,7 @@ struct dma_buf *exynos_dmabuf_prime_export(struct drm_device *drm_dev, struct exynos_drm_gem_obj *exynos_gem_obj = to_exynos_gem_obj(obj); return dma_buf_export(obj, exynos_dmabuf_ops, - exynos_gem_obj-base.size, flags); + exynos_gem_obj-base.size, flags, NULL); } struct drm_gem_object *exynos_dmabuf_prime_import(struct drm_device *drm_dev, diff --git a/drivers/gpu/drm/i915/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/i915_gem_dmabuf.c index 9bb533e0d762..ea66f40e95b3 100644 --- a/drivers/gpu/drm/i915/i915_gem_dmabuf.c +++ b/drivers/gpu/drm/i915/i915_gem_dmabuf.c @@ -233,7 +233,7 @@ static const struct dma_buf_ops i915_dmabuf_ops = { struct dma_buf *i915_gem_prime_export(struct drm_device *dev, struct drm_gem_object *gem_obj, int flags) { - return dma_buf_export(gem_obj, i915_dmabuf_ops, gem_obj-size, flags); + return dma_buf_export(gem_obj, i915_dmabuf_ops, gem_obj-size, flags, NULL); } static int i915_gem_object_get_pages_dmabuf(struct drm_i915_gem_object *obj) diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c index 78c8e7146d56..2a15c8e8d199 100644 --- a/drivers/gpu/drm/nouveau/nouveau_drm.c +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c @@ -816,6 +816,7 @@ driver = { .gem_prime_export = drm_gem_prime_export, .gem_prime_import = drm_gem_prime_import, .gem_prime_pin = nouveau_gem_prime_pin, + .gem_prime_res_obj = nouveau_gem_prime_res_obj, .gem_prime_unpin = nouveau_gem_prime_unpin, .gem_prime_get_sg_table = nouveau_gem_prime_get_sg_table, .gem_prime_import_sg_table = nouveau_gem_prime_import_sg_table, diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.h b/drivers/gpu/drm/nouveau/nouveau_gem.h index 7caca057bc38..ddab762d81fe 100644 --- a/drivers/gpu/drm/nouveau/nouveau_gem.h +++ b/drivers/gpu/drm/nouveau/nouveau_gem.h @@ -35,6 +35,7 @@ extern int nouveau_gem_ioctl_info(struct drm_device *, void *, struct drm_file *); extern int nouveau_gem_prime_pin(struct drm_gem_object *); +struct reservation_object *nouveau_gem_prime_res_obj(struct drm_gem_object *); extern void nouveau_gem_prime_unpin(struct drm_gem_object *); extern struct sg_table *nouveau_gem_prime_get_sg_table(struct drm_gem_object *); extern struct drm_gem_object *nouveau_gem_prime_import_sg_table( diff --git a/drivers/gpu/drm/nouveau/nouveau_prime.c b/drivers/gpu/drm/nouveau/nouveau_prime.c index 51a2cb102b44..1f51008e4d26 100644 --- a/drivers/gpu/drm/nouveau/nouveau_prime.c +++ b/drivers/gpu/drm/nouveau/nouveau_prime.c @@ -102,3 +102,10 @@ void nouveau_gem_prime_unpin(struct drm_gem_object *obj) nouveau_bo_unpin(nvbo); } + +struct reservation_object *nouveau_gem_prime_res_obj(struct drm_gem_object *obj) +{ + struct nouveau_bo *nvbo = nouveau_gem_object(obj); + + return nvbo-bo.resv; +} diff --git a/drivers/gpu/drm/omapdrm/omap_gem_dmabuf.c b/drivers/gpu/drm/omapdrm/omap_gem_dmabuf.c index 4fcca8d42796..a2dbfb1737b4 100644 --- a/drivers/gpu/drm/omapdrm/omap_gem_dmabuf.c +++ b/drivers/gpu/drm/omapdrm/omap_gem_dmabuf.c @@ -171,7 +171,7 @@ static struct dma_buf_ops omap_dmabuf_ops = { struct dma_buf *omap_gem_prime_export(struct drm_device *dev, struct drm_gem_object *obj, int flags) { - return dma_buf_export(obj, omap_dmabuf_ops, obj-size, flags); + return dma_buf_export(obj, omap_dmabuf_ops, obj-size, flags, NULL); } struct drm_gem_object *omap_gem_prime_import(struct drm_device *dev, diff --git a/drivers/gpu/drm/radeon/radeon_drv.c b/drivers/gpu/drm/radeon/radeon_drv.c index 84a1bbb75f91..c15c1a1996fc 100644 --- a/drivers/gpu/drm/radeon/radeon_drv.c +++ b/drivers/gpu/drm/radeon/radeon_drv.c @@ -128,6 +128,7 @@ struct drm_gem_object *radeon_gem_prime_import_sg_table(struct drm_device *dev, struct sg_table *sg); int radeon_gem_prime_pin(struct drm_gem_object *obj); void radeon_gem_prime_unpin(struct drm_gem_object *obj
Re: [PATCH 5/6] reservation: add support for fences to enable cross-device synchronisation
On Mon, Feb 17, 2014 at 10:58 AM, Maarten Lankhorst maarten.lankho...@canonical.com wrote: Signed-off-by: Maarten Lankhorst maarten.lankho...@canonical.com Reviewed-by: Rob Clark robdcl...@gmail.com --- include/linux/reservation.h | 18 +- 1 file changed, 17 insertions(+), 1 deletion(-) diff --git a/include/linux/reservation.h b/include/linux/reservation.h index 813dae960ebd..92c4851b5a39 100644 --- a/include/linux/reservation.h +++ b/include/linux/reservation.h @@ -6,7 +6,7 @@ * Copyright (C) 2012 Texas Instruments * * Authors: - * Rob Clark rob.cl...@linaro.org + * Rob Clark robdcl...@gmail.com * Maarten Lankhorst maarten.lankho...@canonical.com * Thomas Hellstrom thellstrom-at-vmware-dot-com * @@ -40,22 +40,38 @@ #define _LINUX_RESERVATION_H #include linux/ww_mutex.h +#include linux/fence.h extern struct ww_class reservation_ww_class; struct reservation_object { struct ww_mutex lock; + + struct fence *fence_excl; + struct fence **fence_shared; + u32 fence_shared_count, fence_shared_max; }; static inline void reservation_object_init(struct reservation_object *obj) { ww_mutex_init(obj-lock, reservation_ww_class); + + obj-fence_shared_count = obj-fence_shared_max = 0; + obj-fence_shared = NULL; + obj-fence_excl = NULL; } static inline void reservation_object_fini(struct reservation_object *obj) { + int i; + + if (obj-fence_excl) + fence_put(obj-fence_excl); + for (i = 0; i obj-fence_shared_count; ++i) + fence_put(obj-fence_shared[i]); + ww_mutex_destroy(obj-lock); } -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/6] seqno-fence: Hardware dma-buf implementation of fencing (v4)
On Mon, Feb 17, 2014 at 10:56 AM, Maarten Lankhorst maarten.lankho...@canonical.com wrote: This type of fence can be used with hardware synchronization for simple hardware that can block execution until the condition (dma_buf[offset] - value) = 0 has been met. A software fallback still has to be provided in case the fence is used with a device that doesn't support this mechanism. It is useful to expose this for graphics cards that have an op to support this. Some cards like i915 can export those, but don't have an option to wait, so they need the software fallback. I extended the original patch by Rob Clark. v1: Original v2: Renamed from bikeshed to seqno, moved into dma-fence.c since not much was left of the file. Lots of documentation added. v3: Use fence_ops instead of custom callbacks. Moved to own file to avoid circular dependency between dma-buf.h and fence.h v4: Add spinlock pointer to seqno_fence_init Signed-off-by: Maarten Lankhorst maarten.lankho...@canonical.com Reviewed-by: Rob Clark robdcl...@gmail.com --- Documentation/DocBook/device-drivers.tmpl |1 drivers/base/fence.c | 50 + include/linux/seqno-fence.h | 109 + 3 files changed, 160 insertions(+) create mode 100644 include/linux/seqno-fence.h diff --git a/Documentation/DocBook/device-drivers.tmpl b/Documentation/DocBook/device-drivers.tmpl index 7a0c9ddb4818..8c85c20942c2 100644 --- a/Documentation/DocBook/device-drivers.tmpl +++ b/Documentation/DocBook/device-drivers.tmpl @@ -131,6 +131,7 @@ X!Edrivers/base/interface.c !Edrivers/base/dma-buf.c !Edrivers/base/fence.c !Iinclude/linux/fence.h +!Iinclude/linux/seqno-fence.h !Edrivers/base/reservation.c !Iinclude/linux/reservation.h !Edrivers/base/dma-coherent.c diff --git a/drivers/base/fence.c b/drivers/base/fence.c index 12df2bf62034..cd0937127a89 100644 --- a/drivers/base/fence.c +++ b/drivers/base/fence.c @@ -25,6 +25,7 @@ #include linux/export.h #include linux/atomic.h #include linux/fence.h +#include linux/seqno-fence.h #define CREATE_TRACE_POINTS #include trace/events/fence.h @@ -413,3 +414,52 @@ __fence_init(struct fence *fence, const struct fence_ops *ops, trace_fence_init(fence); } EXPORT_SYMBOL(__fence_init); + +static const char *seqno_fence_get_driver_name(struct fence *fence) { + struct seqno_fence *seqno_fence = to_seqno_fence(fence); + return seqno_fence-ops-get_driver_name(fence); +} + +static const char *seqno_fence_get_timeline_name(struct fence *fence) { + struct seqno_fence *seqno_fence = to_seqno_fence(fence); + return seqno_fence-ops-get_timeline_name(fence); +} + +static bool seqno_enable_signaling(struct fence *fence) +{ + struct seqno_fence *seqno_fence = to_seqno_fence(fence); + return seqno_fence-ops-enable_signaling(fence); +} + +static bool seqno_signaled(struct fence *fence) +{ + struct seqno_fence *seqno_fence = to_seqno_fence(fence); + return seqno_fence-ops-signaled seqno_fence-ops-signaled(fence); +} + +static void seqno_release(struct fence *fence) +{ + struct seqno_fence *f = to_seqno_fence(fence); + + dma_buf_put(f-sync_buf); + if (f-ops-release) + f-ops-release(fence); + else + kfree(f); +} + +static long seqno_wait(struct fence *fence, bool intr, signed long timeout) +{ + struct seqno_fence *f = to_seqno_fence(fence); + return f-ops-wait(fence, intr, timeout); +} + +const struct fence_ops seqno_fence_ops = { + .get_driver_name = seqno_fence_get_driver_name, + .get_timeline_name = seqno_fence_get_timeline_name, + .enable_signaling = seqno_enable_signaling, + .signaled = seqno_signaled, + .wait = seqno_wait, + .release = seqno_release, +}; +EXPORT_SYMBOL(seqno_fence_ops); diff --git a/include/linux/seqno-fence.h b/include/linux/seqno-fence.h new file mode 100644 index ..952f7909128c --- /dev/null +++ b/include/linux/seqno-fence.h @@ -0,0 +1,109 @@ +/* + * seqno-fence, using a dma-buf to synchronize fencing + * + * Copyright (C) 2012 Texas Instruments + * Copyright (C) 2012 Canonical Ltd + * Authors: + * Rob Clark robdcl...@gmail.com + * Maarten Lankhorst maarten.lankho...@canonical.com + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published by + * the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program
Re: [PATCH 1/6] fence: dma-buf cross-device synchronization (v17)
dma_fence_is_signaled and dma_fence_wait_timeout. s/dma// and change exports to non GPL. Added fence_is_signaled and fence_enable_sw_signaling calls, add ability to override default wait operation. v10: remove event_queue, use a custom list, export try_to_wake_up from scheduler. Remove fence lock and use a global spinlock instead, this should hopefully remove all the locking headaches I was having on trying to implement this. enable_signaling is called with this lock held. v11: Use atomic ops for flags, lifting the need for some spin_lock_irqsaves. However I kept the guarantee that after fence_signal returns, it is guaranteed that enable_signaling has either been called to completion, or will not be called any more. Add contexts and seqno to base fence implementation. This allows you to wait for less fences, by testing for seqno + signaled, and then only wait on the later fence. Add FENCE_TRACE, FENCE_WARN, and FENCE_ERR. This makes debugging easier. An CONFIG_DEBUG_FENCE will be added to turn off the FENCE_TRACE spam, and another runtime option can turn it off at runtime. v12: Add CONFIG_FENCE_TRACE. Add missing documentation for the fence-context and fence-seqno members. v13: Fixup CONFIG_FENCE_TRACE kconfig description. Move fence_context_alloc to fence. Simplify fence_later. Kill priv member to fence_cb. v14: Remove priv argument from fence_add_callback, oops! v15: Remove priv from documentation. Explicitly include linux/atomic.h. v16: Add trace events. Import changes required by android syncpoints. v17: Use wake_up_state instead of try_to_wake_up. (Colin Cross) Fix up commit description for seqno_fence. (Rob Clark) Signed-off-by: Maarten Lankhorst maarten.lankho...@canonical.com Reviewed-by: Rob Clark robdcl...@gmail.com --- Documentation/DocBook/device-drivers.tmpl |2 drivers/base/Kconfig |9 + drivers/base/Makefile |2 drivers/base/fence.c | 415 + include/linux/fence.h | 329 +++ include/trace/events/fence.h | 125 + 6 files changed, 881 insertions(+), 1 deletion(-) create mode 100644 drivers/base/fence.c create mode 100644 include/linux/fence.h create mode 100644 include/trace/events/fence.h diff --git a/Documentation/DocBook/device-drivers.tmpl b/Documentation/DocBook/device-drivers.tmpl index f5170082bdb3..7a0c9ddb4818 100644 --- a/Documentation/DocBook/device-drivers.tmpl +++ b/Documentation/DocBook/device-drivers.tmpl @@ -129,6 +129,8 @@ X!Edrivers/base/interface.c /sect1 sect1titleDevice Drivers DMA Management/title !Edrivers/base/dma-buf.c +!Edrivers/base/fence.c +!Iinclude/linux/fence.h !Edrivers/base/reservation.c !Iinclude/linux/reservation.h !Edrivers/base/dma-coherent.c diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig index ec36e7772e57..b50ad30151ae 100644 --- a/drivers/base/Kconfig +++ b/drivers/base/Kconfig @@ -200,6 +200,15 @@ config DMA_SHARED_BUFFER APIs extension; the file's descriptor can then be passed on to other driver. +config FENCE_TRACE + bool Enable verbose FENCE_TRACE messages + depends on DMA_SHARED_BUFFER + help + Enable the FENCE_TRACE printks. This will add extra + spam to the console log, but will make it easier to diagnose + lockup related problems for dma-buffers shared across multiple + devices. + config DMA_CMA bool DMA Contiguous Memory Allocator depends on HAVE_DMA_CONTIGUOUS CMA diff --git a/drivers/base/Makefile b/drivers/base/Makefile index 04b314e0fa51..eb4864aee073 100644 --- a/drivers/base/Makefile +++ b/drivers/base/Makefile @@ -10,7 +10,7 @@ obj-$(CONFIG_DMA_CMA) += dma-contiguous.o obj-y += power/ obj-$(CONFIG_HAS_DMA) += dma-mapping.o obj-$(CONFIG_HAVE_GENERIC_DMA_COHERENT) += dma-coherent.o -obj-$(CONFIG_DMA_SHARED_BUFFER) += dma-buf.o reservation.o +obj-$(CONFIG_DMA_SHARED_BUFFER) += dma-buf.o fence.o reservation.o obj-$(CONFIG_ISA) += isa.o obj-$(CONFIG_FW_LOADER)+= firmware_class.o obj-$(CONFIG_NUMA) += node.o diff --git a/drivers/base/fence.c b/drivers/base/fence.c new file mode 100644 index ..12df2bf62034 --- /dev/null +++ b/drivers/base/fence.c @@ -0,0 +1,415 @@ +/* + * Fence mechanism for dma-buf and to allow for asynchronous dma access + * + * Copyright (C) 2012 Canonical Ltd + * Copyright (C) 2012 Texas Instruments + * + * Authors: + * Rob Clark robdcl...@gmail.com + * Maarten Lankhorst maarten.lankho...@canonical.com + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published
Re: [PATCH 2/6] seqno-fence: Hardware dma-buf implementation of fencing (v4)
On Mon, Feb 17, 2014 at 11:56 AM, Christian König deathsim...@vodafone.de wrote: Am 17.02.2014 16:56, schrieb Maarten Lankhorst: This type of fence can be used with hardware synchronization for simple hardware that can block execution until the condition (dma_buf[offset] - value) = 0 has been met. Can't we make that just dma_buf[offset] != 0 instead? As far as I know this way it would match the definition M$ uses in their WDDM specification and so make it much more likely that hardware supports it. well 'buf[offset] = value' at least means the same slot can be used for multiple operations (with increasing values of 'value').. not sure if that is something people care about. =value seems to be possible with adreno and radeon. I'm not really sure about others (although I presume it as least supported for nv desktop stuff). For hw that cannot do =value, we can either have a different fence implementation which uses the !=0 approach. Or change seqno-fence implementation later if needed. But if someone has hw that can do !=0 but not =value, speak up now ;-) Apart from that I still don't like the idea of leaking a drivers IRQ context outside of the driver, but without a proper GPU scheduler there probably isn't much alternative. I guess it will be not uncommon scenario for gpu device to just need to kick display device to write a few registers for a page flip.. probably best not to schedule a worker just for this (unless the signalled device otherwise needs to). I think it is better in this case to give the signalee some rope to hang themselves, and make it the responsibility of the callback to kick things off to a worker if needed. BR, -R Christian. A software fallback still has to be provided in case the fence is used with a device that doesn't support this mechanism. It is useful to expose this for graphics cards that have an op to support this. Some cards like i915 can export those, but don't have an option to wait, so they need the software fallback. I extended the original patch by Rob Clark. v1: Original v2: Renamed from bikeshed to seqno, moved into dma-fence.c since not much was left of the file. Lots of documentation added. v3: Use fence_ops instead of custom callbacks. Moved to own file to avoid circular dependency between dma-buf.h and fence.h v4: Add spinlock pointer to seqno_fence_init Signed-off-by: Maarten Lankhorst maarten.lankho...@canonical.com --- Documentation/DocBook/device-drivers.tmpl |1 drivers/base/fence.c | 50 + include/linux/seqno-fence.h | 109 + 3 files changed, 160 insertions(+) create mode 100644 include/linux/seqno-fence.h diff --git a/Documentation/DocBook/device-drivers.tmpl b/Documentation/DocBook/device-drivers.tmpl index 7a0c9ddb4818..8c85c20942c2 100644 --- a/Documentation/DocBook/device-drivers.tmpl +++ b/Documentation/DocBook/device-drivers.tmpl @@ -131,6 +131,7 @@ X!Edrivers/base/interface.c !Edrivers/base/dma-buf.c !Edrivers/base/fence.c !Iinclude/linux/fence.h +!Iinclude/linux/seqno-fence.h !Edrivers/base/reservation.c !Iinclude/linux/reservation.h !Edrivers/base/dma-coherent.c diff --git a/drivers/base/fence.c b/drivers/base/fence.c index 12df2bf62034..cd0937127a89 100644 --- a/drivers/base/fence.c +++ b/drivers/base/fence.c @@ -25,6 +25,7 @@ #include linux/export.h #include linux/atomic.h #include linux/fence.h +#include linux/seqno-fence.h #define CREATE_TRACE_POINTS #include trace/events/fence.h @@ -413,3 +414,52 @@ __fence_init(struct fence *fence, const struct fence_ops *ops, trace_fence_init(fence); } EXPORT_SYMBOL(__fence_init); + +static const char *seqno_fence_get_driver_name(struct fence *fence) { + struct seqno_fence *seqno_fence = to_seqno_fence(fence); + return seqno_fence-ops-get_driver_name(fence); +} + +static const char *seqno_fence_get_timeline_name(struct fence *fence) { + struct seqno_fence *seqno_fence = to_seqno_fence(fence); + return seqno_fence-ops-get_timeline_name(fence); +} + +static bool seqno_enable_signaling(struct fence *fence) +{ + struct seqno_fence *seqno_fence = to_seqno_fence(fence); + return seqno_fence-ops-enable_signaling(fence); +} + +static bool seqno_signaled(struct fence *fence) +{ + struct seqno_fence *seqno_fence = to_seqno_fence(fence); + return seqno_fence-ops-signaled seqno_fence-ops-signaled(fence); +} + +static void seqno_release(struct fence *fence) +{ + struct seqno_fence *f = to_seqno_fence(fence); + + dma_buf_put(f-sync_buf); + if (f-ops-release) + f-ops-release(fence); + else + kfree(f); +} + +static long seqno_wait(struct fence *fence, bool intr, signed long timeout) +{ + struct seqno_fence *f = to_seqno_fence(fence); + return f-ops-wait
Re: [PATCH 2/6] seqno-fence: Hardware dma-buf implementation of fencing (v4)
On Mon, Feb 17, 2014 at 12:36 PM, Christian König deathsim...@vodafone.de wrote: Am 17.02.2014 18:27, schrieb Rob Clark: On Mon, Feb 17, 2014 at 11:56 AM, Christian König deathsim...@vodafone.de wrote: Am 17.02.2014 16:56, schrieb Maarten Lankhorst: This type of fence can be used with hardware synchronization for simple hardware that can block execution until the condition (dma_buf[offset] - value) = 0 has been met. Can't we make that just dma_buf[offset] != 0 instead? As far as I know this way it would match the definition M$ uses in their WDDM specification and so make it much more likely that hardware supports it. well 'buf[offset] = value' at least means the same slot can be used for multiple operations (with increasing values of 'value').. not sure if that is something people care about. =value seems to be possible with adreno and radeon. I'm not really sure about others (although I presume it as least supported for nv desktop stuff). For hw that cannot do =value, we can either have a different fence implementation which uses the !=0 approach. Or change seqno-fence implementation later if needed. But if someone has hw that can do !=0 but not =value, speak up now ;-) Here! Radeon can only do =value on the DMA and 3D engine, but not with UVD or VCE. And for the 3D engine it means draining the pipe, which isn't really a good idea. hmm, ok.. forgot you have a few extra rings compared to me. Is UVD re-ordering from decode-order to display-order for you in hw? If not, I guess you need sw intervention anyways when a frame is done for frame re-ordering, so maybe hw-hw sync doesn't really matter as much as compared to gpu/3d-display. For dma-3d interactions, seems like you would care more about hw-hw sync, but I guess you aren't likely to use GPU A to do a resolve blit for GPU B.. For 3D ring, I assume you probably want a CP_WAIT_FOR_IDLE before a CP_MEM_WRITE to update fence value in memory (for the one signalling the fence). But why would you need that before a CP_WAIT_REG_MEM (for the one waiting for the fence)? I don't exactly have documentation for adreno version of CP_WAIT_REG_{MEM,EQ,GTE}.. but PFP and ME appear to be same instruction set as r600, so I'm pretty sure they should have similar capabilities.. CP_WAIT_REG_MEM appears to be same but with 32bit gpu addresses vs 64b. BR, -R Christian. Apart from that I still don't like the idea of leaking a drivers IRQ context outside of the driver, but without a proper GPU scheduler there probably isn't much alternative. I guess it will be not uncommon scenario for gpu device to just need to kick display device to write a few registers for a page flip.. probably best not to schedule a worker just for this (unless the signalled device otherwise needs to). I think it is better in this case to give the signalee some rope to hang themselves, and make it the responsibility of the callback to kick things off to a worker if needed. BR, -R Christian. A software fallback still has to be provided in case the fence is used with a device that doesn't support this mechanism. It is useful to expose this for graphics cards that have an op to support this. Some cards like i915 can export those, but don't have an option to wait, so they need the software fallback. I extended the original patch by Rob Clark. v1: Original v2: Renamed from bikeshed to seqno, moved into dma-fence.c since not much was left of the file. Lots of documentation added. v3: Use fence_ops instead of custom callbacks. Moved to own file to avoid circular dependency between dma-buf.h and fence.h v4: Add spinlock pointer to seqno_fence_init Signed-off-by: Maarten Lankhorst maarten.lankho...@canonical.com --- Documentation/DocBook/device-drivers.tmpl |1 drivers/base/fence.c | 50 + include/linux/seqno-fence.h | 109 + 3 files changed, 160 insertions(+) create mode 100644 include/linux/seqno-fence.h diff --git a/Documentation/DocBook/device-drivers.tmpl b/Documentation/DocBook/device-drivers.tmpl index 7a0c9ddb4818..8c85c20942c2 100644 --- a/Documentation/DocBook/device-drivers.tmpl +++ b/Documentation/DocBook/device-drivers.tmpl @@ -131,6 +131,7 @@ X!Edrivers/base/interface.c !Edrivers/base/dma-buf.c !Edrivers/base/fence.c !Iinclude/linux/fence.h +!Iinclude/linux/seqno-fence.h !Edrivers/base/reservation.c !Iinclude/linux/reservation.h !Edrivers/base/dma-coherent.c diff --git a/drivers/base/fence.c b/drivers/base/fence.c index 12df2bf62034..cd0937127a89 100644 --- a/drivers/base/fence.c +++ b/drivers/base/fence.c @@ -25,6 +25,7 @@ #include linux/export.h #include linux/atomic.h #include linux/fence.h +#include linux/seqno-fence.h #define CREATE_TRACE_POINTS #include trace/events/fence.h @@ -413,3 +414,52 @@ __fence_init(struct fence *fence, const struct
Re: [PATCH] dma-buf: avoid using IS_ERR_OR_NULL
On Fri, Dec 20, 2013 at 7:43 PM, Colin Cross ccr...@android.com wrote: dma_buf_map_attachment and dma_buf_vmap can return NULL or ERR_PTR on a error. This encourages a common buggy pattern in callers: sgt = dma_buf_map_attachment(attach, DMA_BIDIRECTIONAL); if (IS_ERR_OR_NULL(sgt)) return PTR_ERR(sgt); This causes the caller to return 0 on an error. IS_ERR_OR_NULL is almost always a sign of poorly-defined error handling. This patch converts dma_buf_map_attachment to always return ERR_PTR, and fixes the callers that incorrectly handled NULL. There are a few more callers that were not checking for NULL at all, which would have dereferenced a NULL pointer later. There are also a few more callers that correctly handled NULL and ERR_PTR differently, I left those alone but they could also be modified to delete the NULL check. This patch also converts dma_buf_vmap to always return NULL. All the callers to dma_buf_vmap only check for NULL, and would have dereferenced an ERR_PTR and panic'd if one was ever returned. This is not consistent with the rest of the dma buf APIs, but matches the expectations of all of the callers. Signed-off-by: Colin Cross ccr...@android.com --- drivers/base/dma-buf.c | 18 +++--- drivers/gpu/drm/drm_prime.c| 2 +- drivers/gpu/drm/exynos/exynos_drm_dmabuf.c | 2 +- drivers/media/v4l2-core/videobuf2-dma-contig.c | 2 +- 4 files changed, 14 insertions(+), 10 deletions(-) diff --git a/drivers/base/dma-buf.c b/drivers/base/dma-buf.c index 1e16cbd61da2..cfe1d8bc7bb8 100644 --- a/drivers/base/dma-buf.c +++ b/drivers/base/dma-buf.c @@ -251,9 +251,8 @@ EXPORT_SYMBOL_GPL(dma_buf_put); * @dmabuf:[in]buffer to attach device to. * @dev: [in]device to be attached. * - * Returns struct dma_buf_attachment * for this attachment; may return negative - * error codes. - * + * Returns struct dma_buf_attachment * for this attachment; returns ERR_PTR on + * error. */ struct dma_buf_attachment *dma_buf_attach(struct dma_buf *dmabuf, struct device *dev) @@ -319,9 +318,8 @@ EXPORT_SYMBOL_GPL(dma_buf_detach); * @attach:[in]attachment whose scatterlist is to be returned * @direction: [in]direction of DMA transfer * - * Returns sg_table containing the scatterlist to be returned; may return NULL - * or ERR_PTR. - * + * Returns sg_table containing the scatterlist to be returned; returns ERR_PTR + * on error. */ struct sg_table *dma_buf_map_attachment(struct dma_buf_attachment *attach, enum dma_data_direction direction) @@ -334,6 +332,8 @@ struct sg_table *dma_buf_map_attachment(struct dma_buf_attachment *attach, return ERR_PTR(-EINVAL); sg_table = attach-dmabuf-ops-map_dma_buf(attach, direction); + if (!sg_table) + sg_table = ERR_PTR(-ENOMEM); return sg_table; } @@ -544,6 +544,8 @@ EXPORT_SYMBOL_GPL(dma_buf_mmap); * These calls are optional in drivers. The intended use for them * is for mapping objects linear in kernel space for high use objects. * Please attempt to use kmap/kunmap before thinking about these interfaces. + * + * Returns NULL on error. */ void *dma_buf_vmap(struct dma_buf *dmabuf) { @@ -566,7 +568,9 @@ void *dma_buf_vmap(struct dma_buf *dmabuf) BUG_ON(dmabuf-vmap_ptr); ptr = dmabuf-ops-vmap(dmabuf); - if (IS_ERR_OR_NULL(ptr)) + if (WARN_ON_ONCE(IS_ERR(ptr))) since vmap is optional, the WARN_ON might be a bit strong.. although it would be a bit strange for an exporter to supply a vmap fxn which always returned NULL, not sure about that. Just thought I'd mention it in case anyone else had an opinion about that. But either way: Reviewed-by: Rob Clark robdcl...@gmail.com + ptr = NULL; + if (!ptr) goto out_unlock; dmabuf-vmap_ptr = ptr; diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c index 56805c39c906..bb516fdd195d 100644 --- a/drivers/gpu/drm/drm_prime.c +++ b/drivers/gpu/drm/drm_prime.c @@ -471,7 +471,7 @@ struct drm_gem_object *drm_gem_prime_import(struct drm_device *dev, get_dma_buf(dma_buf); sgt = dma_buf_map_attachment(attach, DMA_BIDIRECTIONAL); - if (IS_ERR_OR_NULL(sgt)) { + if (IS_ERR(sgt)) { ret = PTR_ERR(sgt); goto fail_detach; } diff --git a/drivers/gpu/drm/exynos/exynos_drm_dmabuf.c b/drivers/gpu/drm/exynos/exynos_drm_dmabuf.c index 59827cc5e770..c786cd4f457b 100644 --- a/drivers/gpu/drm/exynos/exynos_drm_dmabuf.c +++ b/drivers/gpu/drm/exynos/exynos_drm_dmabuf.c @@ -224,7 +224,7 @@ struct drm_gem_object *exynos_dmabuf_prime_import(struct drm_device *drm_dev, get_dma_buf(dma_buf); sgt
Re: [RFC PATCH] fence: dma-buf cross-device synchronization (v12)
On Thu, Aug 15, 2013 at 7:16 AM, Maarten Lankhorst maarten.lankho...@canonical.com wrote: Op 12-08-13 17:43, Rob Clark schreef: On Mon, Jul 29, 2013 at 10:05 AM, Maarten Lankhorst maarten.lankho...@canonical.com wrote: + [snip] +/** + * fence_add_callback - add a callback to be called when the fence + * is signaled + * @fence: [in]the fence to wait on + * @cb:[in]the callback to register + * @func: [in]the function to call + * @priv: [in]the argument to pass to function + * + * cb will be initialized by fence_add_callback, no initialization + * by the caller is required. Any number of callbacks can be registered + * to a fence, but a callback can only be registered to one fence at a time. + * + * Note that the callback can be called from an atomic context. If + * fence is already signaled, this function will return -ENOENT (and + * *not* call the callback) + * + * Add a software callback to the fence. Same restrictions apply to + * refcount as it does to fence_wait, however the caller doesn't need to + * keep a refcount to fence afterwards: when software access is enabled, + * the creator of the fence is required to keep the fence alive until + * after it signals with fence_signal. The callback itself can be called + * from irq context. + * + */ +int fence_add_callback(struct fence *fence, struct fence_cb *cb, + fence_func_t func, void *priv) +{ + unsigned long flags; + int ret = 0; + bool was_set; + + if (WARN_ON(!fence || !func)) + return -EINVAL; + + if (test_bit(FENCE_FLAG_SIGNALED_BIT, fence-flags)) + return -ENOENT; + + spin_lock_irqsave(fence-lock, flags); + + was_set = test_and_set_bit(FENCE_FLAG_ENABLE_SIGNAL_BIT, fence-flags); + + if (test_bit(FENCE_FLAG_SIGNALED_BIT, fence-flags)) + ret = -ENOENT; + else if (!was_set !fence-ops-enable_signaling(fence)) { + __fence_signal(fence); + ret = -ENOENT; + } + + if (!ret) { + cb-func = func; + cb-priv = priv; + list_add_tail(cb-node, fence-cb_list); since the user is providing the 'struct fence_cb', why not drop the priv func args, and have some cb-initialize macro, ie. INIT_FENCE_CB(foo-fence, cbfxn); and I guess we can just drop priv and let the user embed fence in whatever structure they like. Ie. make it look a bit how work_struct works. I don't mind killing priv. But a INIT_FENCE_CB macro is silly, when all it would do is set cb-func. So passing it as an argument to fence_add_callback is fine, unless you have a better reason to do so. INIT_WORK seems to have a bit more initialization than us, it seems work can be more complicated than callbacks, because the callbacks can only be called once and work can be rescheduled multiple times. yeah, INIT_WORK does more.. although maybe some day we want INIT_FENCE_CB to do more (ie. if we add some debug features to help catch misuse of fence/fence-cb's). And if nothing else, having it look a bit like other constructs that we have in the kernel seems useful. And with my point below, you'd want INIT_FENCE_CB to do a INIT_LIST_HEAD(), so it is (very) slightly more than just setting the fxn ptr. maybe also, if (!list_empty(cb-node) return -EBUSY? I think checking for list_empty(cb-node) is a terrible idea. This is no different from any other list corruption, and it's a programming error. Not a runtime error. :-) I was thinking for crtc and page-flip, embed the fence_cb in the crtc. You should only use the cb once at a time, but in this case you might want to re-use it for the next page flip. Having something to catch cb mis-use in this sort of scenario seems useful. maybe how I am thinking to use fence_cb is not quite what you had in mind. I'm not sure. I was trying to think how I could just directly use fence/fence_cb in msm for everything (imported dmabuf or just regular 'ol gem buffers). cb-node.next/prev may be NULL, which would fail with this check. The contents of cb-node are undefined before fence_add_callback is called. Calling fence_remove_callback on a fence that hasn't been added is undefined too. Calling fence_remove_callback works, but I'm thinking of changing the list_del_init to list_del, which would make calling fence_remove_callback twice a fatal error if CONFIG_DEBUG_LIST is enabled, and a possible memory corruption otherwise. ... + [snip] + +/** + * fence context counter: each execution context should have its own + * fence context, this allows checking if fences belong to the same + * context or not. One device can have multiple separate contexts, + * and they're used if some engine can run independently of another. + */ +extern atomic_t fence_context_counter; context-alloc should not be in the critical path.. I'd think probably
Re: [RFC PATCH] fence: dma-buf cross-device synchronization (v12)
default wait operation. v10: remove event_queue, use a custom list, export try_to_wake_up from scheduler. Remove fence lock and use a global spinlock instead, this should hopefully remove all the locking headaches I was having on trying to implement this. enable_signaling is called with this lock held. v11: Use atomic ops for flags, lifting the need for some spin_lock_irqsaves. However I kept the guarantee that after fence_signal returns, it is guaranteed that enable_signaling has either been called to completion, or will not be called any more. Add contexts and seqno to base fence implementation. This allows you to wait for less fences, by testing for seqno + signaled, and then only wait on the later fence. Add FENCE_TRACE, FENCE_WARN, and FENCE_ERR. This makes debugging easier. An CONFIG_DEBUG_FENCE will be added to turn off the FENCE_TRACE spam, and another runtime option can turn it off at runtime. v12: Add CONFIG_FENCE_TRACE. Add missing documentation for the fence-context and fence-seqno members. Signed-off-by: Maarten Lankhorst maarten.lankho...@canonical.com Hi, few (mostly minor/superficial comments).. I didn't really spot anything major (but then again, I think I've looked at all/most of the earlier versions of this too) Reviewed-by: Rob Clark robdcl...@gmail.com --- Documentation/DocBook/device-drivers.tmpl |2 drivers/base/Kconfig | 10 + drivers/base/Makefile |2 drivers/base/fence.c | 286 +++ include/linux/fence.h | 365 + 5 files changed, 664 insertions(+), 1 deletion(-) create mode 100644 drivers/base/fence.c create mode 100644 include/linux/fence.h diff --git a/Documentation/DocBook/device-drivers.tmpl b/Documentation/DocBook/device-drivers.tmpl index cbfdf54..241f4c5 100644 --- a/Documentation/DocBook/device-drivers.tmpl +++ b/Documentation/DocBook/device-drivers.tmpl @@ -126,6 +126,8 @@ X!Edrivers/base/interface.c /sect1 sect1titleDevice Drivers DMA Management/title !Edrivers/base/dma-buf.c +!Edrivers/base/fence.c +!Iinclude/linux/fence.h !Edrivers/base/reservation.c !Iinclude/linux/reservation.h !Edrivers/base/dma-coherent.c diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig index 5daa259..0ad35df 100644 --- a/drivers/base/Kconfig +++ b/drivers/base/Kconfig @@ -200,6 +200,16 @@ config DMA_SHARED_BUFFER APIs extension; the file's descriptor can then be passed on to other driver. +config FENCE_TRACE + bool Enable verbose FENCE_TRACE messages + default n + depends on DMA_SHARED_BUFFER + help + Enable the FENCE_TRACE printks. This will add extra + spam to the config log, but will make it easier to diagnose s/config/console/ I guess? + lockup related problems for dma-buffers shared across multiple + devices. + config CMA bool Contiguous Memory Allocator depends on HAVE_DMA_CONTIGUOUS HAVE_MEMBLOCK diff --git a/drivers/base/Makefile b/drivers/base/Makefile index 48029aa..8a55cb9 100644 --- a/drivers/base/Makefile +++ b/drivers/base/Makefile @@ -10,7 +10,7 @@ obj-$(CONFIG_CMA) += dma-contiguous.o obj-y += power/ obj-$(CONFIG_HAS_DMA) += dma-mapping.o obj-$(CONFIG_HAVE_GENERIC_DMA_COHERENT) += dma-coherent.o -obj-$(CONFIG_DMA_SHARED_BUFFER) += dma-buf.o reservation.o +obj-$(CONFIG_DMA_SHARED_BUFFER) += dma-buf.o fence.o reservation.o obj-$(CONFIG_ISA) += isa.o obj-$(CONFIG_FW_LOADER)+= firmware_class.o obj-$(CONFIG_NUMA) += node.o diff --git a/drivers/base/fence.c b/drivers/base/fence.c new file mode 100644 index 000..28e5ffd --- /dev/null +++ b/drivers/base/fence.c @@ -0,0 +1,286 @@ +/* + * Fence mechanism for dma-buf and to allow for asynchronous dma access + * + * Copyright (C) 2012 Canonical Ltd + * Copyright (C) 2012 Texas Instruments + * + * Authors: + * Rob Clark rob.cl...@linaro.org + * Maarten Lankhorst maarten.lankho...@canonical.com + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published by + * the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program. If not, see http://www.gnu.org/licenses/. + */ + +#include linux/slab.h +#include linux/export.h +#include linux/fence.h + +atomic_t fence_context_counter = ATOMIC_INIT(0); +EXPORT_SYMBOL(fence_context_counter); + +int
Re: [RFC 0/1] drm/pl111: Initial drm/kms driver for pl111
On Fri, Aug 9, 2013 at 12:15 PM, Tom Cooksey tom.cook...@arm.com wrote: Turning to DRM/KMS, it seems the supported formats of a plane can be queried using drm_mode_get_plane. However, there doesn't seem to be a way to query the supported formats of a crtc? If display HW only supports scanning out from a single buffer (like pl111 does), I think it won't have any planes and a fb can only be set on the crtc. In which case, how should user-space query which pixel formats that crtc supports? it is exposed for drm plane's. What is missing is to expose the primary-plane associated with the crtc. Cool - so a patch which adds a way to query the what formats a crtc supports would be welcome? well, I kinda think we want something that exposes the primary plane of the crtc.. I'm thinking something roughly like: - diff --git a/include/uapi/drm/drm_mode.h b/include/uapi/drm/drm_mode.h index 53db7ce..c7ffca8 100644 --- a/include/uapi/drm/drm_mode.h +++ b/include/uapi/drm/drm_mode.h @@ -157,6 +157,12 @@ struct drm_mode_get_plane { struct drm_mode_get_plane_res { __u64 plane_id_ptr; __u32 count_planes; + /* The primary planes are in matching order to crtc_id_ptr in +* drm_mode_card_res (and same length). For crtc_id[n], it's +* primary plane is given by primary_plane_id[n]. +*/ + __u32 count_primary_planes; + __u64 primary_plane_id_ptr; }; #define DRM_MODE_ENCODER_NONE 0 - then use the existing GETPLANE ioctl to query the capabilities What about a way to query the stride alignment constraints? Presumably using the drm_mode_get_property mechanism would be the right way to implement that? I suppose you could try that.. typically that is in userspace, however. It seems like get_property would get messy quickly (ie. is it a pitch alignment constraint, or stride alignment? What if this is different for different formats (in particular tiled)? etc) As with v4l2, DRM doesn't appear to have a way to query the stride constraints? Assuming there is a way to query the stride constraints, there also isn't a way to specify them when creating a buffer with DRM, though perhaps the existing pitch parameter of drm_mode_create_dumb could be used to allow user-space to pass in a minimum stride as well as receive the allocated stride? well, you really shouldn't be using create_dumb.. you should have a userspace piece that is specific to the drm driver, and knows how to use that driver's gem allocate ioctl. Sorry, why does this need a driver-specific allocation function? It's just a display controller driver and I just want to allocate a scan- out buffer - all I'm asking is for the display controller driver to use a minimum stride alignment so I can export the buffer and use another device to fill it with data. Sure.. but userspace has more information readily available to make a better choice. For example, for omapdrm I'd do things differently depending on whether I wanted to scan out that buffer (or a portion of it) rotated. This is something I know in the DDX driver, but not in the kernel. And it is quite likely something that is driver specific. Sure, we could add that to a generic allocate me a buffer ioctl. But that doesn't really seem better, and it becomes a problem as soon as we come across some hw that needs to know something different. In userspace, you have a lot more flexibility, since you don't really need to commit to an API for life. And to bring back the GStreamer argument (since that seems a fitting example when you start talking about sharing buffers between many devices, for example camera+codec+display), it would already be negotiating format between v4l2src + fooencoder + displaysink.. the pitch/stride is part of that format information. If the display isn't the one with the strictest requirements, we don't want the display driver deciding what pitch to use. The whole point is to be able to allocate the buffer in such a way that another device can access it. So the driver _can't_ use a special, device specific format, nor can it allocate it from a private memory pool because doing so would preclude it from being shared with another device. That other device doesn't need to be a GPU wither, it could just as easily be a camera/ISP or video decoder. So presumably you're talking about a GPU driver being the exporter here? If so, how could the GPU driver do these kind of tricks on memory shared with another device? Yes, that is gpu-as-exporter. If someone else is allocating buffers, it is up to them to do these tricks or not. Probably there is a pretty good chance that if you aren't a GPU you don't need those sort of tricks for fast allocation of transient upload buffers, staging textures, temporary pixmaps, etc. Ie. I don't really think a v4l camera or video decoder would benefit from that sort of optimization. Right - but none of
Re: [RFC 0/1] drm/pl111: Initial drm/kms driver for pl111
On Wed, Aug 7, 2013 at 12:23 AM, John Stultz john.stu...@linaro.org wrote: On Tue, Aug 6, 2013 at 5:15 AM, Rob Clark robdcl...@gmail.com wrote: well, let's divide things up into two categories: 1) the arrangement and format of pixels.. ie. what userspace would need to know if it mmap's a buffer. This includes pixel format, stride, etc. This should be negotiated in userspace, it would be crazy to try to do this in the kernel. 2) the physical placement of the pages. Ie. whether it is contiguous or not. Which bank the pages in the buffer are placed in, etc. This is not visible to userspace. This is the purpose of the attach step, so you know all the devices involved in sharing up front before allocating the backing pages. (Or in the worst case, if you have a late attacher you at least know when no device is doing dma access to a buffer and can reallocate and move the buffer.) A long time One concern I know the Android folks have expressed previously (and correct me if its no longer an objection), is that this attach time in-kernel constraint solving / moving or reallocating buffers is likely to hurt determinism. If I understood, their perspective was that userland knows the device path the buffers will travel through, so why not leverage that knowledge, rather then having the kernel have to sort it out for itself after the fact. If you know the device path, then attach the buffer at all the devices before you start using it. Problem solved.. kernel knows all devices before pages need be allocated ;-) BR, -R -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 0/1] drm/pl111: Initial drm/kms driver for pl111
On Wed, Aug 7, 2013 at 1:33 PM, Tom Cooksey tom.cook...@arm.com wrote: Didn't you say that programmatically describing device placement constraints was an unbounded problem? I guess we would have to accept that it's not possible to describe all possible constraints and instead find a way to describe the common ones? well, the point I'm trying to make, is by dividing your constraints into two groups, one that impacts and is handled by userspace, and one that is in the kernel (ie. where the pages go), you cut down the number of permutations that the kernel has to care about considerably. And kernel already cares about, for example, what range of addresses that a device can dma to/from. I think really the only thing missing is the max # of sglist entries (contiguous or not) I think it's more than physically contiguous or not. For example, it can be more efficient to use large page sizes on devices with IOMMUs to reduce TLB traffic. I think the size and even the availability of large pages varies between different IOMMUs. sure.. but I suppose if we can spiff out dma_params to express I need contiguous, perhaps we can add some way to express I prefer as-contiguous-as-possible.. either way, this is about where the pages are placed, and not about the layout of pixels within the page, so should be in kernel. It's something that is missing, but I believe that it belongs in dma_params and hidden behind dma_alloc_*() for simple drivers. Thinking about it, isn't this more a property of the IOMMU? I mean, are there any cases where an IOMMU had a large page mode but you wouldn't want to use it? So when allocating the memory, you'd have to take into account not just the constraints of the devices themselves, but also of any IOMMUs any of the device sit behind? perhaps yes. But the device is associated w/ the iommu it is attached to, so this shouldn't be a problem There's also the issue of buffer stride alignment. As I say, if the buffer is to be written by a tile-based GPU like Mali, it's more efficient if the buffer's stride is aligned to the max AXI bus burst length. Though I guess a buffer stride only makes sense as a concept when interpreting the data as a linear-layout 2D image, so perhaps belongs in user-space along with format negotiation? Yeah.. this isn't about where the pages go, but about the arrangement within a page. And, well, except for hw that supports the same tiling (or compressed-fb) in display+gpu, you probably aren't sharing tiled buffers. You'd only want to share a buffer between devices if those devices can understand the same pixel format. That pixel format can't be device- specific or opaque, it has to be explicit. I think drm_fourcc.h is what defines all the possible pixel formats. This is the enum I used in EGL_EXT_image_dma_buf_import at least. So if we get to the point where multiple devices can understand a tiled or compressed format, I assume we could just add that format to drm_fourcc.h and possibly v4l2's v4l2_mbus_pixelcode enum in v4l2-mediabus.h. For user-space to negotiate a common pixel format and now stride alignment, I guess it will obviously need a way to query what pixel formats a device supports and what its stride alignment requirements are. I don't know v4l2 very well, but it certainly seems the pixel format can be queried using V4L2_SUBDEV_FORMAT_TRY when attempting to set a particular format. I couldn't however find a way to retrieve a list of supported formats - it seems the mechanism is to try out each format in turn to determine if it is supported. Is that right? it is exposed for drm plane's. What is missing is to expose the primary-plane associated with the crtc. There doesn't however seem a way to query what stride constraints a V4l2 device might have. Does HW abstracted by v4l2 typically have such constraints? If so, how can we query them such that a buffer allocated by a DRM driver can be imported into v4l2 and used with that HW? Turning to DRM/KMS, it seems the supported formats of a plane can be queried using drm_mode_get_plane. However, there doesn't seem to be a way to query the supported formats of a crtc? If display HW only supports scanning out from a single buffer (like pl111 does), I think it won't have any planes and a fb can only be set on the crtc. In which case, how should user-space query which pixel formats that crtc supports? Assuming user-space can query the supported formats and find a common one, it will need to allocate a buffer. Looks like drm_mode_create_dumb can do that, but it only takes a bpp parameter, there's no format parameter. I assume then that user-space defines the format and tells the DRM driver which format the buffer is in when creating the fb with drm_mode_fb_cmd2, which does take a format parameter? Is that right? Right, the gem object has no inherent format, it is just some bytes. The
Re: [RFC 0/1] drm/pl111: Initial drm/kms driver for pl111
On Tue, Aug 6, 2013 at 7:31 AM, Tom Cooksey tom.cook...@arm.com wrote: So in some respects, there is a constraint on how buffers which will be drawn to using the GPU are allocated. I don't really like the idea of teaching the display controller DRM driver about the GPU buffer constraints, even if they are fairly trivial like this. If the same display HW IP is being used on several SoCs, it seems wrong somehow to enforce those GPU constraints if some of those SoCs don't have a GPU. Well, I suppose you could get min_pitch_alignment from devicetree, or something like this.. In the end, the easy solution is just to make the display allocate to the worst-case pitch alignment. In the early days of dma-buf discussions, we kicked around the idea of negotiating or programatically describing the constraints, but that didn't really seem like a bounded problem. Yeah - I was around for some of those discussions and agree it's not really an easy problem to solve. We may also then have additional constraints when sharing buffers between the display HW and video decode or even camera ISP HW. Programmatically describing buffer allocation constraints is very difficult and I'm not sure you can actually do it - there's some pretty complex constraints out there! E.g. I believe there's a platform where Y and UV planes of the reference frame need to be in separate DRAM banks for real-time 1080p decode, or something like that? yes, this was discussed. This is different from pitch/format/size constraints.. it is really just a placement constraint (ie. where do the physical pages go). IIRC the conclusion was to use a dummy devices with it's own CMA pool for attaching the Y vs UV buffers. Anyway, I guess my point is that even if we solve how to allocate buffers which will be shared between the GPU and display HW such that both sets of constraints are satisfied, that may not be the end of the story. that was part of the reason to punt this problem to userspace ;-) In practice, the kernel drivers doesn't usually know too much about the dimensions/format/etc.. that is really userspace level knowledge. There are a few exceptions when the kernel needs to know how to setup GTT/etc for tiled buffers, but normally this sort of information is up at the next level up (userspace, and drm_framebuffer in case of scanout). Userspace media frameworks like GStreamer already have a concept of format/caps negotiation. For non-display-gpu sharing, I think this is probably where this sort of constraint negotiation should be handled. I agree that user-space will know which devices will access the buffer and thus can figure out at least a common pixel format. Though I'm not so sure userspace can figure out more low-level details like alignment and placement in physical memory, etc. well, let's divide things up into two categories: 1) the arrangement and format of pixels.. ie. what userspace would need to know if it mmap's a buffer. This includes pixel format, stride, etc. This should be negotiated in userspace, it would be crazy to try to do this in the kernel. 2) the physical placement of the pages. Ie. whether it is contiguous or not. Which bank the pages in the buffer are placed in, etc. This is not visible to userspace. This is the purpose of the attach step, so you know all the devices involved in sharing up front before allocating the backing pages. (Or in the worst case, if you have a late attacher you at least know when no device is doing dma access to a buffer and can reallocate and move the buffer.) A long time back, I had a patch that added a field or two to 'struct device_dma_parameters' so that it could be known if a device required contiguous buffers.. looks like that never got merged, so I'd need to dig that back up and resend it. But the idea was to have the 'struct device' encapsulate all the information that would be needed to do-the-right-thing when it comes to placement. Anyway, assuming user-space can figure out how a buffer should be stored in memory, how does it indicate this to a kernel driver and actually allocate it? Which ioctl on which device does user-space call, with what parameters? Are you suggesting using something like ION which exposes the low-level details of how buffers are laid out in physical memory to userspace? If not, what? no, userspace should not need to know this. And having a central driver that knows this for all the other drivers in the system doesn't really solve anything and isn't really scalable. At best you might want, in some cases, a flag you can pass when allocating. For example, some of the drivers have a 'SCANOUT' flag that can be passed when allocating a GEM buffer, as a hint to the kernel that 'if this hw requires contig memory for scanout, allocate this buffer contig'. But really, when it comes to sharing buffers between devices, we want this sort of information in dev-dma_params of the
Re: [Linaro-mm-sig] [RFC 0/1] drm/pl111: Initial drm/kms driver for pl111
On Tue, Aug 6, 2013 at 8:18 AM, Lucas Stach l.st...@pengutronix.de wrote: Am Dienstag, den 06.08.2013, 12:31 +0100 schrieb Tom Cooksey: Hi Rob, +lkml On Fri, Jul 26, 2013 at 11:58 AM, Tom Cooksey tom.cook...@arm.com wrote: * It abuses flags parameter of DRM_IOCTL_MODE_CREATE_DUMB to also allocate buffers for the GPU. Still not sure how to resolve this as we don't use DRM for our GPU driver. any thoughts/plans about a DRM GPU driver? Ideally long term (esp. once the dma-fence stuff is in place), we'd have gpu-specific drm (gpu-only, no kms) driver, and SoC/display specific drm/kms driver, using prime/dmabuf to share between the two. The extra buffers we were allocating from armsoc DDX were really being allocated through DRM/GEM so we could get an flink name for them and pass a reference to them back to our GPU driver on the client side. If it weren't for our need to access those extra off-screen buffers with the GPU we wouldn't need to allocate them with DRM at all. So, given they are really GPU buffers, it does absolutely make sense to allocate them in a different driver to the display driver. However, to avoid unnecessary memcpys related cache maintenance ops, we'd also like the GPU to render into buffers which are scanned out by the display controller. So let's say we continue using DRM_IOCTL_MODE_CREATE_DUMB to allocate scan out buffers with the display's DRM driver but a custom ioctl on the GPU's DRM driver to allocate non scanout, off-screen buffers. Sounds great, but I don't think that really works with DRI2. If we used two drivers to allocate buffers, which of those drivers do we return in DRI2ConnectReply? Even if we solve that somehow, GEM flink names are name-spaced to a single device node (AFAIK). So when we do a DRI2GetBuffers, how does the EGL in the client know which DRM device owns GEM flink name 1234? We'd need some pretty dirty hacks. You would return the name of the display driver allocating the buffers. On the client side you can use generic ioctls to go from flink - handle - dmabuf. So the client side would end up opening both the display drm device and the gpu, but without needing to know too much about the display. I think the bit I was missing was that a GEM bo for a buffer imported using dma_buf/PRIME can still be flink'd. So the display controller's DRM driver allocates scan-out buffers via the DUMB buffer allocate ioctl. Those scan-out buffers than then be exported from the dispaly's DRM driver and imported into the GPU's DRM driver using PRIME. Once imported into the GPU's driver, we can use flink to get a name for that buffer within the GPU DRM driver's name-space to return to the DRI2 client. That same namespace is also what DRI2 back- buffers are allocated from, so I think that could work... Except... (and.. the general direction is that things will move more to just use dmabuf directly, ie. wayland or dri3) I agree, DRI2 is the only reason why we need a system-wide ID. I also prefer buffers to be passed around by dma_buf fd, but we still need to support DRI2 and will do for some time I expect. Anyway, that latter case also gets quite difficult. The GPU DRM driver would need to know the constraints of the display controller when allocating buffers intended to be scanned out. For example, pl111 typically isn't behind an IOMMU and so requires physically contiguous memory. We'd have to teach the GPU's DRM driver about the constraints of the display HW. Not exactly a clean driver model. :-( I'm still a little stuck on how to proceed, so any ideas would greatly appreciated! My current train of thought is having a kind of SoC-specific DRM driver which allocates buffers for both display and GPU within a single GEM namespace. That SoC-specific DRM driver could then know the constraints of both the GPU and the display HW. We could then use PRIME to export buffers allocated with the SoC DRM driver and import them into the GPU and/or display DRM driver. Usually if the display drm driver is allocating the buffers that might be scanned out, it just needs to have minimal knowledge of the GPU (pitch alignment constraints). I don't think we need a 3rd device just to allocate buffers. While Mali can render to pretty much any buffer, there is a mild performance improvement to be had if the buffer stride is aligned to the AXI bus's max burst length when drawing to the buffer. I suspect the display controllers might frequently benefit if the pitch is aligned to AXI burst length too.. If the display controller is going to be reading from linear memory I don't think it will make much difference - you'll just get an extra 1-2 bus transactions per scanline. With a tile-based GPU like Mali,
Re: [RFC 0/1] drm/pl111: Initial drm/kms driver for pl111
On Tue, Aug 6, 2013 at 10:03 AM, Tom Cooksey tom.cook...@arm.com wrote: Hi Rob, We may also then have additional constraints when sharing buffers between the display HW and video decode or even camera ISP HW. Programmatically describing buffer allocation constraints is very difficult and I'm not sure you can actually do it - there's some pretty complex constraints out there! E.g. I believe there's a platform where Y and UV planes of the reference frame need to be in separate DRAM banks for real-time 1080p decode, or something like that? yes, this was discussed. This is different from pitch/format/size constraints.. it is really just a placement constraint (ie. where do the physical pages go). IIRC the conclusion was to use a dummy devices with it's own CMA pool for attaching the Y vs UV buffers. Anyway, I guess my point is that even if we solve how to allocate buffers which will be shared between the GPU and display HW such that both sets of constraints are satisfied, that may not be the end of the story. that was part of the reason to punt this problem to userspace ;-) In practice, the kernel drivers doesn't usually know too much about the dimensions/format/etc.. that is really userspace level knowledge. There are a few exceptions when the kernel needs to know how to setup GTT/etc for tiled buffers, but normally this sort of information is up at the next level up (userspace, and drm_framebuffer in case of scanout). Userspace media frameworks like GStreamer already have a concept of format/caps negotiation. For non-display-gpu sharing, I think this is probably where this sort of constraint negotiation should be handled. I agree that user-space will know which devices will access the buffer and thus can figure out at least a common pixel format. Though I'm not so sure userspace can figure out more low-level details like alignment and placement in physical memory, etc. well, let's divide things up into two categories: 1) the arrangement and format of pixels.. ie. what userspace would need to know if it mmap's a buffer. This includes pixel format, stride, etc. This should be negotiated in userspace, it would be crazy to try to do this in the kernel. Absolutely. Pixel format has to be negotiated by user-space as in most cases, user-space can map the buffer and thus will need to know how to interpret the data. 2) the physical placement of the pages. Ie. whether it is contiguous or not. Which bank the pages in the buffer are placed in, etc. This is not visible to userspace. Seems sensible to me. ... This is the purpose of the attach step, so you know all the devices involved in sharing up front before allocating the backing pages. (Or in the worst case, if you have a late attacher you at least know when no device is doing dma access to a buffer and can reallocate and move the buffer.) A long time back, I had a patch that added a field or two to 'struct device_dma_parameters' so that it could be known if a device required contiguous buffers.. looks like that never got merged, so I'd need to dig that back up and resend it. But the idea was to have the 'struct device' encapsulate all the information that would be needed to do-the-right-thing when it comes to placement. As I understand it, it's up to the exporting device to allocate the memory backing the dma_buf buffer. I guess the latest possible point you can allocate the backing pages is when map_dma_buf is first called? At that point the exporter can iterate over the current set of attachments, programmatically determine the all the constraints of all the attached drivers and attempt to allocate the backing pages in such a way as to satisfy all those constraints? yes, this is the idea.. possibly some room for some helpers to help out with this, but that is all under the hood from userspace perspective Didn't you say that programmatically describing device placement constraints was an unbounded problem? I guess we would have to accept that it's not possible to describe all possible constraints and instead find a way to describe the common ones? well, the point I'm trying to make, is by dividing your constraints into two groups, one that impacts and is handled by userspace, and one that is in the kernel (ie. where the pages go), you cut down the number of permutations that the kernel has to care about considerably. And kernel already cares about, for example, what range of addresses that a device can dma to/from. I think really the only thing missing is the max # of sglist entries (contiguous or not) One problem with this is it duplicates a lot of logic in each driver which can export a dma_buf buffer. Each exporter will need to do pretty much the same thing: iterate over all the attachments, determine of all the constraints (assuming that can be done) and allocate pages such that the lowest-common-denominator
Re: [Linaro-mm-sig] [RFC 0/1] drm/pl111: Initial drm/kms driver for pl111
On Tue, Aug 6, 2013 at 10:36 AM, Lucas Stach l.st...@pengutronix.de wrote: Am Dienstag, den 06.08.2013, 10:14 -0400 schrieb Rob Clark: On Tue, Aug 6, 2013 at 8:18 AM, Lucas Stach l.st...@pengutronix.de wrote: Am Dienstag, den 06.08.2013, 12:31 +0100 schrieb Tom Cooksey: Hi Rob, +lkml On Fri, Jul 26, 2013 at 11:58 AM, Tom Cooksey tom.cook...@arm.com wrote: * It abuses flags parameter of DRM_IOCTL_MODE_CREATE_DUMB to also allocate buffers for the GPU. Still not sure how to resolve this as we don't use DRM for our GPU driver. any thoughts/plans about a DRM GPU driver? Ideally long term (esp. once the dma-fence stuff is in place), we'd have gpu-specific drm (gpu-only, no kms) driver, and SoC/display specific drm/kms driver, using prime/dmabuf to share between the two. The extra buffers we were allocating from armsoc DDX were really being allocated through DRM/GEM so we could get an flink name for them and pass a reference to them back to our GPU driver on the client side. If it weren't for our need to access those extra off-screen buffers with the GPU we wouldn't need to allocate them with DRM at all. So, given they are really GPU buffers, it does absolutely make sense to allocate them in a different driver to the display driver. However, to avoid unnecessary memcpys related cache maintenance ops, we'd also like the GPU to render into buffers which are scanned out by the display controller. So let's say we continue using DRM_IOCTL_MODE_CREATE_DUMB to allocate scan out buffers with the display's DRM driver but a custom ioctl on the GPU's DRM driver to allocate non scanout, off-screen buffers. Sounds great, but I don't think that really works with DRI2. If we used two drivers to allocate buffers, which of those drivers do we return in DRI2ConnectReply? Even if we solve that somehow, GEM flink names are name-spaced to a single device node (AFAIK). So when we do a DRI2GetBuffers, how does the EGL in the client know which DRM device owns GEM flink name 1234? We'd need some pretty dirty hacks. You would return the name of the display driver allocating the buffers. On the client side you can use generic ioctls to go from flink - handle - dmabuf. So the client side would end up opening both the display drm device and the gpu, but without needing to know too much about the display. I think the bit I was missing was that a GEM bo for a buffer imported using dma_buf/PRIME can still be flink'd. So the display controller's DRM driver allocates scan-out buffers via the DUMB buffer allocate ioctl. Those scan-out buffers than then be exported from the dispaly's DRM driver and imported into the GPU's DRM driver using PRIME. Once imported into the GPU's driver, we can use flink to get a name for that buffer within the GPU DRM driver's name-space to return to the DRI2 client. That same namespace is also what DRI2 back- buffers are allocated from, so I think that could work... Except... (and.. the general direction is that things will move more to just use dmabuf directly, ie. wayland or dri3) I agree, DRI2 is the only reason why we need a system-wide ID. I also prefer buffers to be passed around by dma_buf fd, but we still need to support DRI2 and will do for some time I expect. Anyway, that latter case also gets quite difficult. The GPU DRM driver would need to know the constraints of the display controller when allocating buffers intended to be scanned out. For example, pl111 typically isn't behind an IOMMU and so requires physically contiguous memory. We'd have to teach the GPU's DRM driver about the constraints of the display HW. Not exactly a clean driver model. :-( I'm still a little stuck on how to proceed, so any ideas would greatly appreciated! My current train of thought is having a kind of SoC-specific DRM driver which allocates buffers for both display and GPU within a single GEM namespace. That SoC-specific DRM driver could then know the constraints of both the GPU and the display HW. We could then use PRIME to export buffers allocated with the SoC DRM driver and import them into the GPU and/or display DRM driver. Usually if the display drm driver is allocating the buffers that might be scanned out, it just needs to have minimal knowledge of the GPU (pitch alignment constraints). I don't think we need a 3rd device just to allocate buffers. While Mali can render to pretty much any buffer, there is a mild performance improvement to be had if the buffer stride is aligned to the AXI bus's max burst length when drawing to the buffer. I suspect the display controllers might frequently benefit if the pitch is aligned
Re: [RFC 0/1] drm/pl111: Initial drm/kms driver for pl111
On Tue, Aug 6, 2013 at 1:38 PM, Tom Cooksey tom.cook...@arm.com wrote: ... This is the purpose of the attach step, so you know all the devices involved in sharing up front before allocating the backing pages. (Or in the worst case, if you have a late attacher you at least know when no device is doing dma access to a buffer and can reallocate and move the buffer.) A long time back, I had a patch that added a field or two to 'struct device_dma_parameters' so that it could be known if a device required contiguous buffers.. looks like that never got merged, so I'd need to dig that back up and resend it. But the idea was to have the 'struct device' encapsulate all the information that would be needed to do-the-right-thing when it comes to placement. As I understand it, it's up to the exporting device to allocate the memory backing the dma_buf buffer. I guess the latest possible point you can allocate the backing pages is when map_dma_buf is first called? At that point the exporter can iterate over the current set of attachments, programmatically determine the all the constraints of all the attached drivers and attempt to allocate the backing pages in such a way as to satisfy all those constraints? yes, this is the idea.. possibly some room for some helpers to help out with this, but that is all under the hood from userspace perspective Didn't you say that programmatically describing device placement constraints was an unbounded problem? I guess we would have to accept that it's not possible to describe all possible constraints and instead find a way to describe the common ones? well, the point I'm trying to make, is by dividing your constraints into two groups, one that impacts and is handled by userspace, and one that is in the kernel (ie. where the pages go), you cut down the number of permutations that the kernel has to care about considerably. And kernel already cares about, for example, what range of addresses that a device can dma to/from. I think really the only thing missing is the max # of sglist entries (contiguous or not) I think it's more than physically contiguous or not. For example, it can be more efficient to use large page sizes on devices with IOMMUs to reduce TLB traffic. I think the size and even the availability of large pages varies between different IOMMUs. sure.. but I suppose if we can spiff out dma_params to express I need contiguous, perhaps we can add some way to express I prefer as-contiguous-as-possible.. either way, this is about where the pages are placed, and not about the layout of pixels within the page, so should be in kernel. It's something that is missing, but I believe that it belongs in dma_params and hidden behind dma_alloc_*() for simple drivers. There's also the issue of buffer stride alignment. As I say, if the buffer is to be written by a tile-based GPU like Mali, it's more efficient if the buffer's stride is aligned to the max AXI bus burst length. Though I guess a buffer stride only makes sense as a concept when interpreting the data as a linear-layout 2D image, so perhaps belongs in user-space along with format negotiation? Yeah.. this isn't about where the pages go, but about the arrangement within a page. And, well, except for hw that supports the same tiling (or compressed-fb) in display+gpu, you probably aren't sharing tiled buffers. One problem with this is it duplicates a lot of logic in each driver which can export a dma_buf buffer. Each exporter will need to do pretty much the same thing: iterate over all the attachments, determine of all the constraints (assuming that can be done) and allocate pages such that the lowest-common-denominator is satisfied. Perhaps rather than duplicating that logic in every driver, we could Instead move allocation of the backing pages into dma_buf itself? I tend to think it is better to add helpers as we see common patterns emerge, which drivers can opt-in to using. I don't think that we should move allocation into dma_buf itself, but it would perhaps be useful to have dma_alloc_*() variants that could allocate for multiple devices. A helper could work I guess, though I quite like the idea of having dma_alloc_*() variants which take a list of devices to allocate memory for. That would help for simple stuff, although I'd suspect eventually a GPU driver will move away from that. (Since you probably want to play tricks w/ pools of pages that are pre-zero'd and in the correct cache state, use spare cycles on the gpu or dma engine to pre-zero uncached pages, and games like that.) So presumably you're talking about a GPU driver being the exporter here? If so, how could the GPU driver do these kind of tricks on memory shared with another device? Yes, that is gpu-as-exporter. If someone else is allocating buffers, it is up to them to do these tricks or not. Probably there is a pretty good chance that if
Re: [PATCH V2] drm/exynos: Add fallback option to get non physically continous memory for fb
On Mon, Aug 5, 2013 at 5:44 AM, Vikas Sajjan vikas.saj...@linaro.org wrote: While trying to get boot-logo up on exynos5420 SMDK which has eDP panel connected with resolution 2560x1600, following error occured even with IOMMU enabled: [0.88] [drm:lowlevel_buffer_allocate] *ERROR* failed to allocate buffer. [0.89] [drm] Initialized exynos 1.0.0 20110530 on minor 0 To address the case where physically continous memory MAY NOT be a mandatory requirement for fb, the patch adds a feature to get non physically continous memory for fb if IOMMU is supported and if CONTIG memory allocation fails. Reviewed-by: Rob Clark robdcl...@gmail.com Signed-off-by: Vikas Sajjan vikas.saj...@linaro.org Signed-off-by: Arun Kumar arun...@samsung.com --- changes since v1: - Modified to add the fallback patch if CONTIG alloc fails as suggested by Rob Clark robdcl...@gmail.com and Tomasz Figa tomasz.f...@gmail.com. - changed the commit message. --- drivers/gpu/drm/exynos/exynos_drm_fbdev.c | 19 +++ 1 file changed, 15 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/exynos/exynos_drm_fbdev.c b/drivers/gpu/drm/exynos/exynos_drm_fbdev.c index 8e60bd6..9a4b886 100644 --- a/drivers/gpu/drm/exynos/exynos_drm_fbdev.c +++ b/drivers/gpu/drm/exynos/exynos_drm_fbdev.c @@ -16,6 +16,7 @@ #include drm/drm_crtc.h #include drm/drm_fb_helper.h #include drm/drm_crtc_helper.h +#include drm/exynos_drm.h #include exynos_drm_drv.h #include exynos_drm_fb.h @@ -165,11 +166,21 @@ static int exynos_drm_fbdev_create(struct drm_fb_helper *helper, size = mode_cmd.pitches[0] * mode_cmd.height; - /* 0 means to allocate physically continuous memory */ - exynos_gem_obj = exynos_drm_gem_create(dev, 0, size); + exynos_gem_obj = exynos_drm_gem_create(dev, EXYNOS_BO_CONTIG, size); if (IS_ERR(exynos_gem_obj)) { - ret = PTR_ERR(exynos_gem_obj); - goto err_release_framebuffer; + /* +* If IOMMU is supported then try to get buffer from +* non-continous memory area +*/ + if (is_drm_iommu_supported(dev)) + exynos_gem_obj = exynos_drm_gem_create(dev, + EXYNOS_BO_NONCONTIG, size); + if (IS_ERR(exynos_gem_obj)) { + ret = PTR_ERR(exynos_gem_obj); + goto err_release_framebuffer; + } + dev_warn(pdev-dev, exynos_gem_obj for FB is allocated with\n + non physically continuous memory\n); } exynos_fbdev-exynos_gem_obj = exynos_gem_obj; -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 0/1] drm/pl111: Initial drm/kms driver for pl111
On Mon, Aug 5, 2013 at 1:10 PM, Tom Cooksey tom.cook...@arm.com wrote: Hi Rob, +linux-media, +linaro-mm-sig for discussion of video/camera buffer constraints... On Fri, Jul 26, 2013 at 11:58 AM, Tom Cooksey tom.cook...@arm.com wrote: * It abuses flags parameter of DRM_IOCTL_MODE_CREATE_DUMB to also allocate buffers for the GPU. Still not sure how to resolve this as we don't use DRM for our GPU driver. any thoughts/plans about a DRM GPU driver? Ideally long term (esp. once the dma-fence stuff is in place), we'd have gpu-specific drm (gpu-only, no kms) driver, and SoC/display specific drm/kms driver, using prime/dmabuf to share between the two. The extra buffers we were allocating from armsoc DDX were really being allocated through DRM/GEM so we could get an flink name for them and pass a reference to them back to our GPU driver on the client side. If it weren't for our need to access those extra off-screen buffers with the GPU we wouldn't need to allocate them with DRM at all. So, given they are really GPU buffers, it does absolutely make sense to allocate them in a different driver to the display driver. However, to avoid unnecessary memcpys related cache maintenance ops, we'd also like the GPU to render into buffers which are scanned out by the display controller. So let's say we continue using DRM_IOCTL_MODE_CREATE_DUMB to allocate scan out buffers with the display's DRM driver but a custom ioctl on the GPU's DRM driver to allocate non scanout, off-screen buffers. Sounds great, but I don't think that really works with DRI2. If we used two drivers to allocate buffers, which of those drivers do we return in DRI2ConnectReply? Even if we solve that somehow, GEM flink names are name-spaced to a single device node (AFAIK). So when we do a DRI2GetBuffers, how does the EGL in the client know which DRM device owns GEM flink name 1234? We'd need some pretty dirty hacks. You would return the name of the display driver allocating the buffers. On the client side you can use generic ioctls to go from flink - handle - dmabuf. So the client side would end up opening both the display drm device and the gpu, but without needing to know too much about the display. I think the bit I was missing was that a GEM bo for a buffer imported using dma_buf/PRIME can still be flink'd. So the display controller's DRM driver allocates scan-out buffers via the DUMB buffer allocate ioctl. Those scan-out buffers than then be exported from the dispaly's DRM driver and imported into the GPU's DRM driver using PRIME. Once imported into the GPU's driver, we can use flink to get a name for that buffer within the GPU DRM driver's name-space to return to the DRI2 client. That same namespace is also what DRI2 back-buffers are allocated from, so I think that could work... Except... (and.. the general direction is that things will move more to just use dmabuf directly, ie. wayland or dri3) Anyway, that latter case also gets quite difficult. The GPU DRM driver would need to know the constraints of the display controller when allocating buffers intended to be scanned out. For example, pl111 typically isn't behind an IOMMU and so requires physically contiguous memory. We'd have to teach the GPU's DRM driver about the constraints of the display HW. Not exactly a clean driver model. :-( I'm still a little stuck on how to proceed, so any ideas would greatly appreciated! My current train of thought is having a kind of SoC-specific DRM driver which allocates buffers for both display and GPU within a single GEM namespace. That SoC-specific DRM driver could then know the constraints of both the GPU and the display HW. We could then use PRIME to export buffers allocated with the SoC DRM driver and import them into the GPU and/or display DRM driver. Usually if the display drm driver is allocating the buffers that might be scanned out, it just needs to have minimal knowledge of the GPU (pitch alignment constraints). I don't think we need a 3rd device just to allocate buffers. While Mali can render to pretty much any buffer, there is a mild performance improvement to be had if the buffer stride is aligned to the AXI bus's max burst length when drawing to the buffer. I suspect the display controllers might frequently benefit if the pitch is aligned to AXI burst length too.. So in some respects, there is a constraint on how buffers which will be drawn to using the GPU are allocated. I don't really like the idea of teaching the display controller DRM driver about the GPU buffer constraints, even if they are fairly trivial like this. If the same display HW IP is being used on several SoCs, it seems wrong somehow to enforce those GPU constraints if some of those SoCs don't have a GPU. Well, I suppose you could get min_pitch_alignment from devicetree, or something like this.. In the end, the easy solution is just
Re: [PATCH] drm/exynos: Add check for IOMMU while passing physically continous memory flag
On Thu, Aug 1, 2013 at 7:20 PM, Tomasz Figa tomasz.f...@gmail.com wrote: Hi Vikas, On Thursday 01 of August 2013 16:49:32 Vikas Sajjan wrote: While trying to get boot-logo up on exynos5420 SMDK which has eDP panel connected with resolution 2560x1600, following error occured even with IOMMU enabled: [0.88] [drm:lowlevel_buffer_allocate] *ERROR* failed to allocate buffer. [0.89] [drm] Initialized exynos 1.0.0 20110530 on minor 0 This patch fixes the issue by adding a check for IOMMU. Signed-off-by: Vikas Sajjan vikas.saj...@linaro.org Signed-off-by: Arun Kumar arun...@samsung.com --- drivers/gpu/drm/exynos/exynos_drm_fbdev.c |9 - 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/exynos/exynos_drm_fbdev.c b/drivers/gpu/drm/exynos/exynos_drm_fbdev.c index 8e60bd6..2a8 100644 --- a/drivers/gpu/drm/exynos/exynos_drm_fbdev.c +++ b/drivers/gpu/drm/exynos/exynos_drm_fbdev.c @@ -16,6 +16,7 @@ #include drm/drm_crtc.h #include drm/drm_fb_helper.h #include drm/drm_crtc_helper.h +#include drm/exynos_drm.h #include exynos_drm_drv.h #include exynos_drm_fb.h @@ -143,6 +144,7 @@ static int exynos_drm_fbdev_create(struct drm_fb_helper *helper, struct platform_device *pdev = dev-platformdev; unsigned long size; int ret; + unsigned int flag; DRM_DEBUG_KMS(surface width(%d), height(%d) and bpp(%d\n, sizes-surface_width, sizes-surface_height, @@ -166,7 +168,12 @@ static int exynos_drm_fbdev_create(struct drm_fb_helper *helper, size = mode_cmd.pitches[0] * mode_cmd.height; /* 0 means to allocate physically continuous memory */ - exynos_gem_obj = exynos_drm_gem_create(dev, 0, size); + if (!is_drm_iommu_supported(dev)) + flag = 0; + else + flag = EXYNOS_BO_NONCONTIG; While noncontig memory might be used for devices that support IOMMU, there should be no problem with using contig memory for them, so this seems more like masking the original problem rather than tracking it down. it is probably a good idea to not require contig memory when it is not needed for performance or functionality (and if it is only performance, then fallback gracefully to non-contig).. but yeah, would be good to know if this is masking another issue all the same BR, -R Could you check why the allocation fails when requesting contiguous memory? Best regards, Tomasz -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] dmabuf-sync: Introduce buffer synchronization framework
On Tue, Jun 25, 2013 at 5:09 AM, Inki Dae daei...@gmail.com wrote: that should be the role of kernel memory management which of course needs synchronization btw A and B. But in no case this should be done using dma-buf. dma-buf is for sharing content btw different devices not sharing resources. hmm, is that true? And are you sure? Then how do you think about reservation? the reservation also uses dma-buf with same reason as long as I know: actually, we use reservation to use dma-buf. As you may know, a reservation object is allocated and initialized when a buffer object is exported to a dma buf. no, this is why the reservation object can be passed in when you construction the dmabuf. The fallback is for dmabuf to create it's own, for compatibility and to make life easier for simple devices with few buffers... but I think pretty much all drm drivers would embed the reservation object in the gem buffer and pass it in when the dmabuf is created. It is pretty much imperative that synchronization works independently of dmabuf, you really don't want to have two different cases to deal with in your driver, one for synchronizing non-exported objects, and one for synchronizing dmabuf objects. BR, -R -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Introduce a new helper framework for buffer synchronization
On Mon, May 27, 2013 at 11:56 PM, Inki Dae inki@samsung.com wrote: -Original Message- From: linux-fbdev-ow...@vger.kernel.org [mailto:linux-fbdev- ow...@vger.kernel.org] On Behalf Of Rob Clark Sent: Tuesday, May 28, 2013 12:48 AM To: Inki Dae Cc: Maarten Lankhorst; Daniel Vetter; linux-fbdev; YoungJun Cho; Kyungmin Park; myungjoo.ham; DRI mailing list; linux-arm-ker...@lists.infradead.org; linux-media@vger.kernel.org Subject: Re: Introduce a new helper framework for buffer synchronization On Mon, May 27, 2013 at 6:38 AM, Inki Dae inki@samsung.com wrote: Hi all, I have been removed previous branch and added new one with more cleanup. This time, the fence helper doesn't include user side interfaces and cache operation relevant codes anymore because not only we are not sure that coupling those two things, synchronizing caches and buffer access between CPU and CPU, CPU and DMA, and DMA and DMA with fences, in kernel side is a good idea yet but also existing codes for user side have problems with badly behaved or crashing userspace. So this could be more discussed later. The below is a new branch, https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm- exynos.git/?h=dma-f ence-helper And fence helper codes, https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm- exynos.git/commit/? h=dma-fence-helperid=adcbc0fe7e285ce866e5816e5e21443dcce01005 And example codes for device driver, https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm- exynos.git/commit/? h=dma-fence-helperid=d2ce7af23835789602a99d0ccef1f53cdd5caaae I think the time is not yet ripe for RFC posting: maybe existing dma fence and reservation need more review and addition work. So I'd glad for somebody giving other opinions and advices in advance before RFC posting. thoughts from a *really* quick, pre-coffee, first look: * any sort of helper to simplify single-buffer sort of use-cases (v4l) probably wouldn't want to bake in assumption that seqno_fence is used. * I guess g2d is probably not actually a simple use case, since I expect you can submit blits involving multiple buffers :-P I don't think so. One and more buffers can be used: seqno_fence also has only one buffer. Actually, we have already applied this approach to most devices; multimedia, gpu and display controller. And this approach shows more performance; reduced power consumption against traditional way. And g2d example is just to show you how to apply my approach to device driver. no, you need the ww-mutex / reservation stuff any time you have multiple independent devices (or rings/contexts for hw that can support multiple contexts) which can do operations with multiple buffers. So you could conceivably hit this w/ gpu + g2d if multiple buffers where shared between the two. vram migration and such 'desktop stuff' might make the problem worse, but just because you don't have vram doesn't mean you don't have a problem with multiple buffers. * otherwise, you probably don't want to depend on dmabuf, which is why reservation/fence is split out the way it is.. you want to be able to use a single reservation/fence mechanism within your driver without having to care about which buffers are exported to dmabuf's and which are not. Creating a dmabuf for every GEM bo is too heavyweight. Right. But I think we should dealt with this separately. Actually, we are trying to use reservation for gpu pipe line synchronization such as sgx sync object and this approach is used without dmabuf. In order words, some device can use only reservation for such pipe line synchronization and at the same time, fence helper or similar thing with dmabuf for buffer synchronization. it is probably easier to approach from the reverse direction.. ie, get reservation/synchronization right first, and then dmabuf. (Well, that isn't really a problem because Maarten's reservation/fence patches support dmabuf from the beginning.) BR, -R I'm not entirely sure if reservation/fence could/should be made any simpler for multi-buffer users. Probably the best thing to do is just get reservation/fence rolled out in a few drivers and see if some common patterns emerge. BR, -R Thanks, Inki Dae -- To unsubscribe from this list: send the line unsubscribe linux-fbdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Introduce a new helper framework for buffer synchronization
On Mon, May 27, 2013 at 6:38 AM, Inki Dae inki@samsung.com wrote: Hi all, I have been removed previous branch and added new one with more cleanup. This time, the fence helper doesn't include user side interfaces and cache operation relevant codes anymore because not only we are not sure that coupling those two things, synchronizing caches and buffer access between CPU and CPU, CPU and DMA, and DMA and DMA with fences, in kernel side is a good idea yet but also existing codes for user side have problems with badly behaved or crashing userspace. So this could be more discussed later. The below is a new branch, https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-exynos.git/?h=dma-f ence-helper And fence helper codes, https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-exynos.git/commit/? h=dma-fence-helperid=adcbc0fe7e285ce866e5816e5e21443dcce01005 And example codes for device driver, https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-exynos.git/commit/? h=dma-fence-helperid=d2ce7af23835789602a99d0ccef1f53cdd5caaae I think the time is not yet ripe for RFC posting: maybe existing dma fence and reservation need more review and addition work. So I'd glad for somebody giving other opinions and advices in advance before RFC posting. thoughts from a *really* quick, pre-coffee, first look: * any sort of helper to simplify single-buffer sort of use-cases (v4l) probably wouldn't want to bake in assumption that seqno_fence is used. * I guess g2d is probably not actually a simple use case, since I expect you can submit blits involving multiple buffers :-P * otherwise, you probably don't want to depend on dmabuf, which is why reservation/fence is split out the way it is.. you want to be able to use a single reservation/fence mechanism within your driver without having to care about which buffers are exported to dmabuf's and which are not. Creating a dmabuf for every GEM bo is too heavyweight. I'm not entirely sure if reservation/fence could/should be made any simpler for multi-buffer users. Probably the best thing to do is just get reservation/fence rolled out in a few drivers and see if some common patterns emerge. BR, -R Thanks, Inki Dae -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Introduce a new helper framework for buffer synchronization
On Wed, May 15, 2013 at 1:19 AM, Inki Dae inki@samsung.com wrote: -Original Message- From: Rob Clark [mailto:robdcl...@gmail.com] Sent: Tuesday, May 14, 2013 10:39 PM To: Inki Dae Cc: linux-fbdev; DRI mailing list; Kyungmin Park; myungjoo.ham; YoungJun Cho; linux-arm-ker...@lists.infradead.org; linux-media@vger.kernel.org Subject: Re: Introduce a new helper framework for buffer synchronization On Mon, May 13, 2013 at 10:52 PM, Inki Dae inki@samsung.com wrote: well, for cache management, I think it is a better idea.. I didn't really catch that this was the motivation from the initial patch, but maybe I read it too quickly. But cache can be decoupled from synchronization, because CPU access is not asynchronous. For userspace/CPU access to buffer, you should: 1) wait for buffer 2) prepare-access 3) ... do whatever cpu access to buffer ... 4) finish-access 5) submit buffer for new dma-operation For data flow from CPU to DMA device, 1) wait for buffer 2) prepare-access (dma_buf_begin_cpu_access) 3) cpu access to buffer For data flow from DMA device to CPU 1) wait for buffer Right, but CPU access isn't asynchronous (from the point of view of the CPU), so there isn't really any wait step at this point. And if you do want the CPU to be able to signal a fence from userspace for some reason, you probably what something file/fd based so the refcnting/cleanup when process dies doesn't leave some pending DMA action wedged. But I don't really see the point of that complexity when the CPU access isn't asynchronous in the first place. There was my missing comments, please see the below sequence. For data flow from CPU to DMA device and then from DMA device to CPU, 1) wait for buffer - at user side - ioctl(fd, DMA_BUF_GET_FENCE, ...) - including prepare-access (dma_buf_begin_cpu_access) 2) cpu access to buffer 3) wait for buffer - at device driver - but CPU is already accessing the buffer so blocked. 4) signal - at user side - ioctl(fd, DMA_BUF_PUT_FENCE, ...) 5) the thread, blocked at 3), is waked up by 4). - and then finish-access (dma_buf_end_cpu_access) right, I understand you can have background threads, etc, in userspace. But there are already plenty of synchronization primitives that can be used for cpu-cpu synchronization, either within the same process or between multiple processes. For cpu access, even if it is handled by background threads/processes, I think it is better to use the traditional pthreads or unix synchronization primitives. They have existed forever, they are well tested, and they work. So while it seems nice and orthogonal/clean to couple cache and synchronization and handle dma-cpu and cpu-cpu and cpu-dma in the same generic way, but I think in practice we have to make things more complex than they otherwise need to be to do this. Otherwise I think we'll be having problems with badly behaved or crashing userspace. BR, -R 6) dma access to buffer 7) wait for buffer - at user side - ioctl(fd, DMA_BUF_GET_FENCE, ...) - but DMA is already accessing the buffer so blocked. 8) signal - at device driver 9) the thread, blocked at 7), is waked up by 8) - and then prepare-access (dma_buf_begin_cpu_access) 10 cpu access to buffer Basically, 'wait for buffer' includes buffer synchronization, committing processing, and cache operation. The buffer synchronization means that a current thread should wait for other threads accessing a shared buffer until the completion of their access. And the committing processing means that a current thread possesses the shared buffer so any trying to access the shared buffer by another thread makes the thread to be blocked. However, as I already mentioned before, it seems that these user interfaces are so ugly yet. So we need better way. Give me more comments if there is my missing point :) Thanks, Inki Dae BR, -R 2) finish-access (dma_buf_end _cpu_access) 3) dma access to buffer 1) and 2) are coupled with one function: we have implemented fence_helper_commit_reserve() for it. Cache control(cache clean or cache invalidate) is performed properly checking previous access type and current access type. And the below is actual codes for it, -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Introduce a new helper framework for buffer synchronization
On Mon, May 13, 2013 at 10:52 PM, Inki Dae inki@samsung.com wrote: well, for cache management, I think it is a better idea.. I didn't really catch that this was the motivation from the initial patch, but maybe I read it too quickly. But cache can be decoupled from synchronization, because CPU access is not asynchronous. For userspace/CPU access to buffer, you should: 1) wait for buffer 2) prepare-access 3) ... do whatever cpu access to buffer ... 4) finish-access 5) submit buffer for new dma-operation For data flow from CPU to DMA device, 1) wait for buffer 2) prepare-access (dma_buf_begin_cpu_access) 3) cpu access to buffer For data flow from DMA device to CPU 1) wait for buffer Right, but CPU access isn't asynchronous (from the point of view of the CPU), so there isn't really any wait step at this point. And if you do want the CPU to be able to signal a fence from userspace for some reason, you probably what something file/fd based so the refcnting/cleanup when process dies doesn't leave some pending DMA action wedged. But I don't really see the point of that complexity when the CPU access isn't asynchronous in the first place. BR, -R 2) finish-access (dma_buf_end _cpu_access) 3) dma access to buffer 1) and 2) are coupled with one function: we have implemented fence_helper_commit_reserve() for it. Cache control(cache clean or cache invalidate) is performed properly checking previous access type and current access type. And the below is actual codes for it, -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Introduce a new helper framework for buffer synchronization
On Mon, May 13, 2013 at 8:21 AM, Inki Dae inki@samsung.com wrote: In that case you still wouldn't give userspace control over the fences. I don't see any way that can end well. What if userspace never signals? What if userspace gets killed by oom killer. Who keeps track of that? In all cases, all kernel resources to user fence will be released by kernel once the fence is timed out: never signaling and process killing by oom killer makes the fence timed out. And if we use mmap mechanism you mentioned before, I think user resource could also be freed properly. I tend to agree w/ Maarten here.. there is no good reason for userspace to be *signaling* fences. The exception might be some blob gpu drivers which don't have enough knowledge in the kernel to figure out what to do. (In which case you can add driver private ioctls for that.. still not the right thing to do but at least you don't make a public API out of it.) BR, -R -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Introduce a new helper framework for buffer synchronization
On Mon, May 13, 2013 at 1:18 PM, Inki Dae inki@samsung.com wrote: 2013/5/13 Rob Clark robdcl...@gmail.com On Mon, May 13, 2013 at 8:21 AM, Inki Dae inki@samsung.com wrote: In that case you still wouldn't give userspace control over the fences. I don't see any way that can end well. What if userspace never signals? What if userspace gets killed by oom killer. Who keeps track of that? In all cases, all kernel resources to user fence will be released by kernel once the fence is timed out: never signaling and process killing by oom killer makes the fence timed out. And if we use mmap mechanism you mentioned before, I think user resource could also be freed properly. I tend to agree w/ Maarten here.. there is no good reason for userspace to be *signaling* fences. The exception might be some blob gpu drivers which don't have enough knowledge in the kernel to figure out what to do. (In which case you can add driver private ioctls for that.. still not the right thing to do but at least you don't make a public API out of it.) Please do not care whether those are generic or not. Let's see the following three things. First, it's cache operation. As you know, ARM SoC has ACP (Accelerator Coherency Port) and can be connected to DMA engine or similar devices. And this port is used for cache coherency between CPU cache and DMA device. However, most devices on ARM based embedded systems don't use the ACP port. So they need proper cache operation before and after of DMA or CPU access in case of using cachable mapping. Actually, I see many Linux based platforms call cache control interfaces directly for that. I think the reason, they do so, is that kernel isn't aware of when and how CPU accessed memory. I think we had kicked around the idea of giving dmabuf's a prepare/finish ioctl quite some time back. This is probably something that should be at least a bit decoupled from fences. (Possibly 'prepare' waits for dma access to complete, but not the other way around.) And I did implement in omapdrm support for simulating coherency via page fault-in / shoot-down.. It is one option that makes it completely transparent to userspace, although there is some performance const, so I suppose it depends a bit on your use-case. And second, user process has to do so many things in case of using shared memory with DMA device. User process should understand how DMA device is operated and when interfaces for controling the DMA device are called. Such things would make user application so complicated. And third, it's performance optimization to multimedia and graphics devices. As I mentioned already, we should consider sequential processing for buffer sharing between CPU and DMA device. This means that CPU should stay with idle until DMA device is completed and vise versa. That is why I proposed such user interfaces. Of course, these interfaces might be so ugly yet: for this, Maarten pointed already out and I agree with him. But there must be another better way. Aren't you think we need similar thing? With such interfaces, cache control and buffer synchronization can be performed in kernel level. Moreover, user applization doesn't need to consider DMA device controlling anymore. Therefore, one thread can access a shared buffer and the other can control DMA device with the shared buffer in parallel. We can really make the best use of CPU and DMA idle time. In other words, we can really make the best use of multi tasking OS, Linux. So could you please tell me about that there is any reason we don't use public API for it? I think we can add and use public API if NECESSARY. well, for cache management, I think it is a better idea.. I didn't really catch that this was the motivation from the initial patch, but maybe I read it too quickly. But cache can be decoupled from synchronization, because CPU access is not asynchronous. For userspace/CPU access to buffer, you should: 1) wait for buffer 2) prepare-access 3) ... do whatever cpu access to buffer ... 4) finish-access 5) submit buffer for new dma-operation I suppose you could combine the syscall for #1 and #2.. not sure if that is a good idea or not. But you don't need to. And there is never really any need for userspace to signal a fence. BR, -R Thanks, Inki Dae BR, -R ___ dri-devel mailing list dri-de...@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] drm/udl: avoid swiotlb for imported vmap buffers.
On Mon, May 6, 2013 at 4:44 PM, Daniel Vetter dan...@ffwll.ch wrote: On Mon, May 6, 2013 at 9:56 PM, Dave Airlie airl...@gmail.com wrote: On Tue, May 7, 2013 at 1:59 AM, Daniel Vetter dan...@ffwll.ch wrote: On Mon, May 06, 2013 at 10:35:35AM +1000, Dave Airlie wrote: However if we don't set a dma mask on the usb device, the mapping ends up using swiotlb on machines that have it enabled, which is less than desireable. Signed-off-by: Dave Airlie airl...@redhat.com Fyi for everyone else who was not on irc when DaveI discussed this: This really shouldn't be required and I think the real issue is that udl creates a dma_buf attachement (which is needed for device dma only), but only really wants to do cpu access through vmap/kmap. So not attached the device should be good enough. Cc'ing a few more lists for better fyi ;-) Though I've looked at this a bit more, and since I want to be able to expose shared objects as proper GEM objects from the import side I really need that list of pages. Hm, what does proper GEM object mean in the context of udl? One that appears the same as a GEM object created by userspace. i.e. mmap works. Oh, we have an mmap interface in the dma_buf thing for that, and iirc Rob Clark even bothered to implement the gem-dma_buf mmap forwarding somewhere. And iirc android's ion-on-dma_buf stuff is even using the mmap interface stuff. fwiw, what I did was dma_buf - gem mmap fwding, ie. implement mmap for the dmabuf object by fwd'ing it to my normal gem mmap code. Might be the opposite of what you are looking for here. Although I suppose the reverse could work to, I hadn't really thought about it yet. BR, -R Now for prime let's just ship this, dammit prevailed for now. But I still think that hiding the backing storage a bit better (with the eventual goal of supporting eviction with Maarten's fence/ww_mutex madness) feels like a worthy long-term goal. Cheers, Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ___ dri-devel mailing list dri-de...@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC v2 0/5] Common Display Framework
On Fri, Feb 1, 2013 at 5:42 PM, Laurent Pinchart laurent.pinch...@ideasonboard.com wrote: Hi Rahul, On Wednesday 09 January 2013 13:53:30 Rahul Sharma wrote: Hi Laurent, CDF will also be helpful in supporting Panels with integrated audio (HDMI/DP) if we can add audio related control operations to display_entity_control_ops. Video controls will be called by crtc in DRM/V4L and audio controls from Alsa. I knew that would come up at some point :-) I agree with you that adding audio support would be a very nice improvement, and I'm totally open to that, but I will concentrate on video, at least to start with. The first reason is that I'm not familiar enough with ALSA, and the second that there's only 24h per day :-) Please feel free, of course, to submit a proposal for audio support. Secondly, if I need to support get_modes operation in hdmi/dp panel, I need to implement edid parser inside the panel driver. It will be meaningful to add get_edid control operation for hdmi/dp. Even if EDID data is parsed in the panel driver, raw EDID will still need to be exported, so a get_edid control operation (or something similar) is definitely needed. There's no disagreement on this, I just haven't included that operation yet because my test hardware is purely panel-based. one of (probably many) places that just keeping CDF (CDH? common display helpers..) inside DRM makes life easier :-P BR, -R -- Regards, Laurent Pinchart -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/7] mutex: add support for reservation style locks
On Wed, Jan 30, 2013 at 5:52 AM, Rob Clark robdcl...@gmail.com wrote: On Wed, Jan 30, 2013 at 5:08 AM, Daniel Vetter dan...@ffwll.ch wrote: On Wed, Jan 30, 2013 at 2:07 AM, Rob Clark robdcl...@gmail.com wrote: == Basic problem statement: - --- - GPU's do operations that commonly involve many buffers. Those buffers can be shared across contexts/processes, exist in different memory domains (for example VRAM vs system memory), and so on. And with PRIME / dmabuf, they can even be shared across devices. So there are a handful of situations where the driver needs to wait for buffers to become ready. If you think about this in terms of waiting on a buffer mutex for it to become available, this presents a problem because there is no way to guarantee that buffers appear in a execbuf/batch in the same order in all contexts. That is directly under control of userspace, and a result of the sequence of GL calls that an application makes. Which results in the potential for deadlock. The problem gets more complex when you consider that the kernel may need to migrate the buffer(s) into VRAM before the GPU operates on the buffer(s), which main in turn require evicting some other buffers (and you don't want to evict other buffers which are already queued up to the GPU), but for a simplified understanding of the problem you can ignore this. The algorithm that TTM came up with for dealing with this problem is quite simple. For each group of buffers (execbuf) that need to be locked, the caller would be assigned a unique reservation_id, from a global counter. In case of deadlock in the process of locking all the buffers associated with a execbuf, the one with the lowest reservation_id wins, and the one with the higher reservation_id unlocks all of the buffers that it has already locked, and then tries again. Originally TTM implemented this algorithm on top of an event-queue and atomic-ops, but Maarten Lankhorst realized that by merging this with the mutex code we could take advantage of the existing mutex fast-path code and result in a simpler solution, and so ticket_mutex was born. (Well, there where also some additional complexities with the original implementation when you start adding in cross-device buffer sharing for PRIME.. Maarten could probably better explain.) I think the motivational writeup above is really nice, but the example code below is a bit wrong How it is used: --- -- -- - A very simplified version: int submit_execbuf(execbuf) { /* acquiring locks, before queuing up to GPU: */ seqno = assign_global_seqno(); retry: for (buf in execbuf-buffers) { ret = mutex_reserve_lock(buf-lock, seqno); switch (ret) { case 0: /* we got the lock */ break; case -EAGAIN: /* someone with a lower seqno, so unreserve and try again: */ for (buf2 in reverse order starting before buf in execbuf-buffers) mutex_unreserve_unlock(buf2-lock); goto retry; default: goto err; } } /* now everything is good to go, submit job to GPU: */ ... } int finish_execbuf(execbuf) { /* when GPU is finished: */ for (buf in execbuf-buffers) mutex_unreserve_unlock(buf-lock); } == Since gpu command submission is all asnyc (hopefully at least) we don't unlock once it completes, but right away after the commands are submitted. Otherwise you wouldn't be able to submit new execbufs using the same buffer objects (and besides, holding locks while going back out to userspace is evil). right.. but I was trying to simplify the explanation for non-gpu folk.. maybe that was an over-simplification ;-) Ok, a bit expanded version.. I meant to send this yesterday, but I forgot.. Basic problem statement: - --- - GPU's do operations that commonly involve many buffers. Those buffers can be shared across contexts/processes, exist in different memory domains (for example VRAM vs system memory), and so on. And with PRIME / dmabuf, they can even be shared across devices. So there are a handful of situations where the driver needs to wait for buffers to become ready. If you think about this in terms of waiting on a buffer mutex for it to become available, this presents a problem because there is no way to guarantee that buffers appear in a execbuf/batch in the same order in all contexts. That is directly under control of userspace, and a result of the sequence of GL calls that an application makes. Which results in the potential for deadlock. The problem gets more complex when you consider that the kernel may need to migrate the buffer(s) into VRAM before the GPU operates on the buffer(s), which may in turn require evicting some
Re: [PATCH 2/7] mutex: add support for reservation style locks
On Wed, Jan 30, 2013 at 5:08 AM, Daniel Vetter dan...@ffwll.ch wrote: On Wed, Jan 30, 2013 at 2:07 AM, Rob Clark robdcl...@gmail.com wrote: == Basic problem statement: - --- - GPU's do operations that commonly involve many buffers. Those buffers can be shared across contexts/processes, exist in different memory domains (for example VRAM vs system memory), and so on. And with PRIME / dmabuf, they can even be shared across devices. So there are a handful of situations where the driver needs to wait for buffers to become ready. If you think about this in terms of waiting on a buffer mutex for it to become available, this presents a problem because there is no way to guarantee that buffers appear in a execbuf/batch in the same order in all contexts. That is directly under control of userspace, and a result of the sequence of GL calls that an application makes. Which results in the potential for deadlock. The problem gets more complex when you consider that the kernel may need to migrate the buffer(s) into VRAM before the GPU operates on the buffer(s), which main in turn require evicting some other buffers (and you don't want to evict other buffers which are already queued up to the GPU), but for a simplified understanding of the problem you can ignore this. The algorithm that TTM came up with for dealing with this problem is quite simple. For each group of buffers (execbuf) that need to be locked, the caller would be assigned a unique reservation_id, from a global counter. In case of deadlock in the process of locking all the buffers associated with a execbuf, the one with the lowest reservation_id wins, and the one with the higher reservation_id unlocks all of the buffers that it has already locked, and then tries again. Originally TTM implemented this algorithm on top of an event-queue and atomic-ops, but Maarten Lankhorst realized that by merging this with the mutex code we could take advantage of the existing mutex fast-path code and result in a simpler solution, and so ticket_mutex was born. (Well, there where also some additional complexities with the original implementation when you start adding in cross-device buffer sharing for PRIME.. Maarten could probably better explain.) I think the motivational writeup above is really nice, but the example code below is a bit wrong How it is used: --- -- -- - A very simplified version: int submit_execbuf(execbuf) { /* acquiring locks, before queuing up to GPU: */ seqno = assign_global_seqno(); retry: for (buf in execbuf-buffers) { ret = mutex_reserve_lock(buf-lock, seqno); switch (ret) { case 0: /* we got the lock */ break; case -EAGAIN: /* someone with a lower seqno, so unreserve and try again: */ for (buf2 in reverse order starting before buf in execbuf-buffers) mutex_unreserve_unlock(buf2-lock); goto retry; default: goto err; } } /* now everything is good to go, submit job to GPU: */ ... } int finish_execbuf(execbuf) { /* when GPU is finished: */ for (buf in execbuf-buffers) mutex_unreserve_unlock(buf-lock); } == Since gpu command submission is all asnyc (hopefully at least) we don't unlock once it completes, but right away after the commands are submitted. Otherwise you wouldn't be able to submit new execbufs using the same buffer objects (and besides, holding locks while going back out to userspace is evil). right.. but I was trying to simplify the explanation for non-gpu folk.. maybe that was an over-simplification ;-) BR, -R The trick is to add a fence object for async operation (essentially a waitqueue on steriods to support gpu-gpu direct signalling). And updating fences for a given execbuf needs to happen atomically for all buffers, for otherwise userspace could trick the kernel into creating a circular fence chain. This wouldn't deadlock the kernel, since everything is async, but it'll nicely deadlock the gpus involved. Hence why we need ticketing locks to get dma_buf fences off the ground. Maybe wait for Maarten's feedback, then update your motivational blurb a bit? Cheers, Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/7] mutex: add support for reservation style locks
On Tue, Jan 15, 2013 at 6:33 AM, Maarten Lankhorst m.b.lankho...@gmail.com wrote: Hi Maarten, This is a nice looking extension to avoid re-implementing a mutex in TTM/reservation code.. ofc, probably someone more familiar with mutex code should probably review, but probably a bit of explanation about what and why would be helpful. mutex_reserve_lock, and mutex_reserve_lock_interruptible: Lock a buffer with a reservation_id set. reservation_id must not be set to 0, since this is a special value that means no reservation_id. Normally if reservation_id is not set, or is older than the reservation_id that's currently set on the mutex, the behavior will be to wait normally. However, if the reservation_id is newer than the current reservation_id, -EAGAIN will be returned, and this function must unreserve all other mutexes and then redo a blocking lock with normal mutex calls to prevent a deadlock, then call mutex_locked_set_reservation on successful locking to set the reservation_id inside the lock. It might be a bit more clear to write up how this works from the perspective of the user of ticket_mutex, separately from the internal implementation first, and then how it works internally? Ie, the mutex_set_reservation_fastpath() call is internal to the implementation of ticket_mutex, but -EAGAIN is something the caller of ticket_mutex shall deal with. This might give a clearer picture of how TTM / reservation uses this to prevent deadlock, so those less familiar with TTM could better understand. Well, here is an attempt to start a write-up, which should perhaps eventually be folded into Documentation/ticket-mutex-design.txt. But hopefully a better explanation of the problem and the solution will encourage some review of the ticket_mutex changes. == Basic problem statement: - --- - GPU's do operations that commonly involve many buffers. Those buffers can be shared across contexts/processes, exist in different memory domains (for example VRAM vs system memory), and so on. And with PRIME / dmabuf, they can even be shared across devices. So there are a handful of situations where the driver needs to wait for buffers to become ready. If you think about this in terms of waiting on a buffer mutex for it to become available, this presents a problem because there is no way to guarantee that buffers appear in a execbuf/batch in the same order in all contexts. That is directly under control of userspace, and a result of the sequence of GL calls that an application makes. Which results in the potential for deadlock. The problem gets more complex when you consider that the kernel may need to migrate the buffer(s) into VRAM before the GPU operates on the buffer(s), which main in turn require evicting some other buffers (and you don't want to evict other buffers which are already queued up to the GPU), but for a simplified understanding of the problem you can ignore this. The algorithm that TTM came up with for dealing with this problem is quite simple. For each group of buffers (execbuf) that need to be locked, the caller would be assigned a unique reservation_id, from a global counter. In case of deadlock in the process of locking all the buffers associated with a execbuf, the one with the lowest reservation_id wins, and the one with the higher reservation_id unlocks all of the buffers that it has already locked, and then tries again. Originally TTM implemented this algorithm on top of an event-queue and atomic-ops, but Maarten Lankhorst realized that by merging this with the mutex code we could take advantage of the existing mutex fast-path code and result in a simpler solution, and so ticket_mutex was born. (Well, there where also some additional complexities with the original implementation when you start adding in cross-device buffer sharing for PRIME.. Maarten could probably better explain.) How it is used: --- -- -- - A very simplified version: int submit_execbuf(execbuf) { /* acquiring locks, before queuing up to GPU: */ seqno = assign_global_seqno(); retry: for (buf in execbuf-buffers) { ret = mutex_reserve_lock(buf-lock, seqno); switch (ret) { case 0: /* we got the lock */ break; case -EAGAIN: /* someone with a lower seqno, so unreserve and try again: */ for (buf2 in reverse order starting before buf in execbuf-buffers) mutex_unreserve_unlock(buf2-lock); goto retry; default: goto err; } } /* now everything is good to go, submit job to GPU: */ ... } int finish_execbuf(execbuf) { /* when GPU is finished: */ for (buf in execbuf-buffers) mutex_unreserve_unlock(buf-lock); } == anyways, for the rest of the patch, I'm still going through the mutex/ticket_mutex
Re: [RFC v2 0/5] Common Display Framework
On Tue, Jan 8, 2013 at 2:25 AM, Laurent Pinchart laurent.pinch...@ideasonboard.com wrote: Hi Rob, On Thursday 27 December 2012 09:54:55 Rob Clark wrote: What I've done to avoid that so far is that the master device registers the drivers for it's output sub-devices before registering it's own device. I'm not sure to follow you here. The master device doesn't register anything, do you mean the master device driver ? If so, how does the master device driver register its own device ? Devices are not registered by their driver. sorry, that should have read master driver registers drivers for it's sub-devices.. BR, -R -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv16 5/7] fbmon: add of_videomode helpers
On Mon, Jan 7, 2013 at 2:46 AM, Mohammed, Afzal af...@ti.com wrote: Hi Steffen, On Mon, Jan 07, 2013 at 13:36:48, Steffen Trumtrar wrote: On Mon, Jan 07, 2013 at 06:10:13AM +, Mohammed, Afzal wrote: This breaks DaVinci (da8xx_omapl_defconfig), following change was required to get it build if OF_VIDEOMODE or/and FB_MODE_HELPERS is not defined. There may be better solutions, following was the one that was used by me to test this series. I just did a quick make da8xx_omapl_defconfig make and it builds just fine. On what version did you apply the series? At the moment I have the series sitting on 3.7. Didn't try any 3.8-rcx yet. But fixing this shouldn't be a problem. You are right, me idiot, error will happen only upon try to make use of of_get_fb_videomode() (defined in this patch) in the da8xx-fb driver (with da8xx_omapl_defconfig), to be exact upon adding, video: da8xx-fb: obtain fb_videomode info from dt of my patch series. The change as I mentioned or something similar would be required as any driver that is going to make use of of_get_fb_videomode() would break if CONFIG_OF_VIDEOMODE or CONFIG_FB_MODE_HELPERS is not defined. Shouldn't the driver that depends on CONFIG_OF_VIDEOMODE and CONFIG_FB_MODE_HELPERS, explicitly select them? I don't really see the point of having the static-inline fallbacks. fwiw, using 'select' is what I was doing for lcd panel support for lcdc/da8xx drm driver (which was using the of videomode helpers, albeit a slightly earlier version of the patches): https://github.com/robclark/kernel-omap4/commit/e2aef5f281348afaaaeaa132699efc2831aa8384 BR, -R And testing was done over v3.8-rc2. +#if IS_ENABLED(CONFIG_OF_VIDEOMODE) As _OF_VIDEOMODE is a bool type CONFIG, isn't, #ifdef CONFIG_OF_VIDEOMODE sufficient ? Yes, that is right. But I think IS_ENABLED is the preferred way to do it, isn't it? Now I realize it is. Regards Afzal -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC v2 0/5] Common Display Framework
On Mon, Dec 24, 2012 at 7:37 AM, Laurent Pinchart laurent.pinch...@ideasonboard.com wrote: Hi Rob, On Tuesday 18 December 2012 00:21:32 Rob Clark wrote: On Mon, Dec 17, 2012 at 11:04 PM, Dave Airlie airl...@gmail.com wrote: Many developers showed interest in the first RFC, and I've had the opportunity to discuss it with most of them. I would like to thank (in no particular order) Tomi Valkeinen for all the time he spend helping me to draft v2, Marcus Lorentzon for his useful input during Linaro Connect Q4 2012, and Linaro for inviting me to Connect and providing a venue to discuss this topic. So this might be a bit off topic but this whole CDF triggered me looking at stuff I generally avoid: The biggest problem I'm having currently with the whole ARM graphics and output world is the proliferation of platform drivers for every little thing. The whole ordering of operations with respect to things like suspend/resume or dynamic power management is going to be a real nightmare if there are dependencies between the drivers. How do you enforce ordering of s/r operations between all the various components? I tend to think that sub-devices are useful just to have a way to probe hw which may or may not be there, since on ARM we often don't have any alternative.. but beyond that, suspend/resume, and other life-cycle aspects, they should really be treated as all one device. Especially to avoid undefined suspend/resume ordering. I tend to agree, except that I try to reuse the existing PM infrastructure when possible to avoid reinventing the wheel. So far handling suspend/resume ordering related to data busses in early suspend/late resume operations and allowing the Linux PM core to handle control busses using the Linux device tree worked pretty well. CDF or some sort of mechanism to share panel drivers between drivers is useful. Keeping it within drm, is probably a good idea, if nothing else to simplify re-use of helper fxns (like avi-infoframe stuff, for example) and avoid dealing with merging changes across multiple trees. Treating them more like shared libraries and less like sub-devices which can be dynamically loaded/unloaded (ie. they should be not built as separate modules or suspend/resumed or probed/removed independently of the master driver) is a really good idea to avoid uncovering nasty synchronization issues later (remove vs modeset or pageflip) or surprising userspace in bad ways. We've tried that in V4L2 years ago and realized that the approach led to a dead-end, especially when OF/DT got involved. With DT-based device probing, I2C camera sensors started getting probed asynchronously to the main camera device, as they are children of the I2C bus master. We will have similar issues with I2C HDMI transmitters or panels, so we should be prepared for it. What I've done to avoid that so far is that the master device registers the drivers for it's output sub-devices before registering it's own device. At least this way I can control that they are probed first. Not the prettiest thing, but avoids even uglier problems. On PC hardware the I2C devices are connected to an I2C master provided by the GPU, but on embedded devices they are usually connected to an independent I2C master. We thus can't have a single self-contained driver that controls everything internally, and need to interface with the rest of the SoC drivers. I agree that probing/removing devices independently of the master driver can lead to bad surprises, which is why I want to establish clear rules in CDF regarding what can and can't be done with display entities. Reference counting will be one way to make sure that devices don't disappear all of a sudden. That at least helps cover some issues.. although it doesn't really help userspace confusion. Anyways, with enough work perhaps all problems could be solved.. otoh, there are plenty of other important problems to solve in the world of gpus and kms, so my preference is always not to needlessly over-complicate CDF and instead leave some time for other things BR, -R The other thing I'd like you guys to do is kill the idea of fbdev and v4l drivers that are shared with the drm codebase, really just implement fbdev and v4l on top of the drm layer, some people might think this is some sort of maintainer thing, but really nothing else makes sense, and having these shared display frameworks just to avoid having using drm/kms drivers seems totally pointless. Fix the drm fbdev emulation if an fbdev interface is needed. But creating a fourth framework because our previous 3 frameworks didn't work out doesn't seem like a situation I want to get behind too much. yeah, let's not have multiple frameworks to do the same thing.. For fbdev, it is pretty clear that it is a dead end. For v4l2 (subdev+mcf), it is perhaps bit more flexible when it comes to random arbitrary hw pipelines than kms. But to take
Re: [RFC v2 0/5] Common Display Framework
On Mon, Dec 24, 2012 at 11:09 AM, Laurent Pinchart laurent.pinch...@ideasonboard.com wrote: On the topic of discussions, would anyone be interested in a BoF/brainstorming/whatever session during the FOSDEM ? I will be at FOSDEM.. and from http://wiki.x.org/wiki/fosdem2013 it looks like at least Daniel will be there. If enough others are, it could be a good idea. BR, -R -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC v2 0/5] Common Display Framework
On Mon, Dec 24, 2012 at 11:27 AM, Laurent Pinchart laurent.pinch...@ideasonboard.com wrote: On Wednesday 19 December 2012 16:57:56 Jani Nikula wrote: It just seems to me that, at least from a DRM/KMS perspective, adding another layer (=CDF) for HDMI or DP (or legacy outputs) would be overengineering it. They are pretty well standardized, and I don't see there would be a need to write multiple display drivers for them. Each display controller has one, and can easily handle any chip specific requirements right there. It's my gut feeling that an additional framework would just get in the way. Perhaps there could be more common HDMI/DP helper style code in DRM to reduce overlap across KMS drivers, but that's another thing. So is the HDMI/DP drivers using CDF a more interesting idea from a non-DRM perspective? Or, put another way, is it more of an alternative to using DRM? Please enlighten me if there's some real benefit here that I fail to see! As Rob pointed out, you can have external HDMI/DP encoders, and even internal HDMI/DP encoder IPs can be shared between SoCs and SoC vendors. CDF aims at sharing a single driver between SoCs and boards for a given HDMI/DP encoder. just fwiw, drm already has something a bit like this.. the i2c encoder-slave. With support for a couple external i2c encoders which could in theory be shared between devices. BR, -R -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC v2 0/5] Common Display Framework
On Mon, Dec 24, 2012 at 11:35 AM, Laurent Pinchart laurent.pinch...@ideasonboard.com wrote: On Wednesday 19 December 2012 09:26:40 Rob Clark wrote: And, there are also external HDMI encoders (for example connected over i2c) that can also be shared between boards. So I think there will be a number of cases where CDF is appropriate for HDMI drivers. Although trying to keep this all independent of DRM (as opposed to just something similar to what drivers/gpu/i2c is today) seems a bit overkill for me. Being able to use the helpers in drm and avoiding an extra layer of translation seems like the better option to me. So my vote would be drivers/gpu/cdf. I don't think there will be any need for translation (except perhaps between the DRM mode structures and the common video mode structure that is being discussed). Add a drm_ prefix to the existing CDF functions and structures, and there you go :-) well, and translation for any properties that we'd want to expose to userspace, etc, etc.. I see there being a big potential for a lot of needless glue BR, -R The reason why I'd like to keep CDF separate from DRM (or at least not requiring a drm_device) is that HDMI/DP encoders can be used by pure V4L2 drivers. For DSI panels (or DSI-to-whatever bridges) it's of course another story. You typically need a panel specific driver. And here I see the main point of the whole CDF: decoupling display controllers and the panel drivers, and sharing panel (and converter chip) specific drivers across display controllers. Making it easy to write new drivers, as there would be a model to follow. I'm definitely in favour of coming up with some framework that would tackle that. -- Regards, Laurent Pinchart -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC v2 0/5] Common Display Framework
On Thu, Dec 27, 2012 at 1:18 PM, Sascha Hauer s.ha...@pengutronix.de wrote: On Thu, Dec 27, 2012 at 09:54:55AM -0600, Rob Clark wrote: On Mon, Dec 24, 2012 at 7:37 AM, Laurent Pinchart laurent.pinch...@ideasonboard.com wrote: Hi Rob, On Tuesday 18 December 2012 00:21:32 Rob Clark wrote: On Mon, Dec 17, 2012 at 11:04 PM, Dave Airlie airl...@gmail.com wrote: Many developers showed interest in the first RFC, and I've had the opportunity to discuss it with most of them. I would like to thank (in no particular order) Tomi Valkeinen for all the time he spend helping me to draft v2, Marcus Lorentzon for his useful input during Linaro Connect Q4 2012, and Linaro for inviting me to Connect and providing a venue to discuss this topic. So this might be a bit off topic but this whole CDF triggered me looking at stuff I generally avoid: The biggest problem I'm having currently with the whole ARM graphics and output world is the proliferation of platform drivers for every little thing. The whole ordering of operations with respect to things like suspend/resume or dynamic power management is going to be a real nightmare if there are dependencies between the drivers. How do you enforce ordering of s/r operations between all the various components? I tend to think that sub-devices are useful just to have a way to probe hw which may or may not be there, since on ARM we often don't have any alternative.. but beyond that, suspend/resume, and other life-cycle aspects, they should really be treated as all one device. Especially to avoid undefined suspend/resume ordering. I tend to agree, except that I try to reuse the existing PM infrastructure when possible to avoid reinventing the wheel. So far handling suspend/resume ordering related to data busses in early suspend/late resume operations and allowing the Linux PM core to handle control busses using the Linux device tree worked pretty well. CDF or some sort of mechanism to share panel drivers between drivers is useful. Keeping it within drm, is probably a good idea, if nothing else to simplify re-use of helper fxns (like avi-infoframe stuff, for example) and avoid dealing with merging changes across multiple trees. Treating them more like shared libraries and less like sub-devices which can be dynamically loaded/unloaded (ie. they should be not built as separate modules or suspend/resumed or probed/removed independently of the master driver) is a really good idea to avoid uncovering nasty synchronization issues later (remove vs modeset or pageflip) or surprising userspace in bad ways. We've tried that in V4L2 years ago and realized that the approach led to a dead-end, especially when OF/DT got involved. With DT-based device probing, I2C camera sensors started getting probed asynchronously to the main camera device, as they are children of the I2C bus master. We will have similar issues with I2C HDMI transmitters or panels, so we should be prepared for it. What I've done to avoid that so far is that the master device registers the drivers for it's output sub-devices before registering it's own device. At least this way I can control that they are probed first. Not the prettiest thing, but avoids even uglier problems. This implies that the master driver knows all potential subdevices, something which is not true for SoCs which have external i2c encoders attached to unrelated i2c controllers. well, it can be brute-forced.. ie. drm driver calls common register_all_panels() fxn, which, well, registers all the panel/display subdev's based on their corresponding CONFIG_FOO_PANEL defines. If you anyways aren't building the panels as separate modules, that would work. Maybe not the most *elegant* approach, but simple and functional. I guess it partly depends on the structure in devicetree. If you are assuming that the i2c encoder belongs inside the i2c bus, like: i2cN { foo-i2c-encoder { }; }; and you are letting devicetree create the devices, then it doesn't quite work. I'm not entirely convinced you should do it that way. Really any device like that is going to be hooked up to at least a couple busses.. i2c, some sort of bus carrying pixel data, maybe some gpio's, etc. So maybe makes more sense for a virtual drm/kms bus, and then use phandle stuff to link it to the various other busses it needs: mydrmdev { foo-i2c-encoder { i2c = i2cN; gpio = gpioM 2 3 ... }; }; ok, admittedly that is a bit different from other proposals about how this all fits in devicetree.. but otoh, I'm not a huge believer in letting something that is supposed to make life easier (DT), actually make things harder or more complicated. Plus this CDF stuff all needs to also work on platforms not using OF/DT. BR, -R Sascha -- Pengutronix e.K
Re: [PATCH] [RFC] dma-buf: implement vmap refcounting in the interface logic
On Thu, Dec 20, 2012 at 7:14 AM, Daniel Vetter daniel.vet...@ffwll.ch wrote: All drivers which implement this need to have some sort of refcount to allow concurrent vmap usage. Hence implement this in the dma-buf core. To protect against concurrent calls we need a lock, which potentially causes new funny locking inversions. But this shouldn't be a problem for exporters with statically allocated backing storage, and more dynamic drivers have decent issues already anyway. Inspired by some refactoring patches from Aaron Plattner, who implemented the same idea, but only for drm/prime drivers. v2: Check in dma_buf_release that no dangling vmaps are left. Suggested by Aaron Plattner. We might want to do similar checks for attachments, but that's for another patch. Also fix up ERR_PTR return for vmap. v3: Check whether the passed-in vmap address matches with the cached one for vunmap. Eventually we might want to remove that parameter - compared to the kmap functions there's no need for the vaddr for unmapping. Suggested by Chris Wilson. v4: Fix a brown-paper-bag bug spotted by Aaron Plattner. yeah, I think the locking does worry me a bit but hopefully should be able to do something better when reservations land Signed-off-by: Rob Clark r...@ti.com Cc: Aaron Plattner aplatt...@nvidia.com Signed-off-by: Daniel Vetter daniel.vet...@ffwll.ch --- Documentation/dma-buf-sharing.txt | 6 +- drivers/base/dma-buf.c| 43 ++- include/linux/dma-buf.h | 4 +++- 3 files changed, 46 insertions(+), 7 deletions(-) diff --git a/Documentation/dma-buf-sharing.txt b/Documentation/dma-buf-sharing.txt index 0188903..4966b1b 100644 --- a/Documentation/dma-buf-sharing.txt +++ b/Documentation/dma-buf-sharing.txt @@ -302,7 +302,11 @@ Access to a dma_buf from the kernel context involves three steps: void dma_buf_vunmap(struct dma_buf *dmabuf, void *vaddr) The vmap call can fail if there is no vmap support in the exporter, or if it - runs out of vmalloc space. Fallback to kmap should be implemented. + runs out of vmalloc space. Fallback to kmap should be implemented. Note that + the dma-buf layer keeps a reference count for all vmap access and calls down + into the exporter's vmap function only when no vmapping exists, and only + unmaps it once. Protection against concurrent vmap/vunmap calls is provided + by taking the dma_buf-lock mutex. 3. Finish access diff --git a/drivers/base/dma-buf.c b/drivers/base/dma-buf.c index a3f79c4..26b68de 100644 --- a/drivers/base/dma-buf.c +++ b/drivers/base/dma-buf.c @@ -39,6 +39,8 @@ static int dma_buf_release(struct inode *inode, struct file *file) dmabuf = file-private_data; + BUG_ON(dmabuf-vmapping_counter); + dmabuf-ops-release(dmabuf); kfree(dmabuf); return 0; @@ -482,12 +484,34 @@ EXPORT_SYMBOL_GPL(dma_buf_mmap); */ void *dma_buf_vmap(struct dma_buf *dmabuf) { + void *ptr; + if (WARN_ON(!dmabuf)) return NULL; - if (dmabuf-ops-vmap) - return dmabuf-ops-vmap(dmabuf); - return NULL; + if (!dmabuf-ops-vmap) + return NULL; + + mutex_lock(dmabuf-lock); + if (dmabuf-vmapping_counter) { + dmabuf-vmapping_counter++; + BUG_ON(!dmabuf-vmap_ptr); + ptr = dmabuf-vmap_ptr; + goto out_unlock; + } + + BUG_ON(dmabuf-vmap_ptr); + + ptr = dmabuf-ops-vmap(dmabuf); + if (IS_ERR_OR_NULL(ptr)) + goto out_unlock; + + dmabuf-vmap_ptr = ptr; + dmabuf-vmapping_counter = 1; + +out_unlock: + mutex_unlock(dmabuf-lock); + return ptr; } EXPORT_SYMBOL_GPL(dma_buf_vmap); @@ -501,7 +525,16 @@ void dma_buf_vunmap(struct dma_buf *dmabuf, void *vaddr) if (WARN_ON(!dmabuf)) return; - if (dmabuf-ops-vunmap) - dmabuf-ops-vunmap(dmabuf, vaddr); + BUG_ON(!dmabuf-vmap_ptr); + BUG_ON(dmabuf-vmapping_counter == 0); + BUG_ON(dmabuf-vmap_ptr != vaddr); + + mutex_lock(dmabuf-lock); + if (--dmabuf-vmapping_counter == 0) { + if (dmabuf-ops-vunmap) + dmabuf-ops-vunmap(dmabuf, vaddr); + dmabuf-vmap_ptr = NULL; + } + mutex_unlock(dmabuf-lock); } EXPORT_SYMBOL_GPL(dma_buf_vunmap); diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h index bd2e52c..e3bf2f6 100644 --- a/include/linux/dma-buf.h +++ b/include/linux/dma-buf.h @@ -119,8 +119,10 @@ struct dma_buf { struct file *file; struct list_head attachments; const struct dma_buf_ops *ops; - /* mutex to serialize list manipulation and attach/detach */ + /* mutex to serialize list manipulation, attach/detach and vmap/unmap */ struct mutex
Re: [PATCH] [RFC] dma-buf: implement vmap refcounting in the interface logic
On Thu, Dec 20, 2012 at 10:50 AM, Rob Clark robdcl...@gmail.com wrote: On Thu, Dec 20, 2012 at 7:14 AM, Daniel Vetter daniel.vet...@ffwll.ch wrote: All drivers which implement this need to have some sort of refcount to allow concurrent vmap usage. Hence implement this in the dma-buf core. To protect against concurrent calls we need a lock, which potentially causes new funny locking inversions. But this shouldn't be a problem for exporters with statically allocated backing storage, and more dynamic drivers have decent issues already anyway. Inspired by some refactoring patches from Aaron Plattner, who implemented the same idea, but only for drm/prime drivers. v2: Check in dma_buf_release that no dangling vmaps are left. Suggested by Aaron Plattner. We might want to do similar checks for attachments, but that's for another patch. Also fix up ERR_PTR return for vmap. v3: Check whether the passed-in vmap address matches with the cached one for vunmap. Eventually we might want to remove that parameter - compared to the kmap functions there's no need for the vaddr for unmapping. Suggested by Chris Wilson. v4: Fix a brown-paper-bag bug spotted by Aaron Plattner. yeah, I think the locking does worry me a bit but hopefully should be able to do something better when reservations land Signed-off-by: Rob Clark r...@ti.com or even, Reviewed-by: Rob Clark r...@ti.com Cc: Aaron Plattner aplatt...@nvidia.com Signed-off-by: Daniel Vetter daniel.vet...@ffwll.ch --- Documentation/dma-buf-sharing.txt | 6 +- drivers/base/dma-buf.c| 43 ++- include/linux/dma-buf.h | 4 +++- 3 files changed, 46 insertions(+), 7 deletions(-) diff --git a/Documentation/dma-buf-sharing.txt b/Documentation/dma-buf-sharing.txt index 0188903..4966b1b 100644 --- a/Documentation/dma-buf-sharing.txt +++ b/Documentation/dma-buf-sharing.txt @@ -302,7 +302,11 @@ Access to a dma_buf from the kernel context involves three steps: void dma_buf_vunmap(struct dma_buf *dmabuf, void *vaddr) The vmap call can fail if there is no vmap support in the exporter, or if it - runs out of vmalloc space. Fallback to kmap should be implemented. + runs out of vmalloc space. Fallback to kmap should be implemented. Note that + the dma-buf layer keeps a reference count for all vmap access and calls down + into the exporter's vmap function only when no vmapping exists, and only + unmaps it once. Protection against concurrent vmap/vunmap calls is provided + by taking the dma_buf-lock mutex. 3. Finish access diff --git a/drivers/base/dma-buf.c b/drivers/base/dma-buf.c index a3f79c4..26b68de 100644 --- a/drivers/base/dma-buf.c +++ b/drivers/base/dma-buf.c @@ -39,6 +39,8 @@ static int dma_buf_release(struct inode *inode, struct file *file) dmabuf = file-private_data; + BUG_ON(dmabuf-vmapping_counter); + dmabuf-ops-release(dmabuf); kfree(dmabuf); return 0; @@ -482,12 +484,34 @@ EXPORT_SYMBOL_GPL(dma_buf_mmap); */ void *dma_buf_vmap(struct dma_buf *dmabuf) { + void *ptr; + if (WARN_ON(!dmabuf)) return NULL; - if (dmabuf-ops-vmap) - return dmabuf-ops-vmap(dmabuf); - return NULL; + if (!dmabuf-ops-vmap) + return NULL; + + mutex_lock(dmabuf-lock); + if (dmabuf-vmapping_counter) { + dmabuf-vmapping_counter++; + BUG_ON(!dmabuf-vmap_ptr); + ptr = dmabuf-vmap_ptr; + goto out_unlock; + } + + BUG_ON(dmabuf-vmap_ptr); + + ptr = dmabuf-ops-vmap(dmabuf); + if (IS_ERR_OR_NULL(ptr)) + goto out_unlock; + + dmabuf-vmap_ptr = ptr; + dmabuf-vmapping_counter = 1; + +out_unlock: + mutex_unlock(dmabuf-lock); + return ptr; } EXPORT_SYMBOL_GPL(dma_buf_vmap); @@ -501,7 +525,16 @@ void dma_buf_vunmap(struct dma_buf *dmabuf, void *vaddr) if (WARN_ON(!dmabuf)) return; - if (dmabuf-ops-vunmap) - dmabuf-ops-vunmap(dmabuf, vaddr); + BUG_ON(!dmabuf-vmap_ptr); + BUG_ON(dmabuf-vmapping_counter == 0); + BUG_ON(dmabuf-vmap_ptr != vaddr); + + mutex_lock(dmabuf-lock); + if (--dmabuf-vmapping_counter == 0) { + if (dmabuf-ops-vunmap) + dmabuf-ops-vunmap(dmabuf, vaddr); + dmabuf-vmap_ptr = NULL; + } + mutex_unlock(dmabuf-lock); } EXPORT_SYMBOL_GPL(dma_buf_vunmap); diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h index bd2e52c..e3bf2f6 100644 --- a/include/linux/dma-buf.h +++ b/include/linux/dma-buf.h @@ -119,8 +119,10 @@ struct dma_buf { struct file *file; struct list_head attachments; const struct dma_buf_ops *ops; - /* mutex to serialize list manipulation
Re: [RFC v2 0/5] Common Display Framework
On Wed, Dec 19, 2012 at 8:57 AM, Jani Nikula jani.nik...@linux.intel.com wrote: Hi Laurent - On Tue, 18 Dec 2012, Laurent Pinchart laurent.pinch...@ideasonboard.com wrote: Hi Jani, On Monday 17 December 2012 18:53:37 Jani Nikula wrote: I can see the need for a framework for DSI panels and such (in fact Tomi and I have talked about it like 2-3 years ago already!) but what is the story for HDMI and DP? In particular, what's the relationship between DRM and CDF here? Is there a world domination plan to switch the DRM drivers to use this framework too? ;) Do you have some rough plans how DRM and CDF should work together in general? There's always a world domination plan, isn't there ? :-) I certainly want CDF to be used by DRM (or more accurately KMS). That's what the C stands for, common refers to sharing panel and other display entity drivers between FBDEV, KMS and V4L2. I currently have no plan to expose CDF internals to userspace through the KMS API. We might have to do so later if the hardware complexity grows in such a way that finer control than what KMS provides needs to be exposed to userspace, but I don't think we're there yet. The CDF API will thus only be used internally in the kernel by display controller drivers. The KMS core might get functions to handle common display entity operations, but the bulk of the work will be in the display controller drivers to start with. We will then see what can be abstracted in KMS helper functions. Regarding HDMI and DP, I imagine HDMI and DP drivers that would use the CDF API. That's just a thought for now, I haven't tried to implement them, but it would be nice to handle HDMI screens and DPI/DBI/DSI panels in a generic way. Do you have thoughts to share on this topic ? It just seems to me that, at least from a DRM/KMS perspective, adding another layer (=CDF) for HDMI or DP (or legacy outputs) would be overengineering it. They are pretty well standardized, and I don't see there would be a need to write multiple display drivers for them. Each display controller has one, and can easily handle any chip specific requirements right there. It's my gut feeling that an additional framework would just get in the way. Perhaps there could be more common HDMI/DP helper style code in DRM to reduce overlap across KMS drivers, but that's another thing. So is the HDMI/DP drivers using CDF a more interesting idea from a non-DRM perspective? Or, put another way, is it more of an alternative to using DRM? Please enlighten me if there's some real benefit here that I fail to see! fwiw, I think there are at least a couple cases where multiple SoC's have the same HDMI IP block. And, there are also external HDMI encoders (for example connected over i2c) that can also be shared between boards. So I think there will be a number of cases where CDF is appropriate for HDMI drivers. Although trying to keep this all independent of DRM (as opposed to just something similar to what drivers/gpu/i2c is today) seems a bit overkill for me. Being able to use the helpers in drm and avoiding an extra layer of translation seems like the better option to me. So my vote would be drivers/gpu/cdf. BR, -R For DSI panels (or DSI-to-whatever bridges) it's of course another story. You typically need a panel specific driver. And here I see the main point of the whole CDF: decoupling display controllers and the panel drivers, and sharing panel (and converter chip) specific drivers across display controllers. Making it easy to write new drivers, as there would be a model to follow. I'm definitely in favour of coming up with some framework that would tackle that. BR, Jani. -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC v2 0/5] Common Display Framework
On Wed, Dec 19, 2012 at 9:37 AM, Tomi Valkeinen tomi.valkei...@ti.com wrote: On 2012-12-19 17:26, Rob Clark wrote: And, there are also external HDMI encoders (for example connected over i2c) that can also be shared between boards. So I think there will be a number of cases where CDF is appropriate for HDMI drivers. Although trying to keep this all independent of DRM (as opposed to just something similar to what drivers/gpu/i2c is today) seems a bit overkill for me. Being able to use the helpers in drm and avoiding an extra layer of translation seems like the better option to me. So my vote would be drivers/gpu/cdf. Well, we need to think about that. I would like to keep CDF independent of DRM. I don't like tying different components/frameworks together if there's no real need for that. Also, something that Laurent mentioned in our face-to-face discussions: Some IPs/chips can be used for other purposes than with DRM. He had an example of a board, that (if I understood right) gets video signal from somewhere outside the board, processes the signal with some IPs/chips, and then outputs the signal. So there's no framebuffer, and the image is not stored anywhere. I think the framework used in these cases is always v4l2. The IPs/chips in the above model may be the exact same IPs/chips that are used with normal display. If the CDF was tied to DRM, using the same drivers for normal and these streaming cases would probably not be possible. Well, maybe there is a way, but it really seems to be over-complicating things unnecessarily to keep CDF independent of DRM.. there will be a lot more traditional uses of CDF compared to one crazy use-case. So I don't really fancy making it more difficult than in needs to be for everyone. Probably the thing to do is take a step back and reconsider that one crazy use-case. For example, KMS doesn't enforce that the buffer handled passed when you create a drm framebuffer object to scan out is a GEM buffer. So on that one crazy platform, maybe it makes sense to have a DRM/KMS display driver that takes a handle to identify which video stream coming from the capture end of the pipeline. Anyways, that is just an off-the-top-of-my-head idea, probably there are other options too. BR, -R Tomi -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC v2 0/5] Common Display Framework
On Mon, Dec 17, 2012 at 11:04 PM, Dave Airlie airl...@gmail.com wrote: Many developers showed interest in the first RFC, and I've had the opportunity to discuss it with most of them. I would like to thank (in no particular order) Tomi Valkeinen for all the time he spend helping me to draft v2, Marcus Lorentzon for his useful input during Linaro Connect Q4 2012, and Linaro for inviting me to Connect and providing a venue to discuss this topic. So this might be a bit off topic but this whole CDF triggered me looking at stuff I generally avoid: The biggest problem I'm having currently with the whole ARM graphics and output world is the proliferation of platform drivers for every little thing. The whole ordering of operations with respect to things like suspend/resume or dynamic power management is going to be a real nightmare if there are dependencies between the drivers. How do you enforce ordering of s/r operations between all the various components? I tend to think that sub-devices are useful just to have a way to probe hw which may or may not be there, since on ARM we often don't have any alternative.. but beyond that, suspend/resume, and other life-cycle aspects, they should really be treated as all one device. Especially to avoid undefined suspend/resume ordering. CDF or some sort of mechanism to share panel drivers between drivers is useful. Keeping it within drm, is probably a good idea, if nothing else to simplify re-use of helper fxns (like avi-infoframe stuff, for example) and avoid dealing with merging changes across multiple trees. Treating them more like shared libraries and less like sub-devices which can be dynamically loaded/unloaded (ie. they should be not built as separate modules or suspend/resumed or probed/removed independently of the master driver) is a really good idea to avoid uncovering nasty synchronization issues later (remove vs modeset or pageflip) or surprising userspace in bad ways. The other thing I'd like you guys to do is kill the idea of fbdev and v4l drivers that are shared with the drm codebase, really just implement fbdev and v4l on top of the drm layer, some people might think this is some sort of maintainer thing, but really nothing else makes sense, and having these shared display frameworks just to avoid having using drm/kms drivers seems totally pointless. Fix the drm fbdev emulation if an fbdev interface is needed. But creating a fourth framework because our previous 3 frameworks didn't work out doesn't seem like a situation I want to get behind too much. yeah, let's not have multiple frameworks to do the same thing.. For fbdev, it is pretty clear that it is a dead end. For v4l2 (subdev+mcf), it is perhaps bit more flexible when it comes to random arbitrary hw pipelines than kms. But to take advantage of that, your userspace isn't going to be portable anyways, so you might as well use driver specific properties/ioctls. But I tend to think that is more useful for cameras. And from userspace perspective, kms planes are less painful to use for output than v4l2, so lets stick to drm/kms for output (and not try to add camera/capture support to kms).. k, thx BR, -R Dave. ___ dri-devel mailing list dri-de...@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Linaro-mm-sig] [PATCH] dma-buf: Add debugfs support
On Fri, Dec 14, 2012 at 5:57 AM, Maarten Lankhorst m.b.lankho...@gmail.com wrote: Op 14-12-12 10:36, sumit.sem...@ti.com schreef: From: Sumit Semwal sumit.sem...@linaro.org Add debugfs support to make it easier to print debug information about the dma-buf buffers. I like the idea, I don't know if it could be done in a free manner, but for bonus points could we also have the dma-buf fd be obtainable that way from a debugfs entry? Doing so would allow me to 'steal' a dma-buf from an existing mapping easily, and test against that. Also I think the name of the device and process that exported the dma-buf would be useful to have as well, even if in case of the device that would mean changing the api slightly to record it. I was thinking of having a directory structure like this: /sys/kernel/debug/dma_buf/stats and then for each dma-buf: /sys/kernel/debug/dma-buf/exporting_file.c/number-fd /sys/kernel/debug/dma-buf/exporting_file.c/number-attachments /sys/kernel/debug/dma-buf/exporting_file.c/number-info Opening the fd file would give you back the original fd, or fail with -EIO if refcount was dropped to 0. Would something like this be doable? I don't know debugfs that well, but I don't see why it wouldn't be, yeah.. but sort of back-door's the security benefits of an anonymous fd.. BR, -R ~Maarten -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] dma-buf: Use EXPORT_SYMBOL
On Thu, Oct 11, 2012 at 6:13 AM, Mauro Carvalho Chehab mche...@redhat.com wrote: Em Thu, 11 Oct 2012 09:20:12 +0200 Hans Verkuil hverk...@xs4all.nl escreveu: my understaning is that the drivers/media/ authors should also ack with this licensing (possible) change. I am one of the main contributors there. Alan also has copyrights there, and at other parts of the Linux Kernel, including the driver's core, from where all Linux Kernel drivers are derivative work, including this one. As Alan well said, many other core Linux Kernel authors very likely share this point of view. So, developers implicitly or explicitly copied in this thread that might be considering the usage of dmabuf on proprietary drivers should consider this email as a formal notification of my viewpoint: e. g. that I consider any attempt of using DMABUF or media core/drivers together with proprietary Kernelspace code as a possible GPL infringement. As long as dmabuf uses EXPORT_SYMBOL_GPL that is definitely correct. Does your statement also hold if dmabuf would use EXPORT_SYMBOL? (Just asking) If you read the Kernel COPYING file, it is explicitly said there that the Kernel is licensing with GPLv2. The _ONLY_ exception there is the allowance to use the kernel via normal syscalls: NOTE! This copyright does *not* cover user programs that use kernel services by normal system calls - this is merely considered normal use of the kernel, and does *not* fall under the heading of derived work. Also note that the GPL below is copyrighted by the Free Software Foundation, but the instance of code that it refers to (the Linux kernel) is copyrighted by me and others who actually wrote it. The usage of EXPORT_SYMBOL() is not covered there, so those symbols are also covered by GPLv2. As the usage of a kernel symbol by a proprietary driver is not explicitly listed there as a GPLv2 exception, the only concrete results of this patch is to spread FUD, as EXPORT_SYMBOL might generate some doubts on people that don't read the Kernel's COPYING file. With or without this patch, anyone with intelectual rights in the Kernel may go to court to warrant their rights against the infringing closed source drivers. By not making it explicitly, you're only trying to fool people that using it might be allowed. Maybe a dumb question (I'm a programmer, not a lawyer), but does it change anything if we make the APIs related to *exporting* a dmabuf as EXPORT_SYMBOL() and keep the APIs related to *importing* as EXPORT_SYMBOL_GPL(). This at least avoids the non-GPL kernel module from calling in to other driver code, while still allowing the non-GPL driver to export a buffer that GPL drivers could use. BR, -R BTW, we should consider changing the control framework API to EXPORT_SYMBOL_GPL. Agreed. The number of contributors to v4l2-ctrls.c is very limited, and I have no problem moving that to GPL. For me dmabuf is the rare exception where I prefer EXPORT_SYMBOL to prevent the worse evil of forcing vendors to create incompatible APIs. It's a sad but true that many GPU drivers are still closed source, particularly in the embedded world for which dmabuf was primarily designed. My understanding is that even the creation of incompatible Kernel API is a presumed GPL violation, as it is an attempt to circumvent the license. Basically, if vendors want to work with closed source, there are other options in the market. But if they want to work with Linux, they should be contributing upstream, instead of doing proprietary blobs. Regards, Mauro -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] dma-buf: Use EXPORT_SYMBOL
On Wed, Oct 10, 2012 at 1:17 PM, Alan Cox a...@lxorguk.ukuu.org.uk wrote: On Wed, 10 Oct 2012 08:56:32 -0700 Robert Morell rmor...@nvidia.com wrote: EXPORT_SYMBOL_GPL is intended to be used for an internal implementation issue, and not really an interface. The dma-buf infrastructure is explicitly intended as an interface between modules/drivers, so it should use EXPORT_SYMBOL instead. NAK. This needs at the very least the approval of all rights holders for the files concerned and all code exposed by this change. Well, for my contributions to dmabuf, I don't object.. and I think because we are planning to use dma-buf in userspace for dri3 / dri-next, I think that basically makes it a userspace facing kernel infrastructure which would be required for open and proprietary drivers alike. So I don't see much alternative to making this EXPORT_SYMBOL(). Of course, IANAL. BR, -R Also I'd note if you are trying to do this for the purpose of combining it with proprietary code then you are still in my view as a (and the view of many other) rights holder to the kernel likely to be in breach of the GPL requirements for a derivative work. You may consider that formal notification of my viewpoint. Your corporate legal team can explain to you why the fact you are now aware of my view is important to them. Alan ___ dri-devel mailing list dri-de...@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Linaro-mm-sig] [RFC] New dma_buf - EGLImage EGL extension
On Tue, Oct 2, 2012 at 2:10 PM, Maarten Lankhorst m.b.lankho...@gmail.com wrote: How do you want to deal with the case where Y' and CbCr are different hardware buffers? Could some support for 2d arrays be added in case Y' and CbCr are separated into top/bottom fields? How are semi-planar/planar formats handled that have a different width/height for Y' and CbCr? (YUV420) The API works (AFAIU) like drm addfb2 ioctl, take I420 for example, you could either do: single buffer: fd0 = fd offset0 = 0 pitch0 = width fd1 = fd offset1 = width * height pitch1 = width / 2 fd2 = fd offset2 = offset1 + (width / height / 4) pitch2 = width / 2 multiple buffers: offset0 = offset1 = offset2 = 0 fd0 = fd_luma fd1 = fd_u fd2 = fd_v ... and so on for interlaced/stereo.. is sticking our heads in sand an option? :-P You could get lots of permutations for data layout of fields between interlaced and stereo. One option might be to ignore and let the user create two egl-images and deal with blending in the shader? BR, -R -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] dma-buf: might_sleep() in dma_buf_unmap_attachment()
From: Rob Clark r...@ti.com We never really clarified if unmap could be done in atomic context. But since mapping might require sleeping, this implies mutex in use to synchronize mapping/unmapping, so unmap could sleep as well. Add a might_sleep() to clarify this. Signed-off-by: Rob Clark r...@ti.com Acked-by: Daniel Vetter daniel.vet...@ffwll.ch --- drivers/base/dma-buf.c |2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/base/dma-buf.c b/drivers/base/dma-buf.c index c30f3e1..877eacb 100644 --- a/drivers/base/dma-buf.c +++ b/drivers/base/dma-buf.c @@ -298,6 +298,8 @@ void dma_buf_unmap_attachment(struct dma_buf_attachment *attach, struct sg_table *sg_table, enum dma_data_direction direction) { + might_sleep(); + if (WARN_ON(!attach || !attach-dmabuf || !sg_table)) return; -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Linaro-mm-sig] [PATCH 2/4] dma-fence: dma-buf synchronization (v8 )
dma-buf.h All members are now initialized, so kmalloc can be used for allocating a dma-fence. More documentation added. Signed-off-by: Maarten Lankhorst maarten.lankho...@canonical.com I like the design of this, and especially that it's rather simple ;-) A few comments to polish the interface, implementation and documentation a bit below. --- Documentation/DocBook/device-drivers.tmpl |2 drivers/base/Makefile |2 drivers/base/dma-fence.c | 268 + include/linux/dma-fence.h | 124 + 4 files changed, 395 insertions(+), 1 deletion(-) create mode 100644 drivers/base/dma-fence.c create mode 100644 include/linux/dma-fence.h diff --git a/Documentation/DocBook/device-drivers.tmpl b/Documentation/DocBook/device-drivers.tmpl index 7514dbf..36252ac 100644 --- a/Documentation/DocBook/device-drivers.tmpl +++ b/Documentation/DocBook/device-drivers.tmpl @@ -126,6 +126,8 @@ X!Edrivers/base/interface.c /sect1 sect1titleDevice Drivers DMA Management/title !Edrivers/base/dma-buf.c +!Edrivers/base/dma-fence.c +!Iinclude/linux/dma-fence.h !Edrivers/base/dma-coherent.c !Edrivers/base/dma-mapping.c /sect1 diff --git a/drivers/base/Makefile b/drivers/base/Makefile index 5aa2d70..6e9f217 100644 --- a/drivers/base/Makefile +++ b/drivers/base/Makefile @@ -10,7 +10,7 @@ obj-$(CONFIG_CMA) += dma-contiguous.o obj-y+= power/ obj-$(CONFIG_HAS_DMA)+= dma-mapping.o obj-$(CONFIG_HAVE_GENERIC_DMA_COHERENT) += dma-coherent.o -obj-$(CONFIG_DMA_SHARED_BUFFER) += dma-buf.o +obj-$(CONFIG_DMA_SHARED_BUFFER) += dma-buf.o dma-fence.o obj-$(CONFIG_ISA)+= isa.o obj-$(CONFIG_FW_LOADER) += firmware_class.o obj-$(CONFIG_NUMA) += node.o diff --git a/drivers/base/dma-fence.c b/drivers/base/dma-fence.c new file mode 100644 index 000..93448e4 --- /dev/null +++ b/drivers/base/dma-fence.c @@ -0,0 +1,268 @@ +/* + * Fence mechanism for dma-buf to allow for asynchronous dma access + * + * Copyright (C) 2012 Texas Instruments + * Author: Rob Clark rob.cl...@linaro.org + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published by + * the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program. If not, see http://www.gnu.org/licenses/. + */ + +#include linux/slab.h +#include linux/sched.h +#include linux/export.h +#include linux/dma-fence.h + +/** + * dma_fence_signal - signal completion of a fence + * @fence: the fence to signal + * + * All registered callbacks will be called directly (synchronously) and + * all blocked waters will be awoken. This should be always be called on + * software only fences, or alternatively be called after + * dma_fence_ops::enable_signaling is called. I think we need to be cleare here when dma_fence_signal can be called: - for a sw-only fence (i.e. created with dma_fence_create) dma_fence_signal _must_ be called under all circumstances. - for any other fences, dma_fence_signal may be called, but it _must_ be called once the -enable_signalling func has been called and return 0 (i.e. success). - it may be called only _once_. + */ +int dma_fence_signal(struct dma_fence *fence) +{ + unsigned long flags; + int ret = -EINVAL; + + if (WARN_ON(!fence)) + return -EINVAL; + + spin_lock_irqsave(fence-event_queue.lock, flags); + if (!fence-signaled) { + fence-signaled = true; + __wake_up_locked_key(fence-event_queue, TASK_NORMAL, + fence-event_queue); + ret = 0; + } else + WARN(1, Already signaled); + spin_unlock_irqrestore(fence-event_queue.lock, flags); + + return ret; +} +EXPORT_SYMBOL_GPL(dma_fence_signal); + +static void release_fence(struct kref *kref) +{ + struct dma_fence *fence = + container_of(kref, struct dma_fence, refcount); + + BUG_ON(waitqueue_active(fence-event_queue)); + + if (fence-ops-release) + fence-ops-release(fence); + + kfree(fence); +} + +/** + * dma_fence_put - decreases refcount of the fence + * @fence: [in]fence to reduce refcount of + */ +void dma_fence_put(struct dma_fence *fence) +{ + if (WARN_ON(!fence)) + return; + kref_put(fence-refcount, release_fence); +} +EXPORT_SYMBOL_GPL(dma_fence_put); + +/** + * dma_fence_get - increases refcount of the fence + * @fence
Re: [Linaro-mm-sig] [PATCH 1/4] dma-buf: remove fallback for !CONFIG_DMA_SHARED_BUFFER
On Fri, Aug 10, 2012 at 2:32 PM, Daniel Vetter dan...@ffwll.ch wrote: On Fri, Aug 10, 2012 at 04:57:43PM +0200, Maarten Lankhorst wrote: Documentation says that code requiring dma-buf should add it to select, so inline fallbacks are not going to be used. A link error will make it obvious what went wrong, instead of silently doing nothing at runtime. Signed-off-by: Maarten Lankhorst maarten.lankho...@canonical.com I've botched it more than once to update these when creating new dma-buf code. Hence Reviewed-by: Daniel Vetter daniel.vet...@ffwll.ch yeah, I think the fallbacks date back to when it was a user configurable option, rather than something select'd by drivers using dmabuf, and we just never went back to clean up. Let's drop the fallbacks. Reviewed-by: Rob Clark rob.cl...@linaro.org -- Daniel Vetter Mail: dan...@ffwll.ch Mobile: +41 (0)79 365 57 48 -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Linaro-mm-sig] [PATCH 2/4] dma-fence: dma-buf synchronization (v8 )
On Sat, Aug 11, 2012 at 2:22 PM, Daniel Vetter dan...@ffwll.ch wrote: + +/** + * dma_fence_wait - wait for a fence to be signaled + * + * @fence: [in]The fence to wait on + * @intr:[in]if true, do an interruptible wait + * @timeout: [in]absolute time for timeout, in jiffies. I don't quite like this, I think we should keep the styl of all other wait_*_timeout functions and pass the arg as timeout in jiffies (and also the same return semantics). Otherwise well have funny code that needs to handle return values differently depending upon whether it waits upon a dma_fence or a native object (where it would us the wait_*_timeout functions directly). We did start out this way, but there was an ugly jiffies roll-over problem that was difficult to deal with properly. Using an absolute time avoided the problem. Well, as-is the api works differently than all the other _timeout apis I've seen in the kernel, which makes it confusing. Also, I don't quite see what jiffies wraparound issue there is? iirc, the problem was in dmabufmgr, in dmabufmgr_wait_completed_cpu().. with an absolute timeout, it could loop over all the fences without having to adjust the timeout for the elapsed time. Otherwise it had to adjust the timeout and keep track of when the timeout elapsed without confusing itself via rollover. BR, -R -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2 3/9] v4l: add buffer exporting via dmabuf
On Tue, Jul 31, 2012 at 7:11 AM, Hans Verkuil hverk...@xs4all.nl wrote: For that matter, wouldn't it be useful to support exporting a userptr buffer at some point in the future? Shouldn't USERPTR usage be discouraged once we get dma-buf support ? Why? It's perfectly fine to use it and it's not going away. I'm not saying that we should support exporting a userptr buffer as a dmabuf fd, but I'm just wondering if that is possible at all and how difficult it would be. it seems not terribly safe, since you don't really have much control over where the memory comes from w/ userptr. I'm more in favor of discouraging usage of userptr BR, -R -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2 3/9] v4l: add buffer exporting via dmabuf
On Tue, Jul 31, 2012 at 8:39 AM, Rémi Denis-Courmont r...@remlab.net wrote: Le mardi 31 juillet 2012 14:56:14 Laurent Pinchart, vous avez écrit : For that matter, wouldn't it be useful to support exporting a userptr buffer at some point in the future? Shouldn't USERPTR usage be discouraged once we get dma-buf support ? USERPTR, where available, is currently the only way to perform zero-copy from kernel to userspace. READWRITE does not support zero-copy at all. MMAP only supports zero-copy if userspace knows a boundary on the number of concurrent buffers *and* the device can deal with that number of buffers; in general, MMAP requires memory copying. hmm, this sounds like the problem is device pre-allocating buffers? Anyways, last time I looked, the vb2 core supported changing dmabuf fd each time you QBUF, in a similar way to what you can do w/ userptr. So that seems to get you the advantages you miss w/ mmap without the pitfalls of userptr. I am not sure DMABUF even supports transmitting data efficiently to userspace. In my understanding, it's meant for transmitting data between DSP's bypassing userspace entirely, in other words the exact opposite of what USERBUF does. well, dmabuf's can be mmap'd.. so it is more a matter of where the buffer gets allocated, malloc() or from some driver (v4l2 or other). There are a *ton* of ways userspace allocated memory can go badly, esp. if the hw has special requirements about memory (GFP_DMA32 in a system w/ PAE/LPAE, certain ranges of memory, certain alignment of memory, etc). BR, -R -- Rémi Denis-Courmont http://www.remlab.net/ http://fi.linkedin.com/in/remidenis -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] dma-buf: add helpers for attacher dma-parms
Fyi, Daniel Vetter had suggested on IRC that it would be cleaner to have a single helper fxn that most-restrictive union of all attached device's dma_parms. Really this should include dma_mask and coherent_dma_mask, I think. But that touches a lot of other places in the code. If no one objects to the cleanup of moving dma_mask/coherent_dma_mask into dma_parms, I'll do this first. So anyways, don't consider this patch yet for inclusion, I'll make an updated one based on dma_parms.. BR, -R On Thu, Jul 19, 2012 at 11:23 AM, Rob Clark rob.cl...@linaro.org wrote: From: Rob Clark r...@ti.com Add some helpers to iterate through all attachers and get the most restrictive segment size/count/boundary. Signed-off-by: Rob Clark r...@ti.com --- drivers/base/dma-buf.c | 63 +++ include/linux/dma-buf.h | 19 ++ 2 files changed, 82 insertions(+) diff --git a/drivers/base/dma-buf.c b/drivers/base/dma-buf.c index 24e88fe..757ee20 100644 --- a/drivers/base/dma-buf.c +++ b/drivers/base/dma-buf.c @@ -192,6 +192,69 @@ void dma_buf_put(struct dma_buf *dmabuf) EXPORT_SYMBOL_GPL(dma_buf_put); /** + * dma_buf_max_seg_size - helper for exporters to get the minimum of + * all attached device's max segment size + */ +unsigned int dma_buf_max_seg_size(struct dma_buf *dmabuf) +{ + struct dma_buf_attachment *attach; + unsigned int max = (unsigned int)-1; + + if (WARN_ON(!dmabuf)) + return 0; + + mutex_lock(dmabuf-lock); + list_for_each_entry(attach, dmabuf-attachments, node) + max = min(max, dma_get_max_seg_size(attach-dev)); + mutex_unlock(dmabuf-lock); + + return max; +} +EXPORT_SYMBOL_GPL(dma_buf_max_seg_size); + +/** + * dma_buf_max_seg_count - helper for exporters to get the minimum of + * all attached device's max segment count + */ +unsigned int dma_buf_max_seg_count(struct dma_buf *dmabuf) +{ + struct dma_buf_attachment *attach; + unsigned int max = (unsigned int)-1; + + if (WARN_ON(!dmabuf)) + return 0; + + mutex_lock(dmabuf-lock); + list_for_each_entry(attach, dmabuf-attachments, node) + max = min(max, dma_get_max_seg_count(attach-dev)); + mutex_unlock(dmabuf-lock); + + return max; +} +EXPORT_SYMBOL_GPL(dma_buf_max_seg_count); + +/** + * dma_buf_get_seg_boundary - helper for exporters to get the most + * restrictive segment alignment of all the attached devices + */ +unsigned int dma_buf_get_seg_boundary(struct dma_buf *dmabuf) +{ + struct dma_buf_attachment *attach; + unsigned int mask = (unsigned int)-1; + + if (WARN_ON(!dmabuf)) + return 0; + + mutex_lock(dmabuf-lock); + list_for_each_entry(attach, dmabuf-attachments, node) + mask = dma_get_seg_boundary(attach-dev); + mutex_unlock(dmabuf-lock); + + return mask; +} +EXPORT_SYMBOL_GPL(dma_buf_get_seg_boundary); + +/** * dma_buf_attach - Add the device to dma_buf's attachments list; optionally, * calls attach() of dma_buf_ops to allow device-specific attach functionality * @dmabuf:[in]buffer to attach device to. diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h index eb48f38..9533b9b 100644 --- a/include/linux/dma-buf.h +++ b/include/linux/dma-buf.h @@ -167,6 +167,10 @@ int dma_buf_fd(struct dma_buf *dmabuf, int flags); struct dma_buf *dma_buf_get(int fd); void dma_buf_put(struct dma_buf *dmabuf); +unsigned int dma_buf_max_seg_size(struct dma_buf *dmabuf); +unsigned int dma_buf_max_seg_count(struct dma_buf *dmabuf); +unsigned int dma_buf_get_seg_boundary(struct dma_buf *dmabuf); + struct sg_table *dma_buf_map_attachment(struct dma_buf_attachment *, enum dma_data_direction); void dma_buf_unmap_attachment(struct dma_buf_attachment *, struct sg_table *, @@ -220,6 +224,21 @@ static inline void dma_buf_put(struct dma_buf *dmabuf) return; } +static inline unsigned int dma_buf_max_seg_size(struct dma_buf *dmabuf) +{ + return 0; +} + +static inline unsigned int dma_buf_max_seg_count(struct dma_buf *dmabuf) +{ + return 0; +} + +static inline unsigned int dma_buf_get_seg_boundary(struct dma_buf *dmabuf) +{ + return 0; +} + static inline struct sg_table *dma_buf_map_attachment( struct dma_buf_attachment *attach, enum dma_data_direction write) { -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/2] dma-parms and helpers for dma-buf
From: Rob Clark r...@ti.com Re-sending first patch, with a wider audience. Apparently I didn't spam enough inboxes the first time. And, at Daniel Vetter's suggestion, adding some helper functions in dma-buf to get the most restrictive parameters of all the attached devices. Rob Clark (2): device: add dma_params-max_segment_count dma-buf: add helpers for attacher dma-parms drivers/base/dma-buf.c | 63 +++ include/linux/device.h |1 + include/linux/dma-buf.h | 19 + include/linux/dma-mapping.h | 16 +++ 4 files changed, 99 insertions(+) -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] dma-buf: add helpers for attacher dma-parms
From: Rob Clark r...@ti.com Add some helpers to iterate through all attachers and get the most restrictive segment size/count/boundary. Signed-off-by: Rob Clark r...@ti.com --- drivers/base/dma-buf.c | 63 +++ include/linux/dma-buf.h | 19 ++ 2 files changed, 82 insertions(+) diff --git a/drivers/base/dma-buf.c b/drivers/base/dma-buf.c index 24e88fe..757ee20 100644 --- a/drivers/base/dma-buf.c +++ b/drivers/base/dma-buf.c @@ -192,6 +192,69 @@ void dma_buf_put(struct dma_buf *dmabuf) EXPORT_SYMBOL_GPL(dma_buf_put); /** + * dma_buf_max_seg_size - helper for exporters to get the minimum of + * all attached device's max segment size + */ +unsigned int dma_buf_max_seg_size(struct dma_buf *dmabuf) +{ + struct dma_buf_attachment *attach; + unsigned int max = (unsigned int)-1; + + if (WARN_ON(!dmabuf)) + return 0; + + mutex_lock(dmabuf-lock); + list_for_each_entry(attach, dmabuf-attachments, node) + max = min(max, dma_get_max_seg_size(attach-dev)); + mutex_unlock(dmabuf-lock); + + return max; +} +EXPORT_SYMBOL_GPL(dma_buf_max_seg_size); + +/** + * dma_buf_max_seg_count - helper for exporters to get the minimum of + * all attached device's max segment count + */ +unsigned int dma_buf_max_seg_count(struct dma_buf *dmabuf) +{ + struct dma_buf_attachment *attach; + unsigned int max = (unsigned int)-1; + + if (WARN_ON(!dmabuf)) + return 0; + + mutex_lock(dmabuf-lock); + list_for_each_entry(attach, dmabuf-attachments, node) + max = min(max, dma_get_max_seg_count(attach-dev)); + mutex_unlock(dmabuf-lock); + + return max; +} +EXPORT_SYMBOL_GPL(dma_buf_max_seg_count); + +/** + * dma_buf_get_seg_boundary - helper for exporters to get the most + * restrictive segment alignment of all the attached devices + */ +unsigned int dma_buf_get_seg_boundary(struct dma_buf *dmabuf) +{ + struct dma_buf_attachment *attach; + unsigned int mask = (unsigned int)-1; + + if (WARN_ON(!dmabuf)) + return 0; + + mutex_lock(dmabuf-lock); + list_for_each_entry(attach, dmabuf-attachments, node) + mask = dma_get_seg_boundary(attach-dev); + mutex_unlock(dmabuf-lock); + + return mask; +} +EXPORT_SYMBOL_GPL(dma_buf_get_seg_boundary); + +/** * dma_buf_attach - Add the device to dma_buf's attachments list; optionally, * calls attach() of dma_buf_ops to allow device-specific attach functionality * @dmabuf:[in]buffer to attach device to. diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h index eb48f38..9533b9b 100644 --- a/include/linux/dma-buf.h +++ b/include/linux/dma-buf.h @@ -167,6 +167,10 @@ int dma_buf_fd(struct dma_buf *dmabuf, int flags); struct dma_buf *dma_buf_get(int fd); void dma_buf_put(struct dma_buf *dmabuf); +unsigned int dma_buf_max_seg_size(struct dma_buf *dmabuf); +unsigned int dma_buf_max_seg_count(struct dma_buf *dmabuf); +unsigned int dma_buf_get_seg_boundary(struct dma_buf *dmabuf); + struct sg_table *dma_buf_map_attachment(struct dma_buf_attachment *, enum dma_data_direction); void dma_buf_unmap_attachment(struct dma_buf_attachment *, struct sg_table *, @@ -220,6 +224,21 @@ static inline void dma_buf_put(struct dma_buf *dmabuf) return; } +static inline unsigned int dma_buf_max_seg_size(struct dma_buf *dmabuf) +{ + return 0; +} + +static inline unsigned int dma_buf_max_seg_count(struct dma_buf *dmabuf) +{ + return 0; +} + +static inline unsigned int dma_buf_get_seg_boundary(struct dma_buf *dmabuf) +{ + return 0; +} + static inline struct sg_table *dma_buf_map_attachment( struct dma_buf_attachment *attach, enum dma_data_direction write) { -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] device: add dma_params-max_segment_count
From: Rob Clark r...@ti.com For devices which have constraints about maximum number of segments in an sglist. For example, a device which could only deal with contiguous buffers would set max_segment_count to 1. The initial motivation is for devices sharing buffers via dma-buf, to allow the buffer exporter to know the constraints of other devices which have attached to the buffer. The dma_mask and fields in 'struct device_dma_parameters' tell the exporter everything else that is needed, except whether the importer has constraints about maximum number of segments. Signed-off-by: Rob Clark r...@ti.com --- include/linux/device.h |1 + include/linux/dma-mapping.h | 16 2 files changed, 17 insertions(+) diff --git a/include/linux/device.h b/include/linux/device.h index 161d962..3813735 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -568,6 +568,7 @@ struct device_dma_parameters { * sg limitations. */ unsigned int max_segment_size; + unsigned int max_segment_count;/* zero for unlimited */ unsigned long segment_boundary_mask; }; diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h index dfc099e..f380f79 100644 --- a/include/linux/dma-mapping.h +++ b/include/linux/dma-mapping.h @@ -111,6 +111,22 @@ static inline unsigned int dma_set_max_seg_size(struct device *dev, return -EIO; } +static inline unsigned int dma_get_max_seg_count(struct device *dev) +{ + return dev-dma_parms ? dev-dma_parms-max_segment_count : 0; +} + +static inline int dma_set_max_seg_count(struct device *dev, + unsigned int count) +{ + if (dev-dma_parms) { + dev-dma_parms-max_segment_count = count; + return 0; + } else + return -EIO; +} + + static inline unsigned long dma_get_seg_boundary(struct device *dev) { return dev-dma_parms ? -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Documentation: DocBook DRM framework documentation
On Thu, Jul 12, 2012 at 7:00 PM, Laurent Pinchart laurent.pinch...@ideasonboard.com wrote: Signed-off-by: Laurent Pinchart laurent.pinch...@ideasonboard.com --- Documentation/DocBook/drm.tmpl | 2835 +++- 1 files changed, 2226 insertions(+), 609 deletions(-) Hi everybody, Here's the DRM kernel framework documentation previously posted to the dri-devel mailing list. The documentation has been reworked, converted to DocBook and merged with the existing DocBook DRM documentation stub. The result doesn't cover the whole DRM API but should hopefully be good enough for a start. I've done my best to follow a natural flow starting at initialization and covering the major DRM internal topics. As I'm not a native English speaker I'm not totally happy with the result, so if anyone wants to edit the text please feel free to do so. Review will as usual be appreciated, and acks will be even more welcome (I've been working on this document for longer than I feel comfortable with). btw, thanks for this! One minor typo below.. with that, Reviewed-by: Rob Clark rob.cl...@linaro.org diff --git a/Documentation/DocBook/drm.tmpl b/Documentation/DocBook/drm.tmpl index 196b8b9..44a2c66 100644 [snip] +sect2 + titleOutput Polling/title + synopsisvoid (*output_poll_changed)(struct drm_device *dev);/synopsis + para +This operation notifies the driver that the status of one or more +connectors has changed. Drivers that use the fbdev helper can just call s/fbdev/fb/ +the functiondrm_fb_helper_hotplug_event/function function to handle +this operation. BR, -R -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html