Re: [PATCH] reservation: don't wait when timeout=0

2017-11-21 Thread Rob Clark
On Tue, Nov 21, 2017 at 9:38 AM, Chris Wilson <ch...@chris-wilson.co.uk> wrote:
> Quoting Rob Clark (2017-11-21 14:08:46)
>> If we are testing if a reservation object's fences have been
>> signaled with timeout=0 (non-blocking), we need to pass 0 for
>> timeout to dma_fence_wait_timeout().
>>
>> Plus bonus spelling correction.
>>
>> Signed-off-by: Rob Clark <robdcl...@gmail.com>
>> ---
>>  drivers/dma-buf/reservation.c | 11 +--
>>  1 file changed, 9 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
>> index dec3a815455d..71f51140a9ad 100644
>> --- a/drivers/dma-buf/reservation.c
>> +++ b/drivers/dma-buf/reservation.c
>> @@ -420,7 +420,7 @@ EXPORT_SYMBOL_GPL(reservation_object_get_fences_rcu);
>>   *
>>   * RETURNS
>>   * Returns -ERESTARTSYS if interrupted, 0 if the wait timed out, or
>> - * greater than zer on success.
>> + * greater than zero on success.
>>   */
>>  long reservation_object_wait_timeout_rcu(struct reservation_object *obj,
>>  bool wait_all, bool intr,
>> @@ -483,7 +483,14 @@ long reservation_object_wait_timeout_rcu(struct 
>> reservation_object *obj,
>> goto retry;
>> }
>>
>> -   ret = dma_fence_wait_timeout(fence, intr, ret);
>> +   /*
>> +* Note that dma_fence_wait_timeout() will return 1 if
>> +* the fence is already signaled, so in the wait_all
>> +* case when we go through the retry loop again, ret
>> +* will be greater than 0 and we don't want this to
>> +* cause _wait_timeout() to block
>> +*/
>> +   ret = dma_fence_wait_timeout(fence, intr, timeout ? ret : 0);
>
> One should ask if we should just fix the interface to stop returning
> incorrect results (stop "correcting" a completion with 0 jiffies remaining
> as 1). A timeout can be distinguished by -ETIME (or your pick of errno).

perhaps -EBUSY, if we go that route (although maybe it should be a
follow-on patch, this one is suitable for backport to stable/lts if
one should so choose..)

I think current approach was chosen to match schedule_timeout() and
other such functions that take a timeout in jiffies.  Not making a
judgement on whether that is a good or bad reason..

BR,
-R

> -Chris


[PATCH] reservation: don't wait when timeout=0

2017-11-21 Thread Rob Clark
If we are testing if a reservation object's fences have been
signaled with timeout=0 (non-blocking), we need to pass 0 for
timeout to dma_fence_wait_timeout().

Plus bonus spelling correction.

Signed-off-by: Rob Clark <robdcl...@gmail.com>
---
 drivers/dma-buf/reservation.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
index dec3a815455d..71f51140a9ad 100644
--- a/drivers/dma-buf/reservation.c
+++ b/drivers/dma-buf/reservation.c
@@ -420,7 +420,7 @@ EXPORT_SYMBOL_GPL(reservation_object_get_fences_rcu);
  *
  * RETURNS
  * Returns -ERESTARTSYS if interrupted, 0 if the wait timed out, or
- * greater than zer on success.
+ * greater than zero on success.
  */
 long reservation_object_wait_timeout_rcu(struct reservation_object *obj,
 bool wait_all, bool intr,
@@ -483,7 +483,14 @@ long reservation_object_wait_timeout_rcu(struct 
reservation_object *obj,
goto retry;
}
 
-   ret = dma_fence_wait_timeout(fence, intr, ret);
+   /*
+* Note that dma_fence_wait_timeout() will return 1 if
+* the fence is already signaled, so in the wait_all
+* case when we go through the retry loop again, ret
+* will be greater than 0 and we don't want this to
+* cause _wait_timeout() to block
+*/
+   ret = dma_fence_wait_timeout(fence, intr, timeout ? ret : 0);
dma_fence_put(fence);
if (ret > 0 && wait_all && (i + 1 < shared_count))
goto retry;
-- 
2.13.6



Re: DRM Format Modifiers in v4l2

2017-09-01 Thread Rob Clark
On Fri, Sep 1, 2017 at 3:13 AM, Laurent Pinchart
 wrote:
> Hi Nicolas,
>
> On Thursday, 31 August 2017 19:12:58 EEST Nicolas Dufresne wrote:
>> Le jeudi 31 août 2017 à 17:28 +0300, Laurent Pinchart a écrit :
>> >> e.g. if I have two devices which support MODIFIER_FOO, I could attempt
>> >> to share a buffer between them which uses MODIFIER_FOO without
>> >> necessarily knowing exactly what it is/does.
>> >
>> > Userspace could certainly set modifiers blindly, but the point of
>> > modifiers is to generate side effects benefitial to the use case at hand
>> > (for instance by optimizing the memory access pattern). To use them
>> > meaningfully userspace would need to have at least an idea of the side
>> > effects they generate.
>>
>> Generic userspace will basically pick some random combination.
>
> In that case userspace could set no modifier at all by default (except in the
> case where unmodified formats are not supported by the hardware, but I don't
> expect that to be the most common case).
>
>> To allow generically picking the optimal configuration we could indeed rely
>> on the application knowledge, but we could also enhance the spec so that
>> the order in the enumeration becomes meaningful.
>
> I'm not sure how far we should go. I could imagine a system where the API
> would report capabilities for modifiers (e.g. this modifier lowers the
> bandwidth, this one enhances the quality, ...), but going in that direction,
> where do we stop ? In practice I expect userspace to know some information
> about the hardware, so I'd rather avoid over-engineering the API.
>

I think in the (hopefully not too) long term, something like
https://github.com/cubanismo/allocator/ is the way forward.  That
doesn't quite solve how v4l2 kernel part sorts out w/ corresponding
userspace .so what is preferable, but at least that is
compartmentalized to v4l2.. on the gl/vk side of things there will ofc
be a hardware specific userspace part that knows what it prefers.  For
v4l2, it probably makes sense to sort out what the userspace level API
is and work backwards from there, rather than risk trying to design a
kernel uapi that might turn out to be the wrong thing.

BR,
-R


Re: [PATCH] media: venus: hfi: fix error handling in hfi_sys_init_done()

2017-07-09 Thread Rob Clark
On Sun, Jul 9, 2017 at 3:49 PM, Stanimir Varbanov
<stanimir.varba...@linaro.org> wrote:
> Hi Rob,
>
> On 07/09/2017 04:19 PM, Rob Clark wrote:
>> Not entirely sure what triggers it, but with venus build as kernel
>> module and in initrd, we hit this crash:
>
> Is it happens occasionally or everytime in the initrd? And also with
> your patch it will bail out on venus_probe, does it crash again on next
> venus_probe?

seems to happen every time.. (module is ending up in initrd, but not
sure if fw is.. which might be triggering this?)

I could not boot successfully without this patch, but otoh I haven't
yet tried if venus actually works after boot.

BR,
-R

>>
>>   Unable to handle kernel paging request at virtual address 80003c039000
>>   pgd = 0a14f000
>>   [80003c039000] *pgd=bd9f7003, *pud=bd9f6003, 
>> *pmd=bd9f0003, *pte=
>>   Internal error: Oops: 9607 [#1] SMP
>>   Modules linked in: qcom_wcnss_pil(E+) crc32_ce(E) qcom_common(E) 
>> venus_core(E+) remoteproc(E) snd_soc_msm8916_digital(E) virtio_ring(E) 
>> cdc_ether(E) snd_soc_lpass_apq8016(E) snd_soc_lpass_cpu(E) 
>> snd_soc_apq8016_sbc(E) snd_soc_lpass_platform(E) v4l2_mem2mem(E) virtio(E) 
>> snd_soc_core(E) ac97_bus(E) snd_pcm_dmaengine(E) snd_seq(E) leds_gpio(E) 
>> videobuf2_v4l2(E) videobuf2_core(E) snd_seq_device(E) sndi_pcm(E) 
>> videodev(E) media(E) nvmem_qfprom(E) msm(E) snd_timer(E) snd(E) soundcore(E) 
>> spi_qup(E) mdt_loader(E) qcom_tsens(E) qcom_spmi_temp_alarm(E) nvmem_core(E) 
>> msm_rng(E) uas(E) usb_storage(E) dm9601(E) usbnet(E) mii(E) mmc_block(E) 
>> adv7511(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) 
>> fb_sys_fops(E) qcom_spmi_vadc(E) qcom_vadc_common(PE) industrialio(E) 
>> pinctrl_spmi_mpp(E)
>>pinctrl_spmi_gpio(E) rtc_pm8xxx(E) clk_smd_rpm(E) sdhci_msm(E) 
>> sdhci_pltfm(E) qcom_smd_regulator(E) drm(E) smd_rpm(E) qcom_spmi_pmic(E) 
>> regmap_spmi(E) ci_hdrc_msm(E) ci_hdrc(E) usb3503(E) extcon_usb_gpio(E) 
>> phy_msm_usb(E) udc_core(E) qcom_hwspinlock(E) extcon_core(E) ehci_msm(E) 
>> i2c_qup(E) sdhci(E) mmc_core(E) spmi_pmic_arb(E) spmi(E) qcom_smd(E) smsm(E) 
>> rpmsg_core(E) smp2p(E) smem(E) hwspinlock_core(E) gpio_keys(E)
>>   CPU: 2 PID: 551 Comm: irq/150-venus Tainted: PE   4.12.0+ #1625
>>   Hardware name: qualcomm dragonboard410c/dragonboard410c, BIOS 
>> 2017.07-rc2-00144-ga97bdbdf72-dirty 07/08/2017
>>   task: 800037338000 task.stack: 800038e0
>>   PC is at hfi_sys_init_done+0x64/0x140 [venus_core]
>>   LR is at hfi_process_msg_packet+0xcc/0x1e8 [venus_core]
>>   pc : [] lr : [] pstate: 20400145
>>   sp : 800038e03c60
>>   x29: 800038e03c60 x28: 
>>   x27: 000df018 x26: 0118f4d0
>>   x25: 00020003 x24: 80003a8d3010
>>   x23: 0118f760 x22: 800037b40028
>>   x21: 8000382981f0 x20: 800037b40028
>>   x19: 80003c039000 x18: 0020
>>   x17:  x16: 800037338000
>>   x15:  x14: 00100014
>>   x13: 00011007 x12: 00010020
>>   x11: 100e x10: 0001
>>   x9 : 0002 x8 : 00140001
>>   x7 : 1010 x6 : 0148
>>   x5 : 1009 x4 : 80003c039000
>>   x3 : cd770abb x2 : 0042
>>   x1 : 0788 x0 : 0002
>>   Process irq/150-venus (pid: 551, stack limit = 0x800038e0)
>>   Call trace:
>>   [] hfi_sys_init_done+0x64/0x140 [venus_core]
>>   [] hfi_process_msg_packet+0xcc/0x1e8 [venus_core]
>>   [] venus_isr_thread+0x1b4/0x208 [venus_core]
>>   [] hfi_isr_thread+0x28/0x38 [venus_core]
>>   [] irq_thread_fn+0x30/0x70
>>   [] irq_thread+0x14c/0x1c8
>>   [] kthread+0x138/0x140
>>   [] ret_from_fork+0x10/0x40
>>   Code: 52820125 52820207 7a431820 54000249 (b9400263)
>>   ---[ end trace c963460f20a984b6 ]---
>>
>> The problem is that in the error case, we've incremented the data ptr
>> but not decremented rem_bytes, and keep reading (presumably garbage)
>> until eventually we go beyond the end of the buffer.
>>
>> Instead, on first error, we should probably just bail out.  Other
>> option is to increment read_bytes by sizeof(u32) before the switch,
>> rather than only accounting for the ptype header in the non-error
>> case.  Note that in this case it is HFI_ERR_SYS_INVALID_PARAMETER,
>> ie. an unrecognized/unsupported parameter, so interpreting the next
>> word as a property type would be bogus.  The other erro

[PATCH] media: venus: hfi: fix error handling in hfi_sys_init_done()

2017-07-09 Thread Rob Clark
Not entirely sure what triggers it, but with venus build as kernel
module and in initrd, we hit this crash:

  Unable to handle kernel paging request at virtual address 80003c039000
  pgd = 0a14f000
  [80003c039000] *pgd=bd9f7003, *pud=bd9f6003, 
*pmd=bd9f0003, *pte=
  Internal error: Oops: 9607 [#1] SMP
  Modules linked in: qcom_wcnss_pil(E+) crc32_ce(E) qcom_common(E) 
venus_core(E+) remoteproc(E) snd_soc_msm8916_digital(E) virtio_ring(E) 
cdc_ether(E) snd_soc_lpass_apq8016(E) snd_soc_lpass_cpu(E) 
snd_soc_apq8016_sbc(E) snd_soc_lpass_platform(E) v4l2_mem2mem(E) virtio(E) 
snd_soc_core(E) ac97_bus(E) snd_pcm_dmaengine(E) snd_seq(E) leds_gpio(E) 
videobuf2_v4l2(E) videobuf2_core(E) snd_seq_device(E) snd_pcm(E) videodev(E) 
media(E) nvmem_qfprom(E) msm(E) snd_timer(E) snd(E) soundcore(E) spi_qup(E) 
mdt_loader(E) qcom_tsens(E) qcom_spmi_temp_alarm(E) nvmem_core(E) msm_rng(E) 
uas(E) usb_storage(E) dm9601(E) usbnet(E) mii(E) mmc_block(E) adv7511(E) 
drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) 
qcom_spmi_vadc(E) qcom_vadc_common(PE) industrialio(E) pinctrl_spmi_mpp(E)
   pinctrl_spmi_gpio(E) rtc_pm8xxx(E) clk_smd_rpm(E) sdhci_msm(E) 
sdhci_pltfm(E) qcom_smd_regulator(E) drm(E) smd_rpm(E) qcom_spmi_pmic(E) 
regmap_spmi(E) ci_hdrc_msm(E) ci_hdrc(E) usb3503(E) extcon_usb_gpio(E) 
phy_msm_usb(E) udc_core(E) qcom_hwspinlock(E) extcon_core(E) ehci_msm(E) 
i2c_qup(E) sdhci(E) mmc_core(E) spmi_pmic_arb(E) spmi(E) qcom_smd(E) smsm(E) 
rpmsg_core(E) smp2p(E) smem(E) hwspinlock_core(E) gpio_keys(E)
  CPU: 2 PID: 551 Comm: irq/150-venus Tainted: PE   4.12.0+ #1625
  Hardware name: qualcomm dragonboard410c/dragonboard410c, BIOS 
2017.07-rc2-00144-ga97bdbdf72-dirty 07/08/2017
  task: 800037338000 task.stack: 800038e0
  PC is at hfi_sys_init_done+0x64/0x140 [venus_core]
  LR is at hfi_process_msg_packet+0xcc/0x1e8 [venus_core]
  pc : [] lr : [] pstate: 20400145
  sp : 800038e03c60
  x29: 800038e03c60 x28: 
  x27: 000df018 x26: 0118f4d0
  x25: 00020003 x24: 80003a8d3010
  x23: 0118f760 x22: 800037b40028
  x21: 8000382981f0 x20: 800037b40028
  x19: 80003c039000 x18: 0020
  x17:  x16: 800037338000
  x15:  x14: 00100014
  x13: 00011007 x12: 00010020
  x11: 100e x10: 0001
  x9 : 0002 x8 : 00140001
  x7 : 1010 x6 : 0148
  x5 : 1009 x4 : 80003c039000
  x3 : cd770abb x2 : 0042
  x1 : 0788 x0 : 0002
  Process irq/150-venus (pid: 551, stack limit = 0x800038e0)
  Call trace:
  [] hfi_sys_init_done+0x64/0x140 [venus_core]
  [] hfi_process_msg_packet+0xcc/0x1e8 [venus_core]
  [] venus_isr_thread+0x1b4/0x208 [venus_core]
  [] hfi_isr_thread+0x28/0x38 [venus_core]
  [] irq_thread_fn+0x30/0x70
  [] irq_thread+0x14c/0x1c8
  [] kthread+0x138/0x140
  [] ret_from_fork+0x10/0x40
  Code: 52820125 52820207 7a431820 54000249 (b9400263)
  ---[ end trace c963460f20a984b6 ]---

The problem is that in the error case, we've incremented the data ptr
but not decremented rem_bytes, and keep reading (presumably garbage)
until eventually we go beyond the end of the buffer.

Instead, on first error, we should probably just bail out.  Other
option is to increment read_bytes by sizeof(u32) before the switch,
rather than only accounting for the ptype header in the non-error
case.  Note that in this case it is HFI_ERR_SYS_INVALID_PARAMETER,
ie. an unrecognized/unsupported parameter, so interpreting the next
word as a property type would be bogus.  The other error cases are
due to truncated buffer, so there isn't likely to be anything valid
to interpret in the remainder of the buffer.  So just bailing seems
like a reasonable solution.

Signed-off-by: Rob Clark <robdcl...@gmail.com>
---
 drivers/media/platform/qcom/venus/hfi_msgs.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/media/platform/qcom/venus/hfi_msgs.c 
b/drivers/media/platform/qcom/venus/hfi_msgs.c
index debf80a92797..4190825b20a1 100644
--- a/drivers/media/platform/qcom/venus/hfi_msgs.c
+++ b/drivers/media/platform/qcom/venus/hfi_msgs.c
@@ -239,11 +239,12 @@ static void hfi_sys_init_done(struct venus_core *core, 
struct venus_inst *inst,
break;
}
 
-   if (!error) {
-   rem_bytes -= read_bytes;
-   data += read_bytes;
-   num_properties--;
-   }
+   if (error)
+   break;
+
+   rem_bytes -= read_bytes;
+   data += read_bytes;
+   num_properties--;
}
 
 err_no_prop:
-- 
2.13.0



Re: [RFC PATCH 00/12] Ion cleanup in preparation for moving out of staging

2017-03-13 Thread Rob Clark
On Mon, Mar 13, 2017 at 5:09 PM, Laura Abbott  wrote:
>> Hm, we might want to expose all the heaps as individual
>> /dev/ion_$heapname nodes? Should we do this from the start, since
>> we're massively revamping the uapi anyway (imo not needed, current
>> state seems to work too)?
>> -Daniel
>>
>
> I thought about that. One advantage with separate /dev/ion_$heap
> is that we don't have to worry about a limit of 32 possible
> heaps per system (32-bit heap id allocation field). But dealing
> with an ioctl seems easier than names. Userspace might be less
> likely to hardcode random id numbers vs. names as well.


other advantage, I think, is selinux (brought up elsewhere on this
thread).. heaps at known fixed PAs are useful for certain sorts of
attacks so being able to restrict access more easily seems like a good
thing

BR,
-R


Re: [RFC PATCH 00/12] Ion cleanup in preparation for moving out of staging

2017-03-10 Thread Rob Clark
On Fri, Mar 10, 2017 at 7:40 AM, Daniel Vetter  wrote:
> On Fri, Mar 10, 2017 at 10:31:13AM +, Brian Starkey wrote:
>> Hi,
>>
>> On Thu, Mar 09, 2017 at 09:38:49AM -0800, Laura Abbott wrote:
>> > On 03/09/2017 02:00 AM, Benjamin Gaignard wrote:
>>
>> [snip]
>>
>> > >
>> > > For me those patches are going in the right direction.
>> > >
>> > > I still have few questions:
>> > > - since alignment management has been remove from ion-core, should it
>> > > be also removed from ioctl structure ?
>> >
>> > Yes, I think I'm going to go with the suggestion to fixup the ABI
>> > so we don't need the compat layer and as part of that I'm also
>> > dropping the align argument.
>> >
>>
>> Is the only motivation for removing the alignment parameter that
>> no-one got around to using it for something useful yet?
>> The original comment was true - different devices do have different
>> alignment requirements.
>>
>> Better alignment can help SMMUs use larger blocks when mapping,
>> reducing TLB pressure and the chance of a page table walk causing
>> display underruns.
>
> Extending ioctl uapi is easy, trying to get rid of bad uapi is much
> harder. Given that right now we don't have an ion allocator that does
> alignment I think removing it makes sense. And if we go with lots of
> heaps, we might as well have an ion heap per alignment that your hw needs,
> so there's different ways to implement this in the future.

slight correction:  if you plan ahead (and do things like zero init if
userspace passes in a smaller ioctl struct like drm_ioctl does),
extending ioctl uapi is easy.. might be something worth fixing from
the get-go..

BR,
-R

> At least from the unix device memory allocator pov it's probably simpler
> to encode stuff like this into the heap name, instead of having to pass
> heap + list of additional properties/constraints.
> -Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [RFC 0/6] Module for tracking/accounting shared memory buffers

2016-10-11 Thread Rob Clark
On Tue, Oct 11, 2016 at 7:50 PM, Ruchi Kandoi  wrote:
> This patchstack introduces a new "memtrack" module for tracking and accounting
> memory exported to userspace as shared buffers, like dma-buf fds or GEM 
> handles.

btw, I wouldn't care much about the non-dmabuf case.. dri2/flink is
kind of legacy and the sharing patterns there are not so complex that
we have found the need for any more elaborate debug infrastructure
than what we already have.

Between existing dmabuf debugfs, and /proc/*/maps (and /proc/*/fd?), I
wonder what is missing?  Maybe there is a less intrusive way to get at
the debugging info you want?

BR,
-R

> Any process holding a reference to these buffers will keep the kernel from
> reclaiming its backing pages.  mm counters don't provide a complete picture of
> these allocations, since they only account for pages that are mapped into a
> process's address space.  This problem is especially bad for systems like
> Android that use dma-buf fds to share graphics and multimedia buffers between
> processes: these allocations are often large, have complex sharing patterns,
> and are rarely mapped into every process that holds a reference to them.
>
> memtrack maintains a per-process list of shared buffer references, which is
> exported to userspace as /proc/[pid]/memtrack.  Buffers can be optionally
> "tagged" with a short string: for example, Android userspace would use this
> tag to identify whether buffers were allocated on behalf of the camera stack,
> GL, etc.  memtrack also exports the VMAs associated with these buffers so
> that pages already included in the process's mm counters aren't 
> double-counted.
>
> Shared-buffer allocators can hook into memtrack by embedding
> struct memtrack_buffer in their buffer metadata, calling
> memtrack_buffer_{init,remove} at buffer allocation and free time, and
> memtrack_buffer_{install,uninstall} when a userspace process takes or
> drops a reference to the buffer.  For fd-backed buffers like dma-bufs, hooks 
> in
> fdtable.c and fork.c automatically notify memtrack when references are added 
> or
> removed from a process's fd table.
>
> This patchstack adds memtrack hooks into dma-buf and ion.  If there's upstream
> interest in memtrack, it can be extended to other memory allocators as well,
> such as GEM implementations.
>
> Greg Hackmann (1):
>   drivers: staging: ion: add ION_IOC_TAG ioctl
>
> Ruchi Kandoi (5):
>   fs: add installed and uninstalled file_operations
>   drivers: misc: add memtrack
>   dma-buf: add memtrack support
>   memtrack: Adds the accounting to keep track of all mmaped/unmapped
> pages.
>   memtrack: Add memtrack accounting for forked processes.
>
>  drivers/android/binder.c|   4 +-
>  drivers/dma-buf/dma-buf.c   |  37 +++
>  drivers/misc/Kconfig|  16 +
>  drivers/misc/Makefile   |   1 +
>  drivers/misc/memtrack.c | 516 
> 
>  drivers/staging/android/ion/ion-ioctl.c |  17 ++
>  drivers/staging/android/ion/ion.c   |  60 +++-
>  drivers/staging/android/ion/ion_priv.h  |   2 +
>  drivers/staging/android/uapi/ion.h  |  25 ++
>  fs/file.c   |  38 ++-
>  fs/open.c   |   2 +-
>  fs/proc/base.c  |   4 +
>  include/linux/dma-buf.h |   5 +
>  include/linux/fdtable.h |   4 +-
>  include/linux/fs.h  |   2 +
>  include/linux/memtrack.h| 130 
>  include/linux/mm.h  |   3 +
>  include/linux/sched.h   |   3 +
>  kernel/fork.c   |  23 +-
>  19 files changed, 875 insertions(+), 17 deletions(-)
>  create mode 100644 drivers/misc/memtrack.c
>  create mode 100644 include/linux/memtrack.h
>
> --
> 2.8.0.rc3.226.g39d4020
>
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v10 0/3] Secure Memory Allocation Framework

2016-10-07 Thread Rob Clark
probably should keep the discussion on github (USAGE.md was updated a
bit more and merged into https://github.com/cubanismo/allocator so
look there for the latest)..

but briefly:

1) my expectation is if the user is implementing some use-case, it
knows what devices and APIs are involved, otherwise it wouldn't be
able to pass a buffer to that device/API..

2) if assertion/usage/devices haven't changed, you can re-use the
merged caps for however many buffer allocations

BR,
-R

On Fri, Oct 7, 2016 at 9:54 AM, Benjamin Gaignard
<benjamin.gaign...@linaro.org> wrote:
> Rob,
>
> how do you know which devices are concerned when listing the constraints ?
> Does combine_capabilities is done from each allocation or can it be cached ?
>
> Regards,
> Benjmain
>
> 2016-10-06 18:54 GMT+02:00 Rob Clark <robdcl...@gmail.com>:
>> so there is discussion about a "central userspace allocator" (ie. more
>> like a common userspace API that could be implemented on top of
>> various devices/APIs) to decide in a generic way which device could
>> allocate.
>>
>>   https://github.com/cubanismo/allocator
>>
>> and I wrote up some rough thoughts/proposal about how the usage might
>> look.. just rough, so don't try to compile it or anything, and not
>> consensus yet so it will probably change/evolve..
>>
>>   https://github.com/robclark/allocator/blob/master/USAGE.md
>>
>> I think ion could be just another device to share buffers with, which
>> happens to not impose any specific constraints.  How "liballoc-ion.so"
>> backend figures out how to map constraints/usage to a heap is a bit
>> hand-wavey at the moment.
>>
>> BR,
>> -R
>>
>> On Wed, Oct 5, 2016 at 9:40 AM, Benjamin Gaignard
>> <benjamin.gaign...@linaro.org> wrote:
>>> because with ion it is up to userland to decide which heap to use
>>> and until now userland doesn't have any way to get device constraints...
>>>
>>> I will prefer let a central allocator (in kernel) decide from the
>>> attached devices
>>> which allocator is the best. It is what I have implemented in smaf.
>>>
>>> Benjamin
>>>
>>>
>>> 2016-10-05 15:19 GMT+02:00 Daniel Vetter <dan...@ffwll.ch>:
>>>> On Tue, Oct 04, 2016 at 01:47:21PM +0200, Benjamin Gaignard wrote:
>>>>> version 10 changes:
>>>>>  - rebased on kernel 4.8 tag
>>>>>  - minor typo fix
>>>>>
>>>>> version 9 changes:
>>>>>  - rebased on 4.8-rc5
>>>>>  - struct dma_attrs doesn't exist anymore so update CMA allocator
>>>>>to compile with new dma_*_attr functions
>>>>>  - add example SMAF use case in cover letter
>>>>>
>>>>> version 8 changes:
>>>>>  - rework of the structures used within ioctl
>>>>>by adding a version field and padding to be futur proof
>>>>>  - rename fake secure moduel to test secure module
>>>>>  - fix the various remarks done on the previous patcheset
>>>>>
>>>>> version 7 changes:
>>>>>  - rebased on kernel 4.6-rc7
>>>>>  - simplify secure module API
>>>>>  - add vma ops to be able to detect mmap/munmap calls
>>>>>  - add ioctl to get number and allocator names
>>>>>  - update libsmaf with adding tests
>>>>>https://git.linaro.org/people/benjamin.gaignard/libsmaf.git
>>>>>  - add debug log in fake secure module
>>>>>
>>>>> version 6 changes:
>>>>>  - rebased on kernel 4.5-rc4
>>>>>  - fix mmapping bug while requested allocation size isn't a a multiple of
>>>>>PAGE_SIZE (add a test for this in libsmaf)
>>>>>
>>>>> version 5 changes:
>>>>>  - rebased on kernel 4.3-rc6
>>>>>  - rework locking schema and make handle status use an atomic_t
>>>>>  - add a fake secure module to allow performing tests without trusted
>>>>>environment
>>>>>
>>>>> version 4 changes:
>>>>>  - rebased on kernel 4.3-rc3
>>>>>  - fix missing EXPORT_SYMBOL for smaf_create_handle()
>>>>>
>>>>> version 3 changes:
>>>>>  - Remove ioctl for allocator selection instead provide the name of
>>>>>the targeted allocator with allocation request.
>>>>>Selecting allocator from userland isn't the prefered way of working
>>>>>but is needed when the first user of th

Re: [PATCH v10 0/3] Secure Memory Allocation Framework

2016-10-06 Thread Rob Clark
so there is discussion about a "central userspace allocator" (ie. more
like a common userspace API that could be implemented on top of
various devices/APIs) to decide in a generic way which device could
allocate.

  https://github.com/cubanismo/allocator

and I wrote up some rough thoughts/proposal about how the usage might
look.. just rough, so don't try to compile it or anything, and not
consensus yet so it will probably change/evolve..

  https://github.com/robclark/allocator/blob/master/USAGE.md

I think ion could be just another device to share buffers with, which
happens to not impose any specific constraints.  How "liballoc-ion.so"
backend figures out how to map constraints/usage to a heap is a bit
hand-wavey at the moment.

BR,
-R

On Wed, Oct 5, 2016 at 9:40 AM, Benjamin Gaignard
 wrote:
> because with ion it is up to userland to decide which heap to use
> and until now userland doesn't have any way to get device constraints...
>
> I will prefer let a central allocator (in kernel) decide from the
> attached devices
> which allocator is the best. It is what I have implemented in smaf.
>
> Benjamin
>
>
> 2016-10-05 15:19 GMT+02:00 Daniel Vetter :
>> On Tue, Oct 04, 2016 at 01:47:21PM +0200, Benjamin Gaignard wrote:
>>> version 10 changes:
>>>  - rebased on kernel 4.8 tag
>>>  - minor typo fix
>>>
>>> version 9 changes:
>>>  - rebased on 4.8-rc5
>>>  - struct dma_attrs doesn't exist anymore so update CMA allocator
>>>to compile with new dma_*_attr functions
>>>  - add example SMAF use case in cover letter
>>>
>>> version 8 changes:
>>>  - rework of the structures used within ioctl
>>>by adding a version field and padding to be futur proof
>>>  - rename fake secure moduel to test secure module
>>>  - fix the various remarks done on the previous patcheset
>>>
>>> version 7 changes:
>>>  - rebased on kernel 4.6-rc7
>>>  - simplify secure module API
>>>  - add vma ops to be able to detect mmap/munmap calls
>>>  - add ioctl to get number and allocator names
>>>  - update libsmaf with adding tests
>>>https://git.linaro.org/people/benjamin.gaignard/libsmaf.git
>>>  - add debug log in fake secure module
>>>
>>> version 6 changes:
>>>  - rebased on kernel 4.5-rc4
>>>  - fix mmapping bug while requested allocation size isn't a a multiple of
>>>PAGE_SIZE (add a test for this in libsmaf)
>>>
>>> version 5 changes:
>>>  - rebased on kernel 4.3-rc6
>>>  - rework locking schema and make handle status use an atomic_t
>>>  - add a fake secure module to allow performing tests without trusted
>>>environment
>>>
>>> version 4 changes:
>>>  - rebased on kernel 4.3-rc3
>>>  - fix missing EXPORT_SYMBOL for smaf_create_handle()
>>>
>>> version 3 changes:
>>>  - Remove ioctl for allocator selection instead provide the name of
>>>the targeted allocator with allocation request.
>>>Selecting allocator from userland isn't the prefered way of working
>>>but is needed when the first user of the buffer is a software component.
>>>  - Fix issues in case of error while creating smaf handle.
>>>  - Fix module license.
>>>  - Update libsmaf and tests to care of the SMAF API evolution
>>>https://git.linaro.org/people/benjamin.gaignard/libsmaf.git
>>>
>>> version 2 changes:
>>>  - Add one ioctl to allow allocator selection from userspace.
>>>This is required for the uses case where the first user of
>>>the buffer is a software IP which can't perform dma_buf attachement.
>>>  - Add name and ranking to allocator structure to be able to sort them.
>>>  - Create a tiny library to test SMAF:
>>>https://git.linaro.org/people/benjamin.gaignard/libsmaf.git
>>>  - Fix one issue when try to secure buffer without secure module registered
>>>
>>> SMAF aim to solve two problems: allocating memory that fit with hardware IPs
>>> constraints and secure those data from bus point of view.
>>>
>>> One example of SMAF usage is camera preview: on SoC you may use either an 
>>> USB
>>> webcam or the built-in camera interface and the frames could be send 
>>> directly
>>> to the dipslay Ip or handle by GPU.
>>> Most of USB interfaces and GPU have mmu but almost all built-in camera
>>> interace and display Ips don't have mmu so when selecting how allocate
>>> buffer you need to be aware of each devices constraints (contiguous memroy,
>>> stride, boundary, alignment ...).
>>> ION has solve this problem by let userland decide which allocator (heap) to 
>>> use
>>> but this require to adapt userland for each platform and sometime for each
>>> use case.
>>>
>>> To be sure to select the best allocation method for devices SMAF implement
>>> deferred allocation mechanism: memory allocation is only done when the first
>>> device effectively required it.
>>> Allocator modules have to implement a match() to let SMAF know if they are
>>> compatibles with devices needs.
>>> This patch set provide an example of allocator module which use
>>> dma_{alloc/free/mmap}_attrs() and 

Re: [Mesa-dev] [RFC] New dma_buf -> EGLImage EGL extension - Final spec published!

2016-08-30 Thread Rob Clark
Tom,

hmm, I wonder if it was a bug/oversight for the YUV capabilities of
this extension to not depend on OES_EGL_image_external (which
unfortunately, doesn't seem to have a GL counterpart)?

I think this currently implies that you could sample from an imported
YUV eglimg using (for example) sampler2D in GL or GLES, which I think
was not the intention.

BR,
-R

On Mon, Feb 25, 2013 at 6:54 AM, Tom Cooksey <tom.cook...@arm.com> wrote:
> Hi All,
>
> The final spec has had enum values assigned and been published on Khronos:
>
> http://www.khronos.org/registry/egl/extensions/EXT/EGL_EXT_image_dma_buf_import.txt
>
> Thanks to all who've provided input.
>
>
> Cheers,
>
> Tom
>
>
>
>> -Original Message-
>> From: mesa-dev-bounces+tom.cooksey=arm@lists.freedesktop.org 
>> [mailto:mesa-dev-
>> bounces+tom.cooksey=arm@lists.freedesktop.org] On Behalf Of Tom Cooksey
>> Sent: 04 October 2012 13:10
>> To: mesa-...@lists.freedesktop.org; linaro-mm-...@lists.linaro.org; dri-
>> de...@lists.freedesktop.org; linux-media@vger.kernel.org
>> Subject: [Mesa-dev] [RFC] New dma_buf -> EGLImage EGL extension - New draft!
>>
>> Hi All,
>>
>> After receiving a fair bit of feedback (thanks!), I've updated the
>> EGL_EXT_image_dma_buf_import spec
>> and expanded it to resolve a number of the issues. Please find the latest 
>> draft below and let
>> me
>> know any additional feedback you might have, either on the lists or by 
>> private e-mail - I
>> don't mind
>> which.
>>
>> I think the only remaining issue now is if we need a mechanism whereby an 
>> application can
>> query
>> which drm_fourcc.h formats EGL supports or if just failing with 
>> EGL_BAD_MATCH when the
>> application
>> has use one EGL doesn't support is sufficient. Any thoughts?
>>
>>
>> Cheers,
>>
>> Tom
>>
>>
>> 8<
>>
>>
>> Name
>>
>> EXT_image_dma_buf_import
>>
>> Name Strings
>>
>> EGL_EXT_image_dma_buf_import
>>
>> Contributors
>>
>> Jesse Barker
>> Rob Clark
>> Tom Cooksey
>>
>> Contacts
>>
>> Jesse Barker (jesse 'dot' barker 'at' linaro 'dot' org)
>> Tom Cooksey (tom 'dot' cooksey 'at' arm 'dot' com)
>>
>> Status
>>
>> DRAFT
>>
>> Version
>>
>> Version 4, October 04, 2012
>>
>> Number
>>
>> EGL Extension ???
>>
>> Dependencies
>>
>> EGL 1.2 is required.
>>
>> EGL_KHR_image_base is required.
>>
>> The EGL implementation must be running on a Linux kernel supporting the
>> dma_buf buffer sharing mechanism.
>>
>> This extension is written against the wording of the EGL 1.2 
>> Specification.
>>
>> Overview
>>
>> This extension allows creating an EGLImage from a Linux dma_buf file
>> descriptor or multiple file descriptors in the case of multi-plane YUV
>> images.
>>
>> New Types
>>
>> None
>>
>> New Procedures and Functions
>>
>> None
>>
>> New Tokens
>>
>> Accepted by the  parameter of eglCreateImageKHR:
>>
>> EGL_LINUX_DMA_BUF_EXT
>>
>> Accepted as an attribute in the  parameter of
>> eglCreateImageKHR:
>>
>> EGL_LINUX_DRM_FOURCC_EXT
>> EGL_DMA_BUF_PLANE0_FD_EXT
>> EGL_DMA_BUF_PLANE0_OFFSET_EXT
>> EGL_DMA_BUF_PLANE0_PITCH_EXT
>> EGL_DMA_BUF_PLANE1_FD_EXT
>> EGL_DMA_BUF_PLANE1_OFFSET_EXT
>> EGL_DMA_BUF_PLANE1_PITCH_EXT
>> EGL_DMA_BUF_PLANE2_FD_EXT
>> EGL_DMA_BUF_PLANE2_OFFSET_EXT
>> EGL_DMA_BUF_PLANE2_PITCH_EXT
>> EGL_YUV_COLOR_SPACE_HINT_EXT
>> EGL_SAMPLE_RANGE_HINT_EXT
>> EGL_YUV_CHROMA_HORIZONTAL_SITING_HINT_EXT
>> EGL_YUV_CHROMA_VERTICAL_SITING_HINT_EXT
>>
>> Accepted as the value for the EGL_YUV_COLOR_SPACE_HINT_EXT attribute:
>>
>> EGL_ITU_REC601_EXT
>> EGL_ITU_REC709_EXT
>> EGL_ITU_REC2020_EXT
>>
>> Accepted as the value for the EGL_SAMPLE_RANGE_HINT_EXT attribute:
>>
>> EGL_YUV_FULL_RANGE_EXT
>> EGL_YUV_NARROW_RANGE_EXT
>>
>> Accepted as the value for the EGL_YUV_CHROMA_HORIZONTAL_SITING_HINT_EXT &
&g

Re: DRM device memory writeback (Mali-DP)

2016-07-18 Thread Rob Clark
On Mon, Jul 18, 2016 at 11:01 AM, Daniel Vetter  wrote:
> On Mon, Jul 18, 2016 at 12:47:38PM +0200, Hans Verkuil wrote:
>> On 07/18/2016 12:29 PM, Brian Starkey wrote:
>> > OK, so let's talk about using connectors instead then :-)
>> >
>> > We can attach a generic fixed_mode property which can represent panel
>> > fitters or Mali-DP's writeback scaling, that sounds good.
>> >
>> > The DRM driver can create a new "virtual" encoder/connector pair, I
>> > guess with a new connector type specifically for memory-writeback?
>> > On this connector we allow the fb_id property to attach a framebuffer
>> > for writing into.
>> > We'd probably want an additional property to enable writeback, perhaps
>> > an enum: "disabled", "streaming", "oneshot".
>> >
>> > I think it makes sense to put the output buffer format, size etc.
>> > validation in the virtual encoder's atomic_check. It has access to
>> > both CRTC and connector state, and the encoder is supposed to
>> > represent the format/stream conversion element anyway.
>> >
>> > What I'm not so clear on is how to manage the API with userspace.
>>
>> I am not terribly enthusiastic (to say the least) if a new drm API is
>> created for video capture. That's what V4L2 is for and it comes with
>> kernel frameworks, documentation and utilities. Creating a new API will
>> not make userspace develoeprs happy.
>>
>> I know it is a pain to work with different subsystems, but reinventing
>> the wheel is a much bigger pain. For you and for the poor sods who have
>> to use yet another API to do the same thing.
>>
>> Of course this has to be hooked up to drm at the low level, and anything
>> that can be done to help out with that from the V4L2 side of things is
>> fine, but creating duplicate public APIs isn't the way IMHO.
>
> I agree for the streaming video use-case. But for compositors I do agree
> with Eric that a simple frame capture interface integrated with the drm
> atomic ioctl is what we want. At least my understanding of v4l is that
> it's not that well suited to grabbing specific (single) frames.

same here, we defn want to use v4l for the streaming case (so we can
take advantage of existing userspace support for
capture/encode/stream, etc)..  but for compositors, I think v4l would
be awkward.  Ideal case is single set of driver APIs, so driver
doesn't have to care so much about the use case, and then some drm_v4l
helpers to expose a v4l device for the streaming case.

There are some older patches floating around that implemented v4l
inside drm/msm.. which looked simple enough, although I don't think we
should need to do that in each drm driver..

BR,
-R
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Mesa-dev] [RFC] New dma_buf -> EGLImage EGL extension - Final spec published!

2016-06-20 Thread Rob Clark
On Mon, Jun 20, 2016 at 8:37 AM, Pekka Paalanen <ppaala...@gmail.com> wrote:
> On Fri, 17 Jun 2016 11:44:34 -0400
> Rob Clark <robdcl...@gmail.com> wrote:
>
>> On Fri, Jun 17, 2016 at 9:31 AM, Pekka Paalanen <ppaala...@gmail.com> wrote:
>> > On Fri, 17 Jun 2016 08:26:04 -0400
>> > Rob Clark <robdcl...@gmail.com> wrote:
>> >
>> >> On Fri, Jun 17, 2016 at 3:59 AM, Pekka Paalanen <ppaala...@gmail.com> 
>> >> wrote:
>> >> > On Thu, 16 Jun 2016 10:40:51 -0400
>> >> > Rob Clark <robdcl...@gmail.com> wrote:
>> >> >
>> >> >> So, if we wanted to extend this to support the fourcc-modifiers that
>> >> >> we have on the kernel side for compressed/tiled/etc formats, what
>> >> >> would be the right approach?
>> >> >>
>> >> >> A new version of the existing extension or a new
>> >> >> EGL_EXT_image_dma_buf_import2 extension, or ??
>> >> >
>> >> > Hi Rob,
>> >> >
>> >> > there are actually several things it might be nice to add:
>> >> >
>> >> > - a fourth plane, to match what DRM AddFB2 supports
>> >> >
>> >> > - the 64-bit fb modifiers
>> >> >
>> >> > - queries for which pixel formats are supported by EGL, so a display
>> >> >   server can tell the applications that before the application goes and
>> >> >   tries with a random bunch of them, shooting in the dark
>> >> >
>> >> > - queries for which modifiers are supported for each pixel format, ditto
>> >> >
>> >> > I discussed these with Emil in the past, and it seems an appropriate
>> >> > approach might be the following.
>> >> >
>> >> > Adding the 4th plane can be done as revising the existing
>> >> > EGL_EXT_image_dma_buf_import extension. The plane count is tied to
>> >> > pixel formats (and modifiers?), so the user does not need to know
>> >> > specifically whether the EGL implementation could handle a 4th plane or
>> >> > not. It is implied by the pixel format.
>> >> >
>> >> > Adding the fb modifiers needs to be a new extension, so that users can
>> >> > tell if they are supported or not. This is to avoid the following false
>> >> > failure: if user assumes modifiers are always supported, it will (may?)
>> >> > provide zero modifiers explicitly. If EGL implementation does not
>> >> > handle modifiers this would be rejected as unrecognized attributes,
>> >> > while if the zero modifiers were not given explicitly, everything would
>> >> > just work.
>> >>
>> >> hmm, if we design it as "not passing modifier" == "zero modifier", and
>> >> "never explicitly pass a zero modifier" then modifiers could be added
>> >> without a new extension.  Although I agree that queries would need a
>> >> new extension.. so perhaps not worth being clever.
>> >
>> > Indeed.
>> >
>> >> > The queries obviously(?) need a new extension. It might make sense
>> >> > to bundle both modifier support and the queries in the same new
>> >> > extension.
>> >> >
>> >> > We have some rough old WIP code at
>> >> > https://git.collabora.com/cgit/user/lfrb/mesa.git/log/?h=T1410-modifiers
>> >> > https://git.collabora.com/cgit/user/lfrb/egl-specs.git/log/?h=T1410
>> >> >
>> >> >
>> >> >> On Mon, Feb 25, 2013 at 6:54 AM, Tom Cooksey <tom.cook...@arm.com> 
>> >> >> wrote:
>> >> >> > Hi All,
>> >> >> >
>> >> >> > The final spec has had enum values assigned and been published on 
>> >> >> > Khronos:
>> >> >> >
>> >> >> > http://www.khronos.org/registry/egl/extensions/EXT/EGL_EXT_image_dma_buf_import.txt
>> >> >> >
>> >> >> > Thanks to all who've provided input.
>> >> >
>> >> > May I also pull your attention to a detail with the existing spec and
>> >> > Mesa behaviour I am asking about in
>> >> > https://lists.freedesktop.org/archives/mesa-dev/2016-June/120249.html
>> >> > "What is EGL_EXT_image_dma_buf_import image orientation as a GL 
>> >> > texture

Re: [Mesa-dev] [RFC] New dma_buf -> EGLImage EGL extension - Final spec published!

2016-06-17 Thread Rob Clark
On Fri, Jun 17, 2016 at 9:31 AM, Pekka Paalanen <ppaala...@gmail.com> wrote:
> On Fri, 17 Jun 2016 08:26:04 -0400
> Rob Clark <robdcl...@gmail.com> wrote:
>
>> On Fri, Jun 17, 2016 at 3:59 AM, Pekka Paalanen <ppaala...@gmail.com> wrote:
>> > On Thu, 16 Jun 2016 10:40:51 -0400
>> > Rob Clark <robdcl...@gmail.com> wrote:
>> >
>> >> So, if we wanted to extend this to support the fourcc-modifiers that
>> >> we have on the kernel side for compressed/tiled/etc formats, what
>> >> would be the right approach?
>> >>
>> >> A new version of the existing extension or a new
>> >> EGL_EXT_image_dma_buf_import2 extension, or ??
>> >
>> > Hi Rob,
>> >
>> > there are actually several things it might be nice to add:
>> >
>> > - a fourth plane, to match what DRM AddFB2 supports
>> >
>> > - the 64-bit fb modifiers
>> >
>> > - queries for which pixel formats are supported by EGL, so a display
>> >   server can tell the applications that before the application goes and
>> >   tries with a random bunch of them, shooting in the dark
>> >
>> > - queries for which modifiers are supported for each pixel format, ditto
>> >
>> > I discussed these with Emil in the past, and it seems an appropriate
>> > approach might be the following.
>> >
>> > Adding the 4th plane can be done as revising the existing
>> > EGL_EXT_image_dma_buf_import extension. The plane count is tied to
>> > pixel formats (and modifiers?), so the user does not need to know
>> > specifically whether the EGL implementation could handle a 4th plane or
>> > not. It is implied by the pixel format.
>> >
>> > Adding the fb modifiers needs to be a new extension, so that users can
>> > tell if they are supported or not. This is to avoid the following false
>> > failure: if user assumes modifiers are always supported, it will (may?)
>> > provide zero modifiers explicitly. If EGL implementation does not
>> > handle modifiers this would be rejected as unrecognized attributes,
>> > while if the zero modifiers were not given explicitly, everything would
>> > just work.
>>
>> hmm, if we design it as "not passing modifier" == "zero modifier", and
>> "never explicitly pass a zero modifier" then modifiers could be added
>> without a new extension.  Although I agree that queries would need a
>> new extension.. so perhaps not worth being clever.
>
> Indeed.
>
>> > The queries obviously(?) need a new extension. It might make sense
>> > to bundle both modifier support and the queries in the same new
>> > extension.
>> >
>> > We have some rough old WIP code at
>> > https://git.collabora.com/cgit/user/lfrb/mesa.git/log/?h=T1410-modifiers
>> > https://git.collabora.com/cgit/user/lfrb/egl-specs.git/log/?h=T1410
>> >
>> >
>> >> On Mon, Feb 25, 2013 at 6:54 AM, Tom Cooksey <tom.cook...@arm.com> wrote:
>> >> > Hi All,
>> >> >
>> >> > The final spec has had enum values assigned and been published on 
>> >> > Khronos:
>> >> >
>> >> > http://www.khronos.org/registry/egl/extensions/EXT/EGL_EXT_image_dma_buf_import.txt
>> >> >
>> >> > Thanks to all who've provided input.
>> >
>> > May I also pull your attention to a detail with the existing spec and
>> > Mesa behaviour I am asking about in
>> > https://lists.freedesktop.org/archives/mesa-dev/2016-June/120249.html
>> > "What is EGL_EXT_image_dma_buf_import image orientation as a GL texture?"
>> > Doing a dmabuf import seems to imply an y-flip AFAICT.
>>
>> I would have expected that *any* egl external image (dma-buf or
>> otherwise) should have native orientation rather than gl orientation.
>> It's somewhat useless otherwise.
>
> In that case importing dmabuf works differently than importing a
> wl_buffer (wl_drm), because for the latter, the y-invert flag is
> returned such that the orientation will match GL. And the direct
> scanout path goes through GBM since you have to import a wl_buffer, and
> I haven't looked what GBM does wrt. y-flip if anything.
>
>> I didn't read it carefully yet (would need caffeine first ;-)) but
>> EGL_KHR_image_base does say "This extension defines a new EGL resource
>> type that is suitable for sharing 2D arrays of image data between
>> client APIs" which to me implies native or

Re: [Mesa-dev] [RFC] New dma_buf -> EGLImage EGL extension - Final spec published!

2016-06-17 Thread Rob Clark
On Fri, Jun 17, 2016 at 3:59 AM, Pekka Paalanen <ppaala...@gmail.com> wrote:
> On Thu, 16 Jun 2016 10:40:51 -0400
> Rob Clark <robdcl...@gmail.com> wrote:
>
>> So, if we wanted to extend this to support the fourcc-modifiers that
>> we have on the kernel side for compressed/tiled/etc formats, what
>> would be the right approach?
>>
>> A new version of the existing extension or a new
>> EGL_EXT_image_dma_buf_import2 extension, or ??
>
> Hi Rob,
>
> there are actually several things it might be nice to add:
>
> - a fourth plane, to match what DRM AddFB2 supports
>
> - the 64-bit fb modifiers
>
> - queries for which pixel formats are supported by EGL, so a display
>   server can tell the applications that before the application goes and
>   tries with a random bunch of them, shooting in the dark
>
> - queries for which modifiers are supported for each pixel format, ditto
>
> I discussed these with Emil in the past, and it seems an appropriate
> approach might be the following.
>
> Adding the 4th plane can be done as revising the existing
> EGL_EXT_image_dma_buf_import extension. The plane count is tied to
> pixel formats (and modifiers?), so the user does not need to know
> specifically whether the EGL implementation could handle a 4th plane or
> not. It is implied by the pixel format.
>
> Adding the fb modifiers needs to be a new extension, so that users can
> tell if they are supported or not. This is to avoid the following false
> failure: if user assumes modifiers are always supported, it will (may?)
> provide zero modifiers explicitly. If EGL implementation does not
> handle modifiers this would be rejected as unrecognized attributes,
> while if the zero modifiers were not given explicitly, everything would
> just work.

hmm, if we design it as "not passing modifier" == "zero modifier", and
"never explicitly pass a zero modifier" then modifiers could be added
without a new extension.  Although I agree that queries would need a
new extension.. so perhaps not worth being clever.

> The queries obviously(?) need a new extension. It might make sense
> to bundle both modifier support and the queries in the same new
> extension.
>
> We have some rough old WIP code at
> https://git.collabora.com/cgit/user/lfrb/mesa.git/log/?h=T1410-modifiers
> https://git.collabora.com/cgit/user/lfrb/egl-specs.git/log/?h=T1410
>
>
>> On Mon, Feb 25, 2013 at 6:54 AM, Tom Cooksey <tom.cook...@arm.com> wrote:
>> > Hi All,
>> >
>> > The final spec has had enum values assigned and been published on Khronos:
>> >
>> > http://www.khronos.org/registry/egl/extensions/EXT/EGL_EXT_image_dma_buf_import.txt
>> >
>> > Thanks to all who've provided input.
>
> May I also pull your attention to a detail with the existing spec and
> Mesa behaviour I am asking about in
> https://lists.freedesktop.org/archives/mesa-dev/2016-June/120249.html
> "What is EGL_EXT_image_dma_buf_import image orientation as a GL texture?"
> Doing a dmabuf import seems to imply an y-flip AFAICT.

I would have expected that *any* egl external image (dma-buf or
otherwise) should have native orientation rather than gl orientation.
It's somewhat useless otherwise.

I didn't read it carefully yet (would need caffeine first ;-)) but
EGL_KHR_image_base does say "This extension defines a new EGL resource
type that is suitable for sharing 2D arrays of image data between
client APIs" which to me implies native orientation.  So that just
sounds like a mesa bug somehow?

Do you just get that w/ i965?  I know some linaro folks have been
using this extension to import buffers from video decoder with
freedreno/gallium and no one mentioned the video being upside down.

BR,
-R


>
> Thanks,
> pq
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Mesa-dev] [RFC] New dma_buf -> EGLImage EGL extension - Final spec published!

2016-06-16 Thread Rob Clark
So, if we wanted to extend this to support the fourcc-modifiers that
we have on the kernel side for compressed/tiled/etc formats, what
would be the right approach?

A new version of the existing extension or a new
EGL_EXT_image_dma_buf_import2 extension, or ??

BR,
-R

On Mon, Feb 25, 2013 at 6:54 AM, Tom Cooksey <tom.cook...@arm.com> wrote:
> Hi All,
>
> The final spec has had enum values assigned and been published on Khronos:
>
> http://www.khronos.org/registry/egl/extensions/EXT/EGL_EXT_image_dma_buf_import.txt
>
> Thanks to all who've provided input.
>
>
> Cheers,
>
> Tom
>
>
>
>> -Original Message-
>> From: mesa-dev-bounces+tom.cooksey=arm@lists.freedesktop.org 
>> [mailto:mesa-dev-
>> bounces+tom.cooksey=arm@lists.freedesktop.org] On Behalf Of Tom Cooksey
>> Sent: 04 October 2012 13:10
>> To: mesa-...@lists.freedesktop.org; linaro-mm-...@lists.linaro.org; dri-
>> de...@lists.freedesktop.org; linux-media@vger.kernel.org
>> Subject: [Mesa-dev] [RFC] New dma_buf -> EGLImage EGL extension - New draft!
>>
>> Hi All,
>>
>> After receiving a fair bit of feedback (thanks!), I've updated the
>> EGL_EXT_image_dma_buf_import spec
>> and expanded it to resolve a number of the issues. Please find the latest 
>> draft below and let
>> me
>> know any additional feedback you might have, either on the lists or by 
>> private e-mail - I
>> don't mind
>> which.
>>
>> I think the only remaining issue now is if we need a mechanism whereby an 
>> application can
>> query
>> which drm_fourcc.h formats EGL supports or if just failing with 
>> EGL_BAD_MATCH when the
>> application
>> has use one EGL doesn't support is sufficient. Any thoughts?
>>
>>
>> Cheers,
>>
>> Tom
>>
>>
>> 8<
>>
>>
>> Name
>>
>> EXT_image_dma_buf_import
>>
>> Name Strings
>>
>> EGL_EXT_image_dma_buf_import
>>
>> Contributors
>>
>> Jesse Barker
>> Rob Clark
>> Tom Cooksey
>>
>> Contacts
>>
>> Jesse Barker (jesse 'dot' barker 'at' linaro 'dot' org)
>> Tom Cooksey (tom 'dot' cooksey 'at' arm 'dot' com)
>>
>> Status
>>
>> DRAFT
>>
>> Version
>>
>> Version 4, October 04, 2012
>>
>> Number
>>
>> EGL Extension ???
>>
>> Dependencies
>>
>> EGL 1.2 is required.
>>
>> EGL_KHR_image_base is required.
>>
>> The EGL implementation must be running on a Linux kernel supporting the
>> dma_buf buffer sharing mechanism.
>>
>> This extension is written against the wording of the EGL 1.2 
>> Specification.
>>
>> Overview
>>
>> This extension allows creating an EGLImage from a Linux dma_buf file
>> descriptor or multiple file descriptors in the case of multi-plane YUV
>> images.
>>
>> New Types
>>
>> None
>>
>> New Procedures and Functions
>>
>> None
>>
>> New Tokens
>>
>> Accepted by the  parameter of eglCreateImageKHR:
>>
>> EGL_LINUX_DMA_BUF_EXT
>>
>> Accepted as an attribute in the  parameter of
>> eglCreateImageKHR:
>>
>> EGL_LINUX_DRM_FOURCC_EXT
>> EGL_DMA_BUF_PLANE0_FD_EXT
>> EGL_DMA_BUF_PLANE0_OFFSET_EXT
>> EGL_DMA_BUF_PLANE0_PITCH_EXT
>> EGL_DMA_BUF_PLANE1_FD_EXT
>> EGL_DMA_BUF_PLANE1_OFFSET_EXT
>> EGL_DMA_BUF_PLANE1_PITCH_EXT
>> EGL_DMA_BUF_PLANE2_FD_EXT
>> EGL_DMA_BUF_PLANE2_OFFSET_EXT
>> EGL_DMA_BUF_PLANE2_PITCH_EXT
>> EGL_YUV_COLOR_SPACE_HINT_EXT
>> EGL_SAMPLE_RANGE_HINT_EXT
>> EGL_YUV_CHROMA_HORIZONTAL_SITING_HINT_EXT
>> EGL_YUV_CHROMA_VERTICAL_SITING_HINT_EXT
>>
>> Accepted as the value for the EGL_YUV_COLOR_SPACE_HINT_EXT attribute:
>>
>> EGL_ITU_REC601_EXT
>> EGL_ITU_REC709_EXT
>> EGL_ITU_REC2020_EXT
>>
>> Accepted as the value for the EGL_SAMPLE_RANGE_HINT_EXT attribute:
>>
>> EGL_YUV_FULL_RANGE_EXT
>> EGL_YUV_NARROW_RANGE_EXT
>>
>> Accepted as the value for the EGL_YUV_CHROMA_HORIZONTAL_SITING_HINT_EXT &
>> EGL_YUV_CHROMA_VERTICAL_SITING_HINT_EXT attributes:
>>
>> EGL_YUV_CHROMA_SITING_0_EXT
>> EGL_YUV_CHROMA_SITING_0_5_EXT
>>
>

Re: gstreamer: v4l2videodec plugin

2016-04-29 Thread Rob Clark
On Thu, Apr 28, 2016 at 7:33 AM, Stanimir Varbanov
<stanimir.varba...@linaro.org> wrote:
> On 04/15/2016 07:09 PM, Nicolas Dufresne wrote:
>> Le vendredi 15 avril 2016 à 11:58 -0400, Rob Clark a écrit :
>>> The issue is probably the YUV format, which we cannot really deal
>>> with
>>> properly in gallium..  it's a similar issue to multi-planer even if
>>> it
>>> is in a single buffer.
>>>
>>> The best way to handle this would be to import the same dmabuf fd
>>> twice, with appropriate offsets, to create one GL_RED eglimage for Y
>>> and one GL_RG eglimage for UV, and then combine them in shader in a
>>> similar way to how you'd handle separate Y and UV planes..
>>
>> That's the strategy we use in GStreamer, as very few GL stack support
>> implicit color conversions. For that to work you need to implement the
>> "offset" field in winsys_handle, that was added recently, and make sure
>> you have R8 and RG88 support (usually this is just mapping).
>
> Thanks,
>
> OK, I have made the relevant changes in Mesa and now I have image but
> the U and V components are swapped (the format is NV12, the UV plane is
> at the same buffer but at offset). Digging further and tracing gstreamer
> with apitrace I'm observing something weird.
>
> The gst import dmabuf with following call:
>
> eglCreateImageKHR(dpy = 0x7fa8013030, ctx = NULL, target =
> EGL_LINUX_DMA_BUF_EXT, buffer = NULL, attrib_list = {EGL_WIDTH, 640,
> EGL_HEIGHT, 360, EGL_LINUX_DRM_FOURCC_EXT, 943215175,
> EGL_DMA_BUF_PLANE0_FD_EXT, 29, EGL_DMA_BUF_PLANE0_OFFSET_EXT, 942080,
> EGL_DMA_BUF_PLANE0_PITCH_EXT, 1280, EGL_NONE}) = 0x7f980027d0
>
> the fourcc format is DRM_FORMAT_GR88 (943215175 decimal).
>
> after that:
>
> glTexImage2D(target = GL_TEXTURE_2D, level = 0, internalformat = GL_RG8,
> width = 640, height = 360, border = 0, format = GL_RG, type =
> GL_UNSIGNED_BYTE, pixels = NULL)
>
> and finally on the fragment shader we have:
>
> yuv.x=texture2D(Ytex, texcoord * tex_scale0).r;
> yuv.yz=texture2D(UVtex, texcoord * tex_scale1).rg;
>
> I was expecting to see DRM_FORMAT_RG88 / GL_RG and shader sampling
> y <- r
> z <- g
>
> or DRM_FORMAT_GR88 / GL_RG and shader sampling
> y <- g
> z <- r

IIRC you are using gles?  Could you recompile glimagesink to use
desktop GL?  I'm wondering a bit, but just speculation since I don't
have a way to step through it, but the 'if (_mesa_is_gles())' case in
st_ChooseTextureFormat..  normally for gles the driver is more free to
choose the corresponding internal-format, which is maybe not the right
thing to do for textures which are imported eglimages.

(if recompiling mesa is easier, you could just change that to 'if (0)'
and see if it "fixes" things.. that ofc is not the proper fix, but it
would confirm whether this is what is going on..)

BR,
-R

> Also, browsing the code in Mesa for Intel i965 dri driver I found where
> the __DRI_IMAGE_FORMAT_GR88 becomes MESA_FORMAT_R8G8_UNORM [1].
>
> So I'm wondering is that intensional?
>
> Depending on the answer I should make the same in the Gallium dri2 in
> dri2_from_dma_bufs().
>
> --
> regards,
> Stan
>
> [1]
> https://cgit.freedesktop.org/mesa/mesa/tree/src/mesa/drivers/dri/common/dri_util.c#n878
>
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: gstreamer: v4l2videodec plugin

2016-04-18 Thread Rob Clark
On Fri, Apr 15, 2016 at 12:09 PM, Nicolas Dufresne <nico...@ndufresne.ca> wrote:
> Le vendredi 15 avril 2016 à 11:58 -0400, Rob Clark a écrit :
>> The issue is probably the YUV format, which we cannot really deal
>> with
>> properly in gallium..  it's a similar issue to multi-planer even if
>> it
>> is in a single buffer.
>>
>> The best way to handle this would be to import the same dmabuf fd
>> twice, with appropriate offsets, to create one GL_RED eglimage for Y
>> and one GL_RG eglimage for UV, and then combine them in shader in a
>> similar way to how you'd handle separate Y and UV planes..
>
> That's the strategy we use in GStreamer, as very few GL stack support
> implicit color conversions. For that to work you need to implement the
> "offset" field in winsys_handle, that was added recently, and make sure
> you have R8 and RG88 support (usually this is just mapping).

oh, heh, looks like nobody bothered to add this yet:

-
diff --git a/src/gallium/drivers/freedreno/freedreno_resource.c
b/src/gallium/drivers/freedreno/freedreno_resource.c
index 9aded3b..fab78ab 100644
--- a/src/gallium/drivers/freedreno/freedreno_resource.c
+++ b/src/gallium/drivers/freedreno/freedreno_resource.c
@@ -669,6 +669,7 @@ fd_resource_from_handle(struct pipe_screen *pscreen,
 rsc->base.vtbl = _resource_vtbl;
 rsc->cpp = util_format_get_blocksize(tmpl->format);
 slice->pitch /= rsc->cpp;
+slice->offset = handle->offset;

 assert(rsc->cpp);
-


> cheers,
> Nicolas
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: gstreamer: v4l2videodec plugin

2016-04-15 Thread Rob Clark
On Tue, Apr 12, 2016 at 4:57 AM, Stanimir Varbanov
 wrote:
> Hi Nicolas,
>
> On 04/11/2016 07:25 PM, Nicolas Dufresne wrote:
>> Le lundi 11 avril 2016 à 15:11 +0300, Stanimir Varbanov a écrit :
>>> adding gstreamer-devel
>>>
>>> On 04/11/2016 03:03 PM, Stanimir Varbanov wrote:

 Hi,

 I'm working on QCOM v4l2 video decoder/encoder driver and in order
 to
 test its functionalities I'm using gstreamer v4l2videodec plugin. I
 am
 able to use the v4l2videodec plugin with MMAP, now I want to try
 the
 dmabuf export from v4l2 and import dmabuf buffers to glimagesink. I
 upgraded gst to 1.7.91 so that I have the dmabuf support in
 glimagesink.
 Mesa version is 11.1.2.
>>
>> I'm very happy to see this report. So far, we only had report that this
>> element works on Freescale IMX.6 (CODA) and Exynos 4/5.
>
> In this context, I would be very happy to see v4l2videoenc merged soon :)
>
>>

 I'm using the following pipeline:

 GST_GL_PLATFORM=egl GST_GL_API=gles2 gst-launch-1.0 $GSTDEBUG
 $GSTFILESRC ! qtdemux name=m m.video_0 ! h264parse ! v4l2video32dec
 capture-io-mode=dmabuf ! glimagesink

 I stalled on this error:

 eglimagememory
 gsteglimagememory.c:473:gst_egl_image_memory_from_dmabuf:>>> llocator0>
 eglCreateImage failed: EGL_BAD_MATCH

 which in Mesa is:

 libEGL debug: EGL user error 0x3009 (EGL_BAD_MATCH) in
 dri2_create_image_khr_texture

 Do someone know how the dmabuf import is tested when the support
 has
 been added to glimagesink? Or some pointers how to continue with
 debugging?
>>
>> So far the DMABuf support in glimagesink has been tested on Intel/Mesa
>> and libMALI. There is work in progress in Gallium/Mesa, but until
>> recently there was no support for offset in imported buffer, which
>> would result in BAD_MATCH error. I cannot guaranty this is the exact
>> reason here, BAD_MATCH is used for a very wide variety of reason in
>> those extensions. The right place to dig into this issue would be
>> through the Mesa list and/or Mesa code. Find out what is missing for
>> you driver in Mesa and then I may help you further.
>
> I came down to these conditions
>
> https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/state_trackers/dri/dri2.c?h=11.2#n1063
>
> but I don't know how this is related. The gstreamer
> (gst_egl_image_memory_from_dmabuf) doesn't set this "level" so it will
> be zero.
>
>>
>> For the reference, the importation strategy we use in GStreamer has
>> been inspired of Kodi (xmbc). It consist of importing each YUV plane
>> seperatly using R8 and RG88 textures and doing the color conversion
>> using shaders. Though, if the frame is allocated as a single DMABuf,
>> this requires using offset to access the frame data, and that support
>
> Yep that is my case, the driver capture buffers has one plain, hence one
> dmabuf will be exported per buffer.
>
>> had only been recently added in Gallium base code and in Radeon driver
>> recently. I don't know if Freedreno, VC4 have that, and I know nouveau
>> don't.
>
> Rob, do we need to add something in Freedreno Gallium driver to handle
> dmabuf import?

The issue is probably the YUV format, which we cannot really deal with
properly in gallium..  it's a similar issue to multi-planer even if it
is in a single buffer.

The best way to handle this would be to import the same dmabuf fd
twice, with appropriate offsets, to create one GL_RED eglimage for Y
and one GL_RG eglimage for UV, and then combine them in shader in a
similar way to how you'd handle separate Y and UV planes..

BR,
-R

> --
> regards,
> Stan
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] How implement Secure Data Path ?

2015-05-06 Thread Rob Clark
On Wed, May 6, 2015 at 4:35 AM, Daniel Vetter dan...@ffwll.ch wrote:
 On Tue, May 05, 2015 at 05:54:05PM +0100, One Thousand Gnomes wrote:
  First what is Secure Data Path ? SDP is a set of hardware features to 
  garanty
  that some memories regions could only be read and/or write by specific 
  hardware
  IPs. You can imagine it as a kind of memory firewall which grant/revoke
  accesses to memory per devices. Firewall configuration must be done in a 
  trusted
  environment: for ARM architecture we plan to use OP-TEE + a trusted
  application to do that.

 It's not just an ARM feature so any basis for this in the core code
 should be generic, whether its being enforced by ARM SDP, various Intel
 feature sets or even via a hypervisor.

  I have try 2 hacky approachs with dma_buf:
  - add a secure field in dma_buf structure and configure firewall in
dma_buf_{map/unmap}_attachment() functions.

 How is SDP not just another IOMMU. The only oddity here is that it
 happens to configure buffers the CPU can't touch and it has a control
 mechanism that is designed to cover big media corp type uses where the
 threat model is that the system owner is the enemy. Why does anything care
 about it being SDP, there are also generic cases this might be a useful
 optimisation (eg knowing the buffer isn't CPU touched so you can optimise
 cache flushing).

 The control mechanism is a device/platform detail as with any IOMMU. It
 doesn't matter who configures it or how, providing it happens.

 We do presumably need some small core DMA changes - anyone trying to map
 such a buffer into CPU space needs to get a warning or error but what
 else ?

  From buffer allocation point of view I also facing a problem because when 
  v4l2
  or drm/kms are exporting buffers by using dma_buf they don't attaching
  themself on it and never call dma_buf_{map/unmap}_attachment(). This is not
  an issue in those framework while it is how dma_buf exporters are
  supposed to work.

 Which could be addressed if need be.

 So if SDP is just another IOMMU feature, just as stuff like IMR is on
 some x86 devices, and hypervisor enforced protection is on assorted
 platforms why do we need a special way to do it ? Is there anything
 actually needed beyond being able to tell the existing DMA code that this
 buffer won't be CPU touched and wiring it into the DMA operations for the
 platform ?

 Iirc most of the dma api stuff gets unhappy when memory isn't struct page
 backed. In i915 we do use sg tables everywhere though (even for memory not
 backed by struct page, e.g. the stolen range the bios prereserves), but
 we fill those out manually.

 A possible generic design I see is to have a secure memory allocator
 device which doesn nothing else but hand out dma-bufs. With that we can
 hide the platform-specific allocation methods in there (some need to
 allocate from carveouts, other just need to mark the pages specifically).
 Also dma-buf has explicit methods for cpu access, which are allowed to
 fail. And using the dma-buf attach tracking we can also reject dma to
 devices which cannot access the secure memory. Given all that I think
 going through the dma-buf interface but with a special-purpose allocator
 seems to fit.

 I'm not sure whether a special iommu is a good idea otoh: I'd expect that
 for most devices the driver would need to decide about which iommu to pick
 (or maybe keep track of some special flags for an extended dma_map
 interface). At least looking at gpu drivers using iommus would require
 special code, whereas fully hiding all this behind the dma-buf interface
 should fit in much better.


jfwiw, I'd fully expect devices to be dealing with a mix of secure and
insecure buffers, so I'm also not really sure how the 'special iommu'
plan would play out..

I think 'secure' allocator device sounds attractive from PoV of
separating out platform nonsense.. not sure if it is exactly that
easy, since importing device probably needs to set some special bits
here and there..

BR,
-R


 -Daniel
 --
 Daniel Vetter
 Software Engineer, Intel Corporation
 http://blog.ffwll.ch
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms

2015-02-11 Thread Rob Clark
On Wed, Feb 11, 2015 at 6:12 AM, Russell King - ARM Linux
li...@arm.linux.org.uk wrote:
 On Wed, Feb 11, 2015 at 09:28:37AM +0100, Marek Szyprowski wrote:
 Hello,

 On 2015-01-27 09:25, Sumit Semwal wrote:
 Add some helpers to share the constraints of devices while attaching
 to the dmabuf buffer.
 
 At each attach, the constraints are calculated based on the following:
 - max_segment_size, max_segment_count, segment_boundary_mask from
 device_dma_parameters.
 
 In case the attaching device's constraints don't match up, attach() fails.
 
 At detach, the constraints are recalculated based on the remaining
 attached devices.
 
 Two helpers are added:
 - dma_buf_get_constraints - which gives the current constraints as 
 calculated
during each attach on the buffer till the time,
 - dma_buf_recalc_constraints - which recalculates the constraints for all
currently attached devices for the 'paranoid' ones amongst us.
 
 The idea of this patch is largely taken from Rob Clark's RFC at
 https://lkml.org/lkml/2012/7/19/285, and the comments received on it.
 
 Cc: Rob Clark robdcl...@gmail.com
 Signed-off-by: Sumit Semwal sumit.sem...@linaro.org

 The code looks okay, although it will probably will work well only with
 typical cases like 'contiguous memory needed' or 'no constraints at all'
 (iommu).

 Which is a damn good reason to NAK it - by that admission, it's a half-baked
 idea.

 If all we want to know is whether the importer can accept only contiguous
 memory or not, make a flag to do that, and allow the exporter to test this
 flag.  Don't over-engineer this to make it _seem_ like it can do something
 that it actually totally fails with.

jfyi, I agree with that.. I think the flag is probably the right
approach to start with.  At the end of the day it *is* still just an
in-kernel API (and not something that ends up as userspace ABI) so
when we come up with the use case to make it more generic we can.  Vs.
making it look like something more generic when it isn't really yet.

 As I've already pointed out, there's a major problem if you have already
 had a less restrictive attachment which has an active mapping, and a new
 more restrictive attachment comes along later.

 It seems from Rob's descriptions that we also need another flag in the
 importer to indicate whether it wants to have a valid struct page in the
 scatter list, or whether it (correctly) uses the DMA accessors on the
 scatter list - so that exporters can reject importers which are buggy.

to be completely generic, we would really need a way that the device
could take over only just the last iommu (in case there were multiple
levels of address translation)..

I'm not completely sure, but I *think* the other arm gpu's have their
own internal mmu for doing context switching, etc, so if there is an
additional iommu in front of them they may actually still want to use
the normal dma api's.  Someone please contradict me if I am wrong.  If
this ends up being an issue only for msm, then I'm completely ok with
the easier option of a less generic solution..

BR,
-R


 --
 FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
 according to speedtest.net.
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms

2015-02-11 Thread Rob Clark
On Wed, Feb 11, 2015 at 7:56 AM, Daniel Vetter dan...@ffwll.ch wrote:
 On Wed, Feb 11, 2015 at 06:23:52AM -0500, Rob Clark wrote:
 On Wed, Feb 11, 2015 at 6:12 AM, Russell King - ARM Linux
 li...@arm.linux.org.uk wrote:
  As I've already pointed out, there's a major problem if you have already
  had a less restrictive attachment which has an active mapping, and a new
  more restrictive attachment comes along later.
 
  It seems from Rob's descriptions that we also need another flag in the
  importer to indicate whether it wants to have a valid struct page in the
  scatter list, or whether it (correctly) uses the DMA accessors on the
  scatter list - so that exporters can reject importers which are buggy.

 to be completely generic, we would really need a way that the device
 could take over only just the last iommu (in case there were multiple
 levels of address translation)..

 I still hold that if the dma api steals the iommu your gpu needs for
 context switching then that's a bug in the platform setup code. dma api
 really doesn't have any concept of switchable hw contexts. So trying to
 work around this brokeness by mandating it as a valid dma-buf use-case is
 totally backwards.

sure, my only point is that if I'm the odd man out, I can live with a
hack (ie. requiring drm/msm to be aware enough of the platform to know
if there is 1 level of address translation and frob'ing my 'struct
device' accordingly)... no point in a generic solution for one user.
I like to be practical.

BR,
-R

 -Daniel
 --
 Daniel Vetter
 Software Engineer, Intel Corporation
 +41 (0) 79 365 57 48 - http://blog.ffwll.ch
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms

2015-02-03 Thread Rob Clark
On Tue, Feb 3, 2015 at 9:37 AM, Russell King - ARM Linux
li...@arm.linux.org.uk wrote:
 On Tue, Feb 03, 2015 at 09:04:03AM -0500, Rob Clark wrote:
 Since I'm stuck w/ an iommu, instead of built in mmu, my plan was to
 drop use of dma-mapping entirely (incl the current call to dma_map_sg,
 which I just need until we can use drm_cflush on arm), and
 attach/detach iommu domains directly to implement context switches.
 At that point, dma_addr_t really has no sensible meaning for me.

 So how do you intend to import from a subsystem which only gives you
 the dma_addr_t?

 If you aren't passing system memory, you have no struct page.  You can't
 fake up a struct page.  What this means is that struct scatterlist can't
 represent it any other way.

Tell the exporter to stop using carveouts, and give me proper memory
instead.. ;-)

Well, at least on these SoC's, I think the only valid use for carveout
memory is the bootloader splashscreen.  And I was planning on just
hanging on to that for myself for fbdev scanout buffer or other
internal (non shared) usage..

BR,
-R

 --
 FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
 according to speedtest.net.
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms

2015-02-03 Thread Rob Clark
On Tue, Feb 3, 2015 at 9:41 AM, Russell King - ARM Linux
li...@arm.linux.org.uk wrote:
 On Tue, Feb 03, 2015 at 03:17:27PM +0100, Arnd Bergmann wrote:
 On Tuesday 03 February 2015 09:04:03 Rob Clark wrote:
  Since I'm stuck w/ an iommu, instead of built in mmu, my plan was to
  drop use of dma-mapping entirely (incl the current call to dma_map_sg,
  which I just need until we can use drm_cflush on arm), and
  attach/detach iommu domains directly to implement context switches.
  At that point, dma_addr_t really has no sensible meaning for me.

 I think what you see here is a quite common hardware setup and we really
 lack the right abstraction for it at the moment. Everybody seems to
 work around it with a mix of the dma-mapping API and the iommu API.
 These are doing different things, and even though the dma-mapping API
 can be implemented on top of the iommu API, they are not really compatible.

 I'd go as far as saying that the DMA API on top of IOMMU is more
 intended to be for a system IOMMU for the bus in question, rather
 than a device-level IOMMU.

 If an IOMMU is part of a device, then the device should handle it
 (maybe via an abstraction) and not via the DMA API.  The DMA API should
 be handing the bus addresses to the device driver which the device's
 IOMMU would need to generate.  (In other words, in this circumstance,
 the DMA API shouldn't give you the device internal address.)

if the dma_addr_t becomes the address upstream of the iommu (in
practice, the phys addr), that would, I think, address my concerns
about dma_addr_t

BR,
-R

 --
 FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
 according to speedtest.net.
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms

2015-02-03 Thread Rob Clark
On Tue, Feb 3, 2015 at 9:52 AM, Arnd Bergmann a...@arndb.de wrote:
 On Tuesday 03 February 2015 14:41:09 Russell King - ARM Linux wrote:
 On Tue, Feb 03, 2015 at 03:17:27PM +0100, Arnd Bergmann wrote:
  On Tuesday 03 February 2015 09:04:03 Rob Clark wrote:
   Since I'm stuck w/ an iommu, instead of built in mmu, my plan was to
   drop use of dma-mapping entirely (incl the current call to dma_map_sg,
   which I just need until we can use drm_cflush on arm), and
   attach/detach iommu domains directly to implement context switches.
   At that point, dma_addr_t really has no sensible meaning for me.
 
  I think what you see here is a quite common hardware setup and we really
  lack the right abstraction for it at the moment. Everybody seems to
  work around it with a mix of the dma-mapping API and the iommu API.
  These are doing different things, and even though the dma-mapping API
  can be implemented on top of the iommu API, they are not really compatible.

 I'd go as far as saying that the DMA API on top of IOMMU is more
 intended to be for a system IOMMU for the bus in question, rather
 than a device-level IOMMU.

 If an IOMMU is part of a device, then the device should handle it
 (maybe via an abstraction) and not via the DMA API.  The DMA API should
 be handing the bus addresses to the device driver which the device's
 IOMMU would need to generate.  (In other words, in this circumstance,
 the DMA API shouldn't give you the device internal address.)

 Exactly. And the abstraction that people choose at the moment is the
 iommu API, for better or worse. It makes a lot of sense to use this
 API if the same iommu is used for other devices as well (which is
 the case on Tegra and probably a lot of others). Unfortunately the
 iommu API lacks support for cache management, and probably other things
 as well, because this was not an issue for the original use case
 (device assignment on KVM/x86).

 This could be done by adding explicit or implied cache management
 to the IOMMU mapping interfaces, or by extending the dma-mapping
 interfaces in a way that covers the use case of the device managing
 its own address space, in addition to the existing coherent and
 streaming interfaces.

I think for gpu's, we'd prefer explicit and less abstraction..  which
is probably opposite of what every other driver would want

In the end, my eventual goal is explicit control of tlb flush, and
control of my address space.  And in fact in some cases we are going
to want to use the gpu to bang on iommu registers to do context
switches and tlb flushes.  (Which is obviously not the first step..
and something that is fairly difficult to get right/secure.. but the
performance win seems significant so I'm not sure we can avoid it.)

BR,
-R


 Arnd
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Linaro-mm-sig] [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms

2015-02-03 Thread Rob Clark
On Tue, Feb 3, 2015 at 11:12 AM, Arnd Bergmann a...@arndb.de wrote:
 I agree for the case you are describing here. From what I understood
 from Rob was that he is looking at something more like:

 Fig 3
 CPU--L1cache--L2cache--Memory--IOMMU---iobus--device

 where the IOMMU controls one or more contexts per device, and is
 shared across GPU and non-GPU devices. Here, we need to use the
 dmap-mapping interface to set up the IO page table for any device
 that is unable to address all of system RAM, and we can use it
 for purposes like isolation of the devices. There are also cases
 where using the IOMMU is not optional.


Actually, just to clarify, the IOMMU instance is specific to the GPU..
not shared with other devices.  Otherwise managing multiple contexts
would go quite badly..

But other devices have their own instance of the same IOMMU.. so same
driver could be used.

BR,
-R
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms

2015-02-03 Thread Rob Clark
On Tue, Feb 3, 2015 at 7:28 AM, Russell King - ARM Linux
li...@arm.linux.org.uk wrote:
 On Tue, Feb 03, 2015 at 08:48:56AM +0100, Daniel Vetter wrote:
 On Mon, Feb 02, 2015 at 03:30:21PM -0500, Rob Clark wrote:
  On Mon, Feb 2, 2015 at 11:54 AM, Daniel Vetter dan...@ffwll.ch wrote:
   My initial thought is for dma-buf to not try to prevent something than
   an exporter can actually do.. I think the scenario you describe could
   be handled by two sg-lists, if the exporter was clever enough.
  
   That's already needed, each attachment has it's own sg-list. After all
   there's no array of dma_addr_t in the sg tables, so you can't use one sg
   for more than one mapping. And due to different iommu different devices
   can easily end up with different addresses.
 
 
  Well, to be fair it may not be explicitly stated, but currently one
  should assume the dma_addr_t's in the dmabuf sglist are bogus.  With
  gpu's that implement per-process/context page tables, I'm not really
  sure that there is a sane way to actually do anything else..

 Hm, what does per-process/context page tables have to do here? At least on
 i915 we have a two levels of page tables:
 - first level for vm/device isolation, used through dma api
 - 2nd level for per-gpu-context isolation and context switching, handled
   internally.

 Since atm the dma api doesn't have any context of contexts or different
 pagetables, I don't see who you could use that at all.

 What I've found with *my* etnaviv drm implementation (not Christian's - I
 found it impossible to work with Christian, especially with the endless
 msm doesn't do it that way, so we shouldn't responses and his attitude
 towards cherry-picking my development work [*]) is that it's much easier to
 keep the GPU MMU local to the GPU and under the control of the DRM MM code,
 rather than attaching the IOMMU to the DMA API and handling it that way.

 There are several reasons for that:

 1. DRM has a better idea about when the memory needs to be mapped to the
GPU, and it can more effectively manage the GPU MMU.

 2. The GPU MMU may have TLBs which can only be flushed via a command in
the GPU command stream, so it's fundamentally necessary for the MMU to
be managed by the GPU driver so that it knows when (and how) to insert
the flushes.


If gpu mmu needs some/all updates to happen from command-stream then
probably better to handle it internally..

That is a slightly different scenario from msm, where we have many
instances of the same iommu[*] scattered through the SoC in front of
various different devices.

BR,
-R

[*] at least from iommu register layout, same driver is used for all
instances.. but maybe the tlb+walker are maybe more tightly integrated
to the gpu, but that is just speculation on implementation details
based on some paper I found along the way


 * - as a direct result of that, I've stopped all further development of
 etnaviv drm, and I'm intending to strip it out from my Xorg DDX driver
 as the etnaviv drm API which Christian wants is completely incompatible
 with the non-etnaviv drm, and that just creates far too much pain in the
 DDX driver.

 --
 FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
 according to speedtest.net.
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms

2015-02-03 Thread Rob Clark
On Tue, Feb 3, 2015 at 2:48 AM, Daniel Vetter dan...@ffwll.ch wrote:
 On Mon, Feb 02, 2015 at 03:30:21PM -0500, Rob Clark wrote:
 On Mon, Feb 2, 2015 at 11:54 AM, Daniel Vetter dan...@ffwll.ch wrote:
  My initial thought is for dma-buf to not try to prevent something than
  an exporter can actually do.. I think the scenario you describe could
  be handled by two sg-lists, if the exporter was clever enough.
 
  That's already needed, each attachment has it's own sg-list. After all
  there's no array of dma_addr_t in the sg tables, so you can't use one sg
  for more than one mapping. And due to different iommu different devices
  can easily end up with different addresses.


 Well, to be fair it may not be explicitly stated, but currently one
 should assume the dma_addr_t's in the dmabuf sglist are bogus.  With
 gpu's that implement per-process/context page tables, I'm not really
 sure that there is a sane way to actually do anything else..

 Hm, what does per-process/context page tables have to do here? At least on
 i915 we have a two levels of page tables:
 - first level for vm/device isolation, used through dma api
 - 2nd level for per-gpu-context isolation and context switching, handled
   internally.

 Since atm the dma api doesn't have any context of contexts or different
 pagetables, I don't see who you could use that at all.

Since I'm stuck w/ an iommu, instead of built in mmu, my plan was to
drop use of dma-mapping entirely (incl the current call to dma_map_sg,
which I just need until we can use drm_cflush on arm), and
attach/detach iommu domains directly to implement context switches.
At that point, dma_addr_t really has no sensible meaning for me.

BR,
-R

 -Daniel
 --
 Daniel Vetter
 Software Engineer, Intel Corporation
 +41 (0) 79 365 57 48 - http://blog.ffwll.ch
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Linaro-mm-sig] [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms

2015-02-03 Thread Rob Clark
On Tue, Feb 3, 2015 at 11:58 AM, Russell King - ARM Linux
li...@arm.linux.org.uk wrote:

 Okay, but switching contexts is not something which the DMA API has
 any knowledge of (so it can't know which context to associate with
 which mapping.)  While it knows which device, it has no knowledge
 (nor is there any way for it to gain knowledge) about contexts.

 My personal view is that extending the DMA API in this way feels quite
 dirty - it's a violation of the DMA API design, which is to (a) demark
 the buffer ownership between CPU and DMA agent, and (b) to translate
 buffer locations into a cookie which device drivers can use to instruct
 their device to access that memory.  To see why, consider... that you
 map a buffer to a device in context A, and then you switch to context B,
 which means the dma_addr_t given previously is no longer valid.  You
 then try to unmap it... which is normally done using the (now no longer
 valid) dma_addr_t.

 It seems to me that to support this at DMA API level, we would need to
 completely revamp the DMA API, which IMHO isn't going to be nice.  (It
 would mean that we end up with three APIs - the original PCI DMA API,
 the existing DMA API, and some new DMA API.)

 Do we have any views on how common this feature is?


I can't think of cases outside of GPU's..  if it were more common I'd
be in favor of teaching dma api about multiple contexts, but right now
I think that would just amount to forcing a lot of churn on everyone
else for the benefit of GPU's.

IMHO it makes more sense for GPU drivers to bypass the dma api if they
need to.  Plus, sooner or later, someone will discover that with some
trick or optimization they can get moar fps, but the extra layer of
abstraction will just be getting in the way.

BR,
-R
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms

2015-02-02 Thread Rob Clark
On Mon, Feb 2, 2015 at 11:54 AM, Daniel Vetter dan...@ffwll.ch wrote:
 My initial thought is for dma-buf to not try to prevent something than
 an exporter can actually do.. I think the scenario you describe could
 be handled by two sg-lists, if the exporter was clever enough.

 That's already needed, each attachment has it's own sg-list. After all
 there's no array of dma_addr_t in the sg tables, so you can't use one sg
 for more than one mapping. And due to different iommu different devices
 can easily end up with different addresses.


Well, to be fair it may not be explicitly stated, but currently one
should assume the dma_addr_t's in the dmabuf sglist are bogus.  With
gpu's that implement per-process/context page tables, I'm not really
sure that there is a sane way to actually do anything else..

BR,
-R
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms

2015-02-02 Thread Rob Clark
On Mon, Feb 2, 2015 at 4:46 PM, Russell King - ARM Linux
li...@arm.linux.org.uk wrote:
 On Mon, Feb 02, 2015 at 03:30:21PM -0500, Rob Clark wrote:
 On Mon, Feb 2, 2015 at 11:54 AM, Daniel Vetter dan...@ffwll.ch wrote:
  My initial thought is for dma-buf to not try to prevent something than
  an exporter can actually do.. I think the scenario you describe could
  be handled by two sg-lists, if the exporter was clever enough.
 
  That's already needed, each attachment has it's own sg-list. After all
  there's no array of dma_addr_t in the sg tables, so you can't use one sg
  for more than one mapping. And due to different iommu different devices
  can easily end up with different addresses.


 Well, to be fair it may not be explicitly stated, but currently one
 should assume the dma_addr_t's in the dmabuf sglist are bogus.  With
 gpu's that implement per-process/context page tables, I'm not really
 sure that there is a sane way to actually do anything else..

 That's incorrect - and goes dead against the design of scatterlists.

yeah, a bit of an abuse, although I'm not sure I see a much better way
when a device vaddr depends on user context..

 Not only that, but it is entirely possible that you may get handed
 memory via dmabufs for which there are no struct page's associated
 with that memory - think about display systems which have their own
 video memory which is accessible to the GPU, but it isn't system
 memory.

well, I guess anyways when it comes to sharing buffers, it won't be
the vram placement of the bo that gets shared ;-)

BR,
-R

 In those circumstances, you have to use the dma_addr_t's and not the
 pages.

 --
 FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
 according to speedtest.net.
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms

2015-01-29 Thread Rob Clark
On Thu, Jan 29, 2015 at 10:47 AM, Russell King - ARM Linux
li...@arm.linux.org.uk wrote:
 On Thu, Jan 29, 2015 at 09:00:11PM +0530, Sumit Semwal wrote:
 So, short answer is, it is left to the exporter to decide. The dma-buf
 framework should not even attempt to decide or enforce any of the
 above.

 At each dma_buf_attach(), there's a callback to the exporter, where
 the exporter can decide, if it intends to handle these kind of cases,
 on the best way forward.

 The exporter might, for example, decide to migrate backing storage,

 That's a decision which the exporter can not take.  Think about it...

 If subsystem Y has mapped the buffer, it could be accessing the buffer's
 backing storage at the same time that subsystem Z tries to attach to the
 buffer.

The *theory* is that Y is map/unmap'ing the buffer around each use, so
there will be some point where things could be migrated and remapped..
in practice, I am not sure that anyone is doing this yet.

Probably it would be reasonable if a more restrictive subsystem tried
to attach after the buffer was already allocated and mapped in a way
that don't meet the new constraints, then -EBUSY.

But from a quick look it seems like there needs to be a slight fixup
to not return 0 if calc_constraints() fails..

 Once the buffer has been exported to another user, the exporter has
 effectively lost control over mediating accesses to that buffer.

 All that it can do with the way the dma-buf API is today is to allocate
 a _different_ scatter list pointing at the same backing storage which
 satisfies the segment size and number of segments, etc.

 There's also another issue which you haven't addressed.  What if several
 attachments result in lowering max_segment_size and max_segment_count
 such that:

 max_segment_size * max_segment_count  dmabuf-size

 but individually, the attachments allow dmabuf-size to be represented
 as a scatterlist?

Quite possibly for some of these edge some of cases, some of the
dma-buf exporters are going to need to get more clever (ie. hand off
different scatterlists to different clients).  Although I think by far
the two common cases will be I can support anything via an iommu/mmu
and I need phys contig.

But that isn't an issue w/ dma-buf itself, so much as it is an issue
w/ drivers.  I guess there would be more interest in fixing up drivers
when actual hw comes along that needs it..

BR,
-R

 If an exporter were to take notice of the max_segment_size and
 max_segment_count, the resulting buffer is basically unrepresentable
 as a scatterlist.

  Please consider the possible sequences of use (such as the scenario
  above) when creating or augmenting an API.
 

 I tried to think of the scenarios I could think of, but If you still
 feel this approach doesn't help with your concerns, I'll graciously
 accept advice to improve it.

 See the new one above :)

 --
 FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
 according to speedtest.net.
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms

2015-01-29 Thread Rob Clark
On Thu, Jan 29, 2015 at 2:26 PM, Russell King - ARM Linux
li...@arm.linux.org.uk wrote:
 On Thu, Jan 29, 2015 at 01:52:09PM -0500, Rob Clark wrote:
 Quite possibly for some of these edge some of cases, some of the
 dma-buf exporters are going to need to get more clever (ie. hand off
 different scatterlists to different clients).  Although I think by far
 the two common cases will be I can support anything via an iommu/mmu
 and I need phys contig.

 But that isn't an issue w/ dma-buf itself, so much as it is an issue
 w/ drivers.  I guess there would be more interest in fixing up drivers
 when actual hw comes along that needs it..

 However, validating the attachments is the business of dma-buf.  This
 is actual infrastructure, which should ensure some kind of sanity such
 as the issues I've raised.


My initial thought is for dma-buf to not try to prevent something than
an exporter can actually do.. I think the scenario you describe could
be handled by two sg-lists, if the exporter was clever enough.

That all said, I think probably all the existing exporters cache the
sg-list.  And I can't think of any actual hw which would hit this
problem that can be solved by multiple sg-lists for the same physical
memory.  (And the constraint calculation kind of assumes the end
result will be a single sg-list.)  So it seems reasonable to me to
check that max_segment_count * max_segment_size is not smaller than
the buffer.

If it was a less theoretical problem, I think I'd more inclined for a
way that the exporter could override the checks, or something along
those lines.

otoh, if the attachment is just not possible because the buffer has
been already allocated and mapped by someone with more relaxed
constraints.. then I think the driver should be the one returning the
error since dma-buf doesn't know this.

 The whole we can push it onto our users is really on - what that
 results in is the users ignoring most of the requirements and just doing
 their own thing, which ultimately ends up with the whole thing turning
 into a disgusting mess - one which becomes very difficult to fix later.

Ideally at some point, dma-mapping or some helpers would support
allocations matching constraints..  I think only actual gpu drivers
want to do crazy enough things that they'd want to bypass dma-mapping.
If everyone else can use dma-mapping and/or helpers then we make it
harder for drivers to do the wrong thing than to do the right thing.

 Now, if we're going to do the more clever thing you mention above,
 that rather negates the point of this two-part patch set, which is to
 provide the union of the DMA capabilities of all users.  A union in
 that case is no longer sane as we'd be tailoring the SG lists to each
 user.

It doesn't really negate.. a different sg list representing the same
physical memory cannot suddenly make the buffer physically contiguous
(from the perspective of memory)..

(unless we are not on the same page here, so to speak)

BR,
-R

 If we aren't going to do the more clever thing, then yes, we need this
 code to calculate that union, but we _also_ need it to do sanity checking
 right from the start, and refuse conditions which ultimately break the
 ability to make use of that union - in other words, when the union of
 the DMA capabilities means that the dmabuf can't be represented.

 Unless we do that, we'll just end up with random drivers interpreting
 what they want from the DMA capabilities, and we'll have some drivers
 exporting (eg) scatterlists which satisfy the maximum byte size of an
 element, but ignoring the maximum number of entries or vice versa, and
 that'll most probably hide the case of too small a union.

 It really doesn't make sense to do both either: that route is even more
 madness, because we'll end up with two classes of drivers - those which
 use the union approach, and those which don't.

 The KISS principle applies here.

 --
 FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
 according to speedtest.net.
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms

2015-01-29 Thread Rob Clark
On Thu, Jan 29, 2015 at 5:31 PM, Russell King - ARM Linux
li...@arm.linux.org.uk wrote:
 On Thu, Jan 29, 2015 at 05:18:33PM -0500, Rob Clark wrote:
 On Thu, Jan 29, 2015 at 2:26 PM, Russell King - ARM Linux
 li...@arm.linux.org.uk wrote:
  Now, if we're going to do the more clever thing you mention above,
  that rather negates the point of this two-part patch set, which is to
  provide the union of the DMA capabilities of all users.  A union in
  that case is no longer sane as we'd be tailoring the SG lists to each
  user.

 It doesn't really negate.. a different sg list representing the same
 physical memory cannot suddenly make the buffer physically contiguous
 (from the perspective of memory)..

 (unless we are not on the same page here, so to speak)

 If we are really only interested in the physically contiguous vs
 scattered differentiation, why can't this be just a simple flag?

I'd be fine with that..  I was trying to make it a bit less of a point
solution, but maybe trying to be too generic is not worth it..

There is apparently some hw which has iommu's but small # of tlb
entries, and would prefer partially contiguous buffers.  But that
isn't a hard constraint, and maybe shouldn't be solved w/
max_segment_count.  And I'm not sure how common that is.

 I think I know where you're coming from on that distinction - most
 GPUs can cope with their buffers being discontiguous in memory, but
 scanout and capture hardware tends to need contiguous buffers.

 My guess is that you're looking for some way that a GPU driver could
 allocate a buffer, which can then be imported into the scanout
 hardware - and when it is, the underlying backing store is converted
 to a contiguous buffer.  Is that the usage scenario you're thinking
 of?

Pretty much..  and maybe a few slight permutations on that involving
cameras / video codecs / etc.  But the really-really common case is
gpu (with mmu/iommu) + display (without).  Just solving this problem
would be a really good first step.

BR,
-R


 --
 FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
 according to speedtest.net.
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [REPOST PATCH 1/8] fence: dma-buf cross-device synchronization (v17)

2014-06-19 Thread Rob Clark
, 0 if the wait timed out, or the
 + * remaining timeout in jiffies on success.
 + */
 +long

 Shouldn't this be signed to be explicit?

 +fence_default_wait(struct fence *fence, bool intr, signed long timeout)
 +{
 + struct default_wait_cb cb;
 + unsigned long flags;
 + long ret = timeout;
 + bool was_set;
 +
 + if (test_bit(FENCE_FLAG_SIGNALED_BIT, fence-flags))
 + return timeout;
 +
 + spin_lock_irqsave(fence-lock, flags);
 +
 + if (intr  signal_pending(current)) {
 + ret = -ERESTARTSYS;
 + goto out;
 + }
 +
 + was_set = test_and_set_bit(FENCE_FLAG_ENABLE_SIGNAL_BIT, 
 fence-flags);
 +
 + if (test_bit(FENCE_FLAG_SIGNALED_BIT, fence-flags))
 + goto out;
 +
 + if (!was_set) {
 + trace_fence_enable_signal(fence);
 +
 + if (!fence-ops-enable_signaling(fence)) {
 + __fence_signal(fence);
 + goto out;
 + }
 + }
 +
 + cb.base.func = fence_default_wait_cb;
 + cb.task = current;
 + list_add(cb.base.node, fence-cb_list);
 +
 + while (!test_bit(FENCE_FLAG_SIGNALED_BIT, fence-flags)  ret  0) {
 + if (intr)
 + __set_current_state(TASK_INTERRUPTIBLE);
 + else
 + __set_current_state(TASK_UNINTERRUPTIBLE);
 + spin_unlock_irqrestore(fence-lock, flags);
 +
 + ret = schedule_timeout(ret);
 +
 + spin_lock_irqsave(fence-lock, flags);
 + if (ret  0  intr  signal_pending(current))
 + ret = -ERESTARTSYS;
 + }
 +
 + if (!list_empty(cb.base.node))
 + list_del(cb.base.node);
 + __set_current_state(TASK_RUNNING);
 +
 +out:
 + spin_unlock_irqrestore(fence-lock, flags);
 + return ret;
 +}
 +EXPORT_SYMBOL(fence_default_wait);
 +
 +/**
 + * __fence_init - Initialize a custom fence.
 + * @fence:   [in]the fence to initialize
 + * @ops: [in]the fence_ops for operations on this fence
 + * @lock:[in]the irqsafe spinlock to use for locking this fence
 + * @context: [in]the execution context this fence is run on
 + * @seqno:   [in]a linear increasing sequence number for this context
 + *
 + * Initializes an allocated fence, the caller doesn't have to keep its
 + * refcount after committing with this fence, but it will need to hold a
 + * refcount again if fence_ops.enable_signaling gets called. This can
 + * be used for other implementing other types of fence.
 + *
 + * context and seqno are used for easy comparison between fences, allowing
 + * to check which fence is later by simply using fence_later.
 + */
 +void
 +__fence_init(struct fence *fence, const struct fence_ops *ops,
 +  spinlock_t *lock, unsigned context, unsigned seqno)
 +{
 + BUG_ON(!lock);
 + BUG_ON(!ops || !ops-wait || !ops-enable_signaling ||
 +!ops-get_driver_name || !ops-get_timeline_name);
 +
 + kref_init(fence-refcount);
 + fence-ops = ops;
 + INIT_LIST_HEAD(fence-cb_list);
 + fence-lock = lock;
 + fence-context = context;
 + fence-seqno = seqno;
 + fence-flags = 0UL;
 +
 + trace_fence_init(fence);
 +}
 +EXPORT_SYMBOL(__fence_init);

 Again with the __ exported function...

 I don't even see a fence_init() function anywhere, why the __ ?


think of it as a 'protected' constructor.. only the derived fence
subclass should call.


 diff --git a/include/linux/fence.h b/include/linux/fence.h
 new file mode 100644
 index ..65f2a01ee7e4
 --- /dev/null
 +++ b/include/linux/fence.h
 @@ -0,0 +1,333 @@
 +/*
 + * Fence mechanism for dma-buf to allow for asynchronous dma access
 + *
 + * Copyright (C) 2012 Canonical Ltd
 + * Copyright (C) 2012 Texas Instruments
 + *
 + * Authors:
 + * Rob Clark robdcl...@gmail.com
 + * Maarten Lankhorst maarten.lankho...@canonical.com
 + *
 + * This program is free software; you can redistribute it and/or modify it
 + * under the terms of the GNU General Public License version 2 as published 
 by
 + * the Free Software Foundation.
 + *
 + * This program is distributed in the hope that it will be useful, but 
 WITHOUT
 + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
 + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
 + * more details.
 + *
 + * You should have received a copy of the GNU General Public License along 
 with
 + * this program.  If not, see http://www.gnu.org/licenses/.
 + */
 +
 +#ifndef __LINUX_FENCE_H
 +#define __LINUX_FENCE_H
 +
 +#include linux/err.h
 +#include linux/wait.h
 +#include linux/list.h
 +#include linux/bitops.h
 +#include linux/kref.h
 +#include linux/sched.h
 +#include linux/printk.h
 +
 +struct fence;
 +struct fence_ops;
 +struct fence_cb;
 +
 +/**
 + * struct fence - software synchronization primitive
 + * @refcount: refcount for this fence
 + * @ops: fence_ops associated with this fence
 + * @cb_list: list of all

Re: [REPOST PATCH 1/8] fence: dma-buf cross-device synchronization (v17)

2014-06-19 Thread Rob Clark
On Thu, Jun 19, 2014 at 1:00 PM, Greg KH gre...@linuxfoundation.org wrote:
 On Thu, Jun 19, 2014 at 10:00:18AM -0400, Rob Clark wrote:
 On Wed, Jun 18, 2014 at 9:13 PM, Greg KH gre...@linuxfoundation.org wrote:
  On Wed, Jun 18, 2014 at 12:36:54PM +0200, Maarten Lankhorst wrote:
  +#define CREATE_TRACE_POINTS
  +#include trace/events/fence.h
  +
  +EXPORT_TRACEPOINT_SYMBOL(fence_annotate_wait_on);
  +EXPORT_TRACEPOINT_SYMBOL(fence_emit);
 
  Are you really willing to live with these as tracepoints for forever?
  What is the use of them in debugging?  Was it just for debugging the
  fence code, or for something else?
 
  +/**
  + * fence_context_alloc - allocate an array of fence contexts
  + * @num: [in]amount of contexts to allocate
  + *
  + * This function will return the first index of the number of fences 
  allocated.
  + * The fence context is used for setting fence-context to a unique 
  number.
  + */
  +unsigned fence_context_alloc(unsigned num)
  +{
  + BUG_ON(!num);
  + return atomic_add_return(num, fence_context_counter) - num;
  +}
  +EXPORT_SYMBOL(fence_context_alloc);
 
  EXPORT_SYMBOL_GPL()?  Same goes for all of the exports in here.
  Traditionally all of the driver core exports have been with this
  marking, any objection to making that change here as well?

 tbh, I prefer EXPORT_SYMBOL()..  well, I'd prefer even more if there
 wasn't even a need for EXPORT_SYMBOL_GPL(), but sadly it is a fact of
 life.  We already went through this debate once with dma-buf.  We
 aren't going to change $evil_vendor's mind about non-gpl modules.  The
 only result will be a more flugly convoluted solution (ie. use syncpt
 EXPORT_SYMBOL() on top of fence EXPORT_SYMBOL_GPL()) just as a
 workaround, with the result that no-one benefits.

 It has been proven that using _GPL() exports have caused companies to
 release their code properly over the years, so as these really are
 Linux-only apis, please change them to be marked this way, it helps
 everyone out in the end.

Well, maybe that is the true in some cases.  But it certainly didn't
work out that way for dma-buf.  And I think the end result is worse.

I don't really like coming down on the side of EXPORT_SYMBOL() instead
of EXPORT_SYMBOL_GPL(), but if we do use EXPORT_SYMBOL_GPL() then the
result will only be creative workarounds using the _GPL symbols
indirectly by whatever is available via EXPORT_SYMBOL().  I don't
really see how that will be better.

  +int __fence_signal(struct fence *fence)
  +{
  + struct fence_cb *cur, *tmp;
  + int ret = 0;
  +
  + if (WARN_ON(!fence))
  + return -EINVAL;
  +
  + if (!ktime_to_ns(fence-timestamp)) {
  + fence-timestamp = ktime_get();
  + smp_mb__before_atomic();
  + }
  +
  + if (test_and_set_bit(FENCE_FLAG_SIGNALED_BIT, fence-flags)) {
  + ret = -EINVAL;
  +
  + /*
  +  * we might have raced with the unlocked fence_signal,
  +  * still run through all callbacks
  +  */
  + } else
  + trace_fence_signaled(fence);
  +
  + list_for_each_entry_safe(cur, tmp, fence-cb_list, node) {
  + list_del_init(cur-node);
  + cur-func(fence, cur);
  + }
  + return ret;
  +}
  +EXPORT_SYMBOL(__fence_signal);
 
  Don't export a function with __ in front of it, you are saying that an
  internal function is somehow valid for everyone else to call?  Why?
  You aren't even documenting the function, so who knows how to use it?

 fwiw, the __ versions appear to mainly be concessions for android
 syncpt.  That is the only user outside of fence.c, and it should stay
 that way.

 How are you going to ensure this?  And where did you document it?
 Please fix this up, it's a horrid way to create a new api.

 If the android code needs to be fixed to fit into this model, then fix
 it.

heh, and in fact I was wrong about this.. the __ versions are actually
for when the lock is already held.  Maarten needs to rename (ie
_locked suffix) and add some API docs for this.

  +void
  +__fence_init(struct fence *fence, const struct fence_ops *ops,
  +  spinlock_t *lock, unsigned context, unsigned seqno)
  +{
  + BUG_ON(!lock);
  + BUG_ON(!ops || !ops-wait || !ops-enable_signaling ||
  +!ops-get_driver_name || !ops-get_timeline_name);
  +
  + kref_init(fence-refcount);
  + fence-ops = ops;
  + INIT_LIST_HEAD(fence-cb_list);
  + fence-lock = lock;
  + fence-context = context;
  + fence-seqno = seqno;
  + fence-flags = 0UL;
  +
  + trace_fence_init(fence);
  +}
  +EXPORT_SYMBOL(__fence_init);
 
  Again with the __ exported function...
 
  I don't even see a fence_init() function anywhere, why the __ ?
 

 think of it as a 'protected' constructor.. only the derived fence
 subclass should call.

 Where do you say this?  Again, not a good reason, fix up the api please.

  + kref_get(fence-refcount);
  +}
 
  Why

Re: [REPOST PATCH 1/8] fence: dma-buf cross-device synchronization (v17)

2014-06-19 Thread Rob Clark
On Thu, Jun 19, 2014 at 2:19 PM, Greg KH gre...@linuxfoundation.org wrote:
 On Thu, Jun 19, 2014 at 01:45:30PM -0400, Rob Clark wrote:
 On Thu, Jun 19, 2014 at 1:00 PM, Greg KH gre...@linuxfoundation.org wrote:
  On Thu, Jun 19, 2014 at 10:00:18AM -0400, Rob Clark wrote:
  On Wed, Jun 18, 2014 at 9:13 PM, Greg KH gre...@linuxfoundation.org 
  wrote:
   On Wed, Jun 18, 2014 at 12:36:54PM +0200, Maarten Lankhorst wrote:
   +#define CREATE_TRACE_POINTS
   +#include trace/events/fence.h
   +
   +EXPORT_TRACEPOINT_SYMBOL(fence_annotate_wait_on);
   +EXPORT_TRACEPOINT_SYMBOL(fence_emit);
  
   Are you really willing to live with these as tracepoints for forever?
   What is the use of them in debugging?  Was it just for debugging the
   fence code, or for something else?
  
   +/**
   + * fence_context_alloc - allocate an array of fence contexts
   + * @num: [in]amount of contexts to allocate
   + *
   + * This function will return the first index of the number of fences 
   allocated.
   + * The fence context is used for setting fence-context to a unique 
   number.
   + */
   +unsigned fence_context_alloc(unsigned num)
   +{
   + BUG_ON(!num);
   + return atomic_add_return(num, fence_context_counter) - num;
   +}
   +EXPORT_SYMBOL(fence_context_alloc);
  
   EXPORT_SYMBOL_GPL()?  Same goes for all of the exports in here.
   Traditionally all of the driver core exports have been with this
   marking, any objection to making that change here as well?
 
  tbh, I prefer EXPORT_SYMBOL()..  well, I'd prefer even more if there
  wasn't even a need for EXPORT_SYMBOL_GPL(), but sadly it is a fact of
  life.  We already went through this debate once with dma-buf.  We
  aren't going to change $evil_vendor's mind about non-gpl modules.  The
  only result will be a more flugly convoluted solution (ie. use syncpt
  EXPORT_SYMBOL() on top of fence EXPORT_SYMBOL_GPL()) just as a
  workaround, with the result that no-one benefits.
 
  It has been proven that using _GPL() exports have caused companies to
  release their code properly over the years, so as these really are
  Linux-only apis, please change them to be marked this way, it helps
  everyone out in the end.

 Well, maybe that is the true in some cases.  But it certainly didn't
 work out that way for dma-buf.  And I think the end result is worse.

 I don't really like coming down on the side of EXPORT_SYMBOL() instead
 of EXPORT_SYMBOL_GPL(), but if we do use EXPORT_SYMBOL_GPL() then the
 result will only be creative workarounds using the _GPL symbols
 indirectly by whatever is available via EXPORT_SYMBOL().  I don't
 really see how that will be better.

 You are saying that you _know_ companies will violate our license, so
 you should just give up?  And how do you know people aren't working on
 preventing those indirect usages as well?  :)

Well, all I know is what happened with dmabuf.  This seems like the
exact same scenario (same vendor, same driver, same use-case).

Not really sure how we could completely prevent indirect usage, given
that drm core and many of the drivers are dual MIT/GPL.   (But ofc,
IANAL.)

 Sorry, I'm not going to give up here, again, it has proven to work in
 the past in changing the ways of _very_ large companies, why stop now?

In the general case, I would agree.  But in this specific case, I am
not very optimistic.

That said, it isn't really my loss if it is _GPL()..  I don't have to
use or support that particular driver.  But given that we have some
history from the same debate with dma-buf, I think it is pretty easy
to infer the result from making fence EXPORT_SYMBOL_GPL().

BR,
-R

 thanks,

 greg k-h
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [REPOST PATCH 1/8] fence: dma-buf cross-device synchronization (v17)

2014-06-19 Thread Rob Clark
On Thu, Jun 19, 2014 at 5:50 PM, Dave Airlie airl...@gmail.com wrote:
 On 20 June 2014 04:19, Greg KH gre...@linuxfoundation.org wrote:
 On Thu, Jun 19, 2014 at 01:45:30PM -0400, Rob Clark wrote:
 On Thu, Jun 19, 2014 at 1:00 PM, Greg KH gre...@linuxfoundation.org wrote:
  On Thu, Jun 19, 2014 at 10:00:18AM -0400, Rob Clark wrote:
  On Wed, Jun 18, 2014 at 9:13 PM, Greg KH gre...@linuxfoundation.org 
  wrote:
   On Wed, Jun 18, 2014 at 12:36:54PM +0200, Maarten Lankhorst wrote:
   +#define CREATE_TRACE_POINTS
   +#include trace/events/fence.h
   +
   +EXPORT_TRACEPOINT_SYMBOL(fence_annotate_wait_on);
   +EXPORT_TRACEPOINT_SYMBOL(fence_emit);
  
   Are you really willing to live with these as tracepoints for forever?
   What is the use of them in debugging?  Was it just for debugging the
   fence code, or for something else?
  
   +/**
   + * fence_context_alloc - allocate an array of fence contexts
   + * @num: [in]amount of contexts to allocate
   + *
   + * This function will return the first index of the number of fences 
   allocated.
   + * The fence context is used for setting fence-context to a unique 
   number.
   + */
   +unsigned fence_context_alloc(unsigned num)
   +{
   + BUG_ON(!num);
   + return atomic_add_return(num, fence_context_counter) - num;
   +}
   +EXPORT_SYMBOL(fence_context_alloc);
  
   EXPORT_SYMBOL_GPL()?  Same goes for all of the exports in here.
   Traditionally all of the driver core exports have been with this
   marking, any objection to making that change here as well?
 
  tbh, I prefer EXPORT_SYMBOL()..  well, I'd prefer even more if there
  wasn't even a need for EXPORT_SYMBOL_GPL(), but sadly it is a fact of
  life.  We already went through this debate once with dma-buf.  We
  aren't going to change $evil_vendor's mind about non-gpl modules.  The
  only result will be a more flugly convoluted solution (ie. use syncpt
  EXPORT_SYMBOL() on top of fence EXPORT_SYMBOL_GPL()) just as a
  workaround, with the result that no-one benefits.
 
  It has been proven that using _GPL() exports have caused companies to
  release their code properly over the years, so as these really are
  Linux-only apis, please change them to be marked this way, it helps
  everyone out in the end.

 Well, maybe that is the true in some cases.  But it certainly didn't
 work out that way for dma-buf.  And I think the end result is worse.

 I don't really like coming down on the side of EXPORT_SYMBOL() instead
 of EXPORT_SYMBOL_GPL(), but if we do use EXPORT_SYMBOL_GPL() then the
 result will only be creative workarounds using the _GPL symbols
 indirectly by whatever is available via EXPORT_SYMBOL().  I don't
 really see how that will be better.

 You are saying that you _know_ companies will violate our license, so
 you should just give up?  And how do you know people aren't working on
 preventing those indirect usages as well?  :)

 Sorry, I'm not going to give up here, again, it has proven to work in
 the past in changing the ways of _very_ large companies, why stop now?

 I've found large companies shipping lots of hw putting pressure on
 other large/small companies seems to be only way this has ever
 happened, we'd like to cover that up and say its some great GPL
 enforcement thing.

 To be honest, author's choice is how I'd treat this.

 Personally I think _GPL is broken by design, and that Linus's initial
 point for them has been so diluted by random lobby groups asking for
 every symbol to be _GPL that they are becoming effectively pointless
 now. I also dislike the fact that the lobby groups don't just bring
 violators to court. I'm also sure someone like the LF could have a
 nice income stream if Linus gave them permission to enforce his
 copyrights.

 But anyways, has someone checked that iOS or Windows don't have a
 fence interface? so we know that this is a Linux only interface and
 any works using it are derived? Say the nvidia driver isn't a derived
 work now, will using this interface magically translate it into a
 derived work, so we can go sue them? I don't think so.

I've no ideas about what the APIs are in windows, but windows has had
multi-gpu support for a *long* time, which implies some mechanism like
dmabuf and fence.. this isn't exactly an area where we are
trailblazing here.

BR,
-R


 But its up to Maarten and Rob, and if they say no _GPL then I don't
 think we should be overriding authors intents.

 Dave.
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [REPOST PATCH 1/8] fence: dma-buf cross-device synchronization (v17)

2014-06-18 Thread Rob Clark
;
 +
 + if (test_bit(FENCE_FLAG_SIGNALED_BIT, fence-flags))
 + return timeout;
 +
 + spin_lock_irqsave(fence-lock, flags);
 +
 + if (intr  signal_pending(current)) {
 + ret = -ERESTARTSYS;
 + goto out;
 + }
 +
 + was_set = test_and_set_bit(FENCE_FLAG_ENABLE_SIGNAL_BIT, 
 fence-flags);
 +
 + if (test_bit(FENCE_FLAG_SIGNALED_BIT, fence-flags))
 + goto out;
 +
 + if (!was_set) {
 + trace_fence_enable_signal(fence);
 +
 + if (!fence-ops-enable_signaling(fence)) {
 + __fence_signal(fence);
 + goto out;
 + }
 + }
 +
 + cb.base.func = fence_default_wait_cb;
 + cb.task = current;
 + list_add(cb.base.node, fence-cb_list);
 +
 + while (!test_bit(FENCE_FLAG_SIGNALED_BIT, fence-flags)  ret  0) {
 + if (intr)
 + __set_current_state(TASK_INTERRUPTIBLE);
 + else
 + __set_current_state(TASK_UNINTERRUPTIBLE);
 + spin_unlock_irqrestore(fence-lock, flags);
 +
 + ret = schedule_timeout(ret);
 +
 + spin_lock_irqsave(fence-lock, flags);
 + if (ret  0  intr  signal_pending(current))
 + ret = -ERESTARTSYS;
 + }
 +
 + if (!list_empty(cb.base.node))
 + list_del(cb.base.node);
 + __set_current_state(TASK_RUNNING);
 +
 +out:
 + spin_unlock_irqrestore(fence-lock, flags);
 + return ret;
 +}
 +EXPORT_SYMBOL(fence_default_wait);
 +
 +/**
 + * __fence_init - Initialize a custom fence.
 + * @fence:   [in]the fence to initialize
 + * @ops: [in]the fence_ops for operations on this fence
 + * @lock:[in]the irqsafe spinlock to use for locking this fence
 + * @context: [in]the execution context this fence is run on
 + * @seqno:   [in]a linear increasing sequence number for this context
 + *
 + * Initializes an allocated fence, the caller doesn't have to keep its
 + * refcount after committing with this fence, but it will need to hold a
 + * refcount again if fence_ops.enable_signaling gets called. This can
 + * be used for other implementing other types of fence.
 + *
 + * context and seqno are used for easy comparison between fences, allowing
 + * to check which fence is later by simply using fence_later.
 + */
 +void
 +__fence_init(struct fence *fence, const struct fence_ops *ops,
 +  spinlock_t *lock, unsigned context, unsigned seqno)
 +{
 + BUG_ON(!lock);
 + BUG_ON(!ops || !ops-wait || !ops-enable_signaling ||
 +!ops-get_driver_name || !ops-get_timeline_name);
 +
 + kref_init(fence-refcount);
 + fence-ops = ops;
 + INIT_LIST_HEAD(fence-cb_list);
 + fence-lock = lock;
 + fence-context = context;
 + fence-seqno = seqno;
 + fence-flags = 0UL;
 +
 + trace_fence_init(fence);
 +}
 +EXPORT_SYMBOL(__fence_init);

 Again with the __ exported function...

 I don't even see a fence_init() function anywhere, why the __ ?


 diff --git a/include/linux/fence.h b/include/linux/fence.h
 new file mode 100644
 index ..65f2a01ee7e4
 --- /dev/null
 +++ b/include/linux/fence.h
 @@ -0,0 +1,333 @@
 +/*
 + * Fence mechanism for dma-buf to allow for asynchronous dma access
 + *
 + * Copyright (C) 2012 Canonical Ltd
 + * Copyright (C) 2012 Texas Instruments
 + *
 + * Authors:
 + * Rob Clark robdcl...@gmail.com
 + * Maarten Lankhorst maarten.lankho...@canonical.com
 + *
 + * This program is free software; you can redistribute it and/or modify it
 + * under the terms of the GNU General Public License version 2 as published 
 by
 + * the Free Software Foundation.
 + *
 + * This program is distributed in the hope that it will be useful, but 
 WITHOUT
 + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
 + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
 + * more details.
 + *
 + * You should have received a copy of the GNU General Public License along 
 with
 + * this program.  If not, see http://www.gnu.org/licenses/.
 + */
 +
 +#ifndef __LINUX_FENCE_H
 +#define __LINUX_FENCE_H
 +
 +#include linux/err.h
 +#include linux/wait.h
 +#include linux/list.h
 +#include linux/bitops.h
 +#include linux/kref.h
 +#include linux/sched.h
 +#include linux/printk.h
 +
 +struct fence;
 +struct fence_ops;
 +struct fence_cb;
 +
 +/**
 + * struct fence - software synchronization primitive
 + * @refcount: refcount for this fence
 + * @ops: fence_ops associated with this fence
 + * @cb_list: list of all callbacks to call
 + * @lock: spin_lock_irqsave used for locking
 + * @context: execution context this fence belongs to, returned by
 + *   fence_context_alloc()
 + * @seqno: the sequence number of this fence inside the execution context,
 + * can be compared to decide which fence would be signaled later.
 + * @flags: A mask of FENCE_FLAG_* defined below
 + * @timestamp: Timestamp when the fence was signaled

Re: [REPOST PATCH 1/8] fence: dma-buf cross-device synchronization (v17)

2014-06-18 Thread Rob Clark
 to bool. Rework dma_fence_wait. Added
 dma_fence_is_signaled and dma_fence_wait_timeout.
 s/dma// and change exports to non GPL. Added fence_is_signaled and
 fence_enable_sw_signaling calls, add ability to override default
 wait operation.
 v10: remove event_queue, use a custom list, export try_to_wake_up from
 scheduler. Remove fence lock and use a global spinlock instead,
 this should hopefully remove all the locking headaches I was having
 on trying to implement this. enable_signaling is called with this
 lock held.
 v11:
 Use atomic ops for flags, lifting the need for some spin_lock_irqsaves.
 However I kept the guarantee that after fence_signal returns, it is
 guaranteed that enable_signaling has either been called to completion,
 or will not be called any more.

 Add contexts and seqno to base fence implementation. This allows you
 to wait for less fences, by testing for seqno + signaled, and then only
 wait on the later fence.

 Add FENCE_TRACE, FENCE_WARN, and FENCE_ERR. This makes debugging easier.
 An CONFIG_DEBUG_FENCE will be added to turn off the FENCE_TRACE
 spam, and another runtime option can turn it off at runtime.
 v12:
 Add CONFIG_FENCE_TRACE. Add missing documentation for the fence-context
 and fence-seqno members.
 v13:
 Fixup CONFIG_FENCE_TRACE kconfig description.
 Move fence_context_alloc to fence.
 Simplify fence_later.
 Kill priv member to fence_cb.
 v14:
 Remove priv argument from fence_add_callback, oops!
 v15:
 Remove priv from documentation.
 Explicitly include linux/atomic.h.
 v16:
 Add trace events.
 Import changes required by android syncpoints.
 v17:
 Use wake_up_state instead of try_to_wake_up. (Colin Cross)
 Fix up commit description for seqno_fence. (Rob Clark)

 Signed-off-by: Maarten Lankhorst maarten.lankho...@canonical.com
 Signed-off-by: Thierry Reding thierry.red...@gmail.com #use 
 smp_mb__before_atomic()
 Reviewed-by: Rob Clark robdcl...@gmail.com
 ---
  Documentation/DocBook/device-drivers.tmpl |2
  drivers/base/Kconfig  |9 +
  drivers/base/Makefile |2
  drivers/base/fence.c  |  416 
 +
  include/linux/fence.h |  333 +++
  include/trace/events/fence.h  |  128 +
  6 files changed, 889 insertions(+), 1 deletion(-)
  create mode 100644 drivers/base/fence.c
  create mode 100644 include/linux/fence.h
  create mode 100644 include/trace/events/fence.h

 Who is going to sign up to maintain this code?  (hint, it's not me...)

that would be Sumit (dma-buf tree)..

probably we should move fence/reservation/dma-buf into drivers/dma-buf
(or something approximately like that)

BR,
-R


 thanks,

 greg k-h
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/6] dma-buf: add poll support, v2

2014-02-17 Thread Rob Clark
On Mon, Feb 17, 2014 at 10:58 AM, Maarten Lankhorst
maarten.lankho...@canonical.com wrote:
 Thanks to Fengguang Wu for spotting a missing static cast.

 v2:
 - Kill unused variable need_shared.

 Signed-off-by: Maarten Lankhorst maarten.lankho...@canonical.com
 ---
  drivers/base/dma-buf.c  |  101 
 +++
  include/linux/dma-buf.h |   12 ++
  2 files changed, 113 insertions(+)

 diff --git a/drivers/base/dma-buf.c b/drivers/base/dma-buf.c
 index 85e792c2c909..77ea621ab59d 100644
 --- a/drivers/base/dma-buf.c
 +++ b/drivers/base/dma-buf.c
 @@ -30,6 +30,7 @@
  #include linux/export.h
  #include linux/debugfs.h
  #include linux/seq_file.h
 +#include linux/poll.h
  #include linux/reservation.h

  static inline int is_dma_buf_file(struct file *);
 @@ -52,6 +53,13 @@ static int dma_buf_release(struct inode *inode, struct 
 file *file)

 BUG_ON(dmabuf-vmapping_counter);

 +   /*
 +* Any fences that a dma-buf poll can wait on should be signaled
 +* before releasing dma-buf. This is the responsibility of each
 +* driver that uses the reservation objects.

minor nit.. but wouldn't hurt to mention in the comment that if you
hit this BUG_ON it is because someone isn't holding a ref to the
dmabuf while there is a pending fence

 +*/
 +   BUG_ON(dmabuf-cb_shared.active || dmabuf-cb_excl.active);
 +
 dmabuf-ops-release(dmabuf);

 mutex_lock(db_list.lock);
 @@ -108,10 +116,99 @@ static loff_t dma_buf_llseek(struct file *file, loff_t 
 offset, int whence)
 return base + offset;
  }

 +static void dma_buf_poll_cb(struct fence *fence, struct fence_cb *cb)
 +{
 +   struct dma_buf_poll_cb_t *dcb = (struct dma_buf_poll_cb_t*) cb;
 +   unsigned long flags;
 +
 +   spin_lock_irqsave(dcb-poll-lock, flags);
 +   wake_up_locked_poll(dcb-poll, dcb-active);
 +   dcb-active = 0;
 +   spin_unlock_irqrestore(dcb-poll-lock, flags);
 +}
 +
 +static unsigned int dma_buf_poll(struct file *file, poll_table *poll)
 +{
 +   struct dma_buf *dmabuf;
 +   struct reservation_object *resv;
 +   unsigned long events;
 +
 +   dmabuf = file-private_data;
 +   if (!dmabuf || !dmabuf-resv)
 +   return POLLERR;
 +
 +   resv = dmabuf-resv;
 +
 +   poll_wait(file, dmabuf-poll, poll);
 +
 +   events = poll_requested_events(poll)  (POLLIN | POLLOUT);
 +   if (!events)
 +   return 0;
 +
 +   ww_mutex_lock(resv-lock, NULL);
 +
 +   if (resv-fence_excl  (!(events  POLLOUT) || 
 resv-fence_shared_count == 0)) {
 +   struct dma_buf_poll_cb_t *dcb = dmabuf-cb_excl;
 +   unsigned long pevents = POLLIN;
 +
 +   if (resv-fence_shared_count == 0)
 +   pevents |= POLLOUT;
 +
 +   spin_lock_irq(dmabuf-poll.lock);
 +   if (dcb-active) {
 +   dcb-active |= pevents;
 +   events = ~pevents;
 +   } else
 +   dcb-active = pevents;
 +   spin_unlock_irq(dmabuf-poll.lock);
 +
 +   if (events  pevents) {
 +   if (!fence_add_callback(resv-fence_excl,
 +   dcb-cb, dma_buf_poll_cb))
 +   events = ~pevents;
 +   else
 +   // No callback queued, wake up any additional waiters.

couple spots w/ c++ comments, which I assume you didn't mean to leave?

Anyways, other than those minor things,

Reviewed-by: Rob Clark robdcl...@gmail.com

 +   dma_buf_poll_cb(NULL, dcb-cb);
 +   }
 +   }
 +
 +   if ((events  POLLOUT)  resv-fence_shared_count  0) {
 +   struct dma_buf_poll_cb_t *dcb = dmabuf-cb_shared;
 +   int i;
 +
 +   /* Only queue a new callback if no event has fired yet */
 +   spin_lock_irq(dmabuf-poll.lock);
 +   if (dcb-active)
 +   events = ~POLLOUT;
 +   else
 +   dcb-active = POLLOUT;
 +   spin_unlock_irq(dmabuf-poll.lock);
 +
 +   if (!(events  POLLOUT))
 +   goto out;
 +
 +   for (i = 0; i  resv-fence_shared_count; ++i)
 +   if (!fence_add_callback(resv-fence_shared[i],
 +   dcb-cb, dma_buf_poll_cb)) {
 +   events = ~POLLOUT;
 +   break;
 +   }
 +
 +   // No callback queued, wake up any additional waiters.
 +   if (i == resv-fence_shared_count)
 +   dma_buf_poll_cb(NULL, dcb-cb);
 +   }
 +
 +out:
 +   ww_mutex_unlock(resv-lock);
 +   return events;
 +}
 +
  static const struct file_operations dma_buf_fops = {
 .release= dma_buf_release,
 .mmap

Re: [PATCH 3/6] dma-buf: use reservation objects

2014-02-17 Thread Rob Clark
 if maybe we just want to promote the 'struct
reservation_object' ptr into 'struct drm_gem_object' so we can have a
common get_prime_res_obj fxn for everyone using GEM?

Anyways, that only matters within drivers/gpu/drm so easy enough to
change it later.. so for the drm/fence/reservation/dmabuf bits:

Reviewed-by: Rob Clark robdcl...@gmail.com

 +
 +   return dma_buf_export(obj, drm_gem_prime_dmabuf_ops, obj-size,
 + flags, robj);
  }
  EXPORT_SYMBOL(drm_gem_prime_export);

 diff --git a/drivers/gpu/drm/exynos/exynos_drm_dmabuf.c 
 b/drivers/gpu/drm/exynos/exynos_drm_dmabuf.c
 index 59827cc5e770..b5e89f46326e 100644
 --- a/drivers/gpu/drm/exynos/exynos_drm_dmabuf.c
 +++ b/drivers/gpu/drm/exynos/exynos_drm_dmabuf.c
 @@ -187,7 +187,7 @@ struct dma_buf *exynos_dmabuf_prime_export(struct 
 drm_device *drm_dev,
 struct exynos_drm_gem_obj *exynos_gem_obj = to_exynos_gem_obj(obj);

 return dma_buf_export(obj, exynos_dmabuf_ops,
 -   exynos_gem_obj-base.size, flags);
 +   exynos_gem_obj-base.size, flags, NULL);
  }

  struct drm_gem_object *exynos_dmabuf_prime_import(struct drm_device *drm_dev,
 diff --git a/drivers/gpu/drm/i915/i915_gem_dmabuf.c 
 b/drivers/gpu/drm/i915/i915_gem_dmabuf.c
 index 9bb533e0d762..ea66f40e95b3 100644
 --- a/drivers/gpu/drm/i915/i915_gem_dmabuf.c
 +++ b/drivers/gpu/drm/i915/i915_gem_dmabuf.c
 @@ -233,7 +233,7 @@ static const struct dma_buf_ops i915_dmabuf_ops =  {
  struct dma_buf *i915_gem_prime_export(struct drm_device *dev,
   struct drm_gem_object *gem_obj, int 
 flags)
  {
 -   return dma_buf_export(gem_obj, i915_dmabuf_ops, gem_obj-size, 
 flags);
 +   return dma_buf_export(gem_obj, i915_dmabuf_ops, gem_obj-size, 
 flags, NULL);
  }

  static int i915_gem_object_get_pages_dmabuf(struct drm_i915_gem_object *obj)
 diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c 
 b/drivers/gpu/drm/nouveau/nouveau_drm.c
 index 78c8e7146d56..2a15c8e8d199 100644
 --- a/drivers/gpu/drm/nouveau/nouveau_drm.c
 +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
 @@ -816,6 +816,7 @@ driver = {
 .gem_prime_export = drm_gem_prime_export,
 .gem_prime_import = drm_gem_prime_import,
 .gem_prime_pin = nouveau_gem_prime_pin,
 +   .gem_prime_res_obj = nouveau_gem_prime_res_obj,
 .gem_prime_unpin = nouveau_gem_prime_unpin,
 .gem_prime_get_sg_table = nouveau_gem_prime_get_sg_table,
 .gem_prime_import_sg_table = nouveau_gem_prime_import_sg_table,
 diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.h 
 b/drivers/gpu/drm/nouveau/nouveau_gem.h
 index 7caca057bc38..ddab762d81fe 100644
 --- a/drivers/gpu/drm/nouveau/nouveau_gem.h
 +++ b/drivers/gpu/drm/nouveau/nouveau_gem.h
 @@ -35,6 +35,7 @@ extern int nouveau_gem_ioctl_info(struct drm_device *, void 
 *,
   struct drm_file *);

  extern int nouveau_gem_prime_pin(struct drm_gem_object *);
 +struct reservation_object *nouveau_gem_prime_res_obj(struct drm_gem_object 
 *);
  extern void nouveau_gem_prime_unpin(struct drm_gem_object *);
  extern struct sg_table *nouveau_gem_prime_get_sg_table(struct drm_gem_object 
 *);
  extern struct drm_gem_object *nouveau_gem_prime_import_sg_table(
 diff --git a/drivers/gpu/drm/nouveau/nouveau_prime.c 
 b/drivers/gpu/drm/nouveau/nouveau_prime.c
 index 51a2cb102b44..1f51008e4d26 100644
 --- a/drivers/gpu/drm/nouveau/nouveau_prime.c
 +++ b/drivers/gpu/drm/nouveau/nouveau_prime.c
 @@ -102,3 +102,10 @@ void nouveau_gem_prime_unpin(struct drm_gem_object *obj)

 nouveau_bo_unpin(nvbo);
  }
 +
 +struct reservation_object *nouveau_gem_prime_res_obj(struct drm_gem_object 
 *obj)
 +{
 +   struct nouveau_bo *nvbo = nouveau_gem_object(obj);
 +
 +   return nvbo-bo.resv;
 +}
 diff --git a/drivers/gpu/drm/omapdrm/omap_gem_dmabuf.c 
 b/drivers/gpu/drm/omapdrm/omap_gem_dmabuf.c
 index 4fcca8d42796..a2dbfb1737b4 100644
 --- a/drivers/gpu/drm/omapdrm/omap_gem_dmabuf.c
 +++ b/drivers/gpu/drm/omapdrm/omap_gem_dmabuf.c
 @@ -171,7 +171,7 @@ static struct dma_buf_ops omap_dmabuf_ops = {
  struct dma_buf *omap_gem_prime_export(struct drm_device *dev,
 struct drm_gem_object *obj, int flags)
  {
 -   return dma_buf_export(obj, omap_dmabuf_ops, obj-size, flags);
 +   return dma_buf_export(obj, omap_dmabuf_ops, obj-size, flags, NULL);
  }

  struct drm_gem_object *omap_gem_prime_import(struct drm_device *dev,
 diff --git a/drivers/gpu/drm/radeon/radeon_drv.c 
 b/drivers/gpu/drm/radeon/radeon_drv.c
 index 84a1bbb75f91..c15c1a1996fc 100644
 --- a/drivers/gpu/drm/radeon/radeon_drv.c
 +++ b/drivers/gpu/drm/radeon/radeon_drv.c
 @@ -128,6 +128,7 @@ struct drm_gem_object 
 *radeon_gem_prime_import_sg_table(struct drm_device *dev,
 struct sg_table *sg);
  int radeon_gem_prime_pin(struct drm_gem_object *obj);
  void radeon_gem_prime_unpin(struct drm_gem_object *obj

Re: [PATCH 5/6] reservation: add support for fences to enable cross-device synchronisation

2014-02-17 Thread Rob Clark
On Mon, Feb 17, 2014 at 10:58 AM, Maarten Lankhorst
maarten.lankho...@canonical.com wrote:
 Signed-off-by: Maarten Lankhorst maarten.lankho...@canonical.com

Reviewed-by: Rob Clark robdcl...@gmail.com


 ---
  include/linux/reservation.h |   18 +-
  1 file changed, 17 insertions(+), 1 deletion(-)

 diff --git a/include/linux/reservation.h b/include/linux/reservation.h
 index 813dae960ebd..92c4851b5a39 100644
 --- a/include/linux/reservation.h
 +++ b/include/linux/reservation.h
 @@ -6,7 +6,7 @@
   * Copyright (C) 2012 Texas Instruments
   *
   * Authors:
 - * Rob Clark rob.cl...@linaro.org
 + * Rob Clark robdcl...@gmail.com
   * Maarten Lankhorst maarten.lankho...@canonical.com
   * Thomas Hellstrom thellstrom-at-vmware-dot-com
   *
 @@ -40,22 +40,38 @@
  #define _LINUX_RESERVATION_H

  #include linux/ww_mutex.h
 +#include linux/fence.h

  extern struct ww_class reservation_ww_class;

  struct reservation_object {
 struct ww_mutex lock;
 +
 +   struct fence *fence_excl;
 +   struct fence **fence_shared;
 +   u32 fence_shared_count, fence_shared_max;
  };

  static inline void
  reservation_object_init(struct reservation_object *obj)
  {
 ww_mutex_init(obj-lock, reservation_ww_class);
 +
 +   obj-fence_shared_count = obj-fence_shared_max = 0;
 +   obj-fence_shared = NULL;
 +   obj-fence_excl = NULL;
  }

  static inline void
  reservation_object_fini(struct reservation_object *obj)
  {
 +   int i;
 +
 +   if (obj-fence_excl)
 +   fence_put(obj-fence_excl);
 +   for (i = 0; i  obj-fence_shared_count; ++i)
 +   fence_put(obj-fence_shared[i]);
 +
 ww_mutex_destroy(obj-lock);
  }


--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/6] seqno-fence: Hardware dma-buf implementation of fencing (v4)

2014-02-17 Thread Rob Clark
On Mon, Feb 17, 2014 at 10:56 AM, Maarten Lankhorst
maarten.lankho...@canonical.com wrote:
 This type of fence can be used with hardware synchronization for simple
 hardware that can block execution until the condition
 (dma_buf[offset] - value) = 0 has been met.

 A software fallback still has to be provided in case the fence is used
 with a device that doesn't support this mechanism. It is useful to expose
 this for graphics cards that have an op to support this.

 Some cards like i915 can export those, but don't have an option to wait,
 so they need the software fallback.

 I extended the original patch by Rob Clark.

 v1: Original
 v2: Renamed from bikeshed to seqno, moved into dma-fence.c since
 not much was left of the file. Lots of documentation added.
 v3: Use fence_ops instead of custom callbacks. Moved to own file
 to avoid circular dependency between dma-buf.h and fence.h
 v4: Add spinlock pointer to seqno_fence_init

 Signed-off-by: Maarten Lankhorst maarten.lankho...@canonical.com


Reviewed-by: Rob Clark robdcl...@gmail.com


 ---
  Documentation/DocBook/device-drivers.tmpl |1
  drivers/base/fence.c  |   50 +
  include/linux/seqno-fence.h   |  109 
 +
  3 files changed, 160 insertions(+)
  create mode 100644 include/linux/seqno-fence.h

 diff --git a/Documentation/DocBook/device-drivers.tmpl 
 b/Documentation/DocBook/device-drivers.tmpl
 index 7a0c9ddb4818..8c85c20942c2 100644
 --- a/Documentation/DocBook/device-drivers.tmpl
 +++ b/Documentation/DocBook/device-drivers.tmpl
 @@ -131,6 +131,7 @@ X!Edrivers/base/interface.c
  !Edrivers/base/dma-buf.c
  !Edrivers/base/fence.c
  !Iinclude/linux/fence.h
 +!Iinclude/linux/seqno-fence.h
  !Edrivers/base/reservation.c
  !Iinclude/linux/reservation.h
  !Edrivers/base/dma-coherent.c
 diff --git a/drivers/base/fence.c b/drivers/base/fence.c
 index 12df2bf62034..cd0937127a89 100644
 --- a/drivers/base/fence.c
 +++ b/drivers/base/fence.c
 @@ -25,6 +25,7 @@
  #include linux/export.h
  #include linux/atomic.h
  #include linux/fence.h
 +#include linux/seqno-fence.h

  #define CREATE_TRACE_POINTS
  #include trace/events/fence.h
 @@ -413,3 +414,52 @@ __fence_init(struct fence *fence, const struct fence_ops 
 *ops,
 trace_fence_init(fence);
  }
  EXPORT_SYMBOL(__fence_init);
 +
 +static const char *seqno_fence_get_driver_name(struct fence *fence) {
 +   struct seqno_fence *seqno_fence = to_seqno_fence(fence);
 +   return seqno_fence-ops-get_driver_name(fence);
 +}
 +
 +static const char *seqno_fence_get_timeline_name(struct fence *fence) {
 +   struct seqno_fence *seqno_fence = to_seqno_fence(fence);
 +   return seqno_fence-ops-get_timeline_name(fence);
 +}
 +
 +static bool seqno_enable_signaling(struct fence *fence)
 +{
 +   struct seqno_fence *seqno_fence = to_seqno_fence(fence);
 +   return seqno_fence-ops-enable_signaling(fence);
 +}
 +
 +static bool seqno_signaled(struct fence *fence)
 +{
 +   struct seqno_fence *seqno_fence = to_seqno_fence(fence);
 +   return seqno_fence-ops-signaled  
 seqno_fence-ops-signaled(fence);
 +}
 +
 +static void seqno_release(struct fence *fence)
 +{
 +   struct seqno_fence *f = to_seqno_fence(fence);
 +
 +   dma_buf_put(f-sync_buf);
 +   if (f-ops-release)
 +   f-ops-release(fence);
 +   else
 +   kfree(f);
 +}
 +
 +static long seqno_wait(struct fence *fence, bool intr, signed long timeout)
 +{
 +   struct seqno_fence *f = to_seqno_fence(fence);
 +   return f-ops-wait(fence, intr, timeout);
 +}
 +
 +const struct fence_ops seqno_fence_ops = {
 +   .get_driver_name = seqno_fence_get_driver_name,
 +   .get_timeline_name = seqno_fence_get_timeline_name,
 +   .enable_signaling = seqno_enable_signaling,
 +   .signaled = seqno_signaled,
 +   .wait = seqno_wait,
 +   .release = seqno_release,
 +};
 +EXPORT_SYMBOL(seqno_fence_ops);
 diff --git a/include/linux/seqno-fence.h b/include/linux/seqno-fence.h
 new file mode 100644
 index ..952f7909128c
 --- /dev/null
 +++ b/include/linux/seqno-fence.h
 @@ -0,0 +1,109 @@
 +/*
 + * seqno-fence, using a dma-buf to synchronize fencing
 + *
 + * Copyright (C) 2012 Texas Instruments
 + * Copyright (C) 2012 Canonical Ltd
 + * Authors:
 + * Rob Clark robdcl...@gmail.com
 + *   Maarten Lankhorst maarten.lankho...@canonical.com
 + *
 + * This program is free software; you can redistribute it and/or modify it
 + * under the terms of the GNU General Public License version 2 as published 
 by
 + * the Free Software Foundation.
 + *
 + * This program is distributed in the hope that it will be useful, but 
 WITHOUT
 + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
 + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
 + * more details.
 + *
 + * You should have received a copy of the GNU General Public License along 
 with
 + * this program

Re: [PATCH 1/6] fence: dma-buf cross-device synchronization (v17)

2014-02-17 Thread Rob Clark
 dma_fence_is_signaled and dma_fence_wait_timeout.
 s/dma// and change exports to non GPL. Added fence_is_signaled and
 fence_enable_sw_signaling calls, add ability to override default
 wait operation.
 v10: remove event_queue, use a custom list, export try_to_wake_up from
 scheduler. Remove fence lock and use a global spinlock instead,
 this should hopefully remove all the locking headaches I was having
 on trying to implement this. enable_signaling is called with this
 lock held.
 v11:
 Use atomic ops for flags, lifting the need for some spin_lock_irqsaves.
 However I kept the guarantee that after fence_signal returns, it is
 guaranteed that enable_signaling has either been called to completion,
 or will not be called any more.

 Add contexts and seqno to base fence implementation. This allows you
 to wait for less fences, by testing for seqno + signaled, and then only
 wait on the later fence.

 Add FENCE_TRACE, FENCE_WARN, and FENCE_ERR. This makes debugging easier.
 An CONFIG_DEBUG_FENCE will be added to turn off the FENCE_TRACE
 spam, and another runtime option can turn it off at runtime.
 v12:
 Add CONFIG_FENCE_TRACE. Add missing documentation for the fence-context
 and fence-seqno members.
 v13:
 Fixup CONFIG_FENCE_TRACE kconfig description.
 Move fence_context_alloc to fence.
 Simplify fence_later.
 Kill priv member to fence_cb.
 v14:
 Remove priv argument from fence_add_callback, oops!
 v15:
 Remove priv from documentation.
 Explicitly include linux/atomic.h.
 v16:
 Add trace events.
 Import changes required by android syncpoints.
 v17:
 Use wake_up_state instead of try_to_wake_up. (Colin Cross)
 Fix up commit description for seqno_fence. (Rob Clark)

 Signed-off-by: Maarten Lankhorst maarten.lankho...@canonical.com

Reviewed-by: Rob Clark robdcl...@gmail.com


 ---
  Documentation/DocBook/device-drivers.tmpl |2
  drivers/base/Kconfig  |9 +
  drivers/base/Makefile |2
  drivers/base/fence.c  |  415 
 +
  include/linux/fence.h |  329 +++
  include/trace/events/fence.h  |  125 +
  6 files changed, 881 insertions(+), 1 deletion(-)
  create mode 100644 drivers/base/fence.c
  create mode 100644 include/linux/fence.h
  create mode 100644 include/trace/events/fence.h

 diff --git a/Documentation/DocBook/device-drivers.tmpl 
 b/Documentation/DocBook/device-drivers.tmpl
 index f5170082bdb3..7a0c9ddb4818 100644
 --- a/Documentation/DocBook/device-drivers.tmpl
 +++ b/Documentation/DocBook/device-drivers.tmpl
 @@ -129,6 +129,8 @@ X!Edrivers/base/interface.c
   /sect1
   sect1titleDevice Drivers DMA Management/title
  !Edrivers/base/dma-buf.c
 +!Edrivers/base/fence.c
 +!Iinclude/linux/fence.h
  !Edrivers/base/reservation.c
  !Iinclude/linux/reservation.h
  !Edrivers/base/dma-coherent.c
 diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
 index ec36e7772e57..b50ad30151ae 100644
 --- a/drivers/base/Kconfig
 +++ b/drivers/base/Kconfig
 @@ -200,6 +200,15 @@ config DMA_SHARED_BUFFER
   APIs extension; the file's descriptor can then be passed on to other
   driver.

 +config FENCE_TRACE
 +   bool Enable verbose FENCE_TRACE messages
 +   depends on DMA_SHARED_BUFFER
 +   help
 + Enable the FENCE_TRACE printks. This will add extra
 + spam to the console log, but will make it easier to diagnose
 + lockup related problems for dma-buffers shared across multiple
 + devices.
 +
  config DMA_CMA
 bool DMA Contiguous Memory Allocator
 depends on HAVE_DMA_CONTIGUOUS  CMA
 diff --git a/drivers/base/Makefile b/drivers/base/Makefile
 index 04b314e0fa51..eb4864aee073 100644
 --- a/drivers/base/Makefile
 +++ b/drivers/base/Makefile
 @@ -10,7 +10,7 @@ obj-$(CONFIG_DMA_CMA) += dma-contiguous.o
  obj-y  += power/
  obj-$(CONFIG_HAS_DMA)  += dma-mapping.o
  obj-$(CONFIG_HAVE_GENERIC_DMA_COHERENT) += dma-coherent.o
 -obj-$(CONFIG_DMA_SHARED_BUFFER) += dma-buf.o reservation.o
 +obj-$(CONFIG_DMA_SHARED_BUFFER) += dma-buf.o fence.o reservation.o
  obj-$(CONFIG_ISA)  += isa.o
  obj-$(CONFIG_FW_LOADER)+= firmware_class.o
  obj-$(CONFIG_NUMA) += node.o
 diff --git a/drivers/base/fence.c b/drivers/base/fence.c
 new file mode 100644
 index ..12df2bf62034
 --- /dev/null
 +++ b/drivers/base/fence.c
 @@ -0,0 +1,415 @@
 +/*
 + * Fence mechanism for dma-buf and to allow for asynchronous dma access
 + *
 + * Copyright (C) 2012 Canonical Ltd
 + * Copyright (C) 2012 Texas Instruments
 + *
 + * Authors:
 + * Rob Clark robdcl...@gmail.com
 + * Maarten Lankhorst maarten.lankho...@canonical.com
 + *
 + * This program is free software; you can redistribute it and/or modify it
 + * under the terms of the GNU General Public License version 2 as published

Re: [PATCH 2/6] seqno-fence: Hardware dma-buf implementation of fencing (v4)

2014-02-17 Thread Rob Clark
On Mon, Feb 17, 2014 at 11:56 AM, Christian König
deathsim...@vodafone.de wrote:
 Am 17.02.2014 16:56, schrieb Maarten Lankhorst:

 This type of fence can be used with hardware synchronization for simple
 hardware that can block execution until the condition
 (dma_buf[offset] - value) = 0 has been met.


 Can't we make that just dma_buf[offset] != 0 instead? As far as I know
 this way it would match the definition M$ uses in their WDDM specification
 and so make it much more likely that hardware supports it.

well 'buf[offset] = value' at least means the same slot can be used
for multiple operations (with increasing values of 'value').. not sure
if that is something people care about.

=value seems to be possible with adreno and radeon.  I'm not really sure about 
others (although I presume it as least supported for nv desktop stuff).  For 
hw that cannot do =value, we can either have a different fence implementation 
which uses the !=0 approach.  Or change seqno-fence implementation later if 
needed.  But if someone has hw that can do !=0 but not =value, speak up now 
;-)


 Apart from that I still don't like the idea of leaking a drivers IRQ context
 outside of the driver, but without a proper GPU scheduler there probably
 isn't much alternative.

I guess it will be not uncommon scenario for gpu device to just need
to kick display device to write a few registers for a page flip..
probably best not to schedule a worker just for this (unless the
signalled device otherwise needs to).  I think it is better in this
case to give the signalee some rope to hang themselves, and make it
the responsibility of the callback to kick things off to a worker if
needed.

BR,
-R

 Christian.


 A software fallback still has to be provided in case the fence is used
 with a device that doesn't support this mechanism. It is useful to expose
 this for graphics cards that have an op to support this.

 Some cards like i915 can export those, but don't have an option to wait,
 so they need the software fallback.

 I extended the original patch by Rob Clark.

 v1: Original
 v2: Renamed from bikeshed to seqno, moved into dma-fence.c since
  not much was left of the file. Lots of documentation added.
 v3: Use fence_ops instead of custom callbacks. Moved to own file
  to avoid circular dependency between dma-buf.h and fence.h
 v4: Add spinlock pointer to seqno_fence_init

 Signed-off-by: Maarten Lankhorst maarten.lankho...@canonical.com
 ---
   Documentation/DocBook/device-drivers.tmpl |1
   drivers/base/fence.c  |   50 +
   include/linux/seqno-fence.h   |  109
 +
   3 files changed, 160 insertions(+)
   create mode 100644 include/linux/seqno-fence.h

 diff --git a/Documentation/DocBook/device-drivers.tmpl
 b/Documentation/DocBook/device-drivers.tmpl
 index 7a0c9ddb4818..8c85c20942c2 100644
 --- a/Documentation/DocBook/device-drivers.tmpl
 +++ b/Documentation/DocBook/device-drivers.tmpl
 @@ -131,6 +131,7 @@ X!Edrivers/base/interface.c
   !Edrivers/base/dma-buf.c
   !Edrivers/base/fence.c
   !Iinclude/linux/fence.h
 +!Iinclude/linux/seqno-fence.h
   !Edrivers/base/reservation.c
   !Iinclude/linux/reservation.h
   !Edrivers/base/dma-coherent.c
 diff --git a/drivers/base/fence.c b/drivers/base/fence.c
 index 12df2bf62034..cd0937127a89 100644
 --- a/drivers/base/fence.c
 +++ b/drivers/base/fence.c
 @@ -25,6 +25,7 @@
   #include linux/export.h
   #include linux/atomic.h
   #include linux/fence.h
 +#include linux/seqno-fence.h
 #define CREATE_TRACE_POINTS
   #include trace/events/fence.h
 @@ -413,3 +414,52 @@ __fence_init(struct fence *fence, const struct
 fence_ops *ops,
 trace_fence_init(fence);
   }
   EXPORT_SYMBOL(__fence_init);
 +
 +static const char *seqno_fence_get_driver_name(struct fence *fence) {
 +   struct seqno_fence *seqno_fence = to_seqno_fence(fence);
 +   return seqno_fence-ops-get_driver_name(fence);
 +}
 +
 +static const char *seqno_fence_get_timeline_name(struct fence *fence) {
 +   struct seqno_fence *seqno_fence = to_seqno_fence(fence);
 +   return seqno_fence-ops-get_timeline_name(fence);
 +}
 +
 +static bool seqno_enable_signaling(struct fence *fence)
 +{
 +   struct seqno_fence *seqno_fence = to_seqno_fence(fence);
 +   return seqno_fence-ops-enable_signaling(fence);
 +}
 +
 +static bool seqno_signaled(struct fence *fence)
 +{
 +   struct seqno_fence *seqno_fence = to_seqno_fence(fence);
 +   return seqno_fence-ops-signaled 
 seqno_fence-ops-signaled(fence);
 +}
 +
 +static void seqno_release(struct fence *fence)
 +{
 +   struct seqno_fence *f = to_seqno_fence(fence);
 +
 +   dma_buf_put(f-sync_buf);
 +   if (f-ops-release)
 +   f-ops-release(fence);
 +   else
 +   kfree(f);
 +}
 +
 +static long seqno_wait(struct fence *fence, bool intr, signed long
 timeout)
 +{
 +   struct seqno_fence *f = to_seqno_fence(fence);
 +   return f-ops-wait

Re: [PATCH 2/6] seqno-fence: Hardware dma-buf implementation of fencing (v4)

2014-02-17 Thread Rob Clark
On Mon, Feb 17, 2014 at 12:36 PM, Christian König
deathsim...@vodafone.de wrote:
 Am 17.02.2014 18:27, schrieb Rob Clark:

 On Mon, Feb 17, 2014 at 11:56 AM, Christian König
 deathsim...@vodafone.de wrote:

 Am 17.02.2014 16:56, schrieb Maarten Lankhorst:

 This type of fence can be used with hardware synchronization for simple
 hardware that can block execution until the condition
 (dma_buf[offset] - value) = 0 has been met.


 Can't we make that just dma_buf[offset] != 0 instead? As far as I know
 this way it would match the definition M$ uses in their WDDM
 specification
 and so make it much more likely that hardware supports it.

 well 'buf[offset] = value' at least means the same slot can be used
 for multiple operations (with increasing values of 'value').. not sure
 if that is something people care about.

 =value seems to be possible with adreno and radeon.  I'm not really sure
 about others (although I presume it as least supported for nv desktop
 stuff).  For hw that cannot do =value, we can either have a different fence
 implementation which uses the !=0 approach.  Or change seqno-fence
 implementation later if needed.  But if someone has hw that can do !=0 but
 not =value, speak up now ;-)


 Here! Radeon can only do =value on the DMA and 3D engine, but not with UVD
 or VCE. And for the 3D engine it means draining the pipe, which isn't really
 a good idea.

hmm, ok.. forgot you have a few extra rings compared to me.  Is UVD
re-ordering from decode-order to display-order for you in hw?  If not,
I guess you need sw intervention anyways when a frame is done for
frame re-ordering, so maybe hw-hw sync doesn't really matter as much
as compared to gpu/3d-display.  For dma-3d interactions, seems like
you would care more about hw-hw sync, but I guess you aren't likely
to use GPU A to do a resolve blit for GPU B..

For 3D ring, I assume you probably want a CP_WAIT_FOR_IDLE before a
CP_MEM_WRITE to update fence value in memory (for the one signalling
the fence).  But why would you need that before a CP_WAIT_REG_MEM (for
the one waiting for the fence)?  I don't exactly have documentation
for adreno version of CP_WAIT_REG_{MEM,EQ,GTE}..  but PFP and ME
appear to be same instruction set as r600, so I'm pretty sure they
should have similar capabilities.. CP_WAIT_REG_MEM appears to be same
but with 32bit gpu addresses vs 64b.

BR,
-R

 Christian.



 Apart from that I still don't like the idea of leaking a drivers IRQ
 context
 outside of the driver, but without a proper GPU scheduler there probably
 isn't much alternative.

 I guess it will be not uncommon scenario for gpu device to just need
 to kick display device to write a few registers for a page flip..
 probably best not to schedule a worker just for this (unless the
 signalled device otherwise needs to).  I think it is better in this
 case to give the signalee some rope to hang themselves, and make it
 the responsibility of the callback to kick things off to a worker if
 needed.

 BR,
 -R

 Christian.

 A software fallback still has to be provided in case the fence is used
 with a device that doesn't support this mechanism. It is useful to
 expose
 this for graphics cards that have an op to support this.

 Some cards like i915 can export those, but don't have an option to wait,
 so they need the software fallback.

 I extended the original patch by Rob Clark.

 v1: Original
 v2: Renamed from bikeshed to seqno, moved into dma-fence.c since
   not much was left of the file. Lots of documentation added.
 v3: Use fence_ops instead of custom callbacks. Moved to own file
   to avoid circular dependency between dma-buf.h and fence.h
 v4: Add spinlock pointer to seqno_fence_init

 Signed-off-by: Maarten Lankhorst maarten.lankho...@canonical.com
 ---
Documentation/DocBook/device-drivers.tmpl |1
drivers/base/fence.c  |   50 +
include/linux/seqno-fence.h   |  109
 +
3 files changed, 160 insertions(+)
create mode 100644 include/linux/seqno-fence.h

 diff --git a/Documentation/DocBook/device-drivers.tmpl
 b/Documentation/DocBook/device-drivers.tmpl
 index 7a0c9ddb4818..8c85c20942c2 100644
 --- a/Documentation/DocBook/device-drivers.tmpl
 +++ b/Documentation/DocBook/device-drivers.tmpl
 @@ -131,6 +131,7 @@ X!Edrivers/base/interface.c
!Edrivers/base/dma-buf.c
!Edrivers/base/fence.c
!Iinclude/linux/fence.h
 +!Iinclude/linux/seqno-fence.h
!Edrivers/base/reservation.c
!Iinclude/linux/reservation.h
!Edrivers/base/dma-coherent.c
 diff --git a/drivers/base/fence.c b/drivers/base/fence.c
 index 12df2bf62034..cd0937127a89 100644
 --- a/drivers/base/fence.c
 +++ b/drivers/base/fence.c
 @@ -25,6 +25,7 @@
#include linux/export.h
#include linux/atomic.h
#include linux/fence.h
 +#include linux/seqno-fence.h
  #define CREATE_TRACE_POINTS
#include trace/events/fence.h
 @@ -413,3 +414,52 @@ __fence_init(struct fence *fence, const struct

Re: [PATCH] dma-buf: avoid using IS_ERR_OR_NULL

2013-12-21 Thread Rob Clark
On Fri, Dec 20, 2013 at 7:43 PM, Colin Cross ccr...@android.com wrote:
 dma_buf_map_attachment and dma_buf_vmap can return NULL or
 ERR_PTR on a error.  This encourages a common buggy pattern in
 callers:
 sgt = dma_buf_map_attachment(attach, DMA_BIDIRECTIONAL);
 if (IS_ERR_OR_NULL(sgt))
 return PTR_ERR(sgt);

 This causes the caller to return 0 on an error.  IS_ERR_OR_NULL
 is almost always a sign of poorly-defined error handling.

 This patch converts dma_buf_map_attachment to always return
 ERR_PTR, and fixes the callers that incorrectly handled NULL.
 There are a few more callers that were not checking for NULL
 at all, which would have dereferenced a NULL pointer later.
 There are also a few more callers that correctly handled NULL
 and ERR_PTR differently, I left those alone but they could also
 be modified to delete the NULL check.

 This patch also converts dma_buf_vmap to always return NULL.
 All the callers to dma_buf_vmap only check for NULL, and would
 have dereferenced an ERR_PTR and panic'd if one was ever
 returned. This is not consistent with the rest of the dma buf
 APIs, but matches the expectations of all of the callers.

 Signed-off-by: Colin Cross ccr...@android.com
 ---
  drivers/base/dma-buf.c | 18 +++---
  drivers/gpu/drm/drm_prime.c|  2 +-
  drivers/gpu/drm/exynos/exynos_drm_dmabuf.c |  2 +-
  drivers/media/v4l2-core/videobuf2-dma-contig.c |  2 +-
  4 files changed, 14 insertions(+), 10 deletions(-)

 diff --git a/drivers/base/dma-buf.c b/drivers/base/dma-buf.c
 index 1e16cbd61da2..cfe1d8bc7bb8 100644
 --- a/drivers/base/dma-buf.c
 +++ b/drivers/base/dma-buf.c
 @@ -251,9 +251,8 @@ EXPORT_SYMBOL_GPL(dma_buf_put);
   * @dmabuf:[in]buffer to attach device to.
   * @dev:   [in]device to be attached.
   *
 - * Returns struct dma_buf_attachment * for this attachment; may return 
 negative
 - * error codes.
 - *
 + * Returns struct dma_buf_attachment * for this attachment; returns ERR_PTR 
 on
 + * error.
   */
  struct dma_buf_attachment *dma_buf_attach(struct dma_buf *dmabuf,
   struct device *dev)
 @@ -319,9 +318,8 @@ EXPORT_SYMBOL_GPL(dma_buf_detach);
   * @attach:[in]attachment whose scatterlist is to be returned
   * @direction: [in]direction of DMA transfer
   *
 - * Returns sg_table containing the scatterlist to be returned; may return 
 NULL
 - * or ERR_PTR.
 - *
 + * Returns sg_table containing the scatterlist to be returned; returns 
 ERR_PTR
 + * on error.
   */
  struct sg_table *dma_buf_map_attachment(struct dma_buf_attachment *attach,
 enum dma_data_direction direction)
 @@ -334,6 +332,8 @@ struct sg_table *dma_buf_map_attachment(struct 
 dma_buf_attachment *attach,
 return ERR_PTR(-EINVAL);

 sg_table = attach-dmabuf-ops-map_dma_buf(attach, direction);
 +   if (!sg_table)
 +   sg_table = ERR_PTR(-ENOMEM);

 return sg_table;
  }
 @@ -544,6 +544,8 @@ EXPORT_SYMBOL_GPL(dma_buf_mmap);
   * These calls are optional in drivers. The intended use for them
   * is for mapping objects linear in kernel space for high use objects.
   * Please attempt to use kmap/kunmap before thinking about these interfaces.
 + *
 + * Returns NULL on error.
   */
  void *dma_buf_vmap(struct dma_buf *dmabuf)
  {
 @@ -566,7 +568,9 @@ void *dma_buf_vmap(struct dma_buf *dmabuf)
 BUG_ON(dmabuf-vmap_ptr);

 ptr = dmabuf-ops-vmap(dmabuf);
 -   if (IS_ERR_OR_NULL(ptr))
 +   if (WARN_ON_ONCE(IS_ERR(ptr)))

since vmap is optional, the WARN_ON might be a bit strong..  although
it would be a bit strange for an exporter to supply a vmap fxn which
always returned NULL, not sure about that.  Just thought I'd mention
it in case anyone else had an opinion about that.

But either way:

Reviewed-by: Rob Clark robdcl...@gmail.com


 +   ptr = NULL;
 +   if (!ptr)
 goto out_unlock;

 dmabuf-vmap_ptr = ptr;
 diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
 index 56805c39c906..bb516fdd195d 100644
 --- a/drivers/gpu/drm/drm_prime.c
 +++ b/drivers/gpu/drm/drm_prime.c
 @@ -471,7 +471,7 @@ struct drm_gem_object *drm_gem_prime_import(struct 
 drm_device *dev,
 get_dma_buf(dma_buf);

 sgt = dma_buf_map_attachment(attach, DMA_BIDIRECTIONAL);
 -   if (IS_ERR_OR_NULL(sgt)) {
 +   if (IS_ERR(sgt)) {
 ret = PTR_ERR(sgt);
 goto fail_detach;
 }
 diff --git a/drivers/gpu/drm/exynos/exynos_drm_dmabuf.c 
 b/drivers/gpu/drm/exynos/exynos_drm_dmabuf.c
 index 59827cc5e770..c786cd4f457b 100644
 --- a/drivers/gpu/drm/exynos/exynos_drm_dmabuf.c
 +++ b/drivers/gpu/drm/exynos/exynos_drm_dmabuf.c
 @@ -224,7 +224,7 @@ struct drm_gem_object *exynos_dmabuf_prime_import(struct 
 drm_device *drm_dev,
 get_dma_buf(dma_buf);

 sgt

Re: [RFC PATCH] fence: dma-buf cross-device synchronization (v12)

2013-08-15 Thread Rob Clark
On Thu, Aug 15, 2013 at 7:16 AM, Maarten Lankhorst
maarten.lankho...@canonical.com wrote:
 Op 12-08-13 17:43, Rob Clark schreef:
 On Mon, Jul 29, 2013 at 10:05 AM, Maarten Lankhorst
 maarten.lankho...@canonical.com wrote:
 +
[snip]
 +/**
 + * fence_add_callback - add a callback to be called when the fence
 + * is signaled
 + * @fence: [in]the fence to wait on
 + * @cb:[in]the callback to register
 + * @func:  [in]the function to call
 + * @priv:  [in]the argument to pass to function
 + *
 + * cb will be initialized by fence_add_callback, no initialization
 + * by the caller is required. Any number of callbacks can be registered
 + * to a fence, but a callback can only be registered to one fence at a 
 time.
 + *
 + * Note that the callback can be called from an atomic context.  If
 + * fence is already signaled, this function will return -ENOENT (and
 + * *not* call the callback)
 + *
 + * Add a software callback to the fence. Same restrictions apply to
 + * refcount as it does to fence_wait, however the caller doesn't need to
 + * keep a refcount to fence afterwards: when software access is enabled,
 + * the creator of the fence is required to keep the fence alive until
 + * after it signals with fence_signal. The callback itself can be called
 + * from irq context.
 + *
 + */
 +int fence_add_callback(struct fence *fence, struct fence_cb *cb,
 +  fence_func_t func, void *priv)
 +{
 +   unsigned long flags;
 +   int ret = 0;
 +   bool was_set;
 +
 +   if (WARN_ON(!fence || !func))
 +   return -EINVAL;
 +
 +   if (test_bit(FENCE_FLAG_SIGNALED_BIT, fence-flags))
 +   return -ENOENT;
 +
 +   spin_lock_irqsave(fence-lock, flags);
 +
 +   was_set = test_and_set_bit(FENCE_FLAG_ENABLE_SIGNAL_BIT, 
 fence-flags);
 +
 +   if (test_bit(FENCE_FLAG_SIGNALED_BIT, fence-flags))
 +   ret = -ENOENT;
 +   else if (!was_set  !fence-ops-enable_signaling(fence)) {
 +   __fence_signal(fence);
 +   ret = -ENOENT;
 +   }
 +
 +   if (!ret) {
 +   cb-func = func;
 +   cb-priv = priv;
 +   list_add_tail(cb-node, fence-cb_list);
 since the user is providing the 'struct fence_cb', why not drop the
 priv  func args, and have some cb-initialize macro, ie.

 INIT_FENCE_CB(foo-fence, cbfxn);

 and I guess we can just drop priv and let the user embed fence in
 whatever structure they like.  Ie. make it look a bit how work_struct
 works.
 I don't mind killing priv. But a INIT_FENCE_CB macro is silly, when all it 
 would do is set cb-func.
 So passing it as  an argument to fence_add_callback is fine, unless you have 
 a better reason to
 do so.

 INIT_WORK seems to have a bit more initialization than us, it seems work can 
 be more complicated
 than callbacks, because the callbacks can only be called once and work can be 
 rescheduled multiple times.

yeah, INIT_WORK does more.. although maybe some day we want
INIT_FENCE_CB to do more (ie. if we add some debug features to help
catch misuse of fence/fence-cb's).  And if nothing else, having it
look a bit like other constructs that we have in the kernel seems
useful.  And with my point below, you'd want INIT_FENCE_CB to do a
INIT_LIST_HEAD(), so it is (very) slightly more than just setting the
fxn ptr.

 maybe also, if (!list_empty(cb-node) return -EBUSY?
 I think checking for list_empty(cb-node) is a terrible idea. This is no 
 different from any other list corruption,
 and it's a programming error. Not a runtime error. :-)

I was thinking for crtc and page-flip, embed the fence_cb in the crtc.
 You should only use the cb once at a time, but in this case you might
want to re-use it for the next page flip.  Having something to catch
cb mis-use in this sort of scenario seems useful.

maybe how I am thinking to use fence_cb is not quite what you had in
mind.  I'm not sure.  I was trying to think how I could just directly
use fence/fence_cb in msm for everything (imported dmabuf or just
regular 'ol gem buffers).

 cb-node.next/prev may be NULL, which would fail with this check. The 
 contents of cb-node are undefined
 before fence_add_callback is called. Calling fence_remove_callback on a fence 
 that hasn't been added is
 undefined too. Calling fence_remove_callback works, but I'm thinking of 
 changing the list_del_init to list_del,
 which would make calling fence_remove_callback twice a fatal error if 
 CONFIG_DEBUG_LIST is enabled,
 and a possible memory corruption otherwise.
 ...
 +
[snip]
 +
 +/**
 + * fence context counter: each execution context should have its own
 + * fence context, this allows checking if fences belong to the same
 + * context or not. One device can have multiple separate contexts,
 + * and they're used if some engine can run independently of another.
 + */
 +extern atomic_t fence_context_counter;
 context-alloc should not be in the critical path.. I'd think probably

Re: [RFC PATCH] fence: dma-buf cross-device synchronization (v12)

2013-08-12 Thread Rob Clark
 default
 wait operation.
 v10: remove event_queue, use a custom list, export try_to_wake_up from
 scheduler. Remove fence lock and use a global spinlock instead,
 this should hopefully remove all the locking headaches I was having
 on trying to implement this. enable_signaling is called with this
 lock held.
 v11:
 Use atomic ops for flags, lifting the need for some spin_lock_irqsaves.
 However I kept the guarantee that after fence_signal returns, it is
 guaranteed that enable_signaling has either been called to completion,
 or will not be called any more.

 Add contexts and seqno to base fence implementation. This allows you
 to wait for less fences, by testing for seqno + signaled, and then only
 wait on the later fence.

 Add FENCE_TRACE, FENCE_WARN, and FENCE_ERR. This makes debugging easier.
 An CONFIG_DEBUG_FENCE will be added to turn off the FENCE_TRACE
 spam, and another runtime option can turn it off at runtime.
 v12:
 Add CONFIG_FENCE_TRACE. Add missing documentation for the fence-context
 and fence-seqno members.

 Signed-off-by: Maarten Lankhorst maarten.lankho...@canonical.com

Hi, few (mostly minor/superficial comments).. I didn't really spot
anything major (but then again, I think I've looked at all/most of the
earlier versions of this too)

Reviewed-by: Rob Clark robdcl...@gmail.com

 ---
  Documentation/DocBook/device-drivers.tmpl |2
  drivers/base/Kconfig  |   10 +
  drivers/base/Makefile |2
  drivers/base/fence.c  |  286 +++
  include/linux/fence.h |  365 
 +
  5 files changed, 664 insertions(+), 1 deletion(-)
  create mode 100644 drivers/base/fence.c
  create mode 100644 include/linux/fence.h

 diff --git a/Documentation/DocBook/device-drivers.tmpl 
 b/Documentation/DocBook/device-drivers.tmpl
 index cbfdf54..241f4c5 100644
 --- a/Documentation/DocBook/device-drivers.tmpl
 +++ b/Documentation/DocBook/device-drivers.tmpl
 @@ -126,6 +126,8 @@ X!Edrivers/base/interface.c
   /sect1
   sect1titleDevice Drivers DMA Management/title
  !Edrivers/base/dma-buf.c
 +!Edrivers/base/fence.c
 +!Iinclude/linux/fence.h
  !Edrivers/base/reservation.c
  !Iinclude/linux/reservation.h
  !Edrivers/base/dma-coherent.c
 diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
 index 5daa259..0ad35df 100644
 --- a/drivers/base/Kconfig
 +++ b/drivers/base/Kconfig
 @@ -200,6 +200,16 @@ config DMA_SHARED_BUFFER
   APIs extension; the file's descriptor can then be passed on to other
   driver.

 +config FENCE_TRACE
 +   bool Enable verbose FENCE_TRACE messages
 +   default n
 +   depends on DMA_SHARED_BUFFER
 +   help
 + Enable the FENCE_TRACE printks. This will add extra
 + spam to the config log, but will make it easier to diagnose

s/config/console/ I guess?

 + lockup related problems for dma-buffers shared across multiple
 + devices.
 +
  config CMA
 bool Contiguous Memory Allocator
 depends on HAVE_DMA_CONTIGUOUS  HAVE_MEMBLOCK
 diff --git a/drivers/base/Makefile b/drivers/base/Makefile
 index 48029aa..8a55cb9 100644
 --- a/drivers/base/Makefile
 +++ b/drivers/base/Makefile
 @@ -10,7 +10,7 @@ obj-$(CONFIG_CMA) += dma-contiguous.o
  obj-y  += power/
  obj-$(CONFIG_HAS_DMA)  += dma-mapping.o
  obj-$(CONFIG_HAVE_GENERIC_DMA_COHERENT) += dma-coherent.o
 -obj-$(CONFIG_DMA_SHARED_BUFFER) += dma-buf.o reservation.o
 +obj-$(CONFIG_DMA_SHARED_BUFFER) += dma-buf.o fence.o reservation.o
  obj-$(CONFIG_ISA)  += isa.o
  obj-$(CONFIG_FW_LOADER)+= firmware_class.o
  obj-$(CONFIG_NUMA) += node.o
 diff --git a/drivers/base/fence.c b/drivers/base/fence.c
 new file mode 100644
 index 000..28e5ffd
 --- /dev/null
 +++ b/drivers/base/fence.c
 @@ -0,0 +1,286 @@
 +/*
 + * Fence mechanism for dma-buf and to allow for asynchronous dma access
 + *
 + * Copyright (C) 2012 Canonical Ltd
 + * Copyright (C) 2012 Texas Instruments
 + *
 + * Authors:
 + * Rob Clark rob.cl...@linaro.org
 + * Maarten Lankhorst maarten.lankho...@canonical.com
 + *
 + * This program is free software; you can redistribute it and/or modify it
 + * under the terms of the GNU General Public License version 2 as published 
 by
 + * the Free Software Foundation.
 + *
 + * This program is distributed in the hope that it will be useful, but 
 WITHOUT
 + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
 + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
 + * more details.
 + *
 + * You should have received a copy of the GNU General Public License along 
 with
 + * this program.  If not, see http://www.gnu.org/licenses/.
 + */
 +
 +#include linux/slab.h
 +#include linux/export.h
 +#include linux/fence.h
 +
 +atomic_t fence_context_counter = ATOMIC_INIT(0);
 +EXPORT_SYMBOL(fence_context_counter);
 +
 +int

Re: [RFC 0/1] drm/pl111: Initial drm/kms driver for pl111

2013-08-09 Thread Rob Clark
On Fri, Aug 9, 2013 at 12:15 PM, Tom Cooksey tom.cook...@arm.com wrote:

  Turning to DRM/KMS, it seems the supported formats of a plane can be
  queried using drm_mode_get_plane. However, there doesn't seem to be a
  way to query the supported formats of a crtc? If display HW only
  supports scanning out from a single buffer (like pl111 does), I think
  it won't have any planes and a fb can only be set on the crtc. In
  which case, how should user-space query which pixel formats that crtc
  supports?

 it is exposed for drm plane's.  What is missing is to expose the
 primary-plane associated with the crtc.

 Cool - so a patch which adds a way to query the what formats a crtc
 supports would be welcome?

well, I kinda think we want something that exposes the primary plane
of the crtc.. I'm thinking something roughly like:

-
diff --git a/include/uapi/drm/drm_mode.h b/include/uapi/drm/drm_mode.h
index 53db7ce..c7ffca8 100644
--- a/include/uapi/drm/drm_mode.h
+++ b/include/uapi/drm/drm_mode.h
@@ -157,6 +157,12 @@ struct drm_mode_get_plane {
 struct drm_mode_get_plane_res {
__u64 plane_id_ptr;
__u32 count_planes;
+   /* The primary planes are in matching order to crtc_id_ptr in
+* drm_mode_card_res (and same length).  For crtc_id[n], it's
+* primary plane is given by primary_plane_id[n].
+*/
+   __u32 count_primary_planes;
+   __u64 primary_plane_id_ptr;
 };

 #define DRM_MODE_ENCODER_NONE  0
-

then use the existing GETPLANE ioctl to query the capabilities

 What about a way to query the stride alignment constraints?

 Presumably using the drm_mode_get_property mechanism would be the
 right way to implement that?


I suppose you could try that.. typically that is in userspace,
however.  It seems like get_property would get messy quickly (ie. is
it a pitch alignment constraint, or stride alignment?  What if this is
different for different formats (in particular tiled)? etc)


  As with v4l2, DRM doesn't appear to have a way to query the stride
  constraints? Assuming there is a way to query the stride constraints,
  there also isn't a way to specify them when creating a buffer with
  DRM, though perhaps the existing pitch parameter of
  drm_mode_create_dumb could be used to allow user-space to pass in a
  minimum stride as well as receive the allocated stride?
 

 well, you really shouldn't be using create_dumb..  you should have a
 userspace piece that is specific to the drm driver, and knows how to
 use that driver's gem allocate ioctl.

 Sorry, why does this need a driver-specific allocation function? It's
 just a display controller driver and I just want to allocate a scan-
 out buffer - all I'm asking is for the display controller driver to
 use a minimum stride alignment so I can export the buffer and use
 another device to fill it with data.

Sure.. but userspace has more information readily available to make a
better choice.  For example, for omapdrm I'd do things differently
depending on whether I wanted to scan out that buffer (or a portion of
it) rotated.  This is something I know in the DDX driver, but not in
the kernel.  And it is quite likely something that is driver specific.
 Sure, we could add that to a generic allocate me a buffer ioctl.
But that doesn't really seem better, and it becomes a problem as soon
as we come across some hw that needs to know something different.  In
userspace, you have a lot more flexibility, since you don't really
need to commit to an API for life.

And to bring back the GStreamer argument (since that seems a fitting
example when you start talking about sharing buffers between many
devices, for example camera+codec+display), it would already be
negotiating format between v4l2src + fooencoder + displaysink.. the
pitch/stride is part of that format information.  If the display isn't
the one with the strictest requirements, we don't want the display
driver deciding what pitch to use.

 The whole point is to be able to allocate the buffer in such a way
 that another device can access it. So the driver _can't_ use a
 special, device specific format, nor can it allocate it from a
 private memory pool because doing so would preclude it from being
 shared with another device.

 That other device doesn't need to be a GPU wither, it could just as
 easily be a camera/ISP or video decoder.



   So presumably you're talking about a GPU driver being the exporter
   here? If so, how could the GPU driver do these kind of tricks on
   memory shared with another device?
 
  Yes, that is gpu-as-exporter.  If someone else is allocating
  buffers, it is up to them to do these tricks or not.  Probably
  there is a pretty good chance that if you aren't a GPU you don't
  need those sort of tricks for fast allocation of transient upload
  buffers, staging textures, temporary pixmaps, etc.  Ie. I don't
  really think a v4l camera or video decoder would benefit from that
  sort of optimization.
 
  Right - but none of 

Re: [RFC 0/1] drm/pl111: Initial drm/kms driver for pl111

2013-08-07 Thread Rob Clark
On Wed, Aug 7, 2013 at 12:23 AM, John Stultz john.stu...@linaro.org wrote:
 On Tue, Aug 6, 2013 at 5:15 AM, Rob Clark robdcl...@gmail.com wrote:
 well, let's divide things up into two categories:

 1) the arrangement and format of pixels.. ie. what userspace would
 need to know if it mmap's a buffer.  This includes pixel format,
 stride, etc.  This should be negotiated in userspace, it would be
 crazy to try to do this in the kernel.

 2) the physical placement of the pages.  Ie. whether it is contiguous
 or not.  Which bank the pages in the buffer are placed in, etc.  This
 is not visible to userspace.  This is the purpose of the attach step,
 so you know all the devices involved in sharing up front before
 allocating the backing pages.  (Or in the worst case, if you have a
 late attacher you at least know when no device is doing dma access
 to a buffer and can reallocate and move the buffer.)  A long time

 One concern I know the Android folks have expressed previously (and
 correct me if its no longer an objection), is that this attach time
 in-kernel constraint solving / moving or reallocating buffers is
 likely to hurt determinism.  If I understood, their perspective was
 that userland knows the device path the buffers will travel through,
 so why not leverage that knowledge, rather then having the kernel have
 to sort it out for itself after the fact.

If you know the device path, then attach the buffer at all the devices
before you start using it.  Problem solved.. kernel knows all devices
before pages need be allocated ;-)

BR,
-R
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 0/1] drm/pl111: Initial drm/kms driver for pl111

2013-08-07 Thread Rob Clark
On Wed, Aug 7, 2013 at 1:33 PM, Tom Cooksey tom.cook...@arm.com wrote:

   Didn't you say that programmatically describing device placement
   constraints was an unbounded problem? I guess we would have to
   accept that it's not possible to describe all possible constraints
   and instead find a way to describe the common ones?
 
  well, the point I'm trying to make, is by dividing your constraints
  into two groups, one that impacts and is handled by userspace, and
  one that is in the kernel (ie. where the pages go), you cut down
  the number of permutations that the kernel has to care about
   considerably. And kernel already cares about, for example, what
  range of addresses that a device can dma to/from.  I think really
  the only thing missing is the max # of sglist entries (contiguous
  or not)
 
  I think it's more than physically contiguous or not.
 
  For example, it can be more efficient to use large page sizes on
  devices with IOMMUs to reduce TLB traffic. I think the size and even
  the availability of large pages varies between different IOMMUs.

 sure.. but I suppose if we can spiff out dma_params to express I need
 contiguous, perhaps we can add some way to express I prefer
 as-contiguous-as-possible.. either way, this is about where the pages
 are placed, and not about the layout of pixels within the page, so
 should be in kernel.  It's something that is missing, but I believe
 that it belongs in dma_params and hidden behind dma_alloc_*() for
 simple drivers.

 Thinking about it, isn't this more a property of the IOMMU? I mean,
 are there any cases where an IOMMU had a large page mode but you
 wouldn't want to use it? So when allocating the memory, you'd have to
 take into account not just the constraints of the devices themselves,
 but also of any IOMMUs any of the device sit behind?


perhaps yes.  But the device is associated w/ the iommu it is attached
to, so this shouldn't be a problem


  There's also the issue of buffer stride alignment. As I say, if the
  buffer is to be written by a tile-based GPU like Mali, it's more
  efficient if the buffer's stride is aligned to the max AXI bus burst
  length. Though I guess a buffer stride only makes sense as a concept
  when interpreting the data as a linear-layout 2D image, so perhaps
  belongs in user-space along with format negotiation?
 

 Yeah.. this isn't about where the pages go, but about the arrangement
 within a page.

 And, well, except for hw that supports the same tiling (or
 compressed-fb) in display+gpu, you probably aren't sharing tiled
 buffers.

 You'd only want to share a buffer between devices if those devices can
 understand the same pixel format. That pixel format can't be device-
 specific or opaque, it has to be explicit. I think drm_fourcc.h is
 what defines all the possible pixel formats. This is the enum I used
 in EGL_EXT_image_dma_buf_import at least. So if we get to the point
 where multiple devices can understand a tiled or compressed format, I
 assume we could just add that format to drm_fourcc.h and possibly
 v4l2's v4l2_mbus_pixelcode enum in v4l2-mediabus.h.

 For user-space to negotiate a common pixel format and now stride
 alignment, I guess it will obviously need a way to query what pixel
 formats a device supports and what its stride alignment requirements
 are.

 I don't know v4l2 very well, but it certainly seems the pixel format
 can be queried using V4L2_SUBDEV_FORMAT_TRY when attempting to set
 a particular format. I couldn't however find a way to retrieve a list
 of supported formats - it seems the mechanism is to try out each
 format in turn to determine if it is supported. Is that right?

it is exposed for drm plane's.  What is missing is to expose the
primary-plane associated with the crtc.

 There doesn't however seem a way to query what stride constraints a
 V4l2 device might have. Does HW abstracted by v4l2 typically have
 such constraints? If so, how can we query them such that a buffer
 allocated by a DRM driver can be imported into v4l2 and used with
 that HW?

 Turning to DRM/KMS, it seems the supported formats of a plane can be
 queried using drm_mode_get_plane. However, there doesn't seem to be a
 way to query the supported formats of a crtc? If display HW only
 supports scanning out from a single buffer (like pl111 does), I think
 it won't have any planes and a fb can only be set on the crtc. In
 which case, how should user-space query which pixel formats that crtc
 supports?

 Assuming user-space can query the supported formats and find a common
 one, it will need to allocate a buffer. Looks like
 drm_mode_create_dumb can do that, but it only takes a bpp parameter,
 there's no format parameter. I assume then that user-space defines
 the format and tells the DRM driver which format the buffer is in
 when creating the fb with drm_mode_fb_cmd2, which does take a format
 parameter? Is that right?

Right, the gem object has no inherent format, it is just some bytes.
The 

Re: [RFC 0/1] drm/pl111: Initial drm/kms driver for pl111

2013-08-06 Thread Rob Clark
On Tue, Aug 6, 2013 at 7:31 AM, Tom Cooksey tom.cook...@arm.com wrote:

  So in some respects, there is a constraint on how buffers which will
  be drawn to using the GPU are allocated. I don't really like the idea
  of teaching the display controller DRM driver about the GPU buffer
  constraints, even if they are fairly trivial like this. If the same
  display HW IP is being used on several SoCs, it seems wrong somehow
  to enforce those GPU constraints if some of those SoCs don't have a
  GPU.

 Well, I suppose you could get min_pitch_alignment from devicetree, or
 something like this..

 In the end, the easy solution is just to make the display allocate to
 the worst-case pitch alignment.  In the early days of dma-buf
 discussions, we kicked around the idea of negotiating or
 programatically describing the constraints, but that didn't really
 seem like a bounded problem.

 Yeah - I was around for some of those discussions and agree it's not
 really an easy problem to solve.



  We may also then have additional constraints when sharing buffers
  between the display HW and video decode or even camera ISP HW.
  Programmatically describing buffer allocation constraints is very
  difficult and I'm not sure you can actually do it - there's some
  pretty complex constraints out there! E.g. I believe there's a
  platform where Y and UV planes of the reference frame need to be in
  separate DRAM banks for real-time 1080p decode, or something like
  that?

 yes, this was discussed.  This is different from pitch/format/size
 constraints.. it is really just a placement constraint (ie. where do
 the physical pages go).  IIRC the conclusion was to use a dummy
 devices with it's own CMA pool for attaching the Y vs UV buffers.

  Anyway, I guess my point is that even if we solve how to allocate
  buffers which will be shared between the GPU and display HW such that
  both sets of constraints are satisfied, that may not be the end of
  the story.
 

 that was part of the reason to punt this problem to userspace ;-)

 In practice, the kernel drivers doesn't usually know too much about
 the dimensions/format/etc.. that is really userspace level knowledge.
 There are a few exceptions when the kernel needs to know how to setup
 GTT/etc for tiled buffers, but normally this sort of information is up
 at the next level up (userspace, and drm_framebuffer in case of
 scanout).  Userspace media frameworks like GStreamer already have a
 concept of format/caps negotiation.  For non-display-gpu sharing, I
 think this is probably where this sort of constraint negotiation
 should be handled.

 I agree that user-space will know which devices will access the buffer
 and thus can figure out at least a common pixel format. Though I'm not
 so sure userspace can figure out more low-level details like alignment
 and placement in physical memory, etc.

well, let's divide things up into two categories:

1) the arrangement and format of pixels.. ie. what userspace would
need to know if it mmap's a buffer.  This includes pixel format,
stride, etc.  This should be negotiated in userspace, it would be
crazy to try to do this in the kernel.

2) the physical placement of the pages.  Ie. whether it is contiguous
or not.  Which bank the pages in the buffer are placed in, etc.  This
is not visible to userspace.  This is the purpose of the attach step,
so you know all the devices involved in sharing up front before
allocating the backing pages.  (Or in the worst case, if you have a
late attacher you at least know when no device is doing dma access
to a buffer and can reallocate and move the buffer.)  A long time
back, I had a patch that added a field or two to 'struct
device_dma_parameters' so that it could be known if a device required
contiguous buffers.. looks like that never got merged, so I'd need to
dig that back up and resend it.  But the idea was to have the 'struct
device' encapsulate all the information that would be needed to
do-the-right-thing when it comes to placement.

 Anyway, assuming user-space can figure out how a buffer should be
 stored in memory, how does it indicate this to a kernel driver and
 actually allocate it? Which ioctl on which device does user-space
 call, with what parameters? Are you suggesting using something like
 ION which exposes the low-level details of how buffers are laid out in
 physical memory to userspace? If not, what?

no, userspace should not need to know this.  And having a central
driver that knows this for all the other drivers in the system doesn't
really solve anything and isn't really scalable.  At best you might
want, in some cases, a flag you can pass when allocating.  For
example, some of the drivers have a 'SCANOUT' flag that can be passed
when allocating a GEM buffer, as a hint to the kernel that 'if this hw
requires contig memory for scanout, allocate this buffer contig'.  But
really, when it comes to sharing buffers between devices, we want this
sort of information in dev-dma_params of the 

Re: [Linaro-mm-sig] [RFC 0/1] drm/pl111: Initial drm/kms driver for pl111

2013-08-06 Thread Rob Clark
On Tue, Aug 6, 2013 at 8:18 AM, Lucas Stach l.st...@pengutronix.de wrote:
 Am Dienstag, den 06.08.2013, 12:31 +0100 schrieb Tom Cooksey:
 Hi Rob,

 +lkml

   On Fri, Jul 26, 2013 at 11:58 AM, Tom Cooksey tom.cook...@arm.com
   wrote:
  * It abuses flags parameter of DRM_IOCTL_MODE_CREATE_DUMB to
also allocate buffers for the GPU. Still not sure how to
resolve this as we don't use DRM for our GPU driver.
   
any thoughts/plans about a DRM GPU driver?  Ideally long term
(esp. once the dma-fence stuff is in place), we'd have
gpu-specific drm (gpu-only, no kms) driver, and SoC/display
specific drm/kms driver, using prime/dmabuf to share between
the two.
   
The extra buffers we were allocating from armsoc DDX were really
being allocated through DRM/GEM so we could get an flink name
for them and pass a reference to them back to our GPU driver on
the client side. If it weren't for our need to access those
extra off-screen buffers with the GPU we wouldn't need to
allocate them with DRM at all. So, given they are really GPU
buffers, it does absolutely make sense to allocate them in a
different driver to the display driver.
   
However, to avoid unnecessary memcpys  related cache
maintenance ops, we'd also like the GPU to render into buffers
which are scanned out by the display controller. So let's say
we continue using DRM_IOCTL_MODE_CREATE_DUMB to allocate scan
out buffers with the display's DRM driver but a custom ioctl
on the GPU's DRM driver to allocate non scanout, off-screen
buffers. Sounds great, but I don't think that really works
with DRI2. If we used two drivers to allocate buffers, which
of those drivers do we return in DRI2ConnectReply? Even if we
solve that somehow, GEM flink names are name-spaced to a
single device node (AFAIK). So when we do a DRI2GetBuffers,
how does the EGL in the client know which DRM device owns GEM
flink name 1234? We'd need some pretty dirty hacks.
  
   You would return the name of the display driver allocating the
   buffers.  On the client side you can use generic ioctls to go from
   flink - handle - dmabuf.  So the client side would end up opening
   both the display drm device and the gpu, but without needing to know
   too much about the display.
  
   I think the bit I was missing was that a GEM bo for a buffer imported
   using dma_buf/PRIME can still be flink'd. So the display controller's
   DRM driver allocates scan-out buffers via the DUMB buffer allocate
   ioctl. Those scan-out buffers than then be exported from the
   dispaly's DRM driver and imported into the GPU's DRM driver using
   PRIME. Once imported into the GPU's driver, we can use flink to get a
   name for that buffer within the GPU DRM driver's name-space to return
   to the DRI2 client. That same namespace is also what DRI2 back-
   buffers are allocated from, so I think that could work... Except...
 
  (and.. the general direction is that things will move more to just use
  dmabuf directly, ie. wayland or dri3)

 I agree, DRI2 is the only reason why we need a system-wide ID. I also
 prefer buffers to be passed around by dma_buf fd, but we still need to
 support DRI2 and will do for some time I expect.



Anyway, that latter case also gets quite difficult. The GPU
DRM driver would need to know the constraints of the display
controller when allocating buffers intended to be scanned out.
For example, pl111 typically isn't behind an IOMMU and so
requires physically contiguous memory. We'd have to teach the
GPU's DRM driver about the constraints of the display HW. Not
exactly a clean driver model. :-(
   
I'm still a little stuck on how to proceed, so any ideas
would greatly appreciated! My current train of thought is
having a kind of SoC-specific DRM driver which allocates
buffers for both display and GPU within a single GEM
namespace. That SoC-specific DRM driver could then know the
constraints of both the GPU and the display HW. We could then
use PRIME to export buffers allocated with the SoC DRM driver
and import them into the GPU and/or display DRM driver.
  
   Usually if the display drm driver is allocating the buffers that
   might be scanned out, it just needs to have minimal knowledge of
   the GPU (pitch alignment constraints).  I don't think we need a
   3rd device just to allocate buffers.
  
   While Mali can render to pretty much any buffer, there is a mild
   performance improvement to be had if the buffer stride is aligned to
   the AXI bus's max burst length when drawing to the buffer.
 
  I suspect the display controllers might frequently benefit if the
  pitch is aligned to AXI burst length too..

 If the display controller is going to be reading from linear memory
 I don't think it will make much difference - you'll just get an extra
 1-2 bus transactions per scanline. With a tile-based GPU like Mali,

Re: [RFC 0/1] drm/pl111: Initial drm/kms driver for pl111

2013-08-06 Thread Rob Clark
On Tue, Aug 6, 2013 at 10:03 AM, Tom Cooksey tom.cook...@arm.com wrote:
 Hi Rob,

   We may also then have additional constraints when sharing buffers
   between the display HW and video decode or even camera ISP HW.
   Programmatically describing buffer allocation constraints is very
   difficult and I'm not sure you can actually do it - there's some
   pretty complex constraints out there! E.g. I believe there's a
   platform where Y and UV planes of the reference frame need to be
   in separate DRAM banks for real-time 1080p decode, or something
   like that?
 
  yes, this was discussed.  This is different from pitch/format/size
  constraints.. it is really just a placement constraint (ie. where
  do the physical pages go).  IIRC the conclusion was to use a dummy
  devices with it's own CMA pool for attaching the Y vs UV buffers.
 
   Anyway, I guess my point is that even if we solve how to allocate
   buffers which will be shared between the GPU and display HW such
   that both sets of constraints are satisfied, that may not be the
   end of the story.
  
 
  that was part of the reason to punt this problem to userspace ;-)
 
  In practice, the kernel drivers doesn't usually know too much about
  the dimensions/format/etc.. that is really userspace level
  knowledge. There are a few exceptions when the kernel needs to know
  how to setup GTT/etc for tiled buffers, but normally this sort of
  information is up at the next level up (userspace, and
  drm_framebuffer in case of scanout).  Userspace media frameworks
  like GStreamer already have a concept of format/caps negotiation.
  For non-display-gpu sharing, I think this is probably where this
  sort of constraint negotiation should be handled.
 
  I agree that user-space will know which devices will access the
  buffer and thus can figure out at least a common pixel format.
  Though I'm not so sure userspace can figure out more low-level
  details like alignment and placement in physical memory, etc.
 

 well, let's divide things up into two categories:

 1) the arrangement and format of pixels.. ie. what userspace would
 need to know if it mmap's a buffer.  This includes pixel format,
 stride, etc.  This should be negotiated in userspace, it would be
 crazy to try to do this in the kernel.

 Absolutely. Pixel format has to be negotiated by user-space as in
 most cases, user-space can map the buffer and thus will need to
 know how to interpret the data.



 2) the physical placement of the pages.  Ie. whether it is contiguous
 or not.  Which bank the pages in the buffer are placed in, etc.  This
 is not visible to userspace.

 Seems sensible to me.


 ... This is the purpose of the attach step,
 so you know all the devices involved in sharing up front before
 allocating the backing pages. (Or in the worst case, if you have a
 late attacher you at least know when no device is doing dma access
 to a buffer and can reallocate and move the buffer.)  A long time
 back, I had a patch that added a field or two to 'struct
 device_dma_parameters' so that it could be known if a device required
 contiguous buffers.. looks like that never got merged, so I'd need to
 dig that back up and resend it.  But the idea was to have the 'struct
 device' encapsulate all the information that would be needed to
 do-the-right-thing when it comes to placement.

 As I understand it, it's up to the exporting device to allocate the
 memory backing the dma_buf buffer. I guess the latest possible point
 you can allocate the backing pages is when map_dma_buf is first
 called? At that point the exporter can iterate over the current set
 of attachments, programmatically determine the all the constraints of
 all the attached drivers and attempt to allocate the backing pages
 in such a way as to satisfy all those constraints?

yes, this is the idea..  possibly some room for some helpers to help
out with this, but that is all under the hood from userspace
perspective

 Didn't you say that programmatically describing device placement
 constraints was an unbounded problem? I guess we would have to
 accept that it's not possible to describe all possible constraints
 and instead find a way to describe the common ones?

well, the point I'm trying to make, is by dividing your constraints
into two groups, one that impacts and is handled by userspace, and one
that is in the kernel (ie. where the pages go), you cut down the
number of permutations that the kernel has to care about considerably.
 And kernel already cares about, for example, what range of addresses
that a device can dma to/from.  I think really the only thing missing
is the max # of sglist entries (contiguous or not)

 One problem with this is it duplicates a lot of logic in each
 driver which can export a dma_buf buffer. Each exporter will need to
 do pretty much the same thing: iterate over all the attachments,
 determine of all the constraints (assuming that can be done) and
 allocate pages such that the lowest-common-denominator 

Re: [Linaro-mm-sig] [RFC 0/1] drm/pl111: Initial drm/kms driver for pl111

2013-08-06 Thread Rob Clark
On Tue, Aug 6, 2013 at 10:36 AM, Lucas Stach l.st...@pengutronix.de wrote:
 Am Dienstag, den 06.08.2013, 10:14 -0400 schrieb Rob Clark:
 On Tue, Aug 6, 2013 at 8:18 AM, Lucas Stach l.st...@pengutronix.de wrote:
  Am Dienstag, den 06.08.2013, 12:31 +0100 schrieb Tom Cooksey:
  Hi Rob,
 
  +lkml
 
On Fri, Jul 26, 2013 at 11:58 AM, Tom Cooksey tom.cook...@arm.com
wrote:
   * It abuses flags parameter of DRM_IOCTL_MODE_CREATE_DUMB to
 also allocate buffers for the GPU. Still not sure how to
 resolve this as we don't use DRM for our GPU driver.

 any thoughts/plans about a DRM GPU driver?  Ideally long term
 (esp. once the dma-fence stuff is in place), we'd have
 gpu-specific drm (gpu-only, no kms) driver, and SoC/display
 specific drm/kms driver, using prime/dmabuf to share between
 the two.

 The extra buffers we were allocating from armsoc DDX were really
 being allocated through DRM/GEM so we could get an flink name
 for them and pass a reference to them back to our GPU driver on
 the client side. If it weren't for our need to access those
 extra off-screen buffers with the GPU we wouldn't need to
 allocate them with DRM at all. So, given they are really GPU
 buffers, it does absolutely make sense to allocate them in a
 different driver to the display driver.

 However, to avoid unnecessary memcpys  related cache
 maintenance ops, we'd also like the GPU to render into buffers
 which are scanned out by the display controller. So let's say
 we continue using DRM_IOCTL_MODE_CREATE_DUMB to allocate scan
 out buffers with the display's DRM driver but a custom ioctl
 on the GPU's DRM driver to allocate non scanout, off-screen
 buffers. Sounds great, but I don't think that really works
 with DRI2. If we used two drivers to allocate buffers, which
 of those drivers do we return in DRI2ConnectReply? Even if we
 solve that somehow, GEM flink names are name-spaced to a
 single device node (AFAIK). So when we do a DRI2GetBuffers,
 how does the EGL in the client know which DRM device owns GEM
 flink name 1234? We'd need some pretty dirty hacks.
   
You would return the name of the display driver allocating the
buffers.  On the client side you can use generic ioctls to go from
flink - handle - dmabuf.  So the client side would end up opening
both the display drm device and the gpu, but without needing to know
too much about the display.
   
I think the bit I was missing was that a GEM bo for a buffer imported
using dma_buf/PRIME can still be flink'd. So the display controller's
DRM driver allocates scan-out buffers via the DUMB buffer allocate
ioctl. Those scan-out buffers than then be exported from the
dispaly's DRM driver and imported into the GPU's DRM driver using
PRIME. Once imported into the GPU's driver, we can use flink to get a
name for that buffer within the GPU DRM driver's name-space to return
to the DRI2 client. That same namespace is also what DRI2 back-
buffers are allocated from, so I think that could work... Except...
  
   (and.. the general direction is that things will move more to just use
   dmabuf directly, ie. wayland or dri3)
 
  I agree, DRI2 is the only reason why we need a system-wide ID. I also
  prefer buffers to be passed around by dma_buf fd, but we still need to
  support DRI2 and will do for some time I expect.
 
 
 
 Anyway, that latter case also gets quite difficult. The GPU
 DRM driver would need to know the constraints of the display
 controller when allocating buffers intended to be scanned out.
 For example, pl111 typically isn't behind an IOMMU and so
 requires physically contiguous memory. We'd have to teach the
 GPU's DRM driver about the constraints of the display HW. Not
 exactly a clean driver model. :-(

 I'm still a little stuck on how to proceed, so any ideas
 would greatly appreciated! My current train of thought is
 having a kind of SoC-specific DRM driver which allocates
 buffers for both display and GPU within a single GEM
 namespace. That SoC-specific DRM driver could then know the
 constraints of both the GPU and the display HW. We could then
 use PRIME to export buffers allocated with the SoC DRM driver
 and import them into the GPU and/or display DRM driver.
   
Usually if the display drm driver is allocating the buffers that
might be scanned out, it just needs to have minimal knowledge of
the GPU (pitch alignment constraints).  I don't think we need a
3rd device just to allocate buffers.
   
While Mali can render to pretty much any buffer, there is a mild
performance improvement to be had if the buffer stride is aligned to
the AXI bus's max burst length when drawing to the buffer.
  
   I suspect the display controllers might frequently benefit if the
   pitch is aligned

Re: [RFC 0/1] drm/pl111: Initial drm/kms driver for pl111

2013-08-06 Thread Rob Clark
On Tue, Aug 6, 2013 at 1:38 PM, Tom Cooksey tom.cook...@arm.com wrote:

  ... This is the purpose of the attach step,
  so you know all the devices involved in sharing up front before
  allocating the backing pages. (Or in the worst case, if you have a
  late attacher you at least know when no device is doing dma access
  to a buffer and can reallocate and move the buffer.)  A long time
  back, I had a patch that added a field or two to 'struct
  device_dma_parameters' so that it could be known if a device
  required contiguous buffers.. looks like that never got merged, so
  I'd need to dig that back up and resend it.  But the idea was to
  have the 'struct device' encapsulate all the information that would
  be needed to do-the-right-thing when it comes to placement.
 
  As I understand it, it's up to the exporting device to allocate the
  memory backing the dma_buf buffer. I guess the latest possible point
  you can allocate the backing pages is when map_dma_buf is first
  called? At that point the exporter can iterate over the current set
  of attachments, programmatically determine the all the constraints of
  all the attached drivers and attempt to allocate the backing pages
  in such a way as to satisfy all those constraints?

 yes, this is the idea..  possibly some room for some helpers to help
 out with this, but that is all under the hood from userspace
 perspective

  Didn't you say that programmatically describing device placement
  constraints was an unbounded problem? I guess we would have to
  accept that it's not possible to describe all possible constraints
  and instead find a way to describe the common ones?

 well, the point I'm trying to make, is by dividing your constraints
 into two groups, one that impacts and is handled by userspace, and one
 that is in the kernel (ie. where the pages go), you cut down the
 number of permutations that the kernel has to care about considerably.
  And kernel already cares about, for example, what range of addresses
 that a device can dma to/from.  I think really the only thing missing
 is the max # of sglist entries (contiguous or not)

 I think it's more than physically contiguous or not.

 For example, it can be more efficient to use large page sizes on
 devices with IOMMUs to reduce TLB traffic. I think the size and even
 the availability of large pages varies between different IOMMUs.

sure.. but I suppose if we can spiff out dma_params to express I need
contiguous, perhaps we can add some way to express I prefer
as-contiguous-as-possible.. either way, this is about where the pages
are placed, and not about the layout of pixels within the page, so
should be in kernel.  It's something that is missing, but I believe
that it belongs in dma_params and hidden behind dma_alloc_*() for
simple drivers.

 There's also the issue of buffer stride alignment. As I say, if the
 buffer is to be written by a tile-based GPU like Mali, it's more
 efficient if the buffer's stride is aligned to the max AXI bus burst
 length. Though I guess a buffer stride only makes sense as a concept
 when interpreting the data as a linear-layout 2D image, so perhaps
 belongs in user-space along with format negotiation?


Yeah.. this isn't about where the pages go, but about the arrangement
within a page.

And, well, except for hw that supports the same tiling (or
compressed-fb) in display+gpu, you probably aren't sharing tiled
buffers.


  One problem with this is it duplicates a lot of logic in each
  driver which can export a dma_buf buffer. Each exporter will need to
  do pretty much the same thing: iterate over all the attachments,
  determine of all the constraints (assuming that can be done) and
  allocate pages such that the lowest-common-denominator is satisfied.
 
  Perhaps rather than duplicating that logic in every driver, we could
  Instead move allocation of the backing pages into dma_buf itself?
 

 I tend to think it is better to add helpers as we see common patterns
 emerge, which drivers can opt-in to using.  I don't think that we
 should move allocation into dma_buf itself, but it would perhaps be
 useful to have dma_alloc_*() variants that could allocate for multiple
 devices.

 A helper could work I guess, though I quite like the idea of having
 dma_alloc_*() variants which take a list of devices to allocate memory
 for.


 That would help for simple stuff, although I'd suspect
 eventually a GPU driver will move away from that.  (Since you probably
 want to play tricks w/ pools of pages that are pre-zero'd and in the
 correct cache state, use spare cycles on the gpu or dma engine to
 pre-zero uncached pages, and games like that.)

 So presumably you're talking about a GPU driver being the exporter
 here? If so, how could the GPU driver do these kind of tricks on
 memory shared with another device?


Yes, that is gpu-as-exporter.  If someone else is allocating buffers,
it is up to them to do these tricks or not.  Probably there is a
pretty good chance that if 

Re: [PATCH V2] drm/exynos: Add fallback option to get non physically continous memory for fb

2013-08-05 Thread Rob Clark
On Mon, Aug 5, 2013 at 5:44 AM, Vikas Sajjan vikas.saj...@linaro.org wrote:
 While trying to get boot-logo up on exynos5420 SMDK which has eDP panel
 connected with resolution 2560x1600, following error occured even with
 IOMMU enabled:
 [0.88] [drm:lowlevel_buffer_allocate] *ERROR* failed to allocate buffer.
 [0.89] [drm] Initialized exynos 1.0.0 20110530 on minor 0

 To address the case where physically continous memory MAY NOT be a
 mandatory requirement for fb, the patch adds a feature to get non physically
 continous memory for fb if IOMMU is supported and if CONTIG memory allocation
 fails.


Reviewed-by: Rob Clark robdcl...@gmail.com


 Signed-off-by: Vikas Sajjan vikas.saj...@linaro.org
 Signed-off-by: Arun Kumar arun...@samsung.com
 ---
 changes since v1:
  - Modified to add the fallback patch if CONTIG alloc fails as 
 suggested
  by Rob Clark robdcl...@gmail.com and Tomasz Figa 
 tomasz.f...@gmail.com.

  - changed the commit message.
 ---
  drivers/gpu/drm/exynos/exynos_drm_fbdev.c |   19 +++
  1 file changed, 15 insertions(+), 4 deletions(-)

 diff --git a/drivers/gpu/drm/exynos/exynos_drm_fbdev.c 
 b/drivers/gpu/drm/exynos/exynos_drm_fbdev.c
 index 8e60bd6..9a4b886 100644
 --- a/drivers/gpu/drm/exynos/exynos_drm_fbdev.c
 +++ b/drivers/gpu/drm/exynos/exynos_drm_fbdev.c
 @@ -16,6 +16,7 @@
  #include drm/drm_crtc.h
  #include drm/drm_fb_helper.h
  #include drm/drm_crtc_helper.h
 +#include drm/exynos_drm.h

  #include exynos_drm_drv.h
  #include exynos_drm_fb.h
 @@ -165,11 +166,21 @@ static int exynos_drm_fbdev_create(struct drm_fb_helper 
 *helper,

 size = mode_cmd.pitches[0] * mode_cmd.height;

 -   /* 0 means to allocate physically continuous memory */
 -   exynos_gem_obj = exynos_drm_gem_create(dev, 0, size);
 +   exynos_gem_obj = exynos_drm_gem_create(dev, EXYNOS_BO_CONTIG, size);
 if (IS_ERR(exynos_gem_obj)) {
 -   ret = PTR_ERR(exynos_gem_obj);
 -   goto err_release_framebuffer;
 +   /*
 +* If IOMMU is supported then try to get buffer from
 +* non-continous memory area
 +*/
 +   if (is_drm_iommu_supported(dev))
 +   exynos_gem_obj = exynos_drm_gem_create(dev,
 +   EXYNOS_BO_NONCONTIG, size);
 +   if (IS_ERR(exynos_gem_obj)) {
 +   ret = PTR_ERR(exynos_gem_obj);
 +   goto err_release_framebuffer;
 +   }
 +   dev_warn(pdev-dev, exynos_gem_obj for FB is allocated 
 with\n
 +   non physically continuous memory\n);
 }

 exynos_fbdev-exynos_gem_obj = exynos_gem_obj;
 --
 1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 0/1] drm/pl111: Initial drm/kms driver for pl111

2013-08-05 Thread Rob Clark
On Mon, Aug 5, 2013 at 1:10 PM, Tom Cooksey tom.cook...@arm.com wrote:
 Hi Rob,

 +linux-media, +linaro-mm-sig for discussion of video/camera
 buffer constraints...


 On Fri, Jul 26, 2013 at 11:58 AM, Tom Cooksey tom.cook...@arm.com
 wrote:
* It abuses flags parameter of DRM_IOCTL_MODE_CREATE_DUMB to also
  allocate buffers for the GPU. Still not sure how to resolve
  this as we don't use DRM for our GPU driver.
 
  any thoughts/plans about a DRM GPU driver?  Ideally long term (esp.
  once the dma-fence stuff is in place), we'd have gpu-specific drm
  (gpu-only, no kms) driver, and SoC/display specific drm/kms driver,
  using prime/dmabuf to share between the two.
 
  The extra buffers we were allocating from armsoc DDX were really
  being allocated through DRM/GEM so we could get an flink name
  for them and pass a reference to them back to our GPU driver on
  the client side. If it weren't for our need to access those
  extra off-screen buffers with the GPU we wouldn't need to
  allocate them with DRM at all. So, given they are really GPU
  buffers, it does absolutely make sense to allocate them in a
  different driver to the display driver.
 
  However, to avoid unnecessary memcpys  related cache
  maintenance ops, we'd also like the GPU to render into buffers
  which are scanned out by the display controller. So let's say
  we continue using DRM_IOCTL_MODE_CREATE_DUMB to allocate scan
  out buffers with the display's DRM driver but a custom ioctl
  on the GPU's DRM driver to allocate non scanout, off-screen
  buffers. Sounds great, but I don't think that really works
  with DRI2. If we used two drivers to allocate buffers, which
  of those drivers do we return in DRI2ConnectReply? Even if we
  solve that somehow, GEM flink names are name-spaced to a
  single device node (AFAIK). So when we do a DRI2GetBuffers,
  how does the EGL in the client know which DRM device owns GEM
  flink name 1234? We'd need some pretty dirty hacks.

 You would return the name of the display driver allocating the
 buffers.  On the client side you can use generic ioctls to go from
 flink - handle - dmabuf.  So the client side would end up opening
 both the display drm device and the gpu, but without needing to know
 too much about the display.

 I think the bit I was missing was that a GEM bo for a buffer imported
 using dma_buf/PRIME can still be flink'd. So the display controller's
 DRM driver allocates scan-out buffers via the DUMB buffer allocate
 ioctl. Those scan-out buffers than then be exported from the
 dispaly's DRM driver and imported into the GPU's DRM driver using
 PRIME. Once imported into the GPU's driver, we can use flink to get a
 name for that buffer within the GPU DRM driver's name-space to return
 to the DRI2 client. That same namespace is also what DRI2 back-buffers
 are allocated from, so I think that could work... Except...


(and.. the general direction is that things will move more to just use
dmabuf directly, ie. wayland or dri3)


  Anyway, that latter case also gets quite difficult. The GPU
  DRM driver would need to know the constraints of the display
  controller when allocating buffers intended to be scanned out.
  For example, pl111 typically isn't behind an IOMMU and so
  requires physically contiguous memory. We'd have to teach the
  GPU's DRM driver about the constraints of the display HW. Not
  exactly a clean driver model. :-(
 
  I'm still a little stuck on how to proceed, so any ideas
  would greatly appreciated! My current train of thought is
  having a kind of SoC-specific DRM driver which allocates
  buffers for both display and GPU within a single GEM
  namespace. That SoC-specific DRM driver could then know the
  constraints of both the GPU and the display HW. We could then
  use PRIME to export buffers allocated with the SoC DRM driver
  and import them into the GPU and/or display DRM driver.

 Usually if the display drm driver is allocating the buffers that might
 be scanned out, it just needs to have minimal knowledge of the GPU
 (pitch alignment constraints).  I don't think we need a 3rd device
 just to allocate buffers.

 While Mali can render to pretty much any buffer, there is a mild
 performance improvement to be had if the buffer stride is aligned to
 the AXI bus's max burst length when drawing to the buffer.

I suspect the display controllers might frequently benefit if the
pitch is aligned to AXI burst length too..

 So in some respects, there is a constraint on how buffers which will
 be drawn to using the GPU are allocated. I don't really like the idea
 of teaching the display controller DRM driver about the GPU buffer
 constraints, even if they are fairly trivial like this. If the same
 display HW IP is being used on several SoCs, it seems wrong somehow
 to enforce those GPU constraints if some of those SoCs don't have a
 GPU.

Well, I suppose you could get min_pitch_alignment from devicetree, or
something like this..

In the end, the easy solution is just 

Re: [PATCH] drm/exynos: Add check for IOMMU while passing physically continous memory flag

2013-08-01 Thread Rob Clark
On Thu, Aug 1, 2013 at 7:20 PM, Tomasz Figa tomasz.f...@gmail.com wrote:
 Hi Vikas,

 On Thursday 01 of August 2013 16:49:32 Vikas Sajjan wrote:
 While trying to get boot-logo up on exynos5420 SMDK which has eDP panel
 connected with resolution 2560x1600, following error occured even with
 IOMMU enabled:
 [0.88] [drm:lowlevel_buffer_allocate] *ERROR* failed to allocate
 buffer. [0.89] [drm] Initialized exynos 1.0.0 20110530 on minor 0

 This patch fixes the issue by adding a check for IOMMU.

 Signed-off-by: Vikas Sajjan vikas.saj...@linaro.org
 Signed-off-by: Arun Kumar arun...@samsung.com
 ---
  drivers/gpu/drm/exynos/exynos_drm_fbdev.c |9 -
  1 file changed, 8 insertions(+), 1 deletion(-)

 diff --git a/drivers/gpu/drm/exynos/exynos_drm_fbdev.c
 b/drivers/gpu/drm/exynos/exynos_drm_fbdev.c index 8e60bd6..2a8
 100644
 --- a/drivers/gpu/drm/exynos/exynos_drm_fbdev.c
 +++ b/drivers/gpu/drm/exynos/exynos_drm_fbdev.c
 @@ -16,6 +16,7 @@
  #include drm/drm_crtc.h
  #include drm/drm_fb_helper.h
  #include drm/drm_crtc_helper.h
 +#include drm/exynos_drm.h

  #include exynos_drm_drv.h
  #include exynos_drm_fb.h
 @@ -143,6 +144,7 @@ static int exynos_drm_fbdev_create(struct
 drm_fb_helper *helper, struct platform_device *pdev = dev-platformdev;
   unsigned long size;
   int ret;
 + unsigned int flag;

   DRM_DEBUG_KMS(surface width(%d), height(%d) and bpp(%d\n,
   sizes-surface_width, sizes-surface_height,
 @@ -166,7 +168,12 @@ static int exynos_drm_fbdev_create(struct
 drm_fb_helper *helper, size = mode_cmd.pitches[0] * mode_cmd.height;

   /* 0 means to allocate physically continuous memory */
 - exynos_gem_obj = exynos_drm_gem_create(dev, 0, size);
 + if (!is_drm_iommu_supported(dev))
 + flag = 0;
 + else
 + flag = EXYNOS_BO_NONCONTIG;

 While noncontig memory might be used for devices that support IOMMU, there
 should be no problem with using contig memory for them, so this seems more
 like masking the original problem rather than tracking it down.

it is probably a good idea to not require contig memory when it is not
needed for performance or functionality (and if it is only
performance, then fallback gracefully to non-contig).. but yeah, would
be good to know if this is masking another issue all the same

BR,
-R

 Could you check why the allocation fails when requesting contiguous
 memory?

 Best regards,
 Tomasz

 --
 To unsubscribe from this list: send the line unsubscribe linux-media in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] dmabuf-sync: Introduce buffer synchronization framework

2013-06-25 Thread Rob Clark
On Tue, Jun 25, 2013 at 5:09 AM, Inki Dae daei...@gmail.com wrote:
 that
 should be the role of kernel memory management which of course needs
 synchronization btw A and B. But in no case this should be done using
 dma-buf. dma-buf is for sharing content btw different devices not
 sharing resources.


 hmm, is that true? And are you sure? Then how do you think about
 reservation? the reservation also uses dma-buf with same reason as long as I
 know: actually, we use reservation to use dma-buf. As you may know, a
 reservation object is allocated and initialized when a buffer object is
 exported to a dma buf.

no, this is why the reservation object can be passed in when you
construction the dmabuf.  The fallback is for dmabuf to create it's
own, for compatibility and to make life easier for simple devices with
few buffers... but I think pretty much all drm drivers would embed the
reservation object in the gem buffer and pass it in when the dmabuf is
created.

It is pretty much imperative that synchronization works independently
of dmabuf, you really don't want to have two different cases to deal
with in your driver, one for synchronizing non-exported objects, and
one for synchronizing dmabuf objects.

BR,
-R
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Introduce a new helper framework for buffer synchronization

2013-05-28 Thread Rob Clark
On Mon, May 27, 2013 at 11:56 PM, Inki Dae inki@samsung.com wrote:


 -Original Message-
 From: linux-fbdev-ow...@vger.kernel.org [mailto:linux-fbdev-
 ow...@vger.kernel.org] On Behalf Of Rob Clark
 Sent: Tuesday, May 28, 2013 12:48 AM
 To: Inki Dae
 Cc: Maarten Lankhorst; Daniel Vetter; linux-fbdev; YoungJun Cho; Kyungmin
 Park; myungjoo.ham; DRI mailing list;
 linux-arm-ker...@lists.infradead.org;
 linux-media@vger.kernel.org
 Subject: Re: Introduce a new helper framework for buffer synchronization

 On Mon, May 27, 2013 at 6:38 AM, Inki Dae inki@samsung.com wrote:
  Hi all,
 
  I have been removed previous branch and added new one with more cleanup.
  This time, the fence helper doesn't include user side interfaces and
 cache
  operation relevant codes anymore because not only we are not sure that
  coupling those two things, synchronizing caches and buffer access
 between
  CPU and CPU, CPU and DMA, and DMA and DMA with fences, in kernel side is
 a
  good idea yet but also existing codes for user side have problems with
 badly
  behaved or crashing userspace. So this could be more discussed later.
 
  The below is a new branch,
 
  https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-
 exynos.git/?h=dma-f
  ence-helper
 
  And fence helper codes,
 
  https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-
 exynos.git/commit/?
  h=dma-fence-helperid=adcbc0fe7e285ce866e5816e5e21443dcce01005
 
  And example codes for device driver,
 
  https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-
 exynos.git/commit/?
  h=dma-fence-helperid=d2ce7af23835789602a99d0ccef1f53cdd5caaae
 
  I think the time is not yet ripe for RFC posting: maybe existing dma
 fence
  and reservation need more review and addition work. So I'd glad for
 somebody
  giving other opinions and advices in advance before RFC posting.

 thoughts from a *really* quick, pre-coffee, first look:
 * any sort of helper to simplify single-buffer sort of use-cases (v4l)
 probably wouldn't want to bake in assumption that seqno_fence is used.
 * I guess g2d is probably not actually a simple use case, since I
 expect you can submit blits involving multiple buffers :-P

 I don't think so. One and more buffers can be used: seqno_fence also has
 only one buffer. Actually, we have already applied this approach to most
 devices; multimedia, gpu and display controller. And this approach shows
 more performance; reduced power consumption against traditional way. And g2d
 example is just to show you how to apply my approach to device driver.

no, you need the ww-mutex / reservation stuff any time you have
multiple independent devices (or rings/contexts for hw that can
support multiple contexts) which can do operations with multiple
buffers.  So you could conceivably hit this w/ gpu + g2d if multiple
buffers where shared between the two.  vram migration and such
'desktop stuff' might make the problem worse, but just because you
don't have vram doesn't mean you don't have a problem with multiple
buffers.

 * otherwise, you probably don't want to depend on dmabuf, which is why
 reservation/fence is split out the way it is..  you want to be able to
 use a single reservation/fence mechanism within your driver without
 having to care about which buffers are exported to dmabuf's and which
 are not.  Creating a dmabuf for every GEM bo is too heavyweight.

 Right. But I think we should dealt with this separately. Actually, we are
 trying to use reservation for gpu pipe line synchronization such as sgx sync
 object and this approach is used without dmabuf. In order words, some device
 can use only reservation for such pipe line synchronization and at the same
 time, fence helper or similar thing with dmabuf for buffer synchronization.

it is probably easier to approach from the reverse direction.. ie, get
reservation/synchronization right first, and then dmabuf.  (Well, that
isn't really a problem because Maarten's reservation/fence patches
support dmabuf from the beginning.)

BR,
-R


 I'm not entirely sure if reservation/fence could/should be made any
 simpler for multi-buffer users.  Probably the best thing to do is just
 get reservation/fence rolled out in a few drivers and see if some
 common patterns emerge.

 BR,
 -R

 
  Thanks,
  Inki Dae
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-fbdev in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Introduce a new helper framework for buffer synchronization

2013-05-27 Thread Rob Clark
On Mon, May 27, 2013 at 6:38 AM, Inki Dae inki@samsung.com wrote:
 Hi all,

 I have been removed previous branch and added new one with more cleanup.
 This time, the fence helper doesn't include user side interfaces and cache
 operation relevant codes anymore because not only we are not sure that
 coupling those two things, synchronizing caches and buffer access between
 CPU and CPU, CPU and DMA, and DMA and DMA with fences, in kernel side is a
 good idea yet but also existing codes for user side have problems with badly
 behaved or crashing userspace. So this could be more discussed later.

 The below is a new branch,

 https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-exynos.git/?h=dma-f
 ence-helper

 And fence helper codes,

 https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-exynos.git/commit/?
 h=dma-fence-helperid=adcbc0fe7e285ce866e5816e5e21443dcce01005

 And example codes for device driver,

 https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-exynos.git/commit/?
 h=dma-fence-helperid=d2ce7af23835789602a99d0ccef1f53cdd5caaae

 I think the time is not yet ripe for RFC posting: maybe existing dma fence
 and reservation need more review and addition work. So I'd glad for somebody
 giving other opinions and advices in advance before RFC posting.

thoughts from a *really* quick, pre-coffee, first look:
* any sort of helper to simplify single-buffer sort of use-cases (v4l)
probably wouldn't want to bake in assumption that seqno_fence is used.
* I guess g2d is probably not actually a simple use case, since I
expect you can submit blits involving multiple buffers :-P
* otherwise, you probably don't want to depend on dmabuf, which is why
reservation/fence is split out the way it is..  you want to be able to
use a single reservation/fence mechanism within your driver without
having to care about which buffers are exported to dmabuf's and which
are not.  Creating a dmabuf for every GEM bo is too heavyweight.

I'm not entirely sure if reservation/fence could/should be made any
simpler for multi-buffer users.  Probably the best thing to do is just
get reservation/fence rolled out in a few drivers and see if some
common patterns emerge.

BR,
-R


 Thanks,
 Inki Dae

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Introduce a new helper framework for buffer synchronization

2013-05-15 Thread Rob Clark
On Wed, May 15, 2013 at 1:19 AM, Inki Dae inki@samsung.com wrote:


 -Original Message-
 From: Rob Clark [mailto:robdcl...@gmail.com]
 Sent: Tuesday, May 14, 2013 10:39 PM
 To: Inki Dae
 Cc: linux-fbdev; DRI mailing list; Kyungmin Park; myungjoo.ham; YoungJun
 Cho; linux-arm-ker...@lists.infradead.org; linux-media@vger.kernel.org
 Subject: Re: Introduce a new helper framework for buffer synchronization

 On Mon, May 13, 2013 at 10:52 PM, Inki Dae inki@samsung.com wrote:
  well, for cache management, I think it is a better idea.. I didn't
  really catch that this was the motivation from the initial patch, but
  maybe I read it too quickly.  But cache can be decoupled from
  synchronization, because CPU access is not asynchronous.  For
  userspace/CPU access to buffer, you should:
 
1) wait for buffer
2) prepare-access
3)  ... do whatever cpu access to buffer ...
4) finish-access
5) submit buffer for new dma-operation
 
 
 
  For data flow from CPU to DMA device,
  1) wait for buffer
  2) prepare-access (dma_buf_begin_cpu_access)
  3) cpu access to buffer
 
 
  For data flow from DMA device to CPU
  1) wait for buffer

 Right, but CPU access isn't asynchronous (from the point of view of
 the CPU), so there isn't really any wait step at this point.  And if
 you do want the CPU to be able to signal a fence from userspace for
 some reason, you probably what something file/fd based so the
 refcnting/cleanup when process dies doesn't leave some pending DMA
 action wedged.  But I don't really see the point of that complexity
 when the CPU access isn't asynchronous in the first place.


 There was my missing comments, please see the below sequence.

 For data flow from CPU to DMA device and then from DMA device to CPU,
 1) wait for buffer - at user side - ioctl(fd, DMA_BUF_GET_FENCE, ...)
 - including prepare-access (dma_buf_begin_cpu_access)
 2) cpu access to buffer
 3) wait for buffer - at device driver
 - but CPU is already accessing the buffer so blocked.
 4) signal - at user side - ioctl(fd, DMA_BUF_PUT_FENCE, ...)
 5) the thread, blocked at 3), is waked up by 4).
 - and then finish-access (dma_buf_end_cpu_access)

right, I understand you can have background threads, etc, in
userspace.  But there are already plenty of synchronization primitives
that can be used for cpu-cpu synchronization, either within the same
process or between multiple processes.  For cpu access, even if it is
handled by background threads/processes, I think it is better to use
the traditional pthreads or unix synchronization primitives.  They
have existed forever, they are well tested, and they work.

So while it seems nice and orthogonal/clean to couple cache and
synchronization and handle dma-cpu and cpu-cpu and cpu-dma in the
same generic way, but I think in practice we have to make things more
complex than they otherwise need to be to do this.  Otherwise I think
we'll be having problems with badly behaved or crashing userspace.

BR,
-R

 6) dma access to buffer
 7) wait for buffer - at user side - ioctl(fd, DMA_BUF_GET_FENCE, ...)
 - but DMA is already accessing the buffer so blocked.
 8) signal - at device driver
 9) the thread, blocked at 7), is waked up by 8)
 - and then prepare-access (dma_buf_begin_cpu_access)
 10 cpu access to buffer

 Basically, 'wait for buffer' includes buffer synchronization, committing
 processing, and cache operation. The buffer synchronization means that a
 current thread should wait for other threads accessing a shared buffer until
 the completion of their access. And the committing processing means that a
 current thread possesses the shared buffer so any trying to access the
 shared buffer by another thread makes the thread to be blocked. However, as
 I already mentioned before, it seems that these user interfaces are so ugly
 yet. So we need better way.

 Give me more comments if there is my missing point :)

 Thanks,
 Inki Dae

 BR,
 -R


  2) finish-access (dma_buf_end _cpu_access)
  3) dma access to buffer
 
  1) and 2) are coupled with one function: we have implemented
  fence_helper_commit_reserve() for it.
 
  Cache control(cache clean or cache invalidate) is performed properly
  checking previous access type and current access type.
  And the below is actual codes for it,

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Introduce a new helper framework for buffer synchronization

2013-05-14 Thread Rob Clark
On Mon, May 13, 2013 at 10:52 PM, Inki Dae inki@samsung.com wrote:
 well, for cache management, I think it is a better idea.. I didn't
 really catch that this was the motivation from the initial patch, but
 maybe I read it too quickly.  But cache can be decoupled from
 synchronization, because CPU access is not asynchronous.  For
 userspace/CPU access to buffer, you should:

   1) wait for buffer
   2) prepare-access
   3)  ... do whatever cpu access to buffer ...
   4) finish-access
   5) submit buffer for new dma-operation



 For data flow from CPU to DMA device,
 1) wait for buffer
 2) prepare-access (dma_buf_begin_cpu_access)
 3) cpu access to buffer


 For data flow from DMA device to CPU
 1) wait for buffer

Right, but CPU access isn't asynchronous (from the point of view of
the CPU), so there isn't really any wait step at this point.  And if
you do want the CPU to be able to signal a fence from userspace for
some reason, you probably what something file/fd based so the
refcnting/cleanup when process dies doesn't leave some pending DMA
action wedged.  But I don't really see the point of that complexity
when the CPU access isn't asynchronous in the first place.

BR,
-R


 2) finish-access (dma_buf_end _cpu_access)
 3) dma access to buffer

 1) and 2) are coupled with one function: we have implemented
 fence_helper_commit_reserve() for it.

 Cache control(cache clean or cache invalidate) is performed properly
 checking previous access type and current access type.
 And the below is actual codes for it,
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Introduce a new helper framework for buffer synchronization

2013-05-13 Thread Rob Clark
On Mon, May 13, 2013 at 8:21 AM, Inki Dae inki@samsung.com wrote:

 In that case you still wouldn't give userspace control over the fences. I
 don't see any way that can end well.
 What if userspace never signals? What if userspace gets killed by oom
 killer. Who keeps track of that?


 In all cases, all kernel resources to user fence will be released by kernel
 once the fence is timed out: never signaling and process killing by oom
 killer makes the fence timed out. And if we use mmap mechanism you mentioned
 before, I think user resource could also be freed properly.


I tend to agree w/ Maarten here.. there is no good reason for
userspace to be *signaling* fences.  The exception might be some blob
gpu drivers which don't have enough knowledge in the kernel to figure
out what to do.  (In which case you can add driver private ioctls for
that.. still not the right thing to do but at least you don't make a
public API out of it.)

BR,
-R
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Introduce a new helper framework for buffer synchronization

2013-05-13 Thread Rob Clark
On Mon, May 13, 2013 at 1:18 PM, Inki Dae inki@samsung.com wrote:


 2013/5/13 Rob Clark robdcl...@gmail.com

 On Mon, May 13, 2013 at 8:21 AM, Inki Dae inki@samsung.com wrote:
 
  In that case you still wouldn't give userspace control over the fences.
  I
  don't see any way that can end well.
  What if userspace never signals? What if userspace gets killed by oom
  killer. Who keeps track of that?
 
 
  In all cases, all kernel resources to user fence will be released by
  kernel
  once the fence is timed out: never signaling and process killing by oom
  killer makes the fence timed out. And if we use mmap mechanism you
  mentioned
  before, I think user resource could also be freed properly.


 I tend to agree w/ Maarten here.. there is no good reason for
 userspace to be *signaling* fences.  The exception might be some blob
 gpu drivers which don't have enough knowledge in the kernel to figure
 out what to do.  (In which case you can add driver private ioctls for
 that.. still not the right thing to do but at least you don't make a
 public API out of it.)


 Please do not care whether those are generic or not. Let's see the following
 three things. First, it's cache operation. As you know, ARM SoC has ACP
 (Accelerator Coherency Port) and can be connected to DMA engine or similar
 devices. And this port is used for cache coherency between CPU cache and DMA
 device. However, most devices on ARM based embedded systems don't use the
 ACP port. So they need proper cache operation before and after of DMA or CPU
 access in case of using cachable mapping. Actually, I see many Linux based
 platforms call cache control interfaces directly for that. I think the
 reason, they do so, is that kernel isn't aware of when and how CPU accessed
 memory.

I think we had kicked around the idea of giving dmabuf's a
prepare/finish ioctl quite some time back.  This is probably something
that should be at least a bit decoupled from fences.  (Possibly
'prepare' waits for dma access to complete, but not the other way
around.)

And I did implement in omapdrm support for simulating coherency via
page fault-in / shoot-down..  It is one option that makes it
completely transparent to userspace, although there is some
performance const, so I suppose it depends a bit on your use-case.

 And second, user process has to do so many things in case of using shared
 memory with DMA device. User process should understand how DMA device is
 operated and when interfaces for controling the DMA device are called. Such
 things would make user application so complicated.

 And third, it's performance optimization to multimedia and graphics devices.
 As I mentioned already, we should consider sequential processing for buffer
 sharing between CPU and DMA device. This means that CPU should stay with
 idle until DMA device is completed and vise versa.

 That is why I proposed such user interfaces. Of course, these interfaces
 might be so ugly yet: for this, Maarten pointed already out and I agree with
 him. But there must be another better way. Aren't you think we need similar
 thing? With such interfaces, cache control and buffer synchronization can be
 performed in kernel level. Moreover, user applization doesn't need to
 consider DMA device controlling anymore. Therefore, one thread can access a
 shared buffer and the other can control DMA device with the shared buffer in
 parallel. We can really make the best use of CPU and DMA idle time. In other
 words, we can really make the best use of multi tasking OS, Linux.

 So could you please tell me about that there is any reason we don't use
 public API for it? I think we can add and use public API if NECESSARY.

well, for cache management, I think it is a better idea.. I didn't
really catch that this was the motivation from the initial patch, but
maybe I read it too quickly.  But cache can be decoupled from
synchronization, because CPU access is not asynchronous.  For
userspace/CPU access to buffer, you should:

  1) wait for buffer
  2) prepare-access
  3)  ... do whatever cpu access to buffer ...
  4) finish-access
  5) submit buffer for new dma-operation

I suppose you could combine the syscall for #1 and #2.. not sure if
that is a good idea or not.  But you don't need to.  And there is
never really any need for userspace to signal a fence.

BR,
-R

 Thanks,
 Inki Dae


 BR,
 -R
 ___
 dri-devel mailing list
 dri-de...@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/dri-devel


--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] drm/udl: avoid swiotlb for imported vmap buffers.

2013-05-06 Thread Rob Clark
On Mon, May 6, 2013 at 4:44 PM, Daniel Vetter dan...@ffwll.ch wrote:
 On Mon, May 6, 2013 at 9:56 PM, Dave Airlie airl...@gmail.com wrote:
 On Tue, May 7, 2013 at 1:59 AM, Daniel Vetter dan...@ffwll.ch wrote:
 On Mon, May 06, 2013 at 10:35:35AM +1000, Dave Airlie wrote:
 
  However if we don't set a dma mask on the usb device, the mapping
  ends up using swiotlb on machines that have it enabled, which
  is less than desireable.
 
  Signed-off-by: Dave Airlie airl...@redhat.com
 
  Fyi for everyone else who was not on irc when DaveI discussed this:
  This really shouldn't be required and I think the real issue is that
  udl creates a dma_buf attachement (which is needed for device dma
  only), but only really wants to do cpu access through vmap/kmap. So
  not attached the device should be good enough. Cc'ing a few more lists
  for better fyi ;-)

 Though I've looked at this a bit more, and since I want to be able to 
 expose
 shared objects as proper GEM objects from the import side I really
 need that list of pages.

 Hm, what does proper GEM object mean in the context of udl?

 One that appears the same as a GEM object created by userspace. i.e. mmap 
 works.

 Oh, we have an mmap interface in the dma_buf thing for that, and iirc
 Rob Clark even bothered to implement the gem-dma_buf mmap forwarding
 somewhere. And iirc android's ion-on-dma_buf stuff is even using the
 mmap interface stuff.

fwiw, what I did was dma_buf - gem mmap fwding, ie. implement mmap
for the dmabuf object by fwd'ing it to my normal gem mmap code.  Might
be the opposite of what you are looking for here.  Although I suppose
the reverse could work to, I hadn't really thought about it yet.

BR,
-R

 Now for prime let's just ship this, dammit prevailed for now. But I
 still think that hiding the backing storage a bit better (with the
 eventual goal of supporting eviction with Maarten's fence/ww_mutex
 madness) feels like a worthy long-term goal.

 Cheers, Daniel
 --
 Daniel Vetter
 Software Engineer, Intel Corporation
 +41 (0) 79 365 57 48 - http://blog.ffwll.ch
 ___
 dri-devel mailing list
 dri-de...@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/dri-devel
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 0/5] Common Display Framework

2013-02-02 Thread Rob Clark
On Fri, Feb 1, 2013 at 5:42 PM, Laurent Pinchart
laurent.pinch...@ideasonboard.com wrote:
 Hi Rahul,

 On Wednesday 09 January 2013 13:53:30 Rahul Sharma wrote:
 Hi Laurent,

 CDF will also be helpful in supporting Panels with integrated audio
 (HDMI/DP) if we can add audio related control operations to
 display_entity_control_ops. Video controls will be called by crtc in DRM/V4L
 and audio controls from Alsa.

 I knew that would come up at some point :-) I agree with you that adding audio
 support would be a very nice improvement, and I'm totally open to that, but I
 will concentrate on video, at least to start with. The first reason is that
 I'm not familiar enough with ALSA, and the second that there's only 24h per
 day :-)

 Please feel free, of course, to submit a proposal for audio support.

 Secondly, if I need to support get_modes operation in hdmi/dp panel, I need
 to implement edid parser inside the panel driver. It will be meaningful to
 add get_edid control operation for hdmi/dp.

 Even if EDID data is parsed in the panel driver, raw EDID will still need to
 be exported, so a get_edid control operation (or something similar) is
 definitely needed. There's no disagreement on this, I just haven't included
 that operation yet because my test hardware is purely panel-based.

one of (probably many) places that just keeping CDF (CDH? common
display helpers..) inside DRM makes life easier :-P

BR,
-R

 --
 Regards,

 Laurent Pinchart

 --
 To unsubscribe from this list: send the line unsubscribe linux-media in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/7] mutex: add support for reservation style locks

2013-01-31 Thread Rob Clark
On Wed, Jan 30, 2013 at 5:52 AM, Rob Clark robdcl...@gmail.com wrote:
 On Wed, Jan 30, 2013 at 5:08 AM, Daniel Vetter dan...@ffwll.ch wrote:
 On Wed, Jan 30, 2013 at 2:07 AM, Rob Clark robdcl...@gmail.com wrote:
 ==
 Basic problem statement:
 - --- -
 GPU's do operations that commonly involve many buffers.  Those buffers
 can be shared across contexts/processes, exist in different memory
 domains (for example VRAM vs system memory), and so on.  And with
 PRIME / dmabuf, they can even be shared across devices.  So there are
 a handful of situations where the driver needs to wait for buffers to
 become ready.  If you think about this in terms of waiting on a buffer
 mutex for it to become available, this presents a problem because
 there is no way to guarantee that buffers appear in a execbuf/batch in
 the same order in all contexts.  That is directly under control of
 userspace, and a result of the sequence of GL calls that an
 application makes.  Which results in the potential for deadlock.  The
 problem gets more complex when you consider that the kernel may need
 to migrate the buffer(s) into VRAM before the GPU operates on the
 buffer(s), which main in turn require evicting some other buffers (and
 you don't want to evict other buffers which are already queued up to
 the GPU), but for a simplified understanding of the problem you can
 ignore this.

 The algorithm that TTM came up with for dealing with this problem is
 quite simple.  For each group of buffers (execbuf) that need to be
 locked, the caller would be assigned a unique reservation_id, from a
 global counter.  In case of deadlock in the process of locking all the
 buffers associated with a execbuf, the one with the lowest
 reservation_id wins, and the one with the higher reservation_id
 unlocks all of the buffers that it has already locked, and then tries
 again.

 Originally TTM implemented this algorithm on top of an event-queue and
 atomic-ops, but Maarten Lankhorst realized that by merging this with
 the mutex code we could take advantage of the existing mutex fast-path
 code and result in a simpler solution, and so ticket_mutex was born.
 (Well, there where also some additional complexities with the original
 implementation when you start adding in cross-device buffer sharing
 for PRIME.. Maarten could probably better explain.)

 I think the motivational writeup above is really nice, but the example
 code below is a bit wrong

 How it is used:
 --- -- -- -

 A very simplified version:

   int submit_execbuf(execbuf)
   {
   /* acquiring locks, before queuing up to GPU: */
   seqno = assign_global_seqno();
   retry:
   for (buf in execbuf-buffers) {
   ret = mutex_reserve_lock(buf-lock, seqno);
   switch (ret) {
   case 0:
   /* we got the lock */
   break;
   case -EAGAIN:
   /* someone with a lower seqno, so unreserve and try again: */
   for (buf2 in reverse order starting before buf in
 execbuf-buffers)
   mutex_unreserve_unlock(buf2-lock);
   goto retry;
   default:
   goto err;
   }
   }

   /* now everything is good to go, submit job to GPU: */
   ...
   }

   int finish_execbuf(execbuf)
   {
   /* when GPU is finished: */
   for (buf in execbuf-buffers)
   mutex_unreserve_unlock(buf-lock);
   }
 ==

 Since gpu command submission is all asnyc (hopefully at least) we
 don't unlock once it completes, but right away after the commands are
 submitted. Otherwise you wouldn't be able to submit new execbufs using
 the same buffer objects (and besides, holding locks while going back
 out to userspace is evil).

 right.. but I was trying to simplify the explanation for non-gpu
 folk.. maybe that was an over-simplification ;-)


Ok, a bit expanded version.. I meant to send this yesterday, but I forgot..


Basic problem statement:
- --- -

GPU's do operations that commonly involve many buffers.  Those buffers
can be shared across contexts/processes, exist in different memory
domains (for example VRAM vs system memory), and so on.  And with
PRIME / dmabuf, they can even be shared across devices.  So there are
a handful of situations where the driver needs to wait for buffers to
become ready.  If you think about this in terms of waiting on a buffer
mutex for it to become available, this presents a problem because
there is no way to guarantee that buffers appear in a execbuf/batch in
the same order in all contexts.  That is directly under control of
userspace, and a result of the sequence of GL calls that an application
makes.  Which results in the potential for deadlock.  The problem gets
more complex when you consider that the kernel may need to migrate the
buffer(s) into VRAM before the GPU operates on the buffer(s), which
may in turn require evicting some

Re: [PATCH 2/7] mutex: add support for reservation style locks

2013-01-30 Thread Rob Clark
On Wed, Jan 30, 2013 at 5:08 AM, Daniel Vetter dan...@ffwll.ch wrote:
 On Wed, Jan 30, 2013 at 2:07 AM, Rob Clark robdcl...@gmail.com wrote:
 ==
 Basic problem statement:
 - --- -
 GPU's do operations that commonly involve many buffers.  Those buffers
 can be shared across contexts/processes, exist in different memory
 domains (for example VRAM vs system memory), and so on.  And with
 PRIME / dmabuf, they can even be shared across devices.  So there are
 a handful of situations where the driver needs to wait for buffers to
 become ready.  If you think about this in terms of waiting on a buffer
 mutex for it to become available, this presents a problem because
 there is no way to guarantee that buffers appear in a execbuf/batch in
 the same order in all contexts.  That is directly under control of
 userspace, and a result of the sequence of GL calls that an
 application makes.  Which results in the potential for deadlock.  The
 problem gets more complex when you consider that the kernel may need
 to migrate the buffer(s) into VRAM before the GPU operates on the
 buffer(s), which main in turn require evicting some other buffers (and
 you don't want to evict other buffers which are already queued up to
 the GPU), but for a simplified understanding of the problem you can
 ignore this.

 The algorithm that TTM came up with for dealing with this problem is
 quite simple.  For each group of buffers (execbuf) that need to be
 locked, the caller would be assigned a unique reservation_id, from a
 global counter.  In case of deadlock in the process of locking all the
 buffers associated with a execbuf, the one with the lowest
 reservation_id wins, and the one with the higher reservation_id
 unlocks all of the buffers that it has already locked, and then tries
 again.

 Originally TTM implemented this algorithm on top of an event-queue and
 atomic-ops, but Maarten Lankhorst realized that by merging this with
 the mutex code we could take advantage of the existing mutex fast-path
 code and result in a simpler solution, and so ticket_mutex was born.
 (Well, there where also some additional complexities with the original
 implementation when you start adding in cross-device buffer sharing
 for PRIME.. Maarten could probably better explain.)

 I think the motivational writeup above is really nice, but the example
 code below is a bit wrong

 How it is used:
 --- -- -- -

 A very simplified version:

   int submit_execbuf(execbuf)
   {
   /* acquiring locks, before queuing up to GPU: */
   seqno = assign_global_seqno();
   retry:
   for (buf in execbuf-buffers) {
   ret = mutex_reserve_lock(buf-lock, seqno);
   switch (ret) {
   case 0:
   /* we got the lock */
   break;
   case -EAGAIN:
   /* someone with a lower seqno, so unreserve and try again: */
   for (buf2 in reverse order starting before buf in
 execbuf-buffers)
   mutex_unreserve_unlock(buf2-lock);
   goto retry;
   default:
   goto err;
   }
   }

   /* now everything is good to go, submit job to GPU: */
   ...
   }

   int finish_execbuf(execbuf)
   {
   /* when GPU is finished: */
   for (buf in execbuf-buffers)
   mutex_unreserve_unlock(buf-lock);
   }
 ==

 Since gpu command submission is all asnyc (hopefully at least) we
 don't unlock once it completes, but right away after the commands are
 submitted. Otherwise you wouldn't be able to submit new execbufs using
 the same buffer objects (and besides, holding locks while going back
 out to userspace is evil).

right.. but I was trying to simplify the explanation for non-gpu
folk.. maybe that was an over-simplification ;-)

BR,
-R

 The trick is to add a fence object for async operation (essentially a
 waitqueue on steriods to support gpu-gpu direct signalling). And
 updating fences for a given execbuf needs to happen atomically for all
 buffers, for otherwise userspace could trick the kernel into creating
 a circular fence chain. This wouldn't deadlock the kernel, since
 everything is async, but it'll nicely deadlock the gpus involved.
 Hence why we need ticketing locks to get dma_buf fences off the
 ground.

 Maybe wait for Maarten's feedback, then update your motivational blurb a bit?

 Cheers, Daniel
 --
 Daniel Vetter
 Software Engineer, Intel Corporation
 +41 (0) 79 365 57 48 - http://blog.ffwll.ch
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/7] mutex: add support for reservation style locks

2013-01-29 Thread Rob Clark
On Tue, Jan 15, 2013 at 6:33 AM, Maarten Lankhorst
m.b.lankho...@gmail.com wrote:


Hi Maarten,

This is a nice looking extension to avoid re-implementing a mutex in
TTM/reservation code..  ofc, probably someone more familiar with mutex
code should probably review, but probably a bit of explanation about
what and why would be helpful.

 mutex_reserve_lock, and mutex_reserve_lock_interruptible:
   Lock a buffer with a reservation_id set. reservation_id must not be set to 
 0,
   since this is a special value that means no reservation_id.

   Normally if reservation_id is not set, or is older than the reservation_id 
 that's
   currently set on the mutex, the behavior will be to wait normally.

   However, if  the reservation_id is newer than the current reservation_id, 
 -EAGAIN
   will be returned, and this function must unreserve all other mutexes and 
 then redo
   a blocking lock with normal mutex calls to prevent a deadlock, then call
   mutex_locked_set_reservation on successful locking to set the 
 reservation_id inside
   the lock.

It might be a bit more clear to write up how this works from the
perspective of the user of ticket_mutex, separately from the internal
implementation first, and then how it works internally?  Ie, the
mutex_set_reservation_fastpath() call is internal to the
implementation of ticket_mutex, but -EAGAIN is something the caller of
ticket_mutex shall deal with.  This might give a clearer picture of
how TTM / reservation uses this to prevent deadlock, so those less
familiar with TTM could better understand.

Well, here is an attempt to start a write-up, which should perhaps
eventually be folded into Documentation/ticket-mutex-design.txt.  But
hopefully a better explanation of the problem and the solution will
encourage some review of the ticket_mutex changes.

==
Basic problem statement:
- --- -
GPU's do operations that commonly involve many buffers.  Those buffers
can be shared across contexts/processes, exist in different memory
domains (for example VRAM vs system memory), and so on.  And with
PRIME / dmabuf, they can even be shared across devices.  So there are
a handful of situations where the driver needs to wait for buffers to
become ready.  If you think about this in terms of waiting on a buffer
mutex for it to become available, this presents a problem because
there is no way to guarantee that buffers appear in a execbuf/batch in
the same order in all contexts.  That is directly under control of
userspace, and a result of the sequence of GL calls that an
application makes.  Which results in the potential for deadlock.  The
problem gets more complex when you consider that the kernel may need
to migrate the buffer(s) into VRAM before the GPU operates on the
buffer(s), which main in turn require evicting some other buffers (and
you don't want to evict other buffers which are already queued up to
the GPU), but for a simplified understanding of the problem you can
ignore this.

The algorithm that TTM came up with for dealing with this problem is
quite simple.  For each group of buffers (execbuf) that need to be
locked, the caller would be assigned a unique reservation_id, from a
global counter.  In case of deadlock in the process of locking all the
buffers associated with a execbuf, the one with the lowest
reservation_id wins, and the one with the higher reservation_id
unlocks all of the buffers that it has already locked, and then tries
again.

Originally TTM implemented this algorithm on top of an event-queue and
atomic-ops, but Maarten Lankhorst realized that by merging this with
the mutex code we could take advantage of the existing mutex fast-path
code and result in a simpler solution, and so ticket_mutex was born.
(Well, there where also some additional complexities with the original
implementation when you start adding in cross-device buffer sharing
for PRIME.. Maarten could probably better explain.)

How it is used:
--- -- -- -

A very simplified version:

  int submit_execbuf(execbuf)
  {
  /* acquiring locks, before queuing up to GPU: */
  seqno = assign_global_seqno();
  retry:
  for (buf in execbuf-buffers) {
  ret = mutex_reserve_lock(buf-lock, seqno);
  switch (ret) {
  case 0:
  /* we got the lock */
  break;
  case -EAGAIN:
  /* someone with a lower seqno, so unreserve and try again: */
  for (buf2 in reverse order starting before buf in
execbuf-buffers)
  mutex_unreserve_unlock(buf2-lock);
  goto retry;
  default:
  goto err;
  }
  }

  /* now everything is good to go, submit job to GPU: */
  ...
  }

  int finish_execbuf(execbuf)
  {
  /* when GPU is finished: */
  for (buf in execbuf-buffers)
  mutex_unreserve_unlock(buf-lock);
  }
==

anyways, for the rest of the patch, I'm still going through the
mutex/ticket_mutex 

Re: [RFC v2 0/5] Common Display Framework

2013-01-08 Thread Rob Clark
On Tue, Jan 8, 2013 at 2:25 AM, Laurent Pinchart
laurent.pinch...@ideasonboard.com wrote:
 Hi Rob,

 On Thursday 27 December 2012 09:54:55 Rob Clark wrote:
 What I've done to avoid that so far is that the master device registers the
 drivers for it's output sub-devices before registering it's own device.

 I'm not sure to follow you here. The master device doesn't register anything,
 do you mean the master device driver ? If so, how does the master device
 driver register its own device ? Devices are not registered by their driver.

sorry, that should have read master driver registers drivers for it's
sub-devices..

BR,
-R
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv16 5/7] fbmon: add of_videomode helpers

2013-01-07 Thread Rob Clark
On Mon, Jan 7, 2013 at 2:46 AM, Mohammed, Afzal af...@ti.com wrote:
 Hi Steffen,

 On Mon, Jan 07, 2013 at 13:36:48, Steffen Trumtrar wrote:
 On Mon, Jan 07, 2013 at 06:10:13AM +, Mohammed, Afzal wrote:

  This breaks DaVinci (da8xx_omapl_defconfig), following change was
  required to get it build if OF_VIDEOMODE or/and FB_MODE_HELPERS
  is not defined. There may be better solutions, following was the
  one that was used by me to test this series.

 I just did a quick make da8xx_omapl_defconfig  make and it builds just 
 fine.
 On what version did you apply the series?
 At the moment I have the series sitting on 3.7. Didn't try any 3.8-rcx yet.
 But fixing this shouldn't be a problem.

 You are right, me idiot, error will happen only upon try to make use of
 of_get_fb_videomode() (defined in this patch) in the da8xx-fb driver
 (with da8xx_omapl_defconfig), to be exact upon adding,

 video: da8xx-fb: obtain fb_videomode info from dt of my patch series.

 The change as I mentioned or something similar would be required as
 any driver that is going to make use of of_get_fb_videomode() would
 break if CONFIG_OF_VIDEOMODE or CONFIG_FB_MODE_HELPERS is not defined.

Shouldn't the driver that depends on CONFIG_OF_VIDEOMODE and
CONFIG_FB_MODE_HELPERS, explicitly select them?  I don't really see
the point of having the static-inline fallbacks.

fwiw, using 'select' is what I was doing for lcd panel support for
lcdc/da8xx drm driver (which was using the of videomode helpers,
albeit a slightly earlier version of the patches):

https://github.com/robclark/kernel-omap4/commit/e2aef5f281348afaaaeaa132699efc2831aa8384

BR,
-R


 And testing was done over v3.8-rc2.

   +#if IS_ENABLED(CONFIG_OF_VIDEOMODE)
 
  As _OF_VIDEOMODE is a bool type CONFIG, isn't,
 
  #ifdef CONFIG_OF_VIDEOMODE
 
  sufficient ?
 

 Yes, that is right. But I think IS_ENABLED is the preferred way to do it, 
 isn't it?

 Now I realize it is.

 Regards
 Afzal
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 0/5] Common Display Framework

2012-12-27 Thread Rob Clark
On Mon, Dec 24, 2012 at 7:37 AM, Laurent Pinchart
laurent.pinch...@ideasonboard.com wrote:
 Hi Rob,

 On Tuesday 18 December 2012 00:21:32 Rob Clark wrote:
 On Mon, Dec 17, 2012 at 11:04 PM, Dave Airlie airl...@gmail.com wrote:
  Many developers showed interest in the first RFC, and I've had the
  opportunity to discuss it with most of them. I would like to thank (in
  no particular order) Tomi Valkeinen for all the time he spend helping me
  to draft v2, Marcus Lorentzon for his useful input during Linaro Connect
  Q4 2012, and Linaro for inviting me to Connect and providing a venue to
  discuss this topic.
 
  So this might be a bit off topic but this whole CDF triggered me
  looking at stuff I generally avoid:
 
  The biggest problem I'm having currently with the whole ARM graphics
  and output world is the proliferation of platform drivers for every
  little thing. The whole ordering of operations with respect to things
  like suspend/resume or dynamic power management is going to be a real
  nightmare if there are dependencies between the drivers. How do you
  enforce ordering of s/r operations between all the various components?

 I tend to think that sub-devices are useful just to have a way to probe hw
 which may or may not be there, since on ARM we often don't have any
 alternative.. but beyond that, suspend/resume, and other life-cycle aspects,
 they should really be treated as all one device. Especially to avoid
 undefined suspend/resume ordering.

 I tend to agree, except that I try to reuse the existing PM infrastructure
 when possible to avoid reinventing the wheel. So far handling suspend/resume
 ordering related to data busses in early suspend/late resume operations and
 allowing the Linux PM core to handle control busses using the Linux device
 tree worked pretty well.

 CDF or some sort of mechanism to share panel drivers between drivers is
 useful.  Keeping it within drm, is probably a good idea, if nothing else to
 simplify re-use of helper fxns (like avi-infoframe stuff, for example) and
 avoid dealing with merging changes across multiple trees. Treating them more
 like shared libraries and less like sub-devices which can be dynamically
 loaded/unloaded (ie. they should be not built as separate modules or
 suspend/resumed or probed/removed independently of the master driver) is a
 really good idea to avoid uncovering nasty synchronization issues later
 (remove vs modeset or pageflip) or surprising userspace in bad ways.

 We've tried that in V4L2 years ago and realized that the approach led to a
 dead-end, especially when OF/DT got involved. With DT-based device probing,
 I2C camera sensors started getting probed asynchronously to the main camera
 device, as they are children of the I2C bus master. We will have similar
 issues with I2C HDMI transmitters or panels, so we should be prepared for it.

What I've done to avoid that so far is that the master device
registers the drivers for it's output sub-devices before registering
it's own device.  At least this way I can control that they are probed
first.  Not the prettiest thing, but avoids even uglier problems.

 On PC hardware the I2C devices are connected to an I2C master provided by the
 GPU, but on embedded devices they are usually connected to an independent I2C
 master. We thus can't have a single self-contained driver that controls
 everything internally, and need to interface with the rest of the SoC drivers.

 I agree that probing/removing devices independently of the master driver can
 lead to bad surprises, which is why I want to establish clear rules in CDF
 regarding what can and can't be done with display entities. Reference counting
 will be one way to make sure that devices don't disappear all of a sudden.

That at least helps cover some issues.. although it doesn't really
help userspace confusion.

Anyways, with enough work perhaps all problems could be solved..
otoh, there are plenty of other important problems to solve in the
world of gpus and kms, so my preference is always not to needlessly
over-complicate CDF and instead leave some time for other things

BR,
-R

  The other thing I'd like you guys to do is kill the idea of fbdev and
  v4l drivers that are shared with the drm codebase, really just
  implement fbdev and v4l on top of the drm layer, some people might
  think this is some sort of maintainer thing, but really nothing else
  makes sense, and having these shared display frameworks just to avoid
  having using drm/kms drivers seems totally pointless. Fix the drm
  fbdev emulation if an fbdev interface is needed. But creating a fourth
  framework because our previous 3 frameworks didn't work out doesn't
  seem like a situation I want to get behind too much.

 yeah, let's not have multiple frameworks to do the same thing.. For fbdev,
 it is pretty clear that it is a dead end.  For v4l2 (subdev+mcf), it is
 perhaps bit more flexible when it comes to random arbitrary hw pipelines
 than kms.  But to take

Re: [RFC v2 0/5] Common Display Framework

2012-12-27 Thread Rob Clark
On Mon, Dec 24, 2012 at 11:09 AM, Laurent Pinchart
laurent.pinch...@ideasonboard.com wrote:
 On the topic of discussions, would anyone be interested in a
 BoF/brainstorming/whatever session during the FOSDEM ?

I will be at FOSDEM.. and from http://wiki.x.org/wiki/fosdem2013 it
looks like at least Daniel will be there.  If enough others are, it
could be a good idea.

BR,
-R
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 0/5] Common Display Framework

2012-12-27 Thread Rob Clark
On Mon, Dec 24, 2012 at 11:27 AM, Laurent Pinchart
laurent.pinch...@ideasonboard.com wrote:
 On Wednesday 19 December 2012 16:57:56 Jani Nikula wrote:
 It just seems to me that, at least from a DRM/KMS perspective, adding
 another layer (=CDF) for HDMI or DP (or legacy outputs) would be
 overengineering it. They are pretty well standardized, and I don't see there
 would be a need to write multiple display drivers for them. Each display
 controller has one, and can easily handle any chip specific requirements
 right there. It's my gut feeling that an additional framework would just get
 in the way. Perhaps there could be more common HDMI/DP helper style code in
 DRM to reduce overlap across KMS drivers, but that's another thing.

 So is the HDMI/DP drivers using CDF a more interesting idea from a non-DRM
 perspective? Or, put another way, is it more of an alternative to using DRM?
 Please enlighten me if there's some real benefit here that I fail to see!

 As Rob pointed out, you can have external HDMI/DP encoders, and even internal
 HDMI/DP encoder IPs can be shared between SoCs and SoC vendors. CDF aims at
 sharing a single driver between SoCs and boards for a given HDMI/DP encoder.

just fwiw, drm already has something a bit like this.. the i2c
encoder-slave.  With support for a couple external i2c encoders which
could in theory be shared between devices.

BR,
-R
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 0/5] Common Display Framework

2012-12-27 Thread Rob Clark
On Mon, Dec 24, 2012 at 11:35 AM, Laurent Pinchart
laurent.pinch...@ideasonboard.com wrote:
 On Wednesday 19 December 2012 09:26:40 Rob Clark wrote:
 And, there are also external HDMI encoders (for example connected over
 i2c) that can also be shared between boards.  So I think there will be
 a number of cases where CDF is appropriate for HDMI drivers.  Although
 trying to keep this all independent of DRM (as opposed to just
 something similar to what drivers/gpu/i2c is today) seems a bit
 overkill for me.  Being able to use the helpers in drm and avoiding an
 extra layer of translation seems like the better option to me.  So my
 vote would be drivers/gpu/cdf.

 I don't think there will be any need for translation (except perhaps between
 the DRM mode structures and the common video mode structure that is being
 discussed). Add a drm_ prefix to the existing CDF functions and structures,
 and there you go :-)

well, and translation for any properties that we'd want to expose to
userspace, etc, etc.. I see there being a big potential for a lot of
needless glue

BR,
-R

 The reason why I'd like to keep CDF separate from DRM (or at least not
 requiring a drm_device) is that HDMI/DP encoders can be used by pure V4L2
 drivers.

  For DSI panels (or DSI-to-whatever bridges) it's of course another
  story. You typically need a panel specific driver. And here I see the
  main point of the whole CDF: decoupling display controllers and the
  panel drivers, and sharing panel (and converter chip) specific drivers
  across display controllers. Making it easy to write new drivers, as
  there would be a model to follow. I'm definitely in favour of coming up
  with some framework that would tackle that.

 --
 Regards,

 Laurent Pinchart

 --
 To unsubscribe from this list: send the line unsubscribe linux-media in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 0/5] Common Display Framework

2012-12-27 Thread Rob Clark
On Thu, Dec 27, 2012 at 1:18 PM, Sascha Hauer s.ha...@pengutronix.de wrote:
 On Thu, Dec 27, 2012 at 09:54:55AM -0600, Rob Clark wrote:
 On Mon, Dec 24, 2012 at 7:37 AM, Laurent Pinchart
 laurent.pinch...@ideasonboard.com wrote:
  Hi Rob,
 
  On Tuesday 18 December 2012 00:21:32 Rob Clark wrote:
  On Mon, Dec 17, 2012 at 11:04 PM, Dave Airlie airl...@gmail.com wrote:
   Many developers showed interest in the first RFC, and I've had the
   opportunity to discuss it with most of them. I would like to thank (in
   no particular order) Tomi Valkeinen for all the time he spend helping 
   me
   to draft v2, Marcus Lorentzon for his useful input during Linaro 
   Connect
   Q4 2012, and Linaro for inviting me to Connect and providing a venue to
   discuss this topic.
  
   So this might be a bit off topic but this whole CDF triggered me
   looking at stuff I generally avoid:
  
   The biggest problem I'm having currently with the whole ARM graphics
   and output world is the proliferation of platform drivers for every
   little thing. The whole ordering of operations with respect to things
   like suspend/resume or dynamic power management is going to be a real
   nightmare if there are dependencies between the drivers. How do you
   enforce ordering of s/r operations between all the various components?
 
  I tend to think that sub-devices are useful just to have a way to probe hw
  which may or may not be there, since on ARM we often don't have any
  alternative.. but beyond that, suspend/resume, and other life-cycle 
  aspects,
  they should really be treated as all one device. Especially to avoid
  undefined suspend/resume ordering.
 
  I tend to agree, except that I try to reuse the existing PM infrastructure
  when possible to avoid reinventing the wheel. So far handling 
  suspend/resume
  ordering related to data busses in early suspend/late resume operations and
  allowing the Linux PM core to handle control busses using the Linux device
  tree worked pretty well.
 
  CDF or some sort of mechanism to share panel drivers between drivers is
  useful.  Keeping it within drm, is probably a good idea, if nothing else 
  to
  simplify re-use of helper fxns (like avi-infoframe stuff, for example) and
  avoid dealing with merging changes across multiple trees. Treating them 
  more
  like shared libraries and less like sub-devices which can be dynamically
  loaded/unloaded (ie. they should be not built as separate modules or
  suspend/resumed or probed/removed independently of the master driver) is a
  really good idea to avoid uncovering nasty synchronization issues later
  (remove vs modeset or pageflip) or surprising userspace in bad ways.
 
  We've tried that in V4L2 years ago and realized that the approach led to a
  dead-end, especially when OF/DT got involved. With DT-based device probing,
  I2C camera sensors started getting probed asynchronously to the main camera
  device, as they are children of the I2C bus master. We will have similar
  issues with I2C HDMI transmitters or panels, so we should be prepared for 
  it.

 What I've done to avoid that so far is that the master device
 registers the drivers for it's output sub-devices before registering
 it's own device.  At least this way I can control that they are probed
 first.  Not the prettiest thing, but avoids even uglier problems.

 This implies that the master driver knows all potential subdevices,
 something which is not true for SoCs which have external i2c encoders
 attached to unrelated i2c controllers.

well, it can be brute-forced..  ie. drm driver calls common
register_all_panels() fxn, which, well, registers all the
panel/display subdev's based on their corresponding CONFIG_FOO_PANEL
defines.  If you anyways aren't building the panels as separate
modules, that would work.  Maybe not the most *elegant* approach, but
simple and functional.

I guess it partly depends on the structure in devicetree.  If you are
assuming that the i2c encoder belongs inside the i2c bus, like:

  i2cN {
foo-i2c-encoder {
  
};
  };

and you are letting devicetree create the devices, then it doesn't
quite work.  I'm not entirely convinced you should do it that way.
Really any device like that is going to be hooked up to at least a
couple busses..  i2c, some sort of bus carrying pixel data, maybe some
gpio's, etc.  So maybe makes more sense for a virtual drm/kms bus, and
then use phandle stuff to link it to the various other busses it
needs:

  mydrmdev {
foo-i2c-encoder {
   i2c = i2cN;
   gpio = gpioM 2 3
   ...
};
  };

ok, admittedly that is a bit different from other proposals about how
this all fits in devicetree.. but otoh, I'm not a huge believer in
letting something that is supposed to make life easier (DT), actually
make things harder or more complicated.  Plus this CDF stuff all needs
to also work on platforms not using OF/DT.

BR,
-R

 Sascha

 --
 Pengutronix e.K

Re: [PATCH] [RFC] dma-buf: implement vmap refcounting in the interface logic

2012-12-20 Thread Rob Clark
On Thu, Dec 20, 2012 at 7:14 AM, Daniel Vetter daniel.vet...@ffwll.ch wrote:
 All drivers which implement this need to have some sort of refcount to
 allow concurrent vmap usage. Hence implement this in the dma-buf core.

 To protect against concurrent calls we need a lock, which potentially
 causes new funny locking inversions. But this shouldn't be a problem
 for exporters with statically allocated backing storage, and more
 dynamic drivers have decent issues already anyway.

 Inspired by some refactoring patches from Aaron Plattner, who
 implemented the same idea, but only for drm/prime drivers.

 v2: Check in dma_buf_release that no dangling vmaps are left.
 Suggested by Aaron Plattner. We might want to do similar checks for
 attachments, but that's for another patch. Also fix up ERR_PTR return
 for vmap.

 v3: Check whether the passed-in vmap address matches with the cached
 one for vunmap. Eventually we might want to remove that parameter -
 compared to the kmap functions there's no need for the vaddr for
 unmapping.  Suggested by Chris Wilson.

 v4: Fix a brown-paper-bag bug spotted by Aaron Plattner.

yeah, I think the locking does worry me a bit but hopefully should be
able to do something better when reservations land

Signed-off-by: Rob Clark r...@ti.com

 Cc: Aaron Plattner aplatt...@nvidia.com
 Signed-off-by: Daniel Vetter daniel.vet...@ffwll.ch
 ---
  Documentation/dma-buf-sharing.txt |  6 +-
  drivers/base/dma-buf.c| 43 
 ++-
  include/linux/dma-buf.h   |  4 +++-
  3 files changed, 46 insertions(+), 7 deletions(-)

 diff --git a/Documentation/dma-buf-sharing.txt 
 b/Documentation/dma-buf-sharing.txt
 index 0188903..4966b1b 100644
 --- a/Documentation/dma-buf-sharing.txt
 +++ b/Documentation/dma-buf-sharing.txt
 @@ -302,7 +302,11 @@ Access to a dma_buf from the kernel context involves 
 three steps:
void dma_buf_vunmap(struct dma_buf *dmabuf, void *vaddr)

 The vmap call can fail if there is no vmap support in the exporter, or if 
 it
 -   runs out of vmalloc space. Fallback to kmap should be implemented.
 +   runs out of vmalloc space. Fallback to kmap should be implemented. Note 
 that
 +   the dma-buf layer keeps a reference count for all vmap access and calls 
 down
 +   into the exporter's vmap function only when no vmapping exists, and only
 +   unmaps it once. Protection against concurrent vmap/vunmap calls is 
 provided
 +   by taking the dma_buf-lock mutex.

  3. Finish access

 diff --git a/drivers/base/dma-buf.c b/drivers/base/dma-buf.c
 index a3f79c4..26b68de 100644
 --- a/drivers/base/dma-buf.c
 +++ b/drivers/base/dma-buf.c
 @@ -39,6 +39,8 @@ static int dma_buf_release(struct inode *inode, struct file 
 *file)

 dmabuf = file-private_data;

 +   BUG_ON(dmabuf-vmapping_counter);
 +
 dmabuf-ops-release(dmabuf);
 kfree(dmabuf);
 return 0;
 @@ -482,12 +484,34 @@ EXPORT_SYMBOL_GPL(dma_buf_mmap);
   */
  void *dma_buf_vmap(struct dma_buf *dmabuf)
  {
 +   void *ptr;
 +
 if (WARN_ON(!dmabuf))
 return NULL;

 -   if (dmabuf-ops-vmap)
 -   return dmabuf-ops-vmap(dmabuf);
 -   return NULL;
 +   if (!dmabuf-ops-vmap)
 +   return NULL;
 +
 +   mutex_lock(dmabuf-lock);
 +   if (dmabuf-vmapping_counter) {
 +   dmabuf-vmapping_counter++;
 +   BUG_ON(!dmabuf-vmap_ptr);
 +   ptr = dmabuf-vmap_ptr;
 +   goto out_unlock;
 +   }
 +
 +   BUG_ON(dmabuf-vmap_ptr);
 +
 +   ptr = dmabuf-ops-vmap(dmabuf);
 +   if (IS_ERR_OR_NULL(ptr))
 +   goto out_unlock;
 +
 +   dmabuf-vmap_ptr = ptr;
 +   dmabuf-vmapping_counter = 1;
 +
 +out_unlock:
 +   mutex_unlock(dmabuf-lock);
 +   return ptr;
  }
  EXPORT_SYMBOL_GPL(dma_buf_vmap);

 @@ -501,7 +525,16 @@ void dma_buf_vunmap(struct dma_buf *dmabuf, void *vaddr)
 if (WARN_ON(!dmabuf))
 return;

 -   if (dmabuf-ops-vunmap)
 -   dmabuf-ops-vunmap(dmabuf, vaddr);
 +   BUG_ON(!dmabuf-vmap_ptr);
 +   BUG_ON(dmabuf-vmapping_counter == 0);
 +   BUG_ON(dmabuf-vmap_ptr != vaddr);
 +
 +   mutex_lock(dmabuf-lock);
 +   if (--dmabuf-vmapping_counter == 0) {
 +   if (dmabuf-ops-vunmap)
 +   dmabuf-ops-vunmap(dmabuf, vaddr);
 +   dmabuf-vmap_ptr = NULL;
 +   }
 +   mutex_unlock(dmabuf-lock);
  }
  EXPORT_SYMBOL_GPL(dma_buf_vunmap);
 diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
 index bd2e52c..e3bf2f6 100644
 --- a/include/linux/dma-buf.h
 +++ b/include/linux/dma-buf.h
 @@ -119,8 +119,10 @@ struct dma_buf {
 struct file *file;
 struct list_head attachments;
 const struct dma_buf_ops *ops;
 -   /* mutex to serialize list manipulation and attach/detach */
 +   /* mutex to serialize list manipulation, attach/detach and vmap/unmap 
 */
 struct mutex

Re: [PATCH] [RFC] dma-buf: implement vmap refcounting in the interface logic

2012-12-20 Thread Rob Clark
On Thu, Dec 20, 2012 at 10:50 AM, Rob Clark robdcl...@gmail.com wrote:
 On Thu, Dec 20, 2012 at 7:14 AM, Daniel Vetter daniel.vet...@ffwll.ch wrote:
 All drivers which implement this need to have some sort of refcount to
 allow concurrent vmap usage. Hence implement this in the dma-buf core.

 To protect against concurrent calls we need a lock, which potentially
 causes new funny locking inversions. But this shouldn't be a problem
 for exporters with statically allocated backing storage, and more
 dynamic drivers have decent issues already anyway.

 Inspired by some refactoring patches from Aaron Plattner, who
 implemented the same idea, but only for drm/prime drivers.

 v2: Check in dma_buf_release that no dangling vmaps are left.
 Suggested by Aaron Plattner. We might want to do similar checks for
 attachments, but that's for another patch. Also fix up ERR_PTR return
 for vmap.

 v3: Check whether the passed-in vmap address matches with the cached
 one for vunmap. Eventually we might want to remove that parameter -
 compared to the kmap functions there's no need for the vaddr for
 unmapping.  Suggested by Chris Wilson.

 v4: Fix a brown-paper-bag bug spotted by Aaron Plattner.

 yeah, I think the locking does worry me a bit but hopefully should be
 able to do something better when reservations land

 Signed-off-by: Rob Clark r...@ti.com

or even,

Reviewed-by: Rob Clark r...@ti.com



 Cc: Aaron Plattner aplatt...@nvidia.com
 Signed-off-by: Daniel Vetter daniel.vet...@ffwll.ch
 ---
  Documentation/dma-buf-sharing.txt |  6 +-
  drivers/base/dma-buf.c| 43 
 ++-
  include/linux/dma-buf.h   |  4 +++-
  3 files changed, 46 insertions(+), 7 deletions(-)

 diff --git a/Documentation/dma-buf-sharing.txt 
 b/Documentation/dma-buf-sharing.txt
 index 0188903..4966b1b 100644
 --- a/Documentation/dma-buf-sharing.txt
 +++ b/Documentation/dma-buf-sharing.txt
 @@ -302,7 +302,11 @@ Access to a dma_buf from the kernel context involves 
 three steps:
void dma_buf_vunmap(struct dma_buf *dmabuf, void *vaddr)

 The vmap call can fail if there is no vmap support in the exporter, or 
 if it
 -   runs out of vmalloc space. Fallback to kmap should be implemented.
 +   runs out of vmalloc space. Fallback to kmap should be implemented. Note 
 that
 +   the dma-buf layer keeps a reference count for all vmap access and calls 
 down
 +   into the exporter's vmap function only when no vmapping exists, and only
 +   unmaps it once. Protection against concurrent vmap/vunmap calls is 
 provided
 +   by taking the dma_buf-lock mutex.

  3. Finish access

 diff --git a/drivers/base/dma-buf.c b/drivers/base/dma-buf.c
 index a3f79c4..26b68de 100644
 --- a/drivers/base/dma-buf.c
 +++ b/drivers/base/dma-buf.c
 @@ -39,6 +39,8 @@ static int dma_buf_release(struct inode *inode, struct 
 file *file)

 dmabuf = file-private_data;

 +   BUG_ON(dmabuf-vmapping_counter);
 +
 dmabuf-ops-release(dmabuf);
 kfree(dmabuf);
 return 0;
 @@ -482,12 +484,34 @@ EXPORT_SYMBOL_GPL(dma_buf_mmap);
   */
  void *dma_buf_vmap(struct dma_buf *dmabuf)
  {
 +   void *ptr;
 +
 if (WARN_ON(!dmabuf))
 return NULL;

 -   if (dmabuf-ops-vmap)
 -   return dmabuf-ops-vmap(dmabuf);
 -   return NULL;
 +   if (!dmabuf-ops-vmap)
 +   return NULL;
 +
 +   mutex_lock(dmabuf-lock);
 +   if (dmabuf-vmapping_counter) {
 +   dmabuf-vmapping_counter++;
 +   BUG_ON(!dmabuf-vmap_ptr);
 +   ptr = dmabuf-vmap_ptr;
 +   goto out_unlock;
 +   }
 +
 +   BUG_ON(dmabuf-vmap_ptr);
 +
 +   ptr = dmabuf-ops-vmap(dmabuf);
 +   if (IS_ERR_OR_NULL(ptr))
 +   goto out_unlock;
 +
 +   dmabuf-vmap_ptr = ptr;
 +   dmabuf-vmapping_counter = 1;
 +
 +out_unlock:
 +   mutex_unlock(dmabuf-lock);
 +   return ptr;
  }
  EXPORT_SYMBOL_GPL(dma_buf_vmap);

 @@ -501,7 +525,16 @@ void dma_buf_vunmap(struct dma_buf *dmabuf, void *vaddr)
 if (WARN_ON(!dmabuf))
 return;

 -   if (dmabuf-ops-vunmap)
 -   dmabuf-ops-vunmap(dmabuf, vaddr);
 +   BUG_ON(!dmabuf-vmap_ptr);
 +   BUG_ON(dmabuf-vmapping_counter == 0);
 +   BUG_ON(dmabuf-vmap_ptr != vaddr);
 +
 +   mutex_lock(dmabuf-lock);
 +   if (--dmabuf-vmapping_counter == 0) {
 +   if (dmabuf-ops-vunmap)
 +   dmabuf-ops-vunmap(dmabuf, vaddr);
 +   dmabuf-vmap_ptr = NULL;
 +   }
 +   mutex_unlock(dmabuf-lock);
  }
  EXPORT_SYMBOL_GPL(dma_buf_vunmap);
 diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
 index bd2e52c..e3bf2f6 100644
 --- a/include/linux/dma-buf.h
 +++ b/include/linux/dma-buf.h
 @@ -119,8 +119,10 @@ struct dma_buf {
 struct file *file;
 struct list_head attachments;
 const struct dma_buf_ops *ops;
 -   /* mutex to serialize list manipulation

Re: [RFC v2 0/5] Common Display Framework

2012-12-19 Thread Rob Clark
On Wed, Dec 19, 2012 at 8:57 AM, Jani Nikula
jani.nik...@linux.intel.com wrote:

 Hi Laurent -

 On Tue, 18 Dec 2012, Laurent Pinchart laurent.pinch...@ideasonboard.com 
 wrote:
 Hi Jani,

 On Monday 17 December 2012 18:53:37 Jani Nikula wrote:
 I can see the need for a framework for DSI panels and such (in fact Tomi
 and I have talked about it like 2-3 years ago already!) but what is the
 story for HDMI and DP? In particular, what's the relationship between
 DRM and CDF here? Is there a world domination plan to switch the DRM
 drivers to use this framework too? ;) Do you have some rough plans how
 DRM and CDF should work together in general?

 There's always a world domination plan, isn't there ? :-)

 I certainly want CDF to be used by DRM (or more accurately KMS). That's what
 the C stands for, common refers to sharing panel and other display entity
 drivers between FBDEV, KMS and V4L2.

 I currently have no plan to expose CDF internals to userspace through the KMS
 API. We might have to do so later if the hardware complexity grows in such a
 way that finer control than what KMS provides needs to be exposed to
 userspace, but I don't think we're there yet. The CDF API will thus only be
 used internally in the kernel by display controller drivers. The KMS core
 might get functions to handle common display entity operations, but the bulk
 of the work will be in the display controller drivers to start with. We will
 then see what can be abstracted in KMS helper functions.

 Regarding HDMI and DP, I imagine HDMI and DP drivers that would use the CDF
 API. That's just a thought for now, I haven't tried to implement them, but it
 would be nice to handle HDMI screens and DPI/DBI/DSI panels in a generic way.

 Do you have thoughts to share on this topic ?

 It just seems to me that, at least from a DRM/KMS perspective, adding
 another layer (=CDF) for HDMI or DP (or legacy outputs) would be
 overengineering it. They are pretty well standardized, and I don't see
 there would be a need to write multiple display drivers for them. Each
 display controller has one, and can easily handle any chip specific
 requirements right there. It's my gut feeling that an additional
 framework would just get in the way. Perhaps there could be more common
 HDMI/DP helper style code in DRM to reduce overlap across KMS drivers,
 but that's another thing.

 So is the HDMI/DP drivers using CDF a more interesting idea from a
 non-DRM perspective? Or, put another way, is it more of an alternative
 to using DRM? Please enlighten me if there's some real benefit here that
 I fail to see!

fwiw, I think there are at least a couple cases where multiple SoC's
have the same HDMI IP block.

And, there are also external HDMI encoders (for example connected over
i2c) that can also be shared between boards.  So I think there will be
a number of cases where CDF is appropriate for HDMI drivers.  Although
trying to keep this all independent of DRM (as opposed to just
something similar to what drivers/gpu/i2c is today) seems a bit
overkill for me.  Being able to use the helpers in drm and avoiding an
extra layer of translation seems like the better option to me.  So my
vote would be drivers/gpu/cdf.

BR,
-R

 For DSI panels (or DSI-to-whatever bridges) it's of course another
 story. You typically need a panel specific driver. And here I see the
 main point of the whole CDF: decoupling display controllers and the
 panel drivers, and sharing panel (and converter chip) specific drivers
 across display controllers. Making it easy to write new drivers, as
 there would be a model to follow. I'm definitely in favour of coming up
 with some framework that would tackle that.


 BR,
 Jani.
 --
 To unsubscribe from this list: send the line unsubscribe linux-media in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 0/5] Common Display Framework

2012-12-19 Thread Rob Clark
On Wed, Dec 19, 2012 at 9:37 AM, Tomi Valkeinen tomi.valkei...@ti.com wrote:
 On 2012-12-19 17:26, Rob Clark wrote:

 And, there are also external HDMI encoders (for example connected over
 i2c) that can also be shared between boards.  So I think there will be
 a number of cases where CDF is appropriate for HDMI drivers.  Although
 trying to keep this all independent of DRM (as opposed to just
 something similar to what drivers/gpu/i2c is today) seems a bit
 overkill for me.  Being able to use the helpers in drm and avoiding an
 extra layer of translation seems like the better option to me.  So my
 vote would be drivers/gpu/cdf.

 Well, we need to think about that. I would like to keep CDF independent
 of DRM. I don't like tying different components/frameworks together if
 there's no real need for that.

 Also, something that Laurent mentioned in our face-to-face discussions:
 Some IPs/chips can be used for other purposes than with DRM.

 He had an example of a board, that (if I understood right) gets video
 signal from somewhere outside the board, processes the signal with some
 IPs/chips, and then outputs the signal. So there's no framebuffer, and
 the image is not stored anywhere. I think the framework used in these
 cases is always v4l2.

 The IPs/chips in the above model may be the exact same IPs/chips that
 are used with normal display. If the CDF was tied to DRM, using the
 same drivers for normal and these streaming cases would probably not be
 possible.

Well, maybe there is a way, but it really seems to be
over-complicating things unnecessarily to keep CDF independent of
DRM..  there will be a lot more traditional uses of CDF compared to
one crazy use-case.  So I don't really fancy making it more difficult
than in needs to be for everyone.

Probably the thing to do is take a step back and reconsider that one
crazy use-case.  For example, KMS doesn't enforce that the buffer
handled passed when you create a drm framebuffer object to scan out is
a GEM buffer.  So on that one crazy platform, maybe it makes sense to
have a DRM/KMS display driver that takes a handle to identify which
video stream coming from the capture end of the pipeline.  Anyways,
that is just an off-the-top-of-my-head idea, probably there are other
options too.

BR,
-R

  Tomi


--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 0/5] Common Display Framework

2012-12-17 Thread Rob Clark
On Mon, Dec 17, 2012 at 11:04 PM, Dave Airlie airl...@gmail.com wrote:

 Many developers showed interest in the first RFC, and I've had the 
 opportunity
 to discuss it with most of them. I would like to thank (in no particular
 order) Tomi Valkeinen for all the time he spend helping me to draft v2, 
 Marcus
 Lorentzon for his useful input during Linaro Connect Q4 2012, and Linaro for
 inviting me to Connect and providing a venue to discuss this topic.


 So this might be a bit off topic but this whole CDF triggered me
 looking at stuff I generally avoid:

 The biggest problem I'm having currently with the whole ARM graphics
 and output world is the proliferation of platform drivers for every
 little thing. The whole ordering of operations with respect to things
 like suspend/resume or dynamic power management is going to be a real
 nightmare if there are dependencies between the drivers. How do you
 enforce ordering of s/r operations between all the various components?

I tend to think that sub-devices are useful just to have a way to
probe hw which may or may not be there, since on ARM we often don't
have any alternative.. but beyond that, suspend/resume, and other
life-cycle aspects, they should really be treated as all one device.
Especially to avoid undefined suspend/resume ordering.

CDF or some sort of mechanism to share panel drivers between drivers
is useful.  Keeping it within drm, is probably a good idea, if nothing
else to simplify re-use of helper fxns (like avi-infoframe stuff, for
example) and avoid dealing with merging changes across multiple trees.
  Treating them more like shared libraries and less like sub-devices
which can be dynamically loaded/unloaded (ie. they should be not built
as separate modules or suspend/resumed or probed/removed independently
of the master driver) is a really good idea to avoid uncovering nasty
synchronization issues later (remove vs modeset or pageflip) or
surprising userspace in bad ways.

 The other thing I'd like you guys to do is kill the idea of fbdev and
 v4l drivers that are shared with the drm codebase, really just
 implement fbdev and v4l on top of the drm layer, some people might
 think this is some sort of maintainer thing, but really nothing else
 makes sense, and having these shared display frameworks just to avoid
 having using drm/kms drivers seems totally pointless. Fix the drm
 fbdev emulation if an fbdev interface is needed. But creating a fourth
 framework because our previous 3 frameworks didn't work out doesn't
 seem like a situation I want to get behind too much.

yeah, let's not have multiple frameworks to do the same thing.. For
fbdev, it is pretty clear that it is a dead end.  For v4l2
(subdev+mcf), it is perhaps bit more flexible when it comes to random
arbitrary hw pipelines than kms.  But to take advantage of that, your
userspace isn't going to be portable anyways, so you might as well use
driver specific properties/ioctls.  But I tend to think that is more
useful for cameras.  And from userspace perspective, kms planes are
less painful to use for output than v4l2, so lets stick to drm/kms for
output (and not try to add camera/capture support to kms).. k, thx

BR,
-R

 Dave.
 ___
 dri-devel mailing list
 dri-de...@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/dri-devel
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Linaro-mm-sig] [PATCH] dma-buf: Add debugfs support

2012-12-14 Thread Rob Clark
On Fri, Dec 14, 2012 at 5:57 AM, Maarten Lankhorst
m.b.lankho...@gmail.com wrote:
 Op 14-12-12 10:36, sumit.sem...@ti.com schreef:
 From: Sumit Semwal sumit.sem...@linaro.org

 Add debugfs support to make it easier to print debug information
 about the dma-buf buffers.

 I like the idea, I don't know if it could be done in a free manner, but for 
 bonus points
 could we also have the dma-buf fd be obtainable that way from a debugfs entry?

 Doing so would allow me to 'steal' a dma-buf from an existing mapping easily, 
 and test against that.

 Also I think the name of the device and process that exported the dma-buf 
 would be useful
 to have as well, even if in case of the device that would mean changing the 
 api slightly to record it.

 I was thinking of having a directory structure like this:

 /sys/kernel/debug/dma_buf/stats

 and then for each dma-buf:

 /sys/kernel/debug/dma-buf/exporting_file.c/number-fd
 /sys/kernel/debug/dma-buf/exporting_file.c/number-attachments
 /sys/kernel/debug/dma-buf/exporting_file.c/number-info

 Opening the fd file would give you back the original fd, or fail with -EIO if 
 refcount was dropped to 0.

 Would something like this be doable? I don't know debugfs that well, but I 
 don't see why it wouldn't be,

yeah.. but sort of back-door's the security benefits of an anonymous fd..

BR,
-R

 ~Maarten

 --
 To unsubscribe from this list: send the line unsubscribe linux-media in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] dma-buf: Use EXPORT_SYMBOL

2012-10-11 Thread Rob Clark
On Thu, Oct 11, 2012 at 6:13 AM, Mauro Carvalho Chehab
mche...@redhat.com wrote:
 Em Thu, 11 Oct 2012 09:20:12 +0200
 Hans Verkuil hverk...@xs4all.nl escreveu:

  my understaning is
  that the drivers/media/ authors should also ack with this licensing
  (possible) change. I am one of the main contributors there. Alan also has
  copyrights there, and at other parts of the Linux Kernel, including the 
  driver's
  core, from where all Linux Kernel drivers are derivative work, including 
  this one.
 
  As Alan well said, many other core Linux Kernel authors very likely share
  this point of view.
 
  So, developers implicitly or explicitly copied in this thread that might be
  considering the usage of dmabuf on proprietary drivers should consider
  this email as a formal notification of my viewpoint: e. g. that I consider
  any attempt of using DMABUF or media core/drivers together with proprietary
  Kernelspace code as a possible GPL infringement.

 As long as dmabuf uses EXPORT_SYMBOL_GPL that is definitely correct. Does 
 your
 statement also hold if dmabuf would use EXPORT_SYMBOL? (Just asking)

 If you read the Kernel COPYING file, it is explicitly said there that the 
 Kernel
 is licensing with GPLv2. The _ONLY_ exception there is the allowance to use
 the kernel via normal syscalls:

NOTE! This copyright does *not* cover user programs that use 
 kernel
  services by normal system calls - this is merely considered normal 
 use
  of the kernel, and does *not* fall under the heading of derived 
 work.
  Also note that the GPL below is copyrighted by the Free Software
  Foundation, but the instance of code that it refers to (the Linux
  kernel) is copyrighted by me and others who actually wrote it.

 The usage of EXPORT_SYMBOL() is not covered there, so those symbols are also
 covered by GPLv2.

 As the usage of a kernel symbol by a proprietary driver is not explicitly
 listed there as a GPLv2 exception, the only concrete results of this patch is
 to spread FUD, as EXPORT_SYMBOL might generate some doubts on people that
 don't read the Kernel's COPYING file.

 With or without this patch, anyone with intelectual rights in the Kernel may
 go to court to warrant their rights against the infringing closed source 
 drivers.
 By not making it explicitly, you're only trying to fool people that using
 it might be allowed.

Maybe a dumb question (I'm a programmer, not a lawyer), but does it
change anything if we make the APIs related to *exporting* a dmabuf as
EXPORT_SYMBOL() and keep the APIs related to *importing* as
EXPORT_SYMBOL_GPL().  This at least avoids the non-GPL kernel module
from calling in to other driver code, while still allowing the non-GPL
driver to export a buffer that GPL drivers could use.

BR,
-R

 BTW, we should consider changing the control framework API to 
 EXPORT_SYMBOL_GPL.

 Agreed.

 The number of contributors to v4l2-ctrls.c is very limited, and I have no
 problem moving that to GPL. For me dmabuf is the rare exception where I 
 prefer
 EXPORT_SYMBOL to prevent the worse evil of forcing vendors to create 
 incompatible
 APIs. It's a sad but true that many GPU drivers are still closed source,
 particularly in the embedded world for which dmabuf was primarily designed.

 My understanding is that even the creation of incompatible Kernel API
 is a presumed GPL violation, as it is an attempt to circumvent the license.

 Basically, if vendors want to work with closed source, there are other options
 in the market. But if they want to work with Linux, they should be 
 contributing
 upstream, instead of doing proprietary blobs.

 Regards,
 Mauro
 --
 To unsubscribe from this list: send the line unsubscribe linux-media in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] dma-buf: Use EXPORT_SYMBOL

2012-10-10 Thread Rob Clark
On Wed, Oct 10, 2012 at 1:17 PM, Alan Cox a...@lxorguk.ukuu.org.uk wrote:
 On Wed, 10 Oct 2012 08:56:32 -0700
 Robert Morell rmor...@nvidia.com wrote:

 EXPORT_SYMBOL_GPL is intended to be used for an internal implementation
 issue, and not really an interface.  The dma-buf infrastructure is
 explicitly intended as an interface between modules/drivers, so it
 should use EXPORT_SYMBOL instead.

 NAK. This needs at the very least the approval of all rights holders for
 the files concerned and all code exposed by this change.

Well, for my contributions to dmabuf, I don't object.. and I think
because we are planning to use dma-buf in userspace for dri3 /
dri-next, I think that basically makes it a userspace facing kernel
infrastructure which would be required for open and proprietary
drivers alike.  So I don't see much alternative to making this
EXPORT_SYMBOL().  Of course, IANAL.

BR,
-R

 Also I'd note if you are trying to do this for the purpose of combining
 it with proprietary code then you are still in my view as a (and the view
 of many other) rights holder to the kernel likely to be in breach
 of the GPL requirements for a derivative work. You may consider that
 formal notification of my viewpoint. Your corporate legal team can
 explain to you why the fact you are now aware of my view is important to
 them.

 Alan
 ___
 dri-devel mailing list
 dri-de...@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/dri-devel
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Linaro-mm-sig] [RFC] New dma_buf - EGLImage EGL extension

2012-10-03 Thread Rob Clark
On Tue, Oct 2, 2012 at 2:10 PM, Maarten Lankhorst
m.b.lankho...@gmail.com wrote:
 How do you want to deal with the case where Y' and CbCr are different 
 hardware buffers?
 Could some support for 2d arrays be added in case Y' and CbCr are separated 
 into top/bottom fields?
 How are semi-planar/planar formats handled that have a different width/height 
 for Y' and CbCr? (YUV420)

The API works (AFAIU) like drm addfb2 ioctl, take I420 for example,
you could either do:

  single buffer:
 fd0 = fd
 offset0 = 0
 pitch0 = width
 fd1 = fd
 offset1 = width * height
 pitch1 = width / 2
 fd2 = fd
 offset2 = offset1 + (width / height / 4)
 pitch2 = width / 2

  multiple buffers:
 offset0 = offset1 = offset2 = 0
 fd0 = fd_luma
 fd1 = fd_u
 fd2 = fd_v
 ... and so on

for interlaced/stereo.. is sticking our heads in sand an option?  :-P

You could get lots of permutations for data layout of fields between
interlaced and stereo.  One option might be to ignore and let the user
create two egl-images and deal with blending in the shader?

BR,
-R
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] dma-buf: might_sleep() in dma_buf_unmap_attachment()

2012-09-28 Thread Rob Clark
From: Rob Clark r...@ti.com

We never really clarified if unmap could be done in atomic context.
But since mapping might require sleeping, this implies mutex in use
to synchronize mapping/unmapping, so unmap could sleep as well.  Add
a might_sleep() to clarify this.

Signed-off-by: Rob Clark r...@ti.com
Acked-by: Daniel Vetter daniel.vet...@ffwll.ch
---
 drivers/base/dma-buf.c |2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/base/dma-buf.c b/drivers/base/dma-buf.c
index c30f3e1..877eacb 100644
--- a/drivers/base/dma-buf.c
+++ b/drivers/base/dma-buf.c
@@ -298,6 +298,8 @@ void dma_buf_unmap_attachment(struct dma_buf_attachment 
*attach,
struct sg_table *sg_table,
enum dma_data_direction direction)
 {
+   might_sleep();
+
if (WARN_ON(!attach || !attach-dmabuf || !sg_table))
return;
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Linaro-mm-sig] [PATCH 2/4] dma-fence: dma-buf synchronization (v8 )

2012-08-11 Thread Rob Clark
 dma-buf.h
 All members are now initialized, so kmalloc can be used for
 allocating a dma-fence. More documentation added.

 Signed-off-by: Maarten Lankhorst maarten.lankho...@canonical.com

 I like the design of this, and especially that it's rather simple ;-)

 A few comments to polish the interface, implementation and documentation a
 bit below.

 ---
  Documentation/DocBook/device-drivers.tmpl |2
  drivers/base/Makefile |2
  drivers/base/dma-fence.c  |  268 
 +
  include/linux/dma-fence.h |  124 +
  4 files changed, 395 insertions(+), 1 deletion(-)
  create mode 100644 drivers/base/dma-fence.c
  create mode 100644 include/linux/dma-fence.h

 diff --git a/Documentation/DocBook/device-drivers.tmpl 
 b/Documentation/DocBook/device-drivers.tmpl
 index 7514dbf..36252ac 100644
 --- a/Documentation/DocBook/device-drivers.tmpl
 +++ b/Documentation/DocBook/device-drivers.tmpl
 @@ -126,6 +126,8 @@ X!Edrivers/base/interface.c
   /sect1
   sect1titleDevice Drivers DMA Management/title
  !Edrivers/base/dma-buf.c
 +!Edrivers/base/dma-fence.c
 +!Iinclude/linux/dma-fence.h
  !Edrivers/base/dma-coherent.c
  !Edrivers/base/dma-mapping.c
   /sect1
 diff --git a/drivers/base/Makefile b/drivers/base/Makefile
 index 5aa2d70..6e9f217 100644
 --- a/drivers/base/Makefile
 +++ b/drivers/base/Makefile
 @@ -10,7 +10,7 @@ obj-$(CONFIG_CMA) += dma-contiguous.o
  obj-y+= power/
  obj-$(CONFIG_HAS_DMA)+= dma-mapping.o
  obj-$(CONFIG_HAVE_GENERIC_DMA_COHERENT) += dma-coherent.o
 -obj-$(CONFIG_DMA_SHARED_BUFFER) += dma-buf.o
 +obj-$(CONFIG_DMA_SHARED_BUFFER) += dma-buf.o dma-fence.o
  obj-$(CONFIG_ISA)+= isa.o
  obj-$(CONFIG_FW_LOADER)  += firmware_class.o
  obj-$(CONFIG_NUMA)   += node.o
 diff --git a/drivers/base/dma-fence.c b/drivers/base/dma-fence.c
 new file mode 100644
 index 000..93448e4
 --- /dev/null
 +++ b/drivers/base/dma-fence.c
 @@ -0,0 +1,268 @@
 +/*
 + * Fence mechanism for dma-buf to allow for asynchronous dma access
 + *
 + * Copyright (C) 2012 Texas Instruments
 + * Author: Rob Clark rob.cl...@linaro.org
 + *
 + * This program is free software; you can redistribute it and/or modify it
 + * under the terms of the GNU General Public License version 2 as published 
 by
 + * the Free Software Foundation.
 + *
 + * This program is distributed in the hope that it will be useful, but 
 WITHOUT
 + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
 + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
 + * more details.
 + *
 + * You should have received a copy of the GNU General Public License along 
 with
 + * this program.  If not, see http://www.gnu.org/licenses/.
 + */
 +
 +#include linux/slab.h
 +#include linux/sched.h
 +#include linux/export.h
 +#include linux/dma-fence.h
 +
 +/**
 + * dma_fence_signal - signal completion of a fence
 + * @fence: the fence to signal
 + *
 + * All registered callbacks will be called directly (synchronously) and
 + * all blocked waters will be awoken. This should be always be called on
 + * software only fences, or alternatively be called after
 + * dma_fence_ops::enable_signaling is called.

 I think we need to be cleare here when dma_fence_signal can be called:
 - for a sw-only fence (i.e. created with dma_fence_create)
   dma_fence_signal _must_ be called under all circumstances.
 - for any other fences, dma_fence_signal may be called, but it _must_ be
   called once the -enable_signalling func has been called and return 0
   (i.e. success).
 - it may be called only _once_.

 + */
 +int dma_fence_signal(struct dma_fence *fence)
 +{
 + unsigned long flags;
 + int ret = -EINVAL;
 +
 + if (WARN_ON(!fence))
 + return -EINVAL;
 +
 + spin_lock_irqsave(fence-event_queue.lock, flags);
 + if (!fence-signaled) {
 + fence-signaled = true;
 + __wake_up_locked_key(fence-event_queue, TASK_NORMAL,
 +  fence-event_queue);
 + ret = 0;
 + } else
 + WARN(1, Already signaled);
 + spin_unlock_irqrestore(fence-event_queue.lock, flags);
 +
 + return ret;
 +}
 +EXPORT_SYMBOL_GPL(dma_fence_signal);
 +
 +static void release_fence(struct kref *kref)
 +{
 + struct dma_fence *fence =
 + container_of(kref, struct dma_fence, refcount);
 +
 + BUG_ON(waitqueue_active(fence-event_queue));
 +
 + if (fence-ops-release)
 + fence-ops-release(fence);
 +
 + kfree(fence);
 +}
 +
 +/**
 + * dma_fence_put - decreases refcount of the fence
 + * @fence:   [in]fence to reduce refcount of
 + */
 +void dma_fence_put(struct dma_fence *fence)
 +{
 + if (WARN_ON(!fence))
 + return;
 + kref_put(fence-refcount, release_fence);
 +}
 +EXPORT_SYMBOL_GPL(dma_fence_put);
 +
 +/**
 + * dma_fence_get - increases refcount of the fence
 + * @fence

Re: [Linaro-mm-sig] [PATCH 1/4] dma-buf: remove fallback for !CONFIG_DMA_SHARED_BUFFER

2012-08-11 Thread Rob Clark
On Fri, Aug 10, 2012 at 2:32 PM, Daniel Vetter dan...@ffwll.ch wrote:
 On Fri, Aug 10, 2012 at 04:57:43PM +0200, Maarten Lankhorst wrote:
 Documentation says that code requiring dma-buf should add it to
 select, so inline fallbacks are not going to be used. A link error
 will make it obvious what went wrong, instead of silently doing
 nothing at runtime.

 Signed-off-by: Maarten Lankhorst maarten.lankho...@canonical.com

 I've botched it more than once to update these when creating new dma-buf
 code. Hence

 Reviewed-by: Daniel Vetter daniel.vet...@ffwll.ch

yeah, I think the fallbacks date back to when it was a user
configurable option, rather than something select'd by drivers using
dmabuf, and we just never went back to clean up.  Let's drop the
fallbacks.

Reviewed-by: Rob Clark rob.cl...@linaro.org


 --
 Daniel Vetter
 Mail: dan...@ffwll.ch
 Mobile: +41 (0)79 365 57 48
 --
 To unsubscribe from this list: send the line unsubscribe linux-media in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Linaro-mm-sig] [PATCH 2/4] dma-fence: dma-buf synchronization (v8 )

2012-08-11 Thread Rob Clark
On Sat, Aug 11, 2012 at 2:22 PM, Daniel Vetter dan...@ffwll.ch wrote:
  +
  +/**
  + * dma_fence_wait - wait for a fence to be signaled
  + *
  + * @fence:   [in]The fence to wait on
  + * @intr:[in]if true, do an interruptible wait
  + * @timeout: [in]absolute time for timeout, in jiffies.
 
  I don't quite like this, I think we should keep the styl of all other
  wait_*_timeout functions and pass the arg as timeout in jiffies (and also
  the same return semantics). Otherwise well have funny code that needs to
  handle return values differently depending upon whether it waits upon a
  dma_fence or a native object (where it would us the wait_*_timeout
  functions directly).

 We did start out this way, but there was an ugly jiffies roll-over
 problem that was difficult to deal with properly.  Using an absolute
 time avoided the problem.

 Well, as-is the api works differently than all the other _timeout apis
 I've seen in the kernel, which makes it confusing. Also, I don't quite see
 what jiffies wraparound issue there is?

iirc, the problem was in dmabufmgr, in
dmabufmgr_wait_completed_cpu().. with an absolute timeout, it could
loop over all the fences without having to adjust the timeout for the
elapsed time.  Otherwise it had to adjust the timeout and keep track
of when the timeout elapsed without confusing itself via rollover.

BR,
-R
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 3/9] v4l: add buffer exporting via dmabuf

2012-07-31 Thread Rob Clark
On Tue, Jul 31, 2012 at 7:11 AM, Hans Verkuil hverk...@xs4all.nl wrote:
  For that matter, wouldn't it be useful to support exporting a userptr 
  buffer
  at some point in the future?

 Shouldn't USERPTR usage be discouraged once we get dma-buf support ?

 Why? It's perfectly fine to use it and it's not going away.

 I'm not saying that we should support exporting a userptr buffer as a dmabuf 
 fd,
 but I'm just wondering if that is possible at all and how difficult it would 
 be.

it seems not terribly safe, since you don't really have much control
over where the memory comes from w/ userptr.  I'm more in favor of
discouraging usage of userptr

BR,
-R
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 3/9] v4l: add buffer exporting via dmabuf

2012-07-31 Thread Rob Clark
On Tue, Jul 31, 2012 at 8:39 AM, Rémi Denis-Courmont r...@remlab.net wrote:
 Le mardi 31 juillet 2012 14:56:14 Laurent Pinchart, vous avez écrit :
  For that matter, wouldn't it be useful to support exporting a userptr
  buffer at some point in the future?

 Shouldn't USERPTR usage be discouraged once we get dma-buf support ?

 USERPTR, where available, is currently the only way to perform zero-copy from
 kernel to userspace. READWRITE does not support zero-copy at all. MMAP only
 supports zero-copy if userspace knows a boundary on the number of concurrent
 buffers *and* the device can deal with that number of buffers; in general,
 MMAP requires memory copying.

hmm, this sounds like the problem is device pre-allocating buffers?
Anyways, last time I looked, the vb2 core supported changing dmabuf fd
each time you QBUF, in a similar way to what you can do w/ userptr.
So that seems to get you the advantages you miss w/ mmap without the
pitfalls of userptr.

 I am not sure DMABUF even supports transmitting data efficiently to userspace.
 In my understanding, it's meant for transmitting data between DSP's bypassing
 userspace entirely, in other words the exact opposite of what USERBUF does.

well, dmabuf's can be mmap'd.. so it is more a matter of where the
buffer gets allocated, malloc() or from some driver (v4l2 or other).
There are a *ton* of ways userspace allocated memory can go badly,
esp. if the hw has special requirements about memory (GFP_DMA32 in a
system w/ PAE/LPAE, certain ranges of memory, certain alignment of
memory, etc).

BR,
-R

 --
 Rémi Denis-Courmont
 http://www.remlab.net/
 http://fi.linkedin.com/in/remidenis
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] dma-buf: add helpers for attacher dma-parms

2012-07-20 Thread Rob Clark
Fyi, Daniel Vetter had suggested on IRC that it would be cleaner to
have a single helper fxn that most-restrictive union of all attached
device's dma_parms.  Really this should include dma_mask and
coherent_dma_mask, I think.  But that touches a lot of other places in
the code.  If no one objects to the cleanup of moving
dma_mask/coherent_dma_mask into dma_parms, I'll do this first.

So anyways, don't consider this patch yet for inclusion, I'll make an
updated one based on dma_parms..

BR,
-R

On Thu, Jul 19, 2012 at 11:23 AM, Rob Clark rob.cl...@linaro.org wrote:
 From: Rob Clark r...@ti.com

 Add some helpers to iterate through all attachers and get the most
 restrictive segment size/count/boundary.

 Signed-off-by: Rob Clark r...@ti.com
 ---
  drivers/base/dma-buf.c  |   63 
 +++
  include/linux/dma-buf.h |   19 ++
  2 files changed, 82 insertions(+)

 diff --git a/drivers/base/dma-buf.c b/drivers/base/dma-buf.c
 index 24e88fe..757ee20 100644
 --- a/drivers/base/dma-buf.c
 +++ b/drivers/base/dma-buf.c
 @@ -192,6 +192,69 @@ void dma_buf_put(struct dma_buf *dmabuf)
  EXPORT_SYMBOL_GPL(dma_buf_put);

  /**
 + * dma_buf_max_seg_size - helper for exporters to get the minimum of
 + * all attached device's max segment size
 + */
 +unsigned int dma_buf_max_seg_size(struct dma_buf *dmabuf)
 +{
 +   struct dma_buf_attachment *attach;
 +   unsigned int max = (unsigned int)-1;
 +
 +   if (WARN_ON(!dmabuf))
 +   return 0;
 +
 +   mutex_lock(dmabuf-lock);
 +   list_for_each_entry(attach, dmabuf-attachments, node)
 +   max = min(max, dma_get_max_seg_size(attach-dev));
 +   mutex_unlock(dmabuf-lock);
 +
 +   return max;
 +}
 +EXPORT_SYMBOL_GPL(dma_buf_max_seg_size);
 +
 +/**
 + * dma_buf_max_seg_count - helper for exporters to get the minimum of
 + * all attached device's max segment count
 + */
 +unsigned int dma_buf_max_seg_count(struct dma_buf *dmabuf)
 +{
 +   struct dma_buf_attachment *attach;
 +   unsigned int max = (unsigned int)-1;
 +
 +   if (WARN_ON(!dmabuf))
 +   return 0;
 +
 +   mutex_lock(dmabuf-lock);
 +   list_for_each_entry(attach, dmabuf-attachments, node)
 +   max = min(max, dma_get_max_seg_count(attach-dev));
 +   mutex_unlock(dmabuf-lock);
 +
 +   return max;
 +}
 +EXPORT_SYMBOL_GPL(dma_buf_max_seg_count);
 +
 +/**
 + * dma_buf_get_seg_boundary - helper for exporters to get the most
 + * restrictive segment alignment of all the attached devices
 + */
 +unsigned int dma_buf_get_seg_boundary(struct dma_buf *dmabuf)
 +{
 +   struct dma_buf_attachment *attach;
 +   unsigned int mask = (unsigned int)-1;
 +
 +   if (WARN_ON(!dmabuf))
 +   return 0;
 +
 +   mutex_lock(dmabuf-lock);
 +   list_for_each_entry(attach, dmabuf-attachments, node)
 +   mask = dma_get_seg_boundary(attach-dev);
 +   mutex_unlock(dmabuf-lock);
 +
 +   return mask;
 +}
 +EXPORT_SYMBOL_GPL(dma_buf_get_seg_boundary);
 +
 +/**
   * dma_buf_attach - Add the device to dma_buf's attachments list; optionally,
   * calls attach() of dma_buf_ops to allow device-specific attach 
 functionality
   * @dmabuf:[in]buffer to attach device to.
 diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
 index eb48f38..9533b9b 100644
 --- a/include/linux/dma-buf.h
 +++ b/include/linux/dma-buf.h
 @@ -167,6 +167,10 @@ int dma_buf_fd(struct dma_buf *dmabuf, int flags);
  struct dma_buf *dma_buf_get(int fd);
  void dma_buf_put(struct dma_buf *dmabuf);

 +unsigned int dma_buf_max_seg_size(struct dma_buf *dmabuf);
 +unsigned int dma_buf_max_seg_count(struct dma_buf *dmabuf);
 +unsigned int dma_buf_get_seg_boundary(struct dma_buf *dmabuf);
 +
  struct sg_table *dma_buf_map_attachment(struct dma_buf_attachment *,
 enum dma_data_direction);
  void dma_buf_unmap_attachment(struct dma_buf_attachment *, struct sg_table *,
 @@ -220,6 +224,21 @@ static inline void dma_buf_put(struct dma_buf *dmabuf)
 return;
  }

 +static inline unsigned int dma_buf_max_seg_size(struct dma_buf *dmabuf)
 +{
 +   return 0;
 +}
 +
 +static inline unsigned int dma_buf_max_seg_count(struct dma_buf *dmabuf)
 +{
 +   return 0;
 +}
 +
 +static inline unsigned int dma_buf_get_seg_boundary(struct dma_buf *dmabuf)
 +{
 +   return 0;
 +}
 +
  static inline struct sg_table *dma_buf_map_attachment(
 struct dma_buf_attachment *attach, enum dma_data_direction write)
  {
 --
 1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] dma-parms and helpers for dma-buf

2012-07-19 Thread Rob Clark
From: Rob Clark r...@ti.com

Re-sending first patch, with a wider audience.  Apparently I didn't
spam enough inboxes the first time.

And, at Daniel Vetter's suggestion, adding some helper functions in
dma-buf to get the most restrictive parameters of all the attached
devices.

Rob Clark (2):
  device: add dma_params-max_segment_count
  dma-buf: add helpers for attacher dma-parms

 drivers/base/dma-buf.c  |   63 +++
 include/linux/device.h  |1 +
 include/linux/dma-buf.h |   19 +
 include/linux/dma-mapping.h |   16 +++
 4 files changed, 99 insertions(+)

-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] dma-buf: add helpers for attacher dma-parms

2012-07-19 Thread Rob Clark
From: Rob Clark r...@ti.com

Add some helpers to iterate through all attachers and get the most
restrictive segment size/count/boundary.

Signed-off-by: Rob Clark r...@ti.com
---
 drivers/base/dma-buf.c  |   63 +++
 include/linux/dma-buf.h |   19 ++
 2 files changed, 82 insertions(+)

diff --git a/drivers/base/dma-buf.c b/drivers/base/dma-buf.c
index 24e88fe..757ee20 100644
--- a/drivers/base/dma-buf.c
+++ b/drivers/base/dma-buf.c
@@ -192,6 +192,69 @@ void dma_buf_put(struct dma_buf *dmabuf)
 EXPORT_SYMBOL_GPL(dma_buf_put);
 
 /**
+ * dma_buf_max_seg_size - helper for exporters to get the minimum of
+ * all attached device's max segment size
+ */
+unsigned int dma_buf_max_seg_size(struct dma_buf *dmabuf)
+{
+   struct dma_buf_attachment *attach;
+   unsigned int max = (unsigned int)-1;
+
+   if (WARN_ON(!dmabuf))
+   return 0;
+
+   mutex_lock(dmabuf-lock);
+   list_for_each_entry(attach, dmabuf-attachments, node)
+   max = min(max, dma_get_max_seg_size(attach-dev));
+   mutex_unlock(dmabuf-lock);
+
+   return max;
+}
+EXPORT_SYMBOL_GPL(dma_buf_max_seg_size);
+
+/**
+ * dma_buf_max_seg_count - helper for exporters to get the minimum of
+ * all attached device's max segment count
+ */
+unsigned int dma_buf_max_seg_count(struct dma_buf *dmabuf)
+{
+   struct dma_buf_attachment *attach;
+   unsigned int max = (unsigned int)-1;
+
+   if (WARN_ON(!dmabuf))
+   return 0;
+
+   mutex_lock(dmabuf-lock);
+   list_for_each_entry(attach, dmabuf-attachments, node)
+   max = min(max, dma_get_max_seg_count(attach-dev));
+   mutex_unlock(dmabuf-lock);
+
+   return max;
+}
+EXPORT_SYMBOL_GPL(dma_buf_max_seg_count);
+
+/**
+ * dma_buf_get_seg_boundary - helper for exporters to get the most
+ * restrictive segment alignment of all the attached devices
+ */
+unsigned int dma_buf_get_seg_boundary(struct dma_buf *dmabuf)
+{
+   struct dma_buf_attachment *attach;
+   unsigned int mask = (unsigned int)-1;
+
+   if (WARN_ON(!dmabuf))
+   return 0;
+
+   mutex_lock(dmabuf-lock);
+   list_for_each_entry(attach, dmabuf-attachments, node)
+   mask = dma_get_seg_boundary(attach-dev);
+   mutex_unlock(dmabuf-lock);
+
+   return mask;
+}
+EXPORT_SYMBOL_GPL(dma_buf_get_seg_boundary);
+
+/**
  * dma_buf_attach - Add the device to dma_buf's attachments list; optionally,
  * calls attach() of dma_buf_ops to allow device-specific attach functionality
  * @dmabuf:[in]buffer to attach device to.
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
index eb48f38..9533b9b 100644
--- a/include/linux/dma-buf.h
+++ b/include/linux/dma-buf.h
@@ -167,6 +167,10 @@ int dma_buf_fd(struct dma_buf *dmabuf, int flags);
 struct dma_buf *dma_buf_get(int fd);
 void dma_buf_put(struct dma_buf *dmabuf);
 
+unsigned int dma_buf_max_seg_size(struct dma_buf *dmabuf);
+unsigned int dma_buf_max_seg_count(struct dma_buf *dmabuf);
+unsigned int dma_buf_get_seg_boundary(struct dma_buf *dmabuf);
+
 struct sg_table *dma_buf_map_attachment(struct dma_buf_attachment *,
enum dma_data_direction);
 void dma_buf_unmap_attachment(struct dma_buf_attachment *, struct sg_table *,
@@ -220,6 +224,21 @@ static inline void dma_buf_put(struct dma_buf *dmabuf)
return;
 }
 
+static inline unsigned int dma_buf_max_seg_size(struct dma_buf *dmabuf)
+{
+   return 0;
+}
+
+static inline unsigned int dma_buf_max_seg_count(struct dma_buf *dmabuf)
+{
+   return 0;
+}
+
+static inline unsigned int dma_buf_get_seg_boundary(struct dma_buf *dmabuf)
+{
+   return 0;
+}
+
 static inline struct sg_table *dma_buf_map_attachment(
struct dma_buf_attachment *attach, enum dma_data_direction write)
 {
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] device: add dma_params-max_segment_count

2012-07-19 Thread Rob Clark
From: Rob Clark r...@ti.com

For devices which have constraints about maximum number of segments
in an sglist.  For example, a device which could only deal with
contiguous buffers would set max_segment_count to 1.

The initial motivation is for devices sharing buffers via dma-buf,
to allow the buffer exporter to know the constraints of other
devices which have attached to the buffer.  The dma_mask and fields
in 'struct device_dma_parameters' tell the exporter everything else
that is needed, except whether the importer has constraints about
maximum number of segments.

Signed-off-by: Rob Clark r...@ti.com
---
 include/linux/device.h  |1 +
 include/linux/dma-mapping.h |   16 
 2 files changed, 17 insertions(+)

diff --git a/include/linux/device.h b/include/linux/device.h
index 161d962..3813735 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -568,6 +568,7 @@ struct device_dma_parameters {
 * sg limitations.
 */
unsigned int max_segment_size;
+   unsigned int max_segment_count;/* zero for unlimited */
unsigned long segment_boundary_mask;
 };
 
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index dfc099e..f380f79 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -111,6 +111,22 @@ static inline unsigned int dma_set_max_seg_size(struct 
device *dev,
return -EIO;
 }
 
+static inline unsigned int dma_get_max_seg_count(struct device *dev)
+{
+   return dev-dma_parms ? dev-dma_parms-max_segment_count : 0;
+}
+
+static inline int dma_set_max_seg_count(struct device *dev,
+   unsigned int count)
+{
+   if (dev-dma_parms) {
+   dev-dma_parms-max_segment_count = count;
+   return 0;
+   } else
+   return -EIO;
+}
+
+
 static inline unsigned long dma_get_seg_boundary(struct device *dev)
 {
return dev-dma_parms ?
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Documentation: DocBook DRM framework documentation

2012-07-16 Thread Rob Clark
On Thu, Jul 12, 2012 at 7:00 PM, Laurent Pinchart
laurent.pinch...@ideasonboard.com wrote:
 Signed-off-by: Laurent Pinchart laurent.pinch...@ideasonboard.com
 ---
  Documentation/DocBook/drm.tmpl | 2835 
 +++-
  1 files changed, 2226 insertions(+), 609 deletions(-)

 Hi everybody,

 Here's the DRM kernel framework documentation previously posted to the
 dri-devel mailing list. The documentation has been reworked, converted to
 DocBook and merged with the existing DocBook DRM documentation stub. The
 result doesn't cover the whole DRM API but should hopefully be good enough
 for a start.

 I've done my best to follow a natural flow starting at initialization and
 covering the major DRM internal topics. As I'm not a native English speaker
 I'm not totally happy with the result, so if anyone wants to edit the text
 please feel free to do so. Review will as usual be appreciated, and acks will
 be even more welcome (I've been working on this document for longer than I
 feel comfortable with).

btw, thanks for this!

One minor typo below.. with that,

Reviewed-by: Rob Clark rob.cl...@linaro.org

 diff --git a/Documentation/DocBook/drm.tmpl b/Documentation/DocBook/drm.tmpl
 index 196b8b9..44a2c66 100644
[snip]
 +sect2
 +  titleOutput Polling/title
 +  synopsisvoid (*output_poll_changed)(struct drm_device 
 *dev);/synopsis
 +  para
 +This operation notifies the driver that the status of one or more
 +connectors has changed. Drivers that use the fbdev helper can just 
 call

s/fbdev/fb/

 +the functiondrm_fb_helper_hotplug_event/function function to 
 handle
 +this operation.



BR,
-R
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   >