from:"Inki Dae"

Re: [PATCH for v3.18 00/18] Backport CVE-2017-13166 fixes to Kernel 3.18

2018-03-29 Thread Inki Dae



2018년 03월 29일 16:00에 Greg KH 이(가) 쓴 글:
> On Thu, Mar 29, 2018 at 03:39:54PM +0900, Inki Dae wrote:
>> 2018년 03월 29일 13:25에 Greg KH 이(가) 쓴 글:
>>> On Thu, Mar 29, 2018 at 08:22:08AM +0900, Inki Dae wrote:
>>>> Really thanks for doing this. :) There would be many users who use
>>>> Linux-3.18 for their products yet.
>>>
>>> For new products?  They really should not be.  The kernel is officially
>>
>> Really no. Old products would still be using Linux-3.18 kernel without
>> kernel upgrade. For new product, most of SoC vendors will use
>> Linux-4.x including us.
>> Actually, we are preparing for kernel upgrade for some devices even
>> some old devices (to Linux-4.14-LTS) and almost done.
> 
> That is great to hear.
> 
>>> What is keeping you on 3.18.y and not allowing you to move to a newer
>>> kernel version?
>>
>> We also want to move to latest kernel version. However, there is a case that 
>> we cannot upgrade the kernel.
>> In case that SoC vendor never share firmwares and relevant data
>> sheets, we cannot upgrade the kernel. However, we have to resolve the
>> security issues for users of this device.
> 
> It sounds like you need to be getting those security updates from those
> SoC vendors, as they are the ones you are paying for support for that

It's true but some open source developers like me who use vendor kernel without 
vendor's support will never get the security updates from them.
So if you merge CVE patches even through this kernel is already EOL then many 
open source developers would be glad. :)

Thanks,
Inki Dae

> kernel version that they are forcing you to stay on.
> 
> good luck!
> 
> greg k-h
> 
> 
>

Re: [PATCH for v3.18 00/18] Backport CVE-2017-13166 fixes to Kernel 3.18

2018-03-29 Thread Inki Dae

2018년 03월 29일 13:25에 Greg KH 이(가) 쓴 글:
> On Thu, Mar 29, 2018 at 08:22:08AM +0900, Inki Dae wrote:
>> Really thanks for doing this. :) There would be many users who use
>> Linux-3.18 for their products yet.
> 
> For new products?  They really should not be.  The kernel is officially

Really no. Old products would still be using Linux-3.18 kernel without kernel 
upgrade. For new product, most of SoC vendors will use Linux-4.x including us.
Actually, we are preparing for kernel upgrade for some devices even some old 
devices (to Linux-4.14-LTS) and almost done.

> end-of-life, but I'm keeping it alive for a short while longer just
> because too many people seem to still be using it.  However, they are
> not actually updating the kernel in their devices, so I don't think I
> will be doing many more new 3.18.y releases.
> 
> It's a problem when people ask for support, and then don't use the
> releases given to them :(
> 
> What is keeping you on 3.18.y and not allowing you to move to a newer
> kernel version?

We also want to move to latest kernel version. However, there is a case that we 
cannot upgrade the kernel.
In case that SoC vendor never share firmwares and relevant data sheets, we 
cannot upgrade the kernel. However, we have to resolve the security issues for 
users of this device.

Thanks,
Inki Dae

> 
> thanks,
> 
> greg k-h
> 
> 
>

Re: [PATCH for v3.18 00/18] Backport CVE-2017-13166 fixes to Kernel 3.18

2018-03-28 Thread Inki Dae

Hi Mauro,

2018년 03월 29일 03:12에 Mauro Carvalho Chehab 이(가) 쓴 글:
> Hi Greg,
> 
> Those are the backports meant to solve CVE-2017-13166 on Kernel 3.18.
> 
> It contains two v4l2-ctrls fixes that are required to avoid crashes
> at the test application.
> 
> I wrote two patches myself for Kernel 3.18 in order to solve some
> issues specific for Kernel 3.18 with aren't needed upstream.
> one is actually a one-line change backport. The other one makes
> sure that both 32-bits and 64-bits version of some ioctl calls
> will return the same value for a reserved field.
> 
> I noticed an extra bug while testing it, but the bug also hits upstream,
> and should be backported all the way down all stable/LTS versions.
> So, I'll send it the usual way, after merging upsream.

Really thanks for doing this. :) There would be many users who use Linux-3.18 
for their products yet.

Thanks,
Inki Dae

> 
> Regards,
> Mauro
> 
> 
> Daniel Mentz (2):
>   media: v4l2-compat-ioctl32: Copy v4l2_window->global_alpha
>   media: v4l2-compat-ioctl32.c: refactor compat ioctl32 logic
> 
> Hans Verkuil (12):
>   media: v4l2-ioctl.c: don't copy back the result for -ENOTTY
>   media: v4l2-compat-ioctl32.c: add missing VIDIOC_PREPARE_BUF
>   media: v4l2-compat-ioctl32.c: fix the indentation
>   media: v4l2-compat-ioctl32.c: move 'helper' functions to
> __get/put_v4l2_format32
>   media: v4l2-compat-ioctl32.c: avoid sizeof(type)
>   media: v4l2-compat-ioctl32.c: copy m.userptr in put_v4l2_plane32
>   media: v4l2-compat-ioctl32.c: fix ctrl_is_pointer
>   media: v4l2-compat-ioctl32.c: make ctrl_is_pointer work for subdevs
>   media: v4l2-compat-ioctl32.c: copy clip list in put_v4l2_window32
>   media: v4l2-compat-ioctl32.c: drop pr_info for unknown buffer type
>   media: v4l2-compat-ioctl32.c: don't copy back the result for certain
> errors
>   media: v4l2-ctrls: fix sparse warning
> 
> Mauro Carvalho Chehab (2):
>   media: v4l2-compat-ioctl32: use compat_u64 for video standard
>   media: v4l2-compat-ioctl32: initialize a reserved field
> 
> Ricardo Ribalda (2):
>   vb2: V4L2_BUF_FLAG_DONE is set after DQBUF
>   media: media/v4l2-ctrls: volatiles should not generate CH_VALUE
> 
>  drivers/media/v4l2-core/v4l2-compat-ioctl32.c | 1020 
> +++--
>  drivers/media/v4l2-core/v4l2-ctrls.c  |   96 ++-
>  drivers/media/v4l2-core/v4l2-ioctl.c  |5 +-
>  drivers/media/v4l2-core/videobuf2-core.c  |5 +
>  4 files changed, 691 insertions(+), 435 deletions(-)
>

Re: [RFC 0/4] Exynos DRM: add Picture Processor extension

2017-05-10 Thread Inki Dae



2017년 05월 10일 16:55에 Daniel Vetter 이(가) 쓴 글:
> On Wed, May 10, 2017 at 03:27:02PM +0900, Inki Dae wrote:
>> Hi Tomasz,
>>
>> 2017년 05월 10일 14:38에 Tomasz Figa 이(가) 쓴 글:
>>> Hi Everyone,
>>>
>>> On Wed, May 10, 2017 at 9:24 AM, Inki Dae <inki@samsung.com> wrote:
>>>>
>>>>
>>>> 2017년 04월 26일 07:21에 Sakari Ailus 이(가) 쓴 글:
>>>>> Hi Marek,
>>>>>
>>>>> On Thu, Apr 20, 2017 at 01:23:09PM +0200, Marek Szyprowski wrote:
>>>>>> Hi Laurent,
>>>>>>
>>>>>> On 2017-04-20 12:25, Laurent Pinchart wrote:
>>>>>>> Hi Marek,
>>>>>>>
>>>>>>> (CC'ing Sakari Ailus)
>>>>>>>
>>>>>>> Thank you for the patches.
>>>>>>>
>>>>>>> On Thursday 20 Apr 2017 11:13:36 Marek Szyprowski wrote:
>>>>>>>> Dear all,
>>>>>>>>
>>>>>>>> This is an updated proposal for extending EXYNOS DRM API with generic
>>>>>>>> support for hardware modules, which can be used for processing image 
>>>>>>>> data
>>>>>>> >from the one memory buffer to another. Typical memory-to-memory 
>>>>>>> >operations
>>>>>>>> are: rotation, scaling, colour space conversion or mix of them. This 
>>>>>>>> is a
>>>>>>>> follow-up of my previous proposal "[RFC 0/2] New feature: Framebuffer
>>>>>>>> processors", which has been rejected as "not really needed in the DRM
>>>>>>>> core":
>>>>>>>> http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html
>>>>>>>>
>>>>>>>> In this proposal I moved all the code to Exynos DRM driver, so now this
>>>>>>>> will be specific only to Exynos DRM. I've also changed the name from
>>>>>>>> framebuffer processor (fbproc) to picture processor (pp) to avoid 
>>>>>>>> confusion
>>>>>>>> with fbdev API.
>>>>>>>>
>>>>>>>> Here is a bit more information what picture processors are:
>>>>>>>>
>>>>>>>> Embedded SoCs are known to have a number of hardware blocks, which 
>>>>>>>> perform
>>>>>>>> such operations. They can be used in paralel to the main GPU module to
>>>>>>>> offload CPU from processing grapics or video data. One of example use 
>>>>>>>> of
>>>>>>>> such modules is implementing video overlay, which usually requires 
>>>>>>>> color
>>>>>>>> space conversion from NV12 (or similar) to RGB32 color space and 
>>>>>>>> scaling to
>>>>>>>> target window size.
>>>>>>>>
>>>>>>>> The proposed API is heavily inspired by atomic KMS approach - it is 
>>>>>>>> also
>>>>>>>> based on DRM objects and their properties. A new DRM object is 
>>>>>>>> introduced:
>>>>>>>> picture processor (called pp for convenience). Such objects have a set 
>>>>>>>> of
>>>>>>>> standard DRM properties, which describes the operation to be performed 
>>>>>>>> by
>>>>>>>> respective hardware module. In typical case those properties are a 
>>>>>>>> source
>>>>>>>> fb id and rectangle (x, y, width, height) and destination fb id and
>>>>>>>> rectangle. Optionally a rotation property can be also specified if
>>>>>>>> supported by the given hardware. To perform an operation on image data,
>>>>>>>> userspace provides a set of properties and their values for given 
>>>>>>>> fbproc
>>>>>>>> object in a similar way as object and properties are provided for
>>>>>>>> performing atomic page flip / mode setting.
>>>>>>>>
>>>>>>>> The proposed API consists of the 3 new ioctls:
>>>>>>>> - DRM_IOCTL_EXYNOS_PP_GET_RESOURCES: to enumerate all available picture
>>>>>>>>   processors,
>>>>>>>> - DRM_IOCTL_EXYNOS_PP_GET: to query capa

Re: [RFC 0/4] Exynos DRM: add Picture Processor extension

2017-05-10 Thread Inki Dae



2017년 05월 10일 15:38에 Tomasz Figa 이(가) 쓴 글:
> On Wed, May 10, 2017 at 2:27 PM, Inki Dae <inki@samsung.com> wrote:
>> Hi Tomasz,
>>
>> 2017년 05월 10일 14:38에 Tomasz Figa 이(가) 쓴 글:
>>> Hi Everyone,
>>>
>>> On Wed, May 10, 2017 at 9:24 AM, Inki Dae <inki@samsung.com> wrote:
>>>>
>>>>
>>>> 2017년 04월 26일 07:21에 Sakari Ailus 이(가) 쓴 글:
>>>>> Hi Marek,
>>>>>
>>>>> On Thu, Apr 20, 2017 at 01:23:09PM +0200, Marek Szyprowski wrote:
>>>>>> Hi Laurent,
>>>>>>
>>>>>> On 2017-04-20 12:25, Laurent Pinchart wrote:
>>>>>>> Hi Marek,
>>>>>>>
>>>>>>> (CC'ing Sakari Ailus)
>>>>>>>
>>>>>>> Thank you for the patches.
>>>>>>>
>>>>>>> On Thursday 20 Apr 2017 11:13:36 Marek Szyprowski wrote:
>>>>>>>> Dear all,
>>>>>>>>
>>>>>>>> This is an updated proposal for extending EXYNOS DRM API with generic
>>>>>>>> support for hardware modules, which can be used for processing image 
>>>>>>>> data
>>>>>>> >from the one memory buffer to another. Typical memory-to-memory 
>>>>>>> >operations
>>>>>>>> are: rotation, scaling, colour space conversion or mix of them. This 
>>>>>>>> is a
>>>>>>>> follow-up of my previous proposal "[RFC 0/2] New feature: Framebuffer
>>>>>>>> processors", which has been rejected as "not really needed in the DRM
>>>>>>>> core":
>>>>>>>> http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html
>>>>>>>>
>>>>>>>> In this proposal I moved all the code to Exynos DRM driver, so now this
>>>>>>>> will be specific only to Exynos DRM. I've also changed the name from
>>>>>>>> framebuffer processor (fbproc) to picture processor (pp) to avoid 
>>>>>>>> confusion
>>>>>>>> with fbdev API.
>>>>>>>>
>>>>>>>> Here is a bit more information what picture processors are:
>>>>>>>>
>>>>>>>> Embedded SoCs are known to have a number of hardware blocks, which 
>>>>>>>> perform
>>>>>>>> such operations. They can be used in paralel to the main GPU module to
>>>>>>>> offload CPU from processing grapics or video data. One of example use 
>>>>>>>> of
>>>>>>>> such modules is implementing video overlay, which usually requires 
>>>>>>>> color
>>>>>>>> space conversion from NV12 (or similar) to RGB32 color space and 
>>>>>>>> scaling to
>>>>>>>> target window size.
>>>>>>>>
>>>>>>>> The proposed API is heavily inspired by atomic KMS approach - it is 
>>>>>>>> also
>>>>>>>> based on DRM objects and their properties. A new DRM object is 
>>>>>>>> introduced:
>>>>>>>> picture processor (called pp for convenience). Such objects have a set 
>>>>>>>> of
>>>>>>>> standard DRM properties, which describes the operation to be performed 
>>>>>>>> by
>>>>>>>> respective hardware module. In typical case those properties are a 
>>>>>>>> source
>>>>>>>> fb id and rectangle (x, y, width, height) and destination fb id and
>>>>>>>> rectangle. Optionally a rotation property can be also specified if
>>>>>>>> supported by the given hardware. To perform an operation on image data,
>>>>>>>> userspace provides a set of properties and their values for given 
>>>>>>>> fbproc
>>>>>>>> object in a similar way as object and properties are provided for
>>>>>>>> performing atomic page flip / mode setting.
>>>>>>>>
>>>>>>>> The proposed API consists of the 3 new ioctls:
>>>>>>>> - DRM_IOCTL_EXYNOS_PP_GET_RESOURCES: to enumerate all available picture
>>>>>>>>   processors,
>>>>>>>> - DRM_IOCTL_EXYNOS_PP_

Re: [RFC 0/4] Exynos DRM: add Picture Processor extension

2017-05-10 Thread Inki Dae

Hi Tomasz,

2017년 05월 10일 14:38에 Tomasz Figa 이(가) 쓴 글:
> Hi Everyone,
> 
> On Wed, May 10, 2017 at 9:24 AM, Inki Dae <inki@samsung.com> wrote:
>>
>>
>> 2017년 04월 26일 07:21에 Sakari Ailus 이(가) 쓴 글:
>>> Hi Marek,
>>>
>>> On Thu, Apr 20, 2017 at 01:23:09PM +0200, Marek Szyprowski wrote:
>>>> Hi Laurent,
>>>>
>>>> On 2017-04-20 12:25, Laurent Pinchart wrote:
>>>>> Hi Marek,
>>>>>
>>>>> (CC'ing Sakari Ailus)
>>>>>
>>>>> Thank you for the patches.
>>>>>
>>>>> On Thursday 20 Apr 2017 11:13:36 Marek Szyprowski wrote:
>>>>>> Dear all,
>>>>>>
>>>>>> This is an updated proposal for extending EXYNOS DRM API with generic
>>>>>> support for hardware modules, which can be used for processing image data
>>>>> >from the one memory buffer to another. Typical memory-to-memory 
>>>>> >operations
>>>>>> are: rotation, scaling, colour space conversion or mix of them. This is a
>>>>>> follow-up of my previous proposal "[RFC 0/2] New feature: Framebuffer
>>>>>> processors", which has been rejected as "not really needed in the DRM
>>>>>> core":
>>>>>> http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html
>>>>>>
>>>>>> In this proposal I moved all the code to Exynos DRM driver, so now this
>>>>>> will be specific only to Exynos DRM. I've also changed the name from
>>>>>> framebuffer processor (fbproc) to picture processor (pp) to avoid 
>>>>>> confusion
>>>>>> with fbdev API.
>>>>>>
>>>>>> Here is a bit more information what picture processors are:
>>>>>>
>>>>>> Embedded SoCs are known to have a number of hardware blocks, which 
>>>>>> perform
>>>>>> such operations. They can be used in paralel to the main GPU module to
>>>>>> offload CPU from processing grapics or video data. One of example use of
>>>>>> such modules is implementing video overlay, which usually requires color
>>>>>> space conversion from NV12 (or similar) to RGB32 color space and scaling 
>>>>>> to
>>>>>> target window size.
>>>>>>
>>>>>> The proposed API is heavily inspired by atomic KMS approach - it is also
>>>>>> based on DRM objects and their properties. A new DRM object is 
>>>>>> introduced:
>>>>>> picture processor (called pp for convenience). Such objects have a set of
>>>>>> standard DRM properties, which describes the operation to be performed by
>>>>>> respective hardware module. In typical case those properties are a source
>>>>>> fb id and rectangle (x, y, width, height) and destination fb id and
>>>>>> rectangle. Optionally a rotation property can be also specified if
>>>>>> supported by the given hardware. To perform an operation on image data,
>>>>>> userspace provides a set of properties and their values for given fbproc
>>>>>> object in a similar way as object and properties are provided for
>>>>>> performing atomic page flip / mode setting.
>>>>>>
>>>>>> The proposed API consists of the 3 new ioctls:
>>>>>> - DRM_IOCTL_EXYNOS_PP_GET_RESOURCES: to enumerate all available picture
>>>>>>   processors,
>>>>>> - DRM_IOCTL_EXYNOS_PP_GET: to query capabilities of given picture
>>>>>>   processor,
>>>>>> - DRM_IOCTL_EXYNOS_PP_COMMIT: to perform operation described by given
>>>>>>   property set.
>>>>>>
>>>>>> The proposed API is extensible. Drivers can attach their own, custom
>>>>>> properties to add support for more advanced picture processing (for 
>>>>>> example
>>>>>> blending).
>>>>>>
>>>>>> This proposal aims to replace Exynos DRM IPP (Image Post Processing)
>>>>>> subsystem. IPP API is over-engineered in general, but not really 
>>>>>> extensible
>>>>>> on the other side. It is also buggy, with significant design flaws - the
>>>>>> biggest issue is the fact that the API covers memory-2-memory picture
>>>>>> operations to

Re: [RFC 0/4] Exynos DRM: add Picture Processor extension

2017-05-09 Thread Inki Dae

operation:
>>  - typically it will be used by compositing window manager, this means that
>>some parameters of the processing might change on each vblank (like
>>destination rectangle for example). This api allows such change on each
>>operation without any additional cost. V4L2 requires to reinitialize
>>queues with new configuration on such change, what means that a bunch of
>>ioctls has to be called.
> 
> What do you mean by re-initialising the queue? Format, buffers or something
> else?
> 
> If you need a larger buffer than what you have already allocated, you'll
> need to re-allocate, V4L2 or not.
> 
> We also do lack a way to destroy individual buffers in V4L2. It'd be up to
> implementing that and some work in videobuf2.
> 
> Another thing is that V4L2 is very stream oriented. For most devices that's
> fine as a lot of the parameters are not changeable during streaming,
> especially if the pipeline is handled by multiple drivers. That said, for
> devices that process data from memory to memory performing changes in the
> media bus formats and pipeline configuration is not very efficient
> currently, largely for the same reason.
> 
> The request API that people have been working for a bit different use cases
> isn't in mainline yet. It would allow more efficient per-request
> configuration than what is currently possible, but it has turned out to be
> far from trivial to implement.
> 
>>  - validating processing parameters in V4l2 API is really complicated,
>>because the parameters (format, src rectangles, rotation) are being
>>set incrementally, so we have to either allow some impossible,
>> transitional
>>configurations or complicate the configuration steps even more (like
>>calling some ioctls multiple times for both input and output). In the end
>>all parameters have to be again validated just before performing the
>>operation.
> 
> You have to validate the parameters in any case. In a MC pipeline this takes
> place when the stream is started.
> 
>>
>> 3. generic approach (to add it to DRM core) has been rejected:
>> http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html
> 
> For GPUs I generally understand the reasoning: there's a very limited number
> of users of this API --- primarily because it's not an application
> interface.
> 
> If you have a device that however falls under the scope of V4L2 (at least
> API-wise), does this continue to be the case? Will there be only one or two
> (or so) users for this API? Is it the case here?
> 
> Using a device specific interface definitely has some benefits: there's no
> need to think how would you generalise the interface for other similar
> devices. There's no need to consider backwards compatibility as it's not a
> requirement. The drawback is that the applications that need to support
> similar devices will bear the burden of having to support different APIs.
> 
> I don't mean to say that you should ram whatever under V4L2 / MC
> independently of how unworkable that might be, but there are also clear
> advantages in using a standardised interface such as V4L2.
> 
> V4L2 has a long history behind it and if it was designed today, I bet it
> would look quite different from what it is now.

It's true. There is definitely a benefit with V4L2 because V4L2 provides Linux 
standard ABI - for DRM as of now not.

However, I think that is a only benefit we could get through V4L2. Using V4L2 
makes software stack of Platform to be complicated - We have to open video 
device node and card device node to display a image on the screen scaling or 
converting color space of the image and also we need to export DMA buffer from 
one side and import it to other side using DMABUF.

It may not related to this but even V4L2 has performance problem - every 
QBUF/DQBUF requests performs mapping/unmapping DMA buffer you already know 
this. :)

In addition, recently Display subsystems on ARM SoC tend to include pre/post 
processing hardware in Display controller - OMAP, Exynos8895 and MSM as long as 
I know.


Thanks,
Inki Dae

> 
>>
>> 4. this api can be considered as extended 'blit' operation, other DRM
>> drivers
>>(MGA, R128, VIA) already have ioctls for such operation, so there is also
>>place in DRM for it
> 
> Added LMML to cc.
>

Re: [PATCH v2] dma-buf: Wait on the reservation object when sync'ing before CPU access

2016-12-18 Thread Inki Dae



2016년 08월 16일 01:02에 Daniel Vetter 이(가) 쓴 글:
> On Mon, Aug 15, 2016 at 04:42:18PM +0100, Chris Wilson wrote:
>> Rendering operations to the dma-buf are tracked implicitly via the
>> reservation_object (dmabuf->resv). This is used to allow poll() to
>> wait upon outstanding rendering (or just query the current status of
>> rendering). The dma-buf sync ioctl allows userspace to prepare the
>> dma-buf for CPU access, which should include waiting upon rendering.
>> (Some drivers may need to do more work to ensure that the dma-buf mmap
>> is coherent as well as complete.)
>>
>> v2: Always wait upon the reservation object implicitly. We choose to do
>> it after the native handler in case it can do so more efficiently.
>>
>> Testcase: igt/prime_vgem
>> Testcase: igt/gem_concurrent_blit # *vgem*
>> Signed-off-by: Chris Wilson 
>> Cc: Sumit Semwal 
>> Cc: Daniel Vetter 
>> Cc: Eric Anholt 
>> Cc: linux-media@vger.kernel.org
>> Cc: dri-de...@lists.freedesktop.org
>> Cc: linaro-mm-...@lists.linaro.org
>> Cc: linux-ker...@vger.kernel.org
>> ---
>>  drivers/dma-buf/dma-buf.c | 23 +++
>>  1 file changed, 23 insertions(+)
>>
>> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
>> index ddaee60ae52a..cf04d249a6a4 100644
>> --- a/drivers/dma-buf/dma-buf.c
>> +++ b/drivers/dma-buf/dma-buf.c
>> @@ -586,6 +586,22 @@ void dma_buf_unmap_attachment(struct dma_buf_attachment 
>> *attach,
>>  }
>>  EXPORT_SYMBOL_GPL(dma_buf_unmap_attachment);
>>  
>> +static int __dma_buf_begin_cpu_access(struct dma_buf *dmabuf,
>> +  enum dma_data_direction direction)
>> +{
>> +bool write = (direction == DMA_BIDIRECTIONAL ||
>> +  direction == DMA_TO_DEVICE);
>> +struct reservation_object *resv = dmabuf->resv;
>> +long ret;
>> +
>> +/* Wait on any implicit rendering fences */
>> +ret = reservation_object_wait_timeout_rcu(resv, write, true,
>> +  MAX_SCHEDULE_TIMEOUT);
>> +if (ret < 0)
>> +return ret;
>> +
>> +return 0;
>> +}
>>  
>>  /**
>>   * dma_buf_begin_cpu_access - Must be called before accessing a dma_buf 
>> from the
>> @@ -608,6 +624,13 @@ int dma_buf_begin_cpu_access(struct dma_buf *dmabuf,
>>  if (dmabuf->ops->begin_cpu_access)
>>  ret = dmabuf->ops->begin_cpu_access(dmabuf, direction);
>>  
>> +/* Ensure that all fences are waited upon - but we first allow
>> + * the native handler the chance to do so more efficiently if it
>> + * chooses. A double invocation here will be reasonably cheap no-op.
>> + */
>> +if (ret == 0)
>> +ret = __dma_buf_begin_cpu_access(dmabuf, direction);
> 
> Not sure we should wait first and the flush or the other way round. But I
> don't think it'll matter for any current dma-buf exporter, so meh.
> 

Sorry for late comment. I wonder there is no problem in case that GPU or other 
DMA device tries to access this dma buffer after dma_buf_begin_cpu_access call.
I think in this case, they - GPU or DMA devices - would make a mess of the dma 
buffer while CPU is accessing the buffer.

This patch is in mainline already so if this is real problem then I think we 
sould choose,
1. revert this patch from mainline
2. make sure to prevent other DMA devices to try to access the buffer while CPU 
is accessing the buffer.

Thanks.

> Reviewed-by: Daniel Vetter 
> 
> Sumits, can you pls pick this one up and put into drm-misc?
> -Daniel
> 
>> +
>>  return ret;
>>  }
>>  EXPORT_SYMBOL_GPL(dma_buf_begin_cpu_access);
>> -- 
>> 2.8.1
>>
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 9/9] drm/exynos: Convert g2d_userptr_get_dma_addr() to use get_vaddr_frames()

2015-07-17 Thread Inki Dae

On 2015년 07월 17일 19:20, Hans Verkuil wrote:
 On 07/13/2015 04:55 PM, Jan Kara wrote:
 From: Jan Kara j...@suse.cz

 Convert g2d_userptr_get_dma_addr() to pin pages using get_vaddr_frames().
 This removes the knowledge about vmas and mmap_sem locking from exynos
 driver. Also it fixes a problem that the function has been mapping user
 provided address without holding mmap_sem.
 
 I'd like to see an Ack from one of the exynos drm driver maintainers before
 I merge this.
 
 Inki, Marek?

I already gave Ack but it seems that Jan missed it while updating.

Anyway,
Acked-by: Inki Dae inki@samsung.com

Thanks,
Inki Dae

 
 Regards,
 
   Hans
 

 Signed-off-by: Jan Kara j...@suse.cz
 ---
  drivers/gpu/drm/exynos/Kconfig  |  1 +
  drivers/gpu/drm/exynos/exynos_drm_g2d.c | 91 ++-
  drivers/gpu/drm/exynos/exynos_drm_gem.c | 97 
 -
  3 files changed, 30 insertions(+), 159 deletions(-)

 diff --git a/drivers/gpu/drm/exynos/Kconfig b/drivers/gpu/drm/exynos/Kconfig
 index 43003c4ad80b..b364562dc6c1 100644
 --- a/drivers/gpu/drm/exynos/Kconfig
 +++ b/drivers/gpu/drm/exynos/Kconfig
 @@ -77,6 +77,7 @@ config DRM_EXYNOS_VIDI
  config DRM_EXYNOS_G2D
  bool Exynos DRM G2D
  depends on DRM_EXYNOS  !VIDEO_SAMSUNG_S5P_G2D
 +select FRAME_VECTOR
  help
Choose this option if you want to use Exynos G2D for DRM.
  
 diff --git a/drivers/gpu/drm/exynos/exynos_drm_g2d.c 
 b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
 index 81a250830808..1d8d9a508373 100644
 --- a/drivers/gpu/drm/exynos/exynos_drm_g2d.c
 +++ b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
 @@ -190,10 +190,8 @@ struct g2d_cmdlist_userptr {
  dma_addr_t  dma_addr;
  unsigned long   userptr;
  unsigned long   size;
 -struct page **pages;
 -unsigned intnpages;
 +struct frame_vector *vec;
  struct sg_table *sgt;
 -struct vm_area_struct   *vma;
  atomic_trefcount;
  boolin_pool;
  boolout_of_list;
 @@ -363,6 +361,7 @@ static void g2d_userptr_put_dma_addr(struct drm_device 
 *drm_dev,
  {
  struct g2d_cmdlist_userptr *g2d_userptr =
  (struct g2d_cmdlist_userptr *)obj;
 +struct page **pages;
  
  if (!obj)
  return;
 @@ -382,19 +381,21 @@ out:
  exynos_gem_unmap_sgt_from_dma(drm_dev, g2d_userptr-sgt,
  DMA_BIDIRECTIONAL);
  
 -exynos_gem_put_pages_to_userptr(g2d_userptr-pages,
 -g2d_userptr-npages,
 -g2d_userptr-vma);
 +pages = frame_vector_pages(g2d_userptr-vec);
 +if (!IS_ERR(pages)) {
 +int i;
  
 -exynos_gem_put_vma(g2d_userptr-vma);
 +for (i = 0; i  frame_vector_count(g2d_userptr-vec); i++)
 +set_page_dirty_lock(pages[i]);
 +}
 +put_vaddr_frames(g2d_userptr-vec);
 +frame_vector_destroy(g2d_userptr-vec);
  
  if (!g2d_userptr-out_of_list)
  list_del_init(g2d_userptr-list);
  
  sg_free_table(g2d_userptr-sgt);
  kfree(g2d_userptr-sgt);
 -
 -drm_free_large(g2d_userptr-pages);
  kfree(g2d_userptr);
  }
  
 @@ -408,9 +409,7 @@ static dma_addr_t *g2d_userptr_get_dma_addr(struct 
 drm_device *drm_dev,
  struct exynos_drm_g2d_private *g2d_priv = file_priv-g2d_priv;
  struct g2d_cmdlist_userptr *g2d_userptr;
  struct g2d_data *g2d;
 -struct page **pages;
  struct sg_table *sgt;
 -struct vm_area_struct *vma;
  unsigned long start, end;
  unsigned int npages, offset;
  int ret;
 @@ -456,65 +455,38 @@ static dma_addr_t *g2d_userptr_get_dma_addr(struct 
 drm_device *drm_dev,
  return ERR_PTR(-ENOMEM);
  
  atomic_set(g2d_userptr-refcount, 1);
 +g2d_userptr-size = size;
  
  start = userptr  PAGE_MASK;
  offset = userptr  ~PAGE_MASK;
  end = PAGE_ALIGN(userptr + size);
  npages = (end - start)  PAGE_SHIFT;
 -g2d_userptr-npages = npages;
 -
 -pages = drm_calloc_large(npages, sizeof(struct page *));
 -if (!pages) {
 -DRM_ERROR(failed to allocate pages.\n);
 -ret = -ENOMEM;
 +g2d_userptr-vec = frame_vector_create(npages);
 +if (!g2d_userptr-vec)
  goto err_free;
 -}
  
 -down_read(current-mm-mmap_sem);
 -vma = find_vma(current-mm, userptr);
 -if (!vma) {
 -up_read(current-mm-mmap_sem);
 -DRM_ERROR(failed to get vm region.\n);
 +ret = get_vaddr_frames(start, npages, true, true, g2d_userptr-vec);
 +if (ret != npages) {
 +DRM_ERROR(failed to get user pages from userptr.\n);
 +if (ret  0)
 +goto err_destroy_framevec;
  ret = -EFAULT;
 -goto err_free_pages;
 +goto err_put_framevec;
  }
 -
 -if (vma-vm_end  userptr

Re: [PATCH 9/9] drm/exynos: Convert g2d_userptr_get_dma_addr() to use get_vaddr_frames()

2015-07-17 Thread Inki Dae

On 2015년 07월 17일 19:31, Hans Verkuil wrote:
 On 07/17/2015 12:29 PM, Inki Dae wrote:
 On 2015년 07월 17일 19:20, Hans Verkuil wrote:
 On 07/13/2015 04:55 PM, Jan Kara wrote:
 From: Jan Kara j...@suse.cz

 Convert g2d_userptr_get_dma_addr() to pin pages using get_vaddr_frames().
 This removes the knowledge about vmas and mmap_sem locking from exynos
 driver. Also it fixes a problem that the function has been mapping user
 provided address without holding mmap_sem.

 I'd like to see an Ack from one of the exynos drm driver maintainers before
 I merge this.

 Inki, Marek?

 I already gave Ack but it seems that Jan missed it while updating.

 Anyway,
 Acked-by: Inki Dae inki@samsung.com
 
 Thanks!

Oops, sorry. This patch would incur a build warning. Below is my comment.

 
 BTW, I didn't see your earlier Ack either. Was it posted to the linux-media 
 list as well?
 It didn't turn up there.

I thought posted but I couldn't find the email in my mailbox so I may
mistake.

 
 Regards,
 
   Hans
 

 Thanks,
 Inki Dae


 Regards,

 Hans


 Signed-off-by: Jan Kara j...@suse.cz
 ---
  drivers/gpu/drm/exynos/Kconfig  |  1 +
  drivers/gpu/drm/exynos/exynos_drm_g2d.c | 91 
 ++-
  drivers/gpu/drm/exynos/exynos_drm_gem.c | 97 
 -
  3 files changed, 30 insertions(+), 159 deletions(-)

 diff --git a/drivers/gpu/drm/exynos/Kconfig 
 b/drivers/gpu/drm/exynos/Kconfig
 index 43003c4ad80b..b364562dc6c1 100644
 --- a/drivers/gpu/drm/exynos/Kconfig
 +++ b/drivers/gpu/drm/exynos/Kconfig
 @@ -77,6 +77,7 @@ config DRM_EXYNOS_VIDI
  config DRM_EXYNOS_G2D
bool Exynos DRM G2D
depends on DRM_EXYNOS  !VIDEO_SAMSUNG_S5P_G2D
 +  select FRAME_VECTOR
help
  Choose this option if you want to use Exynos G2D for DRM.
  
 diff --git a/drivers/gpu/drm/exynos/exynos_drm_g2d.c 
 b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
 index 81a250830808..1d8d9a508373 100644
 --- a/drivers/gpu/drm/exynos/exynos_drm_g2d.c
 +++ b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
 @@ -190,10 +190,8 @@ struct g2d_cmdlist_userptr {
dma_addr_t  dma_addr;
unsigned long   userptr;
unsigned long   size;
 -  struct page **pages;
 -  unsigned intnpages;
 +  struct frame_vector *vec;
struct sg_table *sgt;
 -  struct vm_area_struct   *vma;
atomic_trefcount;
boolin_pool;
boolout_of_list;
 @@ -363,6 +361,7 @@ static void g2d_userptr_put_dma_addr(struct drm_device 
 *drm_dev,
  {
struct g2d_cmdlist_userptr *g2d_userptr =
(struct g2d_cmdlist_userptr *)obj;
 +  struct page **pages;
  
if (!obj)
return;
 @@ -382,19 +381,21 @@ out:
exynos_gem_unmap_sgt_from_dma(drm_dev, g2d_userptr-sgt,
DMA_BIDIRECTIONAL);
  
 -  exynos_gem_put_pages_to_userptr(g2d_userptr-pages,
 -  g2d_userptr-npages,
 -  g2d_userptr-vma);
 +  pages = frame_vector_pages(g2d_userptr-vec);
 +  if (!IS_ERR(pages)) {
 +  int i;
  
 -  exynos_gem_put_vma(g2d_userptr-vma);
 +  for (i = 0; i  frame_vector_count(g2d_userptr-vec); i++)
 +  set_page_dirty_lock(pages[i]);
 +  }
 +  put_vaddr_frames(g2d_userptr-vec);
 +  frame_vector_destroy(g2d_userptr-vec);
  
if (!g2d_userptr-out_of_list)
list_del_init(g2d_userptr-list);
  
sg_free_table(g2d_userptr-sgt);
kfree(g2d_userptr-sgt);
 -
 -  drm_free_large(g2d_userptr-pages);
kfree(g2d_userptr);
  }
  
 @@ -408,9 +409,7 @@ static dma_addr_t *g2d_userptr_get_dma_addr(struct 
 drm_device *drm_dev,
struct exynos_drm_g2d_private *g2d_priv = file_priv-g2d_priv;
struct g2d_cmdlist_userptr *g2d_userptr;
struct g2d_data *g2d;
 -  struct page **pages;
struct sg_table *sgt;
 -  struct vm_area_struct *vma;
unsigned long start, end;
unsigned int npages, offset;
int ret;
 @@ -456,65 +455,38 @@ static dma_addr_t *g2d_userptr_get_dma_addr(struct 
 drm_device *drm_dev,
return ERR_PTR(-ENOMEM);
  
atomic_set(g2d_userptr-refcount, 1);
 +  g2d_userptr-size = size;
  
start = userptr  PAGE_MASK;
offset = userptr  ~PAGE_MASK;
end = PAGE_ALIGN(userptr + size);
npages = (end - start)  PAGE_SHIFT;
 -  g2d_userptr-npages = npages;
 -
 -  pages = drm_calloc_large(npages, sizeof(struct page *));
 -  if (!pages) {
 -  DRM_ERROR(failed to allocate pages.\n);
 -  ret = -ENOMEM;
 +  g2d_userptr-vec = frame_vector_create(npages);
 +  if (!g2d_userptr-vec)

You would need ret = -EFAULT here. And below is a patch posted already,
http://www.spinics.net/lists/dri-devel/msg85321.html

ps. please, ignore the codes related to build error in the patch.

With the change, Acked-by: Inki Dae inki@samsung.com

Thanks,
Inki Dae

goto err_free;
 -  }
  
 -  down_read(current-mm

Re: [PATCH 9/9] drm/exynos: Convert g2d_userptr_get_dma_addr() to use get_vaddr_frames()

2015-05-14 Thread Inki Dae

Hi,

On 2015년 05월 13일 22:08, Jan Kara wrote:
 Convert g2d_userptr_get_dma_addr() to pin pages using get_vaddr_frames().
 This removes the knowledge about vmas and mmap_sem locking from exynos
 driver. Also it fixes a problem that the function has been mapping user
 provided address without holding mmap_sem.
 
 Signed-off-by: Jan Kara j...@suse.cz
 ---
  drivers/gpu/drm/exynos/exynos_drm_g2d.c | 89 ++
  drivers/gpu/drm/exynos/exynos_drm_gem.c | 97 
 -
  2 files changed, 29 insertions(+), 157 deletions(-)
 
 diff --git a/drivers/gpu/drm/exynos/exynos_drm_g2d.c 
 b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
 index 81a250830808..265519c0fe2d 100644
 --- a/drivers/gpu/drm/exynos/exynos_drm_g2d.c
 +++ b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
 @@ -190,10 +190,8 @@ struct g2d_cmdlist_userptr {
   dma_addr_t  dma_addr;
   unsigned long   userptr;
   unsigned long   size;
 - struct page **pages;
 - unsigned intnpages;
 + struct frame_vector *vec;
   struct sg_table *sgt;
 - struct vm_area_struct   *vma;
   atomic_trefcount;
   boolin_pool;
   boolout_of_list;
 @@ -363,6 +361,7 @@ static void g2d_userptr_put_dma_addr(struct drm_device 
 *drm_dev,
  {
   struct g2d_cmdlist_userptr *g2d_userptr =
   (struct g2d_cmdlist_userptr *)obj;
 + struct page **pages;
  
   if (!obj)
   return;
 @@ -382,19 +381,21 @@ out:
   exynos_gem_unmap_sgt_from_dma(drm_dev, g2d_userptr-sgt,
   DMA_BIDIRECTIONAL);
  
 - exynos_gem_put_pages_to_userptr(g2d_userptr-pages,
 - g2d_userptr-npages,
 - g2d_userptr-vma);
 + pages = frame_vector_pages(g2d_userptr-vec);
 + if (!IS_ERR(pages)) {
 + int i;
  
 - exynos_gem_put_vma(g2d_userptr-vma);
 + for (i = 0; i  frame_vector_count(g2d_userptr-vec); i++)
 + set_page_dirty_lock(pages[i]);
 + }
 + put_vaddr_frames(g2d_userptr-vec);
 + frame_vector_destroy(g2d_userptr-vec);
  
   if (!g2d_userptr-out_of_list)
   list_del_init(g2d_userptr-list);
  
   sg_free_table(g2d_userptr-sgt);
   kfree(g2d_userptr-sgt);
 -
 - drm_free_large(g2d_userptr-pages);
   kfree(g2d_userptr);
  }
  
 @@ -413,6 +414,7 @@ static dma_addr_t *g2d_userptr_get_dma_addr(struct 
 drm_device *drm_dev,
   struct vm_area_struct *vma;
   unsigned long start, end;
   unsigned int npages, offset;
 + struct frame_vector *vec;
   int ret;
  
   if (!size) {
 @@ -456,65 +458,37 @@ static dma_addr_t *g2d_userptr_get_dma_addr(struct 
 drm_device *drm_dev,
   return ERR_PTR(-ENOMEM);
  
   atomic_set(g2d_userptr-refcount, 1);
 + g2d_userptr-size = size;
  
   start = userptr  PAGE_MASK;
   offset = userptr  ~PAGE_MASK;
   end = PAGE_ALIGN(userptr + size);
   npages = (end - start)  PAGE_SHIFT;
 - g2d_userptr-npages = npages;
 -
 - pages = drm_calloc_large(npages, sizeof(struct page *));

The declaration to pages isn't needed anymore because you removed it.

 - if (!pages) {
 - DRM_ERROR(failed to allocate pages.\n);
 - ret = -ENOMEM;
 + vec = g2d_userptr-vec = frame_vector_create(npages);

I think you can use g2d_userptr-vec so it seems that vec isn't needed.

 + if (!vec)
   goto err_free;
 - }
  
 - down_read(current-mm-mmap_sem);
 - vma = find_vma(current-mm, userptr);

For vma, ditto.

Thanks,
Inki Dae

[--SNIP--]
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: exynos4 / g2d

2014-02-10 Thread Inki Dae

2014-02-10 17:44 GMT+09:00 Sachin Kamat sachin.ka...@linaro.org:
+cc Joonyoung Shim

Hi,

On 10 February 2014 13:58, Tobias Jakobi tjak...@math.uni-bielefeld.de
wrote:
Hello!

Sachin Kamat wrote:
+cc linux-media list and some related maintainers

Hi,

On 10 February 2014 00:22, Tobias Jakobi tjak...@math.uni-bielefeld.de
wrote:
Hello!

I noticed while here
(https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/arch/arm/boot/dts/exynos4x12.dtsi?id=3a0d48f6f81459c874165ffb14b310c0b5bb0c58)
the necessary entry for the dts was made, on the drm driver side
(https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/exynos/exynos_drm_g2d.c)
this was never added.

Shouldn't samsung,exynos4212-g2d go into exynos_g2d_match as well?
The DRM version of G2D driver does not support Exynos4 based G2D IP
yet. The support for this IP
is available only in the V4L2 version of the driver. Please see the file:
drivers/media/platform/s5p-g2d/g2d.c

That doesn't make sense to me. From the initial commit message of the
DRM code:
The G2D is a 2D graphic accelerator that supports Bit Block Transfer.
This G2D driver is exynos drm specific and supports only G2D(version
4.1) of later Exynos series from Exynos4X12 because supporting DMA.
(https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/drivers/gpu/drm/exynos/exynos_drm_g2d.c?id=d7f1642c90ab5eb2d7c48af0581c993094f97e1a)

In fact, this doesn't even mention the Exynos5?!

It does say later Exynos series from Exynos4X12 which technically
includes Exynos5 and

Right, supported.

does not include previous Exynos series SoCs like 4210, etc.
Anyway, I haven't tested this driver on Exynos4 based platforms and
hence cannot confirm if it
supports 4x12 in the current form. I leave it to the original author
and Inki to comment about it.

Just add samsung,exynos4212-g2d to exynos_g2d_match if you want to
use g2d driver on exynos4212 SoC. We already tested this driver on
Exynos4x12 SoC also. We didn't just post dt support patch for
exynos4x12 series.

Thanks,
Inki Dae

--
With warm regards,
Sachin
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

[PATCH v9 0/2] Introduce buffer synchronization framework

2013-09-17 Thread Inki Dae

 the same issue. Web browser and Web app
are different process. The Web app can draw something in its own buffer using
CPU, and then the Web Browser can compose the buffer with its own back buffer.

Thus, in such cases, a shared buffer could be broken as one process draws
something in a buffer using CPU, when other process composes the buffer with
its own buffer using GPU without any locking mechanism. That is why we need
user land locking interface, fcntl system call.

And last one is a deferred page flip issue. This issue is that a window
buffer rendered can be displayed on screen in about 32ms in worst case:
assume that the gpu rendering is completed within 16ms.
That can be incurred when compositing a pixmap buffer with a window buffer
using GPU and when vsync is just started. At this time, Xorg waits for
a vblank event to get a window buffer so 3d rendering will be delayed
up to about 16ms. As a result, the window buffer would be displayed in
about two vsyncs (about 32ms) and in turn, that would show slow
responsiveness.

For this, we could enhance the responsiveness with locking mechanism: skipping
one vblank wait. I guess Android, Chrome OS, and other platforms are using
their own locking mechanisms with similar reason; Android sync driver, KDS, and
DMA fence.

The below shows the deferred page flip issue in worst case:

   | - vsync signal
   |-- DRI2GetBuffers
   |
   |
   |
   | - vsync signal
   |-- Request gpu rendering
  time |
   |
   |-- Request page flip (deferred)
   | - vsync signal
   |-- Displayed on screen
   |
   |
   |
   | - vsync signal

Thanks,
Inki Dae

References:
[1] http://lwn.net/Articles/470339/
[2] https://patchwork.kernel.org/patch/2625361/
[3] http://linux.die.net/man/2/fcntl
[4] https://www.tizen.org/

Inki Dae (2):
  dmabuf-sync: Add a buffer synchronization framework
  dma-buf: Add user interfaces for dmabuf sync support

 Documentation/dma-buf-sync.txt |  286 
 drivers/base/Kconfig   |7 +
 drivers/base/Makefile  |1 +
 drivers/base/dma-buf.c |   86 
 drivers/base/dmabuf-sync.c |  951 
 include/linux/dma-buf.h|   16 +
 include/linux/dmabuf-sync.h|  257 +++
 7 files changed, 1604 insertions(+)
 create mode 100644 Documentation/dma-buf-sync.txt
 create mode 100644 drivers/base/dmabuf-sync.c
 create mode 100644 include/linux/dmabuf-sync.h

-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v9 1/2] dmabuf-sync: Add a buffer synchronization framework

2013-09-17 Thread Inki Dae

 dmabuf_sync *sync;

sync = dmabuf_sync_init(...);
...

dmabuf_sync_get(sync, dmabuf, DMA_BUF_ACCESS_R);
...

And the below can be used as access types:
DMA_BUF_ACCESS_R - CPU will access a buffer for read.
DMA_BUF_ACCESS_W - CPU will access a buffer for read or write.
DMA_BUF_ACCESS_DMA_R - DMA will access a buffer for read
DMA_BUF_ACCESS_DMA_W - DMA will access a buffer for read or
write.

2. Mandatory resource releasing - a task cannot hold a lock indefinitely.
A task may never try to unlock a buffer after taking a lock to the buffer.
In this case, a timer handler to the corresponding sync object is called
in five (default) seconds and then the timed-out buffer is unlocked by work
queue handler to avoid lockups and to enforce resources of the buffer.

The below is how to use interfaces for device driver:
1. Allocate and Initialize a sync object:
static void xxx_dmabuf_sync_free(void *priv)
{
struct xxx_context *ctx = priv;

if (!ctx)
return;

ctx-sync = NULL;
}
...

static struct dmabuf_sync_priv_ops driver_specific_ops = {
.free = xxx_dmabuf_sync_free,
};
...

struct dmabuf_sync *sync;

sync = dmabuf_sync_init(test sync, driver_specific_ops, ctx);
...

2. Add a dmabuf to the sync object when setting up dma buffer relevant
   registers:
dmabuf_sync_get(sync, dmabuf, DMA_BUF_ACCESS_READ);
...

3. Lock all dmabufs of the sync object before DMA or CPU accesses
   the dmabufs:
dmabuf_sync_lock(sync);
...

4. Now CPU or DMA can access all dmabufs locked in step 3.

5. Unlock all dmabufs added in a sync object after DMA or CPU access
   to these dmabufs is completed:
dmabuf_sync_unlock(sync);

   And call the following functions to release all resources,
dmabuf_sync_put_all(sync);
dmabuf_sync_fini(sync);

You can refer to actual example codes:
drm/exynos: add dmabuf sync support for g2d driver and
drm/exynos: add dmabuf sync support for kms framework from
https://git.kernel.org/cgit/linux/kernel/git/daeinki/
drm-exynos.git/log/?h=dmabuf-sync

And this framework includes fcntl[3] and select system call as interfaces
exported to user. As you know, user sees a buffer object as a dma-buf file
descriptor. fcntl() call with the file descriptor means to lock some buffer
region being managed by the dma-buf object. And select() call with the file
descriptor means to poll the completion event of CPU or DMA access to
the dma-buf.

The below is how to use interfaces for user application:

fcntl system call:

struct flock filelock;

1. Lock a dma buf:
filelock.l_type = F_WRLCK or F_RDLCK;

/* lock entire region to the dma buf. */
filelock.lwhence = SEEK_CUR;
filelock.l_start = 0;
filelock.l_len = 0;

fcntl(dmabuf fd, F_SETLKW or F_SETLK, filelock);
...
CPU access to the dma buf

2. Unlock a dma buf:
filelock.l_type = F_UNLCK;

fcntl(dmabuf fd, F_SETLKW or F_SETLK, filelock);

close(dmabuf fd) call would also unlock the dma buf. And for 
more
detail, please refer to [3]

select system call:

fd_set wdfs or rdfs;

FD_ZERO(wdfs or rdfs);
FD_SET(fd, wdfs or rdfs);

select(fd + 1, rdfs, NULL, NULL, NULL);
or
select(fd + 1, NULL, wdfs, NULL, NULL);

Every time select system call is called, a caller will wait for
the completion of DMA or CPU access to a shared buffer if there
is someone accessing the shared buffer. If no anyone then select
system call will be returned at once.

References:
[1] http://lwn.net/Articles/470339/
[2] https://patchwork.kernel.org/patch/2625361/
[3] http://linux.die.net/man/2/fcntl

Signed-off-by: Inki Dae inki@samsung.com
Signed-off-by: Kyungmin Park kyungmin.p...@samsung.com
---
 Documentation/dma-buf-sync.txt |  286 
 drivers/base/Kconfig   |7 +
 drivers/base/Makefile  |1 +
 drivers/base/dma-buf.c |5 +
 drivers/base/dmabuf-sync.c |  951 
 include/linux/dma-buf.h|   16 +
 include/linux/dmabuf-sync.h|  257 +++
 7 files changed, 1523 insertions(+)
 create mode 100644 Documentation/dma-buf-sync.txt

[PATCH v2 2/2] dma-buf: Add user interfaces for dmabuf sync support

2013-09-17 Thread Inki Dae

This patch adds lock and poll callbacks to dma buf file operations,
and these callbacks will be called by fcntl and select system calls.

fcntl and select system calls can be used to wait for the completion
of DMA or CPU access to a shared dmabuf. The difference of them is
fcntl system call takes a lock after the completion but select system
call doesn't. So in case of fcntl system call, it's useful when a task
wants to access a shared dmabuf without any broken. On the other hand,
it's useful when a task wants to just wait for the completion.

Changelog v2:
- Add select system call support.
  . The purpose of this feature is to wait for the completion of DMA or
CPU access to a dmabuf without that caller locks the dmabuf again
after the completion.
That is useful when caller wants to be aware of the completion of
DMA access to the dmabuf, and the caller doesn't use intefaces for
the DMA device driver.

Signed-off-by: Inki Dae inki@samsung.com
Signed-off-by: Kyungmin Park kyungmin.p...@samsung.com
---
 drivers/base/dma-buf.c |   81 
 1 file changed, 81 insertions(+)

diff --git a/drivers/base/dma-buf.c b/drivers/base/dma-buf.c
index 3985751..73234ba 100644
--- a/drivers/base/dma-buf.c
+++ b/drivers/base/dma-buf.c
@@ -29,6 +29,7 @@
 #include linux/export.h
 #include linux/debugfs.h
 #include linux/seq_file.h
+#include linux/poll.h
 #include linux/dmabuf-sync.h
 
 static inline int is_dma_buf_file(struct file *);
@@ -106,10 +107,90 @@ static loff_t dma_buf_llseek(struct file *file, loff_t 
offset, int whence)
return base + offset;
 }
 
+static unsigned int dma_buf_poll(struct file *filp,
+   struct poll_table_struct *poll)
+{
+   struct dma_buf *dmabuf;
+   struct dmabuf_sync_reservation *robj;
+   int ret = 0;
+
+   if (!is_dma_buf_file(filp))
+   return POLLERR;
+
+   dmabuf = filp-private_data;
+   if (!dmabuf || !dmabuf-sync)
+   return POLLERR;
+
+   robj = dmabuf-sync;
+
+   mutex_lock(robj-lock);
+
+   robj-polled = true;
+
+   /*
+* CPU or DMA access to this buffer has been completed, and
+* the blocked task has been waked up. Return poll event
+* so that the task can get out of select().
+*/
+   if (robj-poll_event) {
+   robj-poll_event = false;
+   mutex_unlock(robj-lock);
+   return POLLIN | POLLOUT;
+   }
+
+   /*
+* There is no anyone accessing this buffer so just return.
+*/
+   if (!robj-locked) {
+   mutex_unlock(robj-lock);
+   return POLLIN | POLLOUT;
+   }
+
+   poll_wait(filp, robj-poll_wait, poll);
+
+   mutex_unlock(robj-lock);
+
+   return ret;
+}
+
+static int dma_buf_lock(struct file *file, int cmd, struct file_lock *fl)
+{
+   struct dma_buf *dmabuf;
+   unsigned int type;
+   bool wait = false;
+
+   if (!is_dma_buf_file(file))
+   return -EINVAL;
+
+   dmabuf = file-private_data;
+
+   if ((fl-fl_type  F_UNLCK) == F_UNLCK) {
+   dmabuf_sync_single_unlock(dmabuf);
+   return 0;
+   }
+
+   /* convert flock type to dmabuf sync type. */
+   if ((fl-fl_type  F_WRLCK) == F_WRLCK)
+   type = DMA_BUF_ACCESS_W;
+   else if ((fl-fl_type  F_RDLCK) == F_RDLCK)
+   type = DMA_BUF_ACCESS_R;
+   else
+   return -EINVAL;
+
+   if (fl-fl_flags  FL_SLEEP)
+   wait = true;
+
+   /* TODO. the locking to certain region should also be considered. */
+
+   return dmabuf_sync_single_lock(dmabuf, type, wait);
+}
+
 static const struct file_operations dma_buf_fops = {
.release= dma_buf_release,
.mmap   = dma_buf_mmap_internal,
.llseek = dma_buf_llseek,
+   .poll   = dma_buf_poll,
+   .lock   = dma_buf_lock,
 };
 
 /*
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v8 0/2] Introduce buffer synchronization framework

2013-08-29 Thread Inki Dae

, and then the Web Browser can compose the buffer with its own back buffer.

Thus, in such cases, a shared buffer could be broken as one process draws
something in a buffer using CPU, when other process composes the buffer with
its own buffer using GPU without any locking mechanism. That is why we need
user land locking interface, fcntl system call.

And last one is a deferred page flip issue. This issue is that a window
buffer rendered can be displayed on screen in about 32ms in worst case:
assume that the gpu rendering is completed within 16ms.
That can be incurred when compositing a pixmap buffer with a window buffer
using GPU and when vsync is just started. At this time, Xorg waits for
a vblank event to get a window buffer so 3d rendering will be delayed
up to about 16ms. As a result, the window buffer would be displayed in
about two vsyncs (about 32ms) and in turn, that would show slow
responsiveness.

For this, we could enhance the responsiveness with locking mechanism: skipping
one vblank wait. I guess Android, Chrome OS, and other platforms are using
their own locking mechanisms with similar reason; Android sync driver, KDS, and
DMA fence.

The below shows the deferred page flip issue in worst case:

   | - vsync signal
   |-- DRI2GetBuffers
   |
   |
   |
   | - vsync signal
   |-- Request gpu rendering
  time |
   |
   |-- Request page flip (deferred)
   | - vsync signal
   |-- Displayed on screen
   |
   |
   |
   | - vsync signal

Thanks,
Inki Dae

References:
[1] http://lwn.net/Articles/470339/
[2] https://patchwork.kernel.org/patch/2625361/
[3] http://linux.die.net/man/2/fcntl
[4] https://www.tizen.org/

Inki Dae (2):
  dmabuf-sync: Add a buffer synchronization framework
  dma-buf: Add user interfaces for dmabuf sync support

 Documentation/dma-buf-sync.txt |  286 
 drivers/base/Kconfig   |7 +
 drivers/base/Makefile  |1 +
 drivers/base/dma-buf.c |   85 
 drivers/base/dmabuf-sync.c |  943 
 include/linux/dma-buf.h|   16 +
 include/linux/dmabuf-sync.h|  257 +++
 7 files changed, 1595 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/dma-buf-sync.txt
 create mode 100644 drivers/base/dmabuf-sync.c
 create mode 100644 include/linux/dmabuf-sync.h

-- 
1.7.5.4

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 2/2] dma-buf: Add user interfaces for dmabuf sync support

2013-08-29 Thread Inki Dae

This patch adds lock and poll callbacks to dma buf file operations,
and these callbacks will be called by fcntl and select system calls.

fcntl and select system calls can be used to wait for the completion
of DMA or CPU access to a shared dmabuf. The difference of them is
fcntl system call takes a lock after the completion but select system
call doesn't. So in case of fcntl system call, it's useful when a task
wants to access a shared dmabuf without any broken. On the other hand,
it's useful when a task wants to just wait for the completion.

Changelog v2:
- Add select system call support.
  . The purpose of this feature is to wait for the completion of DMA or
CPU access to a dmabuf without that caller locks the dmabuf again
after the completion.
That is useful when caller wants to be aware of the completion of
DMA access to the dmabuf, and the caller doesn't use intefaces for
the DMA device driver.

Signed-off-by: Inki Dae inki@samsung.com
Signed-off-by: Kyungmin Park kyungmin.p...@samsung.com
---
 drivers/base/dma-buf.c |   81 
 1 files changed, 81 insertions(+), 0 deletions(-)

diff --git a/drivers/base/dma-buf.c b/drivers/base/dma-buf.c
index cc42a38..f961907 100644
--- a/drivers/base/dma-buf.c
+++ b/drivers/base/dma-buf.c
@@ -30,6 +30,7 @@
 #include linux/export.h
 #include linux/debugfs.h
 #include linux/seq_file.h
+#include linux/poll.h
 #include linux/dmabuf-sync.h
 
 static inline int is_dma_buf_file(struct file *);
@@ -81,9 +82,89 @@ static int dma_buf_mmap_internal(struct file *file, struct 
vm_area_struct *vma)
return dmabuf-ops-mmap(dmabuf, vma);
 }
 
+static unsigned int dma_buf_poll(struct file *filp,
+   struct poll_table_struct *poll)
+{
+   struct dma_buf *dmabuf;
+   struct dmabuf_sync_reservation *robj;
+   int ret = 0;
+
+   if (!is_dma_buf_file(filp))
+   return POLLERR;
+
+   dmabuf = filp-private_data;
+   if (!dmabuf || !dmabuf-sync)
+   return POLLERR;
+
+   robj = dmabuf-sync;
+
+   mutex_lock(robj-lock);
+
+   robj-polled = true;
+
+   /*
+* CPU or DMA access to this buffer has been completed, and
+* the blocked task has been waked up. Return poll event
+* so that the task can get out of select().
+*/
+   if (robj-poll_event) {
+   robj-poll_event = false;
+   mutex_unlock(robj-lock);
+   return POLLIN | POLLOUT;
+   }
+
+   /*
+* There is no anyone accessing this buffer so just return.
+*/
+   if (!robj-locked) {
+   mutex_unlock(robj-lock);
+   return POLLIN | POLLOUT;
+   }
+
+   poll_wait(filp, robj-poll_wait, poll);
+
+   mutex_unlock(robj-lock);
+
+   return ret;
+}
+
+static int dma_buf_lock(struct file *file, int cmd, struct file_lock *fl)
+{
+   struct dma_buf *dmabuf;
+   unsigned int type;
+   bool wait = false;
+
+   if (!is_dma_buf_file(file))
+   return -EINVAL;
+
+   dmabuf = file-private_data;
+
+   if ((fl-fl_type  F_UNLCK) == F_UNLCK) {
+   dmabuf_sync_single_unlock(dmabuf);
+   return 0;
+   }
+
+   /* convert flock type to dmabuf sync type. */
+   if ((fl-fl_type  F_WRLCK) == F_WRLCK)
+   type = DMA_BUF_ACCESS_W;
+   else if ((fl-fl_type  F_RDLCK) == F_RDLCK)
+   type = DMA_BUF_ACCESS_R;
+   else
+   return -EINVAL;
+
+   if (fl-fl_flags  FL_SLEEP)
+   wait = true;
+
+   /* TODO. the locking to certain region should also be considered. */
+
+   return dmabuf_sync_single_lock(dmabuf, type, wait);
+}
+
 static const struct file_operations dma_buf_fops = {
.release= dma_buf_release,
.mmap   = dma_buf_mmap_internal,
+   .poll   = dma_buf_poll,
+   .lock   = dma_buf_lock,
 };
 
 /*
-- 
1.7.5.4

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v8 1/2] dmabuf-sync: Add a buffer synchronization framework

2013-08-29 Thread Inki Dae

);
...

And the below can be used as access types:
DMA_BUF_ACCESS_R - CPU will access a buffer for read.
DMA_BUF_ACCESS_W - CPU will access a buffer for read or write.
DMA_BUF_ACCESS_DMA_R - DMA will access a buffer for read
DMA_BUF_ACCESS_DMA_W - DMA will access a buffer for read or
write.

2. Mandatory resource releasing - a task cannot hold a lock indefinitely.
A task may never try to unlock a buffer after taking a lock to the buffer.
In this case, a timer handler to the corresponding sync object is called
in five (default) seconds and then the timed-out buffer is unlocked by work
queue handler to avoid lockups and to enforce resources of the buffer.

The below is how to use interfaces for device driver:
1. Allocate and Initialize a sync object:
static void xxx_dmabuf_sync_free(void *priv)
{
struct xxx_context *ctx = priv;

if (!ctx)
return;

ctx-sync = NULL;
}
...

static struct dmabuf_sync_priv_ops driver_specific_ops = {
.free = xxx_dmabuf_sync_free,
};
...

struct dmabuf_sync *sync;

sync = dmabuf_sync_init(test sync, driver_specific_ops, ctx);
...

2. Add a dmabuf to the sync object when setting up dma buffer relevant
   registers:
dmabuf_sync_get(sync, dmabuf, DMA_BUF_ACCESS_READ);
...

3. Lock all dmabufs of the sync object before DMA or CPU accesses
   the dmabufs:
dmabuf_sync_lock(sync);
...

4. Now CPU or DMA can access all dmabufs locked in step 3.

5. Unlock all dmabufs added in a sync object after DMA or CPU access
   to these dmabufs is completed:
dmabuf_sync_unlock(sync);

   And call the following functions to release all resources,
dmabuf_sync_put_all(sync);
dmabuf_sync_fini(sync);

You can refer to actual example codes:
drm/exynos: add dmabuf sync support for g2d driver and
drm/exynos: add dmabuf sync support for kms framework from
https://git.kernel.org/cgit/linux/kernel/git/daeinki/
drm-exynos.git/log/?h=dmabuf-sync

And this framework includes fcntl[3] and select system call as interfaces
exported to user. As you know, user sees a buffer object as a dma-buf file
descriptor. fcntl() call with the file descriptor means to lock some buffer
region being managed by the dma-buf object. And select() call with the file
descriptor means to poll the completion event of CPU or DMA access to
the dma-buf.

The below is how to use interfaces for user application:

fcntl system call:

struct flock filelock;

1. Lock a dma buf:
filelock.l_type = F_WRLCK or F_RDLCK;

/* lock entire region to the dma buf. */
filelock.lwhence = SEEK_CUR;
filelock.l_start = 0;
filelock.l_len = 0;

fcntl(dmabuf fd, F_SETLKW or F_SETLK, filelock);
...
CPU access to the dma buf

2. Unlock a dma buf:
filelock.l_type = F_UNLCK;

fcntl(dmabuf fd, F_SETLKW or F_SETLK, filelock);

close(dmabuf fd) call would also unlock the dma buf. And for 
more
detail, please refer to [3]

select system call:

fd_set wdfs or rdfs;

FD_ZERO(wdfs or rdfs);
FD_SET(fd, wdfs or rdfs);

select(fd + 1, rdfs, NULL, NULL, NULL);
or
select(fd + 1, NULL, wdfs, NULL, NULL);

Every time select system call is called, a caller will wait for
the completion of DMA or CPU access to a shared buffer if there
is someone accessing the shared buffer. If no anyone then select
system call will be returned at once.

References:
[1] http://lwn.net/Articles/470339/
[2] https://patchwork.kernel.org/patch/2625361/
[3] http://linux.die.net/man/2/fcntl

Signed-off-by: Inki Dae inki@samsung.com
Signed-off-by: Kyungmin Park kyungmin.p...@samsung.com
---
 Documentation/dma-buf-sync.txt |  286 
 drivers/base/Kconfig   |7 +
 drivers/base/Makefile  |1 +
 drivers/base/dma-buf.c |4 +
 drivers/base/dmabuf-sync.c |  945 
 include/linux/dma-buf.h|   16 +
 include/linux/dmabuf-sync.h|  257 +++
 7 files changed, 1516 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/dma-buf-sync.txt
 create mode 100644 drivers/base/dmabuf-sync.c
 create mode 100644 include/linux/dmabuf-sync.h

diff --git

RE: [PATCH v2 2/2] dma-buf: Add user interfaces for dmabuf sync support

2013-08-22 Thread Inki Dae

Thanks for your comments,
Inki Dae

 -Original Message-
 From: David Herrmann [mailto:dh.herrm...@gmail.com]
 Sent: Wednesday, August 21, 2013 10:17 PM
 To: Inki Dae
 Cc: dri-de...@lists.freedesktop.org; linux-fb...@vger.kernel.org; linux-
 arm-ker...@lists.infradead.org; linux-media@vger.kernel.org; linaro-
 ker...@lists.linaro.org; Maarten Lankhorst; Sumit Semwal;
 kyungmin.p...@samsung.com; myungjoo@samsung.com
 Subject: Re: [PATCH v2 2/2] dma-buf: Add user interfaces for dmabuf sync
 support

 Hi

 On Wed, Aug 21, 2013 at 12:33 PM, Inki Dae inki@samsung.com wrote:
  This patch adds lock and poll callbacks to dma buf file operations,
  and these callbacks will be called by fcntl and select system calls.

  fcntl and select system calls can be used to wait for the completion
  of DMA or CPU access to a shared dmabuf. The difference of them is
  fcntl system call takes a lock after the completion but select system
  call doesn't. So in case of fcntl system call, it's useful when a task
  wants to access a shared dmabuf without any broken. On the other hand,
  it's useful when a task wants to just wait for the completion.

 1)
 So how is that supposed to work in user-space? I don't want to block
 on a buffer, but get notified once I can lock it. So I do:
   select(..dmabuf..)
 Once it is finished, I want to use it:
   flock(..dmabuf..)
 However, how can I guarantee the flock will not block? Some other
 process might have locked it in between. So I do a non-blocking
 flock() and if it fails I wait again?

s/flock/fcntl

Yes, it does if you wanted to do a non-blocking fcntl. The fcntl() call will
return -EAGAIN if some other process have locked first. So user process
could retry to lock or do other work. This user process called fcntl() with
non-blocking mode so in this case, I think the user should consider two
things. One is that the fcntl() call couldn't be failed, and other is that
the call could take a lock successfully. Isn't fcntl() with a other fd also,
not dmabuf, take a same action?

Looks ugly and un-predictable.

So I think this is reasonable. However, for select system call, I'm not sure
that this system call is needed yet. So I can remove it if unnecessary.

 2)
 What do I do if some user-space program holds a lock and dead-locks?

I think fcntl call with a other fd also could lead same situation, and the
lock will be unlocked once the user-space program is killed because when the
process is killed, all file descriptors of the process are closed.

 3)
 How do we do modesetting in atomic-context in the kernel? There is no
 way to lock the object. But this is required for panic-handlers and
 more importantly the kdb debugging hooks.
 Ok, I can live with that being racy, but would still be nice to be
 considered.

Yes,  The lock shouldn't be called in the atomic-context. For this, will add
comments enough.

 4)
 Why do we need locks? Aren't fences enough? That is, in which
 situation is a lock really needed?
 If we assume we have two writers A and B (DMA, CPU, GPU, whatever) and
 they have no synchronization on their own. What do we win by
 synchronizing their writes? Ok, yeah, we end up with either A or B and
 not a mixture of both. But if we cannot predict whether we get A or B,
 I don't know why we care at all? It's random, so a mixture would be
 fine, too, wouldn't it?

I think not so. There are some cases that the mixture wouldn't be fine. For
this, will describe it at below.

 So if user-space doesn't have any synchronization on its own, I don't
 see why we need an implicit sync on a dma-buf. Could you describe a
 more elaborate use-case?

Ok, first, I think I described that enough though [PATCH 0/2]. For this, you
can refer to the below link,
http://lwn.net/Articles/564208/ 

Anyway, there are some cases that user-space process needs the
synchronization on its own. In case of Tizen platform[1], one is between X
Client and X Server; actually, Composite Manager. Other is between Web app
based on HTML5 and Web Browser.

Please, assume that X Client draws something in a window buffer using CPU,
and then the X Client requests SWAP to X Server. And then X Server notifies
a damage event to Composite Manager. And then Composite Manager composes the
window buffer with its own back buffer using GPU. In this case, Composite
Manager calls eglSwapBuffers; internally, flushing GL commands instead of
finishing them for more performance.

As you may know, the flushing doesn't wait for the complete event from GPU
driver. And at the same time, the X Client could do other work, and also
draw something in the same buffer again. At this time, The buffer could be
broken. Because the X Client can't aware of when GPU access to the buffer is
completed without out-of-band hand shaking; the out-of-band hand shaking is
quite big overhead. That is why we need user-space locking interface, fcntl
system call.

And also there is same issue in case of Web app: Web app draws something

RE: [PATCH 1/2] [RFC PATCH v6] dmabuf-sync: Add a buffer synchronization framework

2013-08-21 Thread Inki Dae

Thanks for the review,
Inki Dae

 -Original Message-
 From: linux-fbdev-ow...@vger.kernel.org [mailto:linux-fbdev-
 ow...@vger.kernel.org] On Behalf Of Konrad Rzeszutek Wilk
 Sent: Wednesday, August 21, 2013 4:22 AM
 To: Inki Dae
 Cc: dri-de...@lists.freedesktop.org; linux-fb...@vger.kernel.org; linux-
 arm-ker...@lists.infradead.org; linux-media@vger.kernel.org; linaro-
 ker...@lists.linaro.org; kyungmin.p...@samsung.com;
 myungjoo@samsung.com
 Subject: Re: [PATCH 1/2] [RFC PATCH v6] dmabuf-sync: Add a buffer
 synchronization framework

 On Tue, Aug 13, 2013 at 06:19:35PM +0900, Inki Dae wrote:
  This patch adds a buffer synchronization framework based on DMA BUF[1]
  and and based on ww-mutexes[2] for lock mechanism.

  The purpose of this framework is to provide not only buffer access
 control
  to CPU and DMA but also easy-to-use interfaces for device drivers and
  user application. This framework can be used for all dma devices using
  system memory as dma buffer, especially for most ARM based SoCs.

  Changelog v6:
  - Fix sync lock to multiple reads.
  - Add select system call support.
. Wake up poll_wait when a dmabuf is unlocked.
  - Remove unnecessary the use of mutex lock.
  - Add private backend ops callbacks.
. This ops has one callback for device drivers to clean up their
  sync object resource when the sync object is freed. For this,
  device drivers should implement the free callback properly.
  - Update document file.

  Changelog v5:
  - Rmove a dependence on reservation_object: the reservation_object is
 used
to hook up to ttm and dma-buf for easy sharing of reservations across
devices. However, the dmabuf sync can be used for all dma devices;
 v4l2
and drm based drivers, so doesn't need the reservation_object anymore.
With regared to this, it adds 'void *sync' to dma_buf structure.
  - All patches are rebased on mainline, Linux v3.10.

  Changelog v4:
  - Add user side interface for buffer synchronization mechanism and
 update
descriptions related to the user side interface.

  Changelog v3:
  - remove cache operation relevant codes and update document file.

  Changelog v2:
  - use atomic_add_unless to avoid potential bug.
  - add a macro for checking valid access type.
  - code clean.

  The mechanism of this framework has the following steps,
  1. Register dmabufs to a sync object - A task gets a new sync object
 and
  can add one or more dmabufs that the task wants to access.
  This registering should be performed when a device context or an
 event
  context such as a page flip event is created or before CPU accesses
a
 shared
  buffer.

  dma_buf_sync_get(a sync object, a dmabuf);

  2. Lock a sync object - A task tries to lock all dmabufs added in
its
 own
  sync object. Basically, the lock mechanism uses ww-mutex[1] to avoid
 dead
  lock issue and for race condition between CPU and CPU, CPU and DMA,
 and DMA
  and DMA. Taking a lock means that others cannot access all locked
 dmabufs
  until the task that locked the corresponding dmabufs, unlocks all
the
 locked
  dmabufs.
  This locking should be performed before DMA or CPU accesses these
 dmabufs.

  dma_buf_sync_lock(a sync object);

  3. Unlock a sync object - The task unlocks all dmabufs added in its
 own sync
  object. The unlock means that the DMA or CPU accesses to the dmabufs
 have
  been completed so that others may access them.
  This unlocking should be performed after DMA or CPU has completed
 accesses
  to the dmabufs.

  dma_buf_sync_unlock(a sync object);

  4. Unregister one or all dmabufs from a sync object - A task
 unregisters
  the given dmabufs from the sync object. This means that the task
 dosen't
  want to lock the dmabufs.
  The unregistering should be performed after DMA or CPU has completed
  accesses to the dmabufs or when dma_buf_sync_lock() is failed.

  dma_buf_sync_put(a sync object, a dmabuf);
  dma_buf_sync_put_all(a sync object);

  The described steps may be summarized as:
  get - lock - CPU or DMA access to a buffer/s - unlock - put

  This framework includes the following two features.
  1. read (shared) and write (exclusive) locks - A task is required to
 declare
  the access type when the task tries to register a dmabuf;
  READ, WRITE, READ DMA, or WRITE DMA.

  The below is example codes,
  struct dmabuf_sync *sync;

  sync = dmabuf_sync_init(...);
  ...

  dmabuf_sync_get(sync, dmabuf, DMA_BUF_ACCESS_R);
  ...

  And the below can be used as access types:
  DMA_BUF_ACCESS_R - CPU will access a buffer for read.
  DMA_BUF_ACCESS_W - CPU will access a buffer for read or
 write.
  DMA_BUF_ACCESS_DMA_R - DMA will access a buffer for read
  DMA_BUF_ACCESS_DMA_W - DMA will access a buffer for read

[PATCH v7 0/2] Introduce buffer synchronization framework

2013-08-21 Thread Inki Dae

 3d rendering will be delayed
up to about 16ms. As a result, the window buffer would be displayed in
about two vsyncs (about 32ms) and in turn, that would show slow
responsiveness.

For this, we could enhance the responsiveness with locking
mechanism: skipping one vblank wait. I guess in the similar reason,
Android, Chrome OS, and other platforms are using their own locking
mechanisms; Android sync driver, KDS, and DMA fence.

The below shows the deferred page flip issue in worst case,

   | - vsync signal
   |-- DRI2GetBuffers
   |
   |
   |
   | - vsync signal
   |-- Request gpu rendering
  time |
   |
   |-- Request page flip (deferred)
   | - vsync signal
   |-- Displayed on screen
   |
   |
   |
   | - vsync signal

Thanks,
Inki Dae

References:
[1] http://lwn.net/Articles/470339/
[2] https://patchwork.kernel.org/patch/2625361/
[3] http://linux.die.net/man/2/fcntl

Inki Dae (2):
  dmabuf-sync: Add a buffer synchronization framework
  dma-buf: Add user interfaces for dmabuf sync support

 Documentation/dma-buf-sync.txt |  286 
 drivers/base/Kconfig   |7 +
 drivers/base/Makefile  |1 +
 drivers/base/dma-buf.c |   85 +
 drivers/base/dmabuf-sync.c |  706 
 include/linux/dma-buf.h|   16 +
 include/linux/dmabuf-sync.h|  236 ++
 7 files changed, 1337 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/dma-buf-sync.txt
 create mode 100644 drivers/base/dmabuf-sync.c
 create mode 100644 include/linux/dmabuf-sync.h

-- 
1.7.5.4

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v7 1/2] dmabuf-sync: Add a buffer synchronization framework

2013-08-21 Thread Inki Dae

 driver:
1. Allocate and Initialize a sync object:
static void xxx_dmabuf_sync_free(void *priv)
{
struct xxx_context *ctx = priv;

if (!ctx)
return;

ctx-sync = NULL;
}
...

static struct dmabuf_sync_priv_ops driver_specific_ops = {
.free = xxx_dmabuf_sync_free,
};
...

struct dmabuf_sync *sync;

sync = dmabuf_sync_init(test sync, driver_specific_ops, ctx);
...

2. Add a dmabuf to the sync object when setting up dma buffer relevant
   registers:
dmabuf_sync_get(sync, dmabuf, DMA_BUF_ACCESS_READ);
...

3. Lock all dmabufs of the sync object before DMA or CPU accesses
   the dmabufs:
dmabuf_sync_lock(sync);
...

4. Now CPU or DMA can access all dmabufs locked in step 3.

5. Unlock all dmabufs added in a sync object after DMA or CPU access
   to these dmabufs is completed:
dmabuf_sync_unlock(sync);

   And call the following functions to release all resources,
dmabuf_sync_put_all(sync);
dmabuf_sync_fini(sync);

You can refer to actual example codes:
drm/exynos: add dmabuf sync support for g2d driver and
drm/exynos: add dmabuf sync support for kms framework from
https://git.kernel.org/cgit/linux/kernel/git/daeinki/
drm-exynos.git/log/?h=dmabuf-sync

And this framework includes fcntl[3] and select system call as interfaces
exported to user. As you know, user sees a buffer object as a dma-buf file
descriptor. fcntl() call with the file descriptor means to lock some buffer
region being managed by the dma-buf object. And select() call with the file
descriptor means to poll the completion event of CPU or DMA access to
the dma-buf.

The below is how to use interfaces for user application:

fcntl system call:

struct flock filelock;

1. Lock a dma buf:
filelock.l_type = F_WRLCK or F_RDLCK;

/* lock entire region to the dma buf. */
filelock.lwhence = SEEK_CUR;
filelock.l_start = 0;
filelock.l_len = 0;

fcntl(dmabuf fd, F_SETLKW or F_SETLK, filelock);
...
CPU access to the dma buf

2. Unlock a dma buf:
filelock.l_type = F_UNLCK;

fcntl(dmabuf fd, F_SETLKW or F_SETLK, filelock);

close(dmabuf fd) call would also unlock the dma buf. And for 
more
detail, please refer to [3]

select system call:

fd_set wdfs or rdfs;

FD_ZERO(wdfs or rdfs);
FD_SET(fd, wdfs or rdfs);

select(fd + 1, rdfs, NULL, NULL, NULL);
or
select(fd + 1, NULL, wdfs, NULL, NULL);

Every time select system call is called, a caller will wait for
the completion of DMA or CPU access to a shared buffer if there
is someone accessing the shared buffer. If no anyone then select
system call will be returned at once.

References:
[1] http://lwn.net/Articles/470339/
[2] https://patchwork.kernel.org/patch/2625361/
[3] http://linux.die.net/man/2/fcntl

Signed-off-by: Inki Dae inki@samsung.com
Signed-off-by: Kyungmin Park kyungmin.p...@samsung.com
---
 Documentation/dma-buf-sync.txt |  286 
 drivers/base/Kconfig   |7 +
 drivers/base/Makefile  |1 +
 drivers/base/dma-buf.c |4 +
 drivers/base/dmabuf-sync.c |  706 
 include/linux/dma-buf.h|   16 +
 include/linux/dmabuf-sync.h|  236 ++
 7 files changed, 1256 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/dma-buf-sync.txt
 create mode 100644 drivers/base/dmabuf-sync.c
 create mode 100644 include/linux/dmabuf-sync.h

diff --git a/Documentation/dma-buf-sync.txt b/Documentation/dma-buf-sync.txt
new file mode 100644
index 000..5945c8a
--- /dev/null
+++ b/Documentation/dma-buf-sync.txt
@@ -0,0 +1,286 @@
+DMA Buffer Synchronization Framework
+
+
+  Inki Dae
+  inki dot dae at samsung dot com
+  daeinki at gmail dot com
+
+This document is a guide for device-driver writers describing the DMA buffer
+synchronization API. This document also describes how to use the API to
+use buffer synchronization mechanism between DMA and DMA, CPU and DMA, and
+CPU and CPU.
+
+The DMA Buffer synchronization API provides buffer synchronization mechanism;
+i.e., buffer access control to CPU and DMA, and easy-to-use interfaces

RE: [PATCH v2 1/5] [media] exynos-mscl: Add new driver for M-Scaler

2013-08-20 Thread Inki Dae



 -Original Message-
 From: linux-media-ow...@vger.kernel.org [mailto:linux-media-
 ow...@vger.kernel.org] On Behalf Of Shaik Ameer Basha
 Sent: Tuesday, August 20, 2013 5:07 PM
 To: Inki Dae
 Cc: Shaik Ameer Basha; LMML; linux-samsung-...@vger.kernel.org;
 c...@samsung.com; Sylwester Nawrocki; posc...@google.com; Arun Kumar K
 Subject: Re: [PATCH v2 1/5] [media] exynos-mscl: Add new driver for M-
 Scaler
 
 Hi Inki Dae,
 
 Thanks for the review.
 
 
 On Mon, Aug 19, 2013 at 6:18 PM, Inki Dae inki@samsung.com wrote:
  Just quick review.
 
  -Original Message-
  From: linux-media-ow...@vger.kernel.org [mailto:linux-media-
  ow...@vger.kernel.org] On Behalf Of Shaik Ameer Basha
  Sent: Monday, August 19, 2013 7:59 PM
  To: linux-media@vger.kernel.org; linux-samsung-...@vger.kernel.org
  Cc: s.nawro...@samsung.com; posc...@google.com; arun...@samsung.com;
  shaik.am...@samsung.com
  Subject: [PATCH v2 1/5] [media] exynos-mscl: Add new driver for M-
 Scaler
 
  This patch adds support for M-Scaler (M2M Scaler) device which is a
  new device for scaling, blending, color fill  and color space
  conversion on EXYNOS5 SoCs.
 
  This device supports the followings as key feature.
  input image format
  - YCbCr420 2P(UV/VU), 3P
  - YCbCr422 1P(YUYV/UYVY/YVYU), 2P(UV,VU), 3P
  - YCbCr444 2P(UV,VU), 3P
  - RGB565, ARGB1555, ARGB, ARGB, RGBA
  - Pre-multiplexed ARGB, L8A8 and L8
  output image format
  - YCbCr420 2P(UV/VU), 3P
  - YCbCr422 1P(YUYV/UYVY/YVYU), 2P(UV,VU), 3P
  - YCbCr444 2P(UV,VU), 3P
  - RGB565, ARGB1555, ARGB, ARGB, RGBA
  - Pre-multiplexed ARGB
  input rotation
  - 0/90/180/270 degree, X/Y/XY Flip
  scale ratio
  - 1/4 scale down to 16 scale up
  color space conversion
  - RGB to YUV / YUV to RGB
  Size
  - Input : 16x16 to 8192x8192
  - Output:   4x4 to 8192x8192
  alpha blending, color fill
 
  Signed-off-by: Shaik Ameer Basha shaik.am...@samsung.com
  ---
   drivers/media/platform/exynos-mscl/mscl-regs.c |  318
  
   drivers/media/platform/exynos-mscl/mscl-regs.h |  282
  +
   2 files changed, 600 insertions(+)
   create mode 100644 drivers/media/platform/exynos-mscl/mscl-regs.c
   create mode 100644 drivers/media/platform/exynos-mscl/mscl-regs.h
 
  diff --git a/drivers/media/platform/exynos-mscl/mscl-regs.c
  b/drivers/media/platform/exynos-mscl/mscl-regs.c
  new file mode 100644
  index 000..9354afc
  --- /dev/null
  +++ b/drivers/media/platform/exynos-mscl/mscl-regs.c
  @@ -0,0 +1,318 @@
  +/*
  + * Copyright (c) 2013 - 2014 Samsung Electronics Co., Ltd.
  + *   http://www.samsung.com
  + *
  + * Samsung EXYNOS5 SoC series M-Scaler driver
  + *
  + * This program is free software; you can redistribute it and/or
 modify
  + * it under the terms of the GNU General Public License as published
  + * by the Free Software Foundation, either version 2 of the License,
  + * or (at your option) any later version.
  + */
  +
  +#include linux/delay.h
  +#include linux/platform_device.h
  +
  +#include mscl-core.h
  +
  +void mscl_hw_set_sw_reset(struct mscl_dev *dev)
  +{
  + u32 cfg;
  +
  + cfg = readl(dev-regs + MSCL_CFG);
  + cfg |= MSCL_CFG_SOFT_RESET;
  +
  + writel(cfg, dev-regs + MSCL_CFG);
  +}
  +
  +int mscl_wait_reset(struct mscl_dev *dev)
  +{
  + unsigned long end = jiffies + msecs_to_jiffies(50);
 
  What does 50 mean?
 
  + u32 cfg, reset_done = 0;
  +
 
  Please describe why the below codes are needed.
 
 
 As per the Documentation,
 
  SOFT RESET: Writing 1 to this bit generates software reset. To
 check the completion of the reset, wait until this
 field becomes zero, then wrie an arbitrary value to any of RW
 registers and read it. If the read data matches the written data,
  it means SW reset succeeded. Otherwise, repeat write  read until
 matched.
 
 
 Thie below code tries to do the same (as per user manual). and in the
 above msec_to_jiffies(50), 50 is the 50msec wait. before
 checking the SOFT RESET is really done.
 
 Is it good to ignore this checks?
 

No, I mean that someone may want to understand your codes so leave comments 
enough for them.

Thanks,
Inki Dae

 
 
 
  + while (time_before(jiffies, end)) {
  + cfg = readl(dev-regs + MSCL_CFG);
  + if (!(cfg  MSCL_CFG_SOFT_RESET)) {
  + reset_done = 1;
  + break;
  + }
  + usleep_range(10, 20);
  + }
  +
  + /* write any value to r/w reg and read it back */
  + while (reset_done) {
  +
  + /* [TBD] need to define number of tries before returning
  +  * -EBUSY to the caller
  +  */
  +
  + writel(MSCL_CFG_SOFT_RESET_CHECK_VAL,
  + dev-regs + MSCL_CFG_SOFT_RESET_CHECK_REG

RE: [PATCH v2 0/5] Exynos5 M-Scaler Driver

2013-08-19 Thread Inki Dae



 -Original Message-
 From: linux-media-ow...@vger.kernel.org [mailto:linux-media-
 ow...@vger.kernel.org] On Behalf Of Shaik Ameer Basha
 Sent: Monday, August 19, 2013 7:59 PM
 To: linux-media@vger.kernel.org; linux-samsung-...@vger.kernel.org
 Cc: s.nawro...@samsung.com; posc...@google.com; arun...@samsung.com;
 shaik.am...@samsung.com
 Subject: [PATCH v2 0/5] Exynos5 M-Scaler Driver
 
 This patch adds support for M-Scaler (M2M Scaler) device which is a
 new device for scaling, blending, color fill  and color space
 conversion on EXYNOS5 SoCs.

All Exynos5 SoCs really have this IP? It seems that only Exynos5420 and
maybe Exynos5410 have this IP, NOT Exynos5250. Please check it again and
describe it surely over the all patch series.

Thanks,
Inki Dae

 
 This device supports the following as key features.
 input image format
 - YCbCr420 2P(UV/VU), 3P
 - YCbCr422 1P(YUYV/UYVY/YVYU), 2P(UV,VU), 3P
 - YCbCr444 2P(UV,VU), 3P
 - RGB565, ARGB1555, ARGB, ARGB, RGBA
 - Pre-multiplexed ARGB, L8A8 and L8
 output image format
 - YCbCr420 2P(UV/VU), 3P
 - YCbCr422 1P(YUYV/UYVY/YVYU), 2P(UV,VU), 3P
 - YCbCr444 2P(UV,VU), 3P
 - RGB565, ARGB1555, ARGB, ARGB, RGBA
 - Pre-multiplexed ARGB
 input rotation
 - 0/90/180/270 degree, X/Y/XY Flip
 scale ratio
 - 1/4 scale down to 16 scale up
 color space conversion
 - RGB to YUV / YUV to RGB
 Size
 - Input : 16x16 to 8192x8192
 - Output:   4x4 to 8192x8192
 alpha blending, color fill
 
 Rebased on:
 ---
 git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git:master
 
 Changes from v1:
 ---
 1] Split the previous single patch into multiple patches.
 2] Added DT binding documentation.
 3] Removed the unnecessary header file inclusions.
 4] Fix the condition check in mscl_prepare_address for swapping cb/cr
 addresses.
 
 Shaik Ameer Basha (5):
   [media] exynos-mscl: Add new driver for M-Scaler
   [media] exynos-mscl: Add core functionality for the M-Scaler driver
   [media] exynos-mscl: Add m2m functionality for the M-Scaler driver
   [media] exynos-mscl: Add DT bindings for M-Scaler driver
   [media] exynos-mscl: Add Makefile for M-Scaler driver
 
  .../devicetree/bindings/media/exynos5-mscl.txt |   34 +
  drivers/media/platform/Kconfig |8 +
  drivers/media/platform/Makefile|1 +
  drivers/media/platform/exynos-mscl/Makefile|3 +
  drivers/media/platform/exynos-mscl/mscl-core.c | 1312
 
  drivers/media/platform/exynos-mscl/mscl-core.h |  549 
  drivers/media/platform/exynos-mscl/mscl-m2m.c  |  763 
  drivers/media/platform/exynos-mscl/mscl-regs.c |  318 +
  drivers/media/platform/exynos-mscl/mscl-regs.h |  282 +
  9 files changed, 3270 insertions(+)
  create mode 100644 Documentation/devicetree/bindings/media/exynos5-
 mscl.txt
  create mode 100644 drivers/media/platform/exynos-mscl/Makefile
  create mode 100644 drivers/media/platform/exynos-mscl/mscl-core.c
  create mode 100644 drivers/media/platform/exynos-mscl/mscl-core.h
  create mode 100644 drivers/media/platform/exynos-mscl/mscl-m2m.c
  create mode 100644 drivers/media/platform/exynos-mscl/mscl-regs.c
  create mode 100644 drivers/media/platform/exynos-mscl/mscl-regs.h
 
 --
 1.7.9.5
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-media in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH v2 4/5] [media] exynos-mscl: Add DT bindings for M-Scaler driver

2013-08-19 Thread Inki Dae



 -Original Message-
 From: linux-media-ow...@vger.kernel.org [mailto:linux-media-
 ow...@vger.kernel.org] On Behalf Of Shaik Ameer Basha
 Sent: Monday, August 19, 2013 7:59 PM
 To: linux-media@vger.kernel.org; linux-samsung-...@vger.kernel.org
 Cc: s.nawro...@samsung.com; posc...@google.com; arun...@samsung.com;
 shaik.am...@samsung.com
 Subject: [PATCH v2 4/5] [media] exynos-mscl: Add DT bindings for M-Scaler
 driver
 
 This patch adds the DT binding documentation for the exynos5
 based M-Scaler device driver.
 
 Signed-off-by: Shaik Ameer Basha shaik.am...@samsung.com
 ---
  .../devicetree/bindings/media/exynos5-mscl.txt |   34
 
  1 file changed, 34 insertions(+)
  create mode 100644 Documentation/devicetree/bindings/media/exynos5-
 mscl.txt
 
 diff --git a/Documentation/devicetree/bindings/media/exynos5-mscl.txt
 b/Documentation/devicetree/bindings/media/exynos5-mscl.txt
 new file mode 100644
 index 000..5c9d1b1
 --- /dev/null
 +++ b/Documentation/devicetree/bindings/media/exynos5-mscl.txt
 @@ -0,0 +1,34 @@
 +* Samsung Exynos5 M-Scaler device
 +
 +M-Scaler is used for scaling, blending, color fill and color space
 +conversion on EXYNOS5 SoCs.
 +
 +Required properties:
 +- compatible: should be samsung,exynos5-mscl

If Exynos5410/5420 have same IP,
samsung,exynos5410-mscl for M Scaler IP in Exynos5410/5420

Else,
Compatible: should be one of the following:
(a) samsung,exynos5410-mscl for M Scaler IP in Exynos5410
(b) samsung,exynos5420-mscl for M Scaler IP in Exynos5420

 +- reg: should contain M-Scaler physical address location and length.
 +- interrupts: should contain M-Scaler interrupt number
 +- clocks: should contain the clock number according to CCF
 +- clock-names: should be mscl
 +
 +Example:
 +
 + mscl_0: mscl@0x1280 {
 + compatible = samsung,exynos5-mscl;

samsung,exynos5410-mscl;

 + reg = 0x1280 0x1000;
 + interrupts = 0 220 0;
 + clocks = clock 381;
 + clock-names = mscl;
 + };
 +
 +Aliases:
 +Each M-Scaler node should have a numbered alias in the aliases node,
 +in the form of msclN, N = 0...2. M-Scaler driver uses these aliases
 +to retrieve the device IDs using of_alias_get_id() call.
 +
 +Example:
 +
 +aliases {
 + mscl0 =mscl_0;
 + mscl1 =mscl_1;
 + mscl2 =mscl_2;
 +};
 --
 1.7.9.5
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-media in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/2] [RFC PATCH v6] dmabuf-sync: Add a buffer synchronization framework

2013-08-13 Thread Inki Dae

;

ctx-sync = NULL;
}
...

static struct dmabuf_sync_priv_ops driver_specific_ops = {
.free = xxx_dmabuf_sync_free,
};
...

struct dmabuf_sync *sync;

sync = dmabuf_sync_init(test sync, driver_specific_ops, ctx);
...

2. Add a dmabuf to the sync object when setting up dma buffer relevant
   registers:
dmabuf_sync_get(sync, dmabuf, DMA_BUF_ACCESS_READ);
...

3. Lock all dmabufs of the sync object before DMA or CPU accesses
   the dmabufs:
dmabuf_sync_lock(sync);
...

4. Now CPU or DMA can access all dmabufs locked in step 3.

5. Unlock all dmabufs added in a sync object after DMA or CPU access
   to these dmabufs is completed:
dmabuf_sync_unlock(sync);

   And call the following functions to release all resources,
dmabuf_sync_put_all(sync);
dmabuf_sync_fini(sync);

You can refer to actual example codes:
drm/exynos: add dmabuf sync support for g2d driver and
drm/exynos: add dmabuf sync support for kms framework from
https://git.kernel.org/cgit/linux/kernel/git/daeinki/
drm-exynos.git/log/?h=dmabuf-sync

And this framework includes fcntl system call[3] as interfaces exported
to user. As you know, user sees a buffer object as a dma-buf file descriptor.
So fcntl() call with the file descriptor means to lock some buffer region being
managed by the dma-buf object.

The below is how to use interfaces for user application:

fcntl system call:

struct flock filelock;

1. Lock a dma buf:
filelock.l_type = F_WRLCK or F_RDLCK;

/* lock entire region to the dma buf. */
filelock.lwhence = SEEK_CUR;
filelock.l_start = 0;
filelock.l_len = 0;

fcntl(dmabuf fd, F_SETLKW or F_SETLK, filelock);
...
CPU access to the dma buf

2. Unlock a dma buf:
filelock.l_type = F_UNLCK;

fcntl(dmabuf fd, F_SETLKW or F_SETLK, filelock);

close(dmabuf fd) call would also unlock the dma buf. And for 
more
detail, please refer to [3]

select system call:

fd_set wdfs or rdfs;

FD_ZERO(wdfs or rdfs);
FD_SET(fd, wdfs or rdfs);

select(fd + 1, rdfs, NULL, NULL, NULL);
or
select(fd + 1, NULL, wdfs, NULL, NULL);

Every time select system call is called, a caller will wait for
the completion of DMA or CPU access to a shared buffer if there
is someone accessing the shared buffer; locked the shared buffer.
However, if no anyone then select system call will be returned
at once.

References:
[1] http://lwn.net/Articles/470339/
[2] https://patchwork.kernel.org/patch/2625361/
[3] http://linux.die.net/man/2/fcntl

Signed-off-by: Inki Dae inki@samsung.com
Signed-off-by: Kyungmin Park kyungmin.p...@samsung.com
---
 Documentation/dma-buf-sync.txt |  285 +
 drivers/base/Kconfig   |7 +
 drivers/base/Makefile  |1 +
 drivers/base/dma-buf.c |4 +
 drivers/base/dmabuf-sync.c |  678 
 include/linux/dma-buf.h|   16 +
 include/linux/dmabuf-sync.h|  190 +++
 7 files changed, 1181 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/dma-buf-sync.txt
 create mode 100644 drivers/base/dmabuf-sync.c
 create mode 100644 include/linux/dmabuf-sync.h

diff --git a/Documentation/dma-buf-sync.txt b/Documentation/dma-buf-sync.txt
new file mode 100644
index 000..8023d06
--- /dev/null
+++ b/Documentation/dma-buf-sync.txt
@@ -0,0 +1,285 @@
+DMA Buffer Synchronization Framework
+
+
+  Inki Dae
+  inki dot dae at samsung dot com
+  daeinki at gmail dot com
+
+This document is a guide for device-driver writers describing the DMA buffer
+synchronization API. This document also describes how to use the API to
+use buffer synchronization mechanism between DMA and DMA, CPU and DMA, and
+CPU and CPU.
+
+The DMA Buffer synchronization API provides buffer synchronization mechanism;
+i.e., buffer access control to CPU and DMA, and easy-to-use interfaces for
+device drivers and user application. And this API can be used for all dma
+devices using system memory as dma buffer, especially for most ARM based SoCs.
+
+
+Motivation
+--
+
+Buffer synchronization issue between DMA and DMA:
+   Sharing a buffer, a device cannot be aware of when the other device
+   will access the shared buffer: a device may

[RFC PATCH v6 0/2] Introduce buffer synchronization framework

2013-08-13 Thread Inki Dae

Hi all,

   This patch set introduces a buffer synchronization framework based
   on DMA BUF[1] and based on ww-mutexes[2] for lock mechanism, and
   may be final RFC.

   The purpose of this framework is to provide not only buffer access
   control to CPU and CPU, and CPU and DMA, and DMA and DMA but also
   easy-to-use interfaces for device drivers and user application.
   In addtion, this patch set suggests a way for enhancing performance.

   For generic user mode interface, we have used fcntl and select system
   call[3]. As you know, user application sees a buffer object as a dma-buf
   file descriptor. So fcntl() call with the file descriptor means to lock
   some buffer region being managed by the dma-buf object. And select() call
   means to wait for the completion of CPU or DMA access to the dma-buf
   without locking. For more detail, you can refer to the dma-buf-sync.txt
   in Documentation/


   There are some cases we should use this buffer synchronization framework.
   One of which is to primarily enhance GPU rendering performance on Tizen
   platform in case of 3d app with compositing mode that 3d app draws
   something in off-screen buffer, and Web app.

   In case of 3d app with compositing mode which is not a full screen mode,
   the app calls glFlush to submit 3d commands to GPU driver instead of
   glFinish for more performance. The reason we call glFlush is that glFinish
   blocks caller's task until the execution of the 2d commands is completed.
   Thus, that makes GPU and CPU more idle. As result, 3d rendering performance
   with glFinish is quite lower than glFlush. However, the use of glFlush has
   one issue that the a buffer shared with GPU could be broken when CPU
   accesses the buffer at once after glFlush because CPU cannot be aware of
   the completion of GPU access to the buffer. Of course, the app can be aware
   of that time using eglWaitGL but this function is valid only in case of the
   same process.

   In case of Tizen, there are some applications that one process draws
   something in its own off-screen buffer (pixmap buffer) using CPU, and other
   process gets a off-screen buffer (window buffer) from Xorg using
   DRI2GetBuffers, and then composites the pixmap buffer with the window buffer
   using GPU, and finally page flip.

   Web app based on HTML5 also has the same issue. Web browser and its web app
   are different process. The web app draws something in its own pixmap buffer,
   and then the web browser gets a window buffer from Xorg, and then composites
   the pixmap buffer with the window buffer. And finally, page flip.

   Thus, in such cases, a shared buffer could be broken as one process draws
   something in pixmap buffer using CPU, when other process composites the
   pixmap buffer with window buffer using GPU without any locking mechanism.
   That is why we need user land locking interface, fcntl system call.

   And last one is a deferred page flip issue. This issue is that a window
   buffer rendered can be displayed on screen in about 32ms in worst case:
   assume that the gpu rendering is completed within 16ms.
   That can be incurred when compositing a pixmap buffer with a window buffer
   using GPU and when vsync is just started. At this time, Xorg waits for
   a vblank event to get a window buffer so 3d rendering will be delayed
   up to about 16ms. As a result, the window buffer would be displayed in
   about two vsyncs (about 32ms) and in turn, that would show slow
   responsiveness.

   For this, we could enhance the responsiveness with locking
   mechanism: skipping one vblank wait. I guess in the similar reason,
   Android, Chrome OS, and other platforms are using their own locking
   mechanisms; Android sync driver, KDS, and DMA fence.

   The below shows the deferred page flip issue in worst case,

   | - vsync signal
   |-- DRI2GetBuffers
   |
   |
   |
   | - vsync signal
   |-- Request gpu rendering
  time |
   |
   |-- Request page flip (deferred)
   | - vsync signal
   |-- Displayed on screen
   |
   |
   |
   | - vsync signal


Thanks,
Inki Dae


References:
[1] http://lwn.net/Articles/470339/
[2] https://patchwork.kernel.org/patch/2625361/
[3] http://linux.die.net/man/2/fcntl


Inki Dae (2):
  [RFC PATCH v6] dmabuf-sync: Add a buffer synchronization framework
  [RFC PATCH v2] dma-buf: Add user interfaces for dmabuf sync support.

 Documentation/dma-buf-sync.txt |  285 +
 drivers/base/Kconfig   |7 +
 drivers/base/Makefile  |1 +
 drivers/base/dma-buf.c |   85 +
 drivers/base/dmabuf-sync.c |  678 
 include/linux/dma-buf.h|   16 +
 include/linux/dmabuf-sync.h|  191

[Resend][RFC PATCH v6 0/2] Introduce buffer synchronization framework

2013-08-13 Thread Inki Dae

Just adding more detailed descriptions.

Hi all,

   This patch set introduces a buffer synchronization framework based
   on DMA BUF[1] and based on ww-mutexes[2] for lock mechanism, and
   may be final RFC.

   The purpose of this framework is to provide not only buffer access
   control to CPU and CPU, and CPU and DMA, and DMA and DMA but also
   easy-to-use interfaces for device drivers and user application.
   In addtion, this patch set suggests a way for enhancing performance.

   For generic user mode interface, we have used fcntl and select system
   call[3]. As you know, user application sees a buffer object as a dma-buf
   file descriptor. So fcntl() call with the file descriptor means to lock
   some buffer region being managed by the dma-buf object. And select() call
   means to wait for the completion of CPU or DMA access to the dma-buf
   without locking. For more detail, you can refer to the dma-buf-sync.txt
   in Documentation/


   There are some cases we should use this buffer synchronization framework.
   One of which is to primarily enhance GPU rendering performance on Tizen
   platform in case of 3d app with compositing mode that 3d app draws
   something in off-screen buffer, and Web app.

   In case of 3d app with compositing mode which is not a full screen mode,
   the app calls glFlush to submit 3d commands to GPU driver instead of
   glFinish for more performance. The reason we call glFlush is that glFinish
   blocks caller's task until the execution of the 2d commands is completed.
   Thus, that makes GPU and CPU more idle. As result, 3d rendering performance
   with glFinish is quite lower than glFlush. However, the use of glFlush has
   one issue that the a buffer shared with GPU could be broken when CPU
   accesses the buffer at once after glFlush because CPU cannot be aware of
   the completion of GPU access to the buffer. Of course, the app can be aware
   of that time using eglWaitGL but this function is valid only in case of the
   same process.

   The below summarizes how app's window is displayed on Tizen platform:
   1. X client requests a window buffer to Xorg.
   2. X client draws something in the window buffer using CPU.
   3. X client requests SWAP to Xorg.
   4. Xorg notifies a damage event to Composite Manager.
   5. Composite Manager gets the window buffer (front buffer) through
  DRI2GetBuffers.
   6. Composite Manager composes the window buffer and its own back buffer
  using GPU. At this time, eglSwapBuffers is called: internally, 3d
  commands are flushed to gpu driver.
   7. Composite Manager requests SWAP to Xorg.
   8. Xorg performs drm page flip. At this time, the window buffer is
  displayed on screen.

   Web app based on HTML5 also has similar procedure. Web browser and its web
   app are different processs. Web app draws something in its own buffer,
   and then the web browser gets a window buffer from Xorg, and then composes
   those two buffers using GPU.

   Thus, in such cases, a shared buffer could be broken when one process draws
   something in a shared buffer using CPU while Composite manager is composing
   two buffers - X client's front buffer and Composite manger's back buffer, or
   web app's front buffer and web browser's back buffer - using GPU without
   any locking mechanism. That is why we need user land locking interface,
   fcntl system call.

   And last one is a deferred page flip issue. This issue is that a window
   buffer rendered can be displayed on screen in about 32ms in worst case:
   assume that the gpu rendering is completed within 16ms.
   That can be incurred when compositing a pixmap buffer with a window buffer
   using GPU and when vsync is just started. At this time, Xorg waits for
   a vblank event to get a window buffer so 3d rendering will be delayed
   up to about 16ms. As a result, the window buffer would be displayed in
   about two vsyncs (about 32ms) and in turn, that would show slow
   responsiveness.

   For this, we could enhance the responsiveness with locking
   mechanism: skipping one vblank wait. I guess in the similar reason,
   Android, Chrome OS, and other platforms are using their own locking
   mechanisms; Android sync driver, KDS, and DMA fence.

   The below shows the deferred page flip issue in worst case,

   | - vsync signal
   |-- DRI2GetBuffers
   |
   |
   |
   | - vsync signal
   |-- Request gpu rendering
  time |
   |
   |-- Request page flip (deferred)
   | - vsync signal
   |-- Displayed on screen
   |
   |
   |
   | - vsync signal


Thanks,
Inki Dae


References:
[1] http://lwn.net/Articles/470339/
[2] https://patchwork.kernel.org/patch/2625361/
[3] http://linux.die.net/man/2/fcntl


Inki Dae (2):
  [RFC PATCH v6

About buffer sychronization mechanism and cache operation

2013-08-12 Thread Inki Dae

 synchronization mechanism that 
uses only Linux standard interfaces (dmabuf) including user land interfaces 
(fcntl and select system calls), and the dmabuf sync framework could meet it.


Thanks,
Inki Dae

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH v5 1/2] dmabuf-sync: Introduce buffer synchronization framework

2013-07-12 Thread Inki Dae

:
dmabuf_sync_unlock(sync);

   And call the following functions to release all resources,
dmabuf_sync_put_all(sync);
dmabuf_sync_fini(sync);

You can refer to actual example codes:
drm/exynos: add dmabuf sync support for g2d driver and
drm/exynos: add dmabuf sync support for kms framework from
https://git.kernel.org/cgit/linux/kernel/git/daeinki/
drm-exynos.git/log/?h=dmabuf-sync

And this framework includes fcntl system call[3] as interfaces exported
to user. As you know, user sees a buffer object as a dma-buf file descriptor.
So fcntl() call with the file descriptor means to lock some buffer region being
managed by the dma-buf object.

The below is how to use interfaces for user application:
struct flock filelock;

1. Lock a dma buf:
filelock.l_type = F_WRLCK or F_RDLCK;

/* lock entire region to the dma buf. */
filelock.lwhence = SEEK_CUR;
filelock.l_start = 0;
filelock.l_len = 0;

fcntl(dmabuf fd, F_SETLKW or F_SETLK, filelock);
...
CPU access to the dma buf

2. Unlock a dma buf:
filelock.l_type = F_UNLCK;

fcntl(dmabuf fd, F_SETLKW or F_SETLK, filelock);

close(dmabuf fd) call would also unlock the dma buf. And for 
more
detail, please refer to [3]

References:
[1] http://lwn.net/Articles/470339/
[2] https://patchwork.kernel.org/patch/2625361/
[3] http://linux.die.net/man/2/fcntl

Signed-off-by: Inki Dae inki@samsung.com
Signed-off-by: Kyungmin Park kyungmin.p...@samsung.com
---
 Documentation/dma-buf-sync.txt |  290 +
 drivers/base/Kconfig   |7 +
 drivers/base/Makefile  |1 +
 drivers/base/dma-buf.c |4 +
 drivers/base/dmabuf-sync.c |  674 
 include/linux/dma-buf.h|   16 +
 include/linux/dmabuf-sync.h|  178 +++
 7 files changed, 1170 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/dma-buf-sync.txt
 create mode 100644 drivers/base/dmabuf-sync.c
 create mode 100644 include/linux/dmabuf-sync.h

diff --git a/Documentation/dma-buf-sync.txt b/Documentation/dma-buf-sync.txt
new file mode 100644
index 000..4427759
--- /dev/null
+++ b/Documentation/dma-buf-sync.txt
@@ -0,0 +1,290 @@
+DMA Buffer Synchronization Framework
+
+
+  Inki Dae
+  inki dot dae at samsung dot com
+  daeinki at gmail dot com
+
+This document is a guide for device-driver writers describing the DMA buffer
+synchronization API. This document also describes how to use the API to
+use buffer synchronization mechanism between DMA and DMA, CPU and DMA, and
+CPU and CPU.
+
+The DMA Buffer synchronization API provides buffer synchronization mechanism;
+i.e., buffer access control to CPU and DMA, and easy-to-use interfaces for
+device drivers and user application. And this API can be used for all dma
+devices using system memory as dma buffer, especially for most ARM based SoCs.
+
+
+Motivation
+--
+
+Buffer synchronization issue between DMA and DMA:
+   Sharing a buffer, a device cannot be aware of when the other device
+   will access the shared buffer: a device may access a buffer containing
+   wrong data if the device accesses the shared buffer while another
+   device is still accessing the shared buffer.
+   Therefore, a user process should have waited for the completion of DMA
+   access by another device before a device tries to access the shared
+   buffer.
+
+Buffer synchronization issue between CPU and DMA:
+   A user process should consider that when having to send a buffer, filled
+   by CPU, to a device driver for the device driver to access the buffer as
+   a input buffer while CPU and DMA are sharing the buffer.
+   This means that the user process needs to understand how the device
+   driver is worked. Hence, the conventional mechanism not only makes
+   user application complicated but also incurs performance overhead.
+
+Buffer synchronization issue between CPU and CPU:
+   In case that two processes share one buffer; shared with DMA also,
+   they may need some mechanism to allow process B to access the shared
+   buffer after the completion of CPU access by process A.
+   Therefore, process B should have waited for the completion of CPU access
+   by process A using the mechanism before trying to access the shared
+   buffer.
+
+What is the best way to solve these buffer synchronization issues?
+   We may need a common object that a device driver and a user process
+   notify the common object of when they try

[RFC PATCH v1 2/2] dma-buf: add lock callback for fcntl system call

2013-07-12 Thread Inki Dae

This patch adds lock callback to dma buf file operations,
and this callback will be called by fcntl system call.

With this patch, fcntl system call can be used for buffer
synchronization between CPU and CPU, and CPU and DMA in user mode.

Signed-off-by: Inki Dae inki@samsung.com
Signed-off-by: Kyungmin Park kyungmin.p...@samsung.com
---
 drivers/base/dma-buf.c |   33 +
 1 files changed, 33 insertions(+), 0 deletions(-)

diff --git a/drivers/base/dma-buf.c b/drivers/base/dma-buf.c
index 9a26981..e1b8583 100644
--- a/drivers/base/dma-buf.c
+++ b/drivers/base/dma-buf.c
@@ -80,9 +80,42 @@ static int dma_buf_mmap_internal(struct file *file, struct 
vm_area_struct *vma)
return dmabuf-ops-mmap(dmabuf, vma);
 }
 
+static int dma_buf_lock(struct file *file, int cmd, struct file_lock *fl)
+{
+   struct dma_buf *dmabuf;
+   unsigned int type;
+   bool wait = false;
+
+   if (!is_dma_buf_file(file))
+   return -EINVAL;
+
+   dmabuf = file-private_data;
+
+   if ((fl-fl_type  F_UNLCK) == F_UNLCK) {
+   dmabuf_sync_single_unlock(dmabuf);
+   return 0;
+   }
+
+   /* convert flock type to dmabuf sync type. */
+   if ((fl-fl_type  F_WRLCK) == F_WRLCK)
+   type = DMA_BUF_ACCESS_W;
+   else if ((fl-fl_type  F_RDLCK) == F_RDLCK)
+   type = DMA_BUF_ACCESS_R;
+   else
+   return -EINVAL;
+
+   if (fl-fl_flags  FL_SLEEP)
+   wait = true;
+
+   /* TODO. the locking to certain region should also be considered. */
+
+   return dmabuf_sync_single_lock(dmabuf, type, wait);
+}
+
 static const struct file_operations dma_buf_fops = {
.release= dma_buf_release,
.mmap   = dma_buf_mmap_internal,
+   .lock   = dma_buf_lock,
 };
 
 /*
-- 
1.7.5.4

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH 0/2 v5] Introduce buffer synchronization framework

2013-07-12 Thread Inki Dae

Hi all,

This patch set introduces a buffer synchronization framework based
on DMA BUF[1] and based on ww-mutexes[2] for lock mechanism.

The purpose of this framework is to provide not only buffer access
control to CPU and CPU, and CPU and DMA, and DMA and DMA but also
easy-to-use interfaces for device drivers and user application.
In addtion, this patch set suggests a way for enhancing performance.

For generic user mode interface, we have used fcntl system call[3].
As you know, user application sees a buffer object as a dma-buf file
descriptor. So fcntl() call with the file descriptor means to lock
some buffer region being managed by the dma-buf object.
For more detail, you can refer to the dma-buf-sync.txt in Documentation/

Moreover, we had tried to find how we could utilize limited hardware
resources more using buffer synchronization mechanism. And finally,
we have realized that it could enhance performance using multi threads
with this buffer synchronization mechanism: DMA and CPU works individually
so CPU could perform other works while DMA is performing some works,
and vise versa.

However, in the conventional way, that is not easy to do so because DMA
operation is depend on CPU operation, and vice versa.

Conventional way:
User Kernel
-
CPU writes something to src
send the src to driver-
 update DMA register
request DMA start(1)---
 DMA start
-completion signal(2)--
CPU accesses dst

(1) Request DMA start after the CPU access to src buffer is completed.
(2) Access dst buffer after DMA access to the dst buffer is completed.

On the other hand, if there is something to control buffer access between CPU
and DMA? The below shows that:

User(thread a)  User(thread b)Kernel
-
send a src to driver--
  update DMA register
lock the src
request DMA start(1)--
CPU acccess to src
unlock the srclock src and dst
  DMA start
-completion signal(2)-
lock dst  DMA completion
CPU access to dst unlock src and dst
unlock DST

(1) Try to start DMA operation while CPU is accessing the src buffer.
(2) Try CPU access to dst buffer while DMA is accessing the dst buffer.

This means that CPU or DMA could do more works.

In the same way, we could reduce hand shaking overhead between
two processes when those processes need to share a shared buffer.
There may be other cases that we could reduce overhead as well.


References:
[1] http://lwn.net/Articles/470339/
[2] https://patchwork.kernel.org/patch/2625361/
[3] http://linux.die.net/man/2/fcntl


Inki Dae (2):
  [RFC PATCH v5 1/2] dmabuf-sync: Introduce buffer synchronization framework
  [RFC PATCH v1 2/2] dma-buf: add lock callback for fcntl system call

 Documentation/dma-buf-sync.txt |  290 +
 drivers/base/Kconfig   |7 +
 drivers/base/Makefile  |1 +
 drivers/base/dma-buf.c |   37 +++
 drivers/base/dmabuf-sync.c |  674 
 include/linux/dma-buf.h|   16 +
 include/linux/dmabuf-sync.h|  178 +++
 7 files changed, 1203 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/dma-buf-sync.txt
 create mode 100644 drivers/base/dmabuf-sync.c
 create mode 100644 include/linux/dmabuf-sync.h

-- 
1.7.5.4

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH v4 0/2] Introduce buffer synchronization framework

2013-07-10 Thread Inki Dae

Hi all,

This patch set introduces a buffer synchronization framework based
on DMA BUF[1] and reservation[2] to use dma-buf resource, and based
on ww-mutexes[3] for lock mechanism.

The purpose of this framework is to provide not only buffer access
control to CPU and CPU, and CPU and DMA, and DMA and DMA but also
easy-to-use interfaces for device drivers and user application.
In addtion, this patch set suggests a way for enhancing performance.

to implement generic user mode interface, we have used fcntl system
call[4]. As you know, user application sees a buffer object as
a dma-buf file descriptor. So fcntl() call with the file descriptor
means to lock some buffer region being managed by the dma-buf object.
For more detail, you can refer to the dma-buf-sync.txt in Documentation/

Moreover, we had tried to find how we could utilize limited hardware
resources more using buffer synchronization mechanism. And finally,
we have realized that it could enhance performance using multi threads
with this buffer synchronization mechanism: DMA and CPU works individually
so CPU could perform other works while DMA is performing some works,
and vise versa.

However, in the conventional way, that is not easy to do so because DMA
operation is depend on CPU operation, and vice versa.

Conventional way:
User Kernel
-
CPU writes something to src
send the src to driver-
 update DMA register
request DMA start(1)---
 DMA start
-completion signal(2)--
CPU accesses dst

(1) Request DMA start after the CPU access to src buffer is completed.
(2) Access dst buffer after DMA access to the dst buffer is completed.

On the other hand, if there is something to control buffer access between CPU
and DMA? The below shows that:

User(thread a)  User(thread b)Kernel
-
send a src to driver--
  update DMA register
lock the src
request DMA start(1)--
CPU acccess to src
unlock the srclock src and dst
  DMA start
-completion signal(2)-
lock dst  DMA completion
CPU access to dst unlock src and dst
unlock DST

(1) Try to start DMA operation while CPU is accessing the src buffer.
(2) Try CPU access to dst buffer while DMA is accessing the dst buffer.

This means that CPU or DMA could do more works.

In the same way, we could reduce hand shaking overhead between
two processes when those processes need to share a shared buffer.
There may be other cases that we could reduce overhead as well.


References:
[1] http://lwn.net/Articles/470339/
[2] http://lwn.net/Articles/532616/
[3] https://patchwork.kernel.org/patch/2625361/
[4] http://linux.die.net/man/2/fcntl

Inki Dae (2):
  dmabuf-sync: Introduce buffer synchronization framework
  dma-buf: add lock callback for fcntl system call.

 Documentation/dma-buf-sync.txt |  283 +
 drivers/base/Kconfig   |7 +
 drivers/base/Makefile  |1 +
 drivers/base/dma-buf.c |   34 ++
 drivers/base/dmabuf-sync.c |  661 
 include/linux/dma-buf.h|   14 +
 include/linux/dmabuf-sync.h|  132 
 include/linux/reservation.h|9 +
 8 files changed, 1141 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/dma-buf-sync.txt
 create mode 100644 drivers/base/dmabuf-sync.c
 create mode 100644 include/linux/dmabuf-sync.h

-- 
1.7.5.4

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH v1 2/2] dma-buf: add lock callback for fcntl system call.

2013-07-10 Thread Inki Dae

This patch adds lock callback to dma buf file operations,
and this callback will be called by fcntl system call.

With this patch, fcntl system call can be used for buffer
synchronization between CPU and CPU, and CPU and DMA in user mode.

Signed-off-by: Inki Dae inki@samsung.com
Signed-off-by: Kyungmin Park kyungmin.p...@samsung.com
---
 drivers/base/dma-buf.c |   34 ++
 1 files changed, 34 insertions(+), 0 deletions(-)

diff --git a/drivers/base/dma-buf.c b/drivers/base/dma-buf.c
index fe39120..cd71447 100644
--- a/drivers/base/dma-buf.c
+++ b/drivers/base/dma-buf.c
@@ -31,6 +31,7 @@
 #include linux/debugfs.h
 #include linux/seq_file.h
 #include linux/reservation.h
+#include linux/dmabuf-sync.h
 
 static inline int is_dma_buf_file(struct file *);
 
@@ -82,9 +83,42 @@ static int dma_buf_mmap_internal(struct file *file, struct 
vm_area_struct *vma)
return dmabuf-ops-mmap(dmabuf, vma);
 }
 
+static int dma_buf_lock(struct file *file, int cmd, struct file_lock *fl)
+{
+   struct dma_buf *dmabuf;
+   unsigned int type;
+   bool wait = false;
+
+   if (!is_dma_buf_file(file))
+   return -EINVAL;
+
+   dmabuf = file-private_data;
+
+   if ((fl-fl_type  F_UNLCK) == F_UNLCK) {
+   dmabuf_sync_single_unlock(dmabuf);
+   return 0;
+   }
+
+   /* convert flock type to dmabuf sync type. */
+   if ((fl-fl_type  F_WRLCK) == F_WRLCK)
+   type = DMA_BUF_ACCESS_W;
+   else if ((fl-fl_type  F_RDLCK) == F_RDLCK)
+   type = DMA_BUF_ACCESS_R;
+   else
+   return -EINVAL;
+
+   if (fl-fl_flags  FL_SLEEP)
+   wait = true;
+
+   /* TODO. the locking to certain region should also be considered. */
+
+   return dmabuf_sync_single_lock(dmabuf, type, wait);
+}
+
 static const struct file_operations dma_buf_fops = {
.release= dma_buf_release,
.mmap   = dma_buf_mmap_internal,
+   .lock   = dma_buf_lock,
 };
 
 /*
-- 
1.7.5.4

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH v4 1/2] dmabuf-sync: Introduce buffer synchronization framework

2013-07-10 Thread Inki Dae

=4030bdee9bab5841ad32faade528d04cc0c5fc94


https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-exynos.git/

commit/?h=dmabuf-syncid=6ca548e9ea9e865592719ef6b1cde58366af9f5c

And this framework includes fcntl system call[4] as interfaces exported
to user. As you know, user sees a buffer object as a dma-buf file descriptor.
So fcntl() call with the file descriptor means to lock some buffer region being
managed by the dma-buf object.

The below is how to use for user application:
struct flock filelock;

1. Lock a dma buf:
filelock.l_type = F_WRLCK or F_RDLCK;

/* lock entire region to the dma buf. */
filelock.lwhence = SEEK_CUR;
filelock.l_start = 0;
filelock.l_len = 0;

fcntl(dmabuf fd, F_SETLKW or F_SETLK, filelock);
...
CPU access to the dma buf

2. Unlock a dma buf:
filelock.l_type = F_UNLCK;

fcntl(dmabuf fd, F_SETLKW or F_SETLK, filelock);

close(dmabuf fd) call would also unlock the dma buf. And for 
more
detail, please refer to [4]


References:
[1] http://lwn.net/Articles/470339/
[2] http://lwn.net/Articles/532616/
[3] https://patchwork.kernel.org/patch/2625361/
[4] http://linux.die.net/man/2/fcntl

Signed-off-by: Inki Dae inki@samsung.com
Signed-off-by: Kyungmin Park kyungmin.p...@samsung.com
---
 Documentation/dma-buf-sync.txt |  283 +
 drivers/base/Kconfig   |7 +
 drivers/base/Makefile  |1 +
 drivers/base/dmabuf-sync.c |  661 
 include/linux/dma-buf.h|   14 +
 include/linux/dmabuf-sync.h|  132 
 include/linux/reservation.h|9 +
 7 files changed, 1107 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/dma-buf-sync.txt
 create mode 100644 drivers/base/dmabuf-sync.c
 create mode 100644 include/linux/dmabuf-sync.h

diff --git a/Documentation/dma-buf-sync.txt b/Documentation/dma-buf-sync.txt
new file mode 100644
index 000..9d12d00
--- /dev/null
+++ b/Documentation/dma-buf-sync.txt
@@ -0,0 +1,283 @@
+DMA Buffer Synchronization Framework
+
+
+  Inki Dae
+  inki dot dae at samsung dot com
+  daeinki at gmail dot com
+
+This document is a guide for device-driver writers describing the DMA buffer
+synchronization API. This document also describes how to use the API to
+use buffer synchronization mechanism between DMA and DMA, CPU and DMA, and
+CPU and CPU.
+
+The DMA Buffer synchronization API provides buffer synchronization mechanism;
+i.e., buffer access control to CPU and DMA, and easy-to-use interfaces for
+device drivers and user application. And this API can be used for all dma
+devices using system memory as dma buffer, especially for most ARM based SoCs.
+
+
+Motivation
+--
+
+Buffer synchronization issue between DMA and DMA:
+   Sharing a buffer, a device cannot be aware of when the other device
+   will access the shared buffer: a device may access a buffer containing
+   wrong data if the device accesses the shared buffer while another
+   device is still accessing the shared buffer.
+   Therefore, a user process should have waited for the completion of DMA
+   access by another device before a device tries to access the shared
+   buffer.
+
+Buffer synchronization issue between CPU and DMA:
+   A user process should consider that when having to send a buffer, filled
+   by CPU, to a device driver for the device driver to access the buffer as
+   a input buffer while CPU and DMA are sharing the buffer.
+   This means that the user process needs to understand how the device
+   driver is worked. Hence, the conventional mechanism not only makes
+   user application complicated but also incurs performance overhead.
+
+Buffer synchronization issue between CPU and CPU:
+   In case that two processes share one buffer; shared with DMA also,
+   they may need some mechanism to allow process B to access the shared
+   buffer after the completion of CPU access by process A.
+   Therefore, process B should have waited for the completion of CPU access
+   by process A using the mechanism before trying to access the shared
+   buffer.
+
+What is the best way to solve these buffer synchronization issues?
+   We may need a common object that a device driver and a user process
+   notify the common object of when they try to access a shared buffer.
+   That way we could decide when we have to allow or not to allow for CPU
+   or DMA to access the shared buffer through the common object.
+   If so, what could become the common object? Right, that's a dma-buf[1].
+   Now we have already been using the dma-buf

Re: [RFC PATCH] dmabuf-sync: Introduce buffer synchronization framework

2013-06-26 Thread Inki Dae

2013/6/25 Jerome Glisse j.gli...@gmail.com:
 On Tue, Jun 25, 2013 at 10:17 AM, Inki Dae daei...@gmail.com wrote:
 2013/6/25 Rob Clark robdcl...@gmail.com:
 On Tue, Jun 25, 2013 at 5:09 AM, Inki Dae daei...@gmail.com wrote:
 that
 should be the role of kernel memory management which of course needs
 synchronization btw A and B. But in no case this should be done using
 dma-buf. dma-buf is for sharing content btw different devices not
 sharing resources.


 hmm, is that true? And are you sure? Then how do you think about
 reservation? the reservation also uses dma-buf with same reason as long
 as I
 know: actually, we use reservation to use dma-buf. As you may know, a
 reservation object is allocated and initialized when a buffer object is
 exported to a dma buf.

 no, this is why the reservation object can be passed in when you
 construction the dmabuf.

 Right, that way, we could use dma buf for buffer synchronization. I
 just wanted to ask for why Jerome said that dma-buf is for sharing
 content btw different devices not sharing resources.

 From memory, the motivation of dma-buf was to done for few use case,
 among them webcam capturing frame into a buffer and having gpu using
 it directly without memcpy, or one big gpu rendering a scene into a
 buffer that is then use by low power gpu for display ie it was done to
 allow different device to operate on same data using same backing
 memory.

 AFAICT you seem to want to use dma-buf to create scratch buffer, ie a
 process needs to use X amount of memory for an operation, it can
 release|free this memory once its done
 and a process B can the use
 this X memory for its own operation discarding content of process A.
 presume that next frame would have the sequence repeat, process A do
 something, then process B does its thing.
 So to me it sounds like you
 want to implement global scratch buffer using the dmabuf API and that
 sounds bad to me.

 I know most closed driver have several pool of memory, long lived
 object, short lived object and scratch space, then user space allocate
 from one of this pool and there is synchronization done by driver
 using driver specific API to reclaim memory.
 Of course this work
 nicely if you only talking about one logic block or at very least hw
 that have one memory controller.

 Now if you are thinking of doing scratch buffer for several different
 device and share the memory among then you need to be aware of
 security implication, most obvious being that you don't want process B
 being able to read process A scratch memory.
 I know the argument about
 it being graphic but one day this might become gpu code and it might
 be able to insert jump to malicious gpu code.


If you think so, it seems like that there is *definitely* your
misunderstanding. My approach is similar to dma fence: it guarantees
that a DMA cannot access a buffer while other DMA is accessing the
buffer. I guess now some gpu drivers in mainline have been using
specific mechanism for it. And when it comes to the portion you
commented, please know that I just introduced user side mechanism for
buffer sychronization between CPU and CPU, and CPU and DMA in
addition; not implemented but just planned.

Thanks,
Inki Dae

 Cheers,
 Jerome
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH] dmabuf-sync: Introduce buffer synchronization framework

2013-06-25 Thread Inki Dae

2013/6/25 Rob Clark robdcl...@gmail.com:
 On Tue, Jun 25, 2013 at 5:09 AM, Inki Dae daei...@gmail.com wrote:
 that
 should be the role of kernel memory management which of course needs
 synchronization btw A and B. But in no case this should be done using
 dma-buf. dma-buf is for sharing content btw different devices not
 sharing resources.


 hmm, is that true? And are you sure? Then how do you think about
 reservation? the reservation also uses dma-buf with same reason as long as I
 know: actually, we use reservation to use dma-buf. As you may know, a
 reservation object is allocated and initialized when a buffer object is
 exported to a dma buf.

 no, this is why the reservation object can be passed in when you
 construction the dmabuf.

Right, that way, we could use dma buf for buffer synchronization. I
just wanted to ask for why Jerome said that dma-buf is for sharing
content btw different devices not sharing resources.

 The fallback is for dmabuf to create it's
 own, for compatibility and to make life easier for simple devices with
 few buffers... but I think pretty much all drm drivers would embed the
 reservation object in the gem buffer and pass it in when the dmabuf is
 created.

 It is pretty much imperative that synchronization works independently
 of dmabuf, you really don't want to have two different cases to deal
 with in your driver, one for synchronizing non-exported objects, and
 one for synchronizing dmabuf objects.


Now my approach is concentrating on the most basic implementation,
buffer synchronization mechanism between CPU and CPU, CPU and DMA, and
DMA and DMA.  But I think reserveration could be used for other
purposes such as pipe line synchronization independently of dmabuf as
you said. Actually, I had already implemented pipe line
synchronization mechanism using the reservation: in case of MALI-400
DDK, there was pipe line issue between gp and pp jobs, and we had
solved the issue using the pipe line synchronization mechanism with
the reservation. So, we could add more features anytime; those two
different cases, dmabuf objects and non-exported objects, if needed
because we are using the reservation object.

Thanks,
Inki Dae

 BR,
 -R
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH v2] dmabuf-sync: Introduce buffer synchronization framework

2013-06-21 Thread Inki Dae

2013/6/21 Lucas Stach l.st...@pengutronix.de:
 Hi Inki,

 please refrain from sending HTML Mails, it makes proper quoting without
 messing up the layout everywhere pretty hard.


Sorry about that. I should have used text mode.

 Am Freitag, den 21.06.2013, 20:01 +0900 schrieb Inki Dae:
 [...]

 Yeah, you'll some knowledge and understanding about the API
 you are
 working with to get things right. But I think it's not an
 unreasonable
 thing to expect the programmer working directly with kernel
 interfaces
 to read up on how things work.

 Second thing: I'll rather have *one* consistent API for every
 subsystem,
 even if they differ from each other than having to implement
 this
 syncpoint thing in every subsystem. Remember: a single execbuf
 in DRM
 might reference both GEM objects backed by dma-buf as well
 native SHM or
 CMA backed objects. The dma-buf-mgr proposal already allows
 you to
 handle dma-bufs much the same way during validation than
 native GEM
 objects.

 Actually, at first I had implemented a fence helper framework based on
 reservation and dma fence to provide easy-use-interface for device
 drivers. However, that was wrong implemention: I had not only
 customized the dma fence but also not considered dead lock issue.
 After that, I have reimplemented it as dmabuf sync to solve dead
 issue, and at that time, I realized that we first need to concentrate
 on the most basic thing: the fact CPU and CPU, CPU and DMA, or DMA and
 DMA can access a same buffer, And the fact simple is the best, and the
 fact we need not only kernel side but also user side interfaces. After
 that, I collected what is the common part for all subsystems, and I
 have devised this dmabuf sync framework for it. I'm not really
 specialist in Desktop world. So question. isn't the execbuf used only
 for the GPU? the gpu has dedicated video memory(VRAM) so it needs
 migration mechanism between system memory and the dedicated video
 memory, and also to consider ordering issue while be migrated.


 Yeah, execbuf is pretty GPU specific, but I don't see how this matters
 for this discussion. Also I don't see a big difference between embedded
 and desktop GPUs. Buffer migration is more of a detail here. Both take
 command stream that potentially reference other buffers, which might be
 native GEM or dma-buf backed objects. Both have to make sure the buffers
 are in the right domain (caches cleaned and address mappings set up) and
 are available for the desired operation, meaning you have to sync with
 other DMA engines and maybe also with CPU.

Yeah, right. Then, in case of desktop gpu, does't it need additional
something to do when a buffer/s is/are migrated from system memory to
video memory domain, or from video memory to system memory domain? I
guess the below members does similar thing, and all other DMA devices
would not need them:
struct fence {
  ...
  unsigned int context, seqno;
  ...
};

And,
   struct seqno_fence {
 ...
 uint32_t seqno_ofs;
 ...
   };


 The only case where sync isn't clearly defined right now by the current
 API entrypoints is when you access memory through the dma-buf fallback
 mmap support, which might happen with some software processing element
 in a video pipeline or something. I agree that we will need a userspace
 interface here, but I think this shouldn't be yet another sync object,
 but rather more a prepare/fini_cpu_access ioctl on the dma-buf which
 hooks into the existing dma-fence and reservation stuff.

I think we don't need addition ioctl commands for that. I am thinking
of using existing resources as possible. My idea also is similar in
using the reservation stuff to your idea because my approach also
should use the dma-buf resource. However, My idea is that a user
process, that wants buffer synchronization with the other, sees a sync
object as a file descriptor like dma-buf does. The below shows simple
my idea about it:

ioctl(dmabuf_fd, DMA_BUF_IOC_OPEN_SYNC, sync);

flock(sync-fd, LOCK_SH); - LOCK_SH means a shared lock.
CPU access for read
flock(sync-fd, LOCK_UN);

Or

flock(sync-fd, LOCK_EX); - LOCK_EX means an exclusive lock
CPU access for write
flock(sync-fd, LOCK_UN);

close(sync-fd);

As you know, that's similar to dmabuf export feature.

In addition, a more simple idea,
flock(dmabuf_fd, LOCK_SH/EX);
CPU access for read/write
flock(dmabuf_fd, LOCK_UN);

However, I'm not sure that the above examples could be worked well,
and there are no problems yet: actually, I don't fully understand
flock mechanism, so looking into it.



 And to get back to my original point: if you have more than
 one task
 operating together on a buffer you absolutely need some kind
 of real IPC

RE: [RFC PATCH v2] dmabuf-sync: Introduce buffer synchronization framework

2013-06-20 Thread Inki Dae

 -Original Message-
 From: Lucas Stach [mailto:l.st...@pengutronix.de]
 Sent: Thursday, June 20, 2013 4:47 PM
 To: Inki Dae
 Cc: 'Russell King - ARM Linux'; 'Inki Dae'; 'linux-fbdev'; 'YoungJun Cho';
 'Kyungmin Park'; 'myungjoo.ham'; 'DRI mailing list'; linux-arm-
 ker...@lists.infradead.org; linux-media@vger.kernel.org
 Subject: Re: [RFC PATCH v2] dmabuf-sync: Introduce buffer synchronization
 framework

 Am Donnerstag, den 20.06.2013, 15:43 +0900 schrieb Inki Dae:

   -Original Message-
   From: dri-devel-bounces+inki.dae=samsung@lists.freedesktop.org
   [mailto:dri-devel-bounces+inki.dae=samsung@lists.freedesktop.org]
 On
   Behalf Of Russell King - ARM Linux
   Sent: Thursday, June 20, 2013 3:29 AM
   To: Inki Dae
   Cc: linux-fbdev; DRI mailing list; Kyungmin Park; myungjoo.ham;
 YoungJun
   Cho; linux-media@vger.kernel.org; linux-arm-ker...@lists.infradead.org
   Subject: Re: [RFC PATCH v2] dmabuf-sync: Introduce buffer
 synchronization
   framework

   On Thu, Jun 20, 2013 at 12:10:04AM +0900, Inki Dae wrote:
On the other hand, the below shows how we could enhance the
 conventional
way with my approach (just example):

CPU - DMA,
ioctl(qbuf command)  ioctl(streamon)
  |   |
  |   |
qbuf  - dma_buf_sync_get   start streaming - syncpoint

dma_buf_sync_get just registers a sync buffer(dmabuf) to sync object.
   And
the syncpoint is performed by calling dma_buf_sync_lock(), and then
 DMA
accesses the sync buffer.

And DMA - CPU,
ioctl(dqbuf command)
  |
  |
dqbuf - nothing to do

Actual syncpoint is when DMA operation is completed (in interrupt
   handler):
the syncpoint is performed by calling dma_buf_sync_unlock().
Hence,  my approach is to move the syncpoints into just before dma
   access
as long as possible.

   What you've just described does *not* work on architectures such as
   ARMv7 which do speculative cache fetches from memory at any time that
   that memory is mapped with a cacheable status, and will lead to data
   corruption.

  I didn't explain that enough. Sorry about that. 'nothing to do' means
 that a
  dmabuf sync interface isn't called but existing functions are called. So
  this may be explained again:
  ioctl(dqbuf command)
  |
  |
  dqbuf - 1. dma_unmap_sg
  2. dma_buf_sync_unlock (syncpoint)

  The syncpoint I mentioned means lock mechanism; not doing cache
 operation.

  In addition, please see the below more detail examples.

  The conventional way (without dmabuf-sync) is:
  Task A

   1. CPU accesses buf
   2. Send the buf to Task B
   3. Wait for the buf from Task B
   4. go to 1

  Task B
  ---
  1. Wait for the buf from Task A
  2. qbuf the buf
  2.1 insert the buf to incoming queue
  3. stream on
  3.1 dma_map_sg if ready, and move the buf to ready queue
  3.2 get the buf from ready queue, and dma start.
  4. dqbuf
  4.1 dma_unmap_sg after dma operation completion
  4.2 move the buf to outgoing queue
  5. back the buf to Task A
  6. go to 1

  In case that two tasks share buffers, and data flow goes from Task A to
 Task
  B, we would need IPC operation to send and receive buffers properly
 between
  those two tasks every time CPU or DMA access to buffers is started or
  completed.

  With dmabuf-sync:

  Task A

   1. dma_buf_sync_lock - synpoint (call by user side)
   2. CPU accesses buf
   3. dma_buf_sync_unlock - syncpoint (call by user side)
   4. Send the buf to Task B (just one time)
   5. go to 1

  Task B
  ---
  1. Wait for the buf from Task A (just one time)
  2. qbuf the buf
  1.1 insert the buf to incoming queue
  3. stream on
  3.1 dma_buf_sync_lock - syncpoint (call by kernel side)
  3.2 dma_map_sg if ready, and move the buf to ready queue
  3.3 get the buf from ready queue, and dma start.
  4. dqbuf
  4.1 dma_buf_sync_unlock - syncpoint (call by kernel side)
  4.2 dma_unmap_sg after dma operation completion
  4.3 move the buf to outgoing queue
  5. go to 1

  On the other hand, in case of using dmabuf-sync, as you can see the
 above
  example, we would need IPC operation just one time. That way, I think we
  could not only reduce performance overhead but also make user
 application
  simplified. Of course, this approach can be used for all DMA device
 drivers
  such as DRM. I'm not a specialist in v4l2 world so there may be missing
  point.

 You already need some kind of IPC between the two tasks, as I suspect
 even in your example it wouldn't make much sense to queue the buffer
 over and over again in task B without task A writing

RE: [RFC PATCH v2] dmabuf-sync: Introduce buffer synchronization framework

2013-06-20 Thread Inki Dae

 -Original Message-
 From: Lucas Stach [mailto:l.st...@pengutronix.de]
 Sent: Thursday, June 20, 2013 7:11 PM
 To: Inki Dae
 Cc: 'Russell King - ARM Linux'; 'Inki Dae'; 'linux-fbdev'; 'YoungJun Cho';
 'Kyungmin Park'; 'myungjoo.ham'; 'DRI mailing list'; linux-arm-
 ker...@lists.infradead.org; linux-media@vger.kernel.org
 Subject: Re: [RFC PATCH v2] dmabuf-sync: Introduce buffer synchronization
 framework

 Am Donnerstag, den 20.06.2013, 17:24 +0900 schrieb Inki Dae:
 [...]
In addition, please see the below more detail examples.

The conventional way (without dmabuf-sync) is:
Task A

 1. CPU accesses buf
 2. Send the buf to Task B
 3. Wait for the buf from Task B
 4. go to 1

Task B
---
1. Wait for the buf from Task A
2. qbuf the buf
2.1 insert the buf to incoming queue
3. stream on
3.1 dma_map_sg if ready, and move the buf to ready queue
3.2 get the buf from ready queue, and dma start.
4. dqbuf
4.1 dma_unmap_sg after dma operation completion
4.2 move the buf to outgoing queue
5. back the buf to Task A
6. go to 1

In case that two tasks share buffers, and data flow goes from Task A
 to
   Task
B, we would need IPC operation to send and receive buffers properly
   between
those two tasks every time CPU or DMA access to buffers is started
 or
completed.

With dmabuf-sync:

Task A

 1. dma_buf_sync_lock - synpoint (call by user side)
 2. CPU accesses buf
 3. dma_buf_sync_unlock - syncpoint (call by user side)
 4. Send the buf to Task B (just one time)
 5. go to 1

Task B
---
1. Wait for the buf from Task A (just one time)
2. qbuf the buf
1.1 insert the buf to incoming queue
3. stream on
3.1 dma_buf_sync_lock - syncpoint (call by kernel side)
3.2 dma_map_sg if ready, and move the buf to ready queue
3.3 get the buf from ready queue, and dma start.
4. dqbuf
4.1 dma_buf_sync_unlock - syncpoint (call by kernel side)
4.2 dma_unmap_sg after dma operation completion
4.3 move the buf to outgoing queue
5. go to 1

On the other hand, in case of using dmabuf-sync, as you can see the
   above
example, we would need IPC operation just one time. That way, I
 think we
could not only reduce performance overhead but also make user
   application
simplified. Of course, this approach can be used for all DMA device
   drivers
such as DRM. I'm not a specialist in v4l2 world so there may be
 missing
point.

   You already need some kind of IPC between the two tasks, as I suspect
   even in your example it wouldn't make much sense to queue the buffer
   over and over again in task B without task A writing anything to it.
 So
   task A has to signal task B there is new data in the buffer to be
   processed.

   There is no need to share the buffer over and over again just to get
 the
   two processes to work together on the same thing. Just share the fd
   between both and then do out-of-band completion signaling, as you need
   this anyway. Without this you'll end up with unpredictable behavior.
   Just because sync allows you to access the buffer doesn't mean it's
   valid for your use-case. Without completion signaling you could easily
   end up overwriting your data from task A multiple times before task B
   even tries to lock the buffer for processing.

   So the valid flow is (and this already works with the current APIs):
   Task ATask B
   ----
   CPU access buffer
--completion signal-
 qbuf (dragging buffer into
 device domain, flush caches,
 reserve buffer etc.)
   |
 wait for device operation to
 complete
   |
 dqbuf (dragging buffer back
 into CPU domain, invalidate
 caches, unreserve)
   -completion signal
   CPU access buffer

  Correct. In case that data flow goes from A to B, it needs some kind
  of IPC between the two tasks every time as you said. Then, without
  dmabuf-sync, how do think about the case that two tasks share the same
  buffer but these tasks access the buffer(buf1) as write, and data of
  the buffer(buf1) isn't needed to be shared?

 Sorry, I don't see the point you are trying to solve here. If you share

[RFC PATCH v3] dmabuf-sync: Introduce buffer synchronization framework

2013-06-19 Thread Inki Dae

/

commit/?h=dmabuf-syncid=6ca548e9ea9e865592719ef6b1cde58366af9f5c

[1] http://lwn.net/Articles/470339/
[2] http://lwn.net/Articles/532616/
[3] https://patchwork.kernel.org/patch/2625361/

Signed-off-by: Inki Dae inki@samsung.com
Signed-off-by: Kyungmin Park kyungmin.p...@samsung.com
---
 Documentation/dma-buf-sync.txt |  199 
 drivers/base/Kconfig   |7 +
 drivers/base/Makefile  |1 +
 drivers/base/dmabuf-sync.c |  501 
 include/linux/dma-buf.h|   14 ++
 include/linux/dmabuf-sync.h|  115 +
 include/linux/reservation.h|7 +
 7 files changed, 844 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/dma-buf-sync.txt
 create mode 100644 drivers/base/dmabuf-sync.c
 create mode 100644 include/linux/dmabuf-sync.h

diff --git a/Documentation/dma-buf-sync.txt b/Documentation/dma-buf-sync.txt
new file mode 100644
index 000..134de7b
--- /dev/null
+++ b/Documentation/dma-buf-sync.txt
@@ -0,0 +1,199 @@
+DMA Buffer Synchronization Framework
+
+
+  Inki Dae
+  inki dot dae at samsung dot com
+  daeinki at gmail dot com
+
+This document is a guide for device-driver writers describing the DMA buffer
+synchronization API. This document also describes how to use the API to
+use buffer synchronization mechanism between DMA and DMA, CPU and DMA, and
+CPU and CPU.
+
+The DMA Buffer synchronization API provides buffer synchronization mechanism;
+i.e., buffer access control to CPU and DMA, and easy-to-use interfaces for
+device drivers and potentially user application (not implemented for user
+applications, yet). And this API can be used for all dma devices using system
+memory as dma buffer, especially for most ARM based SoCs.
+
+
+Motivation
+--
+
+Buffer synchronization issue between DMA and DMA:
+   Sharing a buffer, a device cannot be aware of when the other device
+   will access the shared buffer: a device may access a buffer containing
+   wrong data if the device accesses the shared buffer while another
+   device is still accessing the shared buffer.
+   Therefore, a user process should have waited for the completion of DMA
+   access by another device before a device tries to access the shared
+   buffer.
+
+Buffer synchronization issue between CPU and DMA:
+   A user process should consider that when having to send a buffer, filled
+   by CPU, to a device driver for the device driver to access the buffer as
+   a input buffer while CPU and DMA are sharing the buffer.
+   This means that the user process needs to understand how the device
+   driver is worked. Hence, the conventional mechanism not only makes
+   user application complicated but also incurs performance overhead.
+
+Buffer synchronization issue between CPU and CPU:
+   In case that two processes share one buffer; shared with DMA also,
+   they may need some mechanism to allow process B to access the shared
+   buffer after the completion of CPU access by process A.
+   Therefore, process B should have waited for the completion of CPU access
+   by process A using the mechanism before trying to access the shared
+   buffer.
+
+What is the best way to solve these buffer synchronization issues?
+   We may need a common object that a device driver and a user process
+   notify the common object of when they try to access a shared buffer.
+   That way we could decide when we have to allow or not to allow for CPU
+   or DMA to access the shared buffer through the common object.
+   If so, what could become the common object? Right, that's a dma-buf[1].
+   Now we have already been using the dma-buf to share one buffer with
+   other drivers.
+
+
+Basic concept
+-
+
+The mechanism of this framework has the following steps,
+1. Register dmabufs to a sync object - A task gets a new sync object and
+can add one or more dmabufs that the task wants to access.
+This registering should be performed when a device context or an event
+context such as a page flip event is created or before CPU accesses a 
shared
+buffer.
+
+   dma_buf_sync_get(a sync object, a dmabuf);
+
+2. Lock a sync object - A task tries to lock all dmabufs added in its own
+sync object. Basically, the lock mechanism uses ww-mutexes[2] to avoid dead
+lock issue and for race condition between CPU and CPU, CPU and DMA, and DMA
+and DMA. Taking a lock means that others cannot access all locked dmabufs
+until the task that locked the corresponding dmabufs, unlocks all the 
locked
+dmabufs.
+This locking should be performed before DMA or CPU accesses these dmabufs.
+
+   dma_buf_sync_lock(a sync object);
+
+3. Unlock a sync object

RE: [RFC PATCH v2] dmabuf-sync: Introduce buffer synchronization framework

2013-06-19 Thread Inki Dae

 -Original Message-
 From: Lucas Stach [mailto:l.st...@pengutronix.de]
 Sent: Wednesday, June 19, 2013 7:22 PM
 To: Inki Dae
 Cc: 'Russell King - ARM Linux'; 'linux-fbdev'; 'Kyungmin Park'; 'DRI
 mailing list'; 'myungjoo.ham'; 'YoungJun Cho'; linux-arm-
 ker...@lists.infradead.org; linux-media@vger.kernel.org
 Subject: Re: [RFC PATCH v2] dmabuf-sync: Introduce buffer synchronization
 framework

 Am Mittwoch, den 19.06.2013, 14:45 +0900 schrieb Inki Dae:

   -Original Message-
   From: Lucas Stach [mailto:l.st...@pengutronix.de]
   Sent: Tuesday, June 18, 2013 6:47 PM
   To: Inki Dae
   Cc: 'Russell King - ARM Linux'; 'linux-fbdev'; 'Kyungmin Park'; 'DRI
   mailing list'; 'myungjoo.ham'; 'YoungJun Cho'; linux-arm-
   ker...@lists.infradead.org; linux-media@vger.kernel.org
   Subject: Re: [RFC PATCH v2] dmabuf-sync: Introduce buffer
 synchronization
   framework

   Am Dienstag, den 18.06.2013, 18:04 +0900 schrieb Inki Dae:
   [...]

 a display device driver.  It shouldn't be used within a single
 driver
 as a means of passing buffers between userspace and kernel space.

What I try to do is not really such ugly thing. What I try to do is
 to
notify that, when CPU tries to access a buffer , to kernel side
 through
dmabuf interface. So it's not really to send the buffer to kernel.

Thanks,
Inki Dae

   The most basic question about why you are trying to implement this
 sort
   of thing in the dma_buf framework still stands.

   Once you imported a dma_buf into your DRM driver it's a GEM object and
   you can and should use the native DRM ioctls to prepare/end a CPU
 access
   to this BO. Then internally to your driver you can use the dma_buf
   reservation/fence stuff to provide the necessary cross-device sync.

  I don't really want that is used only for DRM drivers. We really need
  it for all other DMA devices; i.e., v4l2 based drivers. That is what I
  try to do. And my approach uses reservation to use dma-buf resources
  but not dma fence stuff anymore. However, I'm looking into Radeon DRM
  driver for why we need dma fence stuff, and how we can use it if
  needed.

 Still I don't see the point why you need syncpoints above dma-buf. In
 both the DRM and the V4L2 world we have defined points in the API where
 a buffer is allowed to change domain from device to CPU and vice versa.

 In DRM if you want to access a buffer with the CPU you do a cpu_prepare.
 The buffer changes back to GPU domain once you do the execbuf
 validation, queue a pageflip to the buffer or similar things.

 In V4L2 the syncpoints for cache operations are the queue/dequeue API
 entry points. Those are also the exact points to synchronize with other
 hardware thus using dma-buf reserve/fence.

If so, what if we want to access a buffer with the CPU _in V4L2_? We should 
open a drm device node, and then do a cpu_prepare? 

Thanks,
Inki Dae

 In all this I can't see any need for a new syncpoint primitive slapped
 on top of dma-buf.

 Regards,
 Lucas
 --
 Pengutronix e.K.   | Lucas Stach |
 Industrial Linux Solutions | http://www.pengutronix.de/  |
 Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-5076 |
 Amtsgericht Hildesheim, HRA 2686   | Fax:   +49-5121-206917- |

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [RFC PATCH v2] dmabuf-sync: Introduce buffer synchronization framework

2013-06-18 Thread Inki Dae

 -Original Message-
 From: Russell King - ARM Linux [mailto:li...@arm.linux.org.uk]
 Sent: Tuesday, June 18, 2013 5:43 PM
 To: Inki Dae
 Cc: 'Maarten Lankhorst'; 'linux-fbdev'; 'Kyungmin Park'; 'DRI mailing
 list'; 'Rob Clark'; 'myungjoo.ham'; 'YoungJun Cho'; 'Daniel Vetter';
 linux-arm-ker...@lists.infradead.org; linux-media@vger.kernel.org
 Subject: Re: [RFC PATCH v2] dmabuf-sync: Introduce buffer synchronization
 framework

 On Tue, Jun 18, 2013 at 02:27:40PM +0900, Inki Dae wrote:
  So I'd like to ask for other DRM maintainers. How do you think about it?
 it
  seems like that Intel DRM (maintained by Daniel), OMAP DRM (maintained
 by
  Rob) and GEM CMA helper also have same issue Russell pointed out. I
 think
  not only the above approach but also the performance is very important.

 CMA uses coherent memory to back their buffers, though that might not be
 true of memory obtained from other drivers via dma_buf.  Plus, there is
 no support in the CMA helper for exporting or importng these buffers.

It's not so. Please see Dave's drm next. recently dmabuf support for the CMA
helper has been merged to there.

 I guess Intel i915 is only used on x86, which is a coherent platform and
 requires no cache maintanence for DMA.

 OMAP DRM does not support importing non-DRM buffers buffers back into

Correct. TODO yet.

 DRM.  Moreover, it will suffer from the problems I described if any
 attempt is made to write to the buffer after it has been re-imported.

 Lastly, I should point out that the dma_buf stuff is really only useful
 when you need to export a dma buffer from one driver and import it into
 another driver - for example to pass data from a camera device driver to

Most people know that.

 a display device driver.  It shouldn't be used within a single driver
 as a means of passing buffers between userspace and kernel space.

What I try to do is not really such ugly thing. What I try to do is to
notify that, when CPU tries to access a buffer , to kernel side through
dmabuf interface. So it's not really to send the buffer to kernel.

Thanks,
Inki Dae

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [RFC PATCH v2] dmabuf-sync: Introduce buffer synchronization framework

2013-06-18 Thread Inki Dae

 -Original Message-
 From: Lucas Stach [mailto:l.st...@pengutronix.de]
 Sent: Tuesday, June 18, 2013 6:47 PM
 To: Inki Dae
 Cc: 'Russell King - ARM Linux'; 'linux-fbdev'; 'Kyungmin Park'; 'DRI
 mailing list'; 'myungjoo.ham'; 'YoungJun Cho'; linux-arm-
 ker...@lists.infradead.org; linux-media@vger.kernel.org
 Subject: Re: [RFC PATCH v2] dmabuf-sync: Introduce buffer synchronization
 framework

 Am Dienstag, den 18.06.2013, 18:04 +0900 schrieb Inki Dae:
 [...]

   a display device driver.  It shouldn't be used within a single driver
   as a means of passing buffers between userspace and kernel space.

  What I try to do is not really such ugly thing. What I try to do is to
  notify that, when CPU tries to access a buffer , to kernel side through
  dmabuf interface. So it's not really to send the buffer to kernel.

  Thanks,
  Inki Dae

 The most basic question about why you are trying to implement this sort
 of thing in the dma_buf framework still stands.

 Once you imported a dma_buf into your DRM driver it's a GEM object and
 you can and should use the native DRM ioctls to prepare/end a CPU access
 to this BO. Then internally to your driver you can use the dma_buf
 reservation/fence stuff to provide the necessary cross-device sync.

I don't really want that is used only for DRM drivers. We really need it for 
all other DMA devices; i.e., v4l2 based drivers. That is what I try to do. And 
my approach uses reservation to use dma-buf resources but not dma fence stuff 
anymore. However, I'm looking into Radeon DRM driver for why we need dma fence 
stuff, and how we can use it if needed.

Thanks,
Inki Dae

 Regards,
 Lucas
 --
 Pengutronix e.K.   | Lucas Stach |
 Industrial Linux Solutions | http://www.pengutronix.de/  |
 Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-5076 |
 Amtsgericht Hildesheim, HRA 2686   | Fax:   +49-5121-206917- |

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH v2] dmabuf-sync: Introduce buffer synchronization framework

2013-06-17 Thread Inki Dae

/git/daeinki/drm-exynos.git/

commit/?h=dmabuf-syncid=6ca548e9ea9e865592719ef6b1cde58366af9f5c

The framework performs cache operation based on the previous and current access
types to the dmabufs after the locks to all dmabufs are taken:
Call dma_buf_begin_cpu_access() to invalidate cache if,
previous access type is DMA_BUF_ACCESS_WRITE | DMA and
current access type is DMA_BUF_ACCESS_READ

Call dma_buf_end_cpu_access() to clean cache if,
previous access type is DMA_BUF_ACCESS_WRITE and
current access type is DMA_BUF_ACCESS_READ | DMA

Such cache operations are invoked via dma-buf interfaces so the dma buf exporter
should implement dmabuf-ops-begin_cpu_access/end_cpu_access callbacks.

[1] http://lwn.net/Articles/470339/
[2] http://lwn.net/Articles/532616/
[3] https://patchwork-mail1.kernel.org/patch/2625321/

Signed-off-by: Inki Dae inki@samsung.com
Signed-off-by: Kyungmin Park kyungmin.p...@samsung.com
---
 Documentation/dma-buf-sync.txt |  246 ++
 drivers/base/Kconfig   |7 +
 drivers/base/Makefile  |1 +
 drivers/base/dmabuf-sync.c |  545 
 include/linux/dma-buf.h|   14 +
 include/linux/dmabuf-sync.h|  115 +
 include/linux/reservation.h|7 +
 7 files changed, 935 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/dma-buf-sync.txt
 create mode 100644 drivers/base/dmabuf-sync.c
 create mode 100644 include/linux/dmabuf-sync.h

diff --git a/Documentation/dma-buf-sync.txt b/Documentation/dma-buf-sync.txt
new file mode 100644
index 000..e71b6f4
--- /dev/null
+++ b/Documentation/dma-buf-sync.txt
@@ -0,0 +1,246 @@
+DMA Buffer Synchronization Framework
+
+
+  Inki Dae
+  inki dot dae at samsung dot com
+  daeinki at gmail dot com
+
+This document is a guide for device-driver writers describing the DMA buffer
+synchronization API. This document also describes how to use the API to
+use buffer synchronization between CPU and CPU, CPU and DMA, and DMA and DMA.
+
+The DMA Buffer synchronization API provides buffer synchronization mechanism;
+i.e., buffer access control to CPU and DMA, cache operations, and easy-to-use
+interfaces for device drivers and potentially user application
+(not implemented for user applications, yet). And this API can be used for all
+dma devices using system memory as dma buffer, especially for most ARM based
+SoCs.
+
+
+Motivation
+--
+
+Sharing a buffer, a device cannot be aware of when the other device will access
+the shared buffer: a device may access a buffer containing wrong data if
+the device accesses the shared buffer while another device is still accessing
+the shared buffer. Therefore, a user process should have waited for
+the completion of DMA access by another device before a device tries to access
+the shared buffer.
+
+Besides, there is the same issue when CPU and DMA are sharing a buffer; i.e.,
+a user process should consider that when the user process have to send a buffer
+to a device driver for the device driver to access the buffer as input.
+This means that a user process needs to understand how the device driver is
+worked. Hence, the conventional mechanism not only makes user application
+complicated but also incurs performance overhead because the conventional
+mechanism cannot control devices precisely without additional and complex
+implemantations.
+
+In addition, in case of ARM based SoCs, most devices have no hardware cache
+consistency mechanisms between CPU and DMA devices because they do not use ACP
+(Accelerator Coherency Port). ACP can be connected to DMA engine or similar
+devices in order to keep cache coherency between CPU cache and DMA device.
+Thus, we need additional cache operations to have the devices operate properly;
+i.e., user applications should request cache operations to kernel before DMA
+accesses the buffer and after the completion of buffer access by CPU, or vise
+versa.
+
+   buffer access by CPU - cache clean - buffer access by DMA
+
+Or,
+   buffer access by DMA - cache invalidate - buffer access by CPU
+
+The below shows why cache operations should be requested by user
+process,
+(Presume that CPU and DMA share a buffer and the buffer is mapped
+ with user space as cachable)
+
+   handle = drm_gem_alloc(size);
+   ...
+   va1 = drm_gem_mmap(handle1);
+   va2 = malloc(size);
+   ...
+
+   while(conditions) {
+   memcpy(va1, some data, size);
+   ...
+   drm_xxx_set_dma_buffer(handle, ...);
+   ...
+
+   /* user need to request cache clean at here. */
+
+   /* blocked until dma operation is completed. */
+   drm_xxx_start_dma

RE: [RFC PATCH v2] dmabuf-sync: Introduce buffer synchronization framework

2013-06-17 Thread Inki Dae

 -Original Message-
 From: Maarten Lankhorst [mailto:maarten.lankho...@canonical.com]
 Sent: Monday, June 17, 2013 8:35 PM
 To: Inki Dae
 Cc: dri-de...@lists.freedesktop.org; linux-fb...@vger.kernel.org; linux-
 arm-ker...@lists.infradead.org; linux-media@vger.kernel.org;
 dan...@ffwll.ch; robdcl...@gmail.com; kyungmin.p...@samsung.com;
 myungjoo@samsung.com; yj44@samsung.com
 Subject: Re: [RFC PATCH v2] dmabuf-sync: Introduce buffer synchronization
 framework

 Op 17-06-13 13:15, Inki Dae schreef:
  This patch adds a buffer synchronization framework based on DMA BUF[1]
  and reservation[2] to use dma-buf resource, and based on ww-mutexes[3]
  for lock mechanism.

  The purpose of this framework is not only to couple cache operations,
  and buffer access control to CPU and DMA but also to provide easy-to-use
  interfaces for device drivers and potentially user application
  (not implemented for user applications, yet). And this framework can be
  used for all dma devices using system memory as dma buffer, especially
  for most ARM based SoCs.

  Changelog v2:
  - use atomic_add_unless to avoid potential bug.
  - add a macro for checking valid access type.
  - code clean.

  The mechanism of this framework has the following steps,
  1. Register dmabufs to a sync object - A task gets a new sync object
 and
  can add one or more dmabufs that the task wants to access.
  This registering should be performed when a device context or an
 event
  context such as a page flip event is created or before CPU accesses
a
 shared
  buffer.

  dma_buf_sync_get(a sync object, a dmabuf);

  2. Lock a sync object - A task tries to lock all dmabufs added in
its
 own
  sync object. Basically, the lock mechanism uses ww-mutex[1] to avoid
 dead
  lock issue and for race condition between CPU and CPU, CPU and DMA,
 and DMA
  and DMA. Taking a lock means that others cannot access all locked
 dmabufs
  until the task that locked the corresponding dmabufs, unlocks all
the
 locked
  dmabufs.
  This locking should be performed before DMA or CPU accesses these
 dmabufs.

  dma_buf_sync_lock(a sync object);

  3. Unlock a sync object - The task unlocks all dmabufs added in its
 own sync
  object. The unlock means that the DMA or CPU accesses to the dmabufs
 have
  been completed so that others may access them.
  This unlocking should be performed after DMA or CPU has completed
 accesses
  to the dmabufs.

  dma_buf_sync_unlock(a sync object);

  4. Unregister one or all dmabufs from a sync object - A task
 unregisters
  the given dmabufs from the sync object. This means that the task
 dosen't
  want to lock the dmabufs.
  The unregistering should be performed after DMA or CPU has completed
  accesses to the dmabufs or when dma_buf_sync_lock() is failed.

  dma_buf_sync_put(a sync object, a dmabuf);
  dma_buf_sync_put_all(a sync object);

  The described steps may be summarized as:
  get - lock - CPU or DMA access to a buffer/s - unlock - put

  This framework includes the following two features.
  1. read (shared) and write (exclusive) locks - A task is required to
 declare
  the access type when the task tries to register a dmabuf;
  READ, WRITE, READ DMA, or WRITE DMA.

  The below is example codes,
  struct dmabuf_sync *sync;

  sync = dmabuf_sync_init(NULL, test sync);

  dmabuf_sync_get(sync, dmabuf, DMA_BUF_ACCESS_READ);
  ...

  And the below can be used as access types:
  DMA_BUF_ACCESS_READ,
  - CPU will access a buffer for read.
  DMA_BUF_ACCESS_WRITE,
  - CPU will access a buffer for read or write.
  DMA_BUF_ACCESS_READ | DMA_BUF_ACCESS_DMA,
  - DMA will access a buffer for read
  DMA_BUF_ACCESS_WRITE | DMA_BUF_ACCESS_DMA,
  - DMA will access a buffer for read or write.

  2. Mandatory resource releasing - a task cannot hold a lock
 indefinitely.
  A task may never try to unlock a buffer after taking a lock to the
 buffer.
  In this case, a timer handler to the corresponding sync object is
 called
  in five (default) seconds and then the timed-out buffer is unlocked
 by work
  queue handler to avoid lockups and to enforce resources of the
buffer.

  The below is how to use:
  1. Allocate and Initialize a sync object:
  struct dmabuf_sync *sync;

  sync = dmabuf_sync_init(NULL, test sync);
  ...

  2. Add a dmabuf to the sync object when setting up dma buffer
 relevant
 registers:
  dmabuf_sync_get(sync, dmabuf, DMA_BUF_ACCESS_READ);
  ...

  3. Lock all dmabufs of the sync object before DMA or CPU accesses
 the dmabufs:
  dmabuf_sync_lock(sync);
  ...

  4. Now CPU or DMA can access all dmabufs locked

RE: [RFC PATCH v2] dmabuf-sync: Introduce buffer synchronization framework

2013-06-17 Thread Inki Dae

 -Original Message-
 From: Russell King - ARM Linux [mailto:li...@arm.linux.org.uk]
 Sent: Tuesday, June 18, 2013 3:21 AM
 To: Inki Dae
 Cc: Maarten Lankhorst; linux-fbdev; Kyungmin Park; DRI mailing list; Rob
 Clark; myungjoo.ham; YoungJun Cho; Daniel Vetter; linux-arm-
 ker...@lists.infradead.org; linux-media@vger.kernel.org
 Subject: Re: [RFC PATCH v2] dmabuf-sync: Introduce buffer synchronization
 framework

 On Tue, Jun 18, 2013 at 02:19:04AM +0900, Inki Dae wrote:
  It seems like that all pages of the scatterlist should be mapped with
  DMA every time DMA operation  is started (or drm_xxx_set_src_dma_buffer
  function call), and the pages should be unmapped from DMA again every
  time DMA operation is completed: internally, including each cache
  operation.

 Correct.

  Isn't that big overhead?

 Yes, if you _have_ to do a cache operation to ensure that the DMA agent
 can see the data the CPU has written.

  And If there is no problem, we should accept such overhead?

 If there is no problem then why are we discussing this and why do we need
 this API extension? :)

Ok, another issue regardless of dmabuf-sync. Reasonable to me even though
big overhead. Besides, it seems like that most DRM drivers have same issue.
Therefore, we may need to solve this issue like below:
- do not map a dmabuf with DMA. And just create/update buffer object
of importer.
- map the buffer with DMA every time DMA start or iommu page fault
occurs.
- unmap the buffer from DMA every time DMA operation is completed

With the above approach, cache operation portion of my approach,
dmabuf-sync, can be removed. However, I'm not sure that we really have to
use the above approach with a big overhead. Of course, if we don't use the
above approach then user processes would need to do each cache operation
before DMA operation is started and also after DMA operation is completed in
some cases; user space mapped with physical memory as cachable, and CPU and
DMA share the same buffer.

So I'd like to ask for other DRM maintainers. How do you think about it? it
seems like that Intel DRM (maintained by Daniel), OMAP DRM (maintained by
Rob) and GEM CMA helper also have same issue Russell pointed out. I think
not only the above approach but also the performance is very important.

Thanks,
Inki Dae

  Actually, drm_gem_fd_to_handle() includes to map a
  given dmabuf with iommu table (just logical data) of the DMA. And then,
 the
  device address (or iova) already mapped will be set to buffer register
 of
  the DMA with drm_xxx_set_src/dst_dma_buffer(handle1, ...) call.

 Consider this with a PIPT cache:

   dma_map_sg()- at this point, the kernel addresses of these
   buffers are cleaned and invalidated for the DMA

   userspace writes to the buffer, the data sits in the CPU cache
   Because the cache is PIPT, this data becomes visible to the
   kernel as well.

   DMA is started, and it writes to the buffer

 Now, at this point, which is the correct data?  The data physically in the
 RAM which the DMA has written, or the data in the CPU cache.  It may
 the answer is - they both are, and the loss of either can be a potential
 data corruption issue - there is no way to tell which data should be
 kept but the system is in an inconsistent state and _one_ of them will
 have to be discarded.

   dma_unmap_sg()  - at this point, the kernel addresses of the
   buffers are _invalidated_ and any data in those
   cache lines is discarded

 Which also means that the data in userspace will also be discarded with
 PIPT caches.

 This is precisely why we have buffer rules associated with the DMA API,
 which are these:

   dma_map_sg()
   - the buffer transfers ownership from the CPU to the DMA agent.
   - the CPU may not alter the buffer in any way.
   while (cpu_wants_access) {
   dma_sync_sg_for_cpu()
   - the buffer transfers ownership from the DMA to the CPU.
   - the DMA may not alter the buffer in any way.
   dma_sync_sg_for_device()
   - the buffer transfers ownership from the CPU to the DMA
   - the CPU may not alter the buffer in any way.
   }
   dma_unmap_sg()
   - the buffer transfers ownership from the DMA to the CPU.
   - the DMA may not alter the buffer in any way.

 Any violation of that is likely to lead to data corruption.  Now, some
 may say that the DMA API is only about the kernel mapping.  Yes it is,
 because it takes no regard what so ever about what happens with the
 userspace mappings.  This is absolutely true when you have VIVT or
 aliasing VIPT caches.

 Now consider that with a PIPT cache, or a non-aliasing VIPT cache (which
 is exactly the same behaviourally from this aspect) any modification of
 a userspace mapping is precisely the same as modifying the kernel space
 mapping - and what

[RFC PATCH] dmabuf-sync: Introduce buffer synchronization framework

2013-06-13 Thread Inki Dae

 performs cache operation based on the previous and current access
types to the dmabufs after the locks to all dmabufs are taken:
Call dma_buf_begin_cpu_access() to invalidate cache if,
previous access type is DMA_BUF_ACCESS_WRITE | DMA and
current access type is DMA_BUF_ACCESS_READ

Call dma_buf_end_cpu_access() to clean cache if,
previous access type is DMA_BUF_ACCESS_WRITE and
current access type is DMA_BUF_ACCESS_READ | DMA

Such cache operations are invoked via dma-buf interfaces so the dma buf exporter
should implement dmabuf-ops-begin_cpu_access/end_cpu_access callbacks.

[1] http://lwn.net/Articles/470339/
[2] http://lwn.net/Articles/532616/
[3] https://patchwork-mail1.kernel.org/patch/2625321/

Signed-off-by: Inki Dae inki@samsung.com
---
 Documentation/dma-buf-sync.txt |  246 ++
 drivers/base/Kconfig   |7 +
 drivers/base/Makefile  |1 +
 drivers/base/dmabuf-sync.c |  555 
 include/linux/dma-buf.h|5 +
 include/linux/dmabuf-sync.h|  115 +
 include/linux/reservation.h|7 +
 7 files changed, 936 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/dma-buf-sync.txt
 create mode 100644 drivers/base/dmabuf-sync.c
 create mode 100644 include/linux/dmabuf-sync.h

diff --git a/Documentation/dma-buf-sync.txt b/Documentation/dma-buf-sync.txt
new file mode 100644
index 000..e71b6f4
--- /dev/null
+++ b/Documentation/dma-buf-sync.txt
@@ -0,0 +1,246 @@
+DMA Buffer Synchronization Framework
+
+
+  Inki Dae
+  inki dot dae at samsung dot com
+  daeinki at gmail dot com
+
+This document is a guide for device-driver writers describing the DMA buffer
+synchronization API. This document also describes how to use the API to
+use buffer synchronization between CPU and CPU, CPU and DMA, and DMA and DMA.
+
+The DMA Buffer synchronization API provides buffer synchronization mechanism;
+i.e., buffer access control to CPU and DMA, cache operations, and easy-to-use
+interfaces for device drivers and potentially user application
+(not implemented for user applications, yet). And this API can be used for all
+dma devices using system memory as dma buffer, especially for most ARM based
+SoCs.
+
+
+Motivation
+--
+
+Sharing a buffer, a device cannot be aware of when the other device will access
+the shared buffer: a device may access a buffer containing wrong data if
+the device accesses the shared buffer while another device is still accessing
+the shared buffer. Therefore, a user process should have waited for
+the completion of DMA access by another device before a device tries to access
+the shared buffer.
+
+Besides, there is the same issue when CPU and DMA are sharing a buffer; i.e.,
+a user process should consider that when the user process have to send a buffer
+to a device driver for the device driver to access the buffer as input.
+This means that a user process needs to understand how the device driver is
+worked. Hence, the conventional mechanism not only makes user application
+complicated but also incurs performance overhead because the conventional
+mechanism cannot control devices precisely without additional and complex
+implemantations.
+
+In addition, in case of ARM based SoCs, most devices have no hardware cache
+consistency mechanisms between CPU and DMA devices because they do not use ACP
+(Accelerator Coherency Port). ACP can be connected to DMA engine or similar
+devices in order to keep cache coherency between CPU cache and DMA device.
+Thus, we need additional cache operations to have the devices operate properly;
+i.e., user applications should request cache operations to kernel before DMA
+accesses the buffer and after the completion of buffer access by CPU, or vise
+versa.
+
+   buffer access by CPU - cache clean - buffer access by DMA
+
+Or,
+   buffer access by DMA - cache invalidate - buffer access by CPU
+
+The below shows why cache operations should be requested by user
+process,
+(Presume that CPU and DMA share a buffer and the buffer is mapped
+ with user space as cachable)
+
+   handle = drm_gem_alloc(size);
+   ...
+   va1 = drm_gem_mmap(handle1);
+   va2 = malloc(size);
+   ...
+
+   while(conditions) {
+   memcpy(va1, some data, size);
+   ...
+   drm_xxx_set_dma_buffer(handle, ...);
+   ...
+
+   /* user need to request cache clean at here. */
+
+   /* blocked until dma operation is completed. */
+   drm_xxx_start_dma(...);
+   ...
+
+   /* user need to request cache invalidate at here. */
+
+   memcpy(va2, va1, size);
+   }
+
+The issue arises: user processes may

RE: [RFC PATCH] dmabuf-sync: Introduce buffer synchronization framework

2013-06-13 Thread Inki Dae


 +static void dmabuf_sync_timeout_worker(struct work_struct *work)
 +{
 + struct dmabuf_sync *sync = container_of(work, struct dmabuf_sync,
 work);
 + struct dmabuf_sync_object *sobj;
 +
 + mutex_lock(sync-lock);
 +
 + list_for_each_entry(sobj, sync-syncs, head) {
 + if (WARN_ON(!sobj-robj))
 + continue;
 +
 + printk(KERN_WARNING %s: timeout = 0x%x [type = %d,  \
 + refcnt = %d, locked = %d]\n,
 + sync-name, (u32)sobj-dmabuf,
 + sobj-access_type,
 +
atomic_read(sobj-robj-shared_cnt),
 + sobj-robj-locked);
 +
 + /* unlock only valid sync object. */
 + if (!sobj-robj-locked)
 + continue;
 +
 + if (sobj-robj-shared 
 + atomic_read(sobj-robj-shared_cnt)  1) {
 + atomic_dec(sobj-robj-shared_cnt);
 + continue;
 + }
 +
 + ww_mutex_unlock(sobj-robj-lock);
 +
 + if (sobj-access_type  DMA_BUF_ACCESS_READ)
 + printk(KERN_WARNING %s: r-unlocked = 0x%x\n,
 + sync-name, (u32)sobj-dmabuf);
 + else
 + printk(KERN_WARNING %s: w-unlocked = 0x%x\n,
 + sync-name, (u32)sobj-dmabuf);
 +
 +#if defined(CONFIG_DEBUG_FS)
 + sync_debugfs_timeout_cnt++;
 +#endif

Oops, unnecessary codes. will remove them.

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [RFC PATCH] dmabuf-sync: Introduce buffer synchronization framework

2013-06-13 Thread Inki Dae

Hi Russell,

 -Original Message-
 From: Russell King - ARM Linux [mailto:li...@arm.linux.org.uk]
 Sent: Friday, June 14, 2013 2:26 AM
 To: Inki Dae
 Cc: maarten.lankho...@canonical.com; dan...@ffwll.ch; robdcl...@gmail.com;
 linux-fb...@vger.kernel.org; dri-de...@lists.freedesktop.org;
 kyungmin.p...@samsung.com; myungjoo@samsung.com; yj44@samsung.com;
 linux-arm-ker...@lists.infradead.org; linux-media@vger.kernel.org
 Subject: Re: [RFC PATCH] dmabuf-sync: Introduce buffer synchronization
 framework

 On Thu, Jun 13, 2013 at 05:28:08PM +0900, Inki Dae wrote:
  This patch adds a buffer synchronization framework based on DMA BUF[1]
  and reservation[2] to use dma-buf resource, and based on ww-mutexes[3]
  for lock mechanism.

  The purpose of this framework is not only to couple cache operations,
  and buffer access control to CPU and DMA but also to provide easy-to-use
  interfaces for device drivers and potentially user application
  (not implemented for user applications, yet). And this framework can be
  used for all dma devices using system memory as dma buffer, especially
  for most ARM based SoCs.

  The mechanism of this framework has the following steps,
  1. Register dmabufs to a sync object - A task gets a new sync object
 and
  can add one or more dmabufs that the task wants to access.
  This registering should be performed when a device context or an
 event
  context such as a page flip event is created or before CPU accesses
a
 shared
  buffer.

  dma_buf_sync_get(a sync object, a dmabuf);

  2. Lock a sync object - A task tries to lock all dmabufs added in
its
 own
  sync object. Basically, the lock mechanism uses ww-mutex[1] to avoid
 dead
  lock issue and for race condition between CPU and CPU, CPU and DMA,
 and DMA
  and DMA. Taking a lock means that others cannot access all locked
 dmabufs
  until the task that locked the corresponding dmabufs, unlocks all
the
 locked
  dmabufs.
  This locking should be performed before DMA or CPU accesses these
 dmabufs.

  dma_buf_sync_lock(a sync object);

  3. Unlock a sync object - The task unlocks all dmabufs added in its
 own sync
  object. The unlock means that the DMA or CPU accesses to the dmabufs
 have
  been completed so that others may access them.
  This unlocking should be performed after DMA or CPU has completed
 accesses
  to the dmabufs.

  dma_buf_sync_unlock(a sync object);

  4. Unregister one or all dmabufs from a sync object - A task
 unregisters
  the given dmabufs from the sync object. This means that the task
 dosen't
  want to lock the dmabufs.
  The unregistering should be performed after DMA or CPU has completed
  accesses to the dmabufs or when dma_buf_sync_lock() is failed.

  dma_buf_sync_put(a sync object, a dmabuf);
  dma_buf_sync_put_all(a sync object);

  The described steps may be summarized as:
  get - lock - CPU or DMA access to a buffer/s - unlock - put

  This framework includes the following two features.
  1. read (shared) and write (exclusive) locks - A task is required to
 declare
  the access type when the task tries to register a dmabuf;
  READ, WRITE, READ DMA, or WRITE DMA.

  The below is example codes,
  struct dmabuf_sync *sync;

  sync = dmabuf_sync_init(NULL, test sync);

  dmabuf_sync_get(sync, dmabuf, DMA_BUF_ACCESS_READ);
  ...

  And the below can be used as access types:
  DMA_BUF_ACCESS_READ,
  - CPU will access a buffer for read.
  DMA_BUF_ACCESS_WRITE,
  - CPU will access a buffer for read or write.
  DMA_BUF_ACCESS_READ | DMA_BUF_ACCESS_DMA,
  - DMA will access a buffer for read
  DMA_BUF_ACCESS_WRITE | DMA_BUF_ACCESS_DMA,
  - DMA will access a buffer for read or write.

  2. Mandatory resource releasing - a task cannot hold a lock
 indefinitely.
  A task may never try to unlock a buffer after taking a lock to the
 buffer.
  In this case, a timer handler to the corresponding sync object is
 called
  in five (default) seconds and then the timed-out buffer is unlocked
 by work
  queue handler to avoid lockups and to enforce resources of the
buffer.

  The below is how to use:
  1. Allocate and Initialize a sync object:
  struct dmabuf_sync *sync;

  sync = dmabuf_sync_init(NULL, test sync);
  ...

  2. Add a dmabuf to the sync object when setting up dma buffer
 relevant
 registers:
  dmabuf_sync_get(sync, dmabuf, DMA_BUF_ACCESS_READ);
  ...

  3. Lock all dmabufs of the sync object before DMA or CPU accesses
 the dmabufs:
  dmabuf_sync_lock(sync);
  ...

  4. Now CPU or DMA can access all dmabufs locked in step 3.

  5. Unlock all dmabufs added in a sync object after DMA

Introduce a dmabuf sync framework for buffer synchronization

2013-06-07 Thread Inki Dae

 again.

And the below is my concerns and opinions,
A dma buf has a reservation object when a buffer is exported to the dma buf
- I'm not sure but it seems that reservation object is used for x86 gpu;
having vram and different domain, or similar devices. So in case of embedded
system, most dma devices and cpu share system memory so I think that
reservation object should be considered for them also because basically,
buffer synchronization mechanism should be worked based on dmabuf. For this,
I have added four members to reservation_object; shared_cnt and shared for
read lock, accessed_type for cache operation, and locked for timeout case.
Hence, some devices might need specific something for itself. So how about
remaining only common part for reservation_object structure; it seems that
fence_excl, fence_shared, and so on are not common part.

Now wound/wait mutex doesn't consider read and write lock - In case of using
ww-mutexes for buffer synchronization, it seems that we need read and write
lock for more performance; read access and then read access doesn't need to
be locked. For this, I have added three members, shared_cnt and shared to
reservation_object and this is just to show you how we can use read lock.
However, I'm sure that there is a better/more generic way.

The above all things is just quick implementation for buffer synchronization
so this should be more cleaned up and there might be my missing point.
Please give me your advices and opinions.

Thanks,
Inki Dae

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: Introduce a new helper framework for buffer synchronization

2013-05-28 Thread Inki Dae


Hi Daniel,

Thank you so much. And so very useful.:) Sorry but could be give me more
comments to the below my comments? There are still things making me
confusing.:(


 -Original Message-
 From: Daniel Vetter [mailto:daniel.vet...@ffwll.ch] On Behalf Of Daniel
 Vetter
 Sent: Tuesday, May 28, 2013 7:33 PM
 To: Inki Dae
 Cc: 'Rob Clark'; 'Maarten Lankhorst'; 'Daniel Vetter'; 'linux-fbdev';
 'YoungJun Cho'; 'Kyungmin Park'; 'myungjoo.ham'; 'DRI mailing list';
 linux-arm-ker...@lists.infradead.org; linux-media@vger.kernel.org
 Subject: Re: Introduce a new helper framework for buffer synchronization
 
 On Tue, May 28, 2013 at 12:56:57PM +0900, Inki Dae wrote:
 
 
   -Original Message-
   From: linux-fbdev-ow...@vger.kernel.org [mailto:linux-fbdev-
   ow...@vger.kernel.org] On Behalf Of Rob Clark
   Sent: Tuesday, May 28, 2013 12:48 AM
   To: Inki Dae
   Cc: Maarten Lankhorst; Daniel Vetter; linux-fbdev; YoungJun Cho;
 Kyungmin
   Park; myungjoo.ham; DRI mailing list;
  linux-arm-ker...@lists.infradead.org;
   linux-media@vger.kernel.org
   Subject: Re: Introduce a new helper framework for buffer
 synchronization
  
   On Mon, May 27, 2013 at 6:38 AM, Inki Dae inki@samsung.com
wrote:
Hi all,
   
I have been removed previous branch and added new one with more
 cleanup.
This time, the fence helper doesn't include user side interfaces and
   cache
operation relevant codes anymore because not only we are not sure
 that
coupling those two things, synchronizing caches and buffer access
   between
CPU and CPU, CPU and DMA, and DMA and DMA with fences, in kernel
 side is
   a
good idea yet but also existing codes for user side have problems
 with
   badly
behaved or crashing userspace. So this could be more discussed
later.
   
The below is a new branch,
   
https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-
   exynos.git/?h=dma-f
ence-helper
   
And fence helper codes,
   
https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-
   exynos.git/commit/?
h=dma-fence-helperid=adcbc0fe7e285ce866e5816e5e21443dcce01005
   
And example codes for device driver,
   
https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-
   exynos.git/commit/?
h=dma-fence-helperid=d2ce7af23835789602a99d0ccef1f53cdd5caaae
   
I think the time is not yet ripe for RFC posting: maybe existing dma
   fence
and reservation need more review and addition work. So I'd glad for
   somebody
giving other opinions and advices in advance before RFC posting.
  
   thoughts from a *really* quick, pre-coffee, first look:
   * any sort of helper to simplify single-buffer sort of use-cases (v4l)
   probably wouldn't want to bake in assumption that seqno_fence is used.
   * I guess g2d is probably not actually a simple use case, since I
   expect you can submit blits involving multiple buffers :-P
 
  I don't think so. One and more buffers can be used: seqno_fence also has
  only one buffer. Actually, we have already applied this approach to most
  devices; multimedia, gpu and display controller. And this approach shows
  more performance; reduced power consumption against traditional way. And
 g2d
  example is just to show you how to apply my approach to device driver.
 
 Note that seqno_fence is an implementation pattern for a certain type of
 direct hw-hw synchronization which uses a shared dma_buf to exchange the
 sync cookie.

I'm afraid that I don't understand hw-hw synchronization. hw-hw
synchronization means that device has a hardware feature which supports
buffer synchronization hardware internally? And what is the sync cookie?

 The dma_buf attached to the seqno_fence has _nothing_ to do
 with the dma_buf the fence actually coordinates access to.
 
 I think that confusing is a large reason for why MaartenI don't
 understand what you want to achieve with your fence helpers. Currently
 they're using the seqno_fence, but totally not in a way the seqno_fence
 was meant to be used.
 
 Note that with the current code there is only a pointer from dma_bufs to
 the fence. The fence itself has _no_ pointer to the dma_buf it syncs. This
 shouldn't be a problem since the fence fastpath for already signalled
 fences is completely barrierlock free (it's just a load+bit-test), and
 fences are meant to be embedded into whatever dma tracking structure you
 already have, so no overhead there. The only ugly part is the fence
 refcounting, but I don't think we can drop that.

The below is the proposed way,
dma device has to create a fence before accessing a shared buffer, and then
check if there are other dma which are accessing the shared buffer; if exist
then the dma device should be blocked, and then  it sets the fence to
reservation object of the shared buffer. And then the dma begins access to
the shared buffer. And finally, the dma signals its own fence so that other
blocked can be waked up. However, if there was another dma blocked before
signaling

RE: Introduce a new helper framework for buffer synchronization

2013-05-28 Thread Inki Dae

 -Original Message-
 From: linux-fbdev-ow...@vger.kernel.org [mailto:linux-fbdev-
 ow...@vger.kernel.org] On Behalf Of Rob Clark
 Sent: Tuesday, May 28, 2013 10:49 PM
 To: Inki Dae
 Cc: Maarten Lankhorst; Daniel Vetter; linux-fbdev; YoungJun Cho; Kyungmin
 Park; myungjoo.ham; DRI mailing list;
linux-arm-ker...@lists.infradead.org;
 linux-media@vger.kernel.org
 Subject: Re: Introduce a new helper framework for buffer synchronization

 On Mon, May 27, 2013 at 11:56 PM, Inki Dae inki@samsung.com wrote:

  -Original Message-
  From: linux-fbdev-ow...@vger.kernel.org [mailto:linux-fbdev-
  ow...@vger.kernel.org] On Behalf Of Rob Clark
  Sent: Tuesday, May 28, 2013 12:48 AM
  To: Inki Dae
  Cc: Maarten Lankhorst; Daniel Vetter; linux-fbdev; YoungJun Cho;
 Kyungmin
  Park; myungjoo.ham; DRI mailing list;
  linux-arm-ker...@lists.infradead.org;
  linux-media@vger.kernel.org
  Subject: Re: Introduce a new helper framework for buffer
 synchronization

  On Mon, May 27, 2013 at 6:38 AM, Inki Dae inki@samsung.com wrote:
   Hi all,

   I have been removed previous branch and added new one with more
 cleanup.
   This time, the fence helper doesn't include user side interfaces and
  cache
   operation relevant codes anymore because not only we are not sure
 that
   coupling those two things, synchronizing caches and buffer access
  between
   CPU and CPU, CPU and DMA, and DMA and DMA with fences, in kernel side
 is
  a
   good idea yet but also existing codes for user side have problems
 with
  badly
   behaved or crashing userspace. So this could be more discussed later.

   The below is a new branch,

   https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-
  exynos.git/?h=dma-f
   ence-helper

   And fence helper codes,

   https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-
  exynos.git/commit/?
   h=dma-fence-helperid=adcbc0fe7e285ce866e5816e5e21443dcce01005

   And example codes for device driver,

   https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-
  exynos.git/commit/?
   h=dma-fence-helperid=d2ce7af23835789602a99d0ccef1f53cdd5caaae

   I think the time is not yet ripe for RFC posting: maybe existing dma
  fence
   and reservation need more review and addition work. So I'd glad for
  somebody
   giving other opinions and advices in advance before RFC posting.

  thoughts from a *really* quick, pre-coffee, first look:
  * any sort of helper to simplify single-buffer sort of use-cases (v4l)
  probably wouldn't want to bake in assumption that seqno_fence is used.
  * I guess g2d is probably not actually a simple use case, since I
  expect you can submit blits involving multiple buffers :-P

  I don't think so. One and more buffers can be used: seqno_fence also has
  only one buffer. Actually, we have already applied this approach to most
  devices; multimedia, gpu and display controller. And this approach shows
  more performance; reduced power consumption against traditional way. And
 g2d
  example is just to show you how to apply my approach to device driver.

 no, you need the ww-mutex / reservation stuff any time you have
 multiple independent devices (or rings/contexts for hw that can
 support multiple contexts) which can do operations with multiple
 buffers.

I think I already used reservation stuff any time in that way except
ww-mutex. And I'm not sure that embedded system really needs ww-mutex. If
there is any case, 
could you tell me the case? I really need more advice and understanding :)

Thanks,
Inki Dae

  So you could conceivably hit this w/ gpu + g2d if multiple
 buffers where shared between the two.  vram migration and such
 'desktop stuff' might make the problem worse, but just because you
 don't have vram doesn't mean you don't have a problem with multiple
 buffers.

  * otherwise, you probably don't want to depend on dmabuf, which is why
  reservation/fence is split out the way it is..  you want to be able to
  use a single reservation/fence mechanism within your driver without
  having to care about which buffers are exported to dmabuf's and which
  are not.  Creating a dmabuf for every GEM bo is too heavyweight.

  Right. But I think we should dealt with this separately. Actually, we
 are
  trying to use reservation for gpu pipe line synchronization such as sgx
 sync
  object and this approach is used without dmabuf. In order words, some
 device
  can use only reservation for such pipe line synchronization and at the
 same
  time, fence helper or similar thing with dmabuf for buffer
 synchronization.

 it is probably easier to approach from the reverse direction.. ie, get
 reservation/synchronization right first, and then dmabuf.  (Well, that
 isn't really a problem because Maarten's reservation/fence patches
 support dmabuf from the beginning.)

 BR,
 -R

  I'm not entirely sure if reservation/fence could/should be made any
  simpler for multi-buffer users.  Probably the best thing to do is just
  get reservation/fence

RE: Introduce a new helper framework for buffer synchronization

2013-05-28 Thread Inki Dae

 -Original Message-
 From: daniel.vet...@ffwll.ch [mailto:daniel.vet...@ffwll.ch] On Behalf Of
 Daniel Vetter
 Sent: Wednesday, May 29, 2013 1:50 AM
 To: Inki Dae
 Cc: Rob Clark; Maarten Lankhorst; linux-fbdev; YoungJun Cho; Kyungmin
Park;
 myungjoo.ham; DRI mailing list; linux-arm-ker...@lists.infradead.org;
 linux-media@vger.kernel.org
 Subject: Re: Introduce a new helper framework for buffer synchronization

 On Tue, May 28, 2013 at 4:50 PM, Inki Dae inki@samsung.com wrote:
  I think I already used reservation stuff any time in that way except
  ww-mutex. And I'm not sure that embedded system really needs ww-mutex.
 If
  there is any case,
  could you tell me the case? I really need more advice and
 understanding :)

 If you have only one driver, you can get away without ww_mutex.
 drm/i915 does it, all buffer state is protected by dev-struct_mutex.
 But as soon as you have multiple drivers sharing buffers with dma_buf
 things will blow up.

 Yep, current prime is broken and can lead to deadlocks.

 In practice it doesn't (yet) matter since only the X server does the
 sharing dance, and that one's single-threaded. Now you can claim that
 since you have all buffers pinned in embedded gfx anyway, you don't
 care. But both in desktop gfx and embedded gfx the real fun starts
 once you put fences into the mix and link them up with buffers, then
 every command submission risks that deadlock. Furthermore you can get
 unlucky and construct a circle of fences waiting on each another (only
 though if the fence singalling fires off the next batchbuffer
 asynchronously).

In our case, we haven't ever experienced deadlock yet but there is still
possible to face with deadlock in case that a process is sharing two buffer
with another process like below,
Process A committed buffer A and  waits for buffer B,
Process B committed buffer B and waits for buffer A

That is deadlock and it seems that you say we can resolve deadlock issue
with ww-mutexes. And it seems that we can replace our block-wakeup mechanism
with mutex lock for more performance.

 To prevent such deadlocks you _absolutely_ need to lock _all_ buffers
 that take part in a command submission at once. To do that you either
 need a global lock (ugh) or ww_mutexes.

 So ww_mutexes are the fundamental ingredient of all this, if you don't
 see why you need them then everything piled on top is broken. I think
 until you've understood why exactly we need ww_mutexes there's not
 much point in discussing the finer issues of fences, reservation
 objects and how to integrate it with dma_bufs exactly.

 I'll try to clarify the motivating example in the ww_mutex
 documentation a bit, but I dunno how else I could explain this ...

I don't really want for you waste your time on me. I will trying to apply
ww-mutexes (v4) to the proposed framework for more understanding.

Thanks for your advices.:) 
Inki Dae

 Yours, Daniel
 --
 Daniel Vetter
 Software Engineer, Intel Corporation
 +41 (0) 79 365 57 48 - http://blog.ffwll.ch

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: Introduce a new helper framework for buffer synchronization

2013-05-27 Thread Inki Dae

Hi all,

I have been removed previous branch and added new one with more cleanup.
This time, the fence helper doesn't include user side interfaces and cache
operation relevant codes anymore because not only we are not sure that
coupling those two things, synchronizing caches and buffer access between
CPU and CPU, CPU and DMA, and DMA and DMA with fences, in kernel side is a
good idea yet but also existing codes for user side have problems with badly
behaved or crashing userspace. So this could be more discussed later.

The below is a new branch,

https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-exynos.git/?h=dma-f
ence-helper

And fence helper codes,

https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-exynos.git/commit/?
h=dma-fence-helperid=adcbc0fe7e285ce866e5816e5e21443dcce01005

And example codes for device driver,

https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-exynos.git/commit/?
h=dma-fence-helperid=d2ce7af23835789602a99d0ccef1f53cdd5caaae

I think the time is not yet ripe for RFC posting: maybe existing dma fence
and reservation need more review and addition work. So I'd glad for somebody
giving other opinions and advices in advance before RFC posting.

Thanks,
Inki Dae

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: Introduce a new helper framework for buffer synchronization

2013-05-27 Thread Inki Dae

 -Original Message-
 From: Maarten Lankhorst [mailto:maarten.lankho...@canonical.com]
 Sent: Tuesday, May 28, 2013 12:23 AM
 To: Inki Dae
 Cc: 'Daniel Vetter'; 'Rob Clark'; 'linux-fbdev'; 'YoungJun Cho'; 'Kyungmin
 Park'; 'myungjoo.ham'; 'DRI mailing list'; linux-arm-
 ker...@lists.infradead.org; linux-media@vger.kernel.org
 Subject: Re: Introduce a new helper framework for buffer synchronization

 Hey,

 Op 27-05-13 12:38, Inki Dae schreef:
  Hi all,

  I have been removed previous branch and added new one with more cleanup.
  This time, the fence helper doesn't include user side interfaces and
 cache
  operation relevant codes anymore because not only we are not sure that
  coupling those two things, synchronizing caches and buffer access
 between
  CPU and CPU, CPU and DMA, and DMA and DMA with fences, in kernel side is
 a
  good idea yet but also existing codes for user side have problems with
 badly
  behaved or crashing userspace. So this could be more discussed later.

  The below is a new branch,

  https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-
 exynos.git/?h=dma-f
  ence-helper

  And fence helper codes,

  https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-
 exynos.git/commit/?
  h=dma-fence-helperid=adcbc0fe7e285ce866e5816e5e21443dcce01005

  And example codes for device driver,

  https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-
 exynos.git/commit/?
  h=dma-fence-helperid=d2ce7af23835789602a99d0ccef1f53cdd5caaae

  I think the time is not yet ripe for RFC posting: maybe existing dma
 fence
  and reservation need more review and addition work. So I'd glad for
 somebody
  giving other opinions and advices in advance before RFC posting.

 NAK.

 For examples for how to handle locking properly, see Documentation/ww-
 mutex-design.txt in my recent tree.
 I could list what I believe is wrong with your implementation, but real
 problem is that the approach you're taking is wrong.

I just removed ticket stubs to show my approach you guys as simple as
possible, and I just wanted to show that we could use buffer synchronization
mechanism without ticket stubs.

Question, WW-Mutexes could be used for all devices? I guess this has
dependence on x86 gpu: gpu has VRAM and it means different memory domain.
And could you tell my why shared fence should have only eight objects? I
think we could need more than eight objects for read access. Anyway I think
I don't surely understand yet so there might be my missing point.

Thanks,
Inki Dae

 ~Maarten

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: Introduce a new helper framework for buffer synchronization

2013-05-27 Thread Inki Dae

 -Original Message-
 From: linux-fbdev-ow...@vger.kernel.org [mailto:linux-fbdev-
 ow...@vger.kernel.org] On Behalf Of Rob Clark
 Sent: Tuesday, May 28, 2013 12:48 AM
 To: Inki Dae
 Cc: Maarten Lankhorst; Daniel Vetter; linux-fbdev; YoungJun Cho; Kyungmin
 Park; myungjoo.ham; DRI mailing list;
linux-arm-ker...@lists.infradead.org;
 linux-media@vger.kernel.org
 Subject: Re: Introduce a new helper framework for buffer synchronization

 On Mon, May 27, 2013 at 6:38 AM, Inki Dae inki@samsung.com wrote:
  Hi all,

  I have been removed previous branch and added new one with more cleanup.
  This time, the fence helper doesn't include user side interfaces and
 cache
  operation relevant codes anymore because not only we are not sure that
  coupling those two things, synchronizing caches and buffer access
 between
  CPU and CPU, CPU and DMA, and DMA and DMA with fences, in kernel side is
 a
  good idea yet but also existing codes for user side have problems with
 badly
  behaved or crashing userspace. So this could be more discussed later.

  The below is a new branch,

  https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-
 exynos.git/?h=dma-f
  ence-helper

  And fence helper codes,

  https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-
 exynos.git/commit/?
  h=dma-fence-helperid=adcbc0fe7e285ce866e5816e5e21443dcce01005

  And example codes for device driver,

  https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-
 exynos.git/commit/?
  h=dma-fence-helperid=d2ce7af23835789602a99d0ccef1f53cdd5caaae

  I think the time is not yet ripe for RFC posting: maybe existing dma
 fence
  and reservation need more review and addition work. So I'd glad for
 somebody
  giving other opinions and advices in advance before RFC posting.

 thoughts from a *really* quick, pre-coffee, first look:
 * any sort of helper to simplify single-buffer sort of use-cases (v4l)
 probably wouldn't want to bake in assumption that seqno_fence is used.
 * I guess g2d is probably not actually a simple use case, since I
 expect you can submit blits involving multiple buffers :-P

I don't think so. One and more buffers can be used: seqno_fence also has
only one buffer. Actually, we have already applied this approach to most
devices; multimedia, gpu and display controller. And this approach shows
more performance; reduced power consumption against traditional way. And g2d
example is just to show you how to apply my approach to device driver.

 * otherwise, you probably don't want to depend on dmabuf, which is why
 reservation/fence is split out the way it is..  you want to be able to
 use a single reservation/fence mechanism within your driver without
 having to care about which buffers are exported to dmabuf's and which
 are not.  Creating a dmabuf for every GEM bo is too heavyweight.

Right. But I think we should dealt with this separately. Actually, we are
trying to use reservation for gpu pipe line synchronization such as sgx sync
object and this approach is used without dmabuf. In order words, some device
can use only reservation for such pipe line synchronization and at the same
time, fence helper or similar thing with dmabuf for buffer synchronization.

 I'm not entirely sure if reservation/fence could/should be made any
 simpler for multi-buffer users.  Probably the best thing to do is just
 get reservation/fence rolled out in a few drivers and see if some
 common patterns emerge.

 BR,
 -R

  Thanks,
  Inki Dae

 --
 To unsubscribe from this list: send the line unsubscribe linux-fbdev in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: Introduce a new helper framework for buffer synchronization

2013-05-27 Thread Inki Dae

 -Original Message-
 From: daniel.vet...@ffwll.ch [mailto:daniel.vet...@ffwll.ch] On Behalf Of
 Daniel Vetter
 Sent: Tuesday, May 28, 2013 1:02 AM
 To: Rob Clark
 Cc: Inki Dae; Maarten Lankhorst; linux-fbdev; YoungJun Cho; Kyungmin Park;
 myungjoo.ham; DRI mailing list; linux-arm-ker...@lists.infradead.org;
 linux-media@vger.kernel.org
 Subject: Re: Introduce a new helper framework for buffer synchronization

 On Mon, May 27, 2013 at 5:47 PM, Rob Clark robdcl...@gmail.com wrote:
  On Mon, May 27, 2013 at 6:38 AM, Inki Dae inki@samsung.com wrote:
  Hi all,

  I have been removed previous branch and added new one with more
cleanup.
  This time, the fence helper doesn't include user side interfaces and
 cache
  operation relevant codes anymore because not only we are not sure that
  coupling those two things, synchronizing caches and buffer access
 between
  CPU and CPU, CPU and DMA, and DMA and DMA with fences, in kernel side
 is a
  good idea yet but also existing codes for user side have problems with
 badly
  behaved or crashing userspace. So this could be more discussed later.

  The below is a new branch,

  https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-
 exynos.git/?h=dma-f
  ence-helper

  And fence helper codes,

  https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-
 exynos.git/commit/?
  h=dma-fence-helperid=adcbc0fe7e285ce866e5816e5e21443dcce01005

  And example codes for device driver,

  https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-
 exynos.git/commit/?
  h=dma-fence-helperid=d2ce7af23835789602a99d0ccef1f53cdd5caaae

  I think the time is not yet ripe for RFC posting: maybe existing dma
 fence
  and reservation need more review and addition work. So I'd glad for
 somebody
  giving other opinions and advices in advance before RFC posting.

  thoughts from a *really* quick, pre-coffee, first look:
  * any sort of helper to simplify single-buffer sort of use-cases (v4l)
  probably wouldn't want to bake in assumption that seqno_fence is used.

 Yeah, which is why MaartenI discussed ideas already for what needs to
 be improved in the current dma-buf interface code to make this Just
 Work. At least as long as a driver doesn't want to add new fences,
 which would be especially useful for all kinds of gpu access.

  * I guess g2d is probably not actually a simple use case, since I
  expect you can submit blits involving multiple buffers :-P

 Yeah, on a quick read the current fence helper code seems to be a bit
 limited in scope.

  * otherwise, you probably don't want to depend on dmabuf, which is why
  reservation/fence is split out the way it is..  you want to be able to
  use a single reservation/fence mechanism within your driver without
  having to care about which buffers are exported to dmabuf's and which
  are not.  Creating a dmabuf for every GEM bo is too heavyweight.

 That's pretty much the reason that reservations are free-standing from
 dma_bufs. The idea is to embed them into the gem/ttm/v4l buffer
 object. Maarten also has some helpers to keep track of multi-buffer
 ww_mutex locking and fence attaching in his reservation helpers, but I
 think we should wait with those until we have drivers using them.

 For now I think the priority should be to get the basic stuff in and
 ttm as the first user established. Then we can go nuts later on.

  I'm not entirely sure if reservation/fence could/should be made any
  simpler for multi-buffer users.  Probably the best thing to do is just
  get reservation/fence rolled out in a few drivers and see if some
  common patterns emerge.

 I think we can make the 1 buffer per dma op (i.e. 1:1
 dma_buf-reservation : fence mapping) work fairly simple in dma_buf
 with maybe a dma_buf_attachment_start_dma/end_dma helpers. But there's
 also still the open that currently there's no way to flush cpu caches
 for dma access without unmapping the attachement (or resorting to

That was what I tried adding user interfaces to dmabuf: coupling
synchronizing caches and buffer access between CPU and CPU, CPU and DMA, and
DMA and DMA with fences in kernel side. We need something to do between
mapping and unmapping attachment.

 trick which might not work), so we have a few gaping holes in the
 interface already anyway.
 -Daniel
 --
 Daniel Vetter
 Software Engineer, Intel Corporation
 +41 (0) 79 365 57 48 - http://blog.ffwll.ch

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: Introduce a new helper framework for buffer synchronization

2013-05-23 Thread Inki Dae

 -Original Message-
 From: daniel.vet...@ffwll.ch [mailto:daniel.vet...@ffwll.ch] On Behalf Of
 Daniel Vetter
 Sent: Thursday, May 23, 2013 8:56 PM
 To: Inki Dae
 Cc: Rob Clark; linux-fbdev; DRI mailing list; Kyungmin Park; myungjoo.ham;
 YoungJun Cho; linux-arm-ker...@lists.infradead.org; linux-
 me...@vger.kernel.org
 Subject: Re: Introduce a new helper framework for buffer synchronization

 On Tue, May 21, 2013 at 11:22 AM, Inki Dae inki@samsung.com wrote:
  -Original Message-
  From: Daniel Vetter [mailto:daniel.vet...@ffwll.ch] On Behalf Of Daniel
  Vetter
  Sent: Tuesday, May 21, 2013 4:45 PM
  To: Inki Dae
  Cc: 'Daniel Vetter'; 'Rob Clark'; 'linux-fbdev'; 'DRI mailing list';
  'Kyungmin Park'; 'myungjoo.ham'; 'YoungJun Cho'; linux-arm-
  ker...@lists.infradead.org; linux-media@vger.kernel.org
  Subject: Re: Introduce a new helper framework for buffer
 synchronization

  On Tue, May 21, 2013 at 04:03:06PM +0900, Inki Dae wrote:
- Integration of fence syncing into dma_buf. Imo we should have a
  per-attachment mode which decides whether map/unmap (and the new
  sync)
  should wait for fences or whether the driver takes care of
 syncing
  through the new fence framework itself (for direct hw sync).

   I think it's a good idea to have per-attachment mode for buffer sync.
  But
   I'd like to say we make sure what is the purpose of map
   (dma_buf_map_attachment)first. This interface is used to get a sgt;
   containing pages to physical memory region, and map that region with
   device's iommu table. The important thing is that this interface is
  called
   just one time when user wants to share an allocated buffer with dma.
 But
  cpu
   will try to access the buffer every time as long as it wants.
 Therefore,
  we
   need cache control every time cpu and dma access the shared buffer:
  cache
   clean when cpu goes to dma and cache invalidate when dma goes to cpu.
  That
   is why I added new interfaces, DMA_BUF_GET_FENCE and
 DMA_BUF_PUT_FENCE,
  to
   dma buf framework. Of course, Those are ugly so we need a better way:
 I
  just
   wanted to show you that in such way, we can synchronize the shared
  buffer
   between cpu and dma. By any chance, is there any way that kernel can
 be
   aware of when cpu accesses the shared buffer or is there any point I
  didn't
   catch up?

  Well dma_buf_map/unmap_attachment should also flush/invalidate any
 caches,
  and with the current dma_buf spec those two functions are the _only_
 means

  I know that dma buf exporter should make sure that cache
 clean/invalidate
  are done when dma_buf_map/unmap_attachement is called. For this, already
 we
  do so. However, this function is called when dma buf import is requested
 by
  user to map a dmabuf fd with dma. This means that
 dma_buf_map_attachement()
  is called ONCE when user wants to share the dmabuf fd with dma.
Actually,
 in
  case of exynos drm, dma_map_sg_attrs(), performing cache operation
  internally, is called when dmabuf import is requested by user.

  you have to do so. Which strictly means that if you interleave device
 dma
  and cpu acccess you need to unmap/map every time.

  Which is obviously horribly inefficient, but a known gap in the dma_buf

  Right, and also this has big overhead.

  interface. Hence why I proposed to add dma_buf_sync_attachment similar
 to
  dma_sync_single_for_device:

  /**
   * dma_buf_sync_sg_attachment - sync caches for dma access
   * @attach: dma-buf attachment to sync
   * @sgt: the sg table to sync (returned by dma_buf_map_attachement)
   * @direction: dma direction to sync for
   *
   * This function synchronizes caches for device dma through the given
   * dma-buf attachment when interleaving dma from different devices and
 the
   * cpu. Other device dma needs to be synced also by calls to this
   * function (or a pair of dma_buf_map/unmap_attachment calls), cpu
 access
   * needs to be synced with dma_buf_begin/end_cpu_access.
   */
  void dma_buf_sync_sg_attachment(struct dma_buf_attachment *attach,
struct sg_table *sgt,
enum dma_data_direction direction)

  Note that sync here only means to synchronize caches, not wait for
 any
  outstanding fences. This is simply to be consistent with the
 established
  lingo of the dma api. How the dma-buf fences fit into this is imo a
  different topic, but my idea is that all the cache coherency barriers
  (i.e. dma_buf_map/unmap_attachment, dma_buf_sync_sg_attachment and
  dma_buf_begin/end_cpu_access) would automatically block for any
 attached
  fences (i.e. block for write fences when doing read-only access, block
 for
  all fences otherwise).

  As I mentioned already, kernel can't aware of when cpu accesses a shared
  buffer: user can access a shared buffer after mmap anytime and the
 shared
  buffer should be synchronized between cpu and dma. Therefore, the above
  cache coherency barriers should

RE: Introduce a new helper framework for buffer synchronization

2013-05-21 Thread Inki Dae

 -Original Message-
 From: Daniel Vetter [mailto:daniel.vet...@ffwll.ch] On Behalf Of Daniel
 Vetter
 Sent: Tuesday, May 21, 2013 6:31 AM
 To: Inki Dae
 Cc: Rob Clark; linux-fbdev; DRI mailing list; Kyungmin Park; myungjoo.ham;
 YoungJun Cho; linux-arm-ker...@lists.infradead.org; linux-
 me...@vger.kernel.org
 Subject: Re: Introduce a new helper framework for buffer synchronization

 On Mon, May 20, 2013 at 11:13:04PM +0200, Daniel Vetter wrote:
  On Sat, May 18, 2013 at 03:47:43PM +0900, Inki Dae wrote:
   2013/5/15 Rob Clark robdcl...@gmail.com

On Wed, May 15, 2013 at 1:19 AM, Inki Dae inki@samsung.com
 wrote:

 -Original Message-
 From: Rob Clark [mailto:robdcl...@gmail.com]
 Sent: Tuesday, May 14, 2013 10:39 PM
 To: Inki Dae
 Cc: linux-fbdev; DRI mailing list; Kyungmin Park; myungjoo.ham;
 YoungJun
 Cho; linux-arm-ker...@lists.infradead.org; linux-
 me...@vger.kernel.org
 Subject: Re: Introduce a new helper framework for buffer
 synchronization

 On Mon, May 13, 2013 at 10:52 PM, Inki Dae inki@samsung.com
wrote:
  well, for cache management, I think it is a better idea.. I
 didn't
  really catch that this was the motivation from the initial
 patch, but
  maybe I read it too quickly.  But cache can be decoupled from
  synchronization, because CPU access is not asynchronous.  For
  userspace/CPU access to buffer, you should:

1) wait for buffer
2) prepare-access
3)  ... do whatever cpu access to buffer ...
4) finish-access
5) submit buffer for new dma-operation

  For data flow from CPU to DMA device,
  1) wait for buffer
  2) prepare-access (dma_buf_begin_cpu_access)
  3) cpu access to buffer

  For data flow from DMA device to CPU
  1) wait for buffer

 Right, but CPU access isn't asynchronous (from the point of view
 of
 the CPU), so there isn't really any wait step at this point.  And
 if
 you do want the CPU to be able to signal a fence from userspace
 for
 some reason, you probably what something file/fd based so the
 refcnting/cleanup when process dies doesn't leave some pending
 DMA
 action wedged.  But I don't really see the point of that
 complexity
 when the CPU access isn't asynchronous in the first place.

 There was my missing comments, please see the below sequence.

 For data flow from CPU to DMA device and then from DMA device to
 CPU,
 1) wait for buffer - at user side - ioctl(fd,
 DMA_BUF_GET_FENCE, ...)
 - including prepare-access (dma_buf_begin_cpu_access)
 2) cpu access to buffer
 3) wait for buffer - at device driver
 - but CPU is already accessing the buffer so blocked.
 4) signal - at user side - ioctl(fd, DMA_BUF_PUT_FENCE, ...)
 5) the thread, blocked at 3), is waked up by 4).
 - and then finish-access (dma_buf_end_cpu_access)

right, I understand you can have background threads, etc, in
userspace.  But there are already plenty of synchronization
 primitives
that can be used for cpu-cpu synchronization, either within the
 same
process or between multiple processes.  For cpu access, even if it
 is
handled by background threads/processes, I think it is better to use
the traditional pthreads or unix synchronization primitives.  They
have existed forever, they are well tested, and they work.

So while it seems nice and orthogonal/clean to couple cache and
synchronization and handle dma-cpu and cpu-cpu and cpu-dma in the
same generic way, but I think in practice we have to make things
 more
complex than they otherwise need to be to do this.  Otherwise I
 think
we'll be having problems with badly behaved or crashing userspace.

   Right, this is very important. I think it's not esay to solve this
   issue. Aand at least for ARM based embedded system, such feature is
 useful
   to us; coupling cache operation and buffer synchronization. I'd like
 to
   collect other opinions and advices to solve this issue.

  Maybe we have a bit a misunderstanding here. The kernel really should
 take
  care of sync and cache coherency, and I agree that with the current
  dma_buf code (and the proposed fences) that part is still a bit hazy.
 But
  the kernel should not allow userspace to block access to a buffer until
  userspace is done. It should only sync with any oustanding fences and
  flush buffers before that userspace access happens.

  Then the kernel would again flush caches on the next dma access (which
  hopefully is only started once userspace completed access). Atm this
 isn't
  possible in an efficient way since the dma_buf api only exposes
 map/unmap
  attachment and not a function to just sync an existing mapping like
  dma_sync_single_for_device. I guess we should add a
  dma_buf_sync_attachment interface

RE: Introduce a new helper framework for buffer synchronization

2013-05-21 Thread Inki Dae

 -Original Message-
 From: Daniel Vetter [mailto:daniel.vet...@ffwll.ch] On Behalf Of Daniel
 Vetter
 Sent: Tuesday, May 21, 2013 4:45 PM
 To: Inki Dae
 Cc: 'Daniel Vetter'; 'Rob Clark'; 'linux-fbdev'; 'DRI mailing list';
 'Kyungmin Park'; 'myungjoo.ham'; 'YoungJun Cho'; linux-arm-
 ker...@lists.infradead.org; linux-media@vger.kernel.org
 Subject: Re: Introduce a new helper framework for buffer synchronization

 On Tue, May 21, 2013 at 04:03:06PM +0900, Inki Dae wrote:
   - Integration of fence syncing into dma_buf. Imo we should have a
 per-attachment mode which decides whether map/unmap (and the new
 sync)
 should wait for fences or whether the driver takes care of syncing
 through the new fence framework itself (for direct hw sync).

  I think it's a good idea to have per-attachment mode for buffer sync.
 But
  I'd like to say we make sure what is the purpose of map
  (dma_buf_map_attachment)first. This interface is used to get a sgt;
  containing pages to physical memory region, and map that region with
  device's iommu table. The important thing is that this interface is
 called
  just one time when user wants to share an allocated buffer with dma. But
 cpu
  will try to access the buffer every time as long as it wants. Therefore,
 we
  need cache control every time cpu and dma access the shared buffer:
 cache
  clean when cpu goes to dma and cache invalidate when dma goes to cpu.
 That
  is why I added new interfaces, DMA_BUF_GET_FENCE and DMA_BUF_PUT_FENCE,
 to
  dma buf framework. Of course, Those are ugly so we need a better way: I
 just
  wanted to show you that in such way, we can synchronize the shared
 buffer
  between cpu and dma. By any chance, is there any way that kernel can be
  aware of when cpu accesses the shared buffer or is there any point I
 didn't
  catch up?

 Well dma_buf_map/unmap_attachment should also flush/invalidate any caches,
 and with the current dma_buf spec those two functions are the _only_ means

I know that dma buf exporter should make sure that cache clean/invalidate
are done when dma_buf_map/unmap_attachement is called. For this, already we
do so. However, this function is called when dma buf import is requested by
user to map a dmabuf fd with dma. This means that dma_buf_map_attachement()
is called ONCE when user wants to share the dmabuf fd with dma. Actually, in
case of exynos drm, dma_map_sg_attrs(), performing cache operation
internally, is called when dmabuf import is requested by user.

 you have to do so. Which strictly means that if you interleave device dma
 and cpu acccess you need to unmap/map every time.

 Which is obviously horribly inefficient, but a known gap in the dma_buf

Right, and also this has big overhead.

 interface. Hence why I proposed to add dma_buf_sync_attachment similar to
 dma_sync_single_for_device:

 /**
  * dma_buf_sync_sg_attachment - sync caches for dma access
  * @attach: dma-buf attachment to sync
  * @sgt: the sg table to sync (returned by dma_buf_map_attachement)
  * @direction: dma direction to sync for
  *
  * This function synchronizes caches for device dma through the given
  * dma-buf attachment when interleaving dma from different devices and the
  * cpu. Other device dma needs to be synced also by calls to this
  * function (or a pair of dma_buf_map/unmap_attachment calls), cpu access
  * needs to be synced with dma_buf_begin/end_cpu_access.
  */
 void dma_buf_sync_sg_attachment(struct dma_buf_attachment *attach,
   struct sg_table *sgt,
   enum dma_data_direction direction)

 Note that sync here only means to synchronize caches, not wait for any
 outstanding fences. This is simply to be consistent with the established
 lingo of the dma api. How the dma-buf fences fit into this is imo a
 different topic, but my idea is that all the cache coherency barriers
 (i.e. dma_buf_map/unmap_attachment, dma_buf_sync_sg_attachment and
 dma_buf_begin/end_cpu_access) would automatically block for any attached
 fences (i.e. block for write fences when doing read-only access, block for
 all fences otherwise).

As I mentioned already, kernel can't aware of when cpu accesses a shared
buffer: user can access a shared buffer after mmap anytime and the shared
buffer should be synchronized between cpu and dma. Therefore, the above
cache coherency barriers should be called every time cpu and dma tries to
access a shared buffer, checking before and after of cpu and dma access. And
exactly, the proposed way do such thing. For this, you can refer to the
below link,

http://www.mail-archive.com/linux-media@vger.kernel.org/msg62124.html

My point is that how kernel can aware of when those cache coherency barriers
should be called to synchronize caches and buffer access between cpu and
dma.

 Then we could do a new dma_buf_attach_flags interface for special cases
 (might also be useful for other things, similar to the recently added
 flags in the dma

RE: Introduce a new helper framework for buffer synchronization

2013-05-14 Thread Inki Dae

 -Original Message-
 From: Rob Clark [mailto:robdcl...@gmail.com]
 Sent: Tuesday, May 14, 2013 10:39 PM
 To: Inki Dae
 Cc: linux-fbdev; DRI mailing list; Kyungmin Park; myungjoo.ham; YoungJun
 Cho; linux-arm-ker...@lists.infradead.org; linux-media@vger.kernel.org
 Subject: Re: Introduce a new helper framework for buffer synchronization

 On Mon, May 13, 2013 at 10:52 PM, Inki Dae inki@samsung.com wrote:
  well, for cache management, I think it is a better idea.. I didn't
  really catch that this was the motivation from the initial patch, but
  maybe I read it too quickly.  But cache can be decoupled from
  synchronization, because CPU access is not asynchronous.  For
  userspace/CPU access to buffer, you should:

1) wait for buffer
2) prepare-access
3)  ... do whatever cpu access to buffer ...
4) finish-access
5) submit buffer for new dma-operation

  For data flow from CPU to DMA device,
  1) wait for buffer
  2) prepare-access (dma_buf_begin_cpu_access)
  3) cpu access to buffer

  For data flow from DMA device to CPU
  1) wait for buffer

 Right, but CPU access isn't asynchronous (from the point of view of
 the CPU), so there isn't really any wait step at this point.  And if
 you do want the CPU to be able to signal a fence from userspace for
 some reason, you probably what something file/fd based so the
 refcnting/cleanup when process dies doesn't leave some pending DMA
 action wedged.  But I don't really see the point of that complexity
 when the CPU access isn't asynchronous in the first place.

There was my missing comments, please see the below sequence.

For data flow from CPU to DMA device and then from DMA device to CPU,
1) wait for buffer - at user side - ioctl(fd, DMA_BUF_GET_FENCE, ...)
- including prepare-access (dma_buf_begin_cpu_access)
2) cpu access to buffer
3) wait for buffer - at device driver
- but CPU is already accessing the buffer so blocked.
4) signal - at user side - ioctl(fd, DMA_BUF_PUT_FENCE, ...)
5) the thread, blocked at 3), is waked up by 4).
- and then finish-access (dma_buf_end_cpu_access)
6) dma access to buffer
7) wait for buffer - at user side - ioctl(fd, DMA_BUF_GET_FENCE, ...)
- but DMA is already accessing the buffer so blocked.
8) signal - at device driver
9) the thread, blocked at 7), is waked up by 8)
- and then prepare-access (dma_buf_begin_cpu_access)
10 cpu access to buffer

Basically, 'wait for buffer' includes buffer synchronization, committing
processing, and cache operation. The buffer synchronization means that a
current thread should wait for other threads accessing a shared buffer until
the completion of their access. And the committing processing means that a
current thread possesses the shared buffer so any trying to access the
shared buffer by another thread makes the thread to be blocked. However, as
I already mentioned before, it seems that these user interfaces are so ugly
yet. So we need better way.

Give me more comments if there is my missing point :)

Thanks,
Inki Dae

 BR,
 -R

  2) finish-access (dma_buf_end _cpu_access)
  3) dma access to buffer

  1) and 2) are coupled with one function: we have implemented
  fence_helper_commit_reserve() for it.

  Cache control(cache clean or cache invalidate) is performed properly
  checking previous access type and current access type.
  And the below is actual codes for it,

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: Introduce a new helper framework for buffer synchronization

2013-05-13 Thread Inki Dae

 -Original Message-
 From: Maarten Lankhorst [mailto:maarten.lankho...@canonical.com]
 Sent: Monday, May 13, 2013 5:01 PM
 To: Inki Dae
 Cc: Rob Clark; Daniel Vetter; DRI mailing list; linux-arm-
 ker...@lists.infradead.org; linux-media@vger.kernel.org; linux-fbdev;
 Kyungmin Park; myungjoo.ham; YoungJun Cho
 Subject: Re: Introduce a new helper framework for buffer synchronization

 Op 09-05-13 09:33, Inki Dae schreef:
  Hi all,

  This post introduces a new helper framework based on dma fence. And the
  purpose of this post is to collect other opinions and advices before RFC
  posting.

  First of all, this helper framework, called fence helper, is in progress
  yet so might not have enough comments in codes and also might need to be
  more cleaned up. Moreover, we might be missing some parts of the dma
 fence.
  However, I'd like to say that all things mentioned below has been tested
  with Linux platform and worked well.

  And tutorial for user process.
  just before cpu access
  struct dma_buf_fence *df;

  df-type = DMA_BUF_ACCESS_READ or DMA_BUF_ACCESS_WRITE;
  ioctl(fd, DMA_BUF_GET_FENCE, df);

  after memset or memcpy
  ioctl(fd, DMA_BUF_PUT_FENCE, df);
 NAK.

 Userspace doesn't need to trigger fences. It can do a buffer idle wait,
 and postpone submitting new commands until after it's done using the
 buffer.

Hi Maarten,

It seems that you say user should wait for a buffer like KDS does: KDS uses
select() to postpone submitting new commands. But I think this way assumes
that every data flows a DMA device to a CPU. For example, a CPU should keep
polling for the completion of a buffer access by a DMA device. This means
that the this way isn't considered for data flow to opposite case; CPU to
DMA device.

 Kernel space doesn't need the root hole you created by giving a
 dereferencing a pointer passed from userspace.
 Your next exercise should be to write a security exploit from the api you
 created here. It's the only way to learn how to write safe code. Hint:
 df.ctx = mmap(..);

Also I'm not clear to use our way yet and that is why I posted. As you
mentioned, it seems like that using mmap() is more safe. But there is one
issue it makes me confusing. For your hint, df.ctx = mmap(..), the issue is
that dmabuf mmap can be used to map a dmabuf with user space. And the dmabuf
means a physical memory region allocated by some allocator such as drm gem
or ion.

There might be my missing point so could you please give me more comments?

Thanks,
Inki Dae

 ~Maarten

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: Introduce a new helper framework for buffer synchronization

2013-05-13 Thread Inki Dae

 -Original Message-
 From: Maarten Lankhorst [mailto:maarten.lankho...@canonical.com]
 Sent: Monday, May 13, 2013 6:52 PM
 To: Inki Dae
 Cc: 'Rob Clark'; 'Daniel Vetter'; 'DRI mailing list'; linux-arm-
 ker...@lists.infradead.org; linux-media@vger.kernel.org; 'linux-fbdev';
 'Kyungmin Park'; 'myungjoo.ham'; 'YoungJun Cho'
 Subject: Re: Introduce a new helper framework for buffer synchronization

 Op 13-05-13 11:21, Inki Dae schreef:

  -Original Message-
  From: Maarten Lankhorst [mailto:maarten.lankho...@canonical.com]
  Sent: Monday, May 13, 2013 5:01 PM
  To: Inki Dae
  Cc: Rob Clark; Daniel Vetter; DRI mailing list; linux-arm-
  ker...@lists.infradead.org; linux-media@vger.kernel.org; linux-fbdev;
  Kyungmin Park; myungjoo.ham; YoungJun Cho
  Subject: Re: Introduce a new helper framework for buffer
 synchronization

  Op 09-05-13 09:33, Inki Dae schreef:
  Hi all,

  This post introduces a new helper framework based on dma fence. And
 the
  purpose of this post is to collect other opinions and advices before
 RFC
  posting.

  First of all, this helper framework, called fence helper, is in
 progress
  yet so might not have enough comments in codes and also might need to
 be
  more cleaned up. Moreover, we might be missing some parts of the dma
  fence.
  However, I'd like to say that all things mentioned below has been
 tested
  with Linux platform and worked well.

  And tutorial for user process.
  just before cpu access
  struct dma_buf_fence *df;

  df-type = DMA_BUF_ACCESS_READ or
DMA_BUF_ACCESS_WRITE;
  ioctl(fd, DMA_BUF_GET_FENCE, df);

  after memset or memcpy
  ioctl(fd, DMA_BUF_PUT_FENCE, df);
  NAK.

  Userspace doesn't need to trigger fences. It can do a buffer idle wait,
  and postpone submitting new commands until after it's done using the
  buffer.
  Hi Maarten,

  It seems that you say user should wait for a buffer like KDS does: KDS
 uses
  select() to postpone submitting new commands. But I think this way
 assumes
  that every data flows a DMA device to a CPU. For example, a CPU should
 keep
  polling for the completion of a buffer access by a DMA device. This
 means
  that the this way isn't considered for data flow to opposite case; CPU
 to
  DMA device.
 Not really. You do both things the same way. You first wait for the bo to
 be idle, this could be implemented by adding poll support to the dma-buf
 fd.
 Then you either do your read or write. Since userspace is supposed to be
 the one controlling the bo it should stay idle at that point. If you have
 another thread queueing
 the buffer againbefore your thread is done that's a bug in the
application,
 and can be solved with userspace locking primitives. No need for the
 kernel to get involved.

Yes, that is how we have synchronized buffer between CPU and DMA device
until now without buffer synchronization mechanism. I thought that it's best
to make user not considering anything: user can access a buffer regardless
of any DMA device controlling and the buffer synchronization is performed in
kernel level. Moreover, I think we could optimize graphics and multimedia
hardware performance because hardware can do more works: one thread accesses
a shared buffer and the other controls DMA device with the shared buffer in
parallel. Thus, we could avoid sequential processing and that is my
intention. Aren't you think about that we could improve hardware utilization
with such way or other? of course, there could be better way.

  Kernel space doesn't need the root hole you created by giving a
  dereferencing a pointer passed from userspace.
  Your next exercise should be to write a security exploit from the api
 you
  created here. It's the only way to learn how to write safe code. Hint:
  df.ctx = mmap(..);

  Also I'm not clear to use our way yet and that is why I posted. As you
  mentioned, it seems like that using mmap() is more safe. But there is
 one
  issue it makes me confusing. For your hint, df.ctx = mmap(..), the issue
 is
  that dmabuf mmap can be used to map a dmabuf with user space. And the
 dmabuf
  means a physical memory region allocated by some allocator such as drm
 gem
  or ion.

  There might be my missing point so could you please give me more
 comments?

 My point was that userspace could change df.ctx to some mmap'd memory,
 forcing the kernel to execute some code prepared by userspace.

Understood. I have to find a better way. And for this, I'd like to listen
attentively more opinions and advices.

Thanks for comments,
Inki Dae

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: Introduce a new helper framework for buffer synchronization

2013-05-13 Thread Inki Dae

 -Original Message-
 From: linux-fbdev-ow...@vger.kernel.org [mailto:linux-fbdev-
 ow...@vger.kernel.org] On Behalf Of Maarten Lankhorst
 Sent: Monday, May 13, 2013 8:41 PM
 To: Inki Dae
 Cc: 'Rob Clark'; 'Daniel Vetter'; 'DRI mailing list'; linux-arm-
 ker...@lists.infradead.org; linux-media@vger.kernel.org; 'linux-fbdev';
 'Kyungmin Park'; 'myungjoo.ham'; 'YoungJun Cho'
 Subject: Re: Introduce a new helper framework for buffer synchronization

 Op 13-05-13 13:24, Inki Dae schreef:
  and can be solved with userspace locking primitives. No need for the
  kernel to get involved.

  Yes, that is how we have synchronized buffer between CPU and DMA device
  until now without buffer synchronization mechanism. I thought that it's
 best
  to make user not considering anything: user can access a buffer
 regardless
  of any DMA device controlling and the buffer synchronization is
 performed in
  kernel level. Moreover, I think we could optimize graphics and
 multimedia
  hardware performance because hardware can do more works: one thread
 accesses
  a shared buffer and the other controls DMA device with the shared buffer
 in
  parallel. Thus, we could avoid sequential processing and that is my
  intention. Aren't you think about that we could improve hardware
 utilization
  with such way or other? of course, there could be better way.

 If you don't want to block the hardware the only option is to allocate a
 temp bo and blit to/from it using the hardware.
 OpenGL already does that when you want to read back, because otherwise the
 entire pipeline can get stalled.
 The overhead of command submission for that shouldn't be high, if it is
 you should really try to optimize that first
 before coming up with this crazy scheme.

I have considered all devices sharing buffer with CPU; multimedia, display
controller and graphics devices (including GPU). And we could provide
easy-to-use user interfaces for buffer synchronization. Of course, the
proposed user interfaces may be so ugly yet but at least, I think we need
something else for it.

 In that case you still wouldn't give userspace control over the fences. I
 don't see any way that can end well.
 What if userspace never signals? What if userspace gets killed by oom
 killer. Who keeps track of that?

In all cases, all kernel resources to user fence will be released by kernel
once the fence is timed out: never signaling and process killing by oom
killer makes the fence timed out. And if we use mmap mechanism you mentioned
before, I think user resource could also be freed properly.

Thanks,
Inki Dae

 ~Maarten
 --
 To unsubscribe from this list: send the line unsubscribe linux-fbdev in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: Introduce a new helper framework for buffer synchronization

2013-05-13 Thread Inki Dae

 -Original Message-
 From: Rob Clark [mailto:robdcl...@gmail.com]
 Sent: Tuesday, May 14, 2013 2:58 AM
 To: Inki Dae
 Cc: linux-fbdev; DRI mailing list; Kyungmin Park; myungjoo.ham; YoungJun
 Cho; linux-arm-ker...@lists.infradead.org; linux-media@vger.kernel.org
 Subject: Re: Introduce a new helper framework for buffer synchronization

 On Mon, May 13, 2013 at 1:18 PM, Inki Dae inki@samsung.com wrote:

  2013/5/13 Rob Clark robdcl...@gmail.com

  On Mon, May 13, 2013 at 8:21 AM, Inki Dae inki@samsung.com wrote:

   In that case you still wouldn't give userspace control over the
 fences.
   I
   don't see any way that can end well.
   What if userspace never signals? What if userspace gets killed by
 oom
   killer. Who keeps track of that?

   In all cases, all kernel resources to user fence will be released by
   kernel
   once the fence is timed out: never signaling and process killing by
 oom
   killer makes the fence timed out. And if we use mmap mechanism you
   mentioned
   before, I think user resource could also be freed properly.

  I tend to agree w/ Maarten here.. there is no good reason for
  userspace to be *signaling* fences.  The exception might be some blob
  gpu drivers which don't have enough knowledge in the kernel to figure
  out what to do.  (In which case you can add driver private ioctls for
  that.. still not the right thing to do but at least you don't make a
  public API out of it.)

  Please do not care whether those are generic or not. Let's see the
 following
  three things. First, it's cache operation. As you know, ARM SoC has ACP
  (Accelerator Coherency Port) and can be connected to DMA engine or
 similar
  devices. And this port is used for cache coherency between CPU cache and
 DMA
  device. However, most devices on ARM based embedded systems don't use
 the
  ACP port. So they need proper cache operation before and after of DMA or
 CPU
  access in case of using cachable mapping. Actually, I see many Linux
 based
  platforms call cache control interfaces directly for that. I think the
  reason, they do so, is that kernel isn't aware of when and how CPU
 accessed
  memory.

 I think we had kicked around the idea of giving dmabuf's a
 prepare/finish ioctl quite some time back.  This is probably something
 that should be at least a bit decoupled from fences.  (Possibly
 'prepare' waits for dma access to complete, but not the other way
 around.)

 And I did implement in omapdrm support for simulating coherency via
 page fault-in / shoot-down..  It is one option that makes it
 completely transparent to userspace, although there is some
 performance const, so I suppose it depends a bit on your use-case.

  And second, user process has to do so many things in case of using
 shared
  memory with DMA device. User process should understand how DMA device is
  operated and when interfaces for controling the DMA device are called.
 Such
  things would make user application so complicated.

  And third, it's performance optimization to multimedia and graphics
 devices.
  As I mentioned already, we should consider sequential processing for
 buffer
  sharing between CPU and DMA device. This means that CPU should stay with
  idle until DMA device is completed and vise versa.

  That is why I proposed such user interfaces. Of course, these interfaces
  might be so ugly yet: for this, Maarten pointed already out and I agree
 with
  him. But there must be another better way. Aren't you think we need
 similar
  thing? With such interfaces, cache control and buffer synchronization
 can be
  performed in kernel level. Moreover, user applization doesn't need to
  consider DMA device controlling anymore. Therefore, one thread can
 access a
  shared buffer and the other can control DMA device with the shared
 buffer in
  parallel. We can really make the best use of CPU and DMA idle time. In
 other
  words, we can really make the best use of multi tasking OS, Linux.

  So could you please tell me about that there is any reason we don't use
  public API for it? I think we can add and use public API if NECESSARY.

 well, for cache management, I think it is a better idea.. I didn't
 really catch that this was the motivation from the initial patch, but
 maybe I read it too quickly.  But cache can be decoupled from
 synchronization, because CPU access is not asynchronous.  For
 userspace/CPU access to buffer, you should:

   1) wait for buffer
   2) prepare-access
   3)  ... do whatever cpu access to buffer ...
   4) finish-access
   5) submit buffer for new dma-operation

For data flow from CPU to DMA device,
1) wait for buffer
2) prepare-access (dma_buf_begin_cpu_access)
3) cpu access to buffer

For data flow from DMA device to CPU
1) wait for buffer
2) finish-access (dma_buf_end _cpu_access)
3) dma access to buffer

1) and 2) are coupled with one function: we have implemented
fence_helper_commit_reserve() for it.

Cache control(cache clean or cache

RE: [PATCH v3] drm/exynos: enable FIMD clocks

2013-04-21 Thread Inki Dae

Hi, Mr. Vikas

Please fix the below typos Viresh pointed out and my comments.

 -Original Message-
 From: Viresh Kumar [mailto:viresh.ku...@linaro.org]
 Sent: Monday, April 01, 2013 5:51 PM
 To: Vikas Sajjan
 Cc: dri-de...@lists.freedesktop.org; linux-samsung-...@vger.kernel.org;
 jy0922.s...@samsung.com; inki@samsung.com; kgene@samsung.com;
 linaro-ker...@lists.linaro.org; linux-media@vger.kernel.org
 Subject: Re: [PATCH v3] drm/exynos: enable FIMD clocks
 
 On 1 April 2013 14:13, Vikas Sajjan vikas.saj...@linaro.org wrote:
  While migrating to common clock framework (CCF), found that the FIMD
 clocks
 
 s/found/we found/
 
  were pulled down by the CCF.
  If CCF finds any clock(s) which has NOT been claimed by any of the
  drivers, then such clock(s) are PULLed low by CCF.
 
  By calling clk_prepare_enable() for FIMD clocks fixes the issue.
 
 s/By calling/Calling/
 
 and
 
 s/the/this
 
  this patch also replaces clk_disable() with clk_disable_unprepare()
 
 s/this/This
 
  during exit.
 
 Sorry but your log doesn't say what you are doing. You are just adding
 relevant calls to clk_prepare/unprepare() before calling
 clk_enable/disable.
 
  Signed-off-by: Vikas Sajjan vikas.saj...@linaro.org
  ---
  Changes since v2:
  - moved clk_prepare_enable() and clk_disable_unprepare() from
  fimd_probe() to fimd_clock() as suggested by Inki Dae
 inki@samsung.com
  Changes since v1:
  - added error checking for clk_prepare_enable() and also
replaced
  clk_disable() with clk_disable_unprepare() during exit.
  ---
   drivers/gpu/drm/exynos/exynos_drm_fimd.c |   14 +++---
   1 file changed, 7 insertions(+), 7 deletions(-)
 
  diff --git a/drivers/gpu/drm/exynos/exynos_drm_fimd.c
 b/drivers/gpu/drm/exynos/exynos_drm_fimd.c
  index 9537761..f2400c8 100644
  --- a/drivers/gpu/drm/exynos/exynos_drm_fimd.c
  +++ b/drivers/gpu/drm/exynos/exynos_drm_fimd.c
  @@ -799,18 +799,18 @@ static int fimd_clock(struct fimd_context *ctx,
 bool enable)
  if (enable) {
  int ret;
 
  -   ret = clk_enable(ctx-bus_clk);
  +   ret = clk_prepare_enable(ctx-bus_clk);
  if (ret  0)
  return ret;
 
  -   ret = clk_enable(ctx-lcd_clk);
  +   ret = clk_prepare_enable(ctx-lcd_clk);
  if  (ret  0) {
  -   clk_disable(ctx-bus_clk);
  +   clk_disable_unprepare(ctx-bus_clk);
  return ret;
  }
  } else {
  -   clk_disable(ctx-lcd_clk);
  -   clk_disable(ctx-bus_clk);
  +   clk_disable_unprepare(ctx-lcd_clk);
  +   clk_disable_unprepare(ctx-bus_clk);
  }
 
  return 0;
  @@ -981,8 +981,8 @@ static int fimd_remove(struct platform_device *pdev)
  if (ctx-suspended)
  goto out;
 
  -   clk_disable(ctx-lcd_clk);
  -   clk_disable(ctx-bus_clk);
  +   clk_disable_unprepare(ctx-lcd_clk);
  +   clk_disable_unprepare(ctx-bus_clk);

Just remove the above codes. It seems that clk_disable(also
clk_disable_unprepare) isn't needed because it will be done by
pm_runtime_put_sync and please re-post it(probably patch v5??)

Thanks,
Inki Dae

 
 You are doing things at the right place but i have a suggestion. Are you
 doing
 anything in your clk_prepare() atleast for this device? Probably not.
 
 If not, then its better to call clk_prepare/unprepare only once at
 probe/remove
 and keep clk_enable/disable calls as is.
 
 --
 viresh

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] drm/exynos: enable FIMD clocks

2013-03-27 Thread Inki Dae

2013/3/20 Vikas Sajjan vikas.saj...@linaro.org:
 While migrating to common clock framework (CCF), found that the FIMD clocks
 were pulled down by the CCF.
 If CCF finds any clock(s) which has NOT been claimed by any of the
 drivers, then such clock(s) are PULLed low by CCF.

 By calling clk_prepare_enable() for FIMD clocks fixes the issue.

 this patch also replaces clk_disable() with clk_disable_unprepare()
 during exit.

 Signed-off-by: Vikas Sajjan vikas.saj...@linaro.org
 ---
 Changes since v1:
 - added error checking for clk_prepare_enable() and also replaced
 clk_disable() with clk_disable_unprepare() during exit.
 ---
  drivers/gpu/drm/exynos/exynos_drm_fimd.c |   17 +++--
  1 file changed, 15 insertions(+), 2 deletions(-)

 diff --git a/drivers/gpu/drm/exynos/exynos_drm_fimd.c 
 b/drivers/gpu/drm/exynos/exynos_drm_fimd.c
 index 9537761..014d750 100644
 --- a/drivers/gpu/drm/exynos/exynos_drm_fimd.c
 +++ b/drivers/gpu/drm/exynos/exynos_drm_fimd.c
 @@ -934,6 +934,19 @@ static int fimd_probe(struct platform_device *pdev)
 return ret;
 }

 +   ret = clk_prepare_enable(ctx-lcd_clk);
 +   if (ret) {
 +   dev_err(dev, failed to enable 'sclk_fimd' clock\n);
 +   return ret;
 +   }
 +
 +   ret = clk_prepare_enable(ctx-bus_clk);
 +   if (ret) {
 +   clk_disable_unprepare(ctx-lcd_clk);
 +   dev_err(dev, failed to enable 'fimd' clock\n);
 +   return ret;
 +   }
 +

Please remove the above two clk_prepare_enable function calls and use
them in fimd_clock() instead of clk_enable/disable(). When probed,
fimd clock will be enabled by runtime pm.

Thanks,
Inki Dae

 ctx-vidcon0 = pdata-vidcon0;
 ctx-vidcon1 = pdata-vidcon1;
 ctx-default_win = pdata-default_win;
 @@ -981,8 +994,8 @@ static int fimd_remove(struct platform_device *pdev)
 if (ctx-suspended)
 goto out;

 -   clk_disable(ctx-lcd_clk);
 -   clk_disable(ctx-bus_clk);
 +   clk_disable_unprepare(ctx-lcd_clk);
 +   clk_disable_unprepare(ctx-bus_clk);

 pm_runtime_set_suspended(dev);
 pm_runtime_put_sync(dev);
 --
 1.7.9.5

 ___
 dri-devel mailing list
 dri-de...@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/dri-devel
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH v12 2/2] drm/exynos: enable OF_VIDEOMODE and FB_MODE_HELPERS for exynos drm fimd

2013-03-07 Thread Inki Dae



 -Original Message-
 From: linux-media-ow...@vger.kernel.org [mailto:linux-media-
 ow...@vger.kernel.org] On Behalf Of Vikas Sajjan
 Sent: Thursday, March 07, 2013 4:40 PM
 To: dri-de...@lists.freedesktop.org
 Cc: linux-media@vger.kernel.org; kgene@samsung.com;
 inki@samsung.com; l.kris...@samsung.com; jo...@samsung.com; linaro-
 ker...@lists.linaro.org
 Subject: [PATCH v12 2/2] drm/exynos: enable OF_VIDEOMODE and
 FB_MODE_HELPERS for exynos drm fimd
 
 patch adds select OF_VIDEOMODE and select FB_MODE_HELPERS when
 EXYNOS_DRM_FIMD config is selected.
 
 Signed-off-by: Vikas Sajjan vikas.saj...@linaro.org
 ---
  drivers/gpu/drm/exynos/Kconfig |2 ++
  1 file changed, 2 insertions(+)
 
 diff --git a/drivers/gpu/drm/exynos/Kconfig
 b/drivers/gpu/drm/exynos/Kconfig
 index 046bcda..bb25130 100644
 --- a/drivers/gpu/drm/exynos/Kconfig
 +++ b/drivers/gpu/drm/exynos/Kconfig
 @@ -25,6 +25,8 @@ config DRM_EXYNOS_DMABUF
  config DRM_EXYNOS_FIMD
   bool Exynos DRM FIMD
   depends on DRM_EXYNOS  !FB_S3C  !ARCH_MULTIPLATFORM

Again, you missed 'OF' dependency. At least, let's have build testing surely
before posting :)

Thanks,
Inki Dae

 + select OF_VIDEOMODE
 + select FB_MODE_HELPERS
   help
 Choose this option if you want to use Exynos FIMD for DRM.
 
 --
 1.7.9.5
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-media in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v10 1/2] video: drm: exynos: Add display-timing node parsing using video helper function

2013-03-06 Thread Inki Dae

2013/3/1 Vikas Sajjan vikas.saj...@linaro.org:
 Add support for parsing the display-timing node using video helper
 function.

 The DT node parsing is done only if 'dev.of_node'
 exists and the NON-DT logic is still maintained under the 'else' part.

 Signed-off-by: Leela Krishna Amudala l.kris...@samsung.com
 Signed-off-by: Vikas Sajjan vikas.saj...@linaro.org
 Acked-by: Joonyoung Shim jy0922.s...@samsung.com
 ---
  drivers/gpu/drm/exynos/exynos_drm_fimd.c |   24 
  1 file changed, 20 insertions(+), 4 deletions(-)

 diff --git a/drivers/gpu/drm/exynos/exynos_drm_fimd.c 
 b/drivers/gpu/drm/exynos/exynos_drm_fimd.c
 index 9537761..e323cf9 100644
 --- a/drivers/gpu/drm/exynos/exynos_drm_fimd.c
 +++ b/drivers/gpu/drm/exynos/exynos_drm_fimd.c
 @@ -20,6 +20,7 @@
  #include linux/of_device.h
  #include linux/pm_runtime.h

 +#include video/of_display_timing.h
  #include video/samsung_fimd.h
  #include drm/exynos_drm.h

 @@ -883,10 +884,25 @@ static int fimd_probe(struct platform_device *pdev)

 DRM_DEBUG_KMS(%s\n, __FILE__);

 -   pdata = pdev-dev.platform_data;
 -   if (!pdata) {
 -   dev_err(dev, no platform data specified\n);
 -   return -EINVAL;
 +   if (pdev-dev.of_node) {
 +   pdata = devm_kzalloc(dev, sizeof(*pdata), GFP_KERNEL);
 +   if (!pdata) {
 +   DRM_ERROR(memory allocation for pdata failed\n);
 +   return -ENOMEM;
 +   }
 +
 +   ret = of_get_fb_videomode(dev-of_node, pdata-panel.timing,
 +   OF_USE_NATIVE_MODE);

Add select OF_VIDEOMODE and select FB_MODE_HELPERS to
drivers/gpu/drm/exynos/Kconfig. When EXYNOS_DRM_FIMD config is
selected, these two configs should also be selected.

Thanks,
Inki Dae

 +   if (ret) {
 +   DRM_ERROR(failed: of_get_fb_videomode() : %d\n, 
 ret);
 +   return ret;
 +   }
 +   } else {
 +   pdata = pdev-dev.platform_data;
 +   if (!pdata) {
 +   DRM_ERROR(no platform data specified\n);
 +   return -EINVAL;
 +   }
 }

 panel = pdata-panel;
 --
 1.7.9.5

 ___
 dri-devel mailing list
 dri-de...@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/dri-devel
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH] drm/exynos: modify the compatible string for exynos fimd

2013-03-06 Thread Inki Dae

Already merged. :)

 -Original Message-
 From: Vikas Sajjan [mailto:vikas.saj...@linaro.org]
 Sent: Thursday, March 07, 2013 4:09 PM
 To: InKi Dae
 Cc: dri-de...@lists.freedesktop.org; linux-media@vger.kernel.org;
 kgene@samsung.com; Joonyoung Shim; sunil joshi
 Subject: Re: [PATCH] drm/exynos: modify the compatible string for exynos
 fimd
 
 Hi Mr Inki Dae,
 
 On 28 February 2013 08:12, Joonyoung Shim jy0922.s...@samsung.com wrote:
  On 02/27/2013 07:32 PM, Vikas Sajjan wrote:
 
  modified compatible string for exynos4 fimd as exynos4210-fimd and
  exynos5 fimd as exynos5250-fimd to stick to the rule that compatible
  value should be named after first specific SoC model in which this
  particular IP version was included as discussed at
  https://patchwork.kernel.org/patch/2144861/
 
  Signed-off-by: Vikas Sajjan vikas.saj...@linaro.org
  ---
drivers/gpu/drm/exynos/exynos_drm_fimd.c |4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
 
  diff --git a/drivers/gpu/drm/exynos/exynos_drm_fimd.c
  b/drivers/gpu/drm/exynos/exynos_drm_fimd.c
  index 9537761..433ed35 100644
  --- a/drivers/gpu/drm/exynos/exynos_drm_fimd.c
  +++ b/drivers/gpu/drm/exynos/exynos_drm_fimd.c
  @@ -109,9 +109,9 @@ struct fimd_context {
  #ifdef CONFIG_OF
static const struct of_device_id fimd_driver_dt_match[] = {
  -   { .compatible = samsung,exynos4-fimd,
  +   { .compatible = samsung,exynos4210-fimd,
.data = exynos4_fimd_driver_data },
  -   { .compatible = samsung,exynos5-fimd,
  +   { .compatible = samsung,exynos5250-fimd,
.data = exynos5_fimd_driver_data },
  {},
};
 
 
  Acked-by: Joonyoung Shim jy0922.s...@samsung.com
 
 Can you please apply this patch.
 
 
  Thanks.
 
 
 
 --
 Thanks and Regards
  Vikas Sajjan

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH v6 1/1] video: drm: exynos: Add display-timing node parsing using video helper function

2013-02-20 Thread Inki Dae



 -Original Message-
 From: Vikas Sajjan [mailto:vikas.saj...@linaro.org]
 Sent: Friday, February 15, 2013 3:43 PM
 To: dri-de...@lists.freedesktop.org
 Cc: linux-media@vger.kernel.org; kgene@samsung.com;
 inki@samsung.com; l.kris...@samsung.com; patc...@linaro.org
 Subject: [PATCH v6 1/1] video: drm: exynos: Add display-timing node
 parsing using video helper function
 
 Add support for parsing the display-timing node using video helper
 function.
 
 The DT node parsing and pinctrl selection is done only if 'dev.of_node'
 exists and the NON-DT logic is still maintained under the 'else' part.
 
 Signed-off-by: Leela Krishna Amudala l.kris...@samsung.com
 Signed-off-by: Vikas Sajjan vikas.saj...@linaro.org
 ---
  drivers/gpu/drm/exynos/exynos_drm_fimd.c |   37
 ++
  1 file changed, 33 insertions(+), 4 deletions(-)
 
 diff --git a/drivers/gpu/drm/exynos/exynos_drm_fimd.c
 b/drivers/gpu/drm/exynos/exynos_drm_fimd.c
 index 9537761..8b2c0ff 100644
 --- a/drivers/gpu/drm/exynos/exynos_drm_fimd.c
 +++ b/drivers/gpu/drm/exynos/exynos_drm_fimd.c
 @@ -19,7 +19,9 @@
  #include linux/clk.h
  #include linux/of_device.h
  #include linux/pm_runtime.h
 +#include linux/pinctrl/consumer.h
 
 +#include video/of_display_timing.h
  #include video/samsung_fimd.h
  #include drm/exynos_drm.h
 
 @@ -877,16 +879,43 @@ static int fimd_probe(struct platform_device *pdev)
   struct exynos_drm_subdrv *subdrv;
   struct exynos_drm_fimd_pdata *pdata;
   struct exynos_drm_panel_info *panel;
 + struct fb_videomode *fbmode;
 + struct pinctrl *pctrl;
   struct resource *res;
   int win;
   int ret = -EINVAL;
 
   DRM_DEBUG_KMS(%s\n, __FILE__);
 
 - pdata = pdev-dev.platform_data;
 - if (!pdata) {
 - dev_err(dev, no platform data specified\n);
 - return -EINVAL;
 + if (pdev-dev.of_node) {
 + pdata = devm_kzalloc(dev, sizeof(*pdata), GFP_KERNEL);
 + if (!pdata) {
 + DRM_ERROR(memory allocation for pdata failed\n);
 + return -ENOMEM;
 + }
 +
 + fbmode = pdata-panel.timing;
 + ret = of_get_fb_videomode(dev-of_node, fbmode,
 + OF_USE_NATIVE_MODE);
 + if (ret) {
 + DRM_ERROR(failed: of_get_fb_videomode()\n
 + with return value: %d\n, ret);
 + return ret;
 + }
 +
 + pctrl = devm_pinctrl_get_select_default(dev);

Why does it need pinctrl? and even though needed, I think this should be
separated into another one.

Thanks,
Inki Dae

 + if (IS_ERR_OR_NULL(pctrl)) {
 + DRM_ERROR(failed:
 devm_pinctrl_get_select_default()\n
 + with return value: %d\n, PTR_RET(pctrl));
 + return PTR_RET(pctrl);
 + }
 +
 + } else {
 + pdata = pdev-dev.platform_data;
 + if (!pdata) {
 + DRM_ERROR(no platform data specified\n);
 + return -EINVAL;
 + }
   }
 
   panel = pdata-panel;
 --
 1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v5 1/1] video: drm: exynos: Add display-timing node parsing using video helper function

2013-02-14 Thread Inki Dae

2013/2/6 Vikas Sajjan vikas.saj...@linaro.org:
 Add support for parsing the display-timing node using video helper
 function.

 The DT node parsing and pinctrl selection is done only if 'dev.of_node'
 exists and the NON-DT logic is still maintained under the 'else' part.

 Signed-off-by: Leela Krishna Amudala l.kris...@samsung.com
 Signed-off-by: Vikas Sajjan vikas.saj...@linaro.org
 ---
  drivers/gpu/drm/exynos/exynos_drm_fimd.c |   41 
 +++---
  1 file changed, 37 insertions(+), 4 deletions(-)

 diff --git a/drivers/gpu/drm/exynos/exynos_drm_fimd.c 
 b/drivers/gpu/drm/exynos/exynos_drm_fimd.c
 index bf0d9ba..978e866 100644
 --- a/drivers/gpu/drm/exynos/exynos_drm_fimd.c
 +++ b/drivers/gpu/drm/exynos/exynos_drm_fimd.c
 @@ -19,6 +19,7 @@
  #include linux/clk.h
  #include linux/of_device.h
  #include linux/pm_runtime.h
 +#include linux/pinctrl/consumer.h

  #include video/samsung_fimd.h
  #include drm/exynos_drm.h
 @@ -905,16 +906,48 @@ static int __devinit fimd_probe(struct platform_device 
 *pdev)
 struct exynos_drm_subdrv *subdrv;
 struct exynos_drm_fimd_pdata *pdata;
 struct exynos_drm_panel_info *panel;
 +   struct fb_videomode *fbmode;
 +   struct pinctrl *pctrl;
 struct resource *res;
 int win;
 int ret = -EINVAL;

 DRM_DEBUG_KMS(%s\n, __FILE__);

 -   pdata = pdev-dev.platform_data;
 -   if (!pdata) {
 -   dev_err(dev, no platform data specified\n);
 -   return -EINVAL;
 +   if (pdev-dev.of_node) {
 +   pdata = devm_kzalloc(dev, sizeof(*pdata), GFP_KERNEL);
 +   if (!pdata) {
 +   DRM_ERROR(memory allocation for pdata failed\n);
 +   return -ENOMEM;
 +   }
 +
 +   fbmode = devm_kzalloc(dev, sizeof(*fbmode), GFP_KERNEL);
 +   if (!fbmode) {
 +   DRM_ERROR(memory allocation for fbmode failed\n);
 +   return -ENOMEM;
 +   }

It doesn't need to allocate fbmode.

 +
 +   ret = of_get_fb_videomode(dev-of_node, fbmode, -1);

What is -1? use OF_USE_NATIVE_MODE instead including
of_display_timing.h and just change the above code like below,

   fbmode = pdata-panel.timing;
   ret = of_get_fb_videomode(dev-of_node, fbmode,
OF_USE_NATIVE_MODE);

 +   if (ret) {
 +   DRM_ERROR(failed: of_get_fb_videomode()\n
 +   with return value: %d\n, ret);
 +   return ret;
 +   }
 +   pdata-panel.timing = (struct fb_videomode) *fbmode;

remove the above line.

 +
 +   pctrl = devm_pinctrl_get_select_default(dev);
 +   if (IS_ERR_OR_NULL(pctrl)) {
 +   DRM_ERROR(failed: 
 devm_pinctrl_get_select_default()\n
 +   with return value: %d\n, PTR_RET(pctrl));
 +   return PTR_RET(pctrl);
 +   }
 +
 +   } else {
 +   pdata = pdev-dev.platform_data;
 +   if (!pdata) {
 +   DRM_ERROR(no platform data specified\n);
 +   return -EINVAL;
 +   }
 }

 panel = pdata-panel;
 --
 1.7.9.5

 ___
 dri-devel mailing list
 dri-de...@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/dri-devel
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 2/2] drm/exynos: Add device tree based discovery support for G2D

2013-02-12 Thread Inki Dae

Applied and will go to -next.
And please post the document(in
Documentation/devicetree/bindings/gpu/) for it later.

Thanks,
Inki Dae

2013/2/6 Sachin Kamat sachin.ka...@linaro.org:
 From: Ajay Kumar ajaykumar...@samsung.com

 This patch adds device tree match table for Exynos G2D controller.

 Signed-off-by: Ajay Kumar ajaykumar...@samsung.com
 Signed-off-by: Sachin Kamat sachin.ka...@linaro.org
 ---
 Patch based on exynos-drm-fixes branch of Inki Dae's tree:
 git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos.git

 Changes since v1:
 Modified the compatible string as per the discussions at [1].
 [1] https://patchwork1.kernel.org/patch/2045821/
 ---
  drivers/gpu/drm/exynos/exynos_drm_g2d.c |   10 ++
  1 files changed, 10 insertions(+), 0 deletions(-)

 diff --git a/drivers/gpu/drm/exynos/exynos_drm_g2d.c 
 b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
 index ddcfb5d..0fcfbe4 100644
 --- a/drivers/gpu/drm/exynos/exynos_drm_g2d.c
 +++ b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
 @@ -19,6 +19,7 @@
  #include linux/workqueue.h
  #include linux/dma-mapping.h
  #include linux/dma-attrs.h
 +#include linux/of.h

  #include drm/drmP.h
  #include drm/exynos_drm.h
 @@ -1240,6 +1241,14 @@ static int g2d_resume(struct device *dev)

  static SIMPLE_DEV_PM_OPS(g2d_pm_ops, g2d_suspend, g2d_resume);

 +#ifdef CONFIG_OF
 +static const struct of_device_id exynos_g2d_match[] = {
 +   { .compatible = samsung,exynos5250-g2d },
 +   {},
 +};
 +MODULE_DEVICE_TABLE(of, exynos_g2d_match);
 +#endif
 +
  struct platform_driver g2d_driver = {
 .probe  = g2d_probe,
 .remove = g2d_remove,
 @@ -1247,5 +1256,6 @@ struct platform_driver g2d_driver = {
 .name   = s5p-g2d,
 .owner  = THIS_MODULE,
 .pm = g2d_pm_ops,
 +   .of_match_table = of_match_ptr(exynos_g2d_match),
 },
  };
 --
 1.7.4.1

 ___
 dri-devel mailing list
 dri-de...@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/dri-devel
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 2/2] drm/exynos: Add device tree based discovery support for G2D

2013-02-12 Thread Inki Dae

2013/2/12 Sylwester Nawrocki s.nawro...@samsung.com:
 On 02/12/2013 02:17 PM, Inki Dae wrote:
 Applied and will go to -next.
 And please post the document(in
 Documentation/devicetree/bindings/gpu/) for it later.

 There is already some old patch applied in the devicetree/next tree:

 http://git.secretlab.ca/?p=linux.git;a=commitdiff;h=09495dda6a62c74b13412a63528093910ef80edd

 I guess there is now an incremental patch needed for this.


I think that this patch should be reverted because the compatible
string of this document isn't generic and also the document file
should be moved into proper place(.../bindings/gpu/).

So Mr. Grant, could you please revert the below patch?
of/exynos_g2d: Add Bindings for exynos G2D driver
commit: 09495dda6a62c74b13412a63528093910ef80edd

This document should be modifed correctly and re-posted. For this, we
have already reached an arrangement with other Exynos maintainters.

Thanks,
Inki Dae


 Regards,
 Sylwester

























 ___
 dri-devel mailing list
 dri-de...@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/dri-devel
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH v2 2/2] drm/exynos: Add device tree based discovery support for G2D

2013-02-06 Thread Inki Dae

 -Original Message-
 From: Sachin Kamat [mailto:sachin.ka...@linaro.org]
 Sent: Wednesday, February 06, 2013 5:03 PM
 To: Inki Dae
 Cc: linux-media@vger.kernel.org; dri-de...@lists.freedesktop.org;
 devicetree-disc...@lists.ozlabs.org; k.deb...@samsung.com;
 s.nawro...@samsung.com; kgene@samsung.com; patc...@linaro.org; Ajay
 Kumar
 Subject: Re: [PATCH v2 2/2] drm/exynos: Add device tree based discovery
 support for G2D

 On 6 February 2013 13:02, Inki Dae inki@samsung.com wrote:

  Looks good to me but please add document for it.

 Yes. I will. I was planning to send the bindings document patch along
 with the dt patches (adding node entries to dts files).
 Sylwester had suggested adding this to
 Documentation/devicetree/bindings/media/ which contains other media
 IPs.

I think that it's better to go to gpu than media and we can divide Exynos
IPs into the bellow categories,

Media : mfc
GPU : g2d, g3d, fimc, gsc
Video : fimd, hdmi, eDP, MIPI-DSI

And I think that the device-tree describes hardware so possibly, all
documents in .../bindings/drm/exynos/* should be moved to proper place also.
Please give  me any opinions.

Thanks,
Inki Dae

  To other guys,
  And is there anyone who know where this document should be added to?
  I'm not sure that the g2d document should be placed in
  Documentation/devicetree/bindings/gpu, media, drm/exynos or arm/exynos.
 At
  least, this document should be shared with the g2d hw relevant drivers
 such
  as v4l2 and drm. So is .../bindings/gpu proper place?

 --
 With warm regards,
 Sachin

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH v2 2/2] drm/exynos: Add device tree based discovery support for G2D

2013-02-06 Thread Inki Dae

 -Original Message-
 From: linux-media-ow...@vger.kernel.org [mailto:linux-media-
 ow...@vger.kernel.org] On Behalf Of Sylwester Nawrocki
 Sent: Wednesday, February 06, 2013 8:24 PM
 To: Inki Dae
 Cc: 'Sachin Kamat'; linux-media@vger.kernel.org; dri-
 de...@lists.freedesktop.org; devicetree-disc...@lists.ozlabs.org;
 k.deb...@samsung.com; kgene@samsung.com; patc...@linaro.org; 'Ajay
 Kumar'; kyungmin.p...@samsung.com; sw0312@samsung.com;
 jy0922.s...@samsung.com
 Subject: Re: [PATCH v2 2/2] drm/exynos: Add device tree based discovery
 support for G2D

 On 02/06/2013 09:51 AM, Inki Dae wrote:
 [...]
  I think that it's better to go to gpu than media and we can divide
 Exynos
  IPs into the bellow categories,

  Media : mfc
  GPU : g2d, g3d, fimc, gsc

 Heh, nice try! :) GPU and FIMC ? FIMC is a camera subsystem (hence 'C'
 in the acronym), so what it has really to do with GPU ? All right, this IP
 has really two functions: camera capture and video post-processing
 (colorspace conversion, scaling), but the main feature is camera capture
 (fimc-lite is a camera capture interface IP only).

 Also, Exynos5 GScaler is used as a DMA engine for camera capture data
 pipelines, so it will be used by a camera capture driver as well. It
 really belongs to Media and GPU, as this is a multifunctional
 device (similarly to FIMC).

 So I propose following classification, which seems less inaccurate:

 GPU:   g2d, g3d
 Media: mfc, fimc, fimc-lite, fimc-is, mipi-csis, gsc
 Video: fimd, hdmi, eDP, mipi-dsim

Ok, it seems that your propose is better. :)

To Sachin,
Please add g2d document to .../bindings/gpu

To Rahul,
Could you please move .../drm/exynos/* to .../bindings/video? Probably you
need to rename the files there to exynos*.txt

If there are no other opinions, let's start  :)

Thanks,
Inki Dae

 I have already a DT bindings description prepared for fimc [1].
 (probably it needs to be rephrased a bit not to refer to the linux
 device model). I put it in Documentation/devicetree/bindings/media/soc,
 but likely there is no need for the 'soc' subdirectory...

  Video : fimd, hdmi, eDP, MIPI-DSI

  And I think that the device-tree describes hardware so possibly, all
  documents in .../bindings/drm/exynos/* should be moved to proper place
 also.
  Please give  me any opinions.

 Yes, I agree. If possible, it would be nice to have some Linux API
 agnostic locations.

 [1] goo.gl/eTGOl

 --

 Thanks,
 Sylwester
 --
 To unsubscribe from this list: send the line unsubscribe linux-media in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH v2 2/2] drm/exynos: Add device tree based discovery support for G2D

2013-02-05 Thread Inki Dae

 -Original Message-
 From: Sachin Kamat [mailto:sachin.ka...@linaro.org]
 Sent: Wednesday, February 06, 2013 2:30 PM
 To: linux-media@vger.kernel.org; dri-de...@lists.freedesktop.org;
 devicetree-disc...@lists.ozlabs.org
 Cc: k.deb...@samsung.com; sachin.ka...@linaro.org; inki@samsung.com;
 s.nawro...@samsung.com; kgene@samsung.com; patc...@linaro.org; Ajay
 Kumar
 Subject: [PATCH v2 2/2] drm/exynos: Add device tree based discovery
 support for G2D
 
 From: Ajay Kumar ajaykumar...@samsung.com
 
 This patch adds device tree match table for Exynos G2D controller.
 
 Signed-off-by: Ajay Kumar ajaykumar...@samsung.com
 Signed-off-by: Sachin Kamat sachin.ka...@linaro.org
 ---
 Patch based on exynos-drm-fixes branch of Inki Dae's tree:
 git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos.git
 
 Changes since v1:
 Modified the compatible string as per the discussions at [1].
 [1] https://patchwork1.kernel.org/patch/2045821/
 ---
  drivers/gpu/drm/exynos/exynos_drm_g2d.c |   10 ++
  1 files changed, 10 insertions(+), 0 deletions(-)
 
 diff --git a/drivers/gpu/drm/exynos/exynos_drm_g2d.c
 b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
 index ddcfb5d..0fcfbe4 100644
 --- a/drivers/gpu/drm/exynos/exynos_drm_g2d.c
 +++ b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
 @@ -19,6 +19,7 @@
  #include linux/workqueue.h
  #include linux/dma-mapping.h
  #include linux/dma-attrs.h
 +#include linux/of.h
 
  #include drm/drmP.h
  #include drm/exynos_drm.h
 @@ -1240,6 +1241,14 @@ static int g2d_resume(struct device *dev)
 
  static SIMPLE_DEV_PM_OPS(g2d_pm_ops, g2d_suspend, g2d_resume);
 
 +#ifdef CONFIG_OF
 +static const struct of_device_id exynos_g2d_match[] = {
 + { .compatible = samsung,exynos5250-g2d },

Looks good to me but please add document for it.

To other guys,
And is there anyone who know where this document should be added to?
I'm not sure that the g2d document should be placed in
Documentation/devicetree/bindings/gpu, media, drm/exynos or arm/exynos. At
least, this document should be shared with the g2d hw relevant drivers such
as v4l2 and drm. So is .../bindings/gpu proper place?

Thanks,
Inki Dae

 + {},
 +};
 +MODULE_DEVICE_TABLE(of, exynos_g2d_match);
 +#endif
 +
  struct platform_driver g2d_driver = {
   .probe  = g2d_probe,
   .remove = g2d_remove,
 @@ -1247,5 +1256,6 @@ struct platform_driver g2d_driver = {
   .name   = s5p-g2d,
   .owner  = THIS_MODULE,
   .pm = g2d_pm_ops,
 + .of_match_table = of_match_ptr(exynos_g2d_match),
   },
  };
 --
 1.7.4.1

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] drm/exynos: Add device tree based discovery support for G2D

2013-02-04 Thread Inki Dae

2013/2/4 Sachin Kamat sachin.ka...@linaro.org:
 On 1 February 2013 18:28, Inki Dae daei...@gmail.com wrote:




 2013. 2. 1. 오후 8:52 Inki Dae inki@samsung.com 작성:



 -Original Message-
 From: linux-media-ow...@vger.kernel.org [mailto:linux-media-
 ow...@vger.kernel.org] On Behalf Of Sachin Kamat
 Sent: Friday, February 01, 2013 8:40 PM
 To: Inki Dae
 Cc: Sylwester Nawrocki; Kukjin Kim; Sylwester Nawrocki; linux-
 me...@vger.kernel.org; dri-de...@lists.freedesktop.org; devicetree-
 disc...@lists.ozlabs.org; patc...@linaro.org
 Subject: Re: [PATCH 2/2] drm/exynos: Add device tree based discovery
 support for G2D

 On 1 February 2013 17:02, Inki Dae inki@samsung.com wrote:

 How about using like below?
Compatible = samsung,exynos4x12-fimg-2d /* for Exynos4212,
 Exynos4412  */
 It looks odd to use samsung,exynos4212-fimg-2d saying that this ip is
 for
 exynos4212 and exynos4412.

 AFAIK, compatible strings are not supposed to have any wildcard
 characters.
 Compatible string should suggest the first SoC that contained this IP.
 Hence IMO 4212 is OK.


 Oops, one more thing. AFAIK Exynos4210 also has fimg-2d ip. In this case, we 
 should use samsung,exynos4210-fimg-2d as comparible string and add it to 
 exynos4210.dtsi?

 Exynos4210 has same g2d IP (v3.0) as C110 or V210; so the same
 comptible string will be used for this one too.

 And please check if exynos4212 and 4412 SoCs have same fimg-2d ip. If it's 
 different, we might need to add ip version property or compatible string to 
 each dtsi file to identify the ip version.

 AFAIK, they both have the same IP (v4.1).


Ok, let's use the below,

For exynos4210 SoC,
compatible = samsung,exynos4210-g2d

For exynos4x12 SoCs,
compatible = samsung,exynos4212-g2d

For exynos5250, 5410 (In case of Exynos5440, I'm not sure that the SoC
has same ip)
compatible = samsung,exynos5250-g2d

To other guys,
The device tree is used by not only v4l2 side but also drm side so we
should reach an arrangement. So please give me ack if you agree to my
opinion. Otherwise please, give me your opinions.

Thanks,
Inki Dae



 Sorry but give me your opinions.

 Thanks,
 Inki Dae



 Got it. Please post it again.


 --
 With warm regards,
 Sachin
 --
 To unsubscribe from this list: send the line unsubscribe linux-media in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

 ___
 dri-devel mailing list
 dri-de...@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/dri-devel



 --
 With warm regards,
 Sachin
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Linaro-mm-sig] [PATCH 6/7] reservation: cross-device reservation support

2013-02-03 Thread Inki Dae

 +/**
 + * ticket_commit - commit a reservation with a new fence
 + * @ticket:[in]the reservation_ticket returned by
 + * ticket_reserve
 + * @entries:   [in]a linked list of struct reservation_entry
 + * @fence: [in]the fence that indicates completion
 + *
 + * This function will call reservation_ticket_fini, no need
 + * to do it manually.
 + *
 + * This function should be called after a hardware command submission is
 + * completed succesfully. The fence is used to indicate completion of
 + * those commands.
 + */
 +void
 +ticket_commit(struct reservation_ticket *ticket,
 + struct list_head *entries, struct fence *fence)
 +{
 +   struct list_head *cur;
 +
 +   if (list_empty(entries))
 +   return;
 +
 +   if (WARN_ON(!fence)) {
 +   ticket_backoff(ticket, entries);
 +   return;
 +   }
 +
 +   list_for_each(cur, entries) {
 +   struct reservation_object *bo;
 +   bool shared;
 +
 +   reservation_entry_get(cur, bo, shared);
 +
 +   if (!shared) {
 +   int i;
 +   for (i = 0; i  bo-fence_shared_count; ++i) {
 +   fence_put(bo-fence_shared[i]);
 +   bo-fence_shared[i] = NULL;
 +   }
 +   bo-fence_shared_count = 0;
 +   if (bo-fence_excl)
 +   fence_put(bo-fence_excl);
 +
 +   bo-fence_excl = fence;
 +   } else {
 +   if (WARN_ON(bo-fence_shared_count =
 +   ARRAY_SIZE(bo-fence_shared))) {
 +   mutex_unreserve_unlock(bo-lock);
 +   continue;
 +   }
 +
 +   bo-fence_shared[bo-fence_shared_count++] = fence;
 +   }

Hi,

I got some questions to fence_excl and fence_shared. At the above
code, if bo-fence_excl is not NULL then it puts bo-fence_excl and
sets a new fence to it. This seems like that someone that committed a
new fence, wants to access the given dmabuf exclusively even if
someone is accessing the given dmabuf.

On the other hand, in case of fence_shared, someone wants to access
that dmabuf non-exclusively. So this case seems like that the given
dmabuf could be accessed by two more devices. So I guess that the
fence_excl could be used for write access(may need buffer sync like
blocking) and read access for the fence_shared(may not need buffer
sync). I'm not sure that I understand these two things correctly so
could you please give me more comments for them?

Thanks,
Inki Dae

 +   fence_get(fence);
 +
 +   mutex_unreserve_unlock(bo-lock);
 +   }
 +   reservation_ticket_fini(ticket);
 +}
 +EXPORT_SYMBOL(ticket_commit);
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH 2/2] drm/exynos: Add device tree based discovery support for G2D

2013-02-01 Thread Inki Dae

 -Original Message-
 From: linux-media-ow...@vger.kernel.org [mailto:linux-media-
 ow...@vger.kernel.org] On Behalf Of Sachin Kamat
 Sent: Friday, February 01, 2013 8:13 PM
 To: Sylwester Nawrocki
 Cc: Inki Dae; Kukjin Kim; Sylwester Nawrocki; linux-media@vger.kernel.org;
 dri-de...@lists.freedesktop.org; devicetree-disc...@lists.ozlabs.org;
 patc...@linaro.org
 Subject: Re: [PATCH 2/2] drm/exynos: Add device tree based discovery
 support for G2D

  In any case please let me know the final preferred one so that I can
  update the code send the revised patches.

  The version with SoC name embedded in it seems most reliable and correct
  to me.

  compatible = samsung,exynos3110-fimg-2d /* for Exynos3110 (S5PC110,
 S5PV210),
   Exynos4210 */
  compatible = samsung,exynos4212-fimg-2d /* for Exynos4212, Exynos4412
 */

 Looks good to me.

 Inki, Kukjin, please let us know your opinion so that we can freeze
 this. Also please suggest the SoC name for Exynos5 (5250?).

How about using like below?
Compatible = samsung,exynos4x12-fimg-2d /* for Exynos4212,
Exynos4412  */

It looks odd to use samsung,exynos4212-fimg-2d saying that this ip is for
exynos4212 and exynos4412.

 --
 With warm regards,
 Sachin
 --
 To unsubscribe from this list: send the line unsubscribe linux-media in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH 2/2] drm/exynos: Add device tree based discovery support for G2D

2013-02-01 Thread Inki Dae

 -Original Message-
 From: linux-media-ow...@vger.kernel.org [mailto:linux-media-
 ow...@vger.kernel.org] On Behalf Of Sachin Kamat
 Sent: Friday, February 01, 2013 8:40 PM
 To: Inki Dae
 Cc: Sylwester Nawrocki; Kukjin Kim; Sylwester Nawrocki; linux-
 me...@vger.kernel.org; dri-de...@lists.freedesktop.org; devicetree-
 disc...@lists.ozlabs.org; patc...@linaro.org
 Subject: Re: [PATCH 2/2] drm/exynos: Add device tree based discovery
 support for G2D

 On 1 February 2013 17:02, Inki Dae inki@samsung.com wrote:

  How about using like below?
  Compatible = samsung,exynos4x12-fimg-2d /* for Exynos4212,
  Exynos4412  */
  It looks odd to use samsung,exynos4212-fimg-2d saying that this ip is
 for
  exynos4212 and exynos4412.

 AFAIK, compatible strings are not supposed to have any wildcard
characters.
 Compatible string should suggest the first SoC that contained this IP.
 Hence IMO 4212 is OK.

Got it. Please post it again.

 --
 With warm regards,
 Sachin
 --
 To unsubscribe from this list: send the line unsubscribe linux-media in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] drm/exynos: Add device tree based discovery support for G2D

2013-02-01 Thread Inki Dae





2013. 2. 1. 오후 8:52 Inki Dae inki@samsung.com 작성:

 
 
 -Original Message-
 From: linux-media-ow...@vger.kernel.org [mailto:linux-media-
 ow...@vger.kernel.org] On Behalf Of Sachin Kamat
 Sent: Friday, February 01, 2013 8:40 PM
 To: Inki Dae
 Cc: Sylwester Nawrocki; Kukjin Kim; Sylwester Nawrocki; linux-
 me...@vger.kernel.org; dri-de...@lists.freedesktop.org; devicetree-
 disc...@lists.ozlabs.org; patc...@linaro.org
 Subject: Re: [PATCH 2/2] drm/exynos: Add device tree based discovery
 support for G2D
 
 On 1 February 2013 17:02, Inki Dae inki@samsung.com wrote:
 
 How about using like below?
Compatible = samsung,exynos4x12-fimg-2d /* for Exynos4212,
 Exynos4412  */
 It looks odd to use samsung,exynos4212-fimg-2d saying that this ip is
 for
 exynos4212 and exynos4412.
 
 AFAIK, compatible strings are not supposed to have any wildcard
 characters.
 Compatible string should suggest the first SoC that contained this IP.
 Hence IMO 4212 is OK.
 

Oops, one more thing. AFAIK Exynos4210 also has fimg-2d ip. In this case, we 
should use samsung,exynos4210-fimg-2d as comparible string and add it to 
exynos4210.dtsi?
And please check if exynos4212 and 4412 SoCs have same fimg-2d ip. If it's 
different, we might need to add ip version property or compatible string to 
each dtsi file to identify the ip version.

Sorry but give me your opinions.

Thanks,
Inki Dae


 
 Got it. Please post it again.
 
 
 --
 With warm regards,
 Sachin
 --
 To unsubscribe from this list: send the line unsubscribe linux-media in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 ___
 dri-devel mailing list
 dri-de...@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/dri-devel
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Linaro-mm-sig] [PATCH 4/7] fence: dma-buf cross-device synchronization (v11)

2013-01-31 Thread Inki Dae

Hi,

below is my opinion.

 +struct fence;
 +struct fence_ops;
 +struct fence_cb;
 +
 +/**
 + * struct fence - software synchronization primitive
 + * @refcount: refcount for this fence
 + * @ops: fence_ops associated with this fence
 + * @cb_list: list of all callbacks to call
 + * @lock: spin_lock_irqsave used for locking
 + * @priv: fence specific private data
 + * @flags: A mask of FENCE_FLAG_* defined below
 + *
 + * the flags member must be manipulated and read using the appropriate
 + * atomic ops (bit_*), so taking the spinlock will not be needed most
 + * of the time.
 + *
 + * FENCE_FLAG_SIGNALED_BIT - fence is already signaled
 + * FENCE_FLAG_ENABLE_SIGNAL_BIT - enable_signaling might have been called*
 + * FENCE_FLAG_USER_BITS - start of the unused bits, can be used by the
 + * implementer of the fence for its own purposes. Can be used in different
 + * ways by different fence implementers, so do not rely on this.
 + *
 + * *) Since atomic bitops are used, this is not guaranteed to be the case.
 + * Particularly, if the bit was set, but fence_signal was called right
 + * before this bit was set, it would have been able to set the
 + * FENCE_FLAG_SIGNALED_BIT, before enable_signaling was called.
 + * Adding a check for FENCE_FLAG_SIGNALED_BIT after setting
 + * FENCE_FLAG_ENABLE_SIGNAL_BIT closes this race, and makes sure that
 + * after fence_signal was called, any enable_signaling call will have either
 + * been completed, or never called at all.
 + */
 +struct fence {
 +   struct kref refcount;
 +   const struct fence_ops *ops;
 +   struct list_head cb_list;
 +   spinlock_t *lock;
 +   unsigned context, seqno;
 +   unsigned long flags;
 +};
 +
 +enum fence_flag_bits {
 +   FENCE_FLAG_SIGNALED_BIT,
 +   FENCE_FLAG_ENABLE_SIGNAL_BIT,
 +   FENCE_FLAG_USER_BITS, /* must always be last member */
 +};
 +

It seems like that this fence framework need to add read/write flags.
In case of two read operations, one might wait for another one. But
the another is just read operation so we doesn't need to wait for it.
Shouldn't fence-wait-request be ignored? In this case, I think it's
enough to consider just only write operation.

For this, you could add the following,

enum fence_flag_bits {
...
FENCE_FLAG_ACCESS_READ_BIT,
FENCE_FLAG_ACCESS_WRITE_BIT,
...
};

And the producer could call fence_init() like below,
__fence_init(..., FENCE_FLAG_ACCESS_WRITE_BIT,...);

With this, fence-flags has FENCE_FLAG_ACCESS_WRITE_BIT as write
operation and then other sides(read or write operation) would wait for
the write operation completion.
And also consumer calls that function with FENCE_FLAG_ACCESS_READ_BIT
so that other consumers could ignore the fence-wait to any read
operations.

Thanks,
Inki Dae
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Linaro-mm-sig] [PATCH 4/7] fence: dma-buf cross-device synchronization (v11)

2013-01-31 Thread Inki Dae

2013/1/31 Daniel Vetter dan...@ffwll.ch:
 On Thu, Jan 31, 2013 at 06:32:15PM +0900, Inki Dae wrote:
 Hi,

 below is my opinion.

  +struct fence;
  +struct fence_ops;
  +struct fence_cb;
  +
  +/**
  + * struct fence - software synchronization primitive
  + * @refcount: refcount for this fence
  + * @ops: fence_ops associated with this fence
  + * @cb_list: list of all callbacks to call
  + * @lock: spin_lock_irqsave used for locking
  + * @priv: fence specific private data
  + * @flags: A mask of FENCE_FLAG_* defined below
  + *
  + * the flags member must be manipulated and read using the appropriate
  + * atomic ops (bit_*), so taking the spinlock will not be needed most
  + * of the time.
  + *
  + * FENCE_FLAG_SIGNALED_BIT - fence is already signaled
  + * FENCE_FLAG_ENABLE_SIGNAL_BIT - enable_signaling might have been called*
  + * FENCE_FLAG_USER_BITS - start of the unused bits, can be used by the
  + * implementer of the fence for its own purposes. Can be used in different
  + * ways by different fence implementers, so do not rely on this.
  + *
  + * *) Since atomic bitops are used, this is not guaranteed to be the case.
  + * Particularly, if the bit was set, but fence_signal was called right
  + * before this bit was set, it would have been able to set the
  + * FENCE_FLAG_SIGNALED_BIT, before enable_signaling was called.
  + * Adding a check for FENCE_FLAG_SIGNALED_BIT after setting
  + * FENCE_FLAG_ENABLE_SIGNAL_BIT closes this race, and makes sure that
  + * after fence_signal was called, any enable_signaling call will have 
  either
  + * been completed, or never called at all.
  + */
  +struct fence {
  +   struct kref refcount;
  +   const struct fence_ops *ops;
  +   struct list_head cb_list;
  +   spinlock_t *lock;
  +   unsigned context, seqno;
  +   unsigned long flags;
  +};
  +
  +enum fence_flag_bits {
  +   FENCE_FLAG_SIGNALED_BIT,
  +   FENCE_FLAG_ENABLE_SIGNAL_BIT,
  +   FENCE_FLAG_USER_BITS, /* must always be last member */
  +};
  +

 It seems like that this fence framework need to add read/write flags.
 In case of two read operations, one might wait for another one. But
 the another is just read operation so we doesn't need to wait for it.
 Shouldn't fence-wait-request be ignored? In this case, I think it's
 enough to consider just only write operation.

 For this, you could add the following,

 enum fence_flag_bits {
 ...
 FENCE_FLAG_ACCESS_READ_BIT,
 FENCE_FLAG_ACCESS_WRITE_BIT,
 ...
 };

 And the producer could call fence_init() like below,
 __fence_init(..., FENCE_FLAG_ACCESS_WRITE_BIT,...);

 With this, fence-flags has FENCE_FLAG_ACCESS_WRITE_BIT as write
 operation and then other sides(read or write operation) would wait for
 the write operation completion.
 And also consumer calls that function with FENCE_FLAG_ACCESS_READ_BIT
 so that other consumers could ignore the fence-wait to any read
 operations.

 Fences here match more to the sync-points concept from the android stuff.
 The idea is that they only signal when a hw operation completes.

 Synchronization integration happens at the dma_buf level, where you can
 specify whether the new operation you're doing is exclusive (which means
 that you need to wait for all previous operations to complete), i.e. a
 write. Or whether the operation is non-excluses (i.e. just reading) in
 which case you only need to wait for any still outstanding exclusive
 fences attached to the dma_buf. But you _can_ attach more than one
 non-exclusive fence to a dma_buf at the same time, and so e.g. read a
 buffer objects from different engines concurrently.

 There's been some talk whether we also need a non-exclusive write
 attachment (i.e. allow multiple concurrent writers), but I don't yet fully
 understand the use-case.

 In short the proposed patches can do what you want to do, it's just that
 read/write access isn't part of the fences, but how you attach fences to
 dma_bufs.


Thanks for comments, Maarten and Daniel.

I think I understand as your comment but I don't think that I
understand fully the dma-fence mechanism. So I wish you to give me
some advices for it. In our case, I'm applying the dma-fence to
mali(3d gpu) driver as producer and exynos drm(display controller)
driver as consumer.

And the sequence is as the following:
In case of producer,
1. call fence_wait to wait for the dma access completion of others.
2. And then the producer creates a fence and a new reservation entry.
3. And then it sets the given dmabuf's resv(reservation_object) to the
new reservation entry.
4. And then it adds the reservation entry to entries list.
5. And then it sets the fence to all dmabufs of the entries list.
Actually, this work is to set the fence to the reservaion_object of
each dmabuf.
6. And then the producer's dma start.
7. Finally, when the dma start is completed, we get the entries list
from a 3d job command(in case of mali core, pp job) and call
fence_signal

RE: [PATCH 2/2] drm/exynos: Add device tree based discovery support for G2D

2013-01-31 Thread Inki Dae

Hi Kukjin,

 -Original Message-
 From: linux-media-ow...@vger.kernel.org [mailto:linux-media-
 ow...@vger.kernel.org] On Behalf Of Kukjin Kim
 Sent: Friday, February 01, 2013 9:15 AM
 To: 'Sylwester Nawrocki'; 'Inki Dae'
 Cc: 'Sachin Kamat'; linux-media@vger.kernel.org; dri-
 de...@lists.freedesktop.org; devicetree-disc...@lists.ozlabs.org;
 patc...@linaro.org; s.nawro...@samsung.com
 Subject: RE: [PATCH 2/2] drm/exynos: Add device tree based discovery
 support for G2D

 Sylwester Nawrocki wrote:

  Hi Inki,

 Hi Sylwester and Inki,

  On 01/31/2013 02:30 AM, Inki Dae wrote:
   -Original Message-
   From: Sylwester Nawrocki [mailto:sylvester.nawro...@gmail.com]
   Sent: Thursday, January 31, 2013 5:51 AM
   To: Inki Dae
   Cc: Sachin Kamat; linux-media@vger.kernel.org; dri-
   de...@lists.freedesktop.org; devicetree-disc...@lists.ozlabs.org;
   patc...@linaro.org; s.nawro...@samsung.com
   Subject: Re: [PATCH 2/2] drm/exynos: Add device tree based discovery
   support for G2D

   On 01/30/2013 09:50 AM, Inki Dae wrote:
   +static const struct of_device_id exynos_g2d_match[] = {
   +   { .compatible = samsung,g2d-v41 },

   not only Exynos5 and also Exyno4 has the g2d gpu and drm-based g2d
   driver shoud support for all Exynos SoCs. How about using
   samsung,exynos5-g2d instead and adding a new property 'version' to
   identify ip version more surely? With this, we could know which SoC
   and its g2d ip version. The version property could have '0x14' or
   others. And please add descriptions to dt document.

   Err no. Are you suggesting using samsung,exynos5-g2d compatible
  string
   for Exynos4 specific IPs ? This would not be correct, and you still
 can

   I assumed the version 'v41' is the ip for Exynos5 SoC. So if this
 version
   means Exynos4 SoC then it should be samsung,exynos4-g2d.

  Yes, v3.0 is implemented in the S5PC110 (Exynos3110) SoCs and
Exynos4210,
  V4.1 can be found in Exynos4212 and Exynos4412, if I'm not mistaken.

  So we could have:

  compatible = samsung,exynos-g2d-3.0 /* for Exynos3110, Exynos4210 */
  compatible = samsung,exynos-g2d-4.1 /* for Exynos4212, Exynos4412 */

 In my opinion, this is better than later. Because as I said, when we can
 use
 IP version to identify, it is more clear and can be used

 One more, how about following?

 compatible = samsung,g2d-3.0
 compatible = samsung,g2d-4.1

I think compatible string should be considered case by case.

For example,
If compatible = samsung,g2d-3.0 is added to exynos4210.dtsi, it'd be
reasonable. But what if that compatible string is added to exynos4.dtsi?.
This case isn't considered for exynos4412 SoC with v4.1. 

So at least shouldn't that compatible string include SoC version so that
that can be added to proper dtsi file? And I'm not sure how the ip version
should be dealt with as of now:( Really enough to know the ip version
implicitly(ie. exynos4412 string means implicitly that its g2d ip version is
v4.1 so its device driver refers to the necessary data through
of_device_id's data)?

 I think, just g2d is enough. For example, we are using it for mfc like
 following: compatible = samsung.mfc-v6

  or alternatively

  compatible = samsung,exynos3110-g2d /* for Exynos3110, Exynos4210 */
  compatible = samsung,exynos4212-g2d /* for Exynos4212, Exynos4412 */

So, IMO, I think this is better than first one.

Thanks,
Inki Dae

 Thanks.

 - Kukjin

 --
 To unsubscribe from this list: send the line unsubscribe linux-media in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] drm/exynos: Add device tree based discovery support for G2D

2013-01-30 Thread Inki Dae

2013/1/25 Sachin Kamat sachin.ka...@linaro.org:
 From: Ajay Kumar ajaykumar...@samsung.com

 This patch adds device tree match table for Exynos G2D controller.

 Signed-off-by: Ajay Kumar ajaykumar...@samsung.com
 Signed-off-by: Sachin Kamat sachin.ka...@linaro.org
 ---
  drivers/gpu/drm/exynos/exynos_drm_g2d.c |   10 ++
  1 files changed, 10 insertions(+), 0 deletions(-)

 diff --git a/drivers/gpu/drm/exynos/exynos_drm_g2d.c 
 b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
 index ddcfb5d..d24b170 100644
 --- a/drivers/gpu/drm/exynos/exynos_drm_g2d.c
 +++ b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
 @@ -19,6 +19,7 @@
  #include linux/workqueue.h
  #include linux/dma-mapping.h
  #include linux/dma-attrs.h
 +#include linux/of.h

  #include drm/drmP.h
  #include drm/exynos_drm.h
 @@ -1240,6 +1241,14 @@ static int g2d_resume(struct device *dev)

  static SIMPLE_DEV_PM_OPS(g2d_pm_ops, g2d_suspend, g2d_resume);

 +#ifdef CONFIG_OF
 +static const struct of_device_id exynos_g2d_match[] = {
 +   { .compatible = samsung,g2d-v41 },

not only Exynos5 and also Exyno4 has the g2d gpu and drm-based g2d
driver shoud support for all Exynos SoCs. How about using
samsung,exynos5-g2d instead and adding a new property 'version' to
identify ip version more surely? With this, we could know which SoC
and its g2d ip version. The version property could have '0x14' or
others. And please add descriptions to dt document.

 +   {},
 +};
 +MODULE_DEVICE_TABLE(of, exynos_g2d_match);
 +#endif
 +
  struct platform_driver g2d_driver = {
 .probe  = g2d_probe,
 .remove = g2d_remove,
 @@ -1247,5 +1256,6 @@ struct platform_driver g2d_driver = {
 .name   = s5p-g2d,
 .owner  = THIS_MODULE,
 .pm = g2d_pm_ops,
 +   .of_match_table = of_match_ptr(exynos_g2d_match),
 },
  };
 --
 1.7.4.1

 ___
 dri-devel mailing list
 dri-de...@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/dri-devel
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH 2/2] drm/exynos: Add device tree based discovery support for G2D

2013-01-30 Thread Inki Dae

 -Original Message-
 From: Sylwester Nawrocki [mailto:sylvester.nawro...@gmail.com]
 Sent: Thursday, January 31, 2013 5:51 AM
 To: Inki Dae
 Cc: Sachin Kamat; linux-media@vger.kernel.org; dri-
 de...@lists.freedesktop.org; devicetree-disc...@lists.ozlabs.org;
 patc...@linaro.org; s.nawro...@samsung.com
 Subject: Re: [PATCH 2/2] drm/exynos: Add device tree based discovery
 support for G2D

 On 01/30/2013 09:50 AM, Inki Dae wrote:
  +static const struct of_device_id exynos_g2d_match[] = {
  +   { .compatible = samsung,g2d-v41 },

  not only Exynos5 and also Exyno4 has the g2d gpu and drm-based g2d
  driver shoud support for all Exynos SoCs. How about using
  samsung,exynos5-g2d instead and adding a new property 'version' to
  identify ip version more surely? With this, we could know which SoC
  and its g2d ip version. The version property could have '0x14' or
  others. And please add descriptions to dt document.

 Err no. Are you suggesting using samsung,exynos5-g2d compatible string
 for Exynos4 specific IPs ? This would not be correct, and you still can

I assumed the version 'v41' is the ip for Exynos5 SoC. So if this version
means Exynos4 SoC then it should be samsung,exynos4-g2d.

 match the driver with multiple different revisions of the IP and associate
 any required driver's private data with each corresponding compatible
 property.

Right, and for why I prefer to use version property instead of embedded
version string, you can refer to the my comment I replied already to the
drm/exynos: Get HDMI version from device tree email thread.

 Perhaps it would make more sense to include the SoCs name in the
 compatible
 string, e.g. samsung,exynos-g2d-v41, but appending revision of the IP
 seems acceptable to me. The revisions appear to be well documented and
 it's
 more or less clear which one corresponds to which SoC.

 --

 Thanks,
 Sylwester

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Linaro-mm-sig] [PATCH 5/7] seqno-fence: Hardware dma-buf implementation of fencing (v4)

2013-01-16 Thread Inki Dae

2013/1/16 Maarten Lankhorst maarten.lankho...@canonical.com:
 Op 16-01-13 07:28, Inki Dae schreef:
 2013/1/15 Maarten Lankhorst m.b.lankho...@gmail.com:
 This type of fence can be used with hardware synchronization for simple
 hardware that can block execution until the condition
 (dma_buf[offset] - value) = 0 has been met.

 A software fallback still has to be provided in case the fence is used
 with a device that doesn't support this mechanism. It is useful to expose
 this for graphics cards that have an op to support this.

 Some cards like i915 can export those, but don't have an option to wait,
 so they need the software fallback.

 I extended the original patch by Rob Clark.

 v1: Original
 v2: Renamed from bikeshed to seqno, moved into dma-fence.c since
 not much was left of the file. Lots of documentation added.
 v3: Use fence_ops instead of custom callbacks. Moved to own file
 to avoid circular dependency between dma-buf.h and fence.h
 v4: Add spinlock pointer to seqno_fence_init

 Signed-off-by: Maarten Lankhorst maarten.lankho...@canonical.com
 ---
  Documentation/DocBook/device-drivers.tmpl |   1 +
  drivers/base/fence.c  |  38 +++
  include/linux/seqno-fence.h   | 105 
 ++
  3 files changed, 144 insertions(+)
  create mode 100644 include/linux/seqno-fence.h

 diff --git a/Documentation/DocBook/device-drivers.tmpl 
 b/Documentation/DocBook/device-drivers.tmpl
 index 6f53fc0..ad14396 100644
 --- a/Documentation/DocBook/device-drivers.tmpl
 +++ b/Documentation/DocBook/device-drivers.tmpl
 @@ -128,6 +128,7 @@ X!Edrivers/base/interface.c
  !Edrivers/base/dma-buf.c
  !Edrivers/base/fence.c
  !Iinclude/linux/fence.h
 +!Iinclude/linux/seqno-fence.h
  !Edrivers/base/dma-coherent.c
  !Edrivers/base/dma-mapping.c
   /sect1
 diff --git a/drivers/base/fence.c b/drivers/base/fence.c
 index 28e5ffd..1d3f29c 100644
 --- a/drivers/base/fence.c
 +++ b/drivers/base/fence.c
 @@ -24,6 +24,7 @@
  #include linux/slab.h
  #include linux/export.h
  #include linux/fence.h
 +#include linux/seqno-fence.h

  atomic_t fence_context_counter = ATOMIC_INIT(0);
  EXPORT_SYMBOL(fence_context_counter);
 @@ -284,3 +285,40 @@ out:
 return ret;
  }
  EXPORT_SYMBOL(fence_default_wait);
 +
 +static bool seqno_enable_signaling(struct fence *fence)
 +{
 +   struct seqno_fence *seqno_fence = to_seqno_fence(fence);
 +   return seqno_fence-ops-enable_signaling(fence);
 +}
 +
 +static bool seqno_signaled(struct fence *fence)
 +{
 +   struct seqno_fence *seqno_fence = to_seqno_fence(fence);
 +   return seqno_fence-ops-signaled  
 seqno_fence-ops-signaled(fence);
 +}
 +
 +static void seqno_release(struct fence *fence)
 +{
 +   struct seqno_fence *f = to_seqno_fence(fence);
 +
 +   dma_buf_put(f-sync_buf);
 +   if (f-ops-release)
 +   f-ops-release(fence);
 +   else
 +   kfree(f);
 +}
 +
 +static long seqno_wait(struct fence *fence, bool intr, signed long timeout)
 +{
 +   struct seqno_fence *f = to_seqno_fence(fence);
 +   return f-ops-wait(fence, intr, timeout);
 +}
 +
 +const struct fence_ops seqno_fence_ops = {
 +   .enable_signaling = seqno_enable_signaling,
 +   .signaled = seqno_signaled,
 +   .wait = seqno_wait,
 +   .release = seqno_release
 +};
 +EXPORT_SYMBOL_GPL(seqno_fence_ops);
 diff --git a/include/linux/seqno-fence.h b/include/linux/seqno-fence.h
 new file mode 100644
 index 000..603adc0
 --- /dev/null
 +++ b/include/linux/seqno-fence.h
 @@ -0,0 +1,105 @@
 +/*
 + * seqno-fence, using a dma-buf to synchronize fencing
 + *
 + * Copyright (C) 2012 Texas Instruments
 + * Copyright (C) 2012 Canonical Ltd
 + * Authors:
 + *   Rob Clark rob.cl...@linaro.org
 + *   Maarten Lankhorst maarten.lankho...@canonical.com
 + *
 + * This program is free software; you can redistribute it and/or modify it
 + * under the terms of the GNU General Public License version 2 as 
 published by
 + * the Free Software Foundation.
 + *
 + * This program is distributed in the hope that it will be useful, but 
 WITHOUT
 + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
 + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License 
 for
 + * more details.
 + *
 + * You should have received a copy of the GNU General Public License along 
 with
 + * this program.  If not, see http://www.gnu.org/licenses/.
 + */
 +
 +#ifndef __LINUX_SEQNO_FENCE_H
 +#define __LINUX_SEQNO_FENCE_H
 +
 +#include linux/fence.h
 +#include linux/dma-buf.h
 +
 +struct seqno_fence {
 +   struct fence base;
 +
 +   const struct fence_ops *ops;
 +   struct dma_buf *sync_buf;
 +   uint32_t seqno_ofs;
 +};
 Hi maarten,

 I'm applying dma-fence v11 and seqno-fence v4 to exynos drm and have
 some proposals.

 The above seqno_fence structure has only one dmabuf. Shouldn't it have
 mutiple dmabufs? For example, in case of drm driver, when pageflip is
 requested, one

Re: [Linaro-mm-sig] [PATCH 5/7] seqno-fence: Hardware dma-buf implementation of fencing (v4)

2013-01-15 Thread Inki Dae

 be
sychronized with other devices. Below is simple structure for it,

struct seqno_fence_dmabuf {
struct list_headlist;
intid;
struct dmabuf  *sync_buf;
uint32_t   seqno_ops;
uint32_t   seqno;
};

The member, id, could be used to identify which device sync_buf is
going to be accessed by. In case of drm driver, one framebuffer could
be accessed by one more devices, one is Display controller and another
is HDMI controller. So id would have crtc number.

And seqno_fence structure could be defined like below,

struct seqno_fence {
struct list_headsync_buf_list;
struct fence  base;
const struct fence_ops *ops;
};

In addition, I have implemented fence-helper framework for sw sync as
WIP and below is intefaces for it,

struct fence_helper {
struct list_headentries;
struct reservation_ticket   ticket;
struct seqno_fence  *sf;
spinlock_t lock;
void  *priv;
};

int fence_helper_init(struct fence_helper *fh, void *priv, void
(*remease)(struct fence *fence));
- This function is called at driver open so process unique context
would have a new seqno_fence instance. This function does just
seqno_fence_init call, initialize entries list and set device specific
fence release callback.

bool fence_helper_check_sync_buf(struct fence_helper *fh, struct
dma_buf *sync_buf, int id);
- This function is called before dma is started and checks if same
sync_bufs had already be committed to reservation_object,
bo-fence_shared[n]. And id could be used to identy which device
sync_buf is going to be accessed by.

int fence_helper_add_sync_buf(struct fence_helper *fh, struct dma_buf
*sync_buf, int id);
- This function is called if fence_helper_check_sync_buf() is true and
adds it seqno_fence's sync_buf_list wrapping sync_buf as
seqno_fence_dma_buf structure. With this function call, one
seqno_fence instance would have more sync_bufs. At this time, the
reference count to this sync_buf is taken.

void fence_helper_del_sync_buf(struct fence_helper *fh, int id);
- This function is called if some operation is failed after
fence_helper_add_sync_buf call to release relevant resources.

int fence_helper_init_reservation_entry(struct fence_helper *fh,
struct dma_buf *dmabuf, bool shared, int id);
- This function is called after fence_helper_add_sync_buf call and
calls reservation_entry_init function to set a reservation object of
sync_buf to a new reservation_entry object. And then the new
reservation_entry is added to fh-entries to track all sync_bufs this
device is going to access.

void fence_helper_fini_reservation_entry(struct fence_helper *fh, int id);
- This function is called if some operation is failed after
fence_helper_init_reservation_entry call to releae relevant resources.

int fence_helper_ticket_reserve(struct fence_helper *fh, int id);
- This function is called after fence_helper_init_reservation_entry
call and calls ticket_reserve function to reserve a ticket(locked for
each reservation entry in fh-entires)

void fence_helper_ticket_commit(struct fence_helper *fh, int id);
- This function is called after fence_helper_ticket_reserve() is
called to commit this device's fence to all reservation_objects of
each sync_buf. After that, once other devices try to access these
buffers, they would be blocked and unlock each reservation entry in
fh-entires.

int fence_helper_wait(struct fence_helper *fh, struct dma_buf *dmabuf,
bool intr);
- This function is called before fence_helper_add_sync_buf() is called
to wait for a signal from other devices.

int fence_helper_signal(struct fence_helper *fh, int id);
- This function is called by device's interrupt handler or somewhere
when dma access to this buffer has been completed and calls
fence_signal() with each fence registed to each reservation object in
fh-entries to notify dma access completion to other deivces. At this
time, other devices blocked would be waked up and forward their next
step.

For more detail, in addition, this function does the following,
- delete each reservation entry in fh-entries.
- release each seqno_fence_dmabuf object in seqno_fence's
sync_buf_list and call dma_buf_put() to put the reference count to
dmabuf.


Now the fence-helper framework is just WIP yet so there may be my
missing points. If you are ok, I'd like to post it as RFC.

Thanks,
Inki Dae


 +
 +extern const struct fence_ops seqno_fence_ops;
 +
 +/**
 + * to_seqno_fence - cast a fence to a seqno_fence
 + * @fence: fence to cast to a seqno_fence
 + *
 + * Returns NULL if the fence is not a seqno_fence,
 + * or the seqno_fence otherwise.
 + */
 +static inline struct seqno_fence *
 +to_seqno_fence(struct fence *fence)
 +{
 +   if (fence-ops != seqno_fence_ops)
 +   return

Re: [PATCH 2/2] [RFC] video: display: Adding frame related ops to MIPI DSI video source struct

2013-01-09 Thread Inki Dae

2013/1/10 Laurent Pinchart laurent.pinch...@ideasonboard.com:
 Hi Vikas,

 Thank you for the patch.

 On Friday 04 January 2013 10:24:04 Vikas Sajjan wrote:
 On 3 January 2013 16:29, Tomasz Figa t.f...@samsung.com wrote:
  On Wednesday 02 of January 2013 18:47:22 Vikas C Sajjan wrote:
  From: Vikas Sajjan vikas.saj...@linaro.org
 
  Signed-off-by: Vikas Sajjan vikas.saj...@linaro.org
  ---
 
   include/video/display.h |6 ++
   1 file changed, 6 insertions(+)
 
  diff --git a/include/video/display.h b/include/video/display.h
  index b639fd0..fb2f437 100644
  --- a/include/video/display.h
  +++ b/include/video/display.h
  @@ -117,6 +117,12 @@ struct dsi_video_source_ops {
 
void (*enable_hs)(struct video_source *src, bool enable);
 
  + /* frame related */
  + int (*get_frame_done)(struct video_source *src);
  + int (*clear_frame_done)(struct video_source *src);
  + int (*set_early_blank_mode)(struct video_source *src, int power);
  + int (*set_blank_mode)(struct video_source *src, int power);
  +
 
  I'm not sure if all those extra ops are needed in any way.
 
  Looking and Exynos MIPI DSIM driver, set_blank_mode is handling only
  FB_BLANK_UNBLANK status, which basically equals to the already existing
  enable operation, while set_early_blank mode handles only
  FB_BLANK_POWERDOWN, being equal to disable callback.

 Right, exynos_mipi_dsi_blank_mode() only supports FB_BLANK_UNBLANK as
 of now, but FB_BLANK_NORMAL will be supported in future.
 If not for Exynos, i think it will be need for other SoCs which
 support FB_BLANK_UNBLANK and FB_BLANK_NORMAL.

 Could you please explain in a bit more details what the set_early_blank_mode
 and set_blank_mode operations do ?

  Both get_frame_done and clear_frame_done do not look at anything used at
  the moment and if frame done status monitoring will be ever needed, I
  think a better way should be implemented.

 You are right, as of now Exynos MIPI DSI Panels are NOT using these
 callbacks, but as you mentioned we will need frame done status monitoring
 anyways, so i included these callbacks here. Will check, if we can implement
 any better method.

 Do you expect the entity drivers (and in particular the panel drivers) to
 require frame done notification ? If so, could you explain your use case(s) ?


Hi Laurent,

As you know, there are two types of MIPI-DSI based lcd panels, RGB and
CPU mode. In case of CPU mode lcd panel, it has its own framebuffer
internally and the image in the framebuffer is transferred on lcd
panel in 60Hz itself. But for this, there is something we should
consider. The display controller with CPU mode doens't transfer image
data to MIPI-DSI controller itself. So we should set trigger bit of
the display controller to 1 to do it and also check whether the data
transmission in the framebuffer is done on lcd panel to avoid tearing
issue and some confliction issue(A) between read and write operations
like below,

lcd_panel_frame_done_interrrupt_handler()
{
...
if (mipi-dsi frame done)
trigger display controller;
...
}

A. the issue that LCD panel can access its own framebuffer while some
new data from MIPI-DSI controller is being written in the framebuffer.

But I think there might be better way to avoid such thing.

Thanks,
Inki Dae

 --
 Regards,

 Laurent Pinchart

 ___
 dri-devel mailing list
 dri-de...@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/dri-devel
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC 0/5] Generic panel framework

2012-10-20 Thread Inki Dae

 above
 I'll buy you a beer at the next conference we will both attend. If you think
 the proposed solution is too complex, or too simple, I'm all ears. I
 personally already feel that we might need something even more generic to
 support other kinds of external devices connected to display controllers, such
 as external DSI to HDMI converters for instance. Some kind of video entity
 exposing abstract operations like the panels do would make sense, in which
 case panels would inherit from that video entity.

 Speaking of conferences, I will attend the KS/LPC in San Diego in a bit more
 than a week, and would be happy to discuss this topic face to face there.

 Laurent Pinchart (5):
   video: Add generic display panel core
   video: panel: Add dummy panel support
   video: panel: Add MIPI DBI bus support
   video: panel: Add R61505 panel support
   video: panel: Add R61517 panel support

how about using 'buses' directory instead of 'panel' and adding
'panel' under that like below?
video/buess: display bus frameworks such as MIPI-DBI/DSI and eDP are placed.
video/buess/panel: panel drivers based on display bus-based drivers are placed.

I think MIPI-DBI(Display Bus Interface)/DSI(Display Serial Interface)
and eDP are the bus interfaces for display controllers such as
DISC(OMAP SoC) and FIMC(Exynos SoC).

Thanks,
Inki Dae


  drivers/video/Kconfig  |1 +
  drivers/video/Makefile |1 +
  drivers/video/panel/Kconfig|   37 +++
  drivers/video/panel/Makefile   |5 +
  drivers/video/panel/panel-dbi.c|  217 +++
  drivers/video/panel/panel-dummy.c  |  103 +++
  drivers/video/panel/panel-r61505.c |  520 
 
  drivers/video/panel/panel-r61517.c |  408 
  drivers/video/panel/panel.c|  269 +++
  include/video/panel-dbi.h  |   92 +++
  include/video/panel-dummy.h|   25 ++
  include/video/panel-r61505.h   |   27 ++
  include/video/panel-r61517.h   |   28 ++
  include/video/panel.h  |  111 
  14 files changed, 1844 insertions(+), 0 deletions(-)
  create mode 100644 drivers/video/panel/Kconfig
  create mode 100644 drivers/video/panel/Makefile
  create mode 100644 drivers/video/panel/panel-dbi.c
  create mode 100644 drivers/video/panel/panel-dummy.c
  create mode 100644 drivers/video/panel/panel-r61505.c
  create mode 100644 drivers/video/panel/panel-r61517.c
  create mode 100644 drivers/video/panel/panel.c
  create mode 100644 include/video/panel-dbi.h
  create mode 100644 include/video/panel-dummy.h
  create mode 100644 include/video/panel-r61505.h
  create mode 100644 include/video/panel-r61517.h
  create mode 100644 include/video/panel.h

 --
 Regards,

 Laurent Pinchart

 --
 To unsubscribe from this list: send the line unsubscribe linux-fbdev in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC 0/5] Generic panel framework

2012-10-20 Thread Inki Dae

Hi Tomi,

2012/8/17 Tomi Valkeinen tomi.valkei...@ti.com:
 Hi,

 On Fri, 2012-08-17 at 02:49 +0200, Laurent Pinchart wrote:

 I will appreciate all reviews, comments, criticisms, ideas, remarks, ... If

 Oookay, where to start... ;)

 A few cosmetic/general comments first.

 I find the file naming a bit strange. You have panel.c, which is the
 core framework, panel-dbi.c, which is the DBI bus, panel-r61517.c, which
 is driver for r61517 panel...

 Perhaps something in this direction (in order): panel-core.c,
 mipi-dbi-bus.c, panel-r61517.c? And we probably end up with quite a lot
 of panel drivers, perhaps we should already divide these into separate
 directories, and then we wouldn't need to prefix each panel with
 panel- at all.

 ---

 Should we aim for DT only solution from the start? DT is the direction
 we are going, and I feel the older platform data stuff would be
 deprecated soon.

 ---

 Something missing from the intro is how this whole thing should be used.
 It doesn't help if we know how to turn on the panel, we also need to
 display something on it =). So I think some kind of diagram/example of
 how, say, drm would use this thing, and also how the SoC specific DBI
 bus driver would be done, would clarify things.

 ---

 We have discussed face to face about the different hardware setups and
 scenarios that we should support, but I'll list some of them here for
 others:

 1) We need to support chains of external display chips and panels. A
 simple example is a chip that takes DSI in, and outputs DPI. In that
 case we'd have a chain of SoC - DSI2DPI - DPI panel.

 In final products I think two external devices is the maximum (at least
 I've never seen three devices in a row), but in theory and in
 development environments the chain can be arbitrarily long. Also the
 connections are not necessarily 1-to-1, but a device can take one input
 while it has two outputs, or a device can take two inputs.

 Now, I think two external devices is a must requirement. I'm not sure if
 supporting more is an important requirement. However, if we support two
 devices, it could be that it's trivial to change the framework to
 support n devices.

 2) Panels and display chips are all but standard. They very often have
 their own sequences how to do things, have bugs, or implement some
 feature in slightly different way than some other panel. This is why the
 panel driver should be able to control or define the way things happen.

 As an example, Sharp LQ043T1DG01 panel
 (www.sharpsme.com/download/LQ043T1DG01-SP-072106pdf). It is enabled with
 the following sequence:

 - Enable VCC and AVDD regulators
 - Wait min 50ms
 - Enable full video stream (pck, syncs, pixels) from SoC
 - Wait min 0.5ms
 - Set DISP GPIO, which turns on the display panel

 Here we could split the enabling of panel to two parts, prepare (in this
 case starts regulators and waits 50ms) and finish (wait 0.5ms and set
 DISP GPIO), and the upper layer would start the video stream in between.

 I realize this could be done with the PANEL_ENABLE_* levels in your RFC,
 but I don't think the concepts quite match:

 - PANEL_ENABLE_BLANK level is needed for smart panels, as we need to
 configure them and send the initial frame at that operating level. With

The smart panel means command mode way(same as cpu mode)? This panel
includes framebuffer internally and needs triggering from Display
controller to update a new frame on that internal framebuffer. I think
we also need this trigger interface.

Thanks,
Inki Dae


 dummy panels there's really no such level, there's just one enable
 sequence that is always done right away.

 - I find waiting at the beginning of a function very ugly (what are we
 waiting for?) and we'd need that when changing the panel to
 PANEL_ENABLE_ON level.

 - It's still limited if the panel is a stranger one (see following
 example).

 Consider the following theoretical panel enable example, taken to absurd
 level just to show the general problem:

 - Enable regulators
 - Enable video stream
 - Wait 50ms
 - Disable video stream
 - Set enable GPIO
 - Enable video stream

 This one would be rather impossible with the upper layer handling the
 enabling of the video stream. Thus I see that the panel driver needs to
 control the sequences, and the Sharp panel driver's enable would look
 something like:

 regulator_enable(...);
 sleep();
 dpi_enable_video();
 sleep();
 gpip_set(..);

 Note that even with this model we still need the PANEL_ENABLE levels you
 have.

 ---

 I'm not sure I understand the panel unload problem you mentioned. Nobody
 should have direct references to the panel functions, so there shouldn't
 be any automatic references that would prevent module unloading. So when
 the user does rmmod panel-mypanel, the panel driver's remove will be
 called. It'll unregister itself from the panel framework, which causes
 notifications and the display driver will stop using the panel. After
 that nobody has pointers to the panel

Re: [RFC 0/5] Generic panel framework

2012-10-20 Thread Inki Dae

correct some typo. Sorry for this.

2012/10/20 Inki Dae inki@samsung.com:
 Hi Laurent. sorry for being late.

 2012/8/17 Laurent Pinchart laurent.pinch...@ideasonboard.com:
 Hi everybody,

 While working on DT bindings for the Renesas Mobile SoC display controller
 (a.k.a. LCDC) I quickly realized that display panel implementation based on
 board code callbacks would need to be replaced by a driver-based panel
 framework.

 Several driver-based panel support solution already exist in the kernel.

 - The LCD device class is implemented in drivers/video/backlight/lcd.c and
   exposes a kernel API in include/linux/lcd.h. That API is tied to the FBDEV
   API for historical reason, uses board code callback for reset and power
   management, and doesn't include support for standard features available in
   today's smart panels.

 - OMAP2+ based systems use custom panel drivers available in
   drivers/video/omap2/displays. Those drivers are based on OMAP DSS (display
   controller) specific APIs.

 - Similarly, Exynos based systems use custom panel drivers available in
   drivers/video/exynos. Only a single driver (s6e8ax0) is currently 
 available.
   That driver is based on Exynos display controller specific APIs and on the
   LCD device class API.

 I've brought up the issue with Tomi Valkeinen (OMAP DSS maintainer) and 
 Marcus
 Lorentzon (working on panel support for ST/Linaro), and we agreed that a
 generic panel framework for display devices is needed. These patches 
 implement
 a first proof of concept.

 One of the main reasons for creating a new panel framework instead of adding
 missing features to the LCD framework is to avoid being tied to the FBDEV
 framework. Panels will be used by DRM drivers as well, and their API should
 thus be subsystem-agnostic. Note that the panel framework used the
 fb_videomode structure in its API, this will be replaced by a common video
 mode structure shared across subsystems (there's only so many hours per day).

 Panels, as used in these patches, are defined as physical devices combining a
 matrix of pixels and a controller capable of driving that matrix.

 Panel physical devices are registered as children of the control bus the 
 panel
 controller is connected to (depending on the panel type, we can find platform
 devices for dummy panels with no control bus, or I2C, SPI, DBI, DSI, ...
 devices). The generic panel framework matches registered panel devices with
 panel drivers and call the panel drivers probe method, as done by other 
 device
 classes in the kernel. The driver probe() method is responsible for
 instantiating a struct panel instance and registering it with the generic
 panel framework.

 Display drivers are panel consumers. They register a panel notifier with the
 framework, which then calls the notifier when a matching panel is registered.
 The reason for this asynchronous mode of operation, compared to how drivers
 acquire regulator or clock resources, is that the panel can use resources
 provided by the display driver. For instance a panel can be a child of the 
 DBI
 or DSI bus controlled by the display device, or use a clock provided by that
 device. We can't defer the display device probe until the panel is registered
 and also defer the panel device probe until the display is registered. As
 most display drivers need to handle output devices hotplug (HDMI monitors for
 instance), handling panel through a notification system seemed to be the
 easiest solution.

 Note that this brings a different issue after registration, as display and
 panel drivers would take a reference to each other. Those circular references
 would make driver unloading impossible. I haven't found a good solution for
 that problem yet (hence the RFC state of those patches), and I would
 appreciate your input here. This might also be a hint that the framework
 design is wrong to start with. I guess I can't get everything right on the
 first try ;-)

 Getting hold of the panel is the most complex part. Once done, display 
 drivers
 can call abstract operations provided by panel drivers to control the panel
 operation. These patches implement three of those operations (enable, start
 transfer and get modes). More operations will be needed, and those three
 operations will likely get modified during review. Most of the panels on
 devices I own are dumb panels with no control bus, and are thus not the best
 candidates to design a framework that needs to take complex panels' needs 
 into
 account.

 In addition to the generic panel core, I've implemented MIPI DBI (Display Bus
 Interface, a parallel bus for panels that supports read/write transfers of
 commands and data) bus support, as well as three panel drivers (dummy panels
 with no control bus, and Renesas R61505- and R61517-based panels, both using
 DBI as their control bus). Only the dummy panel driver has been tested as I
 lack hardware for the two other drivers.

 I will appreciate all reviews, comments, criticisms

RE: [PATCH v1 01/14] media: s5p-hdmi: add HPD GPIO to platform data

2012-10-04 Thread Inki Dae

Hello Media guys,

This is dependent of exynos drm patch set to be merged to mainline so if
there is no problem then please, give me ack so that I can merge this patch
with exynos drm patch set.

Thanks,
Inki Dae

 -Original Message-
 From: RAHUL SHARMA [mailto:rahul.sha...@samsung.com]
 Sent: Thursday, October 04, 2012 4:40 PM
 To: Tomasz Stanislawski; Kyungmin Park; linux-arm-
 ker...@lists.infradead.org; linux-media@vger.kernel.org
 Cc: In-Ki Dae; SUNIL JOSHI; r.sh.o...@gmail.com
 Subject: Re: [PATCH v1 01/14] media: s5p-hdmi: add HPD GPIO to platform
 data
 
 Hi Mr. Tomasz, Mr. Park, list,
 
 First patch in the following set belongs to s5p-media, rest to exynos-drm.
 Please review the media patch so that It can be merged for mainline.
 
 regards,
 Rahul Sharma
 
 On Thu, Oct 4, 2012 at 9:12 PM, Rahul Sharma rahul.sha...@samsung.com
 wrote:
  From: Tomasz Stanislawski t.stanisl...@samsung.com
 
  This patch extends s5p-hdmi platform data by a GPIO identifier for
  Hot-Plug-Detection pin.
 
  Signed-off-by: Tomasz Stanislawski t.stanisl...@samsung.com
  Signed-off-by: Kyungmin Park kyungmin.p...@samsung.com
  ---
   include/media/s5p_hdmi.h |2 ++
   1 files changed, 2 insertions(+), 0 deletions(-)
 
  diff --git a/include/media/s5p_hdmi.h b/include/media/s5p_hdmi.h
  index 361a751..181642b 100644
  --- a/include/media/s5p_hdmi.h
  +++ b/include/media/s5p_hdmi.h
  @@ -20,6 +20,7 @@ struct i2c_board_info;
* @hdmiphy_info: template for HDMIPHY I2C device
* @mhl_bus: controller id for MHL control bus
* @mhl_info: template for MHL I2C device
  + * @hpd_gpio: GPIO for Hot-Plug-Detect pin
*
* NULL pointer for *_info fields indicates that
* the corresponding chip is not present
  @@ -29,6 +30,7 @@ struct s5p_hdmi_platform_data {
  struct i2c_board_info *hdmiphy_info;
  int mhl_bus;
  struct i2c_board_info *mhl_info;
  +   int hpd_gpio;
   };
 
   #endif /* S5P_HDMI_H */
  --
  1.7.0.4
 
 
  ___
  linux-arm-kernel mailing list
  linux-arm-ker...@lists.infradead.org
  http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] dma-buf: mmap support

2012-04-24 Thread InKi Dae

Hi,


 +static int dma_buf_mmap_internal(struct file *file, struct vm_area_struct 
 *vma)
 +{
 +       struct dma_buf *dmabuf;
 +
 +       if (!is_dma_buf_file(file))
 +               return -EINVAL;
 +
 +       dmabuf = file-private_data;
 +
 +       /* check for overflowing the buffer's size */
 +       if (vma-vm_pgoff + ((vma-vm_end - vma-vm_start)  PAGE_SHIFT) 
 +           dmabuf-size  PAGE_SHIFT)

is this condition right? your intention is for checking buffer's size
is valid or not. by the way why is vma-vm_pgoff added to vm region
size?

 +               return -EINVAL;
 +
 +       return dmabuf-ops-mmap(dmabuf, vma);
 +}
 +
  static const struct file_operations dma_buf_fops = {
        .release        = dma_buf_release,
 +       .mmap           = dma_buf_mmap_internal,
  };

  /*
 @@ -82,7 +100,8 @@ struct dma_buf *dma_buf_export(void *priv, const struct 
 dma_buf_ops *ops,
                          || !ops-unmap_dma_buf
                          || !ops-release
                          || !ops-kmap_atomic
 -                         || !ops-kmap)) {
 +                         || !ops-kmap
 +                         || !ops-mmap)) {
                return ERR_PTR(-EINVAL);
        }

 @@ -406,3 +425,46 @@ void dma_buf_kunmap(struct dma_buf *dmabuf, unsigned 
 long page_num,
                dmabuf-ops-kunmap(dmabuf, page_num, vaddr);
  }
  EXPORT_SYMBOL_GPL(dma_buf_kunmap);
 +
 +
 +/**
 + * dma_buf_mmap - Setup up a userspace mmap with the given vma
 + * @dma_buf:   [in]    buffer that should back the vma
 + * @vma:       [in]    vma for the mmap
 + * @pgoff:     [in]    offset in pages where this mmap should start within 
 the
 + *                     dma-buf buffer.
 + *
 + * This function adjusts the passed in vma so that it points at the file of 
 the
 + * dma_buf operation. It alsog adjusts the starting pgoff and does bounds
 + * checking on the size of the vma. Then it calls the exporters mmap 
 function to
 + * set up the mapping.
 + *
 + * Can return negative error values, returns 0 on success.
 + */
 +int dma_buf_mmap(struct dma_buf *dmabuf, struct vm_area_struct *vma,
 +                unsigned long pgoff)
 +{
 +       if (WARN_ON(!dmabuf || !vma))
 +               return -EINVAL;
 +
 +       /* check for offset overflow */
 +       if (pgoff + ((vma-vm_end - vma-vm_start)  PAGE_SHIFT)  pgoff)

ditto. isn't it checked whether page offset to be mmaped is placed
within vm region or not with the condition, if ((vma-vm_end -
vma-vm_start)  PAGE_SHIFT)  pgoff)?

 +               return -EOVERFLOW;
 +
 +       /* check for overflowing the buffer's size */
 +       if (pgoff + ((vma-vm_end - vma-vm_start)  PAGE_SHIFT) 
 +           dmabuf-size  PAGE_SHIFT)
 +               return -EINVAL;
 +
 +       /* readjust the vma */
 +       if (vma-vm_file)
 +               fput(vma-vm_file);
 +
 +       vma-vm_file = dmabuf-file;
 +       get_file(vma-vm_file);
 +
 +       vma-vm_pgoff = pgoff;
 +
 +       return dmabuf-ops-mmap(dmabuf, vma);
 +}
 +EXPORT_SYMBOL_GPL(dma_buf_mmap);
 diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
 index 3efbfc2..1f78d15 100644
 --- a/include/linux/dma-buf.h
 +++ b/include/linux/dma-buf.h
 @@ -61,6 +61,10 @@ struct dma_buf_attachment;
  *                This Callback must not sleep.
  * @kmap: maps a page from the buffer into kernel address space.
  * @kunmap: [optional] unmaps a page from the buffer.
 + * @mmap: used to expose the backing storage to userspace. Note that the
 + *       mapping needs to be coherent - if the exporter doesn't directly
 + *       support this, it needs to fake coherency by shooting down any ptes
 + *       when transitioning away from the cpu domain.
  */
  struct dma_buf_ops {
        int (*attach)(struct dma_buf *, struct device *,
 @@ -92,6 +96,8 @@ struct dma_buf_ops {
        void (*kunmap_atomic)(struct dma_buf *, unsigned long, void *);
        void *(*kmap)(struct dma_buf *, unsigned long);
        void (*kunmap)(struct dma_buf *, unsigned long, void *);
 +
 +       int (*mmap)(struct dma_buf *, struct vm_area_struct *vma);
  };

  /**
 @@ -167,6 +173,9 @@ void *dma_buf_kmap_atomic(struct dma_buf *, unsigned 
 long);
  void dma_buf_kunmap_atomic(struct dma_buf *, unsigned long, void *);
  void *dma_buf_kmap(struct dma_buf *, unsigned long);
  void dma_buf_kunmap(struct dma_buf *, unsigned long, void *);
 +
 +int dma_buf_mmap(struct dma_buf *, struct vm_area_struct *,
 +                unsigned long);
  #else

  static inline struct dma_buf_attachment *dma_buf_attach(struct dma_buf 
 *dmabuf,
 @@ -248,6 +257,13 @@ static inline void dma_buf_kunmap(struct dma_buf *dmabuf,
                                  unsigned long pnum, void *vaddr)
  {
  }
 +
 +static inline int dma_buf_mmap(struct dma_buf *dmabuf,
 +                              struct vm_area_struct *vma,
 +                              unsigned long pgoff)
 +{
 +       return -ENODEV;
 +}
  #endif /* CONFIG_DMA_SHARED_BUFFER */

  #endif /* __DMA_BUF_H__ */
 --

Re: [PATCH] dma-buf: mmap support

2012-04-24 Thread InKi Dae

2012/4/25, Daniel Vetter dan...@ffwll.ch:
 On Wed, Apr 25, 2012 at 01:37:51AM +0900, InKi Dae wrote:
 Hi,

 
  +static int dma_buf_mmap_internal(struct file *file, struct
  vm_area_struct *vma)
  +{
  +   struct dma_buf *dmabuf;
  +
  +   if (!is_dma_buf_file(file))
  +   return -EINVAL;
  +
  +   dmabuf = file-private_data;
  +
  +   /* check for overflowing the buffer's size */
  +   if (vma-vm_pgoff + ((vma-vm_end - vma-vm_start) 
  PAGE_SHIFT) 
  +   dmabuf-size  PAGE_SHIFT)

 is this condition right? your intention is for checking buffer's size
 is valid or not. by the way why is vma-vm_pgoff added to vm region
 size?

 This check here is to ensure that userspace cannot mmap beyong the end of
 the dma_buf object. vm_pgoff is the offset userspace passed in at mmap
 time and hence needs to be added. Note that vm_end and vm_start are in
 bytes, wheres vm_pgoff is in pages.


You're right, vma area region would be decided by user-desired size
that this is passed by mmap syscall so user should set size and
vm_pgoff appropriately. it's my missing point. well if any part of
dmabuf buffer region had already been mmaped and after that user
requested mmap for another region of the dmabuf buffer region again
then isn't there any problem? I mean that dmabuf-size would always
have same value since any memory region allocated by any allocators
such as gem, ump and so on have been exported to dmabuf. so at second
mmap request, dmabuf-size wouldn't have reasonable value because with
first mmap request, any part of the dmabuf buffer region had already
beem mmaped. for example, dmabuf size is 1MB and 512Kb region of the
dmabuf was mmaped by first mmap request and then with second mmap
request, your code would check whether user-desired size is valid or
not with dmabuf-size but dmabuf-size would still have 1MB it means
at second mmap request, any size between 512KB ~ 1MB would be ok. it's
just my concern and there could be my missing point.

Thanks,
Inki Dae.

  +   return -EINVAL;
  +
  +   return dmabuf-ops-mmap(dmabuf, vma);
  +}
  +
   static const struct file_operations dma_buf_fops = {
 .release= dma_buf_release,
  +   .mmap   = dma_buf_mmap_internal,
   };
 
   /*
  @@ -82,7 +100,8 @@ struct dma_buf *dma_buf_export(void *priv, const
  struct dma_buf_ops *ops,
   || !ops-unmap_dma_buf
   || !ops-release
   || !ops-kmap_atomic
  - || !ops-kmap)) {
  + || !ops-kmap
  + || !ops-mmap)) {
 return ERR_PTR(-EINVAL);
 }
 
  @@ -406,3 +425,46 @@ void dma_buf_kunmap(struct dma_buf *dmabuf,
  unsigned long page_num,
 dmabuf-ops-kunmap(dmabuf, page_num, vaddr);
   }
   EXPORT_SYMBOL_GPL(dma_buf_kunmap);
  +
  +
  +/**
  + * dma_buf_mmap - Setup up a userspace mmap with the given vma
  + * @dma_buf:   [in]buffer that should back the vma
  + * @vma:   [in]vma for the mmap
  + * @pgoff: [in]offset in pages where this mmap should start
  within the
  + * dma-buf buffer.
  + *
  + * This function adjusts the passed in vma so that it points at the
  file of the
  + * dma_buf operation. It alsog adjusts the starting pgoff and does
  bounds
  + * checking on the size of the vma. Then it calls the exporters mmap
  function to
  + * set up the mapping.
  + *
  + * Can return negative error values, returns 0 on success.
  + */
  +int dma_buf_mmap(struct dma_buf *dmabuf, struct vm_area_struct *vma,
  +unsigned long pgoff)
  +{
  +   if (WARN_ON(!dmabuf || !vma))
  +   return -EINVAL;
  +
  +   /* check for offset overflow */
  +   if (pgoff + ((vma-vm_end - vma-vm_start)  PAGE_SHIFT) 
  pgoff)

 ditto. isn't it checked whether page offset to be mmaped is placed
 within vm region or not with the condition, if ((vma-vm_end -
 vma-vm_start)  PAGE_SHIFT)  pgoff)?

 Nope, this check only checks for overflow. The pgoff is the offset within
 the dma_buf object. E.g. a drm driver splits up it mmap space into pieces,
 which map to individual buffers. If userspace just mmaps parts of such a
 buffer, the importer can pass the offset in pgoff. But I expect this to be
 0 for almost all cases.

 Note that we don't need this overflow check in the internal mmap function
 because do_mmap will do it for us. But here the importer potentially sets
 a completely different pgoff, so we need to do it. dma_buf documentation
 also mentions this (and that importers do not have to do these checks).

 Yours, Daniel


  +   return -EOVERFLOW;
  +
  +   /* check for overflowing the buffer's size */
  +   if (pgoff + ((vma-vm_end - vma-vm_start)  PAGE_SHIFT) 
  +   dmabuf-size  PAGE_SHIFT)
  +   return -EINVAL;
  +
  +   /* readjust the vma */
  +   if (vma-vm_file

Re: [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism

2012-01-10 Thread InKi Dae

2012/1/10 InKi Dae daei...@gmail.com:
 2012/1/10 Semwal, Sumit sumit.sem...@ti.com:
 On Tue, Jan 10, 2012 at 7:44 AM, Rob Clark r...@ti.com wrote:
 On Mon, Jan 9, 2012 at 7:34 PM, InKi Dae daei...@gmail.com wrote:
 2012/1/10 Rob Clark r...@ti.com:
 at least with no IOMMU, the memory information(containing physical
 memory address) would be copied to vb2_xx_buf object if drm gem
 exported its own buffer and vb2 wants to use that buffer at this time,
 sg table is used to share that buffer. and the problem I pointed out
 is that this buffer(also physical memory region) could be released by
 vb2 framework(as you know, vb2_xx_buf object and the memory region for
 buf-dma_addr pointing) but the Exporter(drm gem) couldn't know that
 so some problems would be induced once drm gem tries to release or
 access that buffer. and I have tried to resolve this issue adding
 get_shared_cnt() callback to dma-buf.h but I'm not sure that this is
 good way. maybe there would be better way.
 Hi Inki,
 As also mentioned in the documentation patch, importer (the user of
 the buffer) - in this case for current RFC patches on
 v4l2-as-a-user[1] vb2 framework - shouldn't release the backing memory
 of the buffer directly - it should only use the dma-buf callbacks in
 the right sequence to let the exporter know that it is done using this
 buffer, so the exporter can release it if allowed and needed.

 thank you for your comments.:) and below are some tables about dmabuf
 operations with ideal use and these tables indicate reference count of
 when buffer is created, shared and released. so if there are any
 problems, please let me know. P.S. these are just simple cases so
 there would be others.


 in case of using only drm gem and dmabuf,

 operations                       gem refcount    file f_count   buf refcount
 
 1. gem create                   A:1                                   A:0
 2. export(handle A - fd)    A:2                A:1              A:0
 3. import(fd - handle B)    A:2, B:1         A:2              A:1
 4. file close(A)                  A:2, B:1         A:1              A:1
 5. gem close(A)                A:1, B:1         A:1              A:1
 6. gem close(B)                A:1, B:0         A:1              A:0
 7. file close(A)                  A:0                A:0
 ---
 3. handle B shares the buf of handle A.
 6. release handle B but its buf.
 7. release gem handle A and dmabuf of file A and also physical memory region.


 and in case of using drm gem, vb2 and dmabuf,

 operations                  gem, vb2 refcount    file f_count   buf refcount
 
 1. gem create                   A:1                 A:0
   (GEM side)
 2. export(handle A - fd)    A:2                 A:1              A:0
   (GEM side)
 3. import(fd - handle B)    A:2, B:1          A:2              A:1
   (VB2 side)
 4. file close(A)                  A:2, B:1          A:1              A:1
   (VB2 side)
 5. vb2 close(B)                 A:2, B:0          A:1              A:0
   (VB2 side)
 6. gem close(A)                A:1                A:1              A:0
   (GEM side)
 7. file close(A)                  A:0                A:0
   (GEM side)
 
 3. vb2 handle B is shared with the buf of gem handle A.
 5. release vb2 handle B and decrease refcount of the buf pointed by it.
 7. release gem handle A and dmabuf of file A and also physical memory region.


Ah, sorry, it seems that file close shouldn't be called because
file-f_count of the file would be dropped by Importer and if f_count
is 0 then the file would be released by fput() so I'm not sure but
again:

in case of using only drm gem and dmabuf,

operations   gem refcount  file f_count   buf refcount
--
1. gem create(A)   A:1 A:0
2. export(handle A - fd)A:2  A:1  A:0
3. import(fd - handle B)A:2, B:1   A:2  A:1
4. gem close(B)A:2, B:release  A:1  A:0
5. gem close(A)A:1, A:1  A:0
6. gem close(A)A:release A:release A:release
-

and in case of using drm gem, vb2 and dmabuf,

operations  gem, vb2 refcountfile f_count   buf refcount

1. gem create   A:1 A:0

Re: [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism

2012-01-10 Thread InKi Dae

2012/1/10 Rob Clark r...@ti.com:
 On Mon, Jan 9, 2012 at 7:34 PM, InKi Dae daei...@gmail.com wrote:
 2012/1/10 Rob Clark r...@ti.com:
 On Mon, Jan 9, 2012 at 4:10 AM, InKi Dae daei...@gmail.com wrote:
 note : in case of sharing a buffer between v4l2 and drm driver, the
 memory info would be copied vb2_xx_buf to xx_gem or xx_gem to
 vb2_xx_buf through sg table. in this case, only memory info is used to
 share, not some objects.

 which v4l2/vb2 patches are you looking at?  The patches I was using,
 vb2 holds a reference to the 'struct dma_buf *' internally, not just
 keeping the sg_table


 yes, not keeping the sg_table. I mean... see a example below please.

 static void vb2_dma_contig_map_dmabuf(void *mem_priv)
 {
    struct sg_table *sg;
     ...
     sg = dma_buf_map_attachment(buf-db_attach, dir);
     ...
     buf-dma_addr = sg_dma_address(sg-sgl);
     ...
 }

 at least with no IOMMU, the memory information(containing physical
 memory address) would be copied to vb2_xx_buf object if drm gem
 exported its own buffer and vb2 wants to use that buffer at this time,
 sg table is used to share that buffer. and the problem I pointed out
 is that this buffer(also physical memory region) could be released by
 vb2 framework(as you know, vb2_xx_buf object and the memory region for
 buf-dma_addr pointing) but the Exporter(drm gem) couldn't know that
 so some problems would be induced once drm gem tries to release or
 access that buffer. and I have tried to resolve this issue adding
 get_shared_cnt() callback to dma-buf.h but I'm not sure that this is
 good way. maybe there would be better way.

 the exporter (in this case your driver's drm/gem bits) shouldn't
 release that mapping / sgtable until the importer (in this case v4l2)
 calls dma_buf_unmap fxn..

 It would be an error if the importer did a dma_buf_put() without first
 calling dma_buf_unmap_attachment() (if currently mapped) and then
 dma_buf_detach() (if currently attached).  Perhaps somewhere there
 should be some sanity checking debug code which could be enabled to do
 a WARN_ON() if the importer does the wrong thing.  It shouldn't really
 be part of the API, I don't think, but it actually does seem like a
 good thing, esp. as new drivers start trying to use dmabuf, to have
 some debug options which could be enabled.

 It is entirely possible that something was missed on the vb2 patches,
 but the way it is intended to work is like this:
 https://github.com/robclark/kernel-omap4/blob/0961428143cd10269223e3d0f24bc3a66a96185f/drivers/media/video/videobuf2-core.c#L92

 where it does a detach() before the dma_buf_put(), and the vb2-contig
 backend checks here that it is also unmapped():
 https://github.com/robclark/kernel-omap4/blob/0961428143cd10269223e3d0f24bc3a66a96185f/drivers/media/video/videobuf2-dma-contig.c#L251


I think that we also used same concept as your. for this, you can
refer to Dave's repository below and see the drm_prime_gem_destroy
function.
http://cgit.freedesktop.org/~airlied/linux/commit/?h=drm-prime-dmabufid=7cb374d6642e838e0e4836042e057e6d9139dcad

but when it comes to releasing resources, I mistakely understood some
parts of dmabuf concept so thank you for Rob and Sumit. that is very
useful.

 BR,
 -R

 Thanks.

 BR,
 -R
 ___
 dri-devel mailing list
 dri-de...@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/dri-devel
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism

2012-01-09 Thread InKi Dae

2012/1/9 Daniel Vetter dan...@ffwll.ch:
On Mon, Jan 09, 2012 at 03:20:48PM +0900, InKi Dae wrote:
I has test dmabuf based drm gem module for exynos and I found one problem.
you can refer to this test repository:
http://git.infradead.org/users/kmpark/linux-samsung/shortlog/refs/heads/exynos-drm-dmabuf

at this repository, I added some exception codes for resource release
in addition to Dave's patch sets.

let's suppose we use dmabuf based vb2 and drm gem with physically
continuous memory(no IOMMU) and we try to share allocated buffer
between them(v4l2 and drm driver).

1. request memory allocation through drm gem interface.
2. request DRM_SET_PRIME ioctl with the gem handle to get a fd to the
gem object.
- internally, private gem based dmabuf moudle calls drm_buf_export()
to register allocated gem object to fd.
3. request qbuf with the fd(got from 2) and DMABUF type to set the
buffer to v4l2 based device.
- internally, vb2 plug in module gets a buffer to the fd and then
calls dmabuf-ops-map_dmabuf() callback to get the sg table
containing physical memory info to the gem object. and then the
physical memory info would be copied to vb2_xx_buf object.
for DMABUF feature for v4l2 and videobuf2 framework, you can refer to
this repository:
git://github.com/robclark/kernel-omap4.git drmplane-dmabuf

after that, if v4l2 driver want to release vb2_xx_buf object with
allocated memory region by user request, how should we do?. refcount
to vb2_xx_buf is dependent on videobuf2 framework. so when vb2_xx_buf
object is released videobuf2 framework don't know who is using the
physical memory region. so this physical memory region is released and
when drm driver tries to access the region or to release it also, a
problem would be induced.

for this problem, I added get_shared_cnt() callback to dma-buf.h but
I'm not sure that this is good way. maybe there may be better way.
if there is any missing point, please let me know.

The dma_buf object needs to hold a reference on the underlying
(necessarily reference-counted) buffer object when the exporter creates
the dma_buf handle. This reference should then get dropped in the
exporters dma_buf-ops-release() function, which is only getting called
when the last reference to the dma_buf disappears.

when the exporter creates the dma_buf handle(for example, gem - fd),
I think the refcount of gem object should be increased at this point,
and decreased by dma_buf-ops-release() again because when the
dma_buf is created and dma_buf_export() is called, this dma_buf refers
to the gem object one time. and in case of inporter(fd - gem),
file-f_count of the dma_buf is increased and then when this gem
object is released by user request such as drm close or
drn_gem_close_ioctl, dma_buf_put() should be called by
dma_buf-ops-detach() to decrease file-f_count again because the gem
object refers to the dma_buf. for this, you can refer to my test
repository I mentioned above. but the problem is that when a buffer is
released by one side, another can't know whether the buffer already
was released or not.
note : in case of sharing a buffer between v4l2 and drm driver, the
memory info would be copied vb2_xx_buf to xx_gem or xx_gem to
vb2_xx_buf through sg table. in this case, only memory info is used to
share, not some objects.

If this doesn't work like that currently, we have a bug, and exporting the
reference count or something similar can't fix that.

Yours, Daniel

PS: Please cut down the original mail when replying, otherwise it's pretty
hard to find your response ;-)

Ok, got it. thanks. :)

--
Daniel Vetter
Mail: dan...@ffwll.ch
Mobile: +41 (0)79 365 57 48
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

1 2 >

1 - 100 of 109 matches

Mail list logo