[RFC 0/2] New feature: Framebuffer processors

2016-08-30 Thread Inki Dae


2016년 08월 25일 21:14에 Daniel Vetter 이(가) 쓴 글:
> On Thu, Aug 25, 2016 at 08:45:25PM +0900, Inki Dae wrote:
>>
>>
>> 2016년 08월 25일 17:42에 Daniel Vetter 이(가) 쓴 글:
>>> On Thu, Aug 25, 2016 at 05:06:55PM +0900, Inki Dae wrote:


 2016년 08월 24일 20:57에 Daniel Vetter 이(가) 쓴 글:
> On Wed, Aug 24, 2016 at 08:44:24PM +0900, Inki Dae wrote:
>> Hi,
>>
>> 2016년 08월 23일 18:41에 Daniel Stone 이(가) 쓴 글:
>>> Hi,
>>>
>>> On 22 August 2016 at 16:23, Rob Clark  wrote:
 I guess a lot comes down to 'how long before hw designers bolt a CP to
 the thing'..  at that point, I think you especially don't want a
 per-blit kernel interface.
>>>
>>> Regardless of whether or not we want it, we already _have_ it, in the
>>> form of V4L2 M2M. There are already a few IP blocks working on that, I
>>> believe. If V4L2 <-> KMS interop is painful, well, we need to fix that
>>> anyway ...
>>
>> So we are trying this. We had expereneced using V4L2 and DRM together on
>> Linux Platform makes it too complicated, and also integrated DRM with
>> M2M such as 2D and Post processor makes it simplified.  So We have been
>> trying to move existing V4L2 based drivers into DRM excepting HW Video
>> Codec - called it MFC - and Camera sensor and relevant things.
>> I think now V4L2 and DRM frameworks may make many engineers confusing
>> because there are the same devices which can be controlled by V4L2 and
>> DRM frameworks - maybe we would need more efforts like Laurent did with
>> Live source[1] in the future.
>
> Can you pls explain in more detail where working with both v4l and drm
> drivers and making them cooperate using dma-bufs poses problems? We should
> definitely fix that.

 I think it would be most Linux Platforms - Android, Chrome and Tizen -
 which would use OpenMAX/GStreammer for Multimedia and X or
 Wayland/SurfaceFlinger for Display.
>>>
>>> Yes, that's the use case. Where is the problem in making this happen? v4l
>>> can import dma-bufs, drm can export them, and there's plenty devices
>>> shipping (afaik) that make use of exact this pipeline. Can you pls explain
>>> what problem you've hit trying to make this work on exynos?
>>
>> No problem but just make it complicated as I mentioned above - the
>> stream operations - S_FMT, REQBUF, QUARYBUF, QBUF, STREAM ON and DQBUF
>> of V4L2 would never be simple as DRM.  Do you think M2M device should be
>> controlled by V4L2 interfaces? and even 2D accelerator? As long as I
>> know, The Graphics card on Desktop has all devices such as 2D/3D GPU, HW
>> Video codec and Display controller, and these devices are controlled by
>> DRM interfaces. So we - ARM Exynos - are trying to move these things to
>> DRM world and also trying to implement more convenient interfaces like
>> Marek did.
> 
> This is a misconception, there's nothing in the drm world requiring that
> everything is under the same drm device. All the work we've done over the
> past years (dma-buf, reservations, fence, prime, changes in X.org and
> wayland) are _all_ to make it possible to have a gfx device consisting of
> multiple drm/v4l/whatever else nodes. Especially for a SoC moving back to
> fake-integrating stuff really isn't a good idea I think.

Yes, not all devices. As I mentioned already - excepting HW Video codec and 
Camera device, we have changed a post processor driver to DRM instead of V4L2 
and now we are trying to standard the post processsor interfaces.
I know that the Display controllers of several SoC include Post processing 
function which makes image to be scaled up, down or rotated and pixel format of 
the image to be converted to other pixel format, and these device are 
controlled by DRM.

Do you think these post processor devices should be controlled by V4L2?

Thanks,
Inki Dae

> 
> And wrt drm being simpler than v4l - I don't think drm is any simpler, at
> elast if you look at some of the more feature-full render drivers.
> -Daniel
> 


[RFC 0/2] New feature: Framebuffer processors

2016-08-25 Thread Inki Dae


2016년 08월 25일 17:42에 Daniel Vetter 이(가) 쓴 글:
> On Thu, Aug 25, 2016 at 05:06:55PM +0900, Inki Dae wrote:
>>
>>
>> 2016년 08월 24일 20:57에 Daniel Vetter 이(가) 쓴 글:
>>> On Wed, Aug 24, 2016 at 08:44:24PM +0900, Inki Dae wrote:
 Hi,

 2016년 08월 23일 18:41에 Daniel Stone 이(가) 쓴 글:
> Hi,
>
> On 22 August 2016 at 16:23, Rob Clark  wrote:
>> I guess a lot comes down to 'how long before hw designers bolt a CP to
>> the thing'..  at that point, I think you especially don't want a
>> per-blit kernel interface.
>
> Regardless of whether or not we want it, we already _have_ it, in the
> form of V4L2 M2M. There are already a few IP blocks working on that, I
> believe. If V4L2 <-> KMS interop is painful, well, we need to fix that
> anyway ...

 So we are trying this. We had expereneced using V4L2 and DRM together on
 Linux Platform makes it too complicated, and also integrated DRM with
 M2M such as 2D and Post processor makes it simplified.  So We have been
 trying to move existing V4L2 based drivers into DRM excepting HW Video
 Codec - called it MFC - and Camera sensor and relevant things.
 I think now V4L2 and DRM frameworks may make many engineers confusing
 because there are the same devices which can be controlled by V4L2 and
 DRM frameworks - maybe we would need more efforts like Laurent did with
 Live source[1] in the future.
>>>
>>> Can you pls explain in more detail where working with both v4l and drm
>>> drivers and making them cooperate using dma-bufs poses problems? We should
>>> definitely fix that.
>>
>> I think it would be most Linux Platforms - Android, Chrome and Tizen -
>> which would use OpenMAX/GStreammer for Multimedia and X or
>> Wayland/SurfaceFlinger for Display.
> 
> Yes, that's the use case. Where is the problem in making this happen? v4l
> can import dma-bufs, drm can export them, and there's plenty devices
> shipping (afaik) that make use of exact this pipeline. Can you pls explain
> what problem you've hit trying to make this work on exynos?

No problem but just make it complicated as I mentioned above - the stream 
operations - S_FMT, REQBUF, QUARYBUF, QBUF, STREAM ON and DQBUF of V4L2 would 
never be simple as DRM.
Do you think M2M device should be controlled by V4L2 interfaces? and even 2D 
accelerator? As long as I know, The Graphics card on Desktop has all devices 
such as 2D/3D GPU, HW Video codec and Display controller, and these devices are 
controlled by DRM interfaces. So we - ARM Exynos - are trying to move these 
things to DRM world and also trying to implement more convenient interfaces 
like Marek did.

Thanks,
Inki Dae

> -Daniel
> 


[RFC 0/2] New feature: Framebuffer processors

2016-08-25 Thread Inki Dae


2016년 08월 24일 20:57에 Daniel Vetter 이(가) 쓴 글:
> On Wed, Aug 24, 2016 at 08:44:24PM +0900, Inki Dae wrote:
>> Hi,
>>
>> 2016년 08월 23일 18:41에 Daniel Stone 이(가) 쓴 글:
>>> Hi,
>>>
>>> On 22 August 2016 at 16:23, Rob Clark  wrote:
 I guess a lot comes down to 'how long before hw designers bolt a CP to
 the thing'..  at that point, I think you especially don't want a
 per-blit kernel interface.
>>>
>>> Regardless of whether or not we want it, we already _have_ it, in the
>>> form of V4L2 M2M. There are already a few IP blocks working on that, I
>>> believe. If V4L2 <-> KMS interop is painful, well, we need to fix that
>>> anyway ...
>>
>> So we are trying this. We had expereneced using V4L2 and DRM together on
>> Linux Platform makes it too complicated, and also integrated DRM with
>> M2M such as 2D and Post processor makes it simplified.  So We have been
>> trying to move existing V4L2 based drivers into DRM excepting HW Video
>> Codec - called it MFC - and Camera sensor and relevant things.
>> I think now V4L2 and DRM frameworks may make many engineers confusing
>> because there are the same devices which can be controlled by V4L2 and
>> DRM frameworks - maybe we would need more efforts like Laurent did with
>> Live source[1] in the future.
> 
> Can you pls explain in more detail where working with both v4l and drm
> drivers and making them cooperate using dma-bufs poses problems? We should
> definitely fix that.

I think it would be most Linux Platforms - Android, Chrome and Tizen - which 
would use OpenMAX/GStreammer for Multimedia and X or Wayland/SurfaceFlinger for 
Display.

Thanks,
Inki Dae

> -Daniel
> 


[RFC 0/2] New feature: Framebuffer processors

2016-08-25 Thread Daniel Vetter
On Thu, Aug 25, 2016 at 08:45:25PM +0900, Inki Dae wrote:
> 
> 
> 2016년 08월 25일 17:42에 Daniel Vetter 이(가) 쓴 글:
> > On Thu, Aug 25, 2016 at 05:06:55PM +0900, Inki Dae wrote:
> >>
> >>
> >> 2016년 08월 24일 20:57에 Daniel Vetter 이(가) 쓴 글:
> >>> On Wed, Aug 24, 2016 at 08:44:24PM +0900, Inki Dae wrote:
>  Hi,
> 
>  2016년 08월 23일 18:41에 Daniel Stone 이(가) 쓴 글:
> > Hi,
> >
> > On 22 August 2016 at 16:23, Rob Clark  wrote:
> >> I guess a lot comes down to 'how long before hw designers bolt a CP to
> >> the thing'..  at that point, I think you especially don't want a
> >> per-blit kernel interface.
> >
> > Regardless of whether or not we want it, we already _have_ it, in the
> > form of V4L2 M2M. There are already a few IP blocks working on that, I
> > believe. If V4L2 <-> KMS interop is painful, well, we need to fix that
> > anyway ...
> 
>  So we are trying this. We had expereneced using V4L2 and DRM together on
>  Linux Platform makes it too complicated, and also integrated DRM with
>  M2M such as 2D and Post processor makes it simplified.  So We have been
>  trying to move existing V4L2 based drivers into DRM excepting HW Video
>  Codec - called it MFC - and Camera sensor and relevant things.
>  I think now V4L2 and DRM frameworks may make many engineers confusing
>  because there are the same devices which can be controlled by V4L2 and
>  DRM frameworks - maybe we would need more efforts like Laurent did with
>  Live source[1] in the future.
> >>>
> >>> Can you pls explain in more detail where working with both v4l and drm
> >>> drivers and making them cooperate using dma-bufs poses problems? We should
> >>> definitely fix that.
> >>
> >> I think it would be most Linux Platforms - Android, Chrome and Tizen -
> >> which would use OpenMAX/GStreammer for Multimedia and X or
> >> Wayland/SurfaceFlinger for Display.
> >
> > Yes, that's the use case. Where is the problem in making this happen? v4l
> > can import dma-bufs, drm can export them, and there's plenty devices
> > shipping (afaik) that make use of exact this pipeline. Can you pls explain
> > what problem you've hit trying to make this work on exynos?
> 
> No problem but just make it complicated as I mentioned above - the
> stream operations - S_FMT, REQBUF, QUARYBUF, QBUF, STREAM ON and DQBUF
> of V4L2 would never be simple as DRM.  Do you think M2M device should be
> controlled by V4L2 interfaces? and even 2D accelerator? As long as I
> know, The Graphics card on Desktop has all devices such as 2D/3D GPU, HW
> Video codec and Display controller, and these devices are controlled by
> DRM interfaces. So we - ARM Exynos - are trying to move these things to
> DRM world and also trying to implement more convenient interfaces like
> Marek did.

This is a misconception, there's nothing in the drm world requiring that
everything is under the same drm device. All the work we've done over the
past years (dma-buf, reservations, fence, prime, changes in X.org and
wayland) are _all_ to make it possible to have a gfx device consisting of
multiple drm/v4l/whatever else nodes. Especially for a SoC moving back to
fake-integrating stuff really isn't a good idea I think.

And wrt drm being simpler than v4l - I don't think drm is any simpler, at
elast if you look at some of the more feature-full render drivers.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


[RFC 0/2] New feature: Framebuffer processors

2016-08-25 Thread Daniel Vetter
On Thu, Aug 25, 2016 at 05:06:55PM +0900, Inki Dae wrote:
> 
> 
> 2016년 08월 24일 20:57에 Daniel Vetter 이(가) 쓴 글:
> > On Wed, Aug 24, 2016 at 08:44:24PM +0900, Inki Dae wrote:
> >> Hi,
> >>
> >> 2016년 08월 23일 18:41에 Daniel Stone 이(가) 쓴 글:
> >>> Hi,
> >>>
> >>> On 22 August 2016 at 16:23, Rob Clark  wrote:
>  I guess a lot comes down to 'how long before hw designers bolt a CP to
>  the thing'..  at that point, I think you especially don't want a
>  per-blit kernel interface.
> >>>
> >>> Regardless of whether or not we want it, we already _have_ it, in the
> >>> form of V4L2 M2M. There are already a few IP blocks working on that, I
> >>> believe. If V4L2 <-> KMS interop is painful, well, we need to fix that
> >>> anyway ...
> >>
> >> So we are trying this. We had expereneced using V4L2 and DRM together on
> >> Linux Platform makes it too complicated, and also integrated DRM with
> >> M2M such as 2D and Post processor makes it simplified.  So We have been
> >> trying to move existing V4L2 based drivers into DRM excepting HW Video
> >> Codec - called it MFC - and Camera sensor and relevant things.
> >> I think now V4L2 and DRM frameworks may make many engineers confusing
> >> because there are the same devices which can be controlled by V4L2 and
> >> DRM frameworks - maybe we would need more efforts like Laurent did with
> >> Live source[1] in the future.
> >
> > Can you pls explain in more detail where working with both v4l and drm
> > drivers and making them cooperate using dma-bufs poses problems? We should
> > definitely fix that.
> 
> I think it would be most Linux Platforms - Android, Chrome and Tizen -
> which would use OpenMAX/GStreammer for Multimedia and X or
> Wayland/SurfaceFlinger for Display.

Yes, that's the use case. Where is the problem in making this happen? v4l
can import dma-bufs, drm can export them, and there's plenty devices
shipping (afaik) that make use of exact this pipeline. Can you pls explain
what problem you've hit trying to make this work on exynos?
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


[RFC 0/2] New feature: Framebuffer processors

2016-08-24 Thread Inki Dae
Hi,

2016년 08월 23일 18:41에 Daniel Stone 이(가) 쓴 글:
> Hi,
> 
> On 22 August 2016 at 16:23, Rob Clark  wrote:
>> I guess a lot comes down to 'how long before hw designers bolt a CP to
>> the thing'..  at that point, I think you especially don't want a
>> per-blit kernel interface.
> 
> Regardless of whether or not we want it, we already _have_ it, in the
> form of V4L2 M2M. There are already a few IP blocks working on that, I
> believe. If V4L2 <-> KMS interop is painful, well, we need to fix that
> anyway ...

So we are trying this. We had expereneced using V4L2 and DRM together on Linux 
Platform makes it too complicated, and also integrated DRM with M2M such as 2D 
and Post processor makes it simplified.
So We have been trying to move existing V4L2 based drivers into DRM excepting 
HW Video Codec - called it MFC - and Camera sensor and relevant things.
I think now V4L2 and DRM frameworks may make many engineers confusing because 
there are the same devices which can be controlled by V4L2 and DRM frameworks - 
maybe we would need more efforts like Laurent did with Live source[1] in the 
future.

Anyway, sad to say, it seems other maintainers leave NAK for this patch series 
because this trying had failed already long time ago - they had learned such 
thing didn't work well. So seems we have to keep this in only Exynos DRM.


[1] https://lwn.net/Articles/640290/

Thanks,
Inki Dae

> 
> Cheers,
> Daniel
> --
> To unsubscribe from this list: send the line "unsubscribe linux-samsung-soc" 
> in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> 


[RFC 0/2] New feature: Framebuffer processors

2016-08-24 Thread Daniel Vetter
On Wed, Aug 24, 2016 at 08:44:24PM +0900, Inki Dae wrote:
> Hi,
> 
> 2016년 08월 23일 18:41에 Daniel Stone 이(가) 쓴 글:
> > Hi,
> > 
> > On 22 August 2016 at 16:23, Rob Clark  wrote:
> >> I guess a lot comes down to 'how long before hw designers bolt a CP to
> >> the thing'..  at that point, I think you especially don't want a
> >> per-blit kernel interface.
> > 
> > Regardless of whether or not we want it, we already _have_ it, in the
> > form of V4L2 M2M. There are already a few IP blocks working on that, I
> > believe. If V4L2 <-> KMS interop is painful, well, we need to fix that
> > anyway ...
> 
> So we are trying this. We had expereneced using V4L2 and DRM together on
> Linux Platform makes it too complicated, and also integrated DRM with
> M2M such as 2D and Post processor makes it simplified.  So We have been
> trying to move existing V4L2 based drivers into DRM excepting HW Video
> Codec - called it MFC - and Camera sensor and relevant things.
> I think now V4L2 and DRM frameworks may make many engineers confusing
> because there are the same devices which can be controlled by V4L2 and
> DRM frameworks - maybe we would need more efforts like Laurent did with
> Live source[1] in the future.

Can you pls explain in more detail where working with both v4l and drm
drivers and making them cooperate using dma-bufs poses problems? We should
definitely fix that.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


[RFC 0/2] New feature: Framebuffer processors

2016-08-23 Thread Daniel Stone
Hi,

On 22 August 2016 at 16:23, Rob Clark  wrote:
> I guess a lot comes down to 'how long before hw designers bolt a CP to
> the thing'..  at that point, I think you especially don't want a
> per-blit kernel interface.

Regardless of whether or not we want it, we already _have_ it, in the
form of V4L2 M2M. There are already a few IP blocks working on that, I
believe. If V4L2 <-> KMS interop is painful, well, we need to fix that
anyway ...

Cheers,
Daniel


[RFC 0/2] New feature: Framebuffer processors

2016-08-23 Thread Dave Airlie
On 22 August 2016 at 19:44, Marek Szyprowski  
wrote:
> Dear all,
>
> This is the initial proposal for extending DRM API with generic support for
> hardware modules, which can be used for processing image data from the one
> memory buffer to another. Typical memory-to-memory operations are:
> rotation, scaling, colour space conversion or mix of them. In this proposal
> I named such hardware modules a framebuffer processors.
>
> Embedded SoCs are known to have a number of hardware blocks, which perform
> such operations. They can be used in paralel to the main GPU module to
> offload CPU from processing grapics or video data. One of example use of
> such modules is implementing video overlay, which usually requires color
> space conversion from NV12 (or similar) to RGB32 color space and scaling to
> target window size.
>
> Till now there was no generic, hardware independent API for performing such
> operations. Exynos DRM driver has its own custom extension called IPP
> (Image Post Processing), but frankly speaking, it is over-engineered and not
> really used in open-source. I didn't indentify similar API in other DRM
> drivers, besides those which expose complete support for the whole GPU.

So I'm with the others in that it's a road we've travelled and learned from,
generic accel API don't work very well long term.

What will happen is the next generation exynos will have a command queue
for it's dma engine or whatever and someone will shoehorn that into this API
because this API exists, even if it isn't suitable.

What are the requirements for having this API, what userspace feature is driving
it, compositors?, toolkit rendering?

Dave.


[RFC 0/2] New feature: Framebuffer processors

2016-08-22 Thread Benjamin Gaignard
In STM SoC we have hardware block doing scaling/colorspace conversion,
we have decide to use v4l2 mem2mem API for it:
https://linuxtv.org/downloads/v4l-dvb-apis/selection-api.html

the code is here:
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/media/platform/sti/bdisp?id=refs/tags/v4.8-rc3

Regards,
Benjamin


2016-08-22 17:23 GMT+02:00 Rob Clark :
> On Mon, Aug 22, 2016 at 5:59 AM, Christian König
>  wrote:
>> Am 22.08.2016 um 11:44 schrieb Marek Szyprowski:
>>>
>>> Dear all,
>>>
>>> This is the initial proposal for extending DRM API with generic support
>>> for
>>> hardware modules, which can be used for processing image data from the one
>>> memory buffer to another. Typical memory-to-memory operations are:
>>> rotation, scaling, colour space conversion or mix of them. In this
>>> proposal
>>> I named such hardware modules a framebuffer processors.
>>>
>>> Embedded SoCs are known to have a number of hardware blocks, which perform
>>> such operations. They can be used in paralel to the main GPU module to
>>> offload CPU from processing grapics or video data. One of example use of
>>> such modules is implementing video overlay, which usually requires color
>>> space conversion from NV12 (or similar) to RGB32 color space and scaling
>>> to
>>> target window size.
>>>
>>> Till now there was no generic, hardware independent API for performing
>>> such
>>> operations. Exynos DRM driver has its own custom extension called IPP
>>> (Image Post Processing), but frankly speaking, it is over-engineered and
>>> not
>>> really used in open-source. I didn't indentify similar API in other DRM
>>> drivers, besides those which expose complete support for the whole GPU.
>>
>>
>> Well there are good reasons why we don't have hardware independent command
>> submission.
>>
>> We already tried approaches like this and they didn't worked very well and
>> are generally a pain to get right.
>>
>> So my feeling goes into the direction of a NAK, especially since you didn't
>> explained in this mail why there is need for a common API.
>
> I guess a lot comes down to 'how long before hw designers bolt a CP to
> the thing'..  at that point, I think you especially don't want a
> per-blit kernel interface.
>
> But either way, if userspace needs/wants a generic 2d blitter API, it
> is probably best to start out with defining a common userspace level
> API.  That gets you a lot more flexibility to throw it away and start
> again once you realize you've painted yourself into a corner.  And it
> is something that could be implemented on top of real gpu's in
> addition to things that look more like a mem2mem crtc.
>
> Given the length of time kernel uapi must be supported, vs how fast hw
> evolves, I'm leaning towards NAK as well.
>
> BR,
> -R
>
>
>> Regards,
>> Christian.
>>
>>
>>>
>>> However, the need for commmon API has been already mentioned on the
>>> mailing
>>> list. Here are some example threads:
>>> 1. "RFC: hardware accelerated bitblt using dma engine"
>>> http://www.spinics.net/lists/dri-devel/msg114250.html
>>> 2. "[PATCH 00/25] Exynos DRM: new life of IPP (Image Post Processing)
>>> subsystem"
>>> https://lists.freedesktop.org/archives/dri-devel/2015-November/094115.html
>>> https://lists.freedesktop.org/archives/dri-devel/2015-November/094533.html
>>>
>>> The proposed API is heavily inspired by atomic KMS approach - it is also
>>> based on DRM objects and their properties. A new DRM object is introduced:
>>> framebuffer processor (called fbproc for convenience). Such fbproc objects
>>> have a set of standard DRM properties, which describes the operation to be
>>> performed by respective hardware module. In typical case those properties
>>> are a source fb id and rectangle (x, y, width, height) and destination fb
>>> id and rectangle. Optionally a rotation property can be also specified if
>>> supported by the given hardware. To perform an operation on image data,
>>> userspace provides a set of properties and their values for given fbproc
>>> object in a similar way as object and properties are provided for
>>> performing atomic page flip / mode setting.
>>>
>>> The proposed API consists of the 3 new ioctls:
>>> - DRM_IOCTL_MODE_GETFBPROCRESOURCES: to enumerate all available fbproc
>>>objects,
>>> - DRM_IOCTL_MODE_GETFBPROC: to query capabilities of given fbproc object,
>>> - DRM_IOCTL_MODE_FBPROC: to perform operation described by given property
>>>set.
>>>
>>> The proposed API is extensible. Drivers can attach their own, custom
>>> properties to add support for more advanced picture processing (for
>>> example
>>> blending).
>>>
>>> Please note that this API is intended to be used for simple
>>> memory-to-memory
>>> image processing hardware not the full-blown GPU blitters, which usually
>>> have more features. Typically blitters provides much more operations
>>> beside
>>> simple pixel copying and operate best if its command queue is controlled
>>> from
>>> respective dedicated 

[RFC 0/2] New feature: Framebuffer processors

2016-08-22 Thread Daniel Vetter
On Mon, Aug 22, 2016 at 11:59:24AM +0200, Christian König wrote:
> Am 22.08.2016 um 11:44 schrieb Marek Szyprowski:
> > Dear all,
> > 
> > This is the initial proposal for extending DRM API with generic support for
> > hardware modules, which can be used for processing image data from the one
> > memory buffer to another. Typical memory-to-memory operations are:
> > rotation, scaling, colour space conversion or mix of them. In this proposal
> > I named such hardware modules a framebuffer processors.
> > 
> > Embedded SoCs are known to have a number of hardware blocks, which perform
> > such operations. They can be used in paralel to the main GPU module to
> > offload CPU from processing grapics or video data. One of example use of
> > such modules is implementing video overlay, which usually requires color
> > space conversion from NV12 (or similar) to RGB32 color space and scaling to
> > target window size.
> > 
> > Till now there was no generic, hardware independent API for performing such
> > operations. Exynos DRM driver has its own custom extension called IPP
> > (Image Post Processing), but frankly speaking, it is over-engineered and not
> > really used in open-source. I didn't indentify similar API in other DRM
> > drivers, besides those which expose complete support for the whole GPU.
> 
> Well there are good reasons why we don't have hardware independent command
> submission.
> 
> We already tried approaches like this and they didn't worked very well and
> are generally a pain to get right.
> 
> So my feeling goes into the direction of a NAK, especially since you didn't
> explained in this mail why there is need for a common API.

We've had an earlier RFC thread, and I made it clear there already that
this will face a steep uphill battle. I don't really see any explanation
here why this is not an exact copy of the ideas we've shown to not work
10+ years ago, hence I concur on this NACK.

Make this a driver private thing, operating on gem objects (and yes that
means you get to reinvent the metdata, which is imo a good thing since it
avoids encumbering kms with this blitter use-case). And then that
interface proves indeed useful for multiple blitter IP blocks we can use
it for that in generic fashion. And if it shows up in different
display/render/gpu blocks we can reuse the driver using dma-buf/prime
sharing. So there's really downside, except that your ioctl won't be
blessed as official in any way, which imo is a Good Thing.

Or this all turns out as a mistake (which I expect it to be) and we can
quitely burry it again since it's just a little driver.

Trying to push this will lead to 1+ years of frustration and most likely
still not succeed.
-Daniel

> 
> Regards,
> Christian.
> 
> > 
> > However, the need for commmon API has been already mentioned on the mailing
> > list. Here are some example threads:
> > 1. "RFC: hardware accelerated bitblt using dma engine"
> > http://www.spinics.net/lists/dri-devel/msg114250.html
> > 2. "[PATCH 00/25] Exynos DRM: new life of IPP (Image Post Processing) 
> > subsystem"
> > https://lists.freedesktop.org/archives/dri-devel/2015-November/094115.html
> > https://lists.freedesktop.org/archives/dri-devel/2015-November/094533.html
> > 
> > The proposed API is heavily inspired by atomic KMS approach - it is also
> > based on DRM objects and their properties. A new DRM object is introduced:
> > framebuffer processor (called fbproc for convenience). Such fbproc objects
> > have a set of standard DRM properties, which describes the operation to be
> > performed by respective hardware module. In typical case those properties
> > are a source fb id and rectangle (x, y, width, height) and destination fb
> > id and rectangle. Optionally a rotation property can be also specified if
> > supported by the given hardware. To perform an operation on image data,
> > userspace provides a set of properties and their values for given fbproc
> > object in a similar way as object and properties are provided for
> > performing atomic page flip / mode setting.
> > 
> > The proposed API consists of the 3 new ioctls:
> > - DRM_IOCTL_MODE_GETFBPROCRESOURCES: to enumerate all available fbproc
> >objects,
> > - DRM_IOCTL_MODE_GETFBPROC: to query capabilities of given fbproc object,
> > - DRM_IOCTL_MODE_FBPROC: to perform operation described by given property
> >set.
> > 
> > The proposed API is extensible. Drivers can attach their own, custom
> > properties to add support for more advanced picture processing (for example
> > blending).
> > 
> > Please note that this API is intended to be used for simple memory-to-memory
> > image processing hardware not the full-blown GPU blitters, which usually
> > have more features. Typically blitters provides much more operations beside
> > simple pixel copying and operate best if its command queue is controlled 
> > from
> > respective dedicated code in userspace.
> > 
> > The patchset consist of 4 parts:
> > 1. generic code for DRM core for 

[RFC 0/2] New feature: Framebuffer processors

2016-08-22 Thread Tobias Jakobi
Hello Marek,

Marek Szyprowski wrote:
> Dear Tobias
> 
> 
> On 2016-08-22 12:07, Tobias Jakobi wrote:
>> Hey Marek,
>>
>> I had a quick look at the series and I really like the new approach.
>>
>> I was wondering about the following though. If I understand this
>> correctly I can only perform m2m operations on buffers which are
>> registered as framebuffers. Is this possible to weaken that requirements
>> such that arbitrary GEM objects can be used as input and output?
> 
> Thanks for you comment.
> 
> I'm open for discussion if the API should be based on framebuffers or
> GEM objects.
> 
> Initially I thought that GEM objects would be enough, but later I
> noticed that in such case user would need to provide at least width,
> height, stride, start offset and pixel format - all parameters that
> are already used to create framebuffer object. Operating on GEM
> buffers will also make support for images composed from multiple
> buffers (like separate GEM objects for luma/chroma parts in case of
> planar formats) a bit harder. Same about already introduced API for
> fb-modifiers. I just don't want to duplicate all of it in fbproc API.
yes, that makes perfectly sense. Passing the buffer parameters
(geometry, pixel format, etc.) each time is probably not a good (and
efficient) idea.


I'm still wondering if there can't arise a situation where I simply
can't register a buffer as framebuffer.

I'm specifically looking at internal_framebuffer_create() and the driver
specific fb_create(). Now internal_framebuffer_create() itself only does
some minimal checking, but e.g. it does check against certain
minimum/maximum geometry parameters. How do we know that these
parameters match the ones for the block that performs the m2m operations?

I could image that a block could perform scaling operations on buffers
with a much larger geometry than the core allows to bind as
framebuffers. So if I have such a buffer with large geometry I might
want to scale it down, so that I can display it. But that's not possible
since I can't even bind the src as fb.

Does that make sense?


With best wishes,
Tobias


> Operating on framebuffer objects also helps to reduce errors in
> userspace. One can already queue the result of processing to the
> display hardware and this way avoid common issues related to debugging
> why the processed image is not displayed correctly due to incorrectly
> defined pitch/fourcc/start offset/etc. This is however not really a
> strong advantage of framebuffers.
> 
> 
>>
>> Anyway, great work!
>>
>> With best wishes,
>> Tobias
>>
>>
>> Marek Szyprowski wrote:
>>> Dear all,
>>>
>>> This is the initial proposal for extending DRM API with generic
>>> support for
>>> hardware modules, which can be used for processing image data from
>>> the one
>>> memory buffer to another. Typical memory-to-memory operations are:
>>> rotation, scaling, colour space conversion or mix of them. In this
>>> proposal
>>> I named such hardware modules a framebuffer processors.
>>>
>>> Embedded SoCs are known to have a number of hardware blocks, which
>>> perform
>>> such operations. They can be used in paralel to the main GPU module to
>>> offload CPU from processing grapics or video data. One of example use of
>>> such modules is implementing video overlay, which usually requires color
>>> space conversion from NV12 (or similar) to RGB32 color space and
>>> scaling to
>>> target window size.
>>>
>>> Till now there was no generic, hardware independent API for
>>> performing such
>>> operations. Exynos DRM driver has its own custom extension called IPP
>>> (Image Post Processing), but frankly speaking, it is over-engineered
>>> and not
>>> really used in open-source. I didn't indentify similar API in other DRM
>>> drivers, besides those which expose complete support for the whole GPU.
>>>
>>> However, the need for commmon API has been already mentioned on the
>>> mailing
>>> list. Here are some example threads:
>>> 1. "RFC: hardware accelerated bitblt using dma engine"
>>> http://www.spinics.net/lists/dri-devel/msg114250.html
>>> 2. "[PATCH 00/25] Exynos DRM: new life of IPP (Image Post Processing)
>>> subsystem"
>>> https://lists.freedesktop.org/archives/dri-devel/2015-November/094115.html
>>>
>>> https://lists.freedesktop.org/archives/dri-devel/2015-November/094533.html
>>>
>>>
>>> The proposed API is heavily inspired by atomic KMS approach - it is also
>>> based on DRM objects and their properties. A new DRM object is
>>> introduced:
>>> framebuffer processor (called fbproc for convenience). Such fbproc
>>> objects
>>> have a set of standard DRM properties, which describes the operation
>>> to be
>>> performed by respective hardware module. In typical case those
>>> properties
>>> are a source fb id and rectangle (x, y, width, height) and
>>> destination fb
>>> id and rectangle. Optionally a rotation property can be also
>>> specified if
>>> supported by the given hardware. To perform an operation on image data,
>>> userspace 

[RFC 0/2] New feature: Framebuffer processors

2016-08-22 Thread Marek Szyprowski
Dear Tobias


On 2016-08-22 12:07, Tobias Jakobi wrote:
> Hey Marek,
>
> I had a quick look at the series and I really like the new approach.
>
> I was wondering about the following though. If I understand this
> correctly I can only perform m2m operations on buffers which are
> registered as framebuffers. Is this possible to weaken that requirements
> such that arbitrary GEM objects can be used as input and output?

Thanks for you comment.

I'm open for discussion if the API should be based on framebuffers or
GEM objects.

Initially I thought that GEM objects would be enough, but later I
noticed that in such case user would need to provide at least width,
height, stride, start offset and pixel format - all parameters that
are already used to create framebuffer object. Operating on GEM
buffers will also make support for images composed from multiple
buffers (like separate GEM objects for luma/chroma parts in case of
planar formats) a bit harder. Same about already introduced API for
fb-modifiers. I just don't want to duplicate all of it in fbproc API.

Operating on framebuffer objects also helps to reduce errors in
userspace. One can already queue the result of processing to the
display hardware and this way avoid common issues related to debugging
why the processed image is not displayed correctly due to incorrectly
defined pitch/fourcc/start offset/etc. This is however not really a
strong advantage of framebuffers.


>
> Anyway, great work!
>
> With best wishes,
> Tobias
>
>
> Marek Szyprowski wrote:
>> Dear all,
>>
>> This is the initial proposal for extending DRM API with generic support for
>> hardware modules, which can be used for processing image data from the one
>> memory buffer to another. Typical memory-to-memory operations are:
>> rotation, scaling, colour space conversion or mix of them. In this proposal
>> I named such hardware modules a framebuffer processors.
>>
>> Embedded SoCs are known to have a number of hardware blocks, which perform
>> such operations. They can be used in paralel to the main GPU module to
>> offload CPU from processing grapics or video data. One of example use of
>> such modules is implementing video overlay, which usually requires color
>> space conversion from NV12 (or similar) to RGB32 color space and scaling to
>> target window size.
>>
>> Till now there was no generic, hardware independent API for performing such
>> operations. Exynos DRM driver has its own custom extension called IPP
>> (Image Post Processing), but frankly speaking, it is over-engineered and not
>> really used in open-source. I didn't indentify similar API in other DRM
>> drivers, besides those which expose complete support for the whole GPU.
>>
>> However, the need for commmon API has been already mentioned on the mailing
>> list. Here are some example threads:
>> 1. "RFC: hardware accelerated bitblt using dma engine"
>> http://www.spinics.net/lists/dri-devel/msg114250.html
>> 2. "[PATCH 00/25] Exynos DRM: new life of IPP (Image Post Processing) 
>> subsystem"
>> https://lists.freedesktop.org/archives/dri-devel/2015-November/094115.html
>> https://lists.freedesktop.org/archives/dri-devel/2015-November/094533.html
>>
>> The proposed API is heavily inspired by atomic KMS approach - it is also
>> based on DRM objects and their properties. A new DRM object is introduced:
>> framebuffer processor (called fbproc for convenience). Such fbproc objects
>> have a set of standard DRM properties, which describes the operation to be
>> performed by respective hardware module. In typical case those properties
>> are a source fb id and rectangle (x, y, width, height) and destination fb
>> id and rectangle. Optionally a rotation property can be also specified if
>> supported by the given hardware. To perform an operation on image data,
>> userspace provides a set of properties and their values for given fbproc
>> object in a similar way as object and properties are provided for
>> performing atomic page flip / mode setting.
>>
>> The proposed API consists of the 3 new ioctls:
>> - DRM_IOCTL_MODE_GETFBPROCRESOURCES: to enumerate all available fbproc
>>objects,
>> - DRM_IOCTL_MODE_GETFBPROC: to query capabilities of given fbproc object,
>> - DRM_IOCTL_MODE_FBPROC: to perform operation described by given property
>>set.
>>
>> The proposed API is extensible. Drivers can attach their own, custom
>> properties to add support for more advanced picture processing (for example
>> blending).
>>
>> Please note that this API is intended to be used for simple memory-to-memory
>> image processing hardware not the full-blown GPU blitters, which usually
>> have more features. Typically blitters provides much more operations beside
>> simple pixel copying and operate best if its command queue is controlled from
>> respective dedicated code in userspace.
>>
>> The patchset consist of 4 parts:
>> 1. generic code for DRM core for handling fbproc objects and ioctls
>> 2. example, quick conversion of Exynos Rotator driver 

[RFC 0/2] New feature: Framebuffer processors

2016-08-22 Thread Tobias Jakobi
Hey Marek,

I had a quick look at the series and I really like the new approach.

I was wondering about the following though. If I understand this
correctly I can only perform m2m operations on buffers which are
registered as framebuffers. Is this possible to weaken that requirements
such that arbitrary GEM objects can be used as input and output?

Anyway, great work!

With best wishes,
Tobias


Marek Szyprowski wrote:
> Dear all,
> 
> This is the initial proposal for extending DRM API with generic support for
> hardware modules, which can be used for processing image data from the one
> memory buffer to another. Typical memory-to-memory operations are:
> rotation, scaling, colour space conversion or mix of them. In this proposal
> I named such hardware modules a framebuffer processors.
> 
> Embedded SoCs are known to have a number of hardware blocks, which perform
> such operations. They can be used in paralel to the main GPU module to
> offload CPU from processing grapics or video data. One of example use of
> such modules is implementing video overlay, which usually requires color
> space conversion from NV12 (or similar) to RGB32 color space and scaling to
> target window size.
> 
> Till now there was no generic, hardware independent API for performing such
> operations. Exynos DRM driver has its own custom extension called IPP
> (Image Post Processing), but frankly speaking, it is over-engineered and not
> really used in open-source. I didn't indentify similar API in other DRM
> drivers, besides those which expose complete support for the whole GPU.
> 
> However, the need for commmon API has been already mentioned on the mailing
> list. Here are some example threads:
> 1. "RFC: hardware accelerated bitblt using dma engine"
> http://www.spinics.net/lists/dri-devel/msg114250.html
> 2. "[PATCH 00/25] Exynos DRM: new life of IPP (Image Post Processing) 
> subsystem"
> https://lists.freedesktop.org/archives/dri-devel/2015-November/094115.html
> https://lists.freedesktop.org/archives/dri-devel/2015-November/094533.html
> 
> The proposed API is heavily inspired by atomic KMS approach - it is also
> based on DRM objects and their properties. A new DRM object is introduced:
> framebuffer processor (called fbproc for convenience). Such fbproc objects
> have a set of standard DRM properties, which describes the operation to be
> performed by respective hardware module. In typical case those properties
> are a source fb id and rectangle (x, y, width, height) and destination fb
> id and rectangle. Optionally a rotation property can be also specified if
> supported by the given hardware. To perform an operation on image data,
> userspace provides a set of properties and their values for given fbproc
> object in a similar way as object and properties are provided for
> performing atomic page flip / mode setting.
> 
> The proposed API consists of the 3 new ioctls:
> - DRM_IOCTL_MODE_GETFBPROCRESOURCES: to enumerate all available fbproc
>   objects,
> - DRM_IOCTL_MODE_GETFBPROC: to query capabilities of given fbproc object,
> - DRM_IOCTL_MODE_FBPROC: to perform operation described by given property
>   set.
> 
> The proposed API is extensible. Drivers can attach their own, custom
> properties to add support for more advanced picture processing (for example
> blending).
> 
> Please note that this API is intended to be used for simple memory-to-memory
> image processing hardware not the full-blown GPU blitters, which usually
> have more features. Typically blitters provides much more operations beside
> simple pixel copying and operate best if its command queue is controlled from
> respective dedicated code in userspace.
> 
> The patchset consist of 4 parts:
> 1. generic code for DRM core for handling fbproc objects and ioctls
> 2. example, quick conversion of Exynos Rotator driver to fbproc API
> 3. libdrm extensions for handling fbproc objects
> 4. simple example of userspace code for performing 180 degree rotation of the
>framebuffer
> 
> Patches were tested on Exynos 4412-based Odroid U3 board, on top
> of Linux v4.8-rc1 kernel.
> 
> TODO:
> 1. agree on the API shape
> 2. add more documentation, especially to the kernel docs
> 3. add more userspace examples
> 
> Best regards
> Marek Szyprowski
> Samsung R Institute Poland
> 
> 
> Marek Szyprowski (2):
>   drm: add support for framebuffer processor objects
>   drm/exynos: register rotator as fbproc instead of custom ipp framework
> 
>  drivers/gpu/drm/Makefile|   3 +-
>  drivers/gpu/drm/drm_atomic.c|   5 +
>  drivers/gpu/drm/drm_crtc.c  |   6 +
>  drivers/gpu/drm/drm_crtc_internal.h |  12 +
>  drivers/gpu/drm/drm_fbproc.c| 754 
> 
>  drivers/gpu/drm/drm_ioctl.c |   3 +
>  drivers/gpu/drm/exynos/Kconfig  |   1 -
>  drivers/gpu/drm/exynos/exynos_drm_drv.c |   3 +-
>  drivers/gpu/drm/exynos/exynos_drm_rotator.c | 353 +++--

[RFC 0/2] New feature: Framebuffer processors

2016-08-22 Thread Christian König
Am 22.08.2016 um 11:44 schrieb Marek Szyprowski:
> Dear all,
>
> This is the initial proposal for extending DRM API with generic support for
> hardware modules, which can be used for processing image data from the one
> memory buffer to another. Typical memory-to-memory operations are:
> rotation, scaling, colour space conversion or mix of them. In this proposal
> I named such hardware modules a framebuffer processors.
>
> Embedded SoCs are known to have a number of hardware blocks, which perform
> such operations. They can be used in paralel to the main GPU module to
> offload CPU from processing grapics or video data. One of example use of
> such modules is implementing video overlay, which usually requires color
> space conversion from NV12 (or similar) to RGB32 color space and scaling to
> target window size.
>
> Till now there was no generic, hardware independent API for performing such
> operations. Exynos DRM driver has its own custom extension called IPP
> (Image Post Processing), but frankly speaking, it is over-engineered and not
> really used in open-source. I didn't indentify similar API in other DRM
> drivers, besides those which expose complete support for the whole GPU.

Well there are good reasons why we don't have hardware independent 
command submission.

We already tried approaches like this and they didn't worked very well 
and are generally a pain to get right.

So my feeling goes into the direction of a NAK, especially since you 
didn't explained in this mail why there is need for a common API.

Regards,
Christian.

>
> However, the need for commmon API has been already mentioned on the mailing
> list. Here are some example threads:
> 1. "RFC: hardware accelerated bitblt using dma engine"
> http://www.spinics.net/lists/dri-devel/msg114250.html
> 2. "[PATCH 00/25] Exynos DRM: new life of IPP (Image Post Processing) 
> subsystem"
> https://lists.freedesktop.org/archives/dri-devel/2015-November/094115.html
> https://lists.freedesktop.org/archives/dri-devel/2015-November/094533.html
>
> The proposed API is heavily inspired by atomic KMS approach - it is also
> based on DRM objects and their properties. A new DRM object is introduced:
> framebuffer processor (called fbproc for convenience). Such fbproc objects
> have a set of standard DRM properties, which describes the operation to be
> performed by respective hardware module. In typical case those properties
> are a source fb id and rectangle (x, y, width, height) and destination fb
> id and rectangle. Optionally a rotation property can be also specified if
> supported by the given hardware. To perform an operation on image data,
> userspace provides a set of properties and their values for given fbproc
> object in a similar way as object and properties are provided for
> performing atomic page flip / mode setting.
>
> The proposed API consists of the 3 new ioctls:
> - DRM_IOCTL_MODE_GETFBPROCRESOURCES: to enumerate all available fbproc
>objects,
> - DRM_IOCTL_MODE_GETFBPROC: to query capabilities of given fbproc object,
> - DRM_IOCTL_MODE_FBPROC: to perform operation described by given property
>set.
>
> The proposed API is extensible. Drivers can attach their own, custom
> properties to add support for more advanced picture processing (for example
> blending).
>
> Please note that this API is intended to be used for simple memory-to-memory
> image processing hardware not the full-blown GPU blitters, which usually
> have more features. Typically blitters provides much more operations beside
> simple pixel copying and operate best if its command queue is controlled from
> respective dedicated code in userspace.
>
> The patchset consist of 4 parts:
> 1. generic code for DRM core for handling fbproc objects and ioctls
> 2. example, quick conversion of Exynos Rotator driver to fbproc API
> 3. libdrm extensions for handling fbproc objects
> 4. simple example of userspace code for performing 180 degree rotation of the
> framebuffer
>
> Patches were tested on Exynos 4412-based Odroid U3 board, on top
> of Linux v4.8-rc1 kernel.
>
> TODO:
> 1. agree on the API shape
> 2. add more documentation, especially to the kernel docs
> 3. add more userspace examples
>
> Best regards
> Marek Szyprowski
> Samsung R Institute Poland
>
>
> Marek Szyprowski (2):
>drm: add support for framebuffer processor objects
>drm/exynos: register rotator as fbproc instead of custom ipp framework
>
>   drivers/gpu/drm/Makefile|   3 +-
>   drivers/gpu/drm/drm_atomic.c|   5 +
>   drivers/gpu/drm/drm_crtc.c  |   6 +
>   drivers/gpu/drm/drm_crtc_internal.h |  12 +
>   drivers/gpu/drm/drm_fbproc.c| 754 
> 
>   drivers/gpu/drm/drm_ioctl.c |   3 +
>   drivers/gpu/drm/exynos/Kconfig  |   1 -
>   drivers/gpu/drm/exynos/exynos_drm_drv.c |   3 +-
>   drivers/gpu/drm/exynos/exynos_drm_rotator.c | 353 +++--
>   

[RFC 0/2] New feature: Framebuffer processors

2016-08-22 Thread Marek Szyprowski
Dear all,

This is the initial proposal for extending DRM API with generic support for
hardware modules, which can be used for processing image data from the one
memory buffer to another. Typical memory-to-memory operations are:
rotation, scaling, colour space conversion or mix of them. In this proposal
I named such hardware modules a framebuffer processors.

Embedded SoCs are known to have a number of hardware blocks, which perform
such operations. They can be used in paralel to the main GPU module to
offload CPU from processing grapics or video data. One of example use of
such modules is implementing video overlay, which usually requires color
space conversion from NV12 (or similar) to RGB32 color space and scaling to
target window size.

Till now there was no generic, hardware independent API for performing such
operations. Exynos DRM driver has its own custom extension called IPP
(Image Post Processing), but frankly speaking, it is over-engineered and not
really used in open-source. I didn't indentify similar API in other DRM
drivers, besides those which expose complete support for the whole GPU.

However, the need for commmon API has been already mentioned on the mailing
list. Here are some example threads:
1. "RFC: hardware accelerated bitblt using dma engine"
http://www.spinics.net/lists/dri-devel/msg114250.html
2. "[PATCH 00/25] Exynos DRM: new life of IPP (Image Post Processing) subsystem"
https://lists.freedesktop.org/archives/dri-devel/2015-November/094115.html
https://lists.freedesktop.org/archives/dri-devel/2015-November/094533.html

The proposed API is heavily inspired by atomic KMS approach - it is also
based on DRM objects and their properties. A new DRM object is introduced:
framebuffer processor (called fbproc for convenience). Such fbproc objects
have a set of standard DRM properties, which describes the operation to be
performed by respective hardware module. In typical case those properties
are a source fb id and rectangle (x, y, width, height) and destination fb
id and rectangle. Optionally a rotation property can be also specified if
supported by the given hardware. To perform an operation on image data,
userspace provides a set of properties and their values for given fbproc
object in a similar way as object and properties are provided for
performing atomic page flip / mode setting.

The proposed API consists of the 3 new ioctls:
- DRM_IOCTL_MODE_GETFBPROCRESOURCES: to enumerate all available fbproc
  objects,
- DRM_IOCTL_MODE_GETFBPROC: to query capabilities of given fbproc object,
- DRM_IOCTL_MODE_FBPROC: to perform operation described by given property
  set.

The proposed API is extensible. Drivers can attach their own, custom
properties to add support for more advanced picture processing (for example
blending).

Please note that this API is intended to be used for simple memory-to-memory
image processing hardware not the full-blown GPU blitters, which usually
have more features. Typically blitters provides much more operations beside
simple pixel copying and operate best if its command queue is controlled from
respective dedicated code in userspace.

The patchset consist of 4 parts:
1. generic code for DRM core for handling fbproc objects and ioctls
2. example, quick conversion of Exynos Rotator driver to fbproc API
3. libdrm extensions for handling fbproc objects
4. simple example of userspace code for performing 180 degree rotation of the
   framebuffer

Patches were tested on Exynos 4412-based Odroid U3 board, on top
of Linux v4.8-rc1 kernel.

TODO:
1. agree on the API shape
2. add more documentation, especially to the kernel docs
3. add more userspace examples

Best regards
Marek Szyprowski
Samsung R Institute Poland


Marek Szyprowski (2):
  drm: add support for framebuffer processor objects
  drm/exynos: register rotator as fbproc instead of custom ipp framework

 drivers/gpu/drm/Makefile|   3 +-
 drivers/gpu/drm/drm_atomic.c|   5 +
 drivers/gpu/drm/drm_crtc.c  |   6 +
 drivers/gpu/drm/drm_crtc_internal.h |  12 +
 drivers/gpu/drm/drm_fbproc.c| 754 
 drivers/gpu/drm/drm_ioctl.c |   3 +
 drivers/gpu/drm/exynos/Kconfig  |   1 -
 drivers/gpu/drm/exynos/exynos_drm_drv.c |   3 +-
 drivers/gpu/drm/exynos/exynos_drm_rotator.c | 353 +++--
 drivers/gpu/drm/exynos/exynos_drm_rotator.h |  19 -
 include/drm/drmP.h  |  10 +
 include/drm/drm_crtc.h  | 211 
 include/drm/drm_irq.h   |  14 +
 include/uapi/drm/drm.h  |  13 +
 include/uapi/drm/drm_mode.h |  39 ++
 15 files changed, 1263 insertions(+), 183 deletions(-)
 create mode 100644 drivers/gpu/drm/drm_fbproc.c
 delete mode 100644 drivers/gpu/drm/exynos/exynos_drm_rotator.h

-- 
1.9.1



[RFC 0/2] New feature: Framebuffer processors

2016-08-22 Thread Rob Clark
On Mon, Aug 22, 2016 at 5:59 AM, Christian König
 wrote:
> Am 22.08.2016 um 11:44 schrieb Marek Szyprowski:
>>
>> Dear all,
>>
>> This is the initial proposal for extending DRM API with generic support
>> for
>> hardware modules, which can be used for processing image data from the one
>> memory buffer to another. Typical memory-to-memory operations are:
>> rotation, scaling, colour space conversion or mix of them. In this
>> proposal
>> I named such hardware modules a framebuffer processors.
>>
>> Embedded SoCs are known to have a number of hardware blocks, which perform
>> such operations. They can be used in paralel to the main GPU module to
>> offload CPU from processing grapics or video data. One of example use of
>> such modules is implementing video overlay, which usually requires color
>> space conversion from NV12 (or similar) to RGB32 color space and scaling
>> to
>> target window size.
>>
>> Till now there was no generic, hardware independent API for performing
>> such
>> operations. Exynos DRM driver has its own custom extension called IPP
>> (Image Post Processing), but frankly speaking, it is over-engineered and
>> not
>> really used in open-source. I didn't indentify similar API in other DRM
>> drivers, besides those which expose complete support for the whole GPU.
>
>
> Well there are good reasons why we don't have hardware independent command
> submission.
>
> We already tried approaches like this and they didn't worked very well and
> are generally a pain to get right.
>
> So my feeling goes into the direction of a NAK, especially since you didn't
> explained in this mail why there is need for a common API.

I guess a lot comes down to 'how long before hw designers bolt a CP to
the thing'..  at that point, I think you especially don't want a
per-blit kernel interface.

But either way, if userspace needs/wants a generic 2d blitter API, it
is probably best to start out with defining a common userspace level
API.  That gets you a lot more flexibility to throw it away and start
again once you realize you've painted yourself into a corner.  And it
is something that could be implemented on top of real gpu's in
addition to things that look more like a mem2mem crtc.

Given the length of time kernel uapi must be supported, vs how fast hw
evolves, I'm leaning towards NAK as well.

BR,
-R


> Regards,
> Christian.
>
>
>>
>> However, the need for commmon API has been already mentioned on the
>> mailing
>> list. Here are some example threads:
>> 1. "RFC: hardware accelerated bitblt using dma engine"
>> http://www.spinics.net/lists/dri-devel/msg114250.html
>> 2. "[PATCH 00/25] Exynos DRM: new life of IPP (Image Post Processing)
>> subsystem"
>> https://lists.freedesktop.org/archives/dri-devel/2015-November/094115.html
>> https://lists.freedesktop.org/archives/dri-devel/2015-November/094533.html
>>
>> The proposed API is heavily inspired by atomic KMS approach - it is also
>> based on DRM objects and their properties. A new DRM object is introduced:
>> framebuffer processor (called fbproc for convenience). Such fbproc objects
>> have a set of standard DRM properties, which describes the operation to be
>> performed by respective hardware module. In typical case those properties
>> are a source fb id and rectangle (x, y, width, height) and destination fb
>> id and rectangle. Optionally a rotation property can be also specified if
>> supported by the given hardware. To perform an operation on image data,
>> userspace provides a set of properties and their values for given fbproc
>> object in a similar way as object and properties are provided for
>> performing atomic page flip / mode setting.
>>
>> The proposed API consists of the 3 new ioctls:
>> - DRM_IOCTL_MODE_GETFBPROCRESOURCES: to enumerate all available fbproc
>>objects,
>> - DRM_IOCTL_MODE_GETFBPROC: to query capabilities of given fbproc object,
>> - DRM_IOCTL_MODE_FBPROC: to perform operation described by given property
>>set.
>>
>> The proposed API is extensible. Drivers can attach their own, custom
>> properties to add support for more advanced picture processing (for
>> example
>> blending).
>>
>> Please note that this API is intended to be used for simple
>> memory-to-memory
>> image processing hardware not the full-blown GPU blitters, which usually
>> have more features. Typically blitters provides much more operations
>> beside
>> simple pixel copying and operate best if its command queue is controlled
>> from
>> respective dedicated code in userspace.
>>
>> The patchset consist of 4 parts:
>> 1. generic code for DRM core for handling fbproc objects and ioctls
>> 2. example, quick conversion of Exynos Rotator driver to fbproc API
>> 3. libdrm extensions for handling fbproc objects
>> 4. simple example of userspace code for performing 180 degree rotation of
>> the
>> framebuffer
>>
>> Patches were tested on Exynos 4412-based Odroid U3 board, on top
>> of Linux v4.8-rc1 kernel.
>>
>> TODO:
>> 1. agree on the API shape
>> 2. add