Re: [FFmpeg-devel] [PATCH V2] lavf: add transpose_opencl filter

2018-12-04 Thread Mark Thompson
On 04/12/2018 07:31, Song, Ruiling wrote:
>> -Original Message-
>> From: ffmpeg-devel [mailto:ffmpeg-devel-boun...@ffmpeg.org] On Behalf Of
>> Mark Thompson
>> Sent: Monday, December 3, 2018 8:10 AM
>> To: ffmpeg-devel@ffmpeg.org
>> Subject: Re: [FFmpeg-devel] [PATCH V2] lavf: add transpose_opencl filter
>>
>> On 28/11/2018 02:27, Ruiling Song wrote:
>>> Signed-off-by: Ruiling Song 
>>> ---
>>>  configure |   1 +
>>>  libavfilter/Makefile  |   1 +
>>>  libavfilter/allfilters.c  |   1 +
>>>  libavfilter/opencl/transpose.cl   |  35 +
>>>  libavfilter/opencl_source.h   |   1 +
>>>  libavfilter/transpose.h   |  34 +
>>>  libavfilter/vf_transpose.c|  14 +-
>>>  libavfilter/vf_transpose_opencl.c | 288
>> ++
>>>  8 files changed, 362 insertions(+), 13 deletions(-)
>>>  create mode 100644 libavfilter/opencl/transpose.cl
>>>  create mode 100644 libavfilter/transpose.h
>>>  create mode 100644 libavfilter/vf_transpose_opencl.c
>>
>> Testing the passthrough option here reveals a slightly unfortunate 
>> interaction
>> with mapping - if this is the only filter in use, then not doing a redundant 
>> copy
>> can fall over.
>>
>> For example, on Rockchip (Mali) decoding with rkmpp then using:
>>
>> -vf
>> hwmap=derive_device=opencl,transpose_opencl=dir=clock:passthrough=landsc
>> ape,hwdownload,format=nv12
>>
>> fails at the download in the passthrough case because it doesn't allow the 
>> read
>> (the extension does explicitly document this constraint -
>> <https://www.khronos.org/registry/OpenCL/extensions/arm/cl_arm_import_m
>> emory.txt>).
>>
>> VAAPI has a similar problem with a decode followed by:
>>
>> -vf
>> hwmap=derive_device=opencl,transpose_opencl,hwmap=derive_device=vaapi:r
>> everse=1
>>
>> because the reverse mapping tries to replace the inlink hw_frames_ctx in a 
>> way
>> which doesn't actually work.
>>
>> All of these cases do of course work if anything else is in the way - any 
>> additional
>> opencl filter on either side makes it work.  I think it's fine to ignore 
>> this (after all,
>> the hwmap immediately followed by hwdownload case can already fail in the
>> same way), but any thoughts you have on making that better are welcome.
> I also noticed that when I did testing. Currently have no idea on how to fix 
> it.
> But I do have interest to look for a better fix for this issue.
> Right now I am still struggling to understand the source code of hwmap.
> I didn't figure out how the hwmap will be used to map from software to 
> hardware format.
> That is the piece of code starting from line 200 in vf_hwmap.c
> https://github.com/FFmpeg/FFmpeg/blob/master/libavfilter/vf_hwmap.c#L200
> Could you show me some example command that would go into this branch?

It's the non-unmap case of the second mode in 
<http://ffmpeg.org/ffmpeg-filters.html#hwmap>.  An API which offers software 
mapping can provide a mapped frame to the previous component to use as its 
output, which may then be able to avoid a redundant copy that would happen if 
hwupload were used.

For a slightly artificial example where the difference due to the removed copy 
is very visible, compare:

$ ./ffmpeg_g -y -init_hw_device vaapi=v:/dev/dri/renderD128 -filter_hw_device v 
-filter_complex 
'haldclutsrc=level=8:rate=30,format=rgb0,hwupload,scale_vaapi=format=nv12' -c:v 
h264_vaapi -frames:v 1 out.mp4
frame=1 fps=1089
$ ./ffmpeg_g -y -init_hw_device vaapi=v:/dev/dri/renderD128 -filter_hw_device v 
-filter_complex 
'haldclutsrc=level=8:rate=30,format=rgb0,hwmap,scale_vaapi=format=nv12' -c:v 
h264_vaapi -frames:v 1 out.mp4
frame=1 fps=1391

Thanks,

- Mark
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH V2] lavf: add transpose_opencl filter

2018-12-03 Thread Song, Ruiling


> -Original Message-
> From: ffmpeg-devel [mailto:ffmpeg-devel-boun...@ffmpeg.org] On Behalf Of
> Mark Thompson
> Sent: Monday, December 3, 2018 8:10 AM
> To: ffmpeg-devel@ffmpeg.org
> Subject: Re: [FFmpeg-devel] [PATCH V2] lavf: add transpose_opencl filter
> 
> On 28/11/2018 02:27, Ruiling Song wrote:
> > Signed-off-by: Ruiling Song 
> > ---
> >  configure |   1 +
> >  libavfilter/Makefile  |   1 +
> >  libavfilter/allfilters.c  |   1 +
> >  libavfilter/opencl/transpose.cl   |  35 +
> >  libavfilter/opencl_source.h   |   1 +
> >  libavfilter/transpose.h   |  34 +
> >  libavfilter/vf_transpose.c|  14 +-
> >  libavfilter/vf_transpose_opencl.c | 288
> ++
> >  8 files changed, 362 insertions(+), 13 deletions(-)
> >  create mode 100644 libavfilter/opencl/transpose.cl
> >  create mode 100644 libavfilter/transpose.h
> >  create mode 100644 libavfilter/vf_transpose_opencl.c
> 
> Testing the passthrough option here reveals a slightly unfortunate interaction
> with mapping - if this is the only filter in use, then not doing a redundant 
> copy
> can fall over.
> 
> For example, on Rockchip (Mali) decoding with rkmpp then using:
> 
> -vf
> hwmap=derive_device=opencl,transpose_opencl=dir=clock:passthrough=landsc
> ape,hwdownload,format=nv12
> 
> fails at the download in the passthrough case because it doesn't allow the 
> read
> (the extension does explicitly document this constraint -
> <https://www.khronos.org/registry/OpenCL/extensions/arm/cl_arm_import_m
> emory.txt>).
> 
> VAAPI has a similar problem with a decode followed by:
> 
> -vf
> hwmap=derive_device=opencl,transpose_opencl,hwmap=derive_device=vaapi:r
> everse=1
> 
> because the reverse mapping tries to replace the inlink hw_frames_ctx in a way
> which doesn't actually work.
> 
> All of these cases do of course work if anything else is in the way - any 
> additional
> opencl filter on either side makes it work.  I think it's fine to ignore this 
> (after all,
> the hwmap immediately followed by hwdownload case can already fail in the
> same way), but any thoughts you have on making that better are welcome.
I also noticed that when I did testing. Currently have no idea on how to fix it.
But I do have interest to look for a better fix for this issue.
Right now I am still struggling to understand the source code of hwmap.
I didn't figure out how the hwmap will be used to map from software to hardware 
format.
That is the piece of code starting from line 200 in vf_hwmap.c
https://github.com/FFmpeg/FFmpeg/blob/master/libavfilter/vf_hwmap.c#L200
Could you show me some example command that would go into this branch?

Thanks!
Ruiling
> 
> 
> >> Does the dependency on dir have any effect on speed here?  Any call is only
> ever
> >> going to use one side of each of the dir cases, so it feels like it might 
> >> be nicer
> to
> >> hard-code that so they aren't included in the compiled code at all.
> > For such memory bound OpenCL kernel, some little more arithmetic operation
> would not affect the overall performance.
> > I did some more testing, and see no obvious performance difference for
> different 'dir' parameter. So I just keep it as now.
> 
> That makes sense, thank you for checking.
> 
> 
> So, LGTM and applied.
> 
> Thanks,
> 
> - Mark
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH V2] lavf: add transpose_opencl filter

2018-12-02 Thread Mark Thompson
On 28/11/2018 02:27, Ruiling Song wrote:
> Signed-off-by: Ruiling Song 
> ---
>  configure |   1 +
>  libavfilter/Makefile  |   1 +
>  libavfilter/allfilters.c  |   1 +
>  libavfilter/opencl/transpose.cl   |  35 +
>  libavfilter/opencl_source.h   |   1 +
>  libavfilter/transpose.h   |  34 +
>  libavfilter/vf_transpose.c|  14 +-
>  libavfilter/vf_transpose_opencl.c | 288 
> ++
>  8 files changed, 362 insertions(+), 13 deletions(-)
>  create mode 100644 libavfilter/opencl/transpose.cl
>  create mode 100644 libavfilter/transpose.h
>  create mode 100644 libavfilter/vf_transpose_opencl.c

Testing the passthrough option here reveals a slightly unfortunate interaction 
with mapping - if this is the only filter in use, then not doing a redundant 
copy can fall over.

For example, on Rockchip (Mali) decoding with rkmpp then using:

-vf 
hwmap=derive_device=opencl,transpose_opencl=dir=clock:passthrough=landscape,hwdownload,format=nv12

fails at the download in the passthrough case because it doesn't allow the read 
(the extension does explicitly document this constraint - 
).

VAAPI has a similar problem with a decode followed by:

-vf 
hwmap=derive_device=opencl,transpose_opencl,hwmap=derive_device=vaapi:reverse=1

because the reverse mapping tries to replace the inlink hw_frames_ctx in a way 
which doesn't actually work.

All of these cases do of course work if anything else is in the way - any 
additional opencl filter on either side makes it work.  I think it's fine to 
ignore this (after all, the hwmap immediately followed by hwdownload case can 
already fail in the same way), but any thoughts you have on making that better 
are welcome.


>> Does the dependency on dir have any effect on speed here?  Any call is only 
>> ever
>> going to use one side of each of the dir cases, so it feels like it might be 
>> nicer to
>> hard-code that so they aren't included in the compiled code at all.
> For such memory bound OpenCL kernel, some little more arithmetic operation 
> would not affect the overall performance.
> I did some more testing, and see no obvious performance difference for 
> different 'dir' parameter. So I just keep it as now.

That makes sense, thank you for checking.


So, LGTM and applied.

Thanks,

- Mark
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH V2] lavf: add transpose_opencl filter

2018-11-27 Thread Ruiling Song
Signed-off-by: Ruiling Song 
---
 configure |   1 +
 libavfilter/Makefile  |   1 +
 libavfilter/allfilters.c  |   1 +
 libavfilter/opencl/transpose.cl   |  35 +
 libavfilter/opencl_source.h   |   1 +
 libavfilter/transpose.h   |  34 +
 libavfilter/vf_transpose.c|  14 +-
 libavfilter/vf_transpose_opencl.c | 288 ++
 8 files changed, 362 insertions(+), 13 deletions(-)
 create mode 100644 libavfilter/opencl/transpose.cl
 create mode 100644 libavfilter/transpose.h
 create mode 100644 libavfilter/vf_transpose_opencl.c

diff --git a/configure b/configure
index b4f944c..dcb3f5f 100755
--- a/configure
+++ b/configure
@@ -3479,6 +3479,7 @@ tinterlace_merge_test_deps="tinterlace_filter"
 tinterlace_pad_test_deps="tinterlace_filter"
 tonemap_filter_deps="const_nan"
 tonemap_opencl_filter_deps="opencl const_nan"
+transpose_opencl_filter_deps="opencl"
 unsharp_opencl_filter_deps="opencl"
 uspp_filter_deps="gpl avcodec"
 vaguedenoiser_filter_deps="gpl"
diff --git a/libavfilter/Makefile b/libavfilter/Makefile
index 1895fa2..6e26581 100644
--- a/libavfilter/Makefile
+++ b/libavfilter/Makefile
@@ -393,6 +393,7 @@ OBJS-$(CONFIG_TONEMAP_OPENCL_FILTER) += 
vf_tonemap_opencl.o colorspace.o
 OBJS-$(CONFIG_TPAD_FILTER)   += vf_tpad.o
 OBJS-$(CONFIG_TRANSPOSE_FILTER)  += vf_transpose.o
 OBJS-$(CONFIG_TRANSPOSE_NPP_FILTER)  += vf_transpose_npp.o cuda_check.o
+OBJS-$(CONFIG_TRANSPOSE_OPENCL_FILTER)   += vf_transpose_opencl.o opencl.o 
opencl/transpose.o
 OBJS-$(CONFIG_TRIM_FILTER)   += trim.o
 OBJS-$(CONFIG_UNPREMULTIPLY_FILTER)  += vf_premultiply.o framesync.o
 OBJS-$(CONFIG_UNSHARP_FILTER)+= vf_unsharp.o
diff --git a/libavfilter/allfilters.c b/libavfilter/allfilters.c
index 837c99e..a600069 100644
--- a/libavfilter/allfilters.c
+++ b/libavfilter/allfilters.c
@@ -372,6 +372,7 @@ extern AVFilter ff_vf_tonemap_opencl;
 extern AVFilter ff_vf_tpad;
 extern AVFilter ff_vf_transpose;
 extern AVFilter ff_vf_transpose_npp;
+extern AVFilter ff_vf_transpose_opencl;
 extern AVFilter ff_vf_trim;
 extern AVFilter ff_vf_unpremultiply;
 extern AVFilter ff_vf_unsharp;
diff --git a/libavfilter/opencl/transpose.cl b/libavfilter/opencl/transpose.cl
new file mode 100644
index 000..e6388ab
--- /dev/null
+++ b/libavfilter/opencl/transpose.cl
@@ -0,0 +1,35 @@
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+kernel void transpose(__write_only image2d_t dst,
+  __read_only image2d_t src,
+  int dir) {
+const sampler_t sampler = (CLK_NORMALIZED_COORDS_FALSE |
+   CLK_ADDRESS_CLAMP_TO_EDGE   |
+   CLK_FILTER_NEAREST);
+
+int2 size = get_image_dim(dst);
+int x = get_global_id(0);
+int y = get_global_id(1);
+
+int xin = (dir & 2) ? (size.y - 1 - y) : y;
+int yin = (dir & 1) ? (size.x - 1 - x) : x;
+float4 data = read_imagef(src, sampler, (int2)(xin, yin));
+
+if (x < size.x && y < size.y)
+write_imagef(dst, (int2)(x, y), data);
+}
diff --git a/libavfilter/opencl_source.h b/libavfilter/opencl_source.h
index 2f67d89..4118138 100644
--- a/libavfilter/opencl_source.h
+++ b/libavfilter/opencl_source.h
@@ -25,6 +25,7 @@ extern const char *ff_opencl_source_convolution;
 extern const char *ff_opencl_source_neighbor;
 extern const char *ff_opencl_source_overlay;
 extern const char *ff_opencl_source_tonemap;
+extern const char *ff_opencl_source_transpose;
 extern const char *ff_opencl_source_unsharp;
 
 #endif /* AVFILTER_OPENCL_SOURCE_H */
diff --git a/libavfilter/transpose.h b/libavfilter/transpose.h
new file mode 100644
index 000..d4bb4da
--- /dev/null
+++ b/libavfilter/transpose.h
@@ -0,0 +1,34 @@
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied