Re: [FFmpeg-devel] [PATCH] lavfi: add nlmeans_opencl filter

2019-04-14 Thread Song, Ruiling


> -Original Message-
> From: ffmpeg-devel [mailto:ffmpeg-devel-boun...@ffmpeg.org] On Behalf Of
> Mark Thompson
> Sent: Sunday, April 14, 2019 1:23 AM
> To: ffmpeg-devel@ffmpeg.org
> Subject: Re: [FFmpeg-devel] [PATCH] lavfi: add nlmeans_opencl filter
> 
> On 12/04/2019 08:38, Song, Ruiling wrote:
> >>>> +#define RELEASE_KERNEL(k)\
> >>>> +do { \
> >>>> +if (k) { \
> >>>> +cle = clReleaseKernel(k);\
> >>>> +if (cle != CL_SUCCESS)   \
> >>>> +av_log(avctx, AV_LOG_ERROR, "Failed to release " \
> >>>> +   "kernel: %d.\n", cle);\
> >>>> +}\
> >>>> +} while(0)
> >>>
> >>> This appears multiple times here and also in other filters.  Maybe it 
> >>> should
> be a
> >>> macro in opencl.h like CL_SET_KERNEL_ARG?
> > Hi Mark,
> >
> > I am rethinking about this problem, can we just simply call 
> > clReleaseKernel()
> and not checking the input and the error_code.
> > OpenCL spec has require implementation to check the input argument. So I
> think we can just ignore the if-null check.
> 
> I'm not sure that's true?  The spec allows a CL_INVALID_KERNEL error, but
> doesn't offer any clear indication of when it should be returned (NULL is
> distinguished in other cases, but not here).  Random pointers certainly do 
> crash
> implementations, so they aren't interpreting it as a requirement to validate 
> the
> pointer generally (against some list in the context, say).
Yes, seems the spec does not say about null pointer check clearly.
Because the null pointer check is cheap, so I thought every good programmed 
OpenCL driver should be able to check that.
Maybe you are right. I am not quite sure now:(
So we can keep the check as before. I have added the macro to do this. Please 
help take a look at V2 when you have time.

Thanks!
Ruiling
> 
> The standard ICD loader does have a null check returning CL_INVALID_KERNEL,
> but there is no requirement that it is used rather than linking to a 
> particular ICD
> directly.
> 
> > As we are destroying the objects, is it still useful to care the error code
> returned?
> 
> Probably not, I agree.
> 
> - Mark
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH] lavfi: add nlmeans_opencl filter

2019-04-13 Thread Mark Thompson
On 12/04/2019 08:38, Song, Ruiling wrote:
 +#define RELEASE_KERNEL(k)\
 +do { \
 +if (k) { \
 +cle = clReleaseKernel(k);\
 +if (cle != CL_SUCCESS)   \
 +av_log(avctx, AV_LOG_ERROR, "Failed to release " \
 +   "kernel: %d.\n", cle);\
 +}\
 +} while(0)
>>>
>>> This appears multiple times here and also in other filters.  Maybe it 
>>> should be a
>>> macro in opencl.h like CL_SET_KERNEL_ARG?
> Hi Mark,
> 
> I am rethinking about this problem, can we just simply call clReleaseKernel() 
> and not checking the input and the error_code.
> OpenCL spec has require implementation to check the input argument. So I 
> think we can just ignore the if-null check.

I'm not sure that's true?  The spec allows a CL_INVALID_KERNEL error, but 
doesn't offer any clear indication of when it should be returned (NULL is 
distinguished in other cases, but not here).  Random pointers certainly do 
crash implementations, so they aren't interpreting it as a requirement to 
validate the pointer generally (against some list in the context, say).

The standard ICD loader does have a null check returning CL_INVALID_KERNEL, but 
there is no requirement that it is used rather than linking to a particular ICD 
directly.

> As we are destroying the objects, is it still useful to care the error code 
> returned?

Probably not, I agree.

- Mark
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH] lavfi: add nlmeans_opencl filter

2019-04-12 Thread Song, Ruiling
> > > +#define RELEASE_KERNEL(k)\
> > > +do { \
> > > +if (k) { \
> > > +cle = clReleaseKernel(k);\
> > > +if (cle != CL_SUCCESS)   \
> > > +av_log(avctx, AV_LOG_ERROR, "Failed to release " \
> > > +   "kernel: %d.\n", cle);\
> > > +}\
> > > +} while(0)
> >
> > This appears multiple times here and also in other filters.  Maybe it 
> > should be a
> > macro in opencl.h like CL_SET_KERNEL_ARG?
Hi Mark,

I am rethinking about this problem, can we just simply call clReleaseKernel() 
and not checking the input and the error_code.
OpenCL spec has require implementation to check the input argument. So I think 
we can just ignore the if-null check.
As we are destroying the objects, is it still useful to care the error code 
returned?

Thanks!
Ruiling
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH] lavfi: add nlmeans_opencl filter

2019-04-10 Thread Song, Ruiling


> -Original Message-
> From: ffmpeg-devel [mailto:ffmpeg-devel-boun...@ffmpeg.org] On Behalf Of
> Carl Eugen Hoyos
> Sent: Tuesday, April 9, 2019 9:21 PM
> To: FFmpeg development discussions and patches 
> Subject: Re: [FFmpeg-devel] [PATCH] lavfi: add nlmeans_opencl filter
> 
> 2019-04-09 4:54 GMT+02:00, Song, Ruiling :
> 
> >> > +kernel void vert_sum(__global uint4 *ii,
> >> > + int width,
> >> > + int height)
> >> > +{
> >> > +int x = get_global_id(0);
> >> > +uint4 sum = 0;
> >> > +for (int i = 0; i < height; i++) {
> >> > +ii[i * width + x] += sum;
> >> > +sum = ii[i * width + x];
> >>
> >> This looks like it might be able to overflow in extreme cases?
> >>
> >> 3840 * 2160 * (1 - 0)^2 * 255 * 255 = 539,343,360,000 which
> >> is a long way out of range for a 32-bit int.  That requires
> >> impossible input (all pixels differing by the most extreme
> >> value), but something like a chequerboard might be of the
> >> same order?
> > Yes this is a dilemma for me. Generally the filter is with
> > high computation cost.
> > To fix the overflow, we have to use 64bit integer for the
> > integral image. Most GPUs are not good at 64bit integer
> > calculation I think. May be we can try later.
> > So I would prefer to stay with 32bit integer for a while.
> 
> Can the overflow be detected at runtime?
Will add the check.
> 
> Could the user choose between 32 and 64 bit calculation?
I may mark this as TODO.
> 
> Carl Eugen
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH] lavfi: add nlmeans_opencl filter

2019-04-09 Thread Carl Eugen Hoyos
2019-04-09 4:54 GMT+02:00, Song, Ruiling :

>> > +kernel void vert_sum(__global uint4 *ii,
>> > + int width,
>> > + int height)
>> > +{
>> > +int x = get_global_id(0);
>> > +uint4 sum = 0;
>> > +for (int i = 0; i < height; i++) {
>> > +ii[i * width + x] += sum;
>> > +sum = ii[i * width + x];
>>
>> This looks like it might be able to overflow in extreme cases?
>>
>> 3840 * 2160 * (1 - 0)^2 * 255 * 255 = 539,343,360,000 which
>> is a long way out of range for a 32-bit int.  That requires
>> impossible input (all pixels differing by the most extreme
>> value), but something like a chequerboard might be of the
>> same order?
> Yes this is a dilemma for me. Generally the filter is with
> high computation cost.
> To fix the overflow, we have to use 64bit integer for the
> integral image. Most GPUs are not good at 64bit integer
> calculation I think. May be we can try later.
> So I would prefer to stay with 32bit integer for a while.

Can the overflow be detected at runtime?

Could the user choose between 32 and 64 bit calculation?

Carl Eugen
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH] lavfi: add nlmeans_opencl filter

2019-04-08 Thread Song, Ruiling
Thanks for the valuable comments!

> -Original Message-
> From: ffmpeg-devel [mailto:ffmpeg-devel-boun...@ffmpeg.org] On Behalf Of
> Mark Thompson
> Sent: Tuesday, April 9, 2019 4:26 AM
> To: ffmpeg-devel@ffmpeg.org
> Subject: Re: [FFmpeg-devel] [PATCH] lavfi: add nlmeans_opencl filter
> 
> On 01/04/2019 08:52, Ruiling Song wrote:
> > Signed-off-by: Ruiling Song 
> > ---
> > This filter runs about 2x faster on integrated GPU than nlmeans on my 
> > Skylake
> CPU.
> > Anybody like to give some comments?
> 
> Nice!
> 
> >  configure   |   1 +
> >  doc/filters.texi|   4 +
> >  libavfilter/Makefile|   1 +
> >  libavfilter/allfilters.c|   1 +
> >  libavfilter/opencl/nlmeans.cl   | 108 +
> >  libavfilter/opencl_source.h |   1 +
> >  libavfilter/vf_nlmeans_opencl.c | 390 
> >  7 files changed, 506 insertions(+)
> >  create mode 100644 libavfilter/opencl/nlmeans.cl
> >  create mode 100644 libavfilter/vf_nlmeans_opencl.c
> >
> > diff --git a/configure b/configure
> > index f6123f53e5..a233512491 100755
> > --- a/configure
> > +++ b/configure
> > @@ -3460,6 +3460,7 @@ mpdecimate_filter_select="pixelutils"
> >  minterpolate_filter_select="scene_sad"
> >  mptestsrc_filter_deps="gpl"
> >  negate_filter_deps="lut_filter"
> > +nlmeans_opencl_filter_deps="opencl"
> >  nnedi_filter_deps="gpl"
> >  ocr_filter_deps="libtesseract"
> >  ocv_filter_deps="libopencv"
> > diff --git a/doc/filters.texi b/doc/filters.texi
> > index 867607d870..21c2c1a4b5 100644
> > --- a/doc/filters.texi
> > +++ b/doc/filters.texi
> > @@ -19030,6 +19030,10 @@ Apply erosion filter with threshold0 set to 30,
> threshold1 set 40, threshold2 se
> >  @end example
> >  @end itemize
> >
> > +@section nlmeans_opencl
> > +
> > +Non-local Means denoise filter through OpenCL, this filter accepts same
> options as @ref{nlmeans}.
> > +
> >  @section overlay_opencl
> >
> >  Overlay one video on top of another.
> > diff --git a/libavfilter/Makefile b/libavfilter/Makefile
> > index fef6ec5c55..92039bfdcf 100644
> > --- a/libavfilter/Makefile
> > +++ b/libavfilter/Makefile
> > @@ -291,6 +291,7 @@ OBJS-$(CONFIG_MIX_FILTER)+= vf_mix.o
> >  OBJS-$(CONFIG_MPDECIMATE_FILTER) += vf_mpdecimate.o
> >  OBJS-$(CONFIG_NEGATE_FILTER) += vf_lut.o
> >  OBJS-$(CONFIG_NLMEANS_FILTER)+= vf_nlmeans.o
> > +OBJS-$(CONFIG_NLMEANS_OPENCL_FILTER) += vf_nlmeans_opencl.o
> opencl.o opencl/nlmeans.o
> >  OBJS-$(CONFIG_NNEDI_FILTER)  += vf_nnedi.o
> >  OBJS-$(CONFIG_NOFORMAT_FILTER)   += vf_format.o
> >  OBJS-$(CONFIG_NOISE_FILTER)  += vf_noise.o
> > diff --git a/libavfilter/allfilters.c b/libavfilter/allfilters.c
> > index c51ae0f3c7..2a6390c92d 100644
> > --- a/libavfilter/allfilters.c
> > +++ b/libavfilter/allfilters.c
> > @@ -277,6 +277,7 @@ extern AVFilter ff_vf_mix;
> >  extern AVFilter ff_vf_mpdecimate;
> >  extern AVFilter ff_vf_negate;
> >  extern AVFilter ff_vf_nlmeans;
> > +extern AVFilter ff_vf_nlmeans_opencl;
> >  extern AVFilter ff_vf_nnedi;
> >  extern AVFilter ff_vf_noformat;
> >  extern AVFilter ff_vf_noise;
> > diff --git a/libavfilter/opencl/nlmeans.cl b/libavfilter/opencl/nlmeans.cl
> > new file mode 100644
> > index 00..dcb04834ca
> > --- /dev/null
> > +++ b/libavfilter/opencl/nlmeans.cl
> > @@ -0,0 +1,108 @@
> > +/*
> > + * This file is part of FFmpeg.
> > + *
> > + * FFmpeg is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU Lesser General Public
> > + * License as published by the Free Software Foundation; either
> > + * version 2.1 of the License, or (at your option) any later version.
> > + *
> > + * FFmpeg is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > + * Lesser General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU Lesser General Public
> > + * License along with FFmpeg; if not, write to the Free Software
> > + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301
> USA
> > + */
> > +
> > +const sa

Re: [FFmpeg-devel] [PATCH] lavfi: add nlmeans_opencl filter

2019-04-08 Thread Mark Thompson
On 01/04/2019 08:52, Ruiling Song wrote:
> Signed-off-by: Ruiling Song 
> ---
> This filter runs about 2x faster on integrated GPU than nlmeans on my Skylake 
> CPU.
> Anybody like to give some comments?

Nice!

>  configure   |   1 +
>  doc/filters.texi|   4 +
>  libavfilter/Makefile|   1 +
>  libavfilter/allfilters.c|   1 +
>  libavfilter/opencl/nlmeans.cl   | 108 +
>  libavfilter/opencl_source.h |   1 +
>  libavfilter/vf_nlmeans_opencl.c | 390 
>  7 files changed, 506 insertions(+)
>  create mode 100644 libavfilter/opencl/nlmeans.cl
>  create mode 100644 libavfilter/vf_nlmeans_opencl.c
> 
> diff --git a/configure b/configure
> index f6123f53e5..a233512491 100755
> --- a/configure
> +++ b/configure
> @@ -3460,6 +3460,7 @@ mpdecimate_filter_select="pixelutils"
>  minterpolate_filter_select="scene_sad"
>  mptestsrc_filter_deps="gpl"
>  negate_filter_deps="lut_filter"
> +nlmeans_opencl_filter_deps="opencl"
>  nnedi_filter_deps="gpl"
>  ocr_filter_deps="libtesseract"
>  ocv_filter_deps="libopencv"
> diff --git a/doc/filters.texi b/doc/filters.texi
> index 867607d870..21c2c1a4b5 100644
> --- a/doc/filters.texi
> +++ b/doc/filters.texi
> @@ -19030,6 +19030,10 @@ Apply erosion filter with threshold0 set to 30, 
> threshold1 set 40, threshold2 se
>  @end example
>  @end itemize
>  
> +@section nlmeans_opencl
> +
> +Non-local Means denoise filter through OpenCL, this filter accepts same 
> options as @ref{nlmeans}.
> +
>  @section overlay_opencl
>  
>  Overlay one video on top of another.
> diff --git a/libavfilter/Makefile b/libavfilter/Makefile
> index fef6ec5c55..92039bfdcf 100644
> --- a/libavfilter/Makefile
> +++ b/libavfilter/Makefile
> @@ -291,6 +291,7 @@ OBJS-$(CONFIG_MIX_FILTER)+= vf_mix.o
>  OBJS-$(CONFIG_MPDECIMATE_FILTER) += vf_mpdecimate.o
>  OBJS-$(CONFIG_NEGATE_FILTER) += vf_lut.o
>  OBJS-$(CONFIG_NLMEANS_FILTER)+= vf_nlmeans.o
> +OBJS-$(CONFIG_NLMEANS_OPENCL_FILTER) += vf_nlmeans_opencl.o opencl.o 
> opencl/nlmeans.o
>  OBJS-$(CONFIG_NNEDI_FILTER)  += vf_nnedi.o
>  OBJS-$(CONFIG_NOFORMAT_FILTER)   += vf_format.o
>  OBJS-$(CONFIG_NOISE_FILTER)  += vf_noise.o
> diff --git a/libavfilter/allfilters.c b/libavfilter/allfilters.c
> index c51ae0f3c7..2a6390c92d 100644
> --- a/libavfilter/allfilters.c
> +++ b/libavfilter/allfilters.c
> @@ -277,6 +277,7 @@ extern AVFilter ff_vf_mix;
>  extern AVFilter ff_vf_mpdecimate;
>  extern AVFilter ff_vf_negate;
>  extern AVFilter ff_vf_nlmeans;
> +extern AVFilter ff_vf_nlmeans_opencl;
>  extern AVFilter ff_vf_nnedi;
>  extern AVFilter ff_vf_noformat;
>  extern AVFilter ff_vf_noise;
> diff --git a/libavfilter/opencl/nlmeans.cl b/libavfilter/opencl/nlmeans.cl
> new file mode 100644
> index 00..dcb04834ca
> --- /dev/null
> +++ b/libavfilter/opencl/nlmeans.cl
> @@ -0,0 +1,108 @@
> +/*
> + * This file is part of FFmpeg.
> + *
> + * FFmpeg is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * FFmpeg is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with FFmpeg; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 
> USA
> + */
> +
> +const sampler_t sampler = (CLK_NORMALIZED_COORDS_FALSE |
> +   CLK_ADDRESS_CLAMP_TO_EDGE   |
> +   CLK_FILTER_NEAREST);
> +
> +kernel void horiz_sum(__global uint4 *ii,
> +  __read_only image2d_t src,
> +  int width,
> +  int height,
> +  int4 dx,
> +  int4 dy)
> +{
> +
> +int y = get_global_id(0);
> +int work_size = get_global_size(0);
> +
> +uint4 sum = (uint4)(0);
> +float4 s2;
> +for (int i = 0; i < width; i++) {
> +float s1 = read_imagef(src, sampler, (int2)(i, y)).x;
> +s2.x = read_imagef(src, sampler, (int2)(i+dx.x, y+dy.x)).x;
> +s2.y = read_imagef(src, sampler, (int2)(i+dx.y, y+dy.y)).x;
> +s2.z = read_imagef(src, sampler, (int2)(i+dx.z, y+dy.z)).x;
> +s2.w = read_imagef(src, sampler, (int2)(i+dx.w, y+dy.w)).x;
> +sum += convert_uint4((s1-s2)*(s1-s2) * 255*255);
> +ii[y * width + i] = sum;
> +}
> +}
> +
> +kernel void vert_sum(__global uint4 *ii,
> + int width,
> + int height)
> +{
> +int x = 

Re: [FFmpeg-devel] [PATCH] lavfi: add nlmeans_opencl filter

2019-04-07 Thread Song, Ruiling


> -Original Message-
> From: ffmpeg-devel [mailto:ffmpeg-devel-boun...@ffmpeg.org] On Behalf Of
> myp...@gmail.com
> Sent: Monday, April 8, 2019 9:37 AM
> To: FFmpeg development discussions and patches 
> Subject: Re: [FFmpeg-devel] [PATCH] lavfi: add nlmeans_opencl filter
> 
> On Mon, Apr 8, 2019 at 9:33 AM Song, Ruiling  wrote:
> >
> > > -Original Message-
> > > From: Song, Ruiling
> > > Sent: Monday, April 1, 2019 3:53 PM
> > > To: ffmpeg-devel@ffmpeg.org
> > > Cc: Song, Ruiling 
> > > Subject: [PATCH] lavfi: add nlmeans_opencl filter
> > >
> > > Signed-off-by: Ruiling Song 
> > > ---
> > > This filter runs about 2x faster on integrated GPU than nlmeans on my
> Skylake
> > > CPU.
> > > Anybody like to give some comments?
> >
> > Ping?
> >
> Tested and verified in i5-8265U

Thanks for the testing. And comments about the code itself are welcome.
The performance data highly depend on the research-window parameters and also 
the hardware.
I think you may play-with the parameters to make a trade-off between speed and 
quality.

Thanks!
Ruiling
> 
> OpenCL CPU/pocl 1.2fps with 1080P input
> OpenCL GPU/intel NEO 1.2 fps with 1080P input
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH] lavfi: add nlmeans_opencl filter

2019-04-07 Thread myp...@gmail.com
On Mon, Apr 8, 2019 at 9:33 AM Song, Ruiling  wrote:
>
> > -Original Message-
> > From: Song, Ruiling
> > Sent: Monday, April 1, 2019 3:53 PM
> > To: ffmpeg-devel@ffmpeg.org
> > Cc: Song, Ruiling 
> > Subject: [PATCH] lavfi: add nlmeans_opencl filter
> >
> > Signed-off-by: Ruiling Song 
> > ---
> > This filter runs about 2x faster on integrated GPU than nlmeans on my 
> > Skylake
> > CPU.
> > Anybody like to give some comments?
>
> Ping?
>
Tested and verified in i5-8265U

OpenCL CPU/pocl 1.2fps with 1080P input
OpenCL GPU/intel NEO 1.2 fps with 1080P input
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH] lavfi: add nlmeans_opencl filter

2019-04-07 Thread Song, Ruiling
> -Original Message-
> From: Song, Ruiling
> Sent: Monday, April 1, 2019 3:53 PM
> To: ffmpeg-devel@ffmpeg.org
> Cc: Song, Ruiling 
> Subject: [PATCH] lavfi: add nlmeans_opencl filter
> 
> Signed-off-by: Ruiling Song 
> ---
> This filter runs about 2x faster on integrated GPU than nlmeans on my Skylake
> CPU.
> Anybody like to give some comments?

Ping?

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH] lavfi: add nlmeans_opencl filter

2019-04-01 Thread Song, Ruiling



> Can you supply some details performance data ? 

On my i7-6770HQ, the nlmeans take 1.2s to process one 1080p frame.
And nlmeans_opencl take 500ms to process one frame.

Ruiling
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH] lavfi: add nlmeans_opencl filter

2019-04-01 Thread myp...@gmail.com
On Mon, Apr 1, 2019 at 3:53 PM Ruiling Song  wrote:

> Signed-off-by: Ruiling Song 
> ---
> This filter runs about 2x faster on integrated GPU than nlmeans on my
> Skylake CPU.
> Anybody like to give some comments?
>
> Ruiling
>
>  configure   |   1 +
>  doc/filters.texi|   4 +
>  libavfilter/Makefile|   1 +
>  libavfilter/allfilters.c|   1 +
>  libavfilter/opencl/nlmeans.cl   | 108 +
>  libavfilter/opencl_source.h |   1 +
>  libavfilter/vf_nlmeans_opencl.c | 390 
>  7 files changed, 506 insertions(+)
>  create mode 100644 libavfilter/opencl/nlmeans.cl
>  create mode 100644 libavfilter/vf_nlmeans_opencl.c
>
> diff --git a/configure b/configure
> index f6123f53e5..a233512491 100755
> --- a/configure
> +++ b/configure
> @@ -3460,6 +3460,7 @@ mpdecimate_filter_select="pixelutils"
>  minterpolate_filter_select="scene_sad"
>  mptestsrc_filter_deps="gpl"
>  negate_filter_deps="lut_filter"
> +nlmeans_opencl_filter_deps="opencl"
>  nnedi_filter_deps="gpl"
>  ocr_filter_deps="libtesseract"
>  ocv_filter_deps="libopencv"
> diff --git a/doc/filters.texi b/doc/filters.texi
> index 867607d870..21c2c1a4b5 100644
> --- a/doc/filters.texi
> +++ b/doc/filters.texi
> @@ -19030,6 +19030,10 @@ Apply erosion filter with threshold0 set to 30,
> threshold1 set 40, threshold2 se
>  @end example
>  @end itemize
>
> +@section nlmeans_opencl
> +
> +Non-local Means denoise filter through OpenCL, this filter accepts same
> options as @ref{nlmeans}.
> +
>  @section overlay_opencl
>
>  Overlay one video on top of another.
> diff --git a/libavfilter/Makefile b/libavfilter/Makefile
> index fef6ec5c55..92039bfdcf 100644
> --- a/libavfilter/Makefile
> +++ b/libavfilter/Makefile
> @@ -291,6 +291,7 @@ OBJS-$(CONFIG_MIX_FILTER)+=
> vf_mix.o
>  OBJS-$(CONFIG_MPDECIMATE_FILTER) += vf_mpdecimate.o
>  OBJS-$(CONFIG_NEGATE_FILTER) += vf_lut.o
>  OBJS-$(CONFIG_NLMEANS_FILTER)+= vf_nlmeans.o
> +OBJS-$(CONFIG_NLMEANS_OPENCL_FILTER) += vf_nlmeans_opencl.o
> opencl.o opencl/nlmeans.o
>  OBJS-$(CONFIG_NNEDI_FILTER)  += vf_nnedi.o
>  OBJS-$(CONFIG_NOFORMAT_FILTER)   += vf_format.o
>  OBJS-$(CONFIG_NOISE_FILTER)  += vf_noise.o
> diff --git a/libavfilter/allfilters.c b/libavfilter/allfilters.c
> index c51ae0f3c7..2a6390c92d 100644
> --- a/libavfilter/allfilters.c
> +++ b/libavfilter/allfilters.c
> @@ -277,6 +277,7 @@ extern AVFilter ff_vf_mix;
>  extern AVFilter ff_vf_mpdecimate;
>  extern AVFilter ff_vf_negate;
>  extern AVFilter ff_vf_nlmeans;
> +extern AVFilter ff_vf_nlmeans_opencl;
>  extern AVFilter ff_vf_nnedi;
>  extern AVFilter ff_vf_noformat;
>  extern AVFilter ff_vf_noise;
> diff --git a/libavfilter/opencl/nlmeans.cl b/libavfilter/opencl/nlmeans.cl
> new file mode 100644
> index 00..dcb04834ca
> --- /dev/null
> +++ b/libavfilter/opencl/nlmeans.cl
> @@ -0,0 +1,108 @@
> +/*
> + * This file is part of FFmpeg.
> + *
> + * FFmpeg is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * FFmpeg is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with FFmpeg; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
> 02110-1301 USA
> + */
> +
> +const sampler_t sampler = (CLK_NORMALIZED_COORDS_FALSE |
> +   CLK_ADDRESS_CLAMP_TO_EDGE   |
> +   CLK_FILTER_NEAREST);
> +
> +kernel void horiz_sum(__global uint4 *ii,
> +  __read_only image2d_t src,
> +  int width,
> +  int height,
> +  int4 dx,
> +  int4 dy)
> +{
> +
> +int y = get_global_id(0);
> +int work_size = get_global_size(0);
> +
> +uint4 sum = (uint4)(0);
> +float4 s2;
> +for (int i = 0; i < width; i++) {
> +float s1 = read_imagef(src, sampler, (int2)(i, y)).x;
> +s2.x = read_imagef(src, sampler, (int2)(i+dx.x, y+dy.x)).x;
> +s2.y = read_imagef(src, sampler, (int2)(i+dx.y, y+dy.y)).x;
> +s2.z = read_imagef(src, sampler, (int2)(i+dx.z, y+dy.z)).x;
> +s2.w = read_imagef(src, sampler, (int2)(i+dx.w, y+dy.w)).x;
> +sum += convert_uint4((s1-s2)*(s1-s2) * 255*255);
> +ii[y * width + i] = sum;
> +}
> +}
> +
> +kernel void vert_sum(__global uint4 *ii,
> + int width,
> + int height)
> +{
> +