Re: [FFmpeg-devel] [PATCH] lavfi: add nlmeans_opencl filter
> -Original Message- > From: ffmpeg-devel [mailto:ffmpeg-devel-boun...@ffmpeg.org] On Behalf Of > Mark Thompson > Sent: Sunday, April 14, 2019 1:23 AM > To: ffmpeg-devel@ffmpeg.org > Subject: Re: [FFmpeg-devel] [PATCH] lavfi: add nlmeans_opencl filter > > On 12/04/2019 08:38, Song, Ruiling wrote: > >>>> +#define RELEASE_KERNEL(k)\ > >>>> +do { \ > >>>> +if (k) { \ > >>>> +cle = clReleaseKernel(k);\ > >>>> +if (cle != CL_SUCCESS) \ > >>>> +av_log(avctx, AV_LOG_ERROR, "Failed to release " \ > >>>> + "kernel: %d.\n", cle);\ > >>>> +}\ > >>>> +} while(0) > >>> > >>> This appears multiple times here and also in other filters. Maybe it > >>> should > be a > >>> macro in opencl.h like CL_SET_KERNEL_ARG? > > Hi Mark, > > > > I am rethinking about this problem, can we just simply call > > clReleaseKernel() > and not checking the input and the error_code. > > OpenCL spec has require implementation to check the input argument. So I > think we can just ignore the if-null check. > > I'm not sure that's true? The spec allows a CL_INVALID_KERNEL error, but > doesn't offer any clear indication of when it should be returned (NULL is > distinguished in other cases, but not here). Random pointers certainly do > crash > implementations, so they aren't interpreting it as a requirement to validate > the > pointer generally (against some list in the context, say). Yes, seems the spec does not say about null pointer check clearly. Because the null pointer check is cheap, so I thought every good programmed OpenCL driver should be able to check that. Maybe you are right. I am not quite sure now:( So we can keep the check as before. I have added the macro to do this. Please help take a look at V2 when you have time. Thanks! Ruiling > > The standard ICD loader does have a null check returning CL_INVALID_KERNEL, > but there is no requirement that it is used rather than linking to a > particular ICD > directly. > > > As we are destroying the objects, is it still useful to care the error code > returned? > > Probably not, I agree. > > - Mark > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe". ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] lavfi: add nlmeans_opencl filter
On 12/04/2019 08:38, Song, Ruiling wrote: +#define RELEASE_KERNEL(k)\ +do { \ +if (k) { \ +cle = clReleaseKernel(k);\ +if (cle != CL_SUCCESS) \ +av_log(avctx, AV_LOG_ERROR, "Failed to release " \ + "kernel: %d.\n", cle);\ +}\ +} while(0) >>> >>> This appears multiple times here and also in other filters. Maybe it >>> should be a >>> macro in opencl.h like CL_SET_KERNEL_ARG? > Hi Mark, > > I am rethinking about this problem, can we just simply call clReleaseKernel() > and not checking the input and the error_code. > OpenCL spec has require implementation to check the input argument. So I > think we can just ignore the if-null check. I'm not sure that's true? The spec allows a CL_INVALID_KERNEL error, but doesn't offer any clear indication of when it should be returned (NULL is distinguished in other cases, but not here). Random pointers certainly do crash implementations, so they aren't interpreting it as a requirement to validate the pointer generally (against some list in the context, say). The standard ICD loader does have a null check returning CL_INVALID_KERNEL, but there is no requirement that it is used rather than linking to a particular ICD directly. > As we are destroying the objects, is it still useful to care the error code > returned? Probably not, I agree. - Mark ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] lavfi: add nlmeans_opencl filter
> > > +#define RELEASE_KERNEL(k)\ > > > +do { \ > > > +if (k) { \ > > > +cle = clReleaseKernel(k);\ > > > +if (cle != CL_SUCCESS) \ > > > +av_log(avctx, AV_LOG_ERROR, "Failed to release " \ > > > + "kernel: %d.\n", cle);\ > > > +}\ > > > +} while(0) > > > > This appears multiple times here and also in other filters. Maybe it > > should be a > > macro in opencl.h like CL_SET_KERNEL_ARG? Hi Mark, I am rethinking about this problem, can we just simply call clReleaseKernel() and not checking the input and the error_code. OpenCL spec has require implementation to check the input argument. So I think we can just ignore the if-null check. As we are destroying the objects, is it still useful to care the error code returned? Thanks! Ruiling ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] lavfi: add nlmeans_opencl filter
> -Original Message- > From: ffmpeg-devel [mailto:ffmpeg-devel-boun...@ffmpeg.org] On Behalf Of > Carl Eugen Hoyos > Sent: Tuesday, April 9, 2019 9:21 PM > To: FFmpeg development discussions and patches > Subject: Re: [FFmpeg-devel] [PATCH] lavfi: add nlmeans_opencl filter > > 2019-04-09 4:54 GMT+02:00, Song, Ruiling : > > >> > +kernel void vert_sum(__global uint4 *ii, > >> > + int width, > >> > + int height) > >> > +{ > >> > +int x = get_global_id(0); > >> > +uint4 sum = 0; > >> > +for (int i = 0; i < height; i++) { > >> > +ii[i * width + x] += sum; > >> > +sum = ii[i * width + x]; > >> > >> This looks like it might be able to overflow in extreme cases? > >> > >> 3840 * 2160 * (1 - 0)^2 * 255 * 255 = 539,343,360,000 which > >> is a long way out of range for a 32-bit int. That requires > >> impossible input (all pixels differing by the most extreme > >> value), but something like a chequerboard might be of the > >> same order? > > Yes this is a dilemma for me. Generally the filter is with > > high computation cost. > > To fix the overflow, we have to use 64bit integer for the > > integral image. Most GPUs are not good at 64bit integer > > calculation I think. May be we can try later. > > So I would prefer to stay with 32bit integer for a while. > > Can the overflow be detected at runtime? Will add the check. > > Could the user choose between 32 and 64 bit calculation? I may mark this as TODO. > > Carl Eugen > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe". ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] lavfi: add nlmeans_opencl filter
2019-04-09 4:54 GMT+02:00, Song, Ruiling : >> > +kernel void vert_sum(__global uint4 *ii, >> > + int width, >> > + int height) >> > +{ >> > +int x = get_global_id(0); >> > +uint4 sum = 0; >> > +for (int i = 0; i < height; i++) { >> > +ii[i * width + x] += sum; >> > +sum = ii[i * width + x]; >> >> This looks like it might be able to overflow in extreme cases? >> >> 3840 * 2160 * (1 - 0)^2 * 255 * 255 = 539,343,360,000 which >> is a long way out of range for a 32-bit int. That requires >> impossible input (all pixels differing by the most extreme >> value), but something like a chequerboard might be of the >> same order? > Yes this is a dilemma for me. Generally the filter is with > high computation cost. > To fix the overflow, we have to use 64bit integer for the > integral image. Most GPUs are not good at 64bit integer > calculation I think. May be we can try later. > So I would prefer to stay with 32bit integer for a while. Can the overflow be detected at runtime? Could the user choose between 32 and 64 bit calculation? Carl Eugen ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] lavfi: add nlmeans_opencl filter
Thanks for the valuable comments! > -Original Message- > From: ffmpeg-devel [mailto:ffmpeg-devel-boun...@ffmpeg.org] On Behalf Of > Mark Thompson > Sent: Tuesday, April 9, 2019 4:26 AM > To: ffmpeg-devel@ffmpeg.org > Subject: Re: [FFmpeg-devel] [PATCH] lavfi: add nlmeans_opencl filter > > On 01/04/2019 08:52, Ruiling Song wrote: > > Signed-off-by: Ruiling Song > > --- > > This filter runs about 2x faster on integrated GPU than nlmeans on my > > Skylake > CPU. > > Anybody like to give some comments? > > Nice! > > > configure | 1 + > > doc/filters.texi| 4 + > > libavfilter/Makefile| 1 + > > libavfilter/allfilters.c| 1 + > > libavfilter/opencl/nlmeans.cl | 108 + > > libavfilter/opencl_source.h | 1 + > > libavfilter/vf_nlmeans_opencl.c | 390 > > 7 files changed, 506 insertions(+) > > create mode 100644 libavfilter/opencl/nlmeans.cl > > create mode 100644 libavfilter/vf_nlmeans_opencl.c > > > > diff --git a/configure b/configure > > index f6123f53e5..a233512491 100755 > > --- a/configure > > +++ b/configure > > @@ -3460,6 +3460,7 @@ mpdecimate_filter_select="pixelutils" > > minterpolate_filter_select="scene_sad" > > mptestsrc_filter_deps="gpl" > > negate_filter_deps="lut_filter" > > +nlmeans_opencl_filter_deps="opencl" > > nnedi_filter_deps="gpl" > > ocr_filter_deps="libtesseract" > > ocv_filter_deps="libopencv" > > diff --git a/doc/filters.texi b/doc/filters.texi > > index 867607d870..21c2c1a4b5 100644 > > --- a/doc/filters.texi > > +++ b/doc/filters.texi > > @@ -19030,6 +19030,10 @@ Apply erosion filter with threshold0 set to 30, > threshold1 set 40, threshold2 se > > @end example > > @end itemize > > > > +@section nlmeans_opencl > > + > > +Non-local Means denoise filter through OpenCL, this filter accepts same > options as @ref{nlmeans}. > > + > > @section overlay_opencl > > > > Overlay one video on top of another. > > diff --git a/libavfilter/Makefile b/libavfilter/Makefile > > index fef6ec5c55..92039bfdcf 100644 > > --- a/libavfilter/Makefile > > +++ b/libavfilter/Makefile > > @@ -291,6 +291,7 @@ OBJS-$(CONFIG_MIX_FILTER)+= vf_mix.o > > OBJS-$(CONFIG_MPDECIMATE_FILTER) += vf_mpdecimate.o > > OBJS-$(CONFIG_NEGATE_FILTER) += vf_lut.o > > OBJS-$(CONFIG_NLMEANS_FILTER)+= vf_nlmeans.o > > +OBJS-$(CONFIG_NLMEANS_OPENCL_FILTER) += vf_nlmeans_opencl.o > opencl.o opencl/nlmeans.o > > OBJS-$(CONFIG_NNEDI_FILTER) += vf_nnedi.o > > OBJS-$(CONFIG_NOFORMAT_FILTER) += vf_format.o > > OBJS-$(CONFIG_NOISE_FILTER) += vf_noise.o > > diff --git a/libavfilter/allfilters.c b/libavfilter/allfilters.c > > index c51ae0f3c7..2a6390c92d 100644 > > --- a/libavfilter/allfilters.c > > +++ b/libavfilter/allfilters.c > > @@ -277,6 +277,7 @@ extern AVFilter ff_vf_mix; > > extern AVFilter ff_vf_mpdecimate; > > extern AVFilter ff_vf_negate; > > extern AVFilter ff_vf_nlmeans; > > +extern AVFilter ff_vf_nlmeans_opencl; > > extern AVFilter ff_vf_nnedi; > > extern AVFilter ff_vf_noformat; > > extern AVFilter ff_vf_noise; > > diff --git a/libavfilter/opencl/nlmeans.cl b/libavfilter/opencl/nlmeans.cl > > new file mode 100644 > > index 00..dcb04834ca > > --- /dev/null > > +++ b/libavfilter/opencl/nlmeans.cl > > @@ -0,0 +1,108 @@ > > +/* > > + * This file is part of FFmpeg. > > + * > > + * FFmpeg is free software; you can redistribute it and/or > > + * modify it under the terms of the GNU Lesser General Public > > + * License as published by the Free Software Foundation; either > > + * version 2.1 of the License, or (at your option) any later version. > > + * > > + * FFmpeg is distributed in the hope that it will be useful, > > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > > + * Lesser General Public License for more details. > > + * > > + * You should have received a copy of the GNU Lesser General Public > > + * License along with FFmpeg; if not, write to the Free Software > > + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 > USA > > + */ > > + > > +const sa
Re: [FFmpeg-devel] [PATCH] lavfi: add nlmeans_opencl filter
On 01/04/2019 08:52, Ruiling Song wrote: > Signed-off-by: Ruiling Song > --- > This filter runs about 2x faster on integrated GPU than nlmeans on my Skylake > CPU. > Anybody like to give some comments? Nice! > configure | 1 + > doc/filters.texi| 4 + > libavfilter/Makefile| 1 + > libavfilter/allfilters.c| 1 + > libavfilter/opencl/nlmeans.cl | 108 + > libavfilter/opencl_source.h | 1 + > libavfilter/vf_nlmeans_opencl.c | 390 > 7 files changed, 506 insertions(+) > create mode 100644 libavfilter/opencl/nlmeans.cl > create mode 100644 libavfilter/vf_nlmeans_opencl.c > > diff --git a/configure b/configure > index f6123f53e5..a233512491 100755 > --- a/configure > +++ b/configure > @@ -3460,6 +3460,7 @@ mpdecimate_filter_select="pixelutils" > minterpolate_filter_select="scene_sad" > mptestsrc_filter_deps="gpl" > negate_filter_deps="lut_filter" > +nlmeans_opencl_filter_deps="opencl" > nnedi_filter_deps="gpl" > ocr_filter_deps="libtesseract" > ocv_filter_deps="libopencv" > diff --git a/doc/filters.texi b/doc/filters.texi > index 867607d870..21c2c1a4b5 100644 > --- a/doc/filters.texi > +++ b/doc/filters.texi > @@ -19030,6 +19030,10 @@ Apply erosion filter with threshold0 set to 30, > threshold1 set 40, threshold2 se > @end example > @end itemize > > +@section nlmeans_opencl > + > +Non-local Means denoise filter through OpenCL, this filter accepts same > options as @ref{nlmeans}. > + > @section overlay_opencl > > Overlay one video on top of another. > diff --git a/libavfilter/Makefile b/libavfilter/Makefile > index fef6ec5c55..92039bfdcf 100644 > --- a/libavfilter/Makefile > +++ b/libavfilter/Makefile > @@ -291,6 +291,7 @@ OBJS-$(CONFIG_MIX_FILTER)+= vf_mix.o > OBJS-$(CONFIG_MPDECIMATE_FILTER) += vf_mpdecimate.o > OBJS-$(CONFIG_NEGATE_FILTER) += vf_lut.o > OBJS-$(CONFIG_NLMEANS_FILTER)+= vf_nlmeans.o > +OBJS-$(CONFIG_NLMEANS_OPENCL_FILTER) += vf_nlmeans_opencl.o opencl.o > opencl/nlmeans.o > OBJS-$(CONFIG_NNEDI_FILTER) += vf_nnedi.o > OBJS-$(CONFIG_NOFORMAT_FILTER) += vf_format.o > OBJS-$(CONFIG_NOISE_FILTER) += vf_noise.o > diff --git a/libavfilter/allfilters.c b/libavfilter/allfilters.c > index c51ae0f3c7..2a6390c92d 100644 > --- a/libavfilter/allfilters.c > +++ b/libavfilter/allfilters.c > @@ -277,6 +277,7 @@ extern AVFilter ff_vf_mix; > extern AVFilter ff_vf_mpdecimate; > extern AVFilter ff_vf_negate; > extern AVFilter ff_vf_nlmeans; > +extern AVFilter ff_vf_nlmeans_opencl; > extern AVFilter ff_vf_nnedi; > extern AVFilter ff_vf_noformat; > extern AVFilter ff_vf_noise; > diff --git a/libavfilter/opencl/nlmeans.cl b/libavfilter/opencl/nlmeans.cl > new file mode 100644 > index 00..dcb04834ca > --- /dev/null > +++ b/libavfilter/opencl/nlmeans.cl > @@ -0,0 +1,108 @@ > +/* > + * This file is part of FFmpeg. > + * > + * FFmpeg is free software; you can redistribute it and/or > + * modify it under the terms of the GNU Lesser General Public > + * License as published by the Free Software Foundation; either > + * version 2.1 of the License, or (at your option) any later version. > + * > + * FFmpeg is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + * Lesser General Public License for more details. > + * > + * You should have received a copy of the GNU Lesser General Public > + * License along with FFmpeg; if not, write to the Free Software > + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 > USA > + */ > + > +const sampler_t sampler = (CLK_NORMALIZED_COORDS_FALSE | > + CLK_ADDRESS_CLAMP_TO_EDGE | > + CLK_FILTER_NEAREST); > + > +kernel void horiz_sum(__global uint4 *ii, > + __read_only image2d_t src, > + int width, > + int height, > + int4 dx, > + int4 dy) > +{ > + > +int y = get_global_id(0); > +int work_size = get_global_size(0); > + > +uint4 sum = (uint4)(0); > +float4 s2; > +for (int i = 0; i < width; i++) { > +float s1 = read_imagef(src, sampler, (int2)(i, y)).x; > +s2.x = read_imagef(src, sampler, (int2)(i+dx.x, y+dy.x)).x; > +s2.y = read_imagef(src, sampler, (int2)(i+dx.y, y+dy.y)).x; > +s2.z = read_imagef(src, sampler, (int2)(i+dx.z, y+dy.z)).x; > +s2.w = read_imagef(src, sampler, (int2)(i+dx.w, y+dy.w)).x; > +sum += convert_uint4((s1-s2)*(s1-s2) * 255*255); > +ii[y * width + i] = sum; > +} > +} > + > +kernel void vert_sum(__global uint4 *ii, > + int width, > + int height) > +{ > +int x =
Re: [FFmpeg-devel] [PATCH] lavfi: add nlmeans_opencl filter
> -Original Message- > From: ffmpeg-devel [mailto:ffmpeg-devel-boun...@ffmpeg.org] On Behalf Of > myp...@gmail.com > Sent: Monday, April 8, 2019 9:37 AM > To: FFmpeg development discussions and patches > Subject: Re: [FFmpeg-devel] [PATCH] lavfi: add nlmeans_opencl filter > > On Mon, Apr 8, 2019 at 9:33 AM Song, Ruiling wrote: > > > > > -Original Message- > > > From: Song, Ruiling > > > Sent: Monday, April 1, 2019 3:53 PM > > > To: ffmpeg-devel@ffmpeg.org > > > Cc: Song, Ruiling > > > Subject: [PATCH] lavfi: add nlmeans_opencl filter > > > > > > Signed-off-by: Ruiling Song > > > --- > > > This filter runs about 2x faster on integrated GPU than nlmeans on my > Skylake > > > CPU. > > > Anybody like to give some comments? > > > > Ping? > > > Tested and verified in i5-8265U Thanks for the testing. And comments about the code itself are welcome. The performance data highly depend on the research-window parameters and also the hardware. I think you may play-with the parameters to make a trade-off between speed and quality. Thanks! Ruiling > > OpenCL CPU/pocl 1.2fps with 1080P input > OpenCL GPU/intel NEO 1.2 fps with 1080P input > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe". ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] lavfi: add nlmeans_opencl filter
On Mon, Apr 8, 2019 at 9:33 AM Song, Ruiling wrote: > > > -Original Message- > > From: Song, Ruiling > > Sent: Monday, April 1, 2019 3:53 PM > > To: ffmpeg-devel@ffmpeg.org > > Cc: Song, Ruiling > > Subject: [PATCH] lavfi: add nlmeans_opencl filter > > > > Signed-off-by: Ruiling Song > > --- > > This filter runs about 2x faster on integrated GPU than nlmeans on my > > Skylake > > CPU. > > Anybody like to give some comments? > > Ping? > Tested and verified in i5-8265U OpenCL CPU/pocl 1.2fps with 1080P input OpenCL GPU/intel NEO 1.2 fps with 1080P input ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] lavfi: add nlmeans_opencl filter
> -Original Message- > From: Song, Ruiling > Sent: Monday, April 1, 2019 3:53 PM > To: ffmpeg-devel@ffmpeg.org > Cc: Song, Ruiling > Subject: [PATCH] lavfi: add nlmeans_opencl filter > > Signed-off-by: Ruiling Song > --- > This filter runs about 2x faster on integrated GPU than nlmeans on my Skylake > CPU. > Anybody like to give some comments? Ping? ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] lavfi: add nlmeans_opencl filter
> Can you supply some details performance data ? On my i7-6770HQ, the nlmeans take 1.2s to process one 1080p frame. And nlmeans_opencl take 500ms to process one frame. Ruiling ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] lavfi: add nlmeans_opencl filter
On Mon, Apr 1, 2019 at 3:53 PM Ruiling Song wrote: > Signed-off-by: Ruiling Song > --- > This filter runs about 2x faster on integrated GPU than nlmeans on my > Skylake CPU. > Anybody like to give some comments? > > Ruiling > > configure | 1 + > doc/filters.texi| 4 + > libavfilter/Makefile| 1 + > libavfilter/allfilters.c| 1 + > libavfilter/opencl/nlmeans.cl | 108 + > libavfilter/opencl_source.h | 1 + > libavfilter/vf_nlmeans_opencl.c | 390 > 7 files changed, 506 insertions(+) > create mode 100644 libavfilter/opencl/nlmeans.cl > create mode 100644 libavfilter/vf_nlmeans_opencl.c > > diff --git a/configure b/configure > index f6123f53e5..a233512491 100755 > --- a/configure > +++ b/configure > @@ -3460,6 +3460,7 @@ mpdecimate_filter_select="pixelutils" > minterpolate_filter_select="scene_sad" > mptestsrc_filter_deps="gpl" > negate_filter_deps="lut_filter" > +nlmeans_opencl_filter_deps="opencl" > nnedi_filter_deps="gpl" > ocr_filter_deps="libtesseract" > ocv_filter_deps="libopencv" > diff --git a/doc/filters.texi b/doc/filters.texi > index 867607d870..21c2c1a4b5 100644 > --- a/doc/filters.texi > +++ b/doc/filters.texi > @@ -19030,6 +19030,10 @@ Apply erosion filter with threshold0 set to 30, > threshold1 set 40, threshold2 se > @end example > @end itemize > > +@section nlmeans_opencl > + > +Non-local Means denoise filter through OpenCL, this filter accepts same > options as @ref{nlmeans}. > + > @section overlay_opencl > > Overlay one video on top of another. > diff --git a/libavfilter/Makefile b/libavfilter/Makefile > index fef6ec5c55..92039bfdcf 100644 > --- a/libavfilter/Makefile > +++ b/libavfilter/Makefile > @@ -291,6 +291,7 @@ OBJS-$(CONFIG_MIX_FILTER)+= > vf_mix.o > OBJS-$(CONFIG_MPDECIMATE_FILTER) += vf_mpdecimate.o > OBJS-$(CONFIG_NEGATE_FILTER) += vf_lut.o > OBJS-$(CONFIG_NLMEANS_FILTER)+= vf_nlmeans.o > +OBJS-$(CONFIG_NLMEANS_OPENCL_FILTER) += vf_nlmeans_opencl.o > opencl.o opencl/nlmeans.o > OBJS-$(CONFIG_NNEDI_FILTER) += vf_nnedi.o > OBJS-$(CONFIG_NOFORMAT_FILTER) += vf_format.o > OBJS-$(CONFIG_NOISE_FILTER) += vf_noise.o > diff --git a/libavfilter/allfilters.c b/libavfilter/allfilters.c > index c51ae0f3c7..2a6390c92d 100644 > --- a/libavfilter/allfilters.c > +++ b/libavfilter/allfilters.c > @@ -277,6 +277,7 @@ extern AVFilter ff_vf_mix; > extern AVFilter ff_vf_mpdecimate; > extern AVFilter ff_vf_negate; > extern AVFilter ff_vf_nlmeans; > +extern AVFilter ff_vf_nlmeans_opencl; > extern AVFilter ff_vf_nnedi; > extern AVFilter ff_vf_noformat; > extern AVFilter ff_vf_noise; > diff --git a/libavfilter/opencl/nlmeans.cl b/libavfilter/opencl/nlmeans.cl > new file mode 100644 > index 00..dcb04834ca > --- /dev/null > +++ b/libavfilter/opencl/nlmeans.cl > @@ -0,0 +1,108 @@ > +/* > + * This file is part of FFmpeg. > + * > + * FFmpeg is free software; you can redistribute it and/or > + * modify it under the terms of the GNU Lesser General Public > + * License as published by the Free Software Foundation; either > + * version 2.1 of the License, or (at your option) any later version. > + * > + * FFmpeg is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + * Lesser General Public License for more details. > + * > + * You should have received a copy of the GNU Lesser General Public > + * License along with FFmpeg; if not, write to the Free Software > + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA > 02110-1301 USA > + */ > + > +const sampler_t sampler = (CLK_NORMALIZED_COORDS_FALSE | > + CLK_ADDRESS_CLAMP_TO_EDGE | > + CLK_FILTER_NEAREST); > + > +kernel void horiz_sum(__global uint4 *ii, > + __read_only image2d_t src, > + int width, > + int height, > + int4 dx, > + int4 dy) > +{ > + > +int y = get_global_id(0); > +int work_size = get_global_size(0); > + > +uint4 sum = (uint4)(0); > +float4 s2; > +for (int i = 0; i < width; i++) { > +float s1 = read_imagef(src, sampler, (int2)(i, y)).x; > +s2.x = read_imagef(src, sampler, (int2)(i+dx.x, y+dy.x)).x; > +s2.y = read_imagef(src, sampler, (int2)(i+dx.y, y+dy.y)).x; > +s2.z = read_imagef(src, sampler, (int2)(i+dx.z, y+dy.z)).x; > +s2.w = read_imagef(src, sampler, (int2)(i+dx.w, y+dy.w)).x; > +sum += convert_uint4((s1-s2)*(s1-s2) * 255*255); > +ii[y * width + i] = sum; > +} > +} > + > +kernel void vert_sum(__global uint4 *ii, > + int width, > + int height) > +{ > +