Re: [FFmpeg-devel] [PATCH] avfilter/vf_gblur: add x86 SIMD optimizations

2019-06-01 Thread Song, Ruiling
> -Original Message-
> From: ffmpeg-devel [mailto:ffmpeg-devel-boun...@ffmpeg.org] On Behalf
> Of Carl Eugen Hoyos
> Sent: Saturday, June 1, 2019 6:12 AM
> To: FFmpeg development discussions and patches  de...@ffmpeg.org>
> Subject: Re: [FFmpeg-devel] [PATCH] avfilter/vf_gblur: add x86 SIMD
> optimizations
> 
> Am Do., 30. Mai 2019 um 05:46 Uhr schrieb Ruiling Song
> :
> >
> > For details of the implementation, please refer to the comment
> > inlined in the assembly code.
> 
> This sentence sounds unneeded to me.
> 
> > It improves the horizontal pass
> > performance about 100% under single thread.
> 
> I am not a native speaker but I wonder what a "100% speed
> improvement" could mean...
It means 50% reduction of running time.
For example, previously it takes 12ms to do one horizontal pass per frame, now 
it takes 6ms to do the horizontal pass per frame.
Any comments on the assembly code?

> 
> > Tested overall performance using the command(avx2 enabled):
> > ./ffmpeg -i 1080p.mp4 -vf gblur -f null /dev/null
> > ./ffmpeg -i 1080p.mp4 -vf gblur=threads=1 -f null /dev/null
> > For single thread, the fps improves from 43 to 60, about 40%.
> > For multi-thread, the fps improves from 110 to 130, about 20%.
> 
> Carl Eugen
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH] avfilter/vf_gblur: add x86 SIMD optimizations

2019-05-31 Thread Carl Eugen Hoyos
Am Do., 30. Mai 2019 um 05:46 Uhr schrieb Ruiling Song :
>
> For details of the implementation, please refer to the comment
> inlined in the assembly code.

This sentence sounds unneeded to me.

> It improves the horizontal pass
> performance about 100% under single thread.

I am not a native speaker but I wonder what a "100% speed
improvement" could mean...

> Tested overall performance using the command(avx2 enabled):
> ./ffmpeg -i 1080p.mp4 -vf gblur -f null /dev/null
> ./ffmpeg -i 1080p.mp4 -vf gblur=threads=1 -f null /dev/null
> For single thread, the fps improves from 43 to 60, about 40%.
> For multi-thread, the fps improves from 110 to 130, about 20%.

Carl Eugen
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH] avfilter/vf_gblur: add x86 SIMD optimizations

2019-05-30 Thread Song, Ruiling


> -Original Message-
> From: Paul B Mahol [mailto:one...@gmail.com]
> Sent: Thursday, May 30, 2019 3:24 PM
> To: FFmpeg development discussions and patches  de...@ffmpeg.org>
> Cc: Song, Ruiling 
> Subject: Re: [FFmpeg-devel] [PATCH] avfilter/vf_gblur: add x86 SIMD
> optimizations
> 
> On 5/30/19, Ruiling Song  wrote:
> > For details of the implementation, please refer to the comment
> > inlined in the assembly code. It improves the horizontal pass
> > performance about 100% under single thread.
> >
> > Tested overall performance using the command(avx2 enabled):
> > ./ffmpeg -i 1080p.mp4 -vf gblur -f null /dev/null
> > ./ffmpeg -i 1080p.mp4 -vf gblur=threads=1 -f null /dev/null
> > For single thread, the fps improves from 43 to 60, about 40%.
> > For multi-thread, the fps improves from 110 to 130, about 20%.
> >
> > Signed-off-by: Ruiling Song 
> > ---
> >  libavfilter/gblur.h |  54 ++
> >  libavfilter/vf_gblur.c  |  66 +---
> >  libavfilter/x86/Makefile|   2 +
> >  libavfilter/x86/vf_gblur.asm| 182
> 
> >  libavfilter/x86/vf_gblur_init.c |  36 +++
> >  5 files changed, 302 insertions(+), 38 deletions(-)
> >  create mode 100644 libavfilter/gblur.h
> >  create mode 100644 libavfilter/x86/vf_gblur.asm
> >  create mode 100644 libavfilter/x86/vf_gblur_init.c

[...]
> > diff --git a/libavfilter/vf_gblur.c b/libavfilter/vf_gblur.c
> > index b91a8c074a..4e876bca05 100644
> > --- a/libavfilter/vf_gblur.c
> > +++ b/libavfilter/vf_gblur.c
> > @@ -30,29 +30,11 @@
> >  #include "libavutil/pixdesc.h"
> >  #include "avfilter.h"
> >  #include "formats.h"
> > +#include "gblur.h"
> >  #include "internal.h"
> >  #include "video.h"
> > +#include 
> 
> Is this header really needed?
Oh, this is not needed, I forget to remove it after I am experimenting with SSE 
intrinsics.
Will remove it. Thanks!

Ruiling
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH] avfilter/vf_gblur: add x86 SIMD optimizations

2019-05-30 Thread Paul B Mahol
On 5/30/19, Ruiling Song  wrote:
> For details of the implementation, please refer to the comment
> inlined in the assembly code. It improves the horizontal pass
> performance about 100% under single thread.
>
> Tested overall performance using the command(avx2 enabled):
> ./ffmpeg -i 1080p.mp4 -vf gblur -f null /dev/null
> ./ffmpeg -i 1080p.mp4 -vf gblur=threads=1 -f null /dev/null
> For single thread, the fps improves from 43 to 60, about 40%.
> For multi-thread, the fps improves from 110 to 130, about 20%.
>
> Signed-off-by: Ruiling Song 
> ---
>  libavfilter/gblur.h |  54 ++
>  libavfilter/vf_gblur.c  |  66 +---
>  libavfilter/x86/Makefile|   2 +
>  libavfilter/x86/vf_gblur.asm| 182 
>  libavfilter/x86/vf_gblur_init.c |  36 +++
>  5 files changed, 302 insertions(+), 38 deletions(-)
>  create mode 100644 libavfilter/gblur.h
>  create mode 100644 libavfilter/x86/vf_gblur.asm
>  create mode 100644 libavfilter/x86/vf_gblur_init.c
>
> diff --git a/libavfilter/gblur.h b/libavfilter/gblur.h
> new file mode 100644
> index 00..97217044d0
> --- /dev/null
> +++ b/libavfilter/gblur.h
> @@ -0,0 +1,54 @@
> +/*
> + * Copyright (c) 2011 Pascal Getreuer
> + * Copyright (c) 2016 Paul B Mahol
> + *
> + * Redistribution and use in source and binary forms, with or without
> modification,
> + * are permitted provided that the following conditions are met:
> + *
> + *  * Redistributions of source code must retain the above copyright
> + *notice, this list of conditions and the following disclaimer.
> + *  * Redistributions in binary form must reproduce the above
> + *copyright notice, this list of conditions and the following
> + *disclaimer in the documentation and/or other materials provided
> + *with the distribution.
> + *
> + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + * HOLDER BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
> + * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
> + * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
> + * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
> + * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
> + * NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
> + * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +#ifndef AVFILTER_GBLUR_H
> +#define AVFILTER_GBLUR_H
> +#include "avfilter.h"
> +
> +typedef struct GBlurContext {
> +const AVClass *class;
> +
> +float sigma;
> +float sigmaV;
> +int steps;
> +int planes;
> +
> +int depth;
> +int planewidth[4];
> +int planeheight[4];
> +float *buffer;
> +float boundaryscale;
> +float boundaryscaleV;
> +float postscale;
> +float postscaleV;
> +float nu;
> +float nuV;
> +int nb_planes;
> +void (*horiz_slice)(float *buffer, int width, int height, int steps,
> float nu, float bscale);
> +} GBlurContext;
> +void ff_gblur_init_x86(GBlurContext *s);
> +#endif
> diff --git a/libavfilter/vf_gblur.c b/libavfilter/vf_gblur.c
> index b91a8c074a..4e876bca05 100644
> --- a/libavfilter/vf_gblur.c
> +++ b/libavfilter/vf_gblur.c
> @@ -30,29 +30,11 @@
>  #include "libavutil/pixdesc.h"
>  #include "avfilter.h"
>  #include "formats.h"
> +#include "gblur.h"
>  #include "internal.h"
>  #include "video.h"
> +#include 

Is this header really needed?

>
> -typedef struct GBlurContext {
> -const AVClass *class;
> -
> -float sigma;
> -float sigmaV;
> -int steps;
> -int planes;
> -
> -int depth;
> -int planewidth[4];
> -int planeheight[4];
> -float *buffer;
> -float boundaryscale;
> -float boundaryscaleV;
> -float postscale;
> -float postscaleV;
> -float nu;
> -float nuV;
> -int nb_planes;
> -} GBlurContext;
>
>  #define OFFSET(x) offsetof(GBlurContext, x)
>  #define FLAGS AV_OPT_FLAG_VIDEO_PARAM|AV_OPT_FLAG_FILTERING_PARAM
> @@ -72,39 +54,44 @@ typedef struct ThreadData {
>  int width;
>  } ThreadData;
>
> -static int filter_horizontally(AVFilterContext *ctx, void *arg, int jobnr,
> int nb_jobs)
> +static void horiz_slice_c(float *buffer, int width, int height, int steps,
> +  float nu, float bscale)
>  {
> -GBlurContext *s = ctx->priv;
> -ThreadData *td = arg;
> -const int height = td->height;
> -const int width = td->width;
> -const int slice_start = (height *  jobnr   ) / nb_jobs;
> -const int slice_end   = (height * (jobnr+1)) / nb_jobs;
> -const float boundaryscale = s->boundaryscale;
> -const int steps = s->steps;
> -const float nu = s->nu;
> -float *buffer = s->buffer;
> -int