Re: [FFmpeg-devel] [PATCH V3 1/2] avfilter/vf_gblur: add x86 SIMD optimizations

2019-06-12 Thread Song, Ruiling
> -Original Message-
> From: ffmpeg-devel [mailto:ffmpeg-devel-boun...@ffmpeg.org] On Behalf
> Of Adam Sampson
> Sent: Wednesday, June 12, 2019 8:21 PM
> To: ffmpeg-devel@ffmpeg.org
> Subject: Re: [FFmpeg-devel] [PATCH V3 1/2] avfilter/vf_gblur: add x86 SIMD
> optimizations
> 
> Hi Ruiling,
> 
> Ruiling Song  writes:
> 
> This breaks the build for me on x86-32 -- the asm helpers in
> vf_gblur.asm are only defined on x86-64, but vf_gblur_init.c expects
> them to exist on both architectures.
> 
> ld: libavfilter/libavfilter.so: undefined reference to `ff_horiz_slice_avx2'
> ld: libavfilter/libavfilter.so: undefined reference to `ff_horiz_slice_sse4'
> collect2: error: ld returned 1 exit status
> 
> Adding "#if ARCH_X86_64" conditionals to vf_gblur_init.c fixes it.
Thank you for reporting this. Sorry for that. Thank you James for fixing it.

> 
> Thanks,
> 
> --
> Adam Sampson  <http://offog.org/>
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH V3 1/2] avfilter/vf_gblur: add x86 SIMD optimizations

2019-06-12 Thread Adam Sampson
Hi Ruiling,

Ruiling Song  writes:

> The horizontal pass get ~2x performance with the patch
> under single thread.
[...]
> +++ b/libavfilter/x86/vf_gblur.asm
[...]
> +%if ARCH_X86_64
> +INIT_XMM sse4
> +HORIZ_SLICE
> +
> +INIT_XMM avx2
> +HORIZ_SLICE
> +%endif
[...]
> +++ b/libavfilter/x86/vf_gblur_init.c
[...]
> +void ff_horiz_slice_sse4(float *ptr, int width, int height, int steps, float 
> nu, float bscale);
> +void ff_horiz_slice_avx2(float *ptr, int width, int height, int
> steps, float nu, float bscale);

This breaks the build for me on x86-32 -- the asm helpers in
vf_gblur.asm are only defined on x86-64, but vf_gblur_init.c expects
them to exist on both architectures.

ld: libavfilter/libavfilter.so: undefined reference to `ff_horiz_slice_avx2'
ld: libavfilter/libavfilter.so: undefined reference to `ff_horiz_slice_sse4'
collect2: error: ld returned 1 exit status

Adding "#if ARCH_X86_64" conditionals to vf_gblur_init.c fixes it.

Thanks,

-- 
Adam Sampson  
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH V3 1/2] avfilter/vf_gblur: add x86 SIMD optimizations

2019-06-05 Thread Ruiling Song
The horizontal pass get ~2x performance with the patch
under single thread.

Tested overall performance using the command(avx2 enabled):
./ffmpeg -i 1080p.mp4 -vf gblur -f null /dev/null
./ffmpeg -i 1080p.mp4 -vf gblur=threads=1 -f null /dev/null
For single thread, the fps improves from 43 to 60, about 40%.
For multi-thread, the fps improves from 110 to 130, about 20%.

v2:
Fix the bug when steps is not one.
v3:
Fix the bug when the upper half of 64bit register for 'int'
argument passing may have garbage.

Signed-off-by: Ruiling Song 
---
 libavfilter/gblur.h |  55 ++
 libavfilter/vf_gblur.c  |  71 ++--
 libavfilter/x86/Makefile|   2 +
 libavfilter/x86/vf_gblur.asm| 185 
 libavfilter/x86/vf_gblur_init.c |  36 +++
 5 files changed, 310 insertions(+), 39 deletions(-)
 create mode 100644 libavfilter/gblur.h
 create mode 100644 libavfilter/x86/vf_gblur.asm
 create mode 100644 libavfilter/x86/vf_gblur_init.c

diff --git a/libavfilter/gblur.h b/libavfilter/gblur.h
new file mode 100644
index 00..87129801de
--- /dev/null
+++ b/libavfilter/gblur.h
@@ -0,0 +1,55 @@
+/*
+ * Copyright (c) 2011 Pascal Getreuer
+ * Copyright (c) 2016 Paul B Mahol
+ *
+ * Redistribution and use in source and binary forms, with or without 
modification,
+ * are permitted provided that the following conditions are met:
+ *
+ *  * Redistributions of source code must retain the above copyright
+ *notice, this list of conditions and the following disclaimer.
+ *  * Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials provided
+ *with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * HOLDER BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+ * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+ * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+ * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+ * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+ * NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+ * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef AVFILTER_GBLUR_H
+#define AVFILTER_GBLUR_H
+#include "avfilter.h"
+
+typedef struct GBlurContext {
+const AVClass *class;
+
+float sigma;
+float sigmaV;
+int steps;
+int planes;
+
+int depth;
+int planewidth[4];
+int planeheight[4];
+float *buffer;
+float boundaryscale;
+float boundaryscaleV;
+float postscale;
+float postscaleV;
+float nu;
+float nuV;
+int nb_planes;
+void (*horiz_slice)(float *buffer, int width, int height, int steps, float 
nu, float bscale);
+} GBlurContext;
+void ff_gblur_init(GBlurContext *s);
+void ff_gblur_init_x86(GBlurContext *s);
+#endif
diff --git a/libavfilter/vf_gblur.c b/libavfilter/vf_gblur.c
index b91a8c074a..e71b33da80 100644
--- a/libavfilter/vf_gblur.c
+++ b/libavfilter/vf_gblur.c
@@ -30,30 +30,10 @@
 #include "libavutil/pixdesc.h"
 #include "avfilter.h"
 #include "formats.h"
+#include "gblur.h"
 #include "internal.h"
 #include "video.h"
 
-typedef struct GBlurContext {
-const AVClass *class;
-
-float sigma;
-float sigmaV;
-int steps;
-int planes;
-
-int depth;
-int planewidth[4];
-int planeheight[4];
-float *buffer;
-float boundaryscale;
-float boundaryscaleV;
-float postscale;
-float postscaleV;
-float nu;
-float nuV;
-int nb_planes;
-} GBlurContext;
-
 #define OFFSET(x) offsetof(GBlurContext, x)
 #define FLAGS AV_OPT_FLAG_VIDEO_PARAM|AV_OPT_FLAG_FILTERING_PARAM
 
@@ -72,39 +52,44 @@ typedef struct ThreadData {
 int width;
 } ThreadData;
 
-static int filter_horizontally(AVFilterContext *ctx, void *arg, int jobnr, int 
nb_jobs)
+static void horiz_slice_c(float *buffer, int width, int height, int steps,
+  float nu, float bscale)
 {
-GBlurContext *s = ctx->priv;
-ThreadData *td = arg;
-const int height = td->height;
-const int width = td->width;
-const int slice_start = (height *  jobnr   ) / nb_jobs;
-const int slice_end   = (height * (jobnr+1)) / nb_jobs;
-const float boundaryscale = s->boundaryscale;
-const int steps = s->steps;
-const float nu = s->nu;
-float *buffer = s->buffer;
-int y, x, step;
+int step, x, y;
 float *ptr;
-
-/* Filter horizontally along each row */
-for (y = slice_start; y < slice_end; y++) {
+for (y = 0; y < height; y++) {
 for (step = 0; step < steps; step++) {
 ptr = buffer +