2018-01-17 21:13 GMT+01:00 Martin Vignali :
> Hello,
>
>
> New patch in attach
>
> with modification in average, grain extract, multiply, screen, grain merge
>
>
> -- blend Average --
> Prev patch :
> average_c: 15605.4
> average_sse2: 1205.9
> average_avx2: 772.4
>
>
Hello,
New patch in attach
with modification in average, grain extract, multiply, screen, grain merge
-- blend Average --
Prev patch :
average_c: 15605.4
average_sse2: 1205.9
average_avx2: 772.4
New patch :
average_c: 15604.4
average_sse2: 490.9
average_avx2: 265.2
With 3 operand :
using
On Tue, Jan 16, 2018 at 11:33 PM, Martin Vignali
wrote:
> BLEND_INIT grainextract, 4
You could also try doing twice as much per iteration which might be
more efficient, especially in avx2 since it avoids cross-lane
shuffles. Applies to some other ones as well.
E.g.
2018-01-16 23:00 GMT+01:00 James Darnley :
> On 2018-01-16 22:26, Martin Vignali wrote:
> > diff --git a/libavutil/x86/x86util.asm b/libavutil/x86/x86util.asm
> > index d7cd996842..9db2d90e57 100644
> > --- a/libavutil/x86/x86util.asm
> > +++ b/libavutil/x86/x86util.asm
>
On 2018-01-16 22:26, Martin Vignali wrote:
> diff --git a/libavutil/x86/x86util.asm b/libavutil/x86/x86util.asm
> index d7cd996842..9db2d90e57 100644
> --- a/libavutil/x86/x86util.asm
> +++ b/libavutil/x86/x86util.asm
> @@ -335,7 +335,7 @@
> %endmacro
>
> %macro ABS2 4
> -%if cpuflag(ssse3)
>
Hello,
following Henrik Gramner comments (in discussion "avfilter/x86/vf_blend :
add avx2 version for 8b func (WIP)")
in attach new patch to add AVX2 version for each 8b func (except divide)
001 : avutil : add ABS2 for avx2
002 : avfilter : add AVX2 version
for most of the func, the AVX2 is a