Re: [FFmpeg-devel] avfilter/x86/vf_blend : add avx2 for 8b func (v2)

2018-01-28 Thread Martin Vignali
2018-01-17 21:13 GMT+01:00 Martin Vignali : > Hello, > > > New patch in attach > > with modification in average, grain extract, multiply, screen, grain merge > > > -- blend Average -- > Prev patch : > average_c: 15605.4 > average_sse2: 1205.9 > average_avx2: 772.4 > >

Re: [FFmpeg-devel] avfilter/x86/vf_blend : add avx2 for 8b func (v2)

2018-01-17 Thread Martin Vignali
Hello, New patch in attach with modification in average, grain extract, multiply, screen, grain merge -- blend Average -- Prev patch : average_c: 15605.4 average_sse2: 1205.9 average_avx2: 772.4 New patch : average_c: 15604.4 average_sse2: 490.9 average_avx2: 265.2 With 3 operand : using

Re: [FFmpeg-devel] avfilter/x86/vf_blend : add avx2 for 8b func (v2)

2018-01-17 Thread Henrik Gramner
On Tue, Jan 16, 2018 at 11:33 PM, Martin Vignali wrote: > BLEND_INIT grainextract, 4 You could also try doing twice as much per iteration which might be more efficient, especially in avx2 since it avoids cross-lane shuffles. Applies to some other ones as well. E.g.

Re: [FFmpeg-devel] avfilter/x86/vf_blend : add avx2 for 8b func (v2)

2018-01-16 Thread Martin Vignali
2018-01-16 23:00 GMT+01:00 James Darnley : > On 2018-01-16 22:26, Martin Vignali wrote: > > diff --git a/libavutil/x86/x86util.asm b/libavutil/x86/x86util.asm > > index d7cd996842..9db2d90e57 100644 > > --- a/libavutil/x86/x86util.asm > > +++ b/libavutil/x86/x86util.asm >

Re: [FFmpeg-devel] avfilter/x86/vf_blend : add avx2 for 8b func (v2)

2018-01-16 Thread James Darnley
On 2018-01-16 22:26, Martin Vignali wrote: > diff --git a/libavutil/x86/x86util.asm b/libavutil/x86/x86util.asm > index d7cd996842..9db2d90e57 100644 > --- a/libavutil/x86/x86util.asm > +++ b/libavutil/x86/x86util.asm > @@ -335,7 +335,7 @@ > %endmacro > > %macro ABS2 4 > -%if cpuflag(ssse3) >

[FFmpeg-devel] avfilter/x86/vf_blend : add avx2 for 8b func (v2)

2018-01-16 Thread Martin Vignali
Hello, following Henrik Gramner comments (in discussion "avfilter/x86/vf_blend : add avx2 version for 8b func (WIP)") in attach new patch to add AVX2 version for each 8b func (except divide) 001 : avutil : add ABS2 for avx2 002 : avfilter : add AVX2 version for most of the func, the AVX2 is a