Re: [FFmpeg-devel] avfilter/x86/vf_hflip : make macro and add AVX2

2017-12-19 Thread Martin Vignali
Pushed, using vpermq (faster for me)
and add "%if HAVE_AVX2_EXTERNAL" around INIT YMM...

Thanks

Martin
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] avfilter/x86/vf_hflip : make macro and add AVX2

2017-12-14 Thread Martin Vignali
2017-12-13 17:18 GMT+01:00 Henrik Gramner :

> On Wed, Dec 13, 2017 at 6:07 AM, Martin Vignali
>  wrote:
> > +vpermq  m1, [srcq + xq - mmsize + %3], 0x4e; flip each lane
> at load
> > +vpermq  m2, [srcq + xq - 2 * mmsize + %3], 0x4e; flip each lane
> at load
>
> Would doing 2x 128-bit movu + 2x vinserti128 be faster?
>
>
> Hello,

Seems to be slower for me (patch in attach, maybe i made something wrong)

With vpermq :
hflip_byte_c: 29.2
hflip_byte_ssse3: 28.4
hflip_byte_avx2: 20.2
hflip_short_c: 29.2
hflip_short_ssse3: 28.4
hflip_short_avx2: 20.2

With movu + vinserti128 :
hflip_byte_c: 29.2
hflip_byte_ssse3: 28.2
hflip_byte_avx2: 22.7
hflip_short_c: 29.7
hflip_short_ssse3: 28.2
hflip_short_avx2: 21.7

Martin


0001-avfilter-x86-vf_hflip-merge-hflip-byte-and-hflip-sho.patch
Description: Binary data


0002-avfilter-x86-vf_hflip-add-avx2-version-for-hflip_byt.patch
Description: Binary data
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] avfilter/x86/vf_hflip : make macro and add AVX2

2017-12-13 Thread Henrik Gramner
On Wed, Dec 13, 2017 at 6:07 AM, Martin Vignali
 wrote:
> +vpermq  m1, [srcq + xq - mmsize + %3], 0x4e; flip each lane at 
> load
> +vpermq  m2, [srcq + xq - 2 * mmsize + %3], 0x4e; flip each lane at 
> load

Would doing 2x 128-bit movu + 2x vinserti128 be faster?
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] avfilter/x86/vf_hflip : make macro and add AVX2

2017-12-13 Thread Martin Vignali
Hello,

In attach patch to merge byte and short hflip asm func into a macro
and add AVX2 version

Checkasm result (Kaby Lake, x86_64, mac os 10.12)
hflip_byte_c: 30.9
hflip_byte_ssse3: 30.4
hflip_byte_avx2: 21.9
hflip_short_c: 31.6
hflip_short_ssse3: 30.4
hflip_short_avx2: 22.4


Martin


0001-avfilter-x86-vf_hflip-merge-hflip-byte-and-hflip-sho.patch
Description: Binary data


0002-avfilter-x86-vf_hflip-add-avx2-version-for-hflip_byt.patch
Description: Binary data


0003-avfilter-x86-vf_hflip-indent-after-previous-commit.patch
Description: Binary data
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel