Re: [FFmpeg-devel] avfilter/x86/vf_hflip : make macro and add AVX2
Pushed, using vpermq (faster for me) and add "%if HAVE_AVX2_EXTERNAL" around INIT YMM... Thanks Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] avfilter/x86/vf_hflip : make macro and add AVX2
2017-12-13 17:18 GMT+01:00 Henrik Gramner : > On Wed, Dec 13, 2017 at 6:07 AM, Martin Vignali > wrote: > > +vpermq m1, [srcq + xq - mmsize + %3], 0x4e; flip each lane > at load > > +vpermq m2, [srcq + xq - 2 * mmsize + %3], 0x4e; flip each lane > at load > > Would doing 2x 128-bit movu + 2x vinserti128 be faster? > > > Hello, Seems to be slower for me (patch in attach, maybe i made something wrong) With vpermq : hflip_byte_c: 29.2 hflip_byte_ssse3: 28.4 hflip_byte_avx2: 20.2 hflip_short_c: 29.2 hflip_short_ssse3: 28.4 hflip_short_avx2: 20.2 With movu + vinserti128 : hflip_byte_c: 29.2 hflip_byte_ssse3: 28.2 hflip_byte_avx2: 22.7 hflip_short_c: 29.7 hflip_short_ssse3: 28.2 hflip_short_avx2: 21.7 Martin 0001-avfilter-x86-vf_hflip-merge-hflip-byte-and-hflip-sho.patch Description: Binary data 0002-avfilter-x86-vf_hflip-add-avx2-version-for-hflip_byt.patch Description: Binary data ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] avfilter/x86/vf_hflip : make macro and add AVX2
On Wed, Dec 13, 2017 at 6:07 AM, Martin Vignali wrote: > +vpermq m1, [srcq + xq - mmsize + %3], 0x4e; flip each lane at > load > +vpermq m2, [srcq + xq - 2 * mmsize + %3], 0x4e; flip each lane at > load Would doing 2x 128-bit movu + 2x vinserti128 be faster? ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] avfilter/x86/vf_hflip : make macro and add AVX2
Hello, In attach patch to merge byte and short hflip asm func into a macro and add AVX2 version Checkasm result (Kaby Lake, x86_64, mac os 10.12) hflip_byte_c: 30.9 hflip_byte_ssse3: 30.4 hflip_byte_avx2: 21.9 hflip_short_c: 31.6 hflip_short_ssse3: 30.4 hflip_short_avx2: 22.4 Martin 0001-avfilter-x86-vf_hflip-merge-hflip-byte-and-hflip-sho.patch Description: Binary data 0002-avfilter-x86-vf_hflip-add-avx2-version-for-hflip_byt.patch Description: Binary data 0003-avfilter-x86-vf_hflip-indent-after-previous-commit.patch Description: Binary data ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel