Re: [FFmpeg-devel] [PATCH] avfilter: add hflip x86 SIMD

2017-12-03 Thread Martin Vignali
Checkasm result (osx) for your last patch : hflip_byte_c: 28.5 hflip_byte_ssse3: 29.0 hflip_short_c: 277.7 hflip_short_ssse3: 65.0 if you add a "cmp xq, wq" after the simd loop you can be faster than c (clang), if width is multiple of mmsize*2 hflip_byte_c: 28.5 hflip_byte_ssse3: 27.5 see below

Re: [FFmpeg-devel] [PATCH] avfilter: add hflip x86 SIMD

2017-12-03 Thread James Almer
On 12/3/2017 5:50 PM, Paul B Mahol wrote: > Signed-off-by: Paul B Mahol > --- > libavfilter/hflip.h | 38 > libavfilter/vf_hflip.c | 133 > ++-- > libavfilter/x86/Makefile| 2 + >

Re: [FFmpeg-devel] [PATCH] avfilter: add hflip x86 SIMD

2017-12-03 Thread Martin Vignali
I modify the checkasm test, to test various width if (check_func(s.flip_line[0], "hflip_%s", report_name)) { for (i = 1; i < w; i++) { call_ref(src, dst_ref, i); call_new(src, dst_new, i); if (memcmp(dst_ref, dst_new, WIDTH)) {

Re: [FFmpeg-devel] [PATCH] avfilter: add hflip x86 SIMD

2017-12-03 Thread Paul B Mahol
On 12/3/17, Paul B Mahol wrote: > On 12/3/17, Martin Vignali wrote: >> Maybe the problem come from the skip part : >> >> +INIT_XMM ssse3 >>> +cglobal hflip_byte, 3, 5, 3, src, dst, w, x, v >>> +movam0, [pb_flip_byte] >>> +mov xq, 0 >>>

Re: [FFmpeg-devel] [PATCH] avfilter: add hflip x86 SIMD

2017-12-03 Thread Paul B Mahol
On 12/3/17, Martin Vignali wrote: > Maybe the problem come from the skip part : > > +INIT_XMM ssse3 >> +cglobal hflip_byte, 3, 5, 3, src, dst, w, x, v >> +movam0, [pb_flip_byte] >> +mov xq, 0 >> +mov wd, dword wm >> +sub wq, 2 * mmsize >>

Re: [FFmpeg-devel] [PATCH] avfilter: add hflip x86 SIMD

2017-12-03 Thread Paul B Mahol
On 12/3/17, Martin Vignali wrote: > 2017-12-03 20:36 GMT+01:00 Paul B Mahol : > >> On 12/3/17, Martin Vignali wrote: >> >> >> >> In any case, if clang or gcc can generate better code, then the hand >> >> written version needs

Re: [FFmpeg-devel] [PATCH] avfilter: add hflip x86 SIMD

2017-12-03 Thread Martin Vignali
Maybe the problem come from the skip part : +INIT_XMM ssse3 > +cglobal hflip_byte, 3, 5, 3, src, dst, w, x, v > +movam0, [pb_flip_byte] > +mov xq, 0 > +mov wd, dword wm > +sub wq, 2 * mmsize > +cmp wq, mmsize > +jl .skip > + > +.loop0: > +

Re: [FFmpeg-devel] [PATCH] avfilter: add hflip x86 SIMD

2017-12-03 Thread Martin Vignali
2017-12-03 20:36 GMT+01:00 Paul B Mahol : > On 12/3/17, Martin Vignali wrote: > >> > >> In any case, if clang or gcc can generate better code, then the hand > >> written version needs to be optimized to be as fast or faster. > >> > >> > >> > > Quick

Re: [FFmpeg-devel] [PATCH] avfilter: add hflip x86 SIMD

2017-12-03 Thread Paul B Mahol
On 12/3/17, Martin Vignali wrote: >> >> In any case, if clang or gcc can generate better code, then the hand >> written version needs to be optimized to be as fast or faster. >> >> >> > Quick test : pass checkasm (but probably only because width = 256) > hflip_byte_c:

Re: [FFmpeg-devel] [PATCH] avfilter: add hflip x86 SIMD

2017-12-03 Thread Martin Vignali
> > In any case, if clang or gcc can generate better code, then the hand > written version needs to be optimized to be as fast or faster. > > > Quick test : pass checkasm (but probably only because width = 256) hflip_byte_c: 26.4 hflip_byte_ssse3: 20.4 INIT_XMM ssse3 cglobal hflip_byte, 3, 5, 2,

Re: [FFmpeg-devel] [PATCH] avfilter: add hflip x86 SIMD

2017-12-03 Thread Paul B Mahol
On 12/3/17, Paul B Mahol wrote: > On 12/3/17, Paul B Mahol wrote: >> Signed-off-by: Paul B Mahol >> --- >> libavfilter/hflip.h | 38 >> libavfilter/vf_hflip.c | 133 >>

Re: [FFmpeg-devel] [PATCH] avfilter: add hflip x86 SIMD

2017-12-03 Thread Paul B Mahol
On 12/3/17, Paul B Mahol wrote: > Signed-off-by: Paul B Mahol > --- > libavfilter/hflip.h | 38 > libavfilter/vf_hflip.c | 133 > ++-- > libavfilter/x86/Makefile| 2 + >

Re: [FFmpeg-devel] [PATCH] avfilter: add hflip x86 SIMD

2017-12-03 Thread James Almer
On 12/3/2017 3:55 PM, Martin Vignali wrote: > in O2 or O3 : clang -S -O3 test_asm_gen.c > > If i correctly understand, same idea than paul's patch > but processing two xmm in the main loop > > .section__TEXT,__text,regular,pure_instructions > .macosx_version_min 10, 12 > .section

Re: [FFmpeg-devel] [PATCH] avfilter: add hflip x86 SIMD

2017-12-03 Thread Martin Vignali
> Can you post a disassembly of hflip_byte_c? > > > in O1 : clang -S -O1 test_asm_gen.c .section__TEXT,__text,regular,pure_instructions .macosx_version_min 10, 12 .globl_hflip_byte_c .p2align4, 0x90 _hflip_byte_c: ## @hflip_byte_c

Re: [FFmpeg-devel] [PATCH] avfilter: add hflip x86 SIMD

2017-12-03 Thread James Almer
On 12/3/2017 3:09 PM, Martin Vignali wrote: >> 2017-12-03 17:46 GMT+01:00 Paul B Mahol : >> >>> On 12/3/17, Martin Vignali wrote: Hello, Maybe you can use a macro for byte and short version, only few lines are different in each

Re: [FFmpeg-devel] [PATCH] avfilter: add hflip x86 SIMD

2017-12-03 Thread Martin Vignali
> 2017-12-03 17:46 GMT+01:00 Paul B Mahol : > >> On 12/3/17, Martin Vignali wrote: >> > Hello, >> > >> > Maybe you can use a macro for byte and short version, >> > only few lines are different in each version >> >> Sure, feel free to send patches. >> >>

Re: [FFmpeg-devel] [PATCH] avfilter: add hflip x86 SIMD

2017-12-03 Thread Martin Vignali
2017-12-03 17:46 GMT+01:00 Paul B Mahol : > On 12/3/17, Martin Vignali wrote: > > Hello, > > > > Maybe you can use a macro for byte and short version, > > only few lines are different in each version > > Sure, feel free to send patches. > > I'm not

Re: [FFmpeg-devel] [PATCH] avfilter: add hflip x86 SIMD

2017-12-03 Thread Paul B Mahol
On 12/3/17, Martin Vignali wrote: > Hello, > > Maybe you can use a macro for byte and short version, > only few lines are different in each version Sure, feel free to send patches. I'm not very macro proficient. ___

Re: [FFmpeg-devel] [PATCH] avfilter: add hflip x86 SIMD

2017-12-03 Thread Martin Vignali
Hello, Maybe you can use a macro for byte and short version, only few lines are different in each version Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH] avfilter: add hflip x86 SIMD

2017-12-02 Thread Paul B Mahol
On 12/2/17, Martin Vignali wrote: >> + >> +%include "libavutil/x86/x86util.asm" >> + >> +SECTION_RODATA >> + >> +pb_flip_byte: times 16 db 15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0 >> +pb_flip_short: times 16 db 14,15,12,13,10,11,8,9,6,7,4,5,2,3,0,1 >> + >> > > times 16 ?

Re: [FFmpeg-devel] [PATCH] avfilter: add hflip x86 SIMD

2017-12-02 Thread Martin Vignali
> + > +%include "libavutil/x86/x86util.asm" > + > +SECTION_RODATA > + > +pb_flip_byte: times 16 db 15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0 > +pb_flip_short: times 16 db 14,15,12,13,10,11,8,9,6,7,4,5,2,3,0,1 > + > times 16 ? Martin ___ ffmpeg-devel

Re: [FFmpeg-devel] [PATCH] avfilter: add hflip x86 SIMD

2017-12-01 Thread James Almer
On 12/1/2017 7:02 PM, Paul B Mahol wrote: > Signed-off-by: Paul B Mahol > --- > libavfilter/hflip.h | 38 + > libavfilter/vf_hflip.c | 30 ++-- > libavfilter/x86/Makefile| 2 ++ > libavfilter/x86/vf_hflip.asm

Re: [FFmpeg-devel] [PATCH] avfilter: add hflip x86 SIMD

2017-12-01 Thread James Almer
On 12/1/2017 11:13 PM, Michael Niedermayer wrote: > On Fri, Dec 01, 2017 at 11:02:43PM +0100, Paul B Mahol wrote: >> Signed-off-by: Paul B Mahol >> --- >> libavfilter/hflip.h | 38 + >> libavfilter/vf_hflip.c | 30

Re: [FFmpeg-devel] [PATCH] avfilter: add hflip x86 SIMD

2017-12-01 Thread Michael Niedermayer
On Fri, Dec 01, 2017 at 11:02:43PM +0100, Paul B Mahol wrote: > Signed-off-by: Paul B Mahol > --- > libavfilter/hflip.h | 38 + > libavfilter/vf_hflip.c | 30 ++-- > libavfilter/x86/Makefile| 2 ++ >