Checkasm result (osx) for your last patch :
hflip_byte_c: 28.5
hflip_byte_ssse3: 29.0
hflip_short_c: 277.7
hflip_short_ssse3: 65.0
if you add a "cmp xq, wq" after the simd loop
you can be faster than c (clang), if width is multiple of mmsize*2
hflip_byte_c: 28.5
hflip_byte_ssse3: 27.5
see below
On 12/3/2017 5:50 PM, Paul B Mahol wrote:
> Signed-off-by: Paul B Mahol
> ---
> libavfilter/hflip.h | 38
> libavfilter/vf_hflip.c | 133
> ++--
> libavfilter/x86/Makefile| 2 +
> libavfilter/x86/vf_hflip.asm| 1
Signed-off-by: Paul B Mahol
---
libavfilter/hflip.h | 38
libavfilter/vf_hflip.c | 133 ++--
libavfilter/x86/Makefile| 2 +
libavfilter/x86/vf_hflip.asm| 102 ++
libavfilter/x86/vf_hfl
I modify the checkasm test, to test various width
if (check_func(s.flip_line[0], "hflip_%s", report_name)) {
for (i = 1; i < w; i++) {
call_ref(src, dst_ref, i);
call_new(src, dst_new, i);
if (memcmp(dst_ref, dst_new, WIDTH)) {
printf("FA
On 12/3/17, Paul B Mahol wrote:
> On 12/3/17, Martin Vignali wrote:
>> Maybe the problem come from the skip part :
>>
>> +INIT_XMM ssse3
>>> +cglobal hflip_byte, 3, 5, 3, src, dst, w, x, v
>>> +movam0, [pb_flip_byte]
>>> +mov xq, 0
>>> +mov wd, dword wm
>>> +sub wq
On 12/3/17, Martin Vignali wrote:
> Maybe the problem come from the skip part :
>
> +INIT_XMM ssse3
>> +cglobal hflip_byte, 3, 5, 3, src, dst, w, x, v
>> +movam0, [pb_flip_byte]
>> +mov xq, 0
>> +mov wd, dword wm
>> +sub wq, 2 * mmsize
>> +cmp wq, mmsize
>>
On 12/3/17, Martin Vignali wrote:
> 2017-12-03 20:36 GMT+01:00 Paul B Mahol :
>
>> On 12/3/17, Martin Vignali wrote:
>> >>
>> >> In any case, if clang or gcc can generate better code, then the hand
>> >> written version needs to be optimized to be as fast or faster.
>> >>
>> >>
>> >>
>> > Quick t
Maybe the problem come from the skip part :
+INIT_XMM ssse3
> +cglobal hflip_byte, 3, 5, 3, src, dst, w, x, v
> +movam0, [pb_flip_byte]
> +mov xq, 0
> +mov wd, dword wm
> +sub wq, 2 * mmsize
> +cmp wq, mmsize
> +jl .skip
> +
> +.loop0:
> +neg
2017-12-03 20:36 GMT+01:00 Paul B Mahol :
> On 12/3/17, Martin Vignali wrote:
> >>
> >> In any case, if clang or gcc can generate better code, then the hand
> >> written version needs to be optimized to be as fast or faster.
> >>
> >>
> >>
> > Quick test : pass checkasm (but probably only because
On 12/3/17, Martin Vignali wrote:
>>
>> In any case, if clang or gcc can generate better code, then the hand
>> written version needs to be optimized to be as fast or faster.
>>
>>
>>
> Quick test : pass checkasm (but probably only because width = 256)
> hflip_byte_c: 26.4
> hflip_byte_ssse3: 20.4
>
> In any case, if clang or gcc can generate better code, then the hand
> written version needs to be optimized to be as fast or faster.
>
>
>
Quick test : pass checkasm (but probably only because width = 256)
hflip_byte_c: 26.4
hflip_byte_ssse3: 20.4
INIT_XMM ssse3
cglobal hflip_byte, 3, 5, 2,
Signed-off-by: Paul B Mahol
---
libavfilter/hflip.h | 38
libavfilter/vf_hflip.c | 133 ++--
libavfilter/x86/Makefile| 2 +
libavfilter/x86/vf_hflip.asm| 98 +
libavfilter/x86/vf_hfli
On 12/3/17, Paul B Mahol wrote:
> On 12/3/17, Paul B Mahol wrote:
>> Signed-off-by: Paul B Mahol
>> ---
>> libavfilter/hflip.h | 38
>> libavfilter/vf_hflip.c | 133
>> ++--
>> libavfilter/x86/Makefile| 2 +
>> lib
On 12/3/17, Paul B Mahol wrote:
> Signed-off-by: Paul B Mahol
> ---
> libavfilter/hflip.h | 38
> libavfilter/vf_hflip.c | 133
> ++--
> libavfilter/x86/Makefile| 2 +
> libavfilter/x86/vf_hflip.asm| 98 +++
On 12/3/2017 3:55 PM, Martin Vignali wrote:
> in O2 or O3 : clang -S -O3 test_asm_gen.c
>
> If i correctly understand, same idea than paul's patch
> but processing two xmm in the main loop
>
> .section__TEXT,__text,regular,pure_instructions
> .macosx_version_min 10, 12
> .section
> Can you post a disassembly of hflip_byte_c?
>
>
> in O1 : clang -S -O1 test_asm_gen.c
.section__TEXT,__text,regular,pure_instructions
.macosx_version_min 10, 12
.globl_hflip_byte_c
.p2align4, 0x90
_hflip_byte_c: ## @hflip_byte_c
.cfi_start
On 12/3/2017 3:09 PM, Martin Vignali wrote:
>> 2017-12-03 17:46 GMT+01:00 Paul B Mahol :
>>
>>> On 12/3/17, Martin Vignali wrote:
Hello,
Maybe you can use a macro for byte and short version,
only few lines are different in each version
>>>
>>> Sure, feel free to send patches.
>
> 2017-12-03 17:46 GMT+01:00 Paul B Mahol :
>
>> On 12/3/17, Martin Vignali wrote:
>> > Hello,
>> >
>> > Maybe you can use a macro for byte and short version,
>> > only few lines are different in each version
>>
>> Sure, feel free to send patches.
>>
>> I'm not very macro proficient.
>>
>
> Ok, i
2017-12-03 17:46 GMT+01:00 Paul B Mahol :
> On 12/3/17, Martin Vignali wrote:
> > Hello,
> >
> > Maybe you can use a macro for byte and short version,
> > only few lines are different in each version
>
> Sure, feel free to send patches.
>
> I'm not very macro proficient.
>
Ok, i will take a look
On 12/3/17, Martin Vignali wrote:
> Hello,
>
> Maybe you can use a macro for byte and short version,
> only few lines are different in each version
Sure, feel free to send patches.
I'm not very macro proficient.
___
ffmpeg-devel mailing list
ffmpeg-dev
Hello,
Maybe you can use a macro for byte and short version,
only few lines are different in each version
Martin
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
On 12/2/17, Martin Vignali wrote:
>> +
>> +%include "libavutil/x86/x86util.asm"
>> +
>> +SECTION_RODATA
>> +
>> +pb_flip_byte: times 16 db 15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0
>> +pb_flip_short: times 16 db 14,15,12,13,10,11,8,9,6,7,4,5,2,3,0,1
>> +
>>
>
> times 16 ?
Removed.
__
> +
> +%include "libavutil/x86/x86util.asm"
> +
> +SECTION_RODATA
> +
> +pb_flip_byte: times 16 db 15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0
> +pb_flip_short: times 16 db 14,15,12,13,10,11,8,9,6,7,4,5,2,3,0,1
> +
>
times 16 ?
Martin
___
ffmpeg-devel mailin
Signed-off-by: Paul B Mahol
---
libavfilter/hflip.h | 38
libavfilter/vf_hflip.c | 131 ++--
libavfilter/x86/Makefile| 2 +
libavfilter/x86/vf_hflip.asm| 92
libavfilter/x86/vf_hflip
On 12/1/2017 7:02 PM, Paul B Mahol wrote:
> Signed-off-by: Paul B Mahol
> ---
> libavfilter/hflip.h | 38 +
> libavfilter/vf_hflip.c | 30 ++--
> libavfilter/x86/Makefile| 2 ++
> libavfilter/x86/vf_hflip.asm| 61
> +++
On 12/1/2017 11:13 PM, Michael Niedermayer wrote:
> On Fri, Dec 01, 2017 at 11:02:43PM +0100, Paul B Mahol wrote:
>> Signed-off-by: Paul B Mahol
>> ---
>> libavfilter/hflip.h | 38 +
>> libavfilter/vf_hflip.c | 30 ++--
>> libavfilter/x
On Fri, Dec 01, 2017 at 11:02:43PM +0100, Paul B Mahol wrote:
> Signed-off-by: Paul B Mahol
> ---
> libavfilter/hflip.h | 38 +
> libavfilter/vf_hflip.c | 30 ++--
> libavfilter/x86/Makefile| 2 ++
> libavfilter/x86/vf_hflip.as
Signed-off-by: Paul B Mahol
---
libavfilter/hflip.h | 38 +
libavfilter/vf_hflip.c | 30 ++--
libavfilter/x86/Makefile| 2 ++
libavfilter/x86/vf_hflip.asm| 61 +
libavfilter/x86/vf_hf
28 matches
Mail list logo