Re: [FFmpeg-devel] [PATCH] x86/exrdsp: optimize ff_reorder_pixels_avx2()

2017-09-20 Thread Martin Vignali
>
> I also had a hard time getting to notice a difference with
> star/stop_timer, and it's clear why seeing how little difference this
> change truly makes.
>
> You can build the above tool with "make checkasm", and the executable
> will be in the tests/checkasm folder. The results tend to be less
> variable and it's better detecting small differences between functions.
>
> 
>
Thanks for the explanation, and the checkasm patch for exrdsp (i missed it
before)
I have similar result than you using checkasm,

Martin
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] x86/exrdsp: optimize ff_reorder_pixels_avx2()

2017-09-18 Thread James Almer
On 9/18/2017 4:35 AM, Martin Vignali wrote:
> 2017-09-18 3:52 GMT+02:00 James Almer :
> 
>> From: Henrik Gramner 
>>
>> Tested with "checkasm --test=exrdsp -bench"
>>
>> Before:
>> reorder_pixels_c: 5187.8
>> reorder_pixels_sse2: 377.0
>> reorder_pixels_avx2: 331.3
>>
>> After:
>> reorder_pixels_c: 5181.5
>> reorder_pixels_sse2: 377.0
>> reorder_pixels_avx2: 313.8
>>
>> I don't have the same result using a start/stop timer,
> but your testing approach is probably better than mine.

I also had a hard time getting to notice a difference with
star/stop_timer, and it's clear why seeing how little difference this
change truly makes.

You can build the above tool with "make checkasm", and the executable
will be in the tests/checkasm folder. The results tend to be less
variable and it's better detecting small differences between functions.

> And like, you both think it's a better way to do it, it's ok for me !
> 
> Thanks
> 
> Martin
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] x86/exrdsp: optimize ff_reorder_pixels_avx2()

2017-09-18 Thread Martin Vignali
2017-09-18 3:52 GMT+02:00 James Almer :

> From: Henrik Gramner 
>
> Tested with "checkasm --test=exrdsp -bench"
>
> Before:
> reorder_pixels_c: 5187.8
> reorder_pixels_sse2: 377.0
> reorder_pixels_avx2: 331.3
>
> After:
> reorder_pixels_c: 5181.5
> reorder_pixels_sse2: 377.0
> reorder_pixels_avx2: 313.8
>
> I don't have the same result using a start/stop timer,
but your testing approach is probably better than mine.
And like, you both think it's a better way to do it, it's ok for me !

Thanks

Martin
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH] x86/exrdsp: optimize ff_reorder_pixels_avx2()

2017-09-17 Thread James Almer
From: Henrik Gramner 

Tested with "checkasm --test=exrdsp -bench"

Before:
reorder_pixels_c: 5187.8
reorder_pixels_sse2: 377.0
reorder_pixels_avx2: 331.3

After:
reorder_pixels_c: 5181.5
reorder_pixels_sse2: 377.0
reorder_pixels_avx2: 313.8

Signed-off-by: James Almer 
---
 libavcodec/x86/exrdsp.asm | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/libavcodec/x86/exrdsp.asm b/libavcodec/x86/exrdsp.asm
index b91a7be20d..06c629e59e 100644
--- a/libavcodec/x86/exrdsp.asm
+++ b/libavcodec/x86/exrdsp.asm
@@ -39,16 +39,15 @@ cglobal reorder_pixels, 3,4,3, dst, src1, size, src2
 neg  sizeq; size = offset for 
dst, src1, src2
 .loop:
 
-%if cpuflag(avx2)
-vpermq  m0, [src1q + sizeq], 0xd8; load first 
part
-vpermq  m1, [src2q + sizeq], 0xd8; load second 
part
-%else
 movam0, [src1q+sizeq]; load first 
part
 movum1, [src2q+sizeq]; load second 
part
-%endif
 SBUTTERFLY bw, 0, 1, 2   ; interleaved
-mova [dstq+2*sizeq   ], m0   ; copy to dst
-mova [dstq+2*sizeq+mmsize], m1
+mova [dstq+2*sizeq   ], xm0  ; copy to dst
+mova [dstq+2*sizeq+16], xm1
+%if cpuflag(avx2)
+vperm2i128  m0, m0, m1, q0301
+mova [dstq+2*sizeq+32], m0
+%endif
 add sizeq, mmsize
 jl .loop
 RET
-- 
2.14.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel