Re: [FFmpeg-devel] libavcodec/blockdsp : add AVX version

2017-10-03 Thread James Almer
On 10/3/2017 4:47 PM, Martin Vignali wrote: > Hello, > > >> I used GCC 7.2. clear_blocks_mmx is slower than c for me as well, but >> not the rest. >> Your compiler seems to have done a much better job than mine. Is it >> Clang? Does it somehow have vectorization enabled perhaps? Because >>

Re: [FFmpeg-devel] libavcodec/blockdsp : add AVX version

2017-10-03 Thread Ronald S. Bultje
Hi, On Tue, Oct 3, 2017 at 3:47 PM, Martin Vignali wrote: > 2017-10-02 4:05 GMT+02:00 Ronald S. Bultje : > > On Sun, Oct 1, 2017 at 7:46 PM, Martin Vignali > > > wrote: > > > I also modify several decoder/encoder, in

Re: [FFmpeg-devel] libavcodec/blockdsp : add AVX version

2017-10-03 Thread Martin Vignali
Hello, > I used GCC 7.2. clear_blocks_mmx is slower than c for me as well, but > not the rest. > Your compiler seems to have done a much better job than mine. Is it > Clang? Does it somehow have vectorization enabled perhaps? Because > that's not supposed to happen. > > Yes it's Clang 8.1 I put

Re: [FFmpeg-devel] libavcodec/blockdsp : add AVX version

2017-10-01 Thread Ronald S. Bultje
Hi, On Sun, Oct 1, 2017 at 7:46 PM, Martin Vignali wrote: > I also modify several decoder/encoder, in order to fix the DECLARE_ALIGNED > from 16 to 32 > How did you decide which ones to change? Ronald ___ ffmpeg-devel

Re: [FFmpeg-devel] libavcodec/blockdsp : add AVX version

2017-10-01 Thread James Almer
On 10/1/2017 8:46 PM, Martin Vignali wrote: > Hello, > > After taking a look on blockdsp > ./tests/checkasm/checkasm --test=blockdsp --bench > > the result of clear_blocks is slower on my computer than the C version > except if we add an avx version > > In attach patch to add avx version > for

[FFmpeg-devel] libavcodec/blockdsp : add AVX version

2017-10-01 Thread Martin Vignali
Hello, After taking a look on blockdsp ./tests/checkasm/checkasm --test=blockdsp --bench the result of clear_blocks is slower on my computer than the C version except if we add an avx version In attach patch to add avx version for clear_block and clear_blocks result : (Kaby Lake, Mac os 10.12)