Re: [FFmpeg-devel] [PATCH 0/6] x86 SIMD for dirac 10-bit wavelet transforms

2018-07-25 Thread James Darnley
On 2018-07-19 17:23, Rostislav Pehlivanov wrote: > Could you provide standard overall transform results using START/STOP_TIMER > rather than overall decoding speed? Ask and ye shall receive. > haar horizontal compose > sse2: 3.67x faster (45248±108.1 vs. 12328±21.1 decicycles) compared with

Re: [FFmpeg-devel] [PATCH 0/6] x86 SIMD for dirac 10-bit wavelet transforms

2018-07-19 Thread Rostislav Pehlivanov
On 19 July 2018 at 16:29, James Darnley wrote: > On 2018-07-19 17:23, Rostislav Pehlivanov wrote: > > > > Could you provide standard overall transform results using > START/STOP_TIMER > > rather than overall decoding speed? > > Coefficients sizes and therefore golomb unpacking speed changes with

Re: [FFmpeg-devel] [PATCH 0/6] x86 SIMD for dirac 10-bit wavelet transforms

2018-07-19 Thread James Darnley
On 2018-07-19 17:23, Rostislav Pehlivanov wrote: > > Could you provide standard overall transform results using START/STOP_TIMER > rather than overall decoding speed? > Coefficients sizes and therefore golomb unpacking speed changes with > respect to the transform so potentially there could be som

Re: [FFmpeg-devel] [PATCH 0/6] x86 SIMD for dirac 10-bit wavelet transforms

2018-07-19 Thread Rostislav Pehlivanov
On 19 July 2018 at 15:52, James Darnley wrote: > I tested the speed gains by using ffmpeg to decode a 720p yuv422p10 file > encoded > with the relevant transform. The summary is below. > > Haar > C:119fps > SSE2: 204fps > AVX: 206fps > AVX2: 221fps > > 5_3 > C: 94fps > SSE2: 118fps > AV

[FFmpeg-devel] [PATCH 0/6] x86 SIMD for dirac 10-bit wavelet transforms

2018-07-19 Thread James Darnley
I tested the speed gains by using ffmpeg to decode a 720p yuv422p10 file encoded with the relevant transform. The summary is below. Haar C:119fps SSE2: 204fps AVX: 206fps AVX2: 221fps 5_3 C: 94fps SSE2: 118fps AVX2: 121fps 9_7 C: 84fps SSE2: 111fps AVX2: 115fps Is the AVX worth it