Re: [FFmpeg-devel] [PATCH 1/6] avcodec/h264: mmx2, sse2, avx 10-bit h chroma deblock/loop filter
On 2016-12-02 00:31, Carl Eugen Hoyos wrote: > 2016-12-01 17:57 GMT+01:00 James Darnley: >> Yorkfield: >> - mmx2: 2.44x faster (278 vs. 114 cycles) >> - sse2: 3.35x faster (278 vs. 83 cycles) >> >> Skylake: >> - mmx2: 1.69x faster (169 vs. 100 cycles) >> - sse2: 2.34x faster (169 vs. 72 cycles) > > Is it expected (or possible) that the speed impact is so > different for different Intel hardware? Yes. Intel's Core branded processors introduced a much better micro-architecture (the generation after the Yorkfield) which will cause the scalar C code to be quite a bit faster. The SIMD on the other hand was already so quick it didn't gain much. (At least I think I remember this being the story.) >> - avx: 2.32x faster (169 vs. 73 cycles) > > Don't you agree that if this is true (I don't know if it is) > the patch should not be applied as is? I do agree and I wouldn't (deliberately) apply anything that made the decoder slower, or not as fast as it could be. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 1/6] avcodec/h264: mmx2, sse2, avx 10-bit h chroma deblock/loop filter
On 2016-12-01 23:16, Michael Niedermayer wrote: > On Thu, Dec 01, 2016 at 05:57:44PM +0100, James Darnley wrote: >> Yorkfield: >> - mmx2: 2.44x faster (278 vs. 114 cycles) >> - sse2: 3.35x faster (278 vs. 83 cycles) >> >> Skylake: >> - mmx2: 1.69x faster (169 vs. 100 cycles) >> - sse2: 2.34x faster (169 vs. 72 cycles) >> - avx: 2.32x faster (169 vs. 73 cycles) >> --- >> libavcodec/x86/h264_deblock_10bit.asm | 118 >> ++ >> libavcodec/x86/h264dsp_init.c | 9 +++ >> 2 files changed, 127 insertions(+) > > breaks build on linux x86-32 > > YASMlibavcodec/x86/h264_deblock_10bit.o > src/libavcodec/x86/h264_deblock_10bit.asm:1039: warning: `bpl' is a register > in 64-bit mode > src/libavcodec/x86/h264_deblock_10bit.asm:1039: error: undefined symbol `bpl' > (first use) > src/libavcodec/x86/h264_deblock_10bit.asm:1039: error: (Each undefined > symbol is reported only once.) > src/libavcodec/x86/h264_deblock_10bit.asm:1039: warning: `bpl' is a register > in 64-bit mode Ah. I shouldn't do clever things like trying to use the byte-sized registers. It isn't needed and causes problems like this. Changed locally. Also changed in the 4:2:0 chroma intra patch. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 1/6] avcodec/h264: mmx2, sse2, avx 10-bit h chroma deblock/loop filter
2016-12-01 17:57 GMT+01:00 James Darnley: > Yorkfield: > - mmx2: 2.44x faster (278 vs. 114 cycles) > - sse2: 3.35x faster (278 vs. 83 cycles) > > Skylake: > - mmx2: 1.69x faster (169 vs. 100 cycles) > - sse2: 2.34x faster (169 vs. 72 cycles) Is it expected (or possible) that the speed impact is so different for different Intel hardware? > - avx: 2.32x faster (169 vs. 73 cycles) Don't you agree that if this is true (I don't know if it is) the patch should not be applied as is? Carl Eugen ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 1/6] avcodec/h264: mmx2, sse2, avx 10-bit h chroma deblock/loop filter
On Thu, Dec 01, 2016 at 05:57:44PM +0100, James Darnley wrote: > Yorkfield: > - mmx2: 2.44x faster (278 vs. 114 cycles) > - sse2: 3.35x faster (278 vs. 83 cycles) > > Skylake: > - mmx2: 1.69x faster (169 vs. 100 cycles) > - sse2: 2.34x faster (169 vs. 72 cycles) > - avx: 2.32x faster (169 vs. 73 cycles) > --- > libavcodec/x86/h264_deblock_10bit.asm | 118 > ++ > libavcodec/x86/h264dsp_init.c | 9 +++ > 2 files changed, 127 insertions(+) breaks build on linux x86-32 YASMlibavcodec/x86/h264_deblock_10bit.o src/libavcodec/x86/h264_deblock_10bit.asm:1039: warning: `bpl' is a register in 64-bit mode src/libavcodec/x86/h264_deblock_10bit.asm:1039: error: undefined symbol `bpl' (first use) src/libavcodec/x86/h264_deblock_10bit.asm:1039: error: (Each undefined symbol is reported only once.) src/libavcodec/x86/h264_deblock_10bit.asm:1039: warning: `bpl' is a register in 64-bit mode [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB In a rich man's house there is no place to spit but his face. -- Diogenes of Sinope signature.asc Description: Digital signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel