Re: [FFmpeg-devel] [PATCH 1/6] avcodec/h264: mmx2, sse2, avx 10-bit h chroma deblock/loop filter

2016-12-01 Thread James Darnley
On 2016-12-02 00:31, Carl Eugen Hoyos wrote:
> 2016-12-01 17:57 GMT+01:00 James Darnley :
>> Yorkfield:
>>  - mmx2: 2.44x faster (278 vs. 114 cycles)
>>  - sse2: 3.35x faster (278 vs.  83 cycles)
>>
>> Skylake:
>>  - mmx2: 1.69x faster (169 vs. 100 cycles)
>>  - sse2: 2.34x faster (169 vs.  72 cycles)
> 
> Is it expected (or possible) that the speed impact is so
> different for different Intel hardware?

Yes.  Intel's Core branded processors introduced a much better
micro-architecture (the generation after the Yorkfield) which will cause
the scalar C code to be quite a bit faster.  The SIMD on the other hand
was already so quick it didn't gain much.

(At least I think I remember this being the story.)

>>  - avx:  2.32x faster (169 vs.  73 cycles)
> 
> Don't you agree that if this is true (I don't know if it is)
> the patch should not be applied as is?

I do agree and I wouldn't (deliberately) apply anything that made the
decoder slower, or not as fast as it could be.

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 1/6] avcodec/h264: mmx2, sse2, avx 10-bit h chroma deblock/loop filter

2016-12-01 Thread James Darnley
On 2016-12-01 23:16, Michael Niedermayer wrote:
> On Thu, Dec 01, 2016 at 05:57:44PM +0100, James Darnley wrote:
>> Yorkfield:
>>  - mmx2: 2.44x faster (278 vs. 114 cycles)
>>  - sse2: 3.35x faster (278 vs.  83 cycles)
>>
>> Skylake:
>>  - mmx2: 1.69x faster (169 vs. 100 cycles)
>>  - sse2: 2.34x faster (169 vs.  72 cycles)
>>  - avx:  2.32x faster (169 vs.  73 cycles)
>> ---
>>  libavcodec/x86/h264_deblock_10bit.asm | 118 
>> ++
>>  libavcodec/x86/h264dsp_init.c |   9 +++
>>  2 files changed, 127 insertions(+)
> 
> breaks build on linux x86-32
> 
> YASMlibavcodec/x86/h264_deblock_10bit.o
> src/libavcodec/x86/h264_deblock_10bit.asm:1039: warning: `bpl' is a register 
> in 64-bit mode
> src/libavcodec/x86/h264_deblock_10bit.asm:1039: error: undefined symbol `bpl' 
> (first use)
> src/libavcodec/x86/h264_deblock_10bit.asm:1039: error:  (Each undefined 
> symbol is reported only once.)
> src/libavcodec/x86/h264_deblock_10bit.asm:1039: warning: `bpl' is a register 
> in 64-bit mode

Ah.  I shouldn't do clever things like trying to use the byte-sized
registers.  It isn't needed and causes problems like this.  Changed
locally.  Also changed in the 4:2:0 chroma intra patch.

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 1/6] avcodec/h264: mmx2, sse2, avx 10-bit h chroma deblock/loop filter

2016-12-01 Thread Carl Eugen Hoyos
2016-12-01 17:57 GMT+01:00 James Darnley :
> Yorkfield:
>  - mmx2: 2.44x faster (278 vs. 114 cycles)
>  - sse2: 3.35x faster (278 vs.  83 cycles)
>
> Skylake:
>  - mmx2: 1.69x faster (169 vs. 100 cycles)
>  - sse2: 2.34x faster (169 vs.  72 cycles)

Is it expected (or possible) that the speed impact is so
different for different Intel hardware?

>  - avx:  2.32x faster (169 vs.  73 cycles)

Don't you agree that if this is true (I don't know if it is)
the patch should not be applied as is?

Carl Eugen
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 1/6] avcodec/h264: mmx2, sse2, avx 10-bit h chroma deblock/loop filter

2016-12-01 Thread Michael Niedermayer
On Thu, Dec 01, 2016 at 05:57:44PM +0100, James Darnley wrote:
> Yorkfield:
>  - mmx2: 2.44x faster (278 vs. 114 cycles)
>  - sse2: 3.35x faster (278 vs.  83 cycles)
> 
> Skylake:
>  - mmx2: 1.69x faster (169 vs. 100 cycles)
>  - sse2: 2.34x faster (169 vs.  72 cycles)
>  - avx:  2.32x faster (169 vs.  73 cycles)
> ---
>  libavcodec/x86/h264_deblock_10bit.asm | 118 
> ++
>  libavcodec/x86/h264dsp_init.c |   9 +++
>  2 files changed, 127 insertions(+)

breaks build on linux x86-32

YASMlibavcodec/x86/h264_deblock_10bit.o
src/libavcodec/x86/h264_deblock_10bit.asm:1039: warning: `bpl' is a register in 
64-bit mode
src/libavcodec/x86/h264_deblock_10bit.asm:1039: error: undefined symbol `bpl' 
(first use)
src/libavcodec/x86/h264_deblock_10bit.asm:1039: error:  (Each undefined symbol 
is reported only once.)
src/libavcodec/x86/h264_deblock_10bit.asm:1039: warning: `bpl' is a register in 
64-bit mode

[...]
-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

In a rich man's house there is no place to spit but his face.
-- Diogenes of Sinope


signature.asc
Description: Digital signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel