Re: [FFmpeg-devel] [PATCH 7/8] lavc/flacenc: add AVX2 version of the 32-bit LPC encoder

2017-12-02 Thread James Darnley
On 2017-11-27 17:50, Henrik Gramner wrote: > On Sun, Nov 26, 2017 at 11:51 PM, James Darnley > wrote: >> -pd_0_int_min: times 2 dd 0, -2147483648 >> -pq_int_min: times 2 dq -2147483648 >> -pq_int_max: times 2 dq 2147483647 >> +pd_0_int_min: times 4 dd 0, -2147483648 >> +pq_int_min: tim

Re: [FFmpeg-devel] [PATCH 7/8] lavc/flacenc: add AVX2 version of the 32-bit LPC encoder

2017-11-27 Thread Henrik Gramner
>> Using 128-bit broadcasts is preferable over duplicating the constants >> to 256-bit unless there's a good reason for doing so since it wastes >> less cache and is faster on AMD CPU:s. > > What would that reason be? Afaik broadcasts are expensive, since they > both load from memory then splat dat

Re: [FFmpeg-devel] [PATCH 7/8] lavc/flacenc: add AVX2 version of the 32-bit LPC encoder

2017-11-27 Thread James Almer
On 11/27/2017 1:50 PM, Henrik Gramner wrote: > On Sun, Nov 26, 2017 at 11:51 PM, James Darnley > wrote: >> -pd_0_int_min: times 2 dd 0, -2147483648 >> -pq_int_min: times 2 dq -2147483648 >> -pq_int_max: times 2 dq 2147483647 >> +pd_0_int_min: times 4 dd 0, -2147483648 >> +pq_int_min: t

Re: [FFmpeg-devel] [PATCH 7/8] lavc/flacenc: add AVX2 version of the 32-bit LPC encoder

2017-11-27 Thread Henrik Gramner
On Sun, Nov 26, 2017 at 11:51 PM, James Darnley wrote: > -pd_0_int_min: times 2 dd 0, -2147483648 > -pq_int_min: times 2 dq -2147483648 > -pq_int_max: times 2 dq 2147483647 > +pd_0_int_min: times 4 dd 0, -2147483648 > +pq_int_min: times 4 dq -2147483648 > +pq_int_max: times 4 dq 21

Re: [FFmpeg-devel] [PATCH 7/8] lavc/flacenc: add AVX2 version of the 32-bit LPC encoder

2017-11-26 Thread James Almer
On 11/26/2017 8:13 PM, Rostislav Pehlivanov wrote: > On 26 November 2017 at 22:51, James Darnley wrote: > >> When compared to the SSE4.2 version runtime, is reduced by 1 to 26%. The >> function itself is around 2 times faster. >> --- >> libavcodec/x86/flac_dsp_gpl.asm | 56 +

Re: [FFmpeg-devel] [PATCH 7/8] lavc/flacenc: add AVX2 version of the 32-bit LPC encoder

2017-11-26 Thread James Almer
On 11/26/2017 7:51 PM, James Darnley wrote: > When compared to the SSE4.2 version runtime, is reduced by 1 to 26%. The > function itself is around 2 times faster. > --- > libavcodec/x86/flac_dsp_gpl.asm | 56 > +++-- > libavcodec/x86/flacdsp_init.c | 5 +++-

Re: [FFmpeg-devel] [PATCH 7/8] lavc/flacenc: add AVX2 version of the 32-bit LPC encoder

2017-11-26 Thread James Darnley
On 2017-11-27 00:13, Rostislav Pehlivanov wrote: > On 26 November 2017 at 22:51, James Darnley wrote: >> @@ -123,7 +123,10 @@ RET >> %endmacro >> >> %macro PMINSQ 3 >> -pcmpgtq %3, %2, %1 >> +mova%3, %2 >> +; We cannot use the 3-operand format because the memory location >> canno

Re: [FFmpeg-devel] [PATCH 7/8] lavc/flacenc: add AVX2 version of the 32-bit LPC encoder

2017-11-26 Thread Rostislav Pehlivanov
On 26 November 2017 at 22:51, James Darnley wrote: > When compared to the SSE4.2 version runtime, is reduced by 1 to 26%. The > function itself is around 2 times faster. > --- > libavcodec/x86/flac_dsp_gpl.asm | 56 ++ > +-- > libavcodec/x86/flacdsp_init.c

[FFmpeg-devel] [PATCH 7/8] lavc/flacenc: add AVX2 version of the 32-bit LPC encoder

2017-11-26 Thread James Darnley
When compared to the SSE4.2 version runtime, is reduced by 1 to 26%. The function itself is around 2 times faster. --- libavcodec/x86/flac_dsp_gpl.asm | 56 +++-- libavcodec/x86/flacdsp_init.c | 5 +++- 2 files changed, 47 insertions(+), 14 deletions(-) dif