Re: [FFmpeg-devel] [WIP][PATCH]v2 Opus Pyramid Vector Quantization Search in x86 SIMD asm

2017-06-30 Thread Ivan Kalvachev
On 6/26/17, Henrik Gramner wrote: > On Sat, Jun 24, 2017 at 10:39 PM, Ivan Kalvachev > wrote: >> +%define HADDPS_IS_FAST 0 >> +%define PHADDD_IS_FAST 0 > [...] >> +haddps %1, %1 >> +haddps %1, %1 > [...] >> + phaddd

Re: [FFmpeg-devel] [WIP][PATCH]v2 Opus Pyramid Vector Quantization Search in x86 SIMD asm

2017-06-25 Thread Henrik Gramner
On Sat, Jun 24, 2017 at 10:39 PM, Ivan Kalvachev wrote: > +%define HADDPS_IS_FAST 0 > +%define PHADDD_IS_FAST 0 [...] > +haddps %1, %1 > +haddps %1, %1 [...] > + phaddd xmm%1,xmm%1 > + phaddd xmm%1,xmm%1 You can safely

Re: [FFmpeg-devel] [WIP][PATCH]v2 Opus Pyramid Vector Quantization Search in x86 SIMD asm

2017-06-25 Thread Michael Niedermayer
On Sat, Jun 24, 2017 at 11:39:03PM +0300, Ivan Kalvachev wrote: [...] > diff --git a/libavcodec/x86/opus_pvq_search.asm > b/libavcodec/x86/opus_pvq_search.asm > new file mode 100644 > index 00..36b679b75e > --- /dev/null > +++ b/libavcodec/x86/opus_pvq_search.asm > @@ -0,0 +1,628 @@ > +;

[FFmpeg-devel] [WIP][PATCH]v2 Opus Pyramid Vector Quantization Search in x86 SIMD asm

2017-06-24 Thread Ivan Kalvachev
This is the second version of my work. Nobody posted any benchmarks, so the old code remains for this round too. The proper PIC handling code is included. Small cosmetics, e.g. using tmpY, to separate (semantically) from the output outY. Now the tmpX buffer is fixed at 256*sizeof(float) size