subject:"Re\: \[FFmpeg\-devel\] \[PATCH v3\] aacenc\: add SIMD optimizations for abs

Re: [FFmpeg-devel] [PATCH v3] aacenc: add SIMD optimizations for abs_pow34 and quantization

2016-10-18 Thread Rostislav Pehlivanov

On 18 October 2016 at 21:04, Michael Niedermayer 
wrote:

> On Tue, Oct 18, 2016 at 05:33:13PM +0100, Rostislav Pehlivanov wrote:
> > On 18 October 2016 at 16:32, James Almer  wrote:
> >
> > > On 10/18/2016 12:07 PM, Rostislav Pehlivanov wrote:
> > > > diff --git a/libavcodec/aacenc.c b/libavcodec/aacenc.c
> > > > index ee3cbf8..622f0ba 100644
> > > > --- a/libavcodec/aacenc.c
> > > > +++ b/libavcodec/aacenc.c
> > > > @@ -1033,6 +1033,12 @@ static av_cold int
> aac_encode_init(AVCodecContext
> > > *avctx)
> > > >  ff_lpc_init(&s->lpc, 2*avctx->frame_size, TNS_MAX_ORDER,
> > > FF_LPC_TYPE_LEVINSON);
> > > >  s->random_state = 0x1f2e3d4c;
> > > >
> > > > +s->abs_pow34   = &abs_pow34_v;
> > > > +s->quant_bands = &quantize_bands;
> > >
> > > No need for & in these.
> > >
> > > > +
> > > > +if (ARCH_X86)
> > > > +ff_aac_dsp_init_x86(s);
> > > > +
> > > >  if (HAVE_MIPSDSP)
> > > >  ff_aac_coder_init_mips(s);
> > >
> > > [...]
> > >
> > > > diff --git a/libavcodec/x86/aacencdsp.asm
> b/libavcodec/x86/aacencdsp.asm
> > > > new file mode 100644
> > > > index 000..dd7b022
> > > > --- /dev/null
> > > > +++ b/libavcodec/x86/aacencdsp.asm
> > > > @@ -0,0 +1,88 @@
> > > > +;**
> > > 
> > > > +;* SIMD optimized AAC encoder DSP functions
> > > > +;*
> > > > +;* Copyright (C) 2016 Rostislav Pehlivanov 
> > > > +;*
> > > > +;* This file is part of FFmpeg.
> > > > +;*
> > > > +;* FFmpeg is free software; you can redistribute it and/or
> > > > +;* modify it under the terms of the GNU Lesser General Public
> > > > +;* License as published by the Free Software Foundation; either
> > > > +;* version 2.1 of the License, or (at your option) any later
> version.
> > > > +;*
> > > > +;* FFmpeg is distributed in the hope that it will be useful,
> > > > +;* but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > > +;* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > > > +;* Lesser General Public License for more details.
> > > > +;*
> > > > +;* You should have received a copy of the GNU Lesser General Public
> > > > +;* License along with FFmpeg; if not, write to the Free Software
> > > > +;* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
> > > 02110-1301 USA
> > > > +;**
> > > 
> > > > +
> > > > +%include "libavutil/x86/x86util.asm"
> > > > +
> > > > +SECTION_RODATA
> > > > +
> > > > +float_abs_mask: times 4 dd 0x7fff
> > > > +
> > > > +SECTION .text
> > > > +
> > > > +;**
> *
> > > > +;void ff_abs_pow34(float *out, const float *in, const int size);
> > > > +;**
> *
> > > > +INIT_XMM sse
> > > > +cglobal abs_pow34, 3, 3, 3, out, in, size
> > > > +mova   m2, [float_abs_mask]
> > > > +shlsizeq, 2
> > > > +addinq, sizeq
> > > > +addoutq, sizeq
> > > > +negsizeq
> > > > +.loop:
> > > > +movaps m0, [inq+sizeq]
> > > > +andps  m0, m2
> > >
> > > Remove the movaps and do
> > >
> > > andps  m0, m2, [inq+sizeq]
> > >
> > > Instead. Sorry i didn't notice this last time.
> > >
> > > > +sqrtps m1, m0
> > > > +mulps  m0, m1
> > > > +sqrtps m0, m0
> > > > +mova   [outq+sizeq], m0
> > > > +addsizeq, mmsize
> > > > +jl.loop
> > > > +RET
> > > > +
> > > > +;**
> *
> > > > +;void ff_aac_quantize_bands(int *out, const float *in, const float
> > > *scaled,
> > > > +;   int size, int is_signed, int maxval,
> const
> > > float Q34,
> > > > +;   const float rounding)
> > > > +;**
> *
> > > > +INIT_XMM sse2
> > > > +cglobal aac_quantize_bands, 5, 5, 6, out, in, scaled, size,
> is_signed,
> > > maxval, Q34, rounding
> > > > +%if UNIX64 == 0
> > > > +movss m0, Q34m
> > > > +movss m1, roundingm
> > > > +cvtsi2ss  m3, maxvald
> > > > +%else
> > > > +cvtsi2ss  m3, dword maxvalm
> > > > +%endif
> > >
> > > The other way around. Unix64 is the one that has maxval on a reg
> regardless
> > > of how you init the function, whereas win64 and any x86_32 target have
> it
> > > on stack.
> > >
> > > > +shufpsm0, m0, 0
> > > > +shufpsm1, m1, 0
> > > > +shufpsm3, m3, 0
> > > > +shl   is_signedd, 31
> > > > +movd  m4, is_signedd
> > > > +shufpsm4, m4, 0
> > > > +shl   sized,   2
> > > > +add   inq, sizeq
> > > > +add   outq, sizeq
> > > > +add   scaledq, sizeq
> > > > +neg   sizeq
> > > > +.loop:
> > > > +mulps m2, m0, [scaledq+sizeq]
> > > > +addps m2, m1
> > > > +minps m2, m3
> > >
> > > > +movapsm5, [inq+size

Re: [FFmpeg-devel] [PATCH v3] aacenc: add SIMD optimizations for abs_pow34 and quantization

2016-10-18 Thread Michael Niedermayer

On Tue, Oct 18, 2016 at 05:33:13PM +0100, Rostislav Pehlivanov wrote:
> On 18 October 2016 at 16:32, James Almer  wrote:
> 
> > On 10/18/2016 12:07 PM, Rostislav Pehlivanov wrote:
> > > diff --git a/libavcodec/aacenc.c b/libavcodec/aacenc.c
> > > index ee3cbf8..622f0ba 100644
> > > --- a/libavcodec/aacenc.c
> > > +++ b/libavcodec/aacenc.c
> > > @@ -1033,6 +1033,12 @@ static av_cold int aac_encode_init(AVCodecContext
> > *avctx)
> > >  ff_lpc_init(&s->lpc, 2*avctx->frame_size, TNS_MAX_ORDER,
> > FF_LPC_TYPE_LEVINSON);
> > >  s->random_state = 0x1f2e3d4c;
> > >
> > > +s->abs_pow34   = &abs_pow34_v;
> > > +s->quant_bands = &quantize_bands;
> >
> > No need for & in these.
> >
> > > +
> > > +if (ARCH_X86)
> > > +ff_aac_dsp_init_x86(s);
> > > +
> > >  if (HAVE_MIPSDSP)
> > >  ff_aac_coder_init_mips(s);
> >
> > [...]
> >
> > > diff --git a/libavcodec/x86/aacencdsp.asm b/libavcodec/x86/aacencdsp.asm
> > > new file mode 100644
> > > index 000..dd7b022
> > > --- /dev/null
> > > +++ b/libavcodec/x86/aacencdsp.asm
> > > @@ -0,0 +1,88 @@
> > > +;**
> > 
> > > +;* SIMD optimized AAC encoder DSP functions
> > > +;*
> > > +;* Copyright (C) 2016 Rostislav Pehlivanov 
> > > +;*
> > > +;* This file is part of FFmpeg.
> > > +;*
> > > +;* FFmpeg is free software; you can redistribute it and/or
> > > +;* modify it under the terms of the GNU Lesser General Public
> > > +;* License as published by the Free Software Foundation; either
> > > +;* version 2.1 of the License, or (at your option) any later version.
> > > +;*
> > > +;* FFmpeg is distributed in the hope that it will be useful,
> > > +;* but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > +;* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > > +;* Lesser General Public License for more details.
> > > +;*
> > > +;* You should have received a copy of the GNU Lesser General Public
> > > +;* License along with FFmpeg; if not, write to the Free Software
> > > +;* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
> > 02110-1301 USA
> > > +;**
> > 
> > > +
> > > +%include "libavutil/x86/x86util.asm"
> > > +
> > > +SECTION_RODATA
> > > +
> > > +float_abs_mask: times 4 dd 0x7fff
> > > +
> > > +SECTION .text
> > > +
> > > +;***
> > > +;void ff_abs_pow34(float *out, const float *in, const int size);
> > > +;***
> > > +INIT_XMM sse
> > > +cglobal abs_pow34, 3, 3, 3, out, in, size
> > > +mova   m2, [float_abs_mask]
> > > +shlsizeq, 2
> > > +addinq, sizeq
> > > +addoutq, sizeq
> > > +negsizeq
> > > +.loop:
> > > +movaps m0, [inq+sizeq]
> > > +andps  m0, m2
> >
> > Remove the movaps and do
> >
> > andps  m0, m2, [inq+sizeq]
> >
> > Instead. Sorry i didn't notice this last time.
> >
> > > +sqrtps m1, m0
> > > +mulps  m0, m1
> > > +sqrtps m0, m0
> > > +mova   [outq+sizeq], m0
> > > +addsizeq, mmsize
> > > +jl.loop
> > > +RET
> > > +
> > > +;***
> > > +;void ff_aac_quantize_bands(int *out, const float *in, const float
> > *scaled,
> > > +;   int size, int is_signed, int maxval, const
> > float Q34,
> > > +;   const float rounding)
> > > +;***
> > > +INIT_XMM sse2
> > > +cglobal aac_quantize_bands, 5, 5, 6, out, in, scaled, size, is_signed,
> > maxval, Q34, rounding
> > > +%if UNIX64 == 0
> > > +movss m0, Q34m
> > > +movss m1, roundingm
> > > +cvtsi2ss  m3, maxvald
> > > +%else
> > > +cvtsi2ss  m3, dword maxvalm
> > > +%endif
> >
> > The other way around. Unix64 is the one that has maxval on a reg regardless
> > of how you init the function, whereas win64 and any x86_32 target have it
> > on stack.
> >
> > > +shufpsm0, m0, 0
> > > +shufpsm1, m1, 0
> > > +shufpsm3, m3, 0
> > > +shl   is_signedd, 31
> > > +movd  m4, is_signedd
> > > +shufpsm4, m4, 0
> > > +shl   sized,   2
> > > +add   inq, sizeq
> > > +add   outq, sizeq
> > > +add   scaledq, sizeq
> > > +neg   sizeq
> > > +.loop:
> > > +mulps m2, m0, [scaledq+sizeq]
> > > +addps m2, m1
> > > +minps m2, m3
> >
> > > +movapsm5, [inq+sizeq]
> > > +andps m5, m4
> >
> > Same as in abs_pow34, remove movaps and do
> >
> > andps m5, m4, [inq+sizeq]
> >
> > > +orps  m2, m5
> > > +cvttps2dq m2, m2
> > > +mova  [outq+sizeq], m2
> > > +add   sizeq, mmsize
> > > +jl   .loop
> > > +RET
> > > diff --git a/libavcodec/x86/aacencdsp_init.c b/liba

Re: [FFmpeg-devel] [PATCH v3] aacenc: add SIMD optimizations for abs_pow34 and quantization

2016-10-18 Thread Rostislav Pehlivanov

On 18 October 2016 at 16:32, James Almer  wrote:

> On 10/18/2016 12:07 PM, Rostislav Pehlivanov wrote:
> > diff --git a/libavcodec/aacenc.c b/libavcodec/aacenc.c
> > index ee3cbf8..622f0ba 100644
> > --- a/libavcodec/aacenc.c
> > +++ b/libavcodec/aacenc.c
> > @@ -1033,6 +1033,12 @@ static av_cold int aac_encode_init(AVCodecContext
> *avctx)
> >  ff_lpc_init(&s->lpc, 2*avctx->frame_size, TNS_MAX_ORDER,
> FF_LPC_TYPE_LEVINSON);
> >  s->random_state = 0x1f2e3d4c;
> >
> > +s->abs_pow34   = &abs_pow34_v;
> > +s->quant_bands = &quantize_bands;
>
> No need for & in these.
>
> > +
> > +if (ARCH_X86)
> > +ff_aac_dsp_init_x86(s);
> > +
> >  if (HAVE_MIPSDSP)
> >  ff_aac_coder_init_mips(s);
>
> [...]
>
> > diff --git a/libavcodec/x86/aacencdsp.asm b/libavcodec/x86/aacencdsp.asm
> > new file mode 100644
> > index 000..dd7b022
> > --- /dev/null
> > +++ b/libavcodec/x86/aacencdsp.asm
> > @@ -0,0 +1,88 @@
> > +;**
> 
> > +;* SIMD optimized AAC encoder DSP functions
> > +;*
> > +;* Copyright (C) 2016 Rostislav Pehlivanov 
> > +;*
> > +;* This file is part of FFmpeg.
> > +;*
> > +;* FFmpeg is free software; you can redistribute it and/or
> > +;* modify it under the terms of the GNU Lesser General Public
> > +;* License as published by the Free Software Foundation; either
> > +;* version 2.1 of the License, or (at your option) any later version.
> > +;*
> > +;* FFmpeg is distributed in the hope that it will be useful,
> > +;* but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +;* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > +;* Lesser General Public License for more details.
> > +;*
> > +;* You should have received a copy of the GNU Lesser General Public
> > +;* License along with FFmpeg; if not, write to the Free Software
> > +;* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
> 02110-1301 USA
> > +;**
> 
> > +
> > +%include "libavutil/x86/x86util.asm"
> > +
> > +SECTION_RODATA
> > +
> > +float_abs_mask: times 4 dd 0x7fff
> > +
> > +SECTION .text
> > +
> > +;***
> > +;void ff_abs_pow34(float *out, const float *in, const int size);
> > +;***
> > +INIT_XMM sse
> > +cglobal abs_pow34, 3, 3, 3, out, in, size
> > +mova   m2, [float_abs_mask]
> > +shlsizeq, 2
> > +addinq, sizeq
> > +addoutq, sizeq
> > +negsizeq
> > +.loop:
> > +movaps m0, [inq+sizeq]
> > +andps  m0, m2
>
> Remove the movaps and do
>
> andps  m0, m2, [inq+sizeq]
>
> Instead. Sorry i didn't notice this last time.
>
> > +sqrtps m1, m0
> > +mulps  m0, m1
> > +sqrtps m0, m0
> > +mova   [outq+sizeq], m0
> > +addsizeq, mmsize
> > +jl.loop
> > +RET
> > +
> > +;***
> > +;void ff_aac_quantize_bands(int *out, const float *in, const float
> *scaled,
> > +;   int size, int is_signed, int maxval, const
> float Q34,
> > +;   const float rounding)
> > +;***
> > +INIT_XMM sse2
> > +cglobal aac_quantize_bands, 5, 5, 6, out, in, scaled, size, is_signed,
> maxval, Q34, rounding
> > +%if UNIX64 == 0
> > +movss m0, Q34m
> > +movss m1, roundingm
> > +cvtsi2ss  m3, maxvald
> > +%else
> > +cvtsi2ss  m3, dword maxvalm
> > +%endif
>
> The other way around. Unix64 is the one that has maxval on a reg regardless
> of how you init the function, whereas win64 and any x86_32 target have it
> on stack.
>
> > +shufpsm0, m0, 0
> > +shufpsm1, m1, 0
> > +shufpsm3, m3, 0
> > +shl   is_signedd, 31
> > +movd  m4, is_signedd
> > +shufpsm4, m4, 0
> > +shl   sized,   2
> > +add   inq, sizeq
> > +add   outq, sizeq
> > +add   scaledq, sizeq
> > +neg   sizeq
> > +.loop:
> > +mulps m2, m0, [scaledq+sizeq]
> > +addps m2, m1
> > +minps m2, m3
>
> > +movapsm5, [inq+sizeq]
> > +andps m5, m4
>
> Same as in abs_pow34, remove movaps and do
>
> andps m5, m4, [inq+sizeq]
>
> > +orps  m2, m5
> > +cvttps2dq m2, m2
> > +mova  [outq+sizeq], m2
> > +add   sizeq, mmsize
> > +jl   .loop
> > +RET
> > diff --git a/libavcodec/x86/aacencdsp_init.c b/libavcodec/x86/aacencdsp_
> init.c
> > new file mode 100644
> > index 000..aefaa15
> > --- /dev/null
> > +++ b/libavcodec/x86/aacencdsp_init.c
> > @@ -0,0 +1,43 @@
> > +/*
> > + * AAC encoder assembly optimizations
> > + * Copyright (C) 2016 Rostislav Pehlivanov 
> > + *
> > + * This file is part of FFmpeg.
> > + *
> > + * FFmpeg is free software; you can redi

Re: [FFmpeg-devel] [PATCH v3] aacenc: add SIMD optimizations for abs_pow34 and quantization

2016-10-18 Thread James Almer

On 10/18/2016 12:07 PM, Rostislav Pehlivanov wrote:
> diff --git a/libavcodec/aacenc.c b/libavcodec/aacenc.c
> index ee3cbf8..622f0ba 100644
> --- a/libavcodec/aacenc.c
> +++ b/libavcodec/aacenc.c
> @@ -1033,6 +1033,12 @@ static av_cold int aac_encode_init(AVCodecContext 
> *avctx)
>  ff_lpc_init(&s->lpc, 2*avctx->frame_size, TNS_MAX_ORDER, 
> FF_LPC_TYPE_LEVINSON);
>  s->random_state = 0x1f2e3d4c;
>  
> +s->abs_pow34   = &abs_pow34_v;
> +s->quant_bands = &quantize_bands;

No need for & in these.

> +
> +if (ARCH_X86)
> +ff_aac_dsp_init_x86(s);
> +
>  if (HAVE_MIPSDSP)
>  ff_aac_coder_init_mips(s);

[...]

> diff --git a/libavcodec/x86/aacencdsp.asm b/libavcodec/x86/aacencdsp.asm
> new file mode 100644
> index 000..dd7b022
> --- /dev/null
> +++ b/libavcodec/x86/aacencdsp.asm
> @@ -0,0 +1,88 @@
> +;**
> +;* SIMD optimized AAC encoder DSP functions
> +;*
> +;* Copyright (C) 2016 Rostislav Pehlivanov 
> +;*
> +;* This file is part of FFmpeg.
> +;*
> +;* FFmpeg is free software; you can redistribute it and/or
> +;* modify it under the terms of the GNU Lesser General Public
> +;* License as published by the Free Software Foundation; either
> +;* version 2.1 of the License, or (at your option) any later version.
> +;*
> +;* FFmpeg is distributed in the hope that it will be useful,
> +;* but WITHOUT ANY WARRANTY; without even the implied warranty of
> +;* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +;* Lesser General Public License for more details.
> +;*
> +;* You should have received a copy of the GNU Lesser General Public
> +;* License along with FFmpeg; if not, write to the Free Software
> +;* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 
> USA
> +;**
> +
> +%include "libavutil/x86/x86util.asm"
> +
> +SECTION_RODATA
> +
> +float_abs_mask: times 4 dd 0x7fff
> +
> +SECTION .text
> +
> +;***
> +;void ff_abs_pow34(float *out, const float *in, const int size);
> +;***
> +INIT_XMM sse
> +cglobal abs_pow34, 3, 3, 3, out, in, size
> +mova   m2, [float_abs_mask]
> +shlsizeq, 2
> +addinq, sizeq
> +addoutq, sizeq
> +negsizeq
> +.loop:
> +movaps m0, [inq+sizeq]
> +andps  m0, m2

Remove the movaps and do

andps  m0, m2, [inq+sizeq]

Instead. Sorry i didn't notice this last time.

> +sqrtps m1, m0
> +mulps  m0, m1
> +sqrtps m0, m0
> +mova   [outq+sizeq], m0
> +addsizeq, mmsize
> +jl.loop
> +RET
> +
> +;***
> +;void ff_aac_quantize_bands(int *out, const float *in, const float *scaled,
> +;   int size, int is_signed, int maxval, const float 
> Q34,
> +;   const float rounding)
> +;***
> +INIT_XMM sse2
> +cglobal aac_quantize_bands, 5, 5, 6, out, in, scaled, size, is_signed, 
> maxval, Q34, rounding
> +%if UNIX64 == 0
> +movss m0, Q34m
> +movss m1, roundingm
> +cvtsi2ss  m3, maxvald
> +%else
> +cvtsi2ss  m3, dword maxvalm
> +%endif

The other way around. Unix64 is the one that has maxval on a reg regardless
of how you init the function, whereas win64 and any x86_32 target have it
on stack.

> +shufpsm0, m0, 0
> +shufpsm1, m1, 0
> +shufpsm3, m3, 0
> +shl   is_signedd, 31
> +movd  m4, is_signedd
> +shufpsm4, m4, 0
> +shl   sized,   2
> +add   inq, sizeq
> +add   outq, sizeq
> +add   scaledq, sizeq
> +neg   sizeq
> +.loop:
> +mulps m2, m0, [scaledq+sizeq]
> +addps m2, m1
> +minps m2, m3

> +movapsm5, [inq+sizeq]
> +andps m5, m4

Same as in abs_pow34, remove movaps and do

andps m5, m4, [inq+sizeq]

> +orps  m2, m5
> +cvttps2dq m2, m2
> +mova  [outq+sizeq], m2
> +add   sizeq, mmsize
> +jl   .loop
> +RET
> diff --git a/libavcodec/x86/aacencdsp_init.c b/libavcodec/x86/aacencdsp_init.c
> new file mode 100644
> index 000..aefaa15
> --- /dev/null
> +++ b/libavcodec/x86/aacencdsp_init.c
> @@ -0,0 +1,43 @@
> +/*
> + * AAC encoder assembly optimizations
> + * Copyright (C) 2016 Rostislav Pehlivanov 
> + *
> + * This file is part of FFmpeg.
> + *
> + * FFmpeg is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * FFmpeg is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without ev

Re: [FFmpeg-devel] [PATCH v3] aacenc: add SIMD optimizations for abs_pow34 and quantization

2016-10-18 Thread Rostislav Pehlivanov

On 18 October 2016 at 14:51, Michael Niedermayer 
wrote:

> On Tue, Oct 18, 2016 at 09:02:19AM +0100, Rostislav Pehlivanov wrote:
> > On 17 October 2016 at 23:43, Michael Niedermayer  >
> > wrote:
> >
> > > On Mon, Oct 17, 2016 at 10:24:48PM +0100, Rostislav Pehlivanov wrote:
> > > > Should fix segfaults on x86-32
> > > >
> > > > Performance improvements:
> > > >
> > > > quant_bands:
> > > > with: 681 decicycles in quant_bands, 8388453 runs,155 skips
> > > > without: 1190 decicycles in quant_bands, 8388386 runs,222 skips
> > > > Around 42% for the function
> > > >
> > > > Twoloop coder:
> > > >
> > > > abs_pow34:
> > > > with/without: 7.82s/8.17s
> > > > Around 4% for the entire encoder
> > > >
> > > > Both:
> > > > with/without: 7.15s/8.17s
> > > > Around 12% for the entire encoder
> > > >
> > > > Fast coder:
> > > >
> > > > abs_pow34:
> > > > with/without: 3.40s/3.77s
> > > > Around 10% for the entire encoder
> > > >
> > > > Both:
> > > > with/without: 3.02s/3.77s
> > > > Around 20% faster for the entire encoder
> > > >
> > > > Signed-off-by: Rostislav Pehlivanov 
> > > > ---
> > > >  libavcodec/aaccoder.c| 27 +++--
> > > >  libavcodec/aaccoder_trellis.h|  2 +-
> > > >  libavcodec/aaccoder_twoloop.h|  2 +-
> > > >  libavcodec/aacenc.c  |  4 ++
> > > >  libavcodec/aacenc.h  |  6 +++
> > > >  libavcodec/aacenc_is.c   |  6 +--
> > > >  libavcodec/aacenc_ltp.c  |  4 +-
> > > >  libavcodec/aacenc_pred.c |  6 +--
> > > >  libavcodec/aacenc_quantization.h |  4 +-
> > > >  libavcodec/aacenc_utils.h|  4 +-
> > > >  libavcodec/x86/Makefile  |  2 +
> > > >  libavcodec/x86/aacencdsp.asm | 87 ++
> > > ++
> > > >  libavcodec/x86/aacencdsp_init.c  | 43 
> > > >  13 files changed, 170 insertions(+), 27 deletions(-)
> > > >  create mode 100644 libavcodec/x86/aacencdsp.asm
> > > >  create mode 100644 libavcodec/x86/aacencdsp_init.c
> > >
> > > fate passes on linux32/64 x86, mingw32/64 x86
> > >
> > > build fails on arm:
> > >
> > > libavcodec/libavcodec.a(aacenc.o): In function `aac_encode_init':
> > > ffmpeg/arm/src/libavcodec/aacenc.c:1038: undefined reference to
> > > `ff_aac_dsp_init_x86'
> > > collect2: ld returned 1 exit status
> > > make: *** [ffserver_g] Error 1
> > > make: *** Waiting for unfinished jobs
> > > libavcodec/libavcodec.a(aacenc.o): In function `aac_encode_init':
> > > ffmpeg/arm/src/libavcodec/aacenc.c:1038: undefined reference to
> > > `ff_aac_dsp_init_x86'
> > > collect2: ld returned 1 exit status
> > > make: *** [ffprobe_g] Error 1
> > > libavcodec/libavcodec.a(aacenc.o): In function `aac_encode_init':
> > > ffmpeg/arm/src/libavcodec/aacenc.c:1038: undefined reference to
> > > `ff_aac_dsp_init_x86'
> > > collect2: ld returned 1 exit status
> > > make: *** [ffmpeg_g] Error 1
> > >
> > > [...]
> > > --
> > > Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC7
> 87040B0FAB
> > >
> > > While the State exists there can be no freedom; when there is freedom
> there
> > > will be no State. -- Vladimir Lenin
> > >
> > > ___
> > > ffmpeg-devel mailing list
> > > ffmpeg-devel@ffmpeg.org
> > > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> > >
> > >
> > Attaching a new version with the fixes from James Almer which should also
> > fix non-x86 compilation
>
> >  aaccoder.c|   27 +++
> >  aaccoder_trellis.h|2 -
> >  aaccoder_twoloop.h|2 -
> >  aacenc.c  |4 ++
> >  aacenc.h  |6 +++
> >  aacenc_is.c   |6 +--
> >  aacenc_ltp.c  |4 +-
> >  aacenc_pred.c |6 +--
> >  aacenc_quantization.h |4 +-
> >  aacenc_utils.h|2 -
> >  x86/Makefile  |2 +
> >  x86/aacencdsp.asm |   88 ++
> 
> >  x86/aacencdsp_init.c  |   43 
> >  13 files changed, 170 insertions(+), 26 deletions(-)
> > 84d67e14dbd62ef958a52a4027a8dff22f7480b6  0001-aacenc-add-SIMD-
> optimizations-for-abs_pow34-and-quan.patch
> > From d92003e23d82bc40fd85712538983209a7704248 Mon Sep 17 00:00:00 2001
> > From: Rostislav Pehlivanov 
> > Date: Sat, 8 Oct 2016 15:59:14 +0100
> > Subject: [PATCH] aacenc: add SIMD optimizations for abs_pow34 and
> quantization
> >
> > Performance improvements:
> >
> > quant_bands:
> > with: 681 decicycles in quant_bands, 8388453 runs,155 skips
> > without: 1190 decicycles in quant_bands, 8388386 runs,222 skips
> > Around 42% for the function
> >
> > Twoloop coder:
> >
> > abs_pow34:
> > with/without: 7.82s/8.17s
> > Around 4% for the entire encoder
> >
> > Both:
> > with/without: 7.15s/8.17s
> > Around 12% for the entire encoder
> >
> > Fast coder:
> >
> > abs_pow34:
> > with/without: 3.40s/3.77s
> > Around 10% for the entire encoder
> >
> > Both:
> > with/without: 3.02s/3.77s
> > Around 20% faste

Re: [FFmpeg-devel] [PATCH v3] aacenc: add SIMD optimizations for abs_pow34 and quantization

2016-10-18 Thread Michael Niedermayer

On Tue, Oct 18, 2016 at 09:02:19AM +0100, Rostislav Pehlivanov wrote:
> On 17 October 2016 at 23:43, Michael Niedermayer 
> wrote:
> 
> > On Mon, Oct 17, 2016 at 10:24:48PM +0100, Rostislav Pehlivanov wrote:
> > > Should fix segfaults on x86-32
> > >
> > > Performance improvements:
> > >
> > > quant_bands:
> > > with: 681 decicycles in quant_bands, 8388453 runs,155 skips
> > > without: 1190 decicycles in quant_bands, 8388386 runs,222 skips
> > > Around 42% for the function
> > >
> > > Twoloop coder:
> > >
> > > abs_pow34:
> > > with/without: 7.82s/8.17s
> > > Around 4% for the entire encoder
> > >
> > > Both:
> > > with/without: 7.15s/8.17s
> > > Around 12% for the entire encoder
> > >
> > > Fast coder:
> > >
> > > abs_pow34:
> > > with/without: 3.40s/3.77s
> > > Around 10% for the entire encoder
> > >
> > > Both:
> > > with/without: 3.02s/3.77s
> > > Around 20% faster for the entire encoder
> > >
> > > Signed-off-by: Rostislav Pehlivanov 
> > > ---
> > >  libavcodec/aaccoder.c| 27 +++--
> > >  libavcodec/aaccoder_trellis.h|  2 +-
> > >  libavcodec/aaccoder_twoloop.h|  2 +-
> > >  libavcodec/aacenc.c  |  4 ++
> > >  libavcodec/aacenc.h  |  6 +++
> > >  libavcodec/aacenc_is.c   |  6 +--
> > >  libavcodec/aacenc_ltp.c  |  4 +-
> > >  libavcodec/aacenc_pred.c |  6 +--
> > >  libavcodec/aacenc_quantization.h |  4 +-
> > >  libavcodec/aacenc_utils.h|  4 +-
> > >  libavcodec/x86/Makefile  |  2 +
> > >  libavcodec/x86/aacencdsp.asm | 87 ++
> > ++
> > >  libavcodec/x86/aacencdsp_init.c  | 43 
> > >  13 files changed, 170 insertions(+), 27 deletions(-)
> > >  create mode 100644 libavcodec/x86/aacencdsp.asm
> > >  create mode 100644 libavcodec/x86/aacencdsp_init.c
> >
> > fate passes on linux32/64 x86, mingw32/64 x86
> >
> > build fails on arm:
> >
> > libavcodec/libavcodec.a(aacenc.o): In function `aac_encode_init':
> > ffmpeg/arm/src/libavcodec/aacenc.c:1038: undefined reference to
> > `ff_aac_dsp_init_x86'
> > collect2: ld returned 1 exit status
> > make: *** [ffserver_g] Error 1
> > make: *** Waiting for unfinished jobs
> > libavcodec/libavcodec.a(aacenc.o): In function `aac_encode_init':
> > ffmpeg/arm/src/libavcodec/aacenc.c:1038: undefined reference to
> > `ff_aac_dsp_init_x86'
> > collect2: ld returned 1 exit status
> > make: *** [ffprobe_g] Error 1
> > libavcodec/libavcodec.a(aacenc.o): In function `aac_encode_init':
> > ffmpeg/arm/src/libavcodec/aacenc.c:1038: undefined reference to
> > `ff_aac_dsp_init_x86'
> > collect2: ld returned 1 exit status
> > make: *** [ffmpeg_g] Error 1
> >
> > [...]
> > --
> > Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
> >
> > While the State exists there can be no freedom; when there is freedom there
> > will be no State. -- Vladimir Lenin
> >
> > ___
> > ffmpeg-devel mailing list
> > ffmpeg-devel@ffmpeg.org
> > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> >
> >
> Attaching a new version with the fixes from James Almer which should also
> fix non-x86 compilation

>  aaccoder.c|   27 +++
>  aaccoder_trellis.h|2 -
>  aaccoder_twoloop.h|2 -
>  aacenc.c  |4 ++
>  aacenc.h  |6 +++
>  aacenc_is.c   |6 +--
>  aacenc_ltp.c  |4 +-
>  aacenc_pred.c |6 +--
>  aacenc_quantization.h |4 +-
>  aacenc_utils.h|2 -
>  x86/Makefile  |2 +
>  x86/aacencdsp.asm |   88 
> ++
>  x86/aacencdsp_init.c  |   43 
>  13 files changed, 170 insertions(+), 26 deletions(-)
> 84d67e14dbd62ef958a52a4027a8dff22f7480b6  
> 0001-aacenc-add-SIMD-optimizations-for-abs_pow34-and-quan.patch
> From d92003e23d82bc40fd85712538983209a7704248 Mon Sep 17 00:00:00 2001
> From: Rostislav Pehlivanov 
> Date: Sat, 8 Oct 2016 15:59:14 +0100
> Subject: [PATCH] aacenc: add SIMD optimizations for abs_pow34 and quantization
> 
> Performance improvements:
> 
> quant_bands:
> with: 681 decicycles in quant_bands, 8388453 runs,155 skips
> without: 1190 decicycles in quant_bands, 8388386 runs,222 skips
> Around 42% for the function
> 
> Twoloop coder:
> 
> abs_pow34:
> with/without: 7.82s/8.17s
> Around 4% for the entire encoder
> 
> Both:
> with/without: 7.15s/8.17s
> Around 12% for the entire encoder
> 
> Fast coder:
> 
> abs_pow34:
> with/without: 3.40s/3.77s
> Around 10% for the entire encoder
> 
> Both:
> with/without: 3.02s/3.77s
> Around 20% faster for the entire encoder
> 
> Signed-off-by: Rostislav Pehlivanov 
> ---
>  libavcodec/aaccoder.c| 27 ++--
>  libavcodec/aaccoder_trellis.h|  2 +-
>  libavcodec/aaccoder_twoloop.h|  2 +-
>  libavcodec/aacenc.c  |  4 ++
>  libavcodec/aacenc.h  |  6 +++
>  libavcodec/aacenc_is.c

Re: [FFmpeg-devel] [PATCH v3] aacenc: add SIMD optimizations for abs_pow34 and quantization

2016-10-18 Thread Rostislav Pehlivanov

On 17 October 2016 at 23:43, Michael Niedermayer 
wrote:

> On Mon, Oct 17, 2016 at 10:24:48PM +0100, Rostislav Pehlivanov wrote:
> > Should fix segfaults on x86-32
> >
> > Performance improvements:
> >
> > quant_bands:
> > with: 681 decicycles in quant_bands, 8388453 runs,155 skips
> > without: 1190 decicycles in quant_bands, 8388386 runs,222 skips
> > Around 42% for the function
> >
> > Twoloop coder:
> >
> > abs_pow34:
> > with/without: 7.82s/8.17s
> > Around 4% for the entire encoder
> >
> > Both:
> > with/without: 7.15s/8.17s
> > Around 12% for the entire encoder
> >
> > Fast coder:
> >
> > abs_pow34:
> > with/without: 3.40s/3.77s
> > Around 10% for the entire encoder
> >
> > Both:
> > with/without: 3.02s/3.77s
> > Around 20% faster for the entire encoder
> >
> > Signed-off-by: Rostislav Pehlivanov 
> > ---
> >  libavcodec/aaccoder.c| 27 +++--
> >  libavcodec/aaccoder_trellis.h|  2 +-
> >  libavcodec/aaccoder_twoloop.h|  2 +-
> >  libavcodec/aacenc.c  |  4 ++
> >  libavcodec/aacenc.h  |  6 +++
> >  libavcodec/aacenc_is.c   |  6 +--
> >  libavcodec/aacenc_ltp.c  |  4 +-
> >  libavcodec/aacenc_pred.c |  6 +--
> >  libavcodec/aacenc_quantization.h |  4 +-
> >  libavcodec/aacenc_utils.h|  4 +-
> >  libavcodec/x86/Makefile  |  2 +
> >  libavcodec/x86/aacencdsp.asm | 87 ++
> ++
> >  libavcodec/x86/aacencdsp_init.c  | 43 
> >  13 files changed, 170 insertions(+), 27 deletions(-)
> >  create mode 100644 libavcodec/x86/aacencdsp.asm
> >  create mode 100644 libavcodec/x86/aacencdsp_init.c
>
> fate passes on linux32/64 x86, mingw32/64 x86
>
> build fails on arm:
>
> libavcodec/libavcodec.a(aacenc.o): In function `aac_encode_init':
> ffmpeg/arm/src/libavcodec/aacenc.c:1038: undefined reference to
> `ff_aac_dsp_init_x86'
> collect2: ld returned 1 exit status
> make: *** [ffserver_g] Error 1
> make: *** Waiting for unfinished jobs
> libavcodec/libavcodec.a(aacenc.o): In function `aac_encode_init':
> ffmpeg/arm/src/libavcodec/aacenc.c:1038: undefined reference to
> `ff_aac_dsp_init_x86'
> collect2: ld returned 1 exit status
> make: *** [ffprobe_g] Error 1
> libavcodec/libavcodec.a(aacenc.o): In function `aac_encode_init':
> ffmpeg/arm/src/libavcodec/aacenc.c:1038: undefined reference to
> `ff_aac_dsp_init_x86'
> collect2: ld returned 1 exit status
> make: *** [ffmpeg_g] Error 1
>
> [...]
> --
> Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>
> While the State exists there can be no freedom; when there is freedom there
> will be no State. -- Vladimir Lenin
>
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
>
Attaching a new version with the fixes from James Almer which should also
fix non-x86 compilation
From d92003e23d82bc40fd85712538983209a7704248 Mon Sep 17 00:00:00 2001
From: Rostislav Pehlivanov 
Date: Sat, 8 Oct 2016 15:59:14 +0100
Subject: [PATCH] aacenc: add SIMD optimizations for abs_pow34 and quantization

Performance improvements:

quant_bands:
with: 681 decicycles in quant_bands, 8388453 runs,155 skips
without: 1190 decicycles in quant_bands, 8388386 runs,222 skips
Around 42% for the function

Twoloop coder:

abs_pow34:
with/without: 7.82s/8.17s
Around 4% for the entire encoder

Both:
with/without: 7.15s/8.17s
Around 12% for the entire encoder

Fast coder:

abs_pow34:
with/without: 3.40s/3.77s
Around 10% for the entire encoder

Both:
with/without: 3.02s/3.77s
Around 20% faster for the entire encoder

Signed-off-by: Rostislav Pehlivanov 
---
 libavcodec/aaccoder.c| 27 ++--
 libavcodec/aaccoder_trellis.h|  2 +-
 libavcodec/aaccoder_twoloop.h|  2 +-
 libavcodec/aacenc.c  |  4 ++
 libavcodec/aacenc.h  |  6 +++
 libavcodec/aacenc_is.c   |  6 +--
 libavcodec/aacenc_ltp.c  |  4 +-
 libavcodec/aacenc_pred.c |  6 +--
 libavcodec/aacenc_quantization.h |  4 +-
 libavcodec/aacenc_utils.h|  2 +-
 libavcodec/x86/Makefile  |  2 +
 libavcodec/x86/aacencdsp.asm | 88 
 libavcodec/x86/aacencdsp_init.c  | 43 
 13 files changed, 170 insertions(+), 26 deletions(-)
 create mode 100644 libavcodec/x86/aacencdsp.asm
 create mode 100644 libavcodec/x86/aacencdsp_init.c

diff --git a/libavcodec/aaccoder.c b/libavcodec/aaccoder.c
index 35787e8..9f3b4ed 100644
--- a/libavcodec/aaccoder.c
+++ b/libavcodec/aaccoder.c
@@ -88,7 +88,7 @@ static void encode_window_bands_info(AACEncContext *s, SingleChannelElement *sce
 float next_minrd = INFINITY;
 int next_mincb = 0;
 
-abs_pow34_v(s->scoefs, sce->coeffs, 1024);
+s->abs_pow34(s->scoefs, sce->coeffs, 1024);
 start = win*128;
 for (cb = 0; cb < CB_TOT_ALL; cb++) {
 path[0][cb].cost = 0.0f;
@@ -299

Re: [FFmpeg-devel] [PATCH v3] aacenc: add SIMD optimizations for abs_pow34 and quantization

2016-10-17 Thread James Almer

On 10/17/2016 6:24 PM, Rostislav Pehlivanov wrote:
> diff --git a/libavcodec/aacenc_utils.h b/libavcodec/aacenc_utils.h
> index ff9188a..f5cf77d 100644
> --- a/libavcodec/aacenc_utils.h
> +++ b/libavcodec/aacenc_utils.h
> @@ -37,7 +37,7 @@
>  #define ROUND_TO_ZERO 0.1054f
>  #define C_QUANT 0.4054f
>  
> -static inline void abs_pow34_v(float *out, const float *in, const int size)
> +static inline void abs_pow34_v(float *out, const float *in, const int64_t 
> size)

Why int64_t? There's no need for values that big...

>  {
>  int i;
>  for (i = 0; i < size; i++) {

...And it's certainly not correct, seeing this for loop here.

> diff --git a/libavcodec/x86/aacencdsp.asm b/libavcodec/x86/aacencdsp.asm
> new file mode 100644
> index 000..ff4019f
> --- /dev/null
> +++ b/libavcodec/x86/aacencdsp.asm
> @@ -0,0 +1,87 @@
> +;**
> +;* SIMD optimized AAC encoder DSP functions
> +;*
> +;* Copyright (C) 2016 Rostislav Pehlivanov 
> +;*
> +;* This file is part of FFmpeg.
> +;*
> +;* FFmpeg is free software; you can redistribute it and/or
> +;* modify it under the terms of the GNU Lesser General Public
> +;* License as published by the Free Software Foundation; either
> +;* version 2.1 of the License, or (at your option) any later version.
> +;*
> +;* FFmpeg is distributed in the hope that it will be useful,
> +;* but WITHOUT ANY WARRANTY; without even the implied warranty of
> +;* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +;* Lesser General Public License for more details.
> +;*
> +;* You should have received a copy of the GNU Lesser General Public
> +;* License along with FFmpeg; if not, write to the Free Software
> +;* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 
> USA
> +;**
> +
> +%include "libavutil/x86/x86util.asm"
> +
> +SECTION_RODATA
> +
> +float_abs_mask:  times 4 dd 0x7fff
> +
> +SECTION .text
> +
> +;***
> +;void ff_abs_pow34_sse(float *out, const float *in, const int64_t size);
> +;***
> +INIT_XMM sse
> +cglobal abs_pow34, 3, 3, 3, out, in, size
> +mova   m2, [float_abs_mask]
> +shlsizeq, 2
> +addinq, sizeq
> +addoutq, sizeq
> +negsizeq
> +.loop:
> +mova   m0, [inq+sizeq]
> +andps  m0, m2
> +sqrtps m1, m0
> +mulps  m0, m1
> +sqrtps m0, m0
> +mova   [outq+sizeq], m0
> +addsizeq, mmsize
> +jl .loop
> +RET
> +
> +;***
> +;void ff_aac_quantize_bands_sse2(int *out, const float *in, const float 
> *scaled,
> +;int size, int is_signed, int maxval, const 
> float Q34,
> +;const float rounding)
> +;***
> +INIT_XMM sse2
> +cglobal aac_quantize_bands, 6, 6, 6, out, in, scaled, size, is_signed, 
> maxval, Q34, rounding

5, 5, 6

No need to load maxval into a gpr on x86_32 and win64 when cvtsi2ss
can also load it directly from memory.

> +%if UNIX64 == 0
> +movss m0, Q34m
> +movss m1, roundingm
> +%endif
> +SPLATDm0
> +SPLATDm1

shufps m0, m0, 0
shufps m1, m1, 0

On sse2, SPLATD will expand to pshufd, so better stay in float domain.

If you add an AVX/FMA3 version of this function, you could instead use
vbroadcastss in them, and save yourself the movss.

> +cvtsi2ss  m3, maxvald
> +SPLATDm3

%if UNIX64
cvtsi2ss  m3, maxvald
%else
cvtsi2ss  m3, dword maxvalm
%endif
shufpsm3, m3, 0

After you made the function 5, 5, 6.

> +shl   is_signedd, 31
> +movd  m4, is_signedd
> +SPLATDm4

Use shufps here since the instruction using this register in the loop
should be a float one.

> +shl   sizeq,   2

Even if it doesn't come from stack on win64, better use "sized" anyway.

> +add   inq, sizeq
> +add   outq,sizeq
> +add   scaledq, sizeq
> +neg   sizeq
> +.loop:
> +mova  m2, [scaledq+sizeq]
> +mulps m2, m0

mulps m2, m0, [scaledq+sizeq]

> +addps m2, m1

You could combine the mulps and addps into a single fmaddps if you add
an FMA3 version. Something like

movaps  m2, [scaledq+sizeq]
fmaddps m2, m2, m0, m1

The movaps is needed because, unlike FMA4's fmaddps which is
non-destructive, FMA3 needs dst to be the same as one of the three src
registers.

> +minps m2, m3
> +mova  m5, [inq+sizeq]

movaps m5, [inq+sizeq]

> +pand  m5, m4

andps m5, m4

> +orps  m2, m5
> +cvttps2dq m2, m2
> +mova  [outq+sizeq], m2
> +add   sizeq, mmsize
> +jl   .loop
> +RET

Can't you make these function also process eight floats p

Re: [FFmpeg-devel] [PATCH v3] aacenc: add SIMD optimizations for abs_pow34 and quantization

2016-10-17 Thread Michael Niedermayer

On Mon, Oct 17, 2016 at 10:24:48PM +0100, Rostislav Pehlivanov wrote:
> Should fix segfaults on x86-32
> 
> Performance improvements:
> 
> quant_bands:
> with: 681 decicycles in quant_bands, 8388453 runs,155 skips
> without: 1190 decicycles in quant_bands, 8388386 runs,222 skips
> Around 42% for the function
> 
> Twoloop coder:
> 
> abs_pow34:
> with/without: 7.82s/8.17s
> Around 4% for the entire encoder
> 
> Both:
> with/without: 7.15s/8.17s
> Around 12% for the entire encoder
> 
> Fast coder:
> 
> abs_pow34:
> with/without: 3.40s/3.77s
> Around 10% for the entire encoder
> 
> Both:
> with/without: 3.02s/3.77s
> Around 20% faster for the entire encoder
> 
> Signed-off-by: Rostislav Pehlivanov 
> ---
>  libavcodec/aaccoder.c| 27 +++--
>  libavcodec/aaccoder_trellis.h|  2 +-
>  libavcodec/aaccoder_twoloop.h|  2 +-
>  libavcodec/aacenc.c  |  4 ++
>  libavcodec/aacenc.h  |  6 +++
>  libavcodec/aacenc_is.c   |  6 +--
>  libavcodec/aacenc_ltp.c  |  4 +-
>  libavcodec/aacenc_pred.c |  6 +--
>  libavcodec/aacenc_quantization.h |  4 +-
>  libavcodec/aacenc_utils.h|  4 +-
>  libavcodec/x86/Makefile  |  2 +
>  libavcodec/x86/aacencdsp.asm | 87 
> 
>  libavcodec/x86/aacencdsp_init.c  | 43 
>  13 files changed, 170 insertions(+), 27 deletions(-)
>  create mode 100644 libavcodec/x86/aacencdsp.asm
>  create mode 100644 libavcodec/x86/aacencdsp_init.c

fate passes on linux32/64 x86, mingw32/64 x86

build fails on arm:

libavcodec/libavcodec.a(aacenc.o): In function `aac_encode_init':
ffmpeg/arm/src/libavcodec/aacenc.c:1038: undefined reference to 
`ff_aac_dsp_init_x86'
collect2: ld returned 1 exit status
make: *** [ffserver_g] Error 1
make: *** Waiting for unfinished jobs
libavcodec/libavcodec.a(aacenc.o): In function `aac_encode_init':
ffmpeg/arm/src/libavcodec/aacenc.c:1038: undefined reference to 
`ff_aac_dsp_init_x86'
collect2: ld returned 1 exit status
make: *** [ffprobe_g] Error 1
libavcodec/libavcodec.a(aacenc.o): In function `aac_encode_init':
ffmpeg/arm/src/libavcodec/aacenc.c:1038: undefined reference to 
`ff_aac_dsp_init_x86'
collect2: ld returned 1 exit status
make: *** [ffmpeg_g] Error 1

[...]
-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

While the State exists there can be no freedom; when there is freedom there
will be no State. -- Vladimir Lenin


signature.asc
Description: Digital signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH v3] aacenc: add SIMD optimizations for abs_pow34 and quantization

Re: [FFmpeg-devel] [PATCH v3] aacenc: add SIMD optimizations for abs_pow34 and quantization

Re: [FFmpeg-devel] [PATCH v3] aacenc: add SIMD optimizations for abs_pow34 and quantization

Re: [FFmpeg-devel] [PATCH v3] aacenc: add SIMD optimizations for abs_pow34 and quantization

Re: [FFmpeg-devel] [PATCH v3] aacenc: add SIMD optimizations for abs_pow34 and quantization

Re: [FFmpeg-devel] [PATCH v3] aacenc: add SIMD optimizations for abs_pow34 and quantization

Re: [FFmpeg-devel] [PATCH v3] aacenc: add SIMD optimizations for abs_pow34 and quantization

Re: [FFmpeg-devel] [PATCH v3] aacenc: add SIMD optimizations for abs_pow34 and quantization

Re: [FFmpeg-devel] [PATCH v3] aacenc: add SIMD optimizations for abs_pow34 and quantization

9 matches

Site Navigation

Mail list logo

Footer information