date:20190316

Re: [FFmpeg-devel] [PATCH] lavf/mp3dec: increase probe score of buffers entirely composed of valid packets

2019-03-16 Thread Michael Niedermayer

On Sat, Mar 16, 2019 at 05:24:46AM -0500, Rodger Combs wrote:
> Fixes some files misdetecting as MPEG PS
> ---
>  libavformat/mp3dec.c | 4 ++--

If MPEG PS misdetects a file, i would first try to fix the mpeg ps
probe to not misdetect it and produce a lower score for whatever
makes this file not mpeg-ps.

But i have not seen the file so this is just the general direction
i would look into ...

thx

[...]
-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

If a bugfix only changes things apparently unrelated to the bug with no
further explanation, that is a good sign that the bugfix is wrong.

signature.asc
Description: PGP signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH] libavcodec Adding ff_v210_planar_unpack AVX2

2019-03-16 Thread Mike Stoner

Hello,
I resent my AVX2 patch for v210 unpacking.  My first attempt didn't get picked 
up by the Patchwork list for some reason.

I installed Linux on a Broadwell laptop to utilize James Darnley's checkasm 
patch for v210 decode.  The results are below.  

AVX2 gets a nice boost from replacing SHUFPS instructions with VPBLENDD, which 
has more flexible port bindings.  VBLENDPS could also be substituted and is 
available from SSE4.1 onward, however I found only the AVX2 code received any 
measureable gain from that change.

Any further comments are greatly appreciated.  

Thanks,
Mike


Tested on Broadwell CPU, Ubuntu 18.10 x86_64

~/FFmpeg$ tests/checkasm/checkasm --bench --test=v210dec
benchmarking with native FFmpeg timers
nop: 94.1
checkasm: using random seed 3963743306
SSSE3:
 - v210dec.v210_unpack [OK]
AVX:
 - v210dec.v210_unpack [OK]
AVX2:
 - v210dec.v210_unpack [OK]
checkasm: all 3 tests passed
v210_unpack_c: 1625.2
v210_unpack_ssse3: 604.2
v210_unpack_avx: 592.2
v210_unpack_avx2: 422.2
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [PATCH] avcodec/dxtory: Check slice sizes before allocating image

2019-03-16 Thread Michael Niedermayer

Fixes: Timeout (26sec -> 2sec)
Fixes: 
13612/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_DXTORY_fuzzer-5676845977042944

Found-by: continuous fuzzing process 
https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer 
---
 libavcodec/dxtory.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/libavcodec/dxtory.c b/libavcodec/dxtory.c
index 285ca38efb..7024b315d1 100644
--- a/libavcodec/dxtory.c
+++ b/libavcodec/dxtory.c
@@ -272,10 +272,11 @@ static int dxtory_decode_v2(AVCodecContext *avctx, 
AVFrame *pic,
 setup_lru_func setup_lru,
 enum AVPixelFormat fmt)
 {
-GetByteContext gb;
+GetByteContext gb, gb_check;
 GetBitContext  gb2;
 int nslices, slice, line = 0;
 uint32_t off, slice_size;
+uint64_t off_check;
 uint8_t lru[3][8];
 int ret;
 
@@ -283,6 +284,13 @@ static int dxtory_decode_v2(AVCodecContext *avctx, AVFrame 
*pic,
 if (ret < 0)
 return ret;
 
+off_check = off;
+gb_check = gb;
+for (slice = 0; slice < nslices; slice++)
+off_check += bytestream2_get_le32(_check);
+if (off_check - avctx->discard_damaged_percentage*off_check/100 > src_size)
+return AVERROR_INVALIDDATA;
+
 avctx->pix_fmt = fmt;
 if ((ret = ff_get_buffer(avctx, pic, 0)) < 0)
 return ret;
-- 
2.21.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [PATCH 5/5] aarch64/opusdsp: implement NEON accerelated postfilter and deemphasis

2019-03-16 Thread Lynne

153372 UNITS in postfilter_c,   65536 runs,  0 skips
73164 UNITS in postfilter_neon,   65536 runs,  0 skips -> 2.1x speedup

80591 UNITS in deemphasis_c,  131072 runs,  0 skips
43969 UNITS in deemphasis_neon,  131072 runs,  0 skips -> 1.83x speedup

Total decoder speedup: ~15% on a Raspberry Pi 3 (from 28.1x to 33.5x realtime)

Deemphasis SIMD based on the following unrolling:
const float c1 = CELT_EMPH_COEFF, c2 = c1*c1, c3 = c2*c1, c4 = c3*c1;
float state = coeff;

for (int i = 0; i < len; i += 4) {
    y[0] = x[0] + c1*state;
    y[1] = x[1] + c2*state + c1*x[0];
    y[2] = x[2] + c3*state + c1*x[1] + c2*x[0];
    y[3] = x[3] + c4*state + c1*x[2] + c2*x[1] + c3*x[0];

    state = y[3];
    y += 4;
    x += 4;
}

Unlike the x86 version, duplication is used instead of pslldq so
the structure and tables are different.
Same approach tested on x86 (3x pslldq -> vbroadcastss + shufps + pslldq)
had the same performance, so 3x pslldq was kept as vbroadcastss has a higher
latency.

>From c90f1a0b04b691670e8a357449a3c3ce09e5ef51 Mon Sep 17 00:00:00 2001
From: Lynne 
Date: Fri, 15 Mar 2019 14:37:31 +
Subject: [PATCH 5/5] aarch64/opusdsp: implement NEON accerelated postfilter
 and deemphasis

153372 UNITS in postfilter_c,   65536 runs,  0 skips
73164 UNITS in postfilter_neon,   65536 runs,  0 skips -> 2.1x speedup

80591 UNITS in deemphasis_c,  131072 runs,  0 skips
43969 UNITS in deemphasis_neon,  131072 runs,  0 skips -> 1.83x speedup

Total decoder speedup: ~15% on a Raspberry Pi 3 (from 28.1x to 33.5x realtime)

Deemphasis SIMD based on the following unrolling:
const float c1 = CELT_EMPH_COEFF, c2 = c1*c1, c3 = c2*c1, c4 = c3*c1;
float state = coeff;

for (int i = 0; i < len; i += 4) {
y[0] = x[0] + c1*state;
y[1] = x[1] + c2*state + c1*x[0];
y[2] = x[2] + c3*state + c1*x[1] + c2*x[0];
y[3] = x[3] + c4*state + c1*x[2] + c2*x[1] + c3*x[0];

state = y[3];
y += 4;
x += 4;
}

Unlike the x86 version, duplication is used instead of pslldq so
the structure and tables are different.
Same approach tested on x86 (3x pslldq -> vbroadcastss + shufps + pslldq)
had the same performance, so 3x pslldq was kept as vbroadcastss has a higher
latency.
---
 libavcodec/aarch64/Makefile   |   2 +
 libavcodec/aarch64/opusdsp_init.c |  35 +
 libavcodec/aarch64/opusdsp_neon.S | 113 ++
 libavcodec/opusdsp.c  |   3 +
 libavcodec/opusdsp.h  |   1 +
 5 files changed, 154 insertions(+)
 create mode 100644 libavcodec/aarch64/opusdsp_init.c
 create mode 100644 libavcodec/aarch64/opusdsp_neon.S

diff --git a/libavcodec/aarch64/Makefile b/libavcodec/aarch64/Makefile
index 8bc8bc528c..00f93bf59f 100644
--- a/libavcodec/aarch64/Makefile
+++ b/libavcodec/aarch64/Makefile
@@ -15,6 +15,7 @@ OBJS-$(CONFIG_VP8DSP)   += aarch64/vp8dsp_init_aarch64.o
 OBJS-$(CONFIG_AAC_DECODER)  += aarch64/aacpsdsp_init_aarch64.o \
aarch64/sbrdsp_init_aarch64.o
 OBJS-$(CONFIG_DCA_DECODER)  += aarch64/synth_filter_init.o
+OBJS-$(CONFIG_OPUS_DECODER) += aarch64/opusdsp_init.o
 OBJS-$(CONFIG_RV40_DECODER) += aarch64/rv40dsp_init_aarch64.o
 OBJS-$(CONFIG_VC1DSP)   += aarch64/vc1dsp_init_aarch64.o
 OBJS-$(CONFIG_VORBIS_DECODER)   += aarch64/vorbisdsp_init.o
@@ -49,6 +50,7 @@ NEON-OBJS-$(CONFIG_VP8DSP)  += aarch64/vp8dsp_neon.o
 # decoders/encoders
 NEON-OBJS-$(CONFIG_AAC_DECODER) += aarch64/aacpsdsp_neon.o
 NEON-OBJS-$(CONFIG_DCA_DECODER) += aarch64/synth_filter_neon.o
+NEON-OBJS-$(CONFIG_OPUS_DECODER)+= aarch64/opusdsp_neon.o
 NEON-OBJS-$(CONFIG_VORBIS_DECODER)  += aarch64/vorbisdsp_neon.o
 NEON-OBJS-$(CONFIG_VP9_DECODER) += aarch64/vp9itxfm_16bpp_neon.o   \
aarch64/vp9itxfm_neon.o \
diff --git a/libavcodec/aarch64/opusdsp_init.c b/libavcodec/aarch64/opusdsp_init.c
new file mode 100644
index 00..cc6a1b672d
--- /dev/null
+++ b/libavcodec/aarch64/opusdsp_init.c
@@ -0,0 +1,35 @@
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include "config.h"
+
+#include "libavutil/aarch64/cpu.h"

[FFmpeg-devel] [PATCH 4/5] x86/opusdsp: implement FMA3 accelerated postfilter and deemphasis

2019-03-16 Thread Lynne

58893 decicycles in deemphasis_c,  130548 runs,    524 skips
9475 decicycles in deemphasis_fma3,  130686 runs,    386 skips -> 6.21x speedup

24866 decicycles in postfilter_c,   65386 runs,    150 skips
5268 decicycles in postfilter_fma3,   65505 runs, 31 skips -> 4.72x speedup

Total decoder speedup: ~14%

Deemphasis SIMD based on the following unrolling:
const float c1 = CELT_EMPH_COEFF, c2 = c1*c1, c3 = c2*c1, c4 = c3*c1;
float state = coeff;

for (int i = 0; i < len; i += 4) {
    y[0] = x[0] + c1*state;
    y[1] = x[1] + c2*state + c1*x[0];
    y[2] = x[2] + c3*state + c1*x[1] + c2*x[0];
    y[3] = x[3] + c4*state + c1*x[2] + c2*x[1] + c3*x[0];

    state = y[3];
    y += 4;
    x += 4;
}

>From b4be0e7019f16ec567f39da50d1ea35ce5ddf45a Mon Sep 17 00:00:00 2001
From: Lynne 
Date: Fri, 15 Mar 2019 14:43:04 +
Subject: [PATCH 4/5] x86/opusdsp: implement FMA3 accelerated postfilter and
 deemphasis

58893 decicycles in deemphasis_c,  130548 runs,524 skips
9475 decicycles in deemphasis_fma3,  130686 runs,386 skips -> 6.21x speedup

24866 decicycles in postfilter_c,   65386 runs,150 skips
5268 decicycles in postfilter_fma3,   65505 runs, 31 skips -> 4.72x speedup

Total decoder speedup: ~14%

Deemphasis SIMD based on the following unrolling:
const float c1 = CELT_EMPH_COEFF, c2 = c1*c1, c3 = c2*c1, c4 = c3*c1;
float state = coeff;

for (int i = 0; i < len; i += 4) {
y[0] = x[0] + c1*state;
y[1] = x[1] + c2*state + c1*x[0];
y[2] = x[2] + c3*state + c1*x[1] + c2*x[0];
y[3] = x[3] + c4*state + c1*x[2] + c2*x[1] + c3*x[0];

state = y[3];
y += 4;
x += 4;
}
---
 libavcodec/opusdsp.c  |   3 +
 libavcodec/opusdsp.h  |   2 +
 libavcodec/x86/Makefile   |   2 +
 libavcodec/x86/opusdsp.asm| 114 ++
 libavcodec/x86/opusdsp_init.c |  35 +++
 5 files changed, 156 insertions(+)
 create mode 100644 libavcodec/x86/opusdsp.asm
 create mode 100644 libavcodec/x86/opusdsp_init.c

diff --git a/libavcodec/opusdsp.c b/libavcodec/opusdsp.c
index 615e7d6816..17e819f977 100644
--- a/libavcodec/opusdsp.c
+++ b/libavcodec/opusdsp.c
@@ -58,4 +58,7 @@ av_cold void ff_opus_dsp_init(OpusDSP *ctx)
 {
 ctx->postfilter = postfilter_c;
 ctx->deemphasis = deemphasis_c;
+
+if (ARCH_X86)
+ff_opus_dsp_init_x86(ctx);
 }
diff --git a/libavcodec/opusdsp.h b/libavcodec/opusdsp.h
index 74adfe6859..e8b8cf40a9 100644
--- a/libavcodec/opusdsp.h
+++ b/libavcodec/opusdsp.h
@@ -30,4 +30,6 @@ typedef struct OpusDSP {
 
 void ff_opus_dsp_init(OpusDSP *ctx);
 
+void ff_opus_dsp_init_x86(OpusDSP *ctx);
+
 #endif /* AVCODEC_OPUS_DSP_H */
diff --git a/libavcodec/x86/Makefile b/libavcodec/x86/Makefile
index 2697431781..f63f7cfed3 100644
--- a/libavcodec/x86/Makefile
+++ b/libavcodec/x86/Makefile
@@ -53,6 +53,7 @@ OBJS-$(CONFIG_CAVS_DECODER)+= x86/cavsdsp.o
 OBJS-$(CONFIG_DCA_DECODER) += x86/dcadsp_init.o x86/synth_filter_init.o
 OBJS-$(CONFIG_DNXHD_ENCODER)   += x86/dnxhdenc_init.o
 OBJS-$(CONFIG_EXR_DECODER) += x86/exrdsp_init.o
+OBJS-$(CONFIG_OPUS_DECODER)+= x86/opusdsp_init.o
 OBJS-$(CONFIG_OPUS_ENCODER)+= x86/celt_pvq_init.o
 OBJS-$(CONFIG_HEVC_DECODER)+= x86/hevcdsp_init.o
 OBJS-$(CONFIG_JPEG2000_DECODER)+= x86/jpeg2000dsp_init.o
@@ -126,6 +127,7 @@ X86ASM-OBJS-$(CONFIG_MDCT15)   += x86/mdct15.o
 X86ASM-OBJS-$(CONFIG_ME_CMP)   += x86/me_cmp.o
 X86ASM-OBJS-$(CONFIG_MPEGAUDIODSP) += x86/imdct36.o
 X86ASM-OBJS-$(CONFIG_MPEGVIDEOENC) += x86/mpegvideoencdsp.o
+X86ASM-OBJS-$(CONFIG_OPUS_DECODER) += x86/opusdsp.o
 X86ASM-OBJS-$(CONFIG_OPUS_ENCODER) += x86/celt_pvq_search.o
 X86ASM-OBJS-$(CONFIG_PIXBLOCKDSP)  += x86/pixblockdsp.o
 X86ASM-OBJS-$(CONFIG_QPELDSP)  += x86/qpeldsp.o \
diff --git a/libavcodec/x86/opusdsp.asm b/libavcodec/x86/opusdsp.asm
new file mode 100644
index 00..ed65614e06
--- /dev/null
+++ b/libavcodec/x86/opusdsp.asm
@@ -0,0 +1,114 @@
+;**
+;* Opus SIMD functions
+;*
+;* This file is part of FFmpeg.
+;*
+;* FFmpeg is free software; you can redistribute it and/or
+;* modify it under the terms of the GNU Lesser General Public
+;* License as published by the Free Software Foundation; either
+;* version 2.1 of the License, or (at your option) any later version.
+;*
+;* FFmpeg is distributed in the hope that it will be useful,
+;* but WITHOUT ANY WARRANTY; without even the implied warranty of
+;* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+;* Lesser General Public License for more details.
+;*
+;* You should have received a copy of the GNU Lesser General Public
+;* License along with FFmpeg; if not, write to the Free Software
+;* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA

[FFmpeg-devel] [PATCH 3/5] opusdsp: create and move deemphasis and postfiltering from opus_celt

2019-03-16 Thread Lynne

Nothing else changed.

>From 5bf6dc93339a231bd82f09cc6a2b9becb5af0f4a Mon Sep 17 00:00:00 2001
From: Lynne 
Date: Fri, 15 Mar 2019 14:35:03 +
Subject: [PATCH 3/5] opusdsp: create and move deemphasis and postfiltering
 from opus_celt

---
 libavcodec/Makefile|  3 ++-
 libavcodec/opus_celt.c | 53 
 libavcodec/opus_celt.h |  3 ++-
 libavcodec/opusdsp.c   | 61 ++
 libavcodec/opusdsp.h   | 33 +++
 5 files changed, 109 insertions(+), 44 deletions(-)
 create mode 100644 libavcodec/opusdsp.c
 create mode 100644 libavcodec/opusdsp.h

diff --git a/libavcodec/Makefile b/libavcodec/Makefile
index 15c43a8a6a..d13eef8a20 100644
--- a/libavcodec/Makefile
+++ b/libavcodec/Makefile
@@ -495,7 +495,8 @@ OBJS-$(CONFIG_NELLYMOSER_ENCODER)  += nellymoserenc.o nellymoser.o
 OBJS-$(CONFIG_NUV_DECODER) += nuv.o rtjpeg.o
 OBJS-$(CONFIG_ON2AVC_DECODER)  += on2avc.o on2avcdata.o
 OBJS-$(CONFIG_OPUS_DECODER)+= opusdec.o opus.o opus_celt.o opus_rc.o \
-  opus_pvq.o opus_silk.o opustab.o vorbis_data.o
+  opus_pvq.o opus_silk.o opustab.o vorbis_data.o \
+  opusdsp.o
 OBJS-$(CONFIG_OPUS_ENCODER)+= opusenc.o opus.o opus_rc.o opustab.o opus_pvq.o \
   opusenc_psy.o
 OBJS-$(CONFIG_PAF_AUDIO_DECODER)   += pafaudio.o
diff --git a/libavcodec/opus_celt.c b/libavcodec/opus_celt.c
index 115dd8c63e..4655172b09 100644
--- a/libavcodec/opus_celt.c
+++ b/libavcodec/opus_celt.c
@@ -202,40 +202,10 @@ static void celt_postfilter_apply_transition(CeltBlock *block, float *data)
 }
 }
 
-static void celt_postfilter_apply(CeltBlock *block, float *data, int len)
-{
-const int T = block->pf_period;
-float g0, g1, g2;
-float x0, x1, x2, x3, x4;
-int i;
-
-if (block->pf_gains[0] == 0.0 || len <= 0)
-return;
-
-g0 = block->pf_gains[0];
-g1 = block->pf_gains[1];
-g2 = block->pf_gains[2];
-
-x4 = data[-T - 2];
-x3 = data[-T - 1];
-x2 = data[-T];
-x1 = data[-T + 1];
-
-for (i = 0; i < len; i++) {
-x0 = data[i - T + 2];
-data[i] += g0 * x2+
-   g1 * (x1 + x3) +
-   g2 * (x0 + x4);
-x4 = x3;
-x3 = x2;
-x2 = x1;
-x1 = x0;
-}
-}
-
 static void celt_postfilter(CeltFrame *f, CeltBlock *block)
 {
 int len = f->blocksize * f->blocks;
+const int filter_len = len - 2 * CELT_OVERLAP;
 
 celt_postfilter_apply_transition(block, block->buf + 1024);
 
@@ -247,8 +217,11 @@ static void celt_postfilter(CeltFrame *f, CeltBlock *block)
 
 if (len > CELT_OVERLAP) {
 celt_postfilter_apply_transition(block, block->buf + 1024 + CELT_OVERLAP);
-celt_postfilter_apply(block, block->buf + 1024 + 2 * CELT_OVERLAP,
-  len - 2 * CELT_OVERLAP);
+
+if (block->pf_gains[0] > FLT_EPSILON && filter_len > 0)
+f->opusdsp.postfilter(block->buf + 1024 + 2 * CELT_OVERLAP,
+  block->pf_period, block->pf_gains,
+  filter_len);
 
 block->pf_period_old = block->pf_period;
 memcpy(block->pf_gains_old, block->pf_gains, sizeof(block->pf_gains));
@@ -462,7 +435,6 @@ int ff_celt_decode_frame(CeltFrame *f, OpusRangeCoder *rc,
 /* transform and output for each output channel */
 for (i = 0; i < f->output_channels; i++) {
 CeltBlock *block = >block[i];
-float m = block->emph_coeff;
 
 /* iMDCT and overlap-add */
 for (j = 0; j < f->blocks; j++) {
@@ -480,14 +452,10 @@ int ff_celt_decode_frame(CeltFrame *f, OpusRangeCoder *rc,
 /* postfilter */
 celt_postfilter(f, block);
 
-/* deemphasis and output scaling */
-for (j = 0; j < frame_size; j++) {
-const float tmp = block->buf[1024 - frame_size + j] + m;
-m = tmp * CELT_EMPH_COEFF;
-output[i][j] = tmp;
-}
-
-block->emph_coeff = m;
+/* deemphasis */
+block->emph_coeff = f->opusdsp.deemphasis(output[i],
+  >buf[1024 - frame_size],
+  block->emph_coeff, frame_size);
 }
 
 if (channels == 1)
@@ -596,6 +564,7 @@ int ff_celt_init(AVCodecContext *avctx, CeltFrame **f, int output_channels,
 goto fail;
 }
 
+ff_opus_dsp_init(>opusdsp);
 ff_celt_flush(frm);
 
 *f = frm;
diff --git a/libavcodec/opus_celt.h b/libavcodec/opus_celt.h
index 9289a1867a..7c1c5316b9 100644
--- a/libavcodec/opus_celt.h
+++ b/libavcodec/opus_celt.h
@@ -28,6 +28,7 @@
 
 #include "opus.h"
 #include "opus_pvq.h"
+#include "opusdsp.h"
 
 #include "mdct15.h"
 #include "libavutil/float_dsp.h"
@@ -40,7 +41,6 @@
 #define

Re: [FFmpeg-devel] [PATCH 1/5] x86/opus_dsp: rename to celt_pvq

2019-03-16 Thread Lynne

16 Mar 2019, 16:26 by d...@lynne.ee:

> Its only used in the encoder and in CELT's PVQ.
>
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org 
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel 
> 
>

Sorry, forgot to attach patch.
>From bac8095e35e1ab0095edb6e63fa804127f3d417a Mon Sep 17 00:00:00 2001
From: Lynne 
Date: Fri, 15 Mar 2019 14:29:44 +
Subject: [PATCH 1/5] x86/opus_dsp: rename to celt_pvq

Its only used in the encoder and in CELT's PVQ.
---
 libavcodec/opus_pvq.c   | 2 +-
 libavcodec/opus_pvq.h   | 2 +-
 libavcodec/x86/Makefile | 6 +++---
 libavcodec/x86/{opus_dsp_init.c => celt_pvq_init.c} | 2 +-
 libavcodec/x86/{opus_pvq_search.asm => celt_pvq_search.asm} | 0
 5 files changed, 6 insertions(+), 6 deletions(-)
 rename libavcodec/x86/{opus_dsp_init.c => celt_pvq_init.c} (97%)
 rename libavcodec/x86/{opus_pvq_search.asm => celt_pvq_search.asm} (100%)

diff --git a/libavcodec/opus_pvq.c b/libavcodec/opus_pvq.c
index 0dbf14184d..c3119a03f1 100644
--- a/libavcodec/opus_pvq.c
+++ b/libavcodec/opus_pvq.c
@@ -904,7 +904,7 @@ int av_cold ff_celt_pvq_init(CeltPVQ **pvq, int encode)
 s->quant_band = encode ? pvq_encode_band : pvq_decode_band;
 
 if (ARCH_X86)
-ff_opus_dsp_init_x86(s);
+ff_celt_pvq_init_x86(s);
 
 *pvq = s;
 
diff --git a/libavcodec/opus_pvq.h b/libavcodec/opus_pvq.h
index e2f01a01b5..52f9a4e6d4 100644
--- a/libavcodec/opus_pvq.h
+++ b/libavcodec/opus_pvq.h
@@ -40,7 +40,7 @@ struct CeltPVQ {
 QUANT_FN(*quant_band);
 };
 
-void ff_opus_dsp_init_x86(struct CeltPVQ *s);
+void ff_celt_pvq_init_x86(struct CeltPVQ *s);
 
 int  ff_celt_pvq_init(struct CeltPVQ **pvq, int encode);
 void ff_celt_pvq_uninit(struct CeltPVQ **pvq);
diff --git a/libavcodec/x86/Makefile b/libavcodec/x86/Makefile
index 2350c8bbee..3bfba94ec2 100644
--- a/libavcodec/x86/Makefile
+++ b/libavcodec/x86/Makefile
@@ -53,8 +53,8 @@ OBJS-$(CONFIG_CAVS_DECODER)+= x86/cavsdsp.o
 OBJS-$(CONFIG_DCA_DECODER) += x86/dcadsp_init.o x86/synth_filter_init.o
 OBJS-$(CONFIG_DNXHD_ENCODER)   += x86/dnxhdenc_init.o
 OBJS-$(CONFIG_EXR_DECODER) += x86/exrdsp_init.o
-OBJS-$(CONFIG_OPUS_DECODER)+= x86/opus_dsp_init.o
-OBJS-$(CONFIG_OPUS_ENCODER)+= x86/opus_dsp_init.o
+OBJS-$(CONFIG_OPUS_DECODER)+= x86/celt_pvq_init.o
+OBJS-$(CONFIG_OPUS_ENCODER)+= x86/celt_pvq_init.o
 OBJS-$(CONFIG_HEVC_DECODER)+= x86/hevcdsp_init.o
 OBJS-$(CONFIG_JPEG2000_DECODER)+= x86/jpeg2000dsp_init.o
 OBJS-$(CONFIG_MLP_DECODER) += x86/mlpdsp_init.o
@@ -127,7 +127,7 @@ X86ASM-OBJS-$(CONFIG_MDCT15)   += x86/mdct15.o
 X86ASM-OBJS-$(CONFIG_ME_CMP)   += x86/me_cmp.o
 X86ASM-OBJS-$(CONFIG_MPEGAUDIODSP) += x86/imdct36.o
 X86ASM-OBJS-$(CONFIG_MPEGVIDEOENC) += x86/mpegvideoencdsp.o
-X86ASM-OBJS-$(CONFIG_OPUS_ENCODER) += x86/opus_pvq_search.o
+X86ASM-OBJS-$(CONFIG_OPUS_ENCODER) += x86/celt_pvq_search.o
 X86ASM-OBJS-$(CONFIG_PIXBLOCKDSP)  += x86/pixblockdsp.o
 X86ASM-OBJS-$(CONFIG_QPELDSP)  += x86/qpeldsp.o \
   x86/fpel.o\
diff --git a/libavcodec/x86/opus_dsp_init.c b/libavcodec/x86/celt_pvq_init.c
similarity index 97%
rename from libavcodec/x86/opus_dsp_init.c
rename to libavcodec/x86/celt_pvq_init.c
index a9f8a96159..3890a9cb9f 100644
--- a/libavcodec/x86/opus_dsp_init.c
+++ b/libavcodec/x86/celt_pvq_init.c
@@ -28,7 +28,7 @@ extern float ff_pvq_search_approx_sse2(float *X, int *y, int K, int N);
 extern float ff_pvq_search_approx_sse4(float *X, int *y, int K, int N);
 extern float ff_pvq_search_exact_avx  (float *X, int *y, int K, int N);
 
-av_cold void ff_opus_dsp_init_x86(CeltPVQ *s)
+av_cold void ff_celt_pvq_init_x86(CeltPVQ *s)
 {
 int cpu_flags = av_get_cpu_flags();
 
diff --git a/libavcodec/x86/opus_pvq_search.asm b/libavcodec/x86/celt_pvq_search.asm
similarity index 100%
rename from libavcodec/x86/opus_pvq_search.asm
rename to libavcodec/x86/celt_pvq_search.asm
-- 
2.21.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [PATCH 1/5] x86/opus_dsp: rename to celt_pvq

2019-03-16 Thread Lynne

Its only used in the encoder and in CELT's PVQ.

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [PATCH] libavcodec Adding ff_v210_planar_unpack AVX2

2019-03-16 Thread Michael Stoner

Replaced VSHUFPS with VPBLENDD to relieve port 5 bottleneck
AVX2 is 1.4x faster than AVX
---
 libavcodec/v210dec.c   | 10 +-
 libavcodec/x86/v210-init.c |  8 +
 libavcodec/x86/v210.asm| 72 +-
 3 files changed, 73 insertions(+), 17 deletions(-)

diff --git a/libavcodec/v210dec.c b/libavcodec/v210dec.c
index ddc5dbe8be..26954c0df3 100644
--- a/libavcodec/v210dec.c
+++ b/libavcodec/v210dec.c
@@ -119,7 +119,7 @@ static int decode_frame(AVCodecContext *avctx, void *data, 
int *got_frame,
 const uint32_t *src = (const uint32_t*)psrc;
 uint32_t val;
 
-w = (avctx->width / 6) * 6;
+w = (avctx->width / 12) * 12;
 s->unpack_frame(src, y, u, v, w);
 
 y += w;
@@ -127,6 +127,14 @@ static int decode_frame(AVCodecContext *avctx, void *data, 
int *got_frame,
 v += w >> 1;
 src += (w << 1) / 3;
 
+if (w < avctx->width - 5) {
+   READ_PIXELS(u, y, v);
+   READ_PIXELS(y, u, y);
+   READ_PIXELS(v, y, u);
+   READ_PIXELS(y, v, y);
+w += 6;
+}
+
 if (w < avctx->width - 1) {
 READ_PIXELS(u, y, v);
 
diff --git a/libavcodec/x86/v210-init.c b/libavcodec/x86/v210-init.c
index d64dbca1a8..cb9a6cbd6a 100644
--- a/libavcodec/x86/v210-init.c
+++ b/libavcodec/x86/v210-init.c
@@ -21,9 +21,11 @@
 
 extern void ff_v210_planar_unpack_unaligned_ssse3(const uint32_t *src, 
uint16_t *y, uint16_t *u, uint16_t *v, int width);
 extern void ff_v210_planar_unpack_unaligned_avx(const uint32_t *src, uint16_t 
*y, uint16_t *u, uint16_t *v, int width);
+extern void ff_v210_planar_unpack_unaligned_avx2(const uint32_t *src, uint16_t 
*y, uint16_t *u, uint16_t *v, int width);
 
 extern void ff_v210_planar_unpack_aligned_ssse3(const uint32_t *src, uint16_t 
*y, uint16_t *u, uint16_t *v, int width);
 extern void ff_v210_planar_unpack_aligned_avx(const uint32_t *src, uint16_t 
*y, uint16_t *u, uint16_t *v, int width);
+extern void ff_v210_planar_unpack_aligned_avx2(const uint32_t *src, uint16_t 
*y, uint16_t *u, uint16_t *v, int width);
 
 av_cold void ff_v210_x86_init(V210DecContext *s)
 {
@@ -36,6 +38,9 @@ av_cold void ff_v210_x86_init(V210DecContext *s)
 
 if (HAVE_AVX_EXTERNAL && cpu_flags & AV_CPU_FLAG_AVX)
 s->unpack_frame = ff_v210_planar_unpack_aligned_avx;
+
+if (HAVE_AVX2_EXTERNAL && cpu_flags & AV_CPU_FLAG_AVX2)
+s->unpack_frame = ff_v210_planar_unpack_aligned_avx2;
 }
 else {
 if (cpu_flags & AV_CPU_FLAG_SSSE3)
@@ -43,6 +48,9 @@ av_cold void ff_v210_x86_init(V210DecContext *s)
 
 if (HAVE_AVX_EXTERNAL && cpu_flags & AV_CPU_FLAG_AVX)
 s->unpack_frame = ff_v210_planar_unpack_unaligned_avx;
+
+if (HAVE_AVX2_EXTERNAL && cpu_flags & AV_CPU_FLAG_AVX2)
+s->unpack_frame = ff_v210_planar_unpack_unaligned_avx2;
 }
 #endif
 }
diff --git a/libavcodec/x86/v210.asm b/libavcodec/x86/v210.asm
index c24c765e5b..706712313d 100644
--- a/libavcodec/x86/v210.asm
+++ b/libavcodec/x86/v210.asm
@@ -22,9 +22,14 @@
 
 %include "libavutil/x86/x86util.asm"
 
-SECTION_RODATA
+SECTION_RODATA 32
+
+; for AVX2 version only
+v210_luma_permute: dd 0,1,2,4,5,6,7,7  ; 32-byte alignment required
+v210_chroma_shuf2: db 0,1,2,3,4,5,8,9,10,11,12,13,-1,-1,-1,-1
+v210_luma_shuf_avx2: db 0,1,4,5,6,7,8,9,12,13,14,15,-1,-1,-1,-1
+v210_chroma_shuf_avx2: db 0,1,4,5,10,11,-1,-1,2,3,8,9,12,13,-1,-1
 
-v210_mask: times 4 dd 0x3ff
 v210_mult: dw 64,4,64,4,64,4,64,4
 v210_luma_shuf: db 8,9,0,1,2,3,12,13,4,5,6,7,-1,-1,-1,-1
 v210_chroma_shuf: db 0,1,8,9,6,7,-1,-1,2,3,4,5,12,13,-1,-1
@@ -34,40 +39,65 @@ SECTION .text
 %macro v210_planar_unpack 1
 
 ; v210_planar_unpack(const uint32_t *src, uint16_t *y, uint16_t *u, uint16_t 
*v, int width)
-cglobal v210_planar_unpack_%1, 5, 5, 7
+cglobal v210_planar_unpack_%1, 5, 5, 8
 movsxdifnidn r4, r4d
 lear1, [r1+2*r4]
 addr2, r4
 addr3, r4
 negr4
 
-mova   m3, [v210_mult]
-mova   m4, [v210_mask]
-mova   m5, [v210_luma_shuf]
-mova   m6, [v210_chroma_shuf]
+VBROADCASTI128   m3, [v210_mult]
+VBROADCASTI128   m5, [v210_chroma_shuf]
+
+%if cpuflag(avx2)
+VBROADCASTI128   m4, [v210_luma_shuf_avx2]
+VBROADCASTI128   m5, [v210_chroma_shuf_avx2]
+mova m6, [v210_luma_permute]
+VBROADCASTI128   m7, [v210_chroma_shuf2]
+%else
+VBROADCASTI128   m4, [v210_luma_shuf]
+VBROADCASTI128   m5, [v210_chroma_shuf]
+%endif
+
 .loop:
 %ifidn %1, unaligned
-movu   m0, [r0]
+movu   m0, [r0]; yB v5 yA  u5 y9 v4  y8 u4 y7  v3 y6 u3  y5 v2 y4  u2 
y3 v1  y2 u1 y1  v0 y0 u0
 %else
 mova   m0, [r0]
 %endif
 
 pmullw m1, m0, m3
-psrld  m0, 10
-psrlw  m1, 6  ; u0 v0 y1 y2 v1 u2 y4 y5
-pand   m0, m4 ; y0 __ u1 __ y3 __ v2 __
+pslld  m0, 12
+psrlw  m1, 6   ; yB yA u5 v4 y8 y7 v3 u3 y5 y4 u2 v1 
y2 y1

[FFmpeg-devel] avcodec/proresenc (aw and ks) : use correct bitstream version in frame header

2019-03-16 Thread Martin Vignali

Hello

Patch in attach change bitstream version in prores frame header
(based on RDD36)
0 for 422
1 for 444 with ou without alpha

001 : Fix for prores_aw (update 444 fate test)
002 : Fix for prores_ks (no fate update because 444 is not tested for ks
encoder)

Pass fate for me (x86_64 macos)

Martin


0002-avcodec-proresenc_ks-use-correct-bitstream-version-i.patch
Description: Binary data


0001-avcodec-proresenc_aw-use-correct-bitstream-version-i.patch
Description: Binary data
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH] lavf/mp3dec: increase probe score of buffers entirely composed of valid packets

2019-03-16 Thread Carl Eugen Hoyos

Am Sa., 16. März 2019 um 11:25 Uhr schrieb Rodger Combs
:
>
> Fixes some files misdetecting as MPEG PS

Please provide such a sample.

Thank you, Carl Eugen
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [PATCH] lavf/mp3dec: increase probe score of buffers entirely composed of valid packets

2019-03-16 Thread Rodger Combs

Fixes some files misdetecting as MPEG PS
---
 libavformat/mp3dec.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libavformat/mp3dec.c b/libavformat/mp3dec.c
index ef884934e1..81da0c6090 100644
--- a/libavformat/mp3dec.c
+++ b/libavformat/mp3dec.c
@@ -100,13 +100,13 @@ static int mp3_read_probe(AVProbeData *p)
 max_framesizes = FFMAX(max_framesizes, framesizes);
 if(buf == buf0) {
 first_frames= frames;
-if (buf2 == end + sizeof(uint32_t))
+if (buf2 >= end + sizeof(uint32_t))
 whole_used = 1;
 }
 }
 // keep this in sync with ac3 probe, both need to avoid
 // issues with MPEG-files!
-if   (first_frames>=7) return AVPROBE_SCORE_EXTENSION + 1;
+if   (first_frames>=7) return AVPROBE_SCORE_EXTENSION + 1 + whole_used * 
FFMIN(first_frames / 2, 5);
 else if(max_frames>200 && p->buf_size < 2*max_framesizes)return 
AVPROBE_SCORE_EXTENSION;
 else if(max_frames>=4 && p->buf_size < 2*max_framesizes) return 
AVPROBE_SCORE_EXTENSION / 2;
 else if(ff_id3v2_match(buf0, ID3v2_DEFAULT_MAGIC) && 
2*ff_id3v2_tag_len(buf0) >= p->buf_size)
-- 
2.20.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [PATCH 2/2] lavf/movdec: use a more appropriate error code for bad trun atoms

2019-03-16 Thread Rodger Combs

---
 libavformat/mov.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libavformat/mov.c b/libavformat/mov.c
index 0dfee2e7c4..9ed109962f 100644
--- a/libavformat/mov.c
+++ b/libavformat/mov.c
@@ -4785,7 +4785,7 @@ static int mov_read_trun(MOVContext *c, AVIOContext *pb, 
MOVAtom atom)
 av_log(c->fc, AV_LOG_ERROR, "Failed to add index entry\n");
 }
 if (entries <= 0)
-return -1;
+return AVERROR_INVALIDDATA;
 
 requested_size = (st->nb_index_entries + entries) * sizeof(AVIndexEntry);
 new_entries = av_fast_realloc(st->index_entries,
-- 
2.20.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [PATCH 1/2] lavf/movdec: fix demuxing files with 0-entry trun atoms

2019-03-16 Thread Rodger Combs

Regressed in 4a9d32baca3af0d1831f9556a922c7ab5b426b10
---
 libavformat/mov.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/libavformat/mov.c b/libavformat/mov.c
index a7d444b0ee..0dfee2e7c4 100644
--- a/libavformat/mov.c
+++ b/libavformat/mov.c
@@ -4739,6 +4739,8 @@ static int mov_read_trun(MOVContext *c, AVIOContext *pb, 
MOVAtom atom)
 flags = avio_rb24(pb);
 entries = avio_rb32(pb);
 av_log(c->fc, AV_LOG_TRACE, "flags 0x%x entries %u\n", flags, entries);
+if (entries == 0)
+return 0;
 
 if ((uint64_t)entries+sc->ctts_count >= UINT_MAX/sizeof(*sc->ctts_data))
 return AVERROR_INVALIDDATA;
-- 
2.20.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [PATCH] avcodec/truemotion2: Fix integer overflow in tm2_null_res_block()

2019-03-16 Thread Michael Niedermayer

Fixes: signed integer overflow: 638592 - -2122219136 cannot be represented 
in type 'int'
Fixes: 
13441/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_TRUEMOTION2_fuzzer-5732769815068672

Found-by: continuous fuzzing process 
https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer 
---
 libavcodec/truemotion2.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/libavcodec/truemotion2.c b/libavcodec/truemotion2.c
index 4d27f0cbfc..18719dae1c 100644
--- a/libavcodec/truemotion2.c
+++ b/libavcodec/truemotion2.c
@@ -600,7 +600,8 @@ static inline void tm2_null_res_block(TM2Context *ctx, 
AVFrame *pic, int bx, int
 {
 int i;
 int ct;
-int left, right, diff;
+unsigned left, right;
+int diff;
 int deltas[16];
 TM2_INIT_POINTERS();
 
-- 
2.21.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] Bug in YUV decoder

2019-03-16 Thread Reto Kromer

Ben Hutchinson wrote:

>What is top posting? I'll try to avoid it if I know what it is.

Then please "google" it. Thanks!

https://ffmpeg.org/contact.html#MailingLists
https://en.wikipedia.org/wiki/Posting_style#Top-posting

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH] lavf/mp3dec: increase probe score of buffers entirely composed of valid packets

Re: [FFmpeg-devel] [PATCH] libavcodec Adding ff_v210_planar_unpack AVX2

[FFmpeg-devel] [PATCH] avcodec/dxtory: Check slice sizes before allocating image

[FFmpeg-devel] [PATCH 5/5] aarch64/opusdsp: implement NEON accerelated postfilter and deemphasis

[FFmpeg-devel] [PATCH 4/5] x86/opusdsp: implement FMA3 accelerated postfilter and deemphasis

[FFmpeg-devel] [PATCH 3/5] opusdsp: create and move deemphasis and postfiltering from opus_celt

Re: [FFmpeg-devel] [PATCH 1/5] x86/opus_dsp: rename to celt_pvq

[FFmpeg-devel] [PATCH 1/5] x86/opus_dsp: rename to celt_pvq

[FFmpeg-devel] [PATCH] libavcodec Adding ff_v210_planar_unpack AVX2

[FFmpeg-devel] avcodec/proresenc (aw and ks) : use correct bitstream version in frame header

Re: [FFmpeg-devel] [PATCH] lavf/mp3dec: increase probe score of buffers entirely composed of valid packets

[FFmpeg-devel] [PATCH] lavf/mp3dec: increase probe score of buffers entirely composed of valid packets

[FFmpeg-devel] [PATCH 2/2] lavf/movdec: use a more appropriate error code for bad trun atoms

[FFmpeg-devel] [PATCH 1/2] lavf/movdec: fix demuxing files with 0-entry trun atoms

[FFmpeg-devel] [PATCH] avcodec/truemotion2: Fix integer overflow in tm2_null_res_block()

Re: [FFmpeg-devel] Bug in YUV decoder

16 matches

Site Navigation

Mail list logo

Footer information