Re: [FFmpeg-devel] [PATCH v2 2/4] swscale/x86: add sse4 {lum, chr}ConvertRange
Hi, On Tue, Jun 11, 2024 at 8:42 PM James Almer wrote: > > On 6/11/2024 3:26 PM, Michael Niedermayer wrote: > > On Tue, Jun 11, 2024 at 02:28:56PM +0200, Ramiro Polla wrote: > >> chrRangeFromJpeg_8_c: 28.7 > >> chrRangeFromJpeg_8_sse4: 16.2 > >> chrRangeFromJpeg_24_c: 152.7 > >> chrRangeFromJpeg_24_sse4: 29.7 > >> chrRangeFromJpeg_128_c: 366.5 > >> chrRangeFromJpeg_128_sse4: 233.0 > >> chrRangeFromJpeg_144_c: 408.0 > >> chrRangeFromJpeg_144_sse4: 182.5 > >> chrRangeFromJpeg_256_c: 698.7 > >> chrRangeFromJpeg_256_sse4: 325.5 > >> chrRangeFromJpeg_512_c: 1348.7 > >> chrRangeFromJpeg_512_sse4: 660.2 > >> chrRangeToJpeg_8_c: 37.7 > >> chrRangeToJpeg_8_sse4: 16.2 > >> chrRangeToJpeg_24_c: 115.7 > >> chrRangeToJpeg_24_sse4: 36.2 > >> chrRangeToJpeg_128_c: 631.2 > >> chrRangeToJpeg_128_sse4: 163.7 > >> chrRangeToJpeg_144_c: 710.7 > >> chrRangeToJpeg_144_sse4: 183.0 > >> chrRangeToJpeg_256_c: 1253.0 > >> chrRangeToJpeg_256_sse4: 343.5 > >> chrRangeToJpeg_512_c: 2491.2 > >> chrRangeToJpeg_512_sse4: 654.2 > >> lumRangeFromJpeg_8_c: 11.7 > >> lumRangeFromJpeg_8_sse4: 10.5 > >> lumRangeFromJpeg_24_c: 38.5 > >> lumRangeFromJpeg_24_sse4: 19.0 > >> lumRangeFromJpeg_128_c: 237.5 > >> lumRangeFromJpeg_128_sse4: 79.2 > >> lumRangeFromJpeg_144_c: 255.7 > >> lumRangeFromJpeg_144_sse4: 90.5 > >> lumRangeFromJpeg_256_c: 441.5 > >> lumRangeFromJpeg_256_sse4: 161.7 > >> lumRangeFromJpeg_512_c: 879.0 > >> lumRangeFromJpeg_512_sse4: 333.2 > >> lumRangeToJpeg_8_c: 20.0 > >> lumRangeToJpeg_8_sse4: 11.7 > >> lumRangeToJpeg_24_c: 61.5 > >> lumRangeToJpeg_24_sse4: 17.7 > >> lumRangeToJpeg_128_c: 357.5 > >> lumRangeToJpeg_128_sse4: 80.0 > >> lumRangeToJpeg_144_c: 371.5 > >> lumRangeToJpeg_144_sse4: 93.2 > >> lumRangeToJpeg_256_c: 651.5 > >> lumRangeToJpeg_256_sse4: 164.5 > >> lumRangeToJpeg_512_c: 1279.0 > >> lumRangeToJpeg_512_sse4: 333.7 > >> --- > >> libswscale/swscale_internal.h| 1 + > >> libswscale/utils.c | 2 + > >> libswscale/x86/Makefile | 1 + > >> libswscale/x86/range_convert.asm | 130 +++ > >> libswscale/x86/swscale.c | 36 + > >> 5 files changed, 170 insertions(+) > >> create mode 100644 libswscale/x86/range_convert.asm > > > > breaks x86-32 build > > > > LDffmpeg_g > > /usr/lib/gcc-cross/i686-linux-gnu/7/../../../../i686-linux-gnu/bin/ld: > > libswscale/libswscale.a(utils.o): in function `sws_setColorspaceDetails': > > ffmpeg/linux32/src/libswscale/utils.c:1086: undefined reference to > > `ff_sws_init_range_convert_x86' > > collect2: error: ld returned 1 exit status > > make: *** [Makefile:139: ffmpeg_g] Error 1 > > > > thx > > The functions are wrapped in ARCH_X86_64 checks for seemingly no reason, > so they should be removed in the next iteration. Fixed. James walked me through on IRC to optimize and improve the functions in a way that they work both with sse2 and avx2. New patch attached. From 9e49e72f6766e96cc06bec869fb776fff4c477bf Mon Sep 17 00:00:00 2001 From: Ramiro Polla Date: Thu, 6 Jun 2024 18:33:34 +0200 Subject: [PATCH] swscale/x86: add sse2 and avx2 {lum,chr}ConvertRange chrRangeFromJpeg_8_c: 22.3 chrRangeFromJpeg_8_sse2: 13.3 chrRangeFromJpeg_8_avx2: 13.3 chrRangeFromJpeg_24_c: 72.8 chrRangeFromJpeg_24_sse2: 22.3 chrRangeFromJpeg_24_avx2: 17.5 chrRangeFromJpeg_128_c: 345.5 chrRangeFromJpeg_128_sse2: 106.0 chrRangeFromJpeg_128_avx2: 57.8 chrRangeFromJpeg_144_c: 380.5 chrRangeFromJpeg_144_sse2: 118.5 chrRangeFromJpeg_144_avx2: 62.3 chrRangeFromJpeg_256_c: 646.3 chrRangeFromJpeg_256_sse2: 218.8 chrRangeFromJpeg_256_avx2: 109.0 chrRangeFromJpeg_512_c: 1461.5 chrRangeFromJpeg_512_sse2: 426.5 chrRangeFromJpeg_512_avx2: 211.5 chrRangeToJpeg_8_c: 37.8 chrRangeToJpeg_8_sse2: 10.5 chrRangeToJpeg_8_avx2: 14.0 chrRangeToJpeg_24_c: 114.3 chrRangeToJpeg_24_sse2: 23.5 chrRangeToJpeg_24_avx2: 16.3 chrRangeToJpeg_128_c: 633.5 chrRangeToJpeg_128_sse2: 107.5 chrRangeToJpeg_128_avx2: 55.0 chrRangeToJpeg_144_c: 758.3 chrRangeToJpeg_144_sse2: 132.0 chrRangeToJpeg_144_avx2: 64.5 chrRangeToJpeg_256_c: 1345.0 chrRangeToJpeg_256_sse2: 218.0 chrRangeToJpeg_256_avx2: 105.3 chrRangeToJpeg_512_c: 2524.0 chrRangeToJpeg_512_sse2: 417.0 chrRangeToJpeg_512_avx2: 218.8 lumRangeFromJpeg_8_c: 11.8 lumRangeFromJpeg_8_sse2: 11.0 lumRangeFromJpeg_8_avx2: 10.3 lumRangeFromJpeg_24_c: 38.5 lumRangeFromJpeg_24_sse2: 15.5 lumRangeFromJpeg_24_avx2: 12.5 lumR
Re: [FFmpeg-devel] [PATCH 4/4] swscale/aarch64: add neon {lum, chr}ConvertRange
On Mon, Jun 10, 2024 at 1:56 PM Martin Storsjö wrote: > On Fri, 7 Jun 2024, Ramiro Polla wrote: > > > chrRangeFromJpeg_8_c: 28.5 > > chrRangeFromJpeg_8_neon: 21.2 > > chrRangeFromJpeg_24_c: 81.2 > > chrRangeFromJpeg_24_neon: 34.7 > > chrRangeFromJpeg_128_c: 425.2 > > chrRangeFromJpeg_128_neon: 162.0 > > chrRangeFromJpeg_144_c: 480.2 > > chrRangeFromJpeg_144_neon: 180.2 > > chrRangeFromJpeg_256_c: 838.2 > > chrRangeFromJpeg_256_neon: 318.0 > > chrRangeFromJpeg_512_c: 1698.2 > > chrRangeFromJpeg_512_neon: 630.0 > > chrRangeToJpeg_8_c: 56.0 > > chrRangeToJpeg_8_neon: 23.5 > > chrRangeToJpeg_24_c: 147.7 > > chrRangeToJpeg_24_neon: 38.2 > > chrRangeToJpeg_128_c: 760.2 > > chrRangeToJpeg_128_neon: 182.5 > > chrRangeToJpeg_144_c: 857.7 > > chrRangeToJpeg_144_neon: 204.5 > > chrRangeToJpeg_256_c: 1504.2 > > chrRangeToJpeg_256_neon: 358.5 > > chrRangeToJpeg_512_c: 3025.7 > > chrRangeToJpeg_512_neon: 710.5 > > lumRangeFromJpeg_8_c: 24.0 > > lumRangeFromJpeg_8_neon: 18.2 > > lumRangeFromJpeg_24_c: 64.0 > > lumRangeFromJpeg_24_neon: 22.2 > > lumRangeFromJpeg_128_c: 289.2 > > lumRangeFromJpeg_128_neon: 79.2 > > lumRangeFromJpeg_144_c: 334.7 > > lumRangeFromJpeg_144_neon: 87.7 > > lumRangeFromJpeg_256_c: 579.5 > > lumRangeFromJpeg_256_neon: 152.0 > > lumRangeFromJpeg_512_c: 1208.0 > > lumRangeFromJpeg_512_neon: 299.0 > > lumRangeToJpeg_8_c: 30.0 > > lumRangeToJpeg_8_neon: 19.0 > > lumRangeToJpeg_24_c: 82.2 > > lumRangeToJpeg_24_neon: 24.0 > > lumRangeToJpeg_128_c: 440.7 > > lumRangeToJpeg_128_neon: 90.5 > > lumRangeToJpeg_144_c: 502.0 > > lumRangeToJpeg_144_neon: 102.2 > > lumRangeToJpeg_256_c: 893.7 > > lumRangeToJpeg_256_neon: 178.0 > > lumRangeToJpeg_512_c: 1793.7 > > lumRangeToJpeg_512_neon: 355.0 > > --- > > libswscale/aarch64/Makefile | 1 + > > libswscale/aarch64/range_convert_neon.S | 103 > > libswscale/aarch64/swscale.c| 21 + > > libswscale/swscale_internal.h | 1 + > > libswscale/utils.c | 4 +- > > 5 files changed, 129 insertions(+), 1 deletion(-) > > create mode 100644 libswscale/aarch64/range_convert_neon.S > > > > diff --git a/libswscale/aarch64/Makefile b/libswscale/aarch64/Makefile > > index da1d909561..6923827f82 100644 > > --- a/libswscale/aarch64/Makefile > > +++ b/libswscale/aarch64/Makefile > > @@ -4,5 +4,6 @@ OBJS+= aarch64/rgb2rgb.o\ > > > > NEON-OBJS += aarch64/hscale.o \ > >aarch64/output.o \ > > + aarch64/range_convert_neon.o \ > >aarch64/rgb2rgb_neon.o \ > >aarch64/yuv2rgb_neon.o \ > > diff --git a/libswscale/aarch64/range_convert_neon.S > > b/libswscale/aarch64/range_convert_neon.S > > new file mode 100644 > > index 00..5e104971f0 > > --- /dev/null > > +++ b/libswscale/aarch64/range_convert_neon.S > > @@ -0,0 +1,103 @@ > > +/* > > + * Copyright (c) 2024 Ramiro Polla > > + * > > + * This file is part of FFmpeg. > > + * > > + * FFmpeg is free software; you can redistribute it and/or > > + * modify it under the terms of the GNU Lesser General Public > > + * License as published by the Free Software Foundation; either > > + * version 2.1 of the License, or (at your option) any later version. > > + * > > + * FFmpeg is distributed in the hope that it will be useful, > > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > > + * Lesser General Public License for more details. > > + * > > + * You should have received a copy of the GNU Lesser General Public > > + * License along with FFmpeg; if not, write to the Free Software > > + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA > > 02110-1301 USA > > + */ > > + > > +#include "libavutil/aarch64/asm.S" > > + > > +.macro lumConvertRange name max mult offset shift > > We usually use commas between the macro arguments here. Apparently it > doesn't make any difference for any of the tools we support, but it would > be nice for consistency. (When invoking macros, commas between arguments > are optional for most platforms, but not when targeting Apple platforms, > so being strict with consistent use of commas is generally good.) Fixed in the new patchset. > > +const offset_\name, align=
[FFmpeg-devel] [PATCH v2 4/4] swscale/aarch64: add neon {lum, chr}ConvertRange
chrRangeFromJpeg_8_c: 29.2 chrRangeFromJpeg_8_neon: 19.5 chrRangeFromJpeg_24_c: 80.5 chrRangeFromJpeg_24_neon: 34.0 chrRangeFromJpeg_128_c: 413.7 chrRangeFromJpeg_128_neon: 156.0 chrRangeFromJpeg_144_c: 471.0 chrRangeFromJpeg_144_neon: 174.2 chrRangeFromJpeg_256_c: 842.0 chrRangeFromJpeg_256_neon: 305.5 chrRangeFromJpeg_512_c: 1699.0 chrRangeFromJpeg_512_neon: 608.0 chrRangeToJpeg_8_c: 51.7 chrRangeToJpeg_8_neon: 22.7 chrRangeToJpeg_24_c: 149.7 chrRangeToJpeg_24_neon: 38.0 chrRangeToJpeg_128_c: 761.7 chrRangeToJpeg_128_neon: 176.7 chrRangeToJpeg_144_c: 866.2 chrRangeToJpeg_144_neon: 198.7 chrRangeToJpeg_256_c: 1516.5 chrRangeToJpeg_256_neon: 348.7 chrRangeToJpeg_512_c: 3067.2 chrRangeToJpeg_512_neon: 692.7 lumRangeFromJpeg_8_c: 24.0 lumRangeFromJpeg_8_neon: 17.0 lumRangeFromJpeg_24_c: 56.7 lumRangeFromJpeg_24_neon: 21.0 lumRangeFromJpeg_128_c: 294.5 lumRangeFromJpeg_128_neon: 76.7 lumRangeFromJpeg_144_c: 332.5 lumRangeFromJpeg_144_neon: 86.7 lumRangeFromJpeg_256_c: 586.0 lumRangeFromJpeg_256_neon: 152.2 lumRangeFromJpeg_512_c: 1190.0 lumRangeFromJpeg_512_neon: 298.0 lumRangeToJpeg_8_c: 31.7 lumRangeToJpeg_8_neon: 19.5 lumRangeToJpeg_24_c: 83.5 lumRangeToJpeg_24_neon: 24.2 lumRangeToJpeg_128_c: 440.5 lumRangeToJpeg_128_neon: 91.0 lumRangeToJpeg_144_c: 504.2 lumRangeToJpeg_144_neon: 101.0 lumRangeToJpeg_256_c: 879.7 lumRangeToJpeg_256_neon: 177.2 lumRangeToJpeg_512_c: 1794.2 lumRangeToJpeg_512_neon: 354.0 --- libswscale/aarch64/Makefile | 1 + libswscale/aarch64/range_convert_neon.S | 99 + libswscale/aarch64/swscale.c| 21 ++ libswscale/swscale_internal.h | 1 + libswscale/utils.c | 4 +- 5 files changed, 125 insertions(+), 1 deletion(-) create mode 100644 libswscale/aarch64/range_convert_neon.S diff --git a/libswscale/aarch64/Makefile b/libswscale/aarch64/Makefile index adfd90a1b6..37ad960619 100644 --- a/libswscale/aarch64/Makefile +++ b/libswscale/aarch64/Makefile @@ -5,5 +5,6 @@ OBJS+= aarch64/rgb2rgb.o\ NEON-OBJS += aarch64/hscale.o \ aarch64/input.o \ aarch64/output.o \ + aarch64/range_convert_neon.o \ aarch64/rgb2rgb_neon.o \ aarch64/yuv2rgb_neon.o \ diff --git a/libswscale/aarch64/range_convert_neon.S b/libswscale/aarch64/range_convert_neon.S new file mode 100644 index 00..ea56dc2e32 --- /dev/null +++ b/libswscale/aarch64/range_convert_neon.S @@ -0,0 +1,99 @@ +/* + * Copyright (c) 2024 Ramiro Polla + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include "libavutil/aarch64/asm.S" + +.macro lumConvertRange name, max, mult, offset, shift +function ff_\name, export=1 +.if \max != 0 +mov w3, #\max +dup v24.8h, w3 +.endif +mov w3, #\mult +dup v25.4s, w3 +movzw3, \offset & 0x +movkw3, (\offset >> 16) & 0x, lsl #16 +dup v26.4s, w3 +1: +ld1 {v0.8h}, [x0] +.if \max != 0 +sminv0.8h, v0.8h, v24.8h +.endif +mov v16.16b, v26.16b +mov v18.16b, v26.16b +sxtlv20.4s, v0.4h +sxtl2 v22.4s, v0.8h +mla v16.4s, v20.4s, v25.4s +mla v18.4s, v22.4s, v25.4s +shrnv0.4h, v16.4s, #\shift +shrn2 v0.8h, v18.4s, #\shift +subsw1, w1, #8 +st1 {v0.8h}, [x0], #16 +b.gt1b +ret +endfunc +.endm + +.macro chrConvertRange name, max, mult, offset, shift +function ff_\name, export=1 +.if \max != 0 +mov w3, #\max +dup v24.8h, w3 +.endif +mov w3, #\mult +dup v25.4s, w3 +movzw3, \offset & 0x +movkw3, (\offset >> 16) & 0x, lsl #16 +dup v26.4s, w3 +1: +ld1 {v0.8h}, [x0] +ld1 {v1.8h}, [x1] +.if \max != 0 +smin
[FFmpeg-devel] [PATCH v2 3/4] swscale/x86: add avx2 {lum, chr}ConvertRange
chrRangeFromJpeg_8_c: 24.1 chrRangeFromJpeg_8_sse4: 16.1 chrRangeFromJpeg_8_avx2: 19.9 chrRangeFromJpeg_24_c: 72.6 chrRangeFromJpeg_24_sse4: 34.6 chrRangeFromJpeg_24_avx2: 30.9 chrRangeFromJpeg_128_c: 341.1 chrRangeFromJpeg_128_sse4: 160.9 chrRangeFromJpeg_128_avx2: 94.1 chrRangeFromJpeg_144_c: 381.9 chrRangeFromJpeg_144_sse4: 183.6 chrRangeFromJpeg_144_avx2: 108.9 chrRangeFromJpeg_256_c: 646.1 chrRangeFromJpeg_256_sse4: 320.4 chrRangeFromJpeg_256_avx2: 190.6 chrRangeFromJpeg_512_c: 1255.9 chrRangeFromJpeg_512_sse4: 654.1 chrRangeFromJpeg_512_avx2: 392.4 chrRangeToJpeg_8_c: 36.9 chrRangeToJpeg_8_sse4: 13.9 chrRangeToJpeg_8_avx2: 20.6 chrRangeToJpeg_24_c: 113.4 chrRangeToJpeg_24_sse4: 29.6 chrRangeToJpeg_24_avx2: 28.9 chrRangeToJpeg_128_c: 632.1 chrRangeToJpeg_128_sse4: 162.4 chrRangeToJpeg_128_avx2: 94.6 chrRangeToJpeg_144_c: 709.9 chrRangeToJpeg_144_sse4: 183.9 chrRangeToJpeg_144_avx2: 108.1 chrRangeToJpeg_256_c: 2672.9 chrRangeToJpeg_256_sse4: 334.4 chrRangeToJpeg_256_avx2: 190.6 chrRangeToJpeg_512_c: 2500.9 chrRangeToJpeg_512_sse4: 654.1 chrRangeToJpeg_512_avx2: 379.6 lumRangeFromJpeg_8_c: 10.9 lumRangeFromJpeg_8_sse4: 12.4 lumRangeFromJpeg_8_avx2: 17.6 lumRangeFromJpeg_24_c: 38.4 lumRangeFromJpeg_24_sse4: 16.9 lumRangeFromJpeg_24_avx2: 20.6 lumRangeFromJpeg_128_c: 233.6 lumRangeFromJpeg_128_sse4: 79.9 lumRangeFromJpeg_128_avx2: 51.6 lumRangeFromJpeg_144_c: 263.9 lumRangeFromJpeg_144_sse4: 90.1 lumRangeFromJpeg_144_avx2: 57.6 lumRangeFromJpeg_256_c: 436.9 lumRangeFromJpeg_256_sse4: 162.1 lumRangeFromJpeg_256_avx2: 100.6 lumRangeFromJpeg_512_c: 878.4 lumRangeFromJpeg_512_sse4: 335.1 lumRangeFromJpeg_512_avx2: 199.4 lumRangeToJpeg_8_c: 19.1 lumRangeToJpeg_8_sse4: 11.6 lumRangeToJpeg_8_avx2: 17.6 lumRangeToJpeg_24_c: 56.9 lumRangeToJpeg_24_sse4: 17.6 lumRangeToJpeg_24_avx2: 21.4 lumRangeToJpeg_128_c: 335.9 lumRangeToJpeg_128_sse4: 79.1 lumRangeToJpeg_128_avx2: 48.9 lumRangeToJpeg_144_c: 372.9 lumRangeToJpeg_144_sse4: 91.6 lumRangeToJpeg_144_avx2: 55.4 lumRangeToJpeg_256_c: 651.9 lumRangeToJpeg_256_sse4: 163.6 lumRangeToJpeg_256_avx2: 99.1 lumRangeToJpeg_512_c: 1289.9 lumRangeToJpeg_512_sse4: 333.6 lumRangeToJpeg_512_avx2: 211.1 --- libswscale/x86/range_convert.asm | 46 ++-- libswscale/x86/swscale.c | 5 +++- 2 files changed, 42 insertions(+), 9 deletions(-) diff --git a/libswscale/x86/range_convert.asm b/libswscale/x86/range_convert.asm index 13983a386b..54c2f64769 100644 --- a/libswscale/x86/range_convert.asm +++ b/libswscale/x86/range_convert.asm @@ -22,20 +22,20 @@ SECTION_RODATA -chr_to_mult:times 4 dd 4663 -chr_to_offset: times 4 dd -9289992 +chr_to_mult:times 8 dd 4663 +chr_to_offset: times 8 dd -9289992 %define chr_to_shift 12 -chr_from_mult: times 4 dd 1799 -chr_from_offset:times 4 dd 4081085 +chr_from_mult: times 8 dd 1799 +chr_from_offset:times 8 dd 4081085 %define chr_from_shift 11 -lum_to_mult:times 4 dd 19077 -lum_to_offset: times 4 dd -39057361 +lum_to_mult:times 8 dd 19077 +lum_to_offset: times 8 dd -39057361 %define lum_to_shift 14 -lum_from_mult: times 4 dd 14071 -lum_from_offset:times 4 dd 33561947 +lum_from_mult: times 8 dd 14071 +lum_from_offset:times 8 dd 33561947 %define lum_from_shift 14 SECTION .text @@ -66,10 +66,19 @@ cglobal %1, 2, 3, 3, dst, width, x padddm1, m5 psradm0, %4 psradm1, %4 +%if mmsize == 16 packssdw m0, m0 packssdw m1, m1 movq[dstq+xq*2], m0 movq[dstq+xq*2+mmsize/2], m1 +%else +vextracti128xm7, ym0, 1 +packssdwxm0, xm7 +vextracti128xm7, ym1, 1 +packssdwxm1, xm7 +movdqu [dstq+xq*2], xm0 +movdqu [dstq+xq*2+mmsize/2], xm1 +%endif add xq, mmsize / 2 cmp xd, widthd jl .loop @@ -107,6 +116,7 @@ cglobal %1, 3, 4, 4, dstU, dstV, width, x psradm1, %4 psradm2, %4 psradm3, %4 +%if mmsize == 16 packssdw m0, m0 packssdw m1, m1 packssdw m2, m2 @@ -115,6 +125,20 @@ cglobal %1, 3, 4, 4, dstU, dstV, width, x movq [dstUq+xq*2+mmsize/2], m1 movq [dstVq+xq*2], m2 movq [dstVq+xq*2+mmsize/2], m3 +%else +vextracti128xm7, ym0, 1 +packssdwxm0, xm7 +vextracti128xm7, ym1, 1 +packssdwxm1, xm7 +vextracti128xm7, ym2, 1 +packssdwxm2, xm7 +vextracti128xm7, ym3, 1 +packssdwxm3, xm7 +movdqu [dstUq+xq*2], xm0 +movdqu [dstUq+xq*2+mmsize/2], xm1 +movdqu [dstVq+xq*2], xm2 +movdqu [dstVq+xq*2+mmsize/2], xm3 +%endif add xq, mmsize / 2 cmp xd, widthd jl .loop @@ -127,4 +151,10 @@ LUMCONVERTRANGE lumRangeToJpeg, lum_to_mult, lum_to_offset, lum_to_shift CHRCONVERTRANGE chrRangeToJpeg, chr_to_mult, chr_to_offset, chr_to_shift
[FFmpeg-devel] [PATCH v2 2/4] swscale/x86: add sse4 {lum, chr}ConvertRange
chrRangeFromJpeg_8_c: 28.7 chrRangeFromJpeg_8_sse4: 16.2 chrRangeFromJpeg_24_c: 152.7 chrRangeFromJpeg_24_sse4: 29.7 chrRangeFromJpeg_128_c: 366.5 chrRangeFromJpeg_128_sse4: 233.0 chrRangeFromJpeg_144_c: 408.0 chrRangeFromJpeg_144_sse4: 182.5 chrRangeFromJpeg_256_c: 698.7 chrRangeFromJpeg_256_sse4: 325.5 chrRangeFromJpeg_512_c: 1348.7 chrRangeFromJpeg_512_sse4: 660.2 chrRangeToJpeg_8_c: 37.7 chrRangeToJpeg_8_sse4: 16.2 chrRangeToJpeg_24_c: 115.7 chrRangeToJpeg_24_sse4: 36.2 chrRangeToJpeg_128_c: 631.2 chrRangeToJpeg_128_sse4: 163.7 chrRangeToJpeg_144_c: 710.7 chrRangeToJpeg_144_sse4: 183.0 chrRangeToJpeg_256_c: 1253.0 chrRangeToJpeg_256_sse4: 343.5 chrRangeToJpeg_512_c: 2491.2 chrRangeToJpeg_512_sse4: 654.2 lumRangeFromJpeg_8_c: 11.7 lumRangeFromJpeg_8_sse4: 10.5 lumRangeFromJpeg_24_c: 38.5 lumRangeFromJpeg_24_sse4: 19.0 lumRangeFromJpeg_128_c: 237.5 lumRangeFromJpeg_128_sse4: 79.2 lumRangeFromJpeg_144_c: 255.7 lumRangeFromJpeg_144_sse4: 90.5 lumRangeFromJpeg_256_c: 441.5 lumRangeFromJpeg_256_sse4: 161.7 lumRangeFromJpeg_512_c: 879.0 lumRangeFromJpeg_512_sse4: 333.2 lumRangeToJpeg_8_c: 20.0 lumRangeToJpeg_8_sse4: 11.7 lumRangeToJpeg_24_c: 61.5 lumRangeToJpeg_24_sse4: 17.7 lumRangeToJpeg_128_c: 357.5 lumRangeToJpeg_128_sse4: 80.0 lumRangeToJpeg_144_c: 371.5 lumRangeToJpeg_144_sse4: 93.2 lumRangeToJpeg_256_c: 651.5 lumRangeToJpeg_256_sse4: 164.5 lumRangeToJpeg_512_c: 1279.0 lumRangeToJpeg_512_sse4: 333.7 --- libswscale/swscale_internal.h| 1 + libswscale/utils.c | 2 + libswscale/x86/Makefile | 1 + libswscale/x86/range_convert.asm | 130 +++ libswscale/x86/swscale.c | 36 + 5 files changed, 170 insertions(+) create mode 100644 libswscale/x86/range_convert.asm diff --git a/libswscale/swscale_internal.h b/libswscale/swscale_internal.h index 5007dd422f..d5e7b5e71c 100644 --- a/libswscale/swscale_internal.h +++ b/libswscale/swscale_internal.h @@ -698,6 +698,7 @@ void ff_updateMMXDitherTables(SwsContext *c, int dstY); av_cold void ff_sws_init_range_convert(SwsContext *c); av_cold void ff_sws_init_range_convert_loongarch(SwsContext *c); +av_cold void ff_sws_init_range_convert_x86(SwsContext *c); SwsFunc ff_yuv2rgb_init_x86(SwsContext *c); SwsFunc ff_yuv2rgb_init_ppc(SwsContext *c); diff --git a/libswscale/utils.c b/libswscale/utils.c index 476a24fea5..8dfa57b5ff 100644 --- a/libswscale/utils.c +++ b/libswscale/utils.c @@ -1082,6 +1082,8 @@ int sws_setColorspaceDetails(struct SwsContext *c, const int inv_table[4], ff_sws_init_range_convert(c); #if ARCH_LOONGARCH64 ff_sws_init_range_convert_loongarch(c); +#elif ARCH_X86 +ff_sws_init_range_convert_x86(c); #endif } diff --git a/libswscale/x86/Makefile b/libswscale/x86/Makefile index 68391494be..f00154941d 100644 --- a/libswscale/x86/Makefile +++ b/libswscale/x86/Makefile @@ -12,6 +12,7 @@ X86ASM-OBJS += x86/input.o \ x86/output.o \ x86/scale.o \ x86/scale_avx2.o \ + x86/range_convert.o \ x86/rgb_2_rgb.o \ x86/yuv_2_rgb.o \ x86/yuv2yuvX.o \ diff --git a/libswscale/x86/range_convert.asm b/libswscale/x86/range_convert.asm new file mode 100644 index 00..13983a386b --- /dev/null +++ b/libswscale/x86/range_convert.asm @@ -0,0 +1,130 @@ +;** +;* Copyright (c) 2024 Ramiro Polla +;* +;* This file is part of FFmpeg. +;* +;* FFmpeg is free software; you can redistribute it and/or +;* modify it under the terms of the GNU Lesser General Public +;* License as published by the Free Software Foundation; either +;* version 2.1 of the License, or (at your option) any later version. +;* +;* FFmpeg is distributed in the hope that it will be useful, +;* but WITHOUT ANY WARRANTY; without even the implied warranty of +;* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +;* Lesser General Public License for more details. +;* +;* You should have received a copy of the GNU Lesser General Public +;* License along with FFmpeg; if not, write to the Free Software +;* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA +;** + +%include "libavutil/x86/x86util.asm" + +SECTION_RODATA + +chr_to_mult:times 4 dd 4663 +chr_to_offset: times 4 dd -9289992 +%define chr_to_shift 12 + +chr_from_mult: times 4 dd 1799 +chr_from_offset:times 4 dd 4081085 +%define chr_from_shift 11 + +lum_to_mult:times
[FFmpeg-devel] [PATCH v2 1/4] checkasm: add tests for {lum, chr}ConvertRange
--- tests/checkasm/Makefile | 2 +- tests/checkasm/checkasm.c | 1 + tests/checkasm/checkasm.h | 1 + tests/checkasm/sw_range_convert.c | 134 ++ 4 files changed, 137 insertions(+), 1 deletion(-) create mode 100644 tests/checkasm/sw_range_convert.c diff --git a/tests/checkasm/Makefile b/tests/checkasm/Makefile index 6eb94d10d5..f20732b37a 100644 --- a/tests/checkasm/Makefile +++ b/tests/checkasm/Makefile @@ -63,7 +63,7 @@ AVFILTEROBJS-$(CONFIG_SOBEL_FILTER) += vf_convolution.o CHECKASMOBJS-$(CONFIG_AVFILTER) += $(AVFILTEROBJS-yes) # swscale tests -SWSCALEOBJS += sw_gbrp.o sw_rgb.o sw_scale.o +SWSCALEOBJS += sw_gbrp.o sw_range_convert.o sw_rgb.o sw_scale.o CHECKASMOBJS-$(CONFIG_SWSCALE) += $(SWSCALEOBJS) diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c index 2329e2e1bc..56232ab1e0 100644 --- a/tests/checkasm/checkasm.c +++ b/tests/checkasm/checkasm.c @@ -251,6 +251,7 @@ static const struct { #endif #if CONFIG_SWSCALE { "sw_gbrp", checkasm_check_sw_gbrp }, +{ "sw_range_convert", checkasm_check_sw_range_convert }, { "sw_rgb", checkasm_check_sw_rgb }, { "sw_scale", checkasm_check_sw_scale }, #endif diff --git a/tests/checkasm/checkasm.h b/tests/checkasm/checkasm.h index 211d7f52e6..e544007b67 100644 --- a/tests/checkasm/checkasm.h +++ b/tests/checkasm/checkasm.h @@ -119,6 +119,7 @@ void checkasm_check_rv40dsp(void); void checkasm_check_svq1enc(void); void checkasm_check_synth_filter(void); void checkasm_check_sw_gbrp(void); +void checkasm_check_sw_range_convert(void); void checkasm_check_sw_rgb(void); void checkasm_check_sw_scale(void); void checkasm_check_takdsp(void); diff --git a/tests/checkasm/sw_range_convert.c b/tests/checkasm/sw_range_convert.c new file mode 100644 index 00..08029103d1 --- /dev/null +++ b/tests/checkasm/sw_range_convert.c @@ -0,0 +1,134 @@ +/* + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License along + * with FFmpeg; if not, write to the Free Software Foundation, Inc., + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. + */ + +#include + +#include "libavutil/common.h" +#include "libavutil/intreadwrite.h" +#include "libavutil/mem.h" +#include "libavutil/mem_internal.h" + +#include "libswscale/swscale.h" +#include "libswscale/swscale_internal.h" + +#include "checkasm.h" + +static void check_lumConvertRange(int from) +{ +const char *func_str = from ? "lumRangeFromJpeg" : "lumRangeToJpeg"; +#define LARGEST_INPUT_SIZE 512 +#define INPUT_SIZES 6 +static const int input_sizes[] = {8, 24, 128, 144, 256, 512}; +struct SwsContext *ctx; + +LOCAL_ALIGNED_32(int16_t, dst0, [LARGEST_INPUT_SIZE]); +LOCAL_ALIGNED_32(int16_t, dst1, [LARGEST_INPUT_SIZE]); + +declare_func(void, int16_t *dst, int width); + +ctx = sws_alloc_context(); +if (sws_init_context(ctx, NULL, NULL) < 0) +fail(); + +ctx->srcFormat = from ? AV_PIX_FMT_YUVJ444P : AV_PIX_FMT_YUV444P; +ctx->dstFormat = from ? AV_PIX_FMT_YUV444P : AV_PIX_FMT_YUVJ444P; +ctx->srcRange = from; +ctx->dstRange = !from; + +for (int dstWi = 0; dstWi < INPUT_SIZES; dstWi++) { +int width = input_sizes[dstWi]; +for (int i = 0; i < width; i++) { +uint8_t r = rnd(); +dst0[i] = (int16_t) r << 7; +dst1[i] = (int16_t) r << 7; +} +ff_sws_init_scale(ctx); +if (check_func(ctx->lumConvertRange, "%s_%d", func_str, width)) { +call_ref(dst0, width); +call_new(dst1, width); +if (memcmp(dst0, dst1, width * sizeof(int16_t))) +fail(); +bench_new(dst1, width); +} +} + +sws_freeContext(ctx); +} +#undef LARGEST_INPUT_SIZE +#undef INPUT_SIZES + +static void check_chrConvertRange(int from) +{ +const char *func_str = from ? "chrRangeFromJpeg" : "chrRangeToJpeg"; +#define LARGEST_INPUT_SIZE 512 +#define INPUT_SIZES 6 +static const int input_sizes[] = {8, 24, 128, 144, 256, 512}; +struct SwsContext *ctx; + +LOCAL_ALIGNED_32(int16_t, dstU0, [LARGEST_INPUT_SIZE]); +LOCAL_ALIGNED_32(int16_t, dstV0, [LARGEST_INPUT_SIZE]); +LOCAL_ALIGNED_32(int16_t, dstU1, [LARGEST_INPUT_SIZE]); +LOCAL_ALIGNED_32(int16_t, dstV1, [LARGEST_INPUT_SIZE]); + +declare_func(void, int16_t *dstU, int16_t *dstV, int width); + +
Re: [FFmpeg-devel] [PATCH 1/2] ffplay: add -scaling_quality option for SDL
Hi, On Mon, Jun 10, 2024 at 9:04 PM Marton Balint wrote: > On Tue, 4 Jun 2024, Ramiro Polla wrote: > > On Thu, May 30, 2024 at 11:36 PM Ramiro Polla > > wrote: > >> > >> --- > >> doc/ffplay.texi | 2 ++ > >> fftools/ffplay.c | 6 +- > >> 2 files changed, 7 insertions(+), 1 deletion(-) > >> > >> diff --git a/doc/ffplay.texi b/doc/ffplay.texi > >> index 93f77eeece..60f883e159 100644 > >> --- a/doc/ffplay.texi > >> +++ b/doc/ffplay.texi > >> @@ -72,6 +72,8 @@ as 100. > >> Force format. > >> @item -window_title @var{title} > >> Set window title (default is the input filename). > >> +@item -scaling_quality @var{value} > >> +Set SDL_HINT_RENDER_SCALE_QUALITY value (default is "linear"). > >> @item -left @var{title} > >> Set the x position for the left of the window (default is a centered > >> window). > >> @item -top @var{title} > >> diff --git a/fftools/ffplay.c b/fftools/ffplay.c > >> index b9d11eecee..75d2bec777 100644 > >> --- a/fftools/ffplay.c > >> +++ b/fftools/ffplay.c > >> @@ -351,6 +351,7 @@ static int filter_nbthreads = 0; > >> static int enable_vulkan = 0; > >> static char *vulkan_params = NULL; > >> static const char *hwaccel = NULL; > >> +static const char *scaling_quality = NULL; > >> > >> /* current context */ > >> static int is_full_screen; > >> @@ -3683,6 +3684,7 @@ static const OptionDef options[] = { > >> { "framedrop", OPT_TYPE_BOOL, OPT_EXPERT, { }, > >> "drop frames when cpu is too slow", "" }, > >> { "infbuf", OPT_TYPE_BOOL, OPT_EXPERT, { > >> _buffer }, "don't limit the input buffer size (useful with > >> realtime streams)", "" }, > >> { "window_title", OPT_TYPE_STRING, 0, { _title > >> }, "set window title", "window title" }, > >> +{ "scaling_quality",OPT_TYPE_STRING, OPT_EXPERT, { > >> _quality }, "set SDL_HINT_RENDER_SCALE_QUALITY value > >> (default=linear)", "value" }, > >> { "left", OPT_TYPE_INT,OPT_EXPERT, { _left > >> }, "set the x position for the left of the window", "x pos" }, > >> { "top",OPT_TYPE_INT,OPT_EXPERT, { _top }, > >> "set the y position for the top of the window", "y pos" }, > >> { "vf", OPT_TYPE_FUNC, OPT_FUNC_ARG | OPT_EXPERT, { > >> .func_arg = opt_add_vfilter }, "set video filters", "filter_graph" }, > >> @@ -3831,7 +3833,9 @@ int main(int argc, char **argv) > >> } > >> } > >> window = SDL_CreateWindow(program_name, SDL_WINDOWPOS_UNDEFINED, > >> SDL_WINDOWPOS_UNDEFINED, default_width, default_height, flags); > >> -SDL_SetHint(SDL_HINT_RENDER_SCALE_QUALITY, "linear"); > >> +if (!scaling_quality) > >> +scaling_quality = "linear"; > >> +SDL_SetHint(SDL_HINT_RENDER_SCALE_QUALITY, scaling_quality); > >> if (!window) { > >> av_log(NULL, AV_LOG_FATAL, "Failed to create window: %s", > >> SDL_GetError()); > >> do_exit(NULL); > >> -- > >> 2.39.2 > >> > > > > Can anyone comment on this? I had a few doubts on this patch: > > - does the option name properly convey its functionality? > > - is the documentation too terse? > > - should we include the accepted values in the documentation, even > > though they are sdl-specific? > > What is the benefit of having such option? I don't really see a strong use > case for it. Also you want to propagate the scaling quality to placebo > backend as well? Does it acutally make sense to do that? I use this option to set scaling quality to "nearest" when I want the display to be pixelated in fullscreen. I haven't thought about the placebo backend. I'll have a look when I get the time. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] libavcodec/mjpeg: keep last_dc value unclipped
On Fri, Jun 7, 2024 at 9:35 PM Andreas Rheinhardt wrote: > Ramiro Polla: > > Do av_clip_int16(val) _after_ copying the value to last_dc. > > > > Related commits: c28f648b19d and dffae122d0f > > Related ticket: 4683 > > --- > > libavcodec/mjpegdec.c| 3 +-- > > tests/ref/fate/jpg-12bpp | 2 +- > > 2 files changed, 2 insertions(+), 3 deletions(-) > > > > diff --git a/libavcodec/mjpegdec.c b/libavcodec/mjpegdec.c > > index 1481a7f285..7daec649bc 100644 > > --- a/libavcodec/mjpegdec.c > > +++ b/libavcodec/mjpegdec.c > > @@ -843,9 +843,8 @@ static int decode_block(MJpegDecodeContext *s, int16_t > > *block, int component, > > return AVERROR_INVALIDDATA; > > } > > val = val * (unsigned)quant_matrix[0] + s->last_dc[component]; > > -val = av_clip_int16(val); > > s->last_dc[component] = val; > > -block[0] = val; > > +block[0] = av_clip_int16(val); > > /* AC coefs */ > > i = 0; > > {OPEN_READER(re, >gb); > > diff --git a/tests/ref/fate/jpg-12bpp b/tests/ref/fate/jpg-12bpp > > index b3c662d587..9b039a92c6 100644 > > --- a/tests/ref/fate/jpg-12bpp > > +++ b/tests/ref/fate/jpg-12bpp > > @@ -3,4 +3,4 @@ > > #codec_id 0: rawvideo > > #dimensions 0: 999x749 > > #sar 0: 1/1 > > -0, 0, 0,1, 1496502, 0xd91deb4b > > +0, 0, 0,1, 1496502, 0x44efc0af > > Is the change for the fate-sample supposed to be an improvement or what > is the rationale for this? (Is this change mandated by the spec?) As far as I can tell the only sample we have that triggers this is buggy anyways, so it's not something spec-related. It seems more correct to me to clip the values that overflow only for the block, and not propagate the differences from the clipping to the next dc values. This change comes from another project where I decouple the bitstream reading from the processing. Currently we have this comment in MJpegDecodeContext: int last_dc[MAX_COMPONENTS]; /* last DEQUANTIZED dc (XXX: am I right to do that ?) */ What I do is keep the last quantized dc values as they were read from the bitstream, but then I have to add the dc shift for every block. Since it incurs one extra add per block, I'm not submitting the entire patch, but only this chunk. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH 1/4] tests/checkasm: cosmetics, one object per line in Makefile
On Fri, Jun 7, 2024 at 9:27 PM Andreas Rheinhardt wrote: > > Ramiro Polla: > > # swscale tests > > -SWSCALEOBJS += sw_gbrp.o sw_rgb.o sw_scale.o > > +SWSCALEOBJS += sw_gbrp.o \ > > + sw_rgb.o \ > > + sw_scale.o \ > > > > CHECKASMOBJS-$(CONFIG_SWSCALE) += $(SWSCALEOBJS) > > We typically only use a new line of the old line is full. There's currently a mix of everything in the Makefiles. One object per line, multiple objects per line, mix of one or multiple objects per line in the same statement, aligned and unaligned += between lines, aligned and unaligned \ at the end of the lines, some have \ at the last line, some don't... I personally prefer += one object per line and no \ at the end of the line everywhere. It makes the code look consistent and the patches are cleaner and easier to understand. But I don't maintain this, so I have no strong opinion in this case. This patch was meant to simplify the next commit (checkasm: add tests for {lum,chr}ConvertRange), but I can drop it if you prefer. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH] libavcodec/mjpeg: keep last_dc value unclipped
Do av_clip_int16(val) _after_ copying the value to last_dc. Related commits: c28f648b19d and dffae122d0f Related ticket: 4683 --- libavcodec/mjpegdec.c| 3 +-- tests/ref/fate/jpg-12bpp | 2 +- 2 files changed, 2 insertions(+), 3 deletions(-) diff --git a/libavcodec/mjpegdec.c b/libavcodec/mjpegdec.c index 1481a7f285..7daec649bc 100644 --- a/libavcodec/mjpegdec.c +++ b/libavcodec/mjpegdec.c @@ -843,9 +843,8 @@ static int decode_block(MJpegDecodeContext *s, int16_t *block, int component, return AVERROR_INVALIDDATA; } val = val * (unsigned)quant_matrix[0] + s->last_dc[component]; -val = av_clip_int16(val); s->last_dc[component] = val; -block[0] = val; +block[0] = av_clip_int16(val); /* AC coefs */ i = 0; {OPEN_READER(re, >gb); diff --git a/tests/ref/fate/jpg-12bpp b/tests/ref/fate/jpg-12bpp index b3c662d587..9b039a92c6 100644 --- a/tests/ref/fate/jpg-12bpp +++ b/tests/ref/fate/jpg-12bpp @@ -3,4 +3,4 @@ #codec_id 0: rawvideo #dimensions 0: 999x749 #sar 0: 1/1 -0, 0, 0,1, 1496502, 0xd91deb4b +0, 0, 0,1, 1496502, 0x44efc0af -- 2.30.2 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH 1/4] tests/checkasm: cosmetics, one object per line in Makefile
On Fri, Jun 7, 2024 at 8:46 PM Andreas Rheinhardt wrote: > > Ramiro Polla: > > --- > > tests/checkasm/Makefile | 4 +++- > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > diff --git a/tests/checkasm/Makefile b/tests/checkasm/Makefile > > index 6eb94d10d5..3ce152e818 100644 > > --- a/tests/checkasm/Makefile > > +++ b/tests/checkasm/Makefile > > @@ -63,7 +63,9 @@ AVFILTEROBJS-$(CONFIG_SOBEL_FILTER) += > > vf_convolution.o > > CHECKASMOBJS-$(CONFIG_AVFILTER) += $(AVFILTEROBJS-yes) > > > > # swscale tests > > -SWSCALEOBJS += sw_gbrp.o sw_rgb.o sw_scale.o > > +SWSCALEOBJS += sw_gbrp.o > > +SWSCALEOBJS += sw_rgb.o > > +SWSCALEOBJS += sw_scale.o > > > > CHECKASMOBJS-$(CONFIG_SWSCALE) += $(SWSCALEOBJS) > > > > We use the multiple-objects in a line style in all Makefiles. Then we should change the following: libswscale/arm/Makefile (NEON_OBJS) tests/checkasm/Makefile (AVUTILOBJS) libavfilter/dnn/Makefile (OBJS-$(CONFIG_DNN)) New patch attached. From 4965ece9648be5da6e93b6bfa319b6a5fe92aee6 Mon Sep 17 00:00:00 2001 From: Ramiro Polla Date: Thu, 6 Jun 2024 15:40:03 +0200 Subject: [PATCH] tests/checkasm: cosmetics, one object per line in Makefile --- tests/checkasm/Makefile | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/tests/checkasm/Makefile b/tests/checkasm/Makefile index 6eb94d10d5..c2a41d7f7b 100644 --- a/tests/checkasm/Makefile +++ b/tests/checkasm/Makefile @@ -63,7 +63,9 @@ AVFILTEROBJS-$(CONFIG_SOBEL_FILTER) += vf_convolution.o CHECKASMOBJS-$(CONFIG_AVFILTER) += $(AVFILTEROBJS-yes) # swscale tests -SWSCALEOBJS += sw_gbrp.o sw_rgb.o sw_scale.o +SWSCALEOBJS += sw_gbrp.o \ + sw_rgb.o \ + sw_scale.o \ CHECKASMOBJS-$(CONFIG_SWSCALE) += $(SWSCALEOBJS) -- 2.30.2 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] swscale/x86: add avx2 {lum, chr}ConvertRange
On Fri, Jun 7, 2024 at 7:38 PM Ramiro Polla wrote: > > chrRangeFromJpeg_8_c: 49.4 > chrRangeFromJpeg_8_sse4: 15.9 > chrRangeFromJpeg_8_avx2: 22.6 > chrRangeFromJpeg_24_c: 129.4 > chrRangeFromJpeg_24_sse4: 32.1 > chrRangeFromJpeg_24_avx2: 35.1 > chrRangeFromJpeg_128_c: 534.6 > chrRangeFromJpeg_128_sse4: 165.6 > chrRangeFromJpeg_128_avx2: 100.4 > chrRangeFromJpeg_144_c: 735.6 > chrRangeFromJpeg_144_sse4: 185.1 > chrRangeFromJpeg_144_avx2: 109.4 > chrRangeFromJpeg_256_c: 634.6 > chrRangeFromJpeg_256_sse4: 323.6 > chrRangeFromJpeg_256_avx2: 192.6 > chrRangeFromJpeg_512_c: 1242.4 > chrRangeFromJpeg_512_sse4: 662.1 > chrRangeFromJpeg_512_avx2: 409.1 > chrRangeToJpeg_8_c: 39.6 > chrRangeToJpeg_8_sse4: 15.9 > chrRangeToJpeg_8_avx2: 25.4 > chrRangeToJpeg_24_c: 118.9 > chrRangeToJpeg_24_sse4: 32.9 > chrRangeToJpeg_24_avx2: 30.1 > chrRangeToJpeg_128_c: 636.9 > chrRangeToJpeg_128_sse4: 164.4 > chrRangeToJpeg_128_avx2: 96.6 > chrRangeToJpeg_144_c: 716.4 > chrRangeToJpeg_144_sse4: 187.1 > chrRangeToJpeg_144_avx2: 109.4 > chrRangeToJpeg_256_c: 1258.6 > chrRangeToJpeg_256_sse4: 326.1 > chrRangeToJpeg_256_avx2: 193.9 > chrRangeToJpeg_512_c: 2489.4 > chrRangeToJpeg_512_sse4: 662.1 > chrRangeToJpeg_512_avx2: 382.4 > lumRangeFromJpeg_8_c: 13.6 > lumRangeFromJpeg_8_sse4: 14.4 > lumRangeFromJpeg_8_avx2: 19.6 > lumRangeFromJpeg_24_c: 38.9 > lumRangeFromJpeg_24_sse4: 18.9 > lumRangeFromJpeg_24_avx2: 23.9 > lumRangeFromJpeg_128_c: 239.4 > lumRangeFromJpeg_128_sse4: 81.9 > lumRangeFromJpeg_128_avx2: 51.6 > lumRangeFromJpeg_144_c: 285.1 > lumRangeFromJpeg_144_sse4: 92.1 > lumRangeFromJpeg_144_avx2: 59.6 > lumRangeFromJpeg_256_c: 857.1 > lumRangeFromJpeg_256_sse4: 164.4 > lumRangeFromJpeg_256_avx2: 101.9 > lumRangeFromJpeg_512_c: 1028.6 > lumRangeFromJpeg_512_sse4: 335.6 > lumRangeFromJpeg_512_avx2: 201.4 > lumRangeToJpeg_8_c: 20.4 > lumRangeToJpeg_8_sse4: 14.4 > lumRangeToJpeg_8_avx2: 20.4 > lumRangeToJpeg_24_c: 58.1 > lumRangeToJpeg_24_sse4: 18.9 > lumRangeToJpeg_24_avx2: 22.6 > lumRangeToJpeg_128_c: 327.4 > lumRangeToJpeg_128_sse4: 83.4 > lumRangeToJpeg_128_avx2: 53.6 > lumRangeToJpeg_144_c: 375.6 > lumRangeToJpeg_144_sse4: 93.9 > lumRangeToJpeg_144_avx2: 58.9 > lumRangeToJpeg_256_c: 649.6 > lumRangeToJpeg_256_sse4: 162.1 > lumRangeToJpeg_256_avx2: 101.9 > lumRangeToJpeg_512_c: 1289.1 > lumRangeToJpeg_512_sse4: 335.6 > lumRangeToJpeg_512_avx2: 201.4 > --- > libswscale/x86/range_convert.asm | 46 ++-- > libswscale/x86/swscale.c | 5 +++- > 2 files changed, 42 insertions(+), 9 deletions(-) > > diff --git a/libswscale/x86/range_convert.asm > b/libswscale/x86/range_convert.asm > index 13983a386b..54c2f64769 100644 > --- a/libswscale/x86/range_convert.asm > +++ b/libswscale/x86/range_convert.asm [...] > @@ -66,10 +66,19 @@ cglobal %1, 2, 3, 3, dst, width, x > padddm1, m5 > psradm0, %4 > psradm1, %4 > +%if mmsize == 16 > packssdw m0, m0 > packssdw m1, m1 > movq[dstq+xq*2], m0 > movq[dstq+xq*2+mmsize/2], m1 > +%else > +vextracti128xm7, ym0, 1 > +packssdwxm0, xm7 > +vextracti128xm7, ym1, 1 > +packssdwxm1, xm7 > +movdqu [dstq+xq*2], xm0 > +movdqu [dstq+xq*2+mmsize/2], xm1 > +%endif > add xq, mmsize / 2 > cmp xd, widthd > jl .loop Is there a cleaner way to do this packing in avx2 (or a macro to have the same code as non-avx2)? Also is there some cleaner way to move half the register into memory (instead of having to ifdef between mmsize)? ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH] swscale/x86: add avx2 {lum, chr}ConvertRange
chrRangeFromJpeg_8_c: 49.4 chrRangeFromJpeg_8_sse4: 15.9 chrRangeFromJpeg_8_avx2: 22.6 chrRangeFromJpeg_24_c: 129.4 chrRangeFromJpeg_24_sse4: 32.1 chrRangeFromJpeg_24_avx2: 35.1 chrRangeFromJpeg_128_c: 534.6 chrRangeFromJpeg_128_sse4: 165.6 chrRangeFromJpeg_128_avx2: 100.4 chrRangeFromJpeg_144_c: 735.6 chrRangeFromJpeg_144_sse4: 185.1 chrRangeFromJpeg_144_avx2: 109.4 chrRangeFromJpeg_256_c: 634.6 chrRangeFromJpeg_256_sse4: 323.6 chrRangeFromJpeg_256_avx2: 192.6 chrRangeFromJpeg_512_c: 1242.4 chrRangeFromJpeg_512_sse4: 662.1 chrRangeFromJpeg_512_avx2: 409.1 chrRangeToJpeg_8_c: 39.6 chrRangeToJpeg_8_sse4: 15.9 chrRangeToJpeg_8_avx2: 25.4 chrRangeToJpeg_24_c: 118.9 chrRangeToJpeg_24_sse4: 32.9 chrRangeToJpeg_24_avx2: 30.1 chrRangeToJpeg_128_c: 636.9 chrRangeToJpeg_128_sse4: 164.4 chrRangeToJpeg_128_avx2: 96.6 chrRangeToJpeg_144_c: 716.4 chrRangeToJpeg_144_sse4: 187.1 chrRangeToJpeg_144_avx2: 109.4 chrRangeToJpeg_256_c: 1258.6 chrRangeToJpeg_256_sse4: 326.1 chrRangeToJpeg_256_avx2: 193.9 chrRangeToJpeg_512_c: 2489.4 chrRangeToJpeg_512_sse4: 662.1 chrRangeToJpeg_512_avx2: 382.4 lumRangeFromJpeg_8_c: 13.6 lumRangeFromJpeg_8_sse4: 14.4 lumRangeFromJpeg_8_avx2: 19.6 lumRangeFromJpeg_24_c: 38.9 lumRangeFromJpeg_24_sse4: 18.9 lumRangeFromJpeg_24_avx2: 23.9 lumRangeFromJpeg_128_c: 239.4 lumRangeFromJpeg_128_sse4: 81.9 lumRangeFromJpeg_128_avx2: 51.6 lumRangeFromJpeg_144_c: 285.1 lumRangeFromJpeg_144_sse4: 92.1 lumRangeFromJpeg_144_avx2: 59.6 lumRangeFromJpeg_256_c: 857.1 lumRangeFromJpeg_256_sse4: 164.4 lumRangeFromJpeg_256_avx2: 101.9 lumRangeFromJpeg_512_c: 1028.6 lumRangeFromJpeg_512_sse4: 335.6 lumRangeFromJpeg_512_avx2: 201.4 lumRangeToJpeg_8_c: 20.4 lumRangeToJpeg_8_sse4: 14.4 lumRangeToJpeg_8_avx2: 20.4 lumRangeToJpeg_24_c: 58.1 lumRangeToJpeg_24_sse4: 18.9 lumRangeToJpeg_24_avx2: 22.6 lumRangeToJpeg_128_c: 327.4 lumRangeToJpeg_128_sse4: 83.4 lumRangeToJpeg_128_avx2: 53.6 lumRangeToJpeg_144_c: 375.6 lumRangeToJpeg_144_sse4: 93.9 lumRangeToJpeg_144_avx2: 58.9 lumRangeToJpeg_256_c: 649.6 lumRangeToJpeg_256_sse4: 162.1 lumRangeToJpeg_256_avx2: 101.9 lumRangeToJpeg_512_c: 1289.1 lumRangeToJpeg_512_sse4: 335.6 lumRangeToJpeg_512_avx2: 201.4 --- libswscale/x86/range_convert.asm | 46 ++-- libswscale/x86/swscale.c | 5 +++- 2 files changed, 42 insertions(+), 9 deletions(-) diff --git a/libswscale/x86/range_convert.asm b/libswscale/x86/range_convert.asm index 13983a386b..54c2f64769 100644 --- a/libswscale/x86/range_convert.asm +++ b/libswscale/x86/range_convert.asm @@ -22,20 +22,20 @@ SECTION_RODATA -chr_to_mult:times 4 dd 4663 -chr_to_offset: times 4 dd -9289992 +chr_to_mult:times 8 dd 4663 +chr_to_offset: times 8 dd -9289992 %define chr_to_shift 12 -chr_from_mult: times 4 dd 1799 -chr_from_offset:times 4 dd 4081085 +chr_from_mult: times 8 dd 1799 +chr_from_offset:times 8 dd 4081085 %define chr_from_shift 11 -lum_to_mult:times 4 dd 19077 -lum_to_offset: times 4 dd -39057361 +lum_to_mult:times 8 dd 19077 +lum_to_offset: times 8 dd -39057361 %define lum_to_shift 14 -lum_from_mult: times 4 dd 14071 -lum_from_offset:times 4 dd 33561947 +lum_from_mult: times 8 dd 14071 +lum_from_offset:times 8 dd 33561947 %define lum_from_shift 14 SECTION .text @@ -66,10 +66,19 @@ cglobal %1, 2, 3, 3, dst, width, x padddm1, m5 psradm0, %4 psradm1, %4 +%if mmsize == 16 packssdw m0, m0 packssdw m1, m1 movq[dstq+xq*2], m0 movq[dstq+xq*2+mmsize/2], m1 +%else +vextracti128xm7, ym0, 1 +packssdwxm0, xm7 +vextracti128xm7, ym1, 1 +packssdwxm1, xm7 +movdqu [dstq+xq*2], xm0 +movdqu [dstq+xq*2+mmsize/2], xm1 +%endif add xq, mmsize / 2 cmp xd, widthd jl .loop @@ -107,6 +116,7 @@ cglobal %1, 3, 4, 4, dstU, dstV, width, x psradm1, %4 psradm2, %4 psradm3, %4 +%if mmsize == 16 packssdw m0, m0 packssdw m1, m1 packssdw m2, m2 @@ -115,6 +125,20 @@ cglobal %1, 3, 4, 4, dstU, dstV, width, x movq [dstUq+xq*2+mmsize/2], m1 movq [dstVq+xq*2], m2 movq [dstVq+xq*2+mmsize/2], m3 +%else +vextracti128xm7, ym0, 1 +packssdwxm0, xm7 +vextracti128xm7, ym1, 1 +packssdwxm1, xm7 +vextracti128xm7, ym2, 1 +packssdwxm2, xm7 +vextracti128xm7, ym3, 1 +packssdwxm3, xm7 +movdqu [dstUq+xq*2], xm0 +movdqu [dstUq+xq*2+mmsize/2], xm1 +movdqu [dstVq+xq*2], xm2 +movdqu [dstVq+xq*2+mmsize/2], xm3 +%endif add xq, mmsize / 2 cmp xd, widthd jl .loop @@ -127,4 +151,10 @@ LUMCONVERTRANGE lumRangeToJpeg, lum_to_mult, lum_to_offset, lum_to_shift CHRCONVERTRANGE chrRangeToJpeg, chr_to_mult, chr_to_offset,
Re: [FFmpeg-devel] [PATCH 3/4] swscale/x86: add sse4 {lum, chr}ConvertRange
On Fri, Jun 7, 2024 at 4:05 PM Ramiro Polla wrote: > > chrRangeFromJpeg_8_c: 19.9 > chrRangeFromJpeg_8_sse4: 16.2 > chrRangeFromJpeg_24_c: 60.7 > chrRangeFromJpeg_24_sse4: 28.9 > chrRangeFromJpeg_128_c: 325.7 > chrRangeFromJpeg_128_sse4: 160.2 > chrRangeFromJpeg_144_c: 364.2 > chrRangeFromJpeg_144_sse4: 194.9 > chrRangeFromJpeg_256_c: 630.7 > chrRangeFromJpeg_256_sse4: 337.4 > chrRangeFromJpeg_512_c: 1240.4 > chrRangeFromJpeg_512_sse4: 668.4 > chrRangeToJpeg_8_c: 37.7 > chrRangeToJpeg_8_sse4: 19.7 > chrRangeToJpeg_24_c: 114.7 > chrRangeToJpeg_24_sse4: 30.2 > chrRangeToJpeg_128_c: 636.4 > chrRangeToJpeg_128_sse4: 161.7 > chrRangeToJpeg_144_c: 715.7 > chrRangeToJpeg_144_sse4: 272.9 > chrRangeToJpeg_256_c: 1256.7 > chrRangeToJpeg_256_sse4: 341.9 > chrRangeToJpeg_512_c: 2498.7 > chrRangeToJpeg_512_sse4: 668.4 > lumRangeFromJpeg_8_c: 11.7 > lumRangeFromJpeg_8_sse4: 12.4 > lumRangeFromJpeg_24_c: 36.9 > lumRangeFromJpeg_24_sse4: 17.7 > lumRangeFromJpeg_128_c: 228.4 > lumRangeFromJpeg_128_sse4: 85.2 > lumRangeFromJpeg_144_c: 272.9 > lumRangeFromJpeg_144_sse4: 96.9 > lumRangeFromJpeg_256_c: 463.4 > lumRangeFromJpeg_256_sse4: 183.9 > lumRangeFromJpeg_512_c: 879.9 > lumRangeFromJpeg_512_sse4: 355.9 > lumRangeToJpeg_8_c: 17.7 > lumRangeToJpeg_8_sse4: 15.4 > lumRangeToJpeg_24_c: 56.2 > lumRangeToJpeg_24_sse4: 18.4 > lumRangeToJpeg_128_c: 331.4 > lumRangeToJpeg_128_sse4: 84.4 > lumRangeToJpeg_144_c: 375.2 > lumRangeToJpeg_144_sse4: 96.9 > lumRangeToJpeg_256_c: 649.7 > lumRangeToJpeg_256_sse4: 184.4 > lumRangeToJpeg_512_c: 1281.9 > lumRangeToJpeg_512_sse4: 355.9 > --- > libswscale/swscale_internal.h| 1 + > libswscale/utils.c | 2 + > libswscale/x86/Makefile | 1 + > libswscale/x86/range_convert.asm | 100 +++ > libswscale/x86/swscale.c | 36 +++ > 5 files changed, 140 insertions(+) > create mode 100644 libswscale/x86/range_convert.asm > > diff --git a/libswscale/swscale_internal.h b/libswscale/swscale_internal.h > index d4b0c3cee2..92f6105443 100644 > --- a/libswscale/swscale_internal.h > +++ b/libswscale/swscale_internal.h > @@ -698,6 +698,7 @@ void ff_updateMMXDitherTables(SwsContext *c, int dstY); > > av_cold void ff_sws_init_range_convert(SwsContext *c); > av_cold void ff_sws_init_range_convert_loongarch(SwsContext *c); > +av_cold void ff_sws_init_range_convert_x86(SwsContext *c); > > SwsFunc ff_yuv2rgb_init_x86(SwsContext *c); > SwsFunc ff_yuv2rgb_init_ppc(SwsContext *c); > diff --git a/libswscale/utils.c b/libswscale/utils.c > index 476a24fea5..8dfa57b5ff 100644 > --- a/libswscale/utils.c > +++ b/libswscale/utils.c > @@ -1082,6 +1082,8 @@ int sws_setColorspaceDetails(struct SwsContext *c, > const int inv_table[4], > ff_sws_init_range_convert(c); > #if ARCH_LOONGARCH64 > ff_sws_init_range_convert_loongarch(c); > +#elif ARCH_X86 > +ff_sws_init_range_convert_x86(c); > #endif > } > > diff --git a/libswscale/x86/Makefile b/libswscale/x86/Makefile > index 68391494be..f00154941d 100644 > --- a/libswscale/x86/Makefile > +++ b/libswscale/x86/Makefile > @@ -12,6 +12,7 @@ X86ASM-OBJS += x86/input.o > \ > x86/output.o \ > x86/scale.o \ > x86/scale_avx2.o > \ > + x86/range_convert.o \ > x86/rgb_2_rgb.o \ > x86/yuv_2_rgb.o \ > x86/yuv2yuvX.o \ > diff --git a/libswscale/x86/range_convert.asm > b/libswscale/x86/range_convert.asm > new file mode 100644 > index 00..333265fb65 > --- /dev/null > +++ b/libswscale/x86/range_convert.asm > @@ -0,0 +1,100 @@ > +;** > +;* Copyright (c) 2024 Ramiro Polla > +;* > +;* This file is part of FFmpeg. > +;* > +;* FFmpeg is free software; you can redistribute it and/or > +;* modify it under the terms of the GNU Lesser General Public > +;* License as published by the Free Software Foundation; either > +;* version 2.1 of the License, or (at your option) any later version. > +;* > +;* FFmpeg is distributed in the hope that it will be useful, > +;* but WITHOUT ANY WARRANTY; without even the implied warranty of > +;* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > +;* Lesser General Publi
[FFmpeg-devel] [PATCH 4/4] swscale/aarch64: add neon {lum, chr}ConvertRange
chrRangeFromJpeg_8_c: 28.5 chrRangeFromJpeg_8_neon: 21.2 chrRangeFromJpeg_24_c: 81.2 chrRangeFromJpeg_24_neon: 34.7 chrRangeFromJpeg_128_c: 425.2 chrRangeFromJpeg_128_neon: 162.0 chrRangeFromJpeg_144_c: 480.2 chrRangeFromJpeg_144_neon: 180.2 chrRangeFromJpeg_256_c: 838.2 chrRangeFromJpeg_256_neon: 318.0 chrRangeFromJpeg_512_c: 1698.2 chrRangeFromJpeg_512_neon: 630.0 chrRangeToJpeg_8_c: 56.0 chrRangeToJpeg_8_neon: 23.5 chrRangeToJpeg_24_c: 147.7 chrRangeToJpeg_24_neon: 38.2 chrRangeToJpeg_128_c: 760.2 chrRangeToJpeg_128_neon: 182.5 chrRangeToJpeg_144_c: 857.7 chrRangeToJpeg_144_neon: 204.5 chrRangeToJpeg_256_c: 1504.2 chrRangeToJpeg_256_neon: 358.5 chrRangeToJpeg_512_c: 3025.7 chrRangeToJpeg_512_neon: 710.5 lumRangeFromJpeg_8_c: 24.0 lumRangeFromJpeg_8_neon: 18.2 lumRangeFromJpeg_24_c: 64.0 lumRangeFromJpeg_24_neon: 22.2 lumRangeFromJpeg_128_c: 289.2 lumRangeFromJpeg_128_neon: 79.2 lumRangeFromJpeg_144_c: 334.7 lumRangeFromJpeg_144_neon: 87.7 lumRangeFromJpeg_256_c: 579.5 lumRangeFromJpeg_256_neon: 152.0 lumRangeFromJpeg_512_c: 1208.0 lumRangeFromJpeg_512_neon: 299.0 lumRangeToJpeg_8_c: 30.0 lumRangeToJpeg_8_neon: 19.0 lumRangeToJpeg_24_c: 82.2 lumRangeToJpeg_24_neon: 24.0 lumRangeToJpeg_128_c: 440.7 lumRangeToJpeg_128_neon: 90.5 lumRangeToJpeg_144_c: 502.0 lumRangeToJpeg_144_neon: 102.2 lumRangeToJpeg_256_c: 893.7 lumRangeToJpeg_256_neon: 178.0 lumRangeToJpeg_512_c: 1793.7 lumRangeToJpeg_512_neon: 355.0 --- libswscale/aarch64/Makefile | 1 + libswscale/aarch64/range_convert_neon.S | 103 libswscale/aarch64/swscale.c| 21 + libswscale/swscale_internal.h | 1 + libswscale/utils.c | 4 +- 5 files changed, 129 insertions(+), 1 deletion(-) create mode 100644 libswscale/aarch64/range_convert_neon.S diff --git a/libswscale/aarch64/Makefile b/libswscale/aarch64/Makefile index da1d909561..6923827f82 100644 --- a/libswscale/aarch64/Makefile +++ b/libswscale/aarch64/Makefile @@ -4,5 +4,6 @@ OBJS+= aarch64/rgb2rgb.o\ NEON-OBJS += aarch64/hscale.o \ aarch64/output.o \ + aarch64/range_convert_neon.o \ aarch64/rgb2rgb_neon.o \ aarch64/yuv2rgb_neon.o \ diff --git a/libswscale/aarch64/range_convert_neon.S b/libswscale/aarch64/range_convert_neon.S new file mode 100644 index 00..5e104971f0 --- /dev/null +++ b/libswscale/aarch64/range_convert_neon.S @@ -0,0 +1,103 @@ +/* + * Copyright (c) 2024 Ramiro Polla + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include "libavutil/aarch64/asm.S" + +.macro lumConvertRange name max mult offset shift +const offset_\name, align=4 +.word \offset, \offset, \offset, \offset +endconst +function ff_\name, export=1 +.if \max != 0 +mov w3, #\max +dup v24.8h, w3 +.endif +mov w3, #\mult +dup v25.4s, w3 +movrel x3, offset_\name +ld1 {v26.4s}, [x3] +1: +ld1 {v0.8h}, [x0] +.if \max != 0 +sminv0.8h, v0.8h, v24.8h +.endif +mov v16.16b, v26.16b +mov v18.16b, v26.16b +sxtlv20.4s, v0.4h +sxtl2 v22.4s, v0.8h +mla v16.4s, v20.4s, v25.4s +mla v18.4s, v22.4s, v25.4s +shrnv0.4h, v16.4s, #\shift +shrn2 v0.8h, v18.4s, #\shift +subsw1, w1, #8 +st1 {v0.8h}, [x0], #16 +b.gt1b +ret +endfunc +.endm + +.macro chrConvertRange name max mult offset shift +const offset_\name, align=4 +.word \offset, \offset, \offset, \offset +endconst +function ff_\name, export=1 +.if \max != 0 +mov w3, #\max +dup v24.8h, w3 +.endif +mov w3, #\mult +dup v25.4s, w3 +movrel x3, offset_\name +ld1 {v26.4s}, [x3] +1: +ld1 {v0.8h}, [x0] +ld1 {v1.8h}, [x1] +.if \max != 0 +sminv0.8h, v0.8h, v24.8h +
[FFmpeg-devel] [PATCH 3/4] swscale/x86: add sse4 {lum, chr}ConvertRange
chrRangeFromJpeg_8_c: 19.9 chrRangeFromJpeg_8_sse4: 16.2 chrRangeFromJpeg_24_c: 60.7 chrRangeFromJpeg_24_sse4: 28.9 chrRangeFromJpeg_128_c: 325.7 chrRangeFromJpeg_128_sse4: 160.2 chrRangeFromJpeg_144_c: 364.2 chrRangeFromJpeg_144_sse4: 194.9 chrRangeFromJpeg_256_c: 630.7 chrRangeFromJpeg_256_sse4: 337.4 chrRangeFromJpeg_512_c: 1240.4 chrRangeFromJpeg_512_sse4: 668.4 chrRangeToJpeg_8_c: 37.7 chrRangeToJpeg_8_sse4: 19.7 chrRangeToJpeg_24_c: 114.7 chrRangeToJpeg_24_sse4: 30.2 chrRangeToJpeg_128_c: 636.4 chrRangeToJpeg_128_sse4: 161.7 chrRangeToJpeg_144_c: 715.7 chrRangeToJpeg_144_sse4: 272.9 chrRangeToJpeg_256_c: 1256.7 chrRangeToJpeg_256_sse4: 341.9 chrRangeToJpeg_512_c: 2498.7 chrRangeToJpeg_512_sse4: 668.4 lumRangeFromJpeg_8_c: 11.7 lumRangeFromJpeg_8_sse4: 12.4 lumRangeFromJpeg_24_c: 36.9 lumRangeFromJpeg_24_sse4: 17.7 lumRangeFromJpeg_128_c: 228.4 lumRangeFromJpeg_128_sse4: 85.2 lumRangeFromJpeg_144_c: 272.9 lumRangeFromJpeg_144_sse4: 96.9 lumRangeFromJpeg_256_c: 463.4 lumRangeFromJpeg_256_sse4: 183.9 lumRangeFromJpeg_512_c: 879.9 lumRangeFromJpeg_512_sse4: 355.9 lumRangeToJpeg_8_c: 17.7 lumRangeToJpeg_8_sse4: 15.4 lumRangeToJpeg_24_c: 56.2 lumRangeToJpeg_24_sse4: 18.4 lumRangeToJpeg_128_c: 331.4 lumRangeToJpeg_128_sse4: 84.4 lumRangeToJpeg_144_c: 375.2 lumRangeToJpeg_144_sse4: 96.9 lumRangeToJpeg_256_c: 649.7 lumRangeToJpeg_256_sse4: 184.4 lumRangeToJpeg_512_c: 1281.9 lumRangeToJpeg_512_sse4: 355.9 --- libswscale/swscale_internal.h| 1 + libswscale/utils.c | 2 + libswscale/x86/Makefile | 1 + libswscale/x86/range_convert.asm | 100 +++ libswscale/x86/swscale.c | 36 +++ 5 files changed, 140 insertions(+) create mode 100644 libswscale/x86/range_convert.asm diff --git a/libswscale/swscale_internal.h b/libswscale/swscale_internal.h index d4b0c3cee2..92f6105443 100644 --- a/libswscale/swscale_internal.h +++ b/libswscale/swscale_internal.h @@ -698,6 +698,7 @@ void ff_updateMMXDitherTables(SwsContext *c, int dstY); av_cold void ff_sws_init_range_convert(SwsContext *c); av_cold void ff_sws_init_range_convert_loongarch(SwsContext *c); +av_cold void ff_sws_init_range_convert_x86(SwsContext *c); SwsFunc ff_yuv2rgb_init_x86(SwsContext *c); SwsFunc ff_yuv2rgb_init_ppc(SwsContext *c); diff --git a/libswscale/utils.c b/libswscale/utils.c index 476a24fea5..8dfa57b5ff 100644 --- a/libswscale/utils.c +++ b/libswscale/utils.c @@ -1082,6 +1082,8 @@ int sws_setColorspaceDetails(struct SwsContext *c, const int inv_table[4], ff_sws_init_range_convert(c); #if ARCH_LOONGARCH64 ff_sws_init_range_convert_loongarch(c); +#elif ARCH_X86 +ff_sws_init_range_convert_x86(c); #endif } diff --git a/libswscale/x86/Makefile b/libswscale/x86/Makefile index 68391494be..f00154941d 100644 --- a/libswscale/x86/Makefile +++ b/libswscale/x86/Makefile @@ -12,6 +12,7 @@ X86ASM-OBJS += x86/input.o \ x86/output.o \ x86/scale.o \ x86/scale_avx2.o \ + x86/range_convert.o \ x86/rgb_2_rgb.o \ x86/yuv_2_rgb.o \ x86/yuv2yuvX.o \ diff --git a/libswscale/x86/range_convert.asm b/libswscale/x86/range_convert.asm new file mode 100644 index 00..333265fb65 --- /dev/null +++ b/libswscale/x86/range_convert.asm @@ -0,0 +1,100 @@ +;** +;* Copyright (c) 2024 Ramiro Polla +;* +;* This file is part of FFmpeg. +;* +;* FFmpeg is free software; you can redistribute it and/or +;* modify it under the terms of the GNU Lesser General Public +;* License as published by the Free Software Foundation; either +;* version 2.1 of the License, or (at your option) any later version. +;* +;* FFmpeg is distributed in the hope that it will be useful, +;* but WITHOUT ANY WARRANTY; without even the implied warranty of +;* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +;* Lesser General Public License for more details. +;* +;* You should have received a copy of the GNU Lesser General Public +;* License along with FFmpeg; if not, write to the Free Software +;* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA +;** + +%include "libavutil/x86/x86util.asm" + +; NOTE: there is no need to clamp the input when converting to jpeg range +; (like we do in the C code) because packssdw will saturate the output. + +;- +; lumConvertRange
[FFmpeg-devel] [PATCH 2/4] checkasm: add tests for {lum, chr}ConvertRange
--- tests/checkasm/Makefile | 1 + tests/checkasm/checkasm.c | 1 + tests/checkasm/checkasm.h | 1 + tests/checkasm/sw_range_convert.c | 134 ++ 4 files changed, 137 insertions(+) create mode 100644 tests/checkasm/sw_range_convert.c diff --git a/tests/checkasm/Makefile b/tests/checkasm/Makefile index 3ce152e818..e4ec6a27ec 100644 --- a/tests/checkasm/Makefile +++ b/tests/checkasm/Makefile @@ -64,6 +64,7 @@ CHECKASMOBJS-$(CONFIG_AVFILTER) += $(AVFILTEROBJS-yes) # swscale tests SWSCALEOBJS += sw_gbrp.o +SWSCALEOBJS += sw_range_convert.o SWSCALEOBJS += sw_rgb.o SWSCALEOBJS += sw_scale.o diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c index d7aa2a9c09..d2b50c023a 100644 --- a/tests/checkasm/checkasm.c +++ b/tests/checkasm/checkasm.c @@ -248,6 +248,7 @@ static const struct { #endif #if CONFIG_SWSCALE { "sw_gbrp", checkasm_check_sw_gbrp }, +{ "sw_range_convert", checkasm_check_sw_range_convert }, { "sw_rgb", checkasm_check_sw_rgb }, { "sw_scale", checkasm_check_sw_scale }, #endif diff --git a/tests/checkasm/checkasm.h b/tests/checkasm/checkasm.h index 211d7f52e6..e544007b67 100644 --- a/tests/checkasm/checkasm.h +++ b/tests/checkasm/checkasm.h @@ -119,6 +119,7 @@ void checkasm_check_rv40dsp(void); void checkasm_check_svq1enc(void); void checkasm_check_synth_filter(void); void checkasm_check_sw_gbrp(void); +void checkasm_check_sw_range_convert(void); void checkasm_check_sw_rgb(void); void checkasm_check_sw_scale(void); void checkasm_check_takdsp(void); diff --git a/tests/checkasm/sw_range_convert.c b/tests/checkasm/sw_range_convert.c new file mode 100644 index 00..6d7e22ad40 --- /dev/null +++ b/tests/checkasm/sw_range_convert.c @@ -0,0 +1,134 @@ +/* + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License along + * with FFmpeg; if not, write to the Free Software Foundation, Inc., + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. + */ + +#include + +#include "libavutil/common.h" +#include "libavutil/intreadwrite.h" +#include "libavutil/mem.h" +#include "libavutil/mem_internal.h" + +#include "libswscale/swscale.h" +#include "libswscale/swscale_internal.h" + +#include "checkasm.h" + +static void check_lumConvertRange(int from) +{ +const char *func_str = from ? "lumRangeFromJpeg" : "lumRangeToJpeg"; +#define LARGEST_INPUT_SIZE 512 +#define INPUT_SIZES 6 +static const int input_sizes[] = {8, 24, 128, 144, 256, 512}; +struct SwsContext *ctx; + +LOCAL_ALIGNED_32(int16_t, dst0, [LARGEST_INPUT_SIZE]); +LOCAL_ALIGNED_32(int16_t, dst1, [LARGEST_INPUT_SIZE]); + +declare_func(void, int16_t *dst, int width); + +ctx = sws_alloc_context(); +if (sws_init_context(ctx, NULL, NULL) < 0) +fail(); + +ctx->srcFormat = from ? AV_PIX_FMT_YUVJ444P : AV_PIX_FMT_YUV444P; +ctx->dstFormat = from ? AV_PIX_FMT_YUV444P : AV_PIX_FMT_YUVJ444P; +ctx->srcRange = from; +ctx->dstRange = !from; + +for (int dstWi = 0; dstWi < INPUT_SIZES; dstWi++) { +int width = input_sizes[dstWi]; +for (int i = 0; i < width; i++) { +uint8_t r = rnd(); +dst0[i] = (int16_t) r << 7; +dst1[i] = (int16_t) r << 7; +} +ff_sws_init_scale(ctx); +if (check_func(ctx->lumConvertRange, "%s_%d", func_str, width)) { +call_ref(dst0, width); +call_new(dst1, width); +if (memcmp(dst0, dst1, width * sizeof(int16_t))) +fail(); +bench_new(dst1, width); +} +} + +sws_freeContext(ctx); +} +#undef LARGEST_INPUT_SIZE +#undef INPUT_SIZES + +static void check_chrConvertRange(int from) +{ +const char *func_str = from ? "chrRangeFromJpeg" : "chrRangeToJpeg"; +#define LARGEST_INPUT_SIZE 512 +#define INPUT_SIZES 6 +static const int input_sizes[] = {8, 24, 128, 144, 256, 512}; +struct SwsContext *ctx; + +LOCAL_ALIGNED_32(int16_t, dstU0, [LARGEST_INPUT_SIZE]); +LOCAL_ALIGNED_32(int16_t, dstV0, [LARGEST_INPUT_SIZE]); +LOCAL_ALIGNED_32(int16_t, dstU1, [LARGEST_INPUT_SIZE]); +LOCAL_ALIGNED_32(int16_t, dstV1, [LARGEST_INPUT_SIZE]); + +declare_func(void, int16_t *dstU, int16_t *dstV, int width); + +ctx = sws_alloc_context(); +if (sws_init_context(ctx, NULL, NULL) <
[FFmpeg-devel] [PATCH 1/4] tests/checkasm: cosmetics, one object per line in Makefile
--- tests/checkasm/Makefile | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/tests/checkasm/Makefile b/tests/checkasm/Makefile index 6eb94d10d5..3ce152e818 100644 --- a/tests/checkasm/Makefile +++ b/tests/checkasm/Makefile @@ -63,7 +63,9 @@ AVFILTEROBJS-$(CONFIG_SOBEL_FILTER) += vf_convolution.o CHECKASMOBJS-$(CONFIG_AVFILTER) += $(AVFILTEROBJS-yes) # swscale tests -SWSCALEOBJS += sw_gbrp.o sw_rgb.o sw_scale.o +SWSCALEOBJS += sw_gbrp.o +SWSCALEOBJS += sw_rgb.o +SWSCALEOBJS += sw_scale.o CHECKASMOBJS-$(CONFIG_SWSCALE) += $(SWSCALEOBJS) -- 2.30.2 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] libswscale/x86/yuv_2_rgb: fix some comments
On Tue, Jun 4, 2024 at 3:15 PM Ramiro Polla wrote: > > --- > libswscale/x86/yuv_2_rgb.asm | 8 > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/libswscale/x86/yuv_2_rgb.asm b/libswscale/x86/yuv_2_rgb.asm > index e3470fd9ad..a1f9134e08 100644 > --- a/libswscale/x86/yuv_2_rgb.asm > +++ b/libswscale/x86/yuv_2_rgb.asm > @@ -195,15 +195,15 @@ cglobal %1_420_%2%3, GPR_num, GPR_num, reg_num, > parameters > mova m5, m7 > paddsw m3, m0 ; B1 B3 B5 B7 ... > paddsw m5, m1 ; R1 R3 R5 R7 ... > -paddsw m7, m2 ; G1 G3 G4 G7 ... > +paddsw m7, m2 ; G1 G3 G5 G7 ... > paddsw m0, m6 ; B0 B2 B4 B6 ... > paddsw m1, m6 ; R0 R2 R4 R6 ... > paddsw m2, m6 ; G0 G2 G4 G6 ... > > %if %3 == 24 ; PACK RGB24 > %define depth 3 > -packuswb m0, m3 ; R0 R2 R4 R6 ... R1 R3 R5 R7 ... > -packuswb m1, m5 ; B0 B2 B4 B6 ... B1 B3 B5 B7 ... > +packuswb m0, m3 ; B0 B2 B4 B6 ... B1 B3 B5 B7 ... > +packuswb m1, m5 ; R0 R2 R4 R6 ... R1 R3 R5 R7 ... > packuswb m2, m7 ; G0 G2 G4 G6 ... G1 G3 G5 G7 ... > mova m3, m_red > mova m6, m_blue > @@ -248,7 +248,7 @@ cglobal %1_420_%2%3, GPR_num, GPR_num, reg_num, parameters > psrlq m5, 32 > movd [imageq + 20], m2 ; -- -- G7 B7 > movd [imageq + 18], m5 ; R6 G6 B6 R7 > -%endif ; mmsize = 8 > +%endif ; cpuflag > %else ; mmsize == 16 > pshufb m3, [rgb24_shuf1] ; r0 g0 r6 g6 r12 g12 r2 g2 r8 g8 r14 > g14 r4 g4 r10 g10 > pshufb m6, [rgb24_shuf2] ; b10 r11 b0 r1 b6 r7 b12 r13 b2 r3 b8 > r9 b14 r15 b4 r5 > -- > 2.30.2 > I'll apply tomorrow. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] libavcodec/libxvid: code cleanup (replace magic numbers)
On Tue, Jun 4, 2024 at 2:54 PM Ramiro Polla wrote: > On Thu, May 30, 2024 at 11:24 PM Sean McGovern wrote: > > On Thu, May 30, 2024 at 5:20 PM Ramiro Polla wrote: > > > > > > --- > > > libavcodec/libxvid.c | 4 ++-- > > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > > > diff --git a/libavcodec/libxvid.c b/libavcodec/libxvid.c > > > index b9ac39429d..a490f16b3f 100644 > > > --- a/libavcodec/libxvid.c > > > +++ b/libavcodec/libxvid.c > > > @@ -422,13 +422,13 @@ static av_cold int xvid_encode_init(AVCodecContext > > > *avctx) > > > > > > /* Decide how we should decide blocks */ > > > switch (avctx->mb_decision) { > > > -case 2: > > > +case FF_MB_DECISION_RD: > > > x->vop_flags |= XVID_VOP_MODEDECISION_RD; > > > x->me_flags |= XVID_ME_HALFPELREFINE8_RD| > > > XVID_ME_QUARTERPELREFINE8_RD | > > > XVID_ME_EXTSEARCH_RD | > > > XVID_ME_CHECKPREDICTION_RD; > > > -case 1: > > > +case FF_MB_DECISION_BITS: > > > if (!(x->vop_flags & XVID_VOP_MODEDECISION_RD)) > > > x->vop_flags |= XVID_VOP_FAST_MODEDECISION_RD; > > > x->me_flags |= XVID_ME_HALFPELREFINE16_RD | > > > -- > > > 2.30.2 > > > > > > ___ > > > ffmpeg-devel mailing list > > > ffmpeg-devel@ffmpeg.org > > > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > > > > > To unsubscribe, visit link above, or email > > > ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe". > > > > This gets a +1 from me. > > I'll apply tomorrow. Pushed. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] avcodec/mpegvideo_enc: give magic number a name
On Wed, Jun 5, 2024 at 1:51 AM Michael Niedermayer wrote: > On Tue, Jun 04, 2024 at 03:05:35PM +0200, Ramiro Polla wrote: > > --- > > libavcodec/mpegvideo_enc.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > LGTM Pushed. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH] libswscale/x86/yuv_2_rgb: fix some comments
--- libswscale/x86/yuv_2_rgb.asm | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/libswscale/x86/yuv_2_rgb.asm b/libswscale/x86/yuv_2_rgb.asm index e3470fd9ad..a1f9134e08 100644 --- a/libswscale/x86/yuv_2_rgb.asm +++ b/libswscale/x86/yuv_2_rgb.asm @@ -195,15 +195,15 @@ cglobal %1_420_%2%3, GPR_num, GPR_num, reg_num, parameters mova m5, m7 paddsw m3, m0 ; B1 B3 B5 B7 ... paddsw m5, m1 ; R1 R3 R5 R7 ... -paddsw m7, m2 ; G1 G3 G4 G7 ... +paddsw m7, m2 ; G1 G3 G5 G7 ... paddsw m0, m6 ; B0 B2 B4 B6 ... paddsw m1, m6 ; R0 R2 R4 R6 ... paddsw m2, m6 ; G0 G2 G4 G6 ... %if %3 == 24 ; PACK RGB24 %define depth 3 -packuswb m0, m3 ; R0 R2 R4 R6 ... R1 R3 R5 R7 ... -packuswb m1, m5 ; B0 B2 B4 B6 ... B1 B3 B5 B7 ... +packuswb m0, m3 ; B0 B2 B4 B6 ... B1 B3 B5 B7 ... +packuswb m1, m5 ; R0 R2 R4 R6 ... R1 R3 R5 R7 ... packuswb m2, m7 ; G0 G2 G4 G6 ... G1 G3 G5 G7 ... mova m3, m_red mova m6, m_blue @@ -248,7 +248,7 @@ cglobal %1_420_%2%3, GPR_num, GPR_num, reg_num, parameters psrlq m5, 32 movd [imageq + 20], m2 ; -- -- G7 B7 movd [imageq + 18], m5 ; R6 G6 B6 R7 -%endif ; mmsize = 8 +%endif ; cpuflag %else ; mmsize == 16 pshufb m3, [rgb24_shuf1] ; r0 g0 r6 g6 r12 g12 r2 g2 r8 g8 r14 g14 r4 g4 r10 g10 pshufb m6, [rgb24_shuf2] ; b10 r11 b0 r1 b6 r7 b12 r13 b2 r3 b8 r9 b14 r15 b4 r5 -- 2.30.2 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH] avcodec/mpegvideo_enc: give magic number a name
--- libavcodec/mpegvideo_enc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libavcodec/mpegvideo_enc.c b/libavcodec/mpegvideo_enc.c index 73a9082265..82bab43e14 100644 --- a/libavcodec/mpegvideo_enc.c +++ b/libavcodec/mpegvideo_enc.c @@ -562,7 +562,7 @@ av_cold int ff_mpv_encode_init(AVCodecContext *avctx) if ((s->mpv_flags & FF_MPV_FLAG_QP_RD) && avctx->mb_decision != FF_MB_DECISION_RD) { -av_log(avctx, AV_LOG_ERROR, "QP RD needs mbd=2\n"); +av_log(avctx, AV_LOG_ERROR, "QP RD needs mbd=rd\n"); return AVERROR(EINVAL); } -- 2.30.2 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH 1/2] ffplay: add -scaling_quality option for SDL
On Thu, May 30, 2024 at 11:36 PM Ramiro Polla wrote: > > --- > doc/ffplay.texi | 2 ++ > fftools/ffplay.c | 6 +- > 2 files changed, 7 insertions(+), 1 deletion(-) > > diff --git a/doc/ffplay.texi b/doc/ffplay.texi > index 93f77eeece..60f883e159 100644 > --- a/doc/ffplay.texi > +++ b/doc/ffplay.texi > @@ -72,6 +72,8 @@ as 100. > Force format. > @item -window_title @var{title} > Set window title (default is the input filename). > +@item -scaling_quality @var{value} > +Set SDL_HINT_RENDER_SCALE_QUALITY value (default is "linear"). > @item -left @var{title} > Set the x position for the left of the window (default is a centered window). > @item -top @var{title} > diff --git a/fftools/ffplay.c b/fftools/ffplay.c > index b9d11eecee..75d2bec777 100644 > --- a/fftools/ffplay.c > +++ b/fftools/ffplay.c > @@ -351,6 +351,7 @@ static int filter_nbthreads = 0; > static int enable_vulkan = 0; > static char *vulkan_params = NULL; > static const char *hwaccel = NULL; > +static const char *scaling_quality = NULL; > > /* current context */ > static int is_full_screen; > @@ -3683,6 +3684,7 @@ static const OptionDef options[] = { > { "framedrop", OPT_TYPE_BOOL, OPT_EXPERT, { }, > "drop frames when cpu is too slow", "" }, > { "infbuf", OPT_TYPE_BOOL, OPT_EXPERT, { _buffer > }, "don't limit the input buffer size (useful with realtime streams)", "" }, > { "window_title", OPT_TYPE_STRING, 0, { _title }, > "set window title", "window title" }, > +{ "scaling_quality",OPT_TYPE_STRING, OPT_EXPERT, { _quality > }, "set SDL_HINT_RENDER_SCALE_QUALITY value (default=linear)", "value" }, > { "left", OPT_TYPE_INT,OPT_EXPERT, { _left }, > "set the x position for the left of the window", "x pos" }, > { "top",OPT_TYPE_INT,OPT_EXPERT, { _top }, > "set the y position for the top of the window", "y pos" }, > { "vf", OPT_TYPE_FUNC, OPT_FUNC_ARG | OPT_EXPERT, { > .func_arg = opt_add_vfilter }, "set video filters", "filter_graph" }, > @@ -3831,7 +3833,9 @@ int main(int argc, char **argv) > } > } > window = SDL_CreateWindow(program_name, SDL_WINDOWPOS_UNDEFINED, > SDL_WINDOWPOS_UNDEFINED, default_width, default_height, flags); > -SDL_SetHint(SDL_HINT_RENDER_SCALE_QUALITY, "linear"); > +if (!scaling_quality) > +scaling_quality = "linear"; > +SDL_SetHint(SDL_HINT_RENDER_SCALE_QUALITY, scaling_quality); > if (!window) { > av_log(NULL, AV_LOG_FATAL, "Failed to create window: %s", > SDL_GetError()); > do_exit(NULL); > -- > 2.39.2 > Can anyone comment on this? I had a few doubts on this patch: - does the option name properly convey its functionality? - is the documentation too terse? - should we include the accepted values in the documentation, even though they are sdl-specific? ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] libavcodec/libxvid: code cleanup (replace magic numbers)
On Thu, May 30, 2024 at 11:24 PM Sean McGovern wrote: > On Thu, May 30, 2024 at 5:20 PM Ramiro Polla wrote: > > > > --- > > libavcodec/libxvid.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/libavcodec/libxvid.c b/libavcodec/libxvid.c > > index b9ac39429d..a490f16b3f 100644 > > --- a/libavcodec/libxvid.c > > +++ b/libavcodec/libxvid.c > > @@ -422,13 +422,13 @@ static av_cold int xvid_encode_init(AVCodecContext > > *avctx) > > > > /* Decide how we should decide blocks */ > > switch (avctx->mb_decision) { > > -case 2: > > +case FF_MB_DECISION_RD: > > x->vop_flags |= XVID_VOP_MODEDECISION_RD; > > x->me_flags |= XVID_ME_HALFPELREFINE8_RD| > > XVID_ME_QUARTERPELREFINE8_RD | > > XVID_ME_EXTSEARCH_RD | > > XVID_ME_CHECKPREDICTION_RD; > > -case 1: > > +case FF_MB_DECISION_BITS: > > if (!(x->vop_flags & XVID_VOP_MODEDECISION_RD)) > > x->vop_flags |= XVID_VOP_FAST_MODEDECISION_RD; > > x->me_flags |= XVID_ME_HALFPELREFINE16_RD | > > -- > > 2.30.2 > > > > ___ > > ffmpeg-devel mailing list > > ffmpeg-devel@ffmpeg.org > > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > > > To unsubscribe, visit link above, or email > > ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe". > > This gets a +1 from me. I'll apply tomorrow. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH 2/2] ffplay: set default scaling_quality to "best" instead of "linear"
These values are aliases in SDL, but "best" is a more intuitive name. --- doc/ffplay.texi | 2 +- fftools/ffplay.c | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/doc/ffplay.texi b/doc/ffplay.texi index 60f883e159..e7ff62ae16 100644 --- a/doc/ffplay.texi +++ b/doc/ffplay.texi @@ -73,7 +73,7 @@ Force format. @item -window_title @var{title} Set window title (default is the input filename). @item -scaling_quality @var{value} -Set SDL_HINT_RENDER_SCALE_QUALITY value (default is "linear"). +Set SDL_HINT_RENDER_SCALE_QUALITY value (default is "best"). @item -left @var{title} Set the x position for the left of the window (default is a centered window). @item -top @var{title} diff --git a/fftools/ffplay.c b/fftools/ffplay.c index 75d2bec777..6575ad14a7 100644 --- a/fftools/ffplay.c +++ b/fftools/ffplay.c @@ -3684,7 +3684,7 @@ static const OptionDef options[] = { { "framedrop", OPT_TYPE_BOOL, OPT_EXPERT, { }, "drop frames when cpu is too slow", "" }, { "infbuf", OPT_TYPE_BOOL, OPT_EXPERT, { _buffer }, "don't limit the input buffer size (useful with realtime streams)", "" }, { "window_title", OPT_TYPE_STRING, 0, { _title }, "set window title", "window title" }, -{ "scaling_quality",OPT_TYPE_STRING, OPT_EXPERT, { _quality }, "set SDL_HINT_RENDER_SCALE_QUALITY value (default=linear)", "value" }, +{ "scaling_quality",OPT_TYPE_STRING, OPT_EXPERT, { _quality }, "set SDL_HINT_RENDER_SCALE_QUALITY value (default=best)", "value" }, { "left", OPT_TYPE_INT,OPT_EXPERT, { _left }, "set the x position for the left of the window", "x pos" }, { "top",OPT_TYPE_INT,OPT_EXPERT, { _top }, "set the y position for the top of the window", "y pos" }, { "vf", OPT_TYPE_FUNC, OPT_FUNC_ARG | OPT_EXPERT, { .func_arg = opt_add_vfilter }, "set video filters", "filter_graph" }, @@ -3834,7 +3834,7 @@ int main(int argc, char **argv) } window = SDL_CreateWindow(program_name, SDL_WINDOWPOS_UNDEFINED, SDL_WINDOWPOS_UNDEFINED, default_width, default_height, flags); if (!scaling_quality) -scaling_quality = "linear"; +scaling_quality = "best"; SDL_SetHint(SDL_HINT_RENDER_SCALE_QUALITY, scaling_quality); if (!window) { av_log(NULL, AV_LOG_FATAL, "Failed to create window: %s", SDL_GetError()); -- 2.39.2 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH 1/2] ffplay: add -scaling_quality option for SDL
--- doc/ffplay.texi | 2 ++ fftools/ffplay.c | 6 +- 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/doc/ffplay.texi b/doc/ffplay.texi index 93f77eeece..60f883e159 100644 --- a/doc/ffplay.texi +++ b/doc/ffplay.texi @@ -72,6 +72,8 @@ as 100. Force format. @item -window_title @var{title} Set window title (default is the input filename). +@item -scaling_quality @var{value} +Set SDL_HINT_RENDER_SCALE_QUALITY value (default is "linear"). @item -left @var{title} Set the x position for the left of the window (default is a centered window). @item -top @var{title} diff --git a/fftools/ffplay.c b/fftools/ffplay.c index b9d11eecee..75d2bec777 100644 --- a/fftools/ffplay.c +++ b/fftools/ffplay.c @@ -351,6 +351,7 @@ static int filter_nbthreads = 0; static int enable_vulkan = 0; static char *vulkan_params = NULL; static const char *hwaccel = NULL; +static const char *scaling_quality = NULL; /* current context */ static int is_full_screen; @@ -3683,6 +3684,7 @@ static const OptionDef options[] = { { "framedrop", OPT_TYPE_BOOL, OPT_EXPERT, { }, "drop frames when cpu is too slow", "" }, { "infbuf", OPT_TYPE_BOOL, OPT_EXPERT, { _buffer }, "don't limit the input buffer size (useful with realtime streams)", "" }, { "window_title", OPT_TYPE_STRING, 0, { _title }, "set window title", "window title" }, +{ "scaling_quality",OPT_TYPE_STRING, OPT_EXPERT, { _quality }, "set SDL_HINT_RENDER_SCALE_QUALITY value (default=linear)", "value" }, { "left", OPT_TYPE_INT,OPT_EXPERT, { _left }, "set the x position for the left of the window", "x pos" }, { "top",OPT_TYPE_INT,OPT_EXPERT, { _top }, "set the y position for the top of the window", "y pos" }, { "vf", OPT_TYPE_FUNC, OPT_FUNC_ARG | OPT_EXPERT, { .func_arg = opt_add_vfilter }, "set video filters", "filter_graph" }, @@ -3831,7 +3833,9 @@ int main(int argc, char **argv) } } window = SDL_CreateWindow(program_name, SDL_WINDOWPOS_UNDEFINED, SDL_WINDOWPOS_UNDEFINED, default_width, default_height, flags); -SDL_SetHint(SDL_HINT_RENDER_SCALE_QUALITY, "linear"); +if (!scaling_quality) +scaling_quality = "linear"; +SDL_SetHint(SDL_HINT_RENDER_SCALE_QUALITY, scaling_quality); if (!window) { av_log(NULL, AV_LOG_FATAL, "Failed to create window: %s", SDL_GetError()); do_exit(NULL); -- 2.39.2 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH] libavcodec/libxvid: code cleanup (replace magic numbers)
--- libavcodec/libxvid.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/libavcodec/libxvid.c b/libavcodec/libxvid.c index b9ac39429d..a490f16b3f 100644 --- a/libavcodec/libxvid.c +++ b/libavcodec/libxvid.c @@ -422,13 +422,13 @@ static av_cold int xvid_encode_init(AVCodecContext *avctx) /* Decide how we should decide blocks */ switch (avctx->mb_decision) { -case 2: +case FF_MB_DECISION_RD: x->vop_flags |= XVID_VOP_MODEDECISION_RD; x->me_flags |= XVID_ME_HALFPELREFINE8_RD| XVID_ME_QUARTERPELREFINE8_RD | XVID_ME_EXTSEARCH_RD | XVID_ME_CHECKPREDICTION_RD; -case 1: +case FF_MB_DECISION_BITS: if (!(x->vop_flags & XVID_VOP_MODEDECISION_RD)) x->vop_flags |= XVID_VOP_FAST_MODEDECISION_RD; x->me_flags |= XVID_ME_HALFPELREFINE16_RD | -- 2.30.2 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH v3 1/2] checkasm: add test for fdct
On Mon, May 13, 2024 at 6:49 PM James Almer wrote: > On 5/6/2024 2:49 PM, Rémi Denis-Courmont wrote: > > Le maanantaina 6. toukokuuta 2024, 20.18.11 EEST Ramiro Polla a écrit : > >> I'll send a similar patch to fix checkasm/idctdsp after this is merged. > > > > The idctdsp test does not actually test the iDCT, but only the trivial-ish > > add/put helpers, so it does not care about the context. You're welcome to > > fix > > it anyway of course. > > I personally find it ugly how we're storing a whole AVCodecContext on > stack in these tests just to pass two ints to an init function. > Maybe we can make said values be input parameters for these instead of a > pointer to avctx. It could make sense for fdct, but for idct we need a few more parameters (bits_per_raw_sample, codec_id, flags, idct_algo, lowres). That would make the function calls much longer, and in that case I'd prefer just keeping AVCodecContext. Or having an input parameter structure for each *dsp context, but that seems a bit overkill. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH v3 0/2] lavc/aarch64/fdct: add neon-optimized fdct for aarch64
On Wed, Apr 17, 2024 at 10:49 PM Martin Storsjö wrote: > On Wed, 17 Apr 2024, Ramiro Polla wrote: > > This patch set adds fdct to checkasm and neon-optimized fdct for aarch64. > > > > Ramiro Polla (2): > > checkasm: add test for fdct > > lavc/aarch64/fdct: add neon-optimized fdct for aarch64 > > > > libavcodec/aarch64/Makefile | 2 + > > libavcodec/aarch64/fdct.h | 26 ++ > > libavcodec/aarch64/fdctdsp_init_aarch64.c | 39 +++ > > libavcodec/aarch64/fdctdsp_neon.S | 368 ++ > > libavcodec/avcodec.h | 1 + > > libavcodec/fdctdsp.c | 4 +- > > libavcodec/fdctdsp.h | 2 + > > libavcodec/options_table.h| 1 + > > libavcodec/tests/aarch64/dct.c| 2 + > > tests/checkasm/Makefile | 1 + > > tests/checkasm/checkasm.c | 3 + > > tests/checkasm/checkasm.h | 1 + > > tests/checkasm/fdctdsp.c | 68 > > tests/fate/checkasm.mak | 1 + > > 14 files changed, 518 insertions(+), 1 deletion(-) > > create mode 100644 libavcodec/aarch64/fdct.h > > create mode 100644 libavcodec/aarch64/fdctdsp_init_aarch64.c > > create mode 100644 libavcodec/aarch64/fdctdsp_neon.S > > create mode 100644 tests/checkasm/fdctdsp.c > > LGTM, thanks! Pushed. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] lavc/aarch64: fix include for cpu.h
On Mon, May 13, 2024 at 12:15 PM Martin Storsjö wrote: > On Sat, 11 May 2024, Ramiro Polla wrote: > > On Sun, Jan 21, 2024 at 10:57 PM Ramiro Polla > > wrote: > >> > >> --- > >> libavcodec/aarch64/idctdsp_init_aarch64.c | 2 +- > >> 1 file changed, 1 insertion(+), 1 deletion(-) > >> > >> diff --git a/libavcodec/aarch64/idctdsp_init_aarch64.c > >> b/libavcodec/aarch64/idctdsp_init_aarch64.c > >> index eec21aa5a2..8efd5f5323 100644 > >> --- a/libavcodec/aarch64/idctdsp_init_aarch64.c > >> +++ b/libavcodec/aarch64/idctdsp_init_aarch64.c > >> @@ -22,7 +22,7 @@ > >> > >> #include "libavutil/attributes.h" > >> #include "libavutil/cpu.h" > >> -#include "libavutil/arm/cpu.h" > >> +#include "libavutil/aarch64/cpu.h" > >> #include "libavcodec/avcodec.h" > >> #include "libavcodec/idctdsp.h" > >> #include "idct.h" > >> -- > >> 2.30.2 > >> > > > > I'll apply if there are no objections. > > LGTM Thanks. Pushed. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH v4 1/2] checkasm: add test for fdct
On Sat, May 11, 2024 at 10:32 AM Ramiro Polla wrote: > On Mon, May 6, 2024 at 7:46 PM Rémi Denis-Courmont wrote: [...] > > No objections from me, but it would be nice and seemingly trivial to add 9 > > and > > 10 bits while at it. [...] > I'll add checks for the 9 and 10 bits later. Apparently we have no assembly versions of 9 and 10 bits fdct, so there's not much point in adding it to checkasm for the time being. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH v4 1/2] checkasm: add test for fdct
On Mon, May 6, 2024 at 7:46 PM Rémi Denis-Courmont wrote: > Le maanantaina 6. toukokuuta 2024, 20.18.39 EEST Ramiro Polla a écrit : > > Reviewed-by: Martin Storsjö > > Reviewed-by: Rémi Denis-Courmont > > --- > > tests/checkasm/Makefile | 1 + > > tests/checkasm/checkasm.c | 3 ++ > > tests/checkasm/checkasm.h | 1 + > > tests/checkasm/fdctdsp.c | 71 +++ > > tests/fate/checkasm.mak | 1 + > > 5 files changed, 77 insertions(+) > > create mode 100644 tests/checkasm/fdctdsp.c > > > > diff --git a/tests/checkasm/Makefile b/tests/checkasm/Makefile > > index 3e40aba2c3..b5bb885201 100644 > > --- a/tests/checkasm/Makefile > > +++ b/tests/checkasm/Makefile > > @@ -4,6 +4,7 @@ AVCODECOBJS-$(CONFIG_AC3DSP)+= ac3dsp.o > > AVCODECOBJS-$(CONFIG_AUDIODSP) += audiodsp.o > > AVCODECOBJS-$(CONFIG_BLOCKDSP) += blockdsp.o > > AVCODECOBJS-$(CONFIG_BSWAPDSP) += bswapdsp.o > > +AVCODECOBJS-$(CONFIG_FDCTDSP) += fdctdsp.o > > AVCODECOBJS-$(CONFIG_FMTCONVERT)+= fmtconvert.o > > AVCODECOBJS-$(CONFIG_G722DSP) += g722dsp.o > > AVCODECOBJS-$(CONFIG_H264CHROMA)+= h264chroma.o > > diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c > > index 9be32fc16e..e5d39e2116 100644 > > --- a/tests/checkasm/checkasm.c > > +++ b/tests/checkasm/checkasm.c > > @@ -106,6 +106,9 @@ static const struct { > > #if CONFIG_EXR_DECODER > > { "exrdsp", checkasm_check_exrdsp }, > > #endif > > +#if CONFIG_FDCTDSP > > +{ "fdctdsp", checkasm_check_fdctdsp }, > > +#endif > > #if CONFIG_FLAC_DECODER > > { "flacdsp", checkasm_check_flacdsp }, > > #endif > > diff --git a/tests/checkasm/checkasm.h b/tests/checkasm/checkasm.h > > index 173360af60..8807a37a43 100644 > > --- a/tests/checkasm/checkasm.h > > +++ b/tests/checkasm/checkasm.h > > @@ -85,6 +85,7 @@ void checkasm_check_blockdsp(void); > > void checkasm_check_bswapdsp(void); > > void checkasm_check_colorspace(void); > > void checkasm_check_exrdsp(void); > > +void checkasm_check_fdctdsp(void); > > void checkasm_check_fixed_dsp(void); > > void checkasm_check_flacdsp(void); > > void checkasm_check_float_dsp(void); > > diff --git a/tests/checkasm/fdctdsp.c b/tests/checkasm/fdctdsp.c > > new file mode 100644 > > index 00..c640a00656 > > --- /dev/null > > +++ b/tests/checkasm/fdctdsp.c > > @@ -0,0 +1,71 @@ > > +/* > > + * This file is part of FFmpeg. > > + * > > + * FFmpeg is free software; you can redistribute it and/or modify > > + * it under the terms of the GNU General Public License as published by > > + * the Free Software Foundation; either version 2 of the License, or > > + * (at your option) any later version. > > + * > > + * FFmpeg is distributed in the hope that it will be useful, > > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > > + * GNU General Public License for more details. > > + * > > + * You should have received a copy of the GNU General Public License along > > + * with FFmpeg; if not, write to the Free Software Foundation, Inc., > > + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. > > + */ > > + > > +#include > > + > > +#include "checkasm.h" > > + > > +#include "libavcodec/avcodec.h" > > +#include "libavcodec/fdctdsp.h" > > + > > +#include "libavutil/common.h" > > +#include "libavutil/internal.h" > > +#include "libavutil/mem_internal.h" > > + > > +static int int16_cmp_off_by_n(const int16_t *ref, const int16_t *test, > > size_t n, int accuracy) +{ > > +for (size_t i = 0; i < n; i++) { > > +if (abs(ref[i] - test[i]) > accuracy) > > +return 1; > > +} > > +return 0; > > +} > > + > > +static void check_fdct(void) > > +{ > > +LOCAL_ALIGNED_16(int16_t, block0, [64]); > > +LOCAL_ALIGNED_16(int16_t, block1, [64]); > > + > > +AVCodecContext avctx = { > > +.bits_per_raw_sample = 8, > > +.dct_algo = FF_DCT_AUTO, > > +}; > > +FDCTDSPContext h; > > + > > +ff_fdctdsp_init(, ); > > + > > +if (check_func(h.fdct, "fdct")) { > > +declare_func(v
Re: [FFmpeg-devel] [PATCH] lavc/aarch64: fix include for cpu.h
On Sun, Jan 21, 2024 at 10:57 PM Ramiro Polla wrote: > > --- > libavcodec/aarch64/idctdsp_init_aarch64.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/libavcodec/aarch64/idctdsp_init_aarch64.c > b/libavcodec/aarch64/idctdsp_init_aarch64.c > index eec21aa5a2..8efd5f5323 100644 > --- a/libavcodec/aarch64/idctdsp_init_aarch64.c > +++ b/libavcodec/aarch64/idctdsp_init_aarch64.c > @@ -22,7 +22,7 @@ > > #include "libavutil/attributes.h" > #include "libavutil/cpu.h" > -#include "libavutil/arm/cpu.h" > +#include "libavutil/aarch64/cpu.h" > #include "libavcodec/avcodec.h" > #include "libavcodec/idctdsp.h" > #include "idct.h" > -- > 2.30.2 > I'll apply if there are no objections. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH v2 1/2] libavcodec/mpegvideo_enc: fix multi-threaded motion estimation rounding for mpeg4
On Thu, May 9, 2024 at 2:44 AM Michael Niedermayer wrote: > On Wed, May 08, 2024 at 05:19:49PM +0200, Ramiro Polla wrote: > > ff_init_me() was being called after ff_update_duplicate_context(), > > which caused the propagation of the initialization to other thread > > contexts to be delayed by one frame. > > > > In the case of mpeg4 (or flipflop_rounding), this would make the > > hpel_put functions differ between the first thread (which would be > > correctly initialized) and the other threads (which would be stale > > from the previous frame). > > --- > > libavcodec/mpegvideo_enc.c | 6 +++--- > > 1 file changed, 3 insertions(+), 3 deletions(-) > > have you confirmed the actual used rounding matches after this > encoder & decoder side ? Yes, I just rechecked it. It used to be wrong (only the first slice would use the correct hpel/qpel functions in the encoder according to the no_rounding flag in the bitstream). > if yes then this should be ok Thanks. Pushed. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH v2 2/2] libavcodec/motion_est: fix penalty_factor for b frames
On Wed, May 8, 2024 at 11:47 PM Michael Niedermayer wrote: > On Wed, May 08, 2024 at 05:19:50PM +0200, Ramiro Polla wrote: > > In direct_search() and ff_estimate_b_frame_motion(), penalty_factor > > would be used before being initialized in estimate_motion_b(). Also, > > the initialization would happen more than once unnecessarily. > > --- > > libavcodec/motion_est.c | 15 --- > > tests/ref/vsynth/vsynth1-mpeg4-thread| 6 +++--- > > tests/ref/vsynth/vsynth2-mpeg2-ivlc-qprd | 6 +++--- > > tests/ref/vsynth/vsynth2-mpeg4-adap | 8 > > tests/ref/vsynth/vsynth2-mpeg4-qprd | 6 +++--- > > tests/ref/vsynth/vsynth2-mpeg4-thread| 6 +++--- > > tests/ref/vsynth/vsynth_lena-mpeg4-rc| 4 ++-- > > 7 files changed, 26 insertions(+), 25 deletions(-) > > probably ok Thanks. Pushed. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH v2 2/2] libavcodec/motion_est: fix penalty_factor for b frames
In direct_search() and ff_estimate_b_frame_motion(), penalty_factor would be used before being initialized in estimate_motion_b(). Also, the initialization would happen more than once unnecessarily. --- libavcodec/motion_est.c | 15 --- tests/ref/vsynth/vsynth1-mpeg4-thread| 6 +++--- tests/ref/vsynth/vsynth2-mpeg2-ivlc-qprd | 6 +++--- tests/ref/vsynth/vsynth2-mpeg4-adap | 8 tests/ref/vsynth/vsynth2-mpeg4-qprd | 6 +++--- tests/ref/vsynth/vsynth2-mpeg4-thread| 6 +++--- tests/ref/vsynth/vsynth_lena-mpeg4-rc| 4 ++-- 7 files changed, 26 insertions(+), 25 deletions(-) diff --git a/libavcodec/motion_est.c b/libavcodec/motion_est.c index df9d1befa8..fb569ede8a 100644 --- a/libavcodec/motion_est.c +++ b/libavcodec/motion_est.c @@ -1127,9 +1127,6 @@ static int estimate_motion_b(MpegEncContext *s, int mb_x, int mb_y, const uint8_t * const mv_penalty = c->mv_penalty[f_code] + MAX_DMV; int mv_scale; -c->penalty_factor= get_penalty_factor(s->lambda, s->lambda2, c->avctx->me_cmp); -c->sub_penalty_factor= get_penalty_factor(s->lambda, s->lambda2, c->avctx->me_sub_cmp); -c->mb_penalty_factor = get_penalty_factor(s->lambda, s->lambda2, c->avctx->mb_cmp); c->current_mv_penalty= mv_penalty; get_limits(s, 16*mb_x, 16*mb_y); @@ -1495,7 +1492,6 @@ void ff_estimate_b_frame_motion(MpegEncContext * s, int mb_x, int mb_y) { MotionEstContext * const c= >me; -const int penalty_factor= c->mb_penalty_factor; int fmin, bmin, dmin, fbmin, bimin, fimin; int type=0; const int xy = mb_y*s->mb_stride + mb_x; @@ -1517,22 +1513,27 @@ void ff_estimate_b_frame_motion(MpegEncContext * s, return; } +c->penalty_factor= get_penalty_factor(s->lambda, s->lambda2, c->avctx->me_cmp); +c->sub_penalty_factor= get_penalty_factor(s->lambda, s->lambda2, c->avctx->me_sub_cmp); +c->mb_penalty_factor = get_penalty_factor(s->lambda, s->lambda2, c->avctx->mb_cmp); + if (s->codec_id == AV_CODEC_ID_MPEG4) dmin= direct_search(s, mb_x, mb_y); else dmin= INT_MAX; + // FIXME penalty stuff for non-MPEG-4 c->skip=0; fmin = estimate_motion_b(s, mb_x, mb_y, s->b_forw_mv_table, 0, s->f_code) + - 3 * penalty_factor; + 3 * c->mb_penalty_factor; c->skip=0; bmin = estimate_motion_b(s, mb_x, mb_y, s->b_back_mv_table, 2, s->b_code) + - 2 * penalty_factor; + 2 * c->mb_penalty_factor; ff_dlog(s, " %d %d ", s->b_forw_mv_table[xy][0], s->b_forw_mv_table[xy][1]); c->skip=0; -fbmin= bidir_refine(s, mb_x, mb_y) + penalty_factor; +fbmin= bidir_refine(s, mb_x, mb_y) + c->mb_penalty_factor; ff_dlog(s, "%d %d %d %d\n", dmin, fmin, bmin, fbmin); if (s->avctx->flags & AV_CODEC_FLAG_INTERLACED_ME) { diff --git a/tests/ref/vsynth/vsynth1-mpeg4-thread b/tests/ref/vsynth/vsynth1-mpeg4-thread index 6b69fb4c12..6b110c49fb 100644 --- a/tests/ref/vsynth/vsynth1-mpeg4-thread +++ b/tests/ref/vsynth/vsynth1-mpeg4-thread @@ -1,4 +1,4 @@ -369ace2f9613261af869efd9fbb3c149 *tests/data/fate/vsynth1-mpeg4-thread.avi -774754 tests/data/fate/vsynth1-mpeg4-thread.avi -9aa327a244d5179acf7fe64dc1459bff *tests/data/fate/vsynth1-mpeg4-thread.out.rawvideo +7761391e354266976a9e0155eff983dd *tests/data/fate/vsynth1-mpeg4-thread.avi +774752 tests/data/fate/vsynth1-mpeg4-thread.avi +bbdbe9af4f5b106b847595bf3040699f *tests/data/fate/vsynth1-mpeg4-thread.out.rawvideo stddev: 10.13 PSNR: 28.02 MAXDIFF: 183 bytes: 7603200/ 7603200 diff --git a/tests/ref/vsynth/vsynth2-mpeg2-ivlc-qprd b/tests/ref/vsynth/vsynth2-mpeg2-ivlc-qprd index 16de39edfc..f5bbecfcb2 100644 --- a/tests/ref/vsynth/vsynth2-mpeg2-ivlc-qprd +++ b/tests/ref/vsynth/vsynth2-mpeg2-ivlc-qprd @@ -1,4 +1,4 @@ -907a30295ed8323780eee08e606af0ab *tests/data/fate/vsynth2-mpeg2-ivlc-qprd.mpeg2video -269722 tests/data/fate/vsynth2-mpeg2-ivlc-qprd.mpeg2video -d2d9793bf8f3427b5cc17a1be78ddd64 *tests/data/fate/vsynth2-mpeg2-ivlc-qprd.out.rawvideo +f612ea89aa79a7f7b93a8acf332705c4 *tests/data/fate/vsynth2-mpeg2-ivlc-qprd.mpeg2video +269723 tests/data/fate/vsynth2-mpeg2-ivlc-qprd.mpeg2video +88e17886e6383755829d7da519fd5e79 *tests/data/fate/vsynth2-mpeg2-ivlc-qprd.out.rawvideo stddev:5.54 PSNR: 33.25 MAXDIFF: 94 bytes: 7603200/ 7603200 diff --git a/tests/ref/vsynth/vsynth2-mpeg4-adap b/tests/ref/vsynth/vsynth2-mpeg4-adap index 35b2b6aac9..e058cd1ce3 100644 --- a/tests/ref/vsynth/vsynth2-mpeg4-adap +++ b/tests/ref/vsynth/vsynth2-mpeg4-adap @@ -1,4 +1,4 @@ -06a397fe43dab7b6cf56870410fbbbaf *tests/data/fate/vsynth2-mpeg4-adap.avi -203000 tests/data/fate/vsynth2-mpeg4-adap.avi -686565d42d8ba5aea790824b04fa0a18 *tests/data/fate/vsynth2-mpeg4-adap.out.rawvideo -stddev:4.55 PSNR: 34.95 MAXDIFF: 84 bytes: 7603200/ 7603200 +9465ef120d560537d8fcfb5564782e01 *tests/data/fate/vsynth2-mpeg4-adap.avi +203004
[FFmpeg-devel] [PATCH v2 1/2] libavcodec/mpegvideo_enc: fix multi-threaded motion estimation rounding for mpeg4
ff_init_me() was being called after ff_update_duplicate_context(), which caused the propagation of the initialization to other thread contexts to be delayed by one frame. In the case of mpeg4 (or flipflop_rounding), this would make the hpel_put functions differ between the first thread (which would be correctly initialized) and the other threads (which would be stale from the previous frame). --- libavcodec/mpegvideo_enc.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/libavcodec/mpegvideo_enc.c b/libavcodec/mpegvideo_enc.c index 2a75973ac4..b601a1a9e4 100644 --- a/libavcodec/mpegvideo_enc.c +++ b/libavcodec/mpegvideo_enc.c @@ -3623,6 +3623,9 @@ static int encode_picture(MpegEncContext *s) s->q_chroma_intra_matrix16 = s->q_intra_matrix16; } +if(ff_init_me(s)<0) +return -1; + s->mb_intra=0; //for the rate distortion & bit compare functions for(i=1; ithread_context[i], s); @@ -3630,9 +3633,6 @@ static int encode_picture(MpegEncContext *s) return ret; } -if(ff_init_me(s)<0) -return -1; - /* Estimate motion for every MB */ if(s->pict_type != AV_PICTURE_TYPE_I){ s->lambda = (s->lambda * s->me_penalty_compensation + 128) >> 8; -- 2.30.2 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH v4 1/2] checkasm: add test for fdct
Reviewed-by: Martin Storsjö Reviewed-by: Rémi Denis-Courmont --- tests/checkasm/Makefile | 1 + tests/checkasm/checkasm.c | 3 ++ tests/checkasm/checkasm.h | 1 + tests/checkasm/fdctdsp.c | 71 +++ tests/fate/checkasm.mak | 1 + 5 files changed, 77 insertions(+) create mode 100644 tests/checkasm/fdctdsp.c diff --git a/tests/checkasm/Makefile b/tests/checkasm/Makefile index 3e40aba2c3..b5bb885201 100644 --- a/tests/checkasm/Makefile +++ b/tests/checkasm/Makefile @@ -4,6 +4,7 @@ AVCODECOBJS-$(CONFIG_AC3DSP)+= ac3dsp.o AVCODECOBJS-$(CONFIG_AUDIODSP) += audiodsp.o AVCODECOBJS-$(CONFIG_BLOCKDSP) += blockdsp.o AVCODECOBJS-$(CONFIG_BSWAPDSP) += bswapdsp.o +AVCODECOBJS-$(CONFIG_FDCTDSP) += fdctdsp.o AVCODECOBJS-$(CONFIG_FMTCONVERT)+= fmtconvert.o AVCODECOBJS-$(CONFIG_G722DSP) += g722dsp.o AVCODECOBJS-$(CONFIG_H264CHROMA)+= h264chroma.o diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c index 9be32fc16e..e5d39e2116 100644 --- a/tests/checkasm/checkasm.c +++ b/tests/checkasm/checkasm.c @@ -106,6 +106,9 @@ static const struct { #if CONFIG_EXR_DECODER { "exrdsp", checkasm_check_exrdsp }, #endif +#if CONFIG_FDCTDSP +{ "fdctdsp", checkasm_check_fdctdsp }, +#endif #if CONFIG_FLAC_DECODER { "flacdsp", checkasm_check_flacdsp }, #endif diff --git a/tests/checkasm/checkasm.h b/tests/checkasm/checkasm.h index 173360af60..8807a37a43 100644 --- a/tests/checkasm/checkasm.h +++ b/tests/checkasm/checkasm.h @@ -85,6 +85,7 @@ void checkasm_check_blockdsp(void); void checkasm_check_bswapdsp(void); void checkasm_check_colorspace(void); void checkasm_check_exrdsp(void); +void checkasm_check_fdctdsp(void); void checkasm_check_fixed_dsp(void); void checkasm_check_flacdsp(void); void checkasm_check_float_dsp(void); diff --git a/tests/checkasm/fdctdsp.c b/tests/checkasm/fdctdsp.c new file mode 100644 index 00..c640a00656 --- /dev/null +++ b/tests/checkasm/fdctdsp.c @@ -0,0 +1,71 @@ +/* + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License along + * with FFmpeg; if not, write to the Free Software Foundation, Inc., + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. + */ + +#include + +#include "checkasm.h" + +#include "libavcodec/avcodec.h" +#include "libavcodec/fdctdsp.h" + +#include "libavutil/common.h" +#include "libavutil/internal.h" +#include "libavutil/mem_internal.h" + +static int int16_cmp_off_by_n(const int16_t *ref, const int16_t *test, size_t n, int accuracy) +{ +for (size_t i = 0; i < n; i++) { +if (abs(ref[i] - test[i]) > accuracy) +return 1; +} +return 0; +} + +static void check_fdct(void) +{ +LOCAL_ALIGNED_16(int16_t, block0, [64]); +LOCAL_ALIGNED_16(int16_t, block1, [64]); + +AVCodecContext avctx = { +.bits_per_raw_sample = 8, +.dct_algo = FF_DCT_AUTO, +}; +FDCTDSPContext h; + +ff_fdctdsp_init(, ); + +if (check_func(h.fdct, "fdct")) { +declare_func(void, int16_t *); +for (int i = 0; i < 64; i++) { +uint8_t r = rnd(); +block0[i] = r; +block1[i] = r; +} +call_ref(block0); +call_new(block1); +if (int16_cmp_off_by_n(block0, block1, 64, 2)) +fail(); +bench_new(block1); +} +} + +void checkasm_check_fdctdsp(void) +{ +check_fdct(); +report("fdctdsp"); +} diff --git a/tests/fate/checkasm.mak b/tests/fate/checkasm.mak index 4a8e312da9..9b5e2b0d98 100644 --- a/tests/fate/checkasm.mak +++ b/tests/fate/checkasm.mak @@ -8,6 +8,7 @@ FATE_CHECKASM = fate-checkasm-aacencdsp \ fate-checkasm-blockdsp \ fate-checkasm-bswapdsp \ fate-checkasm-exrdsp\ +fate-checkasm-fdctdsp \ fate-checkasm-fixed_dsp \ fate-checkasm-flacdsp \ fate-checkasm-float_dsp \ -- 2.30.2 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To
Re: [FFmpeg-devel] [PATCH v3 1/2] checkasm: add test for fdct
On Thu, May 2, 2024 at 8:05 PM Rémi Denis-Courmont wrote: > Le keskiviikkona 17. huhtikuuta 2024, 21.01.37 EEST Ramiro Polla a écrit : [...] > > +static void check_fdct(void) > > +{ > > +LOCAL_ALIGNED_16(int16_t, block0, [64]); > > +LOCAL_ALIGNED_16(int16_t, block1, [64]); > > + > > +AVCodecContext avctx = { 0 }; > > AFAICT, that is not a legal context for ff_fdctdst_init(), which expect > bits_per_raw_sample to be one of 8, 9 or 10. It would also be good manners to > initialise dct_algo. Thanks for spotting it. New patch coming up in a while. I'll send a similar patch to fix checkasm/idctdsp after this is merged. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH v3 2/2] lavc/aarch64/fdct: add neon-optimized fdct for aarch64
The code is imported from libjpeg-turbo-3.0.1. The neon registers used have been changed to avoid modifying v8-v15. Reviewed-by: Martin Storsjö --- libavcodec/aarch64/Makefile | 2 + libavcodec/aarch64/fdct.h | 26 ++ libavcodec/aarch64/fdctdsp_init_aarch64.c | 39 +++ libavcodec/aarch64/fdctdsp_neon.S | 368 ++ libavcodec/avcodec.h | 1 + libavcodec/fdctdsp.c | 4 +- libavcodec/fdctdsp.h | 2 + libavcodec/options_table.h| 1 + libavcodec/tests/aarch64/dct.c| 2 + 9 files changed, 444 insertions(+), 1 deletion(-) create mode 100644 libavcodec/aarch64/fdct.h create mode 100644 libavcodec/aarch64/fdctdsp_init_aarch64.c create mode 100644 libavcodec/aarch64/fdctdsp_neon.S diff --git a/libavcodec/aarch64/Makefile b/libavcodec/aarch64/Makefile index 95ad4dd202..a3256bb1cc 100644 --- a/libavcodec/aarch64/Makefile +++ b/libavcodec/aarch64/Makefile @@ -1,5 +1,6 @@ # subsystems OBJS-$(CONFIG_AC3DSP) += aarch64/ac3dsp_init_aarch64.o +OBJS-$(CONFIG_FDCTDSP) += aarch64/fdctdsp_init_aarch64.o OBJS-$(CONFIG_FMTCONVERT) += aarch64/fmtconvert_init.o OBJS-$(CONFIG_H264CHROMA) += aarch64/h264chroma_init_aarch64.o OBJS-$(CONFIG_H264DSP) += aarch64/h264dsp_init_aarch64.o @@ -37,6 +38,7 @@ ARMV8-OBJS-$(CONFIG_VIDEODSP) += aarch64/videodsp.o # subsystems NEON-OBJS-$(CONFIG_AAC_DECODER) += aarch64/sbrdsp_neon.o NEON-OBJS-$(CONFIG_AC3DSP) += aarch64/ac3dsp_neon.o +NEON-OBJS-$(CONFIG_FDCTDSP) += aarch64/fdctdsp_neon.o NEON-OBJS-$(CONFIG_FMTCONVERT) += aarch64/fmtconvert_neon.o NEON-OBJS-$(CONFIG_H264CHROMA) += aarch64/h264cmc_neon.o NEON-OBJS-$(CONFIG_H264DSP) += aarch64/h264dsp_neon.o \ diff --git a/libavcodec/aarch64/fdct.h b/libavcodec/aarch64/fdct.h new file mode 100644 index 00..0901b53a83 --- /dev/null +++ b/libavcodec/aarch64/fdct.h @@ -0,0 +1,26 @@ +/* + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#ifndef AVCODEC_AARCH64_FDCT_H +#define AVCODEC_AARCH64_FDCT_H + +#include + +void ff_fdct_neon(int16_t *block); + +#endif /* AVCODEC_AARCH64_FDCT_H */ diff --git a/libavcodec/aarch64/fdctdsp_init_aarch64.c b/libavcodec/aarch64/fdctdsp_init_aarch64.c new file mode 100644 index 00..59d91bc8fc --- /dev/null +++ b/libavcodec/aarch64/fdctdsp_init_aarch64.c @@ -0,0 +1,39 @@ +/* + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include "libavutil/attributes.h" +#include "libavutil/cpu.h" +#include "libavutil/aarch64/cpu.h" +#include "libavcodec/avcodec.h" +#include "libavcodec/fdctdsp.h" +#include "fdct.h" + +av_cold void ff_fdctdsp_init_aarch64(FDCTDSPContext *c, AVCodecContext *avctx, + unsigned high_bit_depth) +{ +int cpu_flags = av_get_cpu_flags(); + +if (have_neon(cpu_flags)) { +if (!high_bit_depth) { +if (avctx->dct_algo == FF_DCT_AUTO || +avctx->dct_algo == FF_DCT_NEON) { +c->fdct = ff_fdct_neon; +} +} +} +} diff --git a/libavcodec/aarch64/fdctdsp_neon.S b/libavcodec/aarch64/fdctdsp_neon.S new file mode 100644 index 00..53fa4debe5 --- /dev/null +++ b/libavcodec/aarch64/fdctdsp_neon.S @@ -0,0 +1,368 @@ +/* + * Armv8 Neon optimizations for libjpeg-turbo + * + * Copyright (C) 2009-2011, Nokia Corporation and/or
[FFmpeg-devel] [PATCH v3 1/2] checkasm: add test for fdct
Reviewed-by: Martin Storsjö --- tests/checkasm/Makefile | 1 + tests/checkasm/checkasm.c | 3 ++ tests/checkasm/checkasm.h | 1 + tests/checkasm/fdctdsp.c | 68 +++ tests/fate/checkasm.mak | 1 + 5 files changed, 74 insertions(+) create mode 100644 tests/checkasm/fdctdsp.c diff --git a/tests/checkasm/Makefile b/tests/checkasm/Makefile index 2673e1d098..70a6120c70 100644 --- a/tests/checkasm/Makefile +++ b/tests/checkasm/Makefile @@ -4,6 +4,7 @@ AVCODECOBJS-$(CONFIG_AC3DSP)+= ac3dsp.o AVCODECOBJS-$(CONFIG_AUDIODSP) += audiodsp.o AVCODECOBJS-$(CONFIG_BLOCKDSP) += blockdsp.o AVCODECOBJS-$(CONFIG_BSWAPDSP) += bswapdsp.o +AVCODECOBJS-$(CONFIG_FDCTDSP) += fdctdsp.o AVCODECOBJS-$(CONFIG_FMTCONVERT)+= fmtconvert.o AVCODECOBJS-$(CONFIG_G722DSP) += g722dsp.o AVCODECOBJS-$(CONFIG_H264CHROMA)+= h264chroma.o diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c index 8be6cb0f55..92c3a30ad3 100644 --- a/tests/checkasm/checkasm.c +++ b/tests/checkasm/checkasm.c @@ -106,6 +106,9 @@ static const struct { #if CONFIG_EXR_DECODER { "exrdsp", checkasm_check_exrdsp }, #endif +#if CONFIG_FDCTDSP +{ "fdctdsp", checkasm_check_fdctdsp }, +#endif #if CONFIG_FLAC_DECODER { "flacdsp", checkasm_check_flacdsp }, #endif diff --git a/tests/checkasm/checkasm.h b/tests/checkasm/checkasm.h index f90920dee7..d3e8f9a37a 100644 --- a/tests/checkasm/checkasm.h +++ b/tests/checkasm/checkasm.h @@ -85,6 +85,7 @@ void checkasm_check_blockdsp(void); void checkasm_check_bswapdsp(void); void checkasm_check_colorspace(void); void checkasm_check_exrdsp(void); +void checkasm_check_fdctdsp(void); void checkasm_check_fixed_dsp(void); void checkasm_check_flacdsp(void); void checkasm_check_float_dsp(void); diff --git a/tests/checkasm/fdctdsp.c b/tests/checkasm/fdctdsp.c new file mode 100644 index 00..68a9b5e435 --- /dev/null +++ b/tests/checkasm/fdctdsp.c @@ -0,0 +1,68 @@ +/* + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License along + * with FFmpeg; if not, write to the Free Software Foundation, Inc., + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. + */ + +#include + +#include "checkasm.h" + +#include "libavcodec/avcodec.h" +#include "libavcodec/fdctdsp.h" + +#include "libavutil/common.h" +#include "libavutil/internal.h" +#include "libavutil/mem_internal.h" + +static int int16_cmp_off_by_n(const int16_t *ref, const int16_t *test, size_t n, int accuracy) +{ +for (size_t i = 0; i < n; i++) { +if (abs(ref[i] - test[i]) > accuracy) +return 1; +} +return 0; +} + +static void check_fdct(void) +{ +LOCAL_ALIGNED_16(int16_t, block0, [64]); +LOCAL_ALIGNED_16(int16_t, block1, [64]); + +AVCodecContext avctx = { 0 }; +FDCTDSPContext h; + +ff_fdctdsp_init(, ); + +if (check_func(h.fdct, "fdct")) { +declare_func(void, int16_t *); +for (int i = 0; i < 64; i++) { +uint8_t r = rnd(); +block0[i] = r; +block1[i] = r; +} +call_ref(block0); +call_new(block1); +if (int16_cmp_off_by_n(block0, block1, 64, 2)) +fail(); +bench_new(block1); +} +} + +void checkasm_check_fdctdsp(void) +{ +check_fdct(); +report("fdctdsp"); +} diff --git a/tests/fate/checkasm.mak b/tests/fate/checkasm.mak index 3b5b867a97..10a42f2f9d 100644 --- a/tests/fate/checkasm.mak +++ b/tests/fate/checkasm.mak @@ -8,6 +8,7 @@ FATE_CHECKASM = fate-checkasm-aacencdsp \ fate-checkasm-blockdsp \ fate-checkasm-bswapdsp \ fate-checkasm-exrdsp\ +fate-checkasm-fdctdsp \ fate-checkasm-fixed_dsp \ fate-checkasm-flacdsp \ fate-checkasm-float_dsp \ -- 2.30.2 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH v3 0/2] lavc/aarch64/fdct: add neon-optimized fdct for aarch64
This patch set adds fdct to checkasm and neon-optimized fdct for aarch64. Ramiro Polla (2): checkasm: add test for fdct lavc/aarch64/fdct: add neon-optimized fdct for aarch64 libavcodec/aarch64/Makefile | 2 + libavcodec/aarch64/fdct.h | 26 ++ libavcodec/aarch64/fdctdsp_init_aarch64.c | 39 +++ libavcodec/aarch64/fdctdsp_neon.S | 368 ++ libavcodec/avcodec.h | 1 + libavcodec/fdctdsp.c | 4 +- libavcodec/fdctdsp.h | 2 + libavcodec/options_table.h| 1 + libavcodec/tests/aarch64/dct.c| 2 + tests/checkasm/Makefile | 1 + tests/checkasm/checkasm.c | 3 + tests/checkasm/checkasm.h | 1 + tests/checkasm/fdctdsp.c | 68 tests/fate/checkasm.mak | 1 + 14 files changed, 518 insertions(+), 1 deletion(-) create mode 100644 libavcodec/aarch64/fdct.h create mode 100644 libavcodec/aarch64/fdctdsp_init_aarch64.c create mode 100644 libavcodec/aarch64/fdctdsp_neon.S create mode 100644 tests/checkasm/fdctdsp.c -- 2.30.2 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH v2] lavc/aarch64/fdct: add neon-optimized fdct for aarch64
The code is imported from libjpeg-turbo-3.0.1. The neon registers used have been changed to avoid modifying v8-v15. --- libavcodec/aarch64/Makefile | 2 + libavcodec/aarch64/fdct.h | 26 ++ libavcodec/aarch64/fdctdsp_init_aarch64.c | 39 +++ libavcodec/aarch64/fdctdsp_neon.S | 368 ++ libavcodec/avcodec.h | 1 + libavcodec/fdctdsp.c | 4 +- libavcodec/fdctdsp.h | 2 + libavcodec/options_table.h| 1 + libavcodec/tests/aarch64/dct.c| 2 + tests/checkasm/Makefile | 1 + tests/checkasm/checkasm.c | 3 + tests/checkasm/checkasm.h | 1 + tests/checkasm/fdctdsp.c | 68 tests/fate/checkasm.mak | 1 + 14 files changed, 518 insertions(+), 1 deletion(-) create mode 100644 libavcodec/aarch64/fdct.h create mode 100644 libavcodec/aarch64/fdctdsp_init_aarch64.c create mode 100644 libavcodec/aarch64/fdctdsp_neon.S create mode 100644 tests/checkasm/fdctdsp.c diff --git a/libavcodec/aarch64/Makefile b/libavcodec/aarch64/Makefile index 95ad4dd202..a3256bb1cc 100644 --- a/libavcodec/aarch64/Makefile +++ b/libavcodec/aarch64/Makefile @@ -1,5 +1,6 @@ # subsystems OBJS-$(CONFIG_AC3DSP) += aarch64/ac3dsp_init_aarch64.o +OBJS-$(CONFIG_FDCTDSP) += aarch64/fdctdsp_init_aarch64.o OBJS-$(CONFIG_FMTCONVERT) += aarch64/fmtconvert_init.o OBJS-$(CONFIG_H264CHROMA) += aarch64/h264chroma_init_aarch64.o OBJS-$(CONFIG_H264DSP) += aarch64/h264dsp_init_aarch64.o @@ -37,6 +38,7 @@ ARMV8-OBJS-$(CONFIG_VIDEODSP) += aarch64/videodsp.o # subsystems NEON-OBJS-$(CONFIG_AAC_DECODER) += aarch64/sbrdsp_neon.o NEON-OBJS-$(CONFIG_AC3DSP) += aarch64/ac3dsp_neon.o +NEON-OBJS-$(CONFIG_FDCTDSP) += aarch64/fdctdsp_neon.o NEON-OBJS-$(CONFIG_FMTCONVERT) += aarch64/fmtconvert_neon.o NEON-OBJS-$(CONFIG_H264CHROMA) += aarch64/h264cmc_neon.o NEON-OBJS-$(CONFIG_H264DSP) += aarch64/h264dsp_neon.o \ diff --git a/libavcodec/aarch64/fdct.h b/libavcodec/aarch64/fdct.h new file mode 100644 index 00..0901b53a83 --- /dev/null +++ b/libavcodec/aarch64/fdct.h @@ -0,0 +1,26 @@ +/* + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#ifndef AVCODEC_AARCH64_FDCT_H +#define AVCODEC_AARCH64_FDCT_H + +#include + +void ff_fdct_neon(int16_t *block); + +#endif /* AVCODEC_AARCH64_FDCT_H */ diff --git a/libavcodec/aarch64/fdctdsp_init_aarch64.c b/libavcodec/aarch64/fdctdsp_init_aarch64.c new file mode 100644 index 00..59d91bc8fc --- /dev/null +++ b/libavcodec/aarch64/fdctdsp_init_aarch64.c @@ -0,0 +1,39 @@ +/* + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include "libavutil/attributes.h" +#include "libavutil/cpu.h" +#include "libavutil/aarch64/cpu.h" +#include "libavcodec/avcodec.h" +#include "libavcodec/fdctdsp.h" +#include "fdct.h" + +av_cold void ff_fdctdsp_init_aarch64(FDCTDSPContext *c, AVCodecContext *avctx, + unsigned high_bit_depth) +{ +int cpu_flags = av_get_cpu_flags(); + +if (have_neon(cpu_flags)) { +if (!high_bit_depth) { +if (avctx->dct_algo == FF_DCT_AUTO || +avctx->dct_algo == FF_DCT_NEON) { +c->fdct = ff_fdct_neon; +} +} +} +} diff --git
Re: [FFmpeg-devel] [PATCH] lavc/aarch64/fdct: add neon-optimized fdct for aarch64
Hi, On Wed, Feb 14, 2024 at 10:42 AM Martin Storsjö wrote: > On Sun, 4 Feb 2024, Ramiro Polla wrote: > > > The code is imported from libjpeg-turbo-3.0.1. The neon registers used > > have been changed to avoid modifying v8-v15. > > --- > > I don't remember if we have any extra routines we need to do if importing > foreign code with a differing license. The license here seems fine in any > case though. I think the license should be ok (based on the "Patches/Committing" section in developer.texi). > This seems to work fine in all my test environments. And thanks for making > sure it doesn't use v8-v15! > > I'm not so familiar with these DSP functions, whether it is norm to add a > new constant like FF_DCT_NEON, but I guess it seems to match the pattern > of the existing code. I don't know either, so I just tried to match the existing code :) > I presume the main case that tests this is "make fate-dct8x8", which > builds and executes libavcodec/tests/dct? How much work would it be to > integrate testing of these routines into checkasm? That way we could rest > assured that the assembly passes all such ABI checks that we do there, > including what registers must not be clobbered. I added checkasm for fdct. It's especially useful to make sure there is no overflow in the DC coefficient. > The assembly uses a different indentation width than the rest of our > assembly. I recently spent some effort on cleaning that up so that our > code is mostly consistent, so I'd prefer not to add new code that deviates > from it. It primarily looks like you'd need to add 4 spaces at the start > of each line. > > I've used a script for mostly automatically reindenting our arm assembly, > you can grab it at https://martin.st/temp/ffmpeg-asm-indent.pl, run it as > "cat file.S | ./ffmpeg-asm-indent.pl > tmp; mv tmp file.S". It's not 100% > accurate, but mostly gets you there, but it's good to manually check it > afterwards as well. I fixed the indentation and tweaked a few more cosmetics in the comments. Thank you for the review and the help on IRC! I'll send v2 shortly. Ramiro ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] lavc/aarch64/fdct: add neon-optimized fdct for aarch64
ping On Sun, Feb 4, 2024 at 3:42 PM Ramiro Polla wrote: > > The code is imported from libjpeg-turbo-3.0.1. The neon registers used > have been changed to avoid modifying v8-v15. > --- > libavcodec/aarch64/Makefile | 2 + > libavcodec/aarch64/fdct.h | 26 ++ > libavcodec/aarch64/fdctdsp_init_aarch64.c | 39 +++ > libavcodec/aarch64/fdctdsp_neon.S | 369 ++ > libavcodec/avcodec.h | 1 + > libavcodec/fdctdsp.c | 4 +- > libavcodec/fdctdsp.h | 2 + > libavcodec/options_table.h| 1 + > libavcodec/tests/aarch64/dct.c| 2 + > 9 files changed, 445 insertions(+), 1 deletion(-) > create mode 100644 libavcodec/aarch64/fdct.h > create mode 100644 libavcodec/aarch64/fdctdsp_init_aarch64.c > create mode 100644 libavcodec/aarch64/fdctdsp_neon.S > > diff --git a/libavcodec/aarch64/Makefile b/libavcodec/aarch64/Makefile > index beb6a02f5f..eebccbe4a5 100644 > --- a/libavcodec/aarch64/Makefile > +++ b/libavcodec/aarch64/Makefile > @@ -1,4 +1,5 @@ > # subsystems > +OBJS-$(CONFIG_FDCTDSP) += aarch64/fdctdsp_init_aarch64.o > OBJS-$(CONFIG_FMTCONVERT) += aarch64/fmtconvert_init.o > OBJS-$(CONFIG_H264CHROMA) += aarch64/h264chroma_init_aarch64.o > OBJS-$(CONFIG_H264DSP) += aarch64/h264dsp_init_aarch64.o > @@ -35,6 +36,7 @@ ARMV8-OBJS-$(CONFIG_VIDEODSP) += > aarch64/videodsp.o > > # subsystems > NEON-OBJS-$(CONFIG_AAC_DECODER) += aarch64/sbrdsp_neon.o > +NEON-OBJS-$(CONFIG_FDCTDSP) += aarch64/fdctdsp_neon.o > NEON-OBJS-$(CONFIG_FMTCONVERT) += aarch64/fmtconvert_neon.o > NEON-OBJS-$(CONFIG_H264CHROMA) += aarch64/h264cmc_neon.o > NEON-OBJS-$(CONFIG_H264DSP) += aarch64/h264dsp_neon.o > \ > diff --git a/libavcodec/aarch64/fdct.h b/libavcodec/aarch64/fdct.h > new file mode 100644 > index 00..0901b53a83 > --- /dev/null > +++ b/libavcodec/aarch64/fdct.h > @@ -0,0 +1,26 @@ > +/* > + * This file is part of FFmpeg. > + * > + * FFmpeg is free software; you can redistribute it and/or > + * modify it under the terms of the GNU Lesser General Public > + * License as published by the Free Software Foundation; either > + * version 2.1 of the License, or (at your option) any later version. > + * > + * FFmpeg is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + * Lesser General Public License for more details. > + * > + * You should have received a copy of the GNU Lesser General Public > + * License along with FFmpeg; if not, write to the Free Software > + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 > USA > + */ > + > +#ifndef AVCODEC_AARCH64_FDCT_H > +#define AVCODEC_AARCH64_FDCT_H > + > +#include > + > +void ff_fdct_neon(int16_t *block); > + > +#endif /* AVCODEC_AARCH64_FDCT_H */ > diff --git a/libavcodec/aarch64/fdctdsp_init_aarch64.c > b/libavcodec/aarch64/fdctdsp_init_aarch64.c > new file mode 100644 > index 00..59d91bc8fc > --- /dev/null > +++ b/libavcodec/aarch64/fdctdsp_init_aarch64.c > @@ -0,0 +1,39 @@ > +/* > + * This file is part of FFmpeg. > + * > + * FFmpeg is free software; you can redistribute it and/or > + * modify it under the terms of the GNU Lesser General Public > + * License as published by the Free Software Foundation; either > + * version 2.1 of the License, or (at your option) any later version. > + * > + * FFmpeg is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + * Lesser General Public License for more details. > + * > + * You should have received a copy of the GNU Lesser General Public > + * License along with FFmpeg; if not, write to the Free Software > + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 > USA > + */ > + > +#include "libavutil/attributes.h" > +#include "libavutil/cpu.h" > +#include "libavutil/aarch64/cpu.h" > +#include "libavcodec/avcodec.h" > +#include "libavcodec/fdctdsp.h" > +#include "fdct.h" > + > +av_cold void ff_fdctdsp_init_aarch64(FDCTDSPContext *c, AVCodecContext > *avctx, > + unsigned high_bit_depth) > +{ > +int cpu_flags = av_get_cpu_flags(); > + > +if (have_neon(cpu_flags)) { > +if (
[FFmpeg-devel] [PATCH] lavc/aarch64/fdct: add neon-optimized fdct for aarch64
The code is imported from libjpeg-turbo-3.0.1. The neon registers used have been changed to avoid modifying v8-v15. --- libavcodec/aarch64/Makefile | 2 + libavcodec/aarch64/fdct.h | 26 ++ libavcodec/aarch64/fdctdsp_init_aarch64.c | 39 +++ libavcodec/aarch64/fdctdsp_neon.S | 369 ++ libavcodec/avcodec.h | 1 + libavcodec/fdctdsp.c | 4 +- libavcodec/fdctdsp.h | 2 + libavcodec/options_table.h| 1 + libavcodec/tests/aarch64/dct.c| 2 + 9 files changed, 445 insertions(+), 1 deletion(-) create mode 100644 libavcodec/aarch64/fdct.h create mode 100644 libavcodec/aarch64/fdctdsp_init_aarch64.c create mode 100644 libavcodec/aarch64/fdctdsp_neon.S diff --git a/libavcodec/aarch64/Makefile b/libavcodec/aarch64/Makefile index beb6a02f5f..eebccbe4a5 100644 --- a/libavcodec/aarch64/Makefile +++ b/libavcodec/aarch64/Makefile @@ -1,4 +1,5 @@ # subsystems +OBJS-$(CONFIG_FDCTDSP) += aarch64/fdctdsp_init_aarch64.o OBJS-$(CONFIG_FMTCONVERT) += aarch64/fmtconvert_init.o OBJS-$(CONFIG_H264CHROMA) += aarch64/h264chroma_init_aarch64.o OBJS-$(CONFIG_H264DSP) += aarch64/h264dsp_init_aarch64.o @@ -35,6 +36,7 @@ ARMV8-OBJS-$(CONFIG_VIDEODSP) += aarch64/videodsp.o # subsystems NEON-OBJS-$(CONFIG_AAC_DECODER) += aarch64/sbrdsp_neon.o +NEON-OBJS-$(CONFIG_FDCTDSP) += aarch64/fdctdsp_neon.o NEON-OBJS-$(CONFIG_FMTCONVERT) += aarch64/fmtconvert_neon.o NEON-OBJS-$(CONFIG_H264CHROMA) += aarch64/h264cmc_neon.o NEON-OBJS-$(CONFIG_H264DSP) += aarch64/h264dsp_neon.o \ diff --git a/libavcodec/aarch64/fdct.h b/libavcodec/aarch64/fdct.h new file mode 100644 index 00..0901b53a83 --- /dev/null +++ b/libavcodec/aarch64/fdct.h @@ -0,0 +1,26 @@ +/* + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#ifndef AVCODEC_AARCH64_FDCT_H +#define AVCODEC_AARCH64_FDCT_H + +#include + +void ff_fdct_neon(int16_t *block); + +#endif /* AVCODEC_AARCH64_FDCT_H */ diff --git a/libavcodec/aarch64/fdctdsp_init_aarch64.c b/libavcodec/aarch64/fdctdsp_init_aarch64.c new file mode 100644 index 00..59d91bc8fc --- /dev/null +++ b/libavcodec/aarch64/fdctdsp_init_aarch64.c @@ -0,0 +1,39 @@ +/* + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include "libavutil/attributes.h" +#include "libavutil/cpu.h" +#include "libavutil/aarch64/cpu.h" +#include "libavcodec/avcodec.h" +#include "libavcodec/fdctdsp.h" +#include "fdct.h" + +av_cold void ff_fdctdsp_init_aarch64(FDCTDSPContext *c, AVCodecContext *avctx, + unsigned high_bit_depth) +{ +int cpu_flags = av_get_cpu_flags(); + +if (have_neon(cpu_flags)) { +if (!high_bit_depth) { +if (avctx->dct_algo == FF_DCT_AUTO || +avctx->dct_algo == FF_DCT_NEON) { +c->fdct = ff_fdct_neon; +} +} +} +} diff --git a/libavcodec/aarch64/fdctdsp_neon.S b/libavcodec/aarch64/fdctdsp_neon.S new file mode 100644 index 00..978c8d3002 --- /dev/null +++ b/libavcodec/aarch64/fdctdsp_neon.S @@ -0,0 +1,369 @@ +/* + * Armv8 Neon optimizations for libjpeg-turbo + * + * Copyright (C) 2009-2011, Nokia Corporation and/or its subsidiary(-ies). + * All Rights Reserved. + * Author: Siarhei Siamashka + * Copyright (C) 2013-2014, Linaro Limited. All Rights
[FFmpeg-devel] [PATCH] lavc/aarch64: fix include for cpu.h
--- libavcodec/aarch64/idctdsp_init_aarch64.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libavcodec/aarch64/idctdsp_init_aarch64.c b/libavcodec/aarch64/idctdsp_init_aarch64.c index eec21aa5a2..8efd5f5323 100644 --- a/libavcodec/aarch64/idctdsp_init_aarch64.c +++ b/libavcodec/aarch64/idctdsp_init_aarch64.c @@ -22,7 +22,7 @@ #include "libavutil/attributes.h" #include "libavutil/cpu.h" -#include "libavutil/arm/cpu.h" +#include "libavutil/aarch64/cpu.h" #include "libavcodec/avcodec.h" #include "libavcodec/idctdsp.h" #include "idct.h" -- 2.30.2 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH 2/2] avcodec/motion_est: fix penalty_factor for b frames
On Mon, Mar 23, 2020 at 10:42 PM Michael Niedermayer wrote: > On Sun, Mar 22, 2020 at 04:55:25PM +0100, Ramiro Polla wrote: > > In ff_estimate_b_frame_motion(), penalty_factor would be used before > > being initialized in estimate_motion_b(). Also, the initialization > > would happen more than once unnecessarily. > > --- > > libavcodec/motion_est.c | 15 --- > > tests/ref/vsynth/vsynth2-mpeg2-422 | 6 +++--- > > tests/ref/vsynth/vsynth2-mpeg2-ivlc-qprd | 6 +++--- > > tests/ref/vsynth/vsynth2-mpeg4-adap | 6 +++--- > > 4 files changed, 17 insertions(+), 16 deletions(-) > > > > diff --git a/libavcodec/motion_est.c b/libavcodec/motion_est.c > > index 02c75fd470..1feb46cec3 100644 > > --- a/libavcodec/motion_est.c > > +++ b/libavcodec/motion_est.c > > @@ -1123,9 +1123,6 @@ static int estimate_motion_b(MpegEncContext *s, int > > mb_x, int mb_y, > > uint8_t * const mv_penalty= c->mv_penalty[f_code] + MAX_DMV; > > int mv_scale; > > > > -c->penalty_factor= get_penalty_factor(s->lambda, s->lambda2, > > c->avctx->me_cmp); > > -c->sub_penalty_factor= get_penalty_factor(s->lambda, s->lambda2, > > c->avctx->me_sub_cmp); > > -c->mb_penalty_factor = get_penalty_factor(s->lambda, s->lambda2, > > c->avctx->mb_cmp); > > c->current_mv_penalty= mv_penalty; > > > > get_limits(s, 16*mb_x, 16*mb_y); > > > > > @@ -1491,7 +1488,6 @@ void ff_estimate_b_frame_motion(MpegEncContext * s, > > int mb_x, int mb_y) > > { > > MotionEstContext * const c= >me; > > -const int penalty_factor= c->mb_penalty_factor; > > int fmin, bmin, dmin, fbmin, bimin, fimin; > > int type=0; > > const int xy = mb_y*s->mb_stride + mb_x; > > @@ -1517,18 +1513,23 @@ void ff_estimate_b_frame_motion(MpegEncContext * s, > > dmin= direct_search(s, mb_x, mb_y); > > else > > dmin= INT_MAX; > > + > > +c->penalty_factor= get_penalty_factor(s->lambda, s->lambda2, > > c->avctx->me_cmp); > > +c->sub_penalty_factor= get_penalty_factor(s->lambda, s->lambda2, > > c->avctx->me_sub_cmp); > > +c->mb_penalty_factor = get_penalty_factor(s->lambda, s->lambda2, > > c->avctx->mb_cmp); > > If mb_penalty_factor isnt correct in this before this maybe isnt enough > as the direct_search() uses mb_penalty_factor Fixed. New patch attached. From 8feded1143715b064c8556a460feb86394b86acd Mon Sep 17 00:00:00 2001 From: Ramiro Polla Date: Sun, 22 Mar 2020 16:45:05 +0100 Subject: [PATCH] avcodec/motion_est: fix penalty_factor for b frames In direct_search() and ff_estimate_b_frame_motion(), penalty_factor would be used before being initialized in estimate_motion_b(). Also, the initialization would happen more than once unnecessarily. --- libavcodec/motion_est.c | 15 --- tests/ref/vsynth/vsynth1-mpeg4-thread| 6 +++--- tests/ref/vsynth/vsynth2-mpeg2-422 | 6 +++--- tests/ref/vsynth/vsynth2-mpeg2-ivlc-qprd | 6 +++--- tests/ref/vsynth/vsynth2-mpeg4-adap | 6 +++--- tests/ref/vsynth/vsynth2-mpeg4-qprd | 6 +++--- tests/ref/vsynth/vsynth2-mpeg4-thread| 6 +++--- tests/ref/vsynth/vsynth_lena-mpeg4-rc| 4 ++-- 8 files changed, 28 insertions(+), 27 deletions(-) diff --git a/libavcodec/motion_est.c b/libavcodec/motion_est.c index 02c75fd470..520a57d4d9 100644 --- a/libavcodec/motion_est.c +++ b/libavcodec/motion_est.c @@ -1123,9 +1123,6 @@ static int estimate_motion_b(MpegEncContext *s, int mb_x, int mb_y, uint8_t * const mv_penalty= c->mv_penalty[f_code] + MAX_DMV; int mv_scale; -c->penalty_factor= get_penalty_factor(s->lambda, s->lambda2, c->avctx->me_cmp); -c->sub_penalty_factor= get_penalty_factor(s->lambda, s->lambda2, c->avctx->me_sub_cmp); -c->mb_penalty_factor = get_penalty_factor(s->lambda, s->lambda2, c->avctx->mb_cmp); c->current_mv_penalty= mv_penalty; get_limits(s, 16*mb_x, 16*mb_y); @@ -1491,7 +1488,6 @@ void ff_estimate_b_frame_motion(MpegEncContext * s, int mb_x, int mb_y) { MotionEstContext * const c= >me; -const int penalty_factor= c->mb_penalty_factor; int fmin, bmin, dmin, fbmin, bimin, fimin; int type=0; const int xy = mb_y*s->mb_stride + mb_x; @@ -1513,22 +1509,27 @@ void ff_estimate_b_frame_motion(MpegEncContext * s, return; } +c->penalty_factor= get_penalty_factor(s->lambda, s->lambda2, c->avctx->me_cmp); +c->sub_penalty_factor
Re: [FFmpeg-devel] [PATCH] MAINTAINERS: add my gpg fingerprint
Hi Michael, On Mon, Mar 23, 2020 at 8:44 PM Michael Niedermayer wrote: > > On Mon, Mar 23, 2020 at 04:11:04AM +0100, Ramiro Polla wrote: > > --- > > MAINTAINERS | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/MAINTAINERS b/MAINTAINERS > > index f9810d5594..9238a1a762 100644 > > --- a/MAINTAINERS > > +++ b/MAINTAINERS > > @@ -614,6 +614,7 @@ Nikolay Aleksandrov 8978 1D8C FB71 588E 4B27 > > EAA8 C4F0 B5FC E011 13B1 > > Panagiotis Issaris6571 13A3 33D9 3726 F728 AA98 F643 B12E ECF3 > > E029 > > Peter RossA907 E02F A6E5 0CD2 34CD 20D2 6760 79C5 AC40 > > DD6B > > Philip Langdale 5DC5 8D66 5FBA 3A43 18EC 045E F8D6 B194 6A75 > > 682E > > +Ramiro Polla 1E0D 3820 ACCB 36AF 97B4 F18C 648E 2B0A E905 > > E26A > > Reimar Doeffinger C61D 16E5 9E2C D10C 8958 38A4 0899 A2B9 06D4 > > D9C7 > > Reinhard Tartler 9300 5DC2 7E87 6C37 ED7B CA9A 9808 3544 9453 > > 48A4 > > Reynaldo H. Verdejo Pinochet 6E27 CD34 170C C78E 4D4F 5F40 C18E 077F 3114 > > 452A > > iam unable to find a matching key on the keyserver > > gpg --search-key "ramiro polla" > gpg: searching for "ramiro polla" from hkp server keys.gnupg.net > (1) Ramiro Polla > 2048 bit RSA key 9B6C5700, created: 2014-09-23 > (2) Ramiro Polla > 1024 bit DSA key 25E635F9, created: 2010-01-08 > pub 2048R/9B6C5700 2014-09-23 > Key fingerprint = 7859 C65B 751B 1179 792E DAE8 8E95 8B2F 9B6C 5700 > uid Ramiro Polla > sub 2048R/CAF28B6D 2014-09-23 Sorry, I have very little experience with this. That's an old key, but I found it now. New patch attached with the 2014 fingerprint (also attached the patch signed with that key just because why not...). Hopefully this is good now. 0001-MAINTAINERS-add-my-gpg-fingerprint.patch.sig Description: PGP signature From 3c1603bd0c698fc9957c5bda718c176e6084ee2c Mon Sep 17 00:00:00 2001 From: Ramiro Polla Date: Mon, 23 Mar 2020 04:02:25 +0100 Subject: [PATCH] MAINTAINERS: add my gpg fingerprint --- MAINTAINERS | 1 + 1 file changed, 1 insertion(+) diff --git a/MAINTAINERS b/MAINTAINERS index f9810d5594..e19d1ee586 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -614,6 +614,7 @@ Nikolay Aleksandrov 8978 1D8C FB71 588E 4B27 EAA8 C4F0 B5FC E011 13B1 Panagiotis Issaris6571 13A3 33D9 3726 F728 AA98 F643 B12E ECF3 E029 Peter RossA907 E02F A6E5 0CD2 34CD 20D2 6760 79C5 AC40 DD6B Philip Langdale 5DC5 8D66 5FBA 3A43 18EC 045E F8D6 B194 6A75 682E +Ramiro Polla 7859 C65B 751B 1179 792E DAE8 8E95 8B2F 9B6C 5700 Reimar Doeffinger C61D 16E5 9E2C D10C 8958 38A4 0899 A2B9 06D4 D9C7 Reinhard Tartler 9300 5DC2 7E87 6C37 ED7B CA9A 9808 3544 9453 48A4 Reynaldo H. Verdejo Pinochet 6E27 CD34 170C C78E 4D4F 5F40 C18E 077F 3114 452A -- 2.11.0 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH] MAINTAINERS: add my gpg fingerprint
--- MAINTAINERS | 1 + 1 file changed, 1 insertion(+) diff --git a/MAINTAINERS b/MAINTAINERS index f9810d5594..9238a1a762 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -614,6 +614,7 @@ Nikolay Aleksandrov 8978 1D8C FB71 588E 4B27 EAA8 C4F0 B5FC E011 13B1 Panagiotis Issaris6571 13A3 33D9 3726 F728 AA98 F643 B12E ECF3 E029 Peter RossA907 E02F A6E5 0CD2 34CD 20D2 6760 79C5 AC40 DD6B Philip Langdale 5DC5 8D66 5FBA 3A43 18EC 045E F8D6 B194 6A75 682E +Ramiro Polla 1E0D 3820 ACCB 36AF 97B4 F18C 648E 2B0A E905 E26A Reimar Doeffinger C61D 16E5 9E2C D10C 8958 38A4 0899 A2B9 06D4 D9C7 Reinhard Tartler 9300 5DC2 7E87 6C37 ED7B CA9A 9808 3544 9453 48A4 Reynaldo H. Verdejo Pinochet 6E27 CD34 170C C78E 4D4F 5F40 C18E 077F 3114 452A -- 2.11.0 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH 1/2] avcodec/mpegvideo_enc: fix multi-threaded motion estimation rounding for mpeg4
ff_init_me() was being called after ff_update_duplicate_context(), which caused the propagation of the initialization to other thread contexts to be delayed by one frame. In the case of mpeg4 (or flipflop_rounding), this would make the hpel_put functions differ between the first thread (which would be correctly initialized) and the other threads (which would be stale from the previous frame). --- libavcodec/mpegvideo_enc.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/libavcodec/mpegvideo_enc.c b/libavcodec/mpegvideo_enc.c index b2eb9cf318..8c2672f76a 100644 --- a/libavcodec/mpegvideo_enc.c +++ b/libavcodec/mpegvideo_enc.c @@ -3699,6 +3699,9 @@ static int encode_picture(MpegEncContext *s, int picture_number) s->q_chroma_intra_matrix16 = s->q_intra_matrix16; } +if(ff_init_me(s)<0) +return -1; + s->mb_intra=0; //for the rate distortion & bit compare functions for(i=1; ithread_context[i], s); @@ -3706,9 +3709,6 @@ static int encode_picture(MpegEncContext *s, int picture_number) return ret; } -if(ff_init_me(s)<0) -return -1; - /* Estimate motion for every MB */ if(s->pict_type != AV_PICTURE_TYPE_I){ s->lambda = (s->lambda * s->me_penalty_compensation + 128) >> 8; -- 2.11.0 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH 2/2] avcodec/motion_est: fix penalty_factor for b frames
In ff_estimate_b_frame_motion(), penalty_factor would be used before being initialized in estimate_motion_b(). Also, the initialization would happen more than once unnecessarily. --- libavcodec/motion_est.c | 15 --- tests/ref/vsynth/vsynth2-mpeg2-422 | 6 +++--- tests/ref/vsynth/vsynth2-mpeg2-ivlc-qprd | 6 +++--- tests/ref/vsynth/vsynth2-mpeg4-adap | 6 +++--- 4 files changed, 17 insertions(+), 16 deletions(-) diff --git a/libavcodec/motion_est.c b/libavcodec/motion_est.c index 02c75fd470..1feb46cec3 100644 --- a/libavcodec/motion_est.c +++ b/libavcodec/motion_est.c @@ -1123,9 +1123,6 @@ static int estimate_motion_b(MpegEncContext *s, int mb_x, int mb_y, uint8_t * const mv_penalty= c->mv_penalty[f_code] + MAX_DMV; int mv_scale; -c->penalty_factor= get_penalty_factor(s->lambda, s->lambda2, c->avctx->me_cmp); -c->sub_penalty_factor= get_penalty_factor(s->lambda, s->lambda2, c->avctx->me_sub_cmp); -c->mb_penalty_factor = get_penalty_factor(s->lambda, s->lambda2, c->avctx->mb_cmp); c->current_mv_penalty= mv_penalty; get_limits(s, 16*mb_x, 16*mb_y); @@ -1491,7 +1488,6 @@ void ff_estimate_b_frame_motion(MpegEncContext * s, int mb_x, int mb_y) { MotionEstContext * const c= >me; -const int penalty_factor= c->mb_penalty_factor; int fmin, bmin, dmin, fbmin, bimin, fimin; int type=0; const int xy = mb_y*s->mb_stride + mb_x; @@ -1517,18 +1513,23 @@ void ff_estimate_b_frame_motion(MpegEncContext * s, dmin= direct_search(s, mb_x, mb_y); else dmin= INT_MAX; + +c->penalty_factor= get_penalty_factor(s->lambda, s->lambda2, c->avctx->me_cmp); +c->sub_penalty_factor= get_penalty_factor(s->lambda, s->lambda2, c->avctx->me_sub_cmp); +c->mb_penalty_factor = get_penalty_factor(s->lambda, s->lambda2, c->avctx->mb_cmp); + // FIXME penalty stuff for non-MPEG-4 c->skip=0; fmin = estimate_motion_b(s, mb_x, mb_y, s->b_forw_mv_table, 0, s->f_code) + - 3 * penalty_factor; + 3 * c->mb_penalty_factor; c->skip=0; bmin = estimate_motion_b(s, mb_x, mb_y, s->b_back_mv_table, 2, s->b_code) + - 2 * penalty_factor; + 2 * c->mb_penalty_factor; ff_dlog(s, " %d %d ", s->b_forw_mv_table[xy][0], s->b_forw_mv_table[xy][1]); c->skip=0; -fbmin= bidir_refine(s, mb_x, mb_y) + penalty_factor; +fbmin= bidir_refine(s, mb_x, mb_y) + c->mb_penalty_factor; ff_dlog(s, "%d %d %d %d\n", dmin, fmin, bmin, fbmin); if (s->avctx->flags & AV_CODEC_FLAG_INTERLACED_ME) { diff --git a/tests/ref/vsynth/vsynth2-mpeg2-422 b/tests/ref/vsynth/vsynth2-mpeg2-422 index ec7244f9f9..e945a4cc0e 100644 --- a/tests/ref/vsynth/vsynth2-mpeg2-422 +++ b/tests/ref/vsynth/vsynth2-mpeg2-422 @@ -1,4 +1,4 @@ -b2fa9b73c3547191ecc01b8163abd4e5 *tests/data/fate/vsynth2-mpeg2-422.mpeg2video -379164 tests/data/fate/vsynth2-mpeg2-422.mpeg2video -704f6a96f93c2409219bd48b74169041 *tests/data/fate/vsynth2-mpeg2-422.out.rawvideo +6fc8dc1d76379e459051ca393101c090 *tests/data/fate/vsynth2-mpeg2-422.mpeg2video +379173 tests/data/fate/vsynth2-mpeg2-422.mpeg2video +9199d5aaa1709d2584e21e58d76d44fb *tests/data/fate/vsynth2-mpeg2-422.out.rawvideo stddev:4.17 PSNR: 35.73 MAXDIFF: 70 bytes: 7603200/ 7603200 diff --git a/tests/ref/vsynth/vsynth2-mpeg2-ivlc-qprd b/tests/ref/vsynth/vsynth2-mpeg2-ivlc-qprd index 16de39edfc..f5bbecfcb2 100644 --- a/tests/ref/vsynth/vsynth2-mpeg2-ivlc-qprd +++ b/tests/ref/vsynth/vsynth2-mpeg2-ivlc-qprd @@ -1,4 +1,4 @@ -907a30295ed8323780eee08e606af0ab *tests/data/fate/vsynth2-mpeg2-ivlc-qprd.mpeg2video -269722 tests/data/fate/vsynth2-mpeg2-ivlc-qprd.mpeg2video -d2d9793bf8f3427b5cc17a1be78ddd64 *tests/data/fate/vsynth2-mpeg2-ivlc-qprd.out.rawvideo +f612ea89aa79a7f7b93a8acf332705c4 *tests/data/fate/vsynth2-mpeg2-ivlc-qprd.mpeg2video +269723 tests/data/fate/vsynth2-mpeg2-ivlc-qprd.mpeg2video +88e17886e6383755829d7da519fd5e79 *tests/data/fate/vsynth2-mpeg2-ivlc-qprd.out.rawvideo stddev:5.54 PSNR: 33.25 MAXDIFF: 94 bytes: 7603200/ 7603200 diff --git a/tests/ref/vsynth/vsynth2-mpeg4-adap b/tests/ref/vsynth/vsynth2-mpeg4-adap index a3223f6363..1ae0a65e4f 100644 --- a/tests/ref/vsynth/vsynth2-mpeg4-adap +++ b/tests/ref/vsynth/vsynth2-mpeg4-adap @@ -1,4 +1,4 @@ -4bff98da2342836476da817428594403 *tests/data/fate/vsynth2-mpeg4-adap.avi -213508 tests/data/fate/vsynth2-mpeg4-adap.avi -0c709f2b81f4593eaa29490332c2cb39 *tests/data/fate/vsynth2-mpeg4-adap.out.rawvideo +fcb79c0dcc00b306b79c354e589b6b69 *tests/data/fate/vsynth2-mpeg4-adap.avi +213526 tests/data/fate/vsynth2-mpeg4-adap.avi +71a34a48a81485f938d2c60a3d34ed39 *tests/data/fate/vsynth2-mpeg4-adap.out.rawvideo stddev:4.87 PSNR: 34.36 MAXDIFF: 86 bytes: 7603200/ 7603200 -- 2.11.0 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 1/2] avcodec/get_bits: cosmetics
On Tue, Nov 5, 2019 at 2:35 PM Michael Niedermayer wrote: > On Tue, Nov 05, 2019 at 11:13:49AM +0100, Ramiro Polla wrote: > > libavcodec/get_bits.h | 8 > > 1 file changed, 4 insertions(+), 4 deletions(-) > > LGTM ping ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH 2/2] avcodec/wmadec: cosmetics
On Tue, Nov 5, 2019 at 2:35 PM Michael Niedermayer wrote: > On Tue, Nov 05, 2019 at 11:13:50AM +0100, Ramiro Polla wrote: > > libavcodec/wmadec.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > LGTM ping ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH 1/2] avcodec/get_bits: cosmetics
--- libavcodec/get_bits.h | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/libavcodec/get_bits.h b/libavcodec/get_bits.h index c4ab607744..66fb877599 100644 --- a/libavcodec/get_bits.h +++ b/libavcodec/get_bits.h @@ -234,9 +234,9 @@ static inline void refill_32(GetBitContext *s, int is_le) #endif if (is_le) -s->cache = (uint64_t)AV_RL32(s->buffer + (s->index >> 3)) << s->bits_left | s->cache; +s->cache = (uint64_t)AV_RL32(s->buffer + (s->index >> 3)) << s->bits_left | s->cache; else -s->cache = s->cache | (uint64_t)AV_RB32(s->buffer + (s->index >> 3)) << (32 - s->bits_left); +s->cache = s->cache | (uint64_t)AV_RB32(s->buffer + (s->index >> 3)) << (32 - s->bits_left); s->index += 32; s->bits_left += 32; } @@ -249,9 +249,9 @@ static inline void refill_64(GetBitContext *s, int is_le) #endif if (is_le) -s->cache = AV_RL64(s->buffer + (s->index >> 3)); +s->cache = AV_RL64(s->buffer + (s->index >> 3)); else -s->cache = AV_RB64(s->buffer + (s->index >> 3)); +s->cache = AV_RB64(s->buffer + (s->index >> 3)); s->index += 64; s->bits_left = 64; } -- 2.11.0 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH 2/2] avcodec/wmadec: cosmetics
--- libavcodec/wmadec.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/libavcodec/wmadec.c b/libavcodec/wmadec.c index 78b51e5871..e7886262f3 100644 --- a/libavcodec/wmadec.c +++ b/libavcodec/wmadec.c @@ -889,11 +889,11 @@ static int wma_decode_superframe(AVCodecContext *avctx, void *data, q = s->last_superframe + s->last_superframe_len; len = bit_offset; while (len > 7) { -*q++ = (get_bits) (>gb, 8); +*q++ = get_bits(>gb, 8); len -= 8; } if (len > 0) -*q++ = (get_bits) (>gb, len) << (8 - len); +*q++ = get_bits(>gb, len) << (8 - len); memset(q, 0, AV_INPUT_BUFFER_PADDING_SIZE); /* XXX: bit_offset bits into last frame */ -- 2.11.0 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] mpegvideo_enc: add option to disable intra mbs in p frames
On Sun, Jun 17, 2018 at 6:23 AM Ramiro Polla wrote: > On Sun, Jun 10, 2018 at 2:32 AM, Michael Niedermayer > wrote: > > On Sat, Jun 09, 2018 at 05:09:13PM +0200, Ramiro Polla wrote: > >> On Thu, May 10, 2018 at 11:01 PM, Michael Niedermayer > >> wrote: > >> > On Wed, May 09, 2018 at 08:44:25PM +0200, Ramiro Polla wrote: > >> >> This option prevents the mpv encoders from using intra macroblocks in > >> >> predictive frames. > >> >> > >> >> It is useful for glitch artists to generate input material. This option > >> >> allows them to split and merge two video files while maintaining fluid > >> >> motion from the second video without having intra macroblocks restoring > >> >> chunks of the first video. > >> > > >> > maybe a continuous variable like snows intra_penalty could achieve this > >> > too but give more flexibility in doing it also just partially if wanted > >> > >> I like this idea better. I wanted a simple way to be able to entirely > >> disable intra macroblocks, but "-intra_penalty max" could cause an > >> overflow, so I set the max value to INT_MAX/2. > >> > >> New patch attached. > > > > LGTM > > > > a fate test may also make sense > > I sent a new patch set that includes a fate test. The patchset with test that I had sent involved some changes to ffprobe/fate that weren't good. I gave up trying to add tests in a clean way. Here's just the previous LGTM'd patch, rebased against git master. Ramiro From e30abdeed04aa36e9f80ea54c891ee32b888d95c Mon Sep 17 00:00:00 2001 From: Ramiro Polla Date: Wed, 23 Oct 2019 21:12:32 +0200 Subject: [PATCH] mpegvideo_enc: add intra_penalty option for p frames This option allows more control over the use of intra macroblocks in predictive frames. By using '-intra_penalty max', intra macroblocks are never used in predictive frames. It is useful for glitch artists to generate input material. This option allows them to split and merge two video files while maintaining fluid motion from the second video without having intra macroblocks restoring chunks of the first video. --- libavcodec/motion_est.c| 10 +- libavcodec/motion_est.h| 2 +- libavcodec/mpegvideo.h | 3 +++ libavcodec/mpegvideo_enc.c | 6 +++--- libavcodec/svq1enc.c | 2 +- 5 files changed, 13 insertions(+), 10 deletions(-) diff --git a/libavcodec/motion_est.c b/libavcodec/motion_est.c index 759eea479d..02c75fd470 100644 --- a/libavcodec/motion_est.c +++ b/libavcodec/motion_est.c @@ -971,7 +971,7 @@ void ff_estimate_p_frame_motion(MpegEncContext * s, int i_score= varc-500+(s->lambda2>>FF_LAMBDA_SHIFT)*20; c->scene_change_score+= ff_sqrt(p_score) - ff_sqrt(i_score); -if (vard*2 + 200*256 > varc) +if (vard*2 + 200*256 > varc && !s->intra_penalty) mb_type|= CANDIDATE_MB_TYPE_INTRA; if (varc*2 + 200*256 > vard || s->qscale > 24){ //if (varc*2 + 200*256 + 50*(s->lambda2>>FF_LAMBDA_SHIFT) > vard){ @@ -1040,7 +1040,7 @@ void ff_estimate_p_frame_motion(MpegEncContext * s, intra_score= s->mecc.mb_cmp[0](s, c->scratchpad, pix, s->linesize, 16); } -intra_score += c->mb_penalty_factor*16; +intra_score += c->mb_penalty_factor*16 + s->intra_penalty; if(intra_score < dmin){ mb_type= CANDIDATE_MB_TYPE_INTRA; @@ -1648,7 +1648,7 @@ int ff_get_best_fcode(MpegEncContext * s, int16_t (*mv_table)[2], int type) } } -void ff_fix_long_p_mvs(MpegEncContext * s) +void ff_fix_long_p_mvs(MpegEncContext * s, int type) { MotionEstContext * const c= >me; const int f_code= s->f_code; @@ -1682,8 +1682,8 @@ void ff_fix_long_p_mvs(MpegEncContext * s) if( mx >=range || mx <-range || my >=range || my <-range){ s->mb_type[i] &= ~CANDIDATE_MB_TYPE_INTER4V; -s->mb_type[i] |= CANDIDATE_MB_TYPE_INTRA; -s->current_picture.mb_type[i] = CANDIDATE_MB_TYPE_INTRA; +s->mb_type[i] |= type; +s->current_picture.mb_type[i] = type; } } } diff --git a/libavcodec/motion_est.h b/libavcodec/motion_est.h index 3b3a8d7341..817220f340 100644 --- a/libavcodec/motion_est.h +++ b/libavcodec/motion_est.h @@ -127,7 +127,7 @@ int ff_get_mb_score(struct MpegEncContext *s, int mx, int my, int src_index, int ff_get_best_fcode(struct MpegEncContext *s, int16_t (*mv_table)[2], int type); -void ff_fix_long_p_mvs(struct MpegEncCo
[FFmpeg-devel] [PATCH 4/4] mpegvideo_enc: add intra_penalty option for p frames
This option allows more control over the use of intra macroblocks in predictive frames. By using '-intra_penalty max', intra macroblocks are never used in predictive frames. It is useful for glitch artists to generate input material. This option allows them to split and merge two video files while maintaining fluid motion from the second video without having intra macroblocks restoring chunks of the first video. --- libavcodec/motion_est.c | 10 libavcodec/motion_est.h | 2 +- libavcodec/mpegvideo.h| 3 +++ libavcodec/mpegvideo_enc.c| 6 ++--- libavcodec/svq1enc.c | 2 +- tests/fate-run.sh | 8 +++ tests/fate/mpeg4.mak | 5 tests/fate/seek.mak | 1 + tests/fate/vcodec.mak | 4 tests/ref/fate/mpeg4-nopimb | 1 + tests/ref/seek/vsynth_lena-mpeg4-nopimb | 40 +++ tests/ref/vsynth/vsynth1-mpeg4-nopimb | 4 tests/ref/vsynth/vsynth2-mpeg4-nopimb | 4 tests/ref/vsynth/vsynth3-mpeg4-nopimb | 4 tests/ref/vsynth/vsynth_lena-mpeg4-nopimb | 4 15 files changed, 88 insertions(+), 10 deletions(-) create mode 100644 tests/ref/fate/mpeg4-nopimb create mode 100644 tests/ref/seek/vsynth_lena-mpeg4-nopimb create mode 100644 tests/ref/vsynth/vsynth1-mpeg4-nopimb create mode 100644 tests/ref/vsynth/vsynth2-mpeg4-nopimb create mode 100644 tests/ref/vsynth/vsynth3-mpeg4-nopimb create mode 100644 tests/ref/vsynth/vsynth_lena-mpeg4-nopimb diff --git a/libavcodec/motion_est.c b/libavcodec/motion_est.c index 8b5ce2117a..fa750e39ec 100644 --- a/libavcodec/motion_est.c +++ b/libavcodec/motion_est.c @@ -971,7 +971,7 @@ void ff_estimate_p_frame_motion(MpegEncContext * s, int i_score= varc-500+(s->lambda2>>FF_LAMBDA_SHIFT)*20; c->scene_change_score+= ff_sqrt(p_score) - ff_sqrt(i_score); -if (vard*2 + 200*256 > varc) +if (vard*2 + 200*256 > varc && !s->intra_penalty) mb_type|= CANDIDATE_MB_TYPE_INTRA; if (varc*2 + 200*256 > vard || s->qscale > 24){ //if (varc*2 + 200*256 + 50*(s->lambda2>>FF_LAMBDA_SHIFT) > vard){ @@ -1040,7 +1040,7 @@ void ff_estimate_p_frame_motion(MpegEncContext * s, intra_score= s->mecc.mb_cmp[0](s, c->scratchpad, pix, s->linesize, 16); } -intra_score += c->mb_penalty_factor*16; +intra_score += c->mb_penalty_factor*16 + s->intra_penalty; if(intra_score < dmin){ mb_type= CANDIDATE_MB_TYPE_INTRA; @@ -1648,7 +1648,7 @@ int ff_get_best_fcode(MpegEncContext * s, int16_t (*mv_table)[2], int type) } } -void ff_fix_long_p_mvs(MpegEncContext * s) +void ff_fix_long_p_mvs(MpegEncContext * s, int type) { MotionEstContext * const c= >me; const int f_code= s->f_code; @@ -1682,8 +1682,8 @@ void ff_fix_long_p_mvs(MpegEncContext * s) if( mx >=range || mx <-range || my >=range || my <-range){ s->mb_type[i] &= ~CANDIDATE_MB_TYPE_INTER4V; -s->mb_type[i] |= CANDIDATE_MB_TYPE_INTRA; -s->current_picture.mb_type[i] = CANDIDATE_MB_TYPE_INTRA; +s->mb_type[i] |= type; +s->current_picture.mb_type[i] = type; } } } diff --git a/libavcodec/motion_est.h b/libavcodec/motion_est.h index 3b3a8d7341..817220f340 100644 --- a/libavcodec/motion_est.h +++ b/libavcodec/motion_est.h @@ -127,7 +127,7 @@ int ff_get_mb_score(struct MpegEncContext *s, int mx, int my, int src_index, int ff_get_best_fcode(struct MpegEncContext *s, int16_t (*mv_table)[2], int type); -void ff_fix_long_p_mvs(struct MpegEncContext *s); +void ff_fix_long_p_mvs(struct MpegEncContext *s, int type); void ff_fix_long_mvs(struct MpegEncContext *s, uint8_t *field_select_table, int field_select, int16_t (*mv_table)[2], int f_code, int type, int truncate); diff --git a/libavcodec/mpegvideo.h b/libavcodec/mpegvideo.h index e16deb64e7..7eda962ba7 100644 --- a/libavcodec/mpegvideo.h +++ b/libavcodec/mpegvideo.h @@ -577,6 +577,8 @@ typedef struct MpegEncContext { int scenechange_threshold; int noise_reduction; + +int intra_penalty; } MpegEncContext; /* mpegvideo_enc common options */ @@ -661,6 +663,7 @@ FF_MPV_OPT_CMP_FUNC, \ {"ps", "RTP payload size in bytes", FF_MPV_OFFSET(rtp_payload_size), AV_OPT_TYPE_INT, {.i64 = 0 }, INT_MIN, INT_MAX, FF_MPV_OPT_FLAGS }, \ {"mepc", "Motion estimation bitrate penalty compensation (1.0 = 256)", FF_MPV_OFFSET(me_penalty_compensation), AV_OPT_TYPE_INT, {.i64 = 256 }, INT_MIN, INT_MAX, FF_MPV_OPT_FLAGS }, \ {"mepre", "pre motion
[FFmpeg-devel] [PATCH 2/4] mpegutils: split debug function that prints mb_type so it may be used by ffprobe
--- libavcodec/mpegutils.c | 115 + libavcodec/mpegutils.h | 7 +++ 2 files changed, 76 insertions(+), 46 deletions(-) diff --git a/libavcodec/mpegutils.c b/libavcodec/mpegutils.c index 0fbe5f8c9d..12c2468797 100644 --- a/libavcodec/mpegutils.c +++ b/libavcodec/mpegutils.c @@ -100,6 +100,72 @@ void ff_draw_horiz_band(AVCodecContext *avctx, } } +int ff_mb_type_str(char *str, int size, int mb_type) +{ +char *ptr = str; + +if (size <= 0) +return 0; + +if (--size <= 0) +goto end; + +// Type & MV direction +if (IS_PCM(mb_type)) +*ptr++ = 'P'; +else if (IS_INTRA(mb_type) && IS_ACPRED(mb_type)) +*ptr++ = 'A'; +else if (IS_INTRA4x4(mb_type)) +*ptr++ = 'i'; +else if (IS_INTRA16x16(mb_type)) +*ptr++ = 'I'; +else if (IS_DIRECT(mb_type) && IS_SKIP(mb_type)) +*ptr++ = 'd'; +else if (IS_DIRECT(mb_type)) +*ptr++ = 'D'; +else if (IS_GMC(mb_type) && IS_SKIP(mb_type)) +*ptr++ = 'g'; +else if (IS_GMC(mb_type)) +*ptr++ = 'G'; +else if (IS_SKIP(mb_type)) +*ptr++ = 'S'; +else if (!USES_LIST(mb_type, 1)) +*ptr++ = '>'; +else if (!USES_LIST(mb_type, 0)) +*ptr++ = '<'; +else { +av_assert2(USES_LIST(mb_type, 0) && USES_LIST(mb_type, 1)); +*ptr++ = 'X'; +} + +if (--size <= 0) +goto end; + +// segmentation +if (IS_8X8(mb_type)) +*ptr++ = '+'; +else if (IS_16X8(mb_type)) +*ptr++ = '-'; +else if (IS_8X16(mb_type)) +*ptr++ = '|'; +else if (IS_INTRA(mb_type) || IS_16X16(mb_type)) +*ptr++ = ' '; +else +*ptr++ = '?'; + +if (--size <= 0) +goto end; + +if (IS_INTERLACED(mb_type)) +*ptr++ = '='; +else +*ptr++ = ' '; + +end: +*ptr = '\0'; +return ptr - str; +} + void ff_print_debug_info2(AVCodecContext *avctx, AVFrame *pict, uint8_t *mbskip_table, uint32_t *mbtype_table, int8_t *qscale_table, int16_t (*motion_val[2])[2], int *low_delay, @@ -231,52 +297,9 @@ void ff_print_debug_info2(AVCodecContext *avctx, AVFrame *pict, uint8_t *mbskip_ qscale_table[x + y * mb_stride]); } if (avctx->debug & FF_DEBUG_MB_TYPE) { -int mb_type = mbtype_table[x + y * mb_stride]; -// Type & MV direction -if (IS_PCM(mb_type)) -av_log(avctx, AV_LOG_DEBUG, "P"); -else if (IS_INTRA(mb_type) && IS_ACPRED(mb_type)) -av_log(avctx, AV_LOG_DEBUG, "A"); -else if (IS_INTRA4x4(mb_type)) -av_log(avctx, AV_LOG_DEBUG, "i"); -else if (IS_INTRA16x16(mb_type)) -av_log(avctx, AV_LOG_DEBUG, "I"); -else if (IS_DIRECT(mb_type) && IS_SKIP(mb_type)) -av_log(avctx, AV_LOG_DEBUG, "d"); -else if (IS_DIRECT(mb_type)) -av_log(avctx, AV_LOG_DEBUG, "D"); -else if (IS_GMC(mb_type) && IS_SKIP(mb_type)) -av_log(avctx, AV_LOG_DEBUG, "g"); -else if (IS_GMC(mb_type)) -av_log(avctx, AV_LOG_DEBUG, "G"); -else if (IS_SKIP(mb_type)) -av_log(avctx, AV_LOG_DEBUG, "S"); -else if (!USES_LIST(mb_type, 1)) -av_log(avctx, AV_LOG_DEBUG, ">"); -else if (!USES_LIST(mb_type, 0)) -av_log(avctx, AV_LOG_DEBUG, "<"); -else { -av_assert2(USES_LIST(mb_type, 0) && USES_LIST(mb_type, 1)); -av_log(avctx, AV_LOG_DEBUG, "X"); -} - -// segmentation -if (IS_8X8(mb_type)) -av_log(avctx, AV_LOG_DEBUG, "+"); -else if (IS_16X8(mb_type)) -av_log(avctx, AV_LOG_DEBUG, "-"); -else if (IS_8X16(mb_type)) -av_log(avctx, AV_LOG_DEBUG, "|"); -else if (IS_INTRA(mb_type) || IS_16X16(mb_type)) -av_log(avctx, AV_LOG_DEBUG, " "); -else -av_log(avctx, AV_LOG_DEBUG, "?"); - - -if (IS_INTERLACED(mb_type)) -av_log(avctx, AV_LOG_DEBUG, "="); -else -av_log(avctx, AV_LOG_DEBUG, " "); +char str[4]; +ff_mb_type_str(str, sizeof(str), mbtype_table[x + y * mb_stride]); +av_log(avctx, AV_LOG_DEBUG, str); } } av_log(avctx, AV_LOG_DEBUG, "\n"); diff --git
[FFmpeg-devel] [PATCH 3/4] ffprobe: print mb_types frame side data
--- fftools/ffprobe.c | 19 +++ 1 file changed, 19 insertions(+) diff --git a/fftools/ffprobe.c b/fftools/ffprobe.c index 544786ec72..5bd14ebfdb 100644 --- a/fftools/ffprobe.c +++ b/fftools/ffprobe.c @@ -30,6 +30,7 @@ #include "libavformat/avformat.h" #include "libavcodec/avcodec.h" +#include "libavcodec/mpegutils.h" #include "libavutil/avassert.h" #include "libavutil/avstring.h" #include "libavutil/bprint.h" @@ -,6 +2223,24 @@ static void show_frame(WriterContext *w, AVFrame *frame, AVStream *stream, AVContentLightMetadata *metadata = (AVContentLightMetadata *)sd->data; print_int("max_content", metadata->MaxCLL); print_int("max_average", metadata->MaxFALL); +} else if (sd->type == AV_FRAME_DATA_MB_TYPES) { +uint32_t *mb_types = (uint32_t *)sd->data; +int mb_height = *mb_types++; +int mb_width = *mb_types++; +int size = mb_height * mb_width * 3 + 1; +char *str = av_malloc(size); +int mb_y, mb_x; +print_int("mb_height", mb_height); +print_int("mb_width", mb_width); +if (str) { +char *ptr = str; +const char *end = str + size; +for (mb_y = 0; mb_y < mb_height; mb_y++) +for (mb_x = 0; mb_x < mb_width; mb_x++) +ptr += ff_mb_type_str(ptr, end - str, *mb_types++); +print_str("mb_types", str); +av_free(str); +} } else if (sd->type == AV_FRAME_DATA_ICC_PROFILE) { AVDictionaryEntry *tag = av_dict_get(sd->metadata, "name", NULL, AV_DICT_MATCH_CASE); if (tag) -- 2.11.0 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] mpegvideo_enc: add option to disable intra mbs in p frames
On Sun, Jun 10, 2018 at 2:32 AM, Michael Niedermayer wrote: > On Sat, Jun 09, 2018 at 05:09:13PM +0200, Ramiro Polla wrote: >> On Thu, May 10, 2018 at 11:01 PM, Michael Niedermayer >> wrote: >> > On Wed, May 09, 2018 at 08:44:25PM +0200, Ramiro Polla wrote: >> >> This option prevents the mpv encoders from using intra macroblocks in >> >> predictive frames. >> >> >> >> It is useful for glitch artists to generate input material. This option >> >> allows them to split and merge two video files while maintaining fluid >> >> motion from the second video without having intra macroblocks restoring >> >> chunks of the first video. >> > >> > maybe a continuous variable like snows intra_penalty could achieve this >> > too but give more flexibility in doing it also just partially if wanted >> >> I like this idea better. I wanted a simple way to be able to entirely >> disable intra macroblocks, but "-intra_penalty max" could cause an >> overflow, so I set the max value to INT_MAX/2. >> >> New patch attached. > > LGTM > > a fate test may also make sense I sent a new patch set that includes a fate test. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH 1/4] lavu/frame: add mb_types side data
--- libavcodec/avcodec.h | 4 libavcodec/mpegutils.c | 20 libavcodec/options_table.h | 1 + libavutil/frame.c | 1 + libavutil/frame.h | 9 + 5 files changed, 35 insertions(+) diff --git a/libavcodec/avcodec.h b/libavcodec/avcodec.h index c90166deb6..7fe4fc9347 100644 --- a/libavcodec/avcodec.h +++ b/libavcodec/avcodec.h @@ -929,6 +929,10 @@ typedef struct RcOverride{ */ #define AV_CODEC_FLAG2_SHOW_ALL (1 << 22) /** + * Export macroblock types through frame side data + */ +#define AV_CODEC_FLAG2_EXPORT_MB_TYPES (1 << 27) +/** * Export motion vectors through frame side data */ #define AV_CODEC_FLAG2_EXPORT_MVS (1 << 28) diff --git a/libavcodec/mpegutils.c b/libavcodec/mpegutils.c index 3f94540616..0fbe5f8c9d 100644 --- a/libavcodec/mpegutils.c +++ b/libavcodec/mpegutils.c @@ -188,6 +188,26 @@ void ff_print_debug_info2(AVCodecContext *avctx, AVFrame *pict, uint8_t *mbskip_ av_freep(); } +if ((avctx->flags2 & AV_CODEC_FLAG2_EXPORT_MB_TYPES) && mbtype_table) { +int size = (2 + mb_height * mb_width) * sizeof(uint32_t); +int mb_x, mb_y; + +AVFrameSideData *sd; +uint32_t *out; + +sd = av_frame_new_side_data(pict, AV_FRAME_DATA_MB_TYPES, size); +if (!sd) +return; + +out = (uint32_t *) sd->data; +*out++ = mb_height; +*out++ = mb_width; + +for (mb_y = 0; mb_y < mb_height; mb_y++) +for (mb_x = 0; mb_x < mb_width; mb_x++) +*out++ = mbtype_table[mb_x + mb_y * mb_stride]; +} + /* TODO: export all the following to make them accessible for users (and filters) */ if (avctx->hwaccel || !mbtype_table) return; diff --git a/libavcodec/options_table.h b/libavcodec/options_table.h index 099261e168..25c84de321 100644 --- a/libavcodec/options_table.h +++ b/libavcodec/options_table.h @@ -76,6 +76,7 @@ static const AVOption avcodec_options[] = { {"export_mvs", "export motion vectors through frame side data", 0, AV_OPT_TYPE_CONST, {.i64 = AV_CODEC_FLAG2_EXPORT_MVS}, INT_MIN, INT_MAX, V|D, "flags2"}, {"skip_manual", "do not skip samples and export skip information as frame side data", 0, AV_OPT_TYPE_CONST, {.i64 = AV_CODEC_FLAG2_SKIP_MANUAL}, INT_MIN, INT_MAX, V|D, "flags2"}, {"ass_ro_flush_noop", "do not reset ASS ReadOrder field on flush", 0, AV_OPT_TYPE_CONST, {.i64 = AV_CODEC_FLAG2_RO_FLUSH_NOOP}, INT_MIN, INT_MAX, S|D, "flags2"}, +{"export_mb_types", "export macroblock types through frame side data", 0, AV_OPT_TYPE_CONST, {.i64 = AV_CODEC_FLAG2_EXPORT_MB_TYPES}, INT_MIN, INT_MAX, V|D, "flags2"}, {"time_base", NULL, OFFSET(time_base), AV_OPT_TYPE_RATIONAL, {.dbl = 0}, 0, INT_MAX}, {"g", "set the group of picture (GOP) size", OFFSET(gop_size), AV_OPT_TYPE_INT, {.i64 = 12 }, INT_MIN, INT_MAX, V|E}, {"ar", "set audio sampling rate (in Hz)", OFFSET(sample_rate), AV_OPT_TYPE_INT, {.i64 = DEFAULT }, 0, INT_MAX, A|D|E}, diff --git a/libavutil/frame.c b/libavutil/frame.c index deb9b6f334..577d4f6e6d 100644 --- a/libavutil/frame.c +++ b/libavutil/frame.c @@ -834,6 +834,7 @@ const char *av_frame_side_data_name(enum AVFrameSideDataType type) case AV_FRAME_DATA_ICC_PROFILE: return "ICC profile"; case AV_FRAME_DATA_QP_TABLE_PROPERTIES: return "QP table properties"; case AV_FRAME_DATA_QP_TABLE_DATA: return "QP table data"; +case AV_FRAME_DATA_MB_TYPES:return "Macroblock types"; } return NULL; } diff --git a/libavutil/frame.h b/libavutil/frame.h index 9d57d6ce66..ce1231b03b 100644 --- a/libavutil/frame.h +++ b/libavutil/frame.h @@ -158,6 +158,15 @@ enum AVFrameSideDataType { */ AV_FRAME_DATA_QP_TABLE_DATA, #endif + +/** + * Macroblock types exported by some codecs (on demand through the + * export_mb_types flag set in the libavcodec AVCodecContext flags2 option). + * The data is composed by a header consisting of uint32_t mb_height and + * uint32_t mb_width, followed by a uint32_t mb_types[mb_height][mb_width] + * array. + */ +AV_FRAME_DATA_MB_TYPES, }; enum AVActiveFormatDescription { -- 2.11.0 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] mpegvideo_enc: add option to disable intra mbs in p frames
Hi Michael, On Thu, May 10, 2018 at 11:01 PM, Michael Niedermayer wrote: > On Wed, May 09, 2018 at 08:44:25PM +0200, Ramiro Polla wrote: >> This option prevents the mpv encoders from using intra macroblocks in >> predictive frames. >> >> It is useful for glitch artists to generate input material. This option >> allows them to split and merge two video files while maintaining fluid >> motion from the second video without having intra macroblocks restoring >> chunks of the first video. > > maybe a continuous variable like snows intra_penalty could achieve this > too but give more flexibility in doing it also just partially if wanted I like this idea better. I wanted a simple way to be able to entirely disable intra macroblocks, but "-intra_penalty max" could cause an overflow, so I set the max value to INT_MAX/2. New patch attached. From d2c1da02c28be5519f0ba84aa22f519a296a6d04 Mon Sep 17 00:00:00 2001 From: Ramiro Polla Date: Sat, 9 Jun 2018 17:00:26 +0200 Subject: [PATCH] mpegvideo_enc: add intra_penalty option for p frames This option allows more control over the use of intra macroblocks in predictive frames. By using '-intra_penalty max', intra macroblocks are never used in predictive frames. It is useful for glitch artists to generate input material. This option allows them to split and merge two video files while maintaining fluid motion from the second video without having intra macroblocks restoring chunks of the first video. --- libavcodec/motion_est.c| 10 +- libavcodec/motion_est.h| 2 +- libavcodec/mpegvideo.h | 3 +++ libavcodec/mpegvideo_enc.c | 6 +++--- libavcodec/svq1enc.c | 2 +- 5 files changed, 13 insertions(+), 10 deletions(-) diff --git a/libavcodec/motion_est.c b/libavcodec/motion_est.c index 8b5ce2117a..fa750e39ec 100644 --- a/libavcodec/motion_est.c +++ b/libavcodec/motion_est.c @@ -971,7 +971,7 @@ void ff_estimate_p_frame_motion(MpegEncContext * s, int i_score= varc-500+(s->lambda2>>FF_LAMBDA_SHIFT)*20; c->scene_change_score+= ff_sqrt(p_score) - ff_sqrt(i_score); -if (vard*2 + 200*256 > varc) +if (vard*2 + 200*256 > varc && !s->intra_penalty) mb_type|= CANDIDATE_MB_TYPE_INTRA; if (varc*2 + 200*256 > vard || s->qscale > 24){ //if (varc*2 + 200*256 + 50*(s->lambda2>>FF_LAMBDA_SHIFT) > vard){ @@ -1040,7 +1040,7 @@ void ff_estimate_p_frame_motion(MpegEncContext * s, intra_score= s->mecc.mb_cmp[0](s, c->scratchpad, pix, s->linesize, 16); } -intra_score += c->mb_penalty_factor*16; +intra_score += c->mb_penalty_factor*16 + s->intra_penalty; if(intra_score < dmin){ mb_type= CANDIDATE_MB_TYPE_INTRA; @@ -1648,7 +1648,7 @@ int ff_get_best_fcode(MpegEncContext * s, int16_t (*mv_table)[2], int type) } } -void ff_fix_long_p_mvs(MpegEncContext * s) +void ff_fix_long_p_mvs(MpegEncContext * s, int type) { MotionEstContext * const c= >me; const int f_code= s->f_code; @@ -1682,8 +1682,8 @@ void ff_fix_long_p_mvs(MpegEncContext * s) if( mx >=range || mx <-range || my >=range || my <-range){ s->mb_type[i] &= ~CANDIDATE_MB_TYPE_INTER4V; -s->mb_type[i] |= CANDIDATE_MB_TYPE_INTRA; -s->current_picture.mb_type[i] = CANDIDATE_MB_TYPE_INTRA; +s->mb_type[i] |= type; +s->current_picture.mb_type[i] = type; } } } diff --git a/libavcodec/motion_est.h b/libavcodec/motion_est.h index 3b3a8d7341..817220f340 100644 --- a/libavcodec/motion_est.h +++ b/libavcodec/motion_est.h @@ -127,7 +127,7 @@ int ff_get_mb_score(struct MpegEncContext *s, int mx, int my, int src_index, int ff_get_best_fcode(struct MpegEncContext *s, int16_t (*mv_table)[2], int type); -void ff_fix_long_p_mvs(struct MpegEncContext *s); +void ff_fix_long_p_mvs(struct MpegEncContext *s, int type); void ff_fix_long_mvs(struct MpegEncContext *s, uint8_t *field_select_table, int field_select, int16_t (*mv_table)[2], int f_code, int type, int truncate); diff --git a/libavcodec/mpegvideo.h b/libavcodec/mpegvideo.h index e16deb64e7..7eda962ba7 100644 --- a/libavcodec/mpegvideo.h +++ b/libavcodec/mpegvideo.h @@ -577,6 +577,8 @@ typedef struct MpegEncContext { int scenechange_threshold; int noise_reduction; + +int intra_penalty; } MpegEncContext; /* mpegvideo_enc common options */ @@ -661,6 +663,7 @@ FF_MPV_OPT_CMP_FUNC, \ {"ps", "RTP payload size in bytes", FF_MPV_OFFSET(rtp_payload_size), AV_OPT_TYPE_INT, {
[FFmpeg-devel] [PATCH] mpegvideo_enc: add option to disable intra mbs in p frames
This option prevents the mpv encoders from using intra macroblocks in predictive frames. It is useful for glitch artists to generate input material. This option allows them to split and merge two video files while maintaining fluid motion from the second video without having intra macroblocks restoring chunks of the first video. --- libavcodec/motion_est.c| 4 ++-- libavcodec/mpegvideo.h | 2 ++ libavcodec/mpegvideo_enc.c | 5 +++-- 3 files changed, 7 insertions(+), 4 deletions(-) diff --git a/libavcodec/motion_est.c b/libavcodec/motion_est.c index 8b5ce2117a..827e2282f7 100644 --- a/libavcodec/motion_est.c +++ b/libavcodec/motion_est.c @@ -971,7 +971,7 @@ void ff_estimate_p_frame_motion(MpegEncContext * s, int i_score= varc-500+(s->lambda2>>FF_LAMBDA_SHIFT)*20; c->scene_change_score+= ff_sqrt(p_score) - ff_sqrt(i_score); -if (vard*2 + 200*256 > varc) +if (vard*2 + 200*256 > varc && !(s->mpv_flags & FF_MPV_FLAG_NOPIMB)) mb_type|= CANDIDATE_MB_TYPE_INTRA; if (varc*2 + 200*256 > vard || s->qscale > 24){ //if (varc*2 + 200*256 + 50*(s->lambda2>>FF_LAMBDA_SHIFT) > vard){ @@ -1042,7 +1042,7 @@ void ff_estimate_p_frame_motion(MpegEncContext * s, } intra_score += c->mb_penalty_factor*16; -if(intra_score < dmin){ +if(intra_score < dmin && !(s->mpv_flags & FF_MPV_FLAG_NOPIMB)){ mb_type= CANDIDATE_MB_TYPE_INTRA; s->current_picture.mb_type[mb_y*s->mb_stride + mb_x] = CANDIDATE_MB_TYPE_INTRA; //FIXME cleanup }else diff --git a/libavcodec/mpegvideo.h b/libavcodec/mpegvideo.h index e16deb64e7..b7ac2c7b48 100644 --- a/libavcodec/mpegvideo.h +++ b/libavcodec/mpegvideo.h @@ -586,6 +586,7 @@ typedef struct MpegEncContext { #define FF_MPV_FLAG_CBP_RD 0x0008 #define FF_MPV_FLAG_NAQ 0x0010 #define FF_MPV_FLAG_MV0 0x0020 +#define FF_MPV_FLAG_NOPIMB 0x0040 #define FF_MPV_OPT_CMP_FUNC \ { "sad","Sum of absolute differences, fast", 0, AV_OPT_TYPE_CONST, {.i64 = FF_CMP_SAD }, INT_MIN, INT_MAX, FF_MPV_OPT_FLAGS, "cmp_func" }, \ @@ -617,6 +618,7 @@ FF_MPV_OPT_CMP_FUNC, \ { "cbp_rd", "use rate distortion optimization for CBP", 0, AV_OPT_TYPE_CONST, { .i64 = FF_MPV_FLAG_CBP_RD }, 0, 0, FF_MPV_OPT_FLAGS, "mpv_flags" },\ { "naq","normalize adaptive quantization", 0, AV_OPT_TYPE_CONST, { .i64 = FF_MPV_FLAG_NAQ },0, 0, FF_MPV_OPT_FLAGS, "mpv_flags" },\ { "mv0","always try a mb with mv=<0,0>", 0, AV_OPT_TYPE_CONST, { .i64 = FF_MPV_FLAG_MV0 },0, 0, FF_MPV_OPT_FLAGS, "mpv_flags" },\ +{ "nopimb", "do not use intra mbs for predictive frames",0, AV_OPT_TYPE_CONST, { .i64 = FF_MPV_FLAG_NOPIMB }, 0, 0, FF_MPV_OPT_FLAGS, "mpv_flags" },\ { "luma_elim_threshold", "single coefficient elimination threshold for luminance (negative values also consider dc coefficient)",\ FF_MPV_OFFSET(luma_elim_threshold), AV_OPT_TYPE_INT, { .i64 = 0 }, INT_MIN, INT_MAX, FF_MPV_OPT_FLAGS },\ { "chroma_elim_threshold", "single coefficient elimination threshold for chrominance (negative values also consider dc coefficient)",\ diff --git a/libavcodec/mpegvideo_enc.c b/libavcodec/mpegvideo_enc.c index 9fdab31a25..e41a8f40cf 100644 --- a/libavcodec/mpegvideo_enc.c +++ b/libavcodec/mpegvideo_enc.c @@ -3752,6 +3752,7 @@ static int encode_picture(MpegEncContext *s, int picture_number) if(!s->umvplus){ if(s->pict_type==AV_PICTURE_TYPE_P || s->pict_type==AV_PICTURE_TYPE_S) { +int truncate = s->mpv_flags & FF_MPV_FLAG_NOPIMB; s->f_code= ff_get_best_fcode(s, s->p_mv_table, CANDIDATE_MB_TYPE_INTER); if (s->avctx->flags & AV_CODEC_FLAG_INTERLACED_ME) { @@ -3762,13 +3763,13 @@ static int encode_picture(MpegEncContext *s, int picture_number) } ff_fix_long_p_mvs(s); -ff_fix_long_mvs(s, NULL, 0, s->p_mv_table, s->f_code, CANDIDATE_MB_TYPE_INTER, 0); +ff_fix_long_mvs(s, NULL, 0, s->p_mv_table, s->f_code, CANDIDATE_MB_TYPE_INTER, truncate); if (s->avctx->flags & AV_CODEC_FLAG_INTERLACED_ME) { int j; for(i=0; i<2; i++){ for(j=0; j<2; j++) ff_fix_long_mvs(s, s->p_field_select_table[i], j, -s->p_field_mv_table[i][j], s->f_code, CANDIDATE_MB_TYPE_INTER_I, 0); +s->p_field_mv_table[i][j], s->f_code, CANDIDATE_MB_TYPE_INTER_I, truncate); } } } -- 2.11.0 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 2/3] Fix Decklink for Mac
2015-01-11 15:38 GMT+01:00 Georg Lippitsch georg.lippit...@gmx.at: --- libavdevice/decklink_common.cpp | 10 ++ 1 file changed, 10 insertions(+) diff --git a/libavdevice/decklink_common.cpp b/libavdevice/decklink_common.cpp index 07e1651..82b8bdb 100644 --- a/libavdevice/decklink_common.cpp +++ b/libavdevice/decklink_common.cpp @@ -70,6 +70,16 @@ static char *dup_wchar_to_utf8(wchar_t *w) #define DECKLINK_STROLECHAR * #define DECKLINK_STRDUP dup_wchar_to_utf8 #define DECKLINK_FREE(s) SysFreeString(s) +#elif __APPLE__ +static char *dup_cfstring_to_utf8(CFStringRef w) +{ +char s[256]; +CFStringGetCString(w, s, 255, kCFStringEncodingUTF8); +return av_strdup(s); +} Is it not possible to get the string's real length? You could also try using CFStringGetCStringPtr() first. +#define DECKLINK_STRconst __CFString * +#define DECKLINK_STRDUP dup_cfstring_to_utf8 +#define DECKLINK_FREE(s) free((void *) s) #else #define DECKLINK_STRconst char * #define DECKLINK_STRDUP av_strdup -- 1.8.4.5 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] Patch for device list error in decklink_common.cpp
On 03.12.2014 12:06, Jon bae wrote: Thanks Ramiro for the correction! Here is the new patch. (Is it better to post directly the patch, or is ok as a attachment?) Attachment is better. But please avoid top-posting in this mailing-list. 2014-12-02 22:19 GMT+01:00 Ramiro Polla ramiro.po...@gmail.com: On 02.12.2014 20:30, Jon bae wrote: Here is the other patch for decklink_common.cpp. It fix the error: COM initialization failed [decklink @ 02e5b520] Could not create DeckLink iterator dummy: Immediate exit request From 203eba2fad14dd6d84552d6c22899792e80b53bb Mon Sep 17 00:00:00 2001 From: Jonathan Baecker jonba...@gmail.com Date: Tue, 2 Dec 2014 20:12:38 +0100 Subject: [PATCH 2/2] device list error in decklink_common Signed-off-by: Jonathan Baecker jonba...@gmail.com --- libavdevice/decklink_common.cpp | 24 ++-- 1 file changed, 14 insertions(+), 10 deletions(-) diff --git a/libavdevice/decklink_common.cpp b/libavdevice/decklink_common.cpp index 8eff910..8f7e32a 100644 --- a/libavdevice/decklink_common.cpp +++ b/libavdevice/decklink_common.cpp @@ -42,16 +42,20 @@ IDeckLinkIterator *CreateDeckLinkIteratorInstance (void) { IDeckLinkIterator *iter; -if (CoInitialize(NULL) != S_OK) { -av_log(NULL, AV_LOG_ERROR, COM initialization failed.\n); -return NULL; -} - -if (CoCreateInstance(CLSID_CDeckLinkIterator, NULL, CLSCTX_ALL, - IID_IDeckLinkIterator, (void**) iter) != S_OK) { -av_log(NULL, AV_LOG_ERROR, DeckLink drivers not installed.\n); -return NULL; -} +HRESULT result; +/* Initialize COM on this thread */ +result = CoInitialize(NULL); +if (FAILED(result)) { +av_log(NULL, AV_LOG_ERROR, COM initialization failed.\n); +return NULL; +} + +/* Create an IDeckLinkIterator object to enumerate all DeckLink cards in the system */ +result = CoCreateInstance(CLSID_CDeckLinkIterator, NULL, CLSCTX_ALL, IID_IDeckLinkIterator, (void**)iter); +if (FAILED(result)) { +av_log(NULL, AV_LOG_ERROR, DeckLink drivers not installed.\n); +return NULL; +} return iter; } -- 2.2.0 This code is Copyright (c) Blackmagic Design. Try just changing the check for CoInitialize(NULL) from != S_OK to 0. From 3c3d5dda659fe30c68a81b0a711cb09bcb5be443 Mon Sep 17 00:00:00 2001 From: Jonathan Baecker jonba...@gmail.com Date: Wed, 3 Dec 2014 12:03:12 +0100 Subject: [PATCH] fix COM initialization failed Signed-off-by: Jonathan Baecker jonba...@gmail.com --- libavdevice/decklink_common.cpp | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/libavdevice/decklink_common.cpp b/libavdevice/decklink_common.cpp index 6899bd2..4252552 100644 --- a/libavdevice/decklink_common.cpp +++ b/libavdevice/decklink_common.cpp @@ -42,13 +42,13 @@ IDeckLinkIterator *CreateDeckLinkIteratorInstance(void) { IDeckLinkIterator *iter; -if (CoInitialize(NULL) != S_OK) { +if (CoInitialize(NULL) 0) { av_log(NULL, AV_LOG_ERROR, COM initialization failed.\n); return NULL; } if (CoCreateInstance(CLSID_CDeckLinkIterator, NULL, CLSCTX_ALL, - IID_IDeckLinkIterator, (void**) iter) != S_OK) { + IID_IDeckLinkIterator, (void**) iter) 0) { av_log(NULL, AV_LOG_ERROR, DeckLink drivers not installed.\n); return NULL; } The CoCreateInstance check doesn't need to be changed. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] Patch for device list error in decklink_common.cpp
On 03.12.2014 16:44, Jon bae wrote: Ok finally... Here now only the first line changed. Sorry for the mess, I 'm not the right person for that. From 2cddda59076b2ac5a539f7016c0aa1883d37c6d8 Mon Sep 17 00:00:00 2001 From: Jonathan Baecker jonba...@gmail.com Date: Wed, 3 Dec 2014 16:41:41 +0100 Subject: [PATCH] fix COM initialization failed Signed-off-by: Jonathan Baecker jonba...@gmail.com --- libavdevice/decklink_common.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libavdevice/decklink_common.cpp b/libavdevice/decklink_common.cpp index 6899bd2..07e1651 100644 --- a/libavdevice/decklink_common.cpp +++ b/libavdevice/decklink_common.cpp @@ -42,7 +42,7 @@ IDeckLinkIterator *CreateDeckLinkIteratorInstance(void) { IDeckLinkIterator *iter; -if (CoInitialize(NULL) != S_OK) { +if (CoInitialize(NULL) 0) { av_log(NULL, AV_LOG_ERROR, COM initialization failed.\n); return NULL; } LGTM. Thanks for submitting the patches! ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] Patch for device list error in decklink_common.cpp
On 02.12.2014 20:30, Jon bae wrote: Here is the other patch for decklink_common.cpp. It fix the error: COM initialization failed [decklink @ 02e5b520] Could not create DeckLink iterator dummy: Immediate exit request From 203eba2fad14dd6d84552d6c22899792e80b53bb Mon Sep 17 00:00:00 2001 From: Jonathan Baecker jonba...@gmail.com Date: Tue, 2 Dec 2014 20:12:38 +0100 Subject: [PATCH 2/2] device list error in decklink_common Signed-off-by: Jonathan Baecker jonba...@gmail.com --- libavdevice/decklink_common.cpp | 24 ++-- 1 file changed, 14 insertions(+), 10 deletions(-) diff --git a/libavdevice/decklink_common.cpp b/libavdevice/decklink_common.cpp index 8eff910..8f7e32a 100644 --- a/libavdevice/decklink_common.cpp +++ b/libavdevice/decklink_common.cpp @@ -42,16 +42,20 @@ IDeckLinkIterator *CreateDeckLinkIteratorInstance(void) { IDeckLinkIterator *iter; -if (CoInitialize(NULL) != S_OK) { -av_log(NULL, AV_LOG_ERROR, COM initialization failed.\n); -return NULL; -} - -if (CoCreateInstance(CLSID_CDeckLinkIterator, NULL, CLSCTX_ALL, - IID_IDeckLinkIterator, (void**) iter) != S_OK) { -av_log(NULL, AV_LOG_ERROR, DeckLink drivers not installed.\n); -return NULL; -} +HRESULT result; +/* Initialize COM on this thread */ +result = CoInitialize(NULL); +if (FAILED(result)) { +av_log(NULL, AV_LOG_ERROR, COM initialization failed.\n); +return NULL; +} + +/* Create an IDeckLinkIterator object to enumerate all DeckLink cards in the system */ +result = CoCreateInstance(CLSID_CDeckLinkIterator, NULL, CLSCTX_ALL, IID_IDeckLinkIterator, (void**)iter); +if (FAILED(result)) { +av_log(NULL, AV_LOG_ERROR, DeckLink drivers not installed.\n); +return NULL; +} return iter; } -- 2.2.0 This code is Copyright (c) Blackmagic Design. Try just changing the check for CoInitialize(NULL) from != S_OK to 0. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] Patch for heap corruption run time error in decklink_common.cpp
On 02.12.2014 20:28, Jon bae wrote: Ok here a second run, I try to follow the instruction from Carl Eugen. This is the first patch for decklink_common.cpp. It fix this error: Unhandled exception at 0x76FA4102 (ntdll.dll) in ffmpeg.exe: 0xC374: A heap has been corrupted (parameters: 0x7701B4B0). From e9bc8e910f515af4030054df3e6feb308f3208aa Mon Sep 17 00:00:00 2001 From: Jonathan Baecker jonba...@gmail.com Date: Tue, 2 Dec 2014 20:10:41 +0100 Subject: [PATCH 1/2] heap corruption run time error in decklink_common Signed-off-by: Jonathan Baecker jonba...@gmail.com --- libavdevice/decklink_common.cpp | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/libavdevice/decklink_common.cpp b/libavdevice/decklink_common.cpp index 9a9e44b..8eff910 100644 --- a/libavdevice/decklink_common.cpp +++ b/libavdevice/decklink_common.cpp @@ -69,9 +69,12 @@ static char *dup_wchar_to_utf8(wchar_t *w) } #define DECKLINK_STROLECHAR * #define DECKLINK_STRDUP dup_wchar_to_utf8 +#define DECKLINK_FREE(s) SysFreeString(s) #else #define DECKLINK_STRconst char * #define DECKLINK_STRDUP av_strdup +/* free() is needed for a string returned by the DeckLink SDL. */ +#define DECKLINK_FREE(s) free((void *) s) #endif HRESULT ff_decklink_get_display_name(IDeckLink *This, const char **displayName) @@ -81,8 +84,7 @@ HRESULT ff_decklink_get_display_name(IDeckLink *This, const char **displayName) if (hr != S_OK) return hr; *displayName = DECKLINK_STRDUP(tmpDisplayName); -/* free() is needed for a string returned by the DeckLink SDL. */ -free((void *) tmpDisplayName); +DECKLINK_FREE(tmpDisplayName); return hr; } LGTM ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] Fwd: OPW Qualification Task: Validate MLP Bitstream
Greeshma, 2014-10-31 15:41 GMT+01:00 greeshma greeshmabalaba...@gmail.com: I have first added experimental encoder mlpenc.c from https://github.com/ramiropolla/mlpenc an updated changes according to the recent commits in FFmpeg That code is supposed to be sent for review after the end of OPW, not before =) The qualification task is to update it to the current FFmpeg codebase (for example the DSPContext changes and the encode function changes). There's still a long way to go before submitting this code for review. Ramiro ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel