from:"Ramiro Polla"

Re: [FFmpeg-devel] [PATCH v2 2/4] swscale/x86: add sse4 {lum, chr}ConvertRange

2024-06-12 Thread Ramiro Polla

Hi,

On Tue, Jun 11, 2024 at 8:42 PM James Almer  wrote:
>
> On 6/11/2024 3:26 PM, Michael Niedermayer wrote:
> > On Tue, Jun 11, 2024 at 02:28:56PM +0200, Ramiro Polla wrote:
> >> chrRangeFromJpeg_8_c: 28.7
> >> chrRangeFromJpeg_8_sse4: 16.2
> >> chrRangeFromJpeg_24_c: 152.7
> >> chrRangeFromJpeg_24_sse4: 29.7
> >> chrRangeFromJpeg_128_c: 366.5
> >> chrRangeFromJpeg_128_sse4: 233.0
> >> chrRangeFromJpeg_144_c: 408.0
> >> chrRangeFromJpeg_144_sse4: 182.5
> >> chrRangeFromJpeg_256_c: 698.7
> >> chrRangeFromJpeg_256_sse4: 325.5
> >> chrRangeFromJpeg_512_c: 1348.7
> >> chrRangeFromJpeg_512_sse4: 660.2
> >> chrRangeToJpeg_8_c: 37.7
> >> chrRangeToJpeg_8_sse4: 16.2
> >> chrRangeToJpeg_24_c: 115.7
> >> chrRangeToJpeg_24_sse4: 36.2
> >> chrRangeToJpeg_128_c: 631.2
> >> chrRangeToJpeg_128_sse4: 163.7
> >> chrRangeToJpeg_144_c: 710.7
> >> chrRangeToJpeg_144_sse4: 183.0
> >> chrRangeToJpeg_256_c: 1253.0
> >> chrRangeToJpeg_256_sse4: 343.5
> >> chrRangeToJpeg_512_c: 2491.2
> >> chrRangeToJpeg_512_sse4: 654.2
> >> lumRangeFromJpeg_8_c: 11.7
> >> lumRangeFromJpeg_8_sse4: 10.5
> >> lumRangeFromJpeg_24_c: 38.5
> >> lumRangeFromJpeg_24_sse4: 19.0
> >> lumRangeFromJpeg_128_c: 237.5
> >> lumRangeFromJpeg_128_sse4: 79.2
> >> lumRangeFromJpeg_144_c: 255.7
> >> lumRangeFromJpeg_144_sse4: 90.5
> >> lumRangeFromJpeg_256_c: 441.5
> >> lumRangeFromJpeg_256_sse4: 161.7
> >> lumRangeFromJpeg_512_c: 879.0
> >> lumRangeFromJpeg_512_sse4: 333.2
> >> lumRangeToJpeg_8_c: 20.0
> >> lumRangeToJpeg_8_sse4: 11.7
> >> lumRangeToJpeg_24_c: 61.5
> >> lumRangeToJpeg_24_sse4: 17.7
> >> lumRangeToJpeg_128_c: 357.5
> >> lumRangeToJpeg_128_sse4: 80.0
> >> lumRangeToJpeg_144_c: 371.5
> >> lumRangeToJpeg_144_sse4: 93.2
> >> lumRangeToJpeg_256_c: 651.5
> >> lumRangeToJpeg_256_sse4: 164.5
> >> lumRangeToJpeg_512_c: 1279.0
> >> lumRangeToJpeg_512_sse4: 333.7
> >> ---
> >>   libswscale/swscale_internal.h|   1 +
> >>   libswscale/utils.c   |   2 +
> >>   libswscale/x86/Makefile  |   1 +
> >>   libswscale/x86/range_convert.asm | 130 +++
> >>   libswscale/x86/swscale.c |  36 +
> >>   5 files changed, 170 insertions(+)
> >>   create mode 100644 libswscale/x86/range_convert.asm
> >
> > breaks x86-32 build
> >
> > LDffmpeg_g
> > /usr/lib/gcc-cross/i686-linux-gnu/7/../../../../i686-linux-gnu/bin/ld: 
> > libswscale/libswscale.a(utils.o): in function `sws_setColorspaceDetails':
> > ffmpeg/linux32/src/libswscale/utils.c:1086: undefined reference to 
> > `ff_sws_init_range_convert_x86'
> > collect2: error: ld returned 1 exit status
> > make: *** [Makefile:139: ffmpeg_g] Error 1
> >
> > thx
>
> The functions are wrapped in ARCH_X86_64 checks for seemingly no reason,
> so they should be removed in the next iteration.

Fixed.

James walked me through on IRC to optimize and improve the functions
in a way that they work both with sse2 and avx2. New patch attached.
From 9e49e72f6766e96cc06bec869fb776fff4c477bf Mon Sep 17 00:00:00 2001
From: Ramiro Polla 
Date: Thu, 6 Jun 2024 18:33:34 +0200
Subject: [PATCH] swscale/x86: add sse2 and avx2 {lum,chr}ConvertRange

chrRangeFromJpeg_8_c: 22.3
chrRangeFromJpeg_8_sse2: 13.3
chrRangeFromJpeg_8_avx2: 13.3
chrRangeFromJpeg_24_c: 72.8
chrRangeFromJpeg_24_sse2: 22.3
chrRangeFromJpeg_24_avx2: 17.5
chrRangeFromJpeg_128_c: 345.5
chrRangeFromJpeg_128_sse2: 106.0
chrRangeFromJpeg_128_avx2: 57.8
chrRangeFromJpeg_144_c: 380.5
chrRangeFromJpeg_144_sse2: 118.5
chrRangeFromJpeg_144_avx2: 62.3
chrRangeFromJpeg_256_c: 646.3
chrRangeFromJpeg_256_sse2: 218.8
chrRangeFromJpeg_256_avx2: 109.0
chrRangeFromJpeg_512_c: 1461.5
chrRangeFromJpeg_512_sse2: 426.5
chrRangeFromJpeg_512_avx2: 211.5
chrRangeToJpeg_8_c: 37.8
chrRangeToJpeg_8_sse2: 10.5
chrRangeToJpeg_8_avx2: 14.0
chrRangeToJpeg_24_c: 114.3
chrRangeToJpeg_24_sse2: 23.5
chrRangeToJpeg_24_avx2: 16.3
chrRangeToJpeg_128_c: 633.5
chrRangeToJpeg_128_sse2: 107.5
chrRangeToJpeg_128_avx2: 55.0
chrRangeToJpeg_144_c: 758.3
chrRangeToJpeg_144_sse2: 132.0
chrRangeToJpeg_144_avx2: 64.5
chrRangeToJpeg_256_c: 1345.0
chrRangeToJpeg_256_sse2: 218.0
chrRangeToJpeg_256_avx2: 105.3
chrRangeToJpeg_512_c: 2524.0
chrRangeToJpeg_512_sse2: 417.0
chrRangeToJpeg_512_avx2: 218.8
lumRangeFromJpeg_8_c: 11.8
lumRangeFromJpeg_8_sse2: 11.0
lumRangeFromJpeg_8_avx2: 10.3
lumRangeFromJpeg_24_c: 38.5
lumRangeFromJpeg_24_sse2: 15.5
lumRangeFromJpeg_24_avx2: 12.5
lumR

Re: [FFmpeg-devel] [PATCH 4/4] swscale/aarch64: add neon {lum, chr}ConvertRange

2024-06-11 Thread Ramiro Polla

On Mon, Jun 10, 2024 at 1:56 PM Martin Storsjö  wrote:
> On Fri, 7 Jun 2024, Ramiro Polla wrote:
>
> > chrRangeFromJpeg_8_c: 28.5
> > chrRangeFromJpeg_8_neon: 21.2
> > chrRangeFromJpeg_24_c: 81.2
> > chrRangeFromJpeg_24_neon: 34.7
> > chrRangeFromJpeg_128_c: 425.2
> > chrRangeFromJpeg_128_neon: 162.0
> > chrRangeFromJpeg_144_c: 480.2
> > chrRangeFromJpeg_144_neon: 180.2
> > chrRangeFromJpeg_256_c: 838.2
> > chrRangeFromJpeg_256_neon: 318.0
> > chrRangeFromJpeg_512_c: 1698.2
> > chrRangeFromJpeg_512_neon: 630.0
> > chrRangeToJpeg_8_c: 56.0
> > chrRangeToJpeg_8_neon: 23.5
> > chrRangeToJpeg_24_c: 147.7
> > chrRangeToJpeg_24_neon: 38.2
> > chrRangeToJpeg_128_c: 760.2
> > chrRangeToJpeg_128_neon: 182.5
> > chrRangeToJpeg_144_c: 857.7
> > chrRangeToJpeg_144_neon: 204.5
> > chrRangeToJpeg_256_c: 1504.2
> > chrRangeToJpeg_256_neon: 358.5
> > chrRangeToJpeg_512_c: 3025.7
> > chrRangeToJpeg_512_neon: 710.5
> > lumRangeFromJpeg_8_c: 24.0
> > lumRangeFromJpeg_8_neon: 18.2
> > lumRangeFromJpeg_24_c: 64.0
> > lumRangeFromJpeg_24_neon: 22.2
> > lumRangeFromJpeg_128_c: 289.2
> > lumRangeFromJpeg_128_neon: 79.2
> > lumRangeFromJpeg_144_c: 334.7
> > lumRangeFromJpeg_144_neon: 87.7
> > lumRangeFromJpeg_256_c: 579.5
> > lumRangeFromJpeg_256_neon: 152.0
> > lumRangeFromJpeg_512_c: 1208.0
> > lumRangeFromJpeg_512_neon: 299.0
> > lumRangeToJpeg_8_c: 30.0
> > lumRangeToJpeg_8_neon: 19.0
> > lumRangeToJpeg_24_c: 82.2
> > lumRangeToJpeg_24_neon: 24.0
> > lumRangeToJpeg_128_c: 440.7
> > lumRangeToJpeg_128_neon: 90.5
> > lumRangeToJpeg_144_c: 502.0
> > lumRangeToJpeg_144_neon: 102.2
> > lumRangeToJpeg_256_c: 893.7
> > lumRangeToJpeg_256_neon: 178.0
> > lumRangeToJpeg_512_c: 1793.7
> > lumRangeToJpeg_512_neon: 355.0
> > ---
> > libswscale/aarch64/Makefile |   1 +
> > libswscale/aarch64/range_convert_neon.S | 103 
> > libswscale/aarch64/swscale.c|  21 +
> > libswscale/swscale_internal.h   |   1 +
> > libswscale/utils.c  |   4 +-
> > 5 files changed, 129 insertions(+), 1 deletion(-)
> > create mode 100644 libswscale/aarch64/range_convert_neon.S
> >
> > diff --git a/libswscale/aarch64/Makefile b/libswscale/aarch64/Makefile
> > index da1d909561..6923827f82 100644
> > --- a/libswscale/aarch64/Makefile
> > +++ b/libswscale/aarch64/Makefile
> > @@ -4,5 +4,6 @@ OBJS+= aarch64/rgb2rgb.o\
> >
> > NEON-OBJS   += aarch64/hscale.o \
> >aarch64/output.o \
> > +   aarch64/range_convert_neon.o \
> >aarch64/rgb2rgb_neon.o   \
> >aarch64/yuv2rgb_neon.o   \
> > diff --git a/libswscale/aarch64/range_convert_neon.S 
> > b/libswscale/aarch64/range_convert_neon.S
> > new file mode 100644
> > index 00..5e104971f0
> > --- /dev/null
> > +++ b/libswscale/aarch64/range_convert_neon.S
> > @@ -0,0 +1,103 @@
> > +/*
> > + * Copyright (c) 2024 Ramiro Polla
> > + *
> > + * This file is part of FFmpeg.
> > + *
> > + * FFmpeg is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU Lesser General Public
> > + * License as published by the Free Software Foundation; either
> > + * version 2.1 of the License, or (at your option) any later version.
> > + *
> > + * FFmpeg is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > + * Lesser General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU Lesser General Public
> > + * License along with FFmpeg; if not, write to the Free Software
> > + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 
> > 02110-1301 USA
> > + */
> > +
> > +#include "libavutil/aarch64/asm.S"
> > +
> > +.macro lumConvertRange name max mult offset shift
>
> We usually use commas between the macro arguments here. Apparently it
> doesn't make any difference for any of the tools we support, but it would
> be nice for consistency. (When invoking macros, commas between arguments
> are optional for most platforms, but not when targeting Apple platforms,
> so being strict with consistent use of commas is generally good.)

Fixed in the new patchset.

> > +const offset_\name, align=

[FFmpeg-devel] [PATCH v2 4/4] swscale/aarch64: add neon {lum, chr}ConvertRange

2024-06-11 Thread Ramiro Polla

chrRangeFromJpeg_8_c: 29.2
chrRangeFromJpeg_8_neon: 19.5
chrRangeFromJpeg_24_c: 80.5
chrRangeFromJpeg_24_neon: 34.0
chrRangeFromJpeg_128_c: 413.7
chrRangeFromJpeg_128_neon: 156.0
chrRangeFromJpeg_144_c: 471.0
chrRangeFromJpeg_144_neon: 174.2
chrRangeFromJpeg_256_c: 842.0
chrRangeFromJpeg_256_neon: 305.5
chrRangeFromJpeg_512_c: 1699.0
chrRangeFromJpeg_512_neon: 608.0
chrRangeToJpeg_8_c: 51.7
chrRangeToJpeg_8_neon: 22.7
chrRangeToJpeg_24_c: 149.7
chrRangeToJpeg_24_neon: 38.0
chrRangeToJpeg_128_c: 761.7
chrRangeToJpeg_128_neon: 176.7
chrRangeToJpeg_144_c: 866.2
chrRangeToJpeg_144_neon: 198.7
chrRangeToJpeg_256_c: 1516.5
chrRangeToJpeg_256_neon: 348.7
chrRangeToJpeg_512_c: 3067.2
chrRangeToJpeg_512_neon: 692.7
lumRangeFromJpeg_8_c: 24.0
lumRangeFromJpeg_8_neon: 17.0
lumRangeFromJpeg_24_c: 56.7
lumRangeFromJpeg_24_neon: 21.0
lumRangeFromJpeg_128_c: 294.5
lumRangeFromJpeg_128_neon: 76.7
lumRangeFromJpeg_144_c: 332.5
lumRangeFromJpeg_144_neon: 86.7
lumRangeFromJpeg_256_c: 586.0
lumRangeFromJpeg_256_neon: 152.2
lumRangeFromJpeg_512_c: 1190.0
lumRangeFromJpeg_512_neon: 298.0
lumRangeToJpeg_8_c: 31.7
lumRangeToJpeg_8_neon: 19.5
lumRangeToJpeg_24_c: 83.5
lumRangeToJpeg_24_neon: 24.2
lumRangeToJpeg_128_c: 440.5
lumRangeToJpeg_128_neon: 91.0
lumRangeToJpeg_144_c: 504.2
lumRangeToJpeg_144_neon: 101.0
lumRangeToJpeg_256_c: 879.7
lumRangeToJpeg_256_neon: 177.2
lumRangeToJpeg_512_c: 1794.2
lumRangeToJpeg_512_neon: 354.0
---
 libswscale/aarch64/Makefile |  1 +
 libswscale/aarch64/range_convert_neon.S | 99 +
 libswscale/aarch64/swscale.c| 21 ++
 libswscale/swscale_internal.h   |  1 +
 libswscale/utils.c  |  4 +-
 5 files changed, 125 insertions(+), 1 deletion(-)
 create mode 100644 libswscale/aarch64/range_convert_neon.S

diff --git a/libswscale/aarch64/Makefile b/libswscale/aarch64/Makefile
index adfd90a1b6..37ad960619 100644
--- a/libswscale/aarch64/Makefile
+++ b/libswscale/aarch64/Makefile
@@ -5,5 +5,6 @@ OBJS+= aarch64/rgb2rgb.o\
 NEON-OBJS   += aarch64/hscale.o \
aarch64/input.o  \
aarch64/output.o \
+   aarch64/range_convert_neon.o \
aarch64/rgb2rgb_neon.o   \
aarch64/yuv2rgb_neon.o   \
diff --git a/libswscale/aarch64/range_convert_neon.S 
b/libswscale/aarch64/range_convert_neon.S
new file mode 100644
index 00..ea56dc2e32
--- /dev/null
+++ b/libswscale/aarch64/range_convert_neon.S
@@ -0,0 +1,99 @@
+/*
+ * Copyright (c) 2024 Ramiro Polla
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include "libavutil/aarch64/asm.S"
+
+.macro lumConvertRange name, max, mult, offset, shift
+function ff_\name, export=1
+.if \max != 0
+mov w3, #\max
+dup v24.8h, w3
+.endif
+mov w3, #\mult
+dup v25.4s, w3
+movzw3, \offset & 0x
+movkw3, (\offset >> 16) & 0x, lsl #16
+dup v26.4s, w3
+1:
+ld1 {v0.8h}, [x0]
+.if \max != 0
+sminv0.8h, v0.8h, v24.8h
+.endif
+mov v16.16b, v26.16b
+mov v18.16b, v26.16b
+sxtlv20.4s, v0.4h
+sxtl2   v22.4s, v0.8h
+mla v16.4s, v20.4s, v25.4s
+mla v18.4s, v22.4s, v25.4s
+shrnv0.4h, v16.4s, #\shift
+shrn2   v0.8h, v18.4s, #\shift
+subsw1, w1, #8
+st1 {v0.8h}, [x0], #16
+b.gt1b
+ret
+endfunc
+.endm
+
+.macro chrConvertRange name, max, mult, offset, shift
+function ff_\name, export=1
+.if \max != 0
+mov w3, #\max
+dup v24.8h, w3
+.endif
+mov w3, #\mult
+dup v25.4s, w3
+movzw3, \offset & 0x
+movkw3, (\offset >> 16) & 0x, lsl #16
+dup v26.4s, w3
+1:
+ld1 {v0.8h}, [x0]
+ld1 {v1.8h}, [x1]
+.if \max != 0
+smin

[FFmpeg-devel] [PATCH v2 3/4] swscale/x86: add avx2 {lum, chr}ConvertRange

2024-06-11 Thread Ramiro Polla

chrRangeFromJpeg_8_c: 24.1
chrRangeFromJpeg_8_sse4: 16.1
chrRangeFromJpeg_8_avx2: 19.9
chrRangeFromJpeg_24_c: 72.6
chrRangeFromJpeg_24_sse4: 34.6
chrRangeFromJpeg_24_avx2: 30.9
chrRangeFromJpeg_128_c: 341.1
chrRangeFromJpeg_128_sse4: 160.9
chrRangeFromJpeg_128_avx2: 94.1
chrRangeFromJpeg_144_c: 381.9
chrRangeFromJpeg_144_sse4: 183.6
chrRangeFromJpeg_144_avx2: 108.9
chrRangeFromJpeg_256_c: 646.1
chrRangeFromJpeg_256_sse4: 320.4
chrRangeFromJpeg_256_avx2: 190.6
chrRangeFromJpeg_512_c: 1255.9
chrRangeFromJpeg_512_sse4: 654.1
chrRangeFromJpeg_512_avx2: 392.4
chrRangeToJpeg_8_c: 36.9
chrRangeToJpeg_8_sse4: 13.9
chrRangeToJpeg_8_avx2: 20.6
chrRangeToJpeg_24_c: 113.4
chrRangeToJpeg_24_sse4: 29.6
chrRangeToJpeg_24_avx2: 28.9
chrRangeToJpeg_128_c: 632.1
chrRangeToJpeg_128_sse4: 162.4
chrRangeToJpeg_128_avx2: 94.6
chrRangeToJpeg_144_c: 709.9
chrRangeToJpeg_144_sse4: 183.9
chrRangeToJpeg_144_avx2: 108.1
chrRangeToJpeg_256_c: 2672.9
chrRangeToJpeg_256_sse4: 334.4
chrRangeToJpeg_256_avx2: 190.6
chrRangeToJpeg_512_c: 2500.9
chrRangeToJpeg_512_sse4: 654.1
chrRangeToJpeg_512_avx2: 379.6
lumRangeFromJpeg_8_c: 10.9
lumRangeFromJpeg_8_sse4: 12.4
lumRangeFromJpeg_8_avx2: 17.6
lumRangeFromJpeg_24_c: 38.4
lumRangeFromJpeg_24_sse4: 16.9
lumRangeFromJpeg_24_avx2: 20.6
lumRangeFromJpeg_128_c: 233.6
lumRangeFromJpeg_128_sse4: 79.9
lumRangeFromJpeg_128_avx2: 51.6
lumRangeFromJpeg_144_c: 263.9
lumRangeFromJpeg_144_sse4: 90.1
lumRangeFromJpeg_144_avx2: 57.6
lumRangeFromJpeg_256_c: 436.9
lumRangeFromJpeg_256_sse4: 162.1
lumRangeFromJpeg_256_avx2: 100.6
lumRangeFromJpeg_512_c: 878.4
lumRangeFromJpeg_512_sse4: 335.1
lumRangeFromJpeg_512_avx2: 199.4
lumRangeToJpeg_8_c: 19.1
lumRangeToJpeg_8_sse4: 11.6
lumRangeToJpeg_8_avx2: 17.6
lumRangeToJpeg_24_c: 56.9
lumRangeToJpeg_24_sse4: 17.6
lumRangeToJpeg_24_avx2: 21.4
lumRangeToJpeg_128_c: 335.9
lumRangeToJpeg_128_sse4: 79.1
lumRangeToJpeg_128_avx2: 48.9
lumRangeToJpeg_144_c: 372.9
lumRangeToJpeg_144_sse4: 91.6
lumRangeToJpeg_144_avx2: 55.4
lumRangeToJpeg_256_c: 651.9
lumRangeToJpeg_256_sse4: 163.6
lumRangeToJpeg_256_avx2: 99.1
lumRangeToJpeg_512_c: 1289.9
lumRangeToJpeg_512_sse4: 333.6
lumRangeToJpeg_512_avx2: 211.1
---
 libswscale/x86/range_convert.asm | 46 ++--
 libswscale/x86/swscale.c |  5 +++-
 2 files changed, 42 insertions(+), 9 deletions(-)

diff --git a/libswscale/x86/range_convert.asm b/libswscale/x86/range_convert.asm
index 13983a386b..54c2f64769 100644
--- a/libswscale/x86/range_convert.asm
+++ b/libswscale/x86/range_convert.asm
@@ -22,20 +22,20 @@
 
 SECTION_RODATA
 
-chr_to_mult:times 4 dd 4663
-chr_to_offset:  times 4 dd -9289992
+chr_to_mult:times 8 dd 4663
+chr_to_offset:  times 8 dd -9289992
 %define chr_to_shift 12
 
-chr_from_mult:  times 4 dd 1799
-chr_from_offset:times 4 dd 4081085
+chr_from_mult:  times 8 dd 1799
+chr_from_offset:times 8 dd 4081085
 %define chr_from_shift 11
 
-lum_to_mult:times 4 dd 19077
-lum_to_offset:  times 4 dd -39057361
+lum_to_mult:times 8 dd 19077
+lum_to_offset:  times 8 dd -39057361
 %define lum_to_shift 14
 
-lum_from_mult:  times 4 dd 14071
-lum_from_offset:times 4 dd 33561947
+lum_from_mult:  times 8 dd 14071
+lum_from_offset:times 8 dd 33561947
 %define lum_from_shift 14
 
 SECTION .text
@@ -66,10 +66,19 @@ cglobal %1, 2, 3, 3, dst, width, x
 padddm1, m5
 psradm0, %4
 psradm1, %4
+%if mmsize == 16
 packssdw m0, m0
 packssdw m1, m1
 movq[dstq+xq*2], m0
 movq[dstq+xq*2+mmsize/2], m1
+%else
+vextracti128xm7, ym0, 1
+packssdwxm0, xm7
+vextracti128xm7, ym1, 1
+packssdwxm1, xm7
+movdqu  [dstq+xq*2], xm0
+movdqu  [dstq+xq*2+mmsize/2], xm1
+%endif
 add  xq, mmsize / 2
 cmp  xd, widthd
 jl .loop
@@ -107,6 +116,7 @@ cglobal %1, 3, 4, 4, dstU, dstV, width, x
 psradm1, %4
 psradm2, %4
 psradm3, %4
+%if mmsize == 16
 packssdw m0, m0
 packssdw m1, m1
 packssdw m2, m2
@@ -115,6 +125,20 @@ cglobal %1, 3, 4, 4, dstU, dstV, width, x
 movq   [dstUq+xq*2+mmsize/2], m1
 movq   [dstVq+xq*2], m2
 movq   [dstVq+xq*2+mmsize/2], m3
+%else
+vextracti128xm7, ym0, 1
+packssdwxm0, xm7
+vextracti128xm7, ym1, 1
+packssdwxm1, xm7
+vextracti128xm7, ym2, 1
+packssdwxm2, xm7
+vextracti128xm7, ym3, 1
+packssdwxm3, xm7
+movdqu [dstUq+xq*2], xm0
+movdqu [dstUq+xq*2+mmsize/2], xm1
+movdqu [dstVq+xq*2], xm2
+movdqu [dstVq+xq*2+mmsize/2], xm3
+%endif
 add  xq, mmsize / 2
 cmp  xd, widthd
 jl .loop
@@ -127,4 +151,10 @@ LUMCONVERTRANGE lumRangeToJpeg,   lum_to_mult,   
lum_to_offset,   lum_to_shift
 CHRCONVERTRANGE chrRangeToJpeg,   chr_to_mult,   chr_to_offset,   chr_to_shift

[FFmpeg-devel] [PATCH v2 2/4] swscale/x86: add sse4 {lum, chr}ConvertRange

2024-06-11 Thread Ramiro Polla

chrRangeFromJpeg_8_c: 28.7
chrRangeFromJpeg_8_sse4: 16.2
chrRangeFromJpeg_24_c: 152.7
chrRangeFromJpeg_24_sse4: 29.7
chrRangeFromJpeg_128_c: 366.5
chrRangeFromJpeg_128_sse4: 233.0
chrRangeFromJpeg_144_c: 408.0
chrRangeFromJpeg_144_sse4: 182.5
chrRangeFromJpeg_256_c: 698.7
chrRangeFromJpeg_256_sse4: 325.5
chrRangeFromJpeg_512_c: 1348.7
chrRangeFromJpeg_512_sse4: 660.2
chrRangeToJpeg_8_c: 37.7
chrRangeToJpeg_8_sse4: 16.2
chrRangeToJpeg_24_c: 115.7
chrRangeToJpeg_24_sse4: 36.2
chrRangeToJpeg_128_c: 631.2
chrRangeToJpeg_128_sse4: 163.7
chrRangeToJpeg_144_c: 710.7
chrRangeToJpeg_144_sse4: 183.0
chrRangeToJpeg_256_c: 1253.0
chrRangeToJpeg_256_sse4: 343.5
chrRangeToJpeg_512_c: 2491.2
chrRangeToJpeg_512_sse4: 654.2
lumRangeFromJpeg_8_c: 11.7
lumRangeFromJpeg_8_sse4: 10.5
lumRangeFromJpeg_24_c: 38.5
lumRangeFromJpeg_24_sse4: 19.0
lumRangeFromJpeg_128_c: 237.5
lumRangeFromJpeg_128_sse4: 79.2
lumRangeFromJpeg_144_c: 255.7
lumRangeFromJpeg_144_sse4: 90.5
lumRangeFromJpeg_256_c: 441.5
lumRangeFromJpeg_256_sse4: 161.7
lumRangeFromJpeg_512_c: 879.0
lumRangeFromJpeg_512_sse4: 333.2
lumRangeToJpeg_8_c: 20.0
lumRangeToJpeg_8_sse4: 11.7
lumRangeToJpeg_24_c: 61.5
lumRangeToJpeg_24_sse4: 17.7
lumRangeToJpeg_128_c: 357.5
lumRangeToJpeg_128_sse4: 80.0
lumRangeToJpeg_144_c: 371.5
lumRangeToJpeg_144_sse4: 93.2
lumRangeToJpeg_256_c: 651.5
lumRangeToJpeg_256_sse4: 164.5
lumRangeToJpeg_512_c: 1279.0
lumRangeToJpeg_512_sse4: 333.7
---
 libswscale/swscale_internal.h|   1 +
 libswscale/utils.c   |   2 +
 libswscale/x86/Makefile  |   1 +
 libswscale/x86/range_convert.asm | 130 +++
 libswscale/x86/swscale.c |  36 +
 5 files changed, 170 insertions(+)
 create mode 100644 libswscale/x86/range_convert.asm

diff --git a/libswscale/swscale_internal.h b/libswscale/swscale_internal.h
index 5007dd422f..d5e7b5e71c 100644
--- a/libswscale/swscale_internal.h
+++ b/libswscale/swscale_internal.h
@@ -698,6 +698,7 @@ void ff_updateMMXDitherTables(SwsContext *c, int dstY);
 
 av_cold void ff_sws_init_range_convert(SwsContext *c);
 av_cold void ff_sws_init_range_convert_loongarch(SwsContext *c);
+av_cold void ff_sws_init_range_convert_x86(SwsContext *c);
 
 SwsFunc ff_yuv2rgb_init_x86(SwsContext *c);
 SwsFunc ff_yuv2rgb_init_ppc(SwsContext *c);
diff --git a/libswscale/utils.c b/libswscale/utils.c
index 476a24fea5..8dfa57b5ff 100644
--- a/libswscale/utils.c
+++ b/libswscale/utils.c
@@ -1082,6 +1082,8 @@ int sws_setColorspaceDetails(struct SwsContext *c, const 
int inv_table[4],
 ff_sws_init_range_convert(c);
 #if ARCH_LOONGARCH64
 ff_sws_init_range_convert_loongarch(c);
+#elif ARCH_X86
+ff_sws_init_range_convert_x86(c);
 #endif
 }
 
diff --git a/libswscale/x86/Makefile b/libswscale/x86/Makefile
index 68391494be..f00154941d 100644
--- a/libswscale/x86/Makefile
+++ b/libswscale/x86/Makefile
@@ -12,6 +12,7 @@ X86ASM-OBJS += x86/input.o
  \
x86/output.o \
x86/scale.o  \
x86/scale_avx2.o  \
+   x86/range_convert.o  \
x86/rgb_2_rgb.o  \
x86/yuv_2_rgb.o  \
x86/yuv2yuvX.o   \
diff --git a/libswscale/x86/range_convert.asm b/libswscale/x86/range_convert.asm
new file mode 100644
index 00..13983a386b
--- /dev/null
+++ b/libswscale/x86/range_convert.asm
@@ -0,0 +1,130 @@
+;**
+;* Copyright (c) 2024 Ramiro Polla
+;*
+;* This file is part of FFmpeg.
+;*
+;* FFmpeg is free software; you can redistribute it and/or
+;* modify it under the terms of the GNU Lesser General Public
+;* License as published by the Free Software Foundation; either
+;* version 2.1 of the License, or (at your option) any later version.
+;*
+;* FFmpeg is distributed in the hope that it will be useful,
+;* but WITHOUT ANY WARRANTY; without even the implied warranty of
+;* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+;* Lesser General Public License for more details.
+;*
+;* You should have received a copy of the GNU Lesser General Public
+;* License along with FFmpeg; if not, write to the Free Software
+;* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+;**
+
+%include "libavutil/x86/x86util.asm"
+
+SECTION_RODATA
+
+chr_to_mult:times 4 dd 4663
+chr_to_offset:  times 4 dd -9289992
+%define chr_to_shift 12
+
+chr_from_mult:  times 4 dd 1799
+chr_from_offset:times 4 dd 4081085
+%define chr_from_shift 11
+
+lum_to_mult:times

[FFmpeg-devel] [PATCH v2 1/4] checkasm: add tests for {lum, chr}ConvertRange

2024-06-11 Thread Ramiro Polla

---
 tests/checkasm/Makefile   |   2 +-
 tests/checkasm/checkasm.c |   1 +
 tests/checkasm/checkasm.h |   1 +
 tests/checkasm/sw_range_convert.c | 134 ++
 4 files changed, 137 insertions(+), 1 deletion(-)
 create mode 100644 tests/checkasm/sw_range_convert.c

diff --git a/tests/checkasm/Makefile b/tests/checkasm/Makefile
index 6eb94d10d5..f20732b37a 100644
--- a/tests/checkasm/Makefile
+++ b/tests/checkasm/Makefile
@@ -63,7 +63,7 @@ AVFILTEROBJS-$(CONFIG_SOBEL_FILTER)  += vf_convolution.o
 CHECKASMOBJS-$(CONFIG_AVFILTER) += $(AVFILTEROBJS-yes)
 
 # swscale tests
-SWSCALEOBJS += sw_gbrp.o sw_rgb.o sw_scale.o
+SWSCALEOBJS += sw_gbrp.o sw_range_convert.o 
sw_rgb.o sw_scale.o
 
 CHECKASMOBJS-$(CONFIG_SWSCALE)  += $(SWSCALEOBJS)
 
diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c
index 2329e2e1bc..56232ab1e0 100644
--- a/tests/checkasm/checkasm.c
+++ b/tests/checkasm/checkasm.c
@@ -251,6 +251,7 @@ static const struct {
 #endif
 #if CONFIG_SWSCALE
 { "sw_gbrp", checkasm_check_sw_gbrp },
+{ "sw_range_convert", checkasm_check_sw_range_convert },
 { "sw_rgb", checkasm_check_sw_rgb },
 { "sw_scale", checkasm_check_sw_scale },
 #endif
diff --git a/tests/checkasm/checkasm.h b/tests/checkasm/checkasm.h
index 211d7f52e6..e544007b67 100644
--- a/tests/checkasm/checkasm.h
+++ b/tests/checkasm/checkasm.h
@@ -119,6 +119,7 @@ void checkasm_check_rv40dsp(void);
 void checkasm_check_svq1enc(void);
 void checkasm_check_synth_filter(void);
 void checkasm_check_sw_gbrp(void);
+void checkasm_check_sw_range_convert(void);
 void checkasm_check_sw_rgb(void);
 void checkasm_check_sw_scale(void);
 void checkasm_check_takdsp(void);
diff --git a/tests/checkasm/sw_range_convert.c 
b/tests/checkasm/sw_range_convert.c
new file mode 100644
index 00..08029103d1
--- /dev/null
+++ b/tests/checkasm/sw_range_convert.c
@@ -0,0 +1,134 @@
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with FFmpeg; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+#include 
+
+#include "libavutil/common.h"
+#include "libavutil/intreadwrite.h"
+#include "libavutil/mem.h"
+#include "libavutil/mem_internal.h"
+
+#include "libswscale/swscale.h"
+#include "libswscale/swscale_internal.h"
+
+#include "checkasm.h"
+
+static void check_lumConvertRange(int from)
+{
+const char *func_str = from ? "lumRangeFromJpeg" : "lumRangeToJpeg";
+#define LARGEST_INPUT_SIZE 512
+#define INPUT_SIZES 6
+static const int input_sizes[] = {8, 24, 128, 144, 256, 512};
+struct SwsContext *ctx;
+
+LOCAL_ALIGNED_32(int16_t, dst0, [LARGEST_INPUT_SIZE]);
+LOCAL_ALIGNED_32(int16_t, dst1, [LARGEST_INPUT_SIZE]);
+
+declare_func(void, int16_t *dst, int width);
+
+ctx = sws_alloc_context();
+if (sws_init_context(ctx, NULL, NULL) < 0)
+fail();
+
+ctx->srcFormat = from ? AV_PIX_FMT_YUVJ444P : AV_PIX_FMT_YUV444P;
+ctx->dstFormat = from ? AV_PIX_FMT_YUV444P : AV_PIX_FMT_YUVJ444P;
+ctx->srcRange = from;
+ctx->dstRange = !from;
+
+for (int dstWi = 0; dstWi < INPUT_SIZES; dstWi++) {
+int width = input_sizes[dstWi];
+for (int i = 0; i < width; i++) {
+uint8_t r = rnd();
+dst0[i] = (int16_t) r << 7;
+dst1[i] = (int16_t) r << 7;
+}
+ff_sws_init_scale(ctx);
+if (check_func(ctx->lumConvertRange, "%s_%d", func_str, width)) {
+call_ref(dst0, width);
+call_new(dst1, width);
+if (memcmp(dst0, dst1, width * sizeof(int16_t)))
+fail();
+bench_new(dst1, width);
+}
+}
+
+sws_freeContext(ctx);
+}
+#undef LARGEST_INPUT_SIZE
+#undef INPUT_SIZES
+
+static void check_chrConvertRange(int from)
+{
+const char *func_str = from ? "chrRangeFromJpeg" : "chrRangeToJpeg";
+#define LARGEST_INPUT_SIZE 512
+#define INPUT_SIZES 6
+static const int input_sizes[] = {8, 24, 128, 144, 256, 512};
+struct SwsContext *ctx;
+
+LOCAL_ALIGNED_32(int16_t, dstU0, [LARGEST_INPUT_SIZE]);
+LOCAL_ALIGNED_32(int16_t, dstV0, [LARGEST_INPUT_SIZE]);
+LOCAL_ALIGNED_32(int16_t, dstU1, [LARGEST_INPUT_SIZE]);
+LOCAL_ALIGNED_32(int16_t, dstV1, [LARGEST_INPUT_SIZE]);
+
+declare_func(void, int16_t *dstU, int16_t *dstV, int width);
+
+

Re: [FFmpeg-devel] [PATCH 1/2] ffplay: add -scaling_quality option for SDL

2024-06-11 Thread Ramiro Polla

Hi,

On Mon, Jun 10, 2024 at 9:04 PM Marton Balint  wrote:
> On Tue, 4 Jun 2024, Ramiro Polla wrote:
> > On Thu, May 30, 2024 at 11:36 PM Ramiro Polla  
> > wrote:
> >>
> >> ---
> >>  doc/ffplay.texi  | 2 ++
> >>  fftools/ffplay.c | 6 +-
> >>  2 files changed, 7 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/doc/ffplay.texi b/doc/ffplay.texi
> >> index 93f77eeece..60f883e159 100644
> >> --- a/doc/ffplay.texi
> >> +++ b/doc/ffplay.texi
> >> @@ -72,6 +72,8 @@ as 100.
> >>  Force format.
> >>  @item -window_title @var{title}
> >>  Set window title (default is the input filename).
> >> +@item -scaling_quality @var{value}
> >> +Set SDL_HINT_RENDER_SCALE_QUALITY value (default is "linear").
> >>  @item -left @var{title}
> >>  Set the x position for the left of the window (default is a centered 
> >> window).
> >>  @item -top @var{title}
> >> diff --git a/fftools/ffplay.c b/fftools/ffplay.c
> >> index b9d11eecee..75d2bec777 100644
> >> --- a/fftools/ffplay.c
> >> +++ b/fftools/ffplay.c
> >> @@ -351,6 +351,7 @@ static int filter_nbthreads = 0;
> >>  static int enable_vulkan = 0;
> >>  static char *vulkan_params = NULL;
> >>  static const char *hwaccel = NULL;
> >> +static const char *scaling_quality = NULL;
> >>
> >>  /* current context */
> >>  static int is_full_screen;
> >> @@ -3683,6 +3684,7 @@ static const OptionDef options[] = {
> >>  { "framedrop",  OPT_TYPE_BOOL,   OPT_EXPERT, {  }, 
> >> "drop frames when cpu is too slow", "" },
> >>  { "infbuf", OPT_TYPE_BOOL,   OPT_EXPERT, { 
> >> _buffer }, "don't limit the input buffer size (useful with 
> >> realtime streams)", "" },
> >>  { "window_title",   OPT_TYPE_STRING,  0, { _title 
> >> }, "set window title", "window title" },
> >> +{ "scaling_quality",OPT_TYPE_STRING, OPT_EXPERT, { 
> >> _quality }, "set SDL_HINT_RENDER_SCALE_QUALITY value 
> >> (default=linear)", "value" },
> >>  { "left",   OPT_TYPE_INT,OPT_EXPERT, { _left 
> >> }, "set the x position for the left of the window", "x pos" },
> >>  { "top",OPT_TYPE_INT,OPT_EXPERT, { _top }, 
> >> "set the y position for the top of the window", "y pos" },
> >>  { "vf", OPT_TYPE_FUNC, OPT_FUNC_ARG | OPT_EXPERT, { 
> >> .func_arg = opt_add_vfilter }, "set video filters", "filter_graph" },
> >> @@ -3831,7 +3833,9 @@ int main(int argc, char **argv)
> >>  }
> >>  }
> >>  window = SDL_CreateWindow(program_name, SDL_WINDOWPOS_UNDEFINED, 
> >> SDL_WINDOWPOS_UNDEFINED, default_width, default_height, flags);
> >> -SDL_SetHint(SDL_HINT_RENDER_SCALE_QUALITY, "linear");
> >> +if (!scaling_quality)
> >> +scaling_quality = "linear";
> >> +SDL_SetHint(SDL_HINT_RENDER_SCALE_QUALITY, scaling_quality);
> >>  if (!window) {
> >>  av_log(NULL, AV_LOG_FATAL, "Failed to create window: %s", 
> >> SDL_GetError());
> >>  do_exit(NULL);
> >> --
> >> 2.39.2
> >>
> >
> > Can anyone comment on this? I had a few doubts on this patch:
> > - does the option name properly convey its functionality?
> > - is the documentation too terse?
> > - should we include the accepted values in the documentation, even
> > though they are sdl-specific?
>
> What is the benefit of having such option? I don't really see a strong use
> case for it. Also you want to propagate the scaling quality to placebo
> backend as well? Does it acutally make sense to do that?

I use this option to set scaling quality to "nearest" when I want the
display to be pixelated in fullscreen.

I haven't thought about the placebo backend. I'll have a look when I
get the time.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH] libavcodec/mjpeg: keep last_dc value unclipped

2024-06-07 Thread Ramiro Polla

On Fri, Jun 7, 2024 at 9:35 PM Andreas Rheinhardt
 wrote:
> Ramiro Polla:
> > Do av_clip_int16(val) _after_ copying the value to last_dc.
> >
> > Related commits: c28f648b19d and dffae122d0f
> > Related ticket: 4683
> > ---
> >  libavcodec/mjpegdec.c| 3 +--
> >  tests/ref/fate/jpg-12bpp | 2 +-
> >  2 files changed, 2 insertions(+), 3 deletions(-)
> >
> > diff --git a/libavcodec/mjpegdec.c b/libavcodec/mjpegdec.c
> > index 1481a7f285..7daec649bc 100644
> > --- a/libavcodec/mjpegdec.c
> > +++ b/libavcodec/mjpegdec.c
> > @@ -843,9 +843,8 @@ static int decode_block(MJpegDecodeContext *s, int16_t 
> > *block, int component,
> >  return AVERROR_INVALIDDATA;
> >  }
> >  val = val * (unsigned)quant_matrix[0] + s->last_dc[component];
> > -val = av_clip_int16(val);
> >  s->last_dc[component] = val;
> > -block[0] = val;
> > +block[0] = av_clip_int16(val);
> >  /* AC coefs */
> >  i = 0;
> >  {OPEN_READER(re, >gb);
> > diff --git a/tests/ref/fate/jpg-12bpp b/tests/ref/fate/jpg-12bpp
> > index b3c662d587..9b039a92c6 100644
> > --- a/tests/ref/fate/jpg-12bpp
> > +++ b/tests/ref/fate/jpg-12bpp
> > @@ -3,4 +3,4 @@
> >  #codec_id 0: rawvideo
> >  #dimensions 0: 999x749
> >  #sar 0: 1/1
> > -0,  0,  0,1,  1496502, 0xd91deb4b
> > +0,  0,  0,1,  1496502, 0x44efc0af
>
> Is the change for the fate-sample supposed to be an improvement or what
> is the rationale for this? (Is this change mandated by the spec?)

As far as I can tell the only sample we have that triggers this is
buggy anyways, so it's not something spec-related. It seems more
correct to me to clip the values that overflow only for the block, and
not propagate the differences from the clipping to the next dc values.

This change comes from another project where I decouple the bitstream
reading from the processing. Currently we have this comment in
MJpegDecodeContext:
int last_dc[MAX_COMPONENTS]; /* last DEQUANTIZED dc (XXX: am I right
to do that ?) */

What I do is keep the last quantized dc values as they were read from
the bitstream, but then I have to add the dc shift for every block.
Since it incurs one extra add per block, I'm not submitting the entire
patch, but only this chunk.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH 1/4] tests/checkasm: cosmetics, one object per line in Makefile

2024-06-07 Thread Ramiro Polla

On Fri, Jun 7, 2024 at 9:27 PM Andreas Rheinhardt
 wrote:
>
> Ramiro Polla:
> >  # swscale tests
> > -SWSCALEOBJS += sw_gbrp.o sw_rgb.o sw_scale.o
> > +SWSCALEOBJS += sw_gbrp.o \
> > +   sw_rgb.o \
> > +   sw_scale.o \
> >
> >  CHECKASMOBJS-$(CONFIG_SWSCALE)  += $(SWSCALEOBJS)
>
> We typically only use a new line of the old line is full.

There's currently a mix of everything in the Makefiles. One object per
line, multiple objects per line, mix of one or multiple objects per
line in the same statement, aligned and unaligned += between lines,
aligned and unaligned \ at the end of the lines, some have \ at the
last line, some don't...

I personally prefer += one object per line and no \ at the end of the
line everywhere. It makes the code look consistent and the patches are
cleaner and easier to understand. But I don't maintain this, so I have
no strong opinion in this case.

This patch was meant to simplify the next commit (checkasm: add tests
for {lum,chr}ConvertRange), but I can drop it if you prefer.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH] libavcodec/mjpeg: keep last_dc value unclipped

2024-06-07 Thread Ramiro Polla

Do av_clip_int16(val) _after_ copying the value to last_dc.

Related commits: c28f648b19d and dffae122d0f
Related ticket: 4683
---
 libavcodec/mjpegdec.c| 3 +--
 tests/ref/fate/jpg-12bpp | 2 +-
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/libavcodec/mjpegdec.c b/libavcodec/mjpegdec.c
index 1481a7f285..7daec649bc 100644
--- a/libavcodec/mjpegdec.c
+++ b/libavcodec/mjpegdec.c
@@ -843,9 +843,8 @@ static int decode_block(MJpegDecodeContext *s, int16_t 
*block, int component,
 return AVERROR_INVALIDDATA;
 }
 val = val * (unsigned)quant_matrix[0] + s->last_dc[component];
-val = av_clip_int16(val);
 s->last_dc[component] = val;
-block[0] = val;
+block[0] = av_clip_int16(val);
 /* AC coefs */
 i = 0;
 {OPEN_READER(re, >gb);
diff --git a/tests/ref/fate/jpg-12bpp b/tests/ref/fate/jpg-12bpp
index b3c662d587..9b039a92c6 100644
--- a/tests/ref/fate/jpg-12bpp
+++ b/tests/ref/fate/jpg-12bpp
@@ -3,4 +3,4 @@
 #codec_id 0: rawvideo
 #dimensions 0: 999x749
 #sar 0: 1/1
-0,  0,  0,1,  1496502, 0xd91deb4b
+0,  0,  0,1,  1496502, 0x44efc0af
-- 
2.30.2

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH 1/4] tests/checkasm: cosmetics, one object per line in Makefile

2024-06-07 Thread Ramiro Polla

On Fri, Jun 7, 2024 at 8:46 PM Andreas Rheinhardt
 wrote:
>
> Ramiro Polla:
> > ---
> >  tests/checkasm/Makefile | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/tests/checkasm/Makefile b/tests/checkasm/Makefile
> > index 6eb94d10d5..3ce152e818 100644
> > --- a/tests/checkasm/Makefile
> > +++ b/tests/checkasm/Makefile
> > @@ -63,7 +63,9 @@ AVFILTEROBJS-$(CONFIG_SOBEL_FILTER)  += 
> > vf_convolution.o
> >  CHECKASMOBJS-$(CONFIG_AVFILTER) += $(AVFILTEROBJS-yes)
> >
> >  # swscale tests
> > -SWSCALEOBJS += sw_gbrp.o sw_rgb.o sw_scale.o
> > +SWSCALEOBJS += sw_gbrp.o
> > +SWSCALEOBJS += sw_rgb.o
> > +SWSCALEOBJS += sw_scale.o
> >
> >  CHECKASMOBJS-$(CONFIG_SWSCALE)  += $(SWSCALEOBJS)
> >
>
> We use the multiple-objects in a line style in all Makefiles.

Then we should change the following:
libswscale/arm/Makefile (NEON_OBJS)
tests/checkasm/Makefile (AVUTILOBJS)
libavfilter/dnn/Makefile (OBJS-$(CONFIG_DNN))

New patch attached.
From 4965ece9648be5da6e93b6bfa319b6a5fe92aee6 Mon Sep 17 00:00:00 2001
From: Ramiro Polla 
Date: Thu, 6 Jun 2024 15:40:03 +0200
Subject: [PATCH] tests/checkasm: cosmetics, one object per line in Makefile

---
 tests/checkasm/Makefile | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tests/checkasm/Makefile b/tests/checkasm/Makefile
index 6eb94d10d5..c2a41d7f7b 100644
--- a/tests/checkasm/Makefile
+++ b/tests/checkasm/Makefile
@@ -63,7 +63,9 @@ AVFILTEROBJS-$(CONFIG_SOBEL_FILTER)  += vf_convolution.o
 CHECKASMOBJS-$(CONFIG_AVFILTER) += $(AVFILTEROBJS-yes)
 
 # swscale tests
-SWSCALEOBJS += sw_gbrp.o sw_rgb.o sw_scale.o
+SWSCALEOBJS += sw_gbrp.o \
+   sw_rgb.o \
+   sw_scale.o \
 
 CHECKASMOBJS-$(CONFIG_SWSCALE)  += $(SWSCALEOBJS)
 
-- 
2.30.2

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH] swscale/x86: add avx2 {lum, chr}ConvertRange

2024-06-07 Thread Ramiro Polla

On Fri, Jun 7, 2024 at 7:38 PM Ramiro Polla  wrote:
>
> chrRangeFromJpeg_8_c: 49.4
> chrRangeFromJpeg_8_sse4: 15.9
> chrRangeFromJpeg_8_avx2: 22.6
> chrRangeFromJpeg_24_c: 129.4
> chrRangeFromJpeg_24_sse4: 32.1
> chrRangeFromJpeg_24_avx2: 35.1
> chrRangeFromJpeg_128_c: 534.6
> chrRangeFromJpeg_128_sse4: 165.6
> chrRangeFromJpeg_128_avx2: 100.4
> chrRangeFromJpeg_144_c: 735.6
> chrRangeFromJpeg_144_sse4: 185.1
> chrRangeFromJpeg_144_avx2: 109.4
> chrRangeFromJpeg_256_c: 634.6
> chrRangeFromJpeg_256_sse4: 323.6
> chrRangeFromJpeg_256_avx2: 192.6
> chrRangeFromJpeg_512_c: 1242.4
> chrRangeFromJpeg_512_sse4: 662.1
> chrRangeFromJpeg_512_avx2: 409.1
> chrRangeToJpeg_8_c: 39.6
> chrRangeToJpeg_8_sse4: 15.9
> chrRangeToJpeg_8_avx2: 25.4
> chrRangeToJpeg_24_c: 118.9
> chrRangeToJpeg_24_sse4: 32.9
> chrRangeToJpeg_24_avx2: 30.1
> chrRangeToJpeg_128_c: 636.9
> chrRangeToJpeg_128_sse4: 164.4
> chrRangeToJpeg_128_avx2: 96.6
> chrRangeToJpeg_144_c: 716.4
> chrRangeToJpeg_144_sse4: 187.1
> chrRangeToJpeg_144_avx2: 109.4
> chrRangeToJpeg_256_c: 1258.6
> chrRangeToJpeg_256_sse4: 326.1
> chrRangeToJpeg_256_avx2: 193.9
> chrRangeToJpeg_512_c: 2489.4
> chrRangeToJpeg_512_sse4: 662.1
> chrRangeToJpeg_512_avx2: 382.4
> lumRangeFromJpeg_8_c: 13.6
> lumRangeFromJpeg_8_sse4: 14.4
> lumRangeFromJpeg_8_avx2: 19.6
> lumRangeFromJpeg_24_c: 38.9
> lumRangeFromJpeg_24_sse4: 18.9
> lumRangeFromJpeg_24_avx2: 23.9
> lumRangeFromJpeg_128_c: 239.4
> lumRangeFromJpeg_128_sse4: 81.9
> lumRangeFromJpeg_128_avx2: 51.6
> lumRangeFromJpeg_144_c: 285.1
> lumRangeFromJpeg_144_sse4: 92.1
> lumRangeFromJpeg_144_avx2: 59.6
> lumRangeFromJpeg_256_c: 857.1
> lumRangeFromJpeg_256_sse4: 164.4
> lumRangeFromJpeg_256_avx2: 101.9
> lumRangeFromJpeg_512_c: 1028.6
> lumRangeFromJpeg_512_sse4: 335.6
> lumRangeFromJpeg_512_avx2: 201.4
> lumRangeToJpeg_8_c: 20.4
> lumRangeToJpeg_8_sse4: 14.4
> lumRangeToJpeg_8_avx2: 20.4
> lumRangeToJpeg_24_c: 58.1
> lumRangeToJpeg_24_sse4: 18.9
> lumRangeToJpeg_24_avx2: 22.6
> lumRangeToJpeg_128_c: 327.4
> lumRangeToJpeg_128_sse4: 83.4
> lumRangeToJpeg_128_avx2: 53.6
> lumRangeToJpeg_144_c: 375.6
> lumRangeToJpeg_144_sse4: 93.9
> lumRangeToJpeg_144_avx2: 58.9
> lumRangeToJpeg_256_c: 649.6
> lumRangeToJpeg_256_sse4: 162.1
> lumRangeToJpeg_256_avx2: 101.9
> lumRangeToJpeg_512_c: 1289.1
> lumRangeToJpeg_512_sse4: 335.6
> lumRangeToJpeg_512_avx2: 201.4
> ---
>  libswscale/x86/range_convert.asm | 46 ++--
>  libswscale/x86/swscale.c |  5 +++-
>  2 files changed, 42 insertions(+), 9 deletions(-)
>
> diff --git a/libswscale/x86/range_convert.asm 
> b/libswscale/x86/range_convert.asm
> index 13983a386b..54c2f64769 100644
> --- a/libswscale/x86/range_convert.asm
> +++ b/libswscale/x86/range_convert.asm
[...]
> @@ -66,10 +66,19 @@ cglobal %1, 2, 3, 3, dst, width, x
>  padddm1, m5
>  psradm0, %4
>  psradm1, %4
> +%if mmsize == 16
>  packssdw m0, m0
>  packssdw m1, m1
>  movq[dstq+xq*2], m0
>  movq[dstq+xq*2+mmsize/2], m1
> +%else
> +vextracti128xm7, ym0, 1
> +packssdwxm0, xm7
> +vextracti128xm7, ym1, 1
> +packssdwxm1, xm7
> +movdqu  [dstq+xq*2], xm0
> +movdqu  [dstq+xq*2+mmsize/2], xm1
> +%endif
>  add  xq, mmsize / 2
>  cmp  xd, widthd
>  jl .loop

Is there a cleaner way to do this packing in avx2 (or a macro to have
the same code as non-avx2)? Also is there some cleaner way to move
half the register into memory (instead of having to ifdef between
mmsize)?
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH] swscale/x86: add avx2 {lum, chr}ConvertRange

2024-06-07 Thread Ramiro Polla

chrRangeFromJpeg_8_c: 49.4
chrRangeFromJpeg_8_sse4: 15.9
chrRangeFromJpeg_8_avx2: 22.6
chrRangeFromJpeg_24_c: 129.4
chrRangeFromJpeg_24_sse4: 32.1
chrRangeFromJpeg_24_avx2: 35.1
chrRangeFromJpeg_128_c: 534.6
chrRangeFromJpeg_128_sse4: 165.6
chrRangeFromJpeg_128_avx2: 100.4
chrRangeFromJpeg_144_c: 735.6
chrRangeFromJpeg_144_sse4: 185.1
chrRangeFromJpeg_144_avx2: 109.4
chrRangeFromJpeg_256_c: 634.6
chrRangeFromJpeg_256_sse4: 323.6
chrRangeFromJpeg_256_avx2: 192.6
chrRangeFromJpeg_512_c: 1242.4
chrRangeFromJpeg_512_sse4: 662.1
chrRangeFromJpeg_512_avx2: 409.1
chrRangeToJpeg_8_c: 39.6
chrRangeToJpeg_8_sse4: 15.9
chrRangeToJpeg_8_avx2: 25.4
chrRangeToJpeg_24_c: 118.9
chrRangeToJpeg_24_sse4: 32.9
chrRangeToJpeg_24_avx2: 30.1
chrRangeToJpeg_128_c: 636.9
chrRangeToJpeg_128_sse4: 164.4
chrRangeToJpeg_128_avx2: 96.6
chrRangeToJpeg_144_c: 716.4
chrRangeToJpeg_144_sse4: 187.1
chrRangeToJpeg_144_avx2: 109.4
chrRangeToJpeg_256_c: 1258.6
chrRangeToJpeg_256_sse4: 326.1
chrRangeToJpeg_256_avx2: 193.9
chrRangeToJpeg_512_c: 2489.4
chrRangeToJpeg_512_sse4: 662.1
chrRangeToJpeg_512_avx2: 382.4
lumRangeFromJpeg_8_c: 13.6
lumRangeFromJpeg_8_sse4: 14.4
lumRangeFromJpeg_8_avx2: 19.6
lumRangeFromJpeg_24_c: 38.9
lumRangeFromJpeg_24_sse4: 18.9
lumRangeFromJpeg_24_avx2: 23.9
lumRangeFromJpeg_128_c: 239.4
lumRangeFromJpeg_128_sse4: 81.9
lumRangeFromJpeg_128_avx2: 51.6
lumRangeFromJpeg_144_c: 285.1
lumRangeFromJpeg_144_sse4: 92.1
lumRangeFromJpeg_144_avx2: 59.6
lumRangeFromJpeg_256_c: 857.1
lumRangeFromJpeg_256_sse4: 164.4
lumRangeFromJpeg_256_avx2: 101.9
lumRangeFromJpeg_512_c: 1028.6
lumRangeFromJpeg_512_sse4: 335.6
lumRangeFromJpeg_512_avx2: 201.4
lumRangeToJpeg_8_c: 20.4
lumRangeToJpeg_8_sse4: 14.4
lumRangeToJpeg_8_avx2: 20.4
lumRangeToJpeg_24_c: 58.1
lumRangeToJpeg_24_sse4: 18.9
lumRangeToJpeg_24_avx2: 22.6
lumRangeToJpeg_128_c: 327.4
lumRangeToJpeg_128_sse4: 83.4
lumRangeToJpeg_128_avx2: 53.6
lumRangeToJpeg_144_c: 375.6
lumRangeToJpeg_144_sse4: 93.9
lumRangeToJpeg_144_avx2: 58.9
lumRangeToJpeg_256_c: 649.6
lumRangeToJpeg_256_sse4: 162.1
lumRangeToJpeg_256_avx2: 101.9
lumRangeToJpeg_512_c: 1289.1
lumRangeToJpeg_512_sse4: 335.6
lumRangeToJpeg_512_avx2: 201.4
---
 libswscale/x86/range_convert.asm | 46 ++--
 libswscale/x86/swscale.c |  5 +++-
 2 files changed, 42 insertions(+), 9 deletions(-)

diff --git a/libswscale/x86/range_convert.asm b/libswscale/x86/range_convert.asm
index 13983a386b..54c2f64769 100644
--- a/libswscale/x86/range_convert.asm
+++ b/libswscale/x86/range_convert.asm
@@ -22,20 +22,20 @@
 
 SECTION_RODATA
 
-chr_to_mult:times 4 dd 4663
-chr_to_offset:  times 4 dd -9289992
+chr_to_mult:times 8 dd 4663
+chr_to_offset:  times 8 dd -9289992
 %define chr_to_shift 12
 
-chr_from_mult:  times 4 dd 1799
-chr_from_offset:times 4 dd 4081085
+chr_from_mult:  times 8 dd 1799
+chr_from_offset:times 8 dd 4081085
 %define chr_from_shift 11
 
-lum_to_mult:times 4 dd 19077
-lum_to_offset:  times 4 dd -39057361
+lum_to_mult:times 8 dd 19077
+lum_to_offset:  times 8 dd -39057361
 %define lum_to_shift 14
 
-lum_from_mult:  times 4 dd 14071
-lum_from_offset:times 4 dd 33561947
+lum_from_mult:  times 8 dd 14071
+lum_from_offset:times 8 dd 33561947
 %define lum_from_shift 14
 
 SECTION .text
@@ -66,10 +66,19 @@ cglobal %1, 2, 3, 3, dst, width, x
 padddm1, m5
 psradm0, %4
 psradm1, %4
+%if mmsize == 16
 packssdw m0, m0
 packssdw m1, m1
 movq[dstq+xq*2], m0
 movq[dstq+xq*2+mmsize/2], m1
+%else
+vextracti128xm7, ym0, 1
+packssdwxm0, xm7
+vextracti128xm7, ym1, 1
+packssdwxm1, xm7
+movdqu  [dstq+xq*2], xm0
+movdqu  [dstq+xq*2+mmsize/2], xm1
+%endif
 add  xq, mmsize / 2
 cmp  xd, widthd
 jl .loop
@@ -107,6 +116,7 @@ cglobal %1, 3, 4, 4, dstU, dstV, width, x
 psradm1, %4
 psradm2, %4
 psradm3, %4
+%if mmsize == 16
 packssdw m0, m0
 packssdw m1, m1
 packssdw m2, m2
@@ -115,6 +125,20 @@ cglobal %1, 3, 4, 4, dstU, dstV, width, x
 movq   [dstUq+xq*2+mmsize/2], m1
 movq   [dstVq+xq*2], m2
 movq   [dstVq+xq*2+mmsize/2], m3
+%else
+vextracti128xm7, ym0, 1
+packssdwxm0, xm7
+vextracti128xm7, ym1, 1
+packssdwxm1, xm7
+vextracti128xm7, ym2, 1
+packssdwxm2, xm7
+vextracti128xm7, ym3, 1
+packssdwxm3, xm7
+movdqu [dstUq+xq*2], xm0
+movdqu [dstUq+xq*2+mmsize/2], xm1
+movdqu [dstVq+xq*2], xm2
+movdqu [dstVq+xq*2+mmsize/2], xm3
+%endif
 add  xq, mmsize / 2
 cmp  xd, widthd
 jl .loop
@@ -127,4 +151,10 @@ LUMCONVERTRANGE lumRangeToJpeg,   lum_to_mult,   
lum_to_offset,   lum_to_shift
 CHRCONVERTRANGE chrRangeToJpeg,   chr_to_mult,   chr_to_offset,

Re: [FFmpeg-devel] [PATCH 3/4] swscale/x86: add sse4 {lum, chr}ConvertRange

2024-06-07 Thread Ramiro Polla

On Fri, Jun 7, 2024 at 4:05 PM Ramiro Polla  wrote:
>
> chrRangeFromJpeg_8_c: 19.9
> chrRangeFromJpeg_8_sse4: 16.2
> chrRangeFromJpeg_24_c: 60.7
> chrRangeFromJpeg_24_sse4: 28.9
> chrRangeFromJpeg_128_c: 325.7
> chrRangeFromJpeg_128_sse4: 160.2
> chrRangeFromJpeg_144_c: 364.2
> chrRangeFromJpeg_144_sse4: 194.9
> chrRangeFromJpeg_256_c: 630.7
> chrRangeFromJpeg_256_sse4: 337.4
> chrRangeFromJpeg_512_c: 1240.4
> chrRangeFromJpeg_512_sse4: 668.4
> chrRangeToJpeg_8_c: 37.7
> chrRangeToJpeg_8_sse4: 19.7
> chrRangeToJpeg_24_c: 114.7
> chrRangeToJpeg_24_sse4: 30.2
> chrRangeToJpeg_128_c: 636.4
> chrRangeToJpeg_128_sse4: 161.7
> chrRangeToJpeg_144_c: 715.7
> chrRangeToJpeg_144_sse4: 272.9
> chrRangeToJpeg_256_c: 1256.7
> chrRangeToJpeg_256_sse4: 341.9
> chrRangeToJpeg_512_c: 2498.7
> chrRangeToJpeg_512_sse4: 668.4
> lumRangeFromJpeg_8_c: 11.7
> lumRangeFromJpeg_8_sse4: 12.4
> lumRangeFromJpeg_24_c: 36.9
> lumRangeFromJpeg_24_sse4: 17.7
> lumRangeFromJpeg_128_c: 228.4
> lumRangeFromJpeg_128_sse4: 85.2
> lumRangeFromJpeg_144_c: 272.9
> lumRangeFromJpeg_144_sse4: 96.9
> lumRangeFromJpeg_256_c: 463.4
> lumRangeFromJpeg_256_sse4: 183.9
> lumRangeFromJpeg_512_c: 879.9
> lumRangeFromJpeg_512_sse4: 355.9
> lumRangeToJpeg_8_c: 17.7
> lumRangeToJpeg_8_sse4: 15.4
> lumRangeToJpeg_24_c: 56.2
> lumRangeToJpeg_24_sse4: 18.4
> lumRangeToJpeg_128_c: 331.4
> lumRangeToJpeg_128_sse4: 84.4
> lumRangeToJpeg_144_c: 375.2
> lumRangeToJpeg_144_sse4: 96.9
> lumRangeToJpeg_256_c: 649.7
> lumRangeToJpeg_256_sse4: 184.4
> lumRangeToJpeg_512_c: 1281.9
> lumRangeToJpeg_512_sse4: 355.9
> ---
>  libswscale/swscale_internal.h|   1 +
>  libswscale/utils.c   |   2 +
>  libswscale/x86/Makefile  |   1 +
>  libswscale/x86/range_convert.asm | 100 +++
>  libswscale/x86/swscale.c |  36 +++
>  5 files changed, 140 insertions(+)
>  create mode 100644 libswscale/x86/range_convert.asm
>
> diff --git a/libswscale/swscale_internal.h b/libswscale/swscale_internal.h
> index d4b0c3cee2..92f6105443 100644
> --- a/libswscale/swscale_internal.h
> +++ b/libswscale/swscale_internal.h
> @@ -698,6 +698,7 @@ void ff_updateMMXDitherTables(SwsContext *c, int dstY);
>
>  av_cold void ff_sws_init_range_convert(SwsContext *c);
>  av_cold void ff_sws_init_range_convert_loongarch(SwsContext *c);
> +av_cold void ff_sws_init_range_convert_x86(SwsContext *c);
>
>  SwsFunc ff_yuv2rgb_init_x86(SwsContext *c);
>  SwsFunc ff_yuv2rgb_init_ppc(SwsContext *c);
> diff --git a/libswscale/utils.c b/libswscale/utils.c
> index 476a24fea5..8dfa57b5ff 100644
> --- a/libswscale/utils.c
> +++ b/libswscale/utils.c
> @@ -1082,6 +1082,8 @@ int sws_setColorspaceDetails(struct SwsContext *c, 
> const int inv_table[4],
>  ff_sws_init_range_convert(c);
>  #if ARCH_LOONGARCH64
>  ff_sws_init_range_convert_loongarch(c);
> +#elif ARCH_X86
> +ff_sws_init_range_convert_x86(c);
>  #endif
>  }
>
> diff --git a/libswscale/x86/Makefile b/libswscale/x86/Makefile
> index 68391494be..f00154941d 100644
> --- a/libswscale/x86/Makefile
> +++ b/libswscale/x86/Makefile
> @@ -12,6 +12,7 @@ X86ASM-OBJS += x86/input.o  
> \
> x86/output.o \
> x86/scale.o  \
> x86/scale_avx2.o  
> \
> +   x86/range_convert.o  \
> x86/rgb_2_rgb.o  \
> x86/yuv_2_rgb.o  \
> x86/yuv2yuvX.o   \
> diff --git a/libswscale/x86/range_convert.asm 
> b/libswscale/x86/range_convert.asm
> new file mode 100644
> index 00..333265fb65
> --- /dev/null
> +++ b/libswscale/x86/range_convert.asm
> @@ -0,0 +1,100 @@
> +;**
> +;* Copyright (c) 2024 Ramiro Polla
> +;*
> +;* This file is part of FFmpeg.
> +;*
> +;* FFmpeg is free software; you can redistribute it and/or
> +;* modify it under the terms of the GNU Lesser General Public
> +;* License as published by the Free Software Foundation; either
> +;* version 2.1 of the License, or (at your option) any later version.
> +;*
> +;* FFmpeg is distributed in the hope that it will be useful,
> +;* but WITHOUT ANY WARRANTY; without even the implied warranty of
> +;* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +;* Lesser General Publi

[FFmpeg-devel] [PATCH 4/4] swscale/aarch64: add neon {lum, chr}ConvertRange

2024-06-07 Thread Ramiro Polla

chrRangeFromJpeg_8_c: 28.5
chrRangeFromJpeg_8_neon: 21.2
chrRangeFromJpeg_24_c: 81.2
chrRangeFromJpeg_24_neon: 34.7
chrRangeFromJpeg_128_c: 425.2
chrRangeFromJpeg_128_neon: 162.0
chrRangeFromJpeg_144_c: 480.2
chrRangeFromJpeg_144_neon: 180.2
chrRangeFromJpeg_256_c: 838.2
chrRangeFromJpeg_256_neon: 318.0
chrRangeFromJpeg_512_c: 1698.2
chrRangeFromJpeg_512_neon: 630.0
chrRangeToJpeg_8_c: 56.0
chrRangeToJpeg_8_neon: 23.5
chrRangeToJpeg_24_c: 147.7
chrRangeToJpeg_24_neon: 38.2
chrRangeToJpeg_128_c: 760.2
chrRangeToJpeg_128_neon: 182.5
chrRangeToJpeg_144_c: 857.7
chrRangeToJpeg_144_neon: 204.5
chrRangeToJpeg_256_c: 1504.2
chrRangeToJpeg_256_neon: 358.5
chrRangeToJpeg_512_c: 3025.7
chrRangeToJpeg_512_neon: 710.5
lumRangeFromJpeg_8_c: 24.0
lumRangeFromJpeg_8_neon: 18.2
lumRangeFromJpeg_24_c: 64.0
lumRangeFromJpeg_24_neon: 22.2
lumRangeFromJpeg_128_c: 289.2
lumRangeFromJpeg_128_neon: 79.2
lumRangeFromJpeg_144_c: 334.7
lumRangeFromJpeg_144_neon: 87.7
lumRangeFromJpeg_256_c: 579.5
lumRangeFromJpeg_256_neon: 152.0
lumRangeFromJpeg_512_c: 1208.0
lumRangeFromJpeg_512_neon: 299.0
lumRangeToJpeg_8_c: 30.0
lumRangeToJpeg_8_neon: 19.0
lumRangeToJpeg_24_c: 82.2
lumRangeToJpeg_24_neon: 24.0
lumRangeToJpeg_128_c: 440.7
lumRangeToJpeg_128_neon: 90.5
lumRangeToJpeg_144_c: 502.0
lumRangeToJpeg_144_neon: 102.2
lumRangeToJpeg_256_c: 893.7
lumRangeToJpeg_256_neon: 178.0
lumRangeToJpeg_512_c: 1793.7
lumRangeToJpeg_512_neon: 355.0
---
 libswscale/aarch64/Makefile |   1 +
 libswscale/aarch64/range_convert_neon.S | 103 
 libswscale/aarch64/swscale.c|  21 +
 libswscale/swscale_internal.h   |   1 +
 libswscale/utils.c  |   4 +-
 5 files changed, 129 insertions(+), 1 deletion(-)
 create mode 100644 libswscale/aarch64/range_convert_neon.S

diff --git a/libswscale/aarch64/Makefile b/libswscale/aarch64/Makefile
index da1d909561..6923827f82 100644
--- a/libswscale/aarch64/Makefile
+++ b/libswscale/aarch64/Makefile
@@ -4,5 +4,6 @@ OBJS+= aarch64/rgb2rgb.o\
 
 NEON-OBJS   += aarch64/hscale.o \
aarch64/output.o \
+   aarch64/range_convert_neon.o \
aarch64/rgb2rgb_neon.o   \
aarch64/yuv2rgb_neon.o   \
diff --git a/libswscale/aarch64/range_convert_neon.S 
b/libswscale/aarch64/range_convert_neon.S
new file mode 100644
index 00..5e104971f0
--- /dev/null
+++ b/libswscale/aarch64/range_convert_neon.S
@@ -0,0 +1,103 @@
+/*
+ * Copyright (c) 2024 Ramiro Polla
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include "libavutil/aarch64/asm.S"
+
+.macro lumConvertRange name max mult offset shift
+const offset_\name, align=4
+.word \offset, \offset, \offset, \offset
+endconst
+function ff_\name, export=1
+.if \max != 0
+mov w3, #\max
+dup v24.8h, w3
+.endif
+mov w3, #\mult
+dup v25.4s, w3
+movrel  x3, offset_\name
+ld1 {v26.4s}, [x3]
+1:
+ld1 {v0.8h}, [x0]
+.if \max != 0
+sminv0.8h, v0.8h, v24.8h
+.endif
+mov v16.16b, v26.16b
+mov v18.16b, v26.16b
+sxtlv20.4s, v0.4h
+sxtl2   v22.4s, v0.8h
+mla v16.4s, v20.4s, v25.4s
+mla v18.4s, v22.4s, v25.4s
+shrnv0.4h, v16.4s, #\shift
+shrn2   v0.8h, v18.4s, #\shift
+subsw1, w1, #8
+st1 {v0.8h}, [x0], #16
+b.gt1b
+ret
+endfunc
+.endm
+
+.macro chrConvertRange name max mult offset shift
+const offset_\name, align=4
+.word \offset, \offset, \offset, \offset
+endconst
+function ff_\name, export=1
+.if \max != 0
+mov w3, #\max
+dup v24.8h, w3
+.endif
+mov w3, #\mult
+dup v25.4s, w3
+movrel  x3, offset_\name
+ld1 {v26.4s}, [x3]
+1:
+ld1 {v0.8h}, [x0]
+ld1 {v1.8h}, [x1]
+.if \max != 0
+sminv0.8h, v0.8h, v24.8h
+

[FFmpeg-devel] [PATCH 3/4] swscale/x86: add sse4 {lum, chr}ConvertRange

2024-06-07 Thread Ramiro Polla

chrRangeFromJpeg_8_c: 19.9
chrRangeFromJpeg_8_sse4: 16.2
chrRangeFromJpeg_24_c: 60.7
chrRangeFromJpeg_24_sse4: 28.9
chrRangeFromJpeg_128_c: 325.7
chrRangeFromJpeg_128_sse4: 160.2
chrRangeFromJpeg_144_c: 364.2
chrRangeFromJpeg_144_sse4: 194.9
chrRangeFromJpeg_256_c: 630.7
chrRangeFromJpeg_256_sse4: 337.4
chrRangeFromJpeg_512_c: 1240.4
chrRangeFromJpeg_512_sse4: 668.4
chrRangeToJpeg_8_c: 37.7
chrRangeToJpeg_8_sse4: 19.7
chrRangeToJpeg_24_c: 114.7
chrRangeToJpeg_24_sse4: 30.2
chrRangeToJpeg_128_c: 636.4
chrRangeToJpeg_128_sse4: 161.7
chrRangeToJpeg_144_c: 715.7
chrRangeToJpeg_144_sse4: 272.9
chrRangeToJpeg_256_c: 1256.7
chrRangeToJpeg_256_sse4: 341.9
chrRangeToJpeg_512_c: 2498.7
chrRangeToJpeg_512_sse4: 668.4
lumRangeFromJpeg_8_c: 11.7
lumRangeFromJpeg_8_sse4: 12.4
lumRangeFromJpeg_24_c: 36.9
lumRangeFromJpeg_24_sse4: 17.7
lumRangeFromJpeg_128_c: 228.4
lumRangeFromJpeg_128_sse4: 85.2
lumRangeFromJpeg_144_c: 272.9
lumRangeFromJpeg_144_sse4: 96.9
lumRangeFromJpeg_256_c: 463.4
lumRangeFromJpeg_256_sse4: 183.9
lumRangeFromJpeg_512_c: 879.9
lumRangeFromJpeg_512_sse4: 355.9
lumRangeToJpeg_8_c: 17.7
lumRangeToJpeg_8_sse4: 15.4
lumRangeToJpeg_24_c: 56.2
lumRangeToJpeg_24_sse4: 18.4
lumRangeToJpeg_128_c: 331.4
lumRangeToJpeg_128_sse4: 84.4
lumRangeToJpeg_144_c: 375.2
lumRangeToJpeg_144_sse4: 96.9
lumRangeToJpeg_256_c: 649.7
lumRangeToJpeg_256_sse4: 184.4
lumRangeToJpeg_512_c: 1281.9
lumRangeToJpeg_512_sse4: 355.9
---
 libswscale/swscale_internal.h|   1 +
 libswscale/utils.c   |   2 +
 libswscale/x86/Makefile  |   1 +
 libswscale/x86/range_convert.asm | 100 +++
 libswscale/x86/swscale.c |  36 +++
 5 files changed, 140 insertions(+)
 create mode 100644 libswscale/x86/range_convert.asm

diff --git a/libswscale/swscale_internal.h b/libswscale/swscale_internal.h
index d4b0c3cee2..92f6105443 100644
--- a/libswscale/swscale_internal.h
+++ b/libswscale/swscale_internal.h
@@ -698,6 +698,7 @@ void ff_updateMMXDitherTables(SwsContext *c, int dstY);
 
 av_cold void ff_sws_init_range_convert(SwsContext *c);
 av_cold void ff_sws_init_range_convert_loongarch(SwsContext *c);
+av_cold void ff_sws_init_range_convert_x86(SwsContext *c);
 
 SwsFunc ff_yuv2rgb_init_x86(SwsContext *c);
 SwsFunc ff_yuv2rgb_init_ppc(SwsContext *c);
diff --git a/libswscale/utils.c b/libswscale/utils.c
index 476a24fea5..8dfa57b5ff 100644
--- a/libswscale/utils.c
+++ b/libswscale/utils.c
@@ -1082,6 +1082,8 @@ int sws_setColorspaceDetails(struct SwsContext *c, const 
int inv_table[4],
 ff_sws_init_range_convert(c);
 #if ARCH_LOONGARCH64
 ff_sws_init_range_convert_loongarch(c);
+#elif ARCH_X86
+ff_sws_init_range_convert_x86(c);
 #endif
 }
 
diff --git a/libswscale/x86/Makefile b/libswscale/x86/Makefile
index 68391494be..f00154941d 100644
--- a/libswscale/x86/Makefile
+++ b/libswscale/x86/Makefile
@@ -12,6 +12,7 @@ X86ASM-OBJS += x86/input.o
  \
x86/output.o \
x86/scale.o  \
x86/scale_avx2.o  \
+   x86/range_convert.o  \
x86/rgb_2_rgb.o  \
x86/yuv_2_rgb.o  \
x86/yuv2yuvX.o   \
diff --git a/libswscale/x86/range_convert.asm b/libswscale/x86/range_convert.asm
new file mode 100644
index 00..333265fb65
--- /dev/null
+++ b/libswscale/x86/range_convert.asm
@@ -0,0 +1,100 @@
+;**
+;* Copyright (c) 2024 Ramiro Polla
+;*
+;* This file is part of FFmpeg.
+;*
+;* FFmpeg is free software; you can redistribute it and/or
+;* modify it under the terms of the GNU Lesser General Public
+;* License as published by the Free Software Foundation; either
+;* version 2.1 of the License, or (at your option) any later version.
+;*
+;* FFmpeg is distributed in the hope that it will be useful,
+;* but WITHOUT ANY WARRANTY; without even the implied warranty of
+;* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+;* Lesser General Public License for more details.
+;*
+;* You should have received a copy of the GNU Lesser General Public
+;* License along with FFmpeg; if not, write to the Free Software
+;* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+;**
+
+%include "libavutil/x86/x86util.asm"
+
+; NOTE: there is no need to clamp the input when converting to jpeg range
+;   (like we do in the C code) because packssdw will saturate the output.
+
+;-
+; lumConvertRange

[FFmpeg-devel] [PATCH 2/4] checkasm: add tests for {lum, chr}ConvertRange

2024-06-07 Thread Ramiro Polla

---
 tests/checkasm/Makefile   |   1 +
 tests/checkasm/checkasm.c |   1 +
 tests/checkasm/checkasm.h |   1 +
 tests/checkasm/sw_range_convert.c | 134 ++
 4 files changed, 137 insertions(+)
 create mode 100644 tests/checkasm/sw_range_convert.c

diff --git a/tests/checkasm/Makefile b/tests/checkasm/Makefile
index 3ce152e818..e4ec6a27ec 100644
--- a/tests/checkasm/Makefile
+++ b/tests/checkasm/Makefile
@@ -64,6 +64,7 @@ CHECKASMOBJS-$(CONFIG_AVFILTER) += $(AVFILTEROBJS-yes)
 
 # swscale tests
 SWSCALEOBJS += sw_gbrp.o
+SWSCALEOBJS += sw_range_convert.o
 SWSCALEOBJS += sw_rgb.o
 SWSCALEOBJS += sw_scale.o
 
diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c
index d7aa2a9c09..d2b50c023a 100644
--- a/tests/checkasm/checkasm.c
+++ b/tests/checkasm/checkasm.c
@@ -248,6 +248,7 @@ static const struct {
 #endif
 #if CONFIG_SWSCALE
 { "sw_gbrp", checkasm_check_sw_gbrp },
+{ "sw_range_convert", checkasm_check_sw_range_convert },
 { "sw_rgb", checkasm_check_sw_rgb },
 { "sw_scale", checkasm_check_sw_scale },
 #endif
diff --git a/tests/checkasm/checkasm.h b/tests/checkasm/checkasm.h
index 211d7f52e6..e544007b67 100644
--- a/tests/checkasm/checkasm.h
+++ b/tests/checkasm/checkasm.h
@@ -119,6 +119,7 @@ void checkasm_check_rv40dsp(void);
 void checkasm_check_svq1enc(void);
 void checkasm_check_synth_filter(void);
 void checkasm_check_sw_gbrp(void);
+void checkasm_check_sw_range_convert(void);
 void checkasm_check_sw_rgb(void);
 void checkasm_check_sw_scale(void);
 void checkasm_check_takdsp(void);
diff --git a/tests/checkasm/sw_range_convert.c 
b/tests/checkasm/sw_range_convert.c
new file mode 100644
index 00..6d7e22ad40
--- /dev/null
+++ b/tests/checkasm/sw_range_convert.c
@@ -0,0 +1,134 @@
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with FFmpeg; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+#include 
+
+#include "libavutil/common.h"
+#include "libavutil/intreadwrite.h"
+#include "libavutil/mem.h"
+#include "libavutil/mem_internal.h"
+
+#include "libswscale/swscale.h"
+#include "libswscale/swscale_internal.h"
+
+#include "checkasm.h"
+
+static void check_lumConvertRange(int from)
+{
+const char *func_str = from ? "lumRangeFromJpeg" : "lumRangeToJpeg";
+#define LARGEST_INPUT_SIZE 512
+#define INPUT_SIZES 6
+static const int input_sizes[] = {8, 24, 128, 144, 256, 512};
+struct SwsContext *ctx;
+
+LOCAL_ALIGNED_32(int16_t, dst0, [LARGEST_INPUT_SIZE]);
+LOCAL_ALIGNED_32(int16_t, dst1, [LARGEST_INPUT_SIZE]);
+
+declare_func(void, int16_t *dst, int width);
+
+ctx = sws_alloc_context();
+if (sws_init_context(ctx, NULL, NULL) < 0)
+fail();
+
+ctx->srcFormat = from ? AV_PIX_FMT_YUVJ444P : AV_PIX_FMT_YUV444P;
+ctx->dstFormat = from ? AV_PIX_FMT_YUV444P : AV_PIX_FMT_YUVJ444P;
+ctx->srcRange = from;
+ctx->dstRange = !from;
+
+for (int dstWi = 0; dstWi < INPUT_SIZES; dstWi++) {
+int width = input_sizes[dstWi];
+for (int i = 0; i < width; i++) {
+uint8_t r = rnd();
+dst0[i] = (int16_t) r << 7;
+dst1[i] = (int16_t) r << 7;
+}
+ff_sws_init_scale(ctx);
+if (check_func(ctx->lumConvertRange, "%s_%d", func_str, width)) {
+call_ref(dst0, width);
+call_new(dst1, width);
+if (memcmp(dst0, dst1, width * sizeof(int16_t)))
+fail();
+bench_new(dst1, width);
+}
+}
+
+sws_freeContext(ctx);
+}
+#undef LARGEST_INPUT_SIZE
+#undef INPUT_SIZES
+
+static void check_chrConvertRange(int from)
+{
+const char *func_str = from ? "chrRangeFromJpeg" : "chrRangeToJpeg";
+#define LARGEST_INPUT_SIZE 512
+#define INPUT_SIZES 6
+static const int input_sizes[] = {8, 24, 128, 144, 256, 512};
+struct SwsContext *ctx;
+
+LOCAL_ALIGNED_32(int16_t, dstU0, [LARGEST_INPUT_SIZE]);
+LOCAL_ALIGNED_32(int16_t, dstV0, [LARGEST_INPUT_SIZE]);
+LOCAL_ALIGNED_32(int16_t, dstU1, [LARGEST_INPUT_SIZE]);
+LOCAL_ALIGNED_32(int16_t, dstV1, [LARGEST_INPUT_SIZE]);
+
+declare_func(void, int16_t *dstU, int16_t *dstV, int width);
+
+ctx = sws_alloc_context();
+if (sws_init_context(ctx, NULL, NULL) <

[FFmpeg-devel] [PATCH 1/4] tests/checkasm: cosmetics, one object per line in Makefile

2024-06-07 Thread Ramiro Polla

---
 tests/checkasm/Makefile | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tests/checkasm/Makefile b/tests/checkasm/Makefile
index 6eb94d10d5..3ce152e818 100644
--- a/tests/checkasm/Makefile
+++ b/tests/checkasm/Makefile
@@ -63,7 +63,9 @@ AVFILTEROBJS-$(CONFIG_SOBEL_FILTER)  += vf_convolution.o
 CHECKASMOBJS-$(CONFIG_AVFILTER) += $(AVFILTEROBJS-yes)
 
 # swscale tests
-SWSCALEOBJS += sw_gbrp.o sw_rgb.o sw_scale.o
+SWSCALEOBJS += sw_gbrp.o
+SWSCALEOBJS += sw_rgb.o
+SWSCALEOBJS += sw_scale.o
 
 CHECKASMOBJS-$(CONFIG_SWSCALE)  += $(SWSCALEOBJS)
 
-- 
2.30.2

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH] libswscale/x86/yuv_2_rgb: fix some comments

2024-06-05 Thread Ramiro Polla

On Tue, Jun 4, 2024 at 3:15 PM Ramiro Polla  wrote:
>
> ---
>  libswscale/x86/yuv_2_rgb.asm | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/libswscale/x86/yuv_2_rgb.asm b/libswscale/x86/yuv_2_rgb.asm
> index e3470fd9ad..a1f9134e08 100644
> --- a/libswscale/x86/yuv_2_rgb.asm
> +++ b/libswscale/x86/yuv_2_rgb.asm
> @@ -195,15 +195,15 @@ cglobal %1_420_%2%3, GPR_num, GPR_num, reg_num, 
> parameters
>  mova m5, m7
>  paddsw m3, m0 ; B1 B3 B5 B7 ...
>  paddsw m5, m1 ; R1 R3 R5 R7 ...
> -paddsw m7, m2 ; G1 G3 G4 G7 ...
> +paddsw m7, m2 ; G1 G3 G5 G7 ...
>  paddsw m0, m6 ; B0 B2 B4 B6 ...
>  paddsw m1, m6 ; R0 R2 R4 R6 ...
>  paddsw m2, m6 ; G0 G2 G4 G6 ...
>
>  %if %3 == 24 ; PACK RGB24
>  %define depth 3
> -packuswb m0, m3 ; R0 R2 R4 R6 ... R1 R3 R5 R7 ...
> -packuswb m1, m5 ; B0 B2 B4 B6 ... B1 B3 B5 B7 ...
> +packuswb m0, m3 ; B0 B2 B4 B6 ... B1 B3 B5 B7 ...
> +packuswb m1, m5 ; R0 R2 R4 R6 ... R1 R3 R5 R7 ...
>  packuswb m2, m7 ; G0 G2 G4 G6 ... G1 G3 G5 G7 ...
>  mova m3, m_red
>  mova m6, m_blue
> @@ -248,7 +248,7 @@ cglobal %1_420_%2%3, GPR_num, GPR_num, reg_num, parameters
>  psrlq m5, 32
>  movd [imageq + 20], m2 ; -- -- G7 B7
>  movd [imageq + 18], m5 ; R6 G6 B6 R7
> -%endif ; mmsize = 8
> +%endif ; cpuflag
>  %else ; mmsize == 16
>  pshufb m3, [rgb24_shuf1] ; r0  g0  r6  g6  r12 g12 r2  g2  r8  g8  r14 
> g14 r4  g4  r10 g10
>  pshufb m6, [rgb24_shuf2] ; b10 r11 b0  r1  b6  r7  b12 r13 b2  r3  b8  
> r9  b14 r15 b4  r5
> --
> 2.30.2
>

I'll apply tomorrow.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH] libavcodec/libxvid: code cleanup (replace magic numbers)

2024-06-05 Thread Ramiro Polla

On Tue, Jun 4, 2024 at 2:54 PM Ramiro Polla  wrote:
> On Thu, May 30, 2024 at 11:24 PM Sean McGovern  wrote:
> > On Thu, May 30, 2024 at 5:20 PM Ramiro Polla  wrote:
> > >
> > > ---
> > >  libavcodec/libxvid.c | 4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/libavcodec/libxvid.c b/libavcodec/libxvid.c
> > > index b9ac39429d..a490f16b3f 100644
> > > --- a/libavcodec/libxvid.c
> > > +++ b/libavcodec/libxvid.c
> > > @@ -422,13 +422,13 @@ static av_cold int xvid_encode_init(AVCodecContext 
> > > *avctx)
> > >
> > >  /* Decide how we should decide blocks */
> > >  switch (avctx->mb_decision) {
> > > -case 2:
> > > +case FF_MB_DECISION_RD:
> > >  x->vop_flags |=  XVID_VOP_MODEDECISION_RD;
> > >  x->me_flags  |=  XVID_ME_HALFPELREFINE8_RD|
> > >   XVID_ME_QUARTERPELREFINE8_RD |
> > >   XVID_ME_EXTSEARCH_RD |
> > >   XVID_ME_CHECKPREDICTION_RD;
> > > -case 1:
> > > +case FF_MB_DECISION_BITS:
> > >  if (!(x->vop_flags & XVID_VOP_MODEDECISION_RD))
> > >  x->vop_flags |= XVID_VOP_FAST_MODEDECISION_RD;
> > >  x->me_flags |= XVID_ME_HALFPELREFINE16_RD |
> > > --
> > > 2.30.2
> > >
> > > ___
> > > ffmpeg-devel mailing list
> > > ffmpeg-devel@ffmpeg.org
> > > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> > >
> > > To unsubscribe, visit link above, or email
> > > ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
> >
> > This gets a +1 from me.
>
> I'll apply tomorrow.

Pushed.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH] avcodec/mpegvideo_enc: give magic number a name

2024-06-05 Thread Ramiro Polla

On Wed, Jun 5, 2024 at 1:51 AM Michael Niedermayer
 wrote:
> On Tue, Jun 04, 2024 at 03:05:35PM +0200, Ramiro Polla wrote:
> > ---
> >  libavcodec/mpegvideo_enc.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
>
> LGTM

Pushed.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH] libswscale/x86/yuv_2_rgb: fix some comments

2024-06-04 Thread Ramiro Polla

---
 libswscale/x86/yuv_2_rgb.asm | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/libswscale/x86/yuv_2_rgb.asm b/libswscale/x86/yuv_2_rgb.asm
index e3470fd9ad..a1f9134e08 100644
--- a/libswscale/x86/yuv_2_rgb.asm
+++ b/libswscale/x86/yuv_2_rgb.asm
@@ -195,15 +195,15 @@ cglobal %1_420_%2%3, GPR_num, GPR_num, reg_num, parameters
 mova m5, m7
 paddsw m3, m0 ; B1 B3 B5 B7 ...
 paddsw m5, m1 ; R1 R3 R5 R7 ...
-paddsw m7, m2 ; G1 G3 G4 G7 ...
+paddsw m7, m2 ; G1 G3 G5 G7 ...
 paddsw m0, m6 ; B0 B2 B4 B6 ...
 paddsw m1, m6 ; R0 R2 R4 R6 ...
 paddsw m2, m6 ; G0 G2 G4 G6 ...
 
 %if %3 == 24 ; PACK RGB24
 %define depth 3
-packuswb m0, m3 ; R0 R2 R4 R6 ... R1 R3 R5 R7 ...
-packuswb m1, m5 ; B0 B2 B4 B6 ... B1 B3 B5 B7 ...
+packuswb m0, m3 ; B0 B2 B4 B6 ... B1 B3 B5 B7 ...
+packuswb m1, m5 ; R0 R2 R4 R6 ... R1 R3 R5 R7 ...
 packuswb m2, m7 ; G0 G2 G4 G6 ... G1 G3 G5 G7 ...
 mova m3, m_red
 mova m6, m_blue
@@ -248,7 +248,7 @@ cglobal %1_420_%2%3, GPR_num, GPR_num, reg_num, parameters
 psrlq m5, 32
 movd [imageq + 20], m2 ; -- -- G7 B7
 movd [imageq + 18], m5 ; R6 G6 B6 R7
-%endif ; mmsize = 8
+%endif ; cpuflag
 %else ; mmsize == 16
 pshufb m3, [rgb24_shuf1] ; r0  g0  r6  g6  r12 g12 r2  g2  r8  g8  r14 g14 
r4  g4  r10 g10
 pshufb m6, [rgb24_shuf2] ; b10 r11 b0  r1  b6  r7  b12 r13 b2  r3  b8  r9  
b14 r15 b4  r5
-- 
2.30.2

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH] avcodec/mpegvideo_enc: give magic number a name

2024-06-04 Thread Ramiro Polla

---
 libavcodec/mpegvideo_enc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libavcodec/mpegvideo_enc.c b/libavcodec/mpegvideo_enc.c
index 73a9082265..82bab43e14 100644
--- a/libavcodec/mpegvideo_enc.c
+++ b/libavcodec/mpegvideo_enc.c
@@ -562,7 +562,7 @@ av_cold int ff_mpv_encode_init(AVCodecContext *avctx)
 
 if ((s->mpv_flags & FF_MPV_FLAG_QP_RD) &&
 avctx->mb_decision != FF_MB_DECISION_RD) {
-av_log(avctx, AV_LOG_ERROR, "QP RD needs mbd=2\n");
+av_log(avctx, AV_LOG_ERROR, "QP RD needs mbd=rd\n");
 return AVERROR(EINVAL);
 }
 
-- 
2.30.2

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH 1/2] ffplay: add -scaling_quality option for SDL

2024-06-04 Thread Ramiro Polla

On Thu, May 30, 2024 at 11:36 PM Ramiro Polla  wrote:
>
> ---
>  doc/ffplay.texi  | 2 ++
>  fftools/ffplay.c | 6 +-
>  2 files changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/doc/ffplay.texi b/doc/ffplay.texi
> index 93f77eeece..60f883e159 100644
> --- a/doc/ffplay.texi
> +++ b/doc/ffplay.texi
> @@ -72,6 +72,8 @@ as 100.
>  Force format.
>  @item -window_title @var{title}
>  Set window title (default is the input filename).
> +@item -scaling_quality @var{value}
> +Set SDL_HINT_RENDER_SCALE_QUALITY value (default is "linear").
>  @item -left @var{title}
>  Set the x position for the left of the window (default is a centered window).
>  @item -top @var{title}
> diff --git a/fftools/ffplay.c b/fftools/ffplay.c
> index b9d11eecee..75d2bec777 100644
> --- a/fftools/ffplay.c
> +++ b/fftools/ffplay.c
> @@ -351,6 +351,7 @@ static int filter_nbthreads = 0;
>  static int enable_vulkan = 0;
>  static char *vulkan_params = NULL;
>  static const char *hwaccel = NULL;
> +static const char *scaling_quality = NULL;
>
>  /* current context */
>  static int is_full_screen;
> @@ -3683,6 +3684,7 @@ static const OptionDef options[] = {
>  { "framedrop",  OPT_TYPE_BOOL,   OPT_EXPERT, {  }, 
> "drop frames when cpu is too slow", "" },
>  { "infbuf", OPT_TYPE_BOOL,   OPT_EXPERT, { _buffer 
> }, "don't limit the input buffer size (useful with realtime streams)", "" },
>  { "window_title",   OPT_TYPE_STRING,  0, { _title }, 
> "set window title", "window title" },
> +{ "scaling_quality",OPT_TYPE_STRING, OPT_EXPERT, { _quality 
> }, "set SDL_HINT_RENDER_SCALE_QUALITY value (default=linear)", "value" },
>  { "left",   OPT_TYPE_INT,OPT_EXPERT, { _left }, 
> "set the x position for the left of the window", "x pos" },
>  { "top",OPT_TYPE_INT,OPT_EXPERT, { _top }, 
> "set the y position for the top of the window", "y pos" },
>  { "vf", OPT_TYPE_FUNC, OPT_FUNC_ARG | OPT_EXPERT, { 
> .func_arg = opt_add_vfilter }, "set video filters", "filter_graph" },
> @@ -3831,7 +3833,9 @@ int main(int argc, char **argv)
>  }
>  }
>  window = SDL_CreateWindow(program_name, SDL_WINDOWPOS_UNDEFINED, 
> SDL_WINDOWPOS_UNDEFINED, default_width, default_height, flags);
> -SDL_SetHint(SDL_HINT_RENDER_SCALE_QUALITY, "linear");
> +if (!scaling_quality)
> +scaling_quality = "linear";
> +SDL_SetHint(SDL_HINT_RENDER_SCALE_QUALITY, scaling_quality);
>  if (!window) {
>  av_log(NULL, AV_LOG_FATAL, "Failed to create window: %s", 
> SDL_GetError());
>  do_exit(NULL);
> --
> 2.39.2
>

Can anyone comment on this? I had a few doubts on this patch:
- does the option name properly convey its functionality?
- is the documentation too terse?
- should we include the accepted values in the documentation, even
though they are sdl-specific?
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH] libavcodec/libxvid: code cleanup (replace magic numbers)

2024-06-04 Thread Ramiro Polla

On Thu, May 30, 2024 at 11:24 PM Sean McGovern  wrote:
> On Thu, May 30, 2024 at 5:20 PM Ramiro Polla  wrote:
> >
> > ---
> >  libavcodec/libxvid.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/libavcodec/libxvid.c b/libavcodec/libxvid.c
> > index b9ac39429d..a490f16b3f 100644
> > --- a/libavcodec/libxvid.c
> > +++ b/libavcodec/libxvid.c
> > @@ -422,13 +422,13 @@ static av_cold int xvid_encode_init(AVCodecContext 
> > *avctx)
> >
> >  /* Decide how we should decide blocks */
> >  switch (avctx->mb_decision) {
> > -case 2:
> > +case FF_MB_DECISION_RD:
> >  x->vop_flags |=  XVID_VOP_MODEDECISION_RD;
> >  x->me_flags  |=  XVID_ME_HALFPELREFINE8_RD|
> >   XVID_ME_QUARTERPELREFINE8_RD |
> >   XVID_ME_EXTSEARCH_RD |
> >   XVID_ME_CHECKPREDICTION_RD;
> > -case 1:
> > +case FF_MB_DECISION_BITS:
> >  if (!(x->vop_flags & XVID_VOP_MODEDECISION_RD))
> >  x->vop_flags |= XVID_VOP_FAST_MODEDECISION_RD;
> >  x->me_flags |= XVID_ME_HALFPELREFINE16_RD |
> > --
> > 2.30.2
> >
> > ___
> > ffmpeg-devel mailing list
> > ffmpeg-devel@ffmpeg.org
> > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> >
> > To unsubscribe, visit link above, or email
> > ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
>
> This gets a +1 from me.

I'll apply tomorrow.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH 2/2] ffplay: set default scaling_quality to "best" instead of "linear"

2024-05-30 Thread Ramiro Polla

These values are aliases in SDL, but "best" is a more intuitive name.
---
 doc/ffplay.texi  | 2 +-
 fftools/ffplay.c | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/doc/ffplay.texi b/doc/ffplay.texi
index 60f883e159..e7ff62ae16 100644
--- a/doc/ffplay.texi
+++ b/doc/ffplay.texi
@@ -73,7 +73,7 @@ Force format.
 @item -window_title @var{title}
 Set window title (default is the input filename).
 @item -scaling_quality @var{value}
-Set SDL_HINT_RENDER_SCALE_QUALITY value (default is "linear").
+Set SDL_HINT_RENDER_SCALE_QUALITY value (default is "best").
 @item -left @var{title}
 Set the x position for the left of the window (default is a centered window).
 @item -top @var{title}
diff --git a/fftools/ffplay.c b/fftools/ffplay.c
index 75d2bec777..6575ad14a7 100644
--- a/fftools/ffplay.c
+++ b/fftools/ffplay.c
@@ -3684,7 +3684,7 @@ static const OptionDef options[] = {
 { "framedrop",  OPT_TYPE_BOOL,   OPT_EXPERT, {  }, "drop 
frames when cpu is too slow", "" },
 { "infbuf", OPT_TYPE_BOOL,   OPT_EXPERT, { _buffer }, 
"don't limit the input buffer size (useful with realtime streams)", "" },
 { "window_title",   OPT_TYPE_STRING,  0, { _title }, 
"set window title", "window title" },
-{ "scaling_quality",OPT_TYPE_STRING, OPT_EXPERT, { _quality }, 
"set SDL_HINT_RENDER_SCALE_QUALITY value (default=linear)", "value" },
+{ "scaling_quality",OPT_TYPE_STRING, OPT_EXPERT, { _quality }, 
"set SDL_HINT_RENDER_SCALE_QUALITY value (default=best)", "value" },
 { "left",   OPT_TYPE_INT,OPT_EXPERT, { _left }, 
"set the x position for the left of the window", "x pos" },
 { "top",OPT_TYPE_INT,OPT_EXPERT, { _top }, "set 
the y position for the top of the window", "y pos" },
 { "vf", OPT_TYPE_FUNC, OPT_FUNC_ARG | OPT_EXPERT, { 
.func_arg = opt_add_vfilter }, "set video filters", "filter_graph" },
@@ -3834,7 +3834,7 @@ int main(int argc, char **argv)
 }
 window = SDL_CreateWindow(program_name, SDL_WINDOWPOS_UNDEFINED, 
SDL_WINDOWPOS_UNDEFINED, default_width, default_height, flags);
 if (!scaling_quality)
-scaling_quality = "linear";
+scaling_quality = "best";
 SDL_SetHint(SDL_HINT_RENDER_SCALE_QUALITY, scaling_quality);
 if (!window) {
 av_log(NULL, AV_LOG_FATAL, "Failed to create window: %s", 
SDL_GetError());
-- 
2.39.2

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH 1/2] ffplay: add -scaling_quality option for SDL

2024-05-30 Thread Ramiro Polla

---
 doc/ffplay.texi  | 2 ++
 fftools/ffplay.c | 6 +-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/doc/ffplay.texi b/doc/ffplay.texi
index 93f77eeece..60f883e159 100644
--- a/doc/ffplay.texi
+++ b/doc/ffplay.texi
@@ -72,6 +72,8 @@ as 100.
 Force format.
 @item -window_title @var{title}
 Set window title (default is the input filename).
+@item -scaling_quality @var{value}
+Set SDL_HINT_RENDER_SCALE_QUALITY value (default is "linear").
 @item -left @var{title}
 Set the x position for the left of the window (default is a centered window).
 @item -top @var{title}
diff --git a/fftools/ffplay.c b/fftools/ffplay.c
index b9d11eecee..75d2bec777 100644
--- a/fftools/ffplay.c
+++ b/fftools/ffplay.c
@@ -351,6 +351,7 @@ static int filter_nbthreads = 0;
 static int enable_vulkan = 0;
 static char *vulkan_params = NULL;
 static const char *hwaccel = NULL;
+static const char *scaling_quality = NULL;
 
 /* current context */
 static int is_full_screen;
@@ -3683,6 +3684,7 @@ static const OptionDef options[] = {
 { "framedrop",  OPT_TYPE_BOOL,   OPT_EXPERT, {  }, "drop 
frames when cpu is too slow", "" },
 { "infbuf", OPT_TYPE_BOOL,   OPT_EXPERT, { _buffer }, 
"don't limit the input buffer size (useful with realtime streams)", "" },
 { "window_title",   OPT_TYPE_STRING,  0, { _title }, 
"set window title", "window title" },
+{ "scaling_quality",OPT_TYPE_STRING, OPT_EXPERT, { _quality }, 
"set SDL_HINT_RENDER_SCALE_QUALITY value (default=linear)", "value" },
 { "left",   OPT_TYPE_INT,OPT_EXPERT, { _left }, 
"set the x position for the left of the window", "x pos" },
 { "top",OPT_TYPE_INT,OPT_EXPERT, { _top }, "set 
the y position for the top of the window", "y pos" },
 { "vf", OPT_TYPE_FUNC, OPT_FUNC_ARG | OPT_EXPERT, { 
.func_arg = opt_add_vfilter }, "set video filters", "filter_graph" },
@@ -3831,7 +3833,9 @@ int main(int argc, char **argv)
 }
 }
 window = SDL_CreateWindow(program_name, SDL_WINDOWPOS_UNDEFINED, 
SDL_WINDOWPOS_UNDEFINED, default_width, default_height, flags);
-SDL_SetHint(SDL_HINT_RENDER_SCALE_QUALITY, "linear");
+if (!scaling_quality)
+scaling_quality = "linear";
+SDL_SetHint(SDL_HINT_RENDER_SCALE_QUALITY, scaling_quality);
 if (!window) {
 av_log(NULL, AV_LOG_FATAL, "Failed to create window: %s", 
SDL_GetError());
 do_exit(NULL);
-- 
2.39.2

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH] libavcodec/libxvid: code cleanup (replace magic numbers)

2024-05-30 Thread Ramiro Polla

---
 libavcodec/libxvid.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libavcodec/libxvid.c b/libavcodec/libxvid.c
index b9ac39429d..a490f16b3f 100644
--- a/libavcodec/libxvid.c
+++ b/libavcodec/libxvid.c
@@ -422,13 +422,13 @@ static av_cold int xvid_encode_init(AVCodecContext *avctx)
 
 /* Decide how we should decide blocks */
 switch (avctx->mb_decision) {
-case 2:
+case FF_MB_DECISION_RD:
 x->vop_flags |=  XVID_VOP_MODEDECISION_RD;
 x->me_flags  |=  XVID_ME_HALFPELREFINE8_RD|
  XVID_ME_QUARTERPELREFINE8_RD |
  XVID_ME_EXTSEARCH_RD |
  XVID_ME_CHECKPREDICTION_RD;
-case 1:
+case FF_MB_DECISION_BITS:
 if (!(x->vop_flags & XVID_VOP_MODEDECISION_RD))
 x->vop_flags |= XVID_VOP_FAST_MODEDECISION_RD;
 x->me_flags |= XVID_ME_HALFPELREFINE16_RD |
-- 
2.30.2

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH v3 1/2] checkasm: add test for fdct

2024-05-13 Thread Ramiro Polla

On Mon, May 13, 2024 at 6:49 PM James Almer  wrote:
> On 5/6/2024 2:49 PM, Rémi Denis-Courmont wrote:
> > Le maanantaina 6. toukokuuta 2024, 20.18.11 EEST Ramiro Polla a écrit :
> >> I'll send a similar patch to fix checkasm/idctdsp after this is merged.
> >
> > The idctdsp test does not actually test the iDCT, but only the trivial-ish
> > add/put helpers, so it does not care about the context. You're welcome to 
> > fix
> > it anyway of course.
>
> I personally find it ugly how we're storing a whole AVCodecContext on
> stack in these tests just to pass two ints to an init function.
> Maybe we can make said values be input parameters for these instead of a
> pointer to avctx.

It could make sense for fdct, but for idct we need a few more
parameters (bits_per_raw_sample, codec_id, flags, idct_algo, lowres).
That would make the function calls much longer, and in that case I'd
prefer just keeping AVCodecContext.
Or having an input parameter structure for each *dsp context, but that
seems a bit overkill.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH v3 0/2] lavc/aarch64/fdct: add neon-optimized fdct for aarch64

2024-05-13 Thread Ramiro Polla

On Wed, Apr 17, 2024 at 10:49 PM Martin Storsjö  wrote:
> On Wed, 17 Apr 2024, Ramiro Polla wrote:
> > This patch set adds fdct to checkasm and neon-optimized fdct for aarch64.
> >
> > Ramiro Polla (2):
> >  checkasm: add test for fdct
> >  lavc/aarch64/fdct: add neon-optimized fdct for aarch64
> >
> > libavcodec/aarch64/Makefile   |   2 +
> > libavcodec/aarch64/fdct.h |  26 ++
> > libavcodec/aarch64/fdctdsp_init_aarch64.c |  39 +++
> > libavcodec/aarch64/fdctdsp_neon.S | 368 ++
> > libavcodec/avcodec.h  |   1 +
> > libavcodec/fdctdsp.c  |   4 +-
> > libavcodec/fdctdsp.h  |   2 +
> > libavcodec/options_table.h|   1 +
> > libavcodec/tests/aarch64/dct.c|   2 +
> > tests/checkasm/Makefile   |   1 +
> > tests/checkasm/checkasm.c |   3 +
> > tests/checkasm/checkasm.h |   1 +
> > tests/checkasm/fdctdsp.c  |  68 
> > tests/fate/checkasm.mak   |   1 +
> > 14 files changed, 518 insertions(+), 1 deletion(-)
> > create mode 100644 libavcodec/aarch64/fdct.h
> > create mode 100644 libavcodec/aarch64/fdctdsp_init_aarch64.c
> > create mode 100644 libavcodec/aarch64/fdctdsp_neon.S
> > create mode 100644 tests/checkasm/fdctdsp.c
>
> LGTM, thanks!

Pushed.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH] lavc/aarch64: fix include for cpu.h

2024-05-13 Thread Ramiro Polla

On Mon, May 13, 2024 at 12:15 PM Martin Storsjö  wrote:
> On Sat, 11 May 2024, Ramiro Polla wrote:
> > On Sun, Jan 21, 2024 at 10:57 PM Ramiro Polla  
> > wrote:
> >>
> >> ---
> >>  libavcodec/aarch64/idctdsp_init_aarch64.c | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/libavcodec/aarch64/idctdsp_init_aarch64.c 
> >> b/libavcodec/aarch64/idctdsp_init_aarch64.c
> >> index eec21aa5a2..8efd5f5323 100644
> >> --- a/libavcodec/aarch64/idctdsp_init_aarch64.c
> >> +++ b/libavcodec/aarch64/idctdsp_init_aarch64.c
> >> @@ -22,7 +22,7 @@
> >>
> >>  #include "libavutil/attributes.h"
> >>  #include "libavutil/cpu.h"
> >> -#include "libavutil/arm/cpu.h"
> >> +#include "libavutil/aarch64/cpu.h"
> >>  #include "libavcodec/avcodec.h"
> >>  #include "libavcodec/idctdsp.h"
> >>  #include "idct.h"
> >> --
> >> 2.30.2
> >>
> >
> > I'll apply if there are no objections.
>
> LGTM

Thanks. Pushed.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH v4 1/2] checkasm: add test for fdct

2024-05-11 Thread Ramiro Polla

On Sat, May 11, 2024 at 10:32 AM Ramiro Polla  wrote:
> On Mon, May 6, 2024 at 7:46 PM Rémi Denis-Courmont  wrote:
[...]
> > No objections from me, but it would be nice and seemingly trivial to add 9 
> > and
> > 10 bits while at it.
[...]
> I'll add checks for the 9 and 10 bits later.

Apparently we have no assembly versions of 9 and 10 bits fdct, so
there's not much point in adding it to checkasm for the time being.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH v4 1/2] checkasm: add test for fdct

2024-05-11 Thread Ramiro Polla

On Mon, May 6, 2024 at 7:46 PM Rémi Denis-Courmont  wrote:
> Le maanantaina 6. toukokuuta 2024, 20.18.39 EEST Ramiro Polla a écrit :
> > Reviewed-by: Martin Storsjö 
> > Reviewed-by: Rémi Denis-Courmont 
> > ---
> >  tests/checkasm/Makefile   |  1 +
> >  tests/checkasm/checkasm.c |  3 ++
> >  tests/checkasm/checkasm.h |  1 +
> >  tests/checkasm/fdctdsp.c  | 71 +++
> >  tests/fate/checkasm.mak   |  1 +
> >  5 files changed, 77 insertions(+)
> >  create mode 100644 tests/checkasm/fdctdsp.c
> >
> > diff --git a/tests/checkasm/Makefile b/tests/checkasm/Makefile
> > index 3e40aba2c3..b5bb885201 100644
> > --- a/tests/checkasm/Makefile
> > +++ b/tests/checkasm/Makefile
> > @@ -4,6 +4,7 @@ AVCODECOBJS-$(CONFIG_AC3DSP)+= ac3dsp.o
> >  AVCODECOBJS-$(CONFIG_AUDIODSP)  += audiodsp.o
> >  AVCODECOBJS-$(CONFIG_BLOCKDSP)  += blockdsp.o
> >  AVCODECOBJS-$(CONFIG_BSWAPDSP)  += bswapdsp.o
> > +AVCODECOBJS-$(CONFIG_FDCTDSP)   += fdctdsp.o
> >  AVCODECOBJS-$(CONFIG_FMTCONVERT)+= fmtconvert.o
> >  AVCODECOBJS-$(CONFIG_G722DSP)   += g722dsp.o
> >  AVCODECOBJS-$(CONFIG_H264CHROMA)+= h264chroma.o
> > diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c
> > index 9be32fc16e..e5d39e2116 100644
> > --- a/tests/checkasm/checkasm.c
> > +++ b/tests/checkasm/checkasm.c
> > @@ -106,6 +106,9 @@ static const struct {
> >  #if CONFIG_EXR_DECODER
> >  { "exrdsp", checkasm_check_exrdsp },
> >  #endif
> > +#if CONFIG_FDCTDSP
> > +{ "fdctdsp", checkasm_check_fdctdsp },
> > +#endif
> >  #if CONFIG_FLAC_DECODER
> >  { "flacdsp", checkasm_check_flacdsp },
> >  #endif
> > diff --git a/tests/checkasm/checkasm.h b/tests/checkasm/checkasm.h
> > index 173360af60..8807a37a43 100644
> > --- a/tests/checkasm/checkasm.h
> > +++ b/tests/checkasm/checkasm.h
> > @@ -85,6 +85,7 @@ void checkasm_check_blockdsp(void);
> >  void checkasm_check_bswapdsp(void);
> >  void checkasm_check_colorspace(void);
> >  void checkasm_check_exrdsp(void);
> > +void checkasm_check_fdctdsp(void);
> >  void checkasm_check_fixed_dsp(void);
> >  void checkasm_check_flacdsp(void);
> >  void checkasm_check_float_dsp(void);
> > diff --git a/tests/checkasm/fdctdsp.c b/tests/checkasm/fdctdsp.c
> > new file mode 100644
> > index 00..c640a00656
> > --- /dev/null
> > +++ b/tests/checkasm/fdctdsp.c
> > @@ -0,0 +1,71 @@
> > +/*
> > + * This file is part of FFmpeg.
> > + *
> > + * FFmpeg is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License as published by
> > + * the Free Software Foundation; either version 2 of the License, or
> > + * (at your option) any later version.
> > + *
> > + * FFmpeg is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > + * GNU General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU General Public License along
> > + * with FFmpeg; if not, write to the Free Software Foundation, Inc.,
> > + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
> > + */
> > +
> > +#include 
> > +
> > +#include "checkasm.h"
> > +
> > +#include "libavcodec/avcodec.h"
> > +#include "libavcodec/fdctdsp.h"
> > +
> > +#include "libavutil/common.h"
> > +#include "libavutil/internal.h"
> > +#include "libavutil/mem_internal.h"
> > +
> > +static int int16_cmp_off_by_n(const int16_t *ref, const int16_t *test,
> > size_t n, int accuracy) +{
> > +for (size_t i = 0; i < n; i++) {
> > +if (abs(ref[i] - test[i]) > accuracy)
> > +return 1;
> > +}
> > +return 0;
> > +}
> > +
> > +static void check_fdct(void)
> > +{
> > +LOCAL_ALIGNED_16(int16_t, block0, [64]);
> > +LOCAL_ALIGNED_16(int16_t, block1, [64]);
> > +
> > +AVCodecContext avctx = {
> > +.bits_per_raw_sample = 8,
> > +.dct_algo = FF_DCT_AUTO,
> > +};
> > +FDCTDSPContext h;
> > +
> > +ff_fdctdsp_init(, );
> > +
> > +if (check_func(h.fdct, "fdct")) {
> > +declare_func(v

Re: [FFmpeg-devel] [PATCH] lavc/aarch64: fix include for cpu.h

2024-05-11 Thread Ramiro Polla

On Sun, Jan 21, 2024 at 10:57 PM Ramiro Polla  wrote:
>
> ---
>  libavcodec/aarch64/idctdsp_init_aarch64.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/libavcodec/aarch64/idctdsp_init_aarch64.c 
> b/libavcodec/aarch64/idctdsp_init_aarch64.c
> index eec21aa5a2..8efd5f5323 100644
> --- a/libavcodec/aarch64/idctdsp_init_aarch64.c
> +++ b/libavcodec/aarch64/idctdsp_init_aarch64.c
> @@ -22,7 +22,7 @@
>
>  #include "libavutil/attributes.h"
>  #include "libavutil/cpu.h"
> -#include "libavutil/arm/cpu.h"
> +#include "libavutil/aarch64/cpu.h"
>  #include "libavcodec/avcodec.h"
>  #include "libavcodec/idctdsp.h"
>  #include "idct.h"
> --
> 2.30.2
>

I'll apply if there are no objections.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH v2 1/2] libavcodec/mpegvideo_enc: fix multi-threaded motion estimation rounding for mpeg4

2024-05-11 Thread Ramiro Polla

On Thu, May 9, 2024 at 2:44 AM Michael Niedermayer
 wrote:
> On Wed, May 08, 2024 at 05:19:49PM +0200, Ramiro Polla wrote:
> > ff_init_me() was being called after ff_update_duplicate_context(),
> > which caused the propagation of the initialization to other thread
> > contexts to be delayed by one frame.
> >
> > In the case of mpeg4 (or flipflop_rounding), this would make the
> > hpel_put functions differ between the first thread (which would be
> > correctly initialized) and the other threads (which would be stale
> > from the previous frame).
> > ---
> >  libavcodec/mpegvideo_enc.c | 6 +++---
> >  1 file changed, 3 insertions(+), 3 deletions(-)
>
> have you confirmed the actual used rounding matches after this
> encoder & decoder side ?

Yes, I just rechecked it. It used to be wrong (only the first slice
would use the correct hpel/qpel functions in the encoder according to
the no_rounding flag in the bitstream).

> if yes then this should be ok

Thanks. Pushed.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH v2 2/2] libavcodec/motion_est: fix penalty_factor for b frames

2024-05-11 Thread Ramiro Polla

On Wed, May 8, 2024 at 11:47 PM Michael Niedermayer
 wrote:
> On Wed, May 08, 2024 at 05:19:50PM +0200, Ramiro Polla wrote:
> > In direct_search() and ff_estimate_b_frame_motion(), penalty_factor
> > would be used before being initialized in estimate_motion_b(). Also,
> > the initialization would happen more than once unnecessarily.
> > ---
> >  libavcodec/motion_est.c  | 15 ---
> >  tests/ref/vsynth/vsynth1-mpeg4-thread|  6 +++---
> >  tests/ref/vsynth/vsynth2-mpeg2-ivlc-qprd |  6 +++---
> >  tests/ref/vsynth/vsynth2-mpeg4-adap  |  8 
> >  tests/ref/vsynth/vsynth2-mpeg4-qprd  |  6 +++---
> >  tests/ref/vsynth/vsynth2-mpeg4-thread|  6 +++---
> >  tests/ref/vsynth/vsynth_lena-mpeg4-rc|  4 ++--
> >  7 files changed, 26 insertions(+), 25 deletions(-)
>
> probably ok

Thanks. Pushed.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH v2 2/2] libavcodec/motion_est: fix penalty_factor for b frames

2024-05-08 Thread Ramiro Polla

In direct_search() and ff_estimate_b_frame_motion(), penalty_factor
would be used before being initialized in estimate_motion_b(). Also,
the initialization would happen more than once unnecessarily.
---
 libavcodec/motion_est.c  | 15 ---
 tests/ref/vsynth/vsynth1-mpeg4-thread|  6 +++---
 tests/ref/vsynth/vsynth2-mpeg2-ivlc-qprd |  6 +++---
 tests/ref/vsynth/vsynth2-mpeg4-adap  |  8 
 tests/ref/vsynth/vsynth2-mpeg4-qprd  |  6 +++---
 tests/ref/vsynth/vsynth2-mpeg4-thread|  6 +++---
 tests/ref/vsynth/vsynth_lena-mpeg4-rc|  4 ++--
 7 files changed, 26 insertions(+), 25 deletions(-)

diff --git a/libavcodec/motion_est.c b/libavcodec/motion_est.c
index df9d1befa8..fb569ede8a 100644
--- a/libavcodec/motion_est.c
+++ b/libavcodec/motion_est.c
@@ -1127,9 +1127,6 @@ static int estimate_motion_b(MpegEncContext *s, int mb_x, 
int mb_y,
 const uint8_t * const mv_penalty = c->mv_penalty[f_code] + MAX_DMV;
 int mv_scale;
 
-c->penalty_factor= get_penalty_factor(s->lambda, s->lambda2, 
c->avctx->me_cmp);
-c->sub_penalty_factor= get_penalty_factor(s->lambda, s->lambda2, 
c->avctx->me_sub_cmp);
-c->mb_penalty_factor = get_penalty_factor(s->lambda, s->lambda2, 
c->avctx->mb_cmp);
 c->current_mv_penalty= mv_penalty;
 
 get_limits(s, 16*mb_x, 16*mb_y);
@@ -1495,7 +1492,6 @@ void ff_estimate_b_frame_motion(MpegEncContext * s,
  int mb_x, int mb_y)
 {
 MotionEstContext * const c= >me;
-const int penalty_factor= c->mb_penalty_factor;
 int fmin, bmin, dmin, fbmin, bimin, fimin;
 int type=0;
 const int xy = mb_y*s->mb_stride + mb_x;
@@ -1517,22 +1513,27 @@ void ff_estimate_b_frame_motion(MpegEncContext * s,
 return;
 }
 
+c->penalty_factor= get_penalty_factor(s->lambda, s->lambda2, 
c->avctx->me_cmp);
+c->sub_penalty_factor= get_penalty_factor(s->lambda, s->lambda2, 
c->avctx->me_sub_cmp);
+c->mb_penalty_factor = get_penalty_factor(s->lambda, s->lambda2, 
c->avctx->mb_cmp);
+
 if (s->codec_id == AV_CODEC_ID_MPEG4)
 dmin= direct_search(s, mb_x, mb_y);
 else
 dmin= INT_MAX;
+
 // FIXME penalty stuff for non-MPEG-4
 c->skip=0;
 fmin = estimate_motion_b(s, mb_x, mb_y, s->b_forw_mv_table, 0, s->f_code) +
-   3 * penalty_factor;
+   3 * c->mb_penalty_factor;
 
 c->skip=0;
 bmin = estimate_motion_b(s, mb_x, mb_y, s->b_back_mv_table, 2, s->b_code) +
-   2 * penalty_factor;
+   2 * c->mb_penalty_factor;
 ff_dlog(s, " %d %d ", s->b_forw_mv_table[xy][0], 
s->b_forw_mv_table[xy][1]);
 
 c->skip=0;
-fbmin= bidir_refine(s, mb_x, mb_y) + penalty_factor;
+fbmin= bidir_refine(s, mb_x, mb_y) + c->mb_penalty_factor;
 ff_dlog(s, "%d %d %d %d\n", dmin, fmin, bmin, fbmin);
 
 if (s->avctx->flags & AV_CODEC_FLAG_INTERLACED_ME) {
diff --git a/tests/ref/vsynth/vsynth1-mpeg4-thread 
b/tests/ref/vsynth/vsynth1-mpeg4-thread
index 6b69fb4c12..6b110c49fb 100644
--- a/tests/ref/vsynth/vsynth1-mpeg4-thread
+++ b/tests/ref/vsynth/vsynth1-mpeg4-thread
@@ -1,4 +1,4 @@
-369ace2f9613261af869efd9fbb3c149 *tests/data/fate/vsynth1-mpeg4-thread.avi
-774754 tests/data/fate/vsynth1-mpeg4-thread.avi
-9aa327a244d5179acf7fe64dc1459bff 
*tests/data/fate/vsynth1-mpeg4-thread.out.rawvideo
+7761391e354266976a9e0155eff983dd *tests/data/fate/vsynth1-mpeg4-thread.avi
+774752 tests/data/fate/vsynth1-mpeg4-thread.avi
+bbdbe9af4f5b106b847595bf3040699f 
*tests/data/fate/vsynth1-mpeg4-thread.out.rawvideo
 stddev:   10.13 PSNR: 28.02 MAXDIFF:  183 bytes:  7603200/  7603200
diff --git a/tests/ref/vsynth/vsynth2-mpeg2-ivlc-qprd 
b/tests/ref/vsynth/vsynth2-mpeg2-ivlc-qprd
index 16de39edfc..f5bbecfcb2 100644
--- a/tests/ref/vsynth/vsynth2-mpeg2-ivlc-qprd
+++ b/tests/ref/vsynth/vsynth2-mpeg2-ivlc-qprd
@@ -1,4 +1,4 @@
-907a30295ed8323780eee08e606af0ab 
*tests/data/fate/vsynth2-mpeg2-ivlc-qprd.mpeg2video
-269722 tests/data/fate/vsynth2-mpeg2-ivlc-qprd.mpeg2video
-d2d9793bf8f3427b5cc17a1be78ddd64 
*tests/data/fate/vsynth2-mpeg2-ivlc-qprd.out.rawvideo
+f612ea89aa79a7f7b93a8acf332705c4 
*tests/data/fate/vsynth2-mpeg2-ivlc-qprd.mpeg2video
+269723 tests/data/fate/vsynth2-mpeg2-ivlc-qprd.mpeg2video
+88e17886e6383755829d7da519fd5e79 
*tests/data/fate/vsynth2-mpeg2-ivlc-qprd.out.rawvideo
 stddev:5.54 PSNR: 33.25 MAXDIFF:   94 bytes:  7603200/  7603200
diff --git a/tests/ref/vsynth/vsynth2-mpeg4-adap 
b/tests/ref/vsynth/vsynth2-mpeg4-adap
index 35b2b6aac9..e058cd1ce3 100644
--- a/tests/ref/vsynth/vsynth2-mpeg4-adap
+++ b/tests/ref/vsynth/vsynth2-mpeg4-adap
@@ -1,4 +1,4 @@
-06a397fe43dab7b6cf56870410fbbbaf *tests/data/fate/vsynth2-mpeg4-adap.avi
-203000 tests/data/fate/vsynth2-mpeg4-adap.avi
-686565d42d8ba5aea790824b04fa0a18 
*tests/data/fate/vsynth2-mpeg4-adap.out.rawvideo
-stddev:4.55 PSNR: 34.95 MAXDIFF:   84 bytes:  7603200/  7603200
+9465ef120d560537d8fcfb5564782e01 *tests/data/fate/vsynth2-mpeg4-adap.avi
+203004

[FFmpeg-devel] [PATCH v2 1/2] libavcodec/mpegvideo_enc: fix multi-threaded motion estimation rounding for mpeg4

2024-05-08 Thread Ramiro Polla

ff_init_me() was being called after ff_update_duplicate_context(),
which caused the propagation of the initialization to other thread
contexts to be delayed by one frame.

In the case of mpeg4 (or flipflop_rounding), this would make the
hpel_put functions differ between the first thread (which would be
correctly initialized) and the other threads (which would be stale
from the previous frame).
---
 libavcodec/mpegvideo_enc.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/libavcodec/mpegvideo_enc.c b/libavcodec/mpegvideo_enc.c
index 2a75973ac4..b601a1a9e4 100644
--- a/libavcodec/mpegvideo_enc.c
+++ b/libavcodec/mpegvideo_enc.c
@@ -3623,6 +3623,9 @@ static int encode_picture(MpegEncContext *s)
 s->q_chroma_intra_matrix16 = s->q_intra_matrix16;
 }
 
+if(ff_init_me(s)<0)
+return -1;
+
 s->mb_intra=0; //for the rate distortion & bit compare functions
 for(i=1; ithread_context[i], s);
@@ -3630,9 +3633,6 @@ static int encode_picture(MpegEncContext *s)
 return ret;
 }
 
-if(ff_init_me(s)<0)
-return -1;
-
 /* Estimate motion for every MB */
 if(s->pict_type != AV_PICTURE_TYPE_I){
 s->lambda  = (s->lambda  * s->me_penalty_compensation + 128) >> 8;
-- 
2.30.2

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH v4 1/2] checkasm: add test for fdct

2024-05-06 Thread Ramiro Polla

Reviewed-by: Martin Storsjö 
Reviewed-by: Rémi Denis-Courmont 
---
 tests/checkasm/Makefile   |  1 +
 tests/checkasm/checkasm.c |  3 ++
 tests/checkasm/checkasm.h |  1 +
 tests/checkasm/fdctdsp.c  | 71 +++
 tests/fate/checkasm.mak   |  1 +
 5 files changed, 77 insertions(+)
 create mode 100644 tests/checkasm/fdctdsp.c

diff --git a/tests/checkasm/Makefile b/tests/checkasm/Makefile
index 3e40aba2c3..b5bb885201 100644
--- a/tests/checkasm/Makefile
+++ b/tests/checkasm/Makefile
@@ -4,6 +4,7 @@ AVCODECOBJS-$(CONFIG_AC3DSP)+= ac3dsp.o
 AVCODECOBJS-$(CONFIG_AUDIODSP)  += audiodsp.o
 AVCODECOBJS-$(CONFIG_BLOCKDSP)  += blockdsp.o
 AVCODECOBJS-$(CONFIG_BSWAPDSP)  += bswapdsp.o
+AVCODECOBJS-$(CONFIG_FDCTDSP)   += fdctdsp.o
 AVCODECOBJS-$(CONFIG_FMTCONVERT)+= fmtconvert.o
 AVCODECOBJS-$(CONFIG_G722DSP)   += g722dsp.o
 AVCODECOBJS-$(CONFIG_H264CHROMA)+= h264chroma.o
diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c
index 9be32fc16e..e5d39e2116 100644
--- a/tests/checkasm/checkasm.c
+++ b/tests/checkasm/checkasm.c
@@ -106,6 +106,9 @@ static const struct {
 #if CONFIG_EXR_DECODER
 { "exrdsp", checkasm_check_exrdsp },
 #endif
+#if CONFIG_FDCTDSP
+{ "fdctdsp", checkasm_check_fdctdsp },
+#endif
 #if CONFIG_FLAC_DECODER
 { "flacdsp", checkasm_check_flacdsp },
 #endif
diff --git a/tests/checkasm/checkasm.h b/tests/checkasm/checkasm.h
index 173360af60..8807a37a43 100644
--- a/tests/checkasm/checkasm.h
+++ b/tests/checkasm/checkasm.h
@@ -85,6 +85,7 @@ void checkasm_check_blockdsp(void);
 void checkasm_check_bswapdsp(void);
 void checkasm_check_colorspace(void);
 void checkasm_check_exrdsp(void);
+void checkasm_check_fdctdsp(void);
 void checkasm_check_fixed_dsp(void);
 void checkasm_check_flacdsp(void);
 void checkasm_check_float_dsp(void);
diff --git a/tests/checkasm/fdctdsp.c b/tests/checkasm/fdctdsp.c
new file mode 100644
index 00..c640a00656
--- /dev/null
+++ b/tests/checkasm/fdctdsp.c
@@ -0,0 +1,71 @@
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with FFmpeg; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+#include 
+
+#include "checkasm.h"
+
+#include "libavcodec/avcodec.h"
+#include "libavcodec/fdctdsp.h"
+
+#include "libavutil/common.h"
+#include "libavutil/internal.h"
+#include "libavutil/mem_internal.h"
+
+static int int16_cmp_off_by_n(const int16_t *ref, const int16_t *test, size_t 
n, int accuracy)
+{
+for (size_t i = 0; i < n; i++) {
+if (abs(ref[i] - test[i]) > accuracy)
+return 1;
+}
+return 0;
+}
+
+static void check_fdct(void)
+{
+LOCAL_ALIGNED_16(int16_t, block0, [64]);
+LOCAL_ALIGNED_16(int16_t, block1, [64]);
+
+AVCodecContext avctx = {
+.bits_per_raw_sample = 8,
+.dct_algo = FF_DCT_AUTO,
+};
+FDCTDSPContext h;
+
+ff_fdctdsp_init(, );
+
+if (check_func(h.fdct, "fdct")) {
+declare_func(void, int16_t *);
+for (int i = 0; i < 64; i++) {
+uint8_t r = rnd();
+block0[i] = r;
+block1[i] = r;
+}
+call_ref(block0);
+call_new(block1);
+if (int16_cmp_off_by_n(block0, block1, 64, 2))
+fail();
+bench_new(block1);
+}
+}
+
+void checkasm_check_fdctdsp(void)
+{
+check_fdct();
+report("fdctdsp");
+}
diff --git a/tests/fate/checkasm.mak b/tests/fate/checkasm.mak
index 4a8e312da9..9b5e2b0d98 100644
--- a/tests/fate/checkasm.mak
+++ b/tests/fate/checkasm.mak
@@ -8,6 +8,7 @@ FATE_CHECKASM = fate-checkasm-aacencdsp 
\
 fate-checkasm-blockdsp  \
 fate-checkasm-bswapdsp  \
 fate-checkasm-exrdsp\
+fate-checkasm-fdctdsp   \
 fate-checkasm-fixed_dsp \
 fate-checkasm-flacdsp   \
 fate-checkasm-float_dsp \
-- 
2.30.2

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To

Re: [FFmpeg-devel] [PATCH v3 1/2] checkasm: add test for fdct

2024-05-06 Thread Ramiro Polla

On Thu, May 2, 2024 at 8:05 PM Rémi Denis-Courmont  wrote:
> Le keskiviikkona 17. huhtikuuta 2024, 21.01.37 EEST Ramiro Polla a écrit :
[...]
> > +static void check_fdct(void)
> > +{
> > +LOCAL_ALIGNED_16(int16_t, block0, [64]);
> > +LOCAL_ALIGNED_16(int16_t, block1, [64]);
> > +
> > +AVCodecContext avctx = { 0 };
>
> AFAICT, that is not a legal context for ff_fdctdst_init(), which expect
> bits_per_raw_sample to be one of 8, 9 or 10. It would also be good manners to
> initialise dct_algo.

Thanks for spotting it. New patch coming up in a while.

I'll send a similar patch to fix checkasm/idctdsp after this is merged.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH v3 2/2] lavc/aarch64/fdct: add neon-optimized fdct for aarch64

2024-04-17 Thread Ramiro Polla

The code is imported from libjpeg-turbo-3.0.1. The neon registers used
have been changed to avoid modifying v8-v15.

Reviewed-by: Martin Storsjö 
---
 libavcodec/aarch64/Makefile   |   2 +
 libavcodec/aarch64/fdct.h |  26 ++
 libavcodec/aarch64/fdctdsp_init_aarch64.c |  39 +++
 libavcodec/aarch64/fdctdsp_neon.S | 368 ++
 libavcodec/avcodec.h  |   1 +
 libavcodec/fdctdsp.c  |   4 +-
 libavcodec/fdctdsp.h  |   2 +
 libavcodec/options_table.h|   1 +
 libavcodec/tests/aarch64/dct.c|   2 +
 9 files changed, 444 insertions(+), 1 deletion(-)
 create mode 100644 libavcodec/aarch64/fdct.h
 create mode 100644 libavcodec/aarch64/fdctdsp_init_aarch64.c
 create mode 100644 libavcodec/aarch64/fdctdsp_neon.S

diff --git a/libavcodec/aarch64/Makefile b/libavcodec/aarch64/Makefile
index 95ad4dd202..a3256bb1cc 100644
--- a/libavcodec/aarch64/Makefile
+++ b/libavcodec/aarch64/Makefile
@@ -1,5 +1,6 @@
 # subsystems
 OBJS-$(CONFIG_AC3DSP)   += aarch64/ac3dsp_init_aarch64.o
+OBJS-$(CONFIG_FDCTDSP)  += aarch64/fdctdsp_init_aarch64.o
 OBJS-$(CONFIG_FMTCONVERT)   += aarch64/fmtconvert_init.o
 OBJS-$(CONFIG_H264CHROMA)   += aarch64/h264chroma_init_aarch64.o
 OBJS-$(CONFIG_H264DSP)  += aarch64/h264dsp_init_aarch64.o
@@ -37,6 +38,7 @@ ARMV8-OBJS-$(CONFIG_VIDEODSP)   += aarch64/videodsp.o
 # subsystems
 NEON-OBJS-$(CONFIG_AAC_DECODER) += aarch64/sbrdsp_neon.o
 NEON-OBJS-$(CONFIG_AC3DSP)  += aarch64/ac3dsp_neon.o
+NEON-OBJS-$(CONFIG_FDCTDSP) += aarch64/fdctdsp_neon.o
 NEON-OBJS-$(CONFIG_FMTCONVERT)  += aarch64/fmtconvert_neon.o
 NEON-OBJS-$(CONFIG_H264CHROMA)  += aarch64/h264cmc_neon.o
 NEON-OBJS-$(CONFIG_H264DSP) += aarch64/h264dsp_neon.o  
\
diff --git a/libavcodec/aarch64/fdct.h b/libavcodec/aarch64/fdct.h
new file mode 100644
index 00..0901b53a83
--- /dev/null
+++ b/libavcodec/aarch64/fdct.h
@@ -0,0 +1,26 @@
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#ifndef AVCODEC_AARCH64_FDCT_H
+#define AVCODEC_AARCH64_FDCT_H
+
+#include 
+
+void ff_fdct_neon(int16_t *block);
+
+#endif /* AVCODEC_AARCH64_FDCT_H */
diff --git a/libavcodec/aarch64/fdctdsp_init_aarch64.c 
b/libavcodec/aarch64/fdctdsp_init_aarch64.c
new file mode 100644
index 00..59d91bc8fc
--- /dev/null
+++ b/libavcodec/aarch64/fdctdsp_init_aarch64.c
@@ -0,0 +1,39 @@
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include "libavutil/attributes.h"
+#include "libavutil/cpu.h"
+#include "libavutil/aarch64/cpu.h"
+#include "libavcodec/avcodec.h"
+#include "libavcodec/fdctdsp.h"
+#include "fdct.h"
+
+av_cold void ff_fdctdsp_init_aarch64(FDCTDSPContext *c, AVCodecContext *avctx,
+ unsigned high_bit_depth)
+{
+int cpu_flags = av_get_cpu_flags();
+
+if (have_neon(cpu_flags)) {
+if (!high_bit_depth) {
+if (avctx->dct_algo == FF_DCT_AUTO ||
+avctx->dct_algo == FF_DCT_NEON) {
+c->fdct = ff_fdct_neon;
+}
+}
+}
+}
diff --git a/libavcodec/aarch64/fdctdsp_neon.S 
b/libavcodec/aarch64/fdctdsp_neon.S
new file mode 100644
index 00..53fa4debe5
--- /dev/null
+++ b/libavcodec/aarch64/fdctdsp_neon.S
@@ -0,0 +1,368 @@
+/*
+ * Armv8 Neon optimizations for libjpeg-turbo
+ *
+ * Copyright (C) 2009-2011, Nokia Corporation and/or

[FFmpeg-devel] [PATCH v3 1/2] checkasm: add test for fdct

2024-04-17 Thread Ramiro Polla

Reviewed-by: Martin Storsjö 
---
 tests/checkasm/Makefile   |  1 +
 tests/checkasm/checkasm.c |  3 ++
 tests/checkasm/checkasm.h |  1 +
 tests/checkasm/fdctdsp.c  | 68 +++
 tests/fate/checkasm.mak   |  1 +
 5 files changed, 74 insertions(+)
 create mode 100644 tests/checkasm/fdctdsp.c

diff --git a/tests/checkasm/Makefile b/tests/checkasm/Makefile
index 2673e1d098..70a6120c70 100644
--- a/tests/checkasm/Makefile
+++ b/tests/checkasm/Makefile
@@ -4,6 +4,7 @@ AVCODECOBJS-$(CONFIG_AC3DSP)+= ac3dsp.o
 AVCODECOBJS-$(CONFIG_AUDIODSP)  += audiodsp.o
 AVCODECOBJS-$(CONFIG_BLOCKDSP)  += blockdsp.o
 AVCODECOBJS-$(CONFIG_BSWAPDSP)  += bswapdsp.o
+AVCODECOBJS-$(CONFIG_FDCTDSP)   += fdctdsp.o
 AVCODECOBJS-$(CONFIG_FMTCONVERT)+= fmtconvert.o
 AVCODECOBJS-$(CONFIG_G722DSP)   += g722dsp.o
 AVCODECOBJS-$(CONFIG_H264CHROMA)+= h264chroma.o
diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c
index 8be6cb0f55..92c3a30ad3 100644
--- a/tests/checkasm/checkasm.c
+++ b/tests/checkasm/checkasm.c
@@ -106,6 +106,9 @@ static const struct {
 #if CONFIG_EXR_DECODER
 { "exrdsp", checkasm_check_exrdsp },
 #endif
+#if CONFIG_FDCTDSP
+{ "fdctdsp", checkasm_check_fdctdsp },
+#endif
 #if CONFIG_FLAC_DECODER
 { "flacdsp", checkasm_check_flacdsp },
 #endif
diff --git a/tests/checkasm/checkasm.h b/tests/checkasm/checkasm.h
index f90920dee7..d3e8f9a37a 100644
--- a/tests/checkasm/checkasm.h
+++ b/tests/checkasm/checkasm.h
@@ -85,6 +85,7 @@ void checkasm_check_blockdsp(void);
 void checkasm_check_bswapdsp(void);
 void checkasm_check_colorspace(void);
 void checkasm_check_exrdsp(void);
+void checkasm_check_fdctdsp(void);
 void checkasm_check_fixed_dsp(void);
 void checkasm_check_flacdsp(void);
 void checkasm_check_float_dsp(void);
diff --git a/tests/checkasm/fdctdsp.c b/tests/checkasm/fdctdsp.c
new file mode 100644
index 00..68a9b5e435
--- /dev/null
+++ b/tests/checkasm/fdctdsp.c
@@ -0,0 +1,68 @@
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with FFmpeg; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+#include 
+
+#include "checkasm.h"
+
+#include "libavcodec/avcodec.h"
+#include "libavcodec/fdctdsp.h"
+
+#include "libavutil/common.h"
+#include "libavutil/internal.h"
+#include "libavutil/mem_internal.h"
+
+static int int16_cmp_off_by_n(const int16_t *ref, const int16_t *test, size_t 
n, int accuracy)
+{
+for (size_t i = 0; i < n; i++) {
+if (abs(ref[i] - test[i]) > accuracy)
+return 1;
+}
+return 0;
+}
+
+static void check_fdct(void)
+{
+LOCAL_ALIGNED_16(int16_t, block0, [64]);
+LOCAL_ALIGNED_16(int16_t, block1, [64]);
+
+AVCodecContext avctx = { 0 };
+FDCTDSPContext h;
+
+ff_fdctdsp_init(, );
+
+if (check_func(h.fdct, "fdct")) {
+declare_func(void, int16_t *);
+for (int i = 0; i < 64; i++) {
+uint8_t r = rnd();
+block0[i] = r;
+block1[i] = r;
+}
+call_ref(block0);
+call_new(block1);
+if (int16_cmp_off_by_n(block0, block1, 64, 2))
+fail();
+bench_new(block1);
+}
+}
+
+void checkasm_check_fdctdsp(void)
+{
+check_fdct();
+report("fdctdsp");
+}
diff --git a/tests/fate/checkasm.mak b/tests/fate/checkasm.mak
index 3b5b867a97..10a42f2f9d 100644
--- a/tests/fate/checkasm.mak
+++ b/tests/fate/checkasm.mak
@@ -8,6 +8,7 @@ FATE_CHECKASM = fate-checkasm-aacencdsp 
\
 fate-checkasm-blockdsp  \
 fate-checkasm-bswapdsp  \
 fate-checkasm-exrdsp\
+fate-checkasm-fdctdsp   \
 fate-checkasm-fixed_dsp \
 fate-checkasm-flacdsp   \
 fate-checkasm-float_dsp \
-- 
2.30.2

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH v3 0/2] lavc/aarch64/fdct: add neon-optimized fdct for aarch64

2024-04-17 Thread Ramiro Polla

This patch set adds fdct to checkasm and neon-optimized fdct for aarch64.

Ramiro Polla (2):
  checkasm: add test for fdct
  lavc/aarch64/fdct: add neon-optimized fdct for aarch64

 libavcodec/aarch64/Makefile   |   2 +
 libavcodec/aarch64/fdct.h |  26 ++
 libavcodec/aarch64/fdctdsp_init_aarch64.c |  39 +++
 libavcodec/aarch64/fdctdsp_neon.S | 368 ++
 libavcodec/avcodec.h  |   1 +
 libavcodec/fdctdsp.c  |   4 +-
 libavcodec/fdctdsp.h  |   2 +
 libavcodec/options_table.h|   1 +
 libavcodec/tests/aarch64/dct.c|   2 +
 tests/checkasm/Makefile   |   1 +
 tests/checkasm/checkasm.c |   3 +
 tests/checkasm/checkasm.h |   1 +
 tests/checkasm/fdctdsp.c  |  68 
 tests/fate/checkasm.mak   |   1 +
 14 files changed, 518 insertions(+), 1 deletion(-)
 create mode 100644 libavcodec/aarch64/fdct.h
 create mode 100644 libavcodec/aarch64/fdctdsp_init_aarch64.c
 create mode 100644 libavcodec/aarch64/fdctdsp_neon.S
 create mode 100644 tests/checkasm/fdctdsp.c

-- 
2.30.2

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH v2] lavc/aarch64/fdct: add neon-optimized fdct for aarch64

2024-04-16 Thread Ramiro Polla

The code is imported from libjpeg-turbo-3.0.1. The neon registers used
have been changed to avoid modifying v8-v15.
---
 libavcodec/aarch64/Makefile   |   2 +
 libavcodec/aarch64/fdct.h |  26 ++
 libavcodec/aarch64/fdctdsp_init_aarch64.c |  39 +++
 libavcodec/aarch64/fdctdsp_neon.S | 368 ++
 libavcodec/avcodec.h  |   1 +
 libavcodec/fdctdsp.c  |   4 +-
 libavcodec/fdctdsp.h  |   2 +
 libavcodec/options_table.h|   1 +
 libavcodec/tests/aarch64/dct.c|   2 +
 tests/checkasm/Makefile   |   1 +
 tests/checkasm/checkasm.c |   3 +
 tests/checkasm/checkasm.h |   1 +
 tests/checkasm/fdctdsp.c  |  68 
 tests/fate/checkasm.mak   |   1 +
 14 files changed, 518 insertions(+), 1 deletion(-)
 create mode 100644 libavcodec/aarch64/fdct.h
 create mode 100644 libavcodec/aarch64/fdctdsp_init_aarch64.c
 create mode 100644 libavcodec/aarch64/fdctdsp_neon.S
 create mode 100644 tests/checkasm/fdctdsp.c

diff --git a/libavcodec/aarch64/Makefile b/libavcodec/aarch64/Makefile
index 95ad4dd202..a3256bb1cc 100644
--- a/libavcodec/aarch64/Makefile
+++ b/libavcodec/aarch64/Makefile
@@ -1,5 +1,6 @@
 # subsystems
 OBJS-$(CONFIG_AC3DSP)   += aarch64/ac3dsp_init_aarch64.o
+OBJS-$(CONFIG_FDCTDSP)  += aarch64/fdctdsp_init_aarch64.o
 OBJS-$(CONFIG_FMTCONVERT)   += aarch64/fmtconvert_init.o
 OBJS-$(CONFIG_H264CHROMA)   += aarch64/h264chroma_init_aarch64.o
 OBJS-$(CONFIG_H264DSP)  += aarch64/h264dsp_init_aarch64.o
@@ -37,6 +38,7 @@ ARMV8-OBJS-$(CONFIG_VIDEODSP)   += aarch64/videodsp.o
 # subsystems
 NEON-OBJS-$(CONFIG_AAC_DECODER) += aarch64/sbrdsp_neon.o
 NEON-OBJS-$(CONFIG_AC3DSP)  += aarch64/ac3dsp_neon.o
+NEON-OBJS-$(CONFIG_FDCTDSP) += aarch64/fdctdsp_neon.o
 NEON-OBJS-$(CONFIG_FMTCONVERT)  += aarch64/fmtconvert_neon.o
 NEON-OBJS-$(CONFIG_H264CHROMA)  += aarch64/h264cmc_neon.o
 NEON-OBJS-$(CONFIG_H264DSP) += aarch64/h264dsp_neon.o  
\
diff --git a/libavcodec/aarch64/fdct.h b/libavcodec/aarch64/fdct.h
new file mode 100644
index 00..0901b53a83
--- /dev/null
+++ b/libavcodec/aarch64/fdct.h
@@ -0,0 +1,26 @@
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#ifndef AVCODEC_AARCH64_FDCT_H
+#define AVCODEC_AARCH64_FDCT_H
+
+#include 
+
+void ff_fdct_neon(int16_t *block);
+
+#endif /* AVCODEC_AARCH64_FDCT_H */
diff --git a/libavcodec/aarch64/fdctdsp_init_aarch64.c 
b/libavcodec/aarch64/fdctdsp_init_aarch64.c
new file mode 100644
index 00..59d91bc8fc
--- /dev/null
+++ b/libavcodec/aarch64/fdctdsp_init_aarch64.c
@@ -0,0 +1,39 @@
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include "libavutil/attributes.h"
+#include "libavutil/cpu.h"
+#include "libavutil/aarch64/cpu.h"
+#include "libavcodec/avcodec.h"
+#include "libavcodec/fdctdsp.h"
+#include "fdct.h"
+
+av_cold void ff_fdctdsp_init_aarch64(FDCTDSPContext *c, AVCodecContext *avctx,
+ unsigned high_bit_depth)
+{
+int cpu_flags = av_get_cpu_flags();
+
+if (have_neon(cpu_flags)) {
+if (!high_bit_depth) {
+if (avctx->dct_algo == FF_DCT_AUTO ||
+avctx->dct_algo == FF_DCT_NEON) {
+c->fdct = ff_fdct_neon;
+}
+}
+}
+}
diff --git

Re: [FFmpeg-devel] [PATCH] lavc/aarch64/fdct: add neon-optimized fdct for aarch64

2024-04-16 Thread Ramiro Polla

Hi,

On Wed, Feb 14, 2024 at 10:42 AM Martin Storsjö  wrote:
> On Sun, 4 Feb 2024, Ramiro Polla wrote:
>
> > The code is imported from libjpeg-turbo-3.0.1. The neon registers used
> > have been changed to avoid modifying v8-v15.
> > ---
>
> I don't remember if we have any extra routines we need to do if importing
> foreign code with a differing license. The license here seems fine in any
> case though.

I think the license should be ok (based on the "Patches/Committing"
section in developer.texi).

> This seems to work fine in all my test environments. And thanks for making
> sure it doesn't use v8-v15!
>
> I'm not so familiar with these DSP functions, whether it is norm to add a
> new constant like FF_DCT_NEON, but I guess it seems to match the pattern
> of the existing code.

I don't know either, so I just tried to match the existing code :)

> I presume the main case that tests this is "make fate-dct8x8", which
> builds and executes libavcodec/tests/dct? How much work would it be to
> integrate testing of these routines into checkasm? That way we could rest
> assured that the assembly passes all such ABI checks that we do there,
> including what registers must not be clobbered.

I added checkasm for fdct. It's especially useful to make sure there
is no overflow in the DC coefficient.

> The assembly uses a different indentation width than the rest of our
> assembly. I recently spent some effort on cleaning that up so that our
> code is mostly consistent, so I'd prefer not to add new code that deviates
> from it. It primarily looks like you'd need to add 4 spaces at the start
> of each line.
>
> I've used a script for mostly automatically reindenting our arm assembly,
> you can grab it at https://martin.st/temp/ffmpeg-asm-indent.pl, run it as
> "cat file.S | ./ffmpeg-asm-indent.pl > tmp; mv tmp file.S". It's not 100%
> accurate, but mostly gets you there, but it's good to manually check it
> afterwards as well.

I fixed the indentation and tweaked a few more cosmetics in the comments.

Thank you for the review and the help on IRC! I'll send v2 shortly.

Ramiro
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH] lavc/aarch64/fdct: add neon-optimized fdct for aarch64

2024-03-06 Thread Ramiro Polla

ping

On Sun, Feb 4, 2024 at 3:42 PM Ramiro Polla  wrote:
>
> The code is imported from libjpeg-turbo-3.0.1. The neon registers used
> have been changed to avoid modifying v8-v15.
> ---
>  libavcodec/aarch64/Makefile   |   2 +
>  libavcodec/aarch64/fdct.h |  26 ++
>  libavcodec/aarch64/fdctdsp_init_aarch64.c |  39 +++
>  libavcodec/aarch64/fdctdsp_neon.S | 369 ++
>  libavcodec/avcodec.h  |   1 +
>  libavcodec/fdctdsp.c  |   4 +-
>  libavcodec/fdctdsp.h  |   2 +
>  libavcodec/options_table.h|   1 +
>  libavcodec/tests/aarch64/dct.c|   2 +
>  9 files changed, 445 insertions(+), 1 deletion(-)
>  create mode 100644 libavcodec/aarch64/fdct.h
>  create mode 100644 libavcodec/aarch64/fdctdsp_init_aarch64.c
>  create mode 100644 libavcodec/aarch64/fdctdsp_neon.S
>
> diff --git a/libavcodec/aarch64/Makefile b/libavcodec/aarch64/Makefile
> index beb6a02f5f..eebccbe4a5 100644
> --- a/libavcodec/aarch64/Makefile
> +++ b/libavcodec/aarch64/Makefile
> @@ -1,4 +1,5 @@
>  # subsystems
> +OBJS-$(CONFIG_FDCTDSP)  += aarch64/fdctdsp_init_aarch64.o
>  OBJS-$(CONFIG_FMTCONVERT)   += aarch64/fmtconvert_init.o
>  OBJS-$(CONFIG_H264CHROMA)   += aarch64/h264chroma_init_aarch64.o
>  OBJS-$(CONFIG_H264DSP)  += aarch64/h264dsp_init_aarch64.o
> @@ -35,6 +36,7 @@ ARMV8-OBJS-$(CONFIG_VIDEODSP)   += 
> aarch64/videodsp.o
>
>  # subsystems
>  NEON-OBJS-$(CONFIG_AAC_DECODER) += aarch64/sbrdsp_neon.o
> +NEON-OBJS-$(CONFIG_FDCTDSP) += aarch64/fdctdsp_neon.o
>  NEON-OBJS-$(CONFIG_FMTCONVERT)  += aarch64/fmtconvert_neon.o
>  NEON-OBJS-$(CONFIG_H264CHROMA)  += aarch64/h264cmc_neon.o
>  NEON-OBJS-$(CONFIG_H264DSP) += aarch64/h264dsp_neon.o
>   \
> diff --git a/libavcodec/aarch64/fdct.h b/libavcodec/aarch64/fdct.h
> new file mode 100644
> index 00..0901b53a83
> --- /dev/null
> +++ b/libavcodec/aarch64/fdct.h
> @@ -0,0 +1,26 @@
> +/*
> + * This file is part of FFmpeg.
> + *
> + * FFmpeg is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * FFmpeg is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with FFmpeg; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 
> USA
> + */
> +
> +#ifndef AVCODEC_AARCH64_FDCT_H
> +#define AVCODEC_AARCH64_FDCT_H
> +
> +#include 
> +
> +void ff_fdct_neon(int16_t *block);
> +
> +#endif /* AVCODEC_AARCH64_FDCT_H */
> diff --git a/libavcodec/aarch64/fdctdsp_init_aarch64.c 
> b/libavcodec/aarch64/fdctdsp_init_aarch64.c
> new file mode 100644
> index 00..59d91bc8fc
> --- /dev/null
> +++ b/libavcodec/aarch64/fdctdsp_init_aarch64.c
> @@ -0,0 +1,39 @@
> +/*
> + * This file is part of FFmpeg.
> + *
> + * FFmpeg is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * FFmpeg is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with FFmpeg; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 
> USA
> + */
> +
> +#include "libavutil/attributes.h"
> +#include "libavutil/cpu.h"
> +#include "libavutil/aarch64/cpu.h"
> +#include "libavcodec/avcodec.h"
> +#include "libavcodec/fdctdsp.h"
> +#include "fdct.h"
> +
> +av_cold void ff_fdctdsp_init_aarch64(FDCTDSPContext *c, AVCodecContext 
> *avctx,
> + unsigned high_bit_depth)
> +{
> +int cpu_flags = av_get_cpu_flags();
> +
> +if (have_neon(cpu_flags)) {
> +if (

[FFmpeg-devel] [PATCH] lavc/aarch64/fdct: add neon-optimized fdct for aarch64

2024-02-04 Thread Ramiro Polla

The code is imported from libjpeg-turbo-3.0.1. The neon registers used
have been changed to avoid modifying v8-v15.
---
 libavcodec/aarch64/Makefile   |   2 +
 libavcodec/aarch64/fdct.h |  26 ++
 libavcodec/aarch64/fdctdsp_init_aarch64.c |  39 +++
 libavcodec/aarch64/fdctdsp_neon.S | 369 ++
 libavcodec/avcodec.h  |   1 +
 libavcodec/fdctdsp.c  |   4 +-
 libavcodec/fdctdsp.h  |   2 +
 libavcodec/options_table.h|   1 +
 libavcodec/tests/aarch64/dct.c|   2 +
 9 files changed, 445 insertions(+), 1 deletion(-)
 create mode 100644 libavcodec/aarch64/fdct.h
 create mode 100644 libavcodec/aarch64/fdctdsp_init_aarch64.c
 create mode 100644 libavcodec/aarch64/fdctdsp_neon.S

diff --git a/libavcodec/aarch64/Makefile b/libavcodec/aarch64/Makefile
index beb6a02f5f..eebccbe4a5 100644
--- a/libavcodec/aarch64/Makefile
+++ b/libavcodec/aarch64/Makefile
@@ -1,4 +1,5 @@
 # subsystems
+OBJS-$(CONFIG_FDCTDSP)  += aarch64/fdctdsp_init_aarch64.o
 OBJS-$(CONFIG_FMTCONVERT)   += aarch64/fmtconvert_init.o
 OBJS-$(CONFIG_H264CHROMA)   += aarch64/h264chroma_init_aarch64.o
 OBJS-$(CONFIG_H264DSP)  += aarch64/h264dsp_init_aarch64.o
@@ -35,6 +36,7 @@ ARMV8-OBJS-$(CONFIG_VIDEODSP)   += aarch64/videodsp.o
 
 # subsystems
 NEON-OBJS-$(CONFIG_AAC_DECODER) += aarch64/sbrdsp_neon.o
+NEON-OBJS-$(CONFIG_FDCTDSP) += aarch64/fdctdsp_neon.o
 NEON-OBJS-$(CONFIG_FMTCONVERT)  += aarch64/fmtconvert_neon.o
 NEON-OBJS-$(CONFIG_H264CHROMA)  += aarch64/h264cmc_neon.o
 NEON-OBJS-$(CONFIG_H264DSP) += aarch64/h264dsp_neon.o  
\
diff --git a/libavcodec/aarch64/fdct.h b/libavcodec/aarch64/fdct.h
new file mode 100644
index 00..0901b53a83
--- /dev/null
+++ b/libavcodec/aarch64/fdct.h
@@ -0,0 +1,26 @@
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#ifndef AVCODEC_AARCH64_FDCT_H
+#define AVCODEC_AARCH64_FDCT_H
+
+#include 
+
+void ff_fdct_neon(int16_t *block);
+
+#endif /* AVCODEC_AARCH64_FDCT_H */
diff --git a/libavcodec/aarch64/fdctdsp_init_aarch64.c 
b/libavcodec/aarch64/fdctdsp_init_aarch64.c
new file mode 100644
index 00..59d91bc8fc
--- /dev/null
+++ b/libavcodec/aarch64/fdctdsp_init_aarch64.c
@@ -0,0 +1,39 @@
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include "libavutil/attributes.h"
+#include "libavutil/cpu.h"
+#include "libavutil/aarch64/cpu.h"
+#include "libavcodec/avcodec.h"
+#include "libavcodec/fdctdsp.h"
+#include "fdct.h"
+
+av_cold void ff_fdctdsp_init_aarch64(FDCTDSPContext *c, AVCodecContext *avctx,
+ unsigned high_bit_depth)
+{
+int cpu_flags = av_get_cpu_flags();
+
+if (have_neon(cpu_flags)) {
+if (!high_bit_depth) {
+if (avctx->dct_algo == FF_DCT_AUTO ||
+avctx->dct_algo == FF_DCT_NEON) {
+c->fdct = ff_fdct_neon;
+}
+}
+}
+}
diff --git a/libavcodec/aarch64/fdctdsp_neon.S 
b/libavcodec/aarch64/fdctdsp_neon.S
new file mode 100644
index 00..978c8d3002
--- /dev/null
+++ b/libavcodec/aarch64/fdctdsp_neon.S
@@ -0,0 +1,369 @@
+/*
+ * Armv8 Neon optimizations for libjpeg-turbo
+ *
+ * Copyright (C) 2009-2011, Nokia Corporation and/or its subsidiary(-ies).
+ *  All Rights Reserved.
+ * Author:  Siarhei Siamashka 
+ * Copyright (C) 2013-2014, Linaro Limited.  All Rights

[FFmpeg-devel] [PATCH] lavc/aarch64: fix include for cpu.h

2024-01-21 Thread Ramiro Polla

---
 libavcodec/aarch64/idctdsp_init_aarch64.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libavcodec/aarch64/idctdsp_init_aarch64.c 
b/libavcodec/aarch64/idctdsp_init_aarch64.c
index eec21aa5a2..8efd5f5323 100644
--- a/libavcodec/aarch64/idctdsp_init_aarch64.c
+++ b/libavcodec/aarch64/idctdsp_init_aarch64.c
@@ -22,7 +22,7 @@
 
 #include "libavutil/attributes.h"
 #include "libavutil/cpu.h"
-#include "libavutil/arm/cpu.h"
+#include "libavutil/aarch64/cpu.h"
 #include "libavcodec/avcodec.h"
 #include "libavcodec/idctdsp.h"
 #include "idct.h"
-- 
2.30.2

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH 2/2] avcodec/motion_est: fix penalty_factor for b frames

2020-03-23 Thread Ramiro Polla

On Mon, Mar 23, 2020 at 10:42 PM Michael Niedermayer
 wrote:
> On Sun, Mar 22, 2020 at 04:55:25PM +0100, Ramiro Polla wrote:
> > In ff_estimate_b_frame_motion(), penalty_factor would be used before
> > being initialized in estimate_motion_b(). Also, the initialization
> > would happen more than once unnecessarily.
> > ---
> >  libavcodec/motion_est.c  | 15 ---
> >  tests/ref/vsynth/vsynth2-mpeg2-422   |  6 +++---
> >  tests/ref/vsynth/vsynth2-mpeg2-ivlc-qprd |  6 +++---
> >  tests/ref/vsynth/vsynth2-mpeg4-adap  |  6 +++---
> >  4 files changed, 17 insertions(+), 16 deletions(-)
> >
> > diff --git a/libavcodec/motion_est.c b/libavcodec/motion_est.c
> > index 02c75fd470..1feb46cec3 100644
> > --- a/libavcodec/motion_est.c
> > +++ b/libavcodec/motion_est.c
> > @@ -1123,9 +1123,6 @@ static int estimate_motion_b(MpegEncContext *s, int 
> > mb_x, int mb_y,
> >  uint8_t * const mv_penalty= c->mv_penalty[f_code] + MAX_DMV;
> >  int mv_scale;
> >
> > -c->penalty_factor= get_penalty_factor(s->lambda, s->lambda2, 
> > c->avctx->me_cmp);
> > -c->sub_penalty_factor= get_penalty_factor(s->lambda, s->lambda2, 
> > c->avctx->me_sub_cmp);
> > -c->mb_penalty_factor = get_penalty_factor(s->lambda, s->lambda2, 
> > c->avctx->mb_cmp);
> >  c->current_mv_penalty= mv_penalty;
> >
> >  get_limits(s, 16*mb_x, 16*mb_y);
>
>
>
> > @@ -1491,7 +1488,6 @@ void ff_estimate_b_frame_motion(MpegEncContext * s,
> >   int mb_x, int mb_y)
> >  {
> >  MotionEstContext * const c= >me;
> > -const int penalty_factor= c->mb_penalty_factor;
> >  int fmin, bmin, dmin, fbmin, bimin, fimin;
> >  int type=0;
> >  const int xy = mb_y*s->mb_stride + mb_x;
> > @@ -1517,18 +1513,23 @@ void ff_estimate_b_frame_motion(MpegEncContext * s,
> >  dmin= direct_search(s, mb_x, mb_y);
> >  else
> >  dmin= INT_MAX;
> > +
> > +c->penalty_factor= get_penalty_factor(s->lambda, s->lambda2, 
> > c->avctx->me_cmp);
> > +c->sub_penalty_factor= get_penalty_factor(s->lambda, s->lambda2, 
> > c->avctx->me_sub_cmp);
> > +c->mb_penalty_factor = get_penalty_factor(s->lambda, s->lambda2, 
> > c->avctx->mb_cmp);
>
> If mb_penalty_factor isnt correct in this before this maybe isnt enough
> as the direct_search() uses mb_penalty_factor

Fixed.

New patch attached.
From 8feded1143715b064c8556a460feb86394b86acd Mon Sep 17 00:00:00 2001
From: Ramiro Polla 
Date: Sun, 22 Mar 2020 16:45:05 +0100
Subject: [PATCH] avcodec/motion_est: fix penalty_factor for b frames

In direct_search() and ff_estimate_b_frame_motion(), penalty_factor
would be used before being initialized in estimate_motion_b(). Also,
the initialization would happen more than once unnecessarily.
---
 libavcodec/motion_est.c  | 15 ---
 tests/ref/vsynth/vsynth1-mpeg4-thread|  6 +++---
 tests/ref/vsynth/vsynth2-mpeg2-422   |  6 +++---
 tests/ref/vsynth/vsynth2-mpeg2-ivlc-qprd |  6 +++---
 tests/ref/vsynth/vsynth2-mpeg4-adap  |  6 +++---
 tests/ref/vsynth/vsynth2-mpeg4-qprd  |  6 +++---
 tests/ref/vsynth/vsynth2-mpeg4-thread|  6 +++---
 tests/ref/vsynth/vsynth_lena-mpeg4-rc|  4 ++--
 8 files changed, 28 insertions(+), 27 deletions(-)

diff --git a/libavcodec/motion_est.c b/libavcodec/motion_est.c
index 02c75fd470..520a57d4d9 100644
--- a/libavcodec/motion_est.c
+++ b/libavcodec/motion_est.c
@@ -1123,9 +1123,6 @@ static int estimate_motion_b(MpegEncContext *s, int mb_x, int mb_y,
 uint8_t * const mv_penalty= c->mv_penalty[f_code] + MAX_DMV;
 int mv_scale;
 
-c->penalty_factor= get_penalty_factor(s->lambda, s->lambda2, c->avctx->me_cmp);
-c->sub_penalty_factor= get_penalty_factor(s->lambda, s->lambda2, c->avctx->me_sub_cmp);
-c->mb_penalty_factor = get_penalty_factor(s->lambda, s->lambda2, c->avctx->mb_cmp);
 c->current_mv_penalty= mv_penalty;
 
 get_limits(s, 16*mb_x, 16*mb_y);
@@ -1491,7 +1488,6 @@ void ff_estimate_b_frame_motion(MpegEncContext * s,
  int mb_x, int mb_y)
 {
 MotionEstContext * const c= >me;
-const int penalty_factor= c->mb_penalty_factor;
 int fmin, bmin, dmin, fbmin, bimin, fimin;
 int type=0;
 const int xy = mb_y*s->mb_stride + mb_x;
@@ -1513,22 +1509,27 @@ void ff_estimate_b_frame_motion(MpegEncContext * s,
 return;
 }
 
+c->penalty_factor= get_penalty_factor(s->lambda, s->lambda2, c->avctx->me_cmp);
+c->sub_penalty_factor

Re: [FFmpeg-devel] [PATCH] MAINTAINERS: add my gpg fingerprint

2020-03-23 Thread Ramiro Polla

Hi Michael,

On Mon, Mar 23, 2020 at 8:44 PM Michael Niedermayer
 wrote:
>
> On Mon, Mar 23, 2020 at 04:11:04AM +0100, Ramiro Polla wrote:
> > ---
> >  MAINTAINERS | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index f9810d5594..9238a1a762 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -614,6 +614,7 @@ Nikolay Aleksandrov   8978 1D8C FB71 588E 4B27 
> > EAA8 C4F0 B5FC E011 13B1
> >  Panagiotis Issaris6571 13A3 33D9 3726 F728 AA98 F643 B12E ECF3 
> > E029
> >  Peter RossA907 E02F A6E5 0CD2 34CD 20D2 6760 79C5 AC40 
> > DD6B
> >  Philip Langdale   5DC5 8D66 5FBA 3A43 18EC 045E F8D6 B194 6A75 
> > 682E
> > +Ramiro Polla  1E0D 3820 ACCB 36AF 97B4 F18C 648E 2B0A E905 
> > E26A
> >  Reimar Doeffinger C61D 16E5 9E2C D10C 8958 38A4 0899 A2B9 06D4 
> > D9C7
> >  Reinhard Tartler  9300 5DC2 7E87 6C37 ED7B CA9A 9808 3544 9453 
> > 48A4
> >  Reynaldo H. Verdejo Pinochet  6E27 CD34 170C C78E 4D4F 5F40 C18E 077F 3114 
> > 452A
>
> iam unable to find a matching key on the keyserver
>
> gpg --search-key "ramiro polla"
> gpg: searching for "ramiro polla" from hkp server keys.gnupg.net
> (1) Ramiro Polla 
>   2048 bit RSA key 9B6C5700, created: 2014-09-23
> (2) Ramiro Polla 
>       1024 bit DSA key 25E635F9, created: 2010-01-08

> pub   2048R/9B6C5700 2014-09-23
>   Key fingerprint = 7859 C65B 751B 1179 792E  DAE8 8E95 8B2F 9B6C 5700
> uid  Ramiro Polla 
> sub   2048R/CAF28B6D 2014-09-23

Sorry, I have very little experience with this. That's an old key, but
I found it now.

New patch attached with the 2014 fingerprint (also attached the patch
signed with that key just because why not...).

Hopefully this is good now.


0001-MAINTAINERS-add-my-gpg-fingerprint.patch.sig
Description: PGP signature
From 3c1603bd0c698fc9957c5bda718c176e6084ee2c Mon Sep 17 00:00:00 2001
From: Ramiro Polla 
Date: Mon, 23 Mar 2020 04:02:25 +0100
Subject: [PATCH] MAINTAINERS: add my gpg fingerprint

---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index f9810d5594..e19d1ee586 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -614,6 +614,7 @@ Nikolay Aleksandrov   8978 1D8C FB71 588E 4B27 EAA8 C4F0 B5FC E011 13B1
 Panagiotis Issaris6571 13A3 33D9 3726 F728 AA98 F643 B12E ECF3 E029
 Peter RossA907 E02F A6E5 0CD2 34CD 20D2 6760 79C5 AC40 DD6B
 Philip Langdale   5DC5 8D66 5FBA 3A43 18EC 045E F8D6 B194 6A75 682E
+Ramiro Polla  7859 C65B 751B 1179 792E DAE8 8E95 8B2F 9B6C 5700
 Reimar Doeffinger C61D 16E5 9E2C D10C 8958 38A4 0899 A2B9 06D4 D9C7
 Reinhard Tartler  9300 5DC2 7E87 6C37 ED7B CA9A 9808 3544 9453 48A4
 Reynaldo H. Verdejo Pinochet  6E27 CD34 170C C78E 4D4F 5F40 C18E 077F 3114 452A
-- 
2.11.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH] MAINTAINERS: add my gpg fingerprint

2020-03-22 Thread Ramiro Polla

---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index f9810d5594..9238a1a762 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -614,6 +614,7 @@ Nikolay Aleksandrov   8978 1D8C FB71 588E 4B27 EAA8 
C4F0 B5FC E011 13B1
 Panagiotis Issaris6571 13A3 33D9 3726 F728 AA98 F643 B12E ECF3 E029
 Peter RossA907 E02F A6E5 0CD2 34CD 20D2 6760 79C5 AC40 DD6B
 Philip Langdale   5DC5 8D66 5FBA 3A43 18EC 045E F8D6 B194 6A75 682E
+Ramiro Polla  1E0D 3820 ACCB 36AF 97B4 F18C 648E 2B0A E905 E26A
 Reimar Doeffinger C61D 16E5 9E2C D10C 8958 38A4 0899 A2B9 06D4 D9C7
 Reinhard Tartler  9300 5DC2 7E87 6C37 ED7B CA9A 9808 3544 9453 48A4
 Reynaldo H. Verdejo Pinochet  6E27 CD34 170C C78E 4D4F 5F40 C18E 077F 3114 452A
-- 
2.11.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH 1/2] avcodec/mpegvideo_enc: fix multi-threaded motion estimation rounding for mpeg4

2020-03-22 Thread Ramiro Polla

ff_init_me() was being called after ff_update_duplicate_context(),
which caused the propagation of the initialization to other thread
contexts to be delayed by one frame.

In the case of mpeg4 (or flipflop_rounding), this would make the
hpel_put functions differ between the first thread (which would be
correctly initialized) and the other threads (which would be stale
from the previous frame).
---
 libavcodec/mpegvideo_enc.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/libavcodec/mpegvideo_enc.c b/libavcodec/mpegvideo_enc.c
index b2eb9cf318..8c2672f76a 100644
--- a/libavcodec/mpegvideo_enc.c
+++ b/libavcodec/mpegvideo_enc.c
@@ -3699,6 +3699,9 @@ static int encode_picture(MpegEncContext *s, int 
picture_number)
 s->q_chroma_intra_matrix16 = s->q_intra_matrix16;
 }
 
+if(ff_init_me(s)<0)
+return -1;
+
 s->mb_intra=0; //for the rate distortion & bit compare functions
 for(i=1; ithread_context[i], s);
@@ -3706,9 +3709,6 @@ static int encode_picture(MpegEncContext *s, int 
picture_number)
 return ret;
 }
 
-if(ff_init_me(s)<0)
-return -1;
-
 /* Estimate motion for every MB */
 if(s->pict_type != AV_PICTURE_TYPE_I){
 s->lambda  = (s->lambda  * s->me_penalty_compensation + 128) >> 8;
-- 
2.11.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH 2/2] avcodec/motion_est: fix penalty_factor for b frames

2020-03-22 Thread Ramiro Polla

In ff_estimate_b_frame_motion(), penalty_factor would be used before
being initialized in estimate_motion_b(). Also, the initialization
would happen more than once unnecessarily.
---
 libavcodec/motion_est.c  | 15 ---
 tests/ref/vsynth/vsynth2-mpeg2-422   |  6 +++---
 tests/ref/vsynth/vsynth2-mpeg2-ivlc-qprd |  6 +++---
 tests/ref/vsynth/vsynth2-mpeg4-adap  |  6 +++---
 4 files changed, 17 insertions(+), 16 deletions(-)

diff --git a/libavcodec/motion_est.c b/libavcodec/motion_est.c
index 02c75fd470..1feb46cec3 100644
--- a/libavcodec/motion_est.c
+++ b/libavcodec/motion_est.c
@@ -1123,9 +1123,6 @@ static int estimate_motion_b(MpegEncContext *s, int mb_x, 
int mb_y,
 uint8_t * const mv_penalty= c->mv_penalty[f_code] + MAX_DMV;
 int mv_scale;
 
-c->penalty_factor= get_penalty_factor(s->lambda, s->lambda2, 
c->avctx->me_cmp);
-c->sub_penalty_factor= get_penalty_factor(s->lambda, s->lambda2, 
c->avctx->me_sub_cmp);
-c->mb_penalty_factor = get_penalty_factor(s->lambda, s->lambda2, 
c->avctx->mb_cmp);
 c->current_mv_penalty= mv_penalty;
 
 get_limits(s, 16*mb_x, 16*mb_y);
@@ -1491,7 +1488,6 @@ void ff_estimate_b_frame_motion(MpegEncContext * s,
  int mb_x, int mb_y)
 {
 MotionEstContext * const c= >me;
-const int penalty_factor= c->mb_penalty_factor;
 int fmin, bmin, dmin, fbmin, bimin, fimin;
 int type=0;
 const int xy = mb_y*s->mb_stride + mb_x;
@@ -1517,18 +1513,23 @@ void ff_estimate_b_frame_motion(MpegEncContext * s,
 dmin= direct_search(s, mb_x, mb_y);
 else
 dmin= INT_MAX;
+
+c->penalty_factor= get_penalty_factor(s->lambda, s->lambda2, 
c->avctx->me_cmp);
+c->sub_penalty_factor= get_penalty_factor(s->lambda, s->lambda2, 
c->avctx->me_sub_cmp);
+c->mb_penalty_factor = get_penalty_factor(s->lambda, s->lambda2, 
c->avctx->mb_cmp);
+
 // FIXME penalty stuff for non-MPEG-4
 c->skip=0;
 fmin = estimate_motion_b(s, mb_x, mb_y, s->b_forw_mv_table, 0, s->f_code) +
-   3 * penalty_factor;
+   3 * c->mb_penalty_factor;
 
 c->skip=0;
 bmin = estimate_motion_b(s, mb_x, mb_y, s->b_back_mv_table, 2, s->b_code) +
-   2 * penalty_factor;
+   2 * c->mb_penalty_factor;
 ff_dlog(s, " %d %d ", s->b_forw_mv_table[xy][0], 
s->b_forw_mv_table[xy][1]);
 
 c->skip=0;
-fbmin= bidir_refine(s, mb_x, mb_y) + penalty_factor;
+fbmin= bidir_refine(s, mb_x, mb_y) + c->mb_penalty_factor;
 ff_dlog(s, "%d %d %d %d\n", dmin, fmin, bmin, fbmin);
 
 if (s->avctx->flags & AV_CODEC_FLAG_INTERLACED_ME) {
diff --git a/tests/ref/vsynth/vsynth2-mpeg2-422 
b/tests/ref/vsynth/vsynth2-mpeg2-422
index ec7244f9f9..e945a4cc0e 100644
--- a/tests/ref/vsynth/vsynth2-mpeg2-422
+++ b/tests/ref/vsynth/vsynth2-mpeg2-422
@@ -1,4 +1,4 @@
-b2fa9b73c3547191ecc01b8163abd4e5 *tests/data/fate/vsynth2-mpeg2-422.mpeg2video
-379164 tests/data/fate/vsynth2-mpeg2-422.mpeg2video
-704f6a96f93c2409219bd48b74169041 
*tests/data/fate/vsynth2-mpeg2-422.out.rawvideo
+6fc8dc1d76379e459051ca393101c090 *tests/data/fate/vsynth2-mpeg2-422.mpeg2video
+379173 tests/data/fate/vsynth2-mpeg2-422.mpeg2video
+9199d5aaa1709d2584e21e58d76d44fb 
*tests/data/fate/vsynth2-mpeg2-422.out.rawvideo
 stddev:4.17 PSNR: 35.73 MAXDIFF:   70 bytes:  7603200/  7603200
diff --git a/tests/ref/vsynth/vsynth2-mpeg2-ivlc-qprd 
b/tests/ref/vsynth/vsynth2-mpeg2-ivlc-qprd
index 16de39edfc..f5bbecfcb2 100644
--- a/tests/ref/vsynth/vsynth2-mpeg2-ivlc-qprd
+++ b/tests/ref/vsynth/vsynth2-mpeg2-ivlc-qprd
@@ -1,4 +1,4 @@
-907a30295ed8323780eee08e606af0ab 
*tests/data/fate/vsynth2-mpeg2-ivlc-qprd.mpeg2video
-269722 tests/data/fate/vsynth2-mpeg2-ivlc-qprd.mpeg2video
-d2d9793bf8f3427b5cc17a1be78ddd64 
*tests/data/fate/vsynth2-mpeg2-ivlc-qprd.out.rawvideo
+f612ea89aa79a7f7b93a8acf332705c4 
*tests/data/fate/vsynth2-mpeg2-ivlc-qprd.mpeg2video
+269723 tests/data/fate/vsynth2-mpeg2-ivlc-qprd.mpeg2video
+88e17886e6383755829d7da519fd5e79 
*tests/data/fate/vsynth2-mpeg2-ivlc-qprd.out.rawvideo
 stddev:5.54 PSNR: 33.25 MAXDIFF:   94 bytes:  7603200/  7603200
diff --git a/tests/ref/vsynth/vsynth2-mpeg4-adap 
b/tests/ref/vsynth/vsynth2-mpeg4-adap
index a3223f6363..1ae0a65e4f 100644
--- a/tests/ref/vsynth/vsynth2-mpeg4-adap
+++ b/tests/ref/vsynth/vsynth2-mpeg4-adap
@@ -1,4 +1,4 @@
-4bff98da2342836476da817428594403 *tests/data/fate/vsynth2-mpeg4-adap.avi
-213508 tests/data/fate/vsynth2-mpeg4-adap.avi
-0c709f2b81f4593eaa29490332c2cb39 
*tests/data/fate/vsynth2-mpeg4-adap.out.rawvideo
+fcb79c0dcc00b306b79c354e589b6b69 *tests/data/fate/vsynth2-mpeg4-adap.avi
+213526 tests/data/fate/vsynth2-mpeg4-adap.avi
+71a34a48a81485f938d2c60a3d34ed39 
*tests/data/fate/vsynth2-mpeg4-adap.out.rawvideo
 stddev:4.87 PSNR: 34.36 MAXDIFF:   86 bytes:  7603200/  7603200
-- 
2.11.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH 1/2] avcodec/get_bits: cosmetics

2020-03-22 Thread Ramiro Polla

On Tue, Nov 5, 2019 at 2:35 PM Michael Niedermayer
 wrote:
> On Tue, Nov 05, 2019 at 11:13:49AM +0100, Ramiro Polla wrote:
> >  libavcodec/get_bits.h | 8 
> >  1 file changed, 4 insertions(+), 4 deletions(-)
>
> LGTM

ping
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH 2/2] avcodec/wmadec: cosmetics

2020-03-22 Thread Ramiro Polla

On Tue, Nov 5, 2019 at 2:35 PM Michael Niedermayer
 wrote:
> On Tue, Nov 05, 2019 at 11:13:50AM +0100, Ramiro Polla wrote:
> >  libavcodec/wmadec.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
>
> LGTM

ping
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH 1/2] avcodec/get_bits: cosmetics

2019-11-05 Thread Ramiro Polla

---
 libavcodec/get_bits.h | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/libavcodec/get_bits.h b/libavcodec/get_bits.h
index c4ab607744..66fb877599 100644
--- a/libavcodec/get_bits.h
+++ b/libavcodec/get_bits.h
@@ -234,9 +234,9 @@ static inline void refill_32(GetBitContext *s, int is_le)
 #endif
 
 if (is_le)
-s->cache   = (uint64_t)AV_RL32(s->buffer + (s->index >> 3)) << 
s->bits_left | s->cache;
+s->cache = (uint64_t)AV_RL32(s->buffer + (s->index >> 3)) << 
s->bits_left | s->cache;
 else
-s->cache   = s->cache | (uint64_t)AV_RB32(s->buffer + (s->index >> 3)) 
<< (32 - s->bits_left);
+s->cache = s->cache | (uint64_t)AV_RB32(s->buffer + (s->index >> 3)) 
<< (32 - s->bits_left);
 s->index += 32;
 s->bits_left += 32;
 }
@@ -249,9 +249,9 @@ static inline void refill_64(GetBitContext *s, int is_le)
 #endif
 
 if (is_le)
-s->cache = AV_RL64(s->buffer + (s->index >> 3));
+s->cache = AV_RL64(s->buffer + (s->index >> 3));
 else
-s->cache = AV_RB64(s->buffer + (s->index >> 3));
+s->cache = AV_RB64(s->buffer + (s->index >> 3));
 s->index += 64;
 s->bits_left = 64;
 }
-- 
2.11.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH 2/2] avcodec/wmadec: cosmetics

2019-11-05 Thread Ramiro Polla

---
 libavcodec/wmadec.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libavcodec/wmadec.c b/libavcodec/wmadec.c
index 78b51e5871..e7886262f3 100644
--- a/libavcodec/wmadec.c
+++ b/libavcodec/wmadec.c
@@ -889,11 +889,11 @@ static int wma_decode_superframe(AVCodecContext *avctx, 
void *data,
 q   = s->last_superframe + s->last_superframe_len;
 len = bit_offset;
 while (len > 7) {
-*q++ = (get_bits) (>gb, 8);
+*q++ = get_bits(>gb, 8);
 len -= 8;
 }
 if (len > 0)
-*q++ = (get_bits) (>gb, len) << (8 - len);
+*q++ = get_bits(>gb, len) << (8 - len);
 memset(q, 0, AV_INPUT_BUFFER_PADDING_SIZE);
 
 /* XXX: bit_offset bits into last frame */
-- 
2.11.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH] mpegvideo_enc: add option to disable intra mbs in p frames

2019-10-23 Thread Ramiro Polla

On Sun, Jun 17, 2018 at 6:23 AM Ramiro Polla  wrote:
> On Sun, Jun 10, 2018 at 2:32 AM, Michael Niedermayer
>  wrote:
> > On Sat, Jun 09, 2018 at 05:09:13PM +0200, Ramiro Polla wrote:
> >> On Thu, May 10, 2018 at 11:01 PM, Michael Niedermayer
> >>  wrote:
> >> > On Wed, May 09, 2018 at 08:44:25PM +0200, Ramiro Polla wrote:
> >> >> This option prevents the mpv encoders from using intra macroblocks in
> >> >> predictive frames.
> >> >>
> >> >> It is useful for glitch artists to generate input material. This option
> >> >> allows them to split and merge two video files while maintaining fluid
> >> >> motion from the second video without having intra macroblocks restoring
> >> >> chunks of the first video.
> >> >
> >> > maybe a continuous variable like snows intra_penalty could achieve this
> >> > too but give more flexibility in doing it also just partially if wanted
> >>
> >> I like this idea better. I wanted a simple way to be able to entirely
> >> disable intra macroblocks, but "-intra_penalty max" could cause an
> >> overflow, so I set the max value to INT_MAX/2.
> >>
> >> New patch attached.
> >
> > LGTM
> >
> > a fate test may also make sense
>
> I sent a new patch set that includes a fate test.

The patchset with test that I had sent involved some changes to
ffprobe/fate that weren't good. I gave up trying to add tests in a
clean way.

Here's just the previous LGTM'd patch, rebased against git master.

Ramiro
From e30abdeed04aa36e9f80ea54c891ee32b888d95c Mon Sep 17 00:00:00 2001
From: Ramiro Polla 
Date: Wed, 23 Oct 2019 21:12:32 +0200
Subject: [PATCH] mpegvideo_enc: add intra_penalty option for p frames

This option allows more control over the use of intra macroblocks in
predictive frames.

By using '-intra_penalty max', intra macroblocks are never used in
predictive frames.

It is useful for glitch artists to generate input material. This option
allows them to split and merge two video files while maintaining fluid
motion from the second video without having intra macroblocks restoring
chunks of the first video.
---
 libavcodec/motion_est.c| 10 +-
 libavcodec/motion_est.h|  2 +-
 libavcodec/mpegvideo.h |  3 +++
 libavcodec/mpegvideo_enc.c |  6 +++---
 libavcodec/svq1enc.c   |  2 +-
 5 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/libavcodec/motion_est.c b/libavcodec/motion_est.c
index 759eea479d..02c75fd470 100644
--- a/libavcodec/motion_est.c
+++ b/libavcodec/motion_est.c
@@ -971,7 +971,7 @@ void ff_estimate_p_frame_motion(MpegEncContext * s,
 int i_score= varc-500+(s->lambda2>>FF_LAMBDA_SHIFT)*20;
 c->scene_change_score+= ff_sqrt(p_score) - ff_sqrt(i_score);
 
-if (vard*2 + 200*256 > varc)
+if (vard*2 + 200*256 > varc && !s->intra_penalty)
 mb_type|= CANDIDATE_MB_TYPE_INTRA;
 if (varc*2 + 200*256 > vard || s->qscale > 24){
 //if (varc*2 + 200*256 + 50*(s->lambda2>>FF_LAMBDA_SHIFT) > vard){
@@ -1040,7 +1040,7 @@ void ff_estimate_p_frame_motion(MpegEncContext * s,
 
 intra_score= s->mecc.mb_cmp[0](s, c->scratchpad, pix, s->linesize, 16);
 }
-intra_score += c->mb_penalty_factor*16;
+intra_score += c->mb_penalty_factor*16 + s->intra_penalty;
 
 if(intra_score < dmin){
 mb_type= CANDIDATE_MB_TYPE_INTRA;
@@ -1648,7 +1648,7 @@ int ff_get_best_fcode(MpegEncContext * s, int16_t (*mv_table)[2], int type)
 }
 }
 
-void ff_fix_long_p_mvs(MpegEncContext * s)
+void ff_fix_long_p_mvs(MpegEncContext * s, int type)
 {
 MotionEstContext * const c= >me;
 const int f_code= s->f_code;
@@ -1682,8 +1682,8 @@ void ff_fix_long_p_mvs(MpegEncContext * s)
 if(   mx >=range || mx <-range
|| my >=range || my <-range){
 s->mb_type[i] &= ~CANDIDATE_MB_TYPE_INTER4V;
-s->mb_type[i] |= CANDIDATE_MB_TYPE_INTRA;
-s->current_picture.mb_type[i] = CANDIDATE_MB_TYPE_INTRA;
+s->mb_type[i] |= type;
+s->current_picture.mb_type[i] = type;
 }
 }
 }
diff --git a/libavcodec/motion_est.h b/libavcodec/motion_est.h
index 3b3a8d7341..817220f340 100644
--- a/libavcodec/motion_est.h
+++ b/libavcodec/motion_est.h
@@ -127,7 +127,7 @@ int ff_get_mb_score(struct MpegEncContext *s, int mx, int my, int src_index,
 int ff_get_best_fcode(struct MpegEncContext *s,
   int16_t (*mv_table)[2], int type);
 
-void ff_fix_long_p_mvs(struct MpegEncCo

[FFmpeg-devel] [PATCH 4/4] mpegvideo_enc: add intra_penalty option for p frames

2018-06-16 Thread Ramiro Polla

This option allows more control over the use of intra macroblocks in
predictive frames.

By using '-intra_penalty max', intra macroblocks are never used in
predictive frames.

It is useful for glitch artists to generate input material. This option
allows them to split and merge two video files while maintaining fluid
motion from the second video without having intra macroblocks restoring
chunks of the first video.
---
 libavcodec/motion_est.c   | 10 
 libavcodec/motion_est.h   |  2 +-
 libavcodec/mpegvideo.h|  3 +++
 libavcodec/mpegvideo_enc.c|  6 ++---
 libavcodec/svq1enc.c  |  2 +-
 tests/fate-run.sh |  8 +++
 tests/fate/mpeg4.mak  |  5 
 tests/fate/seek.mak   |  1 +
 tests/fate/vcodec.mak |  4 
 tests/ref/fate/mpeg4-nopimb   |  1 +
 tests/ref/seek/vsynth_lena-mpeg4-nopimb   | 40 +++
 tests/ref/vsynth/vsynth1-mpeg4-nopimb |  4 
 tests/ref/vsynth/vsynth2-mpeg4-nopimb |  4 
 tests/ref/vsynth/vsynth3-mpeg4-nopimb |  4 
 tests/ref/vsynth/vsynth_lena-mpeg4-nopimb |  4 
 15 files changed, 88 insertions(+), 10 deletions(-)
 create mode 100644 tests/ref/fate/mpeg4-nopimb
 create mode 100644 tests/ref/seek/vsynth_lena-mpeg4-nopimb
 create mode 100644 tests/ref/vsynth/vsynth1-mpeg4-nopimb
 create mode 100644 tests/ref/vsynth/vsynth2-mpeg4-nopimb
 create mode 100644 tests/ref/vsynth/vsynth3-mpeg4-nopimb
 create mode 100644 tests/ref/vsynth/vsynth_lena-mpeg4-nopimb

diff --git a/libavcodec/motion_est.c b/libavcodec/motion_est.c
index 8b5ce2117a..fa750e39ec 100644
--- a/libavcodec/motion_est.c
+++ b/libavcodec/motion_est.c
@@ -971,7 +971,7 @@ void ff_estimate_p_frame_motion(MpegEncContext * s,
 int i_score= varc-500+(s->lambda2>>FF_LAMBDA_SHIFT)*20;
 c->scene_change_score+= ff_sqrt(p_score) - ff_sqrt(i_score);
 
-if (vard*2 + 200*256 > varc)
+if (vard*2 + 200*256 > varc && !s->intra_penalty)
 mb_type|= CANDIDATE_MB_TYPE_INTRA;
 if (varc*2 + 200*256 > vard || s->qscale > 24){
 //if (varc*2 + 200*256 + 50*(s->lambda2>>FF_LAMBDA_SHIFT) > vard){
@@ -1040,7 +1040,7 @@ void ff_estimate_p_frame_motion(MpegEncContext * s,
 
 intra_score= s->mecc.mb_cmp[0](s, c->scratchpad, pix, s->linesize, 
16);
 }
-intra_score += c->mb_penalty_factor*16;
+intra_score += c->mb_penalty_factor*16 + s->intra_penalty;
 
 if(intra_score < dmin){
 mb_type= CANDIDATE_MB_TYPE_INTRA;
@@ -1648,7 +1648,7 @@ int ff_get_best_fcode(MpegEncContext * s, int16_t 
(*mv_table)[2], int type)
 }
 }
 
-void ff_fix_long_p_mvs(MpegEncContext * s)
+void ff_fix_long_p_mvs(MpegEncContext * s, int type)
 {
 MotionEstContext * const c= >me;
 const int f_code= s->f_code;
@@ -1682,8 +1682,8 @@ void ff_fix_long_p_mvs(MpegEncContext * s)
 if(   mx >=range || mx <-range
|| my >=range || my <-range){
 s->mb_type[i] &= ~CANDIDATE_MB_TYPE_INTER4V;
-s->mb_type[i] |= CANDIDATE_MB_TYPE_INTRA;
-s->current_picture.mb_type[i] = 
CANDIDATE_MB_TYPE_INTRA;
+s->mb_type[i] |= type;
+s->current_picture.mb_type[i] = type;
 }
 }
 }
diff --git a/libavcodec/motion_est.h b/libavcodec/motion_est.h
index 3b3a8d7341..817220f340 100644
--- a/libavcodec/motion_est.h
+++ b/libavcodec/motion_est.h
@@ -127,7 +127,7 @@ int ff_get_mb_score(struct MpegEncContext *s, int mx, int 
my, int src_index,
 int ff_get_best_fcode(struct MpegEncContext *s,
   int16_t (*mv_table)[2], int type);
 
-void ff_fix_long_p_mvs(struct MpegEncContext *s);
+void ff_fix_long_p_mvs(struct MpegEncContext *s, int type);
 void ff_fix_long_mvs(struct MpegEncContext *s, uint8_t *field_select_table,
  int field_select, int16_t (*mv_table)[2], int f_code,
  int type, int truncate);
diff --git a/libavcodec/mpegvideo.h b/libavcodec/mpegvideo.h
index e16deb64e7..7eda962ba7 100644
--- a/libavcodec/mpegvideo.h
+++ b/libavcodec/mpegvideo.h
@@ -577,6 +577,8 @@ typedef struct MpegEncContext {
 
 int scenechange_threshold;
 int noise_reduction;
+
+int intra_penalty;
 } MpegEncContext;
 
 /* mpegvideo_enc common options */
@@ -661,6 +663,7 @@ FF_MPV_OPT_CMP_FUNC, \
 {"ps", "RTP payload size in bytes", 
FF_MPV_OFFSET(rtp_payload_size), AV_OPT_TYPE_INT, {.i64 = 0 }, INT_MIN, 
INT_MAX, FF_MPV_OPT_FLAGS }, \
 {"mepc", "Motion estimation bitrate penalty compensation (1.0 = 256)", 
FF_MPV_OFFSET(me_penalty_compensation), AV_OPT_TYPE_INT, {.i64 = 256 }, 
INT_MIN, INT_MAX, FF_MPV_OPT_FLAGS }, \
 {"mepre", "pre motion

[FFmpeg-devel] [PATCH 2/4] mpegutils: split debug function that prints mb_type so it may be used by ffprobe

2018-06-16 Thread Ramiro Polla

---
 libavcodec/mpegutils.c | 115 +
 libavcodec/mpegutils.h |   7 +++
 2 files changed, 76 insertions(+), 46 deletions(-)

diff --git a/libavcodec/mpegutils.c b/libavcodec/mpegutils.c
index 0fbe5f8c9d..12c2468797 100644
--- a/libavcodec/mpegutils.c
+++ b/libavcodec/mpegutils.c
@@ -100,6 +100,72 @@ void ff_draw_horiz_band(AVCodecContext *avctx,
 }
 }
 
+int ff_mb_type_str(char *str, int size, int mb_type)
+{
+char *ptr = str;
+
+if (size <= 0)
+return 0;
+
+if (--size <= 0)
+goto end;
+
+// Type & MV direction
+if (IS_PCM(mb_type))
+*ptr++ = 'P';
+else if (IS_INTRA(mb_type) && IS_ACPRED(mb_type))
+*ptr++ = 'A';
+else if (IS_INTRA4x4(mb_type))
+*ptr++ = 'i';
+else if (IS_INTRA16x16(mb_type))
+*ptr++ = 'I';
+else if (IS_DIRECT(mb_type) && IS_SKIP(mb_type))
+*ptr++ = 'd';
+else if (IS_DIRECT(mb_type))
+*ptr++ = 'D';
+else if (IS_GMC(mb_type) && IS_SKIP(mb_type))
+*ptr++ = 'g';
+else if (IS_GMC(mb_type))
+*ptr++ = 'G';
+else if (IS_SKIP(mb_type))
+*ptr++ = 'S';
+else if (!USES_LIST(mb_type, 1))
+*ptr++ = '>';
+else if (!USES_LIST(mb_type, 0))
+*ptr++ = '<';
+else {
+av_assert2(USES_LIST(mb_type, 0) && USES_LIST(mb_type, 1));
+*ptr++ = 'X';
+}
+
+if (--size <= 0)
+goto end;
+
+// segmentation
+if (IS_8X8(mb_type))
+*ptr++ = '+';
+else if (IS_16X8(mb_type))
+*ptr++ = '-';
+else if (IS_8X16(mb_type))
+*ptr++ = '|';
+else if (IS_INTRA(mb_type) || IS_16X16(mb_type))
+*ptr++ = ' ';
+else
+*ptr++ = '?';
+
+if (--size <= 0)
+goto end;
+
+if (IS_INTERLACED(mb_type))
+*ptr++ = '=';
+else
+*ptr++ = ' ';
+
+end:
+*ptr = '\0';
+return ptr - str;
+}
+
 void ff_print_debug_info2(AVCodecContext *avctx, AVFrame *pict, uint8_t 
*mbskip_table,
  uint32_t *mbtype_table, int8_t *qscale_table, int16_t 
(*motion_val[2])[2],
  int *low_delay,
@@ -231,52 +297,9 @@ void ff_print_debug_info2(AVCodecContext *avctx, AVFrame 
*pict, uint8_t *mbskip_
qscale_table[x + y * mb_stride]);
 }
 if (avctx->debug & FF_DEBUG_MB_TYPE) {
-int mb_type = mbtype_table[x + y * mb_stride];
-// Type & MV direction
-if (IS_PCM(mb_type))
-av_log(avctx, AV_LOG_DEBUG, "P");
-else if (IS_INTRA(mb_type) && IS_ACPRED(mb_type))
-av_log(avctx, AV_LOG_DEBUG, "A");
-else if (IS_INTRA4x4(mb_type))
-av_log(avctx, AV_LOG_DEBUG, "i");
-else if (IS_INTRA16x16(mb_type))
-av_log(avctx, AV_LOG_DEBUG, "I");
-else if (IS_DIRECT(mb_type) && IS_SKIP(mb_type))
-av_log(avctx, AV_LOG_DEBUG, "d");
-else if (IS_DIRECT(mb_type))
-av_log(avctx, AV_LOG_DEBUG, "D");
-else if (IS_GMC(mb_type) && IS_SKIP(mb_type))
-av_log(avctx, AV_LOG_DEBUG, "g");
-else if (IS_GMC(mb_type))
-av_log(avctx, AV_LOG_DEBUG, "G");
-else if (IS_SKIP(mb_type))
-av_log(avctx, AV_LOG_DEBUG, "S");
-else if (!USES_LIST(mb_type, 1))
-av_log(avctx, AV_LOG_DEBUG, ">");
-else if (!USES_LIST(mb_type, 0))
-av_log(avctx, AV_LOG_DEBUG, "<");
-else {
-av_assert2(USES_LIST(mb_type, 0) && USES_LIST(mb_type, 
1));
-av_log(avctx, AV_LOG_DEBUG, "X");
-}
-
-// segmentation
-if (IS_8X8(mb_type))
-av_log(avctx, AV_LOG_DEBUG, "+");
-else if (IS_16X8(mb_type))
-av_log(avctx, AV_LOG_DEBUG, "-");
-else if (IS_8X16(mb_type))
-av_log(avctx, AV_LOG_DEBUG, "|");
-else if (IS_INTRA(mb_type) || IS_16X16(mb_type))
-av_log(avctx, AV_LOG_DEBUG, " ");
-else
-av_log(avctx, AV_LOG_DEBUG, "?");
-
-
-if (IS_INTERLACED(mb_type))
-av_log(avctx, AV_LOG_DEBUG, "=");
-else
-av_log(avctx, AV_LOG_DEBUG, " ");
+char str[4];
+ff_mb_type_str(str, sizeof(str), mbtype_table[x + y * 
mb_stride]);
+av_log(avctx, AV_LOG_DEBUG, str);
 }
 }
 av_log(avctx, AV_LOG_DEBUG, "\n");
diff --git

[FFmpeg-devel] [PATCH 3/4] ffprobe: print mb_types frame side data

2018-06-16 Thread Ramiro Polla

---
 fftools/ffprobe.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/fftools/ffprobe.c b/fftools/ffprobe.c
index 544786ec72..5bd14ebfdb 100644
--- a/fftools/ffprobe.c
+++ b/fftools/ffprobe.c
@@ -30,6 +30,7 @@
 
 #include "libavformat/avformat.h"
 #include "libavcodec/avcodec.h"
+#include "libavcodec/mpegutils.h"
 #include "libavutil/avassert.h"
 #include "libavutil/avstring.h"
 #include "libavutil/bprint.h"
@@ -,6 +2223,24 @@ static void show_frame(WriterContext *w, AVFrame *frame, 
AVStream *stream,
 AVContentLightMetadata *metadata = (AVContentLightMetadata 
*)sd->data;
 print_int("max_content", metadata->MaxCLL);
 print_int("max_average", metadata->MaxFALL);
+} else if (sd->type == AV_FRAME_DATA_MB_TYPES) {
+uint32_t *mb_types = (uint32_t *)sd->data;
+int mb_height = *mb_types++;
+int mb_width = *mb_types++;
+int size = mb_height * mb_width * 3 + 1;
+char *str = av_malloc(size);
+int mb_y, mb_x;
+print_int("mb_height", mb_height);
+print_int("mb_width", mb_width);
+if (str) {
+char *ptr = str;
+const char *end = str + size;
+for (mb_y = 0; mb_y < mb_height; mb_y++)
+for (mb_x = 0; mb_x < mb_width; mb_x++)
+ptr += ff_mb_type_str(ptr, end - str, *mb_types++);
+print_str("mb_types", str);
+av_free(str);
+}
 } else if (sd->type == AV_FRAME_DATA_ICC_PROFILE) {
 AVDictionaryEntry *tag = av_dict_get(sd->metadata, "name", 
NULL, AV_DICT_MATCH_CASE);
 if (tag)
-- 
2.11.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH] mpegvideo_enc: add option to disable intra mbs in p frames

2018-06-16 Thread Ramiro Polla

On Sun, Jun 10, 2018 at 2:32 AM, Michael Niedermayer
 wrote:
> On Sat, Jun 09, 2018 at 05:09:13PM +0200, Ramiro Polla wrote:
>> On Thu, May 10, 2018 at 11:01 PM, Michael Niedermayer
>>  wrote:
>> > On Wed, May 09, 2018 at 08:44:25PM +0200, Ramiro Polla wrote:
>> >> This option prevents the mpv encoders from using intra macroblocks in
>> >> predictive frames.
>> >>
>> >> It is useful for glitch artists to generate input material. This option
>> >> allows them to split and merge two video files while maintaining fluid
>> >> motion from the second video without having intra macroblocks restoring
>> >> chunks of the first video.
>> >
>> > maybe a continuous variable like snows intra_penalty could achieve this
>> > too but give more flexibility in doing it also just partially if wanted
>>
>> I like this idea better. I wanted a simple way to be able to entirely
>> disable intra macroblocks, but "-intra_penalty max" could cause an
>> overflow, so I set the max value to INT_MAX/2.
>>
>> New patch attached.
>
> LGTM
>
> a fate test may also make sense

I sent a new patch set that includes a fate test.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [PATCH 1/4] lavu/frame: add mb_types side data

2018-06-16 Thread Ramiro Polla

---
 libavcodec/avcodec.h   |  4 
 libavcodec/mpegutils.c | 20 
 libavcodec/options_table.h |  1 +
 libavutil/frame.c  |  1 +
 libavutil/frame.h  |  9 +
 5 files changed, 35 insertions(+)

diff --git a/libavcodec/avcodec.h b/libavcodec/avcodec.h
index c90166deb6..7fe4fc9347 100644
--- a/libavcodec/avcodec.h
+++ b/libavcodec/avcodec.h
@@ -929,6 +929,10 @@ typedef struct RcOverride{
  */
 #define AV_CODEC_FLAG2_SHOW_ALL   (1 << 22)
 /**
+ * Export macroblock types through frame side data
+ */
+#define AV_CODEC_FLAG2_EXPORT_MB_TYPES (1 << 27)
+/**
  * Export motion vectors through frame side data
  */
 #define AV_CODEC_FLAG2_EXPORT_MVS (1 << 28)
diff --git a/libavcodec/mpegutils.c b/libavcodec/mpegutils.c
index 3f94540616..0fbe5f8c9d 100644
--- a/libavcodec/mpegutils.c
+++ b/libavcodec/mpegutils.c
@@ -188,6 +188,26 @@ void ff_print_debug_info2(AVCodecContext *avctx, AVFrame 
*pict, uint8_t *mbskip_
 av_freep();
 }
 
+if ((avctx->flags2 & AV_CODEC_FLAG2_EXPORT_MB_TYPES) && mbtype_table) {
+int size = (2 + mb_height * mb_width) * sizeof(uint32_t);
+int mb_x, mb_y;
+
+AVFrameSideData *sd;
+uint32_t *out;
+
+sd = av_frame_new_side_data(pict, AV_FRAME_DATA_MB_TYPES, size);
+if (!sd)
+return;
+
+out = (uint32_t *) sd->data;
+*out++ = mb_height;
+*out++ = mb_width;
+
+for (mb_y = 0; mb_y < mb_height; mb_y++)
+for (mb_x = 0; mb_x < mb_width; mb_x++)
+*out++ = mbtype_table[mb_x + mb_y * mb_stride];
+}
+
 /* TODO: export all the following to make them accessible for users (and 
filters) */
 if (avctx->hwaccel || !mbtype_table)
 return;
diff --git a/libavcodec/options_table.h b/libavcodec/options_table.h
index 099261e168..25c84de321 100644
--- a/libavcodec/options_table.h
+++ b/libavcodec/options_table.h
@@ -76,6 +76,7 @@ static const AVOption avcodec_options[] = {
 {"export_mvs", "export motion vectors through frame side data", 0, 
AV_OPT_TYPE_CONST, {.i64 = AV_CODEC_FLAG2_EXPORT_MVS}, INT_MIN, INT_MAX, V|D, 
"flags2"},
 {"skip_manual", "do not skip samples and export skip information as frame side 
data", 0, AV_OPT_TYPE_CONST, {.i64 = AV_CODEC_FLAG2_SKIP_MANUAL}, INT_MIN, 
INT_MAX, V|D, "flags2"},
 {"ass_ro_flush_noop", "do not reset ASS ReadOrder field on flush", 0, 
AV_OPT_TYPE_CONST, {.i64 = AV_CODEC_FLAG2_RO_FLUSH_NOOP}, INT_MIN, INT_MAX, 
S|D, "flags2"},
+{"export_mb_types", "export macroblock types through frame side data", 0, 
AV_OPT_TYPE_CONST, {.i64 = AV_CODEC_FLAG2_EXPORT_MB_TYPES}, INT_MIN, INT_MAX, 
V|D, "flags2"},
 {"time_base", NULL, OFFSET(time_base), AV_OPT_TYPE_RATIONAL, {.dbl = 0}, 0, 
INT_MAX},
 {"g", "set the group of picture (GOP) size", OFFSET(gop_size), 
AV_OPT_TYPE_INT, {.i64 = 12 }, INT_MIN, INT_MAX, V|E},
 {"ar", "set audio sampling rate (in Hz)", OFFSET(sample_rate), 
AV_OPT_TYPE_INT, {.i64 = DEFAULT }, 0, INT_MAX, A|D|E},
diff --git a/libavutil/frame.c b/libavutil/frame.c
index deb9b6f334..577d4f6e6d 100644
--- a/libavutil/frame.c
+++ b/libavutil/frame.c
@@ -834,6 +834,7 @@ const char *av_frame_side_data_name(enum 
AVFrameSideDataType type)
 case AV_FRAME_DATA_ICC_PROFILE: return "ICC profile";
 case AV_FRAME_DATA_QP_TABLE_PROPERTIES: return "QP table 
properties";
 case AV_FRAME_DATA_QP_TABLE_DATA:   return "QP table data";
+case AV_FRAME_DATA_MB_TYPES:return "Macroblock types";
 }
 return NULL;
 }
diff --git a/libavutil/frame.h b/libavutil/frame.h
index 9d57d6ce66..ce1231b03b 100644
--- a/libavutil/frame.h
+++ b/libavutil/frame.h
@@ -158,6 +158,15 @@ enum AVFrameSideDataType {
  */
 AV_FRAME_DATA_QP_TABLE_DATA,
 #endif
+
+/**
+ * Macroblock types exported by some codecs (on demand through the
+ * export_mb_types flag set in the libavcodec AVCodecContext flags2 
option).
+ * The data is composed by a header consisting of uint32_t mb_height and
+ * uint32_t mb_width, followed by a uint32_t mb_types[mb_height][mb_width]
+ * array.
+ */
+AV_FRAME_DATA_MB_TYPES,
 };
 
 enum AVActiveFormatDescription {
-- 
2.11.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH] mpegvideo_enc: add option to disable intra mbs in p frames

2018-06-09 Thread Ramiro Polla

Hi Michael,

On Thu, May 10, 2018 at 11:01 PM, Michael Niedermayer
 wrote:
> On Wed, May 09, 2018 at 08:44:25PM +0200, Ramiro Polla wrote:
>> This option prevents the mpv encoders from using intra macroblocks in
>> predictive frames.
>>
>> It is useful for glitch artists to generate input material. This option
>> allows them to split and merge two video files while maintaining fluid
>> motion from the second video without having intra macroblocks restoring
>> chunks of the first video.
>
> maybe a continuous variable like snows intra_penalty could achieve this
> too but give more flexibility in doing it also just partially if wanted

I like this idea better. I wanted a simple way to be able to entirely
disable intra macroblocks, but "-intra_penalty max" could cause an
overflow, so I set the max value to INT_MAX/2.

New patch attached.
From d2c1da02c28be5519f0ba84aa22f519a296a6d04 Mon Sep 17 00:00:00 2001
From: Ramiro Polla 
Date: Sat, 9 Jun 2018 17:00:26 +0200
Subject: [PATCH] mpegvideo_enc: add intra_penalty option for p frames

This option allows more control over the use of intra macroblocks in
predictive frames.

By using '-intra_penalty max', intra macroblocks are never used in
predictive frames.

It is useful for glitch artists to generate input material. This option
allows them to split and merge two video files while maintaining fluid
motion from the second video without having intra macroblocks restoring
chunks of the first video.
---
 libavcodec/motion_est.c| 10 +-
 libavcodec/motion_est.h|  2 +-
 libavcodec/mpegvideo.h |  3 +++
 libavcodec/mpegvideo_enc.c |  6 +++---
 libavcodec/svq1enc.c   |  2 +-
 5 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/libavcodec/motion_est.c b/libavcodec/motion_est.c
index 8b5ce2117a..fa750e39ec 100644
--- a/libavcodec/motion_est.c
+++ b/libavcodec/motion_est.c
@@ -971,7 +971,7 @@ void ff_estimate_p_frame_motion(MpegEncContext * s,
 int i_score= varc-500+(s->lambda2>>FF_LAMBDA_SHIFT)*20;
 c->scene_change_score+= ff_sqrt(p_score) - ff_sqrt(i_score);
 
-if (vard*2 + 200*256 > varc)
+if (vard*2 + 200*256 > varc && !s->intra_penalty)
 mb_type|= CANDIDATE_MB_TYPE_INTRA;
 if (varc*2 + 200*256 > vard || s->qscale > 24){
 //if (varc*2 + 200*256 + 50*(s->lambda2>>FF_LAMBDA_SHIFT) > vard){
@@ -1040,7 +1040,7 @@ void ff_estimate_p_frame_motion(MpegEncContext * s,
 
 intra_score= s->mecc.mb_cmp[0](s, c->scratchpad, pix, s->linesize, 16);
 }
-intra_score += c->mb_penalty_factor*16;
+intra_score += c->mb_penalty_factor*16 + s->intra_penalty;
 
 if(intra_score < dmin){
 mb_type= CANDIDATE_MB_TYPE_INTRA;
@@ -1648,7 +1648,7 @@ int ff_get_best_fcode(MpegEncContext * s, int16_t (*mv_table)[2], int type)
 }
 }
 
-void ff_fix_long_p_mvs(MpegEncContext * s)
+void ff_fix_long_p_mvs(MpegEncContext * s, int type)
 {
 MotionEstContext * const c= >me;
 const int f_code= s->f_code;
@@ -1682,8 +1682,8 @@ void ff_fix_long_p_mvs(MpegEncContext * s)
 if(   mx >=range || mx <-range
|| my >=range || my <-range){
 s->mb_type[i] &= ~CANDIDATE_MB_TYPE_INTER4V;
-s->mb_type[i] |= CANDIDATE_MB_TYPE_INTRA;
-s->current_picture.mb_type[i] = CANDIDATE_MB_TYPE_INTRA;
+s->mb_type[i] |= type;
+s->current_picture.mb_type[i] = type;
 }
 }
 }
diff --git a/libavcodec/motion_est.h b/libavcodec/motion_est.h
index 3b3a8d7341..817220f340 100644
--- a/libavcodec/motion_est.h
+++ b/libavcodec/motion_est.h
@@ -127,7 +127,7 @@ int ff_get_mb_score(struct MpegEncContext *s, int mx, int my, int src_index,
 int ff_get_best_fcode(struct MpegEncContext *s,
   int16_t (*mv_table)[2], int type);
 
-void ff_fix_long_p_mvs(struct MpegEncContext *s);
+void ff_fix_long_p_mvs(struct MpegEncContext *s, int type);
 void ff_fix_long_mvs(struct MpegEncContext *s, uint8_t *field_select_table,
  int field_select, int16_t (*mv_table)[2], int f_code,
  int type, int truncate);
diff --git a/libavcodec/mpegvideo.h b/libavcodec/mpegvideo.h
index e16deb64e7..7eda962ba7 100644
--- a/libavcodec/mpegvideo.h
+++ b/libavcodec/mpegvideo.h
@@ -577,6 +577,8 @@ typedef struct MpegEncContext {
 
 int scenechange_threshold;
 int noise_reduction;
+
+int intra_penalty;
 } MpegEncContext;
 
 /* mpegvideo_enc common options */
@@ -661,6 +663,7 @@ FF_MPV_OPT_CMP_FUNC, \
 {"ps", "RTP payload size in bytes", FF_MPV_OFFSET(rtp_payload_size), AV_OPT_TYPE_INT, {

[FFmpeg-devel] [PATCH] mpegvideo_enc: add option to disable intra mbs in p frames

2018-05-09 Thread Ramiro Polla

This option prevents the mpv encoders from using intra macroblocks in
predictive frames.

It is useful for glitch artists to generate input material. This option
allows them to split and merge two video files while maintaining fluid
motion from the second video without having intra macroblocks restoring
chunks of the first video.
---
 libavcodec/motion_est.c| 4 ++--
 libavcodec/mpegvideo.h | 2 ++
 libavcodec/mpegvideo_enc.c | 5 +++--
 3 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/libavcodec/motion_est.c b/libavcodec/motion_est.c
index 8b5ce2117a..827e2282f7 100644
--- a/libavcodec/motion_est.c
+++ b/libavcodec/motion_est.c
@@ -971,7 +971,7 @@ void ff_estimate_p_frame_motion(MpegEncContext * s,
 int i_score= varc-500+(s->lambda2>>FF_LAMBDA_SHIFT)*20;
 c->scene_change_score+= ff_sqrt(p_score) - ff_sqrt(i_score);
 
-if (vard*2 + 200*256 > varc)
+if (vard*2 + 200*256 > varc && !(s->mpv_flags & FF_MPV_FLAG_NOPIMB))
 mb_type|= CANDIDATE_MB_TYPE_INTRA;
 if (varc*2 + 200*256 > vard || s->qscale > 24){
 //if (varc*2 + 200*256 + 50*(s->lambda2>>FF_LAMBDA_SHIFT) > vard){
@@ -1042,7 +1042,7 @@ void ff_estimate_p_frame_motion(MpegEncContext * s,
 }
 intra_score += c->mb_penalty_factor*16;
 
-if(intra_score < dmin){
+if(intra_score < dmin && !(s->mpv_flags & FF_MPV_FLAG_NOPIMB)){
 mb_type= CANDIDATE_MB_TYPE_INTRA;
 s->current_picture.mb_type[mb_y*s->mb_stride + mb_x] = 
CANDIDATE_MB_TYPE_INTRA; //FIXME cleanup
 }else
diff --git a/libavcodec/mpegvideo.h b/libavcodec/mpegvideo.h
index e16deb64e7..b7ac2c7b48 100644
--- a/libavcodec/mpegvideo.h
+++ b/libavcodec/mpegvideo.h
@@ -586,6 +586,7 @@ typedef struct MpegEncContext {
 #define FF_MPV_FLAG_CBP_RD   0x0008
 #define FF_MPV_FLAG_NAQ  0x0010
 #define FF_MPV_FLAG_MV0  0x0020
+#define FF_MPV_FLAG_NOPIMB   0x0040
 
 #define FF_MPV_OPT_CMP_FUNC \
 { "sad","Sum of absolute differences, fast", 0, AV_OPT_TYPE_CONST, {.i64 = 
FF_CMP_SAD }, INT_MIN, INT_MAX, FF_MPV_OPT_FLAGS, "cmp_func" }, \
@@ -617,6 +618,7 @@ FF_MPV_OPT_CMP_FUNC, \
 { "cbp_rd", "use rate distortion optimization for CBP",  0, 
AV_OPT_TYPE_CONST, { .i64 = FF_MPV_FLAG_CBP_RD }, 0, 0, FF_MPV_OPT_FLAGS, 
"mpv_flags" },\
 { "naq","normalize adaptive quantization",   0, 
AV_OPT_TYPE_CONST, { .i64 = FF_MPV_FLAG_NAQ },0, 0, FF_MPV_OPT_FLAGS, 
"mpv_flags" },\
 { "mv0","always try a mb with mv=<0,0>", 0, 
AV_OPT_TYPE_CONST, { .i64 = FF_MPV_FLAG_MV0 },0, 0, FF_MPV_OPT_FLAGS, 
"mpv_flags" },\
+{ "nopimb", "do not use intra mbs for predictive frames",0, 
AV_OPT_TYPE_CONST, { .i64 = FF_MPV_FLAG_NOPIMB }, 0, 0, FF_MPV_OPT_FLAGS, 
"mpv_flags" },\
 { "luma_elim_threshold",   "single coefficient elimination threshold for 
luminance (negative values also consider dc coefficient)",\
   
FF_MPV_OFFSET(luma_elim_threshold), AV_OPT_TYPE_INT, { .i64 = 0 }, INT_MIN, 
INT_MAX, FF_MPV_OPT_FLAGS },\
 { "chroma_elim_threshold", "single coefficient elimination threshold for 
chrominance (negative values also consider dc coefficient)",\
diff --git a/libavcodec/mpegvideo_enc.c b/libavcodec/mpegvideo_enc.c
index 9fdab31a25..e41a8f40cf 100644
--- a/libavcodec/mpegvideo_enc.c
+++ b/libavcodec/mpegvideo_enc.c
@@ -3752,6 +3752,7 @@ static int encode_picture(MpegEncContext *s, int 
picture_number)
 
 if(!s->umvplus){
 if(s->pict_type==AV_PICTURE_TYPE_P || s->pict_type==AV_PICTURE_TYPE_S) 
{
+int truncate = s->mpv_flags & FF_MPV_FLAG_NOPIMB;
 s->f_code= ff_get_best_fcode(s, s->p_mv_table, 
CANDIDATE_MB_TYPE_INTER);
 
 if (s->avctx->flags & AV_CODEC_FLAG_INTERLACED_ME) {
@@ -3762,13 +3763,13 @@ static int encode_picture(MpegEncContext *s, int 
picture_number)
 }
 
 ff_fix_long_p_mvs(s);
-ff_fix_long_mvs(s, NULL, 0, s->p_mv_table, s->f_code, 
CANDIDATE_MB_TYPE_INTER, 0);
+ff_fix_long_mvs(s, NULL, 0, s->p_mv_table, s->f_code, 
CANDIDATE_MB_TYPE_INTER, truncate);
 if (s->avctx->flags & AV_CODEC_FLAG_INTERLACED_ME) {
 int j;
 for(i=0; i<2; i++){
 for(j=0; j<2; j++)
 ff_fix_long_mvs(s, s->p_field_select_table[i], j,
-s->p_field_mv_table[i][j], s->f_code, 
CANDIDATE_MB_TYPE_INTER_I, 0);
+s->p_field_mv_table[i][j], s->f_code, 
CANDIDATE_MB_TYPE_INTER_I, truncate);
 }
 }
 }
-- 
2.11.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH 2/3] Fix Decklink for Mac

2015-01-11 Thread Ramiro Polla

2015-01-11 15:38 GMT+01:00 Georg Lippitsch georg.lippit...@gmx.at:
 ---
  libavdevice/decklink_common.cpp | 10 ++
  1 file changed, 10 insertions(+)

 diff --git a/libavdevice/decklink_common.cpp b/libavdevice/decklink_common.cpp
 index 07e1651..82b8bdb 100644
 --- a/libavdevice/decklink_common.cpp
 +++ b/libavdevice/decklink_common.cpp
 @@ -70,6 +70,16 @@ static char *dup_wchar_to_utf8(wchar_t *w)
  #define DECKLINK_STROLECHAR *
  #define DECKLINK_STRDUP dup_wchar_to_utf8
  #define DECKLINK_FREE(s) SysFreeString(s)
 +#elif __APPLE__
 +static char *dup_cfstring_to_utf8(CFStringRef w)
 +{
 +char s[256];
 +CFStringGetCString(w, s, 255, kCFStringEncodingUTF8);
 +return av_strdup(s);
 +}

Is it not possible to get the string's real length? You could also try
using CFStringGetCStringPtr() first.

 +#define DECKLINK_STRconst __CFString *
 +#define DECKLINK_STRDUP dup_cfstring_to_utf8
 +#define DECKLINK_FREE(s) free((void *) s)
  #else
  #define DECKLINK_STRconst char *
  #define DECKLINK_STRDUP av_strdup
 --
 1.8.4.5
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] Patch for device list error in decklink_common.cpp

2014-12-03 Thread Ramiro Polla



On 03.12.2014 12:06, Jon bae wrote:

Thanks Ramiro for the correction!
Here is the new patch. (Is it better to post directly the patch, or is ok
as a attachment?)


Attachment is better. But please avoid top-posting in this mailing-list.


2014-12-02 22:19 GMT+01:00 Ramiro Polla ramiro.po...@gmail.com:

On 02.12.2014 20:30, Jon bae wrote:

Here is the other patch for decklink_common.cpp. It fix the error:

 COM initialization failed
 [decklink @ 02e5b520] Could not create DeckLink iterator
 dummy: Immediate exit request



  From 203eba2fad14dd6d84552d6c22899792e80b53bb Mon Sep 17 00:00:00 2001

From: Jonathan Baecker jonba...@gmail.com
Date: Tue, 2 Dec 2014 20:12:38 +0100
Subject: [PATCH 2/2] device list error in decklink_common

Signed-off-by: Jonathan Baecker jonba...@gmail.com
---
  libavdevice/decklink_common.cpp | 24 ++--
  1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/libavdevice/decklink_common.cpp
b/libavdevice/decklink_common.cpp
index 8eff910..8f7e32a 100644
--- a/libavdevice/decklink_common.cpp
+++ b/libavdevice/decklink_common.cpp
@@ -42,16 +42,20 @@ IDeckLinkIterator *CreateDeckLinkIteratorInstance
(void)
  {
  IDeckLinkIterator *iter;

-if (CoInitialize(NULL) != S_OK) {
-av_log(NULL, AV_LOG_ERROR, COM initialization failed.\n);
-return NULL;
-}
-
-if (CoCreateInstance(CLSID_CDeckLinkIterator, NULL, CLSCTX_ALL,
- IID_IDeckLinkIterator, (void**) iter) != S_OK)
{
-av_log(NULL, AV_LOG_ERROR, DeckLink drivers not installed.\n);
-return NULL;
-}
+HRESULT result;
+/* Initialize COM on this thread */
+result = CoInitialize(NULL);
+if (FAILED(result)) {
+av_log(NULL, AV_LOG_ERROR, COM initialization failed.\n);
+return NULL;
+}
+
+/* Create an IDeckLinkIterator object to enumerate all DeckLink
cards in the system */
+result = CoCreateInstance(CLSID_CDeckLinkIterator, NULL,
CLSCTX_ALL, IID_IDeckLinkIterator, (void**)iter);
+if (FAILED(result)) {
+av_log(NULL, AV_LOG_ERROR, DeckLink drivers not installed.\n);
+return NULL;
+}

  return iter;
  }
--
2.2.0



This code is Copyright (c) Blackmagic Design. Try just changing the check
for CoInitialize(NULL) from != S_OK to  0.




From 3c3d5dda659fe30c68a81b0a711cb09bcb5be443 Mon Sep 17 00:00:00 2001
From: Jonathan Baecker jonba...@gmail.com
Date: Wed, 3 Dec 2014 12:03:12 +0100
Subject: [PATCH] fix COM initialization failed

Signed-off-by: Jonathan Baecker jonba...@gmail.com
---
 libavdevice/decklink_common.cpp | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libavdevice/decklink_common.cpp b/libavdevice/decklink_common.cpp
index 6899bd2..4252552 100644
--- a/libavdevice/decklink_common.cpp
+++ b/libavdevice/decklink_common.cpp
@@ -42,13 +42,13 @@ IDeckLinkIterator *CreateDeckLinkIteratorInstance(void)
 {
 IDeckLinkIterator *iter;

-if (CoInitialize(NULL) != S_OK) {
+if (CoInitialize(NULL)  0) {
 av_log(NULL, AV_LOG_ERROR, COM initialization failed.\n);
 return NULL;
 }



 if (CoCreateInstance(CLSID_CDeckLinkIterator, NULL, CLSCTX_ALL,
- IID_IDeckLinkIterator, (void**) iter) != S_OK) {
+ IID_IDeckLinkIterator, (void**) iter)  0) {
 av_log(NULL, AV_LOG_ERROR, DeckLink drivers not installed.\n);
 return NULL;
 }


The CoCreateInstance check doesn't need to be changed.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] Patch for device list error in decklink_common.cpp

2014-12-03 Thread Ramiro Polla



On 03.12.2014 16:44, Jon bae wrote:

Ok finally... Here now only the first line changed. Sorry for the mess, I
'm not the right person for that.



From 2cddda59076b2ac5a539f7016c0aa1883d37c6d8 Mon Sep 17 00:00:00 2001
From: Jonathan Baecker jonba...@gmail.com
Date: Wed, 3 Dec 2014 16:41:41 +0100
Subject: [PATCH] fix COM initialization failed

Signed-off-by: Jonathan Baecker jonba...@gmail.com
---
 libavdevice/decklink_common.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libavdevice/decklink_common.cpp b/libavdevice/decklink_common.cpp
index 6899bd2..07e1651 100644
--- a/libavdevice/decklink_common.cpp
+++ b/libavdevice/decklink_common.cpp
@@ -42,7 +42,7 @@ IDeckLinkIterator *CreateDeckLinkIteratorInstance(void)
 {
 IDeckLinkIterator *iter;

-if (CoInitialize(NULL) != S_OK) {
+if (CoInitialize(NULL)  0) {
 av_log(NULL, AV_LOG_ERROR, COM initialization failed.\n);
 return NULL;
 }


LGTM. Thanks for submitting the patches!
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] Patch for device list error in decklink_common.cpp

2014-12-02 Thread Ramiro Polla



On 02.12.2014 20:30, Jon bae wrote:

Here is the other patch for decklink_common.cpp. It fix the error:

COM initialization failed
[decklink @ 02e5b520] Could not create DeckLink iterator
dummy: Immediate exit request



From 203eba2fad14dd6d84552d6c22899792e80b53bb Mon Sep 17 00:00:00 2001
From: Jonathan Baecker jonba...@gmail.com
Date: Tue, 2 Dec 2014 20:12:38 +0100
Subject: [PATCH 2/2] device list error in decklink_common

Signed-off-by: Jonathan Baecker jonba...@gmail.com
---
 libavdevice/decklink_common.cpp | 24 ++--
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/libavdevice/decklink_common.cpp b/libavdevice/decklink_common.cpp
index 8eff910..8f7e32a 100644
--- a/libavdevice/decklink_common.cpp
+++ b/libavdevice/decklink_common.cpp
@@ -42,16 +42,20 @@ IDeckLinkIterator *CreateDeckLinkIteratorInstance(void)
 {
 IDeckLinkIterator *iter;

-if (CoInitialize(NULL) != S_OK) {
-av_log(NULL, AV_LOG_ERROR, COM initialization failed.\n);
-return NULL;
-}
-
-if (CoCreateInstance(CLSID_CDeckLinkIterator, NULL, CLSCTX_ALL,
- IID_IDeckLinkIterator, (void**) iter) != S_OK) {
-av_log(NULL, AV_LOG_ERROR, DeckLink drivers not installed.\n);
-return NULL;
-}
+HRESULT result;
+/* Initialize COM on this thread */
+result = CoInitialize(NULL);
+if (FAILED(result)) {
+av_log(NULL, AV_LOG_ERROR, COM initialization failed.\n);
+return NULL;
+}
+
+/* Create an IDeckLinkIterator object to enumerate all DeckLink cards in 
the system */
+result = CoCreateInstance(CLSID_CDeckLinkIterator, NULL, CLSCTX_ALL, 
IID_IDeckLinkIterator, (void**)iter);
+if (FAILED(result)) {
+av_log(NULL, AV_LOG_ERROR, DeckLink drivers not installed.\n);
+return NULL;
+}

 return iter;
 }
--
2.2.0


This code is Copyright (c) Blackmagic Design. Try just changing the 
check for CoInitialize(NULL) from != S_OK to  0.

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] Patch for heap corruption run time error in decklink_common.cpp

2014-12-02 Thread Ramiro Polla



On 02.12.2014 20:28, Jon bae wrote:

Ok here a second run, I try to follow the instruction from Carl Eugen.
This is the first patch for decklink_common.cpp. It fix this error:

Unhandled exception at 0x76FA4102 (ntdll.dll) in ffmpeg.exe:
0xC374: A heap has been corrupted (parameters: 0x7701B4B0).



From e9bc8e910f515af4030054df3e6feb308f3208aa Mon Sep 17 00:00:00 2001
From: Jonathan Baecker jonba...@gmail.com
Date: Tue, 2 Dec 2014 20:10:41 +0100
Subject: [PATCH 1/2] heap corruption run time error in decklink_common

Signed-off-by: Jonathan Baecker jonba...@gmail.com
---
 libavdevice/decklink_common.cpp | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/libavdevice/decklink_common.cpp b/libavdevice/decklink_common.cpp
index 9a9e44b..8eff910 100644
--- a/libavdevice/decklink_common.cpp
+++ b/libavdevice/decklink_common.cpp
@@ -69,9 +69,12 @@ static char *dup_wchar_to_utf8(wchar_t *w)
 }
 #define DECKLINK_STROLECHAR *
 #define DECKLINK_STRDUP dup_wchar_to_utf8
+#define DECKLINK_FREE(s) SysFreeString(s)
 #else
 #define DECKLINK_STRconst char *
 #define DECKLINK_STRDUP av_strdup
+/* free() is needed for a string returned by the DeckLink SDL. */
+#define DECKLINK_FREE(s) free((void *) s)
 #endif

 HRESULT ff_decklink_get_display_name(IDeckLink *This, const char **displayName)
@@ -81,8 +84,7 @@ HRESULT ff_decklink_get_display_name(IDeckLink *This, const 
char **displayName)
 if (hr != S_OK)
 return hr;
 *displayName = DECKLINK_STRDUP(tmpDisplayName);
-/* free() is needed for a string returned by the DeckLink SDL. */
-free((void *) tmpDisplayName);
+DECKLINK_FREE(tmpDisplayName);
 return hr;
 }


LGTM
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] Fwd: OPW Qualification Task: Validate MLP Bitstream

2014-10-31 Thread Ramiro Polla

Greeshma,

2014-10-31 15:41 GMT+01:00 greeshma greeshmabalaba...@gmail.com:
 I have first added experimental encoder mlpenc.c from
 https://github.com/ramiropolla/mlpenc an updated changes according to the
 recent commits in FFmpeg

That code is supposed to be sent for review after the end of OPW, not before =)

The qualification task is to update it to the current FFmpeg codebase
(for example the DSPContext changes and the encode function changes).
There's still a long way to go before submitting this code for review.

Ramiro
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

71 matches

Mail list logo