[FFmpeg-devel] [PATCH 1/3] lavc/alacdsp: RISC-V V decorrelate_stereo

2022-10-04 Thread remi
From: Rémi Denis-Courmont To avoid data dependencies, this does the following unroll, which requires one extra but probably free addition: coeff = (b * left_weight) >> decorr_shift; b += a; a -= coeff; b -= coeff; swap(a, b); --- libavcodec/alacdsp.c| 4 ++-

[FFmpeg-devel] [PATCH 3/3] lavc/alacdsp: RISC-V V append_extra_bits[1]

2022-10-04 Thread remi
From: Rémi Denis-Courmont --- libavcodec/riscv/alacdsp_init.c | 5 + libavcodec/riscv/alacdsp_rvv.S | 27 +++ 2 files changed, 32 insertions(+) diff --git a/libavcodec/riscv/alacdsp_init.c b/libavcodec/riscv/alacdsp_init.c index 37688be67b..fa8a7c8129 100644 ---

[FFmpeg-devel] [PATCH 2/3] lavc/alacdsp: RISC-V V append_extra_bits[0]

2022-10-04 Thread remi
From: Rémi Denis-Courmont --- libavcodec/riscv/alacdsp_init.c | 8 +++- libavcodec/riscv/alacdsp_rvv.S | 18 ++ 2 files changed, 25 insertions(+), 1 deletion(-) diff --git a/libavcodec/riscv/alacdsp_init.c b/libavcodec/riscv/alacdsp_init.c index 9ddebaa60b..37688be67b

[FFmpeg-devel] [PATCH] riscv: fix scalar product initialisation

2022-10-03 Thread remi
From: Rémi Denis-Courmont VSETVLI xd, x0, ...' has rather nonobvious semantics: - If xd is x0, then it preserves the current vector length. - If xd is not x0, it sets the vector length to the supported maximum. Also somewhat confusingly, while VMV.X.S always does its thing regardless of the

[FFmpeg-devel] [PATCH 4/4] lavc/bswapdsp: RISC-V V bswap16_buf

2022-10-02 Thread remi
From: Rémi Denis-Courmont --- libavcodec/riscv/bswapdsp_init.c | 5 - libavcodec/riscv/bswapdsp_rvv.S | 17 + 2 files changed, 21 insertions(+), 1 deletion(-) diff --git a/libavcodec/riscv/bswapdsp_init.c b/libavcodec/riscv/bswapdsp_init.c index c17b6b75bb..abe84ec1f7

[FFmpeg-devel] [PATCH 3/4] lavc/bswapdsp: RISC-V V bswap_buf

2022-10-02 Thread remi
From: Rémi Denis-Courmont --- libavcodec/riscv/Makefile| 1 + libavcodec/riscv/bswapdsp_init.c | 7 - libavcodec/riscv/bswapdsp_rvv.S | 45 3 files changed, 52 insertions(+), 1 deletion(-) create mode 100644 libavcodec/riscv/bswapdsp_rvv.S diff

[FFmpeg-devel] [PATCH 2/4] lavc/bswapdsp: RISC-V B bswap_buf

2022-10-02 Thread remi
From: Rémi Denis-Courmont Simply taking the Zbb REV8 instruction into use in a simple loop gives some significant savings: bswap_buf_c: 1081.0 bswap_buf_rvb_b: 771.0 But we can also use the 64-bit REV8 as a pseudo-SIMD instruction with just one additional shift, and one fewer load, effectively

[FFmpeg-devel] [PATCH 1/4] lavu/riscv: CPU flag for the Zbb extension

2022-10-02 Thread remi
From: Rémi Denis-Courmont Unfortunately, it is common, and will remain so, that the Bit manipulations are not enabled at compilation time. This is an official policy for Debian ports in general (though they do not support RISC-V officially as of yet) to stick to the minimal target baseline,

[FFmpeg-devel] [PATCH 3/3] lavc/opusdsp: RISC-V V (256-bit vectors) postfilter

2022-10-01 Thread remi
From: Rémi Denis-Courmont This adds a variant of the postfilter for use with 256-bit vectors (or larger). Since the function requires 160-bit logical vectors, we can cut the group multiplier down to just one. The different vector type is passed via register. Unfortunately, there is no VSETIVL

[FFmpeg-devel] [PATCH 1/3] lavc/opusdsp: RISC-V V postfilter

2022-10-01 Thread remi
From: Rémi Denis-Courmont This is optimised for a vector size of 128-bit. Or maybe it would be more accurate to state that this is not properly optimised for larger vector sizes, as they would work just fine with a smaller vector group multiplier. --- libavcodec/opusdsp.c| 2 ++

[FFmpeg-devel] [PATCH 2/3] lavu/riscv: helper macro for VTYPE encoding

2022-10-01 Thread remi
From: Rémi Denis-Courmont On most cases, the vector type (VTYPE) for the RISC-V Vector extension is supplied as an immediate value, with either of the VSETVLI or VSETIVLI instructions. There is however a third instruction VSETVL which takes the vector type from a general purpose register. That

[FFmpeg-devel] [PATCH] lavc/opusdsp: RISC-V F deemphasis

2022-09-29 Thread remi
From: Rémi Denis-Courmont This saves almost exactly 25% on SiFive U74. deemphasis_c: 11536.2 deemphasis_rvf: 8654.2 --- libavcodec/opusdsp.c| 2 ++ libavcodec/opusdsp.h| 1 + libavcodec/riscv/Makefile | 2 ++ libavcodec/riscv/opusdsp_init.c | 36

[FFmpeg-devel] [PATCH 3/3] sws/rgb2rgb: RISC-V 64-bit V packed YUYV/UYVY to planar 4:2:2

2022-09-28 Thread remi
From: Rémi Denis-Courmont This is currently 64-bit only because the stack spilling code would not assemble on RV32I (and it would corrupt s0 and s1 on RV128I, in theory). This could be added later in the unlikely that someone wants it. --- libswscale/riscv/rgb2rgb.c | 10 +++

[FFmpeg-devel] [PATCH 2/3] sws/rgb2rgb: RISC-V V interleaveBytes

2022-09-28 Thread remi
From: Rémi Denis-Courmont --- libswscale/riscv/rgb2rgb.c | 4 libswscale/riscv/rgb2rgb_rvv.S | 26 ++ 2 files changed, 30 insertions(+) diff --git a/libswscale/riscv/rgb2rgb.c b/libswscale/riscv/rgb2rgb.c index 5654154494..32c1546827 100644 ---

[FFmpeg-devel] [PATCH 1/3] sws/rgb2rgb: RISC-V V shuffle_bytes_xxxx functions

2022-09-28 Thread remi
From: Rémi Denis-Courmont --- libswscale/rgb2rgb.c | 2 + libswscale/rgb2rgb.h | 1 + libswscale/riscv/Makefile | 2 + libswscale/riscv/rgb2rgb.c | 47 libswscale/riscv/rgb2rgb_rvv.S | 78 ++ 5 files changed,

[FFmpeg-devel] [PATCH 6/7] lavc/pixblockdsp: RISC-V V 16-bit get_pixels & get_pixels_unaligned

2022-09-27 Thread remi
From: Rémi Denis-Courmont --- libavcodec/riscv/pixblockdsp_init.c | 6 +- libavcodec/riscv/pixblockdsp_rvv.S | 7 +++ 2 files changed, 12 insertions(+), 1 deletion(-) diff --git a/libavcodec/riscv/pixblockdsp_init.c b/libavcodec/riscv/pixblockdsp_init.c index 69dbd18918..bbda381c12

[FFmpeg-devel] [PATCH 7/7] lavc/pixblockdsp: RISC-V diff_pixels & diff_pixels_unaligned

2022-09-27 Thread remi
From: Rémi Denis-Courmont --- libavcodec/riscv/pixblockdsp_init.c | 4 libavcodec/riscv/pixblockdsp_rvv.S | 16 2 files changed, 20 insertions(+) diff --git a/libavcodec/riscv/pixblockdsp_init.c b/libavcodec/riscv/pixblockdsp_init.c index bbda381c12..aa39a8a665 100644

[FFmpeg-devel] [PATCH 5/7] lavc/pixblockdsp: RISC-V V 8-bit get_pixels & get_pixels_unaligned

2022-09-27 Thread remi
From: Rémi Denis-Courmont --- libavcodec/riscv/Makefile | 1 + libavcodec/riscv/pixblockdsp_init.c | 12 ++ libavcodec/riscv/pixblockdsp_rvv.S | 37 + 3 files changed, 50 insertions(+) create mode 100644 libavcodec/riscv/pixblockdsp_rvv.S diff

[FFmpeg-devel] [PATCH 3/7] lavc/idctdsp: RISC-V V add_pixels_clamped function

2022-09-27 Thread remi
From: Rémi Denis-Courmont --- libavcodec/riscv/idctdsp_init.c | 6 +- libavcodec/riscv/idctdsp_rvv.S | 16 2 files changed, 21 insertions(+), 1 deletion(-) diff --git a/libavcodec/riscv/idctdsp_init.c b/libavcodec/riscv/idctdsp_init.c index 1a6add80da..58b8a6c97a 100644

[FFmpeg-devel] [PATCH 4/7] lavc/idctdsp: RISC-V V put_signed_pixels_clamped function

2022-09-27 Thread remi
From: Rémi Denis-Courmont --- libavcodec/riscv/idctdsp_init.c | 3 +++ libavcodec/riscv/idctdsp_rvv.S | 21 + 2 files changed, 24 insertions(+) diff --git a/libavcodec/riscv/idctdsp_init.c b/libavcodec/riscv/idctdsp_init.c index 58b8a6c97a..e6e616a555 100644 ---

[FFmpeg-devel] [PATCH 2/7] lavc/idctdsp: RISC-V V put_pixels_clamped function

2022-09-27 Thread remi
From: Rémi Denis-Courmont --- libavcodec/idctdsp.c| 2 ++ libavcodec/idctdsp.h| 2 ++ libavcodec/riscv/Makefile | 2 ++ libavcodec/riscv/idctdsp_init.c | 41 +++ libavcodec/riscv/idctdsp_rvv.S | 43 +

[FFmpeg-devel] [PATCH 1/7] lavu/riscv: helper to read the vector length

2022-09-27 Thread remi
From: Rémi Denis-Courmont --- libavutil/riscv/cpu.h | 45 +++ 1 file changed, 45 insertions(+) create mode 100644 libavutil/riscv/cpu.h diff --git a/libavutil/riscv/cpu.h b/libavutil/riscv/cpu.h new file mode 100644 index 00..56035f8556 ---

[FFmpeg-devel] [PATCH] checkasm: test packed YUYV to planar YUV 4:2:2

2022-09-26 Thread remi
From: Rémi Denis-Courmont --- tests/checkasm/sw_rgb.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/tests/checkasm/sw_rgb.c b/tests/checkasm/sw_rgb.c index 7cd815e5be..da401e8201 100644 --- a/tests/checkasm/sw_rgb.c +++ b/tests/checkasm/sw_rgb.c @@ -68,7 +68,7 @@

[FFmpeg-devel] [PATCH 06/31] configure: probe RISC-V Vector extension

2022-09-26 Thread remi
From: Rémi Denis-Courmont --- Makefile | 2 +- configure| 15 +++ ffbuild/arch.mak | 2 ++ 3 files changed, 18 insertions(+), 1 deletion(-) diff --git a/Makefile b/Makefile index 61f79e27ae..1fb742f390 100644 --- a/Makefile +++ b/Makefile @@ -91,7 +91,7 @@

[FFmpeg-devel] [PATCH 07/31] lavu/riscv: fallback macros for SH{1, 2, 3}ADD

2022-09-26 Thread remi
From: Rémi Denis-Courmont Those mnemonics require the very latest binutils release at the time of writing. These macros provide seamless backward compatibility. --- libavutil/riscv/asm.S | 19 +++ 1 file changed, 19 insertions(+) diff --git a/libavutil/riscv/asm.S

[FFmpeg-devel] [PATCH 08/31] lavu/floatdsp: RISC-V V vector_fmul_scalar

2022-09-26 Thread remi
From: Rémi Denis-Courmont This is based on existing code from the VLC git tree with two minor changes to account for the different function prototypes. --- libavutil/float_dsp.c| 2 ++ libavutil/float_dsp.h| 1 + libavutil/riscv/Makefile | 4 +++-

[FFmpeg-devel] [PATCH 31/31] lavc/aacpsdsp: RISC-V V stereo_interpolate[0]

2022-09-26 Thread remi
From: Rémi Denis-Courmont --- libavcodec/riscv/aacpsdsp_init.c | 4 +++ libavcodec/riscv/aacpsdsp_rvv.S | 56 2 files changed, 60 insertions(+) diff --git a/libavcodec/riscv/aacpsdsp_init.c b/libavcodec/riscv/aacpsdsp_init.c index c2201ffb6a..f42baf4251

[FFmpeg-devel] [PATCH 29/31] lavc/aacpsdsp: RISC-V V hybrid_analysis_ileave

2022-09-26 Thread remi
From: Rémi Denis-Courmont --- libavcodec/riscv/aacpsdsp_init.c | 5 + libavcodec/riscv/aacpsdsp_rvv.S | 35 2 files changed, 40 insertions(+) diff --git a/libavcodec/riscv/aacpsdsp_init.c b/libavcodec/riscv/aacpsdsp_init.c index 09f16f1041..1d36f89f6e

[FFmpeg-devel] [PATCH 28/31] lavc/aacpsdsp: RISC-V V hybrid_analysis

2022-09-26 Thread remi
From: Rémi Denis-Courmont This starts with one-time initialisation of the 26 constant factors like 08edacc248bce3f8946d75e97188d189c74a6de6. That is done with the scalar instruction set. While the formula can readily be vectored, the gains would (probably) be more than lost in transfering the

[FFmpeg-devel] [PATCH 27/31] lavc/aacpsdsp: RISC-V V mul_pair_single

2022-09-26 Thread remi
From: Rémi Denis-Courmont --- libavcodec/riscv/aacpsdsp_init.c | 6 +- libavcodec/riscv/aacpsdsp_rvv.S | 17 + 2 files changed, 22 insertions(+), 1 deletion(-) diff --git a/libavcodec/riscv/aacpsdsp_init.c b/libavcodec/riscv/aacpsdsp_init.c index 83f6d9b16b..21fd5b8470

[FFmpeg-devel] [PATCH 30/31] lavc/aacpsdsp: RISC-V V hybrid_synthesis_deint

2022-09-26 Thread remi
From: Rémi Denis-Courmont --- libavcodec/riscv/aacpsdsp_init.c | 6 +- libavcodec/riscv/aacpsdsp_rvv.S | 35 2 files changed, 40 insertions(+), 1 deletion(-) diff --git a/libavcodec/riscv/aacpsdsp_init.c b/libavcodec/riscv/aacpsdsp_init.c index

[FFmpeg-devel] [PATCH 23/31] lavc/fmtconvert: RISC-V V int32_to_float_fmul_scalar

2022-09-26 Thread remi
From: Rémi Denis-Courmont --- libavcodec/fmtconvert.c| 2 ++ libavcodec/fmtconvert.h| 1 + libavcodec/riscv/Makefile | 2 ++ libavcodec/riscv/fmtconvert_init.c | 39 ++ libavcodec/riscv/fmtconvert_rvv.S | 39

[FFmpeg-devel] [PATCH 26/31] lavc/aacpsdsp: RISC-V V add_squares

2022-09-26 Thread remi
From: Rémi Denis-Courmont --- libavcodec/aacpsdsp.h| 1 + libavcodec/aacpsdsp_template.c | 2 ++ libavcodec/riscv/Makefile| 2 ++ libavcodec/riscv/aacpsdsp_init.c | 37 libavcodec/riscv/aacpsdsp_rvv.S | 37

[FFmpeg-devel] [PATCH 21/31] lavc/audiodsp: RISC-V V vector_clipf

2022-09-26 Thread remi
From: Rémi Denis-Courmont --- libavcodec/riscv/audiodsp_init.c | 3 +++ libavcodec/riscv/audiodsp_rvv.S | 17 + 2 files changed, 20 insertions(+) diff --git a/libavcodec/riscv/audiodsp_init.c b/libavcodec/riscv/audiodsp_init.c index ac06848a82..9c9265531d 100644 ---

[FFmpeg-devel] [PATCH 25/31] lavc/vorbisdsp: RISC-V V inverse_coupling

2022-09-26 Thread remi
From: Rémi Denis-Courmont This uses the following vectorisation: for (i = 0; i < blocksize; i++) { ang[i] = mag[i] - copysignf(fmaxf(ang[i], 0.f), mag[i]); mag[i] = mag[i] - copysignf(fminf(ang[i], 0.f), mag[i]); } --- libavcodec/riscv/Makefile | 2 ++

[FFmpeg-devel] [PATCH 22/31] lavc/audiodsp: RISC-V V scalarproduct_int16

2022-09-26 Thread remi
From: Rémi Denis-Courmont --- libavcodec/riscv/audiodsp_init.c | 5 - libavcodec/riscv/audiodsp_rvv.S | 19 +++ 2 files changed, 23 insertions(+), 1 deletion(-) diff --git a/libavcodec/riscv/audiodsp_init.c b/libavcodec/riscv/audiodsp_init.c index 9c9265531d..32c3c6794d

[FFmpeg-devel] [PATCH 24/31] lavc/fmtconvert: RISC-V V int32_to_float_fmul_array8

2022-09-26 Thread remi
From: Rémi Denis-Courmont --- libavcodec/riscv/fmtconvert_init.c | 7 ++- libavcodec/riscv/fmtconvert_rvv.S | 28 2 files changed, 34 insertions(+), 1 deletion(-) diff --git a/libavcodec/riscv/fmtconvert_init.c b/libavcodec/riscv/fmtconvert_init.c index

[FFmpeg-devel] [PATCH 19/31] lavu/fixeddsp: RISC-V V butterflies_fixed

2022-09-26 Thread remi
From: Rémi Denis-Courmont --- libavutil/fixed_dsp.c| 4 +++- libavutil/fixed_dsp.h| 1 + libavutil/riscv/Makefile | 4 +++- libavutil/riscv/fixed_dsp_init.c | 38 ++ libavutil/riscv/fixed_dsp_rvv.S | 40

[FFmpeg-devel] [PATCH 15/31] lavu/floatdsp: RISC-V V butterflies_float

2022-09-26 Thread remi
From: Rémi Denis-Courmont --- libavutil/riscv/float_dsp_init.c | 2 ++ libavutil/riscv/float_dsp_rvv.S | 18 ++ 2 files changed, 20 insertions(+) diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c index 8982436647..a1cd180cdc 100644 ---

[FFmpeg-devel] [PATCH 14/31] lavu/floatdsp: RISC-V V vector_fmul_add

2022-09-26 Thread remi
From: Rémi Denis-Courmont --- libavutil/riscv/float_dsp_init.c | 3 +++ libavutil/riscv/float_dsp_rvv.S | 19 +++ 2 files changed, 22 insertions(+) diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c index a559bbb32b..8982436647 100644 ---

[FFmpeg-devel] [PATCH 20/31] lavc/audiodsp: RISC-V V vector_clip_int32

2022-09-26 Thread remi
From: Rémi Denis-Courmont --- libavcodec/riscv/Makefile| 1 + libavcodec/riscv/audiodsp_init.c | 9 libavcodec/riscv/audiodsp_rvv.S | 36 3 files changed, 46 insertions(+) create mode 100644 libavcodec/riscv/audiodsp_rvv.S diff --git

[FFmpeg-devel] [PATCH 18/31] lavu/floatdsp: RISC-V V scalarproduct_float

2022-09-26 Thread remi
From: Rémi Denis-Courmont --- libavutil/riscv/float_dsp_init.c | 2 ++ libavutil/riscv/float_dsp_rvv.S | 20 2 files changed, 22 insertions(+) diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c index 44a505308d..e61f887862 100644 ---

[FFmpeg-devel] [PATCH 05/31] lavu/cpu: CPU flags for the RISC-V Vector extension

2022-09-26 Thread remi
From: Rémi Denis-Courmont RVV defines a total of 12 different extensions, including: - 5 different instruction subsets: - Zve32x: 8-, 16- and 32-bit integers, - Zve32f: Zve32x plus single precision floats, - Zve64x: Zve32x plus 64-bit integers, - Zve64f: Zve32f plus Zve64x, - Zve64d:

[FFmpeg-devel] [PATCH 17/31] lavu/floatdsp: RISC-V V vector_fmul_window

2022-09-26 Thread remi
From: Rémi Denis-Courmont --- libavutil/riscv/float_dsp_init.c | 3 +++ libavutil/riscv/float_dsp_rvv.S | 33 2 files changed, 36 insertions(+) diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c index b99e3080c9..44a505308d

[FFmpeg-devel] [PATCH 16/31] lavu/floatdsp: RISC-V V vector_fmul_reverse

2022-09-26 Thread remi
From: Rémi Denis-Courmont --- libavutil/riscv/float_dsp_init.c | 3 +++ libavutil/riscv/float_dsp_rvv.S | 21 + 2 files changed, 24 insertions(+) diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c index a1cd180cdc..b99e3080c9 100644 ---

[FFmpeg-devel] [PATCH 13/31] lavu/floatdsp: RISC-V V vector_dmac_scalar

2022-09-26 Thread remi
From: Rémi Denis-Courmont --- libavutil/riscv/float_dsp_init.c | 3 +++ libavutil/riscv/float_dsp_rvv.S | 18 ++ 2 files changed, 21 insertions(+) diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c index 9e19413d5d..a559bbb32b 100644 ---

[FFmpeg-devel] [PATCH 12/31] lavu/floatdsp: RISC-V V vector_fmac_scalar

2022-09-26 Thread remi
From: Rémi Denis-Courmont --- libavutil/riscv/float_dsp_init.c | 3 +++ libavutil/riscv/float_dsp_rvv.S | 19 +++ 2 files changed, 22 insertions(+) diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c index 29114dfb82..9e19413d5d 100644 ---

[FFmpeg-devel] [PATCH 11/31] lavu/floatdsp: RISC-V V vector_dmul

2022-09-26 Thread remi
From: Rémi Denis-Courmont --- libavutil/riscv/float_dsp_init.c | 6 +- libavutil/riscv/float_dsp_rvv.S | 17 + 2 files changed, 22 insertions(+), 1 deletion(-) diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c index 2482094ab4..29114dfb82

[FFmpeg-devel] [PATCH 10/31] lavu/floatdsp: RISC-V V vector_fmul

2022-09-26 Thread remi
From: Rémi Denis-Courmont --- libavutil/riscv/float_dsp_init.c | 6 +- libavutil/riscv/float_dsp_rvv.S | 17 + 2 files changed, 22 insertions(+), 1 deletion(-) diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c index 3386139d49..2482094ab4

[FFmpeg-devel] [PATCH 04/31] lavc/pixblockdsp: RISC-V I get_pixels

2022-09-26 Thread remi
From: Rémi Denis-Courmont Benchmarks on SiFive U74-MC (courtesy of Shanghai StarFive Tech): get_pixels_c: 180.0 get_pixels_rvi: 136.7 --- libavcodec/pixblockdsp.c| 2 + libavcodec/pixblockdsp.h| 2 + libavcodec/riscv/Makefile | 2 +

[FFmpeg-devel] [PATCH 09/31] lavu/floatdsp: RISC-V V vector_dmul_scalar

2022-09-26 Thread remi
From: Rémi Denis-Courmont --- libavutil/riscv/float_dsp_init.c | 6 ++ libavutil/riscv/float_dsp_rvv.S | 17 + 2 files changed, 23 insertions(+) diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c index f4299049b0..3386139d49 100644 ---

[FFmpeg-devel] [PATCH 03/31] lavc/audiodsp: RISC-V F vector_clipf

2022-09-26 Thread remi
From: Rémi Denis-Courmont RV64G supports MIN & MAX instructions natively only on floating point registers, not general purpose ones. The later would require the Zbb extension. Due to that, it is actually faster to perform the clipping "properly" in FPU. Benchmarks on SiFive U74-MC (courtesy of

[FFmpeg-devel] [PATCH 02/31] lavu/riscv: initial common header for assembler macros

2022-09-26 Thread remi
From: Rémi Denis-Courmont --- libavutil/riscv/asm.S | 77 +++ 1 file changed, 77 insertions(+) create mode 100644 libavutil/riscv/asm.S diff --git a/libavutil/riscv/asm.S b/libavutil/riscv/asm.S new file mode 100644 index 00..dbd97f40a4 ---

[FFmpeg-devel] [PATCH 01/31] lavu/cpu: detect RISC-V base extensions

2022-09-26 Thread remi
From: Rémi Denis-Courmont This introduces compile-time and run-time CPU detection on RISC-V. In practice, I doubt that FFmpeg will ever see a RISC-V CPU without all of I, F and D extensions, and if it does, it probably won't have run-time detection. So the flags are essentially always set. But

[FFmpeg-devel] [PATCH 31/31] lavc/aacpsdsp: RISC-V V stereo_interpolate[0]

2022-09-25 Thread remi
From: Rémi Denis-Courmont --- libavcodec/riscv/aacpsdsp_init.c | 4 ++ libavcodec/riscv/aacpsdsp_rvv.S | 65 2 files changed, 69 insertions(+) diff --git a/libavcodec/riscv/aacpsdsp_init.c b/libavcodec/riscv/aacpsdsp_init.c index 20b1a12741..58a4c61121 100644

[FFmpeg-devel] [PATCH 30/31] lavc/aacpsdsp: RISC-V V hybrid_synthesis_deint

2022-09-25 Thread remi
From: Rémi Denis-Courmont --- libavcodec/riscv/aacpsdsp_init.c | 3 +++ libavcodec/riscv/aacpsdsp_rvv.S | 35 2 files changed, 38 insertions(+) diff --git a/libavcodec/riscv/aacpsdsp_init.c b/libavcodec/riscv/aacpsdsp_init.c index 76f55502ee..20b1a12741

[FFmpeg-devel] [PATCH 29/31] lavc/aacpsdsp: RISC-V V hybrid_analysis_ileave

2022-09-25 Thread remi
From: Rémi Denis-Courmont --- libavcodec/riscv/aacpsdsp_init.c | 14 + libavcodec/riscv/aacpsdsp_rvv.S | 35 2 files changed, 45 insertions(+), 4 deletions(-) diff --git a/libavcodec/riscv/aacpsdsp_init.c b/libavcodec/riscv/aacpsdsp_init.c index

[FFmpeg-devel] [PATCH 28/31] lavc/aacpsdsp: RISC-V V hybrid_analysis

2022-09-25 Thread remi
From: Rémi Denis-Courmont This starts with one-time initialisation of the 26 constant factors like 08edacc248bce3f8946d75e97188d189c74a6de6. That is done with the scalar instruction set. While the formula can readily be vectored, the gains would (probably) be more than lost in transfering the

[FFmpeg-devel] [PATCH 25/31] lavc/vorbisdsp: RISC-V V inverse_coupling

2022-09-25 Thread remi
From: Rémi Denis-Courmont This uses the following vectorisation: for (i = 0; i < blocksize; i++) { ang[i] = mag[i] - copysignf(fmaxf(ang[i], 0.f), mag[i]); mag[i] = mag[i] - copysignf(fminf(ang[i], 0.f), mag[i]); } --- libavcodec/riscv/Makefile | 2 ++

[FFmpeg-devel] [PATCH 27/31] lavc/aacpsdsp: RISC-V V mul_pair_single

2022-09-25 Thread remi
From: Rémi Denis-Courmont --- libavcodec/riscv/aacpsdsp_init.c | 6 +- libavcodec/riscv/aacpsdsp_rvv.S | 17 + 2 files changed, 22 insertions(+), 1 deletion(-) diff --git a/libavcodec/riscv/aacpsdsp_init.c b/libavcodec/riscv/aacpsdsp_init.c index 525fc9aa38..90c9c501c3

[FFmpeg-devel] [PATCH 23/31] lavc/fmtconvert: RISC-V V int32_to_float_fmul_scalar

2022-09-25 Thread remi
From: Rémi Denis-Courmont --- libavcodec/fmtconvert.c| 2 ++ libavcodec/fmtconvert.h| 1 + libavcodec/riscv/Makefile | 2 ++ libavcodec/riscv/fmtconvert_init.c | 39 ++ libavcodec/riscv/fmtconvert_rvv.S | 39

[FFmpeg-devel] [PATCH 21/31] lavc/audiodsp: RISC-V V vector_clipf

2022-09-25 Thread remi
From: Rémi Denis-Courmont --- libavcodec/riscv/audiodsp_init.c | 7 ++- libavcodec/riscv/audiodsp_rvv.S | 17 + 2 files changed, 23 insertions(+), 1 deletion(-) diff --git a/libavcodec/riscv/audiodsp_init.c b/libavcodec/riscv/audiodsp_init.c index ce8b60ee52..ddd561484f

[FFmpeg-devel] [PATCH 26/31] lavc/aacpsdsp: RISC-V V add_squares

2022-09-25 Thread remi
From: Rémi Denis-Courmont --- libavcodec/aacpsdsp.h| 1 + libavcodec/aacpsdsp_template.c | 2 ++ libavcodec/riscv/Makefile| 2 ++ libavcodec/riscv/aacpsdsp_init.c | 37 libavcodec/riscv/aacpsdsp_rvv.S | 37

[FFmpeg-devel] [PATCH 24/31] lavc/fmtconvert: RISC-V V int32_to_float_fmul_array8

2022-09-25 Thread remi
From: Rémi Denis-Courmont --- libavcodec/riscv/fmtconvert_init.c | 7 ++- libavcodec/riscv/fmtconvert_rvv.S | 28 2 files changed, 34 insertions(+), 1 deletion(-) diff --git a/libavcodec/riscv/fmtconvert_init.c b/libavcodec/riscv/fmtconvert_init.c index

[FFmpeg-devel] [PATCH 22/31] lavc/audiodsp: RISC-V V scalarproduct_int16

2022-09-25 Thread remi
From: Rémi Denis-Courmont --- libavcodec/riscv/audiodsp_init.c | 2 ++ libavcodec/riscv/audiodsp_rvv.S | 19 +++ 2 files changed, 21 insertions(+) diff --git a/libavcodec/riscv/audiodsp_init.c b/libavcodec/riscv/audiodsp_init.c index ddd561484f..6f38b7bc83 100644 ---

[FFmpeg-devel] [PATCH 20/31] lavc/audiodsp: RISC-V V vector_clip_int32

2022-09-25 Thread remi
From: Rémi Denis-Courmont --- libavcodec/riscv/Makefile| 1 + libavcodec/riscv/audiodsp_init.c | 9 libavcodec/riscv/audiodsp_rvv.S | 36 3 files changed, 46 insertions(+) create mode 100644 libavcodec/riscv/audiodsp_rvv.S diff --git

[FFmpeg-devel] [PATCH 17/31] lavu/floatdsp: RISC-V V vector_fmul_window

2022-09-25 Thread remi
From: Rémi Denis-Courmont --- libavutil/riscv/float_dsp_init.c | 3 +++ libavutil/riscv/float_dsp_rvv.S | 33 2 files changed, 36 insertions(+) diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c index 9b8fd9942b..dacd81c08b

[FFmpeg-devel] [PATCH 19/31] lavu/fixeddsp: RISC-V V butterflies_fixed

2022-09-25 Thread remi
From: Rémi Denis-Courmont --- libavutil/fixed_dsp.c| 4 +++- libavutil/fixed_dsp.h| 1 + libavutil/riscv/Makefile | 4 +++- libavutil/riscv/fixed_dsp_init.c | 38 ++ libavutil/riscv/fixed_dsp_rvv.S | 40

[FFmpeg-devel] [PATCH 16/31] lavu/floatdsp: RISC-V V vector_fmul_reverse

2022-09-25 Thread remi
From: Rémi Denis-Courmont --- libavutil/riscv/float_dsp_init.c | 3 +++ libavutil/riscv/float_dsp_rvv.S | 21 + 2 files changed, 24 insertions(+) diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c index f164b1308f..9b8fd9942b 100644 ---

[FFmpeg-devel] [PATCH 18/31] lavu/floatdsp: RISC-V V scalarproduct_float

2022-09-25 Thread remi
From: Rémi Denis-Courmont --- libavutil/riscv/float_dsp_init.c | 2 ++ libavutil/riscv/float_dsp_rvv.S | 20 2 files changed, 22 insertions(+) diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c index dacd81c08b..cc9b7e83dc 100644 ---

[FFmpeg-devel] [PATCH 14/31] lavu/floatdsp: RISC-V V vector_fmul_add

2022-09-25 Thread remi
From: Rémi Denis-Courmont --- libavutil/riscv/float_dsp_init.c | 3 +++ libavutil/riscv/float_dsp_rvv.S | 19 +++ 2 files changed, 22 insertions(+) diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c index d17d0f66c5..2ddd2050f7 100644 ---

[FFmpeg-devel] [PATCH 15/31] lavu/floatdsp: RISC-V V butterflies_float

2022-09-25 Thread remi
From: Rémi Denis-Courmont --- libavutil/riscv/float_dsp_init.c | 2 ++ libavutil/riscv/float_dsp_rvv.S | 18 ++ 2 files changed, 20 insertions(+) diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c index 2ddd2050f7..f164b1308f 100644 ---

[FFmpeg-devel] [PATCH 06/31] configure: probe RISC-V Vector extension

2022-09-25 Thread remi
From: Rémi Denis-Courmont --- Makefile | 2 +- configure| 15 +++ ffbuild/arch.mak | 2 ++ 3 files changed, 18 insertions(+), 1 deletion(-) diff --git a/Makefile b/Makefile index 61f79e27ae..1fb742f390 100644 --- a/Makefile +++ b/Makefile @@ -91,7 +91,7 @@

[FFmpeg-devel] [PATCH 10/31] lavu/floatdsp: RISC-V V vector_fmul

2022-09-25 Thread remi
From: Rémi Denis-Courmont --- libavutil/riscv/float_dsp_init.c | 3 +++ libavutil/riscv/float_dsp_rvv.S | 17 + 2 files changed, 20 insertions(+) diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c index b829c0f736..60b79bd59e 100644 ---

[FFmpeg-devel] [PATCH 08/31] lavu/floatdsp: RISC-V V vector_fmul_scalar

2022-09-25 Thread remi
From: Rémi Denis-Courmont This is based on existing code from the VLC git tree with two minor changes to account for the different function prototypes. --- libavutil/float_dsp.c| 2 ++ libavutil/float_dsp.h| 1 + libavutil/riscv/Makefile | 4 +++-

[FFmpeg-devel] [PATCH 13/31] lavu/floatdsp: RISC-V V vector_dmac_scalar

2022-09-25 Thread remi
From: Rémi Denis-Courmont --- libavutil/riscv/float_dsp_init.c | 3 +++ libavutil/riscv/float_dsp_rvv.S | 18 ++ 2 files changed, 21 insertions(+) diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c index c2d93e0cd7..d17d0f66c5 100644 ---

[FFmpeg-devel] [PATCH 11/31] lavu/floatdsp: RISC-V V vector_dmul

2022-09-25 Thread remi
From: Rémi Denis-Courmont --- libavutil/riscv/float_dsp_init.c | 6 +- libavutil/riscv/float_dsp_rvv.S | 17 + 2 files changed, 22 insertions(+), 1 deletion(-) diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c index 60b79bd59e..6027a67b46

[FFmpeg-devel] [PATCH 07/31] lavu/riscv: fallback macros for SH{1, 2, 3}ADD

2022-09-25 Thread remi
From: Rémi Denis-Courmont Those mnemonics require the very latest binutils release at the time of writing. These macros provide seamless backward compatibility. --- libavutil/riscv/asm.S | 19 +++ 1 file changed, 19 insertions(+) diff --git a/libavutil/riscv/asm.S

[FFmpeg-devel] [PATCH 09/31] lavu/floatdsp: RISC-V V vector_dmul_scalar

2022-09-25 Thread remi
From: Rémi Denis-Courmont --- libavutil/riscv/float_dsp_init.c | 9 - libavutil/riscv/float_dsp_rvv.S | 17 + 2 files changed, 25 insertions(+), 1 deletion(-) diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c index de567c50d2..b829c0f736

[FFmpeg-devel] [PATCH 12/31] lavu/floatdsp: RISC-V V vector_fmac_scalar

2022-09-25 Thread remi
From: Rémi Denis-Courmont --- libavutil/riscv/float_dsp_init.c | 3 +++ libavutil/riscv/float_dsp_rvv.S | 19 +++ 2 files changed, 22 insertions(+) diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c index 6027a67b46..c2d93e0cd7 100644 ---

[FFmpeg-devel] [PATCH 05/31] lavu/cpu: CPU flags for the RISC-V Vector extension

2022-09-25 Thread remi
From: Rémi Denis-Courmont RVV defines a total of 12 different extensions, including: - 5 different instruction subsets: - Zve32x: 8-, 16- and 32-bit integers, - Zve32f: Zve32x plus single precision floats, - Zve64x: Zve32x plus 64-bit integers, - Zve64f: Zve32f plus Zve64x, - Zve64d:

[FFmpeg-devel] [PATCH 04/31] lavc/pixblockdsp: RISC-V I get_pixels

2022-09-25 Thread remi
From: Rémi Denis-Courmont Benchmarks on SiFive U74-MC (courtesy of Shanghai StarFive Tech): get_pixels_c: 180.0 get_pixels_rvi: 136.7 --- libavcodec/pixblockdsp.c| 2 + libavcodec/pixblockdsp.h| 2 + libavcodec/riscv/Makefile | 2 +

[FFmpeg-devel] [PATCH 02/31] lavu/riscv: initial common header for assembler macros

2022-09-25 Thread remi
From: Rémi Denis-Courmont --- libavutil/riscv/asm.S | 77 +++ 1 file changed, 77 insertions(+) create mode 100644 libavutil/riscv/asm.S diff --git a/libavutil/riscv/asm.S b/libavutil/riscv/asm.S new file mode 100644 index 00..dbd97f40a4 ---

[FFmpeg-devel] [PATCH 03/31] lavc/audiodsp: RISC-V F vector_clipf

2022-09-25 Thread remi
From: Rémi Denis-Courmont RV64G supports MIN & MAX instructions natively only on floating point registers, not general purpose ones. The later would require the Zbb extension. Due to that, it is actually faster to perform the clipping "properly" in FPU. Benchmarks on SiFive U74-MC (courtesy of

[FFmpeg-devel] [PATCH 01/31] lavu/cpu: detect RISC-V base extensions

2022-09-25 Thread remi
From: Rémi Denis-Courmont This introduces compile-time and run-time CPU detection on RISC-V. In practice, I doubt that FFmpeg will ever see a RISC-V CPU without all of I, F and D extensions, and if it does, it probably won't have run-time detection. So the flags are essentially always set. But

[FFmpeg-devel] [PATCH 29/29] lavc/aacpsdsp: RISC-V V hybrid_synthesis_deint

2022-09-22 Thread remi
From: Rémi Denis-Courmont --- libavcodec/riscv/aacpsdsp_init.c | 3 +++ libavcodec/riscv/aacpsdsp_rvv.S | 37 2 files changed, 40 insertions(+) diff --git a/libavcodec/riscv/aacpsdsp_init.c b/libavcodec/riscv/aacpsdsp_init.c index 76f55502ee..20b1a12741

[FFmpeg-devel] [PATCH 26/29] lavc/aacpsdsp: RISC-V V mul_pair_single

2022-09-22 Thread remi
From: Rémi Denis-Courmont --- libavcodec/riscv/aacpsdsp_init.c | 6 +- libavcodec/riscv/aacpsdsp_rvv.S | 19 +++ 2 files changed, 24 insertions(+), 1 deletion(-) diff --git a/libavcodec/riscv/aacpsdsp_init.c b/libavcodec/riscv/aacpsdsp_init.c index 525fc9aa38..90c9c501c3

[FFmpeg-devel] [PATCH 21/29] lavc/audiodsp: RISC-V V scalarproduct_int16

2022-09-22 Thread remi
From: Rémi Denis-Courmont --- libavcodec/riscv/audiodsp_init.c | 2 ++ libavcodec/riscv/audiodsp_rvv.S | 20 2 files changed, 22 insertions(+) diff --git a/libavcodec/riscv/audiodsp_init.c b/libavcodec/riscv/audiodsp_init.c index ddd561484f..6f38b7bc83 100644 ---

[FFmpeg-devel] [PATCH 20/29] lavc/audiodsp: RISC-V V vector_clipf

2022-09-22 Thread remi
From: Rémi Denis-Courmont --- libavcodec/riscv/audiodsp_init.c | 7 ++- libavcodec/riscv/audiodsp_rvv.S | 18 ++ 2 files changed, 24 insertions(+), 1 deletion(-) diff --git a/libavcodec/riscv/audiodsp_init.c b/libavcodec/riscv/audiodsp_init.c index ce8b60ee52..ddd561484f

[FFmpeg-devel] [PATCH 28/29] lavc/aacpsdsp: RISC-V V hybrid_analysis_ileave

2022-09-22 Thread remi
From: Rémi Denis-Courmont --- libavcodec/riscv/aacpsdsp_init.c | 14 libavcodec/riscv/aacpsdsp_rvv.S | 37 2 files changed, 47 insertions(+), 4 deletions(-) diff --git a/libavcodec/riscv/aacpsdsp_init.c b/libavcodec/riscv/aacpsdsp_init.c index

[FFmpeg-devel] [PATCH 27/29] lavc/aacpsdsp: RISC-V V hybrid_analysis

2022-09-22 Thread remi
From: Rémi Denis-Courmont This starts with one-time initialisation of the 26 constant factors like 08edacc248bce3f8946d75e97188d189c74a6de6. That is done with the scalar instruction set. While the formula can readily be vectored, the gains would (probably) be more than lost in transfering the

[FFmpeg-devel] [PATCH 25/29] lavc/aacpsdsp: RISC-V V add_squares

2022-09-22 Thread remi
From: Rémi Denis-Courmont --- libavcodec/aacpsdsp.h| 1 + libavcodec/aacpsdsp_template.c | 2 ++ libavcodec/riscv/Makefile| 2 ++ libavcodec/riscv/aacpsdsp_init.c | 37 ++ libavcodec/riscv/aacpsdsp_rvv.S | 39

[FFmpeg-devel] [PATCH 24/29] lavc/vorbisdsp: RISC-V V inverse_coupling

2022-09-22 Thread remi
From: Rémi Denis-Courmont This uses the following vectorisation: for (i = 0; i < blocksize; i++) { ang[i] = mag[i] - copysignf(fmaxf(ang[i], 0.f), mag[i]); mag[i] = mag[i] - copysignf(fminf(ang[i], 0.f), mag[i]); } --- libavcodec/riscv/Makefile | 2 ++

[FFmpeg-devel] [PATCH 22/29] lavc/fmtconvert: RISC-V V int32_to_float_fmul_scalar

2022-09-22 Thread remi
From: Rémi Denis-Courmont --- libavcodec/fmtconvert.c| 2 ++ libavcodec/fmtconvert.h| 1 + libavcodec/riscv/Makefile | 2 ++ libavcodec/riscv/fmtconvert_init.c | 39 + libavcodec/riscv/fmtconvert_rvv.S | 40

[FFmpeg-devel] [PATCH 23/29] lavc/fmtconvert: RISC-V V int32_to_float_fmul_array8

2022-09-22 Thread remi
From: Rémi Denis-Courmont --- libavcodec/riscv/fmtconvert_init.c | 7 ++- libavcodec/riscv/fmtconvert_rvv.S | 29 + 2 files changed, 35 insertions(+), 1 deletion(-) diff --git a/libavcodec/riscv/fmtconvert_init.c b/libavcodec/riscv/fmtconvert_init.c index

[FFmpeg-devel] [PATCH 05/29] lavu/cpu: CPU flags for the RISC-V Vector extension

2022-09-22 Thread remi
From: Rémi Denis-Courmont RVV defines a total of 12 different extensions, including: - 5 different instruction subsets: - Zve32x: 8-, 16- and 32-bit integers, - Zve32f: Zve32x plus single precision floats, - Zve64x: Zve32x plus 64-bit integers, - Zve64f: Zve32f plus Zve64x, - Zve64d:

[FFmpeg-devel] [PATCH 19/29] lavc/audiodsp: RISC-V V vector_clip_int32

2022-09-22 Thread remi
From: Rémi Denis-Courmont --- libavcodec/riscv/Makefile| 1 + libavcodec/riscv/audiodsp_init.c | 9 libavcodec/riscv/audiodsp_rvv.S | 37 3 files changed, 47 insertions(+) create mode 100644 libavcodec/riscv/audiodsp_rvv.S diff --git

[FFmpeg-devel] [PATCH 17/29] lavu/floatdsp: RISC-V V scalarproduct_float

2022-09-22 Thread remi
From: Rémi Denis-Courmont --- libavutil/riscv/float_dsp_init.c | 2 ++ libavutil/riscv/float_dsp_rvv.S | 21 + 2 files changed, 23 insertions(+) diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c index dacd81c08b..cc9b7e83dc 100644 ---

[FFmpeg-devel] [PATCH 18/29] lavu/fixeddsp: RISC-V V butterflies_fixed

2022-09-22 Thread remi
From: Rémi Denis-Courmont --- libavutil/fixed_dsp.c| 4 +++- libavutil/fixed_dsp.h| 1 + libavutil/riscv/Makefile | 4 +++- libavutil/riscv/fixed_dsp_init.c | 38 + libavutil/riscv/fixed_dsp_rvv.S | 41

[FFmpeg-devel] [PATCH 16/29] lavu/floatdsp: RISC-V V vector_fmul_window

2022-09-22 Thread remi
From: Rémi Denis-Courmont --- libavutil/riscv/float_dsp_init.c | 3 +++ libavutil/riscv/float_dsp_rvv.S | 35 2 files changed, 38 insertions(+) diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c index 9b8fd9942b..dacd81c08b

  1   2   3   >