[FFmpeg-devel] [PATCH] mfenc: Fix double frees on init errors

2023-01-31 Thread Martin Storsjö
(from mf_close) is a no-op in that case. Signed-off-by: Martin Storsjö --- libavcodec/mf_utils.c | 6 -- libavcodec/mfenc.c| 2 ++ 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/libavcodec/mf_utils.c b/libavcodec/mf_utils.c index 48e3a63efc..50b9fdb2c4 100644

Re: [FFmpeg-devel] [PATCH] mov: Reduce the verbosity of the warning about fragmented MP4 vs advanced edit lists

2023-01-31 Thread Martin Storsjö
On Mon, 23 Jan 2023, Derek Buitenhuis wrote: On 1/17/2023 9:31 AM, Martin Storsjö wrote: Only warn if the advanced_editlist option is enabled (it is enabled by default though) so we don't print one warning for each track, and demote the warning to AV_LOG_LEVEL_VERBOSE; this message does get

Re: [FFmpeg-devel] [PATCH] lavu/video_enc_params: Avoid relying on an undefined C construct

2023-01-31 Thread Martin Storsjö
On Wed, 18 Jan 2023, Anton Khirnov wrote: Quoting Martin Storsjö (2023-01-15 23:47:41) The construct of using offsetof on a (potentially anonymous) struct defined within the offsetof expression, while supported by all current compilers, has been declared explicitly undefined by the C standards

Re: [FFmpeg-devel] [PATCH] lavfi/vf_ssim360: Fix compilation with MSVC

2023-01-28 Thread Martin Storsjö
On Sat, 28 Jan 2023, Martin Storsjö wrote: Don't use "static const" for compile time float constants, but use defines. This fixes the following error: src/libavfilter/vf_ssim360.c(549): error C2099: initializer is not a constant Signed-off-by: Martin Storsjö --- libavfilter/vf_ssi

[FFmpeg-devel] [PATCH] lavfi/vf_ssim360: Fix compilation with MSVC

2023-01-28 Thread Martin Storsjö
Don't use "static const" for compile time float constants, but use defines. This fixes the following error: src/libavfilter/vf_ssim360.c(549): error C2099: initializer is not a constant Signed-off-by: Martin Storsjö --- libavfilter/vf_ssim360.c | 6 +++--- 1 file changed, 3 insert

Re: [FFmpeg-devel] [PATCH] avcodec/mfenc: fix double-free on init failure

2023-01-21 Thread Martin Storsjö
On Fri, 20 Jan 2023, Cameron Gutman wrote: mfenc sets FF_CODEC_CAP_INIT_CLEANUP, so calling mf_close() on failure inside mf_init() results in a double-free. Signed-off-by: Cameron Gutman --- libavcodec/mfenc.c | 1 - 1 file changed, 1 deletion(-) diff --git a/libavcodec/mfenc.c

[FFmpeg-devel] [PATCH] mov: Reduce the verbosity of the warning about fragmented MP4 vs advanced edit lists

2023-01-17 Thread Martin Storsjö
Only warn if the advanced_editlist option is enabled (it is enabled by default though) so we don't print one warning for each track, and demote the warning to AV_LOG_LEVEL_VERBOSE; this message does get generated whenever parsing a fragmented MP4 file, regardless of whether the file actually uses

[FFmpeg-devel] [PATCH] lavu/video_enc_params: Avoid relying on an undefined C construct

2023-01-15 Thread Martin Storsjö
] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2350.htm [2] https://github.com/llvm/llvm-project/commit/e327b52766ed497e4779f4e652b9ad237dfda8e6 [3] https://reviews.llvm.org/D133574#4053647 Signed-off-by: Martin Storsjö --- libavutil/video_enc_params.c | 10 +- 1 file changed, 5

Re: [FFmpeg-devel] [PATCH] arm32/neon: Avoid using bge/beq for function calls

2023-01-14 Thread Martin Storsjö
Hi Rui, On Sat, 14 Jan 2023, Rui Ueyama wrote: On Sat, 7 Jan 2023, Rui Ueyama wrote: It looks like compiler-generated code always uses `b`, `bl` or `blx` instructions for function calls. These instructions have a 24-bit immediate and therefore can jump anywhere between PC +- 16 MiB. This

Re: [FFmpeg-devel] [PATCH] arm32/neon: Avoid using bge/beq for function calls

2023-01-09 Thread Martin Storsjö
On Mon, 9 Jan 2023, Martin Storsjö wrote: Hi Rui, Long time no see! On Sat, 7 Jan 2023, Rui Ueyama wrote: It looks like compiler-generated code always uses `b`, `bl` or `blx` instructions for function calls. These instructions have a 24-bit immediate and therefore can jump anywhere between

Re: [FFmpeg-devel] [PATCH] arm32/neon: Avoid using bge/beq for function calls

2023-01-09 Thread Martin Storsjö
Hi Rui, Long time no see! On Sat, 7 Jan 2023, Rui Ueyama wrote: It looks like compiler-generated code always uses `b`, `bl` or `blx` instructions for function calls. These instructions have a 24-bit immediate and therefore can jump anywhere between PC +- 16 MiB. This hand-written assembly

Re: [FFmpeg-devel] [PATCH] fate: Mark the tiff-zip-* tests as requiring zlib

2022-11-17 Thread Martin Storsjö
On Thu, 17 Nov 2022, Martin Storsjö wrote: Signed-off-by: Martin Storsjö --- tests/fate/image.mak | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/tests/fate/image.mak b/tests/fate/image.mak index 167c8ccf2c..42dd90feaa 100644 --- a/tests/fate/image.mak +++ b/tests/fate

[FFmpeg-devel] [PATCH] fate: Mark the tiff-zip-* tests as requiring zlib

2022-11-17 Thread Martin Storsjö
Signed-off-by: Martin Storsjö --- tests/fate/image.mak | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/tests/fate/image.mak b/tests/fate/image.mak index 167c8ccf2c..42dd90feaa 100644 --- a/tests/fate/image.mak +++ b/tests/fate/image.mak @@ -513,12 +513,13 @@ fate-tiff

Re: [FFmpeg-devel] [PATCH] avutil/tx: use llrintf() to convert a float into a 64 bit integer

2022-11-08 Thread Martin Storsjö
On Tue, 8 Nov 2022, James Almer wrote: Should fix fate failures on Windowx x86 targets, where long is 32 bits. Signed-off-by: James Almer --- libavutil/tx_priv.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libavutil/tx_priv.h b/libavutil/tx_priv.h index

Re: [FFmpeg-devel] [PATCH 2/2] swscale/output: Bias 16bps output calculations to improve non overflowing range for GBRP16/GBRPF32

2022-11-02 Thread Martin Storsjö
On Wed, 2 Nov 2022, Michael Niedermayer wrote: On Wed, Nov 02, 2022 at 10:16:57PM +0100, Andreas Rheinhardt wrote: Michael Niedermayer: On Wed, Nov 02, 2022 at 10:02:39PM +0100, Michael Niedermayer wrote: Fixes: integer overflow Signed-off-by: Michael Niedermayer --- libswscale/output.c

Re: [FFmpeg-devel] [PATCH 0/3] sw_scale: Provide neon implementation for hscale

2022-11-01 Thread Martin Storsjö
On Fri, 28 Oct 2022, Hubert Mazur wrote: This patchset contains arm64 neon implementation of hscale functions. Fixed minor style issues and declared C function wrappers as static. This patchset do not contain the patch for checkasm tool, as the previous one did. The reason behind it was failing

Re: [FFmpeg-devel] [PATCH] swscale: aarch64: Fix yuv2rgb with negative strides

2022-10-27 Thread Martin Storsjö
On Tue, 25 Oct 2022, Martin Storsjö wrote: Treat the 32 bit stride registers as signed. Alternatively, we could make the stride arguments ptrdiff_t instead of int, and changing all of the assembly to operate on these registers with their full 64 bit width, but that's a much larger and more

Re: [FFmpeg-devel] [PATCH] configure: Remove a leftover comment about MSVC C99 support

2022-10-27 Thread Martin Storsjö
On Wed, 19 Oct 2022, Martin Storsjö wrote: Support for building with older versions of MSVC (with the c99wrap/c99conv frontend) was removed in ce943dd6acbfdfc40223c0fb24d4cad438e6499c. Signed-off-by: Martin Storsjö --- configure | 6 -- 1 file changed, 6 deletions(-) diff --git

[FFmpeg-devel] [PATCH] swscale: aarch64: Fix yuv2rgb with negative strides

2022-10-25 Thread Martin Storsjö
operation, which would clamp the intermediates to 32 bit still). Fixes: https://trac.ffmpeg.org/ticket/9985 Signed-off-by: Martin Storsjö --- libswscale/aarch64/yuv2rgb_neon.S | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/libswscale/aarch64/yuv2rgb_neon.S b/libswscale

Re: [FFmpeg-devel] [PATCH 4/4] sw_scale: Add specializations for hscale 16 to 19

2022-10-24 Thread Martin Storsjö
On Mon, 17 Oct 2022, Hubert Mazur wrote: Provide arm64 neon optimized implementations for hscale16To19 with filter sizes 4, 8 and X4. The tests and benchmarks run on AWS Graviton 2 instances. The results from a checkasm tool are shown below. hscale_16_to_19__fs_4_dstW_512_c: 6216.0

Re: [FFmpeg-devel] [PATCH 1/4] sw_scale: Add specializations for hscale 8 to 19

2022-10-24 Thread Martin Storsjö
On Mon, 17 Oct 2022, Hubert Mazur wrote: Add arm64 neon implementations for hscale 8 to 19 with filter sizes 4, 4X and 8. Both implementations are based on very similar ones dedicated to hscale 8 to 15. The major changes refer to saving the data - instead of writing the result as int16_t it is

Re: [FFmpeg-devel] [PATCH v3] lavc/aarch64: add hevc horizontal qpel/uni/bi

2022-10-24 Thread Martin Storsjö
On Tue, 11 Oct 2022, J. Dekker wrote: checkasm benchmark on Ampere Altra (Neoverse N1): put_hevc_qpel_bi_h4_8_c: 170.7 put_hevc_qpel_bi_h4_8_neon: 64.5 put_hevc_qpel_bi_h6_8_c: 373.7 put_hevc_qpel_bi_h6_8_neon: 130.2 put_hevc_qpel_bi_h8_8_c: 662.0 put_hevc_qpel_bi_h8_8_neon: 138.5

[FFmpeg-devel] [PATCH] configure: Remove a leftover comment about MSVC C99 support

2022-10-19 Thread Martin Storsjö
Support for building with older versions of MSVC (with the c99wrap/c99conv frontend) was removed in ce943dd6acbfdfc40223c0fb24d4cad438e6499c. Signed-off-by: Martin Storsjö --- configure | 6 -- 1 file changed, 6 deletions(-) diff --git a/configure b/configure index 6712d045d9..ed52212f93

Re: [FFmpeg-devel] [PATCH] aarch64: Implement stack spilling in a consistent way.

2022-10-10 Thread Martin Storsjö
On Sun, 9 Oct 2022, reimar.doeffin...@gmx.de wrote: From: Reimar Döffinger Currently it is done in several different ways, which might cause needless dependencies or in case of tx_float_neon.S is incorrect. Signed-off-by: Reimar Döffinger --- libavcodec/aarch64/fft_neon.S | 3 +-

Re: [FFmpeg-devel] [PATCH v2 0/7] arm64 neon implementation for 8bits functions

2022-10-04 Thread Martin Storsjö
: Provide neon implementation of nsse8 lavc/aarch64: Provide optimized implementation of vsse8 for arm64. lavc/aarch64: Add neon implementation for vsse_intra8 Martin Storsjö (3): aarch64: me_cmp: Improve scheduling in ff_pix_abs8_y2_neon aarch64: me_cmp: Fix up the prologue

[FFmpeg-devel] [PATCH] libavcodec: Fix a comment typo

2022-10-03 Thread Martin Storsjö
Signed-off-by: Martin Storsjö --- libavcodec/packet.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libavcodec/packet.h b/libavcodec/packet.h index 404d520071..f28e7e7011 100644 --- a/libavcodec/packet.h +++ b/libavcodec/packet.h @@ -161,7 +161,7 @@ enum

Re: [FFmpeg-devel] [PATCH] arm: vc1dsp: Canonicalize the syntax for aligned NEON loads/stores

2022-09-29 Thread Martin Storsjö
On Wed, 28 Sep 2022, Martin Storsjö wrote: This hopefully should fix building with older toolchains, hopefully fixing the fate failures on http://fate.ffmpeg.org/history.cgi?slot=armel5tej-qemu-debian-gcc4.4. Signed-off-by: Martin Storsjö --- libavcodec/arm/vc1dsp_neon.S | 40

[FFmpeg-devel] [PATCH] riscv: Fix linking without RVV; change #ifdef into #if

2022-09-28 Thread Martin Storsjö
--- This should hopefully fix the current build failures at http://fate.ffmpeg.org/history.cgi?slot=riscv64-linux-gnu-clang-14. --- libavcodec/riscv/fmtconvert_init.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libavcodec/riscv/fmtconvert_init.c

[FFmpeg-devel] [PATCH 2/2] aarch64: me_cmp: Avoid using the non-unrolled codepath for the minimum unroll size

2022-09-28 Thread Martin Storsjö
Signed-off-by: Martin Storsjö --- libavcodec/aarch64/me_cmp_neon.S | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/libavcodec/aarch64/me_cmp_neon.S b/libavcodec/aarch64/me_cmp_neon.S index 832a7cb22d..c710358ab7 100644 --- a/libavcodec/aarch64/me_cmp_neon.S +++ b

[FFmpeg-devel] [PATCH 1/2] aarch64: me_cmp: Avoid redundant loads in ff_pix_abs16_y2_neon

2022-09-28 Thread Martin Storsjö
This avoids one redundant load per row; pix3 from the previous iteration can be used as pix2 in the next one. Before: Cortex A53A72A73 pix_abs_0_2_neon: 138.0 59.7 48.0 After: pix_abs_0_2_neon: 109.7 50.2 39.5 Signed-off-by: Martin Storsjö --- libavcodec/aarch64

Re: [FFmpeg-devel] [PATCH 4/4] lavc/aarch64: Add neon implementation for vsse_intra8

2022-09-28 Thread Martin Storsjö
On Mon, 26 Sep 2022, Grzegorz Bernacki wrote: Provide optimized implementation for vsse_intra8 for arm64. Performance tests are shown below. - vsse_5_c: 87.7 - vsse_5_neon: 26.2 Benchmarks and tests are run with checkasm tool on AWS Graviton 3. --- libavcodec/aarch64/me_cmp_init_aarch64.c |

Re: [FFmpeg-devel] [PATCH 3/4] lavc/aarch64: Provide optimized implementation of vsse8 for arm64.

2022-09-28 Thread Martin Storsjö
On Mon, 26 Sep 2022, Grzegorz Bernacki wrote: Provide optimized implementation of vsse8 for arm64. Performance comparison tests are shown below. - vsse_1_c: 141.5 - vsse_1_neon: 32.5 Benchmarks and tests are run with checkasm tool on AWS Graviton 3. Signed-off-by: Grzegorz Bernacki ---

Re: [FFmpeg-devel] [PATCH 2/4] lavc/aarch64: Provide neon implementation of nsse8

2022-09-28 Thread Martin Storsjö
On Mon, 26 Sep 2022, Grzegorz Bernacki wrote: Add vectorized implementation of nsse8 function. Performance comparison tests are shown below. - nsse_1_c: 256.0 - nsse_1_neon: 82.7 Benchmarks and tests run with checkasm tool on AWS Graviton 3. Signed-off-by: Grzegorz Bernacki ---

Re: [FFmpeg-devel] [PATCH 1/4] lavc/aarch64: Add neon implementation for pix_abs8 functions.

2022-09-28 Thread Martin Storsjö
On Mon, 26 Sep 2022, Grzegorz Bernacki wrote: Provide optimized implementation of pix_abs8 function for arm64. Performance comparison tests are shown below: pix_abs_1_1_c: 162.5 pix_abs_1_1_neon: 27.0 pix_abs_1_2_c: 174.0 pix_abs_1_2_neon: 23.5 pix_abs_1_3_c: 203.2 pix_abs_1_3_neon: 34.7

Re: [FFmpeg-devel] [PATCH] riscv: Use the correct path for including asm.S

2022-09-28 Thread Martin Storsjö
On Wed, 28 Sep 2022, Rémi Denis-Courmont wrote: Le 28 septembre 2022 10:13:57 GMT+03:00, "Martin Storsjö" a écrit : Signed-off-by: Martin Storsjö --- This should hopefully fix the compile failures on fate, http://fate.ffmpeg.org/report.cgi?time=20220927222508=riscv64-linux-

[FFmpeg-devel] [PATCH] arm: vc1dsp: Canonicalize the syntax for aligned NEON loads/stores

2022-09-28 Thread Martin Storsjö
This hopefully should fix building with older toolchains, hopefully fixing the fate failures on http://fate.ffmpeg.org/history.cgi?slot=armel5tej-qemu-debian-gcc4.4. Signed-off-by: Martin Storsjö --- libavcodec/arm/vc1dsp_neon.S | 40 ++-- 1 file changed, 20

[FFmpeg-devel] [PATCH] riscv: Use the correct path for including asm.S

2022-09-28 Thread Martin Storsjö
Signed-off-by: Martin Storsjö --- This should hopefully fix the compile failures on fate, http://fate.ffmpeg.org/report.cgi?time=20220927222508=riscv64-linux-gnu-gcc-12 and http://fate.ffmpeg.org/report.cgi?time=20220927225014=riscv64-linux-gnu-clang-14. --- libavcodec/riscv/fmtconvert_rvv.S

Re: [FFmpeg-devel] Patchwork issues

2022-09-26 Thread Martin Storsjö
On Mon, 26 Sep 2022, Marvin Scholz wrote: As I am not sure who else to email about this, I'll just post it here. I tried to register for Patchwork, however I got an error when registering. I tried again and was told the account already exists, I tried to reset the password for the account but

Re: [FFmpeg-devel] [PATCH 1/6] opus: convert encoder and decoder to lavu/tx

2022-09-25 Thread Martin Storsjö
On Sat, 24 Sep 2022, Lynne wrote: What about ac3dsp then - that one seems like it's fairly optimized for arm? Haven't touched them, they're still being used. Unfortunately, for AC3, the full MDCT optimizations in lavc do make a difference and the overall decoder becomes 15% slower with this

Re: [FFmpeg-devel] [PATCH 1/6] opus: convert encoder and decoder to lavu/tx

2022-09-24 Thread Martin Storsjö
On Sat, 24 Sep 2022, Hendrik Leppkes wrote: On Sat, Sep 24, 2022 at 9:26 PM Hendrik Leppkes wrote: On Sat, Sep 24, 2022 at 8:43 PM Martin Storsjö wrote: > > On Sat, 24 Sep 2022, Lynne wrote: > > > This commit changes both the encoder and decoder to use the new lavu/tx code

Re: [FFmpeg-devel] [PATCH 1/6] opus: convert encoder and decoder to lavu/tx

2022-09-24 Thread Martin Storsjö
On Sat, 24 Sep 2022, Lynne wrote: This commit changes both the encoder and decoder to use the new lavu/tx code, which has faster C transforms and more assembly optimizations. What's the case of e.g. 32 bit arm - that does have a bunch of fft and mdct assembly, but is that something that ends

Re: [FFmpeg-devel] [PATCH 0/3] Provide neon implementations

2022-09-21 Thread Martin Storsjö
On Tue, 20 Sep 2022, Hubert Mazur wrote: This fixes issues addressed in previous patchset: - move sub instruction in vsad8_intra, - remove unnecessary mov instructions, - remove single lane extraction in loop and place it at the end. Removing mov instructions from pix_median_abs functions

Re: [FFmpeg-devel] [PATCH 3/3] lavc/aarch64: Add neon implementation for pix_median_abs8

2022-09-16 Thread Martin Storsjö
On Tue, 13 Sep 2022, Hubert Mazur wrote: Provide optimized implementation for pix_median_abs16 function. Forgot to update this part of the commit message here too. Performance comparison tests are shown below. - median_sad_1_c: 273.7 - median_sad_1_neon: 98.2 Benchmarks and tests run with

Re: [FFmpeg-devel] [PATCH 2/3] lavc/aarch64: Add neon implementation for vsad8_intra

2022-09-16 Thread Martin Storsjö
On Tue, 13 Sep 2022, Hubert Mazur wrote: Provide optimized implementation for pix_median_abs16 function. You've forgot to update this part of the commit message. Performance comparison tests are shown below. - vsad_5_c: 94.7 - vsad_5_neon: 20.7 Benchmarks and tests run with checkasm tool

Re: [FFmpeg-devel] [PATCH 1/3] lavc/aarch64: Add neon implementation for pix_median_abs16

2022-09-16 Thread Martin Storsjö
On Tue, 13 Sep 2022, Hubert Mazur wrote: Provide optimized implementation for pix_median_abs16 function. Performance comparison tests are shown below. - median_sad_0_c: 722.0 - median_sad_0_neon: 144.7 Benchmarks and tests run with checkasm tool on AWS Graviton 3. Signed-off-by: Hubert Mazur

Re: [FFmpeg-devel] [PATCH v2] avcodec/arm/sbcenc: avoid callee preserved vfp registers

2022-09-12 Thread Martin Storsjö
On Sun, 25 Aug 2019, James Cowgill wrote: When compiling FFmpeg with GCC-9, some very random segfaults were observed in code which had previously called down into the SBC encoder NEON assembly routines. This was caused by these functions clobbering some of the vfp callee saved registers (d8 -

Re: [FFmpeg-devel] [PATCH 0/5] Provide optimized neon implementation

2022-09-09 Thread Martin Storsjö
On Thu, 8 Sep 2022, Hubert Mazur wrote: Fix minor issues in the patches. Regarding vsse16 I didn't change saba & umlal to sub & smlal. It doesn't affect the performance, so left it as it was. The majority of changes refer to nsse16: - fixed indentation (thanks for pointing out), - applied the

Re: [FFmpeg-devel] [PATCH 5/5] lavc/aarch64: Provide neon implementation of nsse16

2022-09-07 Thread Martin Storsjö
On Tue, 6 Sep 2022, Hubert Mazur wrote: Add vectorized implementation of nsse16 function. Performance comparison tests are shown below. - nsse_0_c: 707.0 - nsse_0_neon: 120.0 Benchmarks and tests run with checkasm tool on AWS Graviton 3. Signed-off-by: Hubert Mazur ---

Re: [FFmpeg-devel] [PATCH 2/5] lavc/aarch64: Add neon implementation of vsse16

2022-09-07 Thread Martin Storsjö
On Tue, 6 Sep 2022, Hubert Mazur wrote: Provide optimized implementation of vsse16 for arm64. Performance comparison tests are shown below. - vsse_0_c: 254.4 - vsse_0_neon: 64.7 Benchmarks and tests are run with checkasm tool on AWS Graviton 3. Signed-off-by: Hubert Mazur ---

Re: [FFmpeg-devel] [PATCH 0/5] Provide optimized neon implementation

2022-09-07 Thread Martin Storsjö
On Tue, 6 Sep 2022, Hubert Mazur wrote: Provide optimized implementations for me_cmp functions. This set of patches fixes all issues addressed in previous review. Major changes: - Remove redundant loads since the data can be reused. - Improve style. - Fix issues with unrecognized symbols.

Re: [FFmpeg-devel] [PATCH] slicethread: Limit the automatic number of threads to 16

2022-09-06 Thread Martin Storsjö
On Tue, 6 Sep 2022, Lukas Fellechner wrote: There are really two separate issues here: 1. Running out of address space in 32-bit processes It probably makes sense to limit auto threads to 16, but it should only be done in 32-bit processes. FWIW, this was my first approach, until Andreas

Re: [FFmpeg-devel] [PATCH 1/2] x86/tx_float: add support for calling assembly functions from assembly

2022-09-06 Thread Martin Storsjö
On Tue, 6 Sep 2022, Mattias Wadman wrote: On Sat, Sep 3, 2022 at 3:41 AM Lynne wrote: Needed for the next patch. We get this for the extremely small cost of a branch on _ns functions, which wouldn't be used anyway with assembly. Patch attached. Hi, I have issues building on macOS

[FFmpeg-devel] [PATCH v2] x86/tx_float: Fix building for platforms with a symbol prefix

2022-09-06 Thread Martin Storsjö
This fixes building for x86 macOS (both i386 and x86_64) and i386 windows. --- v2: Add mangle() in a couple more places, that weren't noticed on i386 windows. --- libavutil/x86/tx_float.asm | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/libavutil/x86/tx_float.asm

[FFmpeg-devel] [PATCH] x86/tx_float: Fix building for platforms with a symbol prefix

2022-09-06 Thread Martin Storsjö
This fixes building for e.g. i386 windows. --- libavutil/x86/tx_float.asm | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/libavutil/x86/tx_float.asm b/libavutil/x86/tx_float.asm index 1b9131e7fa..ace19788a6 100644 --- a/libavutil/x86/tx_float.asm +++

Re: [FFmpeg-devel] [PATCH] slicethread: Limit the automatic number of threads to 16

2022-09-05 Thread Martin Storsjö
On Mon, 5 Sep 2022, Martin Storsjö wrote: This matches a similar cap on the number of automatic threads in libavcodec/pthread_slice.c. On systems with lots of cores, this does speed things up in general (measurable on the level of the runtime of running "make fate"), and fixes a c

[FFmpeg-devel] [PATCH] slicethread: Limit the automatic number of threads to 16

2022-09-05 Thread Martin Storsjö
This matches a similar cap on the number of automatic threads in libavcodec/pthread_slice.c. On systems with lots of cores, this does speed things up in general (measurable on the level of the runtime of running "make fate"), and fixes a couple fate failures in 32 bit mode on such machines (where

Re: [FFmpeg-devel] [PATCH] cpu: Limit the number of auto threads in 32 bit builds

2022-09-05 Thread Martin Storsjö
On Mon, 5 Sep 2022, Andreas Rheinhardt wrote: Martin Storsjö: Limit the returned value from av_cpu_count to sensible amounts in 32 bit builds. This chosen limit, 64, is somewhat arbitrary - a 32 bit process is capable of creating much more than 64 threads. But in many cases, multiple parts

[FFmpeg-devel] [PATCH] cpu: Limit the number of auto threads in 32 bit builds

2022-09-05 Thread Martin Storsjö
Limit the returned value from av_cpu_count to sensible amounts in 32 bit builds. This chosen limit, 64, is somewhat arbitrary - a 32 bit process is capable of creating much more than 64 threads. But in many cases, multiple parts of the encoding pipeline (decoder, filters, encoders) all create a

Re: [FFmpeg-devel] [PATCH 5/5] lavc/aarch64: Provide neon implementation of nsse16

2022-09-04 Thread Martin Storsjö
On Mon, 22 Aug 2022, Hubert Mazur wrote: Add vectorized implementation of nsse16 function. Performance comparison tests are shown below. - nsse_0_c: 707.0 - nsse_0_neon: 120.0 Benchmarks and tests run with checkasm tool on AWS Graviton 3. Signed-off-by: Hubert Mazur ---

Re: [FFmpeg-devel] [PATCH 4/5] lavc/aarch64: Add neon implementation for vsse_intra16

2022-09-04 Thread Martin Storsjö
On Mon, 22 Aug 2022, Hubert Mazur wrote: Provide optimized implementation for vsse_intra16 for arm64. Performance tests are shown below. - vsse_4_c: 153.7 - vsse_4_neon: 34.2 Benchmarks and tests are run with checkasm tool on AWS Graviton 3. Signed-off-by: Hubert Mazur ---

Re: [FFmpeg-devel] [PATCH 3/5] lavc/aarch64: Add neon implementation for vsad_intra16

2022-09-04 Thread Martin Storsjö
On Mon, 22 Aug 2022, Hubert Mazur wrote: Provide optimized implementation for vsad_intra16 function for arm64. Performance comparison tests are shown below. - vsad_4_c: 177.2 - vsad_4_neon: 24.5 Benchmarks and tests are run with checkasm tool on AWS Gravtion 3. Signed-off-by: Hubert Mazur

Re: [FFmpeg-devel] [PATCH 2/5] lavc/aarch64: Add neon implementation of vsse16

2022-09-04 Thread Martin Storsjö
On Mon, 22 Aug 2022, Hubert Mazur wrote: Provide optimized implementation of vsse16 for arm64. Performance comparison tests are shown below. - vsse_0_c: 254.4 - vsse_0_neon: 64.7 Benchmarks and tests are run with checkasm tool on AWS Graviton 3. Signed-off-by: Hubert Mazur ---

Re: [FFmpeg-devel] [PATCH 2/2] arm: relax byte-swap assembler constraints

2022-09-03 Thread Martin Storsjö
On Sat, 3 Sep 2022, r...@remlab.net wrote: From: Rémi Denis-Courmont There are no particular reasons to force the compiler to use the same register as output and input operand. This forces an extra MOV instruction if the input value needs to be reused after the swap. In most cases, this

Re: [FFmpeg-devel] [PATCH] avcodec/mathops: Set hidden visibility where advantageous

2022-09-03 Thread Martin Storsjö
On Sat, 3 Sep 2022, Andreas Rheinhardt wrote: It is advantageous for ff_crop_tab, as the base pointer used to access this table is not the first element of it. But the real base pointer is still at a constant offset from the code/the GOT and can therefore be accessed relative to the instruction

Re: [FFmpeg-devel] [PATCH 1/5] lavc/aarch64: Add neon implementation for vsad16

2022-09-02 Thread Martin Storsjö
On Mon, 22 Aug 2022, Hubert Mazur wrote: Provide optimized implementation of vsad16 function for arm64. Performance comparison tests are shown below. - vsad_0_c: 285.0 - vsad_0_neon: 42.5 Benchmarks and tests are run with checkasm tool on AWS Graviton 3. Signed-off-by: Hubert Mazur ---

Re: [FFmpeg-devel] [PATCH 5/5] lavc/aarch64: Provide neon implementation of nsse16

2022-09-02 Thread Martin Storsjö
On Mon, 22 Aug 2022, Hubert Mazur wrote: Add vectorized implementation of nsse16 function. Performance comparison tests are shown below. - nsse_0_c: 707.0 - nsse_0_neon: 120.0 Benchmarks and tests run with checkasm tool on AWS Graviton 3. Signed-off-by: Hubert Mazur ---

Re: [FFmpeg-devel] [PATCH 2/2] arm: rv40dsp: Change stride parameters to ptrdiff_t

2022-09-02 Thread Martin Storsjö
On Tue, 9 Aug 2022, Martin Storsjö wrote: These were missed when h264_chroma_mc_func was changed in e4a94d8b36c48d95a7d412c40d7b558422ff659c. Signed-off-by: Martin Storsjö --- libavcodec/arm/rv40dsp_init_arm.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) OK'd by Andreas

Re: [FFmpeg-devel] [PATCH v2] arm: Check the build time constants in av_clip_*intp2

2022-09-02 Thread Martin Storsjö
On Fri, 26 Aug 2022, Martin Storsjö wrote: This fixes building for arm targets with optimizations disabled. --- libavutil/arm/intmath.h | 24 ++-- 1 file changed, 18 insertions(+), 6 deletions(-) diff --git a/libavutil/arm/intmath.h b/libavutil/arm/intmath.h index 5311a7d52b

Re: [FFmpeg-devel] [PATCH v4] libavcodec: Set hidden visibility on global symbols accessed from AArch64 assembly

2022-09-02 Thread Martin Storsjö
On Sat, 27 Aug 2022, Martin Storsjö wrote: The AArch64 assembly accesses those symbols directly, without indirection via e.g. the GOT on ELF. In order for this not to require text relocations, those symbols need to be resolved fully at link time, i.e. those symbols can't be interposable

[FFmpeg-devel] [PATCH v4] libavcodec: Set hidden visibility on global symbols accessed from AArch64 assembly

2022-08-27 Thread Martin Storsjö
that are accessed from AArch64 assembly as hidden, so that they are resolved fully at link time even without the version script and -Wl,-Bsymbolic. Signed-off-by: Martin Storsjö --- v4: Moved the attribute definition to a new, standalone header (which only depends on libavutil/attributes.h

[FFmpeg-devel] [PATCH v2] arm: Check the build time constants in av_clip_*intp2

2022-08-26 Thread Martin Storsjö
This fixes building for arm targets with optimizations disabled. --- libavutil/arm/intmath.h | 24 ++-- 1 file changed, 18 insertions(+), 6 deletions(-) diff --git a/libavutil/arm/intmath.h b/libavutil/arm/intmath.h index 5311a7d52b..f19b21e98d 100644 ---

[FFmpeg-devel] [PATCH] arm: Skip certain inline assembly functions if built without optimizations

2022-08-26 Thread Martin Storsjö
These inline assembly functions rely on being inlined into the caller, so that the parameter "int p" can be a known assembly time constant, instead of a variable parameter. __OPTIMIZE__ is a built-in define which is set by both GCC and Clang (the two main compilers supporting our inline assembly)

Re: [FFmpeg-devel] [PATCH] lavu/tx: implement aarch64 NEON SIMD

2022-08-25 Thread Martin Storsjö
On Sun, 14 Aug 2022, Lynne wrote: The fastest fast Fourier transform in not just the west, but the world, now for the most popular toy ISA. On a high level, it follows the design of the AVX2 version closely, with the exception that the input is slightly less permuted as we don't have to do

Re: [FFmpeg-devel] [PATCH v2] checkasm: sw_scale: Produce more realistic test filter coefficients for yuv2yuvX

2022-08-19 Thread Martin Storsjö
On Thu, 18 Aug 2022, Alan Kelly wrote: Thanks Martin for doing this. On Thu, Aug 18, 2022 at 10:16 AM Martin Storsjö wrote: This avoids triggering overflows in the filters, and avoids stray test failures in the approximate functions on x86; due to rounding

Re: [FFmpeg-devel] [PATCH 5/5] lavc/aarch64: Add neon implementation for pix_abs8

2022-08-18 Thread Martin Storsjö
On Tue, 16 Aug 2022, Hubert Mazur wrote: Provide optimized implementation of pix_abs8 function for arm64. Performance comparison tests are shown below. - pix_abs_1_0_c: 101.2 - pix_abs_1_0_neon: 22.5 - sad_1_c: 101.2 - sad_1_neon: 22.5 Benchmarks and tests are run with checkasm tool on AWS

Re: [FFmpeg-devel] [PATCH 4/5] lavc/aarch64: Add neon implementation for sse8

2022-08-18 Thread Martin Storsjö
On Tue, 16 Aug 2022, Hubert Mazur wrote: Provide optimized implementation of sse8 function for arm64. Performance comparison tests are shown below. - sse_1_c: 130.7 - sse_1_neon: 29.7 Benchmarks and tests run with checkasm tool on AWS Graviton 3. Signed-off-by: Hubert Mazur ---

Re: [FFmpeg-devel] [PATCH 3/5] lavc/aarch64: Add neon implementation for pix_abs16_y2

2022-08-18 Thread Martin Storsjö
On Tue, 16 Aug 2022, Hubert Mazur wrote: Provide optimized implementation of pix_abs16_y2 function for arm64. Performance comparison tests are shown below. pix_abs_0_2_c: 317.2 pix_abs_0_2_neon: 37.5 Benchmarks and tests run with checkasm tool on AWS Graviton 3. Signed-off-by: Hubert Mazur

Re: [FFmpeg-devel] [PATCH 2/5] lavc/aarch64: Add neon implementation for sse4

2022-08-18 Thread Martin Storsjö
On Tue, 16 Aug 2022, Hubert Mazur wrote: Provide neon implementation for sse4 function. Performance comparison tests are shown below. - sse_2_c: 80.7 - sse_2_neon: 31.0 Benchmarks and tests are run with checkasm tool on AWS Graviton 3. Signed-off-by: Hubert Mazur ---

Re: [FFmpeg-devel] [PATCH 1/5] lavc/aarch64: Add neon implementation for sse16

2022-08-18 Thread Martin Storsjö
On Tue, 16 Aug 2022, Hubert Mazur wrote: Provide neon implementation for sse16 function. Performance comparison tests are shown below. - sse_0_c: 268.2 - sse_0_neon: 43.5 Benchmarks and tests run with checkasm tool on AWS Graviton 3. Signed-off-by: Hubert Mazur ---

Re: [FFmpeg-devel] [PATCH 0/5] Provide neon implementation for me_cmp functions

2022-08-18 Thread Martin Storsjö
On Tue, 16 Aug 2022, Hubert Mazur wrote: Add arm64 neon implementation for functions from motion estimation family. All of them were tested and benchmarked using checkasm tool. The rare code paths, e.g. when filter_size % 4 != 0 were also tested. Instructions were manualy deinterleaved to

[FFmpeg-devel] [PATCH v2] checkasm: sw_scale: Produce more realistic test filter coefficients for yuv2yuvX

2022-08-18 Thread Martin Storsjö
This avoids triggering overflows in the filters, and avoids stray test failures in the approximate functions on x86; due to rounding differences, one implementation might overflow while another one doesn't. Signed-off-by: Martin Storsjö --- FWIW, this modification runs successfully with over

Re: [FFmpeg-devel] [PATCH 2/2] checkasm: sw_scale: Reduce range of test data in the yuv2yuvX test to get closer to real data

2022-08-18 Thread Martin Storsjö
On Wed, 17 Aug 2022, Ronald S. Bultje wrote: On Wed, Aug 17, 2022 at 4:32 PM Martin Storsjö wrote: This avoids overflows on some inputs in the x86 case, where the assembly version would clip/overflow differently from the C reference function. This doesn't seem

[FFmpeg-devel] [PATCH 2/2] checkasm: sw_scale: Reduce range of test data in the yuv2yuvX test to get closer to real data

2022-08-17 Thread Martin Storsjö
more realistic output pixel values, instead of having essentially all pixels clipped to either 0 or 255. Signed-off-by: Martin Storsjö --- tests/checkasm/sw_scale.c | 8 1 file changed, 8 insertions(+) diff --git a/tests/checkasm/sw_scale.c b/tests/checkasm/sw_scale.c index d72506ed86

[FFmpeg-devel] [PATCH 1/2] checkasm: sw_scale: Fix the difference printing for approximate functions

2022-08-17 Thread Martin Storsjö
Don't stop directly at the first differing pixel, but find the one that differs by more than the expected accuracy. Also print the failing value in check_yuv2yuvX. Signed-off-by: Martin Storsjö --- tests/checkasm/sw_scale.c | 14 ++ 1 file changed, 10 insertions(+), 4 deletions

Re: [FFmpeg-devel] [PATCH] avcodec/me_cmp: Remove now incorrect av_assert2()

2022-08-17 Thread Martin Storsjö
On Wed, 17 Aug 2022, Andreas Rheinhardt wrote: Since d69d12a5b9236b9d2f1fd247ea452f84cdd1aaf9 these av_assert2() (or more exactly, the ones in hadamard8_diff8x8_c() and hadamard8_intra8x8_c()) are hit. So just remove all of these asserts. (If the test were improved to know which functions

Re: [FFmpeg-devel] [PATCH v3] lavc/aarch64: hevc_add_res add 12bit variants

2022-08-16 Thread Martin Storsjö
On Tue, 16 Aug 2022, J. Dekker wrote: hevc_add_res_4x4_12_c: 46.0 hevc_add_res_4x4_12_neon: 18.7 hevc_add_res_8x8_12_c: 194.7 hevc_add_res_8x8_12_neon: 25.2 hevc_add_res_16x16_12_c: 716.0 hevc_add_res_16x16_12_neon: 69.7 hevc_add_res_32x32_12_c: 3820.7 hevc_add_res_32x32_12_neon: 261.0

Re: [FFmpeg-devel] [PATCH v2] lavc/aarch64: hevc_add_res add 12bit variants

2022-08-16 Thread Martin Storsjö
On Tue, 16 Aug 2022, J. Dekker wrote: hevc_add_res_4x4_12_c: 46.0 hevc_add_res_4x4_12_neon: 18.7 hevc_add_res_8x8_12_c: 194.7 hevc_add_res_8x8_12_neon: 25.2 hevc_add_res_16x16_12_c: 716.0 hevc_add_res_16x16_12_neon: 69.7 hevc_add_res_32x32_12_c: 3820.7 hevc_add_res_32x32_12_neon: 261.0

Re: [FFmpeg-devel] [PATCH 2/2] RFC: checkasm: motion: Test different h parameters

2022-08-16 Thread Martin Storsjö
On Thu, 4 Aug 2022, Martin Storsjö wrote: On Wed, 13 Jul 2022, Martin Storsjö wrote: Previously, the checkasm test always passed h=8, so no other cases were tested. Out of the me_cmp functions, in practice, some functions are hardcoded to always assume a 8x8 block (ignoring the h parameter

Re: [FFmpeg-devel] [PATCH v3 0/3] checkasm: updated tests for sw_scale

2022-08-16 Thread Martin Storsjö
On Sat, 13 Aug 2022, Swinney, Jonathan wrote: We don't generally use stdbool in ffmpeg, even if it's C99 - just use a plain int and 0/1. Updated this. Other than that, the checkasm changes look fine (I coauthored part of them - and your cleanup of my WIP patch looks good!). Yes, thank you

Re: [FFmpeg-devel] [PATCH v2] libswscale/aarch64: add another hscale specialization

2022-08-16 Thread Martin Storsjö
On Sat, 13 Aug 2022, Swinney, Jonathan wrote: This specialization handles the case where filtersize is 4 mod 8, e.g. 12, 20, etc. Aarch64 was previously using the c function for this case. This implementation speeds up that case significantly. hscale_8_to_15__fs_12_dstW_512_c: 6234.1

[FFmpeg-devel] [PATCH 1/2] arm: vc1sdp: Change stride parameters to ptrdiff_t

2022-08-09 Thread Martin Storsjö
This was missed in db54426975e124e98e5130ad01316cb7afd60630. Signed-off-by: Martin Storsjö --- In practice, ptrdiff_t and int are the same type on arm, so these didn't cause any warnings and haven't been caught due to that. --- libavcodec/arm/vc1dsp_init_neon.c | 12 ++-- 1 file changed

[FFmpeg-devel] [PATCH 2/2] arm: rv40dsp: Change stride parameters to ptrdiff_t

2022-08-09 Thread Martin Storsjö
These were missed when h264_chroma_mc_func was changed in e4a94d8b36c48d95a7d412c40d7b558422ff659c. Signed-off-by: Martin Storsjö --- libavcodec/arm/rv40dsp_init_arm.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/libavcodec/arm/rv40dsp_init_arm.c b/libavcodec/arm

Re: [FFmpeg-devel] [PATCH 1/2] lavc/aarch64: new 8-bit hevc 16x16 idct

2022-08-09 Thread Martin Storsjö
On Thu, 23 Jun 2022, J. Dekker wrote: old: hevc_idct_16x16_8_c: 5366.2 hevc_idct_16x16_8_neon: 1493.2 new: hevc_idct_16x16_8_c: 5363.2 hevc_idct_16x16_8_neon: 943.5 Co-developed-by: Rafal Dabrowa Signed-off-by: J. Dekker --- libavcodec/aarch64/hevcdsp_idct_neon.S| 666

Re: [FFmpeg-devel] [PATCH] lavc/aarch64: add hevc chroma loop filter 8-12bit

2022-08-09 Thread Martin Storsjö
On Thu, 23 Jun 2022, J. Dekker wrote: Signed-off-by: J. Dekker --- libavcodec/aarch64/Makefile | 3 +- libavcodec/aarch64/hevcdsp_deblock_neon.S | 168 ++ libavcodec/aarch64/hevcdsp_init_aarch64.c | 14 ++ 3 files changed, 184 insertions(+), 1 deletion(-)

Re: [FFmpeg-devel] [PATCH 3/3] lavc/aarch64: hevc_add_res add 12bit variants

2022-08-09 Thread Martin Storsjö
On Tue, 9 Aug 2022, Martin Storsjö wrote: On Thu, 23 Jun 2022, J. Dekker wrote: hevc_add_res_4x4_12_c: 46.0 hevc_add_res_4x4_12_neon: 18.7 hevc_add_res_8x8_12_c: 194.7 hevc_add_res_8x8_12_neon: 25.2 hevc_add_res_16x16_12_c: 716.0 hevc_add_res_16x16_12_neon: 69.7 hevc_add_res_32x32_12_c

Re: [FFmpeg-devel] [PATCH 3/3] lavc/aarch64: hevc_add_res add 12bit variants

2022-08-09 Thread Martin Storsjö
On Thu, 23 Jun 2022, J. Dekker wrote: hevc_add_res_4x4_12_c: 46.0 hevc_add_res_4x4_12_neon: 18.7 hevc_add_res_8x8_12_c: 194.7 hevc_add_res_8x8_12_neon: 25.2 hevc_add_res_16x16_12_c: 716.0 hevc_add_res_16x16_12_neon: 69.7 hevc_add_res_32x32_12_c: 3820.7 hevc_add_res_32x32_12_neon: 261.0

Re: [FFmpeg-devel] [PATCH 2/3] lavc/aarch64: reformat add_res funcs

2022-08-09 Thread Martin Storsjö
On Thu, 23 Jun 2022, J. Dekker wrote: Signed-off-by: J. Dekker --- libavcodec/aarch64/hevcdsp_idct_neon.S | 216 - 1 file changed, 108 insertions(+), 108 deletions(-) LGTM, thanks! // Martin ___ ffmpeg-devel mailing list

Re: [FFmpeg-devel] [PATCH 1/3] checkasm/hevc_add_res: add 12bit test

2022-08-09 Thread Martin Storsjö
On Thu, 23 Jun 2022, J. Dekker wrote: Signed-off-by: J. Dekker --- tests/checkasm/hevc_add_res.c | 15 --- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/tests/checkasm/hevc_add_res.c b/tests/checkasm/hevc_add_res.c index 0c896adaca..f17d121939 100644 ---

Re: [FFmpeg-devel] [PATCH] checkasm: Silence warnings about unused return value from read()

2022-08-08 Thread Martin Storsjö
On Fri, 5 Aug 2022, Martin Storsjö wrote: On Wed, 27 Jul 2022, Andreas Rheinhardt wrote: Swinney, Jonathan: This patch looks good to me. I would appreciate its merging. } while (0) #define PERF_STOP(t) do { \ +int ret

Re: [FFmpeg-devel] [PATCH] swscale/output: fix reading chroma values when generating vuya output

2022-08-08 Thread Martin Storsjö
On Mon, 8 Aug 2022, James Almer wrote: Signed-off-by: James Almer --- libswscale/output.c | 4 ++-- tests/ref/fate/filter-pixdesc-vuya | 2 +- tests/ref/fate/filter-pixfmts-copy | 2 +- tests/ref/fate/filter-pixfmts-crop | 2 +-

<    1   2   3   4   5   6   7   8   9   10   >