Re: [FFmpeg-devel] [PATCH 3/3] lavc/aarch64: add hevc sao edge 8x8

2022-04-28 Thread Martin Storsjö
On Thu, 28 Apr 2022, J. Dekker wrote: bench on AWS Graviton: hevc_sao_edge_8x8_8_c: 516.0 hevc_sao_edge_8x8_8_neon: 81.0 Signed-off-by: J. Dekker --- libavcodec/aarch64/hevcdsp_init_aarch64.c | 3 ++ libavcodec/aarch64/hevcdsp_sao_neon.S | 51 +++ 2 files changed, 54

Re: [FFmpeg-devel] [PATCH] avcodec/openh264: return (DE|EN)CODER_NOT_FOUND if version check fails

2022-04-27 Thread Martin Storsjö
On Wed, 20 Apr 2022, Martin Storsjö wrote: On Fri, 18 Feb 2022, Andreas Schneider wrote: Signed-off-by: Andreas Schneider --- libavcodec/libopenh264dec.c | 2 +- libavcodec/libopenh264enc.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/libavcodec/libopenh264dec.c b

Re: [FFmpeg-devel] [PATCH] arm64: Fix wrong BTI landing pad

2022-04-26 Thread Martin Storsjö
On Mon, 25 Apr 2022, Andre Kempe wrote: This patch fixes a wrong type of BTI landing pad when branching to functions instantiated via the fft*_neon macro. Although the previously employed paciasp instruction serves as a landing pad, for the ways that this function is invoked it is the wrong

Re: [FFmpeg-devel] [PATCH v11 1/6] libavutil/wchar_filename.h: Add whcartoutf8, wchartoansi and utf8toansi

2022-04-25 Thread Martin Storsjö
On Mon, 25 Apr 2022, Hendrik Leppkes wrote: On Mon, Apr 25, 2022 at 1:12 PM Soft Works wrote: From my point of view: ffmpeg is already working pretty well in handling long file paths (also with Unicode characters) when pre-fixing paths with \\?\, and this is working on all Windows versions

Re: [FFmpeg-devel] [PATCH] swscale: aarch64: Optimize the final summation in the hscale routine

2022-04-22 Thread Martin Storsjö
On Thu, 21 Apr 2022, Swinney, Jonathan wrote: Thanks for making this improvement. I will rebase my patches on your change. I also measured the performance on AWS Graviton 2 and 3. I added the numbers to your table. Before: Cortex A53 A72 A73 Graviton 2

[FFmpeg-devel] av_fopen_utf8 and cross-DLL CRT object sharing issue on Windows

2022-04-20 Thread Martin Storsjö
Hi, I just became aware of the av_fopen_utf8 function - which was introduced to fix path name translations on Windows - actually has a notable design flaw. Background: On Windows, a process can contain more than one C runtime (CRT); the system comes with two shared ones (UCRT and

Re: [FFmpeg-devel] [PATCH] avcodec/openh264: return (DE|EN)CODER_NOT_FOUND if version check fails

2022-04-20 Thread Martin Storsjö
On Fri, 18 Feb 2022, Andreas Schneider wrote: Signed-off-by: Andreas Schneider --- libavcodec/libopenh264dec.c | 2 +- libavcodec/libopenh264enc.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/libavcodec/libopenh264dec.c b/libavcodec/libopenh264dec.c index

Re: [FFmpeg-devel] [PATCH v9 6/6] fftools: Use UTF-8 on Windows

2022-04-20 Thread Martin Storsjö
On Fri, 15 Apr 2022, Nil Admirari wrote: --- fftools/fftools.manifest | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/fftools/fftools.manifest b/fftools/fftools.manifest index 30b7d8fe..d1ac1e4e 100644 --- a/fftools/fftools.manifest +++ b/fftools/fftools.manifest @@ -3,8

Re: [FFmpeg-devel] [PATCH v9 5/6] fftools: Enable long path support on Windows (fixes #8885)

2022-04-20 Thread Martin Storsjö
On Fri, 15 Apr 2022, Nil Admirari wrote: --- fftools/Makefile | 5 + fftools/fftools.manifest | 10 ++ fftools/manifest.rc | 3 +++ 3 files changed, 18 insertions(+) create mode 100644 fftools/fftools.manifest create mode 100644 fftools/manifest.rc I think the change

Re: [FFmpeg-devel] [PATCH v9 4/6] fftools/cmdutils.c: Remove MAX_PATH limit and replace fopen with av_fopen_utf8

2022-04-20 Thread Martin Storsjö
On Fri, 15 Apr 2022, Nil Admirari wrote: --- fftools/cmdutils.c | 38 +- 1 file changed, 29 insertions(+), 9 deletions(-) diff --git a/fftools/cmdutils.c b/fftools/cmdutils.c index 5d7cdc3e..a66dbb22 100644 --- a/fftools/cmdutils.c +++ b/fftools/cmdutils.c @@

Re: [FFmpeg-devel] [PATCH v9 3/6] compat/w32dlfcn.h: Remove MAX_PATH limit and replace LoadLibraryExA with LoadLibraryExW

2022-04-20 Thread Martin Storsjö
On Fri, 15 Apr 2022, Nil Admirari wrote: --- compat/w32dlfcn.h | 78 ++- 1 file changed, 64 insertions(+), 14 deletions(-) diff --git a/compat/w32dlfcn.h b/compat/w32dlfcn.h index 52a94efa..0f41f50b 100644 --- a/compat/w32dlfcn.h +++

Re: [FFmpeg-devel] [PATCH v9 2/6] libavformat/avisynth.c: Remove MAX_PATH limit

2022-04-20 Thread Martin Storsjö
On Fri, 15 Apr 2022, Nil Admirari wrote: --- libavformat/avisynth.c | 12 +++- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/libavformat/avisynth.c b/libavformat/avisynth.c index 8ba2bdea..f7bea8c3 100644 --- a/libavformat/avisynth.c +++ b/libavformat/avisynth.c @@ -34,6

Re: [FFmpeg-devel] [PATCH v9 1/6] libavutil/wchar_filename.h: Add whcartoutf8, wchartoansi and utf8toansi

2022-04-20 Thread Martin Storsjö
On Fri, 15 Apr 2022, Nil Admirari wrote: These functions are going to be used in libavformat/avisynth.c and fftools/cmdutils.c remove MAX_PATH limit. --- libavutil/wchar_filename.h | 51 ++ 1 file changed, 51 insertions(+) I looked through this patchset now,

[FFmpeg-devel] [PATCH] swscale: aarch64: Optimize the final summation in the hscale routine

2022-04-20 Thread Martin Storsjö
, around 3-8% for the smaller filter sizes. Inspired by a patch by Jonathan Swinney . Signed-off-by: Martin Storsjö --- I'll go ahead and apply this patch within a few days if there's no opposition, as it should be a fairly uncontroversial change. --- libswscale/aarch64/hscale.S | 14 +++--- 1

Re: [FFmpeg-devel] [PATCH 1/2] swscale/aarch64: add hscale specializations

2022-04-20 Thread Martin Storsjö
On Sun, 17 Apr 2022, Martin Storsjö wrote: On Fri, 15 Apr 2022, Swinney, Jonathan wrote: This patch adds specializations for hscale for filterSize == 4 and 8 and converts the existing implementation for the X8 version. For the old code, now used for the X8 version, it improves the efficiency

Re: [FFmpeg-devel] [PATCH 2/2] swscale/aarch64: add vscale specializations

2022-04-19 Thread Martin Storsjö
On Fri, 15 Apr 2022, Swinney, Jonathan wrote: This commit adds new code paths for vscale when filterSize is 2, 4, or 8. By using specialized code with unrolling to match the filterSize we can improve performance. | (seconds) | c6g | | | | | - | - | - | |

Re: [FFmpeg-devel] [PATCH 1/1] librtmp: use AVBPrint instead of char *

2022-04-19 Thread Martin Storsjö
On Tue, 19 Apr 2022, Marton Balint wrote: On Sat, 16 Apr 2022, Martin Storsjö wrote: On Fri, 15 Apr 2022, Tristan Matthews wrote: This avoids having to do one pass to calculate the full length to allocate followed by a second pass to actually append values. --- libavformat/librtmp.c

Re: [FFmpeg-devel] [FFmpeg-cvslog] doc: install css files along html docs

2022-04-19 Thread Martin Storsjö
On Mon, 18 Apr 2022, Timo Rothenpieler wrote: ffmpeg | branch: master | Timo Rothenpieler | Thu Apr 7 20:11:24 2022 +0200| [d5687236aba6fd31dd4369c290df9a5b1192e43e] | committer: Timo Rothenpieler doc: install css files along html docs

Re: [FFmpeg-devel] [PATCH 2/2] swscale/aarch64: add vscale specializations

2022-04-16 Thread Martin Storsjö
On Fri, 15 Apr 2022, Swinney, Jonathan wrote: This commit adds new code paths for vscale when filterSize is 2, 4, or 8. By using specialized code with unrolling to match the filterSize we can improve performance. | (seconds) | c6g | | | | | - | - | - | |

Re: [FFmpeg-devel] [PATCH 1/2] swscale/aarch64: add hscale specializations

2022-04-16 Thread Martin Storsjö
On Fri, 15 Apr 2022, Swinney, Jonathan wrote: This patch adds specializations for hscale for filterSize == 4 and 8 and converts the existing implementation for the X8 version. For the old code, now used for the X8 version, it improves the efficiency of the final summations by reducing 11

Re: [FFmpeg-devel] [PATCH v2 0/1] lavc/aarch64: add some neon pix_abs functions

2022-04-16 Thread Martin Storsjö
On Fri, 15 Apr 2022, Martin Storsjö wrote: On Thu, 14 Apr 2022, Swinney, Jonathan wrote: Thanks Martin for the review. I made some updates according to the suggestions you made. I added a checkasm function, but I'm new to the test framework, so it may need some work still. Thanks

Re: [FFmpeg-devel] [PATCH 1/1] librtmp: use AVBPrint instead of char *

2022-04-16 Thread Martin Storsjö
On Fri, 15 Apr 2022, Tristan Matthews wrote: This avoids having to do one pass to calculate the full length to allocate followed by a second pass to actually append values. --- libavformat/librtmp.c | 124 +++--- 1 file changed, 33 insertions(+), 91

Re: [FFmpeg-devel] [PATCH v2 1/1] lavc/aarch64: add some neon pix_abs functions

2022-04-15 Thread Martin Storsjö
On Thu, 14 Apr 2022, Swinney, Jonathan wrote: - ff_pix_abs16_neon - ff_pix_abs16_xy2_neon In direct micro benchmarks of these ff functions verses their C implementations, these functions performed as follows on AWS Graviton 2: ff_pix_abs16_neon: c: benchmark ran 10 iterations in 0.955383

Re: [FFmpeg-devel] [PATCH v2 0/1] lavc/aarch64: add some neon pix_abs functions

2022-04-15 Thread Martin Storsjö
On Thu, 14 Apr 2022, Swinney, Jonathan wrote: Thanks Martin for the review. I made some updates according to the suggestions you made. I added a checkasm function, but I'm new to the test framework, so it may need some work still. Thanks for putting in the effort to make a test - that

Re: [FFmpeg-devel] [PATCH v1] avformat/ipfsgateway: define PATH_MAX

2022-04-14 Thread Martin Storsjö
On Thu, 14 Apr 2022, Mark Gaiser wrote: On Thu, Apr 14, 2022 at 10:25 AM Martin Storsjö wrote: On Wed, 13 Apr 2022, Mark Gaiser wrote: > On Wed, Apr 13, 2022 at 5:21 PM Mark Gaiser wrote: > >> PATH_MAX is posix. Some compilers (MSVC) don't define this >> thus

Re: [FFmpeg-devel] [PATCH v1] avformat/ipfsgateway: define PATH_MAX

2022-04-14 Thread Martin Storsjö
On Wed, 13 Apr 2022, Mark Gaiser wrote: On Wed, Apr 13, 2022 at 5:21 PM Mark Gaiser wrote: PATH_MAX is posix. Some compilers (MSVC) don't define this thus failing to compile the ipfsgateway file. Defining it fixes the compile. Signed-off-by: Mark Gaiser --- libavformat/ipfsgateway.c | 6

Re: [FFmpeg-devel] [PATCH 1/1] librtmp: use AVBPrint instead of char *

2022-04-13 Thread Martin Storsjö
On Wed, 13 Apr 2022, Marton Balint wrote: On Wed, 13 Apr 2022, Martin Storsjö wrote: On Mon, 11 Apr 2022, Tristan Matthews wrote: This avoids having to do one pass to calculate the full length to allocate followed by a second pass to actually append values. --- libavformat/librtmp.c

Re: [FFmpeg-devel] [PATCH 1/1] librtmp: use AVBPrint instead of char *

2022-04-13 Thread Martin Storsjö
On Mon, 11 Apr 2022, Tristan Matthews wrote: This avoids having to do one pass to calculate the full length to allocate followed by a second pass to actually append values. --- libavformat/librtmp.c | 123 +++--- 1 file changed, 32 insertions(+), 91

Re: [FFmpeg-devel] [PATCH 4/4] fate/oma: Use REMUX where appropriate

2022-04-13 Thread Martin Storsjö
On Tue, 12 Apr 2022, Andreas Rheinhardt wrote: Simplifies the checks. Signed-off-by: Andreas Rheinhardt --- tests/fate/oma.mak | 10 ++ 1 file changed, 2 insertions(+), 8 deletions(-) diff --git a/tests/fate/oma.mak b/tests/fate/oma.mak index a088feff21..7e2020b7d0 100644 ---

Re: [FFmpeg-devel] [PATCH 3/4] fate/subtitles: Use REMUX where appropriate

2022-04-13 Thread Martin Storsjö
On Tue, 12 Apr 2022, Andreas Rheinhardt wrote: It also adds the missing depenencies on the file and pipe protocols and the framecrc muxer. Signed-off-by: Andreas Rheinhardt --- tests/fate/subtitles.mak | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tests/fate/subtitles.mak

Re: [FFmpeg-devel] [PATCH 2/4] fate/image: Use TRANSCODE where appropriate

2022-04-13 Thread Martin Storsjö
On Tue, 12 Apr 2022, Andreas Rheinhardt wrote: This also adds previously forgotten requirements. E.g. fate-jpg-icc actually depends on the png decoder, so that it should not be run when e.g. zlib is disabled, yet it happens, see

Re: [FFmpeg-devel] [PATCH 1/4] tests/Makefile: Add auxiliary functions for transcode and stream_remux

2022-04-13 Thread Martin Storsjö
On Tue, 12 Apr 2022, Andreas Rheinhardt wrote: Tests using the transcode and stream_remux functions have some common requirements (namely the file and pipe protocols as well as the framecrc muxer) and also other commonalities: The create a file and read it immediately afterwards, so that they

Re: [FFmpeg-devel] [PATCH v3 00/10] avcodec/vc1: Arm optimisations

2022-04-01 Thread Martin Storsjö
On Fri, 1 Apr 2022, Martin Storsjö wrote: On Thu, 31 Mar 2022, Ben Avison wrote: The VC1 decoder was missing lots of important fast paths for Arm, especially for 64-bit Arm. This submission fills in implementations for all functions where a fast path already existed and the fallback C

Re: [FFmpeg-devel] [PATCH v3 00/10] avcodec/vc1: Arm optimisations

2022-03-31 Thread Martin Storsjö
On Thu, 31 Mar 2022, Ben Avison wrote: The VC1 decoder was missing lots of important fast paths for Arm, especially for 64-bit Arm. This submission fills in implementations for all functions where a fast path already existed and the fallback C implementation was taking 1% or more of the

Re: [FFmpeg-devel] [PATCH 08/10] avcodec/idctdsp: Arm 64-bit NEON block add and clamp fast paths

2022-03-31 Thread Martin Storsjö
On Thu, 31 Mar 2022, Ben Avison wrote: On 30/03/2022 15:14, Martin Storsjö wrote: On Fri, 25 Mar 2022, Ben Avison wrote: +// Clamp 16-bit signed block coefficients to signed 8-bit (biased by 128) +// On entry: +//   x0 -> array of 64x 16-bit coefficients +//   x1 -> 8-bit results +/

Re: [FFmpeg-devel] [PATCH 07/10] avcodec/vc1: Arm 64-bit NEON inverse transform fast paths

2022-03-31 Thread Martin Storsjö
On Thu, 31 Mar 2022, Ben Avison wrote: On 30/03/2022 14:49, Martin Storsjö wrote: Looks generally reasonable. Is it possible to factorize out the individual transforms (so that you'd e.g. invoke the same macro twice in the 8x8 and 4x4 functions) without too much loss? There is a close

Re: [FFmpeg-devel] [PATCH 05/10] avcodec/vc1: Arm 64-bit NEON deblocking filter fast paths

2022-03-31 Thread Martin Storsjö
On Thu, 31 Mar 2022, Ben Avison wrote: On 30/03/2022 13:35, Martin Storsjö wrote: Overall, the code looks sensible to me. Would it make sense to share the core of the filter between the horizontal/vertical cases with e.g. a macro? (I didn't check in detail if there's much differences

Re: [FFmpeg-devel] [PATCH 04/10] avcodec/vc1: Introduce fast path for unescaping bitstream buffer

2022-03-31 Thread Martin Storsjö
On Thu, 31 Mar 2022, Ben Avison wrote: On 29/03/2022 21:37, Martin Storsjö wrote: On Fri, 25 Mar 2022, Ben Avison wrote: As with the rest of the checkasm tests - please unmacro most things where possible (except for the RANDOMIZE_* macros, those are ok to keep macroed if you want

Re: [FFmpeg-devel] [PATCH 10/10] avcodec/vc1: Arm 32-bit NEON unescape fast path

2022-03-30 Thread Martin Storsjö
On Fri, 25 Mar 2022, Ben Avison wrote: checkasm benchmarks on 1.5 GHz Cortex-A72 are as follows. vc1dsp.vc1_unescape_buffer_c: 918624.7 vc1dsp.vc1_unescape_buffer_neon: 142958.0 Signed-off-by: Ben Avison --- libavcodec/arm/vc1dsp_init_neon.c | 61 +++ libavcodec/arm/vc1dsp_neon.S

Re: [FFmpeg-devel] [PATCH 09/10] avcodec/vc1: Arm 64-bit NEON unescape fast path

2022-03-30 Thread Martin Storsjö
On Fri, 25 Mar 2022, Ben Avison wrote: checkasm benchmarks on 1.5 GHz Cortex-A72 are as follows. vc1dsp.vc1_unescape_buffer_c: 655617.7 vc1dsp.vc1_unescape_buffer_neon: 118237.0 Signed-off-by: Ben Avison --- libavcodec/aarch64/vc1dsp_init_aarch64.c | 61

Re: [FFmpeg-devel] [PATCH 08/10] avcodec/idctdsp: Arm 64-bit NEON block add and clamp fast paths

2022-03-30 Thread Martin Storsjö
On Fri, 25 Mar 2022, Ben Avison wrote: checkasm benchmarks on 1.5 GHz Cortex-A72 are as follows. idctdsp.add_pixels_clamped_c: 323.0 idctdsp.add_pixels_clamped_neon: 41.5 idctdsp.put_pixels_clamped_c: 243.0 idctdsp.put_pixels_clamped_neon: 30.0 idctdsp.put_signed_pixels_clamped_c: 225.7

Re: [FFmpeg-devel] [PATCH 07/10] avcodec/vc1: Arm 64-bit NEON inverse transform fast paths

2022-03-30 Thread Martin Storsjö
On Wed, 30 Mar 2022, Martin Storsjö wrote: On Fri, 25 Mar 2022, Ben Avison wrote: checkasm benchmarks on 1.5 GHz Cortex-A72 are as follows. vc1dsp.vc1_inv_trans_4x4_c: 158.2 vc1dsp.vc1_inv_trans_4x4_neon: 65.7 vc1dsp.vc1_inv_trans_4x4_dc_c: 86.5 vc1dsp.vc1_inv_trans_4x4_dc_neon: 26.5

Re: [FFmpeg-devel] [PATCH 07/10] avcodec/vc1: Arm 64-bit NEON inverse transform fast paths

2022-03-30 Thread Martin Storsjö
On Fri, 25 Mar 2022, Ben Avison wrote: checkasm benchmarks on 1.5 GHz Cortex-A72 are as follows. vc1dsp.vc1_inv_trans_4x4_c: 158.2 vc1dsp.vc1_inv_trans_4x4_neon: 65.7 vc1dsp.vc1_inv_trans_4x4_dc_c: 86.5 vc1dsp.vc1_inv_trans_4x4_dc_neon: 26.5 vc1dsp.vc1_inv_trans_4x8_c: 335.2

Re: [FFmpeg-devel] [PATCH 06/10] avcodec/vc1: Arm 32-bit NEON deblocking filter fast paths

2022-03-30 Thread Martin Storsjö
On Fri, 25 Mar 2022, Ben Avison wrote: checkasm benchmarks on 1.5 GHz Cortex-A72 are as follows. Note that the C version can still outperform the NEON version in specific cases. The balance between different code paths is stream-dependent, but in practice the best case happens about 5% of the

Re: [FFmpeg-devel] [PATCH 06/10] avcodec/vc1: Arm 32-bit NEON deblocking filter fast paths

2022-03-30 Thread Martin Storsjö
On Fri, 25 Mar 2022, Ben Avison wrote: checkasm benchmarks on 1.5 GHz Cortex-A72 are as follows. Note that the C version can still outperform the NEON version in specific cases. The balance between different code paths is stream-dependent, but in practice the best case happens about 5% of the

Re: [FFmpeg-devel] [PATCH 05/10] avcodec/vc1: Arm 64-bit NEON deblocking filter fast paths

2022-03-30 Thread Martin Storsjö
On Fri, 25 Mar 2022, Ben Avison wrote: checkasm benchmarks on 1.5 GHz Cortex-A72 are as follows. Note that the C version can still outperform the NEON version in specific cases. The balance between different code paths is stream-dependent, but in practice the best case happens about 5% of the

Re: [FFmpeg-devel] [PATCH] test: tiny_ssim: Don't include config.h

2022-03-30 Thread Martin Storsjö
On Sun, 27 Mar 2022, Martin Storsjö wrote: tiny_ssim is built for the build host, not for the target platform. Therefore, it mustn't include the config.h header, which is set up specifically for the target platform and compiler. This fixes cross building for older WinStore platforms, where

Re: [FFmpeg-devel] [PATCH] vc1dsp: Change remaining stride parameters to ptrdiff_t

2022-03-30 Thread Martin Storsjö
On Tue, 29 Mar 2022, Ben Avison wrote: On 29/03/2022 13:44, Martin Storsjö wrote: The existing x86 assembly for loop filters uses the stride as a full register without clearing/sign extending the upper half of the registers on x86_64. This avoids crashes if the caller would have passed

[FFmpeg-devel] [PATCH v2] vc1dsp: Change remaining stride parameters to ptrdiff_t

2022-03-29 Thread Martin Storsjö
-by: Martin Storsjö --- Updated function signatures in the mips code too, updated the left_stride/right_stride parameters in the vc1_h_s_overlap function too, updated the comments in the x86 assembly. --- libavcodec/mips/vc1dsp_mips.h| 20 ++-- libavcodec/mips/vc1dsp_mmi.c

Re: [FFmpeg-devel] [PATCH 04/10] avcodec/vc1: Introduce fast path for unescaping bitstream buffer

2022-03-29 Thread Martin Storsjö
On Fri, 25 Mar 2022, Ben Avison wrote: void ff_vc1dsp_init(VC1DSPContext* c); diff --git a/tests/checkasm/vc1dsp.c b/tests/checkasm/vc1dsp.c index 0823ccad31..0ab5892403 100644 --- a/tests/checkasm/vc1dsp.c +++ b/tests/checkasm/vc1dsp.c @@ -286,6 +286,20 @@ static matrix

Re: [FFmpeg-devel] [PATCH 03/10] checkasm: Add idctdsp add/put-pixels-clamped tests

2022-03-29 Thread Martin Storsjö
On Tue, 29 Mar 2022, Ben Avison wrote: Thirdly - the added test also occasionally fails for the other existing functions (armv6, neon) and the newly added aarch64 neon version. If you have e.g. src[] = 32767, dst[] = 255, then the widening 8->16 addition will overflow, as there's no operation

Re: [FFmpeg-devel] [PATCH 03/10] checkasm: Add idctdsp add/put-pixels-clamped tests

2022-03-29 Thread Martin Storsjö
On Tue, 29 Mar 2022, Martin Storsjö wrote: On Fri, 25 Mar 2022, Ben Avison wrote: Disable ff_add_pixels_clamped_arm, which was found to fail the test. As this is normally only used for Arms prior to Armv6 (ARM11) it seems quite unlikely that anyone is still using this, so I haven't put

Re: [FFmpeg-devel] [PATCH 03/10] checkasm: Add idctdsp add/put-pixels-clamped tests

2022-03-29 Thread Martin Storsjö
On Fri, 25 Mar 2022, Ben Avison wrote: Disable ff_add_pixels_clamped_arm, which was found to fail the test. As this is normally only used for Arms prior to Armv6 (ARM11) it seems quite unlikely that anyone is still using this, so I haven't put in the effort to debug it. I had a look at this

[FFmpeg-devel] [PATCH] vc1dsp: Change remaining stride parameters to ptrdiff_t

2022-03-29 Thread Martin Storsjö
-by: Martin Storsjö --- libavcodec/vc1dsp.c | 20 ++-- libavcodec/vc1dsp.h | 16 libavcodec/x86/vc1dsp_init.c | 16 3 files changed, 26 insertions(+), 26 deletions(-) diff --git a/libavcodec/vc1dsp.c b/libavcodec/vc1dsp.c index

Re: [FFmpeg-devel] [PATCH 01/10] checkasm: Add vc1dsp in-loop deblocking filter tests

2022-03-29 Thread Martin Storsjö
On Fri, 25 Mar 2022, Ben Avison wrote: Note that the benchmarking results for these functions are highly dependent upon the input data. Therefore, each function is benchmarked twice, corresponding to the best and worst case complexity of the reference C implementation. The performance of a real

Re: [FFmpeg-devel] [PATCH 02/10] checkasm: Add vc1dsp inverse transform tests

2022-03-29 Thread Martin Storsjö
On Fri, 25 Mar 2022, Ben Avison wrote: This test deliberately doesn't exercise the full range of inputs described in the committee draft VC-1 standard. It says: input coefficients in frequency domain, D, satisfy -2048 <= D < 2047 intermediate coefficients, E, satisfy-4096 <=

Re: [FFmpeg-devel] [PATCH 01/10] checkasm: Add vc1dsp in-loop deblocking filter tests

2022-03-29 Thread Martin Storsjö
On Fri, 25 Mar 2022, Ben Avison wrote: Note that the benchmarking results for these functions are highly dependent upon the input data. Therefore, each function is benchmarked twice, corresponding to the best and worst case complexity of the reference C implementation. The performance of a real

Re: [FFmpeg-devel] [PATCH 01/10] checkasm: Add vc1dsp in-loop deblocking filter tests

2022-03-29 Thread Martin Storsjö
On Mon, 28 Mar 2022, Ben Avison wrote: On 25/03/2022 22:53, Martin Storsjö wrote: On Fri, 25 Mar 2022, Ben Avison wrote: +#define CHECK_LOOP_FILTER(func) \ +    do

[FFmpeg-devel] [PATCH] test: tiny_ssim: Don't include config.h

2022-03-26 Thread Martin Storsjö
NULL". Signed-off-by: Martin Storsjö --- tests/tiny_ssim.c | 1 - 1 file changed, 1 deletion(-) diff --git a/tests/tiny_ssim.c b/tests/tiny_ssim.c index 08f8e92a03..9740652288 100644 --- a/tests/tiny_ssim.c +++ b/tests/tiny_ssim.c @@ -27,7 +27,6 @@ * overlapped 8x8 block sums, rather than th

Re: [FFmpeg-devel] [GAS-PP PATCH] Handle the aarch64 tbnz intruction in the same way as tbz, for armasm64

2022-03-25 Thread Martin Storsjö
On Mon, 21 Mar 2022, Martin Storsjö wrote: --- I'll apply in a couple days if there's no comments. --- gas-preprocessor.pl | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) Pushed. // Martin ___ ffmpeg-devel mailing list ffmpeg-devel

Re: [FFmpeg-devel] [PATCH 01/10] checkasm: Add vc1dsp in-loop deblocking filter tests

2022-03-25 Thread Martin Storsjö
On Fri, 25 Mar 2022, Ben Avison wrote: Note that the benchmarking results for these functions are highly dependent upon the input data. Therefore, each function is benchmarked twice, corresponding to the best and worst case complexity of the reference C implementation. The performance of a real

Re: [FFmpeg-devel] [PATCH] rtpenc_vp8: Use 15-bit PictureIDs

2022-03-25 Thread Martin Storsjö
On Tue, 22 Mar 2022, ke...@muxable.com wrote: From: Kevin Wang 7-bit PictureIDs are not supported by WebRTC: https://groups.google.com/g/discuss-webrtc/c/333-L02vuWA In practice, 15-bit PictureIDs offer better compatibility. Signed-off-by: Kevin Wang --- libavformat/rtpenc_vp8.c | 3 ++- 1

Re: [FFmpeg-devel] [PATCH 06/10] avcodec/vc1: Arm 32-bit NEON deblocking filter fast paths

2022-03-25 Thread Martin Storsjö
On Fri, 25 Mar 2022, Lynne wrote: 25 Mar 2022, 19:52 by bavi...@riscosopen.org: +@ VC-1 in-loop deblocking filter for 4 pixel pairs at boundary of vertically-neighbouring blocks +@ On entry: +@ r0 -> top-left pel of lower block +@ r1 = row stride, bytes +@ r2 = PQUANT bitstream

Re: [FFmpeg-devel] [PATCH 0/6] avcodec/vc1: Arm optimisations

2022-03-21 Thread Martin Storsjö
On Mon, 21 Mar 2022, Ben Avison wrote: On 19/03/2022 23:06, Martin Storsjö wrote: As you are writing assembly for these functions, I would very much appreciate if you could add checkasm tests for all the functions you're implementing. I see that there exists a test for the blockdsp functions

Re: [FFmpeg-devel] [PATCH 6/6] avcodec/vc1: Introduce fast path for unescaping bitstream buffer

2022-03-21 Thread Martin Storsjö
On Mon, 21 Mar 2022, Ben Avison wrote: On 18/03/2022 19:10, Andreas Rheinhardt wrote: Ben Avison: +static int vc1_unescape_buffer_neon(const uint8_t *src, int size, uint8_t *dst) +{ +/* Dealing with starting and stopping, and removing escape bytes, are + * comparatively less

[FFmpeg-devel] [GAS-PP PATCH] Handle the aarch64 tbnz intruction in the same way as tbz, for armasm64

2022-03-21 Thread Martin Storsjö
--- I'll apply in a couple days if there's no comments. --- gas-preprocessor.pl | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/gas-preprocessor.pl b/gas-preprocessor.pl index 67b130e..59c93c1 100755 --- a/gas-preprocessor.pl +++ b/gas-preprocessor.pl @@ -943,7 +943,7 @@

Re: [FFmpeg-devel] [PATCH 0/6] avcodec/vc1: Arm optimisations

2022-03-19 Thread Martin Storsjö
On Sun, 20 Mar 2022, Martin Storsjö wrote: The other main issue I'd like to request is to indent the assembly similarly to the rest of the existing assembly. For the 32 bit assembly, your patches do match the surrounding code, but for the 64 bit assembly, your patches align the operands

Re: [FFmpeg-devel] [PATCH 0/6] avcodec/vc1: Arm optimisations

2022-03-19 Thread Martin Storsjö
Hi Ben, On Thu, 17 Mar 2022, Ben Avison wrote: The VC1 decoder was missing lots of important fast paths for Arm, especially for 64-bit Arm. This submission fills in implementations for all functions where a fast path already existed and the fallback C implementation was taking 1% or more of

Re: [FFmpeg-devel] [PATCH] Keep including the full version.h when headers are included externally

2022-03-18 Thread Martin Storsjö
On Fri, 18 Mar 2022, Martin Storsjö wrote: This avoids unnecessary churn and build breakage for users, by making sure the whole version.h is included like it has been so far, while keeping the benefit of not needing to rebuild most files in the ffmpeg tree on minor/micro bumps. --- Surprisingly

[FFmpeg-devel] [PATCH] Keep including the full version.h when headers are included externally

2022-03-18 Thread Martin Storsjö
This avoids unnecessary churn and build breakage for users, by making sure the whole version.h is included like it has been so far, while keeping the benefit of not needing to rebuild most files in the ffmpeg tree on minor/micro bumps. --- Surprisingly many downstream users do seem to rely on the

Re: [FFmpeg-devel] [PATCH 3/3] gitignore: add config_components.h

2022-03-17 Thread Martin Storsjö
On Thu, 17 Mar 2022, James Almer wrote: Signed-off-by: James Almer --- .gitignore | 1 + 1 file changed, 1 insertion(+) All three LGTM - thanks, and sorry for missing these! // Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org

Re: [FFmpeg-devel] [PATCH] Fix libversion.sh for split headers

2022-03-17 Thread Martin Storsjö
On Wed, 16 Mar 2022, Martin Storsjö wrote: --- The extra dummy version_major.h isn't pretty though, but needed (I think?) to fulfill the make dependency. --- ffbuild/library.mak | 4 ++-- ffbuild/libversion.sh | 4 libavutil/version_major.h | 25 + 3 files

[FFmpeg-devel] [PATCH] Fix libversion.sh for split headers

2022-03-16 Thread Martin Storsjö
--- The extra dummy version_major.h isn't pretty though, but needed (I think?) to fulfill the make dependency. --- ffbuild/library.mak | 4 ++-- ffbuild/libversion.sh | 4 libavutil/version_major.h | 25 + 3 files changed, 31 insertions(+), 2 deletions(-)

Re: [FFmpeg-devel] [PATCH] avutil/attributes: add support for clang in AV_NOWARN_DEPRECATED

2022-03-16 Thread Martin Storsjö
On Wed, 16 Mar 2022, James Almer wrote: Signed-off-by: James Almer --- libavutil/attributes.h | 2 +- libavutil/version.h| 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/libavutil/attributes.h b/libavutil/attributes.h index 5cb9fe3452..04c615c952 100644 ---

Re: [FFmpeg-devel] [PATCH 00/13] [RFC] Reduce unnecessary recompilation

2022-03-16 Thread Martin Storsjö
On Mon, 14 Mar 2022, Michael Niedermayer wrote: On Fri, Mar 11, 2022 at 02:17:42PM +0200, Martin Storsjö wrote: On Wed, 23 Feb 2022, Martin Storsjö wrote: When updating the ffmpeg source, one quite often ends up in a situation where practically all of the codebase (or all of a library) gets

Re: [FFmpeg-devel] [PATCH] aarch64: Only emit the PAC/BTI note section when targeting ELF

2022-03-14 Thread Martin Storsjö
On Wed, 9 Mar 2022, Martin Storsjö wrote: This avoids build errors if such features are enabled while targeting another binary format. (Using such features on other platforms might require some other form of signaling/setup though, but the ELF specific .note section isn't applicable at least

Re: [FFmpeg-devel] [PATCH] lavc/aarch64: add some neon pix_abs functions

2022-03-14 Thread Martin Storsjö
On Mon, 7 Mar 2022, Pop, Sebastian wrote: Here are a few suggestions: +add d18, d17, d18 // add to the end result register [...] +mov w0, v18.S[0]// copy result to general purpose register I think you can use 32-bit register s18 instead

Re: [FFmpeg-devel] [PATCH] lavc/aarch64: add some neon pix_abs functions

2022-03-14 Thread Martin Storsjö
On Mon, 7 Mar 2022, Swinney, Jonathan wrote: - ff_pix_abs16_neon - ff_pix_abs16_xy2_neon In direct micro benchmarks of these ff functions verses their C implementations, these functions performed as follows on AWS Graviton 2: ff_pix_abs16_neon: c: benchmark ran 10 iterations in 0.955383

Re: [FFmpeg-devel] [PATCH v2 1/9] libavcodec: Split version.h

2022-03-12 Thread Martin Storsjö
On Fri, 11 Mar 2022, Martin Storsjö wrote: This avoids including version.h in all source files, avoiding unnecessary rebuilds when the version number is bumped. Only version_major.h is included by the main header, which defines availability of e.g. FF_API_* macros, and which is bumped much less

Re: [FFmpeg-devel] [PATCH] movenc: Use LIBAVFORMAT_IDENT instead of LIBAVCODEC_IDENT

2022-03-12 Thread Martin Storsjö
On Sat, 12 Mar 2022, James Almer wrote: On 3/11/2022 11:23 AM, Martin Storsjö wrote: The muxer seems to have had one seemingly accidental use of LIBAVCODEC_IDENT, while LIBAVFORMAT_IDENT probably is the relevant one (which is used multiple times in the same file). Signed-off-by: Martin

[FFmpeg-devel] [PATCH] movenc: Use LIBAVFORMAT_IDENT instead of LIBAVCODEC_IDENT

2022-03-11 Thread Martin Storsjö
The muxer seems to have had one seemingly accidental use of LIBAVCODEC_IDENT, while LIBAVFORMAT_IDENT probably is the relevant one (which is used multiple times in the same file). Signed-off-by: Martin Storsjö --- libavformat/movenc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff

[FFmpeg-devel] [PATCH v3 9/9] configure: Use a separate config_components.h header for $ALL_COMPONENTS

2022-03-11 Thread Martin Storsjö
This avoids unnecessary rebuilds of most source files if only the list of enabled components has changed, but not the other properties of the build, set in config.h. --- Patchwork notified me that the previous round failed building libavdevice/alsa.c due to missing an include of the new header. I

[FFmpeg-devel] [PATCH v2 9/9] configure: Use a separate config_components.h header for $ALL_COMPONENTS

2022-03-11 Thread Martin Storsjö
This avoids unnecessary rebuilds of most source files if only the list of enabled components has changed, but not the other properties of the build, set in config.h. --- configure | 17 +++-- fftools/ffplay.c | 1 + libavcodec/8svx.c

[FFmpeg-devel] [PATCH v2 8/9] doc: Add an entry to APIchanges about no longer implicitly including version.h

2022-03-11 Thread Martin Storsjö
--- doc/APIchanges | 6 ++ 1 file changed, 6 insertions(+) diff --git a/doc/APIchanges b/doc/APIchanges index ccc4f24b28..365a9747c9 100644 --- a/doc/APIchanges +++ b/doc/APIchanges @@ -14,6 +14,12 @@ libavutil: 2021-04-27 API changes, most recent first: +2022-*-* - xx - all

[FFmpeg-devel] [PATCH v2 7/9] libavfilter: Split version.h

2022-03-11 Thread Martin Storsjö
--- fftools/cmdutils.c | 1 + fftools/ffprobe.c | 1 + libavfilter/Makefile| 1 + libavfilter/avfilter.c | 1 + libavfilter/avfilter.h | 2 +- libavfilter/version.h | 13 ++-- libavfilter/version_major.h | 42

[FFmpeg-devel] [PATCH v2 6/9] libswscale: Split version.h

2022-03-11 Thread Martin Storsjö
--- fftools/cmdutils.c | 1 + fftools/ffprobe.c | 1 + libswscale/Makefile| 1 + libswscale/swscale.h | 2 +- libswscale/utils.c | 1 + libswscale/version.h | 9 ++--- libswscale/version_major.h | 35 +++ 7

[FFmpeg-devel] [PATCH v2 5/9] libswresample: Split version.h

2022-03-11 Thread Martin Storsjö
--- fftools/cmdutils.c| 1 + fftools/ffprobe.c | 1 + libswresample/Makefile| 1 + libswresample/swresample.c| 1 + libswresample/swresample.h| 2 +- libswresample/version.h | 3 ++- libswresample/version_major.h | 31

[FFmpeg-devel] [PATCH v2 4/9] libpostproc: Split version.h

2022-03-11 Thread Martin Storsjö
--- fftools/cmdutils.c | 1 + fftools/ffprobe.c | 1 + libpostproc/Makefile| 1 + libpostproc/postprocess.c | 1 + libpostproc/postprocess.h | 2 +- libpostproc/version.h | 3 ++- libpostproc/version_major.h | 31 +++ 7 files

[FFmpeg-devel] [PATCH v2 3/9] libavdevice: Split version.h

2022-03-11 Thread Martin Storsjö
--- fftools/cmdutils.c | 1 + fftools/ffprobe.c | 1 + libavdevice/Makefile| 1 + libavdevice/avdevice.c | 1 + libavdevice/avdevice.h | 2 +- libavdevice/version.h | 10 ++ libavdevice/version_major.h | 37

[FFmpeg-devel] [PATCH v2 2/9] libavformat: Split version.h

2022-03-11 Thread Martin Storsjö
--- fftools/cmdutils.c| 1 + fftools/ffprobe.c | 1 + libavdevice/pulse_audio_dec.c | 1 + libavdevice/pulse_audio_enc.c | 1 + libavformat/Makefile | 1 + libavformat/avformat.h| 2 +- libavformat/avio.h| 2 +- libavformat/flacenc.c

[FFmpeg-devel] [PATCH v2 1/9] libavcodec: Split version.h

2022-03-11 Thread Martin Storsjö
This avoids including version.h in all source files, avoiding unnecessary rebuilds when the version number is bumped. Only version_major.h is included by the main header, which defines availability of e.g. FF_API_* macros, and which is bumped much less often. --- fftools/cmdutils.c |

Re: [FFmpeg-devel] [PATCH 00/13] [RFC] Reduce unnecessary recompilation

2022-03-11 Thread Martin Storsjö
On Wed, 23 Feb 2022, Martin Storsjö wrote: When updating the ffmpeg source, one quite often ends up in a situation where practically all of the codebase (or all of a library) gets rebuilt, due to updates to headers that are included in most files. In some cases, full rebuilds are warranted

Re: [FFmpeg-devel] [PATCH] configure: move ranlib -D test after setting defaults

2022-03-10 Thread Martin Storsjö
On Mon, 14 Feb 2022, Adrian Ratiu wrote: In Gentoo and ChromeOS we want to allow pure LLVM builds without using GNU tools, so we block any unwanted mixed GNU/LLVM usages (GNU tools are still kept around in our chroots for projects like glibc which cannot yet be built otherwise). The default

Re: [FFmpeg-devel] [PATCH] arm64: Add Armv8.3-A PAC support to assembly files

2022-03-09 Thread Martin Storsjö
On Tue, 22 Feb 2022, Martin Storsjö wrote: On Mon, 14 Feb 2022, Andre Kempe wrote: This patch adds optional support for Arm Pointer Authentication Codes. PAC support is turned on or off at compile time using additional compiler flags. Unless any of these is enabled explicitly, no additional

[FFmpeg-devel] [PATCH] aarch64: Only emit the PAC/BTI note section when targeting ELF

2022-03-09 Thread Martin Storsjö
This avoids build errors if such features are enabled while targeting another binary format. (Using such features on other platforms might require some other form of signaling/setup though, but the ELF specific .note section isn't applicable at least.) Signed-off-by: Martin Storsjö

Re: [FFmpeg-devel] [PATCH] libavfilter: vf_scale: Properly take in->color_range into account

2022-03-06 Thread Martin Storsjö
On Sun, 6 Mar 2022, Michael Niedermayer wrote: On Sat, Mar 05, 2022 at 11:33:15PM +0200, Martin Storsjö wrote: On Fri, 4 Mar 2022, Michael Niedermayer wrote: On Thu, Mar 03, 2022 at 02:06:45PM +0200, Martin Storsjö wrote: While swscale can be reconfigured with sws_setColorspaceDetails

Re: [FFmpeg-devel] [PATCH] avfilter/vf_colorlevels: Fix build failure on ARM

2022-03-06 Thread Martin Storsjö
On Sun, 6 Mar 2022, Michael Niedermayer wrote: Signed-off-by: Michael Niedermayer --- libavfilter/vf_colorlevels.c | 16 1 file changed, 8 insertions(+), 8 deletions(-) LGTM (Sorry I didn't check the ML before sending my patch. Maybe mention the commit that introduced the

[FFmpeg-devel] [PATCH] avfilter/vf_colorlevels: Fix building for arm

2022-03-06 Thread Martin Storsjö
This fixes building for arm after 10c2ef1ca41dbe7811f0588f4163c8cf7b8fda66. The argument to av_clip_uintp2 must be an assembly time immediate constant. Signed-off-by: Martin Storsjö --- libavfilter/vf_colorlevels.c | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff

Re: [FFmpeg-devel] [PATCH] libavfilter: vf_scale: Properly take in->color_range into account

2022-03-05 Thread Martin Storsjö
On Fri, 4 Mar 2022, Michael Niedermayer wrote: On Thu, Mar 03, 2022 at 02:06:45PM +0200, Martin Storsjö wrote: While swscale can be reconfigured with sws_setColorspaceDetails, the in/out ranges also need to be set before calling sws_init_context, otherwise the initialization might choose

Re: [FFmpeg-devel] [PATCH] avcodec/dnxhdenc: retry increasing qscale to not overflow max_bits

2022-03-05 Thread Martin Storsjö
On Sat, 5 Mar 2022, Paul B Mahol wrote: Increase mb_bits type from uint16_t to uint32_t to fix possible oveflows in bit size calculations. Update fate test that needs change. Signed-off-by: Paul B Mahol --- libavcodec/dnxhdenc.c | 8 +--- libavcodec/dnxhdenc.h | 2 +-

<    2   3   4   5   6   7   8   9   10   11   >