On Fri, 14 Jun 2024, Zhao Zhili wrote:
On Jun 13, 2024, at 20:54, Martin Storsjö wrote:
On Fri, 7 Jun 2024, Martin Storsjö wrote:
The default timer register pmccntr_el0 usually requires enabling
access with e.g. a kernel module.
---
cntvct_el0 has significantly better resolution than
On Fri, 14 Jun 2024, Gyan Doshi wrote:
On 2024-06-14 02:18 am, Martin Storsjö wrote:
On Thu, 13 Jun 2024, Gyan Doshi wrote:
On 2024-06-13 06:20 pm, Martin Storsjö wrote:
I'd otherwise want to push this, but I'm not entirely satisfied with
the option name quite yet. I'm
On Thu, 13 Jun 2024, Gyan Doshi wrote:
On 2024-06-13 06:20 pm, Martin Storsjö wrote:
On Wed, 5 Jun 2024, Martin Storsjö wrote:
This allows ending up with a normal, non-fragmented file when
the file is finished, while keeping the file readable if writing
is aborted abruptly at any point
On Fri, 7 Jun 2024, Martin Storsjö wrote:
The default timer register pmccntr_el0 usually requires enabling
access with e.g. a kernel module.
---
cntvct_el0 has significantly better resolution than
av_gettime_relative (while the unscaled nanosecond output of
clock_gettime is much higher
On Wed, 5 Jun 2024, Martin Storsjö wrote:
This allows ending up with a normal, non-fragmented file when
the file is finished, while keeping the file readable if writing
is aborted abruptly at any point. (Normally when writing a
mov/mp4 file, the unfinished file is completely useless unless it
On Fri, 7 Jun 2024, Zhao Zhili wrote:
From: Zhao Zhili
Test on Apple M1:
rgb24_to_uv_8_c: 0.0
rgb24_to_uv_8_neon: 0.2
rgb24_to_uv_128_c: 1.0
rgb24_to_uv_128_neon: 0.5
rgb24_to_uv_1080_c: 7.0
rgb24_to_uv_1080_neon: 5.7
rgb24_to_uv_1920_c: 12.5
rgb24_to_uv_1920_neon: 9.5
rgb24_to_uv_half_8_c: 0
On Fri, 7 Jun 2024, Ramiro Polla wrote:
chrRangeFromJpeg_8_c: 28.5
chrRangeFromJpeg_8_neon: 21.2
chrRangeFromJpeg_24_c: 81.2
chrRangeFromJpeg_24_neon: 34.7
chrRangeFromJpeg_128_c: 425.2
chrRangeFromJpeg_128_neon: 162.0
chrRangeFromJpeg_144_c: 480.2
chrRangeFromJpeg_144_neon: 180.2
chrRangeFromJp
On Sun, 9 Jun 2024, Rémi Denis-Courmont wrote:
This seems to have been omitted in
880e2aa23645ed9871c66ee1cbd00f93c72d2d73.
---
configure | 5 -
1 file changed, 5 deletions(-)
diff --git a/configure b/configure
index e69ed55837..4e27e6fd2b 100755
--- a/configure
+++ b/configure
@@ -2130,7 +2
On Sun, 9 Jun 2024, Rémi Denis-Courmont wrote:
The vendor has long since switched to Arm, wit the last product reaching
their official end-of-life over 11 years ago. Linux support for the ISA
was dropped 7 years ago. More importantly, this architecture was never
supported by upstream GCC, and th
On Fri, 7 Jun 2024, Zhao Zhili wrote:
On Jun 7, 2024, at 17:09, Martin Storsjö wrote:
On Fri, 7 Jun 2024, Zhao Zhili wrote:
Note both tests use clang as compiler, which has vectorization
enabled by default with -O3.
FWIW, for more interesting benchmarks, you can configure the build with
On Fri, 7 Jun 2024, Rémi Denis-Courmont wrote:
Le 7 juin 2024 12:12:45 GMT+03:00, "Martin Storsjö" a écrit :
The default timer register pmccntr_el0 usually requires enabling
access with e.g. a kernel module.
---
cntvct_el0 has significantly better resolution than
av_gettime_relat
The default timer register pmccntr_el0 usually requires enabling
access with e.g. a kernel module.
---
cntvct_el0 has significantly better resolution than
av_gettime_relative (while the unscaled nanosecond output of
clock_gettime is much higher resolution).
In one tested case, the cntvct_el0 timer
On Fri, 7 Jun 2024, Zhao Zhili wrote:
From: Zhao Zhili
---
libavutil/timer.h | 5 +
1 file changed, 5 insertions(+)
diff --git a/libavutil/timer.h b/libavutil/timer.h
index 2cd299eca3..74c4d84e69 100644
--- a/libavutil/timer.h
+++ b/libavutil/timer.h
@@ -46,6 +46,8 @@
#include "macos_kperf
On Fri, 7 Jun 2024, Zhao Zhili wrote:
From: Zhao Zhili
It will fallback to mach_absolute_time inside libavutil/timer.h
---
libavutil/aarch64/timer.h | 8 +---
1 file changed, 1 insertion(+), 7 deletions(-)
diff --git a/libavutil/aarch64/timer.h b/libavutil/aarch64/timer.h
index 8b28fd354c.
On Fri, 7 Jun 2024, Zhao Zhili wrote:
Note both tests use clang as compiler, which has vectorization
enabled by default with -O3.
FWIW, for more interesting benchmarks, you can configure the build with
--optflags="-O3 -fno-vectorize".
(Although, the benchmarks against a compiler vectorized
On Fri, 7 Jun 2024, Zhao Zhili wrote:
On Jun 7, 2024, at 16:21, Martin Storsjö wrote:
On Fri, 7 Jun 2024, Zhao Zhili wrote:
From: Zhao Zhili
B0 is defined by system header.
Can you add more details about which header defines this? (I did a quick grep
in a copy of the android NDK
On Fri, 7 Jun 2024, Martin Storsjö wrote:
On Fri, 7 Jun 2024, Zhao Zhili wrote:
From: Zhao Zhili
B0 is defined by system header.
Can you add more details about which header defines this? (I did a quick grep
in a copy of the android NDK, and found it in asm-generic/termbits-common.h
On Fri, 7 Jun 2024, Zhao Zhili wrote:
From: Zhao Zhili
B0 is defined by system header.
Can you add more details about which header defines this? (I did a quick
grep in a copy of the android NDK, and found it in
asm-generic/termbits-common.h.)
// Martin
__
This allows ending up with a normal, non-fragmented file when
the file is finished, while keeping the file readable if writing
is aborted abruptly at any point. (Normally when writing a
mov/mp4 file, the unfinished file is completely useless unless it
is finished properly.)
This results in a file
On Tue, 4 Jun 2024, Zhao Zhili wrote:
From: Zhao Zhili
Test on Apple M1:
rgb24_to_uv_1080_c: 7.2
rgb24_to_uv_1080_neon: 5.5
rgb24_to_uv_1280_c: 8.2
rgb24_to_uv_1280_neon: 6.2
rgb24_to_uv_1920_c: 12.5
rgb24_to_uv_1920_neon: 9.5
rgb24_to_uv_half_540_c: 6.5
rgb24_to_uv_half_540_neon: 3.0
rgb24_
On Wed, 5 Jun 2024, Zhao Zhili wrote:
On Jun 5, 2024, at 14:29, Rémi Denis-Courmont wrote:
Le 4 juin 2024 16:55:01 GMT+03:00, Zhao Zhili mailto:quinkbl...@foxmail.com>> a écrit :
From: Zhao Zhili
Test on Apple M1:
rgb24_to_uv_1080_c: 7.2
rgb24_to_uv_1080_neon: 5.5
rgb24_to_uv_1280_c: 8.2
r
On Tue, 4 Jun 2024, Zhao Zhili wrote:
From: Zhao Zhili
The inline asm doesn't work on Android.
Using pmccntr_el0 doen't work, no, but instead of falling back to
clock_gettime, you may want to use cntvct_el0 instead of pmccntr_el0. IIRC
that works on Android, at least it worked a number of
On Mon, 3 Jun 2024, Zhao Zhili wrote:
diff --git a/libswscale/aarch64/input.S b/libswscale/aarch64/input.S
new file mode 100644
index 00..0a46475723
--- /dev/null
+++ b/libswscale/aarch64/input.S
@@ -0,0 +1,229 @@
+/*
+ * Copyright (c) 2024 Zhao Zhili
+ *
+ * This file is part of FFmpeg
On Mon, 3 Jun 2024, Zhao Zhili wrote:
From: Zhao Zhili
---
I need help on the test. It succeed with the following patch on ARM64,
but failed with x86. I'm not sure whether the issue is in the test, or
hidden in x86 asm, and I don't know x86 asm.
Note that by default, the output of swscale ca
On Sun, 2 Jun 2024, Dennis Sädtler wrote:
On 2024-06-02 21:36, Martin Storsjö wrote:
On Sat, 1 Jun 2024, Dennis Sädtler via ffmpeg-devel wrote:
Should the ftyp atom also be updated to remove brands no longer required
for non-fragmented files?
I'm not sure how important that is in real-
On Sat, 1 Jun 2024, Dennis Sädtler via ffmpeg-devel wrote:
Should the ftyp atom also be updated to remove brands no longer required
for non-fragmented files?
I'm not sure how important that is in real-world scenarios, so it might
not be worth it to deal with some of the additional changes requi
On Fri, 31 May 2024, Timo Rothenpieler wrote:
On 31/05/2024 10:53, Martin Storsjö wrote:
This allows ending up with a normal, non-fragmented file when
the file is finished, while keeping the file readable if writing
is aborted abruptly at any point. (Normally when writing a
mov/mp4 file, the
This allows ending up with a normal, non-fragmented file when
the file is finished, while keeping the file readable if writing
is aborted abruptly at any point. (Normally when writing a
mov/mp4 file, the unfinished file is completely useless unless it
is finished properly.)
This results in a file
On Wed, 22 May 2024, Andreas Rheinhardt wrote:
VVC does not have MMX code at all, so one can use the stricter
declare_func to also check that the MMX state has not been clobbered
with (which would be an ABI violation).
Signed-off-by: Andreas Rheinhardt
---
tests/checkasm/vvc_alf.c | 4 ++--
1 f
The loop filters can write before the pointer given to them;
the actual test invocations correctly used an offset, while
the benchmark calls were lacking an offset. Therefore, when
running with benchmarking, these tests could have spurious
failures.
---
tests/checkasm/h264dsp.c | 4 ++--
1 file ch
On Tue, 21 May 2024, Martin Storsjö wrote:
Don't benchmark every single combination of widths and heights;
only benchmark cases which are squares (like in vvc_mc.c).
Contrary to vvc_mc, which increases sizes by doubling dimensions,
vvc_alf tests all sizes in increments of 4. Limit benchma
On Tue, 21 May 2024, Rémi Denis-Courmont wrote:
Le 21 mai 2024 09:37:18 GMT+03:00, "Martin Storsjö" a écrit :
On Tue, 21 May 2024, Rémi Denis-Courmont wrote:
Hi,
VVC benchmarks have increased checksam runtime by at least an order of
magnitude. It's become so prohibitiv
Don't benchmark every single combination of widths and heights;
only benchmark cases which are squares (like in vvc_mc.c).
Contrary to vvc_mc, which increases sizes by doubling dimensions,
vvc_alf tests all sizes in increments of 4. Limit benchmarking to
the cases which are powers of two.
This re
On Tue, 21 May 2024, Rémi Denis-Courmont wrote:
Hi,
Le 20 mai 2024 03:42:03 GMT+03:00, Stone Chen a
écrit :
Adds checkasm for DMVR SAD AVX2 implementation.
Benchmarks ( AMD 7940HS )
vvc_sad_8x8_c: 70.0
vvc_sad_8x8_avx2: 10.0
vvc_sad_16x16_c: 280.0
vvc_sad_16x16_avx2: 20.0
vvc_sad_32x32_c: 1
On Sat, 11 May 2024, Ramiro Polla wrote:
On Sun, Jan 21, 2024 at 10:57 PM Ramiro Polla wrote:
---
libavcodec/aarch64/idctdsp_init_aarch64.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/libavcodec/aarch64/idctdsp_init_aarch64.c
b/libavcodec/aarch64/idctdsp_init_aarch64
On Sat, 11 May 2024, Lynne via ffmpeg-devel wrote:
Unintentionally removed as part of 03cf10164578aed33f4d0cb5b69d63669c01a538.
Untested, but its assumed that unlike most of the old ARM code,
this one was still working.
---
libavcodec/aac/aacdec_float.c | 5 +
1 file changed, 5 insertions(+)
On Tue, 7 May 2024, Rémi Denis-Courmont wrote:
---
Makefile | 2 +-
configure | 3 +++
doc/APIchanges| 3 +++
ffbuild/arch.mak | 1 +
libavutil/cpu.h | 1 +
libavutil/tests/cpu.c | 1 +
tests/checkasm/checkasm.c | 1 +
7 files changed,
On Tue, 7 May 2024, Rémi Denis-Courmont wrote:
---
libavutil/riscv/cpu.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/libavutil/riscv/cpu.c b/libavutil/riscv/cpu.c
index c3683b06d0..69d1afe853 100644
--- a/libavutil/riscv/cpu.c
+++ b/libavutil/riscv/cpu.c
@@ -29,14 +29,
On Tue, 7 May 2024, Andreas Rheinhardt wrote:
Martin Storsjö:
On Mon, 6 May 2024, James Almer wrote:
It ignores and overwrites the previous values.
Fixes running the test under ubsan.
Signed-off-by: James Almer
---
tests/checkasm/blockdsp.c | 3 ++-
1 file changed, 2 insertions(+), 1
On Mon, 6 May 2024, James Almer wrote:
It ignores and overwrites the previous values.
Fixes running the test under ubsan.
Signed-off-by: James Almer
---
tests/checkasm/blockdsp.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
The change is probably correct, but what issue is ubsan co
On Fri, 3 May 2024, Rémi Denis-Courmont wrote:
This adds the Linux-specific function call to detect CPU features. Unlike
the more portable auxillary vector, this supports extensions other than
single lettered ones. At this point, FFmpeg already needs this to detect
Zba and Zbb at run-time, and p
On Tue, 30 Apr 2024, Andreas Rheinhardt wrote:
Regression since fd172185580c1ccdcfb90bbfdb59fa806fad3117;
triggered by vp4/KTkvw8dg1J8.avi in the FATE suite, but not
when running fate as this code is not used when the bitexact
flag is set.
Bisecting done by ami_stuff, patch from user Mika Fisch
This fixes crashes in the mspel tests on x86.
---
tests/checkasm/vc1dsp.c | 8
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/tests/checkasm/vc1dsp.c b/tests/checkasm/vc1dsp.c
index 407d9e5fe8..f18f0f8251 100644
--- a/tests/checkasm/vc1dsp.c
+++ b/tests/checkasm/vc1dsp.c
@@
On Thu, 25 Apr 2024, Derek Buitenhuis wrote:
Changes since last set:
* Updated commit message with RFC references.
* Properly support Retry-After as both a date and integer number of seconds.
I have tested this against both an HTTP-Date and seconds, and confirmed
it to work.
Derek Buitenhuis
On Mon, 22 Apr 2024, Derek Buitenhuis wrote:
This patch set adds support for properly handling HTTP 429 codes,
and their rate limiting, which is widely used and is standardized.
Changes since first set:
* Added AVERROR_HTTP_TOO_MANY_REQUESTS top error_entries in error.c, per
Andreas' review.
On Mon, 22 Apr 2024, Derek Buitenhuis wrote:
Not every use case benefits from setting retries in terms of the backoff.
Signed-off-by: Derek Buitenhuis
---
libavformat/http.c| 12 +---
libavformat/version.h | 2 +-
2 files changed, 10 insertions(+), 4 deletions(-)
diff --git a/libav
On Mon, 22 Apr 2024, Derek Buitenhuis wrote:
429 and 503 codes can, and often do (e.g. all Google Cloud
Storage URLs can), return a Retry-After header with the error,
indicating how long to wait, in seconds, before retrying again.
If it is not respected by, for example, using our default backoff
On Mon, 22 Apr 2024, Derek Buitenhuis wrote:
Added in thep previous commit.
Signed-off-by: Derek Buitenhuis
---
libavformat/http.c | 6 ++
1 file changed, 6 insertions(+)
diff --git a/libavformat/http.c b/libavformat/http.c
index ed20359552..bbace2694f 100644
--- a/libavformat/http.c
+++ b
On Mon, 22 Apr 2024, Derek Buitenhuis wrote:
Added in thep previous commit.
Typo in the commit message
// Martin
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link abov
On Wed, 17 Apr 2024, Marvin Scholz wrote:
This fixes the checks to properly use runtime feature detection and
check the SDK version (*_MAX_ALLOWED) instead of the targeted version
for the relevant APIs.
As these things are pretty hard to think straight about, it could be good
with a more conc
On Wed, 17 Apr 2024, Ramiro Polla wrote:
This patch set adds fdct to checkasm and neon-optimized fdct for aarch64.
Ramiro Polla (2):
checkasm: add test for fdct
lavc/aarch64/fdct: add neon-optimized fdct for aarch64
libavcodec/aarch64/Makefile | 2 +
libavcodec/aarch64/fdct.h
Travis is no longer relevant for attempting to run CI jobs in our
setup.
---
.travis.yml | 30 --
1 file changed, 30 deletions(-)
delete mode 100644 .travis.yml
diff --git a/.travis.yml b/.travis.yml
deleted file mode 100644
index 784b7bdf73..00
--- a/.travis.
On Wed, 17 Apr 2024, Ramiro Polla wrote:
The code is imported from libjpeg-turbo-3.0.1. The neon registers used
have been changed to avoid modifying v8-v15.
---
libavcodec/aarch64/Makefile | 2 +
libavcodec/aarch64/fdct.h | 26 ++
libavcodec/aarch64/fdctdsp_init_aa
On Wed, 10 Apr 2024, J. Dekker wrote:
The exclude_guest option only has an effect on x86. Omitting
'exclude_guest' defaults to zero which implies that you can count guest
events should you run one. Some non-x86 kernels just ignore it, while
others (e.g. the Asahi Linux kernels) require the user
On Tue, 9 Apr 2024, James Almer wrote:
On 4/4/2024 7:29 AM, Martin Storsjö wrote:
This is based on a spec at https://aomediacodec.github.io/id3-emsg/,
further based on ISO/IEC 23009-1:2019.
Within libavformat, timed ID3 metadata (already supported by the
mpegts demuxer and muxer) is handled
On Thu, 4 Apr 2024, Martin Storsjö wrote:
We have test to make sure that certain configurations do print
warnings. However, the normal operation of the muxer within this
test always printed a warning, so those tests to check for
extra warnings didn't essentially guard anything.
The wa
On Thu, 4 Apr 2024, Martin Storsjö wrote:
This line originates from 6f69f7a8bf6a0d013985578df2ef42ee6b1c7994.
---
libavformat/movenc.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/libavformat/movenc.c b/libavformat/movenc.c
index 46a5b3a62f..ccdd2dbfc9 100644
--- a/libavformat/movenc.c
On Tue, 12 Mar 2024, Martin Storsjö wrote:
---
libavutil/aarch64/cpu.c | 25 +
1 file changed, 13 insertions(+), 12 deletions(-)
diff --git a/libavutil/aarch64/cpu.c b/libavutil/aarch64/cpu.c
index 7a05391343..196bdaf6b0 100644
--- a/libavutil/aarch64/cpu.c
+++ b
On Tue, 9 Apr 2024, J. Dekker wrote:
Note that the config.sh file is left without a shebang, this file is
supposed to be sourced into the current environment.
This commit is purely cosmetic.
Signed-off-by: J. Dekker
---
configure | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Thanks,
On Mon, 8 Apr 2024, J. Dekker wrote:
The preferred way to use LTO is --enable-lto but often times packagers
still end up with -flto in cflags for various reasons. Using grep
on binary object files is brittle and relies on specific object
representation, which in the case of LLVM bitcode, debug-i
On Mon, 8 Apr 2024, J. Dekker wrote:
In some cases, these scripts can be called directly by packagers, and
some systems require the interpreter to be explicit.
It is unclear to me which of the changes are needed and for what reason,
please elaborate much more in the commit message.
Is it po
Before: Cortex A53 A72 A78
ac3_sum_square_bufferfly_float_neon: 1005.7 516.5 224.5
After:
ac3_sum_square_bufferfly_float_neon: 981.7 504.5 223.2
---
libavcodec/aarch64/ac3dsp_neon.S | 16
1 file changed, 4 insertions(+), 12 deletions(-)
On Sat, 6 Apr 2024, Geoff Hill wrote:
Thanks Martin for your review and testing.
Here's v4 with the following changes:
* Use fmal in sum_square_butterfly_float loop. Faster.
* Removed redundant loop bound zero checks in extract_exponents,
sum_square_bufferfly_int32 and sum_square_bufferf
On Tue, 2 Apr 2024, Geoff Hill wrote:
Signed-off-by: Geoff Hill
---
libavcodec/aarch64/ac3dsp_init_aarch64.c | 5
libavcodec/aarch64/ac3dsp_neon.S | 35
tests/checkasm/ac3dsp.c | 26 ++
3 files changed, 66 insertions(+)
diff
On Tue, 2 Apr 2024, Geoff Hill wrote:
Signed-off-by: Geoff Hill
---
libavcodec/aarch64/ac3dsp_init_aarch64.c | 5 +
libavcodec/aarch64/ac3dsp_neon.S | 24 +
tests/checkasm/ac3dsp.c | 27
3 files changed, 56 insertions(+)
d
On Tue, 2 Apr 2024, Geoff Hill wrote:
Here's v3 to push the AC-3 ARMv8 NEON experiment a step further.
This version implements 5 of the AC-3 encoder DSP functions,
and adds checkasm tests where missing.
I've tested that the checkasm tests pass on aarch64 and x86.
Thanks, I've tested that che
This is based on a spec at https://aomediacodec.github.io/id3-emsg/,
further based on ISO/IEC 23009-1:2019.
Within libavformat, timed ID3 metadata (already supported by the
mpegts demuxer and muxer) is handled as a separate data AVStream
with codec type AV_CODEC_ID_TIMED_ID3. However, it doesn't
h
We have test to make sure that certain configurations do print
warnings. However, the normal operation of the muxer within this
test always printed a warning, so those tests to check for
extra warnings didn't essentially guard anything.
The warning that always was printed, "track 1: codec frame si
This line originates from 6f69f7a8bf6a0d013985578df2ef42ee6b1c7994.
---
libavformat/movenc.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/libavformat/movenc.c b/libavformat/movenc.c
index 46a5b3a62f..ccdd2dbfc9 100644
--- a/libavformat/movenc.c
+++ b/libavformat/movenc.c
@@ -1173,8 +1173,6
This fixes assembling files starting with bare symbol declarations,
without explicitly switching to .text first.
---
gas-preprocessor.pl | 3 +++
1 file changed, 3 insertions(+)
diff --git a/gas-preprocessor.pl b/gas-preprocessor.pl
index 2880858..b66181a 100755
--- a/gas-preprocessor.pl
+++ b/ga
On Tue, 26 Mar 2024, Jean-Baptiste Kempf wrote:
On Mon, 25 Mar 2024, at 22:56, J. Dekker wrote:
On Mon, 25 Mar 2024, Martin Storsjö wrote:
Since some time, we have pretty complete AArch64 NEON coverage
for the hevc decoder.
However, some of these functions require the I8MM instruction set
On Mon, 25 Mar 2024, Martin Storsjö wrote:
Since some time, we have pretty complete AArch64 NEON coverage
for the hevc decoder.
However, some of these functions require the I8MM instruction set
extension, and many of them (but not all) lack a plain NEON
version.
This patchset fills in a
As the plain neon qpel_h functions process two rows at a time,
we need to allocate storage for h+8 rows instead of h+7.
By allocating storage for h+8 rows, incrementing the stack
pointer won't end up at the right spot in the end. Store the
intended final stack pointer value in a register x14 which
As the plain neon qpel_h functions process two rows at a time,
we need to allocate storage for h+8 rows instead of h+7.
AWS Graviton 3:
put_hevc_qpel_uni_w_hv4_8_c: 422.2
put_hevc_qpel_uni_w_hv4_8_neon: 140.7
put_hevc_qpel_uni_w_hv4_8_i8mm: 100.7
put_hevc_qpel_uni_w_hv8_8_c: 1208.0
put_hevc_qpel_u
As the plain neon qpel_h functions process two rows at a time,
we need to allocate storage for h+8 rows instead of h+7.
By allocating storage for h+8 rows, incrementing the stack
pointer won't end up at the right spot in the end. Store the
intended final stack pointer value in a register x14 which
As the plain neon qpel_h functions process two rows at a time,
we need to allocate storage for h+8 rows instead of h+7.
By allocating storage for h+8 rows, incrementing the stack
pointer won't end up at the right spot in the end. Store the
intended final stack pointer value in a register x14 which
The hv32 and hv64 functions were identical - both loop and
process 16 pixels at a time.
The hv16 function was near identical, except for the outer loop
(and using sp instead of a separate register).
Given the size of these functions, the extra cost of the outer
loop is negligible, so use the same
---
libavcodec/aarch64/hevcdsp_qpel_neon.S | 695 +
1 file changed, 355 insertions(+), 340 deletions(-)
diff --git a/libavcodec/aarch64/hevcdsp_qpel_neon.S
b/libavcodec/aarch64/hevcdsp_qpel_neon.S
index 06832603d9..ad568e415b 100644
--- a/libavcodec/aarch64/hevcdsp_qpel_n
---
libavcodec/aarch64/hevcdsp_qpel_neon.S | 94 +++---
1 file changed, 86 insertions(+), 8 deletions(-)
diff --git a/libavcodec/aarch64/hevcdsp_qpel_neon.S
b/libavcodec/aarch64/hevcdsp_qpel_neon.S
index fba063186c..c04e8dbea8 100644
--- a/libavcodec/aarch64/hevcdsp_qpel_neon
In addition to just templating, this contains one change to
ff_hevc_put_hevc_epel_bi_hv32_8, by setting the w6 register
which ff_hevc_put_hevc_epel_h32_8_neon requires.
AWS Graviton 3:
put_hevc_epel_bi_hv4_8_c: 176.5
put_hevc_epel_bi_hv4_8_neon: 62.0
put_hevc_epel_bi_hv4_8_i8mm: 58.0
put_hevc_epel
AWS Graviton 3:
put_hevc_qpel_uni_w_h4_8_c: 159.0
put_hevc_qpel_uni_w_h4_8_neon: 64.2
put_hevc_qpel_uni_w_h4_8_i8mm: 40.0
put_hevc_qpel_uni_w_h6_8_c: 344.7
put_hevc_qpel_uni_w_h6_8_neon: 114.5
put_hevc_qpel_uni_w_h6_8_i8mm: 82.0
put_hevc_qpel_uni_w_h8_8_c: 596.2
put_hevc_qpel_uni_w_h8_8_neon: 132.2
The first horizontal filter can use either i8mm or plain neon
versions, while the second part is a pure neon implementation.
---
libavcodec/aarch64/hevcdsp_epel_neon.S | 100 +
1 file changed, 100 insertions(+)
diff --git a/libavcodec/aarch64/hevcdsp_epel_neon.S
b/libavco
AWS Graviton 3:
put_hevc_epel_uni_w_h4_8_c: 97.2
put_hevc_epel_uni_w_h4_8_neon: 41.2
put_hevc_epel_uni_w_h4_8_i8mm: 35.2
put_hevc_epel_uni_w_h6_8_c: 203.7
put_hevc_epel_uni_w_h6_8_neon: 84.7
put_hevc_epel_uni_w_h6_8_i8mm: 74.7
put_hevc_epel_uni_w_h8_8_c: 345.7
put_hevc_epel_uni_w_h8_8_neon: 94.0
pu
AWS Graviton 3:
put_hevc_epel_h4_8_c: 64.7
put_hevc_epel_h4_8_neon: 25.0
put_hevc_epel_h4_8_i8mm: 21.2
put_hevc_epel_h6_8_c: 130.0
put_hevc_epel_h6_8_neon: 40.7
put_hevc_epel_h6_8_i8mm: 36.5
put_hevc_epel_h8_8_c: 209.0
put_hevc_epel_h8_8_neon: 45.2
put_hevc_epel_h8_8_i8mm: 41.2
put_hevc_epel_h12_8_
AWS Graviton 3:
put_hevc_epel_uni_w_hv4_8_c: 191.2
put_hevc_epel_uni_w_hv4_8_neon: 87.7
put_hevc_epel_uni_w_hv4_8_i8mm: 83.2
put_hevc_epel_uni_w_hv6_8_c: 349.5
put_hevc_epel_uni_w_hv6_8_neon: 153.0
put_hevc_epel_uni_w_hv6_8_i8mm: 148.5
put_hevc_epel_uni_w_hv8_8_c: 581.2
put_hevc_epel_uni_w_hv8_8_ne
AWS Graviton 3:
put_hevc_epel_uni_hv4_8_c: 163.5
put_hevc_epel_uni_hv4_8_neon: 59.7
put_hevc_epel_uni_hv4_8_i8mm: 57.5
put_hevc_epel_uni_hv6_8_c: 344.7
put_hevc_epel_uni_hv6_8_neon: 105.0
put_hevc_epel_uni_hv6_8_i8mm: 102.7
put_hevc_epel_uni_hv8_8_c: 552.2
put_hevc_epel_uni_hv8_8_neon: 111.2
put_he
---
libavcodec/aarch64/hevcdsp_qpel_neon.S | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/libavcodec/aarch64/hevcdsp_qpel_neon.S
b/libavcodec/aarch64/hevcdsp_qpel_neon.S
index 0fcded344b..062b7d4d0f 100644
--- a/libavcodec/aarch64/hevcdsp_qpel_neon.S
+++ b/libavcodec/aarch6
AWS Graviton 3:
put_hevc_epel_hv4_8_c: 163.7
put_hevc_epel_hv4_8_neon: 52.5
put_hevc_epel_hv4_8_i8mm: 49.5
put_hevc_epel_hv6_8_c: 292.2
put_hevc_epel_hv6_8_neon: 97.7
put_hevc_epel_hv6_8_i8mm: 101.2
put_hevc_epel_hv8_8_c: 471.0
put_hevc_epel_hv8_8_neon: 106.7
put_hevc_epel_hv8_8_i8mm: 102.5
put_hev
This is a pure reordering of code without changing anything in
the individual functions.
---
libavcodec/aarch64/hevcdsp_epel_neon.S | 971 +
1 file changed, 497 insertions(+), 474 deletions(-)
diff --git a/libavcodec/aarch64/hevcdsp_epel_neon.S
b/libavcodec/aarch64/hevcds
For widths of 32 pixels and more, loop first horizontally,
then vertically.
Previously, this function would process a 16 pixel wide slice
of the block, looping vertically. After processing the whole
height, it would backtrack and process the next 16 pixel wide
slice.
When doing 8tap filtering hor
This gets rid of a couple instructions, but the actual performance
is almost identical on Cortex A72/A73. On Cortex A53, it is a
handful of cycles faster.
---
libavcodec/aarch64/hevcdsp_qpel_neon.S | 15 +--
1 file changed, 5 insertions(+), 10 deletions(-)
diff --git a/libavcodec/aarc
Many of the routines within hevcdsp_epel_neon and hevcdsp_qpel_neon
store temporary buffers on the stack. When consuming it,
many of these functions use the stack pointer as incremental pointer
for reading the data (instead of storing it in another register),
which is rather unusual.
Technically,
Group the epel and qpel functions together.
---
libavcodec/aarch64/hevcdsp_init_aarch64.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/libavcodec/aarch64/hevcdsp_init_aarch64.c
b/libavcodec/aarch64/hevcdsp_init_aarch64.c
index 04692aa98e..d2f2a3681f 100644
--- a/libavcodec/
xes a subtle bug in the existing implementation;
two functions relied on the contents on the stack, below the
stack pointer, being untouched within a function. If a signal
gets delivered, those parts of the stack could be clobbered.
// Martin
Martin Storsjö (21):
aarch64: hevc: Reorder a misp
On Fri, 22 Mar 2024, Andreas Rheinhardt wrote:
Martin Storsjö:
Both patches seem to work fine with MSVC 19.27 - I vaguely prefer the v2
version, which is simpler.
But to me, we could also just revert the change to
libavcodec/ccaption_dec.c, and declare that we require MSVC 19.28
instead
On Thu, 21 Mar 2024, Andreas Rheinhardt wrote:
Andreas Rheinhardt:
C11 provides static assertions via _Static_assert and
provides static_assert as a convenience define for this
in assert.h. MSVC 19.27 declares support for C11, but does
not support _Static_assert, but somehow supports
static_ass
On Sun, 17 Mar 2024, Rémi Denis-Courmont wrote:
Obviously not. Imported libraries are only there to resolve missing
symbols.
Sure - but if resolving the missing symbols brings in those conflicting
object files, there's not much to do about it. If the static library
contains dec_init in a sta
On Sun, 10 Mar 2024, Andreas Rheinhardt wrote:
All versions of MSVC that support C11 (namely >= v19.27)
also support the restrict keyword, therefore av_restrict
is no longer necessary since 75697836b1db3e0f0a3b7061be6be28d00c675a0.
Signed-off-by: Andreas Rheinhardt
---
Untested except via godb
On Thu, 14 Mar 2024, J. Dekker wrote:
Martin Storsjö writes:
The first 32 elements of each row were correct, while the
last 16 were scrambled.
This hasn't been noticed, because the checkasm test erroneously
only checked half of the output (for 8 bit functions), and
apparently none o
This simplifies the code for checking the output, and can print
the failing output (including a map of matching/mismatching
elements) if checkasm is run with the -v/--verbose option.
---
tests/checkasm/hevc_pel.c | 71 ++-
1 file changed, 41 insertions(+), 30 de
101 - 200 of 1007 matches
Mail list logo