Re: [FFmpeg-devel] [PATCH 1/3] riscv: add Zvbb vector bit manipulation extension
On Tue, 7 May 2024, Rémi Denis-Courmont wrote: --- Makefile | 2 +- configure | 3 +++ doc/APIchanges| 3 +++ ffbuild/arch.mak | 1 + libavutil/cpu.h | 1 + libavutil/tests/cpu.c | 1 + tests/checkasm/checkasm.c | 1 + 7 files changed, 11 insertions(+), 1 deletion(-) diff --git a/libavutil/tests/cpu.c b/libavutil/tests/cpu.c index d91bfeab5c..10e620963b 100644 --- a/libavutil/tests/cpu.c +++ b/libavutil/tests/cpu.c @@ -94,6 +94,7 @@ static const struct { { AV_CPU_FLAG_RVV_F32, "zve32f" }, { AV_CPU_FLAG_RVV_I64, "zve64x" }, { AV_CPU_FLAG_RVV_F64, "zve64d" }, +{ AV_CPU_FLAG_RV_ZVBB, "zvbb" }, #endif { 0 } }; Doesn't this test require you to add this extension to the list in libavutil/cpu.c as well? // Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] lavu/riscv: fix build without
On Tue, 7 May 2024, Rémi Denis-Courmont wrote: --- libavutil/riscv/cpu.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/libavutil/riscv/cpu.c b/libavutil/riscv/cpu.c index c3683b06d0..69d1afe853 100644 --- a/libavutil/riscv/cpu.c +++ b/libavutil/riscv/cpu.c @@ -29,14 +29,14 @@ #include #define HWCAP_RV(letter) (1ul << ((letter) - 'A')) #endif -#ifdef HAVE_SYS_HWPROBE_H +#if HAVE_SYS_HWPROBE_H #include #endif int ff_get_cpu_flags_riscv(void) { int ret = 0; -#ifdef HAVE_SYS_HWPROBE_H +#if HAVE_SYS_HWPROBE_H struct riscv_hwprobe pairs[] = { { RISCV_HWPROBE_KEY_BASE_BEHAVIOR, 0 }, { RISCV_HWPROBE_KEY_IMA_EXT_0, 0 }, -- 2.43.0 LGTM // Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] checkasm/blockdsp: don't randomize the buffers for fill_block_tab
On Tue, 7 May 2024, Andreas Rheinhardt wrote: Martin Storsjö: On Mon, 6 May 2024, James Almer wrote: It ignores and overwrites the previous values. Fixes running the test under ubsan. Signed-off-by: James Almer --- tests/checkasm/blockdsp.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) The change is probably correct, but what issue is ubsan complaining about? If this would just be a dead store of unused random values, that shouldn't be an ubsan issue in general, right? UBSan complains about unaligned stores in randomize_buffers; which is obvious given that i is incremented by 1, not by 2. I sent a patch that fixes this without removing randomization: https://ffmpeg.org/pipermail/ffmpeg-devel/2024-May/326945.html Thanks, that explains it. Those two patches LGTM. // Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] checkasm/blockdsp: don't randomize the buffers for fill_block_tab
On Mon, 6 May 2024, James Almer wrote: It ignores and overwrites the previous values. Fixes running the test under ubsan. Signed-off-by: James Almer --- tests/checkasm/blockdsp.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) The change is probably correct, but what issue is ubsan complaining about? If this would just be a dead store of unused random values, that shouldn't be an ubsan issue in general, right? // Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH 2/2] lavu/riscv: add hwprobe() for CPU detection
On Fri, 3 May 2024, Rémi Denis-Courmont wrote: This adds the Linux-specific function call to detect CPU features. Unlike the more portable auxillary vector, this supports extensions other than single lettered ones. At this point, FFmpeg already needs this to detect Zba and Zbb at run-time, and probably will need it for Zvbb in the near future. Support will be available in glibc 2.40 onward. --- configure | 3 +++ libavutil/riscv/cpu.c | 25 + 2 files changed, 28 insertions(+) @@ -27,10 +29,33 @@ #include #define HWCAP_RV(letter) (1ul << ((letter) - 'A')) #endif +#ifdef HAVE_SYS_HWPROBE_H Aren't these kind of config.h macros always defined, but with the values 0/1? I.e., shouldn't this use #if instead of #ifdef? // Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] avcodec/x86/vp3dsp_init: Set correct function pointer, fix crash
On Tue, 30 Apr 2024, Andreas Rheinhardt wrote: Regression since fd172185580c1ccdcfb90bbfdb59fa806fad3117; triggered by vp4/KTkvw8dg1J8.avi in the FATE suite, but not when running fate as this code is not used when the bitexact flag is set. Bisecting done by ami_stuff, patch from user Mika Fischer in ticket #10027 (which this commit fixes). Signed-off-by: Andreas Rheinhardt --- libavcodec/x86/vp3dsp_init.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libavcodec/x86/vp3dsp_init.c b/libavcodec/x86/vp3dsp_init.c index f54fa57b3e..edac1764cb 100644 --- a/libavcodec/x86/vp3dsp_init.c +++ b/libavcodec/x86/vp3dsp_init.c @@ -53,7 +53,7 @@ av_cold void ff_vp3dsp_init_x86(VP3DSPContext *c, int flags) if (!(flags & AV_CODEC_FLAG_BITEXACT)) { c->v_loop_filter = c->v_loop_filter_unaligned = ff_vp3_v_loop_filter_mmxext; -c->h_loop_filter = c->v_loop_filter_unaligned = ff_vp3_h_loop_filter_mmxext; +c->h_loop_filter = c->h_loop_filter_unaligned = ff_vp3_h_loop_filter_mmxext; } } -- 2.40.1 LGTM // Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH] checkasm: vc1dsp: Align buffers sufficiently for the mspel tests
This fixes crashes in the mspel tests on x86. --- tests/checkasm/vc1dsp.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/tests/checkasm/vc1dsp.c b/tests/checkasm/vc1dsp.c index 407d9e5fe8..f18f0f8251 100644 --- a/tests/checkasm/vc1dsp.c +++ b/tests/checkasm/vc1dsp.c @@ -441,10 +441,10 @@ static void check_unescape(void) static void check_mspel_pixels(void) { -LOCAL_ALIGNED_8(uint8_t, src0, [32 * 32]); -LOCAL_ALIGNED_8(uint8_t, src1, [32 * 32]); -LOCAL_ALIGNED_8(uint8_t, dst0, [32 * 32]); -LOCAL_ALIGNED_8(uint8_t, dst1, [32 * 32]); +LOCAL_ALIGNED_16(uint8_t, src0, [32 * 32]); +LOCAL_ALIGNED_16(uint8_t, src1, [32 * 32]); +LOCAL_ALIGNED_16(uint8_t, dst0, [32 * 32]); +LOCAL_ALIGNED_16(uint8_t, dst1, [32 * 32]); VC1DSPContext h; -- 2.34.1 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH v3 0/2] HTTP Retry-After Support
On Thu, 25 Apr 2024, Derek Buitenhuis wrote: Changes since last set: * Updated commit message with RFC references. * Properly support Retry-After as both a date and integer number of seconds. I have tested this against both an HTTP-Date and seconds, and confirmed it to work. Derek Buitenhuis (2): avformat/http: Rename parse_set_cookie_expiry_time to parse_http_date avformat/http: Add support for Retry-After header doc/protocols.texi| 5 libavformat/http.c| 62 ++- libavformat/version.h | 2 +- 3 files changed, 49 insertions(+), 20 deletions(-) Thanks, these patches LGTM. // Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH v2 0/9] HTTP rate limiting and retry improvements
On Mon, 22 Apr 2024, Derek Buitenhuis wrote: This patch set adds support for properly handling HTTP 429 codes, and their rate limiting, which is widely used and is standardized. Changes since first set: * Added AVERROR_HTTP_TOO_MANY_REQUESTS top error_entries in error.c, per Andreas' review. * Made respect_retry_after unsigned and use strtoull, per James' review. * Added docs, as per Stefano's reviews./ * Added a new option to limit the total reconnect delay. * Unfortunate, but HTTP connection management is messy business. I had a look over this patchset, and I had a handful of minor comments, but overall, the patchset seems fine to me. Thanks! // Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH v2 6/9] avformat/http: Add options to set the max number of connection retries
On Mon, 22 Apr 2024, Derek Buitenhuis wrote: Not every use case benefits from setting retries in terms of the backoff. Signed-off-by: Derek Buitenhuis --- libavformat/http.c| 12 +--- libavformat/version.h | 2 +- 2 files changed, 10 insertions(+), 4 deletions(-) diff --git a/libavformat/http.c b/libavformat/http.c index 6927fea2fb..06bd3e340e 100644 --- a/libavformat/http.c +++ b/libavformat/http.c @@ -140,6 +140,7 @@ typedef struct HTTPContext { uint64_t filesize_from_content_range; int respect_retry_after; unsigned int retry_after; +int reconnect_max_retries; } HTTPContext; #define OFFSET(x) offsetof(HTTPContext, x) @@ -178,6 +179,7 @@ static const AVOption options[] = { { "reconnect_on_http_error", "list of http status codes to reconnect on", OFFSET(reconnect_on_http_error), AV_OPT_TYPE_STRING, { .str = NULL }, 0, 0, D }, { "reconnect_streamed", "auto reconnect streamed / non seekable streams", OFFSET(reconnect_streamed), AV_OPT_TYPE_BOOL, { .i64 = 0 }, 0, 1, D }, { "reconnect_delay_max", "max reconnect delay in seconds after which to give up", OFFSET(reconnect_delay_max), AV_OPT_TYPE_INT, { .i64 = 120 }, 0, UINT_MAX/1000/1000, D }, +{ "reconnect_max_retries", "the max number of times to retry a connection", OFFSET(reconnect_max_retries), AV_OPT_TYPE_INT, { .i64 = -1 }, -1, INT_MAX, D }, { "respect_retry_after", "respect the Retry-After header when retrying connections", OFFSET(respect_retry_after), AV_OPT_TYPE_BOOL, { .i64 = 1 }, 0, 1, D }, { "listen", "listen on HTTP", OFFSET(listen), AV_OPT_TYPE_INT, { .i64 = 0 }, 0, 2, D | E }, { "resource", "The resource requested by a client", OFFSET(resource), AV_OPT_TYPE_STRING, { .str = NULL }, 0, 0, E }, @@ -359,7 +361,7 @@ static int http_open_cnx(URLContext *h, AVDictionary **options) { HTTPAuthType cur_auth_type, cur_proxy_auth_type; HTTPContext *s = h->priv_data; -int ret, auth_attempts = 0, redirects = 0; +int ret, conn_attempts = 1, auth_attempts = 0, redirects = 0; int reconnect_delay = 0; uint64_t off; char *cached; @@ -386,7 +388,8 @@ redo: ret = http_open_cnx_internal(h, options); if (ret < 0) { if (!http_should_reconnect(s, ret) || -reconnect_delay > s->reconnect_delay_max) +reconnect_delay > s->reconnect_delay_max || +(s->reconnect_max_retries >= 0 && conn_attempts > s->reconnect_max_retries)) goto fail; if (s->respect_retry_after && s->retry_after > 0) { @@ -401,6 +404,7 @@ redo: if (ret != AVERROR(ETIMEDOUT)) goto fail; reconnect_delay = 1 + 2 * reconnect_delay; +conn_attempts++; /* restore the offset (http_connect resets it) */ s->off = off; @@ -1706,6 +1710,7 @@ static int http_read_stream(URLContext *h, uint8_t *buf, int size) int err, read_ret; int64_t seek_ret; int reconnect_delay = 0; +int conn_attempt = 1; Minor inconsistency; the corresponding variable in the other function was called conn_attempts, as a plural. // Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH v2 4/9] avformat/http: Add support for Retry-After header
On Mon, 22 Apr 2024, Derek Buitenhuis wrote: 429 and 503 codes can, and often do (e.g. all Google Cloud Storage URLs can), return a Retry-After header with the error, indicating how long to wait, in seconds, before retrying again. If it is not respected by, for example, using our default backoff stratetgy instead, chances of success are very unlikely. This adds an AVOption to respect that header. Signed-off-by: Derek Buitenhuis --- libavformat/http.c| 12 libavformat/version.h | 2 +- 2 files changed, 13 insertions(+), 1 deletion(-) Is this feature standardized in a RFC, or is it some other spec somewhere? I think it would be nice with a link to a spec in the commit message here. diff --git a/libavformat/http.c b/libavformat/http.c index e7603037f4..5ed481b63a 100644 --- a/libavformat/http.c +++ b/libavformat/http.c @@ -138,6 +138,8 @@ typedef struct HTTPContext { char *new_location; AVDictionary *redirect_cache; uint64_t filesize_from_content_range; +int respect_retry_after; +unsigned int retry_after; } HTTPContext; #define OFFSET(x) offsetof(HTTPContext, x) @@ -176,6 +178,7 @@ static const AVOption options[] = { { "reconnect_on_http_error", "list of http status codes to reconnect on", OFFSET(reconnect_on_http_error), AV_OPT_TYPE_STRING, { .str = NULL }, 0, 0, D }, { "reconnect_streamed", "auto reconnect streamed / non seekable streams", OFFSET(reconnect_streamed), AV_OPT_TYPE_BOOL, { .i64 = 0 }, 0, 1, D }, { "reconnect_delay_max", "max reconnect delay in seconds after which to give up", OFFSET(reconnect_delay_max), AV_OPT_TYPE_INT, { .i64 = 120 }, 0, UINT_MAX/1000/1000, D }, +{ "respect_retry_after", "respect the Retry-After header when retrying connections", OFFSET(respect_retry_after), AV_OPT_TYPE_BOOL, { .i64 = 1 }, 0, 1, D }, { "listen", "listen on HTTP", OFFSET(listen), AV_OPT_TYPE_INT, { .i64 = 0 }, 0, 2, D | E }, { "resource", "The resource requested by a client", OFFSET(resource), AV_OPT_TYPE_STRING, { .str = NULL }, 0, 0, E }, { "reply_code", "The http status code to return to a client", OFFSET(reply_code), AV_OPT_TYPE_INT, { .i64 = 200}, INT_MIN, 599, E}, @@ -386,6 +389,13 @@ redo: reconnect_delay > s->reconnect_delay_max) goto fail; +if (s->respect_retry_after && s->retry_after > 0) { +reconnect_delay = s->retry_after; It'd be nice with a comment to clarify the units of both values here, which apparently both happen to be integer seconds? +if (reconnect_delay > s->reconnect_delay_max) +goto fail; +s->retry_after = 0; +} + av_log(h, AV_LOG_WARNING, "Will reconnect at %"PRIu64" in %d second(s).\n", off, reconnect_delay); ret = ff_network_sleep_interruptible(1000U * 1000 * reconnect_delay, >interrupt_callback); if (ret != AVERROR(ETIMEDOUT)) @@ -1231,6 +1241,8 @@ static int process_line(URLContext *h, char *line, int line_count, int *parsed_h parse_expires(s, p); } else if (!av_strcasecmp(tag, "Cache-Control")) { parse_cache_control(s, p); +} else if (!av_strcasecmp(tag, "Retry-After")) { +s->retry_after = strtoul(p, NULL, 10); Can you add a comment here, to clarify what unit the value is expressed in? // Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH v2 2/9] avformat/http: Use AVERROR_HTTP_TOO_MANY_REQUESTS
On Mon, 22 Apr 2024, Derek Buitenhuis wrote: Added in thep previous commit. Signed-off-by: Derek Buitenhuis --- libavformat/http.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/libavformat/http.c b/libavformat/http.c index ed20359552..bbace2694f 100644 --- a/libavformat/http.c +++ b/libavformat/http.c @@ -286,6 +286,7 @@ static int http_should_reconnect(HTTPContext *s, int err) case AVERROR_HTTP_UNAUTHORIZED: case AVERROR_HTTP_FORBIDDEN: case AVERROR_HTTP_NOT_FOUND: +case AVERROR_HTTP_TOO_MANY_REQUESTS: case AVERROR_HTTP_OTHER_4XX: status_group = "4xx"; break; @@ -522,6 +523,7 @@ int ff_http_averror(int status_code, int default_averror) case 401: return AVERROR_HTTP_UNAUTHORIZED; case 403: return AVERROR_HTTP_FORBIDDEN; case 404: return AVERROR_HTTP_NOT_FOUND; +case 429: return AVERROR_HTTP_TOO_MANY_REQUESTS; default: break; } if (status_code >= 400 && status_code <= 499) @@ -558,6 +560,10 @@ static int http_write_reply(URLContext* h, int status_code) reply_code = 404; reply_text = "Not Found"; break; +case 429: +reply_code = 429; +reply_text = "Too Many Requests"; +break; case 200: This function seems to handle both the literal status codes, like 429, and also AVERROR style error codes, as when called from handle_http_errors, so perhaps it would be good for consistency to add the AVERROR here too. // Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH v2 2/9] avformat/http: Use AVERROR_HTTP_TOO_MANY_REQUESTS
On Mon, 22 Apr 2024, Derek Buitenhuis wrote: Added in thep previous commit. Typo in the commit message // Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] avdevice/avfoundation: fix macOS/iOS/tvOS SDK conditional checks
On Wed, 17 Apr 2024, Marvin Scholz wrote: This fixes the checks to properly use runtime feature detection and check the SDK version (*_MAX_ALLOWED) instead of the targeted version for the relevant APIs. As these things are pretty hard to think straight about, it could be good with a more concrete example of what this achieves. I.e. if building with -mmacosx-version-min=10.13, we can still use the macOS 10.15 specific APIs, if they were available at build time, via the runtime check. The target is still checked (*_MIN_REQUIRED) to avoid using deprecated methods when targeting new enough versions. --- libavdevice/avfoundation.m | 164 ++--- 1 file changed, 116 insertions(+), 48 deletions(-) The diff is pretty hard to read as is, but when applied and viewed with "git show -w", it becomes clearer. The changes from TARGET_OS_IPHONE to TARGET_OS_IOS is pretty subtle, iirc TARGET_OS_IPHONE was any non-desktop platform (ios/tvos/watchos etc), while TARGET_OS_IOS specifically is iOS. The change looks right, but it might be good to spell this out as well. Specifically also, that TARGET_OS_IPHONE covers a whole class of OSes, while TARGET_OS_IOS is one OS - but the version defines for that OS are __IPHONE_OS_VERSION_MIN_REQUIRED and __IPHONE_OS_VERSION_MAX_ALLOWED. + /* If the targeted macOS is new enough, this fallback case can never be reached, so do not + * use a deprecated API to avoid compiler warnings. + */ This sentence gets somewhat warped up at some point, so I don't think it exactly means and is understandable as you meant it. What about this: If the targeted macOS is new enough, use of older APIs will cause deprecation warnings. Due to the availability check, we actually won't ever execute the code in such builds, but the compiler will still warn about it, unless we actually ifdef out the reference. Outside of what the patch does, I see the existing file uses this construct in a few places: #if !TARGET_OS_IPHONE && __MAC_OS_X_VERSION_MIN_REQUIRED >= 1070 I think it would seem more consistent to update this to use TARGET_OS_OSX instead of negating TARGET_OS_IPHONE - or is there something I'm missing? As for alternative ways of doing this, that would be less unwieldy - I have something like this in mind: #define SDK_AT_LEAST(macos, ios, tvos) \ (TARGET_OS_OSX&& MAC_OS_X_VERSION_MAX_ALLOWED>= macos) || \ (TARGET_OS_IOS&& __IPHONE_OS_VERSION_MAX_ALLOWED >= ios) || \ (TARGET_OS_TV && __TV_OS_VERSION_MAX_ALLOWED >= tvos) #if SDK_AT_LEAST(__MAC_10_15, __IPHONE_10_0, __TVOS_17_0) We could add similar macros for both SDK_AT_LEAST and TARGET_VERSION_AT_LEAST, and variants for different combinations of macos/ios/tvos for when we don't want to specify all of them. We can't use defined(macos) etc within this context though, so if we want to go this way, we'd need to start out with ifdefs for all the defines we use, like this: #ifndef __MAC_10_15 #define __MAC_10_15 #endif There's of course a bit of fragility here, we need to make sure that we actually copypaste the exact right value here. But on the other hand, we even could make it intentionally something else, e.g. like this: #ifndef __MAC_10_15 // If the SDK doesn't define this constant, the SDK doesn't support this version anyway, and we won't end up selecting it, so just use a dummy value instead. #define __MAC_10_15 #endif What do you think, does any of that seem like it would make the code more manageable? // Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH v3 0/2] lavc/aarch64/fdct: add neon-optimized fdct for aarch64
On Wed, 17 Apr 2024, Ramiro Polla wrote: This patch set adds fdct to checkasm and neon-optimized fdct for aarch64. Ramiro Polla (2): checkasm: add test for fdct lavc/aarch64/fdct: add neon-optimized fdct for aarch64 libavcodec/aarch64/Makefile | 2 + libavcodec/aarch64/fdct.h | 26 ++ libavcodec/aarch64/fdctdsp_init_aarch64.c | 39 +++ libavcodec/aarch64/fdctdsp_neon.S | 368 ++ libavcodec/avcodec.h | 1 + libavcodec/fdctdsp.c | 4 +- libavcodec/fdctdsp.h | 2 + libavcodec/options_table.h| 1 + libavcodec/tests/aarch64/dct.c| 2 + tests/checkasm/Makefile | 1 + tests/checkasm/checkasm.c | 3 + tests/checkasm/checkasm.h | 1 + tests/checkasm/fdctdsp.c | 68 tests/fate/checkasm.mak | 1 + 14 files changed, 518 insertions(+), 1 deletion(-) create mode 100644 libavcodec/aarch64/fdct.h create mode 100644 libavcodec/aarch64/fdctdsp_init_aarch64.c create mode 100644 libavcodec/aarch64/fdctdsp_neon.S create mode 100644 tests/checkasm/fdctdsp.c LGTM, thanks! // Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH] Remove .travis.yml
Travis is no longer relevant for attempting to run CI jobs in our setup. --- .travis.yml | 30 -- 1 file changed, 30 deletions(-) delete mode 100644 .travis.yml diff --git a/.travis.yml b/.travis.yml deleted file mode 100644 index 784b7bdf73..00 --- a/.travis.yml +++ /dev/null @@ -1,30 +0,0 @@ -language: c -sudo: false -os: - - linux - - osx -addons: - apt: -packages: - - nasm - - diffutils -compiler: - - clang - - gcc -matrix: -exclude: -- os: osx - compiler: gcc -cache: - directories: -- ffmpeg-samples -before_install: - - if [ "$TRAVIS_OS_NAME" == "osx" ]; then brew update; fi -install: - - if [ "$TRAVIS_OS_NAME" == "osx" ]; then brew install nasm; fi -script: - - mkdir -p ffmpeg-samples - - ./configure --samples=ffmpeg-samples --cc=$CC - - make -j 8 - - make fate-rsync - - make check -j 8 -- 2.39.3 (Apple Git-146) ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH v2] lavc/aarch64/fdct: add neon-optimized fdct for aarch64
On Wed, 17 Apr 2024, Ramiro Polla wrote: The code is imported from libjpeg-turbo-3.0.1. The neon registers used have been changed to avoid modifying v8-v15. --- libavcodec/aarch64/Makefile | 2 + libavcodec/aarch64/fdct.h | 26 ++ libavcodec/aarch64/fdctdsp_init_aarch64.c | 39 +++ libavcodec/aarch64/fdctdsp_neon.S | 368 ++ libavcodec/avcodec.h | 1 + libavcodec/fdctdsp.c | 4 +- libavcodec/fdctdsp.h | 2 + libavcodec/options_table.h| 1 + libavcodec/tests/aarch64/dct.c| 2 + tests/checkasm/Makefile | 1 + tests/checkasm/checkasm.c | 3 + tests/checkasm/checkasm.h | 1 + tests/checkasm/fdctdsp.c | 68 tests/fate/checkasm.mak | 1 + 14 files changed, 518 insertions(+), 1 deletion(-) create mode 100644 libavcodec/aarch64/fdct.h create mode 100644 libavcodec/aarch64/fdctdsp_init_aarch64.c create mode 100644 libavcodec/aarch64/fdctdsp_neon.S create mode 100644 tests/checkasm/fdctdsp.c Overall LGTM, thanks! You may wish to split adding the checkasm test to a separate patch, before adding the new implementation. I was surprised by the header libavcodec/aarch64/fdct.h which seemed redundant on first glance, but I see that this is needed for the dct test executable in libavcodec/tests/aarch64/dct.c, so I guess this is reasonable. (In most other asm implementations, we just declare the functions at the start of the *_init.c files.) // Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH v2] tests/checkasm: add exclude_guest for non-x86 linux perf
On Wed, 10 Apr 2024, J. Dekker wrote: The exclude_guest option only has an effect on x86. Omitting 'exclude_guest' defaults to zero which implies that you can count guest events should you run one. Some non-x86 kernels just ignore it, while others (e.g. the Asahi Linux kernels) require the user to explicitly set the option to 1, i.e. the only behaviour that makes sense when counting guest events isn't supported. Signed-off-by: J. Dekker --- Made commit message clearer, no functional change since v1. tests/checkasm/checkasm.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c index dcd2fd6957..8be6cb0f55 100644 --- a/tests/checkasm/checkasm.c +++ b/tests/checkasm/checkasm.c @@ -742,6 +742,9 @@ static int bench_init_linux(void) .disabled = 1, // start counting only on demand .exclude_kernel = 1, .exclude_hv = 1, +#if !ARCH_X86 +.exclude_guest = 1, +#endif }; printf("benchmarking with Linux Perf Monitoring API\n"); -- 2.44.0 Thanks, the updated commit message feels more readable to me at least. I'm not familiar with the perf API, but I tested perf on an aarch64 machine where perf benchmarking previously worked, and it still works after this change, so it seems ok. // Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] movenc: Allow writing timed ID3 metadata
On Tue, 9 Apr 2024, James Almer wrote: On 4/4/2024 7:29 AM, Martin Storsjö wrote: This is based on a spec at https://aomediacodec.github.io/id3-emsg/, further based on ISO/IEC 23009-1:2019. Within libavformat, timed ID3 metadata (already supported by the mpegts demuxer and muxer) is handled as a separate data AVStream with codec type AV_CODEC_ID_TIMED_ID3. However, it doesn't have a corresponding track in the mov file - instead, these events are written as separate toplevel 'emsg' boxes. --- libavformat/movenc.c | 49 - libavformat/tests/movenc.c | 55 +- tests/ref/fate/movenc | 8 ++ 3 files changed, 104 insertions(+), 8 deletions(-) Should be ok. Thanks for the review, pushed now. // Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] tests/movenc: Validate that normal muxer usage doesn't print warnings
On Thu, 4 Apr 2024, Martin Storsjö wrote: We have test to make sure that certain configurations do print warnings. However, the normal operation of the muxer within this test always printed a warning, so those tests to check for extra warnings didn't essentially guard anything. The warning that always was printed, "track 1: codec frame size is not set" was not present in the libav fork where this testcase originated, it was removed in f234e8a32e6c69d7b63f8627f278be7c2c987f43. Set the frame size for the audio stream to silence the warning, and use this frame size in a couple later calculations, and check that one test configuration doesn't print warnings. Setting the frame size apparently changes the rounding of a timestamp in the ismv muxing testcase. --- libavformat/tests/movenc.c | 10 -- tests/ref/fate/movenc | 2 +- 2 files changed, 9 insertions(+), 3 deletions(-) diff --git a/libavformat/tests/movenc.c b/libavformat/tests/movenc.c index 77f73abdfa..12a3632d4e 100644 --- a/libavformat/tests/movenc.c +++ b/libavformat/tests/movenc.c @@ -215,6 +215,7 @@ static void init_fps(int bf, int audio_preroll, int fps) st->codecpar->codec_type = AVMEDIA_TYPE_AUDIO; st->codecpar->codec_id = AV_CODEC_ID_AAC; st->codecpar->sample_rate = 44100; +st->codecpar->frame_size = 1024; st->codecpar->ch_layout = (AVChannelLayout)AV_CHANNEL_LAYOUT_STEREO; st->time_base.num = 1; st->time_base.den = 44100; @@ -232,9 +233,10 @@ static void init_fps(int bf, int audio_preroll, int fps) frames = 0; gop_size = 30; duration = video_st->time_base.den / fps; -audio_duration = 1024LL * audio_st->time_base.den / audio_st->codecpar->sample_rate; +audio_duration = (long long)audio_st->codecpar->frame_size * + audio_st->time_base.den / audio_st->codecpar->sample_rate; if (audio_preroll) -audio_preroll = 2048LL * audio_st->time_base.den / audio_st->codecpar->sample_rate; +audio_preroll = 2 * audio_duration; bframes = bf; video_dts = bframes ? -duration : 0; @@ -442,6 +444,7 @@ int main(int argc, char **argv) // Similar to the previous one, but with input that doesn't start at // pts/dts 0. avoid_negative_ts behaves in the same way as // in non-empty-moov-no-elst above. +init_count_warnings(); init_out("empty-moov-no-elst"); av_dict_set(, "movflags", "+frag_keyframe+empty_moov", 0); init(1, 0); @@ -449,6 +452,9 @@ int main(int argc, char **argv) finish(); close_out(); +reset_count_warnings(); +check(num_warnings == 0, "Unexpected warnings printed"); + // Same as the previous one, but disable avoid_negative_ts (which // would require using an edit list, but with empty_moov, one can't // write a sensible edit list, when the start timestamps aren't known). diff --git a/tests/ref/fate/movenc b/tests/ref/fate/movenc index 968a3d27f2..0c77f5187c 100644 --- a/tests/ref/fate/movenc +++ b/tests/ref/fate/movenc @@ -20,7 +20,7 @@ write_data len 828, time nopts, type unknown atom - write_data len 728, time 99, type sync atom moof write_data len 812, time nopts, type unknown atom - write_data len 148, time nopts, type trailer atom - -92ce825ff40505ec8676191705adb7e7 4439 ismv +d2df24d323f4a8896441cd91203ac5f8 4439 ismv write_data len 36, time nopts, type header atom ftyp write_data len 1123, time nopts, type header atom - write_data len 796, time 0, type sync atom moof -- 2.39.3 (Apple Git-146) Will push within a few days if there are no objections. // Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] movenc: Remove a leftover commented out line
On Thu, 4 Apr 2024, Martin Storsjö wrote: This line originates from 6f69f7a8bf6a0d013985578df2ef42ee6b1c7994. --- libavformat/movenc.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/libavformat/movenc.c b/libavformat/movenc.c index 46a5b3a62f..ccdd2dbfc9 100644 --- a/libavformat/movenc.c +++ b/libavformat/movenc.c @@ -1173,8 +1173,6 @@ static int get_samples_per_packet(MOVTrack *track) { int i, first_duration; -// return track->par->frame_size; - /* use 1 for raw PCM */ if (!track->audio_vbr) return 1; -- 2.39.3 (Apple Git-146) Will apply. // Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] aarch64: Factorize code for CPU feature detection on Apple platforms
On Tue, 12 Mar 2024, Martin Storsjö wrote: --- libavutil/aarch64/cpu.c | 25 + 1 file changed, 13 insertions(+), 12 deletions(-) diff --git a/libavutil/aarch64/cpu.c b/libavutil/aarch64/cpu.c index 7a05391343..196bdaf6b0 100644 --- a/libavutil/aarch64/cpu.c +++ b/libavutil/aarch64/cpu.c @@ -45,22 +45,23 @@ static int detect_flags(void) #elif defined(__APPLE__) && HAVE_SYSCTLBYNAME #include +static int have_feature(const char *feature) { +uint32_t value = 0; +size_t size = sizeof(value); +if (!sysctlbyname(feature, , , NULL, 0)) +return value; +return 0; +} + static int detect_flags(void) { -uint32_t value = 0; -size_t size; int flags = 0; -size = sizeof(value); -if (!sysctlbyname("hw.optional.arm.FEAT_DotProd", , , NULL, 0)) { -if (value) -flags |= AV_CPU_FLAG_DOTPROD; -} -size = sizeof(value); -if (!sysctlbyname("hw.optional.arm.FEAT_I8MM", , , NULL, 0)) { -if (value) -flags |= AV_CPU_FLAG_I8MM; -} +if (have_feature("hw.optional.arm.FEAT_DotProd")) +flags |= AV_CPU_FLAG_DOTPROD; +if (have_feature("hw.optional.arm.FEAT_I8MM")) +flags |= AV_CPU_FLAG_I8MM; + return flags; } Will apply. // Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH v3 3/5] configure: switch to shebang without space
On Tue, 9 Apr 2024, J. Dekker wrote: Note that the config.sh file is left without a shebang, this file is supposed to be sourced into the current environment. This commit is purely cosmetic. Signed-off-by: J. Dekker --- configure | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Thanks, this set seems fine to me - the explanations seem good now. (I'd consider merging patches 3-5 though, but keeping the full commit message from patch 3).) // Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH v2 2/2] configure: simplify bigendian check
On Mon, 8 Apr 2024, J. Dekker wrote: The preferred way to use LTO is --enable-lto but often times packagers still end up with -flto in cflags for various reasons. Using grep on binary object files is brittle and relies on specific object representation, which in the case of LLVM bitcode, debug-info or other intermediary formats can fail silently. This patch changes the check to a more commonly used define for big-endian systems. It's not common only for big-endian systems, but for GCC-style compilers on all endians. More checks may need to be added in the future to cover legacy machines. Don't use the word "legacy" here. This define is not standard, so it's perfectly plausible to have a modern, standards compliant compiler that just doesn't use this define. With the commmit message you added here, the change is ok, but please do reword the last sentence above. I'd suggest changing the last paragraph into this: --- This patch changes the check to a more commonly used define for GCC style compilers. More checks may be needed to cover other potential compilers that don't use the __BYTE_ORDER__ define. --- // Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH v2 1/2] configure, etc: unify shebang usage
On Mon, 8 Apr 2024, J. Dekker wrote: In some cases, these scripts can be called directly by packagers, and some systems require the interpreter to be explicit. It is unclear to me which of the changes are needed and for what reason, please elaborate much more in the commit message. Is it possible to elaborate on "some systems require the interpreter to be explicit"? It'd be much nicer to reason about if there was a concrete example of such a case (even if it certainly is right to add the missing shebang line). The changes I see fall into these categories: - Change "#! " into "#!. Does this change have a functional effect for someone (where, and why?) or is it purely a cosmetic change? - Add a shebang line in the generated ffbuild/config.sh. This script is highly unlikely to be useful to call on its own like that, so while this probably is good for consistency I don't see it ever making a difference. - Add a shebang line in ffbuild/libversion.sh. I can see the value in calling this script directly, outside of our build system. I presume this is the actual change that makes a difference here? I don't mind the changes, but I'd prefer to split them into two separate commits; add missing shebangs (with an example of the case where it really does make a difference), vs removing extra spaces in shebangs for consistency (with explicit clarification in the commit message whether this is only for stylistic consistency or whether it does make a difference somewhere, and if it does, where). // Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH] aarch64: ac3dsp: Simplify the end of ff_ac3_sum_square_butterfly_float_neon
Before: Cortex A53 A72 A78 ac3_sum_square_bufferfly_float_neon: 1005.7 516.5 224.5 After: ac3_sum_square_bufferfly_float_neon: 981.7 504.5 223.2 --- libavcodec/aarch64/ac3dsp_neon.S | 16 1 file changed, 4 insertions(+), 12 deletions(-) diff --git a/libavcodec/aarch64/ac3dsp_neon.S b/libavcodec/aarch64/ac3dsp_neon.S index 20beb6cc50..7e97cc39f7 100644 --- a/libavcodec/aarch64/ac3dsp_neon.S +++ b/libavcodec/aarch64/ac3dsp_neon.S @@ -103,17 +103,9 @@ function ff_ac3_sum_square_butterfly_float_neon, export=1 fmlav3.4s, v17.4s, v17.4s subsw3, w3, #4 b.gt1b -faddp v0.4s, v0.4s, v0.4s -faddp v0.2s, v0.2s, v0.2s -st1 {v0.s}[0], [x0], #4 -faddp v1.4s, v1.4s, v1.4s -faddp v1.2s, v1.2s, v1.2s -st1 {v1.s}[0], [x0], #4 -faddp v2.4s, v2.4s, v2.4s -faddp v2.2s, v2.2s, v2.2s -st1 {v2.s}[0], [x0], #4 -faddp v3.4s, v3.4s, v3.4s -faddp v3.2s, v3.2s, v3.2s -st1 {v3.s}[0], [x0] +faddp v0.4s, v0.4s, v1.4s +faddp v2.4s, v2.4s, v3.4s +faddp v0.4s, v0.4s, v2.4s +st1 {v0.4s}, [x0] ret endfunc -- 2.39.3 (Apple Git-146) ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH v4 0/5] avcodec/ac3: Add aarch64 NEON DSP
On Sat, 6 Apr 2024, Geoff Hill wrote: Thanks Martin for your review and testing. Here's v4 with the following changes: * Use fmal in sum_square_butterfly_float loop. Faster. * Removed redundant loop bound zero checks in extract_exponents, sum_square_bufferfly_int32 and sum_square_bufferfly_float. * Fixed randomize_int24() to also use negative values. * Carry copyright from arm implementation over to aarch64. I did use this version as reference. * Fix indentation to match existing aarch64 assembly style. Tested once again on aarch64 and x86. Thanks, this set looked good, so I pushed it. I amended the commits a bit, moving the added copyright line from checkasm/ac3dsp.c from patch 1 to 2, where that file actually gets extended. Actually, after pushing, I realized another thing that can be done better in ff_ac3_sum_square_butterfly_float_neon - I'll send a patch for that. On AWS Graviton2 (t4g.medium), GCC 12.3: $ tests/checkasm/checkasm --bench --test=ac3dsp ... NEON: - ac3dsp.ac3_exponent_min [OK] - ac3dsp.ac3_extract_exponents [OK] - ac3dsp.float_to_fixed24 [OK] - ac3dsp.ac3_sum_square_butterfly_int32 [OK] - ac3dsp.ac3_sum_square_butterfly_float [OK] checkasm: all 20 tests passed float_to_fixed24_c: 2460.5 float_to_fixed24_neon: 561.5 FWIW, it's usually neater to include such numbers in the commit message, so it gets brought along into the final git history (to show the benefit we got from the optimization at the time), quoting only those functions that are added/modified in each patch. But I didn't amend in that in the commit messages this time, but you can keep it in mind for the future. Anyway, thanks for the patches! // Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH v3 5/5] avcodec/ac3: Implement sum_square_butterfly_float for aarch64 NEON
On Tue, 2 Apr 2024, Geoff Hill wrote: Signed-off-by: Geoff Hill --- libavcodec/aarch64/ac3dsp_init_aarch64.c | 5 libavcodec/aarch64/ac3dsp_neon.S | 35 tests/checkasm/ac3dsp.c | 26 ++ 3 files changed, 66 insertions(+) diff --git a/libavcodec/aarch64/ac3dsp_neon.S b/libavcodec/aarch64/ac3dsp_neon.S index fa8fcf2e47..4a78ec0b2a 100644 --- a/libavcodec/aarch64/ac3dsp_neon.S +++ b/libavcodec/aarch64/ac3dsp_neon.S @@ -88,3 +88,38 @@ function ff_ac3_sum_square_butterfly_int32_neon, export=1 st1 {v0.1d-v3.1d}, [x0] 1: ret endfunc + +function ff_ac3_sum_square_butterfly_float_neon, export=1 +cbz w3, 1f +moviv0.4s, #0 +moviv1.4s, #0 +moviv2.4s, #0 +moviv3.4s, #0 +0: ld1 {v30.4s}, [x1], #16 +ld1 {v31.4s}, [x2], #16 +faddv16.4s, v30.4s, v31.4s +fsubv17.4s, v30.4s, v31.4s +fmulv30.4s, v30.4s, v30.4s +faddv0.4s, v0.4s, v30.4s The arm version here used vmla instead of separate vmul+vadd - is there any reason why we can't use fmla here? // Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH v3 4/5] avcodec/ac3: Implement sum_square_butterfly_int32 for aarch64 NEON
On Tue, 2 Apr 2024, Geoff Hill wrote: Signed-off-by: Geoff Hill --- libavcodec/aarch64/ac3dsp_init_aarch64.c | 5 + libavcodec/aarch64/ac3dsp_neon.S | 24 + tests/checkasm/ac3dsp.c | 27 3 files changed, 56 insertions(+) diff --git a/libavcodec/aarch64/ac3dsp_init_aarch64.c b/libavcodec/aarch64/ac3dsp_init_aarch64.c index 1bdc215b51..e95436c651 100644 --- a/libavcodec/aarch64/ac3dsp_init_aarch64.c +++ b/libavcodec/aarch64/ac3dsp_init_aarch64.c @@ -28,6 +28,10 @@ void ff_ac3_exponent_min_neon(uint8_t *exp, int num_reuse_blocks, int nb_coefs); void ff_ac3_extract_exponents_neon(uint8_t *exp, int32_t *coef, int nb_coefs); void ff_float_to_fixed24_neon(int32_t *dst, const float *src, size_t len); +void ff_ac3_sum_square_butterfly_int32_neon(int64_t sum[4], +const int32_t *coef0, +const int32_t *coef1, +int len); av_cold void ff_ac3dsp_init_aarch64(AC3DSPContext *c) { @@ -37,4 +41,5 @@ av_cold void ff_ac3dsp_init_aarch64(AC3DSPContext *c) c->ac3_exponent_min = ff_ac3_exponent_min_neon; c->extract_exponents = ff_ac3_extract_exponents_neon; c->float_to_fixed24 = ff_float_to_fixed24_neon; +c->sum_square_butterfly_int32 = ff_ac3_sum_square_butterfly_int32_neon; } diff --git a/libavcodec/aarch64/ac3dsp_neon.S b/libavcodec/aarch64/ac3dsp_neon.S index b26f71a3f6..fa8fcf2e47 100644 --- a/libavcodec/aarch64/ac3dsp_neon.S +++ b/libavcodec/aarch64/ac3dsp_neon.S @@ -64,3 +64,27 @@ function ff_float_to_fixed24_neon, export=1 b.ne0b ret endfunc + +function ff_ac3_sum_square_butterfly_int32_neon, export=1 +cbz w3, 1f The arm version of this patch doesn't have any corresponding check for whether this parameter is zero, and the checkasm test doesn't test that behaviour either. Is that never feasiable (and we could leave it out here) or should we test that and fix it in other assembly versions? In the latter case, it's of course ok to defer that to a separate later patch, not holding up this one. // Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH v3 0/5] avcodec/ac3: Add aarch64 NEON DSP
On Tue, 2 Apr 2024, Geoff Hill wrote: Here's v3 to push the AC-3 ARMv8 NEON experiment a step further. This version implements 5 of the AC-3 encoder DSP functions, and adds checkasm tests where missing. I've tested that the checkasm tests pass on aarch64 and x86. Thanks, I've tested that checkasm also passes on 32 bit arm (where we also do have an ac3dsp implementation). Overall the patches look mostly fine. Are these implementations based on the existing 32 bit arm ones? The code is quite similar (although there's not very many different ways to implement things, so this could be a coincidence)? If based on the existing code, it would be good to retain the copyright statement from that file. These functions have a different indentation than the rest of essentially all our aarch64 assembly (the code you're adding is aligned in two different ways) - please check other files (e.g. vp8dsp_neon.S) for example. The instructions should be aligned to 8 leading spaces, and operands to 24 leading characters. Other than those generic points, I have two comments on the patches themselves. // Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH] movenc: Allow writing timed ID3 metadata
This is based on a spec at https://aomediacodec.github.io/id3-emsg/, further based on ISO/IEC 23009-1:2019. Within libavformat, timed ID3 metadata (already supported by the mpegts demuxer and muxer) is handled as a separate data AVStream with codec type AV_CODEC_ID_TIMED_ID3. However, it doesn't have a corresponding track in the mov file - instead, these events are written as separate toplevel 'emsg' boxes. --- libavformat/movenc.c | 49 - libavformat/tests/movenc.c | 55 +- tests/ref/fate/movenc | 8 ++ 3 files changed, 104 insertions(+), 8 deletions(-) diff --git a/libavformat/movenc.c b/libavformat/movenc.c index ccdd2dbfc9..29b1e4bb0f 100644 --- a/libavformat/movenc.c +++ b/libavformat/movenc.c @@ -5515,7 +5515,7 @@ static int mov_write_ftyp_tag(AVIOContext *pb, AVFormatContext *s) { MOVMuxContext *mov = s->priv_data; int64_t pos = avio_tell(pb); -int has_h264 = 0, has_av1 = 0, has_video = 0, has_dolby = 0; +int has_h264 = 0, has_av1 = 0, has_video = 0, has_dolby = 0, has_id3 = 0; int has_iamf = 0; for (int i = 0; i < s->nb_stream_groups; i++) { @@ -5544,6 +5544,8 @@ static int mov_write_ftyp_tag(AVIOContext *pb, AVFormatContext *s) st->codecpar->nb_coded_side_data, AV_PKT_DATA_DOVI_CONF)) has_dolby = 1; +if (st->codecpar->codec_id == AV_CODEC_ID_TIMED_ID3) +has_id3 = 1; } avio_wb32(pb, 0); /* size */ @@ -5623,6 +5625,9 @@ static int mov_write_ftyp_tag(AVIOContext *pb, AVFormatContext *s) if (mov->flags & FF_MOV_FLAG_DASH && mov->flags & FF_MOV_FLAG_GLOBAL_SIDX) ffio_wfourcc(pb, "dash"); +if (has_id3) +ffio_wfourcc(pb, "aid3"); + return update_size(pb, pos); } @@ -6704,6 +6709,34 @@ static int mov_build_iamf_packet(AVFormatContext *s, MOVTrack *trk, AVPacket *pk return ret; } +static int mov_write_emsg_tag(AVIOContext *pb, AVStream *st, AVPacket *pkt) +{ +int64_t pos = avio_tell(pb); +const char *scheme_id_uri = "https://aomedia.org/emsg/ID3;; +const char *value = ""; + +av_assert0(st->time_base.num == 1); + +avio_write_marker(pb, + av_rescale_q(pkt->pts, st->time_base, AV_TIME_BASE_Q), + AVIO_DATA_MARKER_BOUNDARY_POINT); + +avio_wb32(pb, 0); /* size */ +ffio_wfourcc(pb, "emsg"); +avio_w8(pb, 1); /* version */ +avio_wb24(pb, 0); +avio_wb32(pb, st->time_base.den); /* timescale */ +avio_wb64(pb, pkt->pts); /* presentation_time */ +avio_wb32(pb, 0xU); /* event_duration */ +avio_wb32(pb, 0); /* id */ +/* null terminated UTF8 strings */ +avio_write(pb, scheme_id_uri, strlen(scheme_id_uri) + 1); +avio_write(pb, value, strlen(value) + 1); +avio_write(pb, pkt->data, pkt->size); + +return update_size(pb, pos); +} + static int mov_write_packet(AVFormatContext *s, AVPacket *pkt) { MOVMuxContext *mov = s->priv_data; @@ -6714,6 +6747,11 @@ static int mov_write_packet(AVFormatContext *s, AVPacket *pkt) return 1; } +if (s->streams[pkt->stream_index]->codecpar->codec_id == AV_CODEC_ID_TIMED_ID3) { +mov_write_emsg_tag(s->pb, s->streams[pkt->stream_index], pkt); +return 0; +} + trk = s->streams[pkt->stream_index]->priv_data; if (trk->iamf) { @@ -7365,6 +7403,12 @@ static int mov_init(AVFormatContext *s) AVStream *st = s->streams[i]; if (st->priv_data) continue; +// Don't produce a track in the output file for timed ID3 streams. +if (st->codecpar->codec_id == AV_CODEC_ID_TIMED_ID3) { +// Leave priv_data set to NULL for these AVStreams that don't +// have a corresponding track. +continue; +} st->priv_data = st; mov->nb_tracks++; } @@ -7462,6 +7506,9 @@ static int mov_init(AVFormatContext *s) MOVTrack *track = st->priv_data; AVDictionaryEntry *lang = av_dict_get(st->metadata, "language", NULL,0); +if (!track) +continue; + if (!track->st) { track->st = st; track->par = st->codecpar; diff --git a/libavformat/tests/movenc.c b/libavformat/tests/movenc.c index 12a3632d4e..2fd5c67e76 100644 --- a/libavformat/tests/movenc.c +++ b/libavformat/tests/movenc.c @@ -58,7 +58,7 @@ struct AVMD5* md5; uint8_t hash[HASH_SIZE]; AVPacket *pkt; -AVStream *video_st, *audio_st; +AVStream *video_st, *audio_st, *id3_st; int64_t audio_dts, video_dts; int bframes; @@ -177,7 +177,7 @@ static void check_func(int value, int line, const char *msg, ...) } #define check(value, ...) check_func(value, __LINE__, __VA_ARGS__) -static void init_fps(int bf, int audio_preroll, int fps) +static void init_fps(int bf, int audio_preroll, int fps, int id3) { AVStream *st; int iobuf_size =
[FFmpeg-devel] [PATCH] tests/movenc: Validate that normal muxer usage doesn't print warnings
We have test to make sure that certain configurations do print warnings. However, the normal operation of the muxer within this test always printed a warning, so those tests to check for extra warnings didn't essentially guard anything. The warning that always was printed, "track 1: codec frame size is not set" was not present in the libav fork where this testcase originated, it was removed in f234e8a32e6c69d7b63f8627f278be7c2c987f43. Set the frame size for the audio stream to silence the warning, and use this frame size in a couple later calculations, and check that one test configuration doesn't print warnings. Setting the frame size apparently changes the rounding of a timestamp in the ismv muxing testcase. --- libavformat/tests/movenc.c | 10 -- tests/ref/fate/movenc | 2 +- 2 files changed, 9 insertions(+), 3 deletions(-) diff --git a/libavformat/tests/movenc.c b/libavformat/tests/movenc.c index 77f73abdfa..12a3632d4e 100644 --- a/libavformat/tests/movenc.c +++ b/libavformat/tests/movenc.c @@ -215,6 +215,7 @@ static void init_fps(int bf, int audio_preroll, int fps) st->codecpar->codec_type = AVMEDIA_TYPE_AUDIO; st->codecpar->codec_id = AV_CODEC_ID_AAC; st->codecpar->sample_rate = 44100; +st->codecpar->frame_size = 1024; st->codecpar->ch_layout = (AVChannelLayout)AV_CHANNEL_LAYOUT_STEREO; st->time_base.num = 1; st->time_base.den = 44100; @@ -232,9 +233,10 @@ static void init_fps(int bf, int audio_preroll, int fps) frames = 0; gop_size = 30; duration = video_st->time_base.den / fps; -audio_duration = 1024LL * audio_st->time_base.den / audio_st->codecpar->sample_rate; +audio_duration = (long long)audio_st->codecpar->frame_size * + audio_st->time_base.den / audio_st->codecpar->sample_rate; if (audio_preroll) -audio_preroll = 2048LL * audio_st->time_base.den / audio_st->codecpar->sample_rate; +audio_preroll = 2 * audio_duration; bframes = bf; video_dts = bframes ? -duration : 0; @@ -442,6 +444,7 @@ int main(int argc, char **argv) // Similar to the previous one, but with input that doesn't start at // pts/dts 0. avoid_negative_ts behaves in the same way as // in non-empty-moov-no-elst above. +init_count_warnings(); init_out("empty-moov-no-elst"); av_dict_set(, "movflags", "+frag_keyframe+empty_moov", 0); init(1, 0); @@ -449,6 +452,9 @@ int main(int argc, char **argv) finish(); close_out(); +reset_count_warnings(); +check(num_warnings == 0, "Unexpected warnings printed"); + // Same as the previous one, but disable avoid_negative_ts (which // would require using an edit list, but with empty_moov, one can't // write a sensible edit list, when the start timestamps aren't known). diff --git a/tests/ref/fate/movenc b/tests/ref/fate/movenc index 968a3d27f2..0c77f5187c 100644 --- a/tests/ref/fate/movenc +++ b/tests/ref/fate/movenc @@ -20,7 +20,7 @@ write_data len 828, time nopts, type unknown atom - write_data len 728, time 99, type sync atom moof write_data len 812, time nopts, type unknown atom - write_data len 148, time nopts, type trailer atom - -92ce825ff40505ec8676191705adb7e7 4439 ismv +d2df24d323f4a8896441cd91203ac5f8 4439 ismv write_data len 36, time nopts, type header atom ftyp write_data len 1123, time nopts, type header atom - write_data len 796, time 0, type sync atom moof -- 2.39.3 (Apple Git-146) ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH] movenc: Remove a leftover commented out line
This line originates from 6f69f7a8bf6a0d013985578df2ef42ee6b1c7994. --- libavformat/movenc.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/libavformat/movenc.c b/libavformat/movenc.c index 46a5b3a62f..ccdd2dbfc9 100644 --- a/libavformat/movenc.c +++ b/libavformat/movenc.c @@ -1173,8 +1173,6 @@ static int get_samples_per_packet(MOVTrack *track) { int i, first_duration; -// return track->par->frame_size; - /* use 1 for raw PCM */ if (!track->audio_vbr) return 1; -- 2.39.3 (Apple Git-146) ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [GASPP PATCH] Implicitly start out in the text section for armasm
This fixes assembling files starting with bare symbol declarations, without explicitly switching to .text first. --- gas-preprocessor.pl | 3 +++ 1 file changed, 3 insertions(+) diff --git a/gas-preprocessor.pl b/gas-preprocessor.pl index 2880858..b66181a 100755 --- a/gas-preprocessor.pl +++ b/gas-preprocessor.pl @@ -289,6 +289,9 @@ my %aarch64_req_alias; if ($force_thumb) { parse_line(".thumb\n"); } +if ($as_type eq "armasm") { +parse_line(".text\n"); +} # pass 1: parse .macro # note that the handling of arguments is probably overly permissive vs. gas -- 2.34.1 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH 00/21] aarch64: hevc: Add missing hevc_pel NEON functions
On Tue, 26 Mar 2024, Jean-Baptiste Kempf wrote: On Mon, 25 Mar 2024, at 22:56, J. Dekker wrote: On Mon, 25 Mar 2024, Martin Storsjö wrote: Since some time, we have pretty complete AArch64 NEON coverage for the hevc decoder. However, some of these functions require the I8MM instruction set extension, and many of them (but not all) lack a plain NEON version. This patchset fills in a regular NEON version of all functions where we have an I8MM function. For context; the I8MM instruction set extension is a mandatory part of armv8.6-a. E.g. Apple M2, AWS Graviton 3 have it, but Apple M1 and Ampere Altra don't. This patchset takes decoding of a 1080p HEVC clip from 402 fps to 649 fps on an Apple M1. Patch #2 also fixes a subtle bug in the existing implementation; two functions relied on the contents on the stack, below the stack pointer, being untouched within a function. If a signal gets delivered, those parts of the stack could be clobbered. I know this is a bit short notice for a patchset of this size - but, would people be OK with merging this patchset before the impending 7.0 branch (which is made within the next 24h)? The patches pass all my tricky build configurations, they give a very non-negligible speedup on many common CPUs, and patch #2 fixes a real bug in the existing impleemntations. (A bug fix patch can of course be backported after the branch too, but performance optimizations aren't generally relevant for backporting.) // Martin Yes, please. I will tomorrow morning if you didn’t already push. +1 Thanks, I pushed this set now. // Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH 00/21] aarch64: hevc: Add missing hevc_pel NEON functions
On Mon, 25 Mar 2024, Martin Storsjö wrote: Since some time, we have pretty complete AArch64 NEON coverage for the hevc decoder. However, some of these functions require the I8MM instruction set extension, and many of them (but not all) lack a plain NEON version. This patchset fills in a regular NEON version of all functions where we have an I8MM function. For context; the I8MM instruction set extension is a mandatory part of armv8.6-a. E.g. Apple M2, AWS Graviton 3 have it, but Apple M1 and Ampere Altra don't. This patchset takes decoding of a 1080p HEVC clip from 402 fps to 649 fps on an Apple M1. Patch #2 also fixes a subtle bug in the existing implementation; two functions relied on the contents on the stack, below the stack pointer, being untouched within a function. If a signal gets delivered, those parts of the stack could be clobbered. I know this is a bit short notice for a patchset of this size - but, would people be OK with merging this patchset before the impending 7.0 branch (which is made within the next 24h)? The patches pass all my tricky build configurations, they give a very non-negligible speedup on many common CPUs, and patch #2 fixes a real bug in the existing impleemntations. (A bug fix patch can of course be backported after the branch too, but performance optimizations aren't generally relevant for backporting.) // Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH 21/21] aarch64: hevc: Produce plain neon versions of qpel_bi_hv
As the plain neon qpel_h functions process two rows at a time, we need to allocate storage for h+8 rows instead of h+7. By allocating storage for h+8 rows, incrementing the stack pointer won't end up at the right spot in the end. Store the intended final stack pointer value in a register x14 which we store on the stack. AWS Graviton 3: put_hevc_qpel_bi_hv4_8_c: 385.7 put_hevc_qpel_bi_hv4_8_neon: 131.0 put_hevc_qpel_bi_hv4_8_i8mm: 92.2 put_hevc_qpel_bi_hv6_8_c: 701.0 put_hevc_qpel_bi_hv6_8_neon: 239.5 put_hevc_qpel_bi_hv6_8_i8mm: 191.0 put_hevc_qpel_bi_hv8_8_c: 1162.0 put_hevc_qpel_bi_hv8_8_neon: 228.0 put_hevc_qpel_bi_hv8_8_i8mm: 225.2 put_hevc_qpel_bi_hv12_8_c: 2305.0 put_hevc_qpel_bi_hv12_8_neon: 558.0 put_hevc_qpel_bi_hv12_8_i8mm: 483.2 put_hevc_qpel_bi_hv16_8_c: 3965.2 put_hevc_qpel_bi_hv16_8_neon: 732.7 put_hevc_qpel_bi_hv16_8_i8mm: 656.5 put_hevc_qpel_bi_hv24_8_c: 8709.7 put_hevc_qpel_bi_hv24_8_neon: 1555.2 put_hevc_qpel_bi_hv24_8_i8mm: 1448.7 put_hevc_qpel_bi_hv32_8_c: 14818.0 put_hevc_qpel_bi_hv32_8_neon: 2763.7 put_hevc_qpel_bi_hv32_8_i8mm: 2468.0 put_hevc_qpel_bi_hv48_8_c: 32855.5 put_hevc_qpel_bi_hv48_8_neon: 6107.2 put_hevc_qpel_bi_hv48_8_i8mm: 5452.7 put_hevc_qpel_bi_hv64_8_c: 57591.5 put_hevc_qpel_bi_hv64_8_neon: 10660.2 put_hevc_qpel_bi_hv64_8_i8mm: 9580.0 --- libavcodec/aarch64/hevcdsp_init_aarch64.c | 5 + libavcodec/aarch64/hevcdsp_qpel_neon.S| 164 +- 2 files changed, 103 insertions(+), 66 deletions(-) diff --git a/libavcodec/aarch64/hevcdsp_init_aarch64.c b/libavcodec/aarch64/hevcdsp_init_aarch64.c index e9ee901322..e24dd0cbda 100644 --- a/libavcodec/aarch64/hevcdsp_init_aarch64.c +++ b/libavcodec/aarch64/hevcdsp_init_aarch64.c @@ -319,6 +319,10 @@ NEON8_FNPROTO(qpel_bi_v, (uint8_t *dst, ptrdiff_t dststride, const uint8_t *src, ptrdiff_t srcstride, const int16_t *src2, int height, intptr_t mx, intptr_t my, int width),); +NEON8_FNPROTO(qpel_bi_hv, (uint8_t *dst, ptrdiff_t dststride, +const uint8_t *src, ptrdiff_t srcstride, const int16_t *src2, +int height, intptr_t mx, intptr_t my, int width),); + NEON8_FNPROTO(qpel_bi_hv, (uint8_t *dst, ptrdiff_t dststride, const uint8_t *src, ptrdiff_t srcstride, const int16_t *src2, int height, intptr_t mx, intptr_t my, int width), _i8mm); @@ -452,6 +456,7 @@ av_cold void ff_hevc_dsp_init_aarch64(HEVCDSPContext *c, const int bit_depth) NEON8_FNASSIGN(c->put_hevc_qpel, 1, 1, qpel_hv,); NEON8_FNASSIGN(c->put_hevc_qpel_uni, 1, 1, qpel_uni_hv,); NEON8_FNASSIGN_PARTIAL_5(c->put_hevc_qpel_uni_w, 1, 1, qpel_uni_w_hv,); +NEON8_FNASSIGN(c->put_hevc_qpel_bi, 1, 1, qpel_bi_hv,); if (have_i8mm(cpu_flags)) { NEON8_FNASSIGN(c->put_hevc_epel, 0, 1, epel_h, _i8mm); diff --git a/libavcodec/aarch64/hevcdsp_qpel_neon.S b/libavcodec/aarch64/hevcdsp_qpel_neon.S index df7032b692..8ddaa32b70 100644 --- a/libavcodec/aarch64/hevcdsp_qpel_neon.S +++ b/libavcodec/aarch64/hevcdsp_qpel_neon.S @@ -4590,14 +4590,6 @@ endfunc qpel_uni_w_hv neon -#if HAVE_I8MM -ENABLE_I8MM - -qpel_uni_w_hv neon_i8mm - -DISABLE_I8MM -#endif - function hevc_put_hevc_qpel_bi_hv4_8_end_neon mov x9, #(MAX_PB_SIZE * 2) load_qpel_filterh x7, x6 @@ -4620,7 +4612,8 @@ function hevc_put_hevc_qpel_bi_hv4_8_end_neon .endm 1: calc_all .purgem calc -2: ret +2: mov sp, x14 +ret endfunc function hevc_put_hevc_qpel_bi_hv6_8_end_neon @@ -4650,7 +4643,8 @@ function hevc_put_hevc_qpel_bi_hv6_8_end_neon .endm 1: calc_all .purgem calc -2: ret +2: mov sp, x14 +ret endfunc function hevc_put_hevc_qpel_bi_hv8_8_end_neon @@ -4678,7 +4672,8 @@ function hevc_put_hevc_qpel_bi_hv8_8_end_neon .endm 1: calc_all .purgem calc -2: ret +2: mov sp, x14 +ret endfunc function hevc_put_hevc_qpel_bi_hv16_8_end_neon @@ -4723,83 +4718,87 @@ function hevc_put_hevc_qpel_bi_hv16_8_end_neon subsx10, x10, #16 add x4, x4, #32 b.ne0b -add w10, w5, #7 -lsl x10, x10, #7 -sub x10, x10, x6, lsl #1 // part of first line -add sp, sp, x10 // tmp_array without first line +mov sp, x14 ret endfunc -#if HAVE_I8MM -ENABLE_I8MM - -function ff_hevc_put_hevc_qpel_bi_hv4_8_neon_i8mm, export=1 -add w10, w5, #7 +.macro qpel_bi_hv suffix +function ff_hevc_put_hevc_qpel_bi_hv4_8_\suffix, export=1 +add w10, w5, #8 lsl x10, x10, #7 +mov x14, sp sub sp, sp, x10 // tmp_array -stp x7, x30, [sp, #-48]! +stp x7, x30, [sp, #-64]! stp x4, x5, [sp, #16] stp x0, x1, [sp, #32] +str x14,[sp,
[FFmpeg-devel] [PATCH 20/21] aarch64: hevc: Produce plain neon versions of qpel_uni_w_hv
As the plain neon qpel_h functions process two rows at a time, we need to allocate storage for h+8 rows instead of h+7. AWS Graviton 3: put_hevc_qpel_uni_w_hv4_8_c: 422.2 put_hevc_qpel_uni_w_hv4_8_neon: 140.7 put_hevc_qpel_uni_w_hv4_8_i8mm: 100.7 put_hevc_qpel_uni_w_hv8_8_c: 1208.0 put_hevc_qpel_uni_w_hv8_8_neon: 268.2 put_hevc_qpel_uni_w_hv8_8_i8mm: 261.5 put_hevc_qpel_uni_w_hv16_8_c: 4297.2 put_hevc_qpel_uni_w_hv16_8_neon: 802.2 put_hevc_qpel_uni_w_hv16_8_i8mm: 731.2 put_hevc_qpel_uni_w_hv32_8_c: 15518.5 put_hevc_qpel_uni_w_hv32_8_neon: 3085.2 put_hevc_qpel_uni_w_hv32_8_i8mm: 2783.2 put_hevc_qpel_uni_w_hv64_8_c: 57254.5 put_hevc_qpel_uni_w_hv64_8_neon: 11787.5 put_hevc_qpel_uni_w_hv64_8_i8mm: 10659.0 --- libavcodec/aarch64/hevcdsp_init_aarch64.c | 6 +++ libavcodec/aarch64/hevcdsp_qpel_neon.S| 47 +++ 2 files changed, 37 insertions(+), 16 deletions(-) diff --git a/libavcodec/aarch64/hevcdsp_init_aarch64.c b/libavcodec/aarch64/hevcdsp_init_aarch64.c index 0531db027b..e9ee901322 100644 --- a/libavcodec/aarch64/hevcdsp_init_aarch64.c +++ b/libavcodec/aarch64/hevcdsp_init_aarch64.c @@ -305,6 +305,11 @@ NEON8_FNPROTO(epel_uni_w_hv, (uint8_t *_dst, ptrdiff_t _dststride, int height, int denom, int wx, int ox, intptr_t mx, intptr_t my, int width), _i8mm); +NEON8_FNPROTO_PARTIAL_5(qpel_uni_w_hv, (uint8_t *_dst, ptrdiff_t _dststride, +const uint8_t *_src, ptrdiff_t _srcstride, +int height, int denom, int wx, int ox, +intptr_t mx, intptr_t my, int width),); + NEON8_FNPROTO_PARTIAL_5(qpel_uni_w_hv, (uint8_t *_dst, ptrdiff_t _dststride, const uint8_t *_src, ptrdiff_t _srcstride, int height, int denom, int wx, int ox, @@ -446,6 +451,7 @@ av_cold void ff_hevc_dsp_init_aarch64(HEVCDSPContext *c, const int bit_depth) NEON8_FNASSIGN(c->put_hevc_qpel, 1, 1, qpel_hv,); NEON8_FNASSIGN(c->put_hevc_qpel_uni, 1, 1, qpel_uni_hv,); +NEON8_FNASSIGN_PARTIAL_5(c->put_hevc_qpel_uni_w, 1, 1, qpel_uni_w_hv,); if (have_i8mm(cpu_flags)) { NEON8_FNASSIGN(c->put_hevc_epel, 0, 1, epel_h, _i8mm); diff --git a/libavcodec/aarch64/hevcdsp_qpel_neon.S b/libavcodec/aarch64/hevcdsp_qpel_neon.S index f285ab7461..df7032b692 100644 --- a/libavcodec/aarch64/hevcdsp_qpel_neon.S +++ b/libavcodec/aarch64/hevcdsp_qpel_neon.S @@ -4164,7 +4164,7 @@ qpel_hv neon_i8mm DISABLE_I8MM #endif -.macro QPEL_UNI_W_HV_HEADER width +.macro QPEL_UNI_W_HV_HEADER width, suffix ldp x14, x15, [sp] // mx, my ldr w13, [sp, #16] // width stp x19, x30, [sp, #-80]! @@ -4173,7 +4173,7 @@ DISABLE_I8MM stp x24, x25, [sp, #48] stp x26, x27, [sp, #64] mov x19, sp -mov x11, #9088 +mov x11, #(MAX_PB_SIZE*(MAX_PB_SIZE+8)*2) sub sp, sp, x11 mov x20, x0 mov x21, x1 @@ -4190,7 +4190,16 @@ DISABLE_I8MM mov w26, #-6 sub w26, w26, w5// -shift mov w27, w13// width -bl X(ff_hevc_put_hevc_qpel_h\width\()_8_neon_i8mm) +.ifc \suffix, neon +.if \width >= 32 +mov w6, #\width +bl X(ff_hevc_put_hevc_qpel_h32_8_neon) +.else +bl X(ff_hevc_put_hevc_qpel_h\width\()_8_\suffix) +.endif +.else +bl X(ff_hevc_put_hevc_qpel_h\width\()_8_\suffix) +.endif movrel x9, qpel_filters add x9, x9, x23, lsl #3 ld1 {v0.8b}, [x9] @@ -4552,33 +4561,39 @@ function hevc_put_hevc_qpel_uni_w_hv16_8_end_neon ret endfunc -#if HAVE_I8MM -ENABLE_I8MM - -function ff_hevc_put_hevc_qpel_uni_w_hv4_8_neon_i8mm, export=1 -QPEL_UNI_W_HV_HEADER 4 +.macro qpel_uni_w_hv suffix +function ff_hevc_put_hevc_qpel_uni_w_hv4_8_\suffix, export=1 +QPEL_UNI_W_HV_HEADER 4, \suffix b hevc_put_hevc_qpel_uni_w_hv4_8_end_neon endfunc -function ff_hevc_put_hevc_qpel_uni_w_hv8_8_neon_i8mm, export=1 -QPEL_UNI_W_HV_HEADER 8 +function ff_hevc_put_hevc_qpel_uni_w_hv8_8_\suffix, export=1 +QPEL_UNI_W_HV_HEADER 8, \suffix b hevc_put_hevc_qpel_uni_w_hv8_8_end_neon endfunc -function ff_hevc_put_hevc_qpel_uni_w_hv16_8_neon_i8mm, export=1 -QPEL_UNI_W_HV_HEADER 16 +function ff_hevc_put_hevc_qpel_uni_w_hv16_8_\suffix, export=1 +QPEL_UNI_W_HV_HEADER 16, \suffix b hevc_put_hevc_qpel_uni_w_hv16_8_end_neon endfunc -function ff_hevc_put_hevc_qpel_uni_w_hv32_8_neon_i8mm, export=1 -QPEL_UNI_W_HV_HEADER 32 +function ff_hevc_put_hevc_qpel_uni_w_hv32_8_\suffix, export=1 +QPEL_UNI_W_HV_HEADER 32, \suffix b hevc_put_hevc_qpel_uni_w_hv16_8_end_neon endfunc
[FFmpeg-devel] [PATCH 19/21] aarch64: hevc: Produce plain neon versions of qpel_uni_hv
As the plain neon qpel_h functions process two rows at a time, we need to allocate storage for h+8 rows instead of h+7. By allocating storage for h+8 rows, incrementing the stack pointer won't end up at the right spot in the end. Store the intended final stack pointer value in a register x14 which we store on the stack. AWS Graviton 3: put_hevc_qpel_uni_hv4_8_c: 384.2 put_hevc_qpel_uni_hv4_8_neon: 127.5 put_hevc_qpel_uni_hv4_8_i8mm: 85.5 put_hevc_qpel_uni_hv6_8_c: 705.5 put_hevc_qpel_uni_hv6_8_neon: 224.5 put_hevc_qpel_uni_hv6_8_i8mm: 176.2 put_hevc_qpel_uni_hv8_8_c: 1136.5 put_hevc_qpel_uni_hv8_8_neon: 216.5 put_hevc_qpel_uni_hv8_8_i8mm: 214.0 put_hevc_qpel_uni_hv12_8_c: 2259.5 put_hevc_qpel_uni_hv12_8_neon: 498.5 put_hevc_qpel_uni_hv12_8_i8mm: 410.7 put_hevc_qpel_uni_hv16_8_c: 3824.7 put_hevc_qpel_uni_hv16_8_neon: 670.0 put_hevc_qpel_uni_hv16_8_i8mm: 603.7 put_hevc_qpel_uni_hv24_8_c: 8113.5 put_hevc_qpel_uni_hv24_8_neon: 1474.7 put_hevc_qpel_uni_hv24_8_i8mm: 1351.5 put_hevc_qpel_uni_hv32_8_c: 14744.5 put_hevc_qpel_uni_hv32_8_neon: 2599.7 put_hevc_qpel_uni_hv32_8_i8mm: 2266.0 put_hevc_qpel_uni_hv48_8_c: 32800.0 put_hevc_qpel_uni_hv48_8_neon: 5650.0 put_hevc_qpel_uni_hv48_8_i8mm: 5011.7 put_hevc_qpel_uni_hv64_8_c: 57856.2 put_hevc_qpel_uni_hv64_8_neon: 9863.5 put_hevc_qpel_uni_hv64_8_i8mm: 8767.7 --- libavcodec/aarch64/hevcdsp_init_aarch64.c | 5 + libavcodec/aarch64/hevcdsp_qpel_neon.S| 156 ++ 2 files changed, 102 insertions(+), 59 deletions(-) diff --git a/libavcodec/aarch64/hevcdsp_init_aarch64.c b/libavcodec/aarch64/hevcdsp_init_aarch64.c index 105c26017b..0531db027b 100644 --- a/libavcodec/aarch64/hevcdsp_init_aarch64.c +++ b/libavcodec/aarch64/hevcdsp_init_aarch64.c @@ -277,6 +277,10 @@ NEON8_FNPROTO(qpel_uni_v, (uint8_t *dst, ptrdiff_t dststride, const uint8_t *src, ptrdiff_t srcstride, int height, intptr_t mx, intptr_t my, int width),); +NEON8_FNPROTO(qpel_uni_hv, (uint8_t *dst, ptrdiff_t dststride, +const uint8_t *src, ptrdiff_t srcstride, +int height, intptr_t mx, intptr_t my, int width),); + NEON8_FNPROTO(qpel_uni_hv, (uint8_t *dst, ptrdiff_t dststride, const uint8_t *src, ptrdiff_t srcstride, int height, intptr_t mx, intptr_t my, int width), _i8mm); @@ -441,6 +445,7 @@ av_cold void ff_hevc_dsp_init_aarch64(HEVCDSPContext *c, const int bit_depth) NEON8_FNASSIGN_SHARED_32(c->put_hevc_qpel_uni_w, 0, 1, qpel_uni_w_h,); NEON8_FNASSIGN(c->put_hevc_qpel, 1, 1, qpel_hv,); +NEON8_FNASSIGN(c->put_hevc_qpel_uni, 1, 1, qpel_uni_hv,); if (have_i8mm(cpu_flags)) { NEON8_FNASSIGN(c->put_hevc_epel, 0, 1, epel_h, _i8mm); diff --git a/libavcodec/aarch64/hevcdsp_qpel_neon.S b/libavcodec/aarch64/hevcdsp_qpel_neon.S index 7bffb991a7..f285ab7461 100644 --- a/libavcodec/aarch64/hevcdsp_qpel_neon.S +++ b/libavcodec/aarch64/hevcdsp_qpel_neon.S @@ -2169,7 +2169,8 @@ function hevc_put_hevc_qpel_uni_hv4_8_end_neon .endm 1: calc_all .purgem calc -2: ret +2: mov sp, x14 +ret endfunc function hevc_put_hevc_qpel_uni_hv6_8_end_neon @@ -2198,7 +2199,8 @@ function hevc_put_hevc_qpel_uni_hv6_8_end_neon .endm 1: calc_all .purgem calc -2: ret +2: mov sp, x14 +ret endfunc function hevc_put_hevc_qpel_uni_hv8_8_end_neon @@ -2225,7 +2227,8 @@ function hevc_put_hevc_qpel_uni_hv8_8_end_neon .endm 1: calc_all .purgem calc -2: ret +2: mov sp, x14 +ret endfunc function hevc_put_hevc_qpel_uni_hv12_8_end_neon @@ -2252,7 +2255,8 @@ function hevc_put_hevc_qpel_uni_hv12_8_end_neon .endm 1: calc_all2 .purgem calc -2: ret +2: mov sp, x14 +ret endfunc function hevc_put_hevc_qpel_uni_hv16_8_end_neon @@ -2286,21 +2290,17 @@ function hevc_put_hevc_qpel_uni_hv16_8_end_neon add sp, sp, #32 subsw7, w7, #16 b.ne0b -add w10, w4, #6 -add sp, sp, x12 // discard rest of first line -lsl x10, x10, #7 -add sp, sp, x10 // tmp_array without first line +mov sp, x14 ret endfunc -#if HAVE_I8MM -ENABLE_I8MM - -function ff_hevc_put_hevc_qpel_uni_hv4_8_neon_i8mm, export=1 -add w10, w4, #7 +.macro qpel_uni_hv suffix +function ff_hevc_put_hevc_qpel_uni_hv4_8_\suffix, export=1 +add w10, w4, #8 lsl x10, x10, #7 +mov x14, sp sub sp, sp, x10 // tmp_array -str x30, [sp, #-48]! +stp x30, x14,[sp, #-48]! stp x4, x6, [sp, #16] stp x0, x1, [sp, #32] sub x1, x2, x3, lsl #1 @@ -2309,18 +2309,19 @@ function ff_hevc_put_hevc_qpel_uni_hv4_8_neon_i8mm, export=1 mov
[FFmpeg-devel] [PATCH 18/21] aarch64: hevc: Produce plain neon versions of qpel_hv
As the plain neon qpel_h functions process two rows at a time, we need to allocate storage for h+8 rows instead of h+7. By allocating storage for h+8 rows, incrementing the stack pointer won't end up at the right spot in the end. Store the intended final stack pointer value in a register x14 which we store on the stack. AWS Graviton 3: put_hevc_qpel_hv4_8_c: 386.0 put_hevc_qpel_hv4_8_neon: 125.7 put_hevc_qpel_hv4_8_i8mm: 83.2 put_hevc_qpel_hv6_8_c: 749.0 put_hevc_qpel_hv6_8_neon: 207.0 put_hevc_qpel_hv6_8_i8mm: 166.0 put_hevc_qpel_hv8_8_c: 1305.2 put_hevc_qpel_hv8_8_neon: 216.5 put_hevc_qpel_hv8_8_i8mm: 213.0 put_hevc_qpel_hv12_8_c: 2570.5 put_hevc_qpel_hv12_8_neon: 480.0 put_hevc_qpel_hv12_8_i8mm: 398.2 put_hevc_qpel_hv16_8_c: 4158.7 put_hevc_qpel_hv16_8_neon: 659.7 put_hevc_qpel_hv16_8_i8mm: 593.5 put_hevc_qpel_hv24_8_c: 8626.7 put_hevc_qpel_hv24_8_neon: 1653.5 put_hevc_qpel_hv24_8_i8mm: 1398.7 put_hevc_qpel_hv32_8_c: 14646.0 put_hevc_qpel_hv32_8_neon: 2566.2 put_hevc_qpel_hv32_8_i8mm: 2287.5 put_hevc_qpel_hv48_8_c: 31072.5 put_hevc_qpel_hv48_8_neon: 6228.5 put_hevc_qpel_hv48_8_i8mm: 5291.0 put_hevc_qpel_hv64_8_c: 53847.2 put_hevc_qpel_hv64_8_neon: 9856.7 put_hevc_qpel_hv64_8_i8mm: 8831.0 --- libavcodec/aarch64/hevcdsp_init_aarch64.c | 6 + libavcodec/aarch64/hevcdsp_qpel_neon.S| 166 +- 2 files changed, 104 insertions(+), 68 deletions(-) diff --git a/libavcodec/aarch64/hevcdsp_init_aarch64.c b/libavcodec/aarch64/hevcdsp_init_aarch64.c index ea0d26c019..105c26017b 100644 --- a/libavcodec/aarch64/hevcdsp_init_aarch64.c +++ b/libavcodec/aarch64/hevcdsp_init_aarch64.c @@ -265,6 +265,10 @@ NEON8_FNPROTO(qpel_v, (int16_t *dst, const uint8_t *src, ptrdiff_t srcstride, int height, intptr_t mx, intptr_t my, int width),); +NEON8_FNPROTO(qpel_hv, (int16_t *dst, +const uint8_t *src, ptrdiff_t srcstride, +int height, intptr_t mx, intptr_t my, int width),); + NEON8_FNPROTO(qpel_hv, (int16_t *dst, const uint8_t *src, ptrdiff_t srcstride, int height, intptr_t mx, intptr_t my, int width), _i8mm); @@ -436,6 +440,8 @@ av_cold void ff_hevc_dsp_init_aarch64(HEVCDSPContext *c, const int bit_depth) NEON8_FNASSIGN_SHARED_32(c->put_hevc_qpel_uni_w, 0, 1, qpel_uni_w_h,); +NEON8_FNASSIGN(c->put_hevc_qpel, 1, 1, qpel_hv,); + if (have_i8mm(cpu_flags)) { NEON8_FNASSIGN(c->put_hevc_epel, 0, 1, epel_h, _i8mm); NEON8_FNASSIGN(c->put_hevc_epel, 1, 1, epel_hv, _i8mm); diff --git a/libavcodec/aarch64/hevcdsp_qpel_neon.S b/libavcodec/aarch64/hevcdsp_qpel_neon.S index ad568e415b..7bffb991a7 100644 --- a/libavcodec/aarch64/hevcdsp_qpel_neon.S +++ b/libavcodec/aarch64/hevcdsp_qpel_neon.S @@ -3804,7 +3804,8 @@ function hevc_put_hevc_qpel_hv4_8_end_neon .endm 1: calc_all .purgem calc -2: ret +2: mov sp, x14 +ret endfunc function hevc_put_hevc_qpel_hv6_8_end_neon @@ -3831,7 +3832,8 @@ function hevc_put_hevc_qpel_hv6_8_end_neon .endm 1: calc_all .purgem calc -2: ret +2: mov sp, x14 +ret endfunc function hevc_put_hevc_qpel_hv8_8_end_neon @@ -3857,7 +3859,8 @@ function hevc_put_hevc_qpel_hv8_8_end_neon .endm 1: calc_all .purgem calc -2: ret +2: mov sp, x14 +ret endfunc function hevc_put_hevc_qpel_hv12_8_end_neon @@ -3882,7 +3885,8 @@ function hevc_put_hevc_qpel_hv12_8_end_neon .endm 1: calc_all2 .purgem calc -2: ret +2: mov sp, x14 +ret endfunc function hevc_put_hevc_qpel_hv16_8_end_neon @@ -3906,7 +3910,8 @@ function hevc_put_hevc_qpel_hv16_8_end_neon .endm 1: calc_all2 .purgem calc -2: ret +2: mov sp, x14 +ret endfunc function hevc_put_hevc_qpel_hv32_8_end_neon @@ -3937,162 +3942,187 @@ function hevc_put_hevc_qpel_hv32_8_end_neon add sp, sp, #32 subsw6, w6, #16 b.hi0b -add w10, w3, #6 -add sp, sp, #64 // discard rest of first line -lsl x10, x10, #7 -add sp, sp, x10 // tmp_array without first line +mov sp, x14 ret endfunc -#if HAVE_I8MM -ENABLE_I8MM -function ff_hevc_put_hevc_qpel_hv4_8_neon_i8mm, export=1 -add w10, w3, #7 +.macro qpel_hv suffix +function ff_hevc_put_hevc_qpel_hv4_8_\suffix, export=1 +add w10, w3, #8 mov x7, #128 lsl x10, x10, #7 +mov x14, sp sub sp, sp, x10 // tmp_array -stp x5, x30, [sp, #-32]! -stp x0, x3, [sp, #16] -add x0, sp, #32 +stp x5, x30, [sp, #-48]! +stp x0, x3, [sp, #16] +str x14, [sp, #32] +add x0, sp, #48
[FFmpeg-devel] [PATCH 16/21] aarch64: hevc: Deduplicate the hevc_put_hevc_qpel_uni_w_hv*_8_end_neon functions
The hv32 and hv64 functions were identical - both loop and process 16 pixels at a time. The hv16 function was near identical, except for the outer loop (and using sp instead of a separate register). Given the size of these functions, the extra cost of the outer loop is negligible, so use the same function for hv16 as well. This removes over 200 lines of duplicated assembly, and over 4 KB of binary size. --- libavcodec/aarch64/hevcdsp_qpel_neon.S | 220 + 1 file changed, 3 insertions(+), 217 deletions(-) diff --git a/libavcodec/aarch64/hevcdsp_qpel_neon.S b/libavcodec/aarch64/hevcdsp_qpel_neon.S index c04e8dbea8..06832603d9 100644 --- a/libavcodec/aarch64/hevcdsp_qpel_neon.S +++ b/libavcodec/aarch64/hevcdsp_qpel_neon.S @@ -4381,231 +4381,17 @@ function ff_hevc_put_hevc_qpel_uni_w_hv16_8_neon_i8mm, export=1 b hevc_put_hevc_qpel_uni_w_hv16_8_end_neon endfunc -function hevc_put_hevc_qpel_uni_w_hv16_8_end_neon -ldp q16, q1, [sp] -add sp, sp, x10 -ldp q17, q2, [sp] -add sp, sp, x10 -ldp q18, q3, [sp] -add sp, sp, x10 -ldp q19, q4, [sp] -add sp, sp, x10 -ldp q20, q5, [sp] -add sp, sp, x10 -ldp q21, q6, [sp] -add sp, sp, x10 -ldp q22, q7, [sp] -add sp, sp, x10 -1: -ldp q23, q31, [sp] -add sp, sp, x10 -QPEL_FILTER_H v24, v16, v17, v18, v19, v20, v21, v22, v23 -QPEL_FILTER_H2 v25, v16, v17, v18, v19, v20, v21, v22, v23 -QPEL_FILTER_H v26, v1, v2, v3, v4, v5, v6, v7, v31 -QPEL_FILTER_H2 v27, v1, v2, v3, v4, v5, v6, v7, v31 -QPEL_UNI_W_HV_16 -subsw22, w22, #1 -b.eq2f - -ldp q16, q1, [sp] -add sp, sp, x10 -QPEL_FILTER_H v24, v17, v18, v19, v20, v21, v22, v23, v16 -QPEL_FILTER_H2 v25, v17, v18, v19, v20, v21, v22, v23, v16 -QPEL_FILTER_H v26, v2, v3, v4, v5, v6, v7, v31, v1 -QPEL_FILTER_H2 v27, v2, v3, v4, v5, v6, v7, v31, v1 -QPEL_UNI_W_HV_16 -subsw22, w22, #1 -b.eq2f - -ldp q17, q2, [sp] -add sp, sp, x10 -QPEL_FILTER_H v24, v18, v19, v20, v21, v22, v23, v16, v17 -QPEL_FILTER_H2 v25, v18, v19, v20, v21, v22, v23, v16, v17 -QPEL_FILTER_H v26, v3, v4, v5, v6, v7, v31, v1, v2 -QPEL_FILTER_H2 v27, v3, v4, v5, v6, v7, v31, v1, v2 -QPEL_UNI_W_HV_16 -subsw22, w22, #1 -b.eq2f - -ldp q18, q3, [sp] -add sp, sp, x10 -QPEL_FILTER_H v24, v19, v20, v21, v22, v23, v16, v17, v18 -QPEL_FILTER_H2 v25, v19, v20, v21, v22, v23, v16, v17, v18 -QPEL_FILTER_H v26, v4, v5, v6, v7, v31, v1, v2, v3 -QPEL_FILTER_H2 v27, v4, v5, v6, v7, v31, v1, v2, v3 -QPEL_UNI_W_HV_16 -subsw22, w22, #1 -b.eq2f - -ldp q19, q4, [sp] -add sp, sp, x10 -QPEL_FILTER_H v24, v20, v21, v22, v23, v16, v17, v18, v19 -QPEL_FILTER_H2 v25, v20, v21, v22, v23, v16, v17, v18, v19 -QPEL_FILTER_H v26, v5, v6, v7, v31, v1, v2, v3, v4 -QPEL_FILTER_H2 v27, v5, v6, v7, v31, v1, v2, v3, v4 -QPEL_UNI_W_HV_16 -subsw22, w22, #1 -b.eq2f - -ldp q20, q5, [sp] -add sp, sp, x10 -QPEL_FILTER_H v24, v21, v22, v23, v16, v17, v18, v19, v20 -QPEL_FILTER_H2 v25, v21, v22, v23, v16, v17, v18, v19, v20 -QPEL_FILTER_H v26, v6, v7, v31, v1, v2, v3, v4, v5 -QPEL_FILTER_H2 v27, v6, v7, v31, v1, v2, v3, v4, v5 -QPEL_UNI_W_HV_16 -subsw22, w22, #1 -b.eq2f - -ldp q21, q6, [sp] -add sp, sp, x10 -QPEL_FILTER_H v24, v22, v23, v16, v17, v18, v19, v20, v21 -QPEL_FILTER_H2 v25, v22, v23, v16, v17, v18, v19, v20, v21 -QPEL_FILTER_H v26, v7, v31, v1, v2, v3, v4, v5, v6 -QPEL_FILTER_H2 v27, v7, v31, v1, v2, v3, v4, v5, v6 -QPEL_UNI_W_HV_16 -subsw22, w22, #1 -b.eq2f - -ldp q22, q7, [sp] -add sp, sp, x10 -QPEL_FILTER_H v24, v23, v16, v17, v18, v19, v20, v21, v22 -QPEL_FILTER_H2 v25, v23, v16, v17, v18, v19, v20, v21, v22 -QPEL_FILTER_H v26, v31, v1, v2, v3, v4, v5, v6, v7 -QPEL_FILTER_H2 v27, v31, v1, v2, v3, v4, v5, v6, v7 -
[FFmpeg-devel] [PATCH 17/21] aarch64: hevc: Reorder qpel_hv functions to prepare for templating
--- libavcodec/aarch64/hevcdsp_qpel_neon.S | 695 + 1 file changed, 355 insertions(+), 340 deletions(-) diff --git a/libavcodec/aarch64/hevcdsp_qpel_neon.S b/libavcodec/aarch64/hevcdsp_qpel_neon.S index 06832603d9..ad568e415b 100644 --- a/libavcodec/aarch64/hevcdsp_qpel_neon.S +++ b/libavcodec/aarch64/hevcdsp_qpel_neon.S @@ -2146,29 +2146,6 @@ function ff_hevc_put_hevc_qpel_uni_w_v64_8_neon, export=1 ret endfunc -#if HAVE_I8MM -ENABLE_I8MM - -function ff_hevc_put_hevc_qpel_uni_hv4_8_neon_i8mm, export=1 -add w10, w4, #7 -lsl x10, x10, #7 -sub sp, sp, x10 // tmp_array -str x30, [sp, #-48]! -stp x4, x6, [sp, #16] -stp x0, x1, [sp, #32] -sub x1, x2, x3, lsl #1 -sub x1, x1, x3 -add x0, sp, #48 -mov x2, x3 -add x3, x4, #7 -mov x4, x5 -bl X(ff_hevc_put_hevc_qpel_h4_8_neon_i8mm) -ldp x4, x6, [sp, #16] -ldp x0, x1, [sp, #32] -ldr x30, [sp], #48 -b hevc_put_hevc_qpel_uni_hv4_8_end_neon -endfunc - function hevc_put_hevc_qpel_uni_hv4_8_end_neon mov x9, #(MAX_PB_SIZE * 2) load_qpel_filterh x6, x5 @@ -2195,26 +2172,6 @@ function hevc_put_hevc_qpel_uni_hv4_8_end_neon 2: ret endfunc -function ff_hevc_put_hevc_qpel_uni_hv6_8_neon_i8mm, export=1 -add w10, w4, #7 -lsl x10, x10, #7 -sub sp, sp, x10 // tmp_array -str x30, [sp, #-48]! -stp x4, x6, [sp, #16] -stp x0, x1, [sp, #32] -sub x1, x2, x3, lsl #1 -sub x1, x1, x3 -add x0, sp, #48 -mov x2, x3 -add w3, w4, #7 -mov x4, x5 -bl X(ff_hevc_put_hevc_qpel_h6_8_neon_i8mm) -ldp x4, x6, [sp, #16] -ldp x0, x1, [sp, #32] -ldr x30, [sp], #48 -b hevc_put_hevc_qpel_uni_hv6_8_end_neon -endfunc - function hevc_put_hevc_qpel_uni_hv6_8_end_neon mov x9, #(MAX_PB_SIZE * 2) load_qpel_filterh x6, x5 @@ -2244,26 +2201,6 @@ function hevc_put_hevc_qpel_uni_hv6_8_end_neon 2: ret endfunc -function ff_hevc_put_hevc_qpel_uni_hv8_8_neon_i8mm, export=1 -add w10, w4, #7 -lsl x10, x10, #7 -sub sp, sp, x10 // tmp_array -str x30, [sp, #-48]! -stp x4, x6, [sp, #16] -stp x0, x1, [sp, #32] -sub x1, x2, x3, lsl #1 -sub x1, x1, x3 -add x0, sp, #48 -mov x2, x3 -add w3, w4, #7 -mov x4, x5 -bl X(ff_hevc_put_hevc_qpel_h8_8_neon_i8mm) -ldp x4, x6, [sp, #16] -ldp x0, x1, [sp, #32] -ldr x30, [sp], #48 -b hevc_put_hevc_qpel_uni_hv8_8_end_neon -endfunc - function hevc_put_hevc_qpel_uni_hv8_8_end_neon mov x9, #(MAX_PB_SIZE * 2) load_qpel_filterh x6, x5 @@ -2291,26 +2228,6 @@ function hevc_put_hevc_qpel_uni_hv8_8_end_neon 2: ret endfunc -function ff_hevc_put_hevc_qpel_uni_hv12_8_neon_i8mm, export=1 -add w10, w4, #7 -lsl x10, x10, #7 -sub sp, sp, x10 // tmp_array -stp x7, x30, [sp, #-48]! -stp x4, x6, [sp, #16] -stp x0, x1, [sp, #32] -sub x1, x2, x3, lsl #1 -sub x1, x1, x3 -mov x2, x3 -add x0, sp, #48 -add w3, w4, #7 -mov x4, x5 -bl X(ff_hevc_put_hevc_qpel_h12_8_neon_i8mm) -ldp x4, x6, [sp, #16] -ldp x0, x1, [sp, #32] -ldp x7, x30, [sp], #48 -b hevc_put_hevc_qpel_uni_hv12_8_end_neon -endfunc - function hevc_put_hevc_qpel_uni_hv12_8_end_neon mov x9, #(MAX_PB_SIZE * 2) load_qpel_filterh x6, x5 @@ -2338,26 +2255,6 @@ function hevc_put_hevc_qpel_uni_hv12_8_end_neon 2: ret endfunc -function ff_hevc_put_hevc_qpel_uni_hv16_8_neon_i8mm, export=1 -add w10, w4, #7 -lsl x10, x10, #7 -sub sp, sp, x10 // tmp_array -stp x7, x30, [sp, #-48]! -stp x4, x6, [sp, #16] -stp x0, x1, [sp, #32] -add x0, sp, #48 -sub
[FFmpeg-devel] [PATCH 15/21] aarch64: hevc: Split the qpel_*_hv functions into two parts
--- libavcodec/aarch64/hevcdsp_qpel_neon.S | 94 +++--- 1 file changed, 86 insertions(+), 8 deletions(-) diff --git a/libavcodec/aarch64/hevcdsp_qpel_neon.S b/libavcodec/aarch64/hevcdsp_qpel_neon.S index fba063186c..c04e8dbea8 100644 --- a/libavcodec/aarch64/hevcdsp_qpel_neon.S +++ b/libavcodec/aarch64/hevcdsp_qpel_neon.S @@ -2166,6 +2166,10 @@ function ff_hevc_put_hevc_qpel_uni_hv4_8_neon_i8mm, export=1 ldp x4, x6, [sp, #16] ldp x0, x1, [sp, #32] ldr x30, [sp], #48 +b hevc_put_hevc_qpel_uni_hv4_8_end_neon +endfunc + +function hevc_put_hevc_qpel_uni_hv4_8_end_neon mov x9, #(MAX_PB_SIZE * 2) load_qpel_filterh x6, x5 ldr d16, [sp] @@ -2208,6 +2212,10 @@ function ff_hevc_put_hevc_qpel_uni_hv6_8_neon_i8mm, export=1 ldp x4, x6, [sp, #16] ldp x0, x1, [sp, #32] ldr x30, [sp], #48 +b hevc_put_hevc_qpel_uni_hv6_8_end_neon +endfunc + +function hevc_put_hevc_qpel_uni_hv6_8_end_neon mov x9, #(MAX_PB_SIZE * 2) load_qpel_filterh x6, x5 sub x1, x1, #4 @@ -2253,6 +2261,10 @@ function ff_hevc_put_hevc_qpel_uni_hv8_8_neon_i8mm, export=1 ldp x4, x6, [sp, #16] ldp x0, x1, [sp, #32] ldr x30, [sp], #48 +b hevc_put_hevc_qpel_uni_hv8_8_end_neon +endfunc + +function hevc_put_hevc_qpel_uni_hv8_8_end_neon mov x9, #(MAX_PB_SIZE * 2) load_qpel_filterh x6, x5 ldr q16, [sp] @@ -2296,6 +2308,10 @@ function ff_hevc_put_hevc_qpel_uni_hv12_8_neon_i8mm, export=1 ldp x4, x6, [sp, #16] ldp x0, x1, [sp, #32] ldp x7, x30, [sp], #48 +b hevc_put_hevc_qpel_uni_hv12_8_end_neon +endfunc + +function hevc_put_hevc_qpel_uni_hv12_8_end_neon mov x9, #(MAX_PB_SIZE * 2) load_qpel_filterh x6, x5 sub x1, x1, #8 @@ -2339,7 +2355,10 @@ function ff_hevc_put_hevc_qpel_uni_hv16_8_neon_i8mm, export=1 ldp x4, x6, [sp, #16] ldp x0, x1, [sp, #32] ldp x7, x30, [sp], #48 -.Lqpel_uni_hv16_loop: +b hevc_put_hevc_qpel_uni_hv16_8_end_neon +endfunc + +function hevc_put_hevc_qpel_uni_hv16_8_end_neon mov x9, #(MAX_PB_SIZE * 2) load_qpel_filterh x6, x5 sub w12, w9, w7, lsl #1 @@ -2414,7 +2433,7 @@ function ff_hevc_put_hevc_qpel_uni_hv32_8_neon_i8mm, export=1 ldp x4, x6, [sp, #16] ldp x0, x1, [sp, #32] ldp x7, x30, [sp], #48 -b .Lqpel_uni_hv16_loop +b hevc_put_hevc_qpel_uni_hv16_8_end_neon endfunc function ff_hevc_put_hevc_qpel_uni_hv48_8_neon_i8mm, export=1 @@ -2434,7 +2453,7 @@ function ff_hevc_put_hevc_qpel_uni_hv48_8_neon_i8mm, export=1 ldp x4, x6, [sp, #16] ldp x0, x1, [sp, #32] ldp x7, x30, [sp], #48 -b .Lqpel_uni_hv16_loop +b hevc_put_hevc_qpel_uni_hv16_8_end_neon endfunc function ff_hevc_put_hevc_qpel_uni_hv64_8_neon_i8mm, export=1 @@ -2454,7 +2473,7 @@ function ff_hevc_put_hevc_qpel_uni_hv64_8_neon_i8mm, export=1 ldp x4, x6, [sp, #16] ldp x0, x1, [sp, #32] ldp x7, x30, [sp], #48 -b .Lqpel_uni_hv16_loop +b hevc_put_hevc_qpel_uni_hv16_8_end_neon endfunc DISABLE_I8MM #endif @@ -3776,6 +3795,10 @@ function ff_hevc_put_hevc_qpel_hv4_8_neon_i8mm, export=1 bl X(ff_hevc_put_hevc_qpel_h4_8_neon_i8mm) ldp x0, x3, [sp, #16] ldp x5, x30, [sp], #32 +b hevc_put_hevc_qpel_hv4_8_end_neon +endfunc + +function hevc_put_hevc_qpel_hv4_8_end_neon load_qpel_filterh x5, x4 ldr d16, [sp] ldr d17, [sp, x7] @@ -3813,6 +3836,10 @@ function ff_hevc_put_hevc_qpel_hv6_8_neon_i8mm, export=1 bl X(ff_hevc_put_hevc_qpel_h6_8_neon_i8mm) ldp x0, x3, [sp, #16] ldp x5, x30, [sp], #32 +b hevc_put_hevc_qpel_hv6_8_end_neon +endfunc + +function hevc_put_hevc_qpel_hv6_8_end_neon mov x8, #120 load_qpel_filterh x5, x4 ldr q16, [sp] @@ -3852,6 +3879,10 @@ function ff_hevc_put_hevc_qpel_hv8_8_neon_i8mm, export=1 bl X(ff_hevc_put_hevc_qpel_h8_8_neon_i8mm) ldp x0, x3, [sp, #16] ldp x5, x30, [sp], #32 +b hevc_put_hevc_qpel_hv8_8_end_neon
[FFmpeg-devel] [PATCH 13/21] aarch64: hevc: Produce epel_bi_hv functions for both neon and i8mm
In addition to just templating, this contains one change to ff_hevc_put_hevc_epel_bi_hv32_8, by setting the w6 register which ff_hevc_put_hevc_epel_h32_8_neon requires. AWS Graviton 3: put_hevc_epel_bi_hv4_8_c: 176.5 put_hevc_epel_bi_hv4_8_neon: 62.0 put_hevc_epel_bi_hv4_8_i8mm: 58.0 put_hevc_epel_bi_hv6_8_c: 343.7 put_hevc_epel_bi_hv6_8_neon: 109.7 put_hevc_epel_bi_hv6_8_i8mm: 105.7 put_hevc_epel_bi_hv8_8_c: 536.0 put_hevc_epel_bi_hv8_8_neon: 112.7 put_hevc_epel_bi_hv8_8_i8mm: 111.7 put_hevc_epel_bi_hv12_8_c: 1107.7 put_hevc_epel_bi_hv12_8_neon: 254.7 put_hevc_epel_bi_hv12_8_i8mm: 239.0 put_hevc_epel_bi_hv16_8_c: 1927.7 put_hevc_epel_bi_hv16_8_neon: 356.2 put_hevc_epel_bi_hv16_8_i8mm: 334.2 put_hevc_epel_bi_hv24_8_c: 4195.2 put_hevc_epel_bi_hv24_8_neon: 736.7 put_hevc_epel_bi_hv24_8_i8mm: 715.5 put_hevc_epel_bi_hv32_8_c: 7280.5 put_hevc_epel_bi_hv32_8_neon: 1287.7 put_hevc_epel_bi_hv32_8_i8mm: 1162.2 put_hevc_epel_bi_hv48_8_c: 16857.7 put_hevc_epel_bi_hv48_8_neon: 2836.2 put_hevc_epel_bi_hv48_8_i8mm: 2908.5 put_hevc_epel_bi_hv64_8_c: 29248.2 put_hevc_epel_bi_hv64_8_neon: 5051.7 put_hevc_epel_bi_hv64_8_i8mm: 4491.5 --- libavcodec/aarch64/hevcdsp_epel_neon.S| 62 +++ libavcodec/aarch64/hevcdsp_init_aarch64.c | 5 ++ 2 files changed, 36 insertions(+), 31 deletions(-) diff --git a/libavcodec/aarch64/hevcdsp_epel_neon.S b/libavcodec/aarch64/hevcdsp_epel_neon.S index d0c6205e1c..cb17758a72 100644 --- a/libavcodec/aarch64/hevcdsp_epel_neon.S +++ b/libavcodec/aarch64/hevcdsp_epel_neon.S @@ -3792,14 +3792,6 @@ endfunc epel_uni_w_hv neon -#if HAVE_I8MM -ENABLE_I8MM - -epel_uni_w_hv neon_i8mm - -DISABLE_I8MM -#endif - function hevc_put_hevc_epel_bi_hv4_8_end_neon load_epel_filterh x7, x6 mov x10, #(MAX_PB_SIZE * 2) @@ -3978,10 +3970,8 @@ function hevc_put_hevc_epel_bi_hv32_8_end_neon ret endfunc -#if HAVE_I8MM -ENABLE_I8MM - -function ff_hevc_put_hevc_epel_bi_hv4_8_neon_i8mm, export=1 +.macro epel_bi_hv suffix +function ff_hevc_put_hevc_epel_bi_hv4_8_\suffix, export=1 add w10, w5, #3 lsl x10, x10, #7 sub sp, sp, x10 // tmp_array @@ -3994,14 +3984,14 @@ function ff_hevc_put_hevc_epel_bi_hv4_8_neon_i8mm, export=1 add w3, w5, #3 mov x4, x6 mov x5, x7 -bl X(ff_hevc_put_hevc_epel_h4_8_neon_i8mm) +bl X(ff_hevc_put_hevc_epel_h4_8_\suffix) ldp x4, x5, [sp, #16] ldp x0, x1, [sp, #32] ldp x7, x30, [sp], #48 b hevc_put_hevc_epel_bi_hv4_8_end_neon endfunc -function ff_hevc_put_hevc_epel_bi_hv6_8_neon_i8mm, export=1 +function ff_hevc_put_hevc_epel_bi_hv6_8_\suffix, export=1 add w10, w5, #3 lsl x10, x10, #7 sub sp, sp, x10 // tmp_array @@ -4014,14 +4004,14 @@ function ff_hevc_put_hevc_epel_bi_hv6_8_neon_i8mm, export=1 add w3, w5, #3 mov x4, x6 mov x5, x7 -bl X(ff_hevc_put_hevc_epel_h6_8_neon_i8mm) +bl X(ff_hevc_put_hevc_epel_h6_8_\suffix) ldp x4, x5, [sp, #16] ldp x0, x1, [sp, #32] ldp x7, x30, [sp], #48 b hevc_put_hevc_epel_bi_hv6_8_end_neon endfunc -function ff_hevc_put_hevc_epel_bi_hv8_8_neon_i8mm, export=1 +function ff_hevc_put_hevc_epel_bi_hv8_8_\suffix, export=1 add w10, w5, #3 lsl x10, x10, #7 sub sp, sp, x10 // tmp_array @@ -4034,14 +4024,14 @@ function ff_hevc_put_hevc_epel_bi_hv8_8_neon_i8mm, export=1 add w3, w5, #3 mov x4, x6 mov x5, x7 -bl X(ff_hevc_put_hevc_epel_h8_8_neon_i8mm) +bl X(ff_hevc_put_hevc_epel_h8_8_\suffix) ldp x4, x5, [sp, #16] ldp x0, x1, [sp, #32] ldp x7, x30, [sp], #48 b hevc_put_hevc_epel_bi_hv8_8_end_neon endfunc -function ff_hevc_put_hevc_epel_bi_hv12_8_neon_i8mm, export=1 +function ff_hevc_put_hevc_epel_bi_hv12_8_\suffix, export=1 add w10, w5, #3 lsl x10, x10, #7 sub sp, sp, x10 // tmp_array @@ -4054,14 +4044,14 @@ function ff_hevc_put_hevc_epel_bi_hv12_8_neon_i8mm, export=1 add w3, w5, #3 mov x4, x6 mov x5, x7 -bl X(ff_hevc_put_hevc_epel_h12_8_neon_i8mm) +bl X(ff_hevc_put_hevc_epel_h12_8_\suffix) ldp x4, x5, [sp, #16] ldp x0, x1, [sp, #32] ldp x7, x30, [sp], #48 b hevc_put_hevc_epel_bi_hv12_8_end_neon endfunc
[FFmpeg-devel] [PATCH 14/21] aarch64: hevc: Implement a neon version of hevc_qpel_uni_w_h*_8
AWS Graviton 3: put_hevc_qpel_uni_w_h4_8_c: 159.0 put_hevc_qpel_uni_w_h4_8_neon: 64.2 put_hevc_qpel_uni_w_h4_8_i8mm: 40.0 put_hevc_qpel_uni_w_h6_8_c: 344.7 put_hevc_qpel_uni_w_h6_8_neon: 114.5 put_hevc_qpel_uni_w_h6_8_i8mm: 82.0 put_hevc_qpel_uni_w_h8_8_c: 596.2 put_hevc_qpel_uni_w_h8_8_neon: 132.2 put_hevc_qpel_uni_w_h8_8_i8mm: 106.0 put_hevc_qpel_uni_w_h12_8_c: 1325.0 put_hevc_qpel_uni_w_h12_8_neon: 299.0 put_hevc_qpel_uni_w_h12_8_i8mm: 211.5 put_hevc_qpel_uni_w_h16_8_c: 2300.0 put_hevc_qpel_uni_w_h16_8_neon: 422.0 put_hevc_qpel_uni_w_h16_8_i8mm: 286.2 put_hevc_qpel_uni_w_h24_8_c: 5059.0 put_hevc_qpel_uni_w_h24_8_neon: 912.2 put_hevc_qpel_uni_w_h24_8_i8mm: 664.2 put_hevc_qpel_uni_w_h32_8_c: 9198.2 put_hevc_qpel_uni_w_h32_8_neon: 1638.2 put_hevc_qpel_uni_w_h32_8_i8mm: 1033.7 put_hevc_qpel_uni_w_h48_8_c: 20754.7 put_hevc_qpel_uni_w_h48_8_neon: 3633.7 put_hevc_qpel_uni_w_h48_8_i8mm: 2300.7 put_hevc_qpel_uni_w_h64_8_c: 36854.7 put_hevc_qpel_uni_w_h64_8_neon: 6435.7 put_hevc_qpel_uni_w_h64_8_i8mm: 4039.2 --- libavcodec/aarch64/hevcdsp_init_aarch64.c | 7 + libavcodec/aarch64/hevcdsp_qpel_neon.S| 405 +- 2 files changed, 410 insertions(+), 2 deletions(-) diff --git a/libavcodec/aarch64/hevcdsp_init_aarch64.c b/libavcodec/aarch64/hevcdsp_init_aarch64.c index 6110a360d8..ea0d26c019 100644 --- a/libavcodec/aarch64/hevcdsp_init_aarch64.c +++ b/libavcodec/aarch64/hevcdsp_init_aarch64.c @@ -277,6 +277,11 @@ NEON8_FNPROTO(qpel_uni_hv, (uint8_t *dst, ptrdiff_t dststride, const uint8_t *src, ptrdiff_t srcstride, int height, intptr_t mx, intptr_t my, int width), _i8mm); +NEON8_FNPROTO(qpel_uni_w_h, (uint8_t *_dst, ptrdiff_t _dststride, +const uint8_t *_src, ptrdiff_t _srcstride, +int height, int denom, int wx, int ox, +intptr_t mx, intptr_t my, int width),); + NEON8_FNPROTO(qpel_uni_w_h, (uint8_t *_dst, ptrdiff_t _dststride, const uint8_t *_src, ptrdiff_t _srcstride, int height, int denom, int wx, int ox, @@ -429,6 +434,8 @@ av_cold void ff_hevc_dsp_init_aarch64(HEVCDSPContext *c, const int bit_depth) NEON8_FNASSIGN(c->put_hevc_epel_uni_w, 1, 1, epel_uni_w_hv,); NEON8_FNASSIGN(c->put_hevc_epel_bi, 1, 1, epel_bi_hv,); +NEON8_FNASSIGN_SHARED_32(c->put_hevc_qpel_uni_w, 0, 1, qpel_uni_w_h,); + if (have_i8mm(cpu_flags)) { NEON8_FNASSIGN(c->put_hevc_epel, 0, 1, epel_h, _i8mm); NEON8_FNASSIGN(c->put_hevc_epel, 1, 1, epel_hv, _i8mm); diff --git a/libavcodec/aarch64/hevcdsp_qpel_neon.S b/libavcodec/aarch64/hevcdsp_qpel_neon.S index 062b7d4d0f..fba063186c 100644 --- a/libavcodec/aarch64/hevcdsp_qpel_neon.S +++ b/libavcodec/aarch64/hevcdsp_qpel_neon.S @@ -2456,8 +2456,10 @@ function ff_hevc_put_hevc_qpel_uni_hv64_8_neon_i8mm, export=1 ldp x7, x30, [sp], #48 b .Lqpel_uni_hv16_loop endfunc +DISABLE_I8MM +#endif -.macro QPEL_UNI_W_H_HEADER +.macro QPEL_UNI_W_H_HEADER elems=4s ldr x12, [sp] sub x2, x2, #3 movrel x9, qpel_filters @@ -2465,11 +2467,410 @@ endfunc ld1r{v28.2d}, [x9] mov w10, #-6 sub w10, w10, w5 -dup v30.4s, w6 // wx +dup v30.\elems, w6 // wx dup v31.4s, w10 // shift dup v29.4s, w7 // ox .endm +function ff_hevc_put_hevc_qpel_uni_w_h4_8_neon, export=1 +QPEL_UNI_W_H_HEADER 4h +sxtlv0.8h, v28.8b +1: +ld1 {v1.8b, v2.8b}, [x2], x3 +subsw4, w4, #1 +uxtlv1.8h, v1.8b +uxtlv2.8h, v2.8b +ext v3.16b, v1.16b, v2.16b, #2 +ext v4.16b, v1.16b, v2.16b, #4 +ext v5.16b, v1.16b, v2.16b, #6 +ext v6.16b, v1.16b, v2.16b, #8 +ext v7.16b, v1.16b, v2.16b, #10 +ext v16.16b, v1.16b, v2.16b, #12 +ext v17.16b, v1.16b, v2.16b, #14 +mul v18.4h, v1.4h, v0.h[0] +mla v18.4h, v3.4h, v0.h[1] +mla v18.4h, v4.4h, v0.h[2] +mla v18.4h, v5.4h, v0.h[3] +mla v18.4h, v6.4h, v0.h[4] +mla v18.4h, v7.4h, v0.h[5] +mla v18.4h, v16.4h, v0.h[6] +mla v18.4h, v17.4h, v0.h[7] +smull v16.4s, v18.4h, v30.4h +sqrshl v16.4s, v16.4s, v31.4s +sqadd v16.4s, v16.4s, v29.4s +sqxtn v16.4h, v16.4s +sqxtun v16.8b, v16.8h +str s16, [x0] +add x0, x0, x1 +b.hi1b +ret +endfunc + +function
[FFmpeg-devel] [PATCH 08/21] aarch64: hevc: Split the epel_*_hv functions into two parts
The first horizontal filter can use either i8mm or plain neon versions, while the second part is a pure neon implementation. --- libavcodec/aarch64/hevcdsp_epel_neon.S | 100 + 1 file changed, 100 insertions(+) diff --git a/libavcodec/aarch64/hevcdsp_epel_neon.S b/libavcodec/aarch64/hevcdsp_epel_neon.S index 0e49491a81..6be171ece1 100644 --- a/libavcodec/aarch64/hevcdsp_epel_neon.S +++ b/libavcodec/aarch64/hevcdsp_epel_neon.S @@ -2186,6 +2186,10 @@ function ff_hevc_put_hevc_epel_hv4_8_neon_i8mm, export=1 bl X(ff_hevc_put_hevc_epel_h4_8_neon_i8mm) ldp x0, x3, [sp, #16] ldp x5, x30, [sp], #32 +b hevc_put_hevc_epel_hv4_8_end_neon +endfunc + +function hevc_put_hevc_epel_hv4_8_end_neon load_epel_filterh x5, x4 mov x10, #(MAX_PB_SIZE * 2) ldr d16, [sp] @@ -2215,6 +2219,10 @@ function ff_hevc_put_hevc_epel_hv6_8_neon_i8mm, export=1 bl X(ff_hevc_put_hevc_epel_h6_8_neon_i8mm) ldp x0, x3, [sp, #16] ldp x5, x30, [sp], #32 +b hevc_put_hevc_epel_hv6_8_end_neon +endfunc + +function hevc_put_hevc_epel_hv6_8_end_neon load_epel_filterh x5, x4 mov x5, #120 mov x10, #(MAX_PB_SIZE * 2) @@ -2247,6 +2255,10 @@ function ff_hevc_put_hevc_epel_hv8_8_neon_i8mm, export=1 bl X(ff_hevc_put_hevc_epel_h8_8_neon_i8mm) ldp x0, x3, [sp, #16] ldp x5, x30, [sp], #32 +b hevc_put_hevc_epel_hv8_8_end_neon +endfunc + +function hevc_put_hevc_epel_hv8_8_end_neon load_epel_filterh x5, x4 mov x10, #(MAX_PB_SIZE * 2) ldr q16, [sp] @@ -2277,6 +2289,10 @@ function ff_hevc_put_hevc_epel_hv12_8_neon_i8mm, export=1 bl X(ff_hevc_put_hevc_epel_h12_8_neon_i8mm) ldp x0, x3, [sp, #16] ldp x5, x30, [sp], #32 +b hevc_put_hevc_epel_hv12_8_end_neon +endfunc + +function hevc_put_hevc_epel_hv12_8_end_neon load_epel_filterh x5, x4 mov x5, #112 mov x10, #(MAX_PB_SIZE * 2) @@ -2309,6 +2325,10 @@ function ff_hevc_put_hevc_epel_hv16_8_neon_i8mm, export=1 bl X(ff_hevc_put_hevc_epel_h16_8_neon_i8mm) ldp x0, x3, [sp, #16] ldp x5, x30, [sp], #32 +b hevc_put_hevc_epel_hv16_8_end_neon +endfunc + +function hevc_put_hevc_epel_hv16_8_end_neon load_epel_filterh x5, x4 mov x10, #(MAX_PB_SIZE * 2) ld1 {v16.8h, v17.8h}, [sp], x10 @@ -2340,6 +2360,10 @@ function ff_hevc_put_hevc_epel_hv24_8_neon_i8mm, export=1 bl X(ff_hevc_put_hevc_epel_h24_8_neon_i8mm) ldp x0, x3, [sp, #16] ldp x5, x30, [sp], #32 +b hevc_put_hevc_epel_hv24_8_end_neon +endfunc + +function hevc_put_hevc_epel_hv24_8_end_neon load_epel_filterh x5, x4 mov x10, #(MAX_PB_SIZE * 2) ld1 {v16.8h, v17.8h, v18.8h}, [sp], x10 @@ -2445,6 +2469,10 @@ function ff_hevc_put_hevc_epel_uni_hv4_8_neon_i8mm, export=1 ldp x4, x6, [sp, #16] ldp x0, x1, [sp, #32] ldr x30, [sp], #48 +b hevc_put_hevc_epel_uni_hv4_8_end_neon +endfunc + +function hevc_put_hevc_epel_uni_hv4_8_end_neon load_epel_filterh x6, x5 mov x10, #(MAX_PB_SIZE * 2) ld1 {v16.4h}, [sp], x10 @@ -2478,6 +2506,10 @@ function ff_hevc_put_hevc_epel_uni_hv6_8_neon_i8mm, export=1 ldp x4, x6, [sp, #16] ldp x0, x1, [sp, #32] ldr x30, [sp], #48 +b hevc_put_hevc_epel_uni_hv6_8_end_neon +endfunc + +function hevc_put_hevc_epel_uni_hv6_8_end_neon load_epel_filterh x6, x5 sub x1, x1, #4 mov x10, #(MAX_PB_SIZE * 2) @@ -2514,6 +2546,10 @@ function ff_hevc_put_hevc_epel_uni_hv8_8_neon_i8mm, export=1 ldp x4, x6, [sp, #16] ldp x0, x1, [sp, #32] ldr x30, [sp], #48 +b hevc_put_hevc_epel_uni_hv8_8_end_neon +endfunc + +function hevc_put_hevc_epel_uni_hv8_8_end_neon load_epel_filterh x6, x5 mov x10, #(MAX_PB_SIZE * 2) ld1 {v16.8h}, [sp], x10 @@ -2548,6 +2584,10 @@ function ff_hevc_put_hevc_epel_uni_hv12_8_neon_i8mm, export=1 ldp x4, x6, [sp, #16] ldp x0, x1, [sp, #32] ldr x30, [sp], #48 +b hevc_put_hevc_epel_uni_hv12_8_end_neon +endfunc + +function
[FFmpeg-devel] [PATCH 07/21] aarch64: hevc: Implement a neon version of hevc_epel_uni_w_h*_8
AWS Graviton 3: put_hevc_epel_uni_w_h4_8_c: 97.2 put_hevc_epel_uni_w_h4_8_neon: 41.2 put_hevc_epel_uni_w_h4_8_i8mm: 35.2 put_hevc_epel_uni_w_h6_8_c: 203.7 put_hevc_epel_uni_w_h6_8_neon: 84.7 put_hevc_epel_uni_w_h6_8_i8mm: 74.7 put_hevc_epel_uni_w_h8_8_c: 345.7 put_hevc_epel_uni_w_h8_8_neon: 94.0 put_hevc_epel_uni_w_h8_8_i8mm: 80.7 put_hevc_epel_uni_w_h12_8_c: 768.7 put_hevc_epel_uni_w_h12_8_neon: 196.7 put_hevc_epel_uni_w_h12_8_i8mm: 169.7 put_hevc_epel_uni_w_h16_8_c: 1313.0 put_hevc_epel_uni_w_h16_8_neon: 290.7 put_hevc_epel_uni_w_h16_8_i8mm: 238.0 put_hevc_epel_uni_w_h24_8_c: 2877.5 put_hevc_epel_uni_w_h24_8_neon: 650.0 put_hevc_epel_uni_w_h24_8_i8mm: 512.0 put_hevc_epel_uni_w_h32_8_c: 5113.5 put_hevc_epel_uni_w_h32_8_neon: 1129.5 put_hevc_epel_uni_w_h32_8_i8mm: 739.2 put_hevc_epel_uni_w_h48_8_c: 11757.0 put_hevc_epel_uni_w_h48_8_neon: 2518.7 put_hevc_epel_uni_w_h48_8_i8mm: 1688.5 put_hevc_epel_uni_w_h64_8_c: 20478.0 put_hevc_epel_uni_w_h64_8_neon: 4411.7 put_hevc_epel_uni_w_h64_8_i8mm: 2884.0 --- libavcodec/aarch64/hevcdsp_epel_neon.S| 326 +- libavcodec/aarch64/hevcdsp_init_aarch64.c | 6 + 2 files changed, 319 insertions(+), 13 deletions(-) diff --git a/libavcodec/aarch64/hevcdsp_epel_neon.S b/libavcodec/aarch64/hevcdsp_epel_neon.S index 419e83529a..0e49491a81 100644 --- a/libavcodec/aarch64/hevcdsp_epel_neon.S +++ b/libavcodec/aarch64/hevcdsp_epel_neon.S @@ -1520,6 +1520,319 @@ function ff_hevc_put_hevc_epel_h32_8_neon, export=1 ret endfunc +.macro EPEL_UNI_W_H_HEADER elems=4s +ldr x12, [sp] +sub x2, x2, #1 +movrel x9, epel_filters +add x9, x9, x12, lsl #2 +ld1r{v28.4s}, [x9] +mov w10, #-6 +sub w10, w10, w5 +dup v30.\elems, w6 +dup v31.4s, w10 +dup v29.4s, w7 +.endm + +function ff_hevc_put_hevc_epel_uni_w_h4_8_neon, export=1 +EPEL_UNI_W_H_HEADER 4h +sxtlv0.8h, v28.8b +1: +ld1 {v4.8b}, [x2], x3 +subsw4, w4, #1 +uxtlv4.8h, v4.8b +ext v5.16b, v4.16b, v4.16b, #2 +ext v6.16b, v4.16b, v4.16b, #4 +ext v7.16b, v4.16b, v4.16b, #6 +mul v16.4h, v4.4h, v0.h[0] +mla v16.4h, v5.4h, v0.h[1] +mla v16.4h, v6.4h, v0.h[2] +mla v16.4h, v7.4h, v0.h[3] +smull v16.4s, v16.4h, v30.4h +sqrshl v16.4s, v16.4s, v31.4s +sqadd v16.4s, v16.4s, v29.4s +sqxtn v16.4h, v16.4s +sqxtun v16.8b, v16.8h +str s16, [x0] +add x0, x0, x1 +b.hi1b +ret +endfunc + +function ff_hevc_put_hevc_epel_uni_w_h6_8_neon, export=1 +EPEL_UNI_W_H_HEADER 8h +sub x1, x1, #4 +sxtlv0.8h, v28.8b +1: +ld1 {v3.8b, v4.8b}, [x2], x3 +subsw4, w4, #1 +uxtlv3.8h, v3.8b +uxtlv4.8h, v4.8b +ext v5.16b, v3.16b, v4.16b, #2 +ext v6.16b, v3.16b, v4.16b, #4 +ext v7.16b, v3.16b, v4.16b, #6 +mul v16.8h, v3.8h, v0.h[0] +mla v16.8h, v5.8h, v0.h[1] +mla v16.8h, v6.8h, v0.h[2] +mla v16.8h, v7.8h, v0.h[3] +smull v17.4s, v16.4h, v30.4h +smull2 v18.4s, v16.8h, v30.8h +sqrshl v17.4s, v17.4s, v31.4s +sqrshl v18.4s, v18.4s, v31.4s +sqadd v17.4s, v17.4s, v29.4s +sqadd v18.4s, v18.4s, v29.4s +sqxtn v16.4h, v17.4s +sqxtn2 v16.8h, v18.4s +sqxtun v16.8b, v16.8h +str s16, [x0], #4 +st1 {v16.h}[2], [x0], x1 +b.hi1b +ret +endfunc + +function ff_hevc_put_hevc_epel_uni_w_h8_8_neon, export=1 +EPEL_UNI_W_H_HEADER 8h +sxtlv0.8h, v28.8b +1: +ld1 {v3.8b, v4.8b}, [x2], x3 +subsw4, w4, #1 +uxtlv3.8h, v3.8b +uxtlv4.8h, v4.8b +ext v5.16b, v3.16b, v4.16b, #2 +ext v6.16b, v3.16b, v4.16b, #4 +ext v7.16b, v3.16b, v4.16b, #6 +mul v16.8h, v3.8h, v0.h[0] +mla v16.8h, v5.8h, v0.h[1] +mla v16.8h, v6.8h, v0.h[2] +mla v16.8h, v7.8h, v0.h[3] +smull v17.4s, v16.4h, v30.4h +smull2 v18.4s, v16.8h, v30.8h +sqrshl v17.4s,
[FFmpeg-devel] [PATCH 06/21] aarch64: hevc: Implement a neon version of put_hevc_epel_h*_8
AWS Graviton 3: put_hevc_epel_h4_8_c: 64.7 put_hevc_epel_h4_8_neon: 25.0 put_hevc_epel_h4_8_i8mm: 21.2 put_hevc_epel_h6_8_c: 130.0 put_hevc_epel_h6_8_neon: 40.7 put_hevc_epel_h6_8_i8mm: 36.5 put_hevc_epel_h8_8_c: 209.0 put_hevc_epel_h8_8_neon: 45.2 put_hevc_epel_h8_8_i8mm: 41.2 put_hevc_epel_h12_8_c: 465.5 put_hevc_epel_h12_8_neon: 104.5 put_hevc_epel_h12_8_i8mm: 86.5 put_hevc_epel_h16_8_c: 830.7 put_hevc_epel_h16_8_neon: 134.2 put_hevc_epel_h16_8_i8mm: 114.0 put_hevc_epel_h24_8_c: 1844.7 put_hevc_epel_h24_8_neon: 282.2 put_hevc_epel_h24_8_i8mm: 277.2 put_hevc_epel_h32_8_c: 3227.5 put_hevc_epel_h32_8_neon: 501.5 put_hevc_epel_h32_8_i8mm: 396.0 put_hevc_epel_h48_8_c: 7229.2 put_hevc_epel_h48_8_neon: 1120.2 put_hevc_epel_h48_8_i8mm: 901.2 put_hevc_epel_h64_8_c: 12869.0 put_hevc_epel_h64_8_neon: 1999.2 put_hevc_epel_h64_8_i8mm: 1610.5 --- libavcodec/aarch64/hevcdsp_epel_neon.S| 194 +- libavcodec/aarch64/hevcdsp_init_aarch64.c | 17 ++ 2 files changed, 209 insertions(+), 2 deletions(-) diff --git a/libavcodec/aarch64/hevcdsp_epel_neon.S b/libavcodec/aarch64/hevcdsp_epel_neon.S index d3f0a26f79..419e83529a 100644 --- a/libavcodec/aarch64/hevcdsp_epel_neon.S +++ b/libavcodec/aarch64/hevcdsp_epel_neon.S @@ -1321,8 +1321,6 @@ function ff_hevc_put_hevc_epel_uni_v64_8_neon, export=1 ret endfunc -#if HAVE_I8MM -ENABLE_I8MM .macro EPEL_H_HEADER movrel x5, epel_filters @@ -1332,6 +1330,198 @@ ENABLE_I8MM mov x10, #(MAX_PB_SIZE * 2) .endm +function ff_hevc_put_hevc_epel_h4_8_neon, export=1 +EPEL_H_HEADER +sxtlv0.8h, v30.8b +1: ld1 {v4.8b}, [x1], x2 +subsw3, w3, #1 // height +uxtlv4.8h, v4.8b +ext v5.16b, v4.16b, v4.16b, #2 +ext v6.16b, v4.16b, v4.16b, #4 +ext v7.16b, v4.16b, v4.16b, #6 +mul v16.4h, v4.4h, v0.h[0] +mla v16.4h, v5.4h, v0.h[1] +mla v16.4h, v6.4h, v0.h[2] +mla v16.4h, v7.4h, v0.h[3] +st1 {v16.4h}, [x0], x10 +b.ne1b +ret +endfunc + +function ff_hevc_put_hevc_epel_h6_8_neon, export=1 +EPEL_H_HEADER +sxtlv0.8h, v30.8b +add x6, x0, #8 +1: ld1 {v3.16b}, [x1], x2 +subsw3, w3, #1 // height +uxtl2 v4.8h, v3.16b +uxtlv3.8h, v3.8b +ext v5.16b, v3.16b, v4.16b, #2 +ext v6.16b, v3.16b, v4.16b, #4 +ext v7.16b, v3.16b, v4.16b, #6 +mul v16.8h, v3.8h, v0.h[0] +mla v16.8h, v5.8h, v0.h[1] +mla v16.8h, v6.8h, v0.h[2] +mla v16.8h, v7.8h, v0.h[3] +st1 {v16.4h}, [x0], x10 +st1 {v16.s}[2], [x6], x10 +b.ne1b +ret +endfunc + +function ff_hevc_put_hevc_epel_h8_8_neon, export=1 +EPEL_H_HEADER +sxtlv0.8h, v30.8b +1: ld1 {v3.16b}, [x1], x2 +subsw3, w3, #1 // height +uxtl2 v4.8h, v3.16b +uxtlv3.8h, v3.8b +ext v5.16b, v3.16b, v4.16b, #2 +ext v6.16b, v3.16b, v4.16b, #4 +ext v7.16b, v3.16b, v4.16b, #6 +mul v16.8h, v3.8h, v0.h[0] +mla v16.8h, v5.8h, v0.h[1] +mla v16.8h, v6.8h, v0.h[2] +mla v16.8h, v7.8h, v0.h[3] +st1 {v16.8h}, [x0], x10 +b.ne1b +ret +endfunc + +function ff_hevc_put_hevc_epel_h12_8_neon, export=1 +EPEL_H_HEADER +add x6, x0, #16 +sxtlv0.8h, v30.8b +1: ld1 {v3.16b}, [x1], x2 +subsw3, w3, #1 // height +uxtl2 v4.8h, v3.16b +uxtlv3.8h, v3.8b +ext v5.16b, v3.16b, v4.16b, #2 +ext v6.16b, v3.16b, v4.16b, #4 +ext v7.16b, v3.16b, v4.16b, #6 +ext v20.16b, v4.16b, v4.16b, #2 +ext v21.16b, v4.16b, v4.16b, #4 +ext v22.16b, v4.16b, v4.16b, #6 +mul v16.8h, v3.8h, v0.h[0] +mla v16.8h, v5.8h, v0.h[1] +mla v16.8h, v6.8h, v0.h[2] +mla v16.8h, v7.8h, v0.h[3] +mul v17.4h, v4.4h, v0.h[0] +mla v17.4h, v20.4h, v0.h[1] +mla v17.4h, v21.4h, v0.h[2] +mla v17.4h, v22.4h, v0.h[3] +st1 {v16.8h}, [x0], x10 +st1
[FFmpeg-devel] [PATCH 12/21] aarch64: hevc: Produce epel_uni_w_hv functions for both neon and i8mm
AWS Graviton 3: put_hevc_epel_uni_w_hv4_8_c: 191.2 put_hevc_epel_uni_w_hv4_8_neon: 87.7 put_hevc_epel_uni_w_hv4_8_i8mm: 83.2 put_hevc_epel_uni_w_hv6_8_c: 349.5 put_hevc_epel_uni_w_hv6_8_neon: 153.0 put_hevc_epel_uni_w_hv6_8_i8mm: 148.5 put_hevc_epel_uni_w_hv8_8_c: 581.2 put_hevc_epel_uni_w_hv8_8_neon: 166.7 put_hevc_epel_uni_w_hv8_8_i8mm: 163.5 put_hevc_epel_uni_w_hv12_8_c: 1230.0 put_hevc_epel_uni_w_hv12_8_neon: 387.7 put_hevc_epel_uni_w_hv12_8_i8mm: 370.2 put_hevc_epel_uni_w_hv16_8_c: 2003.2 put_hevc_epel_uni_w_hv16_8_neon: 501.5 put_hevc_epel_uni_w_hv16_8_i8mm: 490.2 put_hevc_epel_uni_w_hv24_8_c: 4448.7 put_hevc_epel_uni_w_hv24_8_neon: 1092.2 put_hevc_epel_uni_w_hv24_8_i8mm: 1069.7 put_hevc_epel_uni_w_hv32_8_c: 7817.2 put_hevc_epel_uni_w_hv32_8_neon: 1916.2 put_hevc_epel_uni_w_hv32_8_i8mm: 1829.5 put_hevc_epel_uni_w_hv48_8_c: 16728.2 put_hevc_epel_uni_w_hv48_8_neon: 4263.7 put_hevc_epel_uni_w_hv48_8_i8mm: 4342.7 put_hevc_epel_uni_w_hv64_8_c: 29563.2 put_hevc_epel_uni_w_hv64_8_neon: 7474.2 put_hevc_epel_uni_w_hv64_8_i8mm: 7128.5 --- libavcodec/aarch64/hevcdsp_epel_neon.S| 55 --- libavcodec/aarch64/hevcdsp_init_aarch64.c | 6 +++ 2 files changed, 36 insertions(+), 25 deletions(-) diff --git a/libavcodec/aarch64/hevcdsp_epel_neon.S b/libavcodec/aarch64/hevcdsp_epel_neon.S index 876db9d449..d0c6205e1c 100644 --- a/libavcodec/aarch64/hevcdsp_epel_neon.S +++ b/libavcodec/aarch64/hevcdsp_epel_neon.S @@ -3573,10 +3573,8 @@ function hevc_put_hevc_epel_uni_w_hv24_8_end_neon ret endfunc -#if HAVE_I8MM -ENABLE_I8MM - -function ff_hevc_put_hevc_epel_uni_w_hv4_8_neon_i8mm, export=1 +.macro epel_uni_w_hv suffix +function ff_hevc_put_hevc_epel_uni_w_hv4_8_\suffix, export=1 epel_uni_w_hv_start sxtwx4, w4 @@ -3591,14 +3589,14 @@ function ff_hevc_put_hevc_epel_uni_w_hv4_8_neon_i8mm, export=1 mov x2, x3 add x3, x4, #3 mov x4, x5 -bl X(ff_hevc_put_hevc_epel_h4_8_neon_i8mm) +bl X(ff_hevc_put_hevc_epel_h4_8_\suffix) ldp x4, x6, [sp, #16] ldp x0, x1, [sp, #32] ldr x30, [sp], #48 b hevc_put_hevc_epel_uni_w_hv4_8_end_neon endfunc -function ff_hevc_put_hevc_epel_uni_w_hv6_8_neon_i8mm, export=1 +function ff_hevc_put_hevc_epel_uni_w_hv6_8_\suffix, export=1 epel_uni_w_hv_start sxtwx4, w4 @@ -3613,14 +3611,14 @@ function ff_hevc_put_hevc_epel_uni_w_hv6_8_neon_i8mm, export=1 mov x2, x3 add x3, x4, #3 mov x4, x5 -bl X(ff_hevc_put_hevc_epel_h6_8_neon_i8mm) +bl X(ff_hevc_put_hevc_epel_h6_8_\suffix) ldp x4, x6, [sp, #16] ldp x0, x1, [sp, #32] ldr x30, [sp], #48 b hevc_put_hevc_epel_uni_w_hv6_8_end_neon endfunc -function ff_hevc_put_hevc_epel_uni_w_hv8_8_neon_i8mm, export=1 +function ff_hevc_put_hevc_epel_uni_w_hv8_8_\suffix, export=1 epel_uni_w_hv_start sxtwx4, w4 @@ -3635,14 +3633,14 @@ function ff_hevc_put_hevc_epel_uni_w_hv8_8_neon_i8mm, export=1 mov x2, x3 add x3, x4, #3 mov x4, x5 -bl X(ff_hevc_put_hevc_epel_h8_8_neon_i8mm) +bl X(ff_hevc_put_hevc_epel_h8_8_\suffix) ldp x4, x6, [sp, #16] ldp x0, x1, [sp, #32] ldr x30, [sp], #48 b hevc_put_hevc_epel_uni_w_hv8_8_end_neon endfunc -function ff_hevc_put_hevc_epel_uni_w_hv12_8_neon_i8mm, export=1 +function ff_hevc_put_hevc_epel_uni_w_hv12_8_\suffix, export=1 epel_uni_w_hv_start sxtwx4, w4 @@ -3657,14 +3655,14 @@ function ff_hevc_put_hevc_epel_uni_w_hv12_8_neon_i8mm, export=1 mov x2, x3 add x3, x4, #3 mov x4, x5 -bl X(ff_hevc_put_hevc_epel_h12_8_neon_i8mm) +bl X(ff_hevc_put_hevc_epel_h12_8_\suffix) ldp x4, x6, [sp, #16] ldp x0, x1, [sp, #32] ldr x30, [sp], #48 b hevc_put_hevc_epel_uni_w_hv12_8_end_neon endfunc -function ff_hevc_put_hevc_epel_uni_w_hv16_8_neon_i8mm, export=1 +function ff_hevc_put_hevc_epel_uni_w_hv16_8_\suffix, export=1 epel_uni_w_hv_start sxtwx4, w4 @@ -3679,14 +3677,14 @@ function ff_hevc_put_hevc_epel_uni_w_hv16_8_neon_i8mm, export=1 mov x2, x3 add x3, x4, #3 mov x4, x5 -bl X(ff_hevc_put_hevc_epel_h16_8_neon_i8mm) +bl X(ff_hevc_put_hevc_epel_h16_8_\suffix) ldp x4, x6, [sp, #16]
[FFmpeg-devel] [PATCH 11/21] aarch64: hevc: Produce epel_uni_hv functions for both neon and i8mm
AWS Graviton 3: put_hevc_epel_uni_hv4_8_c: 163.5 put_hevc_epel_uni_hv4_8_neon: 59.7 put_hevc_epel_uni_hv4_8_i8mm: 57.5 put_hevc_epel_uni_hv6_8_c: 344.7 put_hevc_epel_uni_hv6_8_neon: 105.0 put_hevc_epel_uni_hv6_8_i8mm: 102.7 put_hevc_epel_uni_hv8_8_c: 552.2 put_hevc_epel_uni_hv8_8_neon: 111.2 put_hevc_epel_uni_hv8_8_i8mm: 104.0 put_hevc_epel_uni_hv12_8_c: 1195.0 put_hevc_epel_uni_hv12_8_neon: 248.7 put_hevc_epel_uni_hv12_8_i8mm: 229.5 put_hevc_epel_uni_hv16_8_c: 1910.2 put_hevc_epel_uni_hv16_8_neon: 339.5 put_hevc_epel_uni_hv16_8_i8mm: 323.2 put_hevc_epel_uni_hv24_8_c: 4048.2 put_hevc_epel_uni_hv24_8_neon: 737.7 put_hevc_epel_uni_hv24_8_i8mm: 713.7 put_hevc_epel_uni_hv32_8_c: 6865.7 put_hevc_epel_uni_hv32_8_neon: 1285.0 put_hevc_epel_uni_hv32_8_i8mm: 1206.0 put_hevc_epel_uni_hv48_8_c: 15830.5 put_hevc_epel_uni_hv48_8_neon: 2844.7 put_hevc_epel_uni_hv48_8_i8mm: 2914.0 put_hevc_epel_uni_hv64_8_c: 27912.7 put_hevc_epel_uni_hv64_8_neon: 4970.5 put_hevc_epel_uni_hv64_8_i8mm: 4653.7 --- libavcodec/aarch64/hevcdsp_epel_neon.S| 67 +++ libavcodec/aarch64/hevcdsp_init_aarch64.c | 5 ++ 2 files changed, 38 insertions(+), 34 deletions(-) diff --git a/libavcodec/aarch64/hevcdsp_epel_neon.S b/libavcodec/aarch64/hevcdsp_epel_neon.S index 024464723b..876db9d449 100644 --- a/libavcodec/aarch64/hevcdsp_epel_neon.S +++ b/libavcodec/aarch64/hevcdsp_epel_neon.S @@ -2460,14 +2460,6 @@ endfunc epel_hv neon -#if HAVE_I8MM -ENABLE_I8MM - -epel_hv neon_i8mm - -DISABLE_I8MM -#endif - function hevc_put_hevc_epel_uni_hv4_8_end_neon load_epel_filterh x6, x5 mov x10, #(MAX_PB_SIZE * 2) @@ -2596,10 +2588,8 @@ function hevc_put_hevc_epel_uni_hv24_8_end_neon 2: ret endfunc -#if HAVE_I8MM -ENABLE_I8MM - -function ff_hevc_put_hevc_epel_uni_hv4_8_neon_i8mm, export=1 +.macro epel_uni_hv suffix +function ff_hevc_put_hevc_epel_uni_hv4_8_\suffix, export=1 add w10, w4, #3 lsl x10, x10, #7 sub sp, sp, x10 // tmp_array @@ -2611,14 +2601,14 @@ function ff_hevc_put_hevc_epel_uni_hv4_8_neon_i8mm, export=1 mov x2, x3 add w3, w4, #3 mov x4, x5 -bl X(ff_hevc_put_hevc_epel_h4_8_neon_i8mm) +bl X(ff_hevc_put_hevc_epel_h4_8_\suffix) ldp x4, x6, [sp, #16] ldp x0, x1, [sp, #32] ldr x30, [sp], #48 b hevc_put_hevc_epel_uni_hv4_8_end_neon endfunc -function ff_hevc_put_hevc_epel_uni_hv6_8_neon_i8mm, export=1 +function ff_hevc_put_hevc_epel_uni_hv6_8_\suffix, export=1 add w10, w4, #3 lsl x10, x10, #7 sub sp, sp, x10 // tmp_array @@ -2630,14 +2620,14 @@ function ff_hevc_put_hevc_epel_uni_hv6_8_neon_i8mm, export=1 mov x2, x3 add w3, w4, #3 mov x4, x5 -bl X(ff_hevc_put_hevc_epel_h6_8_neon_i8mm) +bl X(ff_hevc_put_hevc_epel_h6_8_\suffix) ldp x4, x6, [sp, #16] ldp x0, x1, [sp, #32] ldr x30, [sp], #48 b hevc_put_hevc_epel_uni_hv6_8_end_neon endfunc -function ff_hevc_put_hevc_epel_uni_hv8_8_neon_i8mm, export=1 +function ff_hevc_put_hevc_epel_uni_hv8_8_\suffix, export=1 add w10, w4, #3 lsl x10, x10, #7 sub sp, sp, x10 // tmp_array @@ -2649,14 +2639,14 @@ function ff_hevc_put_hevc_epel_uni_hv8_8_neon_i8mm, export=1 mov x2, x3 add w3, w4, #3 mov x4, x5 -bl X(ff_hevc_put_hevc_epel_h8_8_neon_i8mm) +bl X(ff_hevc_put_hevc_epel_h8_8_\suffix) ldp x4, x6, [sp, #16] ldp x0, x1, [sp, #32] ldr x30, [sp], #48 b hevc_put_hevc_epel_uni_hv8_8_end_neon endfunc -function ff_hevc_put_hevc_epel_uni_hv12_8_neon_i8mm, export=1 +function ff_hevc_put_hevc_epel_uni_hv12_8_\suffix, export=1 add w10, w4, #3 lsl x10, x10, #7 sub sp, sp, x10 // tmp_array @@ -2668,14 +2658,14 @@ function ff_hevc_put_hevc_epel_uni_hv12_8_neon_i8mm, export=1 mov x2, x3 add w3, w4, #3 mov x4, x5 -bl X(ff_hevc_put_hevc_epel_h12_8_neon_i8mm) +bl X(ff_hevc_put_hevc_epel_h12_8_\suffix) ldp x4, x6, [sp, #16] ldp x0, x1, [sp, #32] ldr x30, [sp], #48 b hevc_put_hevc_epel_uni_hv12_8_end_neon endfunc -function ff_hevc_put_hevc_epel_uni_hv16_8_neon_i8mm, export=1 +function ff_hevc_put_hevc_epel_uni_hv16_8_\suffix, export=1 add
[FFmpeg-devel] [PATCH 05/21] aarch64: hevc: Use ld1r instead of ldr+dup in hevc_qpel_uni_w_h
--- libavcodec/aarch64/hevcdsp_qpel_neon.S | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/libavcodec/aarch64/hevcdsp_qpel_neon.S b/libavcodec/aarch64/hevcdsp_qpel_neon.S index 0fcded344b..062b7d4d0f 100644 --- a/libavcodec/aarch64/hevcdsp_qpel_neon.S +++ b/libavcodec/aarch64/hevcdsp_qpel_neon.S @@ -2462,8 +2462,7 @@ endfunc sub x2, x2, #3 movrel x9, qpel_filters add x9, x9, x12, lsl #3 -ldr x11, [x9] -dup v28.2d, x11 +ld1r{v28.2d}, [x9] mov w10, #-6 sub w10, w10, w5 dup v30.4s, w6 // wx -- 2.39.3 (Apple Git-146) ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH 10/21] aarch64: hevc: Produce epel_hv functions for both plain neon and i8mm
AWS Graviton 3: put_hevc_epel_hv4_8_c: 163.7 put_hevc_epel_hv4_8_neon: 52.5 put_hevc_epel_hv4_8_i8mm: 49.5 put_hevc_epel_hv6_8_c: 292.2 put_hevc_epel_hv6_8_neon: 97.7 put_hevc_epel_hv6_8_i8mm: 101.2 put_hevc_epel_hv8_8_c: 471.0 put_hevc_epel_hv8_8_neon: 106.7 put_hevc_epel_hv8_8_i8mm: 102.5 put_hevc_epel_hv12_8_c: 1030.2 put_hevc_epel_hv12_8_neon: 240.5 put_hevc_epel_hv12_8_i8mm: 215.0 put_hevc_epel_hv16_8_c: 1711.5 put_hevc_epel_hv16_8_neon: 340.2 put_hevc_epel_hv16_8_i8mm: 319.2 put_hevc_epel_hv24_8_c: 3670.0 put_hevc_epel_hv24_8_neon: 702.0 put_hevc_epel_hv24_8_i8mm: 666.5 put_hevc_epel_hv32_8_c: 6785.5 put_hevc_epel_hv32_8_neon: 1247.0 put_hevc_epel_hv32_8_i8mm: 1169.0 put_hevc_epel_hv48_8_c: 14689.7 put_hevc_epel_hv48_8_neon: 2665.2 put_hevc_epel_hv48_8_i8mm: 2740.0 put_hevc_epel_hv64_8_c: 25899.2 put_hevc_epel_hv64_8_neon: 4801.2 put_hevc_epel_hv64_8_i8mm: 4487.7 --- libavcodec/aarch64/hevcdsp_epel_neon.S| 58 +-- libavcodec/aarch64/hevcdsp_init_aarch64.c | 6 +++ 2 files changed, 38 insertions(+), 26 deletions(-) diff --git a/libavcodec/aarch64/hevcdsp_epel_neon.S b/libavcodec/aarch64/hevcdsp_epel_neon.S index 2088630da1..024464723b 100644 --- a/libavcodec/aarch64/hevcdsp_epel_neon.S +++ b/libavcodec/aarch64/hevcdsp_epel_neon.S @@ -2298,10 +2298,8 @@ function hevc_put_hevc_epel_hv24_8_end_neon 2: ret endfunc -#if HAVE_I8MM -ENABLE_I8MM - -function ff_hevc_put_hevc_epel_hv4_8_neon_i8mm, export=1 +.macro epel_hv suffix +function ff_hevc_put_hevc_epel_hv4_8_\suffix, export=1 add w10, w3, #3 lsl x10, x10, #7 sub sp, sp, x10 // tmp_array @@ -2310,13 +2308,13 @@ function ff_hevc_put_hevc_epel_hv4_8_neon_i8mm, export=1 add x0, sp, #32 sub x1, x1, x2 add w3, w3, #3 -bl X(ff_hevc_put_hevc_epel_h4_8_neon_i8mm) +bl X(ff_hevc_put_hevc_epel_h4_8_\suffix) ldp x0, x3, [sp, #16] ldp x5, x30, [sp], #32 b hevc_put_hevc_epel_hv4_8_end_neon endfunc -function ff_hevc_put_hevc_epel_hv6_8_neon_i8mm, export=1 +function ff_hevc_put_hevc_epel_hv6_8_\suffix, export=1 add w10, w3, #3 lsl x10, x10, #7 sub sp, sp, x10 // tmp_array @@ -2325,13 +2323,13 @@ function ff_hevc_put_hevc_epel_hv6_8_neon_i8mm, export=1 add x0, sp, #32 sub x1, x1, x2 add w3, w3, #3 -bl X(ff_hevc_put_hevc_epel_h6_8_neon_i8mm) +bl X(ff_hevc_put_hevc_epel_h6_8_\suffix) ldp x0, x3, [sp, #16] ldp x5, x30, [sp], #32 b hevc_put_hevc_epel_hv6_8_end_neon endfunc -function ff_hevc_put_hevc_epel_hv8_8_neon_i8mm, export=1 +function ff_hevc_put_hevc_epel_hv8_8_\suffix, export=1 add w10, w3, #3 lsl x10, x10, #7 sub sp, sp, x10 // tmp_array @@ -2340,13 +2338,13 @@ function ff_hevc_put_hevc_epel_hv8_8_neon_i8mm, export=1 add x0, sp, #32 sub x1, x1, x2 add w3, w3, #3 -bl X(ff_hevc_put_hevc_epel_h8_8_neon_i8mm) +bl X(ff_hevc_put_hevc_epel_h8_8_\suffix) ldp x0, x3, [sp, #16] ldp x5, x30, [sp], #32 b hevc_put_hevc_epel_hv8_8_end_neon endfunc -function ff_hevc_put_hevc_epel_hv12_8_neon_i8mm, export=1 +function ff_hevc_put_hevc_epel_hv12_8_\suffix, export=1 add w10, w3, #3 lsl x10, x10, #7 sub sp, sp, x10 // tmp_array @@ -2355,13 +2353,13 @@ function ff_hevc_put_hevc_epel_hv12_8_neon_i8mm, export=1 add x0, sp, #32 sub x1, x1, x2 add w3, w3, #3 -bl X(ff_hevc_put_hevc_epel_h12_8_neon_i8mm) +bl X(ff_hevc_put_hevc_epel_h12_8_\suffix) ldp x0, x3, [sp, #16] ldp x5, x30, [sp], #32 b hevc_put_hevc_epel_hv12_8_end_neon endfunc -function ff_hevc_put_hevc_epel_hv16_8_neon_i8mm, export=1 +function ff_hevc_put_hevc_epel_hv16_8_\suffix, export=1 add w10, w3, #3 lsl x10, x10, #7 sub sp, sp, x10 // tmp_array @@ -2370,13 +2368,13 @@ function ff_hevc_put_hevc_epel_hv16_8_neon_i8mm, export=1 add x0, sp, #32 sub x1, x1, x2 add w3, w3, #3 -bl X(ff_hevc_put_hevc_epel_h16_8_neon_i8mm) +bl X(ff_hevc_put_hevc_epel_h16_8_\suffix) ldp x0, x3, [sp, #16] ldp x5, x30, [sp], #32 b
[FFmpeg-devel] [PATCH 09/21] aarch64: hevc: Reorder epel_hv functions to prepare for templating
This is a pure reordering of code without changing anything in the individual functions. --- libavcodec/aarch64/hevcdsp_epel_neon.S | 971 + 1 file changed, 497 insertions(+), 474 deletions(-) diff --git a/libavcodec/aarch64/hevcdsp_epel_neon.S b/libavcodec/aarch64/hevcdsp_epel_neon.S index 6be171ece1..2088630da1 100644 --- a/libavcodec/aarch64/hevcdsp_epel_neon.S +++ b/libavcodec/aarch64/hevcdsp_epel_neon.S @@ -2173,21 +2173,9 @@ function ff_hevc_put_hevc_epel_h64_8_neon_i8mm, export=1 ret endfunc +DISABLE_I8MM +#endif -function ff_hevc_put_hevc_epel_hv4_8_neon_i8mm, export=1 -add w10, w3, #3 -lsl x10, x10, #7 -sub sp, sp, x10 // tmp_array -stp x5, x30, [sp, #-32]! -stp x0, x3, [sp, #16] -add x0, sp, #32 -sub x1, x1, x2 -add w3, w3, #3 -bl X(ff_hevc_put_hevc_epel_h4_8_neon_i8mm) -ldp x0, x3, [sp, #16] -ldp x5, x30, [sp], #32 -b hevc_put_hevc_epel_hv4_8_end_neon -endfunc function hevc_put_hevc_epel_hv4_8_end_neon load_epel_filterh x5, x4 @@ -2207,21 +2195,6 @@ function hevc_put_hevc_epel_hv4_8_end_neon 2: ret endfunc -function ff_hevc_put_hevc_epel_hv6_8_neon_i8mm, export=1 -add w10, w3, #3 -lsl x10, x10, #7 -sub sp, sp, x10 // tmp_array -stp x5, x30, [sp, #-32]! -stp x0, x3, [sp, #16] -add x0, sp, #32 -sub x1, x1, x2 -add w3, w3, #3 -bl X(ff_hevc_put_hevc_epel_h6_8_neon_i8mm) -ldp x0, x3, [sp, #16] -ldp x5, x30, [sp], #32 -b hevc_put_hevc_epel_hv6_8_end_neon -endfunc - function hevc_put_hevc_epel_hv6_8_end_neon load_epel_filterh x5, x4 mov x5, #120 @@ -2243,21 +2216,6 @@ function hevc_put_hevc_epel_hv6_8_end_neon 2: ret endfunc -function ff_hevc_put_hevc_epel_hv8_8_neon_i8mm, export=1 -add w10, w3, #3 -lsl x10, x10, #7 -sub sp, sp, x10 // tmp_array -stp x5, x30, [sp, #-32]! -stp x0, x3, [sp, #16] -add x0, sp, #32 -sub x1, x1, x2 -add w3, w3, #3 -bl X(ff_hevc_put_hevc_epel_h8_8_neon_i8mm) -ldp x0, x3, [sp, #16] -ldp x5, x30, [sp], #32 -b hevc_put_hevc_epel_hv8_8_end_neon -endfunc - function hevc_put_hevc_epel_hv8_8_end_neon load_epel_filterh x5, x4 mov x10, #(MAX_PB_SIZE * 2) @@ -2277,21 +2235,6 @@ function hevc_put_hevc_epel_hv8_8_end_neon 2: ret endfunc -function ff_hevc_put_hevc_epel_hv12_8_neon_i8mm, export=1 -add w10, w3, #3 -lsl x10, x10, #7 -sub sp, sp, x10 // tmp_array -stp x5, x30, [sp, #-32]! -stp x0, x3, [sp, #16] -add x0, sp, #32 -sub x1, x1, x2 -add w3, w3, #3 -bl X(ff_hevc_put_hevc_epel_h12_8_neon_i8mm) -ldp x0, x3, [sp, #16] -ldp x5, x30, [sp], #32 -b hevc_put_hevc_epel_hv12_8_end_neon -endfunc - function hevc_put_hevc_epel_hv12_8_end_neon load_epel_filterh x5, x4 mov x5, #112 @@ -2313,21 +2256,6 @@ function hevc_put_hevc_epel_hv12_8_end_neon 2: ret endfunc -function ff_hevc_put_hevc_epel_hv16_8_neon_i8mm, export=1 -add w10, w3, #3 -lsl x10, x10, #7 -sub sp, sp, x10 // tmp_array -stp x5, x30, [sp, #-32]! -stp x0, x3, [sp, #16] -add x0, sp, #32 -sub x1, x1, x2 -add w3, w3, #3 -bl X(ff_hevc_put_hevc_epel_h16_8_neon_i8mm) -ldp x0, x3, [sp, #16] -ldp x5, x30, [sp], #32 -b hevc_put_hevc_epel_hv16_8_end_neon -endfunc - function hevc_put_hevc_epel_hv16_8_end_neon load_epel_filterh x5, x4 mov x10, #(MAX_PB_SIZE * 2) @@ -2348,21 +2276,6 @@ function hevc_put_hevc_epel_hv16_8_end_neon 2: ret endfunc -function ff_hevc_put_hevc_epel_hv24_8_neon_i8mm, export=1 -add w10, w3, #3 -lsl x10, x10, #7 -sub sp, sp, x10 // tmp_array -stp x5, x30, [sp, #-32]! -stp x0, x3, [sp, #16] -add x0, sp, #32 -sub x1, x1, x2 -add w3, w3, #3 -bl
[FFmpeg-devel] [PATCH 04/21] aarch64: hevc: Specialize put_hevc_\type\()_h*_8_neon for horizontal looping
For widths of 32 pixels and more, loop first horizontally, then vertically. Previously, this function would process a 16 pixel wide slice of the block, looping vertically. After processing the whole height, it would backtrack and process the next 16 pixel wide slice. When doing 8tap filtering horizontally, the function must load 7 more pixels (in practice, 8) following the actual inputs, and this was done for each slice. By iterating first horizontally throughout each line, then vertically, we access data in a more cache friendly order, and we don't need to reload data unnecessarily. Keep the original order in put_hevc_\type\()_h12_8_neon; the only suboptimal case there is for width=24. But specializing an optimal variant for that would require more code, which might not be worth it. For the h16 case, this implementation would give a slowdown, as it now loads the first 8 pixels separately from the rest, but for larger widths, it is a gain. Therefore, keep the h16 case as it was (but remove the outer loop), and create a new specialized version for horizontal looping with 16 pixels at a time. Before: Cortex A53 A72 A73 Graviton 3 put_hevc_qpel_h16_8_neon: 710.5667.7692.5 211.0 put_hevc_qpel_h32_8_neon:2791.5 2643.5 2732.0 883.5 put_hevc_qpel_h64_8_neon: 10954.0 10657.0 10874.2 3241.5 After: put_hevc_qpel_h16_8_neon: 697.5663.5705.7 212.5 put_hevc_qpel_h32_8_neon:2767.2 2684.5 2791.2 920.5 put_hevc_qpel_h64_8_neon: 10559.2 10471.5 10932.2 3051.7 --- libavcodec/aarch64/hevcdsp_init_aarch64.c | 20 +++-- libavcodec/aarch64/hevcdsp_qpel_neon.S| 103 +- 2 files changed, 94 insertions(+), 29 deletions(-) diff --git a/libavcodec/aarch64/hevcdsp_init_aarch64.c b/libavcodec/aarch64/hevcdsp_init_aarch64.c index d2f2a3681f..1e9f5e32db 100644 --- a/libavcodec/aarch64/hevcdsp_init_aarch64.c +++ b/libavcodec/aarch64/hevcdsp_init_aarch64.c @@ -109,6 +109,8 @@ void ff_hevc_put_hevc_qpel_h12_8_neon(int16_t *dst, const uint8_t *_src, ptrdiff intptr_t mx, intptr_t my, int width); void ff_hevc_put_hevc_qpel_h16_8_neon(int16_t *dst, const uint8_t *_src, ptrdiff_t _srcstride, int height, intptr_t mx, intptr_t my, int width); +void ff_hevc_put_hevc_qpel_h32_8_neon(int16_t *dst, const uint8_t *_src, ptrdiff_t _srcstride, int height, + intptr_t mx, intptr_t my, int width); void ff_hevc_put_hevc_qpel_uni_h4_8_neon(uint8_t *_dst, ptrdiff_t _dststride, const uint8_t *_src, ptrdiff_t _srcstride, int height, intptr_t mx, intptr_t my, int width); @@ -124,6 +126,9 @@ void ff_hevc_put_hevc_qpel_uni_h12_8_neon(uint8_t *_dst, ptrdiff_t _dststride, c void ff_hevc_put_hevc_qpel_uni_h16_8_neon(uint8_t *_dst, ptrdiff_t _dststride, const uint8_t *_src, ptrdiff_t _srcstride, int height, intptr_t mx, intptr_t my, int width); +void ff_hevc_put_hevc_qpel_uni_h32_8_neon(uint8_t *_dst, ptrdiff_t _dststride, const uint8_t *_src, + ptrdiff_t _srcstride, int height, intptr_t mx, intptr_t + my, int width); void ff_hevc_put_hevc_qpel_bi_h4_8_neon(uint8_t *_dst, ptrdiff_t _dststride, const uint8_t *_src, ptrdiff_t _srcstride, const int16_t *src2, int height, intptr_t mx, intptr_t my, int width); @@ -139,6 +144,9 @@ void ff_hevc_put_hevc_qpel_bi_h12_8_neon(uint8_t *_dst, ptrdiff_t _dststride, co void ff_hevc_put_hevc_qpel_bi_h16_8_neon(uint8_t *_dst, ptrdiff_t _dststride, const uint8_t *_src, ptrdiff_t _srcstride, const int16_t *src2, int height, intptr_t mx, intptr_t my, int width); +void ff_hevc_put_hevc_qpel_bi_h32_8_neon(uint8_t *_dst, ptrdiff_t _dststride, const uint8_t *_src, + ptrdiff_t _srcstride, const int16_t *src2, int height, intptr_t + mx, intptr_t my, int width); #define NEON8_FNPROTO(fn, args, ext) \ void ff_hevc_put_hevc_##fn##4_8_neon##ext args; \ @@ -335,28 +343,28 @@ av_cold void ff_hevc_dsp_init_aarch64(HEVCDSPContext *c, const int bit_depth) c->put_hevc_qpel[3][0][1] = ff_hevc_put_hevc_qpel_h8_8_neon; c->put_hevc_qpel[4][0][1] = c->put_hevc_qpel[6][0][1] = ff_hevc_put_hevc_qpel_h12_8_neon; -c->put_hevc_qpel[5][0][1] = +c->put_hevc_qpel[5][0][1] = ff_hevc_put_hevc_qpel_h16_8_neon; c->put_hevc_qpel[7][0][1] = c->put_hevc_qpel[8][0][1] = -c->put_hevc_qpel[9][0][1] =
[FFmpeg-devel] [PATCH 03/21] aarch64: hevc: Merge consecutive stores in put_hevc_\type\()_h16_8_neon
This gets rid of a couple instructions, but the actual performance is almost identical on Cortex A72/A73. On Cortex A53, it is a handful of cycles faster. --- libavcodec/aarch64/hevcdsp_qpel_neon.S | 15 +-- 1 file changed, 5 insertions(+), 10 deletions(-) diff --git a/libavcodec/aarch64/hevcdsp_qpel_neon.S b/libavcodec/aarch64/hevcdsp_qpel_neon.S index 815d897094..432558bb95 100644 --- a/libavcodec/aarch64/hevcdsp_qpel_neon.S +++ b/libavcodec/aarch64/hevcdsp_qpel_neon.S @@ -512,11 +512,10 @@ function ff_hevc_put_hevc_\type\()_h16_8_neon, export=1 .ifc \type, qpel mov dststride, #(MAX_PB_SIZE << 1) lsl x13, srcstride, #1 // srcstridel -mov x14, #((MAX_PB_SIZE << 2) - 16) +mov x14, #(MAX_PB_SIZE << 2) .else lsl x14, dststride, #1 // dststridel lsl x13, srcstride, #1 // srcstridel -sub x14, x14, #8 .endif add x10, dst, dststride // dstb add x12, src, srcstride // srcb @@ -527,10 +526,8 @@ function ff_hevc_put_hevc_\type\()_h16_8_neon, export=1 bl ff_hevc_put_hevc_h16_8_neon .ifc \type, qpel -st1 {v26.8h}, [dst], #16 -st1 {v28.8h}, [x10], #16 -st1 {v27.8h}, [dst], x14 -st1 {v29.8h}, [x10], x14 +st1 {v26.8h, v27.8h}, [dst], x14 +st1 {v28.8h, v29.8h}, [x10], x14 .else .ifc \type, qpel_bi ld1 {v16.8h, v17.8h}, [ x4], x16 @@ -549,10 +546,8 @@ function ff_hevc_put_hevc_\type\()_h16_8_neon, export=1 sqrshrunv28.8b, v28.8h, #6 sqrshrunv29.8b, v29.8h, #6 .endif -st1 {v26.8b}, [dst], #8 -st1 {v28.8b}, [x10], #8 -st1 {v27.8b}, [dst], x14 -st1 {v29.8b}, [x10], x14 +st1 {v26.8b, v27.8b}, [dst], x14 +st1 {v28.8b, v29.8b}, [x10], x14 .endif b.gt1b // double line subswidth, width, #16 -- 2.39.3 (Apple Git-146) ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH 02/21] aarch64: hevc: Don't iterate with sp in ff_hevc_put_hevc_qpel_uni_w_hv32/64_8_neon_i8mm
Many of the routines within hevcdsp_epel_neon and hevcdsp_qpel_neon store temporary buffers on the stack. When consuming it, many of these functions use the stack pointer as incremental pointer for reading the data (instead of storing it in another register), which is rather unusual. Technically, this is fine as long as the pointer remains properly aligned. However in the case of ff_hevc_put_hevc_qpel_uni_w_hv64_8_neon_i8mm, after incrementing sp when reading data (within each 16 pixel wide stripe) it would then reset the stack pointer back to a lower value, for reading the next 16 pixel wide stripe, expecting the data to remain untouched. This can't be assumed; data on the stack below the stack pointer can be clobbered (e.g. by a signal handler). Some OS ABIs allow for a little margin that won't be touched, aka a red zone, but not all do. The ones that do, guarantee 16 or 128 bytes, not 9 KB. Convert this function to use a separate pointer register to iterate through the data, retaining the stack pointer to point at the bottom of the data we require to remain untouched. --- libavcodec/aarch64/hevcdsp_qpel_neon.S | 130 + 1 file changed, 66 insertions(+), 64 deletions(-) diff --git a/libavcodec/aarch64/hevcdsp_qpel_neon.S b/libavcodec/aarch64/hevcdsp_qpel_neon.S index 9be29cafe2..815d897094 100644 --- a/libavcodec/aarch64/hevcdsp_qpel_neon.S +++ b/libavcodec/aarch64/hevcdsp_qpel_neon.S @@ -3981,24 +3981,25 @@ function ff_hevc_put_hevc_qpel_uni_w_hv32_8_neon_i8mm, export=1 mov x11, sp mov w12, w22 mov x13, x20 +mov x14, sp 3: -ldp q16, q1, [sp] -add sp, sp, x10 -ldp q17, q2, [sp] -add sp, sp, x10 -ldp q18, q3, [sp] -add sp, sp, x10 -ldp q19, q4, [sp] -add sp, sp, x10 -ldp q20, q5, [sp] -add sp, sp, x10 -ldp q21, q6, [sp] -add sp, sp, x10 -ldp q22, q7, [sp] -add sp, sp, x10 +ldp q16, q1, [x11] +add x11, x11, x10 +ldp q17, q2, [x11] +add x11, x11, x10 +ldp q18, q3, [x11] +add x11, x11, x10 +ldp q19, q4, [x11] +add x11, x11, x10 +ldp q20, q5, [x11] +add x11, x11, x10 +ldp q21, q6, [x11] +add x11, x11, x10 +ldp q22, q7, [x11] +add x11, x11, x10 1: -ldp q23, q31, [sp] -add sp, sp, x10 +ldp q23, q31, [x11] +add x11, x11, x10 QPEL_FILTER_H v24, v16, v17, v18, v19, v20, v21, v22, v23 QPEL_FILTER_H2 v25, v16, v17, v18, v19, v20, v21, v22, v23 QPEL_FILTER_H v26, v1, v2, v3, v4, v5, v6, v7, v31 @@ -4007,8 +4008,8 @@ function ff_hevc_put_hevc_qpel_uni_w_hv32_8_neon_i8mm, export=1 subsw22, w22, #1 b.eq2f -ldp q16, q1, [sp] -add sp, sp, x10 +ldp q16, q1, [x11] +add x11, x11, x10 QPEL_FILTER_H v24, v17, v18, v19, v20, v21, v22, v23, v16 QPEL_FILTER_H2 v25, v17, v18, v19, v20, v21, v22, v23, v16 QPEL_FILTER_H v26, v2, v3, v4, v5, v6, v7, v31, v1 @@ -4017,8 +4018,8 @@ function ff_hevc_put_hevc_qpel_uni_w_hv32_8_neon_i8mm, export=1 subsw22, w22, #1 b.eq2f -ldp q17, q2, [sp] -add sp, sp, x10 +ldp q17, q2, [x11] +add x11, x11, x10 QPEL_FILTER_H v24, v18, v19, v20, v21, v22, v23, v16, v17 QPEL_FILTER_H2 v25, v18, v19, v20, v21, v22, v23, v16, v17 QPEL_FILTER_H v26, v3, v4, v5, v6, v7, v31, v1, v2 @@ -4027,8 +4028,8 @@ function ff_hevc_put_hevc_qpel_uni_w_hv32_8_neon_i8mm, export=1 subsw22, w22, #1 b.eq2f -ldp q18, q3, [sp] -add sp, sp, x10 +ldp q18, q3, [x11] +add x11, x11, x10 QPEL_FILTER_H v24, v19, v20, v21, v22, v23, v16, v17, v18 QPEL_FILTER_H2 v25, v19, v20, v21, v22, v23, v16, v17, v18 QPEL_FILTER_H v26, v4, v5, v6, v7, v31, v1, v2, v3 @@ -4037,8 +4038,8 @@ function ff_hevc_put_hevc_qpel_uni_w_hv32_8_neon_i8mm, export=1 subsw22, w22, #1 b.eq2f -ldp q19, q4, [sp] -add sp, sp, x10 +ldp q19, q4, [x11] +add x11, x11, x10
[FFmpeg-devel] [PATCH 01/21] aarch64: hevc: Reorder a misplaced function init line
Group the epel and qpel functions together. --- libavcodec/aarch64/hevcdsp_init_aarch64.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libavcodec/aarch64/hevcdsp_init_aarch64.c b/libavcodec/aarch64/hevcdsp_init_aarch64.c index 04692aa98e..d2f2a3681f 100644 --- a/libavcodec/aarch64/hevcdsp_init_aarch64.c +++ b/libavcodec/aarch64/hevcdsp_init_aarch64.c @@ -381,12 +381,12 @@ av_cold void ff_hevc_dsp_init_aarch64(HEVCDSPContext *c, const int bit_depth) NEON8_FNASSIGN(c->put_hevc_epel, 1, 1, epel_hv, _i8mm); NEON8_FNASSIGN(c->put_hevc_epel_uni, 1, 1, epel_uni_hv, _i8mm); NEON8_FNASSIGN(c->put_hevc_epel_uni_w, 0, 1, epel_uni_w_h ,_i8mm); +NEON8_FNASSIGN(c->put_hevc_epel_uni_w, 1, 1, epel_uni_w_hv, _i8mm); NEON8_FNASSIGN(c->put_hevc_epel_bi, 1, 1, epel_bi_hv, _i8mm); NEON8_FNASSIGN(c->put_hevc_qpel, 0, 1, qpel_h, _i8mm); NEON8_FNASSIGN(c->put_hevc_qpel, 1, 1, qpel_hv, _i8mm); NEON8_FNASSIGN(c->put_hevc_qpel_uni, 1, 1, qpel_uni_hv, _i8mm); NEON8_FNASSIGN(c->put_hevc_qpel_uni_w, 0, 1, qpel_uni_w_h, _i8mm); -NEON8_FNASSIGN(c->put_hevc_epel_uni_w, 1, 1, epel_uni_w_hv, _i8mm); NEON8_FNASSIGN_PARTIAL_5(c->put_hevc_qpel_uni_w, 1, 1, qpel_uni_w_hv, _i8mm); NEON8_FNASSIGN(c->put_hevc_qpel_bi, 1, 1, qpel_bi_hv, _i8mm); } -- 2.39.3 (Apple Git-146) ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH 00/21] aarch64: hevc: Add missing hevc_pel NEON functions
Hi, Since some time, we have pretty complete AArch64 NEON coverage for the hevc decoder. However, some of these functions require the I8MM instruction set extension, and many of them (but not all) lack a plain NEON version. This patchset fills in a regular NEON version of all functions where we have an I8MM function. For context; the I8MM instruction set extension is a mandatory part of armv8.6-a. E.g. Apple M2, AWS Graviton 3 have it, but Apple M1 and Ampere Altra don't. This patchset takes decoding of a 1080p HEVC clip from 402 fps to 649 fps on an Apple M1. Patch #2 also fixes a subtle bug in the existing implementation; two functions relied on the contents on the stack, below the stack pointer, being untouched within a function. If a signal gets delivered, those parts of the stack could be clobbered. // Martin Martin Storsjö (21): aarch64: hevc: Reorder a misplaced function init line aarch64: hevc: Don't iterate with sp in ff_hevc_put_hevc_qpel_uni_w_hv32/64_8_neon_i8mm aarch64: hevc: Merge consecutive stores in put_hevc_\type\()_h16_8_neon aarch64: hevc: Specialize put_hevc_\type\()_h*_8_neon for horizontal looping aarch64: hevc: Use ld1r instead of ldr+dup in hevc_qpel_uni_w_h aarch64: hevc: Implement a neon version of put_hevc_epel_h*_8 aarch64: hevc: Implement a neon version of hevc_epel_uni_w_h*_8 aarch64: hevc: Split the epel_*_hv functions into two parts aarch64: hevc: Reorder epel_hv functions to prepare for templating aarch64: hevc: Produce epel_hv functions for both plain neon and i8mm aarch64: hevc: Produce epel_uni_hv functions for both neon and i8mm aarch64: hevc: Produce epel_uni_w_hv functions for both neon and i8mm aarch64: hevc: Produce epel_bi_hv functions for both neon and i8mm aarch64: hevc: Implement a neon version of hevc_qpel_uni_w_h*_8 aarch64: hevc: Split the qpel_*_hv functions into two parts aarch64: hevc: Deduplicate the hevc_put_hevc_qpel_uni_w_hv*_8_end_neon functions aarch64: hevc: Reorder qpel_hv functions to prepare for templating aarch64: hevc: Produce plain neon versions of qpel_hv aarch64: hevc: Produce plain neon versions of qpel_uni_hv aarch64: hevc: Produce plain neon versions of qpel_uni_w_hv aarch64: hevc: Produce plain neon versions of qpel_bi_hv libavcodec/aarch64/hevcdsp_epel_neon.S| 1529 +++-- libavcodec/aarch64/hevcdsp_init_aarch64.c | 96 +- libavcodec/aarch64/hevcdsp_qpel_neon.S| 1804 + 3 files changed, 2291 insertions(+), 1138 deletions(-) -- 2.39.3 (Apple Git-146) ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH v2] configure: Explicitly check for static_assert
On Fri, 22 Mar 2024, Andreas Rheinhardt wrote: Martin Storsjö: Both patches seem to work fine with MSVC 19.27 - I vaguely prefer the v2 version, which is simpler. But to me, we could also just revert the change to libavcodec/ccaption_dec.c, and declare that we require MSVC 19.28 instead. MSVC 19.27, when executed with -std:c11 without -nologo, it prints this: /std:c11 is a preview implementation of the ISO C11 standard, and we're eager to hear about bugs and suggestions for improvements. However, note that these features are provided as-is without support. And I don't have any specific reasons for wanting to use this compiler - I just tested the lowest version that was supposed to be supported earlier and noted that it had broken recently. So to me, reverting to requiring _Static_assert would be quite ok as well. We can actually do both: Test for static_assert and for _Static_assert (to exclude MSVC 19.27; is 19.28 still supposed to be a preview implementation?). 19.28 no longer has that preview implementation banner, so from there on, it should be fine. The reason I prefer static_assert in the codebase is that _Static_assert is actually deprecated with C23 (although I don't think it will be removed any time). Ah, I see. Right, with that in mind, unifying usage to static_assert sounds good. No strong opinion either way about the configure checks still (or whether we should require _Static_assert to be supported), except that strictly requiring static_assert seems less kludgy than trying to define it ourselves. // Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH v2] configure: Explicitly check for static_assert
On Thu, 21 Mar 2024, Andreas Rheinhardt wrote: Andreas Rheinhardt: C11 provides static assertions via _Static_assert and provides static_assert as a convenience define for this in assert.h. MSVC 19.27 declares support for C11, but does not support _Static_assert, but somehow supports static_assert. That's therefore what we use. But apparently there are some old GCC toolchains where _Static_assert is supported, but assert.h does not provide the fallback define. Some fate boxes are affected by this [1]. This commit therefore checks whether static_assert works with assert.h included; if not, it errors out. Users like the above can still add -Dstatic_assert=_Static_assert to cflags as a workaround. [1]: https://fate.ffmpeg.org/report.cgi?time=20240321123620=sh4-debian-qemu-gcc-4.7 Signed-off-by: Andreas Rheinhardt --- This is what a test without fallback looks like. Posted to gather opinions on what people prefer. configure | 13 + 1 file changed, 13 insertions(+) diff --git a/configure b/configure index 6d7b33b0ff..c2d2c70c20 100755 --- a/configure +++ b/configure @@ -5589,6 +5589,19 @@ check_cxxflags_cc -std=$stdcxx ctype.h "__cplusplus >= 201103L" || check_cflags_cc -std=$stdc ctype.h "__STDC_VERSION__ >= 201112L" || { check_cflags_cc -std=c11 ctype.h "__STDC_VERSION__ >= 201112L" && stdc="c11" || die "Compiler lacks C11 support"; } +test_cc < +#include +struct Foo { +int a; +void *ptr; +} obj; +static_assert(offsetof(struct Foo, a) == 0, + "First element of struct does not have offset 0"); +static_assert(offsetof(struct Foo, ptr) >= offsetof(struct Foo, a) + sizeof(obj.a), + "elements not properly ordered in struct"); +EOF + check_cppflags -D_FILE_OFFSET_BITS=64 check_cppflags -D_LARGEFILE_SOURCE Jan has tested old toolchains and found out that his GCC 4.7 has proper C11 headers; so this seems to be unique to Michael's setup. This makes me prefer this patch instead of the version with the fallback. (Michael can simply add -Dstatic_assert=_Static_assert to his cflags.) Of course others are still invited to share their opinions. Both patches seem to work fine with MSVC 19.27 - I vaguely prefer the v2 version, which is simpler. But to me, we could also just revert the change to libavcodec/ccaption_dec.c, and declare that we require MSVC 19.28 instead. MSVC 19.27, when executed with -std:c11 without -nologo, it prints this: /std:c11 is a preview implementation of the ISO C11 standard, and we're eager to hear about bugs and suggestions for improvements. However, note that these features are provided as-is without support. And I don't have any specific reasons for wanting to use this compiler - I just tested the lowest version that was supposed to be supported earlier and noted that it had broken recently. So to me, reverting to requiring _Static_assert would be quite ok as well. // Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] duplicate symbol '_dec_init' in: fftools/ffmpeg_dec.o
On Sun, 17 Mar 2024, Rémi Denis-Courmont wrote: Obviously not. Imported libraries are only there to resolve missing symbols. Sure - but if resolving the missing symbols brings in those conflicting object files, there's not much to do about it. If the static library contains dec_init in a standalone object file that nothing references, then sure, it won't be an issue. But if linking libbr brings in the object file that defines that symbol, we can't get around it. Example: $ cat mylib.h void mylib_func(void); $ cat mylib.c #include "mylib.h" void mylib_func(void) { } void dec_init(void) { } $ cat main.c #include "mylib.h" void dec_init(void) { } int main(int argc, char **argv) { mylib_func(); return 0; } $ gcc -c mylib.c $ ar rcs libmylib.a mylib.o $ gcc -c main.c $ gcc main.o -o main -L. -lmylib /usr/bin/ld: ./libmylib.a(mylib.o): in function `dec_init': mylib.c:(.text+0xb): multiple definition of `dec_init'; main.o:main.c:(.text+0x0): first defined here collect2: error: ld returned 1 exit status I don't see what you propose that the FFmpeg build system should do differently to get around this issue, other than libbr not exposing global symbols outside of their namespace. // Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] configure: Remove av_restrict
On Sun, 10 Mar 2024, Andreas Rheinhardt wrote: All versions of MSVC that support C11 (namely >= v19.27) also support the restrict keyword, therefore av_restrict is no longer necessary since 75697836b1db3e0f0a3b7061be6be28d00c675a0. Signed-off-by: Andreas Rheinhardt --- Untested except via godbolt. MSVC actually uses it for optimizations: https://godbolt.org/z/3EzPnff9T This change looks good overall, thanks! Fate runs successfully both with an old version of MSVC targeting x86_64 and a new one targeting aarch64. However, MSVC 19.27 (aka 2019 16.7) can't successfully build ffmpeg at the moment - it regressed in ec1b6e0cd404b2f7f4d202802b1c0a40d52fc9b0. Now building fails with this error: src/libavcodec/ccaption_dec.c(186): error C2143: syntax error: missing ')' before 'sizeof' src/libavcodec/ccaption_dec.c(186): error C2143: syntax error: missing '{' before 'sizeof' src/libavcodec/ccaption_dec.c(186): error C2059: syntax error: 'sizeof' This issue is not present with the following version, MSVC 2019 16.8 (aka 19.28) though. Btw: The block about __declspec(restrict) was always unneeded for FFmpeg due to 17fad33f81c7e9787fcdc17934fc1eee6c6aa4bf. It came from Libav commit 17fad33f81c7e9787fcdc17934fc1eee6c6aa4bf. This looks like a copypaste typo, I presume the latter should have been 0cff125200ab53fa3ae70d85b4f614f269fe3426. (The code it changed originated in dfa559bcbd41397b3408c59d016631c7c65e320f in libav.) // Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH 1/4] aarch64: Fix ff_hevc_put_hevc_epel_h48_8_neon_i8mm
On Thu, 14 Mar 2024, J. Dekker wrote: Martin Storsjö writes: The first 32 elements of each row were correct, while the last 16 were scrambled. This hasn't been noticed, because the checkasm test erroneously only checked half of the output (for 8 bit functions), and apparently none of the samples as part of "fate-hevc" seem to trigger this specific function. --- libavcodec/aarch64/hevcdsp_epel_neon.S | 14 +- 1 file changed, 9 insertions(+), 5 deletions(-) Thanks for the fixes, wonder if we should use checkasm_check() exclusively in checkasm rather than memcmp(), would probably be useful. Wherever it makes sense and works, then yes, using checkasm_check() probably is useful. (Within dav1d, we use it in most tests except for a few.) FWIW, many checkasm tests seem to have pretty naive setups, where e.g. all rows are tightly packed. If they'd use a bigger stride with more padding between rows, one can also detect some other cases of potential asm bugs. Pushed set Thanks! // Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH 4/4] checkasm: hevc_pel: Use checkasm_check for printing failing output
This simplifies the code for checking the output, and can print the failing output (including a map of matching/mismatching elements) if checkasm is run with the -v/--verbose option. --- tests/checkasm/hevc_pel.c | 71 ++- 1 file changed, 41 insertions(+), 30 deletions(-) diff --git a/tests/checkasm/hevc_pel.c b/tests/checkasm/hevc_pel.c index 73a4619978..ed22ec4f9d 100644 --- a/tests/checkasm/hevc_pel.c +++ b/tests/checkasm/hevc_pel.c @@ -36,6 +36,15 @@ static const int offsets[] = {0, 255, -1 }; #define SIZEOF_PIXEL ((bit_depth + 7) / 8) #define BUF_SIZE (2 * MAX_PB_SIZE * (2 * 4 + MAX_PB_SIZE)) +#define checkasm_check_pixel(buf1, stride1, buf2, stride2, ...) \ +((bit_depth > 8) ? \ + checkasm_check(uint16_t, (const uint16_t*)buf1, stride1, \ + (const uint16_t*)buf2, stride2, \ + __VA_ARGS__) :\ + checkasm_check(uint8_t, (const uint8_t*) buf1, stride1, \ + (const uint8_t*) buf2, stride2, \ + __VA_ARGS__)) + #define randomize_buffers() \ do { \ uint32_t mask = pixel_mask[bit_depth - 8]; \ @@ -78,7 +87,7 @@ static void checkasm_check_hevc_qpel(void) LOCAL_ALIGNED_32(uint8_t, dst1, [BUF_SIZE]); HEVCDSPContext h; -int size, bit_depth, i, j, row; +int size, bit_depth, i, j; declare_func(void, int16_t *dst, uint8_t *src, ptrdiff_t srcstride, int height, intptr_t mx, intptr_t my, int width); @@ -102,12 +111,9 @@ static void checkasm_check_hevc_qpel(void) randomize_buffers(); call_ref(dstw0, src0, sizes[size] * SIZEOF_PIXEL, sizes[size], i, j, sizes[size]); call_new(dstw1, src1, sizes[size] * SIZEOF_PIXEL, sizes[size], i, j, sizes[size]); -for (row = 0; row < size[sizes]; row++) { -if (memcmp(dstw0 + row * MAX_PB_SIZE, - dstw1 + row * MAX_PB_SIZE, - sizes[size] * sizeof(int16_t))) -fail(); -} +checkasm_check(int16_t, dstw0, MAX_PB_SIZE * sizeof(int16_t), +dstw1, MAX_PB_SIZE * sizeof(int16_t), +size[sizes], size[sizes], "dst"); bench_new(dstw1, src1, sizes[size] * SIZEOF_PIXEL, sizes[size], i, j, sizes[size]); } } @@ -152,8 +158,9 @@ static void checkasm_check_hevc_qpel_uni(void) call_new(dst1, sizes[size] * SIZEOF_PIXEL, src1, sizes[size] * SIZEOF_PIXEL, sizes[size], i, j, sizes[size]); -if (memcmp(dst0, dst1, sizes[size] * sizes[size] * SIZEOF_PIXEL)) -fail(); +checkasm_check_pixel(dst0, sizes[size] * SIZEOF_PIXEL, + dst1, sizes[size] * SIZEOF_PIXEL, + size[sizes], size[sizes], "dst"); bench_new(dst1, sizes[size] * SIZEOF_PIXEL, src1, sizes[size] * SIZEOF_PIXEL, sizes[size], i, j, sizes[size]); @@ -204,8 +211,9 @@ static void checkasm_check_hevc_qpel_uni_w(void) call_new(dst1, sizes[size] * SIZEOF_PIXEL, src1, sizes[size] * SIZEOF_PIXEL, sizes[size], *denom, *wx, *ox, i, j, sizes[size]); -if (memcmp(dst0, dst1, sizes[size] * sizes[size] * SIZEOF_PIXEL)) -fail(); +checkasm_check_pixel(dst0, sizes[size] * SIZEOF_PIXEL, + dst1, sizes[size] * SIZEOF_PIXEL, + size[sizes], size[sizes], "dst"); bench_new(dst1, sizes[size] * SIZEOF_PIXEL, src1, sizes[size] * SIZEOF_PIXEL, sizes[size], *denom, *wx, *ox, i, j, sizes[size]); @@ -258,8 +266,9 @@ static void checkasm_check_hevc_qpel_bi(void) call_new(dst1, sizes[size] * SIZEOF_PIXEL, src1, sizes[size] * SIZEOF_PIXEL, ref1, sizes[size], i, j, sizes[size]); -if (memcmp(dst0, dst1, sizes[size] * sizes[size] * SIZEOF_PIXEL)) -
[FFmpeg-devel] [PATCH 3/4] checkasm: hevc_pel: Split a couple excessively long lines
--- tests/checkasm/hevc_pel.c | 134 -- 1 file changed, 98 insertions(+), 36 deletions(-) diff --git a/tests/checkasm/hevc_pel.c b/tests/checkasm/hevc_pel.c index 065da87622..73a4619978 100644 --- a/tests/checkasm/hevc_pel.c +++ b/tests/checkasm/hevc_pel.c @@ -96,13 +96,16 @@ static void checkasm_check_hevc_qpel(void) case 3: type = "qpel_hv"; break; // 1 1 } -if (check_func(h.put_hevc_qpel[size][j][i], "put_hevc_%s%d_%d", type, sizes[size], bit_depth)) { +if (check_func(h.put_hevc_qpel[size][j][i], + "put_hevc_%s%d_%d", type, sizes[size], bit_depth)) { int16_t *dstw0 = (int16_t *) dst0, *dstw1 = (int16_t *) dst1; randomize_buffers(); call_ref(dstw0, src0, sizes[size] * SIZEOF_PIXEL, sizes[size], i, j, sizes[size]); call_new(dstw1, src1, sizes[size] * SIZEOF_PIXEL, sizes[size], i, j, sizes[size]); for (row = 0; row < size[sizes]; row++) { -if (memcmp(dstw0 + row * MAX_PB_SIZE, dstw1 + row * MAX_PB_SIZE, sizes[size] * sizeof(int16_t))) +if (memcmp(dstw0 + row * MAX_PB_SIZE, + dstw1 + row * MAX_PB_SIZE, + sizes[size] * sizeof(int16_t))) fail(); } bench_new(dstw1, src1, sizes[size] * SIZEOF_PIXEL, sizes[size], i, j, sizes[size]); @@ -140,13 +143,20 @@ static void checkasm_check_hevc_qpel_uni(void) case 3: type = "qpel_uni_hv"; break; // 1 1 } -if (check_func(h.put_hevc_qpel_uni[size][j][i], "put_hevc_%s%d_%d", type, sizes[size], bit_depth)) { +if (check_func(h.put_hevc_qpel_uni[size][j][i], + "put_hevc_%s%d_%d", type, sizes[size], bit_depth)) { randomize_buffers(); -call_ref(dst0, sizes[size] * SIZEOF_PIXEL, src0, sizes[size] * SIZEOF_PIXEL, sizes[size], i, j, sizes[size]); -call_new(dst1, sizes[size] * SIZEOF_PIXEL, src1, sizes[size] * SIZEOF_PIXEL, sizes[size], i, j, sizes[size]); +call_ref(dst0, sizes[size] * SIZEOF_PIXEL, + src0, sizes[size] * SIZEOF_PIXEL, + sizes[size], i, j, sizes[size]); +call_new(dst1, sizes[size] * SIZEOF_PIXEL, + src1, sizes[size] * SIZEOF_PIXEL, + sizes[size], i, j, sizes[size]); if (memcmp(dst0, dst1, sizes[size] * sizes[size] * SIZEOF_PIXEL)) fail(); -bench_new(dst1, sizes[size] * SIZEOF_PIXEL, src1, sizes[size] * SIZEOF_PIXEL, sizes[size], i, j, sizes[size]); +bench_new(dst1, sizes[size] * SIZEOF_PIXEL, + src1, sizes[size] * SIZEOF_PIXEL, + sizes[size], i, j, sizes[size]); } } } @@ -182,16 +192,23 @@ static void checkasm_check_hevc_qpel_uni_w(void) case 3: type = "qpel_uni_w_hv"; break; // 1 1 } -if (check_func(h.put_hevc_qpel_uni_w[size][j][i], "put_hevc_%s%d_%d", type, sizes[size], bit_depth)) { +if (check_func(h.put_hevc_qpel_uni_w[size][j][i], + "put_hevc_%s%d_%d", type, sizes[size], bit_depth)) { for (denom = denoms; *denom >= 0; denom++) { for (wx = weights; *wx >= 0; wx++) { for (ox = offsets; *ox >= 0; ox++) { randomize_buffers(); -call_ref(dst0, sizes[size] * SIZEOF_PIXEL, src0, sizes[size] * SIZEOF_PIXEL, sizes[size], *denom, *wx, *ox, i, j, sizes[size]); -call_new(dst1, sizes[size] * SIZEOF_PIXEL, src1, sizes[size] * SIZEOF_PIXEL, sizes[size], *denom, *wx, *ox, i, j, sizes[size]); +call_ref(dst0, sizes[size] * SIZEOF_PIXEL, + src0, sizes[size] * SIZEOF_PIXEL, + sizes[size], *denom, *wx, *ox, i, j, sizes[size]); +call_new(dst1, sizes[size] * SIZEOF_PIXEL, + src1, sizes[size] * SIZEOF_PIXEL, + sizes[size], *denom, *wx, *ox, i, j, sizes[size]); if (memcmp(dst0, dst1, sizes[size] * sizes[size] *
[FFmpeg-devel] [PATCH 2/4] checkasm: hevc_pel: Check the full output in hevc_epel/hevc_qpel
Previously it only checked half the output in 8 bit per pixel mode, as the output actually is 16 bit elements here. --- tests/checkasm/hevc_pel.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tests/checkasm/hevc_pel.c b/tests/checkasm/hevc_pel.c index f9a7a7717c..065da87622 100644 --- a/tests/checkasm/hevc_pel.c +++ b/tests/checkasm/hevc_pel.c @@ -102,7 +102,7 @@ static void checkasm_check_hevc_qpel(void) call_ref(dstw0, src0, sizes[size] * SIZEOF_PIXEL, sizes[size], i, j, sizes[size]); call_new(dstw1, src1, sizes[size] * SIZEOF_PIXEL, sizes[size], i, j, sizes[size]); for (row = 0; row < size[sizes]; row++) { -if (memcmp(dstw0 + row * MAX_PB_SIZE, dstw1 + row * MAX_PB_SIZE, sizes[size] * SIZEOF_PIXEL)) +if (memcmp(dstw0 + row * MAX_PB_SIZE, dstw1 + row * MAX_PB_SIZE, sizes[size] * sizeof(int16_t))) fail(); } bench_new(dstw1, src1, sizes[size] * SIZEOF_PIXEL, sizes[size], i, j, sizes[size]); @@ -334,7 +334,7 @@ static void checkasm_check_hevc_epel(void) call_ref(dstw0, src0, sizes[size] * SIZEOF_PIXEL, sizes[size], i, j, sizes[size]); call_new(dstw1, src1, sizes[size] * SIZEOF_PIXEL, sizes[size], i, j, sizes[size]); for (row = 0; row < size[sizes]; row++) { -if (memcmp(dstw0 + row * MAX_PB_SIZE, dstw1 + row * MAX_PB_SIZE, sizes[size] * SIZEOF_PIXEL)) +if (memcmp(dstw0 + row * MAX_PB_SIZE, dstw1 + row * MAX_PB_SIZE, sizes[size] * sizeof(int16_t))) fail(); } bench_new(dstw1, src1, sizes[size] * SIZEOF_PIXEL, sizes[size], i, j, sizes[size]); -- 2.39.3 (Apple Git-146) ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH 1/4] aarch64: Fix ff_hevc_put_hevc_epel_h48_8_neon_i8mm
The first 32 elements of each row were correct, while the last 16 were scrambled. This hasn't been noticed, because the checkasm test erroneously only checked half of the output (for 8 bit functions), and apparently none of the samples as part of "fate-hevc" seem to trigger this specific function. --- libavcodec/aarch64/hevcdsp_epel_neon.S | 14 +- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/libavcodec/aarch64/hevcdsp_epel_neon.S b/libavcodec/aarch64/hevcdsp_epel_neon.S index 2dafa09337..d3f0a26f79 100644 --- a/libavcodec/aarch64/hevcdsp_epel_neon.S +++ b/libavcodec/aarch64/hevcdsp_epel_neon.S @@ -1572,6 +1572,7 @@ function ff_hevc_put_hevc_epel_h48_8_neon_i8mm, export=1 xtn2v22.8h, v26.4s xtn v23.4h, v23.4s xtn2v23.8h, v27.4s +add x7, x0, #64 st4 {v20.8h, v21.8h, v22.8h, v23.8h}, [x0], x10 ext v4.16b, v2.16b, v3.16b, #1 ext v5.16b, v2.16b, v3.16b, #2 @@ -1584,11 +1585,14 @@ function ff_hevc_put_hevc_epel_h48_8_neon_i8mm, export=1 usdot v21.4s, v4.16b, v30.16b usdot v22.4s, v5.16b, v30.16b usdot v23.4s, v6.16b, v30.16b -xtn v20.4h, v20.4s -xtn2v20.8h, v22.4s -xtn v21.4h, v21.4s -xtn2v21.8h, v23.4s -add x7, x0, #64 +zip1v24.4s, v20.4s, v22.4s +zip2v25.4s, v20.4s, v22.4s +zip1v26.4s, v21.4s, v23.4s +zip2v27.4s, v21.4s, v23.4s +xtn v20.4h, v24.4s +xtn2v20.8h, v25.4s +xtn v21.4h, v26.4s +xtn2v21.8h, v27.4s st2 {v20.8h, v21.8h}, [x7] b.ne1b ret -- 2.39.3 (Apple Git-146) ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH] aarch64: Factorize code for CPU feature detection on Apple platforms
--- libavutil/aarch64/cpu.c | 25 + 1 file changed, 13 insertions(+), 12 deletions(-) diff --git a/libavutil/aarch64/cpu.c b/libavutil/aarch64/cpu.c index 7a05391343..196bdaf6b0 100644 --- a/libavutil/aarch64/cpu.c +++ b/libavutil/aarch64/cpu.c @@ -45,22 +45,23 @@ static int detect_flags(void) #elif defined(__APPLE__) && HAVE_SYSCTLBYNAME #include +static int have_feature(const char *feature) { +uint32_t value = 0; +size_t size = sizeof(value); +if (!sysctlbyname(feature, , , NULL, 0)) +return value; +return 0; +} + static int detect_flags(void) { -uint32_t value = 0; -size_t size; int flags = 0; -size = sizeof(value); -if (!sysctlbyname("hw.optional.arm.FEAT_DotProd", , , NULL, 0)) { -if (value) -flags |= AV_CPU_FLAG_DOTPROD; -} -size = sizeof(value); -if (!sysctlbyname("hw.optional.arm.FEAT_I8MM", , , NULL, 0)) { -if (value) -flags |= AV_CPU_FLAG_I8MM; -} +if (have_feature("hw.optional.arm.FEAT_DotProd")) +flags |= AV_CPU_FLAG_DOTPROD; +if (have_feature("hw.optional.arm.FEAT_I8MM")) +flags |= AV_CPU_FLAG_I8MM; + return flags; } -- 2.34.1 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH 02/18] fftools/ffmpeg_filter: refactor setting input timebase
On Mon, 11 Mar 2024, Anton Khirnov wrote: Well it IS obsolete. AFAIK it was never a particularly popular codec, and was only really used by the anime and ripping scenes in early 2000s, and even they dropped it very quickly once x264 appeared. Within the scene of mobile HW, they commonly had HW codecs for H263 and MPEG4 (or SW codecs), with many but not all also supporting H264. So for one specific generation of mobile devices, MPEG4 was the same level of lingua franca that H264 is today. Obviously not a big use case today in nontrivial numbers of course, but it is an example of a "scene" where the codec did have a pretty broad adoption. And again - that does not mean the capability should be removed, but it does mean that we shouldn't insist on tuning it for the smoothest user experience, since this time is then NOT spent doing something actually useful. I guess that's true. // Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH 02/18] fftools/ffmpeg_filter: refactor setting input timebase
On Mon, 11 Mar 2024, Anton Khirnov wrote: I think the point is, that one can't just dismiss that anybody would want to encode mpeg4 video any longer, even if it is obsolete. I also would like to keep being able to do that. That capability is not going away though, and I'm not arguing that it should. Ok, good. The generally dismissive arguments about mpeg4 encoding being obsolete and something that nobody should be doing, could be interpreted in such a way. That said, I haven't followed the discussion closely enough about what to do with the time bases. The only change is that in some rare cases the automatically selected timebase no longer fits into mpeg4 constraints, so the user has to specify either the framerate or the timebase explicitly. Right, I see. Specifically, the commandline used by Michael involves the extremely obscure case of converting subtitles to video (NOT harsubbing, but really 1 sub -> 1 video). Since subtitle encoding API is hardcoded to AV_TIME_BASE_Q, that timebase gets used for encoding, and the mpeg4 encoder rejects it. If it was hardsubbing (i.e. 1 video + 1 sub -> 1 video), the input video timebase should be used, which would probably work. I don't think it's that big of a deal to require users to specify the timebase or framerate explicitly in such a sitation. Inventing new APIs to cover it automagically seems like a waste of time, unless somebody has actual (not potential) uses for this. Right, I would agree with this. (If someone else would volunteer to add said API I would consider accepting it though.) Is this a usecase that currently works, but would be go away by getting rid of codec specific code in the tools, or is it a nice-to-have new extra feature that is being requested? // Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH 02/18] fftools/ffmpeg_filter: refactor setting input timebase
On Mon, 11 Mar 2024, Anton Khirnov wrote: Quoting Tobias Rapp (2024-03-11 11:12:38) On 10/03/2024 23:49, Anton Khirnov wrote: Quoting James Almer (2024-03-10 23:29:27) On 3/10/2024 7:24 PM, Anton Khirnov wrote: Quoting Michael Niedermayer (2024-03-10 20:21:47) On Sun, Mar 10, 2024 at 07:13:18AM +0100, Anton Khirnov wrote: Quoting Michael Niedermayer (2024-03-10 04:36:29) why not automatically choose a supported timebase ? "[mpeg4 @ 0x55973c869f00] timebase 1/100 not supported by MPEG 4 standard, the maximum admitted value for the timebase denominator is 65535" Because I don't want ffmpeg CLI to have codec-specific code for a codec that's been obsolete for 15+ years. One could also potentially do it inside the encoder itself, but it is nontrivial since the computations are spread across a number of places in mpeg4videoenc.c and mpegvideo_enc.c. And again, it seems like a waste of time - there is no reason to encode mpeg4 today. This is not mpeg4 specific, its just a new additional case that fails The case you reported is mpeg4 specific. ./ffmpeg -i mm-small.mpg test.dv [dvvideo @ 0x7f868800f100] Found no DV profile for 80x60 yuv420p video. Valid DV profiles are: There is no mechanism for an encoder to export supported time bases. Could it be added as an extension to AVProfile, or AVCodec? The two cases are actually pretty different: * mpeg4 has a constraint on the range of timebases, and actually does some perverted computations with the timestamps * DV just needs your video to be CFR, with a list of supported framerates; dvenc should probably read AVCodecContext.framerate instead of time_base But most importantly, is there an actual current use case for either of those encoders? They have both been obsolete for close to two decades. It seems silly to add new API that won't actually be useful to anyone. Hardware doesn't get outdated as quickly as software. And there are people that do not switch their full environment to a new codec every decade just to be "in line". And your point is...? I think the point is, that one can't just dismiss that anybody would want to encode mpeg4 video any longer, even if it is obsolete. I also would like to keep being able to do that. That said, I haven't followed the discussion closely enough about what to do with the time bases. // Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH 2/2] libavcodec: Don't include libavcodec/x86/vvc/Makefile on any architecture
This currently builds files in the libavcodec/x86/{vvc,h26x} subdirectories, which is somewhat unexpected when building for another architecture than x86. The regular arch subdirectories are handled with -include $(SRC_PATH)/$(1)/$(ARCH)/Makefile in the toplevel Makefile. Switch this to a similar optional inclusion, using $(ARCH). --- libavcodec/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libavcodec/Makefile b/libavcodec/Makefile index 5d99120aa9..708434ac76 100644 --- a/libavcodec/Makefile +++ b/libavcodec/Makefile @@ -64,7 +64,7 @@ OBJS = ac3_parser.o \ # subsystems include $(SRC_PATH)/libavcodec/vvc/Makefile -include $(SRC_PATH)/libavcodec/x86/vvc/Makefile +-include $(SRC_PATH)/libavcodec/$(ARCH)/vvc/Makefile OBJS-$(CONFIG_AANDCTTABLES)+= aandcttab.o OBJS-$(CONFIG_AC3DSP) += ac3dsp.o ac3.o ac3tab.o OBJS-$(CONFIG_ADTS_HEADER) += adts_header.o mpeg4audio_sample_rates.o -- 2.39.3 (Apple Git-145) ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH 1/2] makefile: Clean up missed object files with "make clean"
In some builds, the following object files could be left behind after make clean: ./libavfilter/metal/utils.o ./libavfilter/metal/vf_yadif_videotoolbox.metallib.o ./libavcodec/x86/h26x/h2656dsp.o ./libavcodec/neon/mpegvideo.o ./ffbuild/bin2c_host.o --- ffbuild/common.mak | 2 +- libavcodec/neon/Makefile| 3 +++ libavcodec/x86/vvc/Makefile | 2 +- libavfilter/Makefile| 1 + 4 files changed, 6 insertions(+), 2 deletions(-) diff --git a/ffbuild/common.mak b/ffbuild/common.mak index ac54ac0681..87a3ffd2b0 100644 --- a/ffbuild/common.mak +++ b/ffbuild/common.mak @@ -140,7 +140,7 @@ else endif clean:: - $(RM) $(BIN2CEXE) + $(RM) $(BIN2CEXE) $(CLEANSUFFIXES:%=ffbuild/%) %.c %.h %.pc %.ver %.version: TAG = GEN diff --git a/libavcodec/neon/Makefile b/libavcodec/neon/Makefile index 607f116a77..83c2f0051c 100644 --- a/libavcodec/neon/Makefile +++ b/libavcodec/neon/Makefile @@ -1 +1,4 @@ +clean:: + $(RM) $(CLEANSUFFIXES:%=libavcodec/neon/%) + OBJS-$(CONFIG_MPEGVIDEO) += neon/mpegvideo.o diff --git a/libavcodec/x86/vvc/Makefile b/libavcodec/x86/vvc/Makefile index 82f281d1c7..d1623bd46a 100644 --- a/libavcodec/x86/vvc/Makefile +++ b/libavcodec/x86/vvc/Makefile @@ -1,5 +1,5 @@ clean:: - $(RM) $(CLEANSUFFIXES:%=libavcodec/x86/vvc/%) + $(RM) $(CLEANSUFFIXES:%=libavcodec/x86/vvc/%) $(CLEANSUFFIXES:%=libavcodec/x86/h26x/%) OBJS-$(CONFIG_VVC_DECODER) += x86/vvc/vvcdsp_init.o \ x86/h26x/h2656dsp.o diff --git a/libavfilter/Makefile b/libavfilter/Makefile index f6c1d641d6..994d9773ba 100644 --- a/libavfilter/Makefile +++ b/libavfilter/Makefile @@ -666,6 +666,7 @@ TOOLS-$(CONFIG_LIBZMQ) += zmqsend clean:: $(RM) $(CLEANSUFFIXES:%=libavfilter/dnn/%) $(CLEANSUFFIXES:%=libavfilter/opencl/%) \ + $(CLEANSUFFIXES:%=libavfilter/metal/%) \ $(CLEANSUFFIXES:%=libavfilter/vulkan/%) OPENCL = $(subst $(SRC_PATH)/,,$(wildcard $(SRC_PATH)/libavfilter/opencl/*.cl)) -- 2.39.3 (Apple Git-145) ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH] libavdevice: Fix the avfoundation device after switching to FFInputFormat
This was missed in b800327f4c7233d09baca958121722a04c2035ff. --- libavdevice/avfoundation.m | 11 ++- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/libavdevice/avfoundation.m b/libavdevice/avfoundation.m index a0ef87edff..d9b17ccdae 100644 --- a/libavdevice/avfoundation.m +++ b/libavdevice/avfoundation.m @@ -32,6 +32,7 @@ #include "libavutil/pixdesc.h" #include "libavutil/opt.h" #include "libavutil/avstring.h" +#include "libavformat/demux.h" #include "libavformat/internal.h" #include "libavutil/internal.h" #include "libavutil/parseutils.h" @@ -1292,13 +1293,13 @@ static int avf_close(AVFormatContext *s) .category = AV_CLASS_CATEGORY_DEVICE_VIDEO_INPUT, }; -const AVInputFormat ff_avfoundation_demuxer = { -.name = "avfoundation", -.long_name = NULL_IF_CONFIG_SMALL("AVFoundation input device"), +const FFInputFormat ff_avfoundation_demuxer = { +.p.name = "avfoundation", +.p.long_name= NULL_IF_CONFIG_SMALL("AVFoundation input device"), +.p.flags= AVFMT_NOFILE, +.p.priv_class = _class, .priv_data_size = sizeof(AVFContext), .read_header= avf_read_header, .read_packet= avf_read_packet, .read_close = avf_close, -.flags = AVFMT_NOFILE, -.priv_class = _class, }; -- 2.39.3 (Apple Git-145) ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] lavc/aarch64/fdct: add neon-optimized fdct for aarch64
On Wed, 6 Mar 2024, Ramiro Polla wrote: ping Did you miss my response here? https://ffmpeg.org/pipermail/ffmpeg-devel/2024-February/321448.html // Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] aarch64: Use regular hwcaps flags instead of HWCAP_CPUID for CPU feature detection on Linux
On Wed, 28 Feb 2024, Martin Storsjö wrote: The CPU feature detection was added in 493fcde50a84cb23854335bcb0e55c6f383d55db, using HWCAP_CPUID. The argument for using that, was that HWCAP_CPUID was added much earlier in the kernel (in Linux v4.11), while the HWCAP flags for individual features were added much later. And if compiling with older userland headers that lack the bits for e.g. HWCAP_I8MM, we wouldn't be able to detect that feature. (In practice, e.g. Ubuntu 20.04 lacks HWCAP_I8MM in userland headers, but the toolchain does support assembling such instructions). However, while the flag HWCAP_I8MM was addded only in Linux v5.10, any CPU with that feature is most likely running a kernel that is newer than that as well. So by using HWCAP_CPUID, we could detect that feature on kernels between v4.11 and v5.10, but that is a quite unlikely case in practice. By using regular hwcaps flags, the code is much simplified, and doesn't rely on inline assembly to read the cpu id registers. And instead of requiring the userland headers to provide the definitions of the hwcap flags, provide our own definitions of the constants (they are fixed constants anyway), with names not conflicting with the ones from system headers. This avoids a number of ifdefs, and allows detecting these features even if building with userland headers that don't contain these definitions yet. Also, slightly older versions of QEMU, e.g. 6.2 in Ubuntu 22.04, do expose these features via HWCAP flags, but the emulated cpuid registers are missing the bits for exposing e.g. I8MM. --- libavutil/aarch64/cpu.c | 30 -- 1 file changed, 8 insertions(+), 22 deletions(-) Will apply on Monday, if there's no objections. // Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH v4] avcodec/aarch64/hevc: add luma deblock NEON
On Wed, 28 Feb 2024, J. Dekker wrote: Martin Storsjö writes: On Wed, 28 Feb 2024, J. Dekker wrote: Martin Storsjö writes: On Tue, 27 Feb 2024, J. Dekker wrote: Benched using single-threaded full decode on an Ampere Altra. Bpp Before After Speedup 8 73,3s 65,2s 1.124x 10 114,2s 104,0s 1.098x 12 125,8s 115,7s 1.087x Signed-off-by: J. Dekker --- Slightly improved 12bit version. libavcodec/aarch64/hevcdsp_deblock_neon.S | 417 ++ libavcodec/aarch64/hevcdsp_init_aarch64.c | 18 + 2 files changed, 435 insertions(+) diff --git a/libavcodec/aarch64/hevcdsp_deblock_neon.S b/libavcodec/aarch64/hevcdsp_deblock_neon.S index 8227f65649..581056a91e 100644 --- a/libavcodec/aarch64/hevcdsp_deblock_neon.S +++ b/libavcodec/aarch64/hevcdsp_deblock_neon.S @@ -181,3 +181,420 @@ hevc_h_loop_filter_chroma 12 hevc_v_loop_filter_chroma 8 hevc_v_loop_filter_chroma 10 hevc_v_loop_filter_chroma 12 + +.macro hevc_loop_filter_luma_body bitdepth +function hevc_loop_filter_luma_body_\bitdepth\()_neon, export=0 +.if \bitdepth > 8 +lsl w2, w2, #(\bitdepth - 8) // beta <<= BIT_DEPTH - 8 +.else +uxtlv0.8h, v0.8b +uxtlv1.8h, v1.8b +uxtlv2.8h, v2.8b +uxtlv3.8h, v3.8b +uxtlv4.8h, v4.8b +uxtlv5.8h, v5.8b +uxtlv6.8h, v6.8b +uxtlv7.8h, v7.8b +.endif +ldr w7, [x3] // tc[0] +ldr w8, [x3, #4] // tc[1] +dup v18.4h, w7 +dup v19.4h, w8 +trn1v18.2d, v18.2d, v19.2d +.if \bitdepth > 8 +shl v18.8h, v18.8h, #(\bitdepth - 8) +.endif +dup v27.8h, w2 // beta +// tc25 +shl v19.8h, v18.8h, #2 // * 4 +add v19.8h, v19.8h, v18.8h // (tc * 5) +srshr v19.8h, v19.8h, #1 // (tc * 5 + 1) >> 1 +sshrv17.8h, v27.8h, #2 // beta2 + +// beta_2 check +// dp0 = abs(P2 - 2 * P1 + P0) +add v22.8h, v3.8h, v1.8h +shl v23.8h, v2.8h, #1 +sabdv30.8h, v22.8h, v23.8h +// dq0 = abs(Q2 - 2 * Q1 + Q0) +add v21.8h, v6.8h, v4.8h +shl v26.8h, v5.8h, #1 +sabdv31.8h, v21.8h, v26.8h +// d0 = dp0 + dq0 +add v20.8h, v30.8h, v31.8h +shl v25.8h, v20.8h, #1 +// (d0 << 1) < beta_2 +cmgtv23.8h, v17.8h, v25.8h + +// beta check +// d0 + d3 < beta +mov x9, #0x +dup v24.2d, x9 +and v25.16b, v24.16b, v20.16b +addpv25.8h, v25.8h, v25.8h // 1+0 0+1 1+0 0+1 +addpv25.4h, v25.4h, v25.4h // 1+0+0+1 1+0+0+1 +cmgtv25.4h, v27.4h, v25.4h // lower/upper mask in h[0/1] +mov w9, v25.s[0] I don't quite understand what this sequence does and/or how our data is laid out in our registers - we have d0 on input in v20, where's d3? An doesn't the "and" throw away half of the input elements here? I see some similar patterns with the masking and handling below as well - I get a feeling that I don't quite understand the algorithm here, and/or the data layout. We have d0, d1, d2, d3 for both 4 line blocks in v20, mask out d1/d2 and use pair-wise adds to move our data around and calculate d0+d3 together. The first addp just moves elements around, the second addp adds d0 + 0 + 0 + d3. Right, I guess this is the bit that was surprising. I would have expected to have e.g. all the d0 values for e.g. the 8 individual pixels in one SIMD register, and all the d3 values for all pixels in another SIMD register. So as we're operating on 8 pixels in parallel, each of those 8 pixels have their own d0/d3 values, right? Or is this a case where we have just one d0/d3 value for a range of pixels? Yes, d0/d1/d2/d3 are per 4 lines of 8 pixels, it's because d0 and d3 are calculated within their own line, d0 from line 0, d3 from line 3. Maybe it's more confusing since we are doing both halves of the filter at the same time? v20 contains d0 d1 d2 d3 d0 d1 d2 d3, where the second d0 is distinct from the first. But essentially we're doing the same operation across the entire 8 lines, the filter just makes an overall skip decision for each block of 4 lines based on the sum of the result from line 0 and 3. Ah, right, I see. I guess this makes sense then. Thanks! Thus, no further objections to it; the optimizing of loading/storing can be done separately. // Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above,
Re: [FFmpeg-devel] [PATCH v4] avcodec/aarch64/hevc: add luma deblock NEON
On Wed, 28 Feb 2024, J. Dekker wrote: Martin Storsjö writes: On Tue, 27 Feb 2024, J. Dekker wrote: Benched using single-threaded full decode on an Ampere Altra. Bpp Before After Speedup 8 73,3s 65,2s 1.124x 10 114,2s 104,0s 1.098x 12 125,8s 115,7s 1.087x Signed-off-by: J. Dekker --- Slightly improved 12bit version. libavcodec/aarch64/hevcdsp_deblock_neon.S | 417 ++ libavcodec/aarch64/hevcdsp_init_aarch64.c | 18 + 2 files changed, 435 insertions(+) diff --git a/libavcodec/aarch64/hevcdsp_deblock_neon.S b/libavcodec/aarch64/hevcdsp_deblock_neon.S index 8227f65649..581056a91e 100644 --- a/libavcodec/aarch64/hevcdsp_deblock_neon.S +++ b/libavcodec/aarch64/hevcdsp_deblock_neon.S @@ -181,3 +181,420 @@ hevc_h_loop_filter_chroma 12 hevc_v_loop_filter_chroma 8 hevc_v_loop_filter_chroma 10 hevc_v_loop_filter_chroma 12 + +.macro hevc_loop_filter_luma_body bitdepth +function hevc_loop_filter_luma_body_\bitdepth\()_neon, export=0 +.if \bitdepth > 8 +lsl w2, w2, #(\bitdepth - 8) // beta <<= BIT_DEPTH - 8 +.else +uxtlv0.8h, v0.8b +uxtlv1.8h, v1.8b +uxtlv2.8h, v2.8b +uxtlv3.8h, v3.8b +uxtlv4.8h, v4.8b +uxtlv5.8h, v5.8b +uxtlv6.8h, v6.8b +uxtlv7.8h, v7.8b +.endif +ldr w7, [x3] // tc[0] +ldr w8, [x3, #4] // tc[1] +dup v18.4h, w7 +dup v19.4h, w8 +trn1v18.2d, v18.2d, v19.2d +.if \bitdepth > 8 +shl v18.8h, v18.8h, #(\bitdepth - 8) +.endif +dup v27.8h, w2 // beta +// tc25 +shl v19.8h, v18.8h, #2 // * 4 +add v19.8h, v19.8h, v18.8h // (tc * 5) +srshr v19.8h, v19.8h, #1 // (tc * 5 + 1) >> 1 +sshrv17.8h, v27.8h, #2 // beta2 + +// beta_2 check +// dp0 = abs(P2 - 2 * P1 + P0) +add v22.8h, v3.8h, v1.8h +shl v23.8h, v2.8h, #1 +sabdv30.8h, v22.8h, v23.8h +// dq0 = abs(Q2 - 2 * Q1 + Q0) +add v21.8h, v6.8h, v4.8h +shl v26.8h, v5.8h, #1 +sabdv31.8h, v21.8h, v26.8h +// d0 = dp0 + dq0 +add v20.8h, v30.8h, v31.8h +shl v25.8h, v20.8h, #1 +// (d0 << 1) < beta_2 +cmgtv23.8h, v17.8h, v25.8h + +// beta check +// d0 + d3 < beta +mov x9, #0x +dup v24.2d, x9 +and v25.16b, v24.16b, v20.16b +addpv25.8h, v25.8h, v25.8h // 1+0 0+1 1+0 0+1 +addpv25.4h, v25.4h, v25.4h // 1+0+0+1 1+0+0+1 +cmgtv25.4h, v27.4h, v25.4h // lower/upper mask in h[0/1] +mov w9, v25.s[0] I don't quite understand what this sequence does and/or how our data is laid out in our registers - we have d0 on input in v20, where's d3? An doesn't the "and" throw away half of the input elements here? I see some similar patterns with the masking and handling below as well - I get a feeling that I don't quite understand the algorithm here, and/or the data layout. We have d0, d1, d2, d3 for both 4 line blocks in v20, mask out d1/d2 and use pair-wise adds to move our data around and calculate d0+d3 together. The first addp just moves elements around, the second addp adds d0 + 0 + 0 + d3. Right, I guess this is the bit that was surprising. I would have expected to have e.g. all the d0 values for e.g. the 8 individual pixels in one SIMD register, and all the d3 values for all pixels in another SIMD register. So as we're operating on 8 pixels in parallel, each of those 8 pixels have their own d0/d3 values, right? Or is this a case where we have just one d0/d3 value for a range of pixels? // Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH] aarch64: Use regular hwcaps flags instead of HWCAP_CPUID for CPU feature detection on Linux
The CPU feature detection was added in 493fcde50a84cb23854335bcb0e55c6f383d55db, using HWCAP_CPUID. The argument for using that, was that HWCAP_CPUID was added much earlier in the kernel (in Linux v4.11), while the HWCAP flags for individual features were added much later. And if compiling with older userland headers that lack the bits for e.g. HWCAP_I8MM, we wouldn't be able to detect that feature. (In practice, e.g. Ubuntu 20.04 lacks HWCAP_I8MM in userland headers, but the toolchain does support assembling such instructions). However, while the flag HWCAP_I8MM was addded only in Linux v5.10, any CPU with that feature is most likely running a kernel that is newer than that as well. So by using HWCAP_CPUID, we could detect that feature on kernels between v4.11 and v5.10, but that is a quite unlikely case in practice. By using regular hwcaps flags, the code is much simplified, and doesn't rely on inline assembly to read the cpu id registers. And instead of requiring the userland headers to provide the definitions of the hwcap flags, provide our own definitions of the constants (they are fixed constants anyway), with names not conflicting with the ones from system headers. This avoids a number of ifdefs, and allows detecting these features even if building with userland headers that don't contain these definitions yet. Also, slightly older versions of QEMU, e.g. 6.2 in Ubuntu 22.04, do expose these features via HWCAP flags, but the emulated cpuid registers are missing the bits for exposing e.g. I8MM. --- libavutil/aarch64/cpu.c | 30 -- 1 file changed, 8 insertions(+), 22 deletions(-) diff --git a/libavutil/aarch64/cpu.c b/libavutil/aarch64/cpu.c index f27fef3992..7a05391343 100644 --- a/libavutil/aarch64/cpu.c +++ b/libavutil/aarch64/cpu.c @@ -24,34 +24,20 @@ #include #include -#define get_cpu_feature_reg(reg, val) \ -__asm__("mrs %0, " #reg : "=r" (val)) +#define HWCAP_AARCH64_ASIMDDP (1 << 20) +#define HWCAP2_AARCH64_I8MM (1 << 13) static int detect_flags(void) { int flags = 0; -#if defined(HWCAP_CPUID) && HAVE_INLINE_ASM unsigned long hwcap = getauxval(AT_HWCAP); -// We can check for DOTPROD and I8MM using HWCAP_ASIMDDP and -// HWCAP2_I8MM too, avoiding to read the CPUID registers (which triggers -// a trap, handled by the kernel). However the HWCAP_* defines for these -// extensions are added much later than HWCAP_CPUID, so the userland -// headers might lack support for them even if the binary later is run -// on hardware that does support it (and where the kernel might support -// HWCAP_CPUID). -// See https://www.kernel.org/doc/html/latest/arm64/cpu-feature-registers.html -if (hwcap & HWCAP_CPUID) { -uint64_t tmp; - -get_cpu_feature_reg(ID_AA64ISAR0_EL1, tmp); -if (((tmp >> 44) & 0xf) == 0x1) -flags |= AV_CPU_FLAG_DOTPROD; -get_cpu_feature_reg(ID_AA64ISAR1_EL1, tmp); -if (((tmp >> 52) & 0xf) == 0x1) -flags |= AV_CPU_FLAG_I8MM; -} -#endif +unsigned long hwcap2 = getauxval(AT_HWCAP2); + +if (hwcap & HWCAP_AARCH64_ASIMDDP) +flags |= AV_CPU_FLAG_DOTPROD; +if (hwcap2 & HWCAP2_AARCH64_I8MM) +flags |= AV_CPU_FLAG_I8MM; return flags; } -- 2.39.3 (Apple Git-145) ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH v4] avcodec/aarch64/hevc: add luma deblock NEON
On Tue, 27 Feb 2024, J. Dekker wrote: Benched using single-threaded full decode on an Ampere Altra. Bpp Before After Speedup 8 73,3s 65,2s 1.124x 10 114,2s 104,0s 1.098x 12 125,8s 115,7s 1.087x Signed-off-by: J. Dekker --- Slightly improved 12bit version. libavcodec/aarch64/hevcdsp_deblock_neon.S | 417 ++ libavcodec/aarch64/hevcdsp_init_aarch64.c | 18 + 2 files changed, 435 insertions(+) diff --git a/libavcodec/aarch64/hevcdsp_deblock_neon.S b/libavcodec/aarch64/hevcdsp_deblock_neon.S index 8227f65649..581056a91e 100644 --- a/libavcodec/aarch64/hevcdsp_deblock_neon.S +++ b/libavcodec/aarch64/hevcdsp_deblock_neon.S @@ -181,3 +181,420 @@ hevc_h_loop_filter_chroma 12 hevc_v_loop_filter_chroma 8 hevc_v_loop_filter_chroma 10 hevc_v_loop_filter_chroma 12 + +.macro hevc_loop_filter_luma_body bitdepth +function hevc_loop_filter_luma_body_\bitdepth\()_neon, export=0 +.if \bitdepth > 8 +lsl w2, w2, #(\bitdepth - 8) // beta <<= BIT_DEPTH - 8 +.else +uxtlv0.8h, v0.8b +uxtlv1.8h, v1.8b +uxtlv2.8h, v2.8b +uxtlv3.8h, v3.8b +uxtlv4.8h, v4.8b +uxtlv5.8h, v5.8b +uxtlv6.8h, v6.8b +uxtlv7.8h, v7.8b +.endif +ldr w7, [x3] // tc[0] +ldr w8, [x3, #4] // tc[1] +dup v18.4h, w7 +dup v19.4h, w8 +trn1v18.2d, v18.2d, v19.2d +.if \bitdepth > 8 +shl v18.8h, v18.8h, #(\bitdepth - 8) +.endif +dup v27.8h, w2 // beta +// tc25 +shl v19.8h, v18.8h, #2 // * 4 +add v19.8h, v19.8h, v18.8h // (tc * 5) +srshr v19.8h, v19.8h, #1 // (tc * 5 + 1) >> 1 +sshrv17.8h, v27.8h, #2 // beta2 + +// beta_2 check +// dp0 = abs(P2 - 2 * P1 + P0) +add v22.8h, v3.8h, v1.8h +shl v23.8h, v2.8h, #1 +sabdv30.8h, v22.8h, v23.8h +// dq0 = abs(Q2 - 2 * Q1 + Q0) +add v21.8h, v6.8h, v4.8h +shl v26.8h, v5.8h, #1 +sabdv31.8h, v21.8h, v26.8h +// d0 = dp0 + dq0 +add v20.8h, v30.8h, v31.8h +shl v25.8h, v20.8h, #1 +// (d0 << 1) < beta_2 +cmgtv23.8h, v17.8h, v25.8h + +// beta check +// d0 + d3 < beta +mov x9, #0x +dup v24.2d, x9 +and v25.16b, v24.16b, v20.16b +addpv25.8h, v25.8h, v25.8h // 1+0 0+1 1+0 0+1 +addpv25.4h, v25.4h, v25.4h // 1+0+0+1 1+0+0+1 +cmgtv25.4h, v27.4h, v25.4h // lower/upper mask in h[0/1] +mov w9, v25.s[0] I don't quite understand what this sequence does and/or how our data is laid out in our registers - we have d0 on input in v20, where's d3? An doesn't the "and" throw away half of the input elements here? I see some similar patterns with the masking and handling below as well - I get a feeling that I don't quite understand the algorithm here, and/or the data layout. +.if \bitdepth > 8 +ld1 {v0.8h}, [x0], x1 +ld1 {v1.8h}, [x0], x1 +ld1 {v2.8h}, [x0], x1 +ld1 {v3.8h}, [x0], x1 +ld1 {v4.8h}, [x0], x1 +ld1 {v5.8h}, [x0], x1 +ld1 {v6.8h}, [x0], x1 +ld1 {v7.8h}, [x0] +mov w14, #((1 << \bitdepth) - 1) For loads like these, we can generally save a bit by using two alternating registers for loading, with a double stride - see e.g. the vp9 loop filter implementations. But that's a micro optimization. Other than that, this mostly looks reasaonble. // Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH 2/3] avcodec/x86: disable hevc 12b luma deblock
On Sat, 24 Feb 2024, J. Dekker wrote: Nuo Mi writes: On Wed, Feb 21, 2024 at 7:10 PM J. Dekker wrote: Over/underflow in some cases. Signed-off-by: J. Dekker --- libavcodec/x86/hevcdsp_init.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/libavcodec/x86/hevcdsp_init.c b/libavcodec/x86/hevcdsp_init.c index 31e81eb11f..11cb1b3bfd 100644 --- a/libavcodec/x86/hevcdsp_init.c +++ b/libavcodec/x86/hevcdsp_init.c @@ -1205,10 +1205,11 @@ void ff_hevc_dsp_init_x86(HEVCDSPContext *c, const int bit_depth) if (EXTERNAL_SSE2(cpu_flags)) { c->hevc_v_loop_filter_chroma = ff_hevc_v_loop_filter_chroma_12_sse2; c->hevc_h_loop_filter_chroma = ff_hevc_h_loop_filter_chroma_12_sse2; -if (ARCH_X86_64) { -c->hevc_v_loop_filter_luma = ff_hevc_v_loop_filter_luma_12_sse2; -c->hevc_h_loop_filter_luma = ff_hevc_h_loop_filter_luma_12_sse2; -} +// FIXME: 12-bit luma deblock over/underflows in some cases +// if (ARCH_X86_64) { +// c->hevc_v_loop_filter_luma = ff_hevc_v_loop_filter_luma_12_sse2; +// c->hevc_h_loop_filter_luma = ff_hevc_h_loop_filter_luma_12_sse2; +// } SAO_BAND_INIT(12, sse2); SAO_EDGE_INIT(12, sse2); Hi Dekker, VVC will utilize this function as well. Could you please share the HEVC clip or data that caused the overflow? We'll make efforts to address it during the VVC porting You can just run ./tests/checkasm/checkasm --test=hevc_deblock to find a failing case. To clarify, this is with the new checkasm test added in this patchset, not currently in git master - otherwise fate would be failing for everybody on x86. My guess is that delta0 overflows before the right shift, see the ARM64 asm which specfically widens this calculation on 12 bit variant but I'm not 100%, I don't know x86 asm. Are you sure the input is within valid range? It's always possible that checkasm produces inputs that the real decoder wouldn't - but it's also possible that this is a real decoder bug that just hasn't been triggered by any other test yet. // Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [GASPP PATCH] Don't mangle .L local labels for ELF targets
This fixes building FFmpeg's libavcodec/aarch64/h264idct_neon.S for a Linux target. (It's not necessary to use gas-preprocessor for such a target for a very long time, but it can be useful to be able to test gas-preprocessor there still.) --- gas-preprocessor.pl | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/gas-preprocessor.pl b/gas-preprocessor.pl index ba75611..2880858 100755 --- a/gas-preprocessor.pl +++ b/gas-preprocessor.pl @@ -738,7 +738,10 @@ sub handle_serialized_line { } # mach-o local symbol names start with L (no dot) -$line =~ s/(?https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH 3/3] avcodec/aarch64: add hevc deblock NEON
On Wed, 21 Feb 2024, J. Dekker wrote: Benched using single-threaded full decode on an Ampere Altra. Bpp Before After Speedup 8 73,3s 65,2s 1.124x 10 114,2s 104,0s 1.098x 12 125,8s 115,7s 1.087x Signed-off-by: J. Dekker --- libavcodec/aarch64/hevcdsp_deblock_neon.S | 421 ++ libavcodec/aarch64/hevcdsp_init_aarch64.c | 18 + 2 files changed, 439 insertions(+) +0: // STRONG FILTER + +// P0 = p0 + av_clip(((p2 + 2 * p1 + 2 * p0 + 2 * q0 + q1 + 4) >> 3) - p0, -tc3, tc3); +add v21.8h, v2.8h, v3.8h // (p1 + p0 +add v21.8h, v4.8h, v21.8h // + q0) +shl v21.8h, v21.8h, #1 // * 2 +add v22.8h, v1.8h, v5.8h // (p2 + q1) +add v21.8h, v22.8h, v21.8h // + +srshrv21.8h, v21.8h, #3 // >> 3 +sub v21.8h, v21.8h, v3.8h //- p0 + The srshr line is incorrectly indented here (and elsewhere) +sqxtun v4.8b, v4.8h +sqxtun v5.8b, v5.8h +sqxtun v6.8b, v6.8h +sqxtun v7.8b, v7.8h +.endif +ret +3: ret x6 Please indent the "x6" here like other operands +.macro hevc_loop_filter_luma dir bitdepth +function ff_hevc_\dir\()_loop_filter_luma_\bitdepth\()_neon, export=1 +mov x6, x30 +.if \dir == v In GAS assembler, .if does a numerical comparison - it can't do string comparisons. The right way to do this is to do ".ifc \dir, v", which does a string comparison. (If you really do need to do this like a numerical comparison, it's possible to define e.g. "v" as a numeric symbol as well, see e.g. https://code.videolan.org/videolan/dav1d/-/merge_requests/1603/diffs?commit_id=d4746c908c56cb2e8545efd348b8cdc13f2f2253 but that's not really the nicest way to do it.) This issue breaks compilation with Clang. With gas-preprocessor (for MSVC), it manages to build correctly, but does the wrong thing. To avoid me having to test all these build configurations manually, remembering to check all these corner case build configurations and check indentation and all, I've set up a PoC for testing such things on Github Actions. If you have a repo on github, grab my commits from https://github.com/mstorsjo/FFmpeg/commits/gha-aarch64 (there are a couple of them), add your changes on top of these, and push it as a branch to your own github repo, then check the output from the actions. Here's the output of a run with the patches you just posted: https://github.com/mstorsjo/FFmpeg/actions/runs/7988312683 // Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] checkasm: Add a "run-checkasm" make target
On Wed, 14 Feb 2024, Martin Storsjö wrote: Contrary to the existing "fate-checkasm", this always prints the tool output, and runs all tests at once instead of splitting it up per target group. This is more useful when the user expects to look directly at the tool output, instead of being part of a full fate run. (On failure with the regular "make fate-checkasm" targets, none of the tool output is printed, but stored in files. If run with reporting set up to the FATE website, the individual failures are uploaded there, but if it is run in some sort of other CI setup, the intermediate files might not be available afterwards for inspection.) --- tests/checkasm/Makefile | 4 1 file changed, 4 insertions(+) diff --git a/tests/checkasm/Makefile b/tests/checkasm/Makefile index 3562acb2b2..3af42a679b 100644 --- a/tests/checkasm/Makefile +++ b/tests/checkasm/Makefile @@ -91,6 +91,10 @@ CHECKASM := tests/checkasm/checkasm$(EXESUF) $(CHECKASM): $(CHECKASMOBJS) $(FF_STATIC_DEP_LIBS) $(LD) $(LDFLAGS) $(LDEXEFLAGS) $(LD_O) $(CHECKASMOBJS) $(FF_STATIC_DEP_LIBS) $(EXTRALIBS-avcodec) $(EXTRALIBS-avfilter) $(EXTRALIBS-avformat) $(EXTRALIBS-avutil) $(EXTRALIBS-swresample) $(EXTRALIBS) +run-checkasm: $(CHECKASM) +run-checkasm: + $(TARGET_EXEC) $(TARGET_PATH)/$(CHECKASM) I've amended this locally with a $(Q) at the start, to silence the executed command, unless executed with V=1. I'll push this patch later today if there aren't any objections. // Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] avutil/intreadwrite: Remove obsolete warning
On Mon, 19 Feb 2024, Andreas Rheinhardt wrote: Andreas Rheinhardt: Obsolete since 7ec2354c38978b918dc079b611393becb6c80bf7. Signed-off-by: Andreas Rheinhardt --- libavutil/intreadwrite.h | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/libavutil/intreadwrite.h b/libavutil/intreadwrite.h index 21df7887f3..d0a5773b54 100644 --- a/libavutil/intreadwrite.h +++ b/libavutil/intreadwrite.h @@ -583,9 +583,7 @@ union unaligned_16 { uint16_t l; } __attribute__((packed)) av_alias; #endif /* Parameters for AV_COPY*, AV_SWAP*, AV_ZERO* must be - * naturally aligned. They may be implemented using MMX, - * so emms_c() must be called before using any float code - * afterwards. + * naturally aligned. */ #define AV_COPY(n, d, s) \ Will apply this patch tomorrow unless there are objections. LGTM, thanks! // Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] flvdec: Honor the "flv_metadata" option for the "datastream" metadata field
On Fri, 9 Feb 2024, Martin Storsjö wrote: By default the option "flv_metadata" (internally using the field name "trust_metadata") is set to 0, meaning that we don't allocate streams based on information in the metadata, only based on actual streams we encounter. However the "datastream" metadata field still would allocate a subtitle stream. When muxing, the "datastream" field is added if either a data stream or subtitle stream is present - but the same metadata field is used to preemtively create a subtitle stream only. Thus, if the field was added due to a data stream, not a subtitle stream, the demuxer would create a stream which won't get any actual packets. If there was such an extra, empty subtitle stream, running avformat_find_stream_info still used to terminate within reasonable time before 3749eede66c3774799766b1f246afae8a6ffc9bb. After that commit, it no longer would terminate until it reaches the max analyze duration, which is 90 seconds for flv streams (see e6a084641aada7a2e4672172f2ee26642800a361, 24fdf7334d2bb9aab0abdbc878b8ae51eb57c86b and f58e011a1f30332ba824c155078ca701e29aef63). Before that commit (which removed the deprecated AVStream.codec), the "st->codecpar->codec_id = AV_CODEC_ID_TEXT", set within the demuxer, would get propagated into st->codec->codec_id by numerous avcodec_parameters_to_context(st->codec, st->codecpar), then further into st->internal->avctx->codec_id by update_stream_avctx within read_frame_internal in libavformat/utils.c (demux.c these days). --- libavformat/flvdec.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) Will push soon if there are no objections. // Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH] checkasm: Add a "run-checkasm" make target
Contrary to the existing "fate-checkasm", this always prints the tool output, and runs all tests at once instead of splitting it up per target group. This is more useful when the user expects to look directly at the tool output, instead of being part of a full fate run. (On failure with the regular "make fate-checkasm" targets, none of the tool output is printed, but stored in files. If run with reporting set up to the FATE website, the individual failures are uploaded there, but if it is run in some sort of other CI setup, the intermediate files might not be available afterwards for inspection.) --- tests/checkasm/Makefile | 4 1 file changed, 4 insertions(+) diff --git a/tests/checkasm/Makefile b/tests/checkasm/Makefile index 3562acb2b2..3af42a679b 100644 --- a/tests/checkasm/Makefile +++ b/tests/checkasm/Makefile @@ -91,6 +91,10 @@ CHECKASM := tests/checkasm/checkasm$(EXESUF) $(CHECKASM): $(CHECKASMOBJS) $(FF_STATIC_DEP_LIBS) $(LD) $(LDFLAGS) $(LDEXEFLAGS) $(LD_O) $(CHECKASMOBJS) $(FF_STATIC_DEP_LIBS) $(EXTRALIBS-avcodec) $(EXTRALIBS-avfilter) $(EXTRALIBS-avformat) $(EXTRALIBS-avutil) $(EXTRALIBS-swresample) $(EXTRALIBS) +run-checkasm: $(CHECKASM) +run-checkasm: + $(TARGET_EXEC) $(TARGET_PATH)/$(CHECKASM) + checkasm: $(CHECKASM) testclean:: checkasmclean -- 2.39.3 (Apple Git-145) ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] lavc/aarch64/fdct: add neon-optimized fdct for aarch64
Hi, On Sun, 4 Feb 2024, Ramiro Polla wrote: The code is imported from libjpeg-turbo-3.0.1. The neon registers used have been changed to avoid modifying v8-v15. --- I don't remember if we have any extra routines we need to do if importing foreign code with a differing license. The license here seems fine in any case though. This seems to work fine in all my test environments. And thanks for making sure it doesn't use v8-v15! I'm not so familiar with these DSP functions, whether it is norm to add a new constant like FF_DCT_NEON, but I guess it seems to match the pattern of the existing code. I presume the main case that tests this is "make fate-dct8x8", which builds and executes libavcodec/tests/dct? How much work would it be to integrate testing of these routines into checkasm? That way we could rest assured that the assembly passes all such ABI checks that we do there, including what registers must not be clobbered. The assembly uses a different indentation width than the rest of our assembly. I recently spent some effort on cleaning that up so that our code is mostly consistent, so I'd prefer not to add new code that deviates from it. It primarily looks like you'd need to add 4 spaces at the start of each line. I've used a script for mostly automatically reindenting our arm assembly, you can grab it at https://martin.st/temp/ffmpeg-asm-indent.pl, run it as "cat file.S | ./ffmpeg-asm-indent.pl > tmp; mv tmp file.S". It's not 100% accurate, but mostly gets you there, but it's good to manually check it afterwards as well. // Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [FFmpeg-cvslog] lavf/assenc: normalize line endings to \n
On Tue, 13 Feb 2024, Ridley Combs wrote: It looks like checkout has different behavior from reset, and fate uses a hard reset. To test, I committed the change adding tests/ref/** -text, unix2dos'd tests/ref/fate/sub-scc, then ran git -c core.autocrlf=true reset --quiet --hard; this dos2unix'd the file as expected when run with a working tree containing the .gitattributes change (but not otherwise). Git doesn't have any "memory" of the CRLFiness of a file beyond the content of the file itself (whether in the working tree or in committed blobs). It just doesn't necessarily replace every file in checkout invocations when they differ only in line endings. Windows was a mistake. To rephrase; reset vs checkout doesn't make any difference here. It seems to simply be the case, that as long as there are no changes to the file contents themselves between the relevant git commits, and the file isn't flagged as dirty in the stat cache of the local workdir, git never revisits the .gitattributes for this particular file. // Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] fate/subtitles: Ignore line endings for sub-scc test
On Tue, 13 Feb 2024, Andreas Rheinhardt wrote: Since 7bf1b9b35769b37684dd2f18a54f01d852a540c8, the test produces ordinary \n, yet this is not what the reference file used for the most time, leading to test failures. Signed-off-by: Andreas Rheinhardt --- tests/fate/subtitles.mak | 1 + 1 file changed, 1 insertion(+) diff --git a/tests/fate/subtitles.mak b/tests/fate/subtitles.mak index cea4c810dd..90412e9ac1 100644 --- a/tests/fate/subtitles.mak +++ b/tests/fate/subtitles.mak @@ -114,6 +114,7 @@ fate-sub-charenc: CMD = fmtstdout ass -sub_charenc cp1251 -i $(TARGET_SAMPLES)/s FATE_SUBTITLES-$(call DEMDEC, SCC, CCAPTION) += fate-sub-scc fate-sub-scc: CMD = fmtstdout ass -ss 57 -i $(TARGET_SAMPLES)/sub/witch.scc +fate-sub-scc: CMP = diff FATE_SUBTITLES-$(call DEMMUX, SCC, SCC) += fate-sub-scc-remux fate-sub-scc-remux: CMD = fmtstdout scc -i $(TARGET_SAMPLES)/sub/witch.scc -ss 4:00 -map 0 -c copy -- 2.34.1 Looks ok to me, as a temporary measure until we figure out the best way to upgrade everybody's workdirs without needing interaction. (As an added note to the other thread; even if we could easily patch fate.sh, every current user's workdir is also prone to this issue, and the way of fixing it is kinda non-obvious.) // Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [FFmpeg-cvslog] lavf/assenc: normalize line endings to \n
On Tue, 13 Feb 2024, Ridley Combs wrote: It looks like checkout has different behavior from reset, and fate uses a hard reset. To test, I committed the change adding tests/ref/** -text, unix2dos'd tests/ref/fate/sub-scc, then ran git -c core.autocrlf=true reset --quiet --hard; this dos2unix'd the file as expected when run with a working tree containing the .gitattributes change (but not otherwise). The difference here seems to be that you actively modify tests/ref/fate/sub-scc, which causes git to consider the file as needing to be restored when you run git reset. When fate updates from one version to another, the files won't be locally modified, i.e. git's stat cache or similar has this file flagged as "not dirty". So I suggest you retry your procedure by not manually modifying the file, but just letting git handle it, simulating exactly what happens on fate instances when updating from one version to another. I.e., first check out 7bf1b9b3576~, nuke the file and check it out again, make sure that it contains CRLF. Then check out current master, which lacks attributes, but the local file in your workdir still contains CRLF. Then do any series of "git reset --hard", with/without "-c core.autocrlf", to commits on your experimental branch, and it won't change the line endings of the ref file, unless there actually are content changes to that particular file, between the git commits that you do check out. // Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [FFmpeg-cvslog] lavf/assenc: normalize line endings to \n
On Tue, 13 Feb 2024, Ridley Combs via ffmpeg-devel wrote: On Feb 13, 2024, at 01:28, Anton Khirnov wrote: Quoting Martin Storsjö (2024-02-12 12:31:29) On Mon, 12 Feb 2024, Hendrik Leppkes wrote: On Mon, Feb 12, 2024 at 11:22 AM Martin Storsjö wrote: diff --git a/.gitattributes b/.gitattributes index 5a19b963b6..a900528e47 100644 --- a/.gitattributes +++ b/.gitattributes @@ -1,2 +1 @@ *.pnm -diff -text -tests/ref/fate/sub-scc eol=crlf This change seems to have had a tricky effect on the tests/ref/fate/sub-scc file. Previously, when checked out, users got the file with CRLF newlines. When updating to this git commit, or past it, that file remains untouched, with CRLF still present, and the fate-sub-scc test fails. If one does "rm tests/ref/fate/sub-scc; git checkout tests/ref/fate/sub-scc", then the file does get restored with LR newlines, and the test passes. It's easy to do this change manually in the source checkout of a fate runner, but I'm not sure how easily we get all fate instances fixed that way - currently this test is failing in most of them. Can this be fixed by restoring the .gitattribute entry but with eol=lf? Not sure if Git would reset the file then. No, that doesn't seem to make any difference. Not sure if there are any other straightforward/elegant fixes, short of renaming the file, which I guess would require renaming the test itself. I'm fine with renaming the test, unless anyone has a better fix. We could probably tweak the fate runner script to make sure this gets fixed up; can anyone try this patch on one of the affected machines? https://gist.github.com/rcombs/c2ad470bf36c5cbd3fc33e699330eb15 That doesn't seem to make any difference. Also, updating fate.sh doesn't necessarily propagate automatically to runners - in order to run fate, one needs to run fate.sh before it even clones/checks out the directory where it fetches the latest source. So unless one later has changed one's setup, to invoke a fate.sh from the checkout, most fate runners just use whatever copy of fate.sh they had when it was set up. Alternately, we could set -text on all fate ref files, or explicitly set eol=of for them, to ensure their line endings never get rewritten like this regardless of git config. I think either of these solutions would fix this in fate, but only after the fix commit gets checked out *followed by* at least one other commit. Neither of those seem to make any difference either. It's quite easy to test for one self: $ git checkout -b experiment $ $ $ git checkout 7bf1b9b3576~ # Reset original state, for testing $ rm tests/ref/fate/sub-scc; git checkout tests/ref/fate/sub-scc $ vi tests/ref/fate/sub-scc # inspect that the file originally has CRLF $ git checkout experiment~ # check out the commit setting attributes $ git checkout experiment # check out the next commit, with the new attributes set $ vi tests/ref/fate/sub-scc # observe that the file still has CRLF $ git checkout --detach $ git -c core.autocrlf=false reset --hard 7bf1b9b3576 $ vi tests/ref/fate/sub-scc # observe that the file still has CRLF It seems to me (I haven't trid to dig into manuals) that the attribute gets stuck in whatever form it was when the file was first created in the workdir. E.g. doing a "git checkout d1df72a702~" (the commit before the file was originally added) followed by "git checkout 7bf1b9b3576" does fix it. This is at least observed with git 2.25.1. Not sure if this is intended behaviour or a bug from git's side. // Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [FFmpeg-cvslog] lavf/assenc: normalize line endings to \n
On Mon, 12 Feb 2024, Hendrik Leppkes wrote: On Mon, Feb 12, 2024 at 11:22 AM Martin Storsjö wrote: > > diff --git a/.gitattributes b/.gitattributes > index 5a19b963b6..a900528e47 100644 > --- a/.gitattributes > +++ b/.gitattributes > @@ -1,2 +1 @@ > *.pnm -diff -text > -tests/ref/fate/sub-scc eol=crlf This change seems to have had a tricky effect on the tests/ref/fate/sub-scc file. Previously, when checked out, users got the file with CRLF newlines. When updating to this git commit, or past it, that file remains untouched, with CRLF still present, and the fate-sub-scc test fails. If one does "rm tests/ref/fate/sub-scc; git checkout tests/ref/fate/sub-scc", then the file does get restored with LR newlines, and the test passes. It's easy to do this change manually in the source checkout of a fate runner, but I'm not sure how easily we get all fate instances fixed that way - currently this test is failing in most of them. Can this be fixed by restoring the .gitattribute entry but with eol=lf? Not sure if Git would reset the file then. No, that doesn't seem to make any difference. Not sure if there are any other straightforward/elegant fixes, short of renaming the file, which I guess would require renaming the test itself. // Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [FFmpeg-cvslog] lavf/assenc: normalize line endings to \n
On Mon, 12 Feb 2024, rcombs wrote: ffmpeg | branch: master | rcombs | Sun Jan 28 14:27:17 2024 -0800| [7bf1b9b35769b37684dd2f18a54f01d852a540c8] | committer: rcombs lavf/assenc: normalize line endings to \n Previously, we produced output with either \r\n or mixed line endings. This was undesirable unto itself, but also made working with patches affecting FATE output particularly challenging, especially via the mailing list. Everything that consumes the SSA/ASS format is line-ending-agnostic, so \n is selected to simplify git/ML usage in FATE. Extra \r characters at the end of a packet are dropped. These are always ignored by the renderer anyway. http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=7bf1b9b35769b37684dd2f18a54f01d852a540c8 --- .gitattributes | 1 - libavformat/assenc.c| 22 ++-- tests/ref/fate/sub-aqtitle | 94 tests/ref/fate/sub-ass-to-ass-transcode | 124 ++--- tests/ref/fate/sub-cc | 32 +++--- tests/ref/fate/sub-cc-realtime | 44 tests/ref/fate/sub-cc-scte20| 34 +++--- tests/ref/fate/sub-charenc | 128 +++--- tests/ref/fate/sub-jacosub | 50 - tests/ref/fate/sub-microdvd | 48 - tests/ref/fate/sub-movtext | 34 +++--- tests/ref/fate/sub-mpl2 | 36 +++ tests/ref/fate/sub-mpsub| 70 ++-- tests/ref/fate/sub-mpsub-frames | 32 +++--- tests/ref/fate/sub-pjs | 34 +++--- tests/ref/fate/sub-realtext | 38 +++ tests/ref/fate/sub-sami | 46 tests/ref/fate/sub-sami2| 186 tests/ref/fate/sub-srt | 102 +- tests/ref/fate/sub-srt-badsyntax| 48 - tests/ref/fate/sub-ssa-to-ass-remux | 168 ++--- tests/ref/fate/sub-stl | 62 +-- tests/ref/fate/sub-subviewer| 34 +++--- tests/ref/fate/sub-subviewer1 | 48 - tests/ref/fate/sub-vplayer | 34 +++--- tests/ref/fate/sub-webvtt | 58 +- tests/ref/fate/sub-webvtt2 | 52 - 27 files changed, 831 insertions(+), 828 deletions(-) diff --git a/.gitattributes b/.gitattributes index 5a19b963b6..a900528e47 100644 --- a/.gitattributes +++ b/.gitattributes @@ -1,2 +1 @@ *.pnm -diff -text -tests/ref/fate/sub-scc eol=crlf This change seems to have had a tricky effect on the tests/ref/fate/sub-scc file. Previously, when checked out, users got the file with CRLF newlines. When updating to this git commit, or past it, that file remains untouched, with CRLF still present, and the fate-sub-scc test fails. If one does "rm tests/ref/fate/sub-scc; git checkout tests/ref/fate/sub-scc", then the file does get restored with LR newlines, and the test passes. It's easy to do this change manually in the source checkout of a fate runner, but I'm not sure how easily we get all fate instances fixed that way - currently this test is failing in most of them. // Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] avcodec/dca_core: Remove unused emms.h inclusion
On Fri, 9 Feb 2024, Andreas Rheinhardt wrote: Possible since 7ec2354c38978b918dc079b611393becb6c80bf7. Signed-off-by: Andreas Rheinhardt --- libavcodec/dca_core.c | 1 - 1 file changed, 1 deletion(-) diff --git a/libavcodec/dca_core.c b/libavcodec/dca_core.c index 5dd727fc72..697fc74295 100644 --- a/libavcodec/dca_core.c +++ b/libavcodec/dca_core.c @@ -19,7 +19,6 @@ */ #include "libavutil/channel_layout.h" -#include "libavutil/emms.h" #include "dcaadpcm.h" #include "dcadec.h" #include "dcadata.h" -- 2.34.1 LGTM and thanks! // Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH v2] lavc/dxv: align to 4x4 blocks instead of 16x16
On Fri, 9 Feb 2024, Connor Worley wrote: The previous assumption that DXV needs to be aligned to 16x16 was erroneous. 4x4 works just as well, and FATE decoder tests pass for all texture formats. On the encoder side, we should reject input that isn't 4x4 aligned, like the HAP encoder does, and stop aligning to 16x16. This both solves the uninitialized reads causing current FATE tests to fail and produces smaller encoded outputs. With regard to correctness, I've checked the decoding path by encoding a real-world sample with git master, and decoding it with ffmpeg -i dxt1-master.mov -c:v rawvideo -f framecrc - The results are exactly the same between master and this patch. On the encoding side, I've encoded a real-world sample with both master and this patch, and decoded both versions with ffmpeg -i dxt1-{master,patch}.mov -c:v rawvideo -f framecrc - Under this patch, results for both inputs are exactly the same. In other words, the extra padding gained by 16x16 alignment over 4x4 alignment has no impact on decoded video. Signed-off-by: Connor Worley --- libavcodec/dxv.c| 6 +++--- libavcodec/dxvenc.c | 14 +++--- tests/ref/fate/dxv3enc-dxt1 | 2 +- 3 files changed, 15 insertions(+), 7 deletions(-) LGTM, will push soon to get FATE back to green again. // Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH v2 2/2] avcodec/hevc_mp4toannexb: check bytes left for nalu_len
On Fri, 9 Feb 2024, Nuo Mi wrote: similar issue as in the previous commit --- libavcodec/bsf/hevc_mp4toannexb.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) Keep in mind, that while the patches are posted together, they can end up at different places further in review, and in commits, so the commit messages should ideally be understandable standalone. // Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] x86: Remove inline MMX assembly that clobbers the FPU state
On Fri, 26 Jan 2024, Martin Storsjö wrote: On Fri, 26 Jan 2024, Martin Storsjö wrote: These inline implementations of AV_COPY64, AV_SWAP64 and AV_ZERO64 are known to clobber the FPU state - which has to be restored with the 'emms' instruction afterwards. This was known and signaled with the FF_COPY_SWAP_ZERO_USES_MMX define, which calling code seems to have been supposed to check, in order to call emms_c() after using them. See 0b1972d4096df5879038f0af776f87f41e90ebd4, 29c4c0886d143790fcbeddbe40a23dfc6f56345c and df215e575850e41b19aeb1fd99e53372a6b3d537 for history on earlier fixes in the same area. However, new code can use these AV_*64() macros without knowing about the need to call emms_c(). Just get rid of these dangerous inline assembly snippets; this doesn't make any difference for 64 bit architectures anyway. Signed-off-by: Martin Storsjö --- libavcodec/dca_core.c| 16 libavutil/x86/intreadwrite.h | 36 2 files changed, 52 deletions(-) I forgot to add some more context here; the VVC tests fail on i386 in some cases. https://patchwork.ffmpeg.org/project/ffmpeg/patch/20240125170518.61211-1-p...@frankplowman.com/ fixes this, by using av_log2() instead of the float log2() in the VVC decoder. This patch fixes the same issue as well, by eliminating the FPU state clobbering (so that float math functions anywhere in decoders work as expected). If there are no better suggestions here, I would like to go ahead and push this. // Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH] flvdec: Honor the "flv_metadata" option for the "datastream" metadata field
By default the option "flv_metadata" (internally using the field name "trust_metadata") is set to 0, meaning that we don't allocate streams based on information in the metadata, only based on actual streams we encounter. However the "datastream" metadata field still would allocate a subtitle stream. When muxing, the "datastream" field is added if either a data stream or subtitle stream is present - but the same metadata field is used to preemtively create a subtitle stream only. Thus, if the field was added due to a data stream, not a subtitle stream, the demuxer would create a stream which won't get any actual packets. If there was such an extra, empty subtitle stream, running avformat_find_stream_info still used to terminate within reasonable time before 3749eede66c3774799766b1f246afae8a6ffc9bb. After that commit, it no longer would terminate until it reaches the max analyze duration, which is 90 seconds for flv streams (see e6a084641aada7a2e4672172f2ee26642800a361, 24fdf7334d2bb9aab0abdbc878b8ae51eb57c86b and f58e011a1f30332ba824c155078ca701e29aef63). Before that commit (which removed the deprecated AVStream.codec), the "st->codecpar->codec_id = AV_CODEC_ID_TEXT", set within the demuxer, would get propagated into st->codec->codec_id by numerous avcodec_parameters_to_context(st->codec, st->codecpar), then further into st->internal->avctx->codec_id by update_stream_avctx within read_frame_internal in libavformat/utils.c (demux.c these days). --- libavformat/flvdec.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/libavformat/flvdec.c b/libavformat/flvdec.c index e25b5bd163..d898341871 100644 --- a/libavformat/flvdec.c +++ b/libavformat/flvdec.c @@ -627,12 +627,7 @@ static int amf_parse_object(AVFormatContext *s, AVStream *astream, else if (!strcmp(key, "audiodatarate") && 0 <= (int)(num_val * 1024.0)) flv->audio_bit_rate = num_val * 1024.0; -else if (!strcmp(key, "datastream")) { -AVStream *st = create_stream(s, AVMEDIA_TYPE_SUBTITLE); -if (!st) -return AVERROR(ENOMEM); -st->codecpar->codec_id = AV_CODEC_ID_TEXT; -} else if (!strcmp(key, "framerate")) { +else if (!strcmp(key, "framerate")) { flv->framerate = av_d2q(num_val, 1000); if (vstream) vstream->avg_frame_rate = flv->framerate; @@ -654,6 +649,11 @@ static int amf_parse_object(AVFormatContext *s, AVStream *astream, vpar->width = num_val; } else if (!strcmp(key, "height") && vpar) { vpar->height = num_val; +} else if (!strcmp(key, "datastream")) { +AVStream *st = create_stream(s, AVMEDIA_TYPE_SUBTITLE); +if (!st) +return AVERROR(ENOMEM); +st->codecpar->codec_id = AV_CODEC_ID_TEXT; } } } -- 2.39.3 (Apple Git-145) ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH 24/24] libs: bump major version for all libraries
On Fri, 26 Jan 2024, James Almer wrote: On 1/26/2024 1:52 PM, Martin Storsjö wrote: On Fri, 26 Jan 2024, James Almer wrote: On 1/26/2024 1:44 PM, Vittorio Giovara wrote: On Thu, Jan 25, 2024 at 2:48 PM James Almer wrote: Signed-off-by: James Almer --- doc/APIchanges | 2 +- libavcodec/version.h | 2 +- libavcodec/version_major.h | 2 +- libavdevice/version.h | 2 +- libavdevice/version_major.h | 2 +- libavfilter/version.h | 2 +- libavfilter/version_major.h | 2 +- libavformat/version.h | 2 +- libavformat/version_major.h | 2 +- libavutil/version.h | 6 +++--- libpostproc/version.h | 2 +- libpostproc/version_major.h | 2 +- libswresample/version.h | 2 +- libswresample/version_major.h | 2 +- libswscale/version.h | 2 +- libswscale/version_major.h | 2 +- 16 files changed, 18 insertions(+), 18 deletions(-) diff --git a/doc/APIchanges b/doc/APIchanges index e477ed78e0..60711379a1 100644 --- a/doc/APIchanges +++ b/doc/APIchanges @@ -1,4 +1,4 @@ -The last version increases of all libraries were on 2023-02-09 +The last version increases of all libraries were on 2024-01-xx API changes, most recent first: diff --git a/libavcodec/version.h b/libavcodec/version.h index 0fae3d06d3..8c3d476003 100644 --- a/libavcodec/version.h +++ b/libavcodec/version.h @@ -29,7 +29,7 @@ #include "version_major.h" -#define LIBAVCODEC_VERSION_MINOR 38 +#define LIBAVCODEC_VERSION_MINOR 0 #define LIBAVCODEC_VERSION_MICRO 100 should we use this bump opportunity to reset MICRO to 0 too? It's an option. I don't recall if we decided anything about it last bump or during a meeting. And i don't know how much code out there still bothers to check for it to distinguish projects. But i guess that after so many bumps, any existing library user has long since stopped looking at it. VLC 3 (which still is the latest stable version) still has got such checks around. VLC git master also still does have some checks, but only for deciding which "AVPROVIDER" to print in log messages, no function differences. VLC 3 surely wont compile and link with current ffmpeg, right? Or did they port it to the decoupled input/output decoder and encoder API, and even the new channel layout API? They do backport updates to ffmpeg to VLC 3 in general, although it seems that they're still pretty far behind (at ffmpeg 4.4.4) indeed. // Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".