Re: [FFmpeg-devel] [PATCH 3/4] aarch64: Add linux runtime cpu feature detection using getauxval(AT_HWCAP)
Le lauantaina 27. toukokuuta 2023, 23.35.14 EEST Martin Storsjö a écrit : > > NEON detection could be added here, though I've yet to see an Armv8 > > implementation without AdvSIMD. > > I guess we could, but as it's part of the require baseline for armv8-a I > don't think there's much need for it? I mean that if you have 95% of the code needed, you might as well add the last 5%. I don't think it's used by much anything. > If configured with --disable-neon we > don't return that cpuflag though. > > > FWIW, DotProd is exposed as HWCAP_ASIMDDP and I8MM is exposed via > > HWCAP2_I8MM, using trapped ID registers is not (yet) necessary. > > Ah, I see. I guess using those would be more straightforward. > > OTOH, HWCAP_CPUID is available much earlier than HWCAP_ASIMDDP or > HWCAP2_I8MM (I do some amount of cross building with a fairly old > sysroot). If the sysroot shares the not-too-old kernel of the host, then ID registers will work anyway. The age of the userspace toolchain affects availability of HWCAP constant definitions, not that of ID registers, which is a purely in- kernel feature. > I'll think about it, whether it's worth complicating things to > try both approaches, or if we should just go with the plain HWCAPs here. I don't really mind either way. -- 雷米‧德尼-库尔蒙 http://www.remlab.net/ ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH 1/4] configure: aarch64: Support assembling the dotprod and i8mm arch extensions
Le sunnuntaina 28. toukokuuta 2023, 0.34.15 EEST Martin Storsjö a écrit : > > Someone would have to write assembler code that would fail to build under > > a toolchain with a lower target version. That sounds like a bug that > > should be spotted and fixed, rather than papered over. > > I don't see how anything here suggests papering over such an issue? For instance, say the toolchain, or the FFmpeg build flags, sets ARMv8.6 as target architecture. It makes perfect sense for the C compiler to emit non- hint ARMv8.3 PAuth instructions, which would crash on an ARMv8.0-8.2 processor. But it makes zero sense for the assembler to change its behaviour. So all existing ASIMD assembler files written for ARMv8.0-A, should still generate ARMv8.0-A code. But now somebody can accidentally write an ARMv8.1-8.6 instruction into their ARMv8.0 assembler code, and it will not trigger a build error (to them). There are no clear reasons to trying to avoid *lowering* the target version for *assembler* (as opposed to C). If a file needs ARMv8.2, it should just ARMv8.2 as its target, whether the overall build targets ARMv8.0, ARMv8.2 or ARMv8.6. The C code around it should ensure that the requirements are met. There are no benefits to preserving a higher version; it just makes the configure checks needlessly intricate. > if we do add ".arch armv8.2-a" and ".arch_extension i8mm", then we > break the dotprod extension. If we only add ".arch_extension i8mm" without > the .arch directive, we get what we want to though. Yes, but you can also just set `.arch armv8.4-a`, which is ostensibly The Way to enable DotProd on a craptastic assembler. If even that doesn't work, the assembler presumably doesn't support DotProd at all (or it's some obscure unstable development version that was snapshot between adding DotProd and adding complete ARMv8.4, which we really should not care about). Likewise, for I8MM, you can just do `.arch armv8.6-a`. So what if the target version was 8.7 or 9.2? Well, in an assembler file, we don't really care, as noted above. > > If the problem is to avoid `.arch_extension`, then I don't really see > > why you can't just use `.arch` with plus, and simplify a lot. > > Well Clang doesn't quite support that currently either. For > ".arch_extension dotprod" it errors out since it doesn't recognize the > dotprod feature in that directive. It does accept ".arch > armv8.2-a+dotprod" but it doesn't actually unlock using the dotprod > extension in the assembly despite that. (I'll look into fixing this in > upstream LLVM afterwards.) I don't know if `.arch_extension` is specified by Arm or just a GCCism. But if you accept DotProd in `.arch` in arch and then don't enable it, then that's clearly a bug. But then again, that's moot if you can just do `.arch armv8.4- a` instead. > Taking it back to the drawing board: So for enabling e.g. i8mm, we could > either do > .arch armv8.6-a > or > .arch armv8.2-a+dotprod > or > .arch armv8.2 > .arch_extension dotprod > > > Out of these, I initially preferred doing the third approach. I agree that that's the cleanest option. But that's not the point here. The point is that what this patch is a hell of a lot more involved than doing just that, and it seems unnecessarily intricate for the purpose of enabling DotProd. > There's no functional difference between the second and third one, except > the single-line form is more messy to handle, as we can have various > combinations of what actually is supported. Yes but a pile of #if's is even more messy to handle. > And with the single-line .arch > form, we can't just add e.g. i8mm on top of a -march= setting that already > supports dotprod, without respecifying what the toolchain itself defaults > to. > > > The documentation for .arch_extension hints at it being possible to > disable support for extensions with it too, but that doesn't seem to be > the case in practice. If it was, we could add macros to only enable > specifically the extensions we want around those functions that should use > them and nothing more. But I guess if that's not actually supported we > can't do that. GCC supports it with RV64 (`.option arch` plus/minus). I haven't tried it on AArch64. Of course, LLVM does not support it, even though the patch has been out there for a long time and even though it is part of the ABI spec rather than made up by GNU/binutils. Thanks to that, RVV is unusable on LLVM (Linux kernel specifically requires GNU/as due to that). > I guess the alternative would be to just try to set .arch > . I was worried that support for > e.g. armv8.6-a appeared later in toolchains than support for the > individual extension i8mm, but at least from a quick browse in binutils > history, they seem to have been added at the same time, so there's > probably no such drawback. > > Or what's the situation with e.g. SVE2 - was ".arch_extension sve2" > supported significantly earlier than ".arch
Re: [FFmpeg-devel] [PATCH] lavc/aarch64: new optimization for 8-bit hevc_pel_uni_w_pixels, qpel_uni_w_h, qpel_uni_w_v, qpel_uni_w_hv and qpel_h
Hi, Martin I have finished the modification, please review again. Thanks. 在 2023/5/26 16:34, Martin Storsjö 写道: Hi, Overall these patches seem mostly ok, but I've got a few minor points to make: - The usdot instruction requires the i8mm extension (part of armv8.6-a), while udot or sdot would require the dotprod extension (available in armv8.4-a). If you could manage with udot or sdot, these functions would be usable on a wider set of CPUs. Therefore, the current guards are wrong. Also, I finally got support implemented for optionally using these cpu extensions, even if the baseline of the compile don't include it, by runtime enabling it. See the patchset at https://patchwork.ffmpeg.org/project/ffmpeg/list/?series=9009. To adapt your patches on top of this, see the two topmost commits at https://github.com/mstorsjo/ffmpeg/commits/archext. Fixed. - The indentation is inconsistent; in the first patch, you have some instructions written like this: + sqadd v1.4s, v1.4s, v29.4s While you later use this style: + dup v1.16b, v28.b[1] The latter seems to match the style we commonly use; please reformat your code to match that consistently. With some macro invocations in the first patch, you also seem to have too much indentation in some places. See e.g. this: +1: ldr q23, [x2, x3] + add x2, x2, x3, lsl #1 + QPEL_FILTER_B v26, v16, v17, v18, v19, v20, v21, v22, v23 + QPEL_FILTER_B2 v27, v16, v17, v18, v19, v20, v21, v22, v23 + QPEL_UNI_W_V_16 + subs w4, w4, #1 + b.eq 2f (If the macro name is too long, that's ok, but here there's no need to have those lines unaligned.) Fixed. - In the third patch, you've got multiple parameters from the stack like this: + ldp x14, x15, [sp] // mx, my + ldr w13, [sp, #16] // width I see that the mx an my parameters are intptr_t; that's good, since if they would be 32 bit integers, the ABI for such parameters on the stack differ between macOS/Darwin and Linux. But as long as they're intptr_t they behave the same. - At the same place, you're backing up a bunch of registers: + stp x20, x21, [sp, #-16]! + stp x22, x23, [sp, #-16]! + stp x24, x25, [sp, #-16]! + stp x26, x27, [sp, #-16]! + stp x28, x30, [sp, #-16]! This is inefficient; instead, do this: + stp x28, x30, [sp, #-80]! + stp x20, x21, [sp, #16] + stp x22, x23, [sp, #32] + stp x24, x25, [sp, #48] + stp x26, x27, [sp, #64] Also, following that, I see that you back up the stack pointer in x28. Why do you use x28 for that? Using x29 would be customary as frame pointer. Using more efficient implementation now. And x28 in this case is a common callee save register. Anyway, I use x19 instead now. Aside for that, I think the rest of the patches is acceptable. // Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".From 224e1b907b9273f6fecaef007730bd1168493515 Mon Sep 17 00:00:00 2001 From: myais Date: Fri, 5 May 2023 22:06:22 +0800 Subject: [PATCH 2/3] lavc/aarch64: new optimization for 8-bit hevc_qpel_uni_w_h --- libavcodec/aarch64/hevcdsp_init_aarch64.c | 15 +- libavcodec/aarch64/hevcdsp_qpel_neon.S| 434 ++ 2 files changed, 448 insertions(+), 1 deletion(-) diff --git a/libavcodec/aarch64/hevcdsp_init_aarch64.c b/libavcodec/aarch64/hevcdsp_init_aarch64.c index 6b5341dd45..a7e62c7d15 100644 --- a/libavcodec/aarch64/hevcdsp_init_aarch64.c +++ b/libavcodec/aarch64/hevcdsp_init_aarch64.c @@ -145,6 +145,7 @@ void ff_hevc_put_hevc_qpel_bi_h16_8_neon(uint8_t *_dst, ptrdiff_t _dststride, co void ff_hevc_put_hevc_##fn##16_8_neon##ext args; \ void ff_hevc_put_hevc_##fn##64_8_neon##ext args; \ + NEON8_FNPROTO(pel_uni_w_pixels, (uint8_t *_dst, ptrdiff_t _dststride, const uint8_t *_src, ptrdiff_t _srcstride, int height, int denom, int wx, int ox, @@ -155,6 +156,12 @@ NEON8_FNPROTO_PARTIAL_4(qpel_uni_w_v, (uint8_t *_dst, ptrdiff_t _dststride, int height, int denom, int wx, int ox, intptr_t mx, intptr_t my, int width),); +NEON8_FNPROTO(qpel_uni_w_h, (uint8_t *_dst, ptrdiff_t _dststride, +const uint8_t *_src, ptrdiff_t _srcstride, +int height, int denom, int wx, int ox, +intptr_t mx, intptr_t my, int width), _i8mm); + + #define NEON8_FNASSIGN(member, v, h, fn, ext) \ member[1][v][h] = ff_hevc_put_hevc_##fn##4_8_neon##ext; \ member[2][v][h] = ff_hevc_put_hevc_##fn##6_8_neon##ext; \ @@ -174,9
Re: [FFmpeg-devel] [PATCH] lavu/tx: stop using av_log(NULL, )
May 28, 2023, 03:07 by jamr...@gmail.com: > On 5/24/2023 9:32 PM, Lynne wrote: > >> May 24, 2023, 23:24 by leo.i...@gmail.com: >> >>> On 5/24/23 16:35, Lynne wrote: >>> Patch attached. >>> >>> +av_log((void *)_class, AV_LOG_DEBUG, "%s\n", bp.str); >>> >>> The type of the first argument to av_log should be AVClass **, but this >>> only appears to be AVClass *. See libavutil/log.c line 428. >>> >>> - Leo Izen >>> >> >> Right, thanks, changed to: >> >>> static const AVClass tx_class = { >>> .class_name = "tx", >>> .item_name = av_default_item_name, >>> .version = LIBAVUTIL_VERSION_INT, >>> }; >>> >>> static const struct { >>> const AVClass *tx_class; >>> } tx_log = { >>> _class, >>> }; >>> >> Will push this tomorrow. >> > > Can't add an AVClass* field to AVTXContext and set it to _class during > init? > The struct is accessed from asm, didn't really want to fix all the loads for something which only runs at init, and only if !CONFIG_SMALL. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] lavu/tx: stop using av_log(NULL, )
On 5/24/2023 9:32 PM, Lynne wrote: May 24, 2023, 23:24 by leo.i...@gmail.com: On 5/24/23 16:35, Lynne wrote: Patch attached. +av_log((void *)_class, AV_LOG_DEBUG, "%s\n", bp.str); The type of the first argument to av_log should be AVClass **, but this only appears to be AVClass *. See libavutil/log.c line 428. - Leo Izen Right, thanks, changed to: static const AVClass tx_class = { .class_name = "tx", .item_name = av_default_item_name, .version = LIBAVUTIL_VERSION_INT, }; static const struct { const AVClass *tx_class; } tx_log = { _class, }; Will push this tomorrow. Can't add an AVClass* field to AVTXContext and set it to _class during init? ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH 1/4] configure: aarch64: Support assembling the dotprod and i8mm arch extensions
On Sat, 27 May 2023, Rémi Denis-Courmont wrote: Le perjantaina 26. toukokuuta 2023, 11.03.12 EEST Martin Storsjö a écrit : These are available since ARMv8.4-a and ARMv8.6-a respectively, but can also be available optionally since ARMv8.2-a. Check if these are available for use unconditionally (e.g. if compiling with -march=armv8.6-a), or if they can be enabled with specific assembler directives. Use ".arch_extension " for enabling a specific extension in assembly; the same can also be achieved with ".arch armv8.2-a+", but with .arch_extension is easier to combine multiple separate features. Enabling these extensions requires setting a base architecture level of armv8.2-a with .arch. Don't add ".arch armv8.2-a" unless necessary; if the base level is high enough (which might unlock other extensions without .arch_extension), we don't want to lower it. I don't follow how that would actually happen, TBH. Even if the default target version is, say, 8.5, the assembler won't magically start emitting 8.5 instructions. Someone would have to write assembler code that would fail to build under a toolchain with a lower target version. That sounds like a bug that should be spotted and fixed, rather than papered over. I don't see how anything here suggests papering over such an issue? I'm not sure exactly which parts of the message you refer to here, but I'll elaborate on the point about why we only should set .arch if we really need to. Consider a build configuration with -march=armv8.4-a. We test that the dotprod extension is available and usable without adding any directives - so we won't add any directives for that. We also test that the assembler does support i8mm, with ".arch armv8.2-a" plus ".arch_extension i8mm". But if we do add ".arch armv8.2-a" and ".arch_extension i8mm", then we break the dotprod extension. If we only add ".arch_extension i8mm" without the .arch directive, we get what we want to though. If the problem is to avoid `.arch_extension`, then I don't really see why you can't just use `.arch` with plus, and simplify a lot. Well Clang doesn't quite support that currently either. For ".arch_extension dotprod" it errors out since it doesn't recognize the dotprod feature in that directive. It does accept ".arch armv8.2-a+dotprod" but it doesn't actually unlock using the dotprod extension in the assembly despite that. (I'll look into fixing this in upstream LLVM afterwards.) As Clang/LLVM has these limitations/issues currently, one main design criterion here is that we shouldn't add any extra .arch/.arch_extension directives unless we need and can (and gain some instruction support from it). Taking it back to the drawing board: So for enabling e.g. i8mm, we could either do .arch armv8.6-a or .arch armv8.2-a+dotprod or .arch armv8.2 .arch_extension dotprod Out of these, I initially preferred doing the third approach. There's no functional difference between the second and third one, except the single-line form is more messy to handle, as we can have various combinations of what actually is supported. And with the single-line .arch form, we can't just add e.g. i8mm on top of a -march= setting that already supports dotprod, without respecifying what the toolchain itself defaults to. The documentation for .arch_extension hints at it being possible to disable support for extensions with it too, but that doesn't seem to be the case in practice. If it was, we could add macros to only enable specifically the extensions we want around those functions that should use them and nothing more. But I guess if that's not actually supported we can't do that. I guess the alternative would be to just try to set .arch . I was worried that support for e.g. armv8.6-a appeared later in toolchains than support for the individual extension i8mm, but at least from a quick browse in binutils history, they seem to have been added at the same time, so there's probably no such drawback. Or what's the situation with e.g. SVE2 - was ".arch_extension sve2" supported significantly earlier than ".arch armv9-a"? It looks like binutils learnt about sve2 in 2019, but about armv9-a in 2021? OTOH that's probably not too much of a real issue either. If we'd do that, it does simplify the configure logic a fair bit and reduces the number of configure variables we need by a lot. It does enable a few more instruction set extensions than what we need though, but that's probably not a real issue. // Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH 3/4] aarch64: Add linux runtime cpu feature detection using getauxval(AT_HWCAP)
On Sat, 27 May 2023, Rémi Denis-Courmont wrote: Le perjantaina 26. toukokuuta 2023, 11.03.14 EEST Martin Storsjö a écrit : Based on code by Janne Grunau. Using HWCAP_CPUID for user space access to the CPU feature registers. See https://www.kernel.org/doc/html/latest/arm64/cpu-feature-registers.html. --- configure | 2 ++ libavutil/aarch64/cpu.c | 38 ++ 2 files changed, 40 insertions(+) diff --git a/configure b/configure index 3c7473efb2..b5357b8d27 100755 --- a/configure +++ b/configure @@ -2207,6 +2207,7 @@ HAVE_LIST_PUB=" HEADERS_LIST=" arpa_inet_h +asm_hwcap_h asm_types_h cdio_paranoia_h cdio_paranoia_paranoia_h @@ -6422,6 +6423,7 @@ check_headers io.h enabled libdrm && check_headers linux/dma-buf.h +check_headers asm/hwcap.h check_headers linux/perf_event.h check_headers libcrystalhd/libcrystalhd_if.h check_headers malloc.h diff --git a/libavutil/aarch64/cpu.c b/libavutil/aarch64/cpu.c index 42b33e4a2d..34c838c2f5 100644 --- a/libavutil/aarch64/cpu.c +++ b/libavutil/aarch64/cpu.c @@ -20,6 +20,42 @@ #include "libavutil/cpu_internal.h" #include "config.h" +#if (defined(__linux__) || defined(__ANDROID__)) && HAVE_GETAUXVAL && HAVE_ASM_HWCAP_H +#include +#include +#include + +#define get_cpu_feature_reg(reg, val) \ +__asm__("mrs %0, " #reg : "=r" (val)) Strictly speaking, this can read any system register. One way to prevent that would be to include the ID_ prefix and _EL1 suffix in the macro. I would have used a pure static inline instead, but that's just a matter of taste. + +static int detect_flags(void) +{ +unsigned long ret = getauxval(AT_HWCAP); +int flags = 0; +#if defined(HWCAP_CPUID) +uint64_t tmp; +if (!(ret & HWCAP_CPUID)) +return flags; +get_cpu_feature_reg(ID_AA64ISAR0_EL1, tmp); +if (((tmp >> 44) & 0xf) == 0x1) +flags |= AV_CPU_FLAG_DOTPROD; +get_cpu_feature_reg(ID_AA64ISAR1_EL1, tmp); +if (((tmp >> 52) & 0xf) == 0x1) +flags |= AV_CPU_FLAG_I8MM; +#endif NEON detection could be added here, though I've yet to see an Armv8 implementation without AdvSIMD. I guess we could, but as it's part of the require baseline for armv8-a I don't think there's much need for it? If configured with --disable-neon we don't return that cpuflag though. FWIW, DotProd is exposed as HWCAP_ASIMDDP and I8MM is exposed via HWCAP2_I8MM, using trapped ID registers is not (yet) necessary. Ah, I see. I guess using those would be more straightforward. OTOH, HWCAP_CPUID is available much earlier than HWCAP_ASIMDDP or HWCAP2_I8MM (I do some amount of cross building with a fairly old sysroot). I'll think about it, whether it's worth complicating things to try both approaches, or if we should just go with the plain HWCAPs here. // Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] lavc/aarch64: new optimization for 8-bit hevc_pel_uni_w_pixels, qpel_uni_w_h, qpel_uni_w_v, qpel_uni_w_hv and qpel_h
Hi, On Sat, 27 May 2023, myais wrote: I saw your new opinions. Do you mean that the code of my current patch should be guard as follows? C code: /if (have_i8mm(cpu_flags)) {// //}/ /asm code :/ /#if HAVE_I8MM/ /#endif/ Yes I mean my current code base does not have those definitions, should I implement them directly as if they already exist? I suggest you apply my patches from the patchset I pointed you to, https://patchwork.ffmpeg.org/project/ffmpeg/list/?series=9009 - or from the start of the branch on github I pointed you to. It's likely that the patches will change a little before they're accepted, but I believe the main interface towards your code, "#if HAVE_I8MM" and "if (have_i8mm(cpu_flags))" will remain as such. // Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] avfilter/af_aresample: switch to activate
Will apply soon. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] avfilter/af_apad: switch to activate
On Sat, May 27, 2023 at 5:46 PM Paul B Mahol wrote: > > > On Sat, May 27, 2023 at 5:43 PM Anton Khirnov wrote: > >> Quoting Paul B Mahol (2023-05-18 19:48:47) >> > Attached. >> > >> > From c06d3d9f6020fdda19641637cafa6f86c77b4b73 Mon Sep 17 00:00:00 2001 >> > From: Paul B Mahol >> > Date: Wed, 17 May 2023 09:14:56 +0200 >> > Subject: [PATCH 21/27] avfilter/af_apad: switch to activate >> > >> > Signed-off-by: Paul B Mahol >> > --- >> > libavfilter/af_apad.c | 105 -- >> > 1 file changed, 71 insertions(+), 34 deletions(-) >> >> The commit message for such commits should mention why is this better, >> because it is not obvious. For one thing, the new code is significantly >> longer. >> > Than I will count your net change in lines too, and will reject such patches from 0. ~30 lines is irrelevant. This fixes EOF reporting which when combined with other filters in filter graph can reduce usage of both processing and memory resources significantly. I'm astonished of your lack of general knowledge in this domain. > > >> >> -- >> Anton Khirnov >> ___ >> ffmpeg-devel mailing list >> ffmpeg-devel@ffmpeg.org >> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel >> >> To unsubscribe, visit link above, or email >> ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe". >> > ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] avfilter/af_apad: switch to activate
On Sat, May 27, 2023 at 5:43 PM Anton Khirnov wrote: > Quoting Paul B Mahol (2023-05-18 19:48:47) > > Attached. > > > > From c06d3d9f6020fdda19641637cafa6f86c77b4b73 Mon Sep 17 00:00:00 2001 > > From: Paul B Mahol > > Date: Wed, 17 May 2023 09:14:56 +0200 > > Subject: [PATCH 21/27] avfilter/af_apad: switch to activate > > > > Signed-off-by: Paul B Mahol > > --- > > libavfilter/af_apad.c | 105 -- > > 1 file changed, 71 insertions(+), 34 deletions(-) > > The commit message for such commits should mention why is this better, > because it is not obvious. For one thing, the new code is significantly > longer. > > > -- > Anton Khirnov > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe". > ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] avfilter/af_apad: switch to activate
Quoting Paul B Mahol (2023-05-18 19:48:47) > Attached. > > From c06d3d9f6020fdda19641637cafa6f86c77b4b73 Mon Sep 17 00:00:00 2001 > From: Paul B Mahol > Date: Wed, 17 May 2023 09:14:56 +0200 > Subject: [PATCH 21/27] avfilter/af_apad: switch to activate > > Signed-off-by: Paul B Mahol > --- > libavfilter/af_apad.c | 105 -- > 1 file changed, 71 insertions(+), 34 deletions(-) The commit message for such commits should mention why is this better, because it is not obvious. For one thing, the new code is significantly longer. -- Anton Khirnov ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH v3] decklink: Convert to using avpriv_packet_list functions
Hello Marton, On Fri, May 12, 2023 at 11:09 AM Devin Heitmueller wrote: > > The existing DecklinkQueue implementation was using the PacketList > structure but wasn't using the standard avpriv_packet_list_get and > avpriv_packet_list_put functions. Convert to using them so we > eliminate the duplicate logic, per Marton Balint's suggestion. > > Updated to reflect feedback from Marton Balint provided on 05/11/23. > > Signed-off-by: Devin Heitmueller I believe this version of the patch incorporates the changes you requested. Did you have any further comments before it can be merged? Thanks, Devin -- Devin Heitmueller, Senior Software Engineer LTN Global Communications o: +1 (301) 363-1001 w: https://ltnglobal.com e: devin.heitmuel...@ltnglobal.com ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] avfilter/af_apad: switch to activate
Will apply. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] nellymoser: user float literals for table
On Sat, May 27, 2023 at 8:17 AM Andreas Rheinhardt wrote: > > Tristan Matthews: > > --- > > libavcodec/nellymoser.c | 36 ++-- > > 1 file changed, 18 insertions(+), 18 deletions(-) > > > > diff --git a/libavcodec/nellymoser.c b/libavcodec/nellymoser.c > > index 66c5f83a56..fa22b79909 100644 > > --- a/libavcodec/nellymoser.c > > +++ b/libavcodec/nellymoser.c > > @@ -39,30 +39,30 @@ > > #include "nellymoser.h" > > > > const float ff_nelly_dequantization_table[127] = { > > - 0.00, > > + 0.00f, > > > > --0.8472560048, 0.7224709988, > > +-0.8472560048f, 0.7224709988f, > > > > --1.5247479677,-0.4531480074, 0.3753609955, 1.4717899561, > > +-1.5247479677f,-0.4531480074f, 0.3753609955f, 1.4717899561f, > > > > --1.9822579622,-1.1929379702,-0.5829370022,-0.0693780035, 0.3909569979, > > 0.9069200158, 1.4862740040, 2.2215409279, > > +-1.9822579622f,-1.1929379702f,-0.5829370022f,-0.0693780035f, > > 0.3909569979f, 0.9069200158f, 1.4862740040f, 2.2215409279f, > > > > --2.3887870312,-1.8067539930,-1.4105420113,-1.0773609877,-0.7995010018,-0.5558109879,-0.3334020078,-0.1324490011, > > - 0.0568020009, 0.2548770010, 0.4773550034, 0.7386850119, 1.0443060398, > > 1.3954459429, 1.8098750114, 2.3918759823, > > +-2.3887870312f,-1.8067539930f,-1.4105420113f,-1.0773609877f,-0.7995010018f,-0.5558109879f,-0.3334020078f,-0.1324490011f, > > + 0.0568020009f, 0.2548770010f, 0.4773550034f, 0.7386850119f, > > 1.0443060398f, 1.3954459429f, 1.8098750114f, 2.3918759823f, > > > > --2.3893830776,-1.9884680510,-1.7514040470,-1.5643119812,-1.3922129869,-1.2164649963,-1.0469499826,-0.8905100226, > > --0.7645580173,-0.6454579830,-0.5259280205,-0.4059549868,-0.3029719889,-0.2096900046,-0.1239869967,-0.0479229987, > > - 0.025773, 0.1001340002, 0.1737180054, 0.2585540116, 0.3522900045, > > 0.4569880068, 0.5767750144, 0.7003160119, > > - 0.8425520062, 1.0093879700, 1.1821349859, 1.3534560204, 1.5320819616, > > 1.7332619429, 1.9722349644, 2.3978140354, > > +-2.3893830776f,-1.9884680510f,-1.7514040470f,-1.5643119812f,-1.3922129869f,-1.2164649963f,-1.0469499826f,-0.8905100226f, > > +-0.7645580173f,-0.6454579830f,-0.5259280205f,-0.4059549868f,-0.3029719889f,-0.2096900046f,-0.1239869967f,-0.0479229987f, > > + 0.025773f, 0.1001340002f, 0.1737180054f, 0.2585540116f, > > 0.3522900045f, 0.4569880068f, 0.5767750144f, 0.7003160119f, > > + 0.8425520062f, 1.0093879700f, 1.1821349859f, 1.3534560204f, > > 1.5320819616f, 1.7332619429f, 1.9722349644f, 2.3978140354f, > > > > --2.5756309032,-2.0573320389,-1.8984919786,-1.7727810144,-1.6662600040,-1.5742180347,-1.4993319511,-1.4316639900, > > --1.3652280569,-1.3000990152,-1.2280930281,-1.1588579416,-1.0921250582,-1.0135740042,-0.9202849865,-0.8287050128, > > --0.7374889851,-0.6447759867,-0.5590940118,-0.4857139885,-0.4110319912,-0.3459700048,-0.2851159871,-0.2341620028, > > --0.1870580018,-0.1442500055,-0.1107169986,-0.0739680007,-0.0365610011,-0.0073290002, > > 0.0203610007, 0.0479039997, > > - 0.0751969963, 0.098091, 0.1220389977, 0.145862, 0.1694349945, > > 0.1970459968, 0.2252430022, 0.2556869984, > > - 0.2870100141, 0.3197099864, 0.3525829911, 0.3889069855, 0.4334920049, > > 0.4769459963, 0.5204820037, 0.5644530058, > > - 0.6122040153, 0.6685929894, 0.7341650128, 0.8032159805, 0.8784040213, > > 0.9566209912, 1.0397069454, 1.1293770075, > > - 1.2211159468, 1.3080279827, 1.4024800062, 1.5056819916, 1.6227730513, > > 1.7724959850, 1.9430880547, 2.2903931141 > > +-2.5756309032f,-2.0573320389f,-1.8984919786f,-1.7727810144f,-1.6662600040f,-1.5742180347f,-1.4993319511f,-1.4316639900f, > > +-1.3652280569f,-1.3000990152f,-1.2280930281f,-1.1588579416f,-1.0921250582f,-1.0135740042f,-0.9202849865f,-0.8287050128f, > > +-0.7374889851f,-0.6447759867f,-0.5590940118f,-0.4857139885f,-0.4110319912f,-0.3459700048f,-0.2851159871f,-0.2341620028f, > > +-0.1870580018f,-0.1442500055f,-0.1107169986f,-0.0739680007f,-0.0365610011f,-0.0073290002f, > > 0.0203610007f, 0.0479039997f, > > + 0.0751969963f, 0.098091f, 0.1220389977f, 0.145862f, > > 0.1694349945f, 0.1970459968f, 0.2252430022f, 0.2556869984f, > > + 0.2870100141f, 0.3197099864f, 0.3525829911f, 0.3889069855f, > > 0.4334920049f, 0.4769459963f, 0.5204820037f, 0.5644530058f, > > + 0.6122040153f, 0.6685929894f, 0.7341650128f, 0.8032159805f, > > 0.8784040213f, 0.9566209912f, 1.0397069454f, 1.1293770075f, > > + 1.2211159468f, 1.3080279827f, 1.4024800062f, 1.5056819916f, > > 1.6227730513f, 1.7724959850f, 1.9430880547f, 2.2903931141f > > }; > > > > const uint8_t ff_nelly_band_sizes_table[NELLY_BANDS] = { > > What's the point of this? Has some compiler started emitting warnings > about the lossy double->float conversions? No, just my eyeballs. There are other places in the encoder/decoder with double to float conversions but changing these changed the bias of some expressions (e.g., the initialization of pow_table) so I thought I'd start with this first unless there was no interest. Best, -t > > -
Re: [FFmpeg-devel] [PATCH] nellymoser: user float literals for table
Tristan Matthews: > --- > libavcodec/nellymoser.c | 36 ++-- > 1 file changed, 18 insertions(+), 18 deletions(-) > > diff --git a/libavcodec/nellymoser.c b/libavcodec/nellymoser.c > index 66c5f83a56..fa22b79909 100644 > --- a/libavcodec/nellymoser.c > +++ b/libavcodec/nellymoser.c > @@ -39,30 +39,30 @@ > #include "nellymoser.h" > > const float ff_nelly_dequantization_table[127] = { > - 0.00, > + 0.00f, > > --0.8472560048, 0.7224709988, > +-0.8472560048f, 0.7224709988f, > > --1.5247479677,-0.4531480074, 0.3753609955, 1.4717899561, > +-1.5247479677f,-0.4531480074f, 0.3753609955f, 1.4717899561f, > > --1.9822579622,-1.1929379702,-0.5829370022,-0.0693780035, 0.3909569979, > 0.9069200158, 1.4862740040, 2.2215409279, > +-1.9822579622f,-1.1929379702f,-0.5829370022f,-0.0693780035f, 0.3909569979f, > 0.9069200158f, 1.4862740040f, 2.2215409279f, > > --2.3887870312,-1.8067539930,-1.4105420113,-1.0773609877,-0.7995010018,-0.5558109879,-0.3334020078,-0.1324490011, > - 0.0568020009, 0.2548770010, 0.4773550034, 0.7386850119, 1.0443060398, > 1.3954459429, 1.8098750114, 2.3918759823, > +-2.3887870312f,-1.8067539930f,-1.4105420113f,-1.0773609877f,-0.7995010018f,-0.5558109879f,-0.3334020078f,-0.1324490011f, > + 0.0568020009f, 0.2548770010f, 0.4773550034f, 0.7386850119f, 1.0443060398f, > 1.3954459429f, 1.8098750114f, 2.3918759823f, > > --2.3893830776,-1.9884680510,-1.7514040470,-1.5643119812,-1.3922129869,-1.2164649963,-1.0469499826,-0.8905100226, > --0.7645580173,-0.6454579830,-0.5259280205,-0.4059549868,-0.3029719889,-0.2096900046,-0.1239869967,-0.0479229987, > - 0.025773, 0.1001340002, 0.1737180054, 0.2585540116, 0.3522900045, > 0.4569880068, 0.5767750144, 0.7003160119, > - 0.8425520062, 1.0093879700, 1.1821349859, 1.3534560204, 1.5320819616, > 1.7332619429, 1.9722349644, 2.3978140354, > +-2.3893830776f,-1.9884680510f,-1.7514040470f,-1.5643119812f,-1.3922129869f,-1.2164649963f,-1.0469499826f,-0.8905100226f, > +-0.7645580173f,-0.6454579830f,-0.5259280205f,-0.4059549868f,-0.3029719889f,-0.2096900046f,-0.1239869967f,-0.0479229987f, > + 0.025773f, 0.1001340002f, 0.1737180054f, 0.2585540116f, 0.3522900045f, > 0.4569880068f, 0.5767750144f, 0.7003160119f, > + 0.8425520062f, 1.0093879700f, 1.1821349859f, 1.3534560204f, 1.5320819616f, > 1.7332619429f, 1.9722349644f, 2.3978140354f, > > --2.5756309032,-2.0573320389,-1.8984919786,-1.7727810144,-1.6662600040,-1.5742180347,-1.4993319511,-1.4316639900, > --1.3652280569,-1.3000990152,-1.2280930281,-1.1588579416,-1.0921250582,-1.0135740042,-0.9202849865,-0.8287050128, > --0.7374889851,-0.6447759867,-0.5590940118,-0.4857139885,-0.4110319912,-0.3459700048,-0.2851159871,-0.2341620028, > --0.1870580018,-0.1442500055,-0.1107169986,-0.0739680007,-0.0365610011,-0.0073290002, > 0.0203610007, 0.0479039997, > - 0.0751969963, 0.098091, 0.1220389977, 0.145862, 0.1694349945, > 0.1970459968, 0.2252430022, 0.2556869984, > - 0.2870100141, 0.3197099864, 0.3525829911, 0.3889069855, 0.4334920049, > 0.4769459963, 0.5204820037, 0.5644530058, > - 0.6122040153, 0.6685929894, 0.7341650128, 0.8032159805, 0.8784040213, > 0.9566209912, 1.0397069454, 1.1293770075, > - 1.2211159468, 1.3080279827, 1.4024800062, 1.5056819916, 1.6227730513, > 1.7724959850, 1.9430880547, 2.2903931141 > +-2.5756309032f,-2.0573320389f,-1.8984919786f,-1.7727810144f,-1.6662600040f,-1.5742180347f,-1.4993319511f,-1.4316639900f, > +-1.3652280569f,-1.3000990152f,-1.2280930281f,-1.1588579416f,-1.0921250582f,-1.0135740042f,-0.9202849865f,-0.8287050128f, > +-0.7374889851f,-0.6447759867f,-0.5590940118f,-0.4857139885f,-0.4110319912f,-0.3459700048f,-0.2851159871f,-0.2341620028f, > +-0.1870580018f,-0.1442500055f,-0.1107169986f,-0.0739680007f,-0.0365610011f,-0.0073290002f, > 0.0203610007f, 0.0479039997f, > + 0.0751969963f, 0.098091f, 0.1220389977f, 0.145862f, 0.1694349945f, > 0.1970459968f, 0.2252430022f, 0.2556869984f, > + 0.2870100141f, 0.3197099864f, 0.3525829911f, 0.3889069855f, 0.4334920049f, > 0.4769459963f, 0.5204820037f, 0.5644530058f, > + 0.6122040153f, 0.6685929894f, 0.7341650128f, 0.8032159805f, 0.8784040213f, > 0.9566209912f, 1.0397069454f, 1.1293770075f, > + 1.2211159468f, 1.3080279827f, 1.4024800062f, 1.5056819916f, 1.6227730513f, > 1.7724959850f, 1.9430880547f, 2.2903931141f > }; > > const uint8_t ff_nelly_band_sizes_table[NELLY_BANDS] = { What's the point of this? Has some compiler started emitting warnings about the lossy double->float conversions? - Andreas ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] fate: add adpcm_ima_cunning tests
On 27/5/23 19:14, Anton Khirnov wrote: Hi Zane, Quoting Zane van Iperen (2020-05-09 16:00:04) diff --git a/tests/ref/fate/adpcm-ima-cunning-trunc-t2-track1 b/tests/ref/fate/adpcm-ima-cunning-trunc-t2-track1 new file mode 100644 index 00..df9edc403d --- /dev/null +++ b/tests/ref/fate/adpcm-ima-cunning-trunc-t2-track1 @@ -0,0 +1 @@ +d41d8cd98f00b204e9800998ecf8427e This test seems to produce an empty file. Is that intended? Responded on IRC, posting here for posterity: Yes, that test is meant to produce an empty file. The channels are planar - the input file has missing data for the second one. See https://0x0.st/HqUw.png ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] fate: add adpcm_ima_cunning tests
Hi Zane, Quoting Zane van Iperen (2020-05-09 16:00:04) > diff --git a/tests/ref/fate/adpcm-ima-cunning-trunc-t2-track1 > b/tests/ref/fate/adpcm-ima-cunning-trunc-t2-track1 > new file mode 100644 > index 00..df9edc403d > --- /dev/null > +++ b/tests/ref/fate/adpcm-ima-cunning-trunc-t2-track1 > @@ -0,0 +1 @@ > +d41d8cd98f00b204e9800998ecf8427e This test seems to produce an empty file. Is that intended? -- Anton Khirnov ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH 3/4] aarch64: Add linux runtime cpu feature detection using getauxval(AT_HWCAP)
Le perjantaina 26. toukokuuta 2023, 11.03.14 EEST Martin Storsjö a écrit : > Based on code by Janne Grunau. > > Using HWCAP_CPUID for user space access to the CPU feature registers. See > https://www.kernel.org/doc/html/latest/arm64/cpu-feature-registers.html. > --- > configure | 2 ++ > libavutil/aarch64/cpu.c | 38 ++ > 2 files changed, 40 insertions(+) > > diff --git a/configure b/configure > index 3c7473efb2..b5357b8d27 100755 > --- a/configure > +++ b/configure > @@ -2207,6 +2207,7 @@ HAVE_LIST_PUB=" > > HEADERS_LIST=" > arpa_inet_h > +asm_hwcap_h > asm_types_h > cdio_paranoia_h > cdio_paranoia_paranoia_h > @@ -6422,6 +6423,7 @@ check_headers io.h > enabled libdrm && > check_headers linux/dma-buf.h > > +check_headers asm/hwcap.h > check_headers linux/perf_event.h > check_headers libcrystalhd/libcrystalhd_if.h > check_headers malloc.h > diff --git a/libavutil/aarch64/cpu.c b/libavutil/aarch64/cpu.c > index 42b33e4a2d..34c838c2f5 100644 > --- a/libavutil/aarch64/cpu.c > +++ b/libavutil/aarch64/cpu.c > @@ -20,6 +20,42 @@ > #include "libavutil/cpu_internal.h" > #include "config.h" > > +#if (defined(__linux__) || defined(__ANDROID__)) && HAVE_GETAUXVAL && > HAVE_ASM_HWCAP_H +#include > +#include > +#include > + > +#define get_cpu_feature_reg(reg, val) \ > +__asm__("mrs %0, " #reg : "=r" (val)) Strictly speaking, this can read any system register. One way to prevent that would be to include the ID_ prefix and _EL1 suffix in the macro. I would have used a pure static inline instead, but that's just a matter of taste. > + > +static int detect_flags(void) > +{ > +unsigned long ret = getauxval(AT_HWCAP); > +int flags = 0; > +#if defined(HWCAP_CPUID) > +uint64_t tmp; > +if (!(ret & HWCAP_CPUID)) > +return flags; > +get_cpu_feature_reg(ID_AA64ISAR0_EL1, tmp); > +if (((tmp >> 44) & 0xf) == 0x1) > +flags |= AV_CPU_FLAG_DOTPROD; > +get_cpu_feature_reg(ID_AA64ISAR1_EL1, tmp); > +if (((tmp >> 52) & 0xf) == 0x1) > +flags |= AV_CPU_FLAG_I8MM; > +#endif NEON detection could be added here, though I've yet to see an Armv8 implementation without AdvSIMD. FWIW, DotProd is exposed as HWCAP_ASIMDDP and I8MM is exposed via HWCAP2_I8MM, using trapped ID registers is not (yet) necessary. > + > +return flags; > +} > + > +#else > + > +static int detect_flags(void) > +{ > +return 0; > +} > + > +#endif > + > int ff_get_cpu_flags_aarch64(void) > { > int flags = AV_CPU_FLAG_ARMV8 * HAVE_ARMV8 | > @@ -33,6 +69,8 @@ int ff_get_cpu_flags_aarch64(void) > flags |= AV_CPU_FLAG_I8MM; > #endif > > +flags |= detect_flags(); > + > return flags; > } -- レミ・デニ-クールモン http://www.remlab.net/ ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH 1/4] configure: aarch64: Support assembling the dotprod and i8mm arch extensions
Le perjantaina 26. toukokuuta 2023, 11.03.12 EEST Martin Storsjö a écrit : > These are available since ARMv8.4-a and ARMv8.6-a respectively, > but can also be available optionally since ARMv8.2-a. > > Check if these are available for use unconditionally (e.g. if compiling > with -march=armv8.6-a), or if they can be enabled with specific > assembler directives. > > Use ".arch_extension " for enabling a specific extension in > assembly; the same can also be achieved with ".arch armv8.2-a+", > but with .arch_extension is easier to combine multiple separate > features. > > Enabling these extensions requires setting a base architecture level > of armv8.2-a with .arch. Don't add ".arch armv8.2-a" unless necessary; > if the base level is high enough (which might unlock other extensions > without .arch_extension), we don't want to lower it. I don't follow how that would actually happen, TBH. Even if the default target version is, say, 8.5, the assembler won't magically start emitting 8.5 instructions. Someone would have to write assembler code that would fail to build under a toolchain with a lower target version. That sounds like a bug that should be spotted and fixed, rather than papered over. Conversely the logic here seems unnecessarily, if not counter-productively, intricate. > Only add .arch/.arch_extension if needed, e.g. current clang fails > to recognize the dotprod and i8mm features in .arch_extension, but > can successfully assemble these instructions if part of the baseline > set with -march. IME, Clang is utterly useless for assembling. That has become one of my pet peeves with Rust inline assembler, which is other much nicer than C inline assembler, whence you can't just work around it with `-no-integrated-as`. But I digress. If the problem is to avoid `.arch_extension`, then I don't really see why you can't just use `.arch` with plus, and simplify a lot. -- 雷米‧德尼-库尔蒙 http://www.remlab.net/ ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] lavc/aarch64: new optimization for 8-bit hevc_pel_uni_w_pixels, qpel_uni_w_h, qpel_uni_w_v, qpel_uni_w_hv and qpel_h
Hi, Martin, I saw your new opinions. Do you mean that the code of my current patch should be guard as follows? C code: /if (have_i8mm(cpu_flags)) {// //}/ /asm code :/ /#if HAVE_I8MM/ /#endif/ I mean my current code base does not have those definitions, should I implement them directly as if they already exist? other opinions is under modification.. 在 2023/5/26 16:34, Martin Storsjö 写道: Hi, Overall these patches seem mostly ok, but I've got a few minor points to make: - The usdot instruction requires the i8mm extension (part of armv8.6-a), while udot or sdot would require the dotprod extension (available in armv8.4-a). If you could manage with udot or sdot, these functions would be usable on a wider set of CPUs. Therefore, the current guards are wrong. Also, I finally got support implemented for optionally using these cpu extensions, even if the baseline of the compile don't include it, by runtime enabling it. See the patchset at https://patchwork.ffmpeg.org/project/ffmpeg/list/?series=9009. To adapt your patches on top of this, see the two topmost commits at https://github.com/mstorsjo/ffmpeg/commits/archext. - The indentation is inconsistent; in the first patch, you have some instructions written like this: + sqadd v1.4s, v1.4s, v29.4s While you later use this style: + dup v1.16b, v28.b[1] The latter seems to match the style we commonly use; please reformat your code to match that consistently. With some macro invocations in the first patch, you also seem to have too much indentation in some places. See e.g. this: +1: ldr q23, [x2, x3] + add x2, x2, x3, lsl #1 + QPEL_FILTER_B v26, v16, v17, v18, v19, v20, v21, v22, v23 + QPEL_FILTER_B2 v27, v16, v17, v18, v19, v20, v21, v22, v23 + QPEL_UNI_W_V_16 + subs w4, w4, #1 + b.eq 2f (If the macro name is too long, that's ok, but here there's no need to have those lines unaligned.) - In the third patch, you've got multiple parameters from the stack like this: + ldp x14, x15, [sp] // mx, my + ldr w13, [sp, #16] // width I see that the mx an my parameters are intptr_t; that's good, since if they would be 32 bit integers, the ABI for such parameters on the stack differ between macOS/Darwin and Linux. But as long as they're intptr_t they behave the same. - At the same place, you're backing up a bunch of registers: + stp x20, x21, [sp, #-16]! + stp x22, x23, [sp, #-16]! + stp x24, x25, [sp, #-16]! + stp x26, x27, [sp, #-16]! + stp x28, x30, [sp, #-16]! This is inefficient; instead, do this: + stp x28, x30, [sp, #-80]! + stp x20, x21, [sp, #16] + stp x22, x23, [sp, #32] + stp x24, x25, [sp, #48] + stp x26, x27, [sp, #64] Also, following that, I see that you back up the stack pointer in x28. Why do you use x28 for that? Using x29 would be customary as frame pointer. Aside for that, I think the rest of the patches is acceptable. // Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe". ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] lavc/aarch64: new optimization for 8-bit hevc_pel_uni_w_pixels, qpel_uni_w_h, qpel_uni_w_v, qpel_uni_w_hv and qpel_h
yes, of course, Re-add the missing mailing list... 在 2023/5/27 13:45, Martin Storsjö 写道: Hi, Thanks - can you send the new patches to the mailing list too? They need to be available publicly for review before they can be accepted. (I didn't check these yet.) // Martin On Sat, 27 May 2023, myais wrote: Hi, Martin, Thank you for your correction, and I completed the modification according to your opinion, the attachments are the new patches. Thanks. 在 2023/5/24 20:49, Martin Storsjö 写道: Hi, On Tue, 23 May 2023, myais wrote: Do you have any new opinions here? I am looking forward to your reply. I've started looking at this now after focusing on a different issue first. The big thing is that this is the first new optional instruction set on top of aarch64, so there's a bit of work to do to handle that properly (with runtime detection, and assembling that code even if the baseline target doesn't support it). I've started looking into that now. In your case your patches don't care about that and just hardcode enabling it if the compiler baseline support the instruction, and skips it otherwise. I guess that's possibly fine, but your condition for the code is wrong; the "usdot" instruction requires the "i8mm" extension, not "dotprod". i8mm is part of armv8.6-a (and is available on graviton 3, luckily, which allows me to test it). So instead of __ARM_FEATURE_DOTPROD, this should use __ARM_FEATURE_MATMUL_INT8, and the functions should probably use i8mm as suffix instead of dotprod. I guess you can resubmit them with that change (and make sure you don't end up with the "no trailing newline at the end of file" issue in the changed files in any intermediate commit). In addition, I have some new similar patches, which are the aarch64 implementations of some other functions, should I wait for your feedback before submitting or submit it directly? I'd prefer to settle these patches first before taking on the next set. // Martin From b08c4e571a49de109a0619fb9de9461af4901115 Mon Sep 17 00:00:00 2001 From: myais Date: Wed, 3 May 2023 09:53:07 +0800 Subject: [PATCH 1/3] lavc/aarch64: new optimization for 8-bit hevc_pel_uni_w_pixels and qpel_uni_w_v --- libavcodec/aarch64/hevcdsp_init_aarch64.c | 51 ++ libavcodec/aarch64/hevcdsp_qpel_neon.S| 710 ++ 2 files changed, 761 insertions(+) diff --git a/libavcodec/aarch64/hevcdsp_init_aarch64.c b/libavcodec/aarch64/hevcdsp_init_aarch64.c index be1049a2ec..6b5341dd45 100644 --- a/libavcodec/aarch64/hevcdsp_init_aarch64.c +++ b/libavcodec/aarch64/hevcdsp_init_aarch64.c @@ -128,6 +128,52 @@ void ff_hevc_put_hevc_qpel_bi_h16_8_neon(uint8_t *_dst, ptrdiff_t _dststride, co ptrdiff_t _srcstride, const int16_t *src2, int height, intptr_t mx, intptr_t my, int width); +#define NEON8_FNPROTO(fn, args, ext) \ +void ff_hevc_put_hevc_##fn##4_8_neon##ext args; \ +void ff_hevc_put_hevc_##fn##6_8_neon##ext args; \ +void ff_hevc_put_hevc_##fn##8_8_neon##ext args; \ +void ff_hevc_put_hevc_##fn##12_8_neon##ext args; \ +void ff_hevc_put_hevc_##fn##16_8_neon##ext args; \ +void ff_hevc_put_hevc_##fn##24_8_neon##ext args; \ +void ff_hevc_put_hevc_##fn##32_8_neon##ext args; \ +void ff_hevc_put_hevc_##fn##48_8_neon##ext args; \ +void ff_hevc_put_hevc_##fn##64_8_neon##ext args; \ + +#define NEON8_FNPROTO_PARTIAL_4(fn, args, ext) \ +void ff_hevc_put_hevc_##fn##4_8_neon##ext args; \ +void ff_hevc_put_hevc_##fn##8_8_neon##ext args; \ +void ff_hevc_put_hevc_##fn##16_8_neon##ext args; \ +void ff_hevc_put_hevc_##fn##64_8_neon##ext args; \ + +NEON8_FNPROTO(pel_uni_w_pixels, (uint8_t *_dst, ptrdiff_t _dststride, +const uint8_t *_src, ptrdiff_t _srcstride, +int height, int denom, int wx, int ox, +intptr_t mx, intptr_t my, int width),); + +NEON8_FNPROTO_PARTIAL_4(qpel_uni_w_v, (uint8_t *_dst, ptrdiff_t _dststride, +const uint8_t *_src, ptrdiff_t _srcstride, +int height, int denom, int wx, int ox, +intptr_t mx, intptr_t my, int width),); + +#define NEON8_FNASSIGN(member, v, h, fn, ext) \ +member[1][v][h] = ff_hevc_put_hevc_##fn##4_8_neon##ext; \ +member[2][v][h] = ff_hevc_put_hevc_##fn##6_8_neon##ext; \ +member[3][v][h] = ff_hevc_put_hevc_##fn##8_8_neon##ext; \ +member[4][v][h] = ff_hevc_put_hevc_##fn##12_8_neon##ext; \ +member[5][v][h] = ff_hevc_put_hevc_##fn##16_8_neon##ext; \ +member[6][v][h] = ff_hevc_put_hevc_##fn##24_8_neon##ext; \ +member[7][v][h] = ff_hevc_put_hevc_##fn##32_8_neon##ext; \ +member[8][v][h] = ff_hevc_put_hevc_##fn##48_8_neon##ext; \ +member[9][v][h] = ff_hevc_put_hevc_##fn##64_8_neon##ext; + +#define NEON8_FNASSIGN_PARTIAL_4(member, v, h, fn, ext) \ +member[1][v][h] = ff_hevc_put_hevc_##fn##4_8_neon##ext; \ +member[3][v][h] =