Re: [FFmpeg-devel] [PATCH 3/4] aarch64: Add linux runtime cpu feature detection using getauxval(AT_HWCAP)

2023-05-27 Thread Rémi Denis-Courmont
Le lauantaina 27. toukokuuta 2023, 23.35.14 EEST Martin Storsjö a écrit :
> > NEON detection could be added here, though I've yet to see an Armv8
> > implementation without AdvSIMD.
> 
> I guess we could, but as it's part of the require baseline for armv8-a I
> don't think there's much need for it?

I mean that if you have 95% of the code needed, you might as well add the last 
5%. I don't think it's used by much anything.

> If configured with --disable-neon we
> don't return that cpuflag though.
> 
> > FWIW, DotProd is exposed as HWCAP_ASIMDDP and I8MM is exposed via
> > HWCAP2_I8MM, using trapped ID registers is not (yet) necessary.
> 
> Ah, I see. I guess using those would be more straightforward.
> 
> OTOH, HWCAP_CPUID is available much earlier than HWCAP_ASIMDDP or
> HWCAP2_I8MM (I do some amount of cross building with a fairly old
> sysroot).

If the sysroot shares the not-too-old kernel of the host, then ID registers 
will work anyway. The age of the userspace toolchain affects availability of 
HWCAP constant definitions, not that of ID registers, which is a purely in-
kernel feature.

> I'll think about it, whether it's worth complicating things to
> try both approaches, or if we should just go with the plain HWCAPs here.

I don't really mind either way.

-- 
雷米‧德尼-库尔蒙
http://www.remlab.net/



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 1/4] configure: aarch64: Support assembling the dotprod and i8mm arch extensions

2023-05-27 Thread Rémi Denis-Courmont
Le sunnuntaina 28. toukokuuta 2023, 0.34.15 EEST Martin Storsjö a écrit :
> > Someone would have to write assembler code that would fail to build under
> > a toolchain with a lower target version. That sounds like a bug that
> > should be spotted and fixed, rather than papered over.
> 
> I don't see how anything here suggests papering over such an issue?

For instance, say the toolchain, or the FFmpeg build flags, sets ARMv8.6 as 
target architecture. It makes perfect sense for the C compiler to emit non-
hint ARMv8.3 PAuth instructions, which would crash on an ARMv8.0-8.2 
processor. But it makes zero sense for the assembler to change its behaviour. 
So all existing ASIMD assembler files written for ARMv8.0-A, should still 
generate ARMv8.0-A code.

But now somebody can accidentally write an ARMv8.1-8.6 instruction into their 
ARMv8.0 assembler code, and it will not trigger a build error (to them). There 
are no clear reasons to trying to avoid *lowering* the target version for 
*assembler* (as opposed to C).

If a file needs ARMv8.2, it should just ARMv8.2 as its target, whether the 
overall build targets ARMv8.0, ARMv8.2 or ARMv8.6. The C code around it should 
ensure that the requirements are met. There are no benefits to preserving a 
higher version; it just makes the configure checks needlessly intricate.

> if we do add ".arch armv8.2-a" and ".arch_extension i8mm", then we
> break the dotprod extension. If we only add ".arch_extension i8mm" without
> the .arch directive, we get what we want to though.

Yes, but you can also just set `.arch armv8.4-a`, which is ostensibly The Way 
to enable DotProd on a craptastic assembler. If even that doesn't work, the 
assembler presumably doesn't support DotProd at all (or it's some obscure 
unstable development version that was snapshot between adding DotProd and 
adding complete ARMv8.4, which we really should not care about).

Likewise, for I8MM, you can just do `.arch armv8.6-a`.

So what if the target version was 8.7 or 9.2? Well, in an assembler file, we 
don't really care, as noted above.

> > If the problem is to avoid `.arch_extension`, then I don't really see
> > why you can't just use `.arch` with plus, and simplify a lot.
> 
> Well Clang doesn't quite support that currently either. For
> ".arch_extension dotprod" it errors out since it doesn't recognize the
> dotprod feature in that directive. It does accept ".arch
> armv8.2-a+dotprod" but it doesn't actually unlock using the dotprod
> extension in the assembly despite that. (I'll look into fixing this in
> upstream LLVM afterwards.)

I don't know if `.arch_extension` is specified by Arm or just a GCCism. But if 
you accept DotProd in `.arch` in arch and then don't enable it, then that's 
clearly a bug. But then again, that's moot if you can just do `.arch armv8.4-
a` instead.

> Taking it back to the drawing board: So for enabling e.g. i8mm, we could
> either do
>  .arch armv8.6-a
> or
>  .arch armv8.2-a+dotprod
> or
>  .arch armv8.2
>  .arch_extension dotprod
> 
> 
> Out of these, I initially preferred doing the third approach.

I agree that that's the cleanest option. But that's not the point here. The 
point is that what this patch is a hell of a lot more involved than doing just 
that, and it seems unnecessarily intricate for the purpose of enabling 
DotProd.

> There's no functional difference between the second and third one, except
> the single-line form is more messy to handle, as we can have various
> combinations of what actually is supported.

Yes but a pile of #if's is even more messy to handle.

> And with the single-line .arch
> form, we can't just add e.g. i8mm on top of a -march= setting that already
> supports dotprod, without respecifying what the toolchain itself defaults
> to.
> 
> 
> The documentation for .arch_extension hints at it being possible to
> disable support for extensions with it too, but that doesn't seem to be
> the case in practice. If it was, we could add macros to only enable
> specifically the extensions we want around those functions that should use
> them and nothing more. But I guess if that's not actually supported we
> can't do that.

GCC supports it with RV64 (`.option arch` plus/minus). I haven't tried it on 
AArch64. Of course, LLVM does not support it, even though the patch has been 
out there for a long time and even though it is part of the ABI spec rather 
than made up by GNU/binutils. Thanks to that, RVV is unusable on LLVM (Linux 
kernel specifically requires GNU/as due to that).

> I guess the alternative would be to just try to set .arch
> . I was worried that support for
> e.g. armv8.6-a appeared later in toolchains than support for the
> individual extension i8mm, but at least from a quick browse in binutils
> history, they seem to have been added at the same time, so there's
> probably no such drawback.
> 
> Or what's the situation with e.g. SVE2 - was ".arch_extension sve2"
> supported significantly earlier than ".arch 

Re: [FFmpeg-devel] [PATCH] lavc/aarch64: new optimization for 8-bit hevc_pel_uni_w_pixels, qpel_uni_w_h, qpel_uni_w_v, qpel_uni_w_hv and qpel_h

2023-05-27 Thread Logan.Lyu

Hi, Martin

I have finished the modification, please review again.

Thanks.


在 2023/5/26 16:34, Martin Storsjö 写道:

Hi,

Overall these patches seem mostly ok, but I've got a few minor points 
to make:


- The usdot instruction requires the i8mm extension (part of 
armv8.6-a), while udot or sdot would require the dotprod extension 
(available in armv8.4-a). If you could manage with udot or sdot, these 
functions would be usable on a wider set of CPUs.


Therefore, the current guards are wrong. Also, I finally got support 
implemented for optionally using these cpu extensions, even if the 
baseline of the compile don't include it, by runtime enabling it. See 
the patchset at 
https://patchwork.ffmpeg.org/project/ffmpeg/list/?series=9009.


To adapt your patches on top of this, see the two topmost commits at 
https://github.com/mstorsjo/ffmpeg/commits/archext.

Fixed.
- The indentation is inconsistent; in the first patch, you have some 
instructions written like this:


+    sqadd   v1.4s, v1.4s, v29.4s

While you later use this style:

+    dup v1.16b, v28.b[1]

The latter seems to match the style we commonly use; please reformat 
your code to match that consistently.


With some macro invocations in the first patch, you also seem to have 
too much indentation in some places. See e.g. this:


+1:  ldr q23, [x2, x3]
+    add x2, x2, x3, lsl #1
+    QPEL_FILTER_B v26, v16, v17, v18, v19, v20, v21, v22, v23
+    QPEL_FILTER_B2    v27, v16, v17, v18, v19, v20, v21, v22, v23
+    QPEL_UNI_W_V_16
+    subs    w4, w4, #1
+    b.eq    2f

(If the macro name is too long, that's ok, but here there's no need to 
have those lines unaligned.)

Fixed.


- In the third patch, you've got multiple parameters from the stack 
like this:


+    ldp x14, x15, [sp]  // mx, my
+    ldr w13, [sp, #16]  // width

I see that the mx an my parameters are intptr_t; that's good, since if 
they would be 32 bit integers, the ABI for such parameters on the 
stack differ between macOS/Darwin and Linux. But as long as they're 
intptr_t they behave the same.


- At the same place, you're backing up a bunch of registers:

+    stp x20, x21, [sp, #-16]!
+    stp x22, x23, [sp, #-16]!
+    stp x24, x25, [sp, #-16]!
+    stp x26, x27, [sp, #-16]!
+    stp x28, x30, [sp, #-16]!

This is inefficient; instead, do this:

+    stp x28, x30, [sp, #-80]!
+    stp x20, x21, [sp, #16]
+    stp x22, x23, [sp, #32]
+    stp x24, x25, [sp, #48]
+    stp x26, x27, [sp, #64]

Also, following that, I see that you back up the stack pointer in x28. 
Why do you use x28 for that? Using x29 would be customary as frame 
pointer.
Using more efficient implementation now.  And x28 in this case is a 
common callee save register. Anyway, I use x19 instead now.


Aside for that, I think the rest of the patches is acceptable.

// Martin

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".From 224e1b907b9273f6fecaef007730bd1168493515 Mon Sep 17 00:00:00 2001
From: myais 
Date: Fri, 5 May 2023 22:06:22 +0800
Subject: [PATCH 2/3] lavc/aarch64: new optimization for 8-bit
 hevc_qpel_uni_w_h

---
 libavcodec/aarch64/hevcdsp_init_aarch64.c |  15 +-
 libavcodec/aarch64/hevcdsp_qpel_neon.S| 434 ++
 2 files changed, 448 insertions(+), 1 deletion(-)

diff --git a/libavcodec/aarch64/hevcdsp_init_aarch64.c 
b/libavcodec/aarch64/hevcdsp_init_aarch64.c
index 6b5341dd45..a7e62c7d15 100644
--- a/libavcodec/aarch64/hevcdsp_init_aarch64.c
+++ b/libavcodec/aarch64/hevcdsp_init_aarch64.c
@@ -145,6 +145,7 @@ void ff_hevc_put_hevc_qpel_bi_h16_8_neon(uint8_t *_dst, 
ptrdiff_t _dststride, co
 void ff_hevc_put_hevc_##fn##16_8_neon##ext args; \
 void ff_hevc_put_hevc_##fn##64_8_neon##ext args; \
 
+
 NEON8_FNPROTO(pel_uni_w_pixels, (uint8_t *_dst, ptrdiff_t _dststride,
 const uint8_t *_src, ptrdiff_t _srcstride,
 int height, int denom, int wx, int ox,
@@ -155,6 +156,12 @@ NEON8_FNPROTO_PARTIAL_4(qpel_uni_w_v, (uint8_t *_dst,  
ptrdiff_t _dststride,
 int height, int denom, int wx, int ox,
 intptr_t mx, intptr_t my, int width),);
 
+NEON8_FNPROTO(qpel_uni_w_h, (uint8_t *_dst,  ptrdiff_t _dststride,
+const uint8_t *_src, ptrdiff_t _srcstride,
+int height, int denom, int wx, int ox,
+intptr_t mx, intptr_t my, int width), _i8mm);
+
+
 #define NEON8_FNASSIGN(member, v, h, fn, ext) \
 member[1][v][h] = ff_hevc_put_hevc_##fn##4_8_neon##ext;  \
 member[2][v][h] = ff_hevc_put_hevc_##fn##6_8_neon##ext;  \
@@ -174,9 

Re: [FFmpeg-devel] [PATCH] lavu/tx: stop using av_log(NULL, )

2023-05-27 Thread Lynne
May 28, 2023, 03:07 by jamr...@gmail.com:

> On 5/24/2023 9:32 PM, Lynne wrote:
>
>> May 24, 2023, 23:24 by leo.i...@gmail.com:
>>
>>> On 5/24/23 16:35, Lynne wrote:
>>>
 Patch attached.

>>>
>>> +av_log((void *)_class, AV_LOG_DEBUG, "%s\n", bp.str);
>>>
>>> The type of the first argument to av_log should be AVClass **, but this 
>>> only appears to be AVClass *. See libavutil/log.c line 428.
>>>
>>> - Leo Izen
>>>
>>
>> Right, thanks, changed to:
>>
>>> static const AVClass tx_class = {
>>>       .class_name    = "tx",
>>>       .item_name = av_default_item_name,
>>>       .version   = LIBAVUTIL_VERSION_INT,
>>> };
>>>
>>> static const struct {
>>>       const AVClass *tx_class;
>>> } tx_log = {
>>>       _class,
>>> };
>>>
>> Will push this tomorrow.
>>
>
> Can't add an AVClass* field to AVTXContext and set it to _class during 
> init?
>

The struct is accessed from asm, didn't really want to fix all the loads
for something which only runs at init, and only if !CONFIG_SMALL.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH] lavu/tx: stop using av_log(NULL, )

2023-05-27 Thread James Almer

On 5/24/2023 9:32 PM, Lynne wrote:

May 24, 2023, 23:24 by leo.i...@gmail.com:


On 5/24/23 16:35, Lynne wrote:


Patch attached.



+av_log((void *)_class, AV_LOG_DEBUG, "%s\n", bp.str);

The type of the first argument to av_log should be AVClass **, but this only 
appears to be AVClass *. See libavutil/log.c line 428.

- Leo Izen



Right, thanks, changed to:


static const AVClass tx_class = {
      .class_name    = "tx",
      .item_name = av_default_item_name,
      .version   = LIBAVUTIL_VERSION_INT,
};

static const struct {
      const AVClass *tx_class;
} tx_log = {
      _class,
};

Will push this tomorrow.


Can't add an AVClass* field to AVTXContext and set it to _class 
during init?

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 1/4] configure: aarch64: Support assembling the dotprod and i8mm arch extensions

2023-05-27 Thread Martin Storsjö

On Sat, 27 May 2023, Rémi Denis-Courmont wrote:


Le perjantaina 26. toukokuuta 2023, 11.03.12 EEST Martin Storsjö a écrit :

These are available since ARMv8.4-a and ARMv8.6-a respectively,
but can also be available optionally since ARMv8.2-a.

Check if these are available for use unconditionally (e.g. if compiling
with -march=armv8.6-a), or if they can be enabled with specific
assembler directives.

Use ".arch_extension " for enabling a specific extension in
assembly; the same can also be achieved with ".arch armv8.2-a+",
but with .arch_extension is easier to combine multiple separate
features.

Enabling these extensions requires setting a base architecture level
of armv8.2-a with .arch. Don't add ".arch armv8.2-a" unless necessary;
if the base level is high enough (which might unlock other extensions
without .arch_extension), we don't want to lower it.


I don't follow how that would actually happen, TBH. Even if the default target 
version is, say, 8.5, the assembler won't magically start emitting 8.5 
instructions.


Someone would have to write assembler code that would fail to build under a 
toolchain with a lower target version. That sounds like a bug that should be 
spotted and fixed, rather than papered over.


I don't see how anything here suggests papering over such an issue?

I'm not sure exactly which parts of the message you refer to here, but 
I'll elaborate on the point about why we only should set .arch if we 
really need to.



Consider a build configuration with -march=armv8.4-a. We test that the 
dotprod extension is available and usable without adding any directives - 
so we won't add any directives for that. We also test that the assembler 
does support i8mm, with ".arch armv8.2-a" plus ".arch_extension i8mm".


But if we do add ".arch armv8.2-a" and ".arch_extension i8mm", then we 
break the dotprod extension. If we only add ".arch_extension i8mm" without 
the .arch directive, we get what we want to though.


If the problem is to avoid `.arch_extension`, then I don't really see 
why you can't just use `.arch` with plus, and simplify a lot.


Well Clang doesn't quite support that currently either. For 
".arch_extension dotprod" it errors out since it doesn't recognize the 
dotprod feature in that directive. It does accept ".arch 
armv8.2-a+dotprod" but it doesn't actually unlock using the dotprod 
extension in the assembly despite that. (I'll look into fixing this in 
upstream LLVM afterwards.)


As Clang/LLVM has these limitations/issues currently, one main design 
criterion here is that we shouldn't add any extra .arch/.arch_extension 
directives unless we need and can (and gain some instruction support from 
it).



Taking it back to the drawing board: So for enabling e.g. i8mm, we could 
either do

.arch armv8.6-a
or
.arch armv8.2-a+dotprod
or
.arch armv8.2
.arch_extension dotprod


Out of these, I initially preferred doing the third approach.

There's no functional difference between the second and third one, except 
the single-line form is more messy to handle, as we can have various 
combinations of what actually is supported. And with the single-line .arch 
form, we can't just add e.g. i8mm on top of a -march= setting that already 
supports dotprod, without respecifying what the toolchain itself defaults 
to.



The documentation for .arch_extension hints at it being possible to 
disable support for extensions with it too, but that doesn't seem to be 
the case in practice. If it was, we could add macros to only enable 
specifically the extensions we want around those functions that should use 
them and nothing more. But I guess if that's not actually supported we 
can't do that.



I guess the alternative would be to just try to set .arch 
. I was worried that support for 
e.g. armv8.6-a appeared later in toolchains than support for the 
individual extension i8mm, but at least from a quick browse in binutils 
history, they seem to have been added at the same time, so there's 
probably no such drawback.


Or what's the situation with e.g. SVE2 - was ".arch_extension sve2" 
supported significantly earlier than ".arch armv9-a"? It looks like 
binutils learnt about sve2 in 2019, but about armv9-a in 2021? OTOH that's 
probably not too much of a real issue either.


If we'd do that, it does simplify the configure logic a fair bit and 
reduces the number of configure variables we need by a lot. It does enable 
a few more instruction set extensions than what we need though, but that's 
probably not a real issue.


// Martin
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 3/4] aarch64: Add linux runtime cpu feature detection using getauxval(AT_HWCAP)

2023-05-27 Thread Martin Storsjö

On Sat, 27 May 2023, Rémi Denis-Courmont wrote:


Le perjantaina 26. toukokuuta 2023, 11.03.14 EEST Martin Storsjö a écrit :

Based on code by Janne Grunau.

Using HWCAP_CPUID for user space access to the CPU feature registers. See
https://www.kernel.org/doc/html/latest/arm64/cpu-feature-registers.html.
---
 configure   |  2 ++
 libavutil/aarch64/cpu.c | 38 ++
 2 files changed, 40 insertions(+)

diff --git a/configure b/configure
index 3c7473efb2..b5357b8d27 100755
--- a/configure
+++ b/configure
@@ -2207,6 +2207,7 @@ HAVE_LIST_PUB="

 HEADERS_LIST="
 arpa_inet_h
+asm_hwcap_h
 asm_types_h
 cdio_paranoia_h
 cdio_paranoia_paranoia_h
@@ -6422,6 +6423,7 @@ check_headers io.h
 enabled libdrm &&
 check_headers linux/dma-buf.h

+check_headers asm/hwcap.h
 check_headers linux/perf_event.h
 check_headers libcrystalhd/libcrystalhd_if.h
 check_headers malloc.h
diff --git a/libavutil/aarch64/cpu.c b/libavutil/aarch64/cpu.c
index 42b33e4a2d..34c838c2f5 100644
--- a/libavutil/aarch64/cpu.c
+++ b/libavutil/aarch64/cpu.c
@@ -20,6 +20,42 @@
 #include "libavutil/cpu_internal.h"
 #include "config.h"

+#if (defined(__linux__) || defined(__ANDROID__)) && HAVE_GETAUXVAL &&
HAVE_ASM_HWCAP_H +#include 
+#include 
+#include 
+
+#define get_cpu_feature_reg(reg, val) \
+__asm__("mrs %0, " #reg : "=r" (val))


Strictly speaking, this can read any system register. One way to prevent that 
would be to include the ID_ prefix and _EL1 suffix in the macro. I would have 
used a pure static inline instead, but that's just a matter of taste.



+
+static int detect_flags(void)
+{
+unsigned long ret = getauxval(AT_HWCAP);
+int flags = 0;
+#if defined(HWCAP_CPUID)
+uint64_t tmp;
+if (!(ret & HWCAP_CPUID))
+return flags;
+get_cpu_feature_reg(ID_AA64ISAR0_EL1, tmp);
+if (((tmp >> 44) & 0xf) == 0x1)
+flags |= AV_CPU_FLAG_DOTPROD;
+get_cpu_feature_reg(ID_AA64ISAR1_EL1, tmp);
+if (((tmp >> 52) & 0xf) == 0x1)
+flags |= AV_CPU_FLAG_I8MM;
+#endif


NEON detection could be added here, though I've yet to see an Armv8 
implementation without AdvSIMD.


I guess we could, but as it's part of the require baseline for armv8-a I 
don't think there's much need for it? If configured with --disable-neon we 
don't return that cpuflag though.


FWIW, DotProd is exposed as HWCAP_ASIMDDP and I8MM is exposed via 
HWCAP2_I8MM, using trapped ID registers is not (yet) necessary.


Ah, I see. I guess using those would be more straightforward.

OTOH, HWCAP_CPUID is available much earlier than HWCAP_ASIMDDP or 
HWCAP2_I8MM (I do some amount of cross building with a fairly old 
sysroot). I'll think about it, whether it's worth complicating things to 
try both approaches, or if we should just go with the plain HWCAPs here.


// Martin
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH] lavc/aarch64: new optimization for 8-bit hevc_pel_uni_w_pixels, qpel_uni_w_h, qpel_uni_w_v, qpel_uni_w_hv and qpel_h

2023-05-27 Thread Martin Storsjö

Hi,

On Sat, 27 May 2023, myais wrote:

I saw your new opinions. Do you mean that the code of my current patch 
should be guard as follows?


C code:

/if (have_i8mm(cpu_flags)) {//
//}/

/asm code :/

/#if HAVE_I8MM/

/#endif/


Yes

I mean my current code base does not have those definitions, should I 
implement them directly as if they already exist?


I suggest you apply my patches from the patchset I pointed you to, 
https://patchwork.ffmpeg.org/project/ffmpeg/list/?series=9009 - or from 
the start of the branch on github I pointed you to. It's likely that the 
patches will change a little before they're accepted, but I believe the 
main interface towards your code, "#if HAVE_I8MM" and "if 
(have_i8mm(cpu_flags))" will remain as such.


// Martin
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH] avfilter/af_aresample: switch to activate

2023-05-27 Thread Paul B Mahol
Will apply soon.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH] avfilter/af_apad: switch to activate

2023-05-27 Thread Paul B Mahol
On Sat, May 27, 2023 at 5:46 PM Paul B Mahol  wrote:

>
>
> On Sat, May 27, 2023 at 5:43 PM Anton Khirnov  wrote:
>
>> Quoting Paul B Mahol (2023-05-18 19:48:47)
>> > Attached.
>> >
>> > From c06d3d9f6020fdda19641637cafa6f86c77b4b73 Mon Sep 17 00:00:00 2001
>> > From: Paul B Mahol 
>> > Date: Wed, 17 May 2023 09:14:56 +0200
>> > Subject: [PATCH 21/27] avfilter/af_apad: switch to activate
>> >
>> > Signed-off-by: Paul B Mahol 
>> > ---
>> >  libavfilter/af_apad.c | 105 --
>> >  1 file changed, 71 insertions(+), 34 deletions(-)
>>
>> The commit message for such commits should mention why is this better,
>> because it is not obvious. For one thing, the new code is significantly
>> longer.
>>
>

Than I will count your net change in lines too, and will reject such
patches from 0.
~30 lines is irrelevant. This fixes EOF reporting which when combined with
other filters in filter graph can reduce usage of both processing and
memory resources significantly.

I'm astonished of your lack of general knowledge in this domain.



>
>
>>
>> --
>> Anton Khirnov
>> ___
>> ffmpeg-devel mailing list
>> ffmpeg-devel@ffmpeg.org
>> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>>
>> To unsubscribe, visit link above, or email
>> ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
>>
>
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH] avfilter/af_apad: switch to activate

2023-05-27 Thread Paul B Mahol
On Sat, May 27, 2023 at 5:43 PM Anton Khirnov  wrote:

> Quoting Paul B Mahol (2023-05-18 19:48:47)
> > Attached.
> >
> > From c06d3d9f6020fdda19641637cafa6f86c77b4b73 Mon Sep 17 00:00:00 2001
> > From: Paul B Mahol 
> > Date: Wed, 17 May 2023 09:14:56 +0200
> > Subject: [PATCH 21/27] avfilter/af_apad: switch to activate
> >
> > Signed-off-by: Paul B Mahol 
> > ---
> >  libavfilter/af_apad.c | 105 --
> >  1 file changed, 71 insertions(+), 34 deletions(-)
>
> The commit message for such commits should mention why is this better,
> because it is not obvious. For one thing, the new code is significantly
> longer.
>




>
> --
> Anton Khirnov
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
>
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH] avfilter/af_apad: switch to activate

2023-05-27 Thread Anton Khirnov
Quoting Paul B Mahol (2023-05-18 19:48:47)
> Attached.
> 
> From c06d3d9f6020fdda19641637cafa6f86c77b4b73 Mon Sep 17 00:00:00 2001
> From: Paul B Mahol 
> Date: Wed, 17 May 2023 09:14:56 +0200
> Subject: [PATCH 21/27] avfilter/af_apad: switch to activate
> 
> Signed-off-by: Paul B Mahol 
> ---
>  libavfilter/af_apad.c | 105 --
>  1 file changed, 71 insertions(+), 34 deletions(-)

The commit message for such commits should mention why is this better,
because it is not obvious. For one thing, the new code is significantly
longer.

-- 
Anton Khirnov
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH v3] decklink: Convert to using avpriv_packet_list functions

2023-05-27 Thread Devin Heitmueller
Hello Marton,

On Fri, May 12, 2023 at 11:09 AM Devin Heitmueller
 wrote:
>
> The existing DecklinkQueue implementation was using the PacketList
> structure but wasn't using the standard avpriv_packet_list_get and
> avpriv_packet_list_put functions.  Convert to using them so we
> eliminate the duplicate logic, per Marton Balint's suggestion.
>
> Updated to reflect feedback from Marton Balint provided on 05/11/23.
>
> Signed-off-by: Devin Heitmueller 

I believe this version of the patch incorporates the changes you
requested.  Did you have any further comments before it can be merged?

Thanks,

Devin

-- 
Devin Heitmueller, Senior Software Engineer
LTN Global Communications
o: +1 (301) 363-1001
w: https://ltnglobal.com  e: devin.heitmuel...@ltnglobal.com
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH] avfilter/af_apad: switch to activate

2023-05-27 Thread Paul B Mahol
Will apply.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH] nellymoser: user float literals for table

2023-05-27 Thread Tristan Matthews
On Sat, May 27, 2023 at 8:17 AM Andreas Rheinhardt
 wrote:
>
> Tristan Matthews:
> > ---
> >  libavcodec/nellymoser.c | 36 ++--
> >  1 file changed, 18 insertions(+), 18 deletions(-)
> >
> > diff --git a/libavcodec/nellymoser.c b/libavcodec/nellymoser.c
> > index 66c5f83a56..fa22b79909 100644
> > --- a/libavcodec/nellymoser.c
> > +++ b/libavcodec/nellymoser.c
> > @@ -39,30 +39,30 @@
> >  #include "nellymoser.h"
> >
> >  const float ff_nelly_dequantization_table[127] = {
> > - 0.00,
> > + 0.00f,
> >
> > --0.8472560048, 0.7224709988,
> > +-0.8472560048f, 0.7224709988f,
> >
> > --1.5247479677,-0.4531480074, 0.3753609955, 1.4717899561,
> > +-1.5247479677f,-0.4531480074f, 0.3753609955f, 1.4717899561f,
> >
> > --1.9822579622,-1.1929379702,-0.5829370022,-0.0693780035, 0.3909569979, 
> > 0.9069200158, 1.4862740040, 2.2215409279,
> > +-1.9822579622f,-1.1929379702f,-0.5829370022f,-0.0693780035f, 
> > 0.3909569979f, 0.9069200158f, 1.4862740040f, 2.2215409279f,
> >
> > --2.3887870312,-1.8067539930,-1.4105420113,-1.0773609877,-0.7995010018,-0.5558109879,-0.3334020078,-0.1324490011,
> > - 0.0568020009, 0.2548770010, 0.4773550034, 0.7386850119, 1.0443060398, 
> > 1.3954459429, 1.8098750114, 2.3918759823,
> > +-2.3887870312f,-1.8067539930f,-1.4105420113f,-1.0773609877f,-0.7995010018f,-0.5558109879f,-0.3334020078f,-0.1324490011f,
> > + 0.0568020009f, 0.2548770010f, 0.4773550034f, 0.7386850119f, 
> > 1.0443060398f, 1.3954459429f, 1.8098750114f, 2.3918759823f,
> >
> > --2.3893830776,-1.9884680510,-1.7514040470,-1.5643119812,-1.3922129869,-1.2164649963,-1.0469499826,-0.8905100226,
> > --0.7645580173,-0.6454579830,-0.5259280205,-0.4059549868,-0.3029719889,-0.2096900046,-0.1239869967,-0.0479229987,
> > - 0.025773, 0.1001340002, 0.1737180054, 0.2585540116, 0.3522900045, 
> > 0.4569880068, 0.5767750144, 0.7003160119,
> > - 0.8425520062, 1.0093879700, 1.1821349859, 1.3534560204, 1.5320819616, 
> > 1.7332619429, 1.9722349644, 2.3978140354,
> > +-2.3893830776f,-1.9884680510f,-1.7514040470f,-1.5643119812f,-1.3922129869f,-1.2164649963f,-1.0469499826f,-0.8905100226f,
> > +-0.7645580173f,-0.6454579830f,-0.5259280205f,-0.4059549868f,-0.3029719889f,-0.2096900046f,-0.1239869967f,-0.0479229987f,
> > + 0.025773f, 0.1001340002f, 0.1737180054f, 0.2585540116f, 
> > 0.3522900045f, 0.4569880068f, 0.5767750144f, 0.7003160119f,
> > + 0.8425520062f, 1.0093879700f, 1.1821349859f, 1.3534560204f, 
> > 1.5320819616f, 1.7332619429f, 1.9722349644f, 2.3978140354f,
> >
> > --2.5756309032,-2.0573320389,-1.8984919786,-1.7727810144,-1.6662600040,-1.5742180347,-1.4993319511,-1.4316639900,
> > --1.3652280569,-1.3000990152,-1.2280930281,-1.1588579416,-1.0921250582,-1.0135740042,-0.9202849865,-0.8287050128,
> > --0.7374889851,-0.6447759867,-0.5590940118,-0.4857139885,-0.4110319912,-0.3459700048,-0.2851159871,-0.2341620028,
> > --0.1870580018,-0.1442500055,-0.1107169986,-0.0739680007,-0.0365610011,-0.0073290002,
> >  0.0203610007, 0.0479039997,
> > - 0.0751969963, 0.098091, 0.1220389977, 0.145862, 0.1694349945, 
> > 0.1970459968, 0.2252430022, 0.2556869984,
> > - 0.2870100141, 0.3197099864, 0.3525829911, 0.3889069855, 0.4334920049, 
> > 0.4769459963, 0.5204820037, 0.5644530058,
> > - 0.6122040153, 0.6685929894, 0.7341650128, 0.8032159805, 0.8784040213, 
> > 0.9566209912, 1.0397069454, 1.1293770075,
> > - 1.2211159468, 1.3080279827, 1.4024800062, 1.5056819916, 1.6227730513, 
> > 1.7724959850, 1.9430880547, 2.2903931141
> > +-2.5756309032f,-2.0573320389f,-1.8984919786f,-1.7727810144f,-1.6662600040f,-1.5742180347f,-1.4993319511f,-1.4316639900f,
> > +-1.3652280569f,-1.3000990152f,-1.2280930281f,-1.1588579416f,-1.0921250582f,-1.0135740042f,-0.9202849865f,-0.8287050128f,
> > +-0.7374889851f,-0.6447759867f,-0.5590940118f,-0.4857139885f,-0.4110319912f,-0.3459700048f,-0.2851159871f,-0.2341620028f,
> > +-0.1870580018f,-0.1442500055f,-0.1107169986f,-0.0739680007f,-0.0365610011f,-0.0073290002f,
> >  0.0203610007f, 0.0479039997f,
> > + 0.0751969963f, 0.098091f, 0.1220389977f, 0.145862f, 
> > 0.1694349945f, 0.1970459968f, 0.2252430022f, 0.2556869984f,
> > + 0.2870100141f, 0.3197099864f, 0.3525829911f, 0.3889069855f, 
> > 0.4334920049f, 0.4769459963f, 0.5204820037f, 0.5644530058f,
> > + 0.6122040153f, 0.6685929894f, 0.7341650128f, 0.8032159805f, 
> > 0.8784040213f, 0.9566209912f, 1.0397069454f, 1.1293770075f,
> > + 1.2211159468f, 1.3080279827f, 1.4024800062f, 1.5056819916f, 
> > 1.6227730513f, 1.7724959850f, 1.9430880547f, 2.2903931141f
> >  };
> >
> >  const uint8_t ff_nelly_band_sizes_table[NELLY_BANDS] = {
>
> What's the point of this? Has some compiler started emitting warnings
> about the lossy double->float conversions?

No, just my eyeballs. There are other places in the encoder/decoder
with double to float conversions but changing these changed the bias
of some expressions (e.g., the initialization of pow_table) so I
thought I'd start with this first unless there was no interest.

Best,
-t

>
> - 

Re: [FFmpeg-devel] [PATCH] nellymoser: user float literals for table

2023-05-27 Thread Andreas Rheinhardt
Tristan Matthews:
> ---
>  libavcodec/nellymoser.c | 36 ++--
>  1 file changed, 18 insertions(+), 18 deletions(-)
> 
> diff --git a/libavcodec/nellymoser.c b/libavcodec/nellymoser.c
> index 66c5f83a56..fa22b79909 100644
> --- a/libavcodec/nellymoser.c
> +++ b/libavcodec/nellymoser.c
> @@ -39,30 +39,30 @@
>  #include "nellymoser.h"
>  
>  const float ff_nelly_dequantization_table[127] = {
> - 0.00,
> + 0.00f,
>  
> --0.8472560048, 0.7224709988,
> +-0.8472560048f, 0.7224709988f,
>  
> --1.5247479677,-0.4531480074, 0.3753609955, 1.4717899561,
> +-1.5247479677f,-0.4531480074f, 0.3753609955f, 1.4717899561f,
>  
> --1.9822579622,-1.1929379702,-0.5829370022,-0.0693780035, 0.3909569979, 
> 0.9069200158, 1.4862740040, 2.2215409279,
> +-1.9822579622f,-1.1929379702f,-0.5829370022f,-0.0693780035f, 0.3909569979f, 
> 0.9069200158f, 1.4862740040f, 2.2215409279f,
>  
> --2.3887870312,-1.8067539930,-1.4105420113,-1.0773609877,-0.7995010018,-0.5558109879,-0.3334020078,-0.1324490011,
> - 0.0568020009, 0.2548770010, 0.4773550034, 0.7386850119, 1.0443060398, 
> 1.3954459429, 1.8098750114, 2.3918759823,
> +-2.3887870312f,-1.8067539930f,-1.4105420113f,-1.0773609877f,-0.7995010018f,-0.5558109879f,-0.3334020078f,-0.1324490011f,
> + 0.0568020009f, 0.2548770010f, 0.4773550034f, 0.7386850119f, 1.0443060398f, 
> 1.3954459429f, 1.8098750114f, 2.3918759823f,
>  
> --2.3893830776,-1.9884680510,-1.7514040470,-1.5643119812,-1.3922129869,-1.2164649963,-1.0469499826,-0.8905100226,
> --0.7645580173,-0.6454579830,-0.5259280205,-0.4059549868,-0.3029719889,-0.2096900046,-0.1239869967,-0.0479229987,
> - 0.025773, 0.1001340002, 0.1737180054, 0.2585540116, 0.3522900045, 
> 0.4569880068, 0.5767750144, 0.7003160119,
> - 0.8425520062, 1.0093879700, 1.1821349859, 1.3534560204, 1.5320819616, 
> 1.7332619429, 1.9722349644, 2.3978140354,
> +-2.3893830776f,-1.9884680510f,-1.7514040470f,-1.5643119812f,-1.3922129869f,-1.2164649963f,-1.0469499826f,-0.8905100226f,
> +-0.7645580173f,-0.6454579830f,-0.5259280205f,-0.4059549868f,-0.3029719889f,-0.2096900046f,-0.1239869967f,-0.0479229987f,
> + 0.025773f, 0.1001340002f, 0.1737180054f, 0.2585540116f, 0.3522900045f, 
> 0.4569880068f, 0.5767750144f, 0.7003160119f,
> + 0.8425520062f, 1.0093879700f, 1.1821349859f, 1.3534560204f, 1.5320819616f, 
> 1.7332619429f, 1.9722349644f, 2.3978140354f,
>  
> --2.5756309032,-2.0573320389,-1.8984919786,-1.7727810144,-1.6662600040,-1.5742180347,-1.4993319511,-1.4316639900,
> --1.3652280569,-1.3000990152,-1.2280930281,-1.1588579416,-1.0921250582,-1.0135740042,-0.9202849865,-0.8287050128,
> --0.7374889851,-0.6447759867,-0.5590940118,-0.4857139885,-0.4110319912,-0.3459700048,-0.2851159871,-0.2341620028,
> --0.1870580018,-0.1442500055,-0.1107169986,-0.0739680007,-0.0365610011,-0.0073290002,
>  0.0203610007, 0.0479039997,
> - 0.0751969963, 0.098091, 0.1220389977, 0.145862, 0.1694349945, 
> 0.1970459968, 0.2252430022, 0.2556869984,
> - 0.2870100141, 0.3197099864, 0.3525829911, 0.3889069855, 0.4334920049, 
> 0.4769459963, 0.5204820037, 0.5644530058,
> - 0.6122040153, 0.6685929894, 0.7341650128, 0.8032159805, 0.8784040213, 
> 0.9566209912, 1.0397069454, 1.1293770075,
> - 1.2211159468, 1.3080279827, 1.4024800062, 1.5056819916, 1.6227730513, 
> 1.7724959850, 1.9430880547, 2.2903931141
> +-2.5756309032f,-2.0573320389f,-1.8984919786f,-1.7727810144f,-1.6662600040f,-1.5742180347f,-1.4993319511f,-1.4316639900f,
> +-1.3652280569f,-1.3000990152f,-1.2280930281f,-1.1588579416f,-1.0921250582f,-1.0135740042f,-0.9202849865f,-0.8287050128f,
> +-0.7374889851f,-0.6447759867f,-0.5590940118f,-0.4857139885f,-0.4110319912f,-0.3459700048f,-0.2851159871f,-0.2341620028f,
> +-0.1870580018f,-0.1442500055f,-0.1107169986f,-0.0739680007f,-0.0365610011f,-0.0073290002f,
>  0.0203610007f, 0.0479039997f,
> + 0.0751969963f, 0.098091f, 0.1220389977f, 0.145862f, 0.1694349945f, 
> 0.1970459968f, 0.2252430022f, 0.2556869984f,
> + 0.2870100141f, 0.3197099864f, 0.3525829911f, 0.3889069855f, 0.4334920049f, 
> 0.4769459963f, 0.5204820037f, 0.5644530058f,
> + 0.6122040153f, 0.6685929894f, 0.7341650128f, 0.8032159805f, 0.8784040213f, 
> 0.9566209912f, 1.0397069454f, 1.1293770075f,
> + 1.2211159468f, 1.3080279827f, 1.4024800062f, 1.5056819916f, 1.6227730513f, 
> 1.7724959850f, 1.9430880547f, 2.2903931141f
>  };
>  
>  const uint8_t ff_nelly_band_sizes_table[NELLY_BANDS] = {

What's the point of this? Has some compiler started emitting warnings
about the lossy double->float conversions?

- Andreas

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH] fate: add adpcm_ima_cunning tests

2023-05-27 Thread Zane van Iperen




On 27/5/23 19:14, Anton Khirnov wrote:

Hi Zane,
Quoting Zane van Iperen (2020-05-09 16:00:04)

diff --git a/tests/ref/fate/adpcm-ima-cunning-trunc-t2-track1 
b/tests/ref/fate/adpcm-ima-cunning-trunc-t2-track1
new file mode 100644
index 00..df9edc403d
--- /dev/null
+++ b/tests/ref/fate/adpcm-ima-cunning-trunc-t2-track1
@@ -0,0 +1 @@
+d41d8cd98f00b204e9800998ecf8427e


This test seems to produce an empty file. Is that intended?



Responded on IRC, posting here for posterity:

Yes, that test is meant to produce an empty file. The channels are planar - the 
input file has missing data for the second one.
See https://0x0.st/HqUw.png
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH] fate: add adpcm_ima_cunning tests

2023-05-27 Thread Anton Khirnov
Hi Zane,
Quoting Zane van Iperen (2020-05-09 16:00:04)
> diff --git a/tests/ref/fate/adpcm-ima-cunning-trunc-t2-track1 
> b/tests/ref/fate/adpcm-ima-cunning-trunc-t2-track1
> new file mode 100644
> index 00..df9edc403d
> --- /dev/null
> +++ b/tests/ref/fate/adpcm-ima-cunning-trunc-t2-track1
> @@ -0,0 +1 @@
> +d41d8cd98f00b204e9800998ecf8427e

This test seems to produce an empty file. Is that intended?

-- 
Anton Khirnov
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 3/4] aarch64: Add linux runtime cpu feature detection using getauxval(AT_HWCAP)

2023-05-27 Thread Rémi Denis-Courmont
Le perjantaina 26. toukokuuta 2023, 11.03.14 EEST Martin Storsjö a écrit :
> Based on code by Janne Grunau.
> 
> Using HWCAP_CPUID for user space access to the CPU feature registers. See
> https://www.kernel.org/doc/html/latest/arm64/cpu-feature-registers.html.
> ---
>  configure   |  2 ++
>  libavutil/aarch64/cpu.c | 38 ++
>  2 files changed, 40 insertions(+)
> 
> diff --git a/configure b/configure
> index 3c7473efb2..b5357b8d27 100755
> --- a/configure
> +++ b/configure
> @@ -2207,6 +2207,7 @@ HAVE_LIST_PUB="
> 
>  HEADERS_LIST="
>  arpa_inet_h
> +asm_hwcap_h
>  asm_types_h
>  cdio_paranoia_h
>  cdio_paranoia_paranoia_h
> @@ -6422,6 +6423,7 @@ check_headers io.h
>  enabled libdrm &&
>  check_headers linux/dma-buf.h
> 
> +check_headers asm/hwcap.h
>  check_headers linux/perf_event.h
>  check_headers libcrystalhd/libcrystalhd_if.h
>  check_headers malloc.h
> diff --git a/libavutil/aarch64/cpu.c b/libavutil/aarch64/cpu.c
> index 42b33e4a2d..34c838c2f5 100644
> --- a/libavutil/aarch64/cpu.c
> +++ b/libavutil/aarch64/cpu.c
> @@ -20,6 +20,42 @@
>  #include "libavutil/cpu_internal.h"
>  #include "config.h"
> 
> +#if (defined(__linux__) || defined(__ANDROID__)) && HAVE_GETAUXVAL &&
> HAVE_ASM_HWCAP_H +#include 
> +#include 
> +#include 
> +
> +#define get_cpu_feature_reg(reg, val) \
> +__asm__("mrs %0, " #reg : "=r" (val))

Strictly speaking, this can read any system register. One way to prevent that 
would be to include the ID_ prefix and _EL1 suffix in the macro. I would have 
used a pure static inline instead, but that's just a matter of taste.

> +
> +static int detect_flags(void)
> +{
> +unsigned long ret = getauxval(AT_HWCAP);
> +int flags = 0;
> +#if defined(HWCAP_CPUID)
> +uint64_t tmp;
> +if (!(ret & HWCAP_CPUID))
> +return flags;
> +get_cpu_feature_reg(ID_AA64ISAR0_EL1, tmp);
> +if (((tmp >> 44) & 0xf) == 0x1)
> +flags |= AV_CPU_FLAG_DOTPROD;
> +get_cpu_feature_reg(ID_AA64ISAR1_EL1, tmp);
> +if (((tmp >> 52) & 0xf) == 0x1)
> +flags |= AV_CPU_FLAG_I8MM;
> +#endif

NEON detection could be added here, though I've yet to see an Armv8 
implementation without AdvSIMD.

FWIW, DotProd is exposed as HWCAP_ASIMDDP and I8MM is exposed via HWCAP2_I8MM, 
using trapped ID registers is not (yet) necessary.

> +
> +return flags;
> +}
> +
> +#else
> +
> +static int detect_flags(void)
> +{
> +return 0;
> +}
> +
> +#endif
> +
>  int ff_get_cpu_flags_aarch64(void)
>  {
>  int flags = AV_CPU_FLAG_ARMV8 * HAVE_ARMV8 |
> @@ -33,6 +69,8 @@ int ff_get_cpu_flags_aarch64(void)
>  flags |= AV_CPU_FLAG_I8MM;
>  #endif
> 
> +flags |= detect_flags();
> +
>  return flags;
>  }


-- 
レミ・デニ-クールモン
http://www.remlab.net/



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 1/4] configure: aarch64: Support assembling the dotprod and i8mm arch extensions

2023-05-27 Thread Rémi Denis-Courmont
Le perjantaina 26. toukokuuta 2023, 11.03.12 EEST Martin Storsjö a écrit :
> These are available since ARMv8.4-a and ARMv8.6-a respectively,
> but can also be available optionally since ARMv8.2-a.
> 
> Check if these are available for use unconditionally (e.g. if compiling
> with -march=armv8.6-a), or if they can be enabled with specific
> assembler directives.
> 
> Use ".arch_extension " for enabling a specific extension in
> assembly; the same can also be achieved with ".arch armv8.2-a+",
> but with .arch_extension is easier to combine multiple separate
> features.
> 
> Enabling these extensions requires setting a base architecture level
> of armv8.2-a with .arch. Don't add ".arch armv8.2-a" unless necessary;
> if the base level is high enough (which might unlock other extensions
> without .arch_extension), we don't want to lower it.

I don't follow how that would actually happen, TBH. Even if the default target 
version is, say, 8.5, the assembler won't magically start emitting 8.5 
instructions.

Someone would have to write assembler code that would fail to build under a 
toolchain with a lower target version. That sounds like a bug that should be 
spotted and fixed, rather than papered over. Conversely the logic here seems 
unnecessarily, if not counter-productively, intricate.

> Only add .arch/.arch_extension if needed, e.g. current clang fails
> to recognize the dotprod and i8mm features in .arch_extension, but
> can successfully assemble these instructions if part of the baseline
> set with -march.

IME, Clang is utterly useless for assembling. That has become one of my pet 
peeves with Rust inline assembler, which is other much nicer than C inline 
assembler, whence you can't just work around it with `-no-integrated-as`. But 
I digress.

If the problem is to avoid `.arch_extension`, then I don't really see why you 
can't just use `.arch` with plus, and simplify a lot.


-- 
雷米‧德尼-库尔蒙
http://www.remlab.net/



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH] lavc/aarch64: new optimization for 8-bit hevc_pel_uni_w_pixels, qpel_uni_w_h, qpel_uni_w_v, qpel_uni_w_hv and qpel_h

2023-05-27 Thread myais

Hi, Martin,

I saw your new opinions. Do you mean that the code of my current patch 
should be guard as follows?


C code:

/if (have_i8mm(cpu_flags)) {//
//}/

/asm code :/

/#if HAVE_I8MM/

/#endif/


I mean my current code base does not have those definitions, should I 
implement them directly as if they already exist?



other opinions is under modification..


在 2023/5/26 16:34, Martin Storsjö 写道:

Hi,

Overall these patches seem mostly ok, but I've got a few minor points 
to make:


- The usdot instruction requires the i8mm extension (part of 
armv8.6-a), while udot or sdot would require the dotprod extension 
(available in armv8.4-a). If you could manage with udot or sdot, these 
functions would be usable on a wider set of CPUs.


Therefore, the current guards are wrong. Also, I finally got support 
implemented for optionally using these cpu extensions, even if the 
baseline of the compile don't include it, by runtime enabling it. See 
the patchset at 
https://patchwork.ffmpeg.org/project/ffmpeg/list/?series=9009.


To adapt your patches on top of this, see the two topmost commits at 
https://github.com/mstorsjo/ffmpeg/commits/archext.


- The indentation is inconsistent; in the first patch, you have some 
instructions written like this:


+    sqadd   v1.4s, v1.4s, v29.4s

While you later use this style:

+    dup v1.16b, v28.b[1]

The latter seems to match the style we commonly use; please reformat 
your code to match that consistently.


With some macro invocations in the first patch, you also seem to have 
too much indentation in some places. See e.g. this:


+1:  ldr q23, [x2, x3]
+    add x2, x2, x3, lsl #1
+    QPEL_FILTER_B v26, v16, v17, v18, v19, v20, v21, v22, v23
+    QPEL_FILTER_B2    v27, v16, v17, v18, v19, v20, v21, v22, v23
+    QPEL_UNI_W_V_16
+    subs    w4, w4, #1
+    b.eq    2f

(If the macro name is too long, that's ok, but here there's no need to 
have those lines unaligned.)


- In the third patch, you've got multiple parameters from the stack 
like this:


+    ldp x14, x15, [sp]  // mx, my
+    ldr w13, [sp, #16]  // width

I see that the mx an my parameters are intptr_t; that's good, since if 
they would be 32 bit integers, the ABI for such parameters on the 
stack differ between macOS/Darwin and Linux. But as long as they're 
intptr_t they behave the same.


- At the same place, you're backing up a bunch of registers:

+    stp x20, x21, [sp, #-16]!
+    stp x22, x23, [sp, #-16]!
+    stp x24, x25, [sp, #-16]!
+    stp x26, x27, [sp, #-16]!
+    stp x28, x30, [sp, #-16]!

This is inefficient; instead, do this:

+    stp x28, x30, [sp, #-80]!
+    stp x20, x21, [sp, #16]
+    stp x22, x23, [sp, #32]
+    stp x24, x25, [sp, #48]
+    stp x26, x27, [sp, #64]

Also, following that, I see that you back up the stack pointer in x28. 
Why do you use x28 for that? Using x29 would be customary as frame 
pointer.


Aside for that, I think the rest of the patches is acceptable.

// Martin

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH] lavc/aarch64: new optimization for 8-bit hevc_pel_uni_w_pixels, qpel_uni_w_h, qpel_uni_w_v, qpel_uni_w_hv and qpel_h

2023-05-27 Thread myais

yes, of course,   Re-add the missing mailing list...


在 2023/5/27 13:45, Martin Storsjö 写道:

Hi,

Thanks - can you send the new patches to the mailing list too? They 
need to be available publicly for review before they can be accepted. 
(I didn't check these yet.)


// Martin

On Sat, 27 May 2023, myais wrote:


Hi, Martin,

Thank you for your correction, and I completed the modification 
according to your opinion, the attachments are the new patches.



Thanks.


在 2023/5/24 20:49, Martin Storsjö 写道:

Hi,

On Tue, 23 May 2023, myais wrote:


Do you have any new opinions here? I am looking forward to your reply.


I've started looking at this now after focusing on a different issue 
first.


The big thing is that this is the first new optional instruction set 
on top of aarch64, so there's a bit of work to do to handle that 
properly (with runtime detection, and assembling that code even if 
the baseline target doesn't support it). I've started looking into 
that now.


In your case your patches don't care about that and just hardcode 
enabling it if the compiler baseline support the instruction, and 
skips it otherwise. I guess that's possibly fine, but your condition 
for the code is wrong; the "usdot" instruction requires the "i8mm" 
extension, not "dotprod". i8mm is part of armv8.6-a (and is 
available on graviton 3, luckily, which allows me to test it).


So instead of __ARM_FEATURE_DOTPROD, this should use 
__ARM_FEATURE_MATMUL_INT8, and the functions should probably use 
i8mm as suffix instead of dotprod. I guess you can resubmit them 
with that change (and make sure you don't end up with the "no 
trailing newline at the end of file" issue in the changed files in 
any intermediate commit).


In addition, I have some new similar patches, which are the aarch64 
implementations of some other functions, should I wait for your 
feedback before submitting or submit it directly?


I'd prefer to settle these patches first before taking on the next set.

// Martin
From b08c4e571a49de109a0619fb9de9461af4901115 Mon Sep 17 00:00:00 2001
From: myais 
Date: Wed, 3 May 2023 09:53:07 +0800
Subject: [PATCH 1/3] lavc/aarch64: new optimization for 8-bit
 hevc_pel_uni_w_pixels and qpel_uni_w_v

---
 libavcodec/aarch64/hevcdsp_init_aarch64.c |  51 ++
 libavcodec/aarch64/hevcdsp_qpel_neon.S| 710 ++
 2 files changed, 761 insertions(+)

diff --git a/libavcodec/aarch64/hevcdsp_init_aarch64.c 
b/libavcodec/aarch64/hevcdsp_init_aarch64.c
index be1049a2ec..6b5341dd45 100644
--- a/libavcodec/aarch64/hevcdsp_init_aarch64.c
+++ b/libavcodec/aarch64/hevcdsp_init_aarch64.c
@@ -128,6 +128,52 @@ void ff_hevc_put_hevc_qpel_bi_h16_8_neon(uint8_t *_dst, 
ptrdiff_t _dststride, co
  ptrdiff_t _srcstride, const int16_t 
*src2, int height, intptr_t
  mx, intptr_t my, int width);
 
+#define NEON8_FNPROTO(fn, args, ext) \
+void ff_hevc_put_hevc_##fn##4_8_neon##ext args; \
+void ff_hevc_put_hevc_##fn##6_8_neon##ext args; \
+void ff_hevc_put_hevc_##fn##8_8_neon##ext args; \
+void ff_hevc_put_hevc_##fn##12_8_neon##ext args; \
+void ff_hevc_put_hevc_##fn##16_8_neon##ext args; \
+void ff_hevc_put_hevc_##fn##24_8_neon##ext args; \
+void ff_hevc_put_hevc_##fn##32_8_neon##ext args; \
+void ff_hevc_put_hevc_##fn##48_8_neon##ext args; \
+void ff_hevc_put_hevc_##fn##64_8_neon##ext args; \
+
+#define NEON8_FNPROTO_PARTIAL_4(fn, args, ext) \
+void ff_hevc_put_hevc_##fn##4_8_neon##ext args; \
+void ff_hevc_put_hevc_##fn##8_8_neon##ext args; \
+void ff_hevc_put_hevc_##fn##16_8_neon##ext args; \
+void ff_hevc_put_hevc_##fn##64_8_neon##ext args; \
+
+NEON8_FNPROTO(pel_uni_w_pixels, (uint8_t *_dst, ptrdiff_t _dststride,
+const uint8_t *_src, ptrdiff_t _srcstride,
+int height, int denom, int wx, int ox,
+intptr_t mx, intptr_t my, int width),);
+
+NEON8_FNPROTO_PARTIAL_4(qpel_uni_w_v, (uint8_t *_dst,  ptrdiff_t _dststride,
+const uint8_t *_src, ptrdiff_t _srcstride,
+int height, int denom, int wx, int ox,
+intptr_t mx, intptr_t my, int width),);
+
+#define NEON8_FNASSIGN(member, v, h, fn, ext) \
+member[1][v][h] = ff_hevc_put_hevc_##fn##4_8_neon##ext;  \
+member[2][v][h] = ff_hevc_put_hevc_##fn##6_8_neon##ext;  \
+member[3][v][h] = ff_hevc_put_hevc_##fn##8_8_neon##ext;  \
+member[4][v][h] = ff_hevc_put_hevc_##fn##12_8_neon##ext; \
+member[5][v][h] = ff_hevc_put_hevc_##fn##16_8_neon##ext; \
+member[6][v][h] = ff_hevc_put_hevc_##fn##24_8_neon##ext; \
+member[7][v][h] = ff_hevc_put_hevc_##fn##32_8_neon##ext; \
+member[8][v][h] = ff_hevc_put_hevc_##fn##48_8_neon##ext; \
+member[9][v][h] = ff_hevc_put_hevc_##fn##64_8_neon##ext;
+
+#define NEON8_FNASSIGN_PARTIAL_4(member, v, h, fn, ext) \
+member[1][v][h] = ff_hevc_put_hevc_##fn##4_8_neon##ext;  \
+member[3][v][h] =