Re: [FFmpeg-devel] [PATCH] aarch64: ac3dsp: Simplify the end of ff_ac3_sum_square_butterfly_float_neon

2024-04-09 Thread J. Dekker

Martin Storsjö  writes:

> Before:   Cortex A53 A72 A78
> ac3_sum_square_bufferfly_float_neon:  1005.7   516.5   224.5
> After:
> ac3_sum_square_bufferfly_float_neon:   981.7   504.5   223.2
> ---
>  libavcodec/aarch64/ac3dsp_neon.S | 16 
>  1 file changed, 4 insertions(+), 12 deletions(-)
>
> diff --git a/libavcodec/aarch64/ac3dsp_neon.S 
> b/libavcodec/aarch64/ac3dsp_neon.S
> index 20beb6cc50..7e97cc39f7 100644
> --- a/libavcodec/aarch64/ac3dsp_neon.S
> +++ b/libavcodec/aarch64/ac3dsp_neon.S
> @@ -103,17 +103,9 @@ function ff_ac3_sum_square_butterfly_float_neon, export=1
>  fmlav3.4s, v17.4s, v17.4s
>  subsw3, w3, #4
>  b.gt1b
> -faddp   v0.4s, v0.4s, v0.4s
> -faddp   v0.2s, v0.2s, v0.2s
> -st1 {v0.s}[0], [x0], #4
> -faddp   v1.4s, v1.4s, v1.4s
> -faddp   v1.2s, v1.2s, v1.2s
> -st1 {v1.s}[0], [x0], #4
> -faddp   v2.4s, v2.4s, v2.4s
> -faddp   v2.2s, v2.2s, v2.2s
> -st1 {v2.s}[0], [x0], #4
> -faddp   v3.4s, v3.4s, v3.4s
> -faddp   v3.2s, v3.2s, v3.2s
> -st1 {v3.s}[0], [x0]
> +faddp   v0.4s, v0.4s, v1.4s
> +faddp   v2.4s, v2.4s, v3.4s
> +faddp   v0.4s, v0.4s, v2.4s
> +st1 {v0.4s}, [x0]
>  ret
>  endfunc

Thanks, LGTM. Pushed with M1 benchmark on Linux.

-- 
jd
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


[FFmpeg-devel] [PATCH] aarch64: ac3dsp: Simplify the end of ff_ac3_sum_square_butterfly_float_neon

2024-04-08 Thread Martin Storsjö
Before:   Cortex A53 A72 A78
ac3_sum_square_bufferfly_float_neon:  1005.7   516.5   224.5
After:
ac3_sum_square_bufferfly_float_neon:   981.7   504.5   223.2
---
 libavcodec/aarch64/ac3dsp_neon.S | 16 
 1 file changed, 4 insertions(+), 12 deletions(-)

diff --git a/libavcodec/aarch64/ac3dsp_neon.S b/libavcodec/aarch64/ac3dsp_neon.S
index 20beb6cc50..7e97cc39f7 100644
--- a/libavcodec/aarch64/ac3dsp_neon.S
+++ b/libavcodec/aarch64/ac3dsp_neon.S
@@ -103,17 +103,9 @@ function ff_ac3_sum_square_butterfly_float_neon, export=1
 fmlav3.4s, v17.4s, v17.4s
 subsw3, w3, #4
 b.gt1b
-faddp   v0.4s, v0.4s, v0.4s
-faddp   v0.2s, v0.2s, v0.2s
-st1 {v0.s}[0], [x0], #4
-faddp   v1.4s, v1.4s, v1.4s
-faddp   v1.2s, v1.2s, v1.2s
-st1 {v1.s}[0], [x0], #4
-faddp   v2.4s, v2.4s, v2.4s
-faddp   v2.2s, v2.2s, v2.2s
-st1 {v2.s}[0], [x0], #4
-faddp   v3.4s, v3.4s, v3.4s
-faddp   v3.2s, v3.2s, v3.2s
-st1 {v3.s}[0], [x0]
+faddp   v0.4s, v0.4s, v1.4s
+faddp   v2.4s, v2.4s, v3.4s
+faddp   v0.4s, v0.4s, v2.4s
+st1 {v0.4s}, [x0]
 ret
 endfunc
-- 
2.39.3 (Apple Git-146)

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".