Re: [FFmpeg-devel] [PATCH v3 0/5] avcodec/ac3: Add aarch64 NEON DSP
On Tue, 2 Apr 2024, Geoff Hill wrote: Here's v3 to push the AC-3 ARMv8 NEON experiment a step further. This version implements 5 of the AC-3 encoder DSP functions, and adds checkasm tests where missing. I've tested that the checkasm tests pass on aarch64 and x86. Thanks, I've tested that checkasm also passes on 32 bit arm (where we also do have an ac3dsp implementation). Overall the patches look mostly fine. Are these implementations based on the existing 32 bit arm ones? The code is quite similar (although there's not very many different ways to implement things, so this could be a coincidence)? If based on the existing code, it would be good to retain the copyright statement from that file. These functions have a different indentation than the rest of essentially all our aarch64 assembly (the code you're adding is aligned in two different ways) - please check other files (e.g. vp8dsp_neon.S) for example. The instructions should be aligned to 8 leading spaces, and operands to 24 leading characters. Other than those generic points, I have two comments on the patches themselves. // Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH v3 0/5] avcodec/ac3: Add aarch64 NEON DSP
Here's v3 to push the AC-3 ARMv8 NEON experiment a step further. This version implements 5 of the AC-3 encoder DSP functions, and adds checkasm tests where missing. I've tested that the checkasm tests pass on aarch64 and x86. On AWS Graviton2 (t4g.medium), GCC 12.3: $ tests/checkasm/checkasm --bench --verbose --test=ac3dsp ... NEON: - ac3dsp.ac3_exponent_min [OK] - ac3dsp.ac3_extract_exponents [OK] - ac3dsp.float_to_fixed24 [OK] - ac3dsp.ac3_sum_square_butterfly_int32 [OK] - ac3dsp.ac3_sum_square_butterfly_float [OK] checkasm: all 20 tests passed ac3_exponent_min_reuse0_c: 9.0 ac3_exponent_min_reuse0_neon: 9.7 ac3_exponent_min_reuse1_c: 1037.5 ac3_exponent_min_reuse1_neon: 54.0 ac3_exponent_min_reuse2_c: 1820.7 ac3_exponent_min_reuse2_neon: 135.2 ac3_exponent_min_reuse3_c: 2080.5 ac3_exponent_min_reuse3_neon: 167.7 ac3_exponent_min_reuse4_c: 2493.2 ac3_exponent_min_reuse4_neon: 200.0 ac3_exponent_min_reuse5_c: 2970.0 ac3_exponent_min_reuse5_neon: 231.7 ac3_extract_exponents_n512_c: 1717.5 ac3_extract_exponents_n512_neon: 506.7 ac3_extract_exponents_n768_c: 2562.7 ac3_extract_exponents_n768_neon: 769.7 ac3_extract_exponents_n1024_c: 3389.2 ac3_extract_exponents_n1024_neon: 1019.0 ac3_extract_exponents_n1280_c: 4210.7 ac3_extract_exponents_n1280_neon: 1267.5 ac3_extract_exponents_n1536_c: 5071.5 ac3_extract_exponents_n1536_neon: 1522.0 ac3_extract_exponents_n1792_c: 5896.5 ac3_extract_exponents_n1792_neon: 1784.0 ac3_extract_exponents_n2048_c: 6779.2 ac3_extract_exponents_n2048_neon: 2051.0 ac3_extract_exponents_n2304_c: 7559.5 ac3_extract_exponents_n2304_neon: 2290.0 ac3_extract_exponents_n2560_c: 8397.2 ac3_extract_exponents_n2560_neon: 2552.5 ac3_extract_exponents_n2816_c: 9224.2 ac3_extract_exponents_n2816_neon: 2797.7 ac3_extract_exponents_n3072_c: 10026.2 ac3_extract_exponents_n3072_neon: 3047.7 ac3_sum_square_bufferfly_float_c: 1605.7 ac3_sum_square_bufferfly_float_neon: 365.7 ac3_sum_square_bufferfly_int32_c: 965.5 ac3_sum_square_bufferfly_int32_neon: 486.2 float_to_fixed24_c: 2453.7 float_to_fixed24_neon: 516.2 Geoff Hill (5): avcodec/ac3: Implement float_to_fixed24 for aarch64 NEON avcodec/ac3: Implement ac3_exponent_min for aarch64 NEON avcodec/ac3: Implement ac3_extract_exponents for aarch64 NEON avcodec/ac3: Implement sum_square_butterfly_int32 for aarch64 NEON avcodec/ac3: Implement sum_square_butterfly_float for aarch64 NEON libavcodec/aarch64/Makefile | 2 + libavcodec/aarch64/ac3dsp_init_aarch64.c | 50 + libavcodec/aarch64/ac3dsp_neon.S | 125 ++ libavcodec/ac3dsp.c | 4 +- libavcodec/ac3dsp.h | 3 +- tests/checkasm/ac3dsp.c | 130 +++ 6 files changed, 312 insertions(+), 2 deletions(-) create mode 100644 libavcodec/aarch64/ac3dsp_init_aarch64.c create mode 100644 libavcodec/aarch64/ac3dsp_neon.S -- 2.44.0 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".