[FFmpeg-devel] [PATCH 0/3] Provide neon implementations

2022-09-20 Thread Hubert Mazur
. Hubert Mazur (3): lavc/aarch64: Add neon implementation for pix_median_abs16 lavc/aarch64: Add neon implementation for vsad8_intra lavc/aarch64: Add neon implementation for pix_median_abs8 libavcodec/aarch64/me_cmp_init_aarch64.c | 10 ++ libavcodec/aarch64/me_cmp_neon.S | 182

[FFmpeg-devel] [PATCH 1/3] lavc/aarch64: Add neon implementation for pix_median_abs16

2022-09-20 Thread Hubert Mazur
Provide optimized implementation for pix_median_abs16 function. Performance comparison tests are shown below. - median_sad_0_c: 720.5 - median_sad_0_neon: 127.2 Benchmarks and tests run with checkasm tool on AWS Graviton 3. Signed-off-by: Hubert Mazur --- libavcodec/aarch64

[FFmpeg-devel] [PATCH 2/3] lavc/aarch64: Add neon implementation for vsad8_intra

2022-09-20 Thread Hubert Mazur
Provide optimized implementation for vsad8_intra function. Performance comparison tests are shown below. - vsad_5_c: 94.7 - vsad_5_neon: 20.7 Benchmarks and tests run with checkasm tool on AWS Graviton 3. Signed-off-by: Hubert Mazur --- libavcodec/aarch64/me_cmp_init_aarch64.c | 3

[FFmpeg-devel] [PATCH 3/3] lavc/aarch64: Add neon implementation for pix_median_abs8

2022-09-20 Thread Hubert Mazur
Provide optimized implementation for pix_median_abs8 function. Performance comparison tests are shown below. - median_sad_1_c: 277.0 - median_sad_1_neon: 82.0 Benchmarks and tests run with checkasm tool on AWS Graviton 3. Signed-off-by: Hubert Mazur --- libavcodec/aarch64/me_cmp_init_aarch64

Re: [FFmpeg-devel] [PATCH 1/2] aarch64: me_cmp: Avoid redundant loads in ff_pix_abs16_y2_neon

2022-09-28 Thread Hubert Mazur
LGTM, thanks! On Wed, Sep 28, 2022 at 11:13 AM Martin Storsjö wrote: > This avoids one redundant load per row; pix3 from the previous > iteration can be used as pix2 in the next one. > > Before: Cortex A53A72A73 > pix_abs_0_2_neon: 138.0 59.7 48.0 > After: > pix_abs_0_2_neon:

Re: [FFmpeg-devel] [PATCH 2/2] aarch64: me_cmp: Avoid using the non-unrolled codepath for the minimum unroll size

2022-09-28 Thread Hubert Mazur
LGTM. On Wed, Sep 28, 2022 at 11:13 AM Martin Storsjö wrote: > Signed-off-by: Martin Storsjö > --- > libavcodec/aarch64/me_cmp_neon.S | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/libavcodec/aarch64/me_cmp_neon.S > b/libavcodec/aarch64/me_cmp_neon.S > index 832

[FFmpeg-devel] [PATCH 0/4] Provide neon implementations for hscale functions

2022-10-17 Thread Hubert Mazur
Provide arm64 neon optimized functions from swscale family. Hubert Mazur (4): sw_scale: Add specializations for hscale 8 to 19 tests/sw_scale: Add test cases for input sizes 16 sw_scale: Add specializations for hscale 16 to 15 sw_scale: Add specializations for hscale 16 to 19 libswscale

[FFmpeg-devel] [PATCH 1/4] sw_scale: Add specializations for hscale 8 to 19

2022-10-17 Thread Hubert Mazur
: 32803.7 hscale_8_to_19__fs_32_dstW_512_neon: 5474.2 hscale_8_to_19__fs_40_dstW_512_c: 40948.0 hscale_8_to_19__fs_40_dstW_512_neon: 6669.7 Signed-off-by: Hubert Mazur --- libswscale/aarch64/hscale.S | 292 ++- libswscale/aarch64/swscale.c | 13 +- 2 files changed

[FFmpeg-devel] [PATCH 2/4] tests/sw_scale: Add test cases for input sizes 16

2022-10-17 Thread Hubert Mazur
Previously test cases handled only input sizes equal to 8. Add support for input size 16 which is used by scaling routines hscale16To15 and hscale16To19. Pass SwsContext pointer to each function as some of them make use of it. Signed-off-by: Hubert Mazur --- tests/checkasm/sw_scale.c | 35

[FFmpeg-devel] [PATCH 3/4] sw_scale: Add specializations for hscale 16 to 15

2022-10-17 Thread Hubert Mazur
hscale_16_to_15__fs_32_dstW_512_neon: 9511.2 hscale_16_to_15__fs_40_dstW_512_c: 48995.7 hscale_16_to_15__fs_40_dstW_512_neon: 11570.0 Signed-off-by: Hubert Mazur --- libswscale/aarch64/hscale.S | 409 ++- libswscale/aarch64/swscale.c | 66 +- libswscale/swscale.c

[FFmpeg-devel] [PATCH 4/4] sw_scale: Add specializations for hscale 16 to 19

2022-10-17 Thread Hubert Mazur
hscale_16_to_19__fs_32_dstW_512_neon: 9502.7 hscale_16_to_19__fs_40_dstW_512_c: 45477.5 hscale_16_to_19__fs_40_dstW_512_neon: 11552.0 Signed-off-by: Hubert Mazur --- libswscale/aarch64/hscale.S | 402 +++ libswscale/aarch64/swscale.c | 70 +- 2 files changed, 471

Re: [FFmpeg-devel] [PATCH 4/4] sw_scale: Add specializations for hscale 16 to 19

2022-10-25 Thread Hubert Mazur
Thanks for the review. I will fix the failing checkasm first and then take care of the minor issues. I will try to to resend fixed versions this week. Regards, Hubert On Mon, Oct 24, 2022 at 3:19 PM Martin Storsjö wrote: > On Mon, 17 Oct 2022, Hubert Mazur wrote: > > > Provid

[FFmpeg-devel] [PATCH 0/3] sw_scale: Provide neon implementation for hscale

2022-10-28 Thread Hubert Mazur
. After fixing x86 the patch for checkasm could be merged. https://patchwork.ffmpeg.org/project/ffmpeg/patch/20221017130715.30896-3-...@semihalf.com/ Hubert Mazur (3): sw_scale: Add specializations for hscale 8 to 19 sw_scale: Add specializations for hscale 16 to 15 sw_scale: Add

[FFmpeg-devel] [PATCH 1/3] sw_scale: Add specializations for hscale 8 to 19

2022-10-28 Thread Hubert Mazur
: 32803.7 hscale_8_to_19__fs_32_dstW_512_neon: 5474.2 hscale_8_to_19__fs_40_dstW_512_c: 40948.0 hscale_8_to_19__fs_40_dstW_512_neon: 6669.7 Signed-off-by: Hubert Mazur --- libswscale/aarch64/hscale.S | 291 +++ libswscale/aarch64/swscale.c | 13 +- 2 files changed

[FFmpeg-devel] [PATCH 2/3] sw_scale: Add specializations for hscale 16 to 15

2022-10-28 Thread Hubert Mazur
hscale_16_to_15__fs_32_dstW_512_neon: 9511.2 hscale_16_to_15__fs_40_dstW_512_c: 48995.7 hscale_16_to_15__fs_40_dstW_512_neon: 11570.0 Signed-off-by: Hubert Mazur --- libswscale/aarch64/hscale.S | 407 +++ libswscale/aarch64/swscale.c | 61 ++ libswscale/swscale.c

[FFmpeg-devel] [PATCH 3/3] sw_scale: Add specializations for hscale 16 to 19

2022-10-28 Thread Hubert Mazur
hscale_16_to_19__fs_32_dstW_512_neon: 9502.7 hscale_16_to_19__fs_40_dstW_512_c: 45477.5 hscale_16_to_19__fs_40_dstW_512_neon: 11552.0 Signed-off-by: Hubert Mazur --- libswscale/aarch64/hscale.S | 402 +++ libswscale/aarch64/swscale.c | 66 ++ 2 files changed, 468

[FFmpeg-devel] [PATCH 0/2] lavc/aarch64: Provide neon implementations

2022-06-29 Thread Hubert Mazur
Provide neon implementations for motion estimation functions. Hubert Mazur (2): lavc/aarch64: Assign callback with function lavc/aarch64: Add pix_abs16_x2 neon implementation libavcodec/aarch64/me_cmp_init_aarch64.c | 5 + libavcodec/aarch64/me_cmp_neon.S | 134

[FFmpeg-devel] [PATCH 1/2] lavc/aarch64: Assign callback with function

2022-06-29 Thread Hubert Mazur
Assign c->sad[0] callback with already existing neon implementation of pix_abs16 function. Signed-off-by: Hubert Mazur --- libavcodec/aarch64/me_cmp_init_aarch64.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/libavcodec/aarch64/me_cmp_init_aarch64.c b/libavcodec/aarc

[FFmpeg-devel] [PATCH 2/2] lavc/aarch64: Add pix_abs16_x2 neon implementation

2022-06-29 Thread Hubert Mazur
Provide neon implementation for pix_abs16_x2 function. Performance tests of implementation are below. - pix_abs_0_1_c: 291.9 - pix_abs_0_1_neon: 73.7 Benchmarks and tests run with checkasm tool on AWS Graviton 3. Signed-off-by: Hubert Mazur --- libavcodec/aarch64/me_cmp_init_aarch64.c | 3