from:"Hubert Mazur"

[FFmpeg-devel] [PATCH 0/3] Provide neon implementations

2022-09-20 Thread Hubert Mazur

. Hubert Mazur (3): lavc/aarch64: Add neon implementation for pix_median_abs16 lavc/aarch64: Add neon implementation for vsad8_intra lavc/aarch64: Add neon implementation for pix_median_abs8 libavcodec/aarch64/me_cmp_init_aarch64.c | 10 ++ libavcodec/aarch64/me_cmp_neon.S | 182

[FFmpeg-devel] [PATCH 1/3] lavc/aarch64: Add neon implementation for pix_median_abs16

2022-09-20 Thread Hubert Mazur

Provide optimized implementation for pix_median_abs16 function. Performance comparison tests are shown below. - median_sad_0_c: 720.5 - median_sad_0_neon: 127.2 Benchmarks and tests run with checkasm tool on AWS Graviton 3. Signed-off-by: Hubert Mazur --- libavcodec/aarch64

[FFmpeg-devel] [PATCH 2/3] lavc/aarch64: Add neon implementation for vsad8_intra

2022-09-20 Thread Hubert Mazur

Provide optimized implementation for vsad8_intra function. Performance comparison tests are shown below. - vsad_5_c: 94.7 - vsad_5_neon: 20.7 Benchmarks and tests run with checkasm tool on AWS Graviton 3. Signed-off-by: Hubert Mazur --- libavcodec/aarch64/me_cmp_init_aarch64.c | 3

[FFmpeg-devel] [PATCH 3/3] lavc/aarch64: Add neon implementation for pix_median_abs8

2022-09-20 Thread Hubert Mazur

Provide optimized implementation for pix_median_abs8 function. Performance comparison tests are shown below. - median_sad_1_c: 277.0 - median_sad_1_neon: 82.0 Benchmarks and tests run with checkasm tool on AWS Graviton 3. Signed-off-by: Hubert Mazur --- libavcodec/aarch64/me_cmp_init_aarch64

Re: [FFmpeg-devel] [PATCH 1/2] aarch64: me_cmp: Avoid redundant loads in ff_pix_abs16_y2_neon

2022-09-28 Thread Hubert Mazur

LGTM, thanks! On Wed, Sep 28, 2022 at 11:13 AM Martin Storsjö wrote: > This avoids one redundant load per row; pix3 from the previous > iteration can be used as pix2 in the next one. > > Before: Cortex A53A72A73 > pix_abs_0_2_neon: 138.0 59.7 48.0 > After: > pix_abs_0_2_neon:

Re: [FFmpeg-devel] [PATCH 2/2] aarch64: me_cmp: Avoid using the non-unrolled codepath for the minimum unroll size

2022-09-28 Thread Hubert Mazur

LGTM. On Wed, Sep 28, 2022 at 11:13 AM Martin Storsjö wrote: > Signed-off-by: Martin Storsjö > --- > libavcodec/aarch64/me_cmp_neon.S | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/libavcodec/aarch64/me_cmp_neon.S > b/libavcodec/aarch64/me_cmp_neon.S > index 832

[FFmpeg-devel] [PATCH 0/4] Provide neon implementations for hscale functions

2022-10-17 Thread Hubert Mazur

Provide arm64 neon optimized functions from swscale family. Hubert Mazur (4): sw_scale: Add specializations for hscale 8 to 19 tests/sw_scale: Add test cases for input sizes 16 sw_scale: Add specializations for hscale 16 to 15 sw_scale: Add specializations for hscale 16 to 19 libswscale

[FFmpeg-devel] [PATCH 1/4] sw_scale: Add specializations for hscale 8 to 19

2022-10-17 Thread Hubert Mazur

: 32803.7 hscale_8_to_19__fs_32_dstW_512_neon: 5474.2 hscale_8_to_19__fs_40_dstW_512_c: 40948.0 hscale_8_to_19__fs_40_dstW_512_neon: 6669.7 Signed-off-by: Hubert Mazur --- libswscale/aarch64/hscale.S | 292 ++- libswscale/aarch64/swscale.c | 13 +- 2 files changed

[FFmpeg-devel] [PATCH 2/4] tests/sw_scale: Add test cases for input sizes 16

2022-10-17 Thread Hubert Mazur

Previously test cases handled only input sizes equal to 8. Add support for input size 16 which is used by scaling routines hscale16To15 and hscale16To19. Pass SwsContext pointer to each function as some of them make use of it. Signed-off-by: Hubert Mazur --- tests/checkasm/sw_scale.c | 35

[FFmpeg-devel] [PATCH 3/4] sw_scale: Add specializations for hscale 16 to 15

2022-10-17 Thread Hubert Mazur

hscale_16_to_15__fs_32_dstW_512_neon: 9511.2 hscale_16_to_15__fs_40_dstW_512_c: 48995.7 hscale_16_to_15__fs_40_dstW_512_neon: 11570.0 Signed-off-by: Hubert Mazur --- libswscale/aarch64/hscale.S | 409 ++- libswscale/aarch64/swscale.c | 66 +- libswscale/swscale.c

[FFmpeg-devel] [PATCH 4/4] sw_scale: Add specializations for hscale 16 to 19

2022-10-17 Thread Hubert Mazur

hscale_16_to_19__fs_32_dstW_512_neon: 9502.7 hscale_16_to_19__fs_40_dstW_512_c: 45477.5 hscale_16_to_19__fs_40_dstW_512_neon: 11552.0 Signed-off-by: Hubert Mazur --- libswscale/aarch64/hscale.S | 402 +++ libswscale/aarch64/swscale.c | 70 +- 2 files changed, 471

Re: [FFmpeg-devel] [PATCH 4/4] sw_scale: Add specializations for hscale 16 to 19

2022-10-25 Thread Hubert Mazur

Thanks for the review. I will fix the failing checkasm first and then take care of the minor issues. I will try to to resend fixed versions this week. Regards, Hubert On Mon, Oct 24, 2022 at 3:19 PM Martin Storsjö wrote: > On Mon, 17 Oct 2022, Hubert Mazur wrote: > > > Provid

[FFmpeg-devel] [PATCH 0/3] sw_scale: Provide neon implementation for hscale

2022-10-28 Thread Hubert Mazur

. After fixing x86 the patch for checkasm could be merged. https://patchwork.ffmpeg.org/project/ffmpeg/patch/20221017130715.30896-3-...@semihalf.com/ Hubert Mazur (3): sw_scale: Add specializations for hscale 8 to 19 sw_scale: Add specializations for hscale 16 to 15 sw_scale: Add

[FFmpeg-devel] [PATCH 1/3] sw_scale: Add specializations for hscale 8 to 19

2022-10-28 Thread Hubert Mazur

: 32803.7 hscale_8_to_19__fs_32_dstW_512_neon: 5474.2 hscale_8_to_19__fs_40_dstW_512_c: 40948.0 hscale_8_to_19__fs_40_dstW_512_neon: 6669.7 Signed-off-by: Hubert Mazur --- libswscale/aarch64/hscale.S | 291 +++ libswscale/aarch64/swscale.c | 13 +- 2 files changed

[FFmpeg-devel] [PATCH 2/3] sw_scale: Add specializations for hscale 16 to 15

2022-10-28 Thread Hubert Mazur

hscale_16_to_15__fs_32_dstW_512_neon: 9511.2 hscale_16_to_15__fs_40_dstW_512_c: 48995.7 hscale_16_to_15__fs_40_dstW_512_neon: 11570.0 Signed-off-by: Hubert Mazur --- libswscale/aarch64/hscale.S | 407 +++ libswscale/aarch64/swscale.c | 61 ++ libswscale/swscale.c

[FFmpeg-devel] [PATCH 3/3] sw_scale: Add specializations for hscale 16 to 19

2022-10-28 Thread Hubert Mazur

hscale_16_to_19__fs_32_dstW_512_neon: 9502.7 hscale_16_to_19__fs_40_dstW_512_c: 45477.5 hscale_16_to_19__fs_40_dstW_512_neon: 11552.0 Signed-off-by: Hubert Mazur --- libswscale/aarch64/hscale.S | 402 +++ libswscale/aarch64/swscale.c | 66 ++ 2 files changed, 468

[FFmpeg-devel] [PATCH 0/2] lavc/aarch64: Provide neon implementations

2022-06-29 Thread Hubert Mazur

Provide neon implementations for motion estimation functions. Hubert Mazur (2): lavc/aarch64: Assign callback with function lavc/aarch64: Add pix_abs16_x2 neon implementation libavcodec/aarch64/me_cmp_init_aarch64.c | 5 + libavcodec/aarch64/me_cmp_neon.S | 134

[FFmpeg-devel] [PATCH 1/2] lavc/aarch64: Assign callback with function

2022-06-29 Thread Hubert Mazur

Assign c->sad[0] callback with already existing neon implementation of pix_abs16 function. Signed-off-by: Hubert Mazur --- libavcodec/aarch64/me_cmp_init_aarch64.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/libavcodec/aarch64/me_cmp_init_aarch64.c b/libavcodec/aarc

[FFmpeg-devel] [PATCH 2/2] lavc/aarch64: Add pix_abs16_x2 neon implementation

2022-06-29 Thread Hubert Mazur

Provide neon implementation for pix_abs16_x2 function. Performance tests of implementation are below. - pix_abs_0_1_c: 291.9 - pix_abs_0_1_neon: 73.7 Benchmarks and tests run with checkasm tool on AWS Graviton 3. Signed-off-by: Hubert Mazur --- libavcodec/aarch64/me_cmp_init_aarch64.c | 3

[FFmpeg-devel] [PATCH 0/3] Provide neon implementations

[FFmpeg-devel] [PATCH 1/3] lavc/aarch64: Add neon implementation for pix_median_abs16

[FFmpeg-devel] [PATCH 2/3] lavc/aarch64: Add neon implementation for vsad8_intra

[FFmpeg-devel] [PATCH 3/3] lavc/aarch64: Add neon implementation for pix_median_abs8

Re: [FFmpeg-devel] [PATCH 1/2] aarch64: me_cmp: Avoid redundant loads in ff_pix_abs16_y2_neon

Re: [FFmpeg-devel] [PATCH 2/2] aarch64: me_cmp: Avoid using the non-unrolled codepath for the minimum unroll size

[FFmpeg-devel] [PATCH 0/4] Provide neon implementations for hscale functions

[FFmpeg-devel] [PATCH 1/4] sw_scale: Add specializations for hscale 8 to 19

[FFmpeg-devel] [PATCH 2/4] tests/sw_scale: Add test cases for input sizes 16

[FFmpeg-devel] [PATCH 3/4] sw_scale: Add specializations for hscale 16 to 15

[FFmpeg-devel] [PATCH 4/4] sw_scale: Add specializations for hscale 16 to 19

Re: [FFmpeg-devel] [PATCH 4/4] sw_scale: Add specializations for hscale 16 to 19

[FFmpeg-devel] [PATCH 0/3] sw_scale: Provide neon implementation for hscale

[FFmpeg-devel] [PATCH 1/3] sw_scale: Add specializations for hscale 8 to 19

[FFmpeg-devel] [PATCH 2/3] sw_scale: Add specializations for hscale 16 to 15

[FFmpeg-devel] [PATCH 3/3] sw_scale: Add specializations for hscale 16 to 19

[FFmpeg-devel] [PATCH 0/2] lavc/aarch64: Provide neon implementations

[FFmpeg-devel] [PATCH 1/2] lavc/aarch64: Assign callback with function

[FFmpeg-devel] [PATCH 2/2] lavc/aarch64: Add pix_abs16_x2 neon implementation

19 matches

Site Navigation

Mail list logo

Footer information