.
Hubert Mazur (3):
lavc/aarch64: Add neon implementation for pix_median_abs16
lavc/aarch64: Add neon implementation for vsad8_intra
lavc/aarch64: Add neon implementation for pix_median_abs8
libavcodec/aarch64/me_cmp_init_aarch64.c | 10 ++
libavcodec/aarch64/me_cmp_neon.S | 182
Provide optimized implementation for pix_median_abs16 function.
Performance comparison tests are shown below.
- median_sad_0_c: 720.5
- median_sad_0_neon: 127.2
Benchmarks and tests run with checkasm tool on AWS Graviton 3.
Signed-off-by: Hubert Mazur
---
libavcodec/aarch64
Provide optimized implementation for vsad8_intra function.
Performance comparison tests are shown below.
- vsad_5_c: 94.7
- vsad_5_neon: 20.7
Benchmarks and tests run with checkasm tool on AWS Graviton 3.
Signed-off-by: Hubert Mazur
---
libavcodec/aarch64/me_cmp_init_aarch64.c | 3
Provide optimized implementation for pix_median_abs8 function.
Performance comparison tests are shown below.
- median_sad_1_c: 277.0
- median_sad_1_neon: 82.0
Benchmarks and tests run with checkasm tool on AWS Graviton 3.
Signed-off-by: Hubert Mazur
---
libavcodec/aarch64/me_cmp_init_aarch64
LGTM, thanks!
On Wed, Sep 28, 2022 at 11:13 AM Martin Storsjö wrote:
> This avoids one redundant load per row; pix3 from the previous
> iteration can be used as pix2 in the next one.
>
> Before: Cortex A53A72A73
> pix_abs_0_2_neon: 138.0 59.7 48.0
> After:
> pix_abs_0_2_neon:
LGTM.
On Wed, Sep 28, 2022 at 11:13 AM Martin Storsjö wrote:
> Signed-off-by: Martin Storsjö
> ---
> libavcodec/aarch64/me_cmp_neon.S | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/libavcodec/aarch64/me_cmp_neon.S
> b/libavcodec/aarch64/me_cmp_neon.S
> index 832
Provide arm64 neon optimized functions from swscale family.
Hubert Mazur (4):
sw_scale: Add specializations for hscale 8 to 19
tests/sw_scale: Add test cases for input sizes 16
sw_scale: Add specializations for hscale 16 to 15
sw_scale: Add specializations for hscale 16 to 19
libswscale
: 32803.7
hscale_8_to_19__fs_32_dstW_512_neon: 5474.2
hscale_8_to_19__fs_40_dstW_512_c: 40948.0
hscale_8_to_19__fs_40_dstW_512_neon: 6669.7
Signed-off-by: Hubert Mazur
---
libswscale/aarch64/hscale.S | 292 ++-
libswscale/aarch64/swscale.c | 13 +-
2 files changed
Previously test cases handled only input sizes equal to 8.
Add support for input size 16 which is used by scaling
routines hscale16To15 and hscale16To19. Pass SwsContext
pointer to each function as some of them make use of it.
Signed-off-by: Hubert Mazur
---
tests/checkasm/sw_scale.c | 35
hscale_16_to_15__fs_32_dstW_512_neon: 9511.2
hscale_16_to_15__fs_40_dstW_512_c: 48995.7
hscale_16_to_15__fs_40_dstW_512_neon: 11570.0
Signed-off-by: Hubert Mazur
---
libswscale/aarch64/hscale.S | 409 ++-
libswscale/aarch64/swscale.c | 66 +-
libswscale/swscale.c
hscale_16_to_19__fs_32_dstW_512_neon: 9502.7
hscale_16_to_19__fs_40_dstW_512_c: 45477.5
hscale_16_to_19__fs_40_dstW_512_neon: 11552.0
Signed-off-by: Hubert Mazur
---
libswscale/aarch64/hscale.S | 402 +++
libswscale/aarch64/swscale.c | 70 +-
2 files changed, 471
Thanks for the review.
I will fix the failing checkasm first and then take care of the minor
issues. I will try to to resend fixed versions this week.
Regards,
Hubert
On Mon, Oct 24, 2022 at 3:19 PM Martin Storsjö wrote:
> On Mon, 17 Oct 2022, Hubert Mazur wrote:
>
> > Provid
. After fixing x86
the patch for checkasm could be merged.
https://patchwork.ffmpeg.org/project/ffmpeg/patch/20221017130715.30896-3-...@semihalf.com/
Hubert Mazur (3):
sw_scale: Add specializations for hscale 8 to 19
sw_scale: Add specializations for hscale 16 to 15
sw_scale: Add
: 32803.7
hscale_8_to_19__fs_32_dstW_512_neon: 5474.2
hscale_8_to_19__fs_40_dstW_512_c: 40948.0
hscale_8_to_19__fs_40_dstW_512_neon: 6669.7
Signed-off-by: Hubert Mazur
---
libswscale/aarch64/hscale.S | 291 +++
libswscale/aarch64/swscale.c | 13 +-
2 files changed
hscale_16_to_15__fs_32_dstW_512_neon: 9511.2
hscale_16_to_15__fs_40_dstW_512_c: 48995.7
hscale_16_to_15__fs_40_dstW_512_neon: 11570.0
Signed-off-by: Hubert Mazur
---
libswscale/aarch64/hscale.S | 407 +++
libswscale/aarch64/swscale.c | 61 ++
libswscale/swscale.c
hscale_16_to_19__fs_32_dstW_512_neon: 9502.7
hscale_16_to_19__fs_40_dstW_512_c: 45477.5
hscale_16_to_19__fs_40_dstW_512_neon: 11552.0
Signed-off-by: Hubert Mazur
---
libswscale/aarch64/hscale.S | 402 +++
libswscale/aarch64/swscale.c | 66 ++
2 files changed, 468
Provide neon implementations for motion estimation functions.
Hubert Mazur (2):
lavc/aarch64: Assign callback with function
lavc/aarch64: Add pix_abs16_x2 neon implementation
libavcodec/aarch64/me_cmp_init_aarch64.c | 5 +
libavcodec/aarch64/me_cmp_neon.S | 134
Assign c->sad[0] callback with already existing neon implementation
of pix_abs16 function.
Signed-off-by: Hubert Mazur
---
libavcodec/aarch64/me_cmp_init_aarch64.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/libavcodec/aarch64/me_cmp_init_aarch64.c
b/libavcodec/aarc
Provide neon implementation for pix_abs16_x2 function.
Performance tests of implementation are below.
- pix_abs_0_1_c: 291.9
- pix_abs_0_1_neon: 73.7
Benchmarks and tests run with checkasm tool on AWS Graviton 3.
Signed-off-by: Hubert Mazur
---
libavcodec/aarch64/me_cmp_init_aarch64.c | 3
19 matches
Mail list logo