Re: [libav-devel] [PATCH 4/4] h264/aarch64: add intra loop filter neon asm
On Tue, 1 Jan 2019, Janne Grunau wrote: Add my neon asm from x264 relicensed under the LGPL 2.1 or later. Ported (x264 uses nv12 chroma) and optimized. Cycle count for checkasm --bench on a Snapdragon 820e: h264_h_loop_filter_luma_intra_8bpp_c: 60.0 h264_h_loop_filter_luma_intra_8bpp_neon: 54.2 h264_v_loop_filter_luma_intra_8bpp_c: 148.3 h264_v_loop_filter_luma_intra_8bpp_neon: 73.8 h264_h_loop_filter_chroma_intra_8bpp_c: 27.8 h264_h_loop_filter_chroma_intra_8bpp_neon: 21.4 h264_h_loop_filter_chroma_mbaff_intra_8bpp_c: 15.8 h264_h_loop_filter_chroma_mbaff_intra_8bpp_neon: 15.7 h264_v_loop_filter_chroma_intra_8bpp_c: 45.8 h264_v_loop_filter_chroma_intra_8bpp_neon: 17.3 --- libavcodec/aarch64/h264dsp_init_aarch64.c | 16 ++ libavcodec/aarch64/h264dsp_neon.S | 297 ++ 2 files changed, 313 insertions(+) LGTM // Martin ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 3/4] h264/aarch64: optimize neon loop filter
On Tue, 1 Jan 2019, Janne Grunau wrote: Exit as soon as possible if no filtering will be done. Improves the checkasm --bench cycle count on a Snapdragon 820e: h264_h_loop_filter_luma_8bpp_c: 72.4 -> 72.5 h264_h_loop_filter_luma_8bpp_neon: 97.1 -> 56.3 h264_v_loop_filter_luma_8bpp_c: 174.0 -> 173.5 h264_v_loop_filter_luma_8bpp_neon: 62.9 -> 60.9 h264_h_loop_filter_chroma_8bpp_c:30.2 -> 30.3 h264_h_loop_filter_chroma_8bpp_neon: 51.6 -> 25.7 h264_v_loop_filter_chroma_8bpp_c:57.3 -> 57.3 h264_v_loop_filter_chroma_8bpp_neon: 28.0 -> 24.0 --- libavcodec/aarch64/h264dsp_neon.S | 33 ++- 1 file changed, 19 insertions(+), 14 deletions(-) LGTM // Martin ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 2/4] checkasm/h264: add loop filter tests
On Tue, 1 Jan 2019, Janne Grunau wrote: --- tests/checkasm/h264dsp.c | 124 +++ 1 file changed, 124 insertions(+) Looks ok to me // Martin ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 1/4] h264/aarch64: sign extend int stride in loop filter asm
On Tue, 1 Jan 2019, Janne Grunau wrote: --- libavcodec/aarch64/h264dsp_neon.S | 3 +++ 1 file changed, 3 insertions(+) diff --git a/libavcodec/aarch64/h264dsp_neon.S b/libavcodec/aarch64/h264dsp_neon.S index 9b4610a4d4..60ffa24500 100644 --- a/libavcodec/aarch64/h264dsp_neon.S +++ b/libavcodec/aarch64/h264dsp_neon.S @@ -130,6 +130,7 @@ endfunc function ff_h264_h_loop_filter_luma_neon, export=1 h264_loop_filter_start +sxtwx1, w1 sub x0, x0, #4 ld1 {v6.8B}, [x0], x1 @@ -210,6 +211,7 @@ endfunc function ff_h264_v_loop_filter_chroma_neon, export=1 h264_loop_filter_start +sxtwx1, w1 sub x0, x0, x1, lsl #1 ld1 {v18.8B}, [x0], x1 @@ -228,6 +230,7 @@ endfunc function ff_h264_h_loop_filter_chroma_neon, export=1 h264_loop_filter_start +sxtwx1, w1 sub x0, x0, #2 ld1 {v18.S}[0], [x0], x1 -- 2.20.1 LGTM // Martin ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel