Our tests show that CPU clocks are reduced for each module: ~48% for qpel weight ~17% for epel ~71% for sao edge mode ~48% for sao band mode ~60% for idct of 16x16 block And overall decoding speeds up by 20~30% (increase of FPS).
We also compared the decoding results to make sure they are the same before and after the optimization. These patches are based on the n3.4 release. Meng Wang (5): avcodec/hevcdsp: Add NEON optimization for qpel weighted mode avcodec/hevcdsp: Add NEON optimization for epel avcodec/hevcdsp: Use pre-load (pld) to optimize data loading avcodec/hevcdsp: Add NEON optimization for sao avcodec/hevcdsp: Add NEON optimization for idct16x16 Shengbin Meng (1): avcodec/hevcdsp: Add NEON optimization for whole-pixel interpolation libavcodec/arm/Makefile | 4 +- libavcodec/arm/hevcdsp_epel_neon.S | 2078 ++++++++++++++++++++++++++++++++++++ libavcodec/arm/hevcdsp_idct_neon.S | 241 +++++ libavcodec/arm/hevcdsp_init_neon.c | 695 ++++++++++++ libavcodec/arm/hevcdsp_qpel_neon.S | 702 ++++++++++++ libavcodec/arm/hevcdsp_sao_neon.S | 181 ++++ 6 files changed, 3900 insertions(+), 1 deletion(-) create mode 100644 libavcodec/arm/hevcdsp_epel_neon.S create mode 100644 libavcodec/arm/hevcdsp_sao_neon.S -- 2.13.6 (Apple Git-96) _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel