Re: [FFmpeg-devel] [PATCH] avcodec/arm/hevcdsp_sao : add NEON optimization for sao

2018-03-22 Thread Shengbin Meng
The code looks good to me. I think the wrapper is fine, because that part of 
code is not suitable for NEON assembly.

But you can remove the using of `sizeof(uint8_t)` as suggested by Carl.

Shengbin Meng

> On 19 Mar 2018, at 12:41, Yingming Fan  wrote:
> 
> Hi, is there any review about this patch? What’s your option about wrapper we 
> used in this patch.
> 
> Yingming Fan
> 
>> On 11 Mar 2018, at 8:59 PM, Yingming Fan  wrote:
>> 
>> 
>>> On 11 Mar 2018, at 8:54 PM, Carl Eugen Hoyos  wrote:
>>> 
>>> 2018-03-08 8:03 GMT+01:00 Yingming Fan :
 From: Meng Wang 
>>> 
 +stride_dst /= sizeof(uint8_t);
 +stride_src /= sizeof(uint8_t);
>>> 
>>> FFmpeg requires sizeof(uint8_t) to be 1, please simplify
>>> your patch accordingly.
>>> 
>>> Why is the wrapper function needed?
>> 
>> We use wrapper because codes in wrapper no need to be written with assembly, 
>> C codes more readable.
>> 
>>> 
>>> Carl Eugen
>>> ___
>>> ffmpeg-devel mailing list
>>> ffmpeg-devel@ffmpeg.org
>>> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>> 
> 
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] avcodec/arm/hevcdsp_sao : add NEON optimization for sao

2018-03-22 Thread Shengbin Meng
Hi,

By checkasm benchmark, I can see a speedup of ~3x for band mode and ~6x for 
edge mode on my device (the device has aarch64 CPU, but I configured ffmpeg 
with `—arch=arm`). And FATE passed as well.

Results of a checkasm run:

$./tests/checkasm/checkasm --test=hevc_sao --bench
$ sudo ./tests/checkasm/checkasm --test=hevc_sao --bench
benchmarking with Linux Perf Monitoring API
nop: 49.8
checkasm: using random seed 1088726844
NEON:
 - hevc_sao.sao_band [OK]
 - hevc_sao.sao_edge [OK]
checkasm: all 10 tests passed
hevc_sao_band_8x8_8_c: 578.0
hevc_sao_band_8x8_8_neon: 215.3
hevc_sao_band_16x16_8_c: 2004.3
hevc_sao_band_16x16_8_neon: 680.8
hevc_sao_band_32x32_8_c: 8363.5
hevc_sao_band_32x32_8_neon: 2579.3
hevc_sao_band_48x48_8_c: 18268.3
hevc_sao_band_48x48_8_neon: 5653.3
hevc_sao_band_64x64_8_c: 32001.8
hevc_sao_band_64x64_8_neon: 9952.0
hevc_sao_edge_8x8_8_c: 1211.0
hevc_sao_edge_8x8_8_neon: 217.5
hevc_sao_edge_16x16_8_c: 4708.5
hevc_sao_edge_16x16_8_neon: 767.0
hevc_sao_edge_32x32_8_c: 18673.0
hevc_sao_edge_32x32_8_neon: 2967.3
hevc_sao_edge_48x48_8_c: 41936.3
hevc_sao_edge_48x48_8_neon: 6642.8
hevc_sao_edge_64x64_8_c: 74015.8
hevc_sao_edge_64x64_8_neon: 11781.8

Regards
Shengbin

> On 11 Mar 2018, at 10:27, Yingming Fan  wrote:
> 
> Hi, there. 
> I have already pushed a patch which add hevc_sao checkasm and patch was 
> adopted.
> You can verify this optimization by using checkasm under arm device, 
> `checkasm --test=hevc_sao --bench`.
> hevc_sao_band speed up ~2x, hevc_sao_edge speed up ~4x. Also passed FATE 
> under arm platform.
> 
> Yingming Fan
> 
>> On 8 Mar 2018, at 3:03 PM, Yingming Fan  wrote:
>> 
>> From: Meng Wang 
>> 
>> Signed-off-by: Meng Wang 
>> ---
>> As FFmpeg hevc decoder have no SAO neon optimization, we add sao_band and 
>> sao_edge neon codes in this patch.
>> I have already submit a patch called 'checkasm/hevc_sao : add hevc_sao for 
>> checkasm' several days ago.
>> Results below was printed by hevc_sao checkasm on an armv7 device Nexus 5. 
>> From the results we can see: hevc_sao_band speed up ~2x, hevc_sao_edge speed 
>> up ~4x. 
>> Also test FATE under armv7 device and MacOS.
>> 
>> hevc_sao_band_8x8_8_c: 804.9
>> hevc_sao_band_8x8_8_neon: 452.4
>> hevc_sao_band_16x16_8_c: 2638.1
>> hevc_sao_band_16x16_8_neon: 1169.9
>> hevc_sao_band_32x32_8_c: 9259.9
>> hevc_sao_band_32x32_8_neon: 3956.1
>> hevc_sao_band_48x48_8_c: 20344.6
>> hevc_sao_band_48x48_8_neon: 8649.6
>> hevc_sao_band_64x64_8_c: 35684.6
>> hevc_sao_band_64x64_8_neon: 15213.1
>> hevc_sao_edge_8x8_8_c: 1761.6
>> hevc_sao_edge_8x8_8_neon: 414.6
>> hevc_sao_edge_16x16_8_c: 6844.4
>> hevc_sao_edge_16x16_8_neon: 1589.9
>> hevc_sao_edge_32x32_8_c: 27156.4
>> hevc_sao_edge_32x32_8_neon: 6116.6
>> hevc_sao_edge_48x48_8_c: 60004.6
>> hevc_sao_edge_48x48_8_neon: 13686.4
>> hevc_sao_edge_64x64_8_c: 106708.1
>> hevc_sao_edge_64x64_8_neon: 24240.1
>> 
>> libavcodec/arm/Makefile|   3 +-
>> libavcodec/arm/hevcdsp_init_neon.c |  63 +
>> libavcodec/arm/hevcdsp_sao_neon.S  | 181 
>> +
>> 3 files changed, 246 insertions(+), 1 deletion(-)
>> create mode 100644 libavcodec/arm/hevcdsp_sao_neon.S
>> 
>> diff --git a/libavcodec/arm/Makefile b/libavcodec/arm/Makefile
>> index 1eeac5449e..2ee913e8a8 100644
>> --- a/libavcodec/arm/Makefile
>> +++ b/libavcodec/arm/Makefile
>> @@ -136,7 +136,8 @@ NEON-OBJS-$(CONFIG_DCA_DECODER)+= 
>> arm/synth_filter_neon.o
>> NEON-OBJS-$(CONFIG_HEVC_DECODER)   += arm/hevcdsp_init_neon.o   \
>>  arm/hevcdsp_deblock_neon.o\
>>  arm/hevcdsp_idct_neon.o   \
>> -  arm/hevcdsp_qpel_neon.o
>> +  arm/hevcdsp_qpel_neon.o   \
>> +  arm/hevcdsp_sao_neon.o
>> NEON-OBJS-$(CONFIG_RV30_DECODER)   += arm/rv34dsp_neon.o
>> NEON-OBJS-$(CONFIG_RV40_DECODER)   += arm/rv34dsp_neon.o\
>>  arm/rv40dsp_neon.o
>> diff --git a/libavcodec/arm/hevcdsp_init_neon.c 
>> b/libavcodec/arm/hevcdsp_init_neon.c
>> index a4628d2a93..3c480f12f8 100644
>> --- a/libavcodec/arm/hevcdsp_init_neon.c
>> +++ b/libavcodec/arm/hevcdsp_init_neon.c
>> @@ -21,8 +21,16 @@
>> #include "libavutil/attributes.h"
>> #include "libavutil/arm/cpu.h"
>> #include "libavcodec/hevcdsp.h"
>> +#include "libavcodec/avcodec.h"
>> #include "hevcdsp_arm.h"
>> 
>> +void ff_hevc_sao_band_filter_neon_8_wrapper(uint8_t *_dst, uint8_t *_src,
>> +  ptrdiff_t stride_dst, ptrdiff_t 
>> stride_src,
>> +  int16_t *sao_offset_val, int 
>> sao_left_class,
>> +  int width, int height);
>> +void ff_hevc_sao_edge_filter_neon_8_wrapper(uint8_t *_dst, uint8_t 

Re: [FFmpeg-devel] [PATCH] avcodec/arm/hevcdsp_sao : add NEON optimization for sao

2018-03-18 Thread Yingming Fan
Hi, is there any review about this patch? What’s your option about wrapper we 
used in this patch.

Yingming Fan

> On 11 Mar 2018, at 8:59 PM, Yingming Fan  wrote:
> 
> 
>> On 11 Mar 2018, at 8:54 PM, Carl Eugen Hoyos  wrote:
>> 
>> 2018-03-08 8:03 GMT+01:00 Yingming Fan :
>>> From: Meng Wang 
>> 
>>> +stride_dst /= sizeof(uint8_t);
>>> +stride_src /= sizeof(uint8_t);
>> 
>> FFmpeg requires sizeof(uint8_t) to be 1, please simplify
>> your patch accordingly.
>> 
>> Why is the wrapper function needed?
> 
> We use wrapper because codes in wrapper no need to be written with assembly, 
> C codes more readable.
> 
>> 
>> Carl Eugen
>> ___
>> ffmpeg-devel mailing list
>> ffmpeg-devel@ffmpeg.org
>> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] avcodec/arm/hevcdsp_sao : add NEON optimization for sao

2018-03-11 Thread Yingming Fan

> On 11 Mar 2018, at 8:54 PM, Carl Eugen Hoyos  wrote:
> 
> 2018-03-08 8:03 GMT+01:00 Yingming Fan :
>> From: Meng Wang 
> 
>> +stride_dst /= sizeof(uint8_t);
>> +stride_src /= sizeof(uint8_t);
> 
> FFmpeg requires sizeof(uint8_t) to be 1, please simplify
> your patch accordingly.
> 
> Why is the wrapper function needed?

We use wrapper because codes in wrapper no need to be written with assembly, C 
codes more readable.

> 
> Carl Eugen
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] avcodec/arm/hevcdsp_sao : add NEON optimization for sao

2018-03-11 Thread Carl Eugen Hoyos
2018-03-08 8:03 GMT+01:00 Yingming Fan :
> From: Meng Wang 

> +stride_dst /= sizeof(uint8_t);
> +stride_src /= sizeof(uint8_t);

FFmpeg requires sizeof(uint8_t) to be 1, please simplify
your patch accordingly.

Why is the wrapper function needed?

Carl Eugen
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] avcodec/arm/hevcdsp_sao : add NEON optimization for sao

2018-03-10 Thread Yingming Fan
Hi, there. 
I have already pushed a patch which add hevc_sao checkasm and patch was adopted.
You can verify this optimization by using checkasm under arm device, `checkasm 
--test=hevc_sao --bench`.
hevc_sao_band speed up ~2x, hevc_sao_edge speed up ~4x. Also passed FATE under 
arm platform.

Yingming Fan

> On 8 Mar 2018, at 3:03 PM, Yingming Fan  wrote:
> 
> From: Meng Wang 
> 
> Signed-off-by: Meng Wang 
> ---
> As FFmpeg hevc decoder have no SAO neon optimization, we add sao_band and 
> sao_edge neon codes in this patch.
> I have already submit a patch called 'checkasm/hevc_sao : add hevc_sao for 
> checkasm' several days ago.
> Results below was printed by hevc_sao checkasm on an armv7 device Nexus 5. 
> From the results we can see: hevc_sao_band speed up ~2x, hevc_sao_edge speed 
> up ~4x. 
> Also test FATE under armv7 device and MacOS.
> 
> hevc_sao_band_8x8_8_c: 804.9
> hevc_sao_band_8x8_8_neon: 452.4
> hevc_sao_band_16x16_8_c: 2638.1
> hevc_sao_band_16x16_8_neon: 1169.9
> hevc_sao_band_32x32_8_c: 9259.9
> hevc_sao_band_32x32_8_neon: 3956.1
> hevc_sao_band_48x48_8_c: 20344.6
> hevc_sao_band_48x48_8_neon: 8649.6
> hevc_sao_band_64x64_8_c: 35684.6
> hevc_sao_band_64x64_8_neon: 15213.1
> hevc_sao_edge_8x8_8_c: 1761.6
> hevc_sao_edge_8x8_8_neon: 414.6
> hevc_sao_edge_16x16_8_c: 6844.4
> hevc_sao_edge_16x16_8_neon: 1589.9
> hevc_sao_edge_32x32_8_c: 27156.4
> hevc_sao_edge_32x32_8_neon: 6116.6
> hevc_sao_edge_48x48_8_c: 60004.6
> hevc_sao_edge_48x48_8_neon: 13686.4
> hevc_sao_edge_64x64_8_c: 106708.1
> hevc_sao_edge_64x64_8_neon: 24240.1
> 
> libavcodec/arm/Makefile|   3 +-
> libavcodec/arm/hevcdsp_init_neon.c |  63 +
> libavcodec/arm/hevcdsp_sao_neon.S  | 181 +
> 3 files changed, 246 insertions(+), 1 deletion(-)
> create mode 100644 libavcodec/arm/hevcdsp_sao_neon.S
> 
> diff --git a/libavcodec/arm/Makefile b/libavcodec/arm/Makefile
> index 1eeac5449e..2ee913e8a8 100644
> --- a/libavcodec/arm/Makefile
> +++ b/libavcodec/arm/Makefile
> @@ -136,7 +136,8 @@ NEON-OBJS-$(CONFIG_DCA_DECODER)+= 
> arm/synth_filter_neon.o
> NEON-OBJS-$(CONFIG_HEVC_DECODER)   += arm/hevcdsp_init_neon.o   \
>   arm/hevcdsp_deblock_neon.o\
>   arm/hevcdsp_idct_neon.o   \
> -  arm/hevcdsp_qpel_neon.o
> +  arm/hevcdsp_qpel_neon.o   \
> +   arm/hevcdsp_sao_neon.o
> NEON-OBJS-$(CONFIG_RV30_DECODER)   += arm/rv34dsp_neon.o
> NEON-OBJS-$(CONFIG_RV40_DECODER)   += arm/rv34dsp_neon.o\
>   arm/rv40dsp_neon.o
> diff --git a/libavcodec/arm/hevcdsp_init_neon.c 
> b/libavcodec/arm/hevcdsp_init_neon.c
> index a4628d2a93..3c480f12f8 100644
> --- a/libavcodec/arm/hevcdsp_init_neon.c
> +++ b/libavcodec/arm/hevcdsp_init_neon.c
> @@ -21,8 +21,16 @@
> #include "libavutil/attributes.h"
> #include "libavutil/arm/cpu.h"
> #include "libavcodec/hevcdsp.h"
> +#include "libavcodec/avcodec.h"
> #include "hevcdsp_arm.h"
> 
> +void ff_hevc_sao_band_filter_neon_8_wrapper(uint8_t *_dst, uint8_t *_src,
> +  ptrdiff_t stride_dst, ptrdiff_t stride_src,
> +  int16_t *sao_offset_val, int 
> sao_left_class,
> +  int width, int height);
> +void ff_hevc_sao_edge_filter_neon_8_wrapper(uint8_t *_dst, uint8_t *_src, 
> ptrdiff_t stride_dst, int16_t *sao_offset_val,
> +  int eo, int width, int height);
> +
> void ff_hevc_v_loop_filter_luma_neon(uint8_t *_pix, ptrdiff_t _stride, int 
> _beta, int *_tc, uint8_t *_no_p, uint8_t *_no_q);
> void ff_hevc_h_loop_filter_luma_neon(uint8_t *_pix, ptrdiff_t _stride, int 
> _beta, int *_tc, uint8_t *_no_p, uint8_t *_no_q);
> void ff_hevc_v_loop_filter_chroma_neon(uint8_t *_pix, ptrdiff_t _stride, int 
> *_tc, uint8_t *_no_p, uint8_t *_no_q);
> @@ -142,6 +150,51 @@ QPEL_FUNC_UW(ff_hevc_put_qpel_uw_h3v2_neon_8);
> QPEL_FUNC_UW(ff_hevc_put_qpel_uw_h3v3_neon_8);
> #undef QPEL_FUNC_UW
> 
> +void ff_hevc_sao_band_filter_neon_8(uint8_t *dst, uint8_t *src, ptrdiff_t 
> stride_dst, ptrdiff_t stride_src, int width, int height, int16_t 
> *offset_table);
> +
> +void ff_hevc_sao_band_filter_neon_8_wrapper(uint8_t *_dst, uint8_t *_src,
> +  ptrdiff_t stride_dst, ptrdiff_t stride_src,
> +  int16_t *sao_offset_val, int 
> sao_left_class,
> +  int width, int height) {
> +uint8_t *dst = (uint8_t *)_dst;
> +uint8_t *src = (uint8_t *)_src;
> +int16_t offset_table[32] = {0};
> +int k;
> +
> +stride_dst /= sizeof(uint8_t);
> +stride_src /= sizeof(uint8_t);
> +
> +for (k = 0; k < 4; k++) {
> 

[FFmpeg-devel] [PATCH] avcodec/arm/hevcdsp_sao : add NEON optimization for sao

2018-03-07 Thread Yingming Fan
From: Meng Wang 

Signed-off-by: Meng Wang 
---
As FFmpeg hevc decoder have no SAO neon optimization, we add sao_band and 
sao_edge neon codes in this patch.
I have already submit a patch called 'checkasm/hevc_sao : add hevc_sao for 
checkasm' several days ago.
Results below was printed by hevc_sao checkasm on an armv7 device Nexus 5. 
From the results we can see: hevc_sao_band speed up ~2x, hevc_sao_edge speed up 
~4x. 
Also test FATE under armv7 device and MacOS.

hevc_sao_band_8x8_8_c: 804.9
hevc_sao_band_8x8_8_neon: 452.4
hevc_sao_band_16x16_8_c: 2638.1
hevc_sao_band_16x16_8_neon: 1169.9
hevc_sao_band_32x32_8_c: 9259.9
hevc_sao_band_32x32_8_neon: 3956.1
hevc_sao_band_48x48_8_c: 20344.6
hevc_sao_band_48x48_8_neon: 8649.6
hevc_sao_band_64x64_8_c: 35684.6
hevc_sao_band_64x64_8_neon: 15213.1
hevc_sao_edge_8x8_8_c: 1761.6
hevc_sao_edge_8x8_8_neon: 414.6
hevc_sao_edge_16x16_8_c: 6844.4
hevc_sao_edge_16x16_8_neon: 1589.9
hevc_sao_edge_32x32_8_c: 27156.4
hevc_sao_edge_32x32_8_neon: 6116.6
hevc_sao_edge_48x48_8_c: 60004.6
hevc_sao_edge_48x48_8_neon: 13686.4
hevc_sao_edge_64x64_8_c: 106708.1
hevc_sao_edge_64x64_8_neon: 24240.1

 libavcodec/arm/Makefile|   3 +-
 libavcodec/arm/hevcdsp_init_neon.c |  63 +
 libavcodec/arm/hevcdsp_sao_neon.S  | 181 +
 3 files changed, 246 insertions(+), 1 deletion(-)
 create mode 100644 libavcodec/arm/hevcdsp_sao_neon.S

diff --git a/libavcodec/arm/Makefile b/libavcodec/arm/Makefile
index 1eeac5449e..2ee913e8a8 100644
--- a/libavcodec/arm/Makefile
+++ b/libavcodec/arm/Makefile
@@ -136,7 +136,8 @@ NEON-OBJS-$(CONFIG_DCA_DECODER)+= 
arm/synth_filter_neon.o
 NEON-OBJS-$(CONFIG_HEVC_DECODER)   += arm/hevcdsp_init_neon.o   \
   arm/hevcdsp_deblock_neon.o\
   arm/hevcdsp_idct_neon.o   \
-  arm/hevcdsp_qpel_neon.o
+  arm/hevcdsp_qpel_neon.o   \
+ arm/hevcdsp_sao_neon.o
 NEON-OBJS-$(CONFIG_RV30_DECODER)   += arm/rv34dsp_neon.o
 NEON-OBJS-$(CONFIG_RV40_DECODER)   += arm/rv34dsp_neon.o\
   arm/rv40dsp_neon.o
diff --git a/libavcodec/arm/hevcdsp_init_neon.c 
b/libavcodec/arm/hevcdsp_init_neon.c
index a4628d2a93..3c480f12f8 100644
--- a/libavcodec/arm/hevcdsp_init_neon.c
+++ b/libavcodec/arm/hevcdsp_init_neon.c
@@ -21,8 +21,16 @@
 #include "libavutil/attributes.h"
 #include "libavutil/arm/cpu.h"
 #include "libavcodec/hevcdsp.h"
+#include "libavcodec/avcodec.h"
 #include "hevcdsp_arm.h"
 
+void ff_hevc_sao_band_filter_neon_8_wrapper(uint8_t *_dst, uint8_t *_src,
+  ptrdiff_t stride_dst, ptrdiff_t stride_src,
+  int16_t *sao_offset_val, int sao_left_class,
+  int width, int height);
+void ff_hevc_sao_edge_filter_neon_8_wrapper(uint8_t *_dst, uint8_t *_src, 
ptrdiff_t stride_dst, int16_t *sao_offset_val,
+  int eo, int width, int height);
+
 void ff_hevc_v_loop_filter_luma_neon(uint8_t *_pix, ptrdiff_t _stride, int 
_beta, int *_tc, uint8_t *_no_p, uint8_t *_no_q);
 void ff_hevc_h_loop_filter_luma_neon(uint8_t *_pix, ptrdiff_t _stride, int 
_beta, int *_tc, uint8_t *_no_p, uint8_t *_no_q);
 void ff_hevc_v_loop_filter_chroma_neon(uint8_t *_pix, ptrdiff_t _stride, int 
*_tc, uint8_t *_no_p, uint8_t *_no_q);
@@ -142,6 +150,51 @@ QPEL_FUNC_UW(ff_hevc_put_qpel_uw_h3v2_neon_8);
 QPEL_FUNC_UW(ff_hevc_put_qpel_uw_h3v3_neon_8);
 #undef QPEL_FUNC_UW
 
+void ff_hevc_sao_band_filter_neon_8(uint8_t *dst, uint8_t *src, ptrdiff_t 
stride_dst, ptrdiff_t stride_src, int width, int height, int16_t *offset_table);
+
+void ff_hevc_sao_band_filter_neon_8_wrapper(uint8_t *_dst, uint8_t *_src,
+  ptrdiff_t stride_dst, ptrdiff_t stride_src,
+  int16_t *sao_offset_val, int sao_left_class,
+  int width, int height) {
+uint8_t *dst = (uint8_t *)_dst;
+uint8_t *src = (uint8_t *)_src;
+int16_t offset_table[32] = {0};
+int k;
+
+stride_dst /= sizeof(uint8_t);
+stride_src /= sizeof(uint8_t);
+
+for (k = 0; k < 4; k++) {
+offset_table[(k + sao_left_class) & 31] = sao_offset_val[k + 1];
+}
+
+ff_hevc_sao_band_filter_neon_8(dst, src, stride_dst, stride_src, width, 
height, offset_table);
+}
+
+void ff_hevc_sao_edge_filter_neon_8(uint8_t *dst, uint8_t *src, ptrdiff_t 
stride_dst, ptrdiff_t stride_src, int width, int height,
+int a_stride, int b_stride, int16_t 
*sao_offset_val, uint8_t *edge_idx);
+
+void ff_hevc_sao_edge_filter_neon_8_wrapper(uint8_t *_dst, uint8_t *_src, 
ptrdiff_t stride_dst, int16_t *sao_offset_val,
+