from:"Shiyou Yin"

Re: [FFmpeg-devel] [PATCH] swscale: [loongarch] Fix undeclared functions prob.

2024-05-29 Thread Shiyou Yin

Ping.

> 2024年5月15日 16:57，金波  写道：
> 
> Look good to me.
> 
> 2024-05-08 18:07:49 "yinshiyou-hf"  写道：
>> Compile with '--disable-lasx', ‘lumRangeFromJpeg_lasx’ undeclared.
>> ---
>> libswscale/loongarch/swscale_init_loongarch.c | 2 ++
>> 1 file changed, 2 insertions(+)
>> 
>> diff --git a/libswscale/loongarch/swscale_init_loongarch.c 
>> b/libswscale/loongarch/swscale_init_loongarch.c
>> index 3a5a7ee856..4af62ad9f8 100644
>> --- a/libswscale/loongarch/swscale_init_loongarch.c
>> +++ b/libswscale/loongarch/swscale_init_loongarch.c
>> @@ -41,6 +41,7 @@ av_cold void 
>> ff_sws_init_range_convert_loongarch(SwsContext *c)
>> }
>> }
>> }
>> +#if HAVE_LASX
>> if (have_lasx(cpu_flags)) {
>> if (c->srcRange != c->dstRange && !isAnyRGB(c->dstFormat)) {
>> if (c->dstBpc <= 14) {
>> @@ -54,6 +55,7 @@ av_cold void 
>> ff_sws_init_range_convert_loongarch(SwsContext *c)
>> }
>> }
>> }
>> +#endif // #if HAVE_LASX
>> }
>> 
>> av_cold void ff_sws_init_swscale_loongarch(SwsContext *c)
>> -- 
>> 2.20.1
>> 
>> ___
>> ffmpeg-devel mailing list
>> ffmpeg-devel@ffmpeg.org
>> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>> 
>> To unsubscribe, visit link above, or email
>> ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
> 

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH] swscale: [loongarch] Fix undeclared functions prob.

2024-05-08 Thread Shiyou Yin

Compile with '--disable-lasx', ‘lumRangeFromJpeg_lasx’ undeclared.
---
 libswscale/loongarch/swscale_init_loongarch.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/libswscale/loongarch/swscale_init_loongarch.c 
b/libswscale/loongarch/swscale_init_loongarch.c
index 3a5a7ee856..4af62ad9f8 100644
--- a/libswscale/loongarch/swscale_init_loongarch.c
+++ b/libswscale/loongarch/swscale_init_loongarch.c
@@ -41,6 +41,7 @@ av_cold void ff_sws_init_range_convert_loongarch(SwsContext 
*c)
 }
 }
 }
+#if HAVE_LASX
 if (have_lasx(cpu_flags)) {
 if (c->srcRange != c->dstRange && !isAnyRGB(c->dstFormat)) {
 if (c->dstBpc <= 14) {
@@ -54,6 +55,7 @@ av_cold void ff_sws_init_range_convert_loongarch(SwsContext 
*c)
 }
 }
 }
+#endif // #if HAVE_LASX
 }
 
 av_cold void ff_sws_init_swscale_loongarch(SwsContext *c)
-- 
2.20.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] Add optimization in swscale for LA.

2024-04-09 Thread Shiyou Yin



> 2024年3月27日 03:31，Michael Niedermayer  写道：
> 
> On Tue, Mar 26, 2024 at 11:11:00AM +0800, Shiyou Yin wrote:
>> 
>>> 2024年3月16日 11:03，Shiyou Yin  写道：
>>> 
>>> [PATCH 1/3] swscale: [LA] Optimize range convert for yuvj420p.
>>> [PATCH 2/3] swscale: [LA] Optimize yuv2plane1_8_c.
>>> [PATCH 3/3] swscale: [LA] Optimize swscale funcs in input.c
>>> 
>>> ___
>>> ffmpeg-devel mailing list
>>> ffmpeg-devel@ffmpeg.org
>>> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>>> 
>>> To unsubscribe, visit link above, or email
>>> ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe”.
>> 
>> Hi, Michale
>>Could you please help to review this patch set, thanks.
> 
> I can apply it if it has been reviewed but i cannot review it currently
> 
> thx
> 

Please help to apply this patch set, it has been tested and reviewed by my 
colleague.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] Add optimization in swscale for LA.

2024-03-25 Thread Shiyou Yin


> 2024年3月16日 11:03，Shiyou Yin  写道：
> 
> [PATCH 1/3] swscale: [LA] Optimize range convert for yuvj420p.
> [PATCH 2/3] swscale: [LA] Optimize yuv2plane1_8_c.
> [PATCH 3/3] swscale: [LA] Optimize swscale funcs in input.c
> 
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe”.

Hi, Michale
Could you please help to review this patch set, thanks.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH 3/3] swscale: [LA] Optimize swscale funcs in input.c

2024-03-15 Thread Shiyou Yin

  xr0,   xr0,   xr0
+beqz a4,2f
+1:
+xvld xr1,   a1,3
+xvld xr2,   a1,35
+addi.d   a4,a4,-1
+addi.d   a1,a1,64
+xvpickev.b   xr3,   xr2,   xr1
+xvpermi.dxr3,   xr3,   0xd8
+xvpackev.b   xr3,   xr0,   xr3
+xvslli.h xr1,   xr3,   6
+xvsrli.h xr2,   xr3,   2
+xvor.v   xr3,   xr2,   xr1
+xvst xr3,   a0,0
+addi.d   a0,a0,32
+bnez a4,1b
+2:
+beqz t0,4f
+3:
+ld.b t1,a1,3
+addi.d   t0,t0,-1
+addi.d   a1,a1,4
+andi t1,t1,0xff
+slli.w   t2,t1,6
+srli.w   t3,t1,2
+or   t1,t2,t3
+st.h t1,a0,0
+addi.d   a0,a0,2
+bnez t0,3b
+4:
+endfunc
diff --git a/libswscale/loongarch/input_lasx.c 
b/libswscale/loongarch/input_lasx.c
index 4830072eaf..0f1d954880 100644
--- a/libswscale/loongarch/input_lasx.c
+++ b/libswscale/loongarch/input_lasx.c
@@ -200,3 +200,46 @@ void planar_rgb_to_y_lasx(uint8_t *_dst, const uint8_t 
*src[4], int width,
 dst[i] = (tem_ry * r + tem_gy * g + tem_by * b + set) >> shift;
 }
 }
+
+av_cold void ff_sws_init_input_lasx(SwsContext *c)
+{
+enum AVPixelFormat srcFormat = c->srcFormat;
+
+switch (srcFormat) {
+case AV_PIX_FMT_YUYV422:
+c->chrToYV12 = yuy2ToUV_lasx;
+break;
+case AV_PIX_FMT_YVYU422:
+c->chrToYV12 = yvy2ToUV_lasx;
+break;
+case AV_PIX_FMT_UYVY422:
+c->chrToYV12 = uyvyToUV_lasx;
+break;
+case AV_PIX_FMT_NV12:
+case AV_PIX_FMT_NV16:
+case AV_PIX_FMT_NV24:
+c->chrToYV12 = nv12ToUV_lasx;
+break;
+case AV_PIX_FMT_NV21:
+case AV_PIX_FMT_NV42:
+c->chrToYV12 = nv21ToUV_lasx;
+break;
+case AV_PIX_FMT_GBRAP:
+case AV_PIX_FMT_GBRP:
+c->readChrPlanar = planar_rgb_to_uv_lasx;
+break;
+}
+
+if (c->needAlpha) {
+switch (srcFormat) {
+case AV_PIX_FMT_BGRA:
+case AV_PIX_FMT_RGBA:
+c->alpToYV12 = rgbaToA_lasx;
+break;
+case AV_PIX_FMT_ABGR:
+case AV_PIX_FMT_ARGB:
+c->alpToYV12 = abgrToA_lasx;
+break;
+}
+}
+}
diff --git a/libswscale/loongarch/input_lsx.c b/libswscale/loongarch/input_lsx.c
new file mode 100644
index 00..1bb04457bb
--- /dev/null
+++ b/libswscale/loongarch/input_lsx.c
@@ -0,0 +1,65 @@
+/*
+ * Copyright (C) 2024 Loongson Technology Corporation Limited
+ * Contributed by Shiyou Yin
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include "swscale_loongarch.h"
+
+av_cold void ff_sws_init_input_lsx(SwsContext *c)
+{
+enum AVPixelFormat srcFormat = c->srcFormat;
+
+switch (srcFormat) {
+case AV_PIX_FMT_YUYV422:
+c->chrToYV12 = yuy2ToUV_lsx;
+break;
+case AV_PIX_FMT_YVYU422:
+c->chrToYV12 = yvy2ToUV_lsx;
+break;
+case AV_PIX_FMT_UYVY422:
+c->chrToYV12 = uyvyToUV_lsx;
+break;
+case AV_PIX_FMT_NV12:
+case AV_PIX_FMT_NV16:
+case AV_PIX_FMT_NV24:
+c->chrToYV12 = nv12ToUV_lsx;
+break;
+case AV_PIX_FMT_NV21:
+case AV_PIX_FMT_NV42:
+c->chrToYV12 = nv21ToUV_lsx;
+break;
+case AV_PIX_FMT_GBRAP:
+case AV_PIX_FMT_GBRP:
+c->readChrPlanar = planar_rgb_to_uv_lsx;
+break;
+}
+
+if (c->needAlpha) {
+switch (srcFormat) {
+case AV_PIX_FMT_BGRA:
+case AV_PIX_FMT_RGBA:
+c->alpToYV12 = rgbaToA_lsx;
+break;
+case AV_PIX_FMT_ABGR:
+case AV_PIX_FMT_ARGB:
+c->alpToYV12 = abgrToA_lsx;
+break;
+}
+}
+}
diff --git a/libswscale/loongarch/swscale_init_loongarch.c 
b/libswscale/loongarch/swscale_init_loongarch.c
index 04d2553fa4..3a5a7ee856 100644
--- a/libswscale/loongarch/swscale_init_loongarch.c
+++ b/libswscale/loongarch/swscale_init_loongarch.c
@@ -63,6 +63,7 @@ av_cold void ff_sws_init_swscale_loongarch(SwsContext *c)
 ff_sws_init_output_lsx(c, >yuv

[FFmpeg-devel] [PATCH 2/3] swscale: [LA] Optimize yuv2plane1_8_c.

2024-03-15 Thread Shiyou Yin

---
 libswscale/loongarch/output.S | 254 +-
 libswscale/loongarch/output_lasx.c|  23 +-
 libswscale/loongarch/output_lsx.c |  22 +-
 libswscale/loongarch/swscale_init_loongarch.c |  12 +-
 libswscale/loongarch/swscale_loongarch.h  |  29 +-
 5 files changed, 324 insertions(+), 16 deletions(-)

diff --git a/libswscale/loongarch/output.S b/libswscale/loongarch/output.S
index b44bac502a..d71667e38a 100644
--- a/libswscale/loongarch/output.S
+++ b/libswscale/loongarch/output.S
@@ -23,11 +23,11 @@
 
 #include "libavcodec/loongarch/loongson_asm.S"
 
-/* static void ff_yuv2planeX_8_lsx(const int16_t *filter, int filterSize,
+/* static void yuv2planeX_8_lsx(const int16_t *filter, int filterSize,
  * const int16_t **src, uint8_t *dest, int 
dstW,
  * const uint8_t *dither, int offset)
  */
-function ff_yuv2planeX_8_lsx
+function yuv2planeX_8_lsx
 addi.w  t1, a6, 1
 addi.w  t2, a6, 2
 addi.w  t3, a6, 3
@@ -136,3 +136,253 @@ function ff_yuv2planeX_8_lsx
 blt zero,   a4, .DEST
 .END:
 endfunc
+
+/*
+ * void yuv2plane1_8_lsx(const int16_t *src, uint8_t *dest, int dstW,
+ *   const uint8_t *dither, int offset)
+ */
+function yuv2plane1_8_lsx
+addi.w   t1,a4,1
+addi.w   t2,a4,2
+addi.w   t3,a4,3
+addi.w   t4,a4,4
+addi.w   t5,a4,5
+addi.w   t6,a4,6
+addi.w   t7,a4,7
+andi t0,a4,7
+andi t1,t1,7
+andi t2,t2,7
+andi t3,t3,7
+andi t4,t4,7
+andi t5,t5,7
+andi t6,t6,7
+andi t7,t7,7
+ldx.bu   t0,a3,t0
+ldx.bu   t1,a3,t1
+ldx.bu   t2,a3,t2
+ldx.bu   t3,a3,t3
+ldx.bu   t4,a3,t4
+ldx.bu   t5,a3,t5
+ldx.bu   t6,a3,t6
+ldx.bu   t7,a3,t7
+vinsgr2vr.h  vr1,   t0,0
+vinsgr2vr.h  vr1,   t1,1
+vinsgr2vr.h  vr1,   t2,2
+vinsgr2vr.h  vr1,   t3,3
+vinsgr2vr.h  vr1,   t4,4
+vinsgr2vr.h  vr1,   t5,5
+vinsgr2vr.h  vr1,   t6,6
+vinsgr2vr.h  vr1,   t7,7
+vsub.h   vr0,   vr0,   vr0
+vilvl.h  vr2,   vr0,   vr1
+vilvh.h  vr3,   vr0,   vr1
+
+andi t8,a2,7
+srli.d   a2,a2,3
+beqz a2,2f
+1:
+vld  vr1,   a0,0
+addi.d   a0,a0,16
+vshuf4i.dvr0,   vr1,   8
+vexth.w.hvr4,   vr0
+vexth.w.hvr5,   vr1
+
+vadd.w   vr4,   vr2,   vr4
+vadd.w   vr5,   vr3,   vr5
+vsrai.w  vr4,   vr4,   7
+vsrai.w  vr5,   vr5,   7
+vclip255.w   vr4,   vr4
+vclip255.w   vr5,   vr5
+vpickev.hvr1,   vr5,   vr4
+vpickev.bvr1,   vr1,   vr1
+fst.df1,a1,0
+addi.d   a1,a1,8
+addi.d   a2,a2,-1
+bnez a2,1b
+2:
+beqz t8,4f
+3:
+add.wa4,a4,t8
+addi.w   t1,a4,1
+addi.w   t2,a4,2
+addi.w   t3,a4,3
+addi.w   t4,a4,4
+addi.w   t5,a4,5
+addi.w   t6,a4,6
+addi.w   t7,a4,7
+andi t0,a4,7
+andi t1,t1,7
+andi t2,t2,7
+andi t3,t3,7
+andi t4,t4,7
+andi t5,t5,7
+andi t6,t6,7
+andi t7,t7,7
+ldx.bu   t0,a3,t0
+ldx.bu   t1,a3,t1
+ldx.bu   t2,a3,t2
+ldx.bu   t3,a3,t3
+ldx.bu   t4,a3,t4
+ldx.bu   t5,a3,t5
+ldx.bu   t6,a3,t6
+ldx.bu   t7,a3,t7
+vinsgr2vr.h  vr1,   t0,0
+vinsgr2vr.h  vr1,   t1,1
+vinsgr2vr.h  vr1,   t2,2
+vinsgr2vr.h  vr1,   t3,3
+vinsgr2vr.h  vr1,   t4,4
+vinsgr2vr.h  vr1,   t5,5
+vinsgr2vr.h  vr1,   t6,6
+vinsgr2vr.h  vr1,   t7,7
+vsub.h   vr0,   vr0,   vr0
+vilvl.h  vr2,   vr0,   vr1
+vilvh.h  vr3,   vr0,   vr1
+
+addi.d   a0,a0,-16
+add.da0,a0,t8
+add.da0,a0,t8
+addi.d   a1,a1,-8
+add.da1,a1,t8
+
+vld  vr1,   a0,0
+vshuf4i.dvr0,   vr1,   8
+vexth.w.hvr4,   vr0
+vexth.w.hvr5,   vr1
+
+vadd.w   vr4,   vr2,   vr4
+vadd.w   vr5,   vr3,   vr5
+vsrai.w  vr4,   vr4,   7
+vsrai.w  vr5,   vr5,   7
+vclip255.w   vr4,   vr4
+vclip255.w   vr5,   vr5
+vpickev.hvr1,   vr5,   vr4
+

[FFmpeg-devel] Add optimization in swscale for LA.

2024-03-15 Thread Shiyou Yin

[PATCH 1/3] swscale: [LA] Optimize range convert for yuvj420p.
[PATCH 2/3] swscale: [LA] Optimize yuv2plane1_8_c.
[PATCH 3/3] swscale: [LA] Optimize swscale funcs in input.c

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH 1/3] swscale: [LA] Optimize range convert for yuvj420p.

2024-03-15 Thread Shiyou Yin

---
 libswscale/loongarch/swscale.S| 368 ++
 libswscale/loongarch/swscale_init_loongarch.c |  33 ++
 libswscale/loongarch/swscale_loongarch.h  |  11 +
 libswscale/swscale_internal.h |   1 +
 libswscale/utils.c|   6 +-
 5 files changed, 418 insertions(+), 1 deletion(-)

diff --git a/libswscale/loongarch/swscale.S b/libswscale/loongarch/swscale.S
index aa4c5cbe28..67b1bc834d 100644
--- a/libswscale/loongarch/swscale.S
+++ b/libswscale/loongarch/swscale.S
@@ -1866,3 +1866,371 @@ function ff_hscale_16_to_19_sub_lsx
 ld.d s8,  sp, 64
 addi.d   sp,  sp, 72
 endfunc
+
+function lumRangeFromJpeg_lsx
+li.w  t0,14071
+li.w  t1,33561947
+vreplgr2vr.h  vr0,   t0
+srli.wt2,a1,3
+andi  t3,a1,7
+beqz  t2,2f
+1:
+vld   vr1,   a0,0
+vreplgr2vr.w  vr2,   t1
+vreplgr2vr.w  vr3,   t1
+vmaddwev.w.h  vr2,   vr0,   vr1
+vmaddwod.w.h  vr3,   vr0,   vr1
+vsrai.w   vr2,   vr2,   14
+vsrai.w   vr3,   vr3,   14
+vpackev.h vr1,   vr3,   vr2
+vst   vr1,   a0,0
+addi.da0,a0,16
+addi.dt2,t2,-1
+bnez  t2,1b
+2:
+beqz  t3,4f
+3:
+ld.h  t4,a0,0
+mul.w t4,t4,t0
+add.w t4,t4,t1
+srai.wt4,t4,14
+st.h  t4,a0,0
+addi.da0,a0,2
+addi.dt3,t3,-1
+bnez  t3,3b
+4:
+endfunc
+
+function lumRangeFromJpeg_lasx
+li.w   t0,14071
+li.w   t1,33561947
+xvreplgr2vr.h  xr0,   t0
+srli.w t2,a1,4
+andi   t3,a1,15
+beqz   t2,2f
+1:
+xvld   xr1,   a0,0
+xvreplgr2vr.w  xr2,   t1
+xvreplgr2vr.w  xr3,   t1
+xvmaddwev.w.h  xr2,   xr0,   xr1
+xvmaddwod.w.h  xr3,   xr0,   xr1
+xvsrai.w   xr2,   xr2,   14
+xvsrai.w   xr3,   xr3,   14
+xvpackev.h xr1,   xr3,   xr2
+xvst   xr1,   a0,0
+addi.d a0,a0,32
+addi.d t2,t2,-1
+bnez   t2,1b
+2:
+beqz  t3,4f
+3:
+ld.h  t4,a0,0
+mul.w t4,t4,t0
+add.w t4,t4,t1
+srai.wt4,t4,14
+st.h  t4,a0,0
+addi.da0,a0,2
+addi.dt3,t3,-1
+bnez  t3,3b
+4:
+endfunc
+
+function lumRangeToJpeg_lsx
+li.w  t0,19077
+li.w  t1,-39057361
+li.w  t2,30189
+vreplgr2vr.h  vr0,   t0
+vreplgr2vr.h  vr4,   t2
+srli.wt2,a1,3
+andi  t3,a1,7
+beqz  t2,2f
+1:
+vld   vr1,   a0,0
+vreplgr2vr.w  vr2,   t1
+vreplgr2vr.w  vr3,   t1
+vmin.hvr1,   vr1,   vr4
+vmaddwev.w.h  vr2,   vr0,   vr1
+vmaddwod.w.h  vr3,   vr0,   vr1
+vsrai.w   vr2,   vr2,   14
+vsrai.w   vr3,   vr3,   14
+vpackev.h vr1,   vr3,   vr2
+vst   vr1,   a0,0
+addi.da0,a0,16
+addi.dt2,t2,-1
+bnez  t2,1b
+2:
+beqz  t3,4f
+3:
+ld.h  t4,a0,0
+vreplgr2vr.h  vr1,   t4
+vmin.hvr1,   vr1,   vr4
+vpickve2gr.h  t4,vr1,   0
+mul.w t4,t4,t0
+add.w t4,t4,t1
+srai.wt4,t4,14
+st.h  t4,a0,0
+addi.da0,a0,2
+addi.dt3,t3,-1
+bnez  t3,3b
+4:
+endfunc
+
+function lumRangeToJpeg_lasx
+li.w   t0,19077
+li.w   t1,-39057361
+li.w   t2,30189
+xvreplgr2vr.h  xr0,   t0
+xvreplgr2vr.h  xr4,   t2
+srli.w t2,a1,4
+andi   t3,a1,15
+beqz   t2,2f
+1:
+xvld   xr1,   a0,0
+xvreplgr2vr.w  xr2,   t1
+xvreplgr2vr.w  xr3,   t1
+xvmin.hxr1,   xr1,   xr4
+xvmaddwev.w.h  xr2,   xr0,   xr1
+xvmaddwod.w.h  xr3,   xr0,   xr1
+xvsrai.w   xr2,   xr2,   14
+xvsrai.w   xr3,   xr3,   14
+xvpackev.h xr1,   xr3,   xr2
+xvst   xr1,   a0,0
+addi.d a0,a0,32
+addi.d t2,t2,-1
+bnez   t2,1b
+2:
+beqz   t3,4f
+3:
+ld.h   t4,a0,0
+vreplgr2vr.h   vr1,   t4
+vmin.h vr1,   vr1,   vr4
+vpickve2gr.h   t4,vr1,   0
+mul.w  t4,t4,t0
+add.w  t4,t4,t1
+srai.w t4,t4,14
+st.h   t4,a0,0
+addi.d a0,a0,2
+addi.d t3,t3,-1
+bnez

Re: [FFmpeg-devel] [PATCH v3 7/7] avutil/la: Add function performance testing

2023-05-25 Thread Shiyou Yin



> 2023年5月25日 10:36，Hao Chen  写道：
> 
> 
> 在 2023/5/24 下午7:03, Rémi Denis-Courmont 写道:
>> 
>> Le 24 mai 2023 10:39:59 GMT+03:00, Hao Chen  a écrit :
>>> 在 2023/5/20 下午5:38, Rémi Denis-Courmont 写道:
 Le lauantaina 20. toukokuuta 2023, 10.27.19 EEST Hao Chen a écrit :
> From: yuanhecai 
> 
> This patch supports the use of the "checkasm --bench" testing feature
> on loongarch platform.
> 
> Change-Id: I42790388d057c9ade0dfa38a19d9c1fd44ca0bc3
> ---
>   libavutil/loongarch/timer.h | 48 +
>   libavutil/timer.h   |  2 ++
>   2 files changed, 50 insertions(+)
>   create mode 100644 libavutil/loongarch/timer.h
> 
> diff --git a/libavutil/loongarch/timer.h b/libavutil/loongarch/timer.h
> new file mode 100644
> index 00..44ed786409
> --- /dev/null
> +++ b/libavutil/loongarch/timer.h
> @@ -0,0 +1,48 @@
> +/*
> + * Copyright (c) 2023 Loongson Technology Corporation Limited
> + * Contributed by Hecai Yuan 
> + *
> + * This file is part of FFmpeg.
> + *
> + * FFmpeg is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * FFmpeg is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with FFmpeg; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 
> 02110-1301
> USA + */
> +
> +#ifndef AVUTIL_LOONGARCH_TIMER_H
> +#define AVUTIL_LOONGARCH_TIMER_H
> +
> +#include 
> +#include "config.h"
> +
> +#if HAVE_INLINE_ASM
> +
> +#define AV_READ_TIME read_time
> +
> +static inline uint64_t read_time(void)
> +{
> +
> +#if ARCH_LOONGARCH64
> +uint64_t a, id = 0;
 Initial value is never used.
 
> +__asm__ volatile ( "rdtime.d  %0, %1" : "=r"(a), "=r"(id) :: "memory"
> ); +return a;
> +#else
> +uint32_t a, id = 0;
> +__asm__ volatile ( "rdtimel.w  %0, %1" : "=r"(a), "=r"(id) :: 
> "memory"
> ); +return (uint64_t)a;
> +#endif
 Why do you clobber memory here?
 
> +}
> +
> +#endif /* HAVE_INLINE_ASM */
> +
> +#endif /* AVUTIL_LOONGARCH_TIMER_H */
> diff --git a/libavutil/timer.h b/libavutil/timer.h
> index d3db5a27ef..861ba7e9d7 100644
> --- a/libavutil/timer.h
> +++ b/libavutil/timer.h
> @@ -61,6 +61,8 @@
>   #   include "riscv/timer.h"
>   #elif ARCH_X86
>   #   include "x86/timer.h"
> +#elif ARCH_LOONGARCH
> +#   include "loongarch/timer.h"
>   #endif
> 
>   #if !defined(AV_READ_TIME)
> 
 Thanks for your advice.  As described in loongarch's instruction 
 manual, the rdtime.d instruction is used as follows:
 rdtime.d rd, rj. The rj register stores the counter ID. In this 
 application, the value of counter ID is equal to 0.
>> You're setting a value, zero, to a variable `id`, that is then used as 
>> output operand. As far as the compiler is concerned, the value zero is never 
>> used and the initialisation can be elided. The value of register %1 is 
>> unspecified.
>> 
>> If you meant for `id` to be an input operand, the constraints are incorrect.
>> 
> 
> 
> You are right! Thank you very much for your reminder. I will correct it.
> 

 id is output operand, the constraints is correct, and initilazation of id is 
not necessary.

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH v4 1/7] avcodec/la: add LSX optimization for h264 idct.

2023-05-24 Thread Shiyou Yin



> 2023年5月25日 05:28，Michael Niedermayer  写道：
> 
> On Wed, May 24, 2023 at 03:48:27PM +0800, Hao Chen wrote:
>> From: Shiyou Yin 
>> 
>> loongson_asm.S is LoongArch asm optimization helper.
>> Add functions:
>> ff_h264_idct_add_8_lsx
>> ff_h264_idct8_add_8_lsx
>> ff_h264_idct_dc_add_8_lsx
>> ff_h264_idct8_dc_add_8_lsx
>> ff_h264_idct_add16_8_lsx
>> ff_h264_idct8_add4_8_lsx
>> ff_h264_idct_add8_8_lsx
>> ff_h264_idct_add8_422_8_lsx
>> ff_h264_idct_add16_intra_8_lsx
>> ff_h264_luma_dc_dequant_idct_8_lsx
>> Replaced function(LSX is sufficient for these functions):
>> ff_h264_idct_add_lasx
>> ff_h264_idct4x4_addblk_dc_lasx
>> ff_h264_idct_add16_lasx
>> ff_h264_idct8_add4_lasx
>> ff_h264_idct_add8_lasx
>> ff_h264_idct_add8_422_lasx
>> ff_h264_idct_add16_intra_lasx
>> ff_h264_deq_idct_luma_dc_lasx
>> Renamed functions:
>> ff_h264_idct8_addblk_lasx ==> ff_h264_idct8_add_8_lasx
>> ff_h264_idct8_dc_addblk_lasx ==> ff_h264_idct8_dc_add_8_lasx
>> 
>> ./configure --disable-lasx
>> ffmpeg -i 1_h264_1080p_30fps_3Mbps.mp4 -f rawvideo -y /dev/null -an
>> before: 155fps
>> after: 161fps
>> ---
>> libavcodec/loongarch/Makefile | 3 +-
>> libavcodec/loongarch/h264_deblock_lasx.c | 2 +-
>> libavcodec/loongarch/h264dsp_init_loongarch.c | 39 +-
>> libavcodec/loongarch/h264dsp_lasx.c | 2 +-
>> .../{h264dsp_lasx.h => h264dsp_loongarch.h} | 60 +-
>> libavcodec/loongarch/h264idct.S | 658 
>> libavcodec/loongarch/h264idct_lasx.c | 498 -
>> libavcodec/loongarch/h264idct_loongarch.c | 184 
>> libavcodec/loongarch/loongson_asm.S | 945 ++
>> 9 files changed, 1848 insertions(+), 543 deletions(-)
>> rename libavcodec/loongarch/{h264dsp_lasx.h => h264dsp_loongarch.h} (68%)
>> create mode 100644 libavcodec/loongarch/h264idct.S
>> delete mode 100644 libavcodec/loongarch/h264idct_lasx.c
>> create mode 100644 libavcodec/loongarch/h264idct_loongarch.c
>> create mode 100644 libavcodec/loongarch/loongson_asm.S
> 
> Applying: avcodec/la: add LSX optimization for h264 idct.
> .git/rebase-apply/patch:1431: tab in indent.
>   } else if (nnz) {
> warning: 1 line adds whitespace errors.
> 
Thanks, will set core.witespace in gitconfig to avoid these error.

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] Add LSX optimization in avcodec and swscale.

2023-05-21 Thread Shiyou Yin



> 2023年5月20日 15:27，Hao Chen  写道：
> 
> Retrigger the fate test.
> v1: Add LSX optimization in avcodec and swscale, due to the 2K series CPUs 
> only support lsx.
> v2: Modified the implementation of some functions and added support for the 
> checkasm --bench feature.
> v3: Fix whitespace errors in patch.
> 
> [PATCH v3 1/7] avcodec/la: add LSX optimization for h264 idct.
> [PATCH v3 2/7] avcodec/la: Add LSX optimization for loop filter.
> [PATCH v3 3/7] avcodec/la: Add LSX optimization for h264 chroma and
> [PATCH v3 4/7] avcodec/la: Add LSX optimization for h264 qpel.
> [PATCH v3 5/7] swscale/la: Optimize the functions of the swscale
> [PATCH v3 6/7] swscale/la: Add following builtin optimized functions
> [PATCH v3 7/7] avutil/la: Add function performance testing
> 
> 
LGTM.

Michael, please help to review and merge this PR,
FFmpeg added checkasm for h264chroma recently, and this PR happens to fix the 
failure on LA.


___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] Add LSX optimization in avcodec and swscale.

2023-05-17 Thread Shiyou Yin



> 2023年5月17日 15:03，Hao Chen  写道：
> 
> v1: Add LSX optimization in avcodec and swscale, due to the 2K series CPUs 
> only support lsx.
> v2: Modified the implementation of some functions and added support for the 
> checkasm --bench feature.
> 
> [PATCH v2 1/7] avcodec/la: add LSX optimization for h264 idct.
> [PATCH v2 2/7] avcodec/la: Add LSX optimization for loop filter.
> [PATCH v2 3/7] avcodec/la: Add LSX optimization for h264 chroma and
> [PATCH v2 4/7] avcodec/la: Add LSX optimization for h264 qpel.
> [PATCH v2 5/7] swscale/la: Optimize the functions of the swscale.
> [PATCH v2 6/7] swscale/la: Add following builtin optimized functions.
> [PATCH v2 7/7] avutil/la: Add function performance testing.
> 
> ___
> 

LGTM

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH v2 1/7] avcodec/la: add LSX optimization for h264 idct.

2023-05-17 Thread Shiyou Yin



> 2023年5月17日 15:03，Hao Chen  写道：
> 
> From: Shiyou Yin 
> 
> loongson_asm.S is LoongArch asm optimization helper.
> Add functions:
>  ff_h264_idct_add_8_lsx
>  ff_h264_idct8_add_8_lsx
>  ff_h264_idct_dc_add_8_lsx
>  ff_h264_idct8_dc_add_8_lsx
>  ff_h264_idct_add16_8_lsx
>  ff_h264_idct8_add4_8_lsx
>  ff_h264_idct_add8_8_lsx
>  ff_h264_idct_add8_422_8_lsx
>  ff_h264_idct_add16_intra_8_lsx
>  ff_h264_luma_dc_dequant_idct_8_lsx
> Replaced function(LSX is sufficient for these functions):
>  ff_h264_idct_add_lasx
>  ff_h264_idct4x4_addblk_dc_lasx
>  ff_h264_idct_add16_lasx
>  ff_h264_idct8_add4_lasx
>  ff_h264_idct_add8_lasx
>  ff_h264_idct_add8_422_lasx
>  ff_h264_idct_add16_intra_lasx
>  ff_h264_deq_idct_luma_dc_lasx
> Renamed functions:
>  ff_h264_idct8_addblk_lasx ==> ff_h264_idct8_add_8_lasx
>  ff_h264_idct8_dc_addblk_lasx ==> ff_h264_idct8_dc_add_8_lasx
> 
> ./configure --disable-lasx
> ffmpeg -i 1_h264_1080p_30fps_3Mbps.mp4 -f rawvideo -y /dev/null -an
> before: 155fps
> after:  161fps
> ---
> libavcodec/loongarch/Makefile |   3 +-
> libavcodec/loongarch/h264_deblock_lasx.c  |   2 +-
> libavcodec/loongarch/h264dsp_init_loongarch.c |  39 +-
> libavcodec/loongarch/h264dsp_lasx.c   |   2 +-
> .../{h264dsp_lasx.h => h264dsp_loongarch.h}   |  60 +-
> libavcodec/loongarch/h264idct.S   | 659 
> libavcodec/loongarch/h264idct_lasx.c  | 498 -
> libavcodec/loongarch/h264idct_loongarch.c | 185 
> libavcodec/loongarch/loongson_asm.S   | 946 ++
> 9 files changed, 1851 insertions(+), 543 deletions(-)
> rename libavcodec/loongarch/{h264dsp_lasx.h => h264dsp_loongarch.h} (68%)
> create mode 100644 libavcodec/loongarch/h264idct.S
> delete mode 100644 libavcodec/loongarch/h264idct_lasx.c
> create mode 100644 libavcodec/loongarch/h264idct_loongarch.c
> create mode 100644 libavcodec/loongarch/loongson_asm.S
> 

LGTM
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH v1 3/6] avcodec/la: Add LSX optimization for h264 chroma and intrapred.

2023-05-11 Thread Shiyou Yin


> 2023年5月4日 16:49，Hao Chen  写道：

> diff --git a/libavcodec/loongarch/h264chroma_loongarch.h 
> b/libavcodec/loongarch/h264chroma_loongarch.h
> new file mode 100644
> index 00..26a7155389
> --- /dev/null
> +++ b/libavcodec/loongarch/h264chroma_loongarch.h
> @@ -0,0 +1,43 @@
> +/*
> + * Copyright (c) 2023 Loongson Technology Corporation Limited
> + * Contributed by Shiyou Yin  <mailto:yinshiyou...@loongson.cn>>
> + *
> + * This file is part of FFmpeg.
> + *
> + * FFmpeg is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * FFmpeg is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with FFmpeg; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 
> USA
> + */
> +
> +#ifndef AVCODEC_LOONGARCH_H264CHROMA_LOONGARCH_H
> +#define AVCODEC_LOONGARCH_H264CHROMA_LOONGARCH_H
> +
> +#include 
> +#include 
> +#include "libavcodec/h264.h"
> +
stdint.h and stddef.h is not necessary.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH] avutil: [LA] use getauxval to do runtime check.

2023-02-26 Thread Shiyou Yin



> 2023年2月23日 15:48，Steven Liu  写道：
> 
> Shiyou Yin  于2023年2月14日周二 20:26写道：
>> 
>> Replace cpucfg with getauxval to avoid crash in case of
>> some processor capabilities are not supportted by kernel used.
>> ---
>> libavutil/loongarch/cpu.c | 24 
>> 1 file changed, 8 insertions(+), 16 deletions(-)
>> 
>> diff --git a/libavutil/loongarch/cpu.c b/libavutil/loongarch/cpu.c
>> index e4b240bc44..cad8504fde 100644
>> --- a/libavutil/loongarch/cpu.c
>> +++ b/libavutil/loongarch/cpu.c
>> @@ -21,26 +21,18 @@
>> 
>> #include 
>> #include "cpu.h"
>> +#include 
>> 
>> -#define LOONGARCH_CFG2 0x2
>> -#define LOONGARCH_CFG2_LSX(1 << 6)
>> -#define LOONGARCH_CFG2_LASX   (1 << 7)
>> -
>> -static int cpu_flags_cpucfg(void)
>> +#define LA_HWCAP_LSX(1<<4)
>> +#define LA_HWCAP_LASX   (1<<5)
>> +static int cpu_flags_getauxval(void)
>> {
>> int flags = 0;
>> -uint32_t cfg2 = 0;
>> -
>> -__asm__ volatile(
>> -"cpucfg %0, %1 \n\t"
>> -: "+"(cfg2)
>> -: "r"(LOONGARCH_CFG2)
>> -);
>> +int flag  = (int)getauxval(AT_HWCAP);
>> 
>> -if (cfg2 & LOONGARCH_CFG2_LSX)
>> +if (flag & LA_HWCAP_LSX)
>> flags |= AV_CPU_FLAG_LSX;
>> -
>> -if (cfg2 & LOONGARCH_CFG2_LASX)
>> +if (flag & LA_HWCAP_LASX)
>> flags |= AV_CPU_FLAG_LASX;
>> 
>> return flags;
>> @@ -49,7 +41,7 @@ static int cpu_flags_cpucfg(void)
>> int ff_get_cpu_flags_loongarch(void)
>> {
>> #if defined __linux__
>> -return cpu_flags_cpucfg();
>> +return cpu_flags_getauxval();
>> #else
>> /* Assume no SIMD ASE supported */
>> return 0;
>> --
>> 2.20.1
>> 
>> ___
>> ffmpeg-devel mailing list
>> ffmpeg-devel@ffmpeg.org
>> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>> 
>> To unsubscribe, visit link above, or email
>> ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
> 
> 
> LGTM
> 
> Thanks
> Steven
> ___

Could you please help to merge this patch.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH] avutil: [LA] use getauxval to do runtime check.

2023-02-14 Thread Shiyou Yin

Replace cpucfg with getauxval to avoid crash in case of
some processor capabilities are not supportted by kernel used.
---
 libavutil/loongarch/cpu.c | 24 
 1 file changed, 8 insertions(+), 16 deletions(-)

diff --git a/libavutil/loongarch/cpu.c b/libavutil/loongarch/cpu.c
index e4b240bc44..cad8504fde 100644
--- a/libavutil/loongarch/cpu.c
+++ b/libavutil/loongarch/cpu.c
@@ -21,26 +21,18 @@
 
 #include 
 #include "cpu.h"
+#include 
 
-#define LOONGARCH_CFG2 0x2
-#define LOONGARCH_CFG2_LSX(1 << 6)
-#define LOONGARCH_CFG2_LASX   (1 << 7)
-
-static int cpu_flags_cpucfg(void)
+#define LA_HWCAP_LSX(1<<4)
+#define LA_HWCAP_LASX   (1<<5)
+static int cpu_flags_getauxval(void)
 {
 int flags = 0;
-uint32_t cfg2 = 0;
-
-__asm__ volatile(
-"cpucfg %0, %1 \n\t"
-: "+"(cfg2)
-: "r"(LOONGARCH_CFG2)
-);
+int flag  = (int)getauxval(AT_HWCAP);
 
-if (cfg2 & LOONGARCH_CFG2_LSX)
+if (flag & LA_HWCAP_LSX)
 flags |= AV_CPU_FLAG_LSX;
-
-if (cfg2 & LOONGARCH_CFG2_LASX)
+if (flag & LA_HWCAP_LASX)
 flags |= AV_CPU_FLAG_LASX;
 
 return flags;
@@ -49,7 +41,7 @@ static int cpu_flags_cpucfg(void)
 int ff_get_cpu_flags_loongarch(void)
 {
 #if defined __linux__
-return cpu_flags_cpucfg();
+return cpu_flags_getauxval();
 #else
 /* Assume no SIMD ASE supported */
 return 0;
-- 
2.20.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH 3/4] avcodec/mobiclip: Check input size before (re)allocation

2022-11-18 Thread Shiyou Yin

Please ignore fail info form loongarch64 patchwork temporarily.
I have stoped the service from Nov 16th, but there are still result post to 
patchwork.
Have mailed to Andriy to help analyze this problem together.

> 2022年11月19日 05:09，Michael Niedermayer  写道：
> 
> Fixes: Timeout
> Fixes: 
> 52566/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_MOBICLIP_fuzzer-4913160050311168
> 
> Found-by: continuous fuzzing process 
> https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
> Signed-off-by: Michael Niedermayer 
> ---
> libavcodec/mobiclip.c | 3 +++
> 1 file changed, 3 insertions(+)
> 
> diff --git a/libavcodec/mobiclip.c b/libavcodec/mobiclip.c
> index aca462428c..c3b2383dbc 100644
> --- a/libavcodec/mobiclip.c
> +++ b/libavcodec/mobiclip.c
> @@ -1216,6 +1216,9 @@ static int mobiclip_decode(AVCodecContext *avctx, 
> AVFrame *rframe,
> AVFrame *frame = s->pic[s->current_pic];
> int ret;
> 
> +if (avctx->height/16 * (avctx->width/16) * 2 > 8LL*FFALIGN(pkt->size, 2))
> +return AVERROR_INVALIDDATA;
> +
> av_fast_padded_malloc(>bitstream, >bitstream_size,
>   pkt->size);
> 
> -- 
> 2.17.1
> 
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH v1 3/3] swscale/la: Add output_lasx.c file.

2022-09-20 Thread Shiyou Yin



> 2022年9月11日 10:06，Shiyou Yin  写道：
> 
> 
> 
>> 2022年9月9日 21:11，Andreas Rheinhardt > <mailto:andreas.rheinha...@outlook.com>> 写道：
>> 
>> Shiyou Yin:
>>> 
>>> 
>>>> 2022年9月6日 16:12，Shiyou Yin >>> <mailto:yinshiyou...@loongson.cn>> 写道：
>>>> 
>>>>> 
>>>>> 2022年8月29日 20:30，Andreas Rheinhardt >>>> <mailto:andreas.rheinha...@outlook.com> 
>>>>> <mailto:andreas.rheinha...@outlook.com 
>>>>> <mailto:andreas.rheinha...@outlook.com>>> 写道：
>>>>> 
>>>>> Hao Chen:
>>>>>> ffmpeg -i ~/media/1_h264_1080p_30fps_3Mbps.mp4 -f rawvideo -s 640x480 
>>>>>> -pix_fmt
>>>>>> rgb24 -y /dev/null -an
>>>>>> before: 150fps
>>>>>> after: 183fps
>>>>>> 
>>>>>> Signed-off-by: Hao Chen >>>>> <mailto:chen...@loongson.cn>>
>>>>>> ---
>>>>>> libswscale/loongarch/Makefile | 3 +-
>>>>>> libswscale/loongarch/output_lasx.c | 1982 +
>>>>>> libswscale/loongarch/swscale_init_loongarch.c | 3 +
>>>>>> libswscale/loongarch/swscale_loongarch.h | 6 +
>>>>>> 4 files changed, 1993 insertions(+), 1 deletion(-)
>>>>>> create mode 100644 libswscale/loongarch/output_lasx.c
>>>>>> 
>>> 
>>>>>> +static void
>>>>>> +yuv2rgb_2_template_lasx(SwsContext *c, const int16_t *buf[2],
>>>>>> + const int16_t *ubuf[2], const int16_t *vbuf[2],
>>>>>> + const int16_t *abuf[2], uint8_t *dest, int dstW,
>>>>>> + int yalpha, int uvalpha, int y,
>>>>>> + enum AVPixelFormat target, int hasAlpha)
>>>>>> +{
>>>>>> + const int16_t *buf0 = buf[0], *buf1 = buf[1],
>>>>>> + *ubuf0 = ubuf[0], *ubuf1 = ubuf[1],
>>>>>> + *vbuf0 = vbuf[0], *vbuf1 = vbuf[1];
>>>>>> + int yalpha1 = 4096 - yalpha;
>>>>>> + int uvalpha1 = 4096 - uvalpha;
>>>>>> + int i, count = 0;
>>>>>> + int len = dstW - 15;
>>>>>> + int len_count = (dstW + 1) >> 1;
>>>>>> + const void *r, *g, *b;
>>>>>> + int head = YUVRGB_TABLE_HEADROOM;
>>>>>> + __m256i v_yalpha1 = __lasx_xvreplgr2vr_w(yalpha1);
>>>>>> + __m256i v_uvalpha1 = __lasx_xvreplgr2vr_w(uvalpha1);
>>>>>> + __m256i v_yalpha = __lasx_xvreplgr2vr_w(yalpha);
>>>>>> + __m256i v_uvalpha = __lasx_xvreplgr2vr_w(uvalpha);
>>>>>> + __m256i headroom = __lasx_xvreplgr2vr_w(head);
>>>>>> +
>>>>>> + for (i = 0; i < len; i += 16) {
>>>>>> + int Y1, Y2, U, V;
>>>>>> + int i_dex = i << 1;
>>>>>> + int c_dex = count << 1;
>>>>>> + __m256i y0_h, y0_l, y0, u0, v0;
>>>>>> + __m256i y1_h, y1_l, y1, u1, v1;
>>>>>> + __m256i y_l, y_h, u, v;
>>>>>> +
>>>>>> + DUP4_ARG2(__lasx_xvldx, buf0, i_dex, ubuf0, c_dex, vbuf0, c_dex,
>>>>>> + buf1, i_dex, y0, u0, v0, y1);
>>>>>> + DUP2_ARG2(__lasx_xvldx, ubuf1, c_dex, vbuf1, c_dex, u1, v1);
>>>>>> + DUP2_ARG2(__lasx_xvsllwil_w_h, y0, 0, y1, 0, y0_l, y1_l);
>>>>>> + DUP2_ARG1(__lasx_xvexth_w_h, y0, y1, y0_h, y1_h);
>>>>>> + DUP4_ARG1(__lasx_vext2xv_w_h, u0, u1, v0, v1, u0, u1, v0, v1);
>>>>>> + y0_l = __lasx_xvmul_w(y0_l, v_yalpha1);
>>>>>> + y0_h = __lasx_xvmul_w(y0_h, v_yalpha1);
>>>>>> + u0 = __lasx_xvmul_w(u0, v_uvalpha1);
>>>>>> + v0 = __lasx_xvmul_w(v0, v_uvalpha1);
>>>>>> + y_l = __lasx_xvmadd_w(y0_l, v_yalpha, y1_l);
>>>>>> + y_h = __lasx_xvmadd_w(y0_h, v_yalpha, y1_h);
>>>>>> + u = __lasx_xvmadd_w(u0, v_uvalpha, u1);
>>>>>> + v = __lasx_xvmadd_w(v0, v_uvalpha, v1);
>>>>>> + y_l = __lasx_xvsrai_w(y_l, 19);
>>>>>> + y_h = __lasx_xvsrai_w(y_h, 19);
>>>>>> + u = __lasx_xvsrai_w(u, 19);
>>>>>> + v = __lasx_xvsrai_w(v, 19);
>>>>>> + u = __lasx_xvadd_w(u, headroom);
>>>>>> + v = __lasx_xvadd_w(v, headroom);
>>>>>> + WRITE_YUV2RGB(y_l, y_l, u, v, 0, 1, 0, 0);
>>>>>> + WRITE_YUV2RGB(y_l, y_l, u, v, 2, 3, 1, 1);
>>>>>> + WRITE_YUV2R

Re: [FFmpeg-devel] [PATCH v1 3/3] swscale/la: Add output_lasx.c file.

2022-09-10 Thread Shiyou Yin



> 2022年9月9日 21:11，Andreas Rheinhardt  写道：
> 
> Shiyou Yin:
>> 
>> 
>>> 2022年9月6日 16:12，Shiyou Yin  写道：
>>> 
>>>> 
>>>> 2022年8月29日 20:30，Andreas Rheinhardt >>> <mailto:andreas.rheinha...@outlook.com>> 写道：
>>>> 
>>>> Hao Chen:
>>>>> ffmpeg -i ~/media/1_h264_1080p_30fps_3Mbps.mp4 -f rawvideo -s 640x480 
>>>>> -pix_fmt
>>>>> rgb24 -y /dev/null -an
>>>>> before: 150fps
>>>>> after: 183fps
>>>>> 
>>>>> Signed-off-by: Hao Chen 
>>>>> ---
>>>>> libswscale/loongarch/Makefile | 3 +-
>>>>> libswscale/loongarch/output_lasx.c | 1982 +
>>>>> libswscale/loongarch/swscale_init_loongarch.c | 3 +
>>>>> libswscale/loongarch/swscale_loongarch.h | 6 +
>>>>> 4 files changed, 1993 insertions(+), 1 deletion(-)
>>>>> create mode 100644 libswscale/loongarch/output_lasx.c
>>>>> 
>> 
>>>>> +static void
>>>>> +yuv2rgb_2_template_lasx(SwsContext *c, const int16_t *buf[2],
>>>>> + const int16_t *ubuf[2], const int16_t *vbuf[2],
>>>>> + const int16_t *abuf[2], uint8_t *dest, int dstW,
>>>>> + int yalpha, int uvalpha, int y,
>>>>> + enum AVPixelFormat target, int hasAlpha)
>>>>> +{
>>>>> + const int16_t *buf0 = buf[0], *buf1 = buf[1],
>>>>> + *ubuf0 = ubuf[0], *ubuf1 = ubuf[1],
>>>>> + *vbuf0 = vbuf[0], *vbuf1 = vbuf[1];
>>>>> + int yalpha1 = 4096 - yalpha;
>>>>> + int uvalpha1 = 4096 - uvalpha;
>>>>> + int i, count = 0;
>>>>> + int len = dstW - 15;
>>>>> + int len_count = (dstW + 1) >> 1;
>>>>> + const void *r, *g, *b;
>>>>> + int head = YUVRGB_TABLE_HEADROOM;
>>>>> + __m256i v_yalpha1 = __lasx_xvreplgr2vr_w(yalpha1);
>>>>> + __m256i v_uvalpha1 = __lasx_xvreplgr2vr_w(uvalpha1);
>>>>> + __m256i v_yalpha = __lasx_xvreplgr2vr_w(yalpha);
>>>>> + __m256i v_uvalpha = __lasx_xvreplgr2vr_w(uvalpha);
>>>>> + __m256i headroom = __lasx_xvreplgr2vr_w(head);
>>>>> +
>>>>> + for (i = 0; i < len; i += 16) {
>>>>> + int Y1, Y2, U, V;
>>>>> + int i_dex = i << 1;
>>>>> + int c_dex = count << 1;
>>>>> + __m256i y0_h, y0_l, y0, u0, v0;
>>>>> + __m256i y1_h, y1_l, y1, u1, v1;
>>>>> + __m256i y_l, y_h, u, v;
>>>>> +
>>>>> + DUP4_ARG2(__lasx_xvldx, buf0, i_dex, ubuf0, c_dex, vbuf0, c_dex,
>>>>> + buf1, i_dex, y0, u0, v0, y1);
>>>>> + DUP2_ARG2(__lasx_xvldx, ubuf1, c_dex, vbuf1, c_dex, u1, v1);
>>>>> + DUP2_ARG2(__lasx_xvsllwil_w_h, y0, 0, y1, 0, y0_l, y1_l);
>>>>> + DUP2_ARG1(__lasx_xvexth_w_h, y0, y1, y0_h, y1_h);
>>>>> + DUP4_ARG1(__lasx_vext2xv_w_h, u0, u1, v0, v1, u0, u1, v0, v1);
>>>>> + y0_l = __lasx_xvmul_w(y0_l, v_yalpha1);
>>>>> + y0_h = __lasx_xvmul_w(y0_h, v_yalpha1);
>>>>> + u0 = __lasx_xvmul_w(u0, v_uvalpha1);
>>>>> + v0 = __lasx_xvmul_w(v0, v_uvalpha1);
>>>>> + y_l = __lasx_xvmadd_w(y0_l, v_yalpha, y1_l);
>>>>> + y_h = __lasx_xvmadd_w(y0_h, v_yalpha, y1_h);
>>>>> + u = __lasx_xvmadd_w(u0, v_uvalpha, u1);
>>>>> + v = __lasx_xvmadd_w(v0, v_uvalpha, v1);
>>>>> + y_l = __lasx_xvsrai_w(y_l, 19);
>>>>> + y_h = __lasx_xvsrai_w(y_h, 19);
>>>>> + u = __lasx_xvsrai_w(u, 19);
>>>>> + v = __lasx_xvsrai_w(v, 19);
>>>>> + u = __lasx_xvadd_w(u, headroom);
>>>>> + v = __lasx_xvadd_w(v, headroom);
>>>>> + WRITE_YUV2RGB(y_l, y_l, u, v, 0, 1, 0, 0);
>>>>> + WRITE_YUV2RGB(y_l, y_l, u, v, 2, 3, 1, 1);
>>>>> + WRITE_YUV2RGB(y_h, y_h, u, v, 0, 1, 2, 2);
>>>>> + WRITE_YUV2RGB(y_h, y_h, u, v, 2, 3, 3, 3);
>>>>> + WRITE_YUV2RGB(y_l, y_l, u, v, 4, 5, 4, 4);
>>>>> + WRITE_YUV2RGB(y_l, y_l, u, v, 6, 7, 5, 5);
>>>>> + WRITE_YUV2RGB(y_h, y_h, u, v, 4, 5, 6, 6);
>>>>> + WRITE_YUV2RGB(y_h, y_h, u, v, 6, 7, 7, 7);
>>>>> + }
>>>>> + if (dstW - i >= 8) {
>>>>> + int Y1, Y2, U, V;
>>>>> + int i_dex = i << 1;
>>>>> + __m256i y0_l, y0, u0, v0;
>>>>> + __m256i y1_l, y1, u1, v1;
>>>>> + __m256i y_l, u, v;
>>>>> +
>>&

Re: [FFmpeg-devel] Fix bugs on Mips platform.

2022-09-09 Thread Shiyou Yin



> 2022年9月9日 17:41，Hao Chen  写道：
> 
> v2: Modifies the format of the Commit message.
> 
> [PATCH v2 1/2] lavc/mips: Fix bugs in me_cmp_msa.c file.
> [PATCH v2 2/2] lavc/mips: Fix hevc decoding bugs on MIPS paltform.
> 
> 
LGTM.

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH v1 3/3] swscale/la: Add output_lasx.c file.

2022-09-08 Thread Shiyou Yin



> 2022年9月6日 16:12，Shiyou Yin  写道：
> 
>> 
>> 2022年8月29日 20:30，Andreas Rheinhardt > <mailto:andreas.rheinha...@outlook.com>> 写道：
>> 
>> Hao Chen:
>>> ffmpeg -i ~/media/1_h264_1080p_30fps_3Mbps.mp4 -f rawvideo -s 640x480 
>>> -pix_fmt
>>> rgb24 -y /dev/null -an
>>> before: 150fps
>>> after: 183fps
>>> 
>>> Signed-off-by: Hao Chen 
>>> ---
>>> libswscale/loongarch/Makefile | 3 +-
>>> libswscale/loongarch/output_lasx.c | 1982 +
>>> libswscale/loongarch/swscale_init_loongarch.c | 3 +
>>> libswscale/loongarch/swscale_loongarch.h | 6 +
>>> 4 files changed, 1993 insertions(+), 1 deletion(-)
>>> create mode 100644 libswscale/loongarch/output_lasx.c
>>> 

>>> +static void
>>> +yuv2rgb_2_template_lasx(SwsContext *c, const int16_t *buf[2],
>>> + const int16_t *ubuf[2], const int16_t *vbuf[2],
>>> + const int16_t *abuf[2], uint8_t *dest, int dstW,
>>> + int yalpha, int uvalpha, int y,
>>> + enum AVPixelFormat target, int hasAlpha)
>>> +{
>>> + const int16_t *buf0 = buf[0], *buf1 = buf[1],
>>> + *ubuf0 = ubuf[0], *ubuf1 = ubuf[1],
>>> + *vbuf0 = vbuf[0], *vbuf1 = vbuf[1];
>>> + int yalpha1 = 4096 - yalpha;
>>> + int uvalpha1 = 4096 - uvalpha;
>>> + int i, count = 0;
>>> + int len = dstW - 15;
>>> + int len_count = (dstW + 1) >> 1;
>>> + const void *r, *g, *b;
>>> + int head = YUVRGB_TABLE_HEADROOM;
>>> + __m256i v_yalpha1 = __lasx_xvreplgr2vr_w(yalpha1);
>>> + __m256i v_uvalpha1 = __lasx_xvreplgr2vr_w(uvalpha1);
>>> + __m256i v_yalpha = __lasx_xvreplgr2vr_w(yalpha);
>>> + __m256i v_uvalpha = __lasx_xvreplgr2vr_w(uvalpha);
>>> + __m256i headroom = __lasx_xvreplgr2vr_w(head);
>>> +
>>> + for (i = 0; i < len; i += 16) {
>>> + int Y1, Y2, U, V;
>>> + int i_dex = i << 1;
>>> + int c_dex = count << 1;
>>> + __m256i y0_h, y0_l, y0, u0, v0;
>>> + __m256i y1_h, y1_l, y1, u1, v1;
>>> + __m256i y_l, y_h, u, v;
>>> +
>>> + DUP4_ARG2(__lasx_xvldx, buf0, i_dex, ubuf0, c_dex, vbuf0, c_dex,
>>> + buf1, i_dex, y0, u0, v0, y1);
>>> + DUP2_ARG2(__lasx_xvldx, ubuf1, c_dex, vbuf1, c_dex, u1, v1);
>>> + DUP2_ARG2(__lasx_xvsllwil_w_h, y0, 0, y1, 0, y0_l, y1_l);
>>> + DUP2_ARG1(__lasx_xvexth_w_h, y0, y1, y0_h, y1_h);
>>> + DUP4_ARG1(__lasx_vext2xv_w_h, u0, u1, v0, v1, u0, u1, v0, v1);
>>> + y0_l = __lasx_xvmul_w(y0_l, v_yalpha1);
>>> + y0_h = __lasx_xvmul_w(y0_h, v_yalpha1);
>>> + u0 = __lasx_xvmul_w(u0, v_uvalpha1);
>>> + v0 = __lasx_xvmul_w(v0, v_uvalpha1);
>>> + y_l = __lasx_xvmadd_w(y0_l, v_yalpha, y1_l);
>>> + y_h = __lasx_xvmadd_w(y0_h, v_yalpha, y1_h);
>>> + u = __lasx_xvmadd_w(u0, v_uvalpha, u1);
>>> + v = __lasx_xvmadd_w(v0, v_uvalpha, v1);
>>> + y_l = __lasx_xvsrai_w(y_l, 19);
>>> + y_h = __lasx_xvsrai_w(y_h, 19);
>>> + u = __lasx_xvsrai_w(u, 19);
>>> + v = __lasx_xvsrai_w(v, 19);
>>> + u = __lasx_xvadd_w(u, headroom);
>>> + v = __lasx_xvadd_w(v, headroom);
>>> + WRITE_YUV2RGB(y_l, y_l, u, v, 0, 1, 0, 0);
>>> + WRITE_YUV2RGB(y_l, y_l, u, v, 2, 3, 1, 1);
>>> + WRITE_YUV2RGB(y_h, y_h, u, v, 0, 1, 2, 2);
>>> + WRITE_YUV2RGB(y_h, y_h, u, v, 2, 3, 3, 3);
>>> + WRITE_YUV2RGB(y_l, y_l, u, v, 4, 5, 4, 4);
>>> + WRITE_YUV2RGB(y_l, y_l, u, v, 6, 7, 5, 5);
>>> + WRITE_YUV2RGB(y_h, y_h, u, v, 4, 5, 6, 6);
>>> + WRITE_YUV2RGB(y_h, y_h, u, v, 6, 7, 7, 7);
>>> + }
>>> + if (dstW - i >= 8) {
>>> + int Y1, Y2, U, V;
>>> + int i_dex = i << 1;
>>> + __m256i y0_l, y0, u0, v0;
>>> + __m256i y1_l, y1, u1, v1;
>>> + __m256i y_l, u, v;
>>> +
>>> + y0 = __lasx_xvldx(buf0, i_dex);
>> 
>> 1. Not long ago, I tried to constify the src pointer of several asm
>> functions and noticed that they produced new warnings for loongarch
>> (according to patchwork:
>> https://patchwork.ffmpeg.org/project/ffmpeg/patch/db6pr0101mb2214178d3e6b8dca5b86f8198f...@db6pr0101mb2214.eurprd01.prod.exchangelabs.com/),
>> even though I was sure that the code is const-correct. After finding
>> (via https://github.com/opencv/opencv/pull/21833) a toolchain
>> (https://gitee.com/wenux/cross-compiler-la-on-x86) that can build the
>> lasx and lsx code (upstream GCC seems to be lacking lsx and lasx support
>> at the moment; at least, my self-compiled loongarch-GCC did not support
>> lsx and lasx) the issu

Re: [FFmpeg-devel] [PATCH v2] avcodec/mips: Fix MMI macro replaces in HEVC Decoder

2022-09-06 Thread Shiyou Yin



> 2022年8月18日 20:29，Shiyou Yin  写道：
> 
> 
> 
>> 2022年8月18日 19:44，戚铁铮 mailto:qitiezh...@360.cn>> 写道：
>> 
>> 
>> At 2022/8/18 PM 7:01, "Qi Tiezheng" > <mailto:qitiezh...@360.cn>> wrote:
>> 
>>> The latest commit of Loongson MMI macro replaces were incorrect.
>>> It makes a mass of green tints on HEVC videos when playing. I've
>>> compared it with the older MMI implementation, and found out that
>>> several lines have been replaced by wrong macros.
>>> 
>>> Signed-off-by: Qi Tiezheng mailto:qitiezh...@360.cn>>
>>> ---
>>> libavcodec/mips/hevcdsp_mmi.c | 16 
>>> 1 file changed, 8 insertions(+), 8 deletions(-)
>>> 
>>> diff --git a/libavcodec/mips/hevcdsp_mmi.c b/libavcodec/mips/hevcdsp_mmi.c
>>> index 0ea88a7c08..1da56d3d87 100644
>>> --- a/libavcodec/mips/hevcdsp_mmi.c
>>> +++ b/libavcodec/mips/hevcdsp_mmi.c
>>> @@ -80,7 +80,7 @@ void ff_hevc_put_hevc_qpel_h##w##_8_mmi(int16_t *dst, 
>>> const uint8_t *_src, \
>>> "paddh %[ftmp3], %[ftmp3], %[ftmp4] \n\t" \
>>> "paddh %[ftmp5], %[ftmp5], %[ftmp6] \n\t" \
>>> "paddh %[ftmp3], %[ftmp3], %[ftmp5] \n\t" \
>>> - MMI_ULDC1(%[ftmp3], %[dst], 0x00) \
>>> + MMI_USDC1(%[ftmp3], %[dst], 0x00) \
>>> \
>>> "daddi %[x], %[x], -0x01 \n\t" \
>>> PTR_ADDIU "%[src], %[src], 0x04 \n\t" \
>>> @@ -178,7 +178,7 @@ void ff_hevc_put_hevc_qpel_hv##w##_8_mmi(int16_t *dst, 
>>> const uint8_t *_src,\
>>> "paddh %[ftmp3], %[ftmp3], %[ftmp4] \n\t" \
>>> "paddh %[ftmp5], %[ftmp5], %[ftmp6] \n\t" \
>>> "paddh %[ftmp3], %[ftmp3], %[ftmp5] \n\t" \
>>> - MMI_ULDC1(%[ftmp3], %[tmp], 0x00) \
>>> + MMI_USDC1(%[ftmp3], %[tmp], 0x00) \
>>> \
>>> "daddi %[x], %[x], -0x01 \n\t" \
>>> PTR_ADDIU "%[src], %[src], 0x04 \n\t" \
>>> @@ -690,10 +690,10 @@ void ff_hevc_put_hevc_epel_bi_hv##w##_8_mmi(uint8_t 
>>> *_dst, \
>>> \
>>> "1: \n\t" \
>>> "2: \n\t" \
>>> - MMI_ULDC1(%[ftmp3], %[src], 0x00) \
>>> - MMI_ULDC1(%[ftmp4], %[src], 0x01) \
>>> - MMI_ULDC1(%[ftmp5], %[src], 0x02) \
>>> - MMI_ULDC1(%[ftmp6], %[src], 0x03) \
>>> + MMI_ULWC1(%[ftmp2], %[src], 0x00) \
>>> + MMI_ULWC1(%[ftmp3], %[src], 0x01) \
>>> + MMI_ULWC1(%[ftmp4], %[src], 0x02) \
>>> + MMI_ULWC1(%[ftmp5], %[src], 0x03) \
>>> "punpcklbh %[ftmp2], %[ftmp2], %[ftmp0] \n\t" \
>>> "pmullh %[ftmp2], %[ftmp2], %[ftmp1] \n\t" \
>>> "punpcklbh %[ftmp3], %[ftmp3], %[ftmp0] \n\t" \
>>> @@ -707,7 +707,7 @@ void ff_hevc_put_hevc_epel_bi_hv##w##_8_mmi(uint8_t 
>>> *_dst, \
>>> "paddh %[ftmp2], %[ftmp2], %[ftmp3] \n\t" \
>>> "paddh %[ftmp4], %[ftmp4], %[ftmp5] \n\t" \
>>> "paddh %[ftmp2], %[ftmp2], %[ftmp4] \n\t" \
>>> - MMI_ULDC1(%[ftmp2], %[tmp], 0x00) \
>>> + MMI_USDC1(%[ftmp2], %[tmp], 0x00) \
>>> \
>>> "daddi %[x], %[x], -0x01 \n\t" \
>>> PTR_ADDIU "%[src], %[src], 0x04 \n\t" \
>>> @@ -773,7 +773,7 @@ void ff_hevc_put_hevc_epel_bi_hv##w##_8_mmi(uint8_t 
>>> *_dst, \
>>> "paddw %[ftmp5], %[ftmp5], %[ftmp6] \n\t" \
>>> "psraw %[ftmp5], %[ftmp5], %[ftmp0] \n\t" \
>>> "packsswh %[ftmp3], %[ftmp3], %[ftmp5] \n\t" \
>>> - MMI_ULDC1(%[ftmp4], %[tmp], 0x02) \
>>> + MMI_ULDC1(%[ftmp4], %[src2], 0x00) \
>>> "li %[rtmp0], 0x10 \n\t" \
>>> "dmtc1 %[rtmp0], %[ftmp8] \n\t" \
>>> "punpcklhw %[ftmp5], %[ftmp2], %[ftmp3] \n\t" \
>>> -- 
>>> 2.25.1
>> 
>> Sorry, I must use e-mail client because our e-mail server is Exchange not 
>> SMTP.
>> The patch system seems cannot process UTF-8 Chinese characters correctly.
>> I try sending it as attachment again.
>> 
> Thank you for fixing this bug.
> LGTM.
> 

Hi, Michael

Could you please help to merge this Fix.

Thanks,
Shiyou
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH v1 3/3] swscale/la: Add output_lasx.c file.

2022-09-06 Thread Shiyou Yin


> 2022年8月29日 20:30，Andreas Rheinhardt  写道：
> 
> Hao Chen:
>> ffmpeg -i ~/media/1_h264_1080p_30fps_3Mbps.mp4 -f rawvideo -s 640x480 
>> -pix_fmt
>> rgb24 -y /dev/null -an
>> before: 150fps
>> after:  183fps
>> 
>> Signed-off-by: Hao Chen 
>> ---
>> libswscale/loongarch/Makefile |3 +-
>> libswscale/loongarch/output_lasx.c| 1982 +
>> libswscale/loongarch/swscale_init_loongarch.c |3 +
>> libswscale/loongarch/swscale_loongarch.h  |6 +
>> 4 files changed, 1993 insertions(+), 1 deletion(-)
>> create mode 100644 libswscale/loongarch/output_lasx.c
>> 
>> diff --git a/libswscale/loongarch/Makefile b/libswscale/loongarch/Makefile
>> index 4345971514..54d48b3de0 100644
>> --- a/libswscale/loongarch/Makefile
>> +++ b/libswscale/loongarch/Makefile
>> @@ -2,4 +2,5 @@ OBJS-$(CONFIG_SWSCALE)  += 
>> loongarch/swscale_init_loongarch.o
>> LASX-OBJS-$(CONFIG_SWSCALE) += loongarch/swscale_lasx.o \
>>loongarch/input_lasx.o   \
>>loongarch/yuv2rgb_lasx.o \
>> -   loongarch/rgb2rgb_lasx.o
>> +   loongarch/rgb2rgb_lasx.o \
>> +   
>> loongarch/output_lasx.o
>> diff --git a/libswscale/loongarch/output_lasx.c 
>> b/libswscale/loongarch/output_lasx.c
>> new file mode 100644
>> index 00..19f82692ff
>> --- /dev/null
>> +++ b/libswscale/loongarch/output_lasx.c
>> @@ -0,0 +1,1982 @@
>> +/*
>> + * Copyright (C) 2022 Loongson Technology Corporation Limited
>> + * Contributed by Hao Chen(chen...@loongson.cn)
>> + *
>> + * This file is part of FFmpeg.
>> + *
>> + * FFmpeg is free software; you can redistribute it and/or
>> + * modify it under the terms of the GNU Lesser General Public
>> + * License as published by the Free Software Foundation; either
>> + * version 2.1 of the License, or (at your option) any later version.
>> + *
>> + * FFmpeg is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> + * Lesser General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU Lesser General Public
>> + * License along with FFmpeg; if not, write to the Free Software
>> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 
>> USA
>> + */
>> +
>> +#include "swscale_loongarch.h"
>> +#include "libavutil/loongarch/loongson_intrinsics.h"
>> +
>> +void ff_yuv2planeX_8_lasx(const int16_t *filter, int filterSize,
>> +  const int16_t **src, uint8_t *dest, int dstW,
>> +  const uint8_t *dither, int offset)
>> +{
>> +int i;
>> +int len = dstW - 15;
>> +__m256i mask = {0x1C0C180814041000, 0x1C1814100C080400,
>> +0x1C0C180814041000, 0x1C1814100C080400};
>> +__m256i val1, val2, val3;
>> +uint8_t dither0 = dither[offset & 7];
>> +uint8_t dither1 = dither[(offset + 1) & 7];
>> +uint8_t dither2 = dither[(offset + 2) & 7];
>> +uint8_t dither3 = dither[(offset + 3) & 7];
>> +uint8_t dither4 = dither[(offset + 4) & 7];
>> +uint8_t dither5 = dither[(offset + 5) & 7];
>> +uint8_t dither6 = dither[(offset + 6) & 7];
>> +uint8_t dither7 = dither[(offset + 7) & 7];
>> +int val_1[8] = {dither0, dither2, dither4, dither6,
>> +dither0, dither2, dither4, dither6};
>> +int val_2[8] = {dither1, dither3, dither5, dither7,
>> +dither1, dither3, dither5, dither7};
>> +int val_3[8] = {dither0, dither1, dither2, dither3,
>> +dither4, dither5, dither6, dither7};
>> +
>> +DUP2_ARG2(__lasx_xvld, val_1, 0, val_2, 0, val1, val2);
>> +val3 = __lasx_xvld(val_3, 0);
>> +
>> +for (i = 0; i < len; i += 16) {
>> +int j;
>> +__m256i src0, filter0, val;
>> +__m256i val_ev, val_od;
>> +
>> +val_ev = __lasx_xvslli_w(val1, 12);
>> +val_od = __lasx_xvslli_w(val2, 12);
>> +
>> +for (j = 0; j < filterSize; j++) {
>> +src0  = __lasx_xvld(src[j]+ i, 0);
>> +filter0 = __lasx_xvldrepl_h((filter + j), 0);
>> +val_ev = __lasx_xvmaddwev_w_h(val_ev, src0, filter0);
>> +val_od = __lasx_xvmaddwod_w_h(val_od, src0, filter0);
>> +}
>> +val_ev = __lasx_xvsrai_w(val_ev, 19);
>> +val_od = __lasx_xvsrai_w(val_od, 19);
>> +val_ev = __lasx_xvclip255_w(val_ev);
>> +val_od = __lasx_xvclip255_w(val_od);
>> +val= __lasx_xvshuf_b(val_od, val_ev, mask);
>> +__lasx_xvstelm_d(val, (dest + i), 0, 0);
>> +__lasx_xvstelm_d(val, (dest + i), 8, 2);
>> +}
>> +if (dstW - i >= 8){
>> +int j;
>> +__m256i src0, filter0, val_h;
>> +__m256i val_l;
>> +
>> +val_l = __lasx_xvslli_w(val3, 12);
>> +
>> +

Re: [FFmpeg-devel] [PATCH v1 2/3] swscale/la: Add yuv2rgb_lasx.c and rgb2rgb_lasx.c files

2022-08-31 Thread Shiyou Yin



> --- /dev/null
> +++ b/libswscale/loongarch/rgb2rgb_lasx.c
> @@ -0,0 +1,52 @@
> +/*
> + * Copyright (c) 2022 Loongson Technology Corporation Limited
> + * Contributed by Hao Chen(chen...@loongson.cn)
> + *
> + * This file is part of FFmpeg.
> + *
> + * FFmpeg is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * FFmpeg is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with FFmpeg; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 
> USA
> + */
> +
> +#include "swscale_loongarch.h"
> +#include "libavutil/loongarch/loongson_intrinsics.h"
> +
> +void ff_interleave_bytes_lasx(const uint8_t *src1, const uint8_t *src2,
> +  uint8_t *dest, int width, int height,
> +  int src1Stride, int src2Stride, int dstStride)
> +{
> +int h;
> +int len = width & (0xFFF0);
> +
> +for (h = 0; h < height; h++) {
> +int w, index = 0;
> +__m256i src_1, src_2, dst;
> +
> +for (w = 0; w < len; w += 16) {
> +DUP2_ARG2(__lasx_xvld, src1 + w, 0, src2 + w, 0, src_1, src_2);
> +src_1 = __lasx_xvpermi_d(src_1, 0xD8);
> +src_2 = __lasx_xvpermi_d(src_2, 0xD8);
> +dst   = __lasx_xvilvl_b(src_2, src_1);
> +__lasx_xvst(dst, dest + index, 0);
> +index  += 32;
> +}
> +for (w = 0; w < width; w++) {

w shouldn’t be reset to 0.

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH v1 1/3] swscale/la: Optimize hscale functions with lasx.

2022-08-31 Thread Shiyou Yin


> +
> +void planar_rgb_to_uv_lasx(uint8_t *_dstU, uint8_t *_dstV, const uint8_t 
> *src[4],
> +   int width, int32_t *rgb2yuv)
> +{
> +int i;
> +uint16_t *dstU   = (uint16_t *)_dstU;
> +uint16_t *dstV   = (uint16_t *)_dstV;
> +int set  = 0x4001 << (RGB2YUV_SHIFT - 7);
> +int len  = width - 15;
> +int32_t tem_ru   = rgb2yuv[RU_IDX], tem_gu = rgb2yuv[GU_IDX];
> +int32_t tem_bu = rgb2yuv[BU_IDX], tem_rv   = rgb2yuv[RV_IDX];
> +int32_t tem_gv = rgb2yuv[GV_IDX], tem_bv = rgb2yuv[BV_IDX];
> +int shift= RGB2YUV_SHIFT - 6;
> +const uint8_t *src0 = src[0], *src1 = src[1], *src2 = src[2];
> +__m256i ru, gu, bu, rv, gv, bv;
> +__m256i mask = {0x0D0C090805040100, 0x1D1C191815141110,
> +0x0D0C090805040100, 0x1D1C191815141110};
> +__m256i temp = __lasx_xvreplgr2vr_w(set);
> +__m256i sra  = __lasx_xvreplgr2vr_w(shift);
> +
> +ru = __lasx_xvreplgr2vr_w(tem_ru);
> +gu = __lasx_xvreplgr2vr_w(tem_gu);
> +bu = __lasx_xvreplgr2vr_w(tem_bu);
> +rv = __lasx_xvreplgr2vr_w(tem_rv);
> +gv = __lasx_xvreplgr2vr_w(tem_gv);
> +bv = __lasx_xvreplgr2vr_w(tem_bv);
> +for (i = 0; i < len; i += 16) {
> +__m256i _g, _b, _r;
> +__m256i g_l, g_h, b_l, b_h, r_l, r_h;
> +__m256i v_l, v_h, u_l, u_h, u_lh, v_lh;
> +
> +DUP2_ARG2(__lasx_xvld, src0 + i, 0, src1 + i, 0, _g, _b);
> +_r   = __lasx_xvld(src2 + i, 0);

In this case, it’s more readable not to use DUP2.
The following extensions of r,g,b are the same.

> +g_l = __lasx_vext2xv_wu_bu(_g);
> +DUP2_ARG1(__lasx_vext2xv_wu_bu, _b, _r, b_l, r_l);
> +_g   = __lasx_xvpermi_d(_g, 0x01);
> +_b   = __lasx_xvpermi_d(_b, 0x01);
> +_r   = __lasx_xvpermi_d(_r, 0x01);
> +g_h = __lasx_vext2xv_wu_bu(_g);
> +DUP2_ARG1(__lasx_vext2xv_wu_bu, _b, _r, b_h, r_h);
> +u_l  = __lasx_xvmadd_w(temp, ru, r_l);
> +u_h  = __lasx_xvmadd_w(temp, ru, r_h);
> +v_l  = __lasx_xvmadd_w(temp, rv, r_l);
> +v_h  = __lasx_xvmadd_w(temp, rv, r_h);
> +u_l  = __lasx_xvmadd_w(u_l, gu, g_l);
> +u_l  = __lasx_xvmadd_w(u_l, bu, b_l);
> +u_h  = __lasx_xvmadd_w(u_h, gu, g_h);
> +u_h  = __lasx_xvmadd_w(u_h, bu, b_h);
> +v_l  = __lasx_xvmadd_w(v_l, gv, g_l);
> +v_l  = __lasx_xvmadd_w(v_l, bv, b_l);
> +v_h  = __lasx_xvmadd_w(v_h, gv, g_h);
> +v_h  = __lasx_xvmadd_w(v_h, bv, b_h);
> +u_l  = __lasx_xvsra_w(u_l, sra);
> +u_h  = __lasx_xvsra_w(u_h, sra);
> +v_l  = __lasx_xvsra_w(v_l, sra);
> +v_h  = __lasx_xvsra_w(v_h, sra);
> +DUP2_ARG3(__lasx_xvshuf_b, u_h, u_l, mask, v_h, v_l, mask, u_lh, 
> v_lh);
> +u_lh = __lasx_xvpermi_d(u_lh, 0xD8);
> +v_lh = __lasx_xvpermi_d(v_lh, 0xD8);
> +__lasx_xvst(u_lh, (dstU + i), 0);
> +__lasx_xvst(v_lh, (dstV + i), 0);
> +}
> +if (width - i >= 8) {
> +__m256i _g, _b, _r;
> +__m256i g_l, b_l, r_l;
> +__m256i v_l, u_l, u, v;
> +
> +_g  = __lasx_xvldrepl_d((src0 + i), 0);
> +_b  = __lasx_xvldrepl_d((src1 + i), 0);
> +_r  = __lasx_xvldrepl_d((src2 + i), 0);
> +g_l = __lasx_vext2xv_wu_bu(_g);
> +DUP2_ARG1(__lasx_vext2xv_wu_bu, _b, _r, b_l, r_l);
> +u_l = __lasx_xvmadd_w(temp, ru, r_l);
> +v_l = __lasx_xvmadd_w(temp, rv, r_l);
> +u_l = __lasx_xvmadd_w(u_l, gu, g_l);
> +u_l = __lasx_xvmadd_w(u_l, bu, b_l);
> +v_l = __lasx_xvmadd_w(v_l, gv, g_l);
> +v_l = __lasx_xvmadd_w(v_l, bv, b_l);
> +u_l = __lasx_xvsra_w(u_l, sra);
> +v_l = __lasx_xvsra_w(v_l, sra);
> +DUP2_ARG3(__lasx_xvshuf_b, u_l, u_l, mask, v_l, v_l, mask, u, v);
> +__lasx_xvstelm_d(u, (dstU + i), 0, 0);
> +__lasx_xvstelm_d(u, (dstU + i), 8, 2);
> +__lasx_xvstelm_d(v, (dstV + i), 0, 0);
> +__lasx_xvstelm_d(v, (dstV + i), 8, 2);
> +i += 8;
> +}
> +for (; i < width; i++) {
> +int g = src[0][i];
> +int b = src[1][i];
> +int r = src[2][i];
> +
> +dstU[i] = (tem_ru * r + tem_gu * g + tem_bu * b + set) >> shift;
> +dstV[i] = (tem_rv * r + tem_gv * g + tem_bv * b + set) >> shift;
> +}

Suggest to use following condition controls instead.
While(i=width; i>=16; i-=16) {}
if(i>=8) {}
while(I--){}

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH v2] avcodec/mips: Fix MMI macro replaces in HEVC Decoder

2022-08-18 Thread Shiyou Yin



> 2022年8月18日 19:44，戚铁铮  写道：
> 
> 
> At 2022/8/18 PM 7:01, "Qi Tiezheng"  > wrote:
> 
>> The latest commit of Loongson MMI macro replaces were incorrect.
>> It makes a mass of green tints on HEVC videos when playing. I've
>> compared it with the older MMI implementation, and found out that
>> several lines have been replaced by wrong macros.
>> 
>> Signed-off-by: Qi Tiezheng 
>> ---
>> libavcodec/mips/hevcdsp_mmi.c | 16 
>> 1 file changed, 8 insertions(+), 8 deletions(-)
>> 
>> diff --git a/libavcodec/mips/hevcdsp_mmi.c b/libavcodec/mips/hevcdsp_mmi.c
>> index 0ea88a7c08..1da56d3d87 100644
>> --- a/libavcodec/mips/hevcdsp_mmi.c
>> +++ b/libavcodec/mips/hevcdsp_mmi.c
>> @@ -80,7 +80,7 @@ void ff_hevc_put_hevc_qpel_h##w##_8_mmi(int16_t *dst, 
>> const uint8_t *_src, \
>> "paddh %[ftmp3], %[ftmp3], %[ftmp4] \n\t" \
>> "paddh %[ftmp5], %[ftmp5], %[ftmp6] \n\t" \
>> "paddh %[ftmp3], %[ftmp3], %[ftmp5] \n\t" \
>> - MMI_ULDC1(%[ftmp3], %[dst], 0x00) \
>> + MMI_USDC1(%[ftmp3], %[dst], 0x00) \
>> \
>> "daddi %[x], %[x], -0x01 \n\t" \
>> PTR_ADDIU "%[src], %[src], 0x04 \n\t" \
>> @@ -178,7 +178,7 @@ void ff_hevc_put_hevc_qpel_hv##w##_8_mmi(int16_t *dst, 
>> const uint8_t *_src,\
>> "paddh %[ftmp3], %[ftmp3], %[ftmp4] \n\t" \
>> "paddh %[ftmp5], %[ftmp5], %[ftmp6] \n\t" \
>> "paddh %[ftmp3], %[ftmp3], %[ftmp5] \n\t" \
>> - MMI_ULDC1(%[ftmp3], %[tmp], 0x00) \
>> + MMI_USDC1(%[ftmp3], %[tmp], 0x00) \
>> \
>> "daddi %[x], %[x], -0x01 \n\t" \
>> PTR_ADDIU "%[src], %[src], 0x04 \n\t" \
>> @@ -690,10 +690,10 @@ void ff_hevc_put_hevc_epel_bi_hv##w##_8_mmi(uint8_t 
>> *_dst, \
>> \
>> "1: \n\t" \
>> "2: \n\t" \
>> - MMI_ULDC1(%[ftmp3], %[src], 0x00) \
>> - MMI_ULDC1(%[ftmp4], %[src], 0x01) \
>> - MMI_ULDC1(%[ftmp5], %[src], 0x02) \
>> - MMI_ULDC1(%[ftmp6], %[src], 0x03) \
>> + MMI_ULWC1(%[ftmp2], %[src], 0x00) \
>> + MMI_ULWC1(%[ftmp3], %[src], 0x01) \
>> + MMI_ULWC1(%[ftmp4], %[src], 0x02) \
>> + MMI_ULWC1(%[ftmp5], %[src], 0x03) \
>> "punpcklbh %[ftmp2], %[ftmp2], %[ftmp0] \n\t" \
>> "pmullh %[ftmp2], %[ftmp2], %[ftmp1] \n\t" \
>> "punpcklbh %[ftmp3], %[ftmp3], %[ftmp0] \n\t" \
>> @@ -707,7 +707,7 @@ void ff_hevc_put_hevc_epel_bi_hv##w##_8_mmi(uint8_t 
>> *_dst, \
>> "paddh %[ftmp2], %[ftmp2], %[ftmp3] \n\t" \
>> "paddh %[ftmp4], %[ftmp4], %[ftmp5] \n\t" \
>> "paddh %[ftmp2], %[ftmp2], %[ftmp4] \n\t" \
>> - MMI_ULDC1(%[ftmp2], %[tmp], 0x00) \
>> + MMI_USDC1(%[ftmp2], %[tmp], 0x00) \
>> \
>> "daddi %[x], %[x], -0x01 \n\t" \
>> PTR_ADDIU "%[src], %[src], 0x04 \n\t" \
>> @@ -773,7 +773,7 @@ void ff_hevc_put_hevc_epel_bi_hv##w##_8_mmi(uint8_t 
>> *_dst, \
>> "paddw %[ftmp5], %[ftmp5], %[ftmp6] \n\t" \
>> "psraw %[ftmp5], %[ftmp5], %[ftmp0] \n\t" \
>> "packsswh %[ftmp3], %[ftmp3], %[ftmp5] \n\t" \
>> - MMI_ULDC1(%[ftmp4], %[tmp], 0x02) \
>> + MMI_ULDC1(%[ftmp4], %[src2], 0x00) \
>> "li %[rtmp0], 0x10 \n\t" \
>> "dmtc1 %[rtmp0], %[ftmp8] \n\t" \
>> "punpcklhw %[ftmp5], %[ftmp2], %[ftmp3] \n\t" \
>> -- 
>> 2.25.1
> 
> Sorry, I must use e-mail client because our e-mail server is Exchange not 
> SMTP.
> The patch system seems cannot process UTF-8 Chinese characters correctly.
> I try sending it as attachment again.
> 
Thank you for fixing this bug.
LGTM.

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH] MAINTAINERS: add myself as maintainer for LoongArch.

2022-06-01 Thread Shiyou Yin

---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 46723972dc..274fc89203 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -546,6 +546,7 @@ Operating systems / CPU architectures
 
 Alpha   Falk Hueffner
 MIPSManojkumar Bhosale, Shiyou Yin
+LoongArch   Shiyou Yin
 Mac OS X / PowerPC  Romain Dolbeau, Guillaume Poirier
 Amiga / PowerPC Colin Ward
 Linux / PowerPC Lauri Kasanen
-- 
2.20.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] Optimize VP8,VP9,WMV3 decoding for loongarch.

2021-12-20 Thread Shiyou Yin



> 2021年12月18日 下午10:27，Hao Chen  写道：
> 
> ./ffmpeg -i ../9_vp8_1080p_30fps_2Mbps.webm -f rawvideo -y /dev/null -an
> before: 210fps
> after : 585fps
> ffmpeg -i ../10_vp9_1080p_30fps_3Mbps.webm -f rawvideo -y /dev/null -an
> before:170fps
> after :294fps
> ./ffmpeg -i 11_wmv3_720p_24fps_7Mbps.wmv -f rawvideo -y /dev/null -an
> before:131fps
> after :229fps
> 
> [PATCH 1/4] avcodec: [loongarch] Optimize vp8_lpf/mc with LSX.
> [PATCH 2/4] avcodec: [loongarch] Optimize vp9_mc/intra with LSX.
> [PATCH 3/4] avcodec: [loongarch] Optimize vp9_lpf/idct with LSX.
> [PATCH 4/4] avcodec: [loongarch] Optimize vc1dsp with LASX.
> 
LGTM

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH v2 2/7] avcodec: [loongarch] Optimize h264_chroma_mc with LASX.

2021-12-15 Thread Shiyou Yin

> 2021年12月15日 上午12:26，Michael Niedermayer  写道：
> 
> On Tue, Dec 14, 2021 at 09:33:11PM +0800, Hao Chen wrote:
>> From: Shiyou Yin 
>> 
>> ./ffmpeg -i ../1_h264_1080p_30fps_3Mbps.mp4 -f rawvideo -y /dev/null -an
>> before:170
>> after :183
>> 
>> Change-Id: I42ff23cc2dc7c32bd1b7e4274da9d9ec87065f20
> 
> Theres something odd with the encoding of this mail
> 
>  I 1  [text/plain, 8bit, y, 159K]
>  I 2  [text/plain, 7bit, us-ascii, 0,2K]
> 
> and git seems to dislike it
> 
> Applying: libswscale: Adds ff_hscale8to15_4_avx2 and ff_hscale8to15_X4_avx2 
> for all filter sizes.
> Press any key to continue...
> error: cannot convert from y to UTF-8
> fatal: could not parse patch
> 

Thank you for your reply. My colleague has uploaded a V3.
This patch set is optimization for h264 on loongarch.
patches for other format will be uploaded in next few days.
Hope it’s not too late for 5.0.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] Optimize H264 decoding for loongarch.

2021-12-14 Thread Shiyou Yin



> 2021年12月15日 上午11:51，Hao Chen  写道：
> 
> ./ffmpeg -i 1_h264_1080p_30fps_3Mbps.mp4 -f rawvideo -y /dev/null -an
> before: 170fps
> after : 296fps
> 
> V2: Update loongson_intrinsics.h to 1.0.3 in patch 2/7.
> V3: Resubmit these patches due to the encoding problem of the last
> email.
> 
> [PATCH v3 1/7] avutil: [loongarch] Add support for loongarch SIMD.
> [PATCH v3 2/7] avcodec: [loongarch] Optimize h264_chroma_mc with LASX.
> [PATCH v3 3/7] avcodec: [loongarch] Optimize h264qpel with LASX.
> [PATCH v3 4/7] avcodec: [loongarch] Optimize h264dsp with LASX.
> [PATCH v3 5/7] avcodec: [loongarch] Optimize h264idct with LASX.
> [PATCH v3 6/7] avcodec: [loongarch] Optimize h264_deblock with LASX.
> [PATCH v3 7/7] avcodec: [loongarch] Optimize pred16x16_plane with LASX.
> 

LGTM.

These patches are extracted from loongson internal repository of ffmpeg which 
has
 been used and verified on loongarch platform for more than half year.
I hope it can catch up with the release of 5.0.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] Optimize H264 decoding for loongarch.

2021-12-14 Thread Shiyou Yin



> 2021年12月14日 下午9:33，Hao Chen  写道：
> 
> ./ffmpeg -i 1_h264_1080p_30fps_3Mbps.mp4 -f rawvideo -y /dev/null -an
> before: 170fps
> after : 296fps
> 
> V2: Update loongson_intrinsics.h to 1.0.3 in patch 2/7.
> 
> [PATCH v2 1/7] avutil: [loongarch] Add support for loongarch SIMD.
> [PATCH v2 2/7] avcodec: [loongarch] Optimize h264_chroma_mc with LASX.
> [PATCH v2 3/7] avcodec: [loongarch] Optimize h264qpel with LASX.
> [PATCH v2 4/7] avcodec: [loongarch] Optimize h264dsp with LASX.
> [PATCH v2 5/7] avcodec: [loongarch] Optimize h264idct with LASX.
> [PATCH v2 6/7] avcodec: [loongarch] Optimize h264_deblock with LASX.
> [PATCH v2 7/7] avcodec: [loongarch] Optimize pred16x16_plane with LASX.
> 

LGTM

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH v2 3/3] avcodec: [loongarch] Optimize decode_significance/_8x8_loongarch.

2021-11-30 Thread Shiyou Yin

From: Hao Chen 

Decoding 1080P H264 from 168fps to 170fps.

Signed-off-by: Shiyou Yin 
---
 libavcodec/h264_cabac.c   |   2 +
 libavcodec/loongarch/h264_cabac.c | 140 ++
 2 files changed, 142 insertions(+)
 create mode 100644 libavcodec/loongarch/h264_cabac.c

diff --git a/libavcodec/h264_cabac.c b/libavcodec/h264_cabac.c
index 973b3419c4..040fa0a257 100644
--- a/libavcodec/h264_cabac.c
+++ b/libavcodec/h264_cabac.c
@@ -42,6 +42,8 @@
 
 #if ARCH_X86
 #include "x86/h264_cabac.c"
+#elif ARCH_LOONGARCH64
+#include "loongarch/h264_cabac.c"
 #endif
 
 /* Cabac pre state table */
diff --git a/libavcodec/loongarch/h264_cabac.c 
b/libavcodec/loongarch/h264_cabac.c
new file mode 100644
index 00..d88743bed7
--- /dev/null
+++ b/libavcodec/loongarch/h264_cabac.c
@@ -0,0 +1,140 @@
+/*
+ * Loongson  optimized cabac
+ *
+ * Copyright (c) 2021 Loongson Technology Corporation Limited
+ * Contributed by Hao Chen 
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include "libavcodec/cabac.h"
+#include "cabac.h"
+
+#define decode_significance decode_significance_loongarch
+static int decode_significance_loongarch(CABACContext *c, int max_coeff,
+uint8_t *significant_coeff_ctx_base, int *index, int64_t last_off)
+{
+void *end = significant_coeff_ctx_base + max_coeff - 1;
+int64_t minusstart = -(int64_t)significant_coeff_ctx_base;
+int64_t minusindex = 4 - (int64_t)index;
+int64_t bit, tmp0, tmp1, tmp2, one = 1;
+uint8_t *state = significant_coeff_ctx_base;
+
+__asm__ volatile(
+"3:"
+#if UNCHECKED_BITSTREAM_READER
+GET_CABAC_LOONGARCH_UNCBSR
+#else
+GET_CABAC_LOONGARCH
+#endif
+"blt %[bit],  %[one],4f   \n\t"
+"add.d   %[state],%[state],  %[last_off]  \n\t"
+#if UNCHECKED_BITSTREAM_READER
+GET_CABAC_LOONGARCH_UNCBSR
+#else
+GET_CABAC_LOONGARCH
+#endif
+"sub.d   %[state],%[state],  %[last_off]  \n\t"
+"add.d   %[tmp0], %[state],  %[minusstart]\n\t"
+"st.w%[tmp0], %[index],  0\n\t"
+"bge %[bit],  %[one],5f   \n\t"
+"addi.d  %[index],%[index],  4\n\t"
+"4:   \n\t"
+"addi.d  %[state],%[state],  1\n\t"
+"blt %[state],%[end],3b   \n\t"
+"add.d   %[tmp0], %[state],  %[minusstart]\n\t"
+"st.w%[tmp0], %[index],  0\n\t"
+"5:   \n\t"
+"add.d   %[tmp0], %[index],  %[minusindex]\n\t"
+"srli.d  %[tmp0], %[tmp0],   2\n\t"
+: [bit]"="(bit), [tmp0]"="(tmp0), [tmp1]"="(tmp1), [tmp2]"="(tmp2),
+  [c_range]"+"(c->range), [c_low]"+"(c->low), [state]"+"(state),
+  [c_bytestream]"+"(c->bytestream), [index]"+"(index)
+: [tables]"r"(ff_h264_cabac_tables), [end]"r"(end), [one]"r"(one),
+  [minusstart]"r"(minusstart), [minusindex]"r"(minusindex),
+  [last_off]"r"(last_off),
+#if !UNCHECKED_BITSTREAM_READER
+  [c_bytestream_end]"r"(c->bytestream_end),
+#endif
+  [lps_off]"i"(H264_LPS_RANGE_OFFSET),
+  [mlps_off]"i"(H264_MLPS_STATE_OFFSET + 128),
+  [norm_off]"i"(H264_NORM_SHIFT_OFFSET),
+  [cabac_mask]"r"(CABAC_MASK)
+: "memory"
+);
+
+return (int)tmp0;
+}
+
+#define decode_significance_8x8 decode_significance_8x8_loongarch
+static int decode_significance_8x8_loongarch(
+CABACContext *c, uint8_t *significant_coeff_ctx_base,
+int *index, uint8_t *last_coeff_ctx_base, const uint8_t *sig_off)
+{
+int64_t minusindex = 4 - (in

[FFmpeg-devel] [PATCH v2 1/3] configure: Add support for loongarch.

2021-11-30 Thread Shiyou Yin

For la464 cpu: ./configure --cpu=la464

With cross-compiler:
./configure --cross-prefix=loongarch64-linux-gnu- \
--enable-cross-compile --arch=loongarch64 \
--target-os=linux --cpu=la464
---
 Changelog |  1 +
 configure | 23 +++
 2 files changed, 24 insertions(+)

diff --git a/Changelog b/Changelog
index 56faa7f9f5..648079ab64 100644
--- a/Changelog
+++ b/Changelog
@@ -35,6 +35,7 @@ version :
 - bitpacked encoder
 - VideoToolbox VP9 hwaccel
 - VideoToolbox ProRes hwaccel
+- support loongarch.
 
 
 version 4.4:
diff --git a/configure b/configure
index d8b5be8bbb..aa94c39419 100755
--- a/configure
+++ b/configure
@@ -2032,6 +2032,9 @@ ARCH_LIST="
 avr32_uc
 bfin
 ia64
+loongarch
+loongarch32
+loongarch64
 m68k
 mips
 mips64
@@ -4959,6 +4962,9 @@ case "$arch" in
 arm*|iPad*|iPhone*)
 arch="arm"
 ;;
+loongarch*)
+arch="loongarch"
+;;
 mips*|IP*)
 case "$arch" in
 *el)
@@ -5106,6 +5112,18 @@ elif enabled bfin; then
 
 cpuflags="-mcpu=$cpu"
 
+elif enabled loongarch; then
+
+enable local_aligned
+enable simd_align_32
+enable fast_64bit
+enable fast_clz
+enable fast_unaligned
+case $cpu in
+la464)
+cpuflags="-march=$cpu"
+;;
+esac
 elif enabled mips; then
 
 if [ "$cpu" != "generic" ]; then
@@ -5362,6 +5380,11 @@ case "$arch" in
 aarch64|alpha|ia64)
 enabled shared && enable_weak pic
 ;;
+loongarch)
+check_64bit loongarch32 loongarch64
+enabled loongarch64 && disable loongarch32
+enabled shared && enable_weak pic
+;;
 mips)
 check_64bit mips mips64 '_MIPS_SIM > 1'
 enabled shared && enable_weak pic
-- 
2.20.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] Add support for loongarch.

2021-11-30 Thread Shiyou Yin

V2:
1. rebase.
2. Change Author email from yinshi...@loongson.cn to
yinshiyou...@loongson.cn for 1/3 2/3.
3. Refine 2/3.

[PATCH v2 1/3] configure: Add support for loongarch.
[PATCH v2 2/3] avcodec: [loongarch] optimize get_cabac.
[PATCH v2 3/3] avcodec: [loongarch] Optimize decode_significance.

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH v2 2/3] avcodec: [loongarch] optimize get_cabac.

2021-11-30 Thread Shiyou Yin

Decoding 1080P H264 on 2.5Ghz 3A5000: 165fps==>168fps.
Testing command: ffmpeg -i ***.mp4 -f rawvideo -y /dev/null -an
---
 libavcodec/cabac_functions.h |   3 +
 libavcodec/loongarch/cabac.h | 238 +++
 2 files changed, 241 insertions(+)
 create mode 100644 libavcodec/loongarch/cabac.h

diff --git a/libavcodec/cabac_functions.h b/libavcodec/cabac_functions.h
index 46af921822..2f2d48a8f8 100644
--- a/libavcodec/cabac_functions.h
+++ b/libavcodec/cabac_functions.h
@@ -49,6 +49,9 @@
 #if ARCH_MIPS
 #   include "mips/cabac.h"
 #endif
+#if ARCH_LOONGARCH64
+#   include "loongarch/cabac.h"
+#endif
 
 static const uint8_t * const ff_h264_norm_shift = ff_h264_cabac_tables + 
H264_NORM_SHIFT_OFFSET;
 static const uint8_t * const ff_h264_lps_range = ff_h264_cabac_tables + 
H264_LPS_RANGE_OFFSET;
diff --git a/libavcodec/loongarch/cabac.h b/libavcodec/loongarch/cabac.h
new file mode 100644
index 00..e1c946fe16
--- /dev/null
+++ b/libavcodec/loongarch/cabac.h
@@ -0,0 +1,238 @@
+/*
+ * Loongson  optimized cabac
+ *
+ * Copyright (c) 2020 Loongson Technology Corporation Limited
+ * Contributed by Shiyou Yin 
+ *Gu Xiwei(guxiwei...@loongson.cn)
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#ifndef AVCODEC_LOONGARCH_CABAC_H
+#define AVCODEC_LOONGARCH_CABAC_H
+
+#include "libavcodec/cabac.h"
+#include "config.h"
+
+#define GET_CABAC_LOONGARCH_UNCBSR  \
+"ld.bu%[bit],%[state],   0x0   \n\t"\
+"andi %[tmp0],   %[c_range], 0xC0  \n\t"\
+"slli.d   %[tmp0],   %[tmp0],0x01  \n\t"\
+"add.d%[tmp0],   %[tmp0],%[tables] \n\t"\
+"add.d%[tmp0],   %[tmp0],%[bit]\n\t"\
+/* tmp1: RangeLPS */\
+"ld.bu%[tmp1],   %[tmp0],%[lps_off]\n\t"\
+\
+"sub.d%[c_range],%[c_range], %[tmp1]   \n\t"\
+"slli.d   %[tmp0],   %[c_range], 0x11  \n\t"\
+"bge  %[tmp0],   %[c_low],   1f\n\t"\
+"move %[c_range],%[tmp1]   \n\t"\
+"nor  %[bit],%[bit], %[bit]\n\t"\
+"sub.d%[c_low],  %[c_low],   %[tmp0]   \n\t"\
+\
+"1:\n\t"\
+/* tmp1: *state */  \
+"add.d%[tmp0],   %[tables],  %[bit]\n\t"\
+"ld.bu%[tmp1],   %[tmp0],%[mlps_off]   \n\t"\
+/* tmp2: lps_mask */\
+"add.d%[tmp0],   %[tables],  %[c_range]\n\t"\
+"ld.bu%[tmp2],   %[tmp0],%[norm_off]   \n\t"\
+\
+"andi %[bit],%[bit], 0x01  \n\t"\
+"st.b %[tmp1],   %[state],   0x0   \n\t"\
+"sll.d%[c_range],%[c_range], %[tmp2]   \n\t"\
+"sll.d%[c_low],  %[c_low],   %[tmp2]   \n\t"\
+\
+"and  %[tmp1],   %[c_low],   %[cabac_mask] \n\t"\
+"bnez %[tmp1],   1f\n\t"\
+"ld.hu%[tmp1],   %[c_bytestream], 0x0  \n\t"\
+"ctz.d%[tmp0],   %[c_low]  \n\t"\
+"addi.d   %[tmp2],   %[tmp0],-16   \n\t"\
+"revb.2h

[FFmpeg-devel] [PATCH 3/3] avcodec: [loongarch] Optimize decode_significance/_8x8_loongarch.

2021-11-08 Thread Shiyou Yin

From: Hao Chen 

Decoding 1080P H264 from 168fps to 170fps.

Signed-off-by: Shiyou Yin 
---
 libavcodec/h264_cabac.c   |   2 +
 libavcodec/loongarch/h264_cabac.c | 140 ++
 2 files changed, 142 insertions(+)
 create mode 100644 libavcodec/loongarch/h264_cabac.c

diff --git a/libavcodec/h264_cabac.c b/libavcodec/h264_cabac.c
index 973b3419c4..040fa0a257 100644
--- a/libavcodec/h264_cabac.c
+++ b/libavcodec/h264_cabac.c
@@ -42,6 +42,8 @@
 
 #if ARCH_X86
 #include "x86/h264_cabac.c"
+#elif ARCH_LOONGARCH64
+#include "loongarch/h264_cabac.c"
 #endif
 
 /* Cabac pre state table */
diff --git a/libavcodec/loongarch/h264_cabac.c 
b/libavcodec/loongarch/h264_cabac.c
new file mode 100644
index 00..d88743bed7
--- /dev/null
+++ b/libavcodec/loongarch/h264_cabac.c
@@ -0,0 +1,140 @@
+/*
+ * Loongson  optimized cabac
+ *
+ * Copyright (c) 2021 Loongson Technology Corporation Limited
+ * Contributed by Hao Chen 
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include "libavcodec/cabac.h"
+#include "cabac.h"
+
+#define decode_significance decode_significance_loongarch
+static int decode_significance_loongarch(CABACContext *c, int max_coeff,
+uint8_t *significant_coeff_ctx_base, int *index, int64_t last_off)
+{
+void *end = significant_coeff_ctx_base + max_coeff - 1;
+int64_t minusstart = -(int64_t)significant_coeff_ctx_base;
+int64_t minusindex = 4 - (int64_t)index;
+int64_t bit, tmp0, tmp1, tmp2, one = 1;
+uint8_t *state = significant_coeff_ctx_base;
+
+__asm__ volatile(
+"3:"
+#if UNCHECKED_BITSTREAM_READER
+GET_CABAC_LOONGARCH_UNCBSR
+#else
+GET_CABAC_LOONGARCH
+#endif
+"blt %[bit],  %[one],4f   \n\t"
+"add.d   %[state],%[state],  %[last_off]  \n\t"
+#if UNCHECKED_BITSTREAM_READER
+GET_CABAC_LOONGARCH_UNCBSR
+#else
+GET_CABAC_LOONGARCH
+#endif
+"sub.d   %[state],%[state],  %[last_off]  \n\t"
+"add.d   %[tmp0], %[state],  %[minusstart]\n\t"
+"st.w%[tmp0], %[index],  0\n\t"
+"bge %[bit],  %[one],5f   \n\t"
+"addi.d  %[index],%[index],  4\n\t"
+"4:   \n\t"
+"addi.d  %[state],%[state],  1\n\t"
+"blt %[state],%[end],3b   \n\t"
+"add.d   %[tmp0], %[state],  %[minusstart]\n\t"
+"st.w%[tmp0], %[index],  0\n\t"
+"5:   \n\t"
+"add.d   %[tmp0], %[index],  %[minusindex]\n\t"
+"srli.d  %[tmp0], %[tmp0],   2\n\t"
+: [bit]"="(bit), [tmp0]"="(tmp0), [tmp1]"="(tmp1), [tmp2]"="(tmp2),
+  [c_range]"+"(c->range), [c_low]"+"(c->low), [state]"+"(state),
+  [c_bytestream]"+"(c->bytestream), [index]"+"(index)
+: [tables]"r"(ff_h264_cabac_tables), [end]"r"(end), [one]"r"(one),
+  [minusstart]"r"(minusstart), [minusindex]"r"(minusindex),
+  [last_off]"r"(last_off),
+#if !UNCHECKED_BITSTREAM_READER
+  [c_bytestream_end]"r"(c->bytestream_end),
+#endif
+  [lps_off]"i"(H264_LPS_RANGE_OFFSET),
+  [mlps_off]"i"(H264_MLPS_STATE_OFFSET + 128),
+  [norm_off]"i"(H264_NORM_SHIFT_OFFSET),
+  [cabac_mask]"r"(CABAC_MASK)
+: "memory"
+);
+
+return (int)tmp0;
+}
+
+#define decode_significance_8x8 decode_significance_8x8_loongarch
+static int decode_significance_8x8_loongarch(
+CABACContext *c, uint8_t *significant_coeff_ctx_base,
+int *index, uint8_t *last_coeff_ctx_base, const uint8_t *sig_off)
+{
+int64_t minusindex = 4 - (in

[FFmpeg-devel] [PATCH 2/3] avcodec: [loongarch] optimize get_cabac.

2021-11-08 Thread Shiyou Yin

Decoding 1080P H264 on 2.5Ghz 3A5000: 165fps==>168fps.
Testing command: ffmpeg -i ***.mp4 -f rawvideo -y /dev/null -an
---
 libavcodec/cabac_functions.h |   3 +
 libavcodec/loongarch/cabac.h | 254 +++
 2 files changed, 257 insertions(+)
 create mode 100644 libavcodec/loongarch/cabac.h

diff --git a/libavcodec/cabac_functions.h b/libavcodec/cabac_functions.h
index 46af921822..2f2d48a8f8 100644
--- a/libavcodec/cabac_functions.h
+++ b/libavcodec/cabac_functions.h
@@ -49,6 +49,9 @@
 #if ARCH_MIPS
 #   include "mips/cabac.h"
 #endif
+#if ARCH_LOONGARCH64
+#   include "loongarch/cabac.h"
+#endif
 
 static const uint8_t * const ff_h264_norm_shift = ff_h264_cabac_tables + 
H264_NORM_SHIFT_OFFSET;
 static const uint8_t * const ff_h264_lps_range = ff_h264_cabac_tables + 
H264_LPS_RANGE_OFFSET;
diff --git a/libavcodec/loongarch/cabac.h b/libavcodec/loongarch/cabac.h
new file mode 100644
index 00..71e8ba3be7
--- /dev/null
+++ b/libavcodec/loongarch/cabac.h
@@ -0,0 +1,254 @@
+/*
+ * Loongson  optimized cabac
+ *
+ * Copyright (c) 2020 Loongson Technology Corporation Limited
+ * Contributed by Shiyou Yin 
+ *Gu Xiwei(guxiwei...@loongson.cn)
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#ifndef AVCODEC_LOONGARCH_CABAC_H
+#define AVCODEC_LOONGARCH_CABAC_H
+
+#include "libavcodec/cabac.h"
+#include "config.h"
+
+#define GET_CABAC_LOONGARCH_UNCBSR  \
+"ld.bu%[bit],%[state],   0x0   \n\t"\
+"andi %[tmp0],   %[c_range], 0xC0  \n\t"\
+"slli.d   %[tmp0],   %[tmp0],0x01  \n\t"\
+"add.d%[tmp0],   %[tmp0],%[tables] \n\t"\
+"add.d%[tmp0],   %[tmp0],%[bit]\n\t"\
+/* tmp1: RangeLPS */\
+"ld.bu%[tmp1],   %[tmp0],%[lps_off]\n\t"\
+\
+"sub.d%[c_range],%[c_range], %[tmp1]   \n\t"\
+"slli.d   %[tmp0],   %[c_range], 0x11  \n\t"\
+"bge  %[tmp0],   %[c_low],   1f\n\t"\
+"move %[c_range],%[tmp1]   \n\t"\
+"nor  %[bit],%[bit], %[bit]\n\t"\
+"sub.d%[c_low],  %[c_low],   %[tmp0]   \n\t"\
+\
+"1:\n\t"\
+/* tmp1: *state */  \
+"add.d%[tmp0],   %[tables],  %[bit]\n\t"\
+"ld.bu%[tmp1],   %[tmp0],%[mlps_off]   \n\t"\
+/* tmp2: lps_mask */\
+"add.d%[tmp0],   %[tables],  %[c_range]\n\t"\
+"ld.bu%[tmp2],   %[tmp0],%[norm_off]   \n\t"\
+\
+"andi %[bit],%[bit], 0x01  \n\t"\
+"st.b %[tmp1],   %[state],   0x0   \n\t"\
+"sll.d%[c_range],%[c_range], %[tmp2]   \n\t"\
+"sll.d%[c_low],  %[c_low],   %[tmp2]   \n\t"\
+\
+"and  %[tmp1],   %[c_low],   %[cabac_mask] \n\t"\
+"bnez %[tmp1],   1f\n\t"\
+"ld.hu%[tmp1],   %[c_bytestream], 0x0  \n\t"\
+"addi.d   %[tmp0],   %[c_low],   -0X01 \n\t"\
+"xor  %[tmp0],   %[c_low],   %[tmp0]   \n\t&q

[FFmpeg-devel] [PATCH 1/3] configure: Add support for loongarch.

2021-11-08 Thread Shiyou Yin

For la464 cpu: ./configure --cpu=la464
With cross-compiler:
./configure --cross-prefix=loongarch64-linux-gnu- \
--enable-cross-compile --arch=loongarch64 \
--target-os=linux --cpu=la464
---
 Changelog |  1 +
 configure | 17 +
 2 files changed, 18 insertions(+)

diff --git a/Changelog b/Changelog
index 765ec82915..60b74a3873 100644
--- a/Changelog
+++ b/Changelog
@@ -30,6 +30,7 @@ version :
 - xcorrelate video filter
 - varblur video filter
 - huesaturation video filter
+- support loongarch.
 
 
 version 4.4:
diff --git a/configure b/configure
index c01aa480c7..b4be706b65 100755
--- a/configure
+++ b/configure
@@ -2020,6 +2020,9 @@ ARCH_LIST="
 avr32_uc
 bfin
 ia64
+loongarch
+loongarch32
+loongarch64
 m68k
 mips
 mips64
@@ -4932,6 +4935,9 @@ case "$arch" in
 arm*|iPad*|iPhone*)
 arch="arm"
 ;;
+loongarch*)
+arch="loongarch"
+;;
 mips*|IP*)
 case "$arch" in
 *el)
@@ -5079,6 +5085,12 @@ elif enabled bfin; then
 
 cpuflags="-mcpu=$cpu"
 
+elif enabled loongarch; then
+case $cpu in
+la464)
+cpuflags="-march=$cpu"
+;;
+esac
 elif enabled mips; then
 
 if [ "$cpu" != "generic" ]; then
@@ -5335,6 +5347,11 @@ case "$arch" in
 aarch64|alpha|ia64)
 enabled shared && enable_weak pic
 ;;
+loongarch)
+check_64bit loongarch32 loongarch64
+enabled loongarch64 && disable loongarch32
+enabled shared && enable_weak pic
+;;
 mips)
 check_64bit mips mips64 '_MIPS_SIM > 1'
 enabled shared && enable_weak pic
-- 
2.20.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] Add support for loongarch.

2021-11-08 Thread Shiyou Yin

[PATCH 1/3] configure: Add support for loongarch.
[PATCH 2/3] avcodec: [loongarch] optimize get_cabac.
[PATCH 3/3] avcodec: [loongarch] Optimize decode_significance.

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH v4 5/5] mips: Fix potential illegal instruction error.

2021-04-12 Thread Shiyou Yin

MSA2 optimizations are attached to MSA macros in generic_macros_msa.h.
It's difficult to do runtime check for them. Remove this part of code
can make it more robust. H264 1080p decoding: 5.13x==>5.12x.
---
 configure   |  7 +--
 libavutil/mips/generic_macros_msa.h | 37 -
 2 files changed, 1 insertion(+), 43 deletions(-)

diff --git a/configure b/configure
index d7a3f50..7b05612 100755
--- a/configure
+++ b/configure
@@ -451,7 +451,6 @@ Optimization options (experts only):
   --disable-mipsdspdisable MIPS DSP ASE R1 optimizations
   --disable-mipsdspr2  disable MIPS DSP ASE R2 optimizations
   --disable-msadisable MSA optimizations
-  --disable-msa2   disable MSA2 optimizations
   --disable-mipsfpudisable floating point MIPS optimizations
   --disable-mmidisable Loongson SIMD optimizations
   --disable-fast-unaligned consider unaligned accesses slow
@@ -2025,7 +2024,6 @@ ARCH_EXT_LIST_MIPS="
 mipsdsp
 mipsdspr2
 msa
-msa2
 "
 
 ARCH_EXT_LIST_LOONGSON="
@@ -2564,7 +2562,6 @@ mipsdsp_deps="mips"
 mipsdspr2_deps="mips"
 mmi_deps_any="loongson2 loongson3"
 msa_deps="mipsfpu"
-msa2_deps="msa"
 
 cpunop_deps="i686"
 x86_64_select="i686"
@@ -5907,9 +5904,8 @@ elif enabled mips; then
 enabled mipsdsp && check_inline_asm_flags mipsdsp '"addu.qb $t0, $t1, 
$t2"' '-mdsp'
 enabled mipsdspr2 && check_inline_asm_flags mipsdspr2 '"absq_s.qb $t0, 
$t1"' '-mdspr2'
 
-# MSA and MSA2 can be detected at runtime so we supply extra flags here
+# MSA can be detected at runtime so we supply extra flags here
 enabled mipsfpu && enabled msa && check_inline_asm msa '"addvi.b $w0, $w1, 
1"' '-mmsa' && append MSAFLAGS '-mmsa'
-enabled msa && enabled msa2 && check_inline_asm msa2 '"nxbits.any.b $w0, 
$w0"' '-mmsa2' && append MSAFLAGS '-mmsa2'
 
 # loongson2 have no switch cflag so we can only probe toolchain ability
 enabled loongson2 && check_inline_asm loongson2 '"dmult.g $8, $9, $10"' && 
disable loongson3
@@ -7340,7 +7336,6 @@ if enabled mips; then
 echo "MIPS DSP R1 enabled   ${mipsdsp-no}"
 echo "MIPS DSP R2 enabled   ${mipsdspr2-no}"
 echo "MIPS MSA enabled  ${msa-no}"
-echo "MIPS MSA2 enabled ${msa2-no}"
 echo "LOONGSON MMI enabled  ${mmi-no}"
 fi
 if enabled ppc; then
diff --git a/libavutil/mips/generic_macros_msa.h 
b/libavutil/mips/generic_macros_msa.h
index bb25e9f..1486f72 100644
--- a/libavutil/mips/generic_macros_msa.h
+++ b/libavutil/mips/generic_macros_msa.h
@@ -25,10 +25,6 @@
 #include 
 #include 
 
-#if HAVE_MSA2
-#include 
-#endif
-
 #define ALIGNMENT   16
 #define ALLOC_ALIGNED(align) __attribute__ ((aligned((align) << 1)))
 
@@ -1119,15 +1115,6 @@
  unsigned absolute diff values, even-odd pairs are added
  together to generate 8 halfword results.
 */
-#if HAVE_MSA2
-#define SAD_UB2_UH(in0, in1, ref0, ref1) \
-( {  \
-v8u16 sad_m = { 0 }; \
-sad_m += __builtin_msa2_sad_adj2_u_w2x_b((v16u8) in0, (v16u8) ref0); \
-sad_m += __builtin_msa2_sad_adj2_u_w2x_b((v16u8) in1, (v16u8) ref1); \
-sad_m;   \
-} )
-#else
 #define SAD_UB2_UH(in0, in1, ref0, ref1)\
 ( { \
 v16u8 diff0_m, diff1_m; \
@@ -1141,7 +1128,6 @@
 \
 sad_m;  \
 } )
-#endif // #if HAVE_MSA2
 
 /* Description : Insert specified word elements from input vectors to 1
  destination vector
@@ -2183,12 +2169,6 @@
  extracted and interleaved with same vector 'in0' to generate
  4 word elements keeping sign intact
 */
-#if HAVE_MSA2
-#define UNPCK_R_SH_SW(in, out)   \
-{\
-out = (v4i32) __builtin_msa2_w2x_lo_s_h((v8i16) in); \
-}
-#else
 #define UNPCK_R_SH_SW(in, out)   \
 {\
 v8i16 sign_m;\
@@ -2196,7 +2176,6 @@
 sign_m = __msa_clti_s_h((v8i16) in, 0);  \
 out = (v4i32) __msa_ilvr_h(sign_m, (v8i16) in);  \
 }
-#endif // #if HAVE_MSA2
 
 /* Description : Sign extend byte elements from input vector and return
  halfword results in pair of vectors
@@ -2209,13 +2188,6 @@
  Then interleaved left with same vector 'in0' to
  generate 8 signed halfword elements in 'out1'
 */
-#if HAVE_MSA2
-#define UNPCK_SB_SH(in, out0, out1)   \
-{

[FFmpeg-devel] [PATCH v4 3/5] avcodec/mips: Optimize function ff_h264_loop_filter_strength_msa.

2021-04-12 Thread Shiyou Yin

From: gxw 

Speed of decoding H264 1080P: 5.05x ==> 5.13x

Signed-off-by: Shiyou Yin 
---
 libavcodec/mips/Makefile|   3 +-
 libavcodec/mips/h264_deblock_msa.c  | 153 
 libavcodec/mips/h264dsp_init_mips.c |   2 +
 libavcodec/mips/h264dsp_mips.h  |   4 +
 4 files changed, 161 insertions(+), 1 deletion(-)
 create mode 100644 libavcodec/mips/h264_deblock_msa.c

diff --git a/libavcodec/mips/Makefile b/libavcodec/mips/Makefile
index 2be4d9b..81a73a4 100644
--- a/libavcodec/mips/Makefile
+++ b/libavcodec/mips/Makefile
@@ -57,7 +57,8 @@ MSA-OBJS-$(CONFIG_VP8_DECODER)+= 
mips/vp8_mc_msa.o \
  mips/vp8_lpf_msa.o
 MSA-OBJS-$(CONFIG_VP3DSP) += mips/vp3dsp_idct_msa.o
 MSA-OBJS-$(CONFIG_H264DSP)+= mips/h264dsp_msa.o\
- mips/h264idct_msa.o
+ mips/h264idct_msa.o   \
+ mips/h264_deblock_msa.o
 MSA-OBJS-$(CONFIG_H264QPEL)   += mips/h264qpel_msa.o
 MSA-OBJS-$(CONFIG_H264CHROMA) += mips/h264chroma_msa.o
 MSA-OBJS-$(CONFIG_H264PRED)   += mips/h264pred_msa.o
diff --git a/libavcodec/mips/h264_deblock_msa.c 
b/libavcodec/mips/h264_deblock_msa.c
new file mode 100644
index 000..4fed55c
--- /dev/null
+++ b/libavcodec/mips/h264_deblock_msa.c
@@ -0,0 +1,153 @@
+/*
+ * MIPS SIMD optimized H.264 deblocking code
+ *
+ * Copyright (c) 2020 Loongson Technology Corporation Limited
+ *Gu Xiwei 
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include "libavcodec/bit_depth_template.c"
+#include "h264dsp_mips.h"
+#include "libavutil/mips/generic_macros_msa.h"
+#include "libavcodec/mips/h264dsp_mips.h"
+
+#define h264_loop_filter_strength_iteration_msa(edges, step, mask_mv, dir, \
+d_idx, mask_dir)   \
+do {   \
+int b_idx = 0; \
+int step_x4 = step << 2; \
+int d_idx_12 = d_idx + 12; \
+int d_idx_52 = d_idx + 52; \
+int d_idx_x4 = d_idx << 2; \
+int d_idx_x4_48 = d_idx_x4 + 48; \
+int dir_x32  = dir * 32; \
+uint8_t *ref_t = (uint8_t*)ref; \
+uint8_t *mv_t  = (uint8_t*)mv; \
+uint8_t *nnz_t = (uint8_t*)nnz; \
+uint8_t *bS_t  = (uint8_t*)bS; \
+mask_mv <<= 3; \
+for (; b_idx < edges; b_idx += step) { \
+out &= mask_dir; \
+if (!(mask_mv & b_idx)) { \
+if (bidir) { \
+ref_2 = LD_SB(ref_t + d_idx_12); \
+ref_3 = LD_SB(ref_t + d_idx_52); \
+ref_0 = LD_SB(ref_t + 12); \
+ref_1 = LD_SB(ref_t + 52); \
+ref_2 = (v16i8)__msa_ilvr_w((v4i32)ref_3, (v4i32)ref_2); \
+ref_0 = (v16i8)__msa_ilvr_w((v4i32)ref_0, (v4i32)ref_0); \
+ref_1 = (v16i8)__msa_ilvr_w((v4i32)ref_1, (v4i32)ref_1); \
+ref_3 = (v16i8)__msa_shf_h((v8i16)ref_2, 0x4e); \
+ref_0 -= ref_2; \
+ref_1 -= ref_3; \
+ref_0 = (v16i8)__msa_or_v((v16u8)ref_0, (v16u8)ref_1); \
+\
+tmp_2 = LD_SH(mv_t + d_idx_x4_48);   \
+tmp_3 = LD_SH(mv_t + 48); \
+tmp_4 = LD_SH(mv_t + 208); \
+tmp_5 = tmp_2 - tmp_3; \
+tmp_6 = tmp_2 - tmp_4; \
+SAT_SH2_SH(tmp_5, tmp_6, 7); \
+tmp_0 = __msa_pckev_b((v16i8)tmp_6, (v16i8)tmp_5); \
+tmp_0 += cnst_1; \
+tmp_0 = (v16i8)__msa_subs_u_b((v16u8)tmp_0, (v16u8)cnst_0);\
+tmp_0 = (v16i8)__msa_sat_s_h((v8i16)tmp_0, 7); \
+tmp_0 = __msa_pckev_b(tmp_0, tmp_0); \
+out   = (v16i8)__msa_or_v((v16u8)ref_0, (v16u8)tmp_0); \
+\
+tmp_2 = LD_SH(mv_t + 208 + d_idx_x4); \
+tmp_5 = tmp_2 - tmp_3; \
+tmp_6 = tmp_2 - tmp_4; \
+SAT_SH2_SH(tmp_5, tmp_6, 7); \
+tmp_1 = __msa_pckev_b((v16i8)tmp_6, (v16i8)tmp_5

[FFmpeg-devel] [PATCH v4 4/5] avcodec/mips: Refine ff_h264_h_lpf_luma_inter_msa

2021-04-12 Thread Shiyou Yin

From: gxw 

Using mask to avoid judgment, H264 4K decoding speed
improved about 0.1fps tested on 3A4000

Signed-off-by: Shiyou Yin 
---
 libavcodec/mips/h264dsp_msa.c | 465 --
 1 file changed, 171 insertions(+), 294 deletions(-)

diff --git a/libavcodec/mips/h264dsp_msa.c b/libavcodec/mips/h264dsp_msa.c
index a8c3f3c..9d815f8 100644
--- a/libavcodec/mips/h264dsp_msa.c
+++ b/libavcodec/mips/h264dsp_msa.c
@@ -1284,284 +1284,160 @@ static void 
avc_loopfilter_cb_or_cr_intra_edge_ver_msa(uint8_t *data_cb_or_cr,
 }
 }
 
-static void avc_loopfilter_luma_inter_edge_ver_msa(uint8_t *data,
-   uint8_t bs0, uint8_t bs1,
-   uint8_t bs2, uint8_t bs3,
-   uint8_t tc0, uint8_t tc1,
-   uint8_t tc2, uint8_t tc3,
-   uint8_t alpha_in,
-   uint8_t beta_in,
-   ptrdiff_t img_width)
+static void avc_loopfilter_luma_inter_edge_ver_msa(uint8_t* pPix, uint32_t 
iStride,
+   uint8_t iAlpha, uint8_t 
iBeta,
+   uint8_t* pTc)
 {
-v16u8 tmp_vec, bs = { 0 };
-
-tmp_vec = (v16u8) __msa_fill_b(bs0);
-bs = (v16u8) __msa_insve_w((v4i32) bs, 0, (v4i32) tmp_vec);
-tmp_vec = (v16u8) __msa_fill_b(bs1);
-bs = (v16u8) __msa_insve_w((v4i32) bs, 1, (v4i32) tmp_vec);
-tmp_vec = (v16u8) __msa_fill_b(bs2);
-bs = (v16u8) __msa_insve_w((v4i32) bs, 2, (v4i32) tmp_vec);
-tmp_vec = (v16u8) __msa_fill_b(bs3);
-bs = (v16u8) __msa_insve_w((v4i32) bs, 3, (v4i32) tmp_vec);
-
-if (!__msa_test_bz_v(bs)) {
-uint8_t *src = data - 4;
-v16u8 p3_org, p2_org, p1_org, p0_org, q0_org, q1_org, q2_org, q3_org;
-v16u8 p0_asub_q0, p1_asub_p0, q1_asub_q0, alpha, beta;
-v16u8 is_less_than, is_less_than_beta, is_less_than_alpha;
-v16u8 is_bs_greater_than0;
-v16u8 tc = { 0 };
-v16i8 zero = { 0 };
-
-tmp_vec = (v16u8) __msa_fill_b(tc0);
-tc = (v16u8) __msa_insve_w((v4i32) tc, 0, (v4i32) tmp_vec);
-tmp_vec = (v16u8) __msa_fill_b(tc1);
-tc = (v16u8) __msa_insve_w((v4i32) tc, 1, (v4i32) tmp_vec);
-tmp_vec = (v16u8) __msa_fill_b(tc2);
-tc = (v16u8) __msa_insve_w((v4i32) tc, 2, (v4i32) tmp_vec);
-tmp_vec = (v16u8) __msa_fill_b(tc3);
-tc = (v16u8) __msa_insve_w((v4i32) tc, 3, (v4i32) tmp_vec);
-
-is_bs_greater_than0 = (zero < bs);
-
-{
-v16u8 row0, row1, row2, row3, row4, row5, row6, row7;
-v16u8 row8, row9, row10, row11, row12, row13, row14, row15;
-
-LD_UB8(src, img_width,
-   row0, row1, row2, row3, row4, row5, row6, row7);
-src += (8 * img_width);
-LD_UB8(src, img_width,
-   row8, row9, row10, row11, row12, row13, row14, row15);
-
-TRANSPOSE16x8_UB_UB(row0, row1, row2, row3, row4, row5, row6, row7,
-row8, row9, row10, row11,
-row12, row13, row14, row15,
-p3_org, p2_org, p1_org, p0_org,
-q0_org, q1_org, q2_org, q3_org);
-}
-
-p0_asub_q0 = __msa_asub_u_b(p0_org, q0_org);
-p1_asub_p0 = __msa_asub_u_b(p1_org, p0_org);
-q1_asub_q0 = __msa_asub_u_b(q1_org, q0_org);
-
-alpha = (v16u8) __msa_fill_b(alpha_in);
-beta = (v16u8) __msa_fill_b(beta_in);
-
-is_less_than_alpha = (p0_asub_q0 < alpha);
-is_less_than_beta = (p1_asub_p0 < beta);
-is_less_than = is_less_than_beta & is_less_than_alpha;
-is_less_than_beta = (q1_asub_q0 < beta);
-is_less_than = is_less_than_beta & is_less_than;
-is_less_than = is_less_than & is_bs_greater_than0;
-
-if (!__msa_test_bz_v(is_less_than)) {
-v16i8 negate_tc, sign_negate_tc;
-v16u8 p0, q0, p2_asub_p0, q2_asub_q0;
-v8i16 tc_r, tc_l, negate_tc_r, i16_negatetc_l;
-v8i16 p1_org_r, p0_org_r, q0_org_r, q1_org_r;
-v8i16 p1_org_l, p0_org_l, q0_org_l, q1_org_l;
-v8i16 p0_r, q0_r, p0_l, q0_l;
-
-negate_tc = zero - (v16i8) tc;
-sign_negate_tc = __msa_clti_s_b(negate_tc, 0);
-
-ILVRL_B2_SH(sign_negate_tc, negate_tc, negate_tc_r, 
i16_negatetc_l);
-
-UNPCK_UB_SH(tc, tc_r, tc_l);
-UNPCK_UB_SH(p1_org, p1_org_r, p1_org_l);
-UNPCK_UB_SH(p0_org, p0_org_r, p0_org_l);
-UNPCK_UB_SH(q0_org, q0_org_r, q0_org_l);
-
-p2_asub_p0 = __msa_asub_u_b(p2_org, p0_org);
-is_less_than_beta = (p2_asub_p0 < beta);
-

[FFmpeg-devel] [PATCH V4] [mips] Optimize H264 decoding for MIPS platform.

2021-04-12 Thread Shiyou Yin

v2: Fixed a build error in [PATCH 2/5].
v3: add patch 4/5.
v4: Fix bug in 2/5 caused by instruction 'lhu' on BIGENDIAN environment.

[PATCH v4 1/5] avcodec/mips: Restore the initialization sequence of
[PATCH v4 2/5] avcodec/mips: Refine get_cabac_inline_mips.
[PATCH v4 3/5] avcodec/mips: Optimize function ff_h264_loop_filter_strength_msa.
[PATCH v4 4/5] avcodec/mips: Refine ff_h264_h_lpf_luma_inter_msa
[PATCH v4 5/5] mips: Fix potential illegal instruction error.

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH v4 1/5] avcodec/mips: Restore the initialization sequence of MSA and MMI in ff_h264chroma_init_mips.

2021-04-12 Thread Shiyou Yin

The MSA optimization has been refined in commit 93218c2 and ce0a52e.
It is better than MMI version now.
Speed of decoding H264: 4.83x ==> 4.89x (tested on 3A4000).
---
 libavcodec/mips/h264chroma_init_mips.c | 19 +--
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/libavcodec/mips/h264chroma_init_mips.c 
b/libavcodec/mips/h264chroma_init_mips.c
index 6bb19d3..755cc04 100644
--- a/libavcodec/mips/h264chroma_init_mips.c
+++ b/libavcodec/mips/h264chroma_init_mips.c
@@ -28,7 +28,15 @@ av_cold void ff_h264chroma_init_mips(H264ChromaContext *c, 
int bit_depth)
 int cpu_flags = av_get_cpu_flags();
 int high_bit_depth = bit_depth > 8;
 
-/* MMI apears to be faster than MSA here */
+if (have_mmi(cpu_flags)) {
+if (!high_bit_depth) {
+c->put_h264_chroma_pixels_tab[0] = ff_put_h264_chroma_mc8_mmi;
+c->avg_h264_chroma_pixels_tab[0] = ff_avg_h264_chroma_mc8_mmi;
+c->put_h264_chroma_pixels_tab[1] = ff_put_h264_chroma_mc4_mmi;
+c->avg_h264_chroma_pixels_tab[1] = ff_avg_h264_chroma_mc4_mmi;
+}
+}
+
 if (have_msa(cpu_flags)) {
 if (!high_bit_depth) {
 c->put_h264_chroma_pixels_tab[0] = ff_put_h264_chroma_mc8_msa;
@@ -40,13 +48,4 @@ av_cold void ff_h264chroma_init_mips(H264ChromaContext *c, 
int bit_depth)
 c->avg_h264_chroma_pixels_tab[2] = ff_avg_h264_chroma_mc2_msa;
 }
 }
-
-if (have_mmi(cpu_flags)) {
-if (!high_bit_depth) {
-c->put_h264_chroma_pixels_tab[0] = ff_put_h264_chroma_mc8_mmi;
-c->avg_h264_chroma_pixels_tab[0] = ff_avg_h264_chroma_mc8_mmi;
-c->put_h264_chroma_pixels_tab[1] = ff_put_h264_chroma_mc4_mmi;
-c->avg_h264_chroma_pixels_tab[1] = ff_avg_h264_chroma_mc4_mmi;
-}
-}
 }
-- 
2.1.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH v4 2/5] avcodec/mips: Refine get_cabac_inline_mips.

2021-04-12 Thread Shiyou Yin

1. Refined function get_cabac_inline_mips.
2. Optimize function get_cabac_bypass and get_cabac_bypass_sign.

Speed of decoding h264: 4.89x ==> 5.05x(tested on 3A4000).
---
 libavcodec/mips/cabac.h | 140 ++--
 1 file changed, 112 insertions(+), 28 deletions(-)

diff --git a/libavcodec/mips/cabac.h b/libavcodec/mips/cabac.h
index 3d09e93..0648b9a 100644
--- a/libavcodec/mips/cabac.h
+++ b/libavcodec/mips/cabac.h
@@ -2,7 +2,8 @@
  * Loongson SIMD optimized h264chroma
  *
  * Copyright (c) 2018 Loongson Technology Corporation Limited
- * Copyright (c) 2018 Shiyou Yin 
+ * Contributed by Shiyou Yin 
+ *Gu Xiwei(guxiwei...@loongson.cn)
  *
  * This file is part of FFmpeg.
  *
@@ -25,18 +26,18 @@
 #define AVCODEC_MIPS_CABAC_H
 
 #include "libavcodec/cabac.h"
-#include "libavutil/mips/asmdefs.h"
+#include "libavutil/mips/mmiutils.h"
 #include "config.h"
 
 #define get_cabac_inline get_cabac_inline_mips
 static av_always_inline int get_cabac_inline_mips(CABACContext *c,
- uint8_t * const state){
+  uint8_t * const state){
 mips_reg tmp0, tmp1, tmp2, bit;
 
 __asm__ volatile (
 "lbu  %[bit],0(%[state])   \n\t"
 "and  %[tmp0],   %[c_range], 0xC0  \n\t"
-PTR_ADDU "%[tmp0],   %[tmp0],%[tmp0]   \n\t"
+PTR_SLL  "%[tmp0],   %[tmp0],0x01  \n\t"
 PTR_ADDU "%[tmp0],   %[tmp0],%[tables] \n\t"
 PTR_ADDU "%[tmp0],   %[tmp0],%[bit]\n\t"
 /* tmp1: RangeLPS */
@@ -44,18 +45,11 @@ static av_always_inline int 
get_cabac_inline_mips(CABACContext *c,
 
 PTR_SUBU "%[c_range],%[c_range], %[tmp1]   \n\t"
 PTR_SLL  "%[tmp0],   %[c_range], 0x11  \n\t"
-PTR_SUBU "%[tmp0],   %[tmp0],%[c_low]  \n\t"
-
-/* tmp2: lps_mask */
-PTR_SRA  "%[tmp2],   %[tmp0],0x1F  \n\t"
-/* If tmp0 < 0, lps_mask ==  0x*/
-/* If tmp0 >= 0, lps_mask ==  0x*/
+"slt  %[tmp2],   %[tmp0],%[c_low]  \n\t"
 "beqz %[tmp2],   1f\n\t"
-PTR_SLL  "%[tmp0],   %[c_range], 0x11  \n\t"
+"move %[c_range],%[tmp1]   \n\t"
+"not  %[bit],%[bit]\n\t"
 PTR_SUBU "%[c_low],  %[c_low],   %[tmp0]   \n\t"
-PTR_SUBU "%[tmp0],   %[tmp1],%[c_range]\n\t"
-PTR_ADDU "%[c_range],%[c_range], %[tmp0]   \n\t"
-"xor  %[bit],%[bit], %[tmp2]   \n\t"
 
 "1:\n\t"
 /* tmp1: *state */
@@ -70,23 +64,21 @@ static av_always_inline int 
get_cabac_inline_mips(CABACContext *c,
 PTR_SLL  "%[c_range],%[c_range], %[tmp2]   \n\t"
 PTR_SLL  "%[c_low],  %[c_low],   %[tmp2]   \n\t"
 
-"and  %[tmp0],   %[c_low],   %[cabac_mask] \n\t"
-"bnez %[tmp0],   1f\n\t"
-PTR_ADDIU"%[tmp0],   %[c_low],   -0x01 \n\t"
+"and  %[tmp1],   %[c_low],   %[cabac_mask] \n\t"
+"bnez %[tmp1],   1f\n\t"
+PTR_ADDIU"%[tmp0],   %[c_low],   -0X01 \n\t"
 "xor  %[tmp0],   %[c_low],   %[tmp0]   \n\t"
 PTR_SRA  "%[tmp0],   %[tmp0],0x0f  \n\t"
 PTR_ADDU "%[tmp0],   %[tmp0],%[tables] \n\t"
+/* tmp2: ff_h264_norm_shift[x >> (CABAC_BITS - 1)] */
 "lbu  %[tmp2],   %[norm_off](%[tmp0])  \n\t"
-#if CABAC_BITS == 16
-"lbu  %[tmp0],   0(%[c_bytestream])\n\t"
-"lbu  %[tmp1],   1(%[c_bytestream])\n\t"
-PTR_SLL  "%[tmp0],   %[tmp0],0x09  \n\t"
-PTR_SLL  "%[tmp1],   %[tmp1],0x01  \n\t"
-PTR_ADDU "%[tmp0],   %[tmp0],%[tmp1]   \n\t"
+#if HAVE_BIGENDIAN
+"lhu  %[tmp0],   0(%[c_bytestream])\n\t"
 #else
-"

[FFmpeg-devel] [PATCH v3 5/5] mips: Fix potential illegal instruction error.

2021-03-30 Thread Shiyou Yin

MSA2 optimizations are attached to MSA macros in generic_macros_msa.h.
It's difficult to do runtime check for them. Remove this part of code
can make it more robust. H264 1080p decoding: 5.13x==>5.12x.
---
 configure   |  7 +--
 libavutil/mips/generic_macros_msa.h | 37 -
 2 files changed, 1 insertion(+), 43 deletions(-)

diff --git a/configure b/configure
index d7a3f50..7b05612 100755
--- a/configure
+++ b/configure
@@ -451,7 +451,6 @@ Optimization options (experts only):
   --disable-mipsdspdisable MIPS DSP ASE R1 optimizations
   --disable-mipsdspr2  disable MIPS DSP ASE R2 optimizations
   --disable-msadisable MSA optimizations
-  --disable-msa2   disable MSA2 optimizations
   --disable-mipsfpudisable floating point MIPS optimizations
   --disable-mmidisable Loongson SIMD optimizations
   --disable-fast-unaligned consider unaligned accesses slow
@@ -2025,7 +2024,6 @@ ARCH_EXT_LIST_MIPS="
 mipsdsp
 mipsdspr2
 msa
-msa2
 "
 
 ARCH_EXT_LIST_LOONGSON="
@@ -2564,7 +2562,6 @@ mipsdsp_deps="mips"
 mipsdspr2_deps="mips"
 mmi_deps_any="loongson2 loongson3"
 msa_deps="mipsfpu"
-msa2_deps="msa"
 
 cpunop_deps="i686"
 x86_64_select="i686"
@@ -5907,9 +5904,8 @@ elif enabled mips; then
 enabled mipsdsp && check_inline_asm_flags mipsdsp '"addu.qb $t0, $t1, 
$t2"' '-mdsp'
 enabled mipsdspr2 && check_inline_asm_flags mipsdspr2 '"absq_s.qb $t0, 
$t1"' '-mdspr2'
 
-# MSA and MSA2 can be detected at runtime so we supply extra flags here
+# MSA can be detected at runtime so we supply extra flags here
 enabled mipsfpu && enabled msa && check_inline_asm msa '"addvi.b $w0, $w1, 
1"' '-mmsa' && append MSAFLAGS '-mmsa'
-enabled msa && enabled msa2 && check_inline_asm msa2 '"nxbits.any.b $w0, 
$w0"' '-mmsa2' && append MSAFLAGS '-mmsa2'
 
 # loongson2 have no switch cflag so we can only probe toolchain ability
 enabled loongson2 && check_inline_asm loongson2 '"dmult.g $8, $9, $10"' && 
disable loongson3
@@ -7340,7 +7336,6 @@ if enabled mips; then
 echo "MIPS DSP R1 enabled   ${mipsdsp-no}"
 echo "MIPS DSP R2 enabled   ${mipsdspr2-no}"
 echo "MIPS MSA enabled  ${msa-no}"
-echo "MIPS MSA2 enabled ${msa2-no}"
 echo "LOONGSON MMI enabled  ${mmi-no}"
 fi
 if enabled ppc; then
diff --git a/libavutil/mips/generic_macros_msa.h 
b/libavutil/mips/generic_macros_msa.h
index bb25e9f..1486f72 100644
--- a/libavutil/mips/generic_macros_msa.h
+++ b/libavutil/mips/generic_macros_msa.h
@@ -25,10 +25,6 @@
 #include 
 #include 
 
-#if HAVE_MSA2
-#include 
-#endif
-
 #define ALIGNMENT   16
 #define ALLOC_ALIGNED(align) __attribute__ ((aligned((align) << 1)))
 
@@ -1119,15 +1115,6 @@
  unsigned absolute diff values, even-odd pairs are added
  together to generate 8 halfword results.
 */
-#if HAVE_MSA2
-#define SAD_UB2_UH(in0, in1, ref0, ref1) \
-( {  \
-v8u16 sad_m = { 0 }; \
-sad_m += __builtin_msa2_sad_adj2_u_w2x_b((v16u8) in0, (v16u8) ref0); \
-sad_m += __builtin_msa2_sad_adj2_u_w2x_b((v16u8) in1, (v16u8) ref1); \
-sad_m;   \
-} )
-#else
 #define SAD_UB2_UH(in0, in1, ref0, ref1)\
 ( { \
 v16u8 diff0_m, diff1_m; \
@@ -1141,7 +1128,6 @@
 \
 sad_m;  \
 } )
-#endif // #if HAVE_MSA2
 
 /* Description : Insert specified word elements from input vectors to 1
  destination vector
@@ -2183,12 +2169,6 @@
  extracted and interleaved with same vector 'in0' to generate
  4 word elements keeping sign intact
 */
-#if HAVE_MSA2
-#define UNPCK_R_SH_SW(in, out)   \
-{\
-out = (v4i32) __builtin_msa2_w2x_lo_s_h((v8i16) in); \
-}
-#else
 #define UNPCK_R_SH_SW(in, out)   \
 {\
 v8i16 sign_m;\
@@ -2196,7 +2176,6 @@
 sign_m = __msa_clti_s_h((v8i16) in, 0);  \
 out = (v4i32) __msa_ilvr_h(sign_m, (v8i16) in);  \
 }
-#endif // #if HAVE_MSA2
 
 /* Description : Sign extend byte elements from input vector and return
  halfword results in pair of vectors
@@ -2209,13 +2188,6 @@
  Then interleaved left with same vector 'in0' to
  generate 8 signed halfword elements in 'out1'
 */
-#if HAVE_MSA2
-#define UNPCK_SB_SH(in, out0, out1)   \
-{

[FFmpeg-devel] [PATCH v3 4/5] avcodec/mips: Refine ff_h264_h_lpf_luma_inter_msa

2021-03-30 Thread Shiyou Yin

From: gxw 

Using mask to avoid judgment, H264 4K decoding speed
improved about 0.1fps tested on 3A4000

Signed-off-by: Shiyou Yin 
---
 libavcodec/mips/h264dsp_msa.c | 465 --
 1 file changed, 171 insertions(+), 294 deletions(-)

diff --git a/libavcodec/mips/h264dsp_msa.c b/libavcodec/mips/h264dsp_msa.c
index a8c3f3c..9d815f8 100644
--- a/libavcodec/mips/h264dsp_msa.c
+++ b/libavcodec/mips/h264dsp_msa.c
@@ -1284,284 +1284,160 @@ static void 
avc_loopfilter_cb_or_cr_intra_edge_ver_msa(uint8_t *data_cb_or_cr,
 }
 }
 
-static void avc_loopfilter_luma_inter_edge_ver_msa(uint8_t *data,
-   uint8_t bs0, uint8_t bs1,
-   uint8_t bs2, uint8_t bs3,
-   uint8_t tc0, uint8_t tc1,
-   uint8_t tc2, uint8_t tc3,
-   uint8_t alpha_in,
-   uint8_t beta_in,
-   ptrdiff_t img_width)
+static void avc_loopfilter_luma_inter_edge_ver_msa(uint8_t* pPix, uint32_t 
iStride,
+   uint8_t iAlpha, uint8_t 
iBeta,
+   uint8_t* pTc)
 {
-v16u8 tmp_vec, bs = { 0 };
-
-tmp_vec = (v16u8) __msa_fill_b(bs0);
-bs = (v16u8) __msa_insve_w((v4i32) bs, 0, (v4i32) tmp_vec);
-tmp_vec = (v16u8) __msa_fill_b(bs1);
-bs = (v16u8) __msa_insve_w((v4i32) bs, 1, (v4i32) tmp_vec);
-tmp_vec = (v16u8) __msa_fill_b(bs2);
-bs = (v16u8) __msa_insve_w((v4i32) bs, 2, (v4i32) tmp_vec);
-tmp_vec = (v16u8) __msa_fill_b(bs3);
-bs = (v16u8) __msa_insve_w((v4i32) bs, 3, (v4i32) tmp_vec);
-
-if (!__msa_test_bz_v(bs)) {
-uint8_t *src = data - 4;
-v16u8 p3_org, p2_org, p1_org, p0_org, q0_org, q1_org, q2_org, q3_org;
-v16u8 p0_asub_q0, p1_asub_p0, q1_asub_q0, alpha, beta;
-v16u8 is_less_than, is_less_than_beta, is_less_than_alpha;
-v16u8 is_bs_greater_than0;
-v16u8 tc = { 0 };
-v16i8 zero = { 0 };
-
-tmp_vec = (v16u8) __msa_fill_b(tc0);
-tc = (v16u8) __msa_insve_w((v4i32) tc, 0, (v4i32) tmp_vec);
-tmp_vec = (v16u8) __msa_fill_b(tc1);
-tc = (v16u8) __msa_insve_w((v4i32) tc, 1, (v4i32) tmp_vec);
-tmp_vec = (v16u8) __msa_fill_b(tc2);
-tc = (v16u8) __msa_insve_w((v4i32) tc, 2, (v4i32) tmp_vec);
-tmp_vec = (v16u8) __msa_fill_b(tc3);
-tc = (v16u8) __msa_insve_w((v4i32) tc, 3, (v4i32) tmp_vec);
-
-is_bs_greater_than0 = (zero < bs);
-
-{
-v16u8 row0, row1, row2, row3, row4, row5, row6, row7;
-v16u8 row8, row9, row10, row11, row12, row13, row14, row15;
-
-LD_UB8(src, img_width,
-   row0, row1, row2, row3, row4, row5, row6, row7);
-src += (8 * img_width);
-LD_UB8(src, img_width,
-   row8, row9, row10, row11, row12, row13, row14, row15);
-
-TRANSPOSE16x8_UB_UB(row0, row1, row2, row3, row4, row5, row6, row7,
-row8, row9, row10, row11,
-row12, row13, row14, row15,
-p3_org, p2_org, p1_org, p0_org,
-q0_org, q1_org, q2_org, q3_org);
-}
-
-p0_asub_q0 = __msa_asub_u_b(p0_org, q0_org);
-p1_asub_p0 = __msa_asub_u_b(p1_org, p0_org);
-q1_asub_q0 = __msa_asub_u_b(q1_org, q0_org);
-
-alpha = (v16u8) __msa_fill_b(alpha_in);
-beta = (v16u8) __msa_fill_b(beta_in);
-
-is_less_than_alpha = (p0_asub_q0 < alpha);
-is_less_than_beta = (p1_asub_p0 < beta);
-is_less_than = is_less_than_beta & is_less_than_alpha;
-is_less_than_beta = (q1_asub_q0 < beta);
-is_less_than = is_less_than_beta & is_less_than;
-is_less_than = is_less_than & is_bs_greater_than0;
-
-if (!__msa_test_bz_v(is_less_than)) {
-v16i8 negate_tc, sign_negate_tc;
-v16u8 p0, q0, p2_asub_p0, q2_asub_q0;
-v8i16 tc_r, tc_l, negate_tc_r, i16_negatetc_l;
-v8i16 p1_org_r, p0_org_r, q0_org_r, q1_org_r;
-v8i16 p1_org_l, p0_org_l, q0_org_l, q1_org_l;
-v8i16 p0_r, q0_r, p0_l, q0_l;
-
-negate_tc = zero - (v16i8) tc;
-sign_negate_tc = __msa_clti_s_b(negate_tc, 0);
-
-ILVRL_B2_SH(sign_negate_tc, negate_tc, negate_tc_r, 
i16_negatetc_l);
-
-UNPCK_UB_SH(tc, tc_r, tc_l);
-UNPCK_UB_SH(p1_org, p1_org_r, p1_org_l);
-UNPCK_UB_SH(p0_org, p0_org_r, p0_org_l);
-UNPCK_UB_SH(q0_org, q0_org_r, q0_org_l);
-
-p2_asub_p0 = __msa_asub_u_b(p2_org, p0_org);
-is_less_than_beta = (p2_asub_p0 < beta);
-

[FFmpeg-devel] [PATCH v3 3/5] avcodec/mips: Optimize function ff_h264_loop_filter_strength_msa.

2021-03-30 Thread Shiyou Yin

From: gxw 

Speed of decoding H264 1080P: 5.05x ==> 5.13x

Signed-off-by: Shiyou Yin 
---
 libavcodec/mips/Makefile|   3 +-
 libavcodec/mips/h264_deblock_msa.c  | 153 
 libavcodec/mips/h264dsp_init_mips.c |   2 +
 libavcodec/mips/h264dsp_mips.h  |   4 +
 4 files changed, 161 insertions(+), 1 deletion(-)
 create mode 100644 libavcodec/mips/h264_deblock_msa.c

diff --git a/libavcodec/mips/Makefile b/libavcodec/mips/Makefile
index 2be4d9b..81a73a4 100644
--- a/libavcodec/mips/Makefile
+++ b/libavcodec/mips/Makefile
@@ -57,7 +57,8 @@ MSA-OBJS-$(CONFIG_VP8_DECODER)+= 
mips/vp8_mc_msa.o \
  mips/vp8_lpf_msa.o
 MSA-OBJS-$(CONFIG_VP3DSP) += mips/vp3dsp_idct_msa.o
 MSA-OBJS-$(CONFIG_H264DSP)+= mips/h264dsp_msa.o\
- mips/h264idct_msa.o
+ mips/h264idct_msa.o   \
+ mips/h264_deblock_msa.o
 MSA-OBJS-$(CONFIG_H264QPEL)   += mips/h264qpel_msa.o
 MSA-OBJS-$(CONFIG_H264CHROMA) += mips/h264chroma_msa.o
 MSA-OBJS-$(CONFIG_H264PRED)   += mips/h264pred_msa.o
diff --git a/libavcodec/mips/h264_deblock_msa.c 
b/libavcodec/mips/h264_deblock_msa.c
new file mode 100644
index 000..4fed55c
--- /dev/null
+++ b/libavcodec/mips/h264_deblock_msa.c
@@ -0,0 +1,153 @@
+/*
+ * MIPS SIMD optimized H.264 deblocking code
+ *
+ * Copyright (c) 2020 Loongson Technology Corporation Limited
+ *Gu Xiwei 
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include "libavcodec/bit_depth_template.c"
+#include "h264dsp_mips.h"
+#include "libavutil/mips/generic_macros_msa.h"
+#include "libavcodec/mips/h264dsp_mips.h"
+
+#define h264_loop_filter_strength_iteration_msa(edges, step, mask_mv, dir, \
+d_idx, mask_dir)   \
+do {   \
+int b_idx = 0; \
+int step_x4 = step << 2; \
+int d_idx_12 = d_idx + 12; \
+int d_idx_52 = d_idx + 52; \
+int d_idx_x4 = d_idx << 2; \
+int d_idx_x4_48 = d_idx_x4 + 48; \
+int dir_x32  = dir * 32; \
+uint8_t *ref_t = (uint8_t*)ref; \
+uint8_t *mv_t  = (uint8_t*)mv; \
+uint8_t *nnz_t = (uint8_t*)nnz; \
+uint8_t *bS_t  = (uint8_t*)bS; \
+mask_mv <<= 3; \
+for (; b_idx < edges; b_idx += step) { \
+out &= mask_dir; \
+if (!(mask_mv & b_idx)) { \
+if (bidir) { \
+ref_2 = LD_SB(ref_t + d_idx_12); \
+ref_3 = LD_SB(ref_t + d_idx_52); \
+ref_0 = LD_SB(ref_t + 12); \
+ref_1 = LD_SB(ref_t + 52); \
+ref_2 = (v16i8)__msa_ilvr_w((v4i32)ref_3, (v4i32)ref_2); \
+ref_0 = (v16i8)__msa_ilvr_w((v4i32)ref_0, (v4i32)ref_0); \
+ref_1 = (v16i8)__msa_ilvr_w((v4i32)ref_1, (v4i32)ref_1); \
+ref_3 = (v16i8)__msa_shf_h((v8i16)ref_2, 0x4e); \
+ref_0 -= ref_2; \
+ref_1 -= ref_3; \
+ref_0 = (v16i8)__msa_or_v((v16u8)ref_0, (v16u8)ref_1); \
+\
+tmp_2 = LD_SH(mv_t + d_idx_x4_48);   \
+tmp_3 = LD_SH(mv_t + 48); \
+tmp_4 = LD_SH(mv_t + 208); \
+tmp_5 = tmp_2 - tmp_3; \
+tmp_6 = tmp_2 - tmp_4; \
+SAT_SH2_SH(tmp_5, tmp_6, 7); \
+tmp_0 = __msa_pckev_b((v16i8)tmp_6, (v16i8)tmp_5); \
+tmp_0 += cnst_1; \
+tmp_0 = (v16i8)__msa_subs_u_b((v16u8)tmp_0, (v16u8)cnst_0);\
+tmp_0 = (v16i8)__msa_sat_s_h((v8i16)tmp_0, 7); \
+tmp_0 = __msa_pckev_b(tmp_0, tmp_0); \
+out   = (v16i8)__msa_or_v((v16u8)ref_0, (v16u8)tmp_0); \
+\
+tmp_2 = LD_SH(mv_t + 208 + d_idx_x4); \
+tmp_5 = tmp_2 - tmp_3; \
+tmp_6 = tmp_2 - tmp_4; \
+SAT_SH2_SH(tmp_5, tmp_6, 7); \
+tmp_1 = __msa_pckev_b((v16i8)tmp_6, (v16i8)tmp_5

[FFmpeg-devel] [PATCH v3 2/5] avcodec/mips: Refine get_cabac_inline_mips.

2021-03-30 Thread Shiyou Yin

1. Refined function get_cabac_inline_mips.
2. Optimize function get_cabac_bypass and get_cabac_bypass_sign.

Speed of decoding h264: 4.89x ==> 5.05x(tested on 3A4000).
---
 libavcodec/mips/cabac.h | 131 +---
 1 file changed, 102 insertions(+), 29 deletions(-)

diff --git a/libavcodec/mips/cabac.h b/libavcodec/mips/cabac.h
index 3d09e93..0ee7594 100644
--- a/libavcodec/mips/cabac.h
+++ b/libavcodec/mips/cabac.h
@@ -2,7 +2,8 @@
  * Loongson SIMD optimized h264chroma
  *
  * Copyright (c) 2018 Loongson Technology Corporation Limited
- * Copyright (c) 2018 Shiyou Yin 
+ * Contributed by Shiyou Yin 
+ *Gu Xiwei(guxiwei...@loongson.cn)
  *
  * This file is part of FFmpeg.
  *
@@ -25,18 +26,18 @@
 #define AVCODEC_MIPS_CABAC_H
 
 #include "libavcodec/cabac.h"
-#include "libavutil/mips/asmdefs.h"
+#include "libavutil/mips/mmiutils.h"
 #include "config.h"
 
 #define get_cabac_inline get_cabac_inline_mips
 static av_always_inline int get_cabac_inline_mips(CABACContext *c,
- uint8_t * const state){
+  uint8_t * const state){
 mips_reg tmp0, tmp1, tmp2, bit;
 
 __asm__ volatile (
 "lbu  %[bit],0(%[state])   \n\t"
 "and  %[tmp0],   %[c_range], 0xC0  \n\t"
-PTR_ADDU "%[tmp0],   %[tmp0],%[tmp0]   \n\t"
+PTR_SLL  "%[tmp0],   %[tmp0],0x01  \n\t"
 PTR_ADDU "%[tmp0],   %[tmp0],%[tables] \n\t"
 PTR_ADDU "%[tmp0],   %[tmp0],%[bit]\n\t"
 /* tmp1: RangeLPS */
@@ -44,18 +45,11 @@ static av_always_inline int 
get_cabac_inline_mips(CABACContext *c,
 
 PTR_SUBU "%[c_range],%[c_range], %[tmp1]   \n\t"
 PTR_SLL  "%[tmp0],   %[c_range], 0x11  \n\t"
-PTR_SUBU "%[tmp0],   %[tmp0],%[c_low]  \n\t"
-
-/* tmp2: lps_mask */
-PTR_SRA  "%[tmp2],   %[tmp0],0x1F  \n\t"
-/* If tmp0 < 0, lps_mask ==  0x*/
-/* If tmp0 >= 0, lps_mask ==  0x*/
+"slt  %[tmp2],   %[tmp0],%[c_low]  \n\t"
 "beqz %[tmp2],   1f\n\t"
-PTR_SLL  "%[tmp0],   %[c_range], 0x11  \n\t"
+"move %[c_range],%[tmp1]   \n\t"
+"not  %[bit],%[bit]\n\t"
 PTR_SUBU "%[c_low],  %[c_low],   %[tmp0]   \n\t"
-PTR_SUBU "%[tmp0],   %[tmp1],%[c_range]\n\t"
-PTR_ADDU "%[c_range],%[c_range], %[tmp0]   \n\t"
-"xor  %[bit],%[bit], %[tmp2]   \n\t"
 
 "1:\n\t"
 /* tmp1: *state */
@@ -70,23 +64,18 @@ static av_always_inline int 
get_cabac_inline_mips(CABACContext *c,
 PTR_SLL  "%[c_range],%[c_range], %[tmp2]   \n\t"
 PTR_SLL  "%[c_low],  %[c_low],   %[tmp2]   \n\t"
 
-"and  %[tmp0],   %[c_low],   %[cabac_mask] \n\t"
-"bnez %[tmp0],   1f\n\t"
-PTR_ADDIU"%[tmp0],   %[c_low],   -0x01 \n\t"
+"and  %[tmp1],   %[c_low],   %[cabac_mask] \n\t"
+"bnez %[tmp1],   1f\n\t"
+PTR_ADDIU"%[tmp0],   %[c_low],   -0X01 \n\t"
 "xor  %[tmp0],   %[c_low],   %[tmp0]   \n\t"
 PTR_SRA  "%[tmp0],   %[tmp0],0x0f  \n\t"
 PTR_ADDU "%[tmp0],   %[tmp0],%[tables] \n\t"
+/* tmp2: ff_h264_norm_shift[x >> (CABAC_BITS - 1)] */
 "lbu  %[tmp2],   %[norm_off](%[tmp0])  \n\t"
-#if CABAC_BITS == 16
-"lbu  %[tmp0],   0(%[c_bytestream])\n\t"
-"lbu  %[tmp1],   1(%[c_bytestream])\n\t"
-PTR_SLL  "%[tmp0],   %[tmp0],0x09  \n\t"
-PTR_SLL  "%[tmp1],   %[tmp1],0x01  \n\t"
-PTR_ADDU "%[tmp0],   %[tmp0],%[tmp1]   \n\t"
-#else
-"lbu  %[tmp0],   0(%[c_bytestream])\n\t"
+
+"lhu  %[t

[FFmpeg-devel] [mips] Optimize H264 decoding for MIPS platform.

2021-03-30 Thread Shiyou Yin

v2: Fixed a build error in [PATCH 2/5].
v3: add patch 4/5.

[PATCH v3 1/5] avcodec/mips: Restore the initialization sequence of
[PATCH v3 2/5] avcodec/mips: Refine get_cabac_inline_mips.
[PATCH v3 3/5] avcodec/mips: Optimize function ff_h264_loop_filter_strength_msa.
[PATCH v3 4/5] avcodec/mips: Refine ff_h264_h_lpf_luma_inter_msa
[PATCH v3 5/5] mips: Fix potential illegal instruction error.

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH v3 1/5] avcodec/mips: Restore the initialization sequence of MSA and MMI in ff_h264chroma_init_mips.

2021-03-30 Thread Shiyou Yin

The MSA optimization has been refined in commit 93218c2 and ce0a52e.
It is better than MMI version now.
Speed of decoding H264: 4.83x ==> 4.89x (tested on 3A4000).
---
 libavcodec/mips/h264chroma_init_mips.c | 19 +--
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/libavcodec/mips/h264chroma_init_mips.c 
b/libavcodec/mips/h264chroma_init_mips.c
index 6bb19d3..755cc04 100644
--- a/libavcodec/mips/h264chroma_init_mips.c
+++ b/libavcodec/mips/h264chroma_init_mips.c
@@ -28,7 +28,15 @@ av_cold void ff_h264chroma_init_mips(H264ChromaContext *c, 
int bit_depth)
 int cpu_flags = av_get_cpu_flags();
 int high_bit_depth = bit_depth > 8;
 
-/* MMI apears to be faster than MSA here */
+if (have_mmi(cpu_flags)) {
+if (!high_bit_depth) {
+c->put_h264_chroma_pixels_tab[0] = ff_put_h264_chroma_mc8_mmi;
+c->avg_h264_chroma_pixels_tab[0] = ff_avg_h264_chroma_mc8_mmi;
+c->put_h264_chroma_pixels_tab[1] = ff_put_h264_chroma_mc4_mmi;
+c->avg_h264_chroma_pixels_tab[1] = ff_avg_h264_chroma_mc4_mmi;
+}
+}
+
 if (have_msa(cpu_flags)) {
 if (!high_bit_depth) {
 c->put_h264_chroma_pixels_tab[0] = ff_put_h264_chroma_mc8_msa;
@@ -40,13 +48,4 @@ av_cold void ff_h264chroma_init_mips(H264ChromaContext *c, 
int bit_depth)
 c->avg_h264_chroma_pixels_tab[2] = ff_avg_h264_chroma_mc2_msa;
 }
 }
-
-if (have_mmi(cpu_flags)) {
-if (!high_bit_depth) {
-c->put_h264_chroma_pixels_tab[0] = ff_put_h264_chroma_mc8_mmi;
-c->avg_h264_chroma_pixels_tab[0] = ff_avg_h264_chroma_mc8_mmi;
-c->put_h264_chroma_pixels_tab[1] = ff_put_h264_chroma_mc4_mmi;
-c->avg_h264_chroma_pixels_tab[1] = ff_avg_h264_chroma_mc4_mmi;
-}
-}
 }
-- 
2.1.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH v3 3/4] avcodec/mips: Optimize function ff_h264_loop_filter_strength_msa.

2020-10-19 Thread Shiyou Yin

From: gxw 

Speed of decoding H264: 5.45x ==> 5.53x

Signed-off-by: Shiyou Yin 
---
 libavcodec/mips/Makefile|   3 +-
 libavcodec/mips/h264_deblock_msa.c  | 153 
 libavcodec/mips/h264dsp_init_mips.c |   2 +
 libavcodec/mips/h264dsp_mips.h  |   4 +
 4 files changed, 161 insertions(+), 1 deletion(-)
 create mode 100644 libavcodec/mips/h264_deblock_msa.c

diff --git a/libavcodec/mips/Makefile b/libavcodec/mips/Makefile
index 2be4d9b..81a73a4 100644
--- a/libavcodec/mips/Makefile
+++ b/libavcodec/mips/Makefile
@@ -57,7 +57,8 @@ MSA-OBJS-$(CONFIG_VP8_DECODER)+= 
mips/vp8_mc_msa.o \
  mips/vp8_lpf_msa.o
 MSA-OBJS-$(CONFIG_VP3DSP) += mips/vp3dsp_idct_msa.o
 MSA-OBJS-$(CONFIG_H264DSP)+= mips/h264dsp_msa.o\
- mips/h264idct_msa.o
+ mips/h264idct_msa.o   \
+ mips/h264_deblock_msa.o
 MSA-OBJS-$(CONFIG_H264QPEL)   += mips/h264qpel_msa.o
 MSA-OBJS-$(CONFIG_H264CHROMA) += mips/h264chroma_msa.o
 MSA-OBJS-$(CONFIG_H264PRED)   += mips/h264pred_msa.o
diff --git a/libavcodec/mips/h264_deblock_msa.c 
b/libavcodec/mips/h264_deblock_msa.c
new file mode 100644
index 000..4fed55c
--- /dev/null
+++ b/libavcodec/mips/h264_deblock_msa.c
@@ -0,0 +1,153 @@
+/*
+ * MIPS SIMD optimized H.264 deblocking code
+ *
+ * Copyright (c) 2020 Loongson Technology Corporation Limited
+ *Gu Xiwei 
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include "libavcodec/bit_depth_template.c"
+#include "h264dsp_mips.h"
+#include "libavutil/mips/generic_macros_msa.h"
+#include "libavcodec/mips/h264dsp_mips.h"
+
+#define h264_loop_filter_strength_iteration_msa(edges, step, mask_mv, dir, \
+d_idx, mask_dir)   \
+do {   \
+int b_idx = 0; \
+int step_x4 = step << 2; \
+int d_idx_12 = d_idx + 12; \
+int d_idx_52 = d_idx + 52; \
+int d_idx_x4 = d_idx << 2; \
+int d_idx_x4_48 = d_idx_x4 + 48; \
+int dir_x32  = dir * 32; \
+uint8_t *ref_t = (uint8_t*)ref; \
+uint8_t *mv_t  = (uint8_t*)mv; \
+uint8_t *nnz_t = (uint8_t*)nnz; \
+uint8_t *bS_t  = (uint8_t*)bS; \
+mask_mv <<= 3; \
+for (; b_idx < edges; b_idx += step) { \
+out &= mask_dir; \
+if (!(mask_mv & b_idx)) { \
+if (bidir) { \
+ref_2 = LD_SB(ref_t + d_idx_12); \
+ref_3 = LD_SB(ref_t + d_idx_52); \
+ref_0 = LD_SB(ref_t + 12); \
+ref_1 = LD_SB(ref_t + 52); \
+ref_2 = (v16i8)__msa_ilvr_w((v4i32)ref_3, (v4i32)ref_2); \
+ref_0 = (v16i8)__msa_ilvr_w((v4i32)ref_0, (v4i32)ref_0); \
+ref_1 = (v16i8)__msa_ilvr_w((v4i32)ref_1, (v4i32)ref_1); \
+ref_3 = (v16i8)__msa_shf_h((v8i16)ref_2, 0x4e); \
+ref_0 -= ref_2; \
+ref_1 -= ref_3; \
+ref_0 = (v16i8)__msa_or_v((v16u8)ref_0, (v16u8)ref_1); \
+\
+tmp_2 = LD_SH(mv_t + d_idx_x4_48);   \
+tmp_3 = LD_SH(mv_t + 48); \
+tmp_4 = LD_SH(mv_t + 208); \
+tmp_5 = tmp_2 - tmp_3; \
+tmp_6 = tmp_2 - tmp_4; \
+SAT_SH2_SH(tmp_5, tmp_6, 7); \
+tmp_0 = __msa_pckev_b((v16i8)tmp_6, (v16i8)tmp_5); \
+tmp_0 += cnst_1; \
+tmp_0 = (v16i8)__msa_subs_u_b((v16u8)tmp_0, (v16u8)cnst_0);\
+tmp_0 = (v16i8)__msa_sat_s_h((v8i16)tmp_0, 7); \
+tmp_0 = __msa_pckev_b(tmp_0, tmp_0); \
+out   = (v16i8)__msa_or_v((v16u8)ref_0, (v16u8)tmp_0); \
+\
+tmp_2 = LD_SH(mv_t + 208 + d_idx_x4); \
+tmp_5 = tmp_2 - tmp_3; \
+tmp_6 = tmp_2 - tmp_4; \
+SAT_SH2_SH(tmp_5, tmp_6, 7); \
+tmp_1 = __msa_pckev_b((v16i8)tmp_6, (v16i8)tmp_5

[FFmpeg-devel] [PATCH v3 4/4] Fix potential illegal instruction error.

2020-10-19 Thread Shiyou Yin

MSA2 optimizations are attached to MSA macros in generic_macros_msa.h.
It's difficult to do runtime check for them. Remove this part of code
can make it more robust.
Impact on performance is not obvious(167fps==>166fps).
---
 configure   |  7 +--
 libavutil/mips/generic_macros_msa.h | 37 -
 2 files changed, 1 insertion(+), 43 deletions(-)

diff --git a/configure b/configure
index 8e451ca..f1f8ee9 100755
--- a/configure
+++ b/configure
@@ -450,7 +450,6 @@ Optimization options (experts only):
   --disable-mipsdspdisable MIPS DSP ASE R1 optimizations
   --disable-mipsdspr2  disable MIPS DSP ASE R2 optimizations
   --disable-msadisable MSA optimizations
-  --disable-msa2   disable MSA2 optimizations
   --disable-mipsfpudisable floating point MIPS optimizations
   --disable-mmidisable Loongson SIMD optimizations
   --disable-fast-unaligned consider unaligned accesses slow
@@ -2023,7 +2022,6 @@ ARCH_EXT_LIST_MIPS="
 mipsdsp
 mipsdspr2
 msa
-msa2
 "
 
 ARCH_EXT_LIST_LOONGSON="
@@ -2560,7 +2558,6 @@ mipsdsp_deps="mips"
 mipsdspr2_deps="mips"
 mmi_deps_any="loongson2 loongson3"
 msa_deps="mipsfpu"
-msa2_deps="msa"
 
 cpunop_deps="i686"
 x86_64_select="i686"
@@ -5886,9 +5883,8 @@ elif enabled mips; then
 enabled mipsdsp && check_inline_asm_flags mipsdsp '"addu.qb $t0, $t1, 
$t2"' '-mdsp'
 enabled mipsdspr2 && check_inline_asm_flags mipsdspr2 '"absq_s.qb $t0, 
$t1"' '-mdspr2'
 
-# MSA and MSA2 can be detected at runtime so we supply extra flags here
+# MSA can be detected at runtime so we supply extra flags here
 enabled mipsfpu && enabled msa && check_inline_asm msa '"addvi.b $w0, $w1, 
1"' '-mmsa' && append MSAFLAGS '-mmsa'
-enabled msa && enabled msa2 && check_inline_asm msa2 '"nxbits.any.b $w0, 
$w0"' '-mmsa2' && append MSAFLAGS '-mmsa2'
 
 # loongson2 have no switch cflag so we can only probe toolchain ability
 enabled loongson2 && check_inline_asm loongson2 '"dmult.g $8, $9, $10"' && 
disable loongson3
@@ -7312,7 +7308,6 @@ if enabled mips; then
 echo "MIPS DSP R1 enabled   ${mipsdsp-no}"
 echo "MIPS DSP R2 enabled   ${mipsdspr2-no}"
 echo "MIPS MSA enabled  ${msa-no}"
-echo "MIPS MSA2 enabled ${msa2-no}"
 echo "LOONGSON MMI enabled  ${mmi-no}"
 fi
 if enabled ppc; then
diff --git a/libavutil/mips/generic_macros_msa.h 
b/libavutil/mips/generic_macros_msa.h
index bb25e9f..1486f72 100644
--- a/libavutil/mips/generic_macros_msa.h
+++ b/libavutil/mips/generic_macros_msa.h
@@ -25,10 +25,6 @@
 #include 
 #include 
 
-#if HAVE_MSA2
-#include 
-#endif
-
 #define ALIGNMENT   16
 #define ALLOC_ALIGNED(align) __attribute__ ((aligned((align) << 1)))
 
@@ -1119,15 +1115,6 @@
  unsigned absolute diff values, even-odd pairs are added
  together to generate 8 halfword results.
 */
-#if HAVE_MSA2
-#define SAD_UB2_UH(in0, in1, ref0, ref1) \
-( {  \
-v8u16 sad_m = { 0 }; \
-sad_m += __builtin_msa2_sad_adj2_u_w2x_b((v16u8) in0, (v16u8) ref0); \
-sad_m += __builtin_msa2_sad_adj2_u_w2x_b((v16u8) in1, (v16u8) ref1); \
-sad_m;   \
-} )
-#else
 #define SAD_UB2_UH(in0, in1, ref0, ref1)\
 ( { \
 v16u8 diff0_m, diff1_m; \
@@ -1141,7 +1128,6 @@
 \
 sad_m;  \
 } )
-#endif // #if HAVE_MSA2
 
 /* Description : Insert specified word elements from input vectors to 1
  destination vector
@@ -2183,12 +2169,6 @@
  extracted and interleaved with same vector 'in0' to generate
  4 word elements keeping sign intact
 */
-#if HAVE_MSA2
-#define UNPCK_R_SH_SW(in, out)   \
-{\
-out = (v4i32) __builtin_msa2_w2x_lo_s_h((v8i16) in); \
-}
-#else
 #define UNPCK_R_SH_SW(in, out)   \
 {\
 v8i16 sign_m;\
@@ -2196,7 +2176,6 @@
 sign_m = __msa_clti_s_h((v8i16) in, 0);  \
 out = (v4i32) __msa_ilvr_h(sign_m, (v8i16) in);  \
 }
-#endif // #if HAVE_MSA2
 
 /* Description : Sign extend byte elements from input vector and return
  halfword results in pair of vectors
@@ -2209,13 +2188,6 @@
  Then interleaved left with same vector 'in0' to
  generate 8 signed halfword elements in 'out1'
 */
-#if HAVE_MSA2
-#define UNPCK_SB_SH(in, out0, out1)

[FFmpeg-devel] MIPS: Optimize H264 decoding.

2020-10-19 Thread Shiyou Yin

H264 decoding speed: 154fps ==> 165fps, 5.14x ==> 5.53x (tested on 3A4000)

V2: Fixed a build error in [PATCH 2/3].
"Error: opcode not supported on this processor: mips32r2 (mips32r2) `dsbh 
$10,$10'"

V3: Add a fix patch to make MSA optimization more robust.

[PATCH v3 1/4] avcodec/mips: Restore the initialization sequence of.
[PATCH v3 2/4] avcodec/mips: Refine get_cabac_inline_mips.
[PATCH v3 3/4] avcodec/mips: Optimize function.
[PATCH v3 4/4] Fix potential illegal instruction error.

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH v3 1/4] avcodec/mips: Restore the initialization sequence of MSA and MMI in ff_h264chroma_init_mips.

2020-10-19 Thread Shiyou Yin

The MSA version has been refined in commit 93218c2 and ce0a52e,
and is better than MMI version now.
Speed of decoding H264: 5.14x ==> 5.23x (tested on 3A4000).
---
 libavcodec/mips/h264chroma_init_mips.c | 19 +--
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/libavcodec/mips/h264chroma_init_mips.c 
b/libavcodec/mips/h264chroma_init_mips.c
index 6bb19d3..755cc04 100644
--- a/libavcodec/mips/h264chroma_init_mips.c
+++ b/libavcodec/mips/h264chroma_init_mips.c
@@ -28,7 +28,15 @@ av_cold void ff_h264chroma_init_mips(H264ChromaContext *c, 
int bit_depth)
 int cpu_flags = av_get_cpu_flags();
 int high_bit_depth = bit_depth > 8;
 
-/* MMI apears to be faster than MSA here */
+if (have_mmi(cpu_flags)) {
+if (!high_bit_depth) {
+c->put_h264_chroma_pixels_tab[0] = ff_put_h264_chroma_mc8_mmi;
+c->avg_h264_chroma_pixels_tab[0] = ff_avg_h264_chroma_mc8_mmi;
+c->put_h264_chroma_pixels_tab[1] = ff_put_h264_chroma_mc4_mmi;
+c->avg_h264_chroma_pixels_tab[1] = ff_avg_h264_chroma_mc4_mmi;
+}
+}
+
 if (have_msa(cpu_flags)) {
 if (!high_bit_depth) {
 c->put_h264_chroma_pixels_tab[0] = ff_put_h264_chroma_mc8_msa;
@@ -40,13 +48,4 @@ av_cold void ff_h264chroma_init_mips(H264ChromaContext *c, 
int bit_depth)
 c->avg_h264_chroma_pixels_tab[2] = ff_avg_h264_chroma_mc2_msa;
 }
 }
-
-if (have_mmi(cpu_flags)) {
-if (!high_bit_depth) {
-c->put_h264_chroma_pixels_tab[0] = ff_put_h264_chroma_mc8_mmi;
-c->avg_h264_chroma_pixels_tab[0] = ff_avg_h264_chroma_mc8_mmi;
-c->put_h264_chroma_pixels_tab[1] = ff_put_h264_chroma_mc4_mmi;
-c->avg_h264_chroma_pixels_tab[1] = ff_avg_h264_chroma_mc4_mmi;
-}
-}
 }
-- 
2.1.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH v3 2/4] avcodec/mips: Refine get_cabac_inline_mips.

2020-10-19 Thread Shiyou Yin

1. Refined function get_cabac_inline_mips.
2. Optimize function get_cabac_bypass and get_cabac_bypass_sign.

Speed of decoding h264: 5.23x ==> 5.45x(tested on 3A4000).
---
 libavcodec/mips/cabac.h | 131 +---
 1 file changed, 102 insertions(+), 29 deletions(-)

diff --git a/libavcodec/mips/cabac.h b/libavcodec/mips/cabac.h
index 3d09e93..0ee7594 100644
--- a/libavcodec/mips/cabac.h
+++ b/libavcodec/mips/cabac.h
@@ -2,7 +2,8 @@
  * Loongson SIMD optimized h264chroma
  *
  * Copyright (c) 2018 Loongson Technology Corporation Limited
- * Copyright (c) 2018 Shiyou Yin 
+ * Contributed by Shiyou Yin 
+ *Gu Xiwei(guxiwei...@loongson.cn)
  *
  * This file is part of FFmpeg.
  *
@@ -25,18 +26,18 @@
 #define AVCODEC_MIPS_CABAC_H
 
 #include "libavcodec/cabac.h"
-#include "libavutil/mips/asmdefs.h"
+#include "libavutil/mips/mmiutils.h"
 #include "config.h"
 
 #define get_cabac_inline get_cabac_inline_mips
 static av_always_inline int get_cabac_inline_mips(CABACContext *c,
- uint8_t * const state){
+  uint8_t * const state){
 mips_reg tmp0, tmp1, tmp2, bit;
 
 __asm__ volatile (
 "lbu  %[bit],0(%[state])   \n\t"
 "and  %[tmp0],   %[c_range], 0xC0  \n\t"
-PTR_ADDU "%[tmp0],   %[tmp0],%[tmp0]   \n\t"
+PTR_SLL  "%[tmp0],   %[tmp0],0x01  \n\t"
 PTR_ADDU "%[tmp0],   %[tmp0],%[tables] \n\t"
 PTR_ADDU "%[tmp0],   %[tmp0],%[bit]\n\t"
 /* tmp1: RangeLPS */
@@ -44,18 +45,11 @@ static av_always_inline int 
get_cabac_inline_mips(CABACContext *c,
 
 PTR_SUBU "%[c_range],%[c_range], %[tmp1]   \n\t"
 PTR_SLL  "%[tmp0],   %[c_range], 0x11  \n\t"
-PTR_SUBU "%[tmp0],   %[tmp0],%[c_low]  \n\t"
-
-/* tmp2: lps_mask */
-PTR_SRA  "%[tmp2],   %[tmp0],0x1F  \n\t"
-/* If tmp0 < 0, lps_mask ==  0x*/
-/* If tmp0 >= 0, lps_mask ==  0x*/
+"slt  %[tmp2],   %[tmp0],%[c_low]  \n\t"
 "beqz %[tmp2],   1f\n\t"
-PTR_SLL  "%[tmp0],   %[c_range], 0x11  \n\t"
+"move %[c_range],%[tmp1]   \n\t"
+"not  %[bit],%[bit]\n\t"
 PTR_SUBU "%[c_low],  %[c_low],   %[tmp0]   \n\t"
-PTR_SUBU "%[tmp0],   %[tmp1],%[c_range]\n\t"
-PTR_ADDU "%[c_range],%[c_range], %[tmp0]   \n\t"
-"xor  %[bit],%[bit], %[tmp2]   \n\t"
 
 "1:\n\t"
 /* tmp1: *state */
@@ -70,23 +64,18 @@ static av_always_inline int 
get_cabac_inline_mips(CABACContext *c,
 PTR_SLL  "%[c_range],%[c_range], %[tmp2]   \n\t"
 PTR_SLL  "%[c_low],  %[c_low],   %[tmp2]   \n\t"
 
-"and  %[tmp0],   %[c_low],   %[cabac_mask] \n\t"
-"bnez %[tmp0],   1f\n\t"
-PTR_ADDIU"%[tmp0],   %[c_low],   -0x01 \n\t"
+"and  %[tmp1],   %[c_low],   %[cabac_mask] \n\t"
+"bnez %[tmp1],   1f\n\t"
+PTR_ADDIU"%[tmp0],   %[c_low],   -0X01 \n\t"
 "xor  %[tmp0],   %[c_low],   %[tmp0]   \n\t"
 PTR_SRA  "%[tmp0],   %[tmp0],0x0f  \n\t"
 PTR_ADDU "%[tmp0],   %[tmp0],%[tables] \n\t"
+/* tmp2: ff_h264_norm_shift[x >> (CABAC_BITS - 1)] */
 "lbu  %[tmp2],   %[norm_off](%[tmp0])  \n\t"
-#if CABAC_BITS == 16
-"lbu  %[tmp0],   0(%[c_bytestream])\n\t"
-"lbu  %[tmp1],   1(%[c_bytestream])\n\t"
-PTR_SLL  "%[tmp0],   %[tmp0],0x09  \n\t"
-PTR_SLL  "%[tmp1],   %[tmp1],0x01  \n\t"
-PTR_ADDU "%[tmp0],   %[tmp0],%[tmp1]   \n\t"
-#else
-"lbu  %[tmp0],   0(%[c_bytestream])\n\t"
+
+"lhu  %[t

Re: [FFmpeg-devel] [PATCH 1/2] avcodec/mips: [loongson] Fixed mmi optimization

2020-09-10 Thread Shiyou Yin

>-Original Message-
>From: ffmpeg-devel-boun...@ffmpeg.org [mailto:ffmpeg-devel-boun...@ffmpeg.org] 
>On Behalf Of
>Shiyou Yin
>Sent: Thursday, September 3, 2020 2:30 PM
>To: ffmpeg-devel@ffmpeg.org
>Cc: gxw
>Subject: [FFmpeg-devel] [PATCH 1/2] avcodec/mips: [loongson] Fixed mmi 
>optimization
>
>From: gxw 
>
>Test case fate-checkasm-h264pred failed in latest community code.
>This patch fixed the bug.
>
>Signed-off-by: Shiyou Yin 
>---
> libavcodec/mips/h264pred_mmi.c | 8 ++--
> 1 file changed, 6 insertions(+), 2 deletions(-)
>
>diff --git a/libavcodec/mips/h264pred_mmi.c b/libavcodec/mips/h264pred_mmi.c
>index f4fe091..0209c2e 100644
>--- a/libavcodec/mips/h264pred_mmi.c
>+++ b/libavcodec/mips/h264pred_mmi.c
>@@ -178,7 +178,9 @@ void ff_pred8x8l_top_dc_8_mmi(uint8_t *src, int 
>has_topleft,
>
> "1: \n\t"
> "bnez   %[has_topright],2f  \n\t"
>-"pinsrh_3   %[ftmp2],   %[ftmp2],   %[ftmp4]\n\t"
>+"dli%[tmp0],0xa4\n\t"
>+"mtc1   %[tmp0],%[ftmp1]\n\t"
>+"pshufh %[ftmp2],   %[ftmp2],   %[ftmp1]\n\t"
>
> "2: \n\t"
> "dli%[tmp0],0x02\n\t"
>@@ -370,7 +372,9 @@ void ff_pred8x8l_vertical_8_mmi(uint8_t *src, int 
>has_topleft,
>
> "1: \n\t"
> "bnez   %[has_topright],2f  \n\t"
>-"pinsrh_3   %[ftmp11],  %[ftmp11],  %[ftmp9]\n\t"
>+"dli%[tmp0],0xa4\n\t"
>+"mtc1   %[tmp0],%[ftmp1]\n\t"
>+"pshufh %[ftmp11],  %[ftmp11],  %[ftmp1]\n\t"
>
> "2: \n\t"
> "dli%[tmp0],0x02\n\t"
>--
>2.1.0
>

Ping.

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH 2/2] Fix msa can't be disabled when '--cpu=loongson3a' assigned.

2020-09-10 Thread Shiyou Yin

>-Original Message-
>From: ffmpeg-devel-boun...@ffmpeg.org [mailto:ffmpeg-devel-boun...@ffmpeg.org] 
>On Behalf Of
>Shiyou Yin
>Sent: Thursday, September 3, 2020 2:30 PM
>To: ffmpeg-devel@ffmpeg.org
>Subject: [FFmpeg-devel] [PATCH 2/2] Fix msa can't be disabled when 
>'--cpu=loongson3a' assigned.
>
>There are compiler and runtime check for MSA and MMI.
>Remove the redundant setting of MSA and MMI for cores specified by "--cpu".
>
>Signed-off-by: Shiyou Yin 
>---
> configure | 9 -
> 1 file changed, 9 deletions(-)
>
>diff --git a/configure b/configure
>index 5640720..7f103fa 100755
>--- a/configure
>+++ b/configure
>@@ -5025,8 +5025,6 @@ elif enabled mips; then
> disable loongson3
> disable mipsdsp
> disable mipsdspr2
>-disable msa
>-disable mmi
>
> cpuflags="-march=$cpu"
>
>@@ -5035,17 +5033,13 @@ elif enabled mips; then
> mips1|mips3)
> ;;
> mips32r2)
>-enable msa
> enable mips32r2
> ;;
> mips32r5)
>-enable msa
> enable mips32r2
> enable mips32r5
> ;;
> mips64r2|mips64r5)
>-enable msa
>-enable mmi
> enable mips64r2
> enable loongson3
> ;;
>@@ -5062,7 +5056,6 @@ elif enabled mips; then
> enable mips32r2
> ;;
> p5600)
>-enable msa
> enable mips32r2
> enable mips32r5
> check_cflags "-mtune=p5600" && check_cflags "-msched-weight 
> -mload-store-pairs
>-funroll-loops"
>@@ -5077,7 +5070,6 @@ elif enabled mips; then
> ;;
> # Cores from Loongson
> loongson2e|loongson2f|loongson3*)
>-enable mmi
> enable local_aligned
> enable simd_align_16
> enable fast_64bit
>@@ -5100,7 +5092,6 @@ elif enabled mips; then
> case $cpu in
> loongson3*)
> enable loongson3
>-enable msa
> cpuflags="-march=loongson3a -mhard-float 
> $expensive_optimization_flag"
> ;;
> loongson2e)
>--
>2.1.0
>

Ping.

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH v2 3/3] avcodec/mips: Optimize function ff_h264_loop_filter_strength_msa.

2020-09-07 Thread Shiyou Yin

From: gxw 

Speed of decoding H264: 5.45x ==> 5.53x

Signed-off-by: Shiyou Yin 
---
 libavcodec/mips/Makefile|   3 +-
 libavcodec/mips/h264_deblock_msa.c  | 153 
 libavcodec/mips/h264dsp_init_mips.c |   2 +
 libavcodec/mips/h264dsp_mips.h  |   4 +
 4 files changed, 161 insertions(+), 1 deletion(-)
 create mode 100644 libavcodec/mips/h264_deblock_msa.c

diff --git a/libavcodec/mips/Makefile b/libavcodec/mips/Makefile
index 2be4d9b..81a73a4 100644
--- a/libavcodec/mips/Makefile
+++ b/libavcodec/mips/Makefile
@@ -57,7 +57,8 @@ MSA-OBJS-$(CONFIG_VP8_DECODER)+= 
mips/vp8_mc_msa.o \
  mips/vp8_lpf_msa.o
 MSA-OBJS-$(CONFIG_VP3DSP) += mips/vp3dsp_idct_msa.o
 MSA-OBJS-$(CONFIG_H264DSP)+= mips/h264dsp_msa.o\
- mips/h264idct_msa.o
+ mips/h264idct_msa.o   \
+ mips/h264_deblock_msa.o
 MSA-OBJS-$(CONFIG_H264QPEL)   += mips/h264qpel_msa.o
 MSA-OBJS-$(CONFIG_H264CHROMA) += mips/h264chroma_msa.o
 MSA-OBJS-$(CONFIG_H264PRED)   += mips/h264pred_msa.o
diff --git a/libavcodec/mips/h264_deblock_msa.c 
b/libavcodec/mips/h264_deblock_msa.c
new file mode 100644
index 000..4fed55c
--- /dev/null
+++ b/libavcodec/mips/h264_deblock_msa.c
@@ -0,0 +1,153 @@
+/*
+ * MIPS SIMD optimized H.264 deblocking code
+ *
+ * Copyright (c) 2020 Loongson Technology Corporation Limited
+ *Gu Xiwei 
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include "libavcodec/bit_depth_template.c"
+#include "h264dsp_mips.h"
+#include "libavutil/mips/generic_macros_msa.h"
+#include "libavcodec/mips/h264dsp_mips.h"
+
+#define h264_loop_filter_strength_iteration_msa(edges, step, mask_mv, dir, \
+d_idx, mask_dir)   \
+do {   \
+int b_idx = 0; \
+int step_x4 = step << 2; \
+int d_idx_12 = d_idx + 12; \
+int d_idx_52 = d_idx + 52; \
+int d_idx_x4 = d_idx << 2; \
+int d_idx_x4_48 = d_idx_x4 + 48; \
+int dir_x32  = dir * 32; \
+uint8_t *ref_t = (uint8_t*)ref; \
+uint8_t *mv_t  = (uint8_t*)mv; \
+uint8_t *nnz_t = (uint8_t*)nnz; \
+uint8_t *bS_t  = (uint8_t*)bS; \
+mask_mv <<= 3; \
+for (; b_idx < edges; b_idx += step) { \
+out &= mask_dir; \
+if (!(mask_mv & b_idx)) { \
+if (bidir) { \
+ref_2 = LD_SB(ref_t + d_idx_12); \
+ref_3 = LD_SB(ref_t + d_idx_52); \
+ref_0 = LD_SB(ref_t + 12); \
+ref_1 = LD_SB(ref_t + 52); \
+ref_2 = (v16i8)__msa_ilvr_w((v4i32)ref_3, (v4i32)ref_2); \
+ref_0 = (v16i8)__msa_ilvr_w((v4i32)ref_0, (v4i32)ref_0); \
+ref_1 = (v16i8)__msa_ilvr_w((v4i32)ref_1, (v4i32)ref_1); \
+ref_3 = (v16i8)__msa_shf_h((v8i16)ref_2, 0x4e); \
+ref_0 -= ref_2; \
+ref_1 -= ref_3; \
+ref_0 = (v16i8)__msa_or_v((v16u8)ref_0, (v16u8)ref_1); \
+\
+tmp_2 = LD_SH(mv_t + d_idx_x4_48);   \
+tmp_3 = LD_SH(mv_t + 48); \
+tmp_4 = LD_SH(mv_t + 208); \
+tmp_5 = tmp_2 - tmp_3; \
+tmp_6 = tmp_2 - tmp_4; \
+SAT_SH2_SH(tmp_5, tmp_6, 7); \
+tmp_0 = __msa_pckev_b((v16i8)tmp_6, (v16i8)tmp_5); \
+tmp_0 += cnst_1; \
+tmp_0 = (v16i8)__msa_subs_u_b((v16u8)tmp_0, (v16u8)cnst_0);\
+tmp_0 = (v16i8)__msa_sat_s_h((v8i16)tmp_0, 7); \
+tmp_0 = __msa_pckev_b(tmp_0, tmp_0); \
+out   = (v16i8)__msa_or_v((v16u8)ref_0, (v16u8)tmp_0); \
+\
+tmp_2 = LD_SH(mv_t + 208 + d_idx_x4); \
+tmp_5 = tmp_2 - tmp_3; \
+tmp_6 = tmp_2 - tmp_4; \
+SAT_SH2_SH(tmp_5, tmp_6, 7); \
+tmp_1 = __msa_pckev_b((v16i8)tmp_6, (v16i8)tmp_5

[FFmpeg-devel] [PATCH v2 1/3] avcodec/mips: Restore the initialization sequence of MSA and MMI in ff_h264chroma_init_mips.

2020-09-07 Thread Shiyou Yin

Speed of decoding H264: 5.14x ==> 5.23x (tested on 3A4000).
---
 libavcodec/mips/h264chroma_init_mips.c | 19 +--
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/libavcodec/mips/h264chroma_init_mips.c 
b/libavcodec/mips/h264chroma_init_mips.c
index 6bb19d3..755cc04 100644
--- a/libavcodec/mips/h264chroma_init_mips.c
+++ b/libavcodec/mips/h264chroma_init_mips.c
@@ -28,7 +28,15 @@ av_cold void ff_h264chroma_init_mips(H264ChromaContext *c, 
int bit_depth)
 int cpu_flags = av_get_cpu_flags();
 int high_bit_depth = bit_depth > 8;
 
-/* MMI apears to be faster than MSA here */
+if (have_mmi(cpu_flags)) {
+if (!high_bit_depth) {
+c->put_h264_chroma_pixels_tab[0] = ff_put_h264_chroma_mc8_mmi;
+c->avg_h264_chroma_pixels_tab[0] = ff_avg_h264_chroma_mc8_mmi;
+c->put_h264_chroma_pixels_tab[1] = ff_put_h264_chroma_mc4_mmi;
+c->avg_h264_chroma_pixels_tab[1] = ff_avg_h264_chroma_mc4_mmi;
+}
+}
+
 if (have_msa(cpu_flags)) {
 if (!high_bit_depth) {
 c->put_h264_chroma_pixels_tab[0] = ff_put_h264_chroma_mc8_msa;
@@ -40,13 +48,4 @@ av_cold void ff_h264chroma_init_mips(H264ChromaContext *c, 
int bit_depth)
 c->avg_h264_chroma_pixels_tab[2] = ff_avg_h264_chroma_mc2_msa;
 }
 }
-
-if (have_mmi(cpu_flags)) {
-if (!high_bit_depth) {
-c->put_h264_chroma_pixels_tab[0] = ff_put_h264_chroma_mc8_mmi;
-c->avg_h264_chroma_pixels_tab[0] = ff_avg_h264_chroma_mc8_mmi;
-c->put_h264_chroma_pixels_tab[1] = ff_put_h264_chroma_mc4_mmi;
-c->avg_h264_chroma_pixels_tab[1] = ff_avg_h264_chroma_mc4_mmi;
-}
-}
 }
-- 
2.1.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH v2 2/3] avcodec/mips: Refine get_cabac_inline_mips.

2020-09-07 Thread Shiyou Yin

1. Refined function get_cabac_inline_mips.
2. Optimize function get_cabac_bypass and get_cabac_bypass_sign.

Speed of decoding h264: 5.23x ==> 5.45x(tested on 3A4000).
---
 libavcodec/mips/cabac.h | 131 +---
 1 file changed, 102 insertions(+), 29 deletions(-)

diff --git a/libavcodec/mips/cabac.h b/libavcodec/mips/cabac.h
index 3d09e93..0ee7594 100644
--- a/libavcodec/mips/cabac.h
+++ b/libavcodec/mips/cabac.h
@@ -2,7 +2,8 @@
  * Loongson SIMD optimized h264chroma
  *
  * Copyright (c) 2018 Loongson Technology Corporation Limited
- * Copyright (c) 2018 Shiyou Yin 
+ * Contributed by Shiyou Yin 
+ *Gu Xiwei(guxiwei...@loongson.cn)
  *
  * This file is part of FFmpeg.
  *
@@ -25,18 +26,18 @@
 #define AVCODEC_MIPS_CABAC_H
 
 #include "libavcodec/cabac.h"
-#include "libavutil/mips/asmdefs.h"
+#include "libavutil/mips/mmiutils.h"
 #include "config.h"
 
 #define get_cabac_inline get_cabac_inline_mips
 static av_always_inline int get_cabac_inline_mips(CABACContext *c,
- uint8_t * const state){
+  uint8_t * const state){
 mips_reg tmp0, tmp1, tmp2, bit;
 
 __asm__ volatile (
 "lbu  %[bit],0(%[state])   \n\t"
 "and  %[tmp0],   %[c_range], 0xC0  \n\t"
-PTR_ADDU "%[tmp0],   %[tmp0],%[tmp0]   \n\t"
+PTR_SLL  "%[tmp0],   %[tmp0],0x01  \n\t"
 PTR_ADDU "%[tmp0],   %[tmp0],%[tables] \n\t"
 PTR_ADDU "%[tmp0],   %[tmp0],%[bit]\n\t"
 /* tmp1: RangeLPS */
@@ -44,18 +45,11 @@ static av_always_inline int 
get_cabac_inline_mips(CABACContext *c,
 
 PTR_SUBU "%[c_range],%[c_range], %[tmp1]   \n\t"
 PTR_SLL  "%[tmp0],   %[c_range], 0x11  \n\t"
-PTR_SUBU "%[tmp0],   %[tmp0],%[c_low]  \n\t"
-
-/* tmp2: lps_mask */
-PTR_SRA  "%[tmp2],   %[tmp0],0x1F  \n\t"
-/* If tmp0 < 0, lps_mask ==  0x*/
-/* If tmp0 >= 0, lps_mask ==  0x*/
+"slt  %[tmp2],   %[tmp0],%[c_low]  \n\t"
 "beqz %[tmp2],   1f\n\t"
-PTR_SLL  "%[tmp0],   %[c_range], 0x11  \n\t"
+"move %[c_range],%[tmp1]   \n\t"
+"not  %[bit],%[bit]\n\t"
 PTR_SUBU "%[c_low],  %[c_low],   %[tmp0]   \n\t"
-PTR_SUBU "%[tmp0],   %[tmp1],%[c_range]\n\t"
-PTR_ADDU "%[c_range],%[c_range], %[tmp0]   \n\t"
-"xor  %[bit],%[bit], %[tmp2]   \n\t"
 
 "1:\n\t"
 /* tmp1: *state */
@@ -70,23 +64,18 @@ static av_always_inline int 
get_cabac_inline_mips(CABACContext *c,
 PTR_SLL  "%[c_range],%[c_range], %[tmp2]   \n\t"
 PTR_SLL  "%[c_low],  %[c_low],   %[tmp2]   \n\t"
 
-"and  %[tmp0],   %[c_low],   %[cabac_mask] \n\t"
-"bnez %[tmp0],   1f\n\t"
-PTR_ADDIU"%[tmp0],   %[c_low],   -0x01 \n\t"
+"and  %[tmp1],   %[c_low],   %[cabac_mask] \n\t"
+"bnez %[tmp1],   1f\n\t"
+PTR_ADDIU"%[tmp0],   %[c_low],   -0X01 \n\t"
 "xor  %[tmp0],   %[c_low],   %[tmp0]   \n\t"
 PTR_SRA  "%[tmp0],   %[tmp0],0x0f  \n\t"
 PTR_ADDU "%[tmp0],   %[tmp0],%[tables] \n\t"
+/* tmp2: ff_h264_norm_shift[x >> (CABAC_BITS - 1)] */
 "lbu  %[tmp2],   %[norm_off](%[tmp0])  \n\t"
-#if CABAC_BITS == 16
-"lbu  %[tmp0],   0(%[c_bytestream])\n\t"
-"lbu  %[tmp1],   1(%[c_bytestream])\n\t"
-PTR_SLL  "%[tmp0],   %[tmp0],0x09  \n\t"
-PTR_SLL  "%[tmp1],   %[tmp1],0x01  \n\t"
-PTR_ADDU "%[tmp0],   %[tmp0],%[tmp1]   \n\t"
-#else
-"lbu  %[tmp0],   0(%[c_bytestream])\n\t"
+
+"lhu  %[t

[FFmpeg-devel] [loongson] Optimize H264 decoding.

2020-09-07 Thread Shiyou Yin

H264 decoding speed: 154fps ==> 165fps, 5.14x ==> 5.53x (tested on 3A4000)

V2: Fixed a build error in [PATCH 2/3].
"Error: opcode not supported on this processor: mips32r2 (mips32r2) `dsbh 
$10,$10'"

[PATCH 1/3] avcodec/mips: Restore the initialization sequence of MSA. 
[PATCH 2/3] avcodec/mips: Refine get_cabac_inline_mips.
[PATCH 3/3] avcodec/mips: Optimize function ff_h264_loop_filter_strength_msa.

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH 3/3] avcodec/mips: Optimize function ff_h264_loop_filter_strength_msa.

2020-09-04 Thread Shiyou Yin

From: gxw 

Speed of decoding H264: 5.45x ==> 5.53x

Signed-off-by: Shiyou Yin 
---
 libavcodec/mips/Makefile|   3 +-
 libavcodec/mips/h264_deblock_msa.c  | 153 
 libavcodec/mips/h264dsp_init_mips.c |   2 +
 libavcodec/mips/h264dsp_mips.h  |   4 +
 4 files changed, 161 insertions(+), 1 deletion(-)
 create mode 100644 libavcodec/mips/h264_deblock_msa.c

diff --git a/libavcodec/mips/Makefile b/libavcodec/mips/Makefile
index 2be4d9b..81a73a4 100644
--- a/libavcodec/mips/Makefile
+++ b/libavcodec/mips/Makefile
@@ -57,7 +57,8 @@ MSA-OBJS-$(CONFIG_VP8_DECODER)+= 
mips/vp8_mc_msa.o \
  mips/vp8_lpf_msa.o
 MSA-OBJS-$(CONFIG_VP3DSP) += mips/vp3dsp_idct_msa.o
 MSA-OBJS-$(CONFIG_H264DSP)+= mips/h264dsp_msa.o\
- mips/h264idct_msa.o
+ mips/h264idct_msa.o   \
+ mips/h264_deblock_msa.o
 MSA-OBJS-$(CONFIG_H264QPEL)   += mips/h264qpel_msa.o
 MSA-OBJS-$(CONFIG_H264CHROMA) += mips/h264chroma_msa.o
 MSA-OBJS-$(CONFIG_H264PRED)   += mips/h264pred_msa.o
diff --git a/libavcodec/mips/h264_deblock_msa.c 
b/libavcodec/mips/h264_deblock_msa.c
new file mode 100644
index 000..4fed55c
--- /dev/null
+++ b/libavcodec/mips/h264_deblock_msa.c
@@ -0,0 +1,153 @@
+/*
+ * MIPS SIMD optimized H.264 deblocking code
+ *
+ * Copyright (c) 2020 Loongson Technology Corporation Limited
+ *Gu Xiwei 
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include "libavcodec/bit_depth_template.c"
+#include "h264dsp_mips.h"
+#include "libavutil/mips/generic_macros_msa.h"
+#include "libavcodec/mips/h264dsp_mips.h"
+
+#define h264_loop_filter_strength_iteration_msa(edges, step, mask_mv, dir, \
+d_idx, mask_dir)   \
+do {   \
+int b_idx = 0; \
+int step_x4 = step << 2; \
+int d_idx_12 = d_idx + 12; \
+int d_idx_52 = d_idx + 52; \
+int d_idx_x4 = d_idx << 2; \
+int d_idx_x4_48 = d_idx_x4 + 48; \
+int dir_x32  = dir * 32; \
+uint8_t *ref_t = (uint8_t*)ref; \
+uint8_t *mv_t  = (uint8_t*)mv; \
+uint8_t *nnz_t = (uint8_t*)nnz; \
+uint8_t *bS_t  = (uint8_t*)bS; \
+mask_mv <<= 3; \
+for (; b_idx < edges; b_idx += step) { \
+out &= mask_dir; \
+if (!(mask_mv & b_idx)) { \
+if (bidir) { \
+ref_2 = LD_SB(ref_t + d_idx_12); \
+ref_3 = LD_SB(ref_t + d_idx_52); \
+ref_0 = LD_SB(ref_t + 12); \
+ref_1 = LD_SB(ref_t + 52); \
+ref_2 = (v16i8)__msa_ilvr_w((v4i32)ref_3, (v4i32)ref_2); \
+ref_0 = (v16i8)__msa_ilvr_w((v4i32)ref_0, (v4i32)ref_0); \
+ref_1 = (v16i8)__msa_ilvr_w((v4i32)ref_1, (v4i32)ref_1); \
+ref_3 = (v16i8)__msa_shf_h((v8i16)ref_2, 0x4e); \
+ref_0 -= ref_2; \
+ref_1 -= ref_3; \
+ref_0 = (v16i8)__msa_or_v((v16u8)ref_0, (v16u8)ref_1); \
+\
+tmp_2 = LD_SH(mv_t + d_idx_x4_48);   \
+tmp_3 = LD_SH(mv_t + 48); \
+tmp_4 = LD_SH(mv_t + 208); \
+tmp_5 = tmp_2 - tmp_3; \
+tmp_6 = tmp_2 - tmp_4; \
+SAT_SH2_SH(tmp_5, tmp_6, 7); \
+tmp_0 = __msa_pckev_b((v16i8)tmp_6, (v16i8)tmp_5); \
+tmp_0 += cnst_1; \
+tmp_0 = (v16i8)__msa_subs_u_b((v16u8)tmp_0, (v16u8)cnst_0);\
+tmp_0 = (v16i8)__msa_sat_s_h((v8i16)tmp_0, 7); \
+tmp_0 = __msa_pckev_b(tmp_0, tmp_0); \
+out   = (v16i8)__msa_or_v((v16u8)ref_0, (v16u8)tmp_0); \
+\
+tmp_2 = LD_SH(mv_t + 208 + d_idx_x4); \
+tmp_5 = tmp_2 - tmp_3; \
+tmp_6 = tmp_2 - tmp_4; \
+SAT_SH2_SH(tmp_5, tmp_6, 7); \
+tmp_1 = __msa_pckev_b((v16i8)tmp_6, (v16i8)tmp_5

[FFmpeg-devel] [loongson] Optimize H264 decoding.

2020-09-04 Thread Shiyou Yin

H264 decoding speed: 154fps ==> 165fps, 5.14x ==> 5.53x (tested on 3A4000)

[PATCH 1/3] avcodec/mips: Restore the initialization sequence of MSA. 
[PATCH 2/3] avcodec/mips: Refine get_cabac_inline_mips.
[PATCH 3/3] avcodec/mips: Optimize function ff_h264_loop_filter_strength_msa.

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH 1/3] avcodec/mips: Restore the initialization sequence of MSA and MMI in ff_h264chroma_init_mips.

2020-09-04 Thread Shiyou Yin

Speed of decoding H264: 5.14x ==> 5.23x (tested on 3A4000).
---
 libavcodec/mips/h264chroma_init_mips.c | 19 +--
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/libavcodec/mips/h264chroma_init_mips.c 
b/libavcodec/mips/h264chroma_init_mips.c
index 6bb19d3..755cc04 100644
--- a/libavcodec/mips/h264chroma_init_mips.c
+++ b/libavcodec/mips/h264chroma_init_mips.c
@@ -28,7 +28,15 @@ av_cold void ff_h264chroma_init_mips(H264ChromaContext *c, 
int bit_depth)
 int cpu_flags = av_get_cpu_flags();
 int high_bit_depth = bit_depth > 8;
 
-/* MMI apears to be faster than MSA here */
+if (have_mmi(cpu_flags)) {
+if (!high_bit_depth) {
+c->put_h264_chroma_pixels_tab[0] = ff_put_h264_chroma_mc8_mmi;
+c->avg_h264_chroma_pixels_tab[0] = ff_avg_h264_chroma_mc8_mmi;
+c->put_h264_chroma_pixels_tab[1] = ff_put_h264_chroma_mc4_mmi;
+c->avg_h264_chroma_pixels_tab[1] = ff_avg_h264_chroma_mc4_mmi;
+}
+}
+
 if (have_msa(cpu_flags)) {
 if (!high_bit_depth) {
 c->put_h264_chroma_pixels_tab[0] = ff_put_h264_chroma_mc8_msa;
@@ -40,13 +48,4 @@ av_cold void ff_h264chroma_init_mips(H264ChromaContext *c, 
int bit_depth)
 c->avg_h264_chroma_pixels_tab[2] = ff_avg_h264_chroma_mc2_msa;
 }
 }
-
-if (have_mmi(cpu_flags)) {
-if (!high_bit_depth) {
-c->put_h264_chroma_pixels_tab[0] = ff_put_h264_chroma_mc8_mmi;
-c->avg_h264_chroma_pixels_tab[0] = ff_avg_h264_chroma_mc8_mmi;
-c->put_h264_chroma_pixels_tab[1] = ff_put_h264_chroma_mc4_mmi;
-c->avg_h264_chroma_pixels_tab[1] = ff_avg_h264_chroma_mc4_mmi;
-}
-}
 }
-- 
2.1.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH 2/3] avcodec/mips: Refine get_cabac_inline_mips.

2020-09-04 Thread Shiyou Yin

1. Refined function get_cabac_inline_mips.
2. Optimize function get_cabac_bypass and get_cabac_bypass_sign.

Speed of decoding h264: 5.23x ==> 5.45x(tested on 3A4000).
---
 libavcodec/mips/cabac.h | 131 +---
 1 file changed, 102 insertions(+), 29 deletions(-)

diff --git a/libavcodec/mips/cabac.h b/libavcodec/mips/cabac.h
index 3d09e93..579d4ee 100644
--- a/libavcodec/mips/cabac.h
+++ b/libavcodec/mips/cabac.h
@@ -2,7 +2,8 @@
  * Loongson SIMD optimized h264chroma
  *
  * Copyright (c) 2018 Loongson Technology Corporation Limited
- * Copyright (c) 2018 Shiyou Yin 
+ * Contributed by Shiyou Yin 
+ *Gu Xiwei(guxiwei...@loongson.cn)
  *
  * This file is part of FFmpeg.
  *
@@ -25,18 +26,18 @@
 #define AVCODEC_MIPS_CABAC_H
 
 #include "libavcodec/cabac.h"
-#include "libavutil/mips/asmdefs.h"
+#include "libavutil/mips/mmiutils.h"
 #include "config.h"
 
 #define get_cabac_inline get_cabac_inline_mips
 static av_always_inline int get_cabac_inline_mips(CABACContext *c,
- uint8_t * const state){
+  uint8_t * const state){
 mips_reg tmp0, tmp1, tmp2, bit;
 
 __asm__ volatile (
 "lbu  %[bit],0(%[state])   \n\t"
 "and  %[tmp0],   %[c_range], 0xC0  \n\t"
-PTR_ADDU "%[tmp0],   %[tmp0],%[tmp0]   \n\t"
+PTR_SLL  "%[tmp0],   %[tmp0],0x01  \n\t"
 PTR_ADDU "%[tmp0],   %[tmp0],%[tables] \n\t"
 PTR_ADDU "%[tmp0],   %[tmp0],%[bit]\n\t"
 /* tmp1: RangeLPS */
@@ -44,18 +45,11 @@ static av_always_inline int 
get_cabac_inline_mips(CABACContext *c,
 
 PTR_SUBU "%[c_range],%[c_range], %[tmp1]   \n\t"
 PTR_SLL  "%[tmp0],   %[c_range], 0x11  \n\t"
-PTR_SUBU "%[tmp0],   %[tmp0],%[c_low]  \n\t"
-
-/* tmp2: lps_mask */
-PTR_SRA  "%[tmp2],   %[tmp0],0x1F  \n\t"
-/* If tmp0 < 0, lps_mask ==  0x*/
-/* If tmp0 >= 0, lps_mask ==  0x*/
+"slt  %[tmp2],   %[tmp0],%[c_low]  \n\t"
 "beqz %[tmp2],   1f\n\t"
-PTR_SLL  "%[tmp0],   %[c_range], 0x11  \n\t"
+"move %[c_range],%[tmp1]   \n\t"
+"not  %[bit],%[bit]\n\t"
 PTR_SUBU "%[c_low],  %[c_low],   %[tmp0]   \n\t"
-PTR_SUBU "%[tmp0],   %[tmp1],%[c_range]\n\t"
-PTR_ADDU "%[c_range],%[c_range], %[tmp0]   \n\t"
-"xor  %[bit],%[bit], %[tmp2]   \n\t"
 
 "1:\n\t"
 /* tmp1: *state */
@@ -70,23 +64,18 @@ static av_always_inline int 
get_cabac_inline_mips(CABACContext *c,
 PTR_SLL  "%[c_range],%[c_range], %[tmp2]   \n\t"
 PTR_SLL  "%[c_low],  %[c_low],   %[tmp2]   \n\t"
 
-"and  %[tmp0],   %[c_low],   %[cabac_mask] \n\t"
-"bnez %[tmp0],   1f\n\t"
-PTR_ADDIU"%[tmp0],   %[c_low],   -0x01 \n\t"
+"and  %[tmp1],   %[c_low],   %[cabac_mask] \n\t"
+"bnez %[tmp1],   1f\n\t"
+PTR_ADDIU"%[tmp0],   %[c_low],   -0X01 \n\t"
 "xor  %[tmp0],   %[c_low],   %[tmp0]   \n\t"
 PTR_SRA  "%[tmp0],   %[tmp0],0x0f  \n\t"
 PTR_ADDU "%[tmp0],   %[tmp0],%[tables] \n\t"
+/* tmp2: ff_h264_norm_shift[x >> (CABAC_BITS - 1)] */
 "lbu  %[tmp2],   %[norm_off](%[tmp0])  \n\t"
-#if CABAC_BITS == 16
-"lbu  %[tmp0],   0(%[c_bytestream])\n\t"
-"lbu  %[tmp1],   1(%[c_bytestream])\n\t"
-PTR_SLL  "%[tmp0],   %[tmp0],0x09  \n\t"
-PTR_SLL  "%[tmp1],   %[tmp1],0x01  \n\t"
-PTR_ADDU "%[tmp0],   %[tmp0],%[tmp1]   \n\t"
-#else
-"lbu  %[tmp0],   0(%[c_bytestream])\n\t"
+
+"lhu  %[t

[FFmpeg-devel] [PATCH 2/2] Fix msa can't be disabled when '--cpu=loongson3a' assigned.

2020-09-03 Thread Shiyou Yin

There are compiler and runtime check for MSA and MMI.
Remove the redundant setting of MSA and MMI for cores specified by "--cpu".

Signed-off-by: Shiyou Yin 
---
 configure | 9 -
 1 file changed, 9 deletions(-)

diff --git a/configure b/configure
index 5640720..7f103fa 100755
--- a/configure
+++ b/configure
@@ -5025,8 +5025,6 @@ elif enabled mips; then
 disable loongson3
 disable mipsdsp
 disable mipsdspr2
-disable msa
-disable mmi
 
 cpuflags="-march=$cpu"
 
@@ -5035,17 +5033,13 @@ elif enabled mips; then
 mips1|mips3)
 ;;
 mips32r2)
-enable msa
 enable mips32r2
 ;;
 mips32r5)
-enable msa
 enable mips32r2
 enable mips32r5
 ;;
 mips64r2|mips64r5)
-enable msa
-enable mmi
 enable mips64r2
 enable loongson3
 ;;
@@ -5062,7 +5056,6 @@ elif enabled mips; then
 enable mips32r2
 ;;
 p5600)
-enable msa
 enable mips32r2
 enable mips32r5
 check_cflags "-mtune=p5600" && check_cflags "-msched-weight 
-mload-store-pairs -funroll-loops"
@@ -5077,7 +5070,6 @@ elif enabled mips; then
 ;;
 # Cores from Loongson
 loongson2e|loongson2f|loongson3*)
-enable mmi
 enable local_aligned
 enable simd_align_16
 enable fast_64bit
@@ -5100,7 +5092,6 @@ elif enabled mips; then
 case $cpu in
 loongson3*)
 enable loongson3
-enable msa
 cpuflags="-march=loongson3a -mhard-float 
$expensive_optimization_flag"
 ;;
 loongson2e)
-- 
2.1.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH 1/2] avcodec/mips: [loongson] Fixed mmi optimization

2020-09-03 Thread Shiyou Yin

From: gxw 

Test case fate-checkasm-h264pred failed in latest community code.
This patch fixed the bug.

Signed-off-by: Shiyou Yin 
---
 libavcodec/mips/h264pred_mmi.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/libavcodec/mips/h264pred_mmi.c b/libavcodec/mips/h264pred_mmi.c
index f4fe091..0209c2e 100644
--- a/libavcodec/mips/h264pred_mmi.c
+++ b/libavcodec/mips/h264pred_mmi.c
@@ -178,7 +178,9 @@ void ff_pred8x8l_top_dc_8_mmi(uint8_t *src, int has_topleft,
 
 "1: \n\t"
 "bnez   %[has_topright],2f  \n\t"
-"pinsrh_3   %[ftmp2],   %[ftmp2],   %[ftmp4]\n\t"
+"dli%[tmp0],0xa4\n\t"
+"mtc1   %[tmp0],%[ftmp1]\n\t"
+"pshufh %[ftmp2],   %[ftmp2],   %[ftmp1]\n\t"
 
 "2: \n\t"
 "dli%[tmp0],0x02\n\t"
@@ -370,7 +372,9 @@ void ff_pred8x8l_vertical_8_mmi(uint8_t *src, int 
has_topleft,
 
 "1: \n\t"
 "bnez   %[has_topright],2f  \n\t"
-"pinsrh_3   %[ftmp11],  %[ftmp11],  %[ftmp9]\n\t"
+"dli%[tmp0],0xa4\n\t"
+"mtc1   %[tmp0],%[ftmp1]\n\t"
+"pshufh %[ftmp11],  %[ftmp11],  %[ftmp1]\n\t"
 
 "2: \n\t"
 "dli%[tmp0],0x02\n\t"
-- 
2.1.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH 4/4] [mips]: Fix segfault in imdct36_mips_float.

2020-07-29 Thread Shiyou Yin

'li.s' is a synthesized instruction, it does not work properly
when compiled with clang on mips, and A segfault occurred.
---
 libavcodec/mips/aacpsdsp_mips.c   |  13 +-
 libavcodec/mips/aacpsy_mips.h |  14 +-
 libavcodec/mips/fft_mips.c|  12 +-
 libavcodec/mips/mpegaudiodsp_mips_float.c | 492 +++---
 4 files changed, 264 insertions(+), 267 deletions(-)

diff --git a/libavcodec/mips/aacpsdsp_mips.c b/libavcodec/mips/aacpsdsp_mips.c
index ef47e31..f635413 100644
--- a/libavcodec/mips/aacpsdsp_mips.c
+++ b/libavcodec/mips/aacpsdsp_mips.c
@@ -293,16 +293,17 @@ static void ps_decorrelate_mips(float (*out)[2], float 
(*delay)[2],
 float phi_fract0 = phi_fract[0];
 float phi_fract1 = phi_fract[1];
 float temp0, temp1, temp2, temp3, temp4, temp5, temp6, temp7, temp8, temp9;
+float f1, f2, f3;
 
 float *p_delay_end = (p_delay + (len << 1));
 
 /* merged 2 loops */
+f1 = 0.65143905753106;
+f2 = 0.56471812200776;
+f3 = 0.48954165955695;
 __asm__ volatile(
 ".setpush\n\t"
 ".setnoreorder   \n\t"
-"li.s%[ag0],0.65143905753106 \n\t"
-"li.s%[ag1],0.56471812200776 \n\t"
-"li.s%[ag2],0.48954165955695 \n\t"
 "mul.s   %[ag0],%[ag0],%[g_decay_slope]  \n\t"
 "mul.s   %[ag1],%[ag1],%[g_decay_slope]  \n\t"
 "mul.s   %[ag2],%[ag2],%[g_decay_slope]  \n\t"
@@ -378,10 +379,10 @@ static void ps_decorrelate_mips(float (*out)[2], float 
(*delay)[2],
   [temp3]"="(temp3), [temp4]"="(temp4), [temp5]"="(temp5),
   [temp6]"="(temp6), [temp7]"="(temp7), [temp8]"="(temp8),
   [temp9]"="(temp9), [p_delay]"+r"(p_delay), 
[p_ap_delay]"+r"(p_ap_delay),
-  [p_Q_fract]"+r"(p_Q_fract), [p_t_gain]"+r"(p_t_gain), 
[p_out]"+r"(p_out),
-  [ag0]"="(ag0), [ag1]"="(ag1), [ag2]"="(ag2)
+  [p_Q_fract]"+r"(p_Q_fract), [p_t_gain]"+r"(p_t_gain), 
[p_out]"+r"(p_out)
 : [phi_fract0]"f"(phi_fract0), [phi_fract1]"f"(phi_fract1),
-  [p_delay_end]"r"(p_delay_end), [g_decay_slope]"f"(g_decay_slope)
+  [p_delay_end]"r"(p_delay_end), [g_decay_slope]"f"(g_decay_slope),
+  [ag0]"f"(f1), [ag1]"f"(f2), [ag2]"f"(f3)
 : "memory"
 );
 }
diff --git a/libavcodec/mips/aacpsy_mips.h b/libavcodec/mips/aacpsy_mips.h
index a1fe5cc..7d27d32 100644
--- a/libavcodec/mips/aacpsy_mips.h
+++ b/libavcodec/mips/aacpsy_mips.h
@@ -135,11 +135,11 @@ static void psy_hp_filter_mips(const float *firbuf, float 
*hpfsmpl, const float
 float coeff3 = psy_fir_coeffs[7];
 float coeff4 = psy_fir_coeffs[9];
 
+float f1 = 32768.0;
 __asm__ volatile (
 ".set push  \n\t"
 ".set noreorder \n\t"
 
-"li.s   $f12,   32768   \n\t"
 "1: \n\t"
 "lwc1   $f0,40(%[fb])   \n\t"
 "lwc1   $f1,4(%[fb])\n\t"
@@ -203,14 +203,14 @@ static void psy_hp_filter_mips(const float *firbuf, float 
*hpfsmpl, const float
 "madd.s %[sum2],%[sum2],$f9,%[coeff4]   \n\t"
 "madd.s %[sum4],%[sum4],$f6,%[coeff4]   \n\t"
 "madd.s %[sum3],%[sum3],$f3,%[coeff4]   \n\t"
-"mul.s  %[sum1],%[sum1],$f12\n\t"
-"mul.s  %[sum2],%[sum2],$f12\n\t"
+"mul.s  %[sum1],%[sum1],%[f1]   \n\t"
+"mul.s  %[sum2],%[sum2],%[f1]   \n\t"
 "madd.s %[sum4],%[sum4],$f11,   %[coeff4]   \n\t"
 "madd.s %[sum3],%[sum3],$f8,%[coeff4]   \n\t"
 "swc1   %[sum1],0(%[hp])\n\t"
 "swc1   %[sum2],4(%[hp])\n\t"
-"mul.s  %[sum4],%[sum4],$f12\n\t"
-"mul.s  %[sum3],%[sum3],$f12\n\t"
+"mul.s  %[sum4],%[sum4],%[f1]   \n\t"
+"mul.s  %[sum3],%[sum3],%[f1]   \n\t"
 "swc1   %[sum4],12(%[hp])   \n\t"
 "swc1   %[sum3],8(%[hp])\n\t"
 "bne%[fb],  %[fb_end],  1b  \n\t"
@@ -223,9 +223,9 @@ static void psy_hp_filter_mips(const float *firbuf, float 
*hpfsmpl, const float
   [fb]"+r"(fb), [hp]"+r"(hp)
 : [coeff0]"f"(coeff0), [coeff1]"f"(coeff1),
   [coeff2]"f"(coeff2), [coeff3]"f"(coeff3),
-  [coeff4]"f"(coeff4), [fb_end]"r"(fb_end)
+  [coeff4]"f"(coeff4),

[FFmpeg-devel] [PATCH 3/4] [mips]: Fix a bug in get_cabac_inline_mips.

2020-07-29 Thread Shiyou Yin

Failed fate case: fate-h264-conformance-caba2_sony_e
Clang is more strict in the use of register constraint.
---
 libavcodec/mips/cabac.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libavcodec/mips/cabac.h b/libavcodec/mips/cabac.h
index c595915..3d09e93 100644
--- a/libavcodec/mips/cabac.h
+++ b/libavcodec/mips/cabac.h
@@ -109,7 +109,7 @@ static av_always_inline int 
get_cabac_inline_mips(CABACContext *c,
   [lps_off]"i"(H264_LPS_RANGE_OFFSET),
   [mlps_off]"i"(H264_MLPS_STATE_OFFSET + 128),
   [norm_off]"i"(H264_NORM_SHIFT_OFFSET),
-  [cabac_mask]"i"(CABAC_MASK)
+  [cabac_mask]"r"(CABAC_MASK)
 : "memory"
 );
 
-- 
2.1.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH 0/4] [mips]: Adapt clang compiler for mips.

2020-07-29 Thread Shiyou Yin

Fixed four prob encountered when compiling ffmpeg with clang on mips.
configure: ./configure --disable-mmi  --cc=clang --cxx=clang++ --ld=clang

[PATCH 1/4] [mips]: Fix register constraint error reported by clang.
[PATCH 2/4] [mips]: Fix prob that 'ulw' and 'uld' unsupported by clang.
[PATCH 3/4] [mips]: Fix a bug in get_cabac_inline_mips.
[PATCH 4/4] [mips]: Fix segfault in imdct36_mips_float.

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH 1/4] [mips]: Fix register constraint error reported by clang.

2020-07-29 Thread Shiyou Yin

Clang report following error in aacsbr_mips.c,ac3dsp_mips.c and aacdec_mips.c:
"couldn't allocate output register for constraint 'r'"

Use 'f' constraint for float variable.
---
 libavcodec/mips/aacdec_mips.c |  2 +-
 libavcodec/mips/aacsbr_mips.c |  2 +-
 libavcodec/mips/sbrdsp_mips.c | 19 ++-
 3 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/libavcodec/mips/aacdec_mips.c b/libavcodec/mips/aacdec_mips.c
index 8e30652..7f24789 100644
--- a/libavcodec/mips/aacdec_mips.c
+++ b/libavcodec/mips/aacdec_mips.c
@@ -340,7 +340,7 @@ static void update_ltp_mips(AACContext *ac, 
SingleChannelElement *sce)
 float *saved_ltp = sce->coeffs;
 const float *lwindow = ics->use_kb_window[0] ? ff_aac_kbd_long_1024 : 
ff_sine_1024;
 const float *swindow = ics->use_kb_window[0] ? ff_aac_kbd_short_128 : 
ff_sine_128;
-float temp0, temp1, temp2, temp3, temp4, temp5, temp6, temp7;
+uint32_t temp0, temp1, temp2, temp3, temp4, temp5, temp6, temp7;
 
 if (ics->window_sequence[0] == EIGHT_SHORT_SEQUENCE) {
 float *p_saved_ltp = saved_ltp + 576;
diff --git a/libavcodec/mips/aacsbr_mips.c b/libavcodec/mips/aacsbr_mips.c
index 2e0cd72..5ef5e68 100644
--- a/libavcodec/mips/aacsbr_mips.c
+++ b/libavcodec/mips/aacsbr_mips.c
@@ -333,7 +333,7 @@ static void sbr_hf_assemble_mips(float Y1[38][64][2],
 int indexnoise = ch_data->f_indexnoise;
 int indexsine  = ch_data->f_indexsine;
 float *g_temp1, *q_temp1, *pok, *pok1;
-float temp1, temp2, temp3, temp4;
+uint32_t temp1, temp2, temp3, temp4;
 int size = m_max;
 
 if (sbr->reset) {
diff --git a/libavcodec/mips/sbrdsp_mips.c b/libavcodec/mips/sbrdsp_mips.c
index 83039fd..1c87c99 100644
--- a/libavcodec/mips/sbrdsp_mips.c
+++ b/libavcodec/mips/sbrdsp_mips.c
@@ -796,9 +796,9 @@ static void sbr_hf_apply_noise_2_mips(float (*Y)[2], const 
float *s_m,
  const float *q_filt, int noise,
  int kx, int m_max)
 {
-int m;
+int m, temp0, temp1;
 float *ff_table;
-float y0,y1, temp0, temp1, temp2, temp3, temp4, temp5;
+float y0, y1, temp2, temp3, temp4, temp5;
 
 for (m = 0; m < m_max; m++) {
 
@@ -808,14 +808,14 @@ static void sbr_hf_apply_noise_2_mips(float (*Y)[2], 
const float *s_m,
 
 __asm__ volatile(
 "lwc1   %[y0],   0(%[Y1])  
\n\t"
-"lwc1   %[temp1],0(%[s_m1])
\n\t"
+"lwc1   %[temp3],0(%[s_m1])
\n\t"
 "addiu  %[noise],%[noise],  1  
\n\t"
 "andi   %[noise],%[noise],  0x1ff  
\n\t"
 "sll%[temp0],%[noise],  3  
\n\t"
 PTR_ADDU "%[ff_table],%[ff_sbr_noise_table],%[temp0]   
\n\t"
-"sub.s  %[y0],   %[y0], %[temp1]   
\n\t"
-"mfc1   %[temp3],%[temp1]  
\n\t"
-"bne%[temp3],$0,1f 
\n\t"
+"sub.s  %[y0],   %[y0], %[temp3]   
\n\t"
+"mfc1   %[temp1],%[temp3]  
\n\t"
+"bne%[temp1],$0,1f 
\n\t"
 "lwc1   %[y1],   4(%[Y1])  
\n\t"
 "lwc1   %[temp2],0(%[q_filt1]) 
\n\t"
 "lwc1   %[temp4],0(%[ff_table])
\n\t"
@@ -826,9 +826,10 @@ static void sbr_hf_apply_noise_2_mips(float (*Y)[2], const 
float *s_m,
 "1:
\n\t"
 "swc1   %[y0],   0(%[Y1])  
\n\t"
 
-: [temp0]"="(temp0), [ff_table]"="(ff_table), [y0]"="(y0),
-  [y1]"="(y1), [temp1]"="(temp1), [temp2]"="(temp2),
-  [temp3]"="(temp3), [temp4]"="(temp4), [temp5]"="(temp5)
+: [temp0]"="(temp0), [temp1]"="(temp1), [y0]"="(y0),
+  [y1]"="(y1), [ff_table]"="(ff_table),
+  [temp2]"="(temp2), [temp3]"="(temp3),
+  [temp4]"="(temp4), [temp5]"="(temp5)
 : [ff_sbr_noise_table]"r"(ff_sbr_noise_table), [noise]"r"(noise),
   [Y1]"r"(Y1), [s_m1]"r"(s_m1), [q_filt1]"r"(q_filt1)
 : "memory"
-- 
2.1.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH 2/4] [mips]: Fix prob that 'ulw' and 'uld' unsupported by clang.

2020-07-29 Thread Shiyou Yin

GCC support these two synthesized instruction, but clang does not yet.
Use machine instruction instead to adapt clang compiler.
---
 libavutil/mips/generic_macros_msa.h | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/libavutil/mips/generic_macros_msa.h 
b/libavutil/mips/generic_macros_msa.h
index 267d4e6..bb25e9f 100644
--- a/libavutil/mips/generic_macros_msa.h
+++ b/libavutil/mips/generic_macros_msa.h
@@ -111,10 +111,11 @@
 uint32_t val_lw_m;   \
  \
 __asm__ volatile (   \
-"ulw  %[val_lw_m],  %[psrc_lw_m]  \n\t"  \
+"lwr %[val_lw_m], 0(%[psrc_lw_m]) \n\t"  \
+"lwl %[val_lw_m], 3(%[psrc_lw_m]) \n\t"  \
  \
-: [val_lw_m] "=r" (val_lw_m) \
-: [psrc_lw_m] "m" (*psrc_lw_m)   \
+: [val_lw_m] "="(val_lw_m) \
+: [psrc_lw_m] "r"(psrc_lw_m) \
 );   \
  \
 val_lw_m;\
@@ -127,10 +128,11 @@
 uint64_t val_ld_m = 0;   \
  \
 __asm__ volatile (   \
-"uld  %[val_ld_m],  %[psrc_ld_m]  \n\t"  \
+"ldr %[val_ld_m], 0(%[psrc_ld_m]) \n\t"  \
+"ldl %[val_ld_m], 7(%[psrc_ld_m]) \n\t"  \
  \
-: [val_ld_m] "=r" (val_ld_m) \
-: [psrc_ld_m] "m" (*psrc_ld_m)   \
+: [val_ld_m] "=" (val_ld_m)\
+: [psrc_ld_m] "r" (psrc_ld_m)\
 );   \
  \
 val_ld_m;\
-- 
2.1.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH v6 0/6] MIPS MSA & MMI Runtime detection support

2020-07-23 Thread Shiyou Yin

>-Original Message-
>From: ffmpeg-devel-boun...@ffmpeg.org [mailto:ffmpeg-devel-boun...@ffmpeg.org] 
>On Behalf Of
>Michael Niedermayer
>Sent: Thursday, July 23, 2020 11:24 PM
>To: FFmpeg development discussions and patches
>Subject: Re: [FFmpeg-devel] [PATCH v6 0/6] MIPS MSA & MMI Runtime detection 
>support
>
>On Thu, Jul 23, 2020 at 04:18:32PM +0200, Michael Niedermayer wrote:
>> On Tue, Jul 21, 2020 at 03:40:25PM +0800, Shiyou Yin wrote:
>> > >-Original Message-
>> > >From: ffmpeg-devel-boun...@ffmpeg.org
>> > >[mailto:ffmpeg-devel-boun...@ffmpeg.org] On Behalf Of Jiaxun Yang
>> > >Sent: Saturday, July 18, 2020 11:36 PM
>> > >To: ffmpeg-devel@ffmpeg.org
>> > >Cc: Jiaxun Yang
>> > >Subject: [FFmpeg-devel] [PATCH v6 0/6] MIPS MSA & MMI Runtime
>> > >detection support
>> > >
>> > >This series adds MIPS MSA & MMI runtime detection support
>> > >
>> > >Please review.
>> > >
>> > >Thanks!
>> > >
>> > >v2:
>> > >  - Add CPUCFG support.
>> > >  - Add "-mloongson-ext" to MMIFLAGS for Loongson-3 as well.
>> > >(Loongson2F don't need this flag)
>> > >
>> > >v3:
>> > >  - Address reveiew suggestions from Shiyou Yin and Weixi Gu.
>> > >
>> > >v4:
>> > >  - Disable DSP for generic CPU
>> > >
>> > >v5:
>> > >  - Clean ups
>> > >  - Address some GCC build warnings
>> > >
>> > >v6:
>> > >  - Address more Shiyou's comments
>> > >
>> > >Jiaxun Yang (6):
>> > >  ffbuild: Refine MIPS handling
>> > >  libavutils: Add parse_r helper for MIPS
>> > >  libavutil: Detect MMI and MSA flags for MIPS
>> > >  libavcodec: Enable runtime detection for MIPS MMI & MSA
>> > >  libavcodec: MIPS: MMI: Fix type mismatches
>> > >  libavcodec: MIPS: MMI: Move sp out of the clobber list
>> > >
>> > > configure   | 172 ++--
>> > > ffbuild/common.mak  |  10 +-
>> > > libavcodec/mips/Makefile|   3 +-
>> > > libavcodec/mips/blockdsp_init_mips.c|  40 +-
>> > > libavcodec/mips/cabac.h |   2 +-
>> > > libavcodec/mips/h263dsp_init_mips.c |  18 +-
>> > > libavcodec/mips/h264chroma_init_mips.c  |  55 +-
>> > > libavcodec/mips/h264dsp_init_mips.c | 225 +++--
>> > > libavcodec/mips/h264dsp_mips.h  |  18 +-
>> > > libavcodec/mips/h264dsp_mmi.c   |  56 +-
>> > > libavcodec/mips/h264pred_init_mips.c| 207 ++--
>> > > libavcodec/mips/h264qpel_init_mips.c| 412 
>> > > libavcodec/mips/hevcdsp_init_mips.c | 992 ++--
>> > > libavcodec/mips/hevcpred_init_mips.c|  40 +-
>> > > libavcodec/mips/hpeldsp_init_mips.c | 180 ++--
>> > > libavcodec/mips/idctdsp_init_mips.c |  74 +-
>> > > libavcodec/mips/me_cmp_init_mips.c  |  50 +-
>> > > libavcodec/mips/mpegvideo_init_mips.c   |  48 +-
>> > > libavcodec/mips/mpegvideoencdsp_init_mips.c |  21 +-
>> > > libavcodec/mips/pixblockdsp_init_mips.c |  63 +-
>> > > libavcodec/mips/qpeldsp_init_mips.c | 270 +++---
>> > > libavcodec/mips/vc1dsp_init_mips.c  | 186 ++--
>> > > libavcodec/mips/videodsp_init.c |  10 +-
>> > > libavcodec/mips/vp3dsp_init_mips.c  |  44 +-
>> > > libavcodec/mips/vp8dsp_init_mips.c  | 240 +++--
>> > > libavcodec/mips/vp9dsp_init_mips.c  |  16 +-
>> > > libavcodec/mips/wmv2dsp_init_mips.c |  18 +-
>> > > libavcodec/mips/wmv2dsp_mips.h  |   4 +-
>> > > libavcodec/mips/wmv2dsp_mmi.c   |   4 +-
>> > > libavcodec/mips/xvid_idct_mmi.c |   4 +-
>> > > libavcodec/mips/xvididct_init_mips.c|  31 +-
>> > > libavcodec/mips/xvididct_mips.h |   4 +-
>> > > libavutil/cpu.c |  10 +
>> > > libavutil/cpu.h |   3 +
>> > > libavutil/cpu_internal.h|   2 +
>> > > libavutil/mips/Makefile |   2 +-
>> > > libavutil/mips/asmdefs.h|  42 +
>> > > libavutil/mips/cpu.c| 134 +++
>> > > libavutil/mips/cpu.h|  28 +
>> > > libavutil/tests/cpu.c   |   3 +
>> > > tests/checkasm/checkasm.c   |   3 +
>> > > 41 files changed, 1919 insertions(+), 1825 deletions(-) create
>> > > mode 100644 libavutil/mips/cpu.c create mode 100644
>> > > libavutil/mips/cpu.h
>> > >
>> > >--
>> > >2.27.0
>> >
>> > LGTM
>>
>> will apply
>
>next time please make sure the patches do not add tabs in .h / .c files these 
>cannot be pushed
>
thank you for your reminder, will pay attention to it in the future.

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH v6 0/6] MIPS MSA & MMI Runtime detection support

2020-07-21 Thread Shiyou Yin

>-Original Message-
>From: ffmpeg-devel-boun...@ffmpeg.org [mailto:ffmpeg-devel-boun...@ffmpeg.org] 
>On Behalf Of
>Jiaxun Yang
>Sent: Saturday, July 18, 2020 11:36 PM
>To: ffmpeg-devel@ffmpeg.org
>Cc: Jiaxun Yang
>Subject: [FFmpeg-devel] [PATCH v6 0/6] MIPS MSA & MMI Runtime detection support
>
>This series adds MIPS MSA & MMI runtime detection support
>
>Please review.
>
>Thanks!
>
>v2:
>  - Add CPUCFG support.
>  - Add "-mloongson-ext" to MMIFLAGS for Loongson-3 as well. (Loongson2F don't 
> need this flag)
>
>v3:
>  - Address reveiew suggestions from Shiyou Yin and Weixi Gu.
>
>v4:
>  - Disable DSP for generic CPU
>
>v5:
>  - Clean ups
>  - Address some GCC build warnings
>
>v6:
>  - Address more Shiyou's comments
>
>Jiaxun Yang (6):
>  ffbuild: Refine MIPS handling
>  libavutils: Add parse_r helper for MIPS
>  libavutil: Detect MMI and MSA flags for MIPS
>  libavcodec: Enable runtime detection for MIPS MMI & MSA
>  libavcodec: MIPS: MMI: Fix type mismatches
>  libavcodec: MIPS: MMI: Move sp out of the clobber list
>
> configure   | 172 ++--
> ffbuild/common.mak  |  10 +-
> libavcodec/mips/Makefile|   3 +-
> libavcodec/mips/blockdsp_init_mips.c|  40 +-
> libavcodec/mips/cabac.h |   2 +-
> libavcodec/mips/h263dsp_init_mips.c |  18 +-
> libavcodec/mips/h264chroma_init_mips.c  |  55 +-
> libavcodec/mips/h264dsp_init_mips.c | 225 +++--
> libavcodec/mips/h264dsp_mips.h  |  18 +-
> libavcodec/mips/h264dsp_mmi.c   |  56 +-
> libavcodec/mips/h264pred_init_mips.c| 207 ++--
> libavcodec/mips/h264qpel_init_mips.c| 412 
> libavcodec/mips/hevcdsp_init_mips.c | 992 ++--
> libavcodec/mips/hevcpred_init_mips.c|  40 +-
> libavcodec/mips/hpeldsp_init_mips.c | 180 ++--
> libavcodec/mips/idctdsp_init_mips.c |  74 +-
> libavcodec/mips/me_cmp_init_mips.c  |  50 +-
> libavcodec/mips/mpegvideo_init_mips.c   |  48 +-
> libavcodec/mips/mpegvideoencdsp_init_mips.c |  21 +-
> libavcodec/mips/pixblockdsp_init_mips.c |  63 +-
> libavcodec/mips/qpeldsp_init_mips.c | 270 +++---
> libavcodec/mips/vc1dsp_init_mips.c  | 186 ++--
> libavcodec/mips/videodsp_init.c |  10 +-
> libavcodec/mips/vp3dsp_init_mips.c  |  44 +-
> libavcodec/mips/vp8dsp_init_mips.c  | 240 +++--
> libavcodec/mips/vp9dsp_init_mips.c  |  16 +-
> libavcodec/mips/wmv2dsp_init_mips.c |  18 +-
> libavcodec/mips/wmv2dsp_mips.h  |   4 +-
> libavcodec/mips/wmv2dsp_mmi.c   |   4 +-
> libavcodec/mips/xvid_idct_mmi.c |   4 +-
> libavcodec/mips/xvididct_init_mips.c|  31 +-
> libavcodec/mips/xvididct_mips.h |   4 +-
> libavutil/cpu.c |  10 +
> libavutil/cpu.h |   3 +
> libavutil/cpu_internal.h|   2 +
> libavutil/mips/Makefile |   2 +-
> libavutil/mips/asmdefs.h|  42 +
> libavutil/mips/cpu.c| 134 +++
> libavutil/mips/cpu.h|  28 +
> libavutil/tests/cpu.c   |   3 +
> tests/checkasm/checkasm.c   |   3 +
> 41 files changed, 1919 insertions(+), 1825 deletions(-)
> create mode 100644 libavutil/mips/cpu.c
> create mode 100644 libavutil/mips/cpu.h
>
>--
>2.27.0

LGTM


___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH] avcodec/mips: fix type mismatch in h264dsp_msa.c

2020-07-18 Thread Shiyou Yin

gcc warning: assignment from incompatible pointer type.
---
 libavcodec/mips/h264dsp_mips.h | 24 +--
 libavcodec/mips/h264dsp_msa.c  | 94 ++
 2 files changed, 62 insertions(+), 56 deletions(-)

diff --git a/libavcodec/mips/h264dsp_mips.h b/libavcodec/mips/h264dsp_mips.h
index 21b7de0..9b6f054 100644
--- a/libavcodec/mips/h264dsp_mips.h
+++ b/libavcodec/mips/h264dsp_mips.h
@@ -25,21 +25,21 @@
 #include "libavcodec/h264dec.h"
 #include "constants.h"
 
-void ff_h264_h_lpf_luma_inter_msa(uint8_t *src, int stride,
+void ff_h264_h_lpf_luma_inter_msa(uint8_t *src, ptrdiff_t stride,
   int alpha, int beta, int8_t *tc0);
-void ff_h264_v_lpf_luma_inter_msa(uint8_t *src, int stride,
+void ff_h264_v_lpf_luma_inter_msa(uint8_t *src, ptrdiff_t stride,
   int alpha, int beta, int8_t *tc0);
-void ff_h264_h_lpf_chroma_inter_msa(uint8_t *src, int stride,
+void ff_h264_h_lpf_chroma_inter_msa(uint8_t *src, ptrdiff_t stride,
 int alpha, int beta, int8_t *tc0);
-void ff_h264_v_lpf_chroma_inter_msa(uint8_t *src, int stride,
+void ff_h264_v_lpf_chroma_inter_msa(uint8_t *src, ptrdiff_t stride,
 int alpha, int beta, int8_t *tc0);
-void ff_h264_h_loop_filter_chroma422_msa(uint8_t *src, int32_t stride,
+void ff_h264_h_loop_filter_chroma422_msa(uint8_t *src, ptrdiff_t stride,
  int32_t alpha, int32_t beta,
  int8_t *tc0);
-void ff_h264_h_loop_filter_chroma422_mbaff_msa(uint8_t *src, int32_t stride,
+void ff_h264_h_loop_filter_chroma422_mbaff_msa(uint8_t *src, ptrdiff_t stride,
int32_t alpha, int32_t beta,
int8_t *tc0);
-void ff_h264_h_loop_filter_luma_mbaff_msa(uint8_t *src, int32_t stride,
+void ff_h264_h_loop_filter_luma_mbaff_msa(uint8_t *src, ptrdiff_t stride,
   int32_t alpha, int32_t beta,
   int8_t *tc0);
 
@@ -67,15 +67,15 @@ void ff_h264_idct8_add4_msa(uint8_t *dst, const int 
*blk_offset,
 int16_t *blk, int dst_stride,
 const uint8_t nnzc[15 * 8]);
 
-void ff_h264_h_lpf_luma_intra_msa(uint8_t *src, int stride,
+void ff_h264_h_lpf_luma_intra_msa(uint8_t *src, ptrdiff_t stride,
   int alpha, int beta);
-void ff_h264_v_lpf_luma_intra_msa(uint8_t *src, int stride,
+void ff_h264_v_lpf_luma_intra_msa(uint8_t *src, ptrdiff_t stride,
   int alpha, int beta);
-void ff_h264_h_lpf_chroma_intra_msa(uint8_t *src, int stride,
+void ff_h264_h_lpf_chroma_intra_msa(uint8_t *src, ptrdiff_t stride,
 int alpha, int beta);
-void ff_h264_v_lpf_chroma_intra_msa(uint8_t *src, int stride,
+void ff_h264_v_lpf_chroma_intra_msa(uint8_t *src, ptrdiff_t stride,
 int alpha, int beta);
-void ff_h264_h_loop_filter_luma_mbaff_intra_msa(uint8_t *src, int stride,
+void ff_h264_h_loop_filter_luma_mbaff_intra_msa(uint8_t *src, ptrdiff_t stride,
 int alpha, int beta);
 
 void ff_biweight_h264_pixels16_8_msa(uint8_t *dst, uint8_t *src,
diff --git a/libavcodec/mips/h264dsp_msa.c b/libavcodec/mips/h264dsp_msa.c
index dd05982..a8c3f3c 100644
--- a/libavcodec/mips/h264dsp_msa.c
+++ b/libavcodec/mips/h264dsp_msa.c
@@ -21,7 +21,7 @@
 #include "libavutil/mips/generic_macros_msa.h"
 #include "h264dsp_mips.h"
 
-static void avc_wgt_4x2_msa(uint8_t *data, int32_t stride,
+static void avc_wgt_4x2_msa(uint8_t *data, ptrdiff_t stride,
 int32_t log2_denom, int32_t src_weight,
 int32_t offset_in)
 {
@@ -48,8 +48,9 @@ static void avc_wgt_4x2_msa(uint8_t *data, int32_t stride,
 ST_W2(src0, 0, 1, data, stride);
 }
 
-static void avc_wgt_4x4_msa(uint8_t *data, int32_t stride, int32_t log2_denom,
-int32_t src_weight, int32_t offset_in)
+static void avc_wgt_4x4_msa(uint8_t *data, ptrdiff_t stride,
+int32_t log2_denom, int32_t src_weight,
+int32_t offset_in)
 {
 uint32_t tp0, tp1, tp2, tp3, offset_val;
 v16u8 src0 = { 0 };
@@ -74,8 +75,9 @@ static void avc_wgt_4x4_msa(uint8_t *data, int32_t stride, 
int32_t log2_denom,
 ST_W4(src0, 0, 1, 2, 3, data, stride);
 }
 
-static void avc_wgt_4x8_msa(uint8_t *data, int32_t stride, int32_t log2_denom,
-int32_t src_weight, int32_t offset_in)
+static void avc_wgt_4x8_msa(uint8_t *data, ptrdiff_t stride,
+int32_t log2_denom, int32_t src_weight,
+int32_t offset_in)
 {
 uint32_t tp0, tp1, tp2, tp3, offset_val;
 v16u8 src0 = { 0 }, src1 = {

Re: [FFmpeg-devel] [PATCH v5 5/6] libavcodec: MIPS: MMI: Fix type mismatches

2020-07-16 Thread Shiyou Yin

>-Original Message-
>From: ffmpeg-devel-boun...@ffmpeg.org [mailto:ffmpeg-devel-boun...@ffmpeg.org] 
>On Behalf Of
>Jiaxun Yang
>Sent: Thursday, July 2, 2020 11:46 PM
>To: ffmpeg-devel@ffmpeg.org
>Cc: Jiaxun Yang
>Subject: [FFmpeg-devel] [PATCH v5 5/6] libavcodec: MIPS: MMI: Fix type 
>mismatches
>
>GCC complains about them.
>
>Signed-off-by: Jiaxun Yang 
>---
> libavcodec/mips/h264dsp_mips.h  | 18 +-
> libavcodec/mips/h264dsp_mmi.c   | 18 +-
> libavcodec/mips/xvid_idct_mmi.c |  4 ++--
> libavcodec/mips/xvididct_mips.h |  4 ++--
> 4 files changed, 22 insertions(+), 22 deletions(-)
>
>diff --git a/libavcodec/mips/h264dsp_mips.h b/libavcodec/mips/h264dsp_mips.h
>index 21b7de06f0..7b2a9fabe5 100644
>--- a/libavcodec/mips/h264dsp_mips.h
>+++ b/libavcodec/mips/h264dsp_mips.h
>@@ -357,23 +357,23 @@ void ff_h264_biweight_pixels4_8_mmi(uint8_t *dst, 
>uint8_t *src,
>
> void ff_deblock_v_chroma_8_mmi(uint8_t *pix, ptrdiff_t stride, int alpha, int 
> beta,
> int8_t *tc0);
>-void ff_deblock_v_chroma_intra_8_mmi(uint8_t *pix, int stride, int alpha,
>+void ff_deblock_v_chroma_intra_8_mmi(uint8_t *pix, ptrdiff_t stride, int 
>alpha,
> int beta);
>-void ff_deblock_h_chroma_8_mmi(uint8_t *pix, int stride, int alpha, int beta,
>+void ff_deblock_h_chroma_8_mmi(uint8_t *pix, ptrdiff_t stride, int alpha, int 
>beta,
> int8_t *tc0);
>-void ff_deblock_h_chroma_intra_8_mmi(uint8_t *pix, int stride, int alpha,
>+void ff_deblock_h_chroma_intra_8_mmi(uint8_t *pix, ptrdiff_t stride, int 
>alpha,
> int beta);
>-void ff_deblock_v_luma_8_mmi(uint8_t *pix, int stride, int alpha, int beta,
>+void ff_deblock_v_luma_8_mmi(uint8_t *pix, ptrdiff_t stride, int alpha, int 
>beta,
> int8_t *tc0);
>-void ff_deblock_v_luma_intra_8_mmi(uint8_t *pix, int stride, int alpha,
>+void ff_deblock_v_luma_intra_8_mmi(uint8_t *pix, ptrdiff_t stride, int alpha,
> int beta);
>-void ff_deblock_h_luma_8_mmi(uint8_t *pix, int stride, int alpha, int beta,
>+void ff_deblock_h_luma_8_mmi(uint8_t *pix, ptrdiff_t stride, int alpha, int 
>beta,
> int8_t *tc0);
>-void ff_deblock_h_luma_intra_8_mmi(uint8_t *pix, int stride, int alpha,
>+void ff_deblock_h_luma_intra_8_mmi(uint8_t *pix, ptrdiff_t stride, int alpha,
> int beta);
>-void ff_deblock_v8_luma_8_mmi(uint8_t *pix, int stride, int alpha, int beta,
>+void ff_deblock_v8_luma_8_mmi(uint8_t *pix, ptrdiff_t stride, int alpha, int 
>beta,
> int8_t *tc0);
>-void ff_deblock_v8_luma_intra_8_mmi(uint8_t *pix, int stride, int alpha,
>+void ff_deblock_v8_luma_intra_8_mmi(uint8_t *pix, ptrdiff_t stride, int alpha,
> int beta);
>
> void ff_put_h264_qpel16_mc00_mmi(uint8_t *dst, const uint8_t *src,
>diff --git a/libavcodec/mips/h264dsp_mmi.c b/libavcodec/mips/h264dsp_mmi.c
>index 0459711b82..7a60ee7c2b 100644
>--- a/libavcodec/mips/h264dsp_mmi.c
>+++ b/libavcodec/mips/h264dsp_mmi.c
>@@ -1433,7 +1433,7 @@ void ff_h264_biweight_pixels4_8_mmi(uint8_t *dst, 
>uint8_t *src,
> }
> }
>
>-void ff_deblock_v8_luma_8_mmi(uint8_t *pix, int stride, int alpha, int beta,
>+void ff_deblock_v8_luma_8_mmi(uint8_t *pix, pixdiff_t stride, int alpha, int 
>beta,


Typo. should be ptrdiff_t.



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH v5 4/6] libavcodec: Enable runtime detection for MIPS MMI & MSA

2020-07-16 Thread Shiyou Yin

>-Original Message-
>From: ffmpeg-devel-boun...@ffmpeg.org [mailto:ffmpeg-devel-boun...@ffmpeg.org] 
>On Behalf Of
>Jiaxun Yang
>Sent: Thursday, July 2, 2020 11:46 PM
>To: ffmpeg-devel@ffmpeg.org
>Cc: Jiaxun Yang
>Subject: [FFmpeg-devel] [PATCH v5 4/6] libavcodec: Enable runtime detection 
>for MIPS MMI & MSA
>
>Apply optimized functions according to cpuflags.
>MSA is usually put after MMI as it's generally faster than MMI.
>
>Signed-off-by: Jiaxun Yang 
>--

>diff --git a/libavcodec/mips/wmv2dsp_mips.h b/libavcodec/mips/wmv2dsp_mips.h
>index 22894c505d..f7313460fb 100644
>--- a/libavcodec/mips/wmv2dsp_mips.h
>+++ b/libavcodec/mips/wmv2dsp_mips.h
>@@ -23,7 +23,7 @@
>
> #include "libavcodec/wmv2dsp.h"
>
>-void ff_wmv2_idct_add_mmi(uint8_t *dest, int line_size, int16_t *block);
>-void ff_wmv2_idct_put_mmi(uint8_t *dest, int line_size, int16_t *block);
>+void ff_wmv2_idct_add_mmi(uint8_t *dest, long int line_size, int16_t *block);
>+void ff_wmv2_idct_put_mmi(uint8_t *dest, long int line_size, int16_t *block);
>

Type of line_size should be ptrdiff_t.

> #endif /* AVCODEC_MIPS_WMV2DSP_MIPS_H */
>diff --git a/libavcodec/mips/wmv2dsp_mmi.c b/libavcodec/mips/wmv2dsp_mmi.c
>index 1f6ccb299b..8796ebe195 100644
>--- a/libavcodec/mips/wmv2dsp_mmi.c
>+++ b/libavcodec/mips/wmv2dsp_mmi.c
>@@ -95,7 +95,7 @@ static void wmv2_idct_col_mmi(short * b)
> b[56] = (a0 + a2 - a1 - a5 + 8192) >> 14;
> }
>
>-void ff_wmv2_idct_add_mmi(uint8_t *dest, int line_size, int16_t *block)
>+void ff_wmv2_idct_add_mmi(uint8_t *dest, long int line_size, int16_t *block)
> {
> int i;
> double ftmp[11];
>@@ -212,7 +212,7 @@ void ff_wmv2_idct_add_mmi(uint8_t *dest, int line_size, 
>int16_t *block)
> );
> }
>
>-void ff_wmv2_idct_put_mmi(uint8_t *dest, int line_size, int16_t *block)
>+void ff_wmv2_idct_put_mmi(uint8_t *dest, long int line_size, int16_t *block)

Type of line_size in this two functions should be ptrdiff_t.

> {
> int i;
> double ftmp[8];


___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH v5 1/6] ffbuild: Refine MIPS handling

2020-07-16 Thread Shiyou Yin

Attachment patch include all changes corresponding to my following comments, 
for your reference.

>-Original Message-
>From: ffmpeg-devel-boun...@ffmpeg.org [mailto:ffmpeg-devel-boun...@ffmpeg.org] 
>On Behalf Of
>Jiaxun Yang
>Sent: Thursday, July 2, 2020 11:46 PM
>To: ffmpeg-devel@ffmpeg.org
>Cc: Jiaxun Yang
>Subject: [FFmpeg-devel] [PATCH v5 1/6] ffbuild: Refine MIPS handling
>
>To enable runtime detection for MIPS, we need to refine ffbuild
>part to support buildding these feature together.
>
>Firstly, we fixed configure, let it probe native ability of toolchain
>to decide wether a feature can to be enabled, also clearly marked
>the conflictions between loongson2 & loongson3 and Release 6 & rest.
>
>Secondly, we compile MMI and MSA C sources with their own flags to ensure
>their flags won't pollute the whole program and generate illegal code.
>
>Signed-off-by: Jiaxun Yang 
>
>--
>v5: Minor fixes
>---
> configure| 180 +++
> ffbuild/common.mak   |  10 ++-
> libavcodec/mips/Makefile |   3 +-
> 3 files changed, 118 insertions(+), 75 deletions(-)
>
>diff --git a/configure b/configure
>index 7495f35faa..385a9e5f5f 100755
>--- a/configure
>+++ b/configure
>@@ -2551,7 +2551,7 @@ mips64r6_deps="mips"
> mipsfpu_deps="mips"
> mipsdsp_deps="mips"
> mipsdspr2_deps="mips"
>-mmi_deps="mips"
>+mmi_deps_any="loongson2 loongson3"
> msa_deps="mipsfpu"
> msa2_deps="msa"
>
>@@ -5002,8 +5002,6 @@ elif enabled bfin; then
>
> elif enabled mips; then
>
>-cpuflags="-march=$cpu"
>-
> if [ "$cpu" != "generic" ]; then
> disable mips32r2
> disable mips32r5
>@@ -5012,19 +5010,61 @@ elif enabled mips; then
> disable mips64r6
> disable loongson2
> disable loongson3
>+disable mipsdsp
>+disable mipsdspr2
>+disable msa
>+disable mmi
>+
>+cpuflags="-march=$cpu"
>
> case $cpu in
>-24kc|24kf*|24kec|34kc|1004kc|24kef*|34kf*|1004kf*|74kc|74kf)
>+# General ISA levels
>+mips1|mips3)
>+;;
>+mips32r2)
>+enable msa
> enable mips32r2
>-disable msa
> ;;
>-p5600|i6400|p6600)
>-disable mipsdsp
>-disable mipsdspr2
>+mips32r5)
>+enable msa
>+enable mips32r2
>+enable mips32r5
> ;;
>-loongson*)
>-enable loongson2
>+mips64r2|mips64r5)
>+enable msa
>+enable mmi
>+enable mips64r2
> enable loongson3
>+;;
>+# Cores from MIPS(MTI)
>+24kc)
>+disable mipsfpu
>+enable mips32r2
>+;;
>+24kf*|24kec|34kc|74Kc|1004kc)
>+enable mips32r2
>+;;
>+24kef*|34kf*|1004kf*)
>+enable mipsdsp
>+enable mips32r2
>+;;
>+p5600)
>+enable msa
>+enable mips32r2
>+enable mips32r5
>+check_cflags "-mtune=p5600" && check_cflags "-msched-weight 
>-mload-store-pairs-funroll-loops"
>+;;
>+i6400)
>+enable mips64r6
>+check_cflags "-mtune=i6400 -mabi=64" && check_cflags 
>"-msched-weight-mload-store-pairs -funroll-loops" && check_ldflags "-mabi=64"
>+;;
>+p6600)
>+enable mips64r6
>+check_cflags "-mtune=p6600 -mabi=64" && check_cflags 
>"-msched-weight-mload-store-pairs -funroll-loops" && check_ldflags "-mabi=64"
>+;;
>+# Cores from Loongson
>+loongson2e|loongson2f|loongson3*)
>+enable mmi
> enable local_aligned
> enable simd_align_16
> enable fast_64bit
>@@ -5032,75 +5072,44 @@ elif enabled mips; then
> enable fast_cmov
> enable fast_unaligned
> disable aligned_stack
>-disable mipsdsp
>-disable mipsdspr2
> # When gcc version less than 5.3.0, add 
> -fno-expensive-optimizations flag.
>-if [ $cc == gcc ]; then
>-gcc_version=$(gcc -dumpversion)
>-if [ "$(echo "$gcc_version 5.3.0" | tr " " "\n" | sort 
>-rV | head -n 1)" =="$gcc_version" ]; then
>-expensive_optimization_flag=""
>-else
>+if test "$cc_type" = "gcc"; then
>+case $gcc_basever in
>+2|2.*|3.*|4.*|5.0|5.1|5.2)
> 
> expensive_optimization_flag="-fno-expensive-optimizations"
>-fi
>+;;
>+*)
>+expensive_optimization_flag=""
>+

Re: [FFmpeg-devel] [PATCH v4 4/4] libavcodec: Enable runtime detection for MIPS MMI & MSA

2020-06-09 Thread Shiyou Yin

>-Original Message-
>From: ffmpeg-devel-boun...@ffmpeg.org [mailto:ffmpeg-devel-boun...@ffmpeg.org] 
>On Behalf Of
>Jiaxun Yang
>Sent: Monday, June 8, 2020 11:32 AM
>To: ffmpeg-devel@ffmpeg.org
>Cc: yinshi...@loongson.cn; Jiaxun Yang
>Subject: [FFmpeg-devel] [PATCH v4 4/4] libavcodec: Enable runtime detection 
>for MIPS MMI & MSA
>
>Apply optimized functions according to cpuflags.
>MSA is always put after MMI as it's usually faster than MMI.
>
>Signed-off-by: Jiaxun Yang 
>---
> libavcodec/mips/blockdsp_init_mips.c| 22 +-
> libavcodec/mips/cabac.h |  2 +-
> libavcodec/mips/h263dsp_init_mips.c | 12 +++---
> libavcodec/mips/h264chroma_init_mips.c  | 22 +-
> libavcodec/mips/h264dsp_init_mips.c | 25 -
> libavcodec/mips/h264pred_init_mips.c| 25 -
> libavcodec/mips/h264qpel_init_mips.c| 22 +-
> libavcodec/mips/hevcdsp_init_mips.c | 24 +++-
> libavcodec/mips/hevcpred_init_mips.c| 12 +++---
> libavcodec/mips/hpeldsp_init_mips.c | 22 +-
> libavcodec/mips/idctdsp_init_mips.c | 24 +++-
> libavcodec/mips/me_cmp_init_mips.c  | 12 +++---
> libavcodec/mips/mpegvideo_init_mips.c   | 22 +-
> libavcodec/mips/mpegvideoencdsp_init_mips.c | 13 ---
> libavcodec/mips/pixblockdsp_init_mips.c | 25 -
> libavcodec/mips/qpeldsp_init_mips.c | 12 +++---
> libavcodec/mips/vc1dsp_init_mips.c  | 22 +-
> libavcodec/mips/videodsp_init.c | 12 +++---
> libavcodec/mips/vp3dsp_init_mips.c  | 22 +-
> libavcodec/mips/vp8dsp_init_mips.c  | 22 +-
> libavcodec/mips/vp9dsp_init_mips.c  | 22 +-
> libavcodec/mips/wmv2dsp_init_mips.c | 12 +++---
> libavcodec/mips/xvididct_init_mips.c| 13 ---
> 23 files changed, 312 insertions(+), 109 deletions(-)
>
>diff --git a/libavcodec/mips/blockdsp_init_mips.c 
>b/libavcodec/mips/blockdsp_init_mips.c
>index 55ac1c3e99..47170c17ef 100644
>--- a/libavcodec/mips/blockdsp_init_mips.c
>+++ b/libavcodec/mips/blockdsp_init_mips.c
>@@ -19,6 +19,7 @@
>  * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 
> USA
>  */
>
>+#include "libavutil/mips/cpu.h"
> #include "blockdsp_mips.h"
>
> #if HAVE_MSA
>@@ -30,6 +31,10 @@ static av_cold void blockdsp_init_msa(BlockDSPContext *c)
> c->fill_block_tab[0] = ff_fill_block16_msa;
> c->fill_block_tab[1] = ff_fill_block8_msa;
> }
>+#else
>+static av_cold void blockdsp_init_msa(BlockDSPContext *c)
>+{
>+}
> #endif  // #if HAVE_MSA
>
> #if HAVE_MMI
>@@ -41,14 +46,19 @@ static av_cold void blockdsp_init_mmi(BlockDSPContext *c)
> c->fill_block_tab[0] = ff_fill_block16_mmi;
> c->fill_block_tab[1] = ff_fill_block8_mmi;
> }
>+#else
>+static av_cold void blockdsp_init_mmi(BlockDSPContext *c)
>+{
>+}
> #endif /* HAVE_MMI */
>

Move "#if HAVE_MSA " into the init function (Same in other init functions). 
static av_cold void blockdsp_init_msa(BlockDSPContext *c)
{
#if HAVE_MSA
c->clear_block = ff_clear_block_msa;
c->clear_blocks = ff_clear_blocks_msa;

c->fill_block_tab[0] = ff_fill_block16_msa;
c->fill_block_tab[1] = ff_fill_block8_msa;
#endif  // #if HAVE_MSA
}


___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH v4 3/4] libavutil: Detect MMI and MSA flags for MIPS

2020-06-09 Thread Shiyou Yin

>-Original Message-
>From: ffmpeg-devel-boun...@ffmpeg.org [mailto:ffmpeg-devel-boun...@ffmpeg.org] 
>On Behalf Of
>Jiaxun Yang
>Sent: Monday, June 8, 2020 11:32 AM
>To: ffmpeg-devel@ffmpeg.org
>Cc: yinshi...@loongson.cn; Jiaxun Yang
>Subject: [FFmpeg-devel] [PATCH v4 3/4] libavutil: Detect MMI and MSA flags for 
>MIPS
>
>Add MMI & MSA runtime detection for MIPS.
>
>Basically there are two code pathes. For systems that
>natively support CPUCFG instruction or kernel emulated
>that instruction, we'll sense this feature from HWCAP and
>report the flags according to values grab from CPUCFG. For
>systems that have no CPUCFG (or not export it in HWCAP),
>we'll parse /proc/cpuinfo instead.
>
>Signed-off-by: Jiaxun Yang 
>---
>v2: Implement CPUCFG code path as CPUCFG emulation and HWCAP
>   have accepted by Linux Kernel upstream.
>---
> libavutil/cpu.c   |  10 +++
> libavutil/cpu.h   |   3 +
> libavutil/cpu_internal.h  |   2 +
> libavutil/mips/Makefile   |   2 +-
> libavutil/mips/cpu.c  | 134 ++
> libavutil/mips/cpu.h  |  28 
> libavutil/tests/cpu.c |   3 +
> tests/checkasm/checkasm.c |   3 +
> 8 files changed, 184 insertions(+), 1 deletion(-)
> create mode 100644 libavutil/mips/cpu.c
> create mode 100644 libavutil/mips/cpu.h
>
>diff --git a/libavutil/mips/cpu.c b/libavutil/mips/cpu.c
>new file mode 100644
>index 00..e9e291a45a
>--- /dev/null
>+++ b/libavutil/mips/cpu.c
>@@ -0,0 +1,134 @@
>+/*
>+ * This file is part of FFmpeg.
>+ *
>+ * FFmpeg is free software; you can redistribute it and/or
>+ * modify it under the terms of the GNU Lesser General Public
>+ * License as published by the Free Software Foundation; either
>+ * version 2.1 of the License, or (at your option) any later version.
>+ *
>+ * FFmpeg is distributed in the hope that it will be useful,
>+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
>+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>+ * Lesser General Public License for more details.
>+ *
>+ * You should have received a copy of the GNU Lesser General Public
>+ * License along with FFmpeg; if not, write to the Free Software
>+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 
>USA
>+ */
>+
>+#include "libavutil/cpu.h"
>+#include "libavutil/cpu_internal.h"
>+#include "config.h"
>+#if defined __linux__ || defined __ANDROID__
>+#include 
>+#include 
>+#include 
>+#include 
>+#include "asmdefs.h"
>+#include "libavutil/avstring.h"
>+#endif
>+
>+#if defined __linux__ || defined __ANDROID__
>+
>+#define HWCAP_LOONGSON_CPUCFG (1 << 14)
>+
>+static int cpucfg_available(void)
>+{
>+return getauxval(AT_HWCAP) & HWCAP_LOONGSON_CPUCFG;
>+}
>+
>+/* Most toolchains have no CPUCFG support yet */
>+static uint32_t read_cpucfg(uint32_t reg)
>+{
>+  uint32_t __res;
>+
>+  __asm__ __volatile__(
>+  "parse_r __res,%0\n\t"
>+  "parse_r reg,%1\n\t"
>+  ".insn \n\t"
>+  ".word (0xc8080118 | (reg << 21) | (__res << 11))\n\t"
>+  :"=r"(__res)
>+  :"r"(reg)
>+  :
>+  );
>+  return __res;
>+}
>+
>+#define LOONGSON_CFG1 0x1
>+
>+#define LOONGSON_CFG1_MMI (1 << 4)
>+#define LOONGSON_CFG1_MSA1(1 << 5)
>+
>+static int cpu_flags_cpucfg(void)
>+{
>+int flags = 0;
>+uint32_t cfg1 = read_cpucfg(LOONGSON_CFG1);
>+
>+if (cfg1 & LOONGSON_CFG1_MMI)
>+flags |= AV_CPU_FLAG_MMI;
>+
>+if (cfg1 & LOONGSON_CFG1_MMI)
>+flags |= AV_CPU_FLAG_MSA;

Should be LOONGSON_CFG1_MSA1.

>+
>+return flags;
>+}
>+

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH v4 2/4] libavutils: Add parse_r helper for MIPS

2020-06-09 Thread Shiyou Yin

>-Original Message-
>From: ffmpeg-devel-boun...@ffmpeg.org [mailto:ffmpeg-devel-boun...@ffmpeg.org] 
>On Behalf Of
>jiaxun.y...@flygoat.com
>Sent: Tuesday, June 9, 2020 1:59 PM
>To: Shiyou Yin; 'FFmpeg development discussions and patches'
>Subject: Re: [FFmpeg-devel] [PATCH v4 2/4] libavutils: Add parse_r helper for 
>MIPS
>
>
>
>于 2020年6月9日 GMT+08:00 下午1:43:58, Shiyou Yin  写到:
>>>-Original Message-
>>>From: jiaxun.y...@flygoat.com [mailto:jiaxun.y...@flygoat.com]
>>>Sent: Tuesday, June 9, 2020 2:03 AM
>>>To: FFmpeg development discussions and patches; Shiyou Yin; 'FFmpeg 
>>>development discussions and
>>>patches'
>>>Subject: Re: [FFmpeg-devel] [PATCH v4 2/4] libavutils: Add parse_r helper 
>>>for MIPS
>>>
>>>
>>>
>>>于 2020年6月8日 GMT+08:00 下午4:38:58, Shiyou Yin  写到:
>>>>>-Original Message-
>>>>>From: ffmpeg-devel-boun...@ffmpeg.org 
>>>>>[mailto:ffmpeg-devel-boun...@ffmpeg.org] On Behalf Of
>>>>>Jiaxun Yang
>>>>>Sent: Monday, June 8, 2020 11:30 AM
>>>>>To: ffmpeg-devel@ffmpeg.org
>>>>>Cc: yinshi...@loongson.cn; Jiaxun Yang
>>>>>Subject: [FFmpeg-devel] [PATCH v4 2/4] libavutils: Add parse_r helper for 
>>>>>MIPS
>>>>>
>>>
>>>[...]
>>>
>>>>In inline assembler, we can add asmSymbolicName in Input/ Output Operands, 
>>>>format:
>>>>[ [asmSymbolicName] ] constraint (cExpression)
>>>>[ [asmSymbolicName] ] constraint (cVariableName)
>>>
>>>Could you expand it?
>>>I'm not really sure how that related to our case.
>>>
>>>I'm trying to use raw opcode in inline assembly and I need
>>>this helper to deal with oprands in raw opcode.
>>>
>>>Thanks!
>>>
>>
>>For the raw opcode case, another proposal for your reference.
>>static uint32_t read_cpucfg_2(uint32_t reg)
>>{
>>register uint32_t input __asm__ ("t0") = reg;
>>register uint32_t __res __asm__ ("t1") = 0;
>>__asm__ volatile(
>>".insn \n\t"
>>".word (0xc9084918) \n\t"
>>);
>>return __res;
>
>Actually this is not always safe.
>t0 and t1 might be clobbered by compiler optimization.
>
>>}
>
>I need this helper to ensure the register usage is guarded by
>compiler's allocation system.
>

Got it, than LGTM.

>--
>Jiaxun Yang
>___
>ffmpeg-devel mailing list
>ffmpeg-devel@ffmpeg.org
>https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
>To unsubscribe, visit link above, or email
>ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH v4 2/4] libavutils: Add parse_r helper for MIPS

2020-06-08 Thread Shiyou Yin

>-Original Message-
>From: jiaxun.y...@flygoat.com [mailto:jiaxun.y...@flygoat.com]
>Sent: Tuesday, June 9, 2020 2:03 AM
>To: FFmpeg development discussions and patches; Shiyou Yin; 'FFmpeg 
>development discussions and
>patches'
>Subject: Re: [FFmpeg-devel] [PATCH v4 2/4] libavutils: Add parse_r helper for 
>MIPS
>
>
>
>于 2020年6月8日 GMT+08:00 下午4:38:58, Shiyou Yin  写到:
>>>-Original Message-
>>>From: ffmpeg-devel-boun...@ffmpeg.org 
>>>[mailto:ffmpeg-devel-boun...@ffmpeg.org] On Behalf Of
>>>Jiaxun Yang
>>>Sent: Monday, June 8, 2020 11:30 AM
>>>To: ffmpeg-devel@ffmpeg.org
>>>Cc: yinshi...@loongson.cn; Jiaxun Yang
>>>Subject: [FFmpeg-devel] [PATCH v4 2/4] libavutils: Add parse_r helper for 
>>>MIPS
>>>
>
>[...]
>
>>In inline assembler, we can add asmSymbolicName in Input/ Output Operands, 
>>format:
>>[ [asmSymbolicName] ] constraint (cExpression)
>>[ [asmSymbolicName] ] constraint (cVariableName)
>
>Could you expand it?
>I'm not really sure how that related to our case.
>
>I'm trying to use raw opcode in inline assembly and I need
>this helper to deal with oprands in raw opcode.
>
>Thanks!
>

For the raw opcode case, another proposal for your reference.
static uint32_t read_cpucfg_2(uint32_t reg)
{
register uint32_t input __asm__ ("t0") = reg;
register uint32_t __res __asm__ ("t1") = 0;
__asm__ volatile(
".insn \n\t"
".word (0xc9084918) \n\t"
);
return __res;
}


___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH v4 2/4] libavutils: Add parse_r helper for MIPS

2020-06-08 Thread Shiyou Yin

>-Original Message-
>From: ffmpeg-devel-boun...@ffmpeg.org [mailto:ffmpeg-devel-boun...@ffmpeg.org] 
>On Behalf Of
>Jiaxun Yang
>Sent: Monday, June 8, 2020 11:30 AM
>To: ffmpeg-devel@ffmpeg.org
>Cc: yinshi...@loongson.cn; Jiaxun Yang
>Subject: [FFmpeg-devel] [PATCH v4 2/4] libavutils: Add parse_r helper for MIPS
>
>That helper grab from kernel code can allow us to inline
>newer instructions (not implemented by the assembler) in
>a elegant manner.
>
>Signed-off-by: Jiaxun Yang 
>---
> libavutil/mips/asmdefs.h | 42 
> 1 file changed, 42 insertions(+)
>
>diff --git a/libavutil/mips/asmdefs.h b/libavutil/mips/asmdefs.h
>index 748119918a..7e0e4cf575 100644
>--- a/libavutil/mips/asmdefs.h
>+++ b/libavutil/mips/asmdefs.h
>@@ -55,4 +55,46 @@
> # define PTR_SLL"sll "
> #endif
>
>+/*
>+ * parse_r var, r - Helper assembler macro for parsing register names.
>+ *
>+ * This converts the register name in $n form provided in \r to the
>+ * corresponding register number, which is assigned to the variable \var. It 
>is
>+ * needed to allow explicit encoding of instructions in inline assembly where
>+ * registers are chosen by the compiler in $n form, allowing us to avoid using
>+ * fixed register numbers.
>+ *
>+ * It also allows newer instructions (not implemented by the assembler) to be
>+ * transparently implemented using assembler macros, instead of needing 
>separate
>+ * cases depending on toolchain support.
>+ *
>+ * Simple usage example:
>+ * __asm__ __volatile__("parse_r __rt, %0\n\t"
>+ *".insn\n\t"
>+ *"# di%0\n\t"
>+ *".word   (0x41606000 | (__rt << 16))"
>+ *: "=r" (status);
>+ */
>+
>+/* Match an individual register number and assign to \var */
>+#define _IFC_REG(n)   \
>+  ".ifc   \\r, $" #n "\n\t"   \
>+  "\\var  = " #n "\n\t"   \
>+  ".endif\n\t"
>+
>+__asm__(".macro   parse_r var r\n\t"
>+  "\\var  = -1\n\t"
>+  _IFC_REG(0)  _IFC_REG(1)  _IFC_REG(2)  _IFC_REG(3)
>+  _IFC_REG(4)  _IFC_REG(5)  _IFC_REG(6)  _IFC_REG(7)
>+  _IFC_REG(8)  _IFC_REG(9)  _IFC_REG(10) _IFC_REG(11)
>+  _IFC_REG(12) _IFC_REG(13) _IFC_REG(14) _IFC_REG(15)
>+  _IFC_REG(16) _IFC_REG(17) _IFC_REG(18) _IFC_REG(19)
>+  _IFC_REG(20) _IFC_REG(21) _IFC_REG(22) _IFC_REG(23)
>+  _IFC_REG(24) _IFC_REG(25) _IFC_REG(26) _IFC_REG(27)
>+  _IFC_REG(28) _IFC_REG(29) _IFC_REG(30) _IFC_REG(31)
>+  ".iflt  \\var\n\t"
>+  ".error \"Unable to parse register name \\r\"\n\t"
>+  ".endif\n\t"
>+  ".endm");
>+

In inline assembler, we can add asmSymbolicName in Input/ Output Operands, 
format:
[ [asmSymbolicName] ] constraint (cExpression)
[ [asmSymbolicName] ] constraint (cVariableName)

> #endif /* AVCODEC_MIPS_ASMDEFS_H */
>--
>2.20.1
>___
>ffmpeg-devel mailing list
>ffmpeg-devel@ffmpeg.org
>https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
>To unsubscribe, visit link above, or email
>ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH v4 1/4] ffbuild: Refine MIPS handling

2020-06-08 Thread Shiyou Yin

>-Original Message-
>From: ffmpeg-devel-boun...@ffmpeg.org [mailto:ffmpeg-devel-boun...@ffmpeg.org] 
>On Behalf Of
>Jiaxun Yang
>Sent: Monday, June 8, 2020 11:30 AM
>To: ffmpeg-devel@ffmpeg.org
>Cc: yinshi...@loongson.cn; Jiaxun Yang
>Subject: [FFmpeg-devel] [PATCH v4 1/4] ffbuild: Refine MIPS handling
>
>To enable runtime detection for MIPS, we need to refine ffbuild
>part to support buildding these feature together.
>
>Firstly, we fixed configure, let it probe native ability of toolchain
>to decide wether a feature can to be enabled, also clearly marked
>the conflictions between loongson2 & loongson3 and Release 6 & rest.
>
>Secondly, we compile MMI and MSA C sources with their own flags to ensure
>their flags won't pollute the whole program and generate illegal code.
>
>Signed-off-by: Jiaxun Yang 
>
>--
>v3: Address Shiyou's review suggestions,
>   Fix GCC version detection method.
>v4: Disable DSP* defaultly.
>---
> configure| 183 +++
> ffbuild/common.mak   |  10 ++-
> libavcodec/mips/Makefile |   3 +-
> 3 files changed, 120 insertions(+), 76 deletions(-)
>
>diff --git a/configure b/configure
>index f97cad0298..03516effe1 100755
>--- a/configure
>+++ b/configure
>@@ -2542,7 +2542,7 @@ vsx_deps="altivec"
> power8_deps="vsx"
>
> loongson2_deps="mips"
>-loongson3_deps="mips"
>+loongson3_deps="mips64r2"
> mips32r2_deps="mips"
> mips32r5_deps="mips"
> mips32r6_deps="mips"
>@@ -2551,7 +2551,7 @@ mips64r6_deps="mips"
> mipsfpu_deps="mips"
> mipsdsp_deps="mips"
> mipsdspr2_deps="mips"
>-mmi_deps="mips"
>+mmi_deps_any="loongson2 loongson3"
> msa_deps="mipsfpu"
> msa2_deps="msa"
>
>@@ -4999,8 +4999,6 @@ elif enabled bfin; then
>
> elif enabled mips; then
>
>-cpuflags="-march=$cpu"
>-
> if [ "$cpu" != "generic" ]; then
> disable mips32r2
> disable mips32r5
>@@ -5009,19 +5007,63 @@ elif enabled mips; then
> disable mips64r6
> disable loongson2
> disable loongson3
>+disable mipsdsp
>+disable mipsdspr2
>+disable msa
>+disable mmi
>+
>+cpuflags="-march=$cpu"
>
> case $cpu in
>-24kc|24kf*|24kec|34kc|1004kc|24kef*|34kf*|1004kf*|74kc|74kf)
>+# General ISA levels
>+mips1|mips3)
>+;;
>+mips32r2)
>+enable msa
> enable mips32r2
>-disable msa
> ;;
>-p5600|i6400|p6600)
>-disable mipsdsp
>-disable mipsdspr2
>+mips32r5)
>+enable msa
>+enable mips32r2
>+enable mips32r5
> ;;
>-loongson*)
>-enable loongson2
>+mips64r2|mips64r5)
>+enable msa
>+enable mmi
>+enable mips64r2
> enable loongson3
>+;;
>+# Cores from MIPS(MTI)
>+24kc)
>+disable mipsfpu
>+enable mips32r2
>+;;
>+24kf*|24kec|34kc|74Kc|1004kc)
>+enable mips32r2
>+;;
>+24kef*|34kf*|1004kf*)
>+enable mipsdsp
>+enable mips32r2
>+;;
>+p5600)
>+enable msa
>+enable mips32r2
>+enable mips32r5
>+check_cflags "-mtune=p5600" && check_cflags "-msched-weight 
>-mload-store-pairs
>-funroll-loops"
>+;;
>+i6400)
>+enable mips64r6
>+check_cflags "-mtune=i6400 -mabi=64" && check_cflags 
>"-msched-weight
>-mload-store-pairs -funroll-loops" && check_ldflags "-mabi=64"
>+;;
>+p6600)
>+enable mips64r6
>+check_cflags "-mtune=p6600 -mabi=64" && check_cflags 
>"-msched-weight
>-mload-store-pairs -funroll-loops" && check_ldflags "-mabi=64"
>+;;
>+# Cores from Loongson
>+loongson2e|loongson2f|loongson3*)
>+disable mipsdsp
>+disable mipsdspr2

Do not need to disable mipsdsp and mipsdspr2 here again.

>+enable mmi
> enable local_aligned
> enable simd_align_16
> enable fast_64bit
>@@ -5029,75 +5071,45 @@ elif enabled mips; then
> enable fast_cmov
> enable fast_unaligned
> disable aligned_stack
>-disable mipsdsp
>-disable mipsdspr2
> # When gcc version less than 5.3.0, add 
> -fno-expensive-optimizations flag.
>-if [ $cc == gcc ]; then
>-gcc_version=$(gcc -dumpversion)
>-if [ "$(echo "$gcc_version 5.3.0" | tr " " "\n" | sort 
>-rV | head -n 1)" ==
>"$gcc_version" ]; then
>-expensive_optimization_flag=""
>-else
>+if

Re: [FFmpeg-devel] [PATCH v3 1/4] ffbuild: Refine MIPS handling

2020-06-06 Thread Shiyou Yin

>-Original Message-
>From: ffmpeg-devel-boun...@ffmpeg.org [mailto:ffmpeg-devel-boun...@ffmpeg.org] 
>On Behalf Of
>Jiaxun Yang
>Sent: Saturday, June 6, 2020 3:34 PM
>To: ffmpeg-devel@ffmpeg.org
>Cc: yinshi...@loongson.cn; Jiaxun Yang
>Subject: [FFmpeg-devel] [PATCH v3 1/4] ffbuild: Refine MIPS handling
>
>To enable runtime detection for MIPS, we need to refine ffbuild
>part to support buildding these feature together.
>
>Firstly, we fixed configure, let it probe native ability of toolchain
>to decide wether a feature can to be enabled, also clearly marked
>the conflictions between loongson2 & loongson3 and Release 6 & rest.
>
>Secondly, we compile MMI and MSA C sources with their own flags to ensure
>their flags won't pollute the whole program and generate illegal code.
>
>Signed-off-by: Jiaxun Yang 
>
>--
>v3: Address Shiyou's review suggestions,
>   Fix GCC version detection method.
>---
> configure| 192 +--
> ffbuild/common.mak   |  10 +-
> libavcodec/mips/Makefile |   3 +-
> 3 files changed, 134 insertions(+), 71 deletions(-)
>
>diff --git a/configure b/configure
>index f97cad0298..f2d924529f 100755
>--- a/configure
>+++ b/configure
>@@ -2542,7 +2542,7 @@ vsx_deps="altivec"
> power8_deps="vsx"
>
> loongson2_deps="mips"
>-loongson3_deps="mips"
>+loongson3_deps="mips64r2"
> mips32r2_deps="mips"
> mips32r5_deps="mips"
> mips32r6_deps="mips"
>@@ -2551,7 +2551,7 @@ mips64r6_deps="mips"
> mipsfpu_deps="mips"
> mipsdsp_deps="mips"
> mipsdspr2_deps="mips"
>-mmi_deps="mips"
>+mmi_deps_any="loongson2 loongson3"
> msa_deps="mipsfpu"
> msa2_deps="msa"
>
>@@ -4999,8 +4999,6 @@ elif enabled bfin; then
>
> elif enabled mips; then
>
>-cpuflags="-march=$cpu"
>-
> if [ "$cpu" != "generic" ]; then
> disable mips32r2
> disable mips32r5
>@@ -5010,92 +5008,125 @@ elif enabled mips; then
> disable loongson2
> disable loongson3
>

Suggest to disable mipsdsp,mipsdspr2,mmi,msa here too.
Only few kind of cpu support them, enable them in each case.

>+cpuflags="-march=$cpu"
>+
> case $cpu in
>-24kc|24kf*|24kec|34kc|1004kc|24kef*|34kf*|1004kf*|74kc|74kf)
>-enable mips32r2
>-disable msa
>-;;
>-p5600|i6400|p6600)
>+# General ISA levels
>+mips1|mips3)
> disable mipsdsp
> disable mipsdspr2
>+disable msa
>+disable mmi
> ;;
>-loongson*)
>-enable loongson2
>-enable loongson3
>-enable local_aligned
>-enable simd_align_16
>-enable fast_64bit
>-enable fast_clz
>-enable fast_cmov
>-enable fast_unaligned
>-disable aligned_stack
>+mips32r2)
> disable mipsdsp
> disable mipsdspr2
>-# When gcc version less than 5.3.0, add 
>-fno-expensive-optimizations flag.
>-if [ $cc == gcc ]; then
>-gcc_version=$(gcc -dumpversion)
>-if [ "$(echo "$gcc_version 5.3.0" | tr " " "\n" | sort 
>-rV | head -n 1)" ==
>"$gcc_version" ]; then
>-expensive_optimization_flag=""
>-else
>-
>expensive_optimization_flag="-fno-expensive-optimizations"
>-fi
>-fi
>-case $cpu in
>-loongson3*)
>-cpuflags="-march=loongson3a -mhard-float 
>$expensive_optimization_flag"
>-;;
>-loongson2e)
>-cpuflags="-march=loongson2e -mhard-float 
>$expensive_optimization_flag"
>-;;
>-loongson2f)
>-cpuflags="-march=loongson2f -mhard-float 
>$expensive_optimization_flag"
>-;;
>-esac
>+disable mmi
>+enable mips32r2
> ;;
>-*)
>-# Unknown CPU. Disable everything.
>-warn "unknown CPU. Disabling all MIPS optimizations."
>-disable mipsfpu
>+mips32r5)
> disable mipsdsp
> disable mipsdspr2
>-disable msa
> disable mmi
>+enable mips32r2
>+enable mips32r5
> ;;
>-esac
>-
>-case $cpu in
>-24kc)
>-disable mipsfpu
>+mips64r2|mips64r5)
> disable mipsdsp
> disable mipsdspr2
>+enable mips64r2
>+enable loongson3
> ;;
>-24kf*)
>+# Cores from MIPS(MTI)
>+24kc)
> disable mipsdsp
> disable mipsdspr2
>-;;
>-

Re: [FFmpeg-devel] [PATCH v2 1/4] ffbuild: Refine MIPS handling

2020-06-04 Thread Shiyou Yin

For your convenience, I add the previous comments into this patch.

>-Original Message-
>From: ffmpeg-devel-boun...@ffmpeg.org [mailto:ffmpeg-devel-boun...@ffmpeg.org] 
>On Behalf Of
>Jiaxun Yang
>Sent: Tuesday, June 2, 2020 10:15 PM
>To: ffmpeg-devel@ffmpeg.org
>Cc: yinshi...@loongson.cn; Jiaxun Yang
>Subject: [FFmpeg-devel] [PATCH v2 1/4] ffbuild: Refine MIPS handling
>
>To enable runtime detection for MIPS, we need to refine ffbuild
>part to support buildding these feature together.
>
>Firstly, we fixed configure, let it probe native ability of toolchain
>to decide wether a feature can to be enabled, also clearly marked
>the conflictions between loongson2 & loongson3 and Release 6 & rest.
>
>Secondly, we compile MMI and MSA C sources with their own flags to ensure
>their flags won't pollute the whole program and generate illegal code.
>
>Signed-off-by: Jiaxun Yang 
>---
> configure| 179 +++
> ffbuild/common.mak   |  10 ++-
> libavcodec/mips/Makefile |   3 +-
> 3 files changed, 117 insertions(+), 75 deletions(-)
>
>diff --git a/configure b/configure
>index f97cad0298..8dc3874642 100755
>--- a/configure
>+++ b/configure
>@@ -1113,6 +1113,26 @@ void foo(void){ __asm__ volatile($code); }
> EOF
> }
>
>+check_extra_inline_asm_flags(){
>+log check_extra_inline_asm_flags "$@"
>+name="$1"
>+extra=$2
>+code="$3"
>+flags=''
>+shift 3
>+while [ "$1" != "" ]; do
>+  append flags $1
>+  shift
>+done;
>+disable $name
>+cat > $TMPC <+void foo(void){ __asm__ volatile($code); }
>+EOF
>+log_file $TMPC
>+test_cmd $cc $CPPFLAGS $CFLAGS $flags "$@" $CC_C $(cc_o $TMPO) $TMPC &&
>+enable $name && append $extra "$flags"
>+}
>+

You can use check_inline_asm. e.g. 
enabled msa && check_inline_asm msa '"addvi.b $w0, $w1, 1"' '-mmsa' && append 
MSAFLAGS '-mmsa'
enabled mmi && check_inline_asm mmi '"punpcklhw $f0, $f0, $f0"' 
"-mloongson-mmi" && append MMIFLAGS '-mloongson-mmi'
enabled loongson3 && check_inline_asm loongson3 '"gsldxc1 $f0, 0($2, $3)"' 
'-mloongson-ext' && append MMIFLAGS '-mloongson-ext'

> check_inline_asm_flags(){
> log check_inline_asm_flags "$@"
> name="$1"
>@@ -2551,7 +2571,7 @@ mips64r6_deps="mips"
> mipsfpu_deps="mips"
> mipsdsp_deps="mips"
> mipsdspr2_deps="mips"
>-mmi_deps="mips"
>+mmi_deps_any="loongson2 loongson3"
> msa_deps="mipsfpu"
> msa2_deps="msa"
>
>@@ -4999,29 +5019,57 @@ elif enabled bfin; then
>
> elif enabled mips; then

In this block, you only need to disable the unsupported extensions for each cpu 
case,
for all extensions in ARCH_EXT_LIST will be enabled by default.

>
>-cpuflags="-march=$cpu"
>-
> if [ "$cpu" != "generic" ]; then
>-disable mips32r2
>-disable mips32r5
>-disable mips64r2
>-disable mips32r6
>-disable mips64r6
>-disable loongson2
>-disable loongson3
>+# DSP is disabled by deafult as they can't be detected at runtime
>+disable mipsdsp
>+disable mipsdspr2
>+
>+cpuflags="-march=$cpu"
>
> case $cpu in
>-24kc|24kf*|24kec|34kc|1004kc|24kef*|34kf*|1004kf*|74kc|74kf)
>+# General ISA levels
>+mips1|mips3)
>+disable msa
>+;;
>+mips32r2)
> enable mips32r2

If you havn't disable it at the beginning of this if block, you needn’t enable 
it explicitly here.

>+;;
>+mips32r5)
>+enable mips32r5
>+;;
>+mips64r2|mips64r5)
>+enable mips64r2
>+;;
>+# Cores from MIPS(MTI)
>+24kc)
>+disable mipsfpu
>+;;
>+24kf*|24kec|34kc|74Kc|1004kc)
>+disable mmi
> disable msa
> ;;
>-p5600|i6400|p6600)
>-disable mipsdsp
>-disable mipsdspr2
>+24kef*|34kf*|1004kf*)
>+disable mmi
>+disable msa
>+enable mipsdsp
>+;;
>+p5600)
>+disable mmi
>+enable mips32r5
>+check_cflags "-mtune=p5600" && check_cflags "-msched-weight 
>-mload-store-pairs
>-funroll-loops"
>+;;
>+i6400)
>+disable mmi
>+enable mips64r6
>+check_cflags "-mtune=i6400 -mabi=64" && check_cflags 
>"-msched-weight
>-mload-store-pairs -funroll-loops" && check_ldflags "-mabi=64"
>+;;
>+p6600)
>+disable mmi
>+enable mips64r6
>+check_cflags "-mtune=p6600 -mabi=64" && check_cflags 
>"-msched-weight
>-mload-store-pairs -funroll-loops" && check_ldflags "-mabi=64"
> ;;
>+# Cores from Loongson
> loongson*)
>-enable loongson2
>-enable loongson3
> enable

Re: [FFmpeg-devel] [PATCH 1/3] ffbuild: Refine MIPS handling

2020-05-31 Thread Shiyou Yin

>-Original Message-
>From: ffmpeg-devel-boun...@ffmpeg.org [mailto:ffmpeg-devel-boun...@ffmpeg.org] 
>On Behalf Of
>Shiyou Yin
>Sent: Sunday, May 31, 2020 11:33 AM
>To: 'FFmpeg development discussions and patches'
>Cc: yinshi...@loongson.cn
>Subject: Re: [FFmpeg-devel] [PATCH 1/3] ffbuild: Refine MIPS handling
>
>>-Original Message-
>>From: ffmpeg-devel-boun...@ffmpeg.org 
>>[mailto:ffmpeg-devel-boun...@ffmpeg.org] On Behalf Of
>>Jiaxun Yang
>>Sent: Tuesday, May 26, 2020 5:48 PM
>>To: ffmpeg-devel@ffmpeg.org
>>Cc: yinshi...@loongson.cn; Jiaxun Yang
>>Subject: [FFmpeg-devel] [PATCH 1/3] ffbuild: Refine MIPS handling
>>
>>To enable runtime detection for MIPS, we need to refine ffbuild
>>part to support buildding these feature together.
>>
>>Firstly, we fixed configure, let it probe native ability of toolchain
>>to decide wether a feature can to be enabled, also clearly marked
>>the conflictions between loongson2 & loongson3 and Release 6 & rest.
>>
>>Secondly, we compile MMI and MSA C sources with their own flags to ensure
>>their flags won't pollute the whole program and generate illegal code.
>>
>>Signed-off-by: Jiaxun Yang 
>>---
>> configure| 179 +++
>> ffbuild/common.mak   |  10 ++-
>> libavcodec/mips/Makefile |   3 +-
>> 3 files changed, 117 insertions(+), 75 deletions(-)
>>
>>diff --git a/configure b/configure
>>index f97cad0298..8dc3874642 100755
>>--- a/configure
>>+++ b/configure
>>@@ -1113,6 +1113,26 @@ void foo(void){ __asm__ volatile($code); }
>> EOF
>> }
>>
>>+check_extra_inline_asm_flags(){
>>+log check_extra_inline_asm_flags "$@"
>>+name="$1"
>>+extra=$2
>>+code="$3"
>>+flags=''
>>+shift 3
>>+while [ "$1" != "" ]; do
>>+  append flags $1
>>+  shift
>>+done;
>>+disable $name
>>+cat > $TMPC <>+void foo(void){ __asm__ volatile($code); }
>>+EOF
>>+log_file $TMPC
>>+test_cmd $cc $CPPFLAGS $CFLAGS $flags "$@" $CC_C $(cc_o $TMPO) $TMPC &&
>>+enable $name && append $extra "$flags"
>>+}
>>+
>
>Use function check_inline_asm_flags is suggested.

To avoid adding '-mmsa' to the global CFLAGS, use check_inline_asm. e.g. 
enabled msa && check_inline_asm msa '"addvi.b $w0, $w1, 1"' '-mmsa'
enabled mmi && check_inline_asm mmi '"punpcklhw $f0, $f0, $f0"' "-mloongson-mmi"
enabled loongson3 && check_inline_asm loongson3 '"gsldxc1 $f0, 0($2, $3)"' 
'-mloongson-ext'

>
>> check_inline_asm_flags(){
>> log check_inline_asm_flags "$@"
>> name="$1"
>>@@ -2551,7 +2571,7 @@ mips64r6_deps="mips"
>> mipsfpu_deps="mips"
>> mipsdsp_deps="mips"
>> mipsdspr2_deps="mips"
>>-mmi_deps="mips"
>>+mmi_deps_any="loongson2 loongson3"
>> msa_deps="mipsfpu"
>> msa2_deps="msa"
>>
>>@@ -4999,29 +5019,57 @@ elif enabled bfin; then
>>
>> elif enabled mips; then
>>
>>-cpuflags="-march=$cpu"
>>-
>> if [ "$cpu" != "generic" ]; then
>>-disable mips32r2
>>-disable mips32r5
>>-disable mips64r2
>>-disable mips32r6
>>-disable mips64r6
>>-disable loongson2
>>-disable loongson3
>>+# DSP is disabled by deafult as they can't be detected at runtime
>>+disable mipsdsp
>>+disable mipsdspr2
>>+
>>+cpuflags="-march=$cpu"
>>
>> case $cpu in
>>-24kc|24kf*|24kec|34kc|1004kc|24kef*|34kf*|1004kf*|74kc|74kf)
>>+# General ISA levels
>>+mips1|mips3)
>>+disable msa
>>+;;
>>+mips32r2)
>> enable mips32r2
>>+;;
>>+mips32r5)
>>+enable mips32r5
>>+;;
>>+mips64r2|mips64r5)
>>+enable mips64r2
>>+;;
>>+# Cores from MIPS(MTI)
>>+24kc)
>>+disable mipsfpu
>>+;;
>>+24kf*|24kec|34kc|74Kc|1004kc)
>>+disable mmi
>> disable msa
>> ;;
>>-p5600|i6400|p6600)
>>-

Re: [FFmpeg-devel] [PATCH 1/3] ffbuild: Refine MIPS handling

2020-05-30 Thread Shiyou Yin

>-Original Message-
>From: ffmpeg-devel-boun...@ffmpeg.org [mailto:ffmpeg-devel-boun...@ffmpeg.org] 
>On Behalf Of
>Jiaxun Yang
>Sent: Tuesday, May 26, 2020 5:48 PM
>To: ffmpeg-devel@ffmpeg.org
>Cc: yinshi...@loongson.cn; Jiaxun Yang
>Subject: [FFmpeg-devel] [PATCH 1/3] ffbuild: Refine MIPS handling
>
>To enable runtime detection for MIPS, we need to refine ffbuild
>part to support buildding these feature together.
>
>Firstly, we fixed configure, let it probe native ability of toolchain
>to decide wether a feature can to be enabled, also clearly marked
>the conflictions between loongson2 & loongson3 and Release 6 & rest.
>
>Secondly, we compile MMI and MSA C sources with their own flags to ensure
>their flags won't pollute the whole program and generate illegal code.
>
>Signed-off-by: Jiaxun Yang 
>---
> configure| 179 +++
> ffbuild/common.mak   |  10 ++-
> libavcodec/mips/Makefile |   3 +-
> 3 files changed, 117 insertions(+), 75 deletions(-)
>
>diff --git a/configure b/configure
>index f97cad0298..8dc3874642 100755
>--- a/configure
>+++ b/configure
>@@ -1113,6 +1113,26 @@ void foo(void){ __asm__ volatile($code); }
> EOF
> }
>
>+check_extra_inline_asm_flags(){
>+log check_extra_inline_asm_flags "$@"
>+name="$1"
>+extra=$2
>+code="$3"
>+flags=''
>+shift 3
>+while [ "$1" != "" ]; do
>+  append flags $1
>+  shift
>+done;
>+disable $name
>+cat > $TMPC <+void foo(void){ __asm__ volatile($code); }
>+EOF
>+log_file $TMPC
>+test_cmd $cc $CPPFLAGS $CFLAGS $flags "$@" $CC_C $(cc_o $TMPO) $TMPC &&
>+enable $name && append $extra "$flags"
>+}
>+

Use function check_inline_asm_flags is suggested.

> check_inline_asm_flags(){
> log check_inline_asm_flags "$@"
> name="$1"
>@@ -2551,7 +2571,7 @@ mips64r6_deps="mips"
> mipsfpu_deps="mips"
> mipsdsp_deps="mips"
> mipsdspr2_deps="mips"
>-mmi_deps="mips"
>+mmi_deps_any="loongson2 loongson3"
> msa_deps="mipsfpu"
> msa2_deps="msa"
>
>@@ -4999,29 +5019,57 @@ elif enabled bfin; then
>
> elif enabled mips; then
>
>-cpuflags="-march=$cpu"
>-
> if [ "$cpu" != "generic" ]; then
>-disable mips32r2
>-disable mips32r5
>-disable mips64r2
>-disable mips32r6
>-disable mips64r6
>-disable loongson2
>-disable loongson3
>+# DSP is disabled by deafult as they can't be detected at runtime
>+disable mipsdsp
>+disable mipsdspr2
>+
>+cpuflags="-march=$cpu"
>
> case $cpu in
>-24kc|24kf*|24kec|34kc|1004kc|24kef*|34kf*|1004kf*|74kc|74kf)
>+# General ISA levels
>+mips1|mips3)
>+disable msa
>+;;
>+mips32r2)
> enable mips32r2
>+;;
>+mips32r5)
>+enable mips32r5
>+;;
>+mips64r2|mips64r5)
>+enable mips64r2
>+;;
>+# Cores from MIPS(MTI)
>+24kc)
>+disable mipsfpu
>+;;
>+24kf*|24kec|34kc|74Kc|1004kc)
>+disable mmi
> disable msa
> ;;
>-p5600|i6400|p6600)
>-disable mipsdsp
>-disable mipsdspr2
>+24kef*|34kf*|1004kf*)
>+disable mmi
>+disable msa
>+enable mipsdsp
>+;;
>+p5600)
>+disable mmi
>+enable mips32r5
>+check_cflags "-mtune=p5600" && check_cflags "-msched-weight 
>-mload-store-pairs
>-funroll-loops"
>+;;
>+i6400)
>+disable mmi
>+enable mips64r6
>+check_cflags "-mtune=i6400 -mabi=64" && check_cflags 
>"-msched-weight
>-mload-store-pairs -funroll-loops" && check_ldflags "-mabi=64"
>+;;
>+p6600)
>+disable mmi
>+enable mips64r6
>+check_cflags "-mtune=p6600 -mabi=64" && check_cflags 
>"-msched-weight
>-mload-store-pairs -funroll-loops" && check_ldflags "-mabi=64"
> ;;
>+# Cores from Loongson
> loongson*)
>-enable loongson2
>-enable loongson3
> enable local_aligned
> enable simd_align_16
> enable fast_64bit
>@@ -5029,8 +5077,6 @@ elif enabled mips; then
> enable fast_cmov
> enable fast_unaligned
> disable aligned_stack
>-disable mipsdsp
>-disable mipsdspr2
> # When gcc version less than 5.3.0, add 
> -fno-expensive-optimizations flag.
> if [ $cc == gcc ]; then
> gcc_version=$(gcc -dumpversion)
>@@ -5042,62 +5088,26 @@ elif enabled mips; then
> fi
> case $cpu in
> loongson3*)
>+

Re: [FFmpeg-devel] [PATCH 1/3] ffbuild: Refine MIPS handling

2020-05-27 Thread Shiyou Yin



>-Original Message-
>From: ffmpeg-devel-boun...@ffmpeg.org [mailto:ffmpeg-devel-boun...@ffmpeg.org] 
>On Behalf Of
>Jiaxun Yang
>Sent: Tuesday, May 26, 2020 5:48 PM
>To: ffmpeg-devel@ffmpeg.org
>Cc: yinshi...@loongson.cn; Jiaxun Yang
>Subject: [FFmpeg-devel] [PATCH 1/3] ffbuild: Refine MIPS handling
>
>To enable runtime detection for MIPS, we need to refine ffbuild
>part to support buildding these feature together.
>
>Firstly, we fixed configure, let it probe native ability of toolchain
>to decide wether a feature can to be enabled, also clearly marked
>the conflictions between loongson2 & loongson3 and Release 6 & rest.
>
>Secondly, we compile MMI and MSA C sources with their own flags to ensure
>their flags won't pollute the whole program and generate illegal code.
>
>Signed-off-by: Jiaxun Yang 
>---
> configure| 179 +++
> ffbuild/common.mak   |  10 ++-
> libavcodec/mips/Makefile |   3 +-
> 3 files changed, 117 insertions(+), 75 deletions(-)
>
>diff --git a/configure b/configure
>index f97cad0298..8dc3874642 100755
>--- a/configure
>+++ b/configure
>@@ -1113,6 +1113,26 @@ void foo(void){ __asm__ volatile($code); }
> EOF
> }
>
>+check_extra_inline_asm_flags(){
>+log check_extra_inline_asm_flags "$@"
>+name="$1"
>+extra=$2
>+code="$3"
>+flags=''
>+shift 3
>+while [ "$1" != "" ]; do
>+  append flags $1
>+  shift
>+done;
>+disable $name
>+cat > $TMPC <+void foo(void){ __asm__ volatile($code); }
>+EOF
>+log_file $TMPC
>+test_cmd $cc $CPPFLAGS $CFLAGS $flags "$@" $CC_C $(cc_o $TMPO) $TMPC &&
>+enable $name && append $extra "$flags"
>+}
>+
> check_inline_asm_flags(){
> log check_inline_asm_flags "$@"
> name="$1"
>@@ -2551,7 +2571,7 @@ mips64r6_deps="mips"
> mipsfpu_deps="mips"
> mipsdsp_deps="mips"
> mipsdspr2_deps="mips"
>-mmi_deps="mips"
>+mmi_deps_any="loongson2 loongson3"
> msa_deps="mipsfpu"
> msa2_deps="msa"
>
>@@ -4999,29 +5019,57 @@ elif enabled bfin; then
>
> elif enabled mips; then
>
>-cpuflags="-march=$cpu"
>-
> if [ "$cpu" != "generic" ]; then
>-disable mips32r2
>-disable mips32r5
>-disable mips64r2
>-disable mips32r6
>-disable mips64r6
>-disable loongson2
>-disable loongson3
>+# DSP is disabled by deafult as they can't be detected at runtime
>+disable mipsdsp
>+disable mipsdspr2
>+
>+cpuflags="-march=$cpu"
>
> case $cpu in
>-24kc|24kf*|24kec|34kc|1004kc|24kef*|34kf*|1004kf*|74kc|74kf)
>+# General ISA levels
>+mips1|mips3)
>+disable msa
>+;;
>+mips32r2)
> enable mips32r2
>+;;
>+mips32r5)
>+enable mips32r5
>+;;
>+mips64r2|mips64r5)
>+enable mips64r2
>+;;
>+# Cores from MIPS(MTI)
>+24kc)
>+disable mipsfpu
>+;;
>+24kf*|24kec|34kc|74Kc|1004kc)
>+disable mmi
> disable msa
> ;;
>-p5600|i6400|p6600)
>-disable mipsdsp
>-disable mipsdspr2
>+24kef*|34kf*|1004kf*)
>+disable mmi
>+disable msa
>+enable mipsdsp
>+;;
>+p5600)
>+disable mmi
>+enable mips32r5
>+check_cflags "-mtune=p5600" && check_cflags "-msched-weight 
>-mload-store-pairs
>-funroll-loops"
>+;;
>+i6400)
>+disable mmi
>+enable mips64r6
>+check_cflags "-mtune=i6400 -mabi=64" && check_cflags 
>"-msched-weight
>-mload-store-pairs -funroll-loops" && check_ldflags "-mabi=64"
>+;;
>+p6600)
>+disable mmi
>+enable mips64r6
>+check_cflags "-mtune=p6600 -mabi=64" && check_cflags 
>"-msched-weight
>-mload-store-pairs -funroll-loops" && check_ldflags "-mabi=64"
> ;;
>+# Cores from Loongson
> loongson*)
>-enable loongson2
>-enable loongson3
> enable local_aligned
> enable simd_align_16
> enable fast_64bit
>@@ -5029,8 +5077,6 @@ elif enabled mips; then
> enable fast_cmov
> enable fast_unaligned
> disable aligned_stack
>-disable mipsdsp
>-disable mipsdspr2
> # When gcc version less than 5.3.0, add 
> -fno-expensive-optimizations flag.
> if [ $cc == gcc ]; then
> gcc_version=$(gcc -dumpversion)
>@@ -5042,62 +5088,26 @@ elif enabled mips; then
> fi
> case $cpu in
> loongson3*)
>+enable loongson3
>+

Re: [FFmpeg-devel] [PATCH v2] avcodec/mips: msa optimizations for vc1dsp

2019-10-28 Thread Shiyou Yin

>-Original Message-
>From: ffmpeg-devel-boun...@ffmpeg.org [mailto:ffmpeg-devel-boun...@ffmpeg.org] 
>On Behalf Of gxw
>Sent: Monday, October 21, 2019 3:57 PM
>To: ffmpeg-devel@ffmpeg.org
>Subject: [FFmpeg-devel] [PATCH v2] avcodec/mips: msa optimizations for vc1dsp
>
>Performance of WMV3 decoding has speed up from 3.66x to 5.23x tested on 3A4000.
>---
> libavcodec/mips/Makefile|   1 +
> libavcodec/mips/vc1dsp_init_mips.c  |  30 ++-
> libavcodec/mips/vc1dsp_mips.h   |  23 ++
> libavcodec/mips/vc1dsp_msa.c| 461 
> libavutil/mips/generic_macros_msa.h |   3 +
> 5 files changed, 514 insertions(+), 4 deletions(-)
> create mode 100644 libavcodec/mips/vc1dsp_msa.c
>
>diff --git a/libavcodec/mips/Makefile b/libavcodec/mips/Makefile
>index c5b54d5..b4993f6 100644
>--- a/libavcodec/mips/Makefile
>+++ b/libavcodec/mips/Makefile
>@@ -89,3 +89,4 @@ MMI-OBJS-$(CONFIG_WMV2DSP)+= 
>mips/wmv2dsp_mmi.o
> MMI-OBJS-$(CONFIG_HEVC_DECODER)   += mips/hevcdsp_mmi.o
> MMI-OBJS-$(CONFIG_VP3DSP) += mips/vp3dsp_idct_mmi.o
> MMI-OBJS-$(CONFIG_VP9_DECODER)+= mips/vp9_mc_mmi.o
>+MSA-OBJS-$(CONFIG_VC1_DECODER)+= mips/vc1dsp_msa.o
>diff --git a/libavcodec/mips/vc1dsp_init_mips.c 
>b/libavcodec/mips/vc1dsp_init_mips.c
>index 4adc9e1..c0007ff 100644
>--- a/libavcodec/mips/vc1dsp_init_mips.c
>+++ b/libavcodec/mips/vc1dsp_init_mips.c
>@@ -23,6 +23,10 @@
> #include "vc1dsp_mips.h"
> #include "config.h"
>
>+#define FN_ASSIGN(OP, X, Y, INSN) \
>+dsp->OP##vc1_mspel_pixels_tab[1][X+4*Y] = 
>ff_##OP##vc1_mspel_mc##X##Y##INSN; \
>+dsp->OP##vc1_mspel_pixels_tab[0][X+4*Y] = 
>ff_##OP##vc1_mspel_mc##X##Y##_16##INSN
>+
> #if HAVE_MMI
> static av_cold void vc1dsp_init_mmi(VC1DSPContext *dsp)
> {
>@@ -49,10 +53,6 @@ static av_cold void vc1dsp_init_mmi(VC1DSPContext *dsp)
> dsp->vc1_v_loop_filter16 = ff_vc1_v_loop_filter16_mmi;
> dsp->vc1_h_loop_filter16 = ff_vc1_h_loop_filter16_mmi;
>
>-#define FN_ASSIGN(OP, X, Y, INSN) \
>-dsp->OP##vc1_mspel_pixels_tab[1][X+4*Y] = 
>ff_##OP##vc1_mspel_mc##X##Y##INSN; \
>-dsp->OP##vc1_mspel_pixels_tab[0][X+4*Y] = 
>ff_##OP##vc1_mspel_mc##X##Y##_16##INSN
>-
> FN_ASSIGN(put_, 0, 0, _mmi);
> FN_ASSIGN(put_, 0, 1, _mmi);
> FN_ASSIGN(put_, 0, 2, _mmi);
>@@ -100,9 +100,31 @@ static av_cold void vc1dsp_init_mmi(VC1DSPContext *dsp)
> }
> #endif /* HAVE_MMI */
>
>+#if HAVE_MSA
>+static av_cold void vc1dsp_init_msa(VC1DSPContext *dsp)
>+{
>+dsp->vc1_inv_trans_8x8 = ff_vc1_inv_trans_8x8_msa;
>+dsp->vc1_inv_trans_4x8 = ff_vc1_inv_trans_4x8_msa;
>+dsp->vc1_inv_trans_8x4 = ff_vc1_inv_trans_8x4_msa;
>+
>+FN_ASSIGN(put_, 1, 1, _msa);
>+FN_ASSIGN(put_, 1, 2, _msa);
>+FN_ASSIGN(put_, 1, 3, _msa);
>+FN_ASSIGN(put_, 2, 1, _msa);
>+FN_ASSIGN(put_, 2, 2, _msa);
>+FN_ASSIGN(put_, 2, 3, _msa);
>+FN_ASSIGN(put_, 3, 1, _msa);
>+FN_ASSIGN(put_, 3, 2, _msa);
>+FN_ASSIGN(put_, 3, 3, _msa);
>+}
>+#endif /* HAVE_MSA */
>+
> av_cold void ff_vc1dsp_init_mips(VC1DSPContext *dsp)
> {
> #if HAVE_MMI
> vc1dsp_init_mmi(dsp);
> #endif /* HAVE_MMI */
>+#if HAVE_MSA
>+vc1dsp_init_msa(dsp);
>+#endif /* HAVE_MSA */
> }
>diff --git a/libavcodec/mips/vc1dsp_mips.h b/libavcodec/mips/vc1dsp_mips.h
>index 0db85fa..5f72e60 100644
>--- a/libavcodec/mips/vc1dsp_mips.h
>+++ b/libavcodec/mips/vc1dsp_mips.h
>@@ -191,4 +191,27 @@ void ff_avg_no_rnd_vc1_chroma_mc4_mmi(uint8_t *dst /* 
>align 8 */,
>   uint8_t *src /* align 1 */,
>   int stride, int h, int x, int y);
>
>+void ff_vc1_inv_trans_8x8_msa(int16_t block[64]);
>+void ff_vc1_inv_trans_8x4_msa(uint8_t *dest, ptrdiff_t linesize, int16_t 
>*block);
>+void ff_vc1_inv_trans_4x8_msa(uint8_t *dest, ptrdiff_t linesize, int16_t 
>*block);
>+
>+#define FF_PUT_VC1_MSPEL_MC_MSA(hmode, vmode) 
>\
>+void ff_put_vc1_mspel_mc ## hmode ## vmode ## _msa(uint8_t *dst,  
>\
>+  const uint8_t *src, 
>\
>+  ptrdiff_t stride, int rnd); 
>\
>+void ff_put_vc1_mspel_mc ## hmode ## vmode ## _16_msa(uint8_t *dst,   
>\
>+  const uint8_t *src, 
>\
>+  ptrdiff_t stride, int rnd);
>+
>+FF_PUT_VC1_MSPEL_MC_MSA(1, 1);
>+FF_PUT_VC1_MSPEL_MC_MSA(1, 2);
>+FF_PUT_VC1_MSPEL_MC_MSA(1, 3);
>+
>+FF_PUT_VC1_MSPEL_MC_MSA(2, 1);
>+FF_PUT_VC1_MSPEL_MC_MSA(2, 2);
>+FF_PUT_VC1_MSPEL_MC_MSA(2, 3);
>+
>+FF_PUT_VC1_MSPEL_MC_MSA(3, 1);
>+FF_PUT_VC1_MSPEL_MC_MSA(3, 2);
>+FF_PUT_VC1_MSPEL_MC_MSA(3, 3);
> #endif /* AVCODEC_MIPS_VC1DSP_MIPS_H */
>diff --git a/libavcodec/mips/vc1dsp_msa.c b/libavcodec/mips/vc1dsp_msa.c
>new file mode 100644
>index 000..6e588e8
>--- /dev/null
>+++ b/libavcodec/mips/vc1dsp_msa.c
>@@ -0,0

Re: [FFmpeg-devel] [PATCH] avcodec/mips: msa optimizations for vc1dsp

2019-10-14 Thread Shiyou Yin

>diff --git a/libavcodec/mips/vc1dsp_msa.c b/libavcodec/mips/vc1dsp_msa.c
>new file mode 100644
>index 000..1619ea4
>--- /dev/null
>+++ b/libavcodec/mips/vc1dsp_msa.c
>@@ -0,0 +1,483 @@
>+/*
>+ * Loongson SIMD optimized vc1dsp
>+ *
>+ * Copyright (c) 2019 Loongson Technology Corporation Limited
>+ *gxw 
>+ *
>+ * This file is part of FFmpeg.
>+ *
>+ * FFmpeg is free software; you can redistribute it and/or
>+ * modify it under the terms of the GNU Lesser General Public
>+ * License as published by the Free Software Foundation; either
>+ * version 2.1 of the License, or (at your option) any later version.
>+ *
>+ * FFmpeg is distributed in the hope that it will be useful,
>+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
>+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>+ * Lesser General Public License for more details.
>+ *
>+ * You should have received a copy of the GNU Lesser General Public
>+ * License along with FFmpeg; if not, write to the Free Software
>+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 
>USA
>+ */
>+
>+#include "vc1dsp_mips.h"
>+#include "constants.h"
>+#include "libavutil/mips/generic_macros_msa.h"
>+
>+void ff_vc1_inv_trans_8x8_msa(int16_t block[64])
>+{
>+v8i16 in0, in1, in2, in3, in4, in5, in6, in7;
>+v4i32 in_r0, in_r1, in_r2, in_r3, in_r4, in_r5, in_r6, in_r7;
>+v4i32 in_l0, in_l1, in_l2, in_l3, in_l4, in_l5, in_l6, in_l7;
>+v4i32 t_r1, t_r2, t_r3, t_r4, t_r5, t_r6, t_r7, t_r8;
>+v4i32 t_l1, t_l2, t_l3, t_l4, t_l5, t_l6, t_l7, t_l8;
>+v4i32 cnst_12 = {12, 12, 12, 12};
>+v4i32 cnst_4 = {4, 4, 4, 4};
>+v4i32 cnst_16 = {16, 16, 16, 16};
>+v4i32 cnst_6 = {6, 6, 6, 6};
>+v4i32 cnst_15 = {15, 15, 15, 15};
>+v4i32 cnst_9 = {9, 9, 9, 9};
>+v4i32 cnst_1 = {1, 1, 1, 1};
>+v4i32 cnst_64 = {64, 64, 64, 64};
>+
>+LD_SH8(block, 8, in0, in1, in2, in3, in4, in5, in6, in7);
>+UNPCK_SH_SW(in0, in_r0, in_l0);
>+UNPCK_SH_SW(in1, in_r1, in_l1);
>+UNPCK_SH_SW(in2, in_r2, in_l2);
>+UNPCK_SH_SW(in3, in_r3, in_l3);
>+UNPCK_SH_SW(in4, in_r4, in_l4);
>+UNPCK_SH_SW(in5, in_r5, in_l5);
>+UNPCK_SH_SW(in6, in_r6, in_l6);
>+UNPCK_SH_SW(in7, in_r7, in_l7);
>+// First loop
>+t_r1 = cnst_12 * (in_r0 + in_r4) + cnst_4;
>+t_l1 = cnst_12 * (in_l0 + in_l4) + cnst_4;
>+t_r2 = cnst_12 * (in_r0 - in_r4) + cnst_4;
>+t_l2 = cnst_12 * (in_l0 - in_l4) + cnst_4;
>+t_r3 = cnst_16 * in_r2 + cnst_6 * in_r6;
>+t_l3 = cnst_16 * in_l2 + cnst_6 * in_l6;
>+t_r4 = cnst_6 * in_r2 - cnst_16 * in_r6;
>+t_l4 = cnst_6 * in_l2 - cnst_16 * in_l6;
>+
>+ADD4(t_r1, t_r3, t_l1, t_l3, t_r2, t_r4, t_l2, t_l4, t_r5, t_l5, t_r6, 
>t_l6);
>+SUB4(t_r2, t_r4, t_l2, t_l4, t_r1, t_r3, t_l1, t_l3, t_r7, t_l7, t_r8, 
>t_l8);
>+t_r1 = cnst_16 * in_r1 + cnst_15 * in_r3 + cnst_9 * in_r5 + cnst_4 * 
>in_r7;
>+t_l1 = cnst_16 * in_l1 + cnst_15 * in_l3 + cnst_9 * in_l5 + cnst_4 * 
>in_l7;
>+t_r2 = cnst_15 * in_r1 - cnst_4 * in_r3 - cnst_16 * in_r5 - cnst_9 * 
>in_r7;
>+t_l2 = cnst_15 * in_l1 - cnst_4 * in_l3 - cnst_16 * in_l5 - cnst_9 * 
>in_l7;
>+t_r3 = cnst_9 * in_r1 - cnst_16 * in_r3 + cnst_4 * in_r5 + cnst_15 * 
>in_r7;
>+t_l3 = cnst_9 * in_l1 - cnst_16 * in_l3 + cnst_4 * in_l5 + cnst_15 * 
>in_l7;
>+t_r4 = cnst_4 * in_r1 - cnst_9 * in_r3 + cnst_15 * in_r5 - cnst_16 * 
>in_r7;
>+t_l4 = cnst_4 * in_l1 - cnst_9 * in_l3 + cnst_15 * in_l5 - cnst_16 * 
>in_l7;
>+
>+in_r0 = (t_r5 + t_r1) >> 3;
>+in_l0 = (t_l5 + t_l1) >> 3;
>+in_r1 = (t_r6 + t_r2) >> 3;
>+in_l1 = (t_l6 + t_l2) >> 3;
>+in_r2 = (t_r7 + t_r3) >> 3;
>+in_l2 = (t_l7 + t_l3) >> 3;
>+in_r3 = (t_r8 + t_r4) >> 3;
>+in_l3 = (t_l8 + t_l4) >> 3;
>+
>+in_r4 = (t_r8 - t_r4) >> 3;
>+in_l4 = (t_l8 - t_l4) >> 3;
>+in_r5 = (t_r7 - t_r3) >> 3;
>+in_l5 = (t_l7 - t_l3) >> 3;
>+in_r6 = (t_r6 - t_r2) >> 3;
>+in_l6 = (t_l6 - t_l2) >> 3;
>+in_r7 = (t_r5 - t_r1) >> 3;
>+in_l7 = (t_l5 - t_l1) >> 3;
>+TRANSPOSE4x4_SW_SW(in_r0, in_r1, in_r2, in_r3, in_r0, in_r1, in_r2, 
>in_r3);
>+TRANSPOSE4x4_SW_SW(in_l0, in_l1, in_l2, in_l3, t_l1, t_l2, t_l3, t_l4);
>+TRANSPOSE4x4_SW_SW(in_r4, in_r5, in_r6, in_r7, in_l0, in_l1, in_l2, 
>in_l3);
>+TRANSPOSE4x4_SW_SW(in_l4, in_l5, in_l6, in_l7, in_l4, in_l5, in_l6, 
>in_l7);
>+in_r4 = t_l1, in_r5 = t_l2, in_r6 = t_l3, in_r7 = t_l4;

It's better to transpose 'in_l0, in_l1, in_l2, in_l3' directly into themselves, 
and ' in_r4, in_r5, in_r6, in_r7' the same.


>+// Second loop
>+t_r1 = cnst_12 * (in_r0 + in_r4) + cnst_64;
>+t_l1 = cnst_12 * (in_l0 + in_l4) + cnst_64;
>+t_r2 = cnst_12 * (in_r0 - in_r4) + cnst_64;
>+t_l2 = cnst_12 * (in_l0 - in_l4) + cnst_64;
>+t_r3 = cnst_16 * in_r2 + cnst_6 * in_r6;
>+t_l3 = cnst_16 * in_l2 + cnst_6 * in_l6;
>+t_r4 = cnst_6 * in_r2 - cnst_16 * in_r6;
>+t_l4 = cnst_6 * in_l2 - cnst_16 * in_l6;

Re: [FFmpeg-devel] [PATCH] avcodec/mips: Fixed four warnings in vc1dsp

2019-10-13 Thread Shiyou Yin

>-Original Message-
>From: ffmpeg-devel-boun...@ffmpeg.org [mailto:ffmpeg-devel-boun...@ffmpeg.org] 
>On Behalf Of gxw
>Sent: Saturday, October 12, 2019 10:48 AM
>To: ffmpeg-devel@ffmpeg.org
>Subject: [FFmpeg-devel] [PATCH] avcodec/mips: Fixed four warnings in vc1dsp
>
>Change the stride argument to ptrdiff_t in the following functions:
>ff_put_no_rnd_vc1_chroma_mc8_mmi, ff_put_no_rnd_vc1_chroma_mc4_mmi,
>ff_avg_no_rnd_vc1_chroma_mc8_mmi, ff_avg_no_rnd_vc1_chroma_mc4_mmi.
>---
> libavcodec/mips/vc1dsp_mips.h | 8 
> libavcodec/mips/vc1dsp_mmi.c  | 8 
> 2 files changed, 8 insertions(+), 8 deletions(-)
>
>diff --git a/libavcodec/mips/vc1dsp_mips.h b/libavcodec/mips/vc1dsp_mips.h
>index 5f72e60..5897dae 100644
>--- a/libavcodec/mips/vc1dsp_mips.h
>+++ b/libavcodec/mips/vc1dsp_mips.h
>@@ -180,16 +180,16 @@ void ff_vc1_h_loop_filter16_mmi(uint8_t *src, int 
>stride, int pq);
>
> void ff_put_no_rnd_vc1_chroma_mc8_mmi(uint8_t *dst /* align 8 */,
>   uint8_t *src /* align 1 */,
>-  int stride, int h, int x, int y);
>+  ptrdiff_t stride, int h, int x, int y);
> void ff_put_no_rnd_vc1_chroma_mc4_mmi(uint8_t *dst /* align 8 */,
>   uint8_t *src /* align 1 */,
>-  int stride, int h, int x, int y);
>+  ptrdiff_t stride, int h, int x, int y);
> void ff_avg_no_rnd_vc1_chroma_mc8_mmi(uint8_t *dst /* align 8 */,
>   uint8_t *src /* align 1 */,
>-  int stride, int h, int x, int y);
>+  ptrdiff_t stride, int h, int x, int y);
> void ff_avg_no_rnd_vc1_chroma_mc4_mmi(uint8_t *dst /* align 8 */,
>   uint8_t *src /* align 1 */,
>-  int stride, int h, int x, int y);
>+  ptrdiff_t stride, int h, int x, int y);
>
> void ff_vc1_inv_trans_8x8_msa(int16_t block[64]);
> void ff_vc1_inv_trans_8x4_msa(uint8_t *dest, ptrdiff_t linesize, int16_t 
> *block);
>diff --git a/libavcodec/mips/vc1dsp_mmi.c b/libavcodec/mips/vc1dsp_mmi.c
>index db314de..9837868 100644
>--- a/libavcodec/mips/vc1dsp_mmi.c
>+++ b/libavcodec/mips/vc1dsp_mmi.c
>@@ -2241,7 +2241,7 @@ DECLARE_FUNCTION(3, 3)
>
> void ff_put_no_rnd_vc1_chroma_mc8_mmi(uint8_t *dst /* align 8 */,
>   uint8_t *src /* align 1 */,
>-  int stride, int h, int x, int y)
>+  ptrdiff_t stride, int h, int x, int y)
> {
> const int A = (8 - x) * (8 - y);
> const int B = (x) * (8 - y);
>@@ -2296,7 +2296,7 @@ void ff_put_no_rnd_vc1_chroma_mc8_mmi(uint8_t *dst /* 
>align 8 */,
>
> void ff_put_no_rnd_vc1_chroma_mc4_mmi(uint8_t *dst /* align 8 */,
>   uint8_t *src /* align 1 */,
>-  int stride, int h, int x, int y)
>+  ptrdiff_t stride, int h, int x, int y)
> {
> const int A = (8 - x) * (8 - y);
> const int B = (x) * (8 - y);
>@@ -2349,7 +2349,7 @@ void ff_put_no_rnd_vc1_chroma_mc4_mmi(uint8_t *dst /* 
>align 8 */,
>
> void ff_avg_no_rnd_vc1_chroma_mc8_mmi(uint8_t *dst /* align 8 */,
>   uint8_t *src /* align 1 */,
>-  int stride, int h, int x, int y)
>+  ptrdiff_t stride, int h, int x, int y)
> {
> const int A = (8 - x) * (8 - y);
> const int B = (x) * (8 - y);
>@@ -2407,7 +2407,7 @@ void ff_avg_no_rnd_vc1_chroma_mc8_mmi(uint8_t *dst /* 
>align 8 */,
>
> void ff_avg_no_rnd_vc1_chroma_mc4_mmi(uint8_t *dst /* align 8 */,
>   uint8_t *src /* align 1 */,
>-  int stride, int h, int x, int y)
>+  ptrdiff_t stride, int h, int x, int y)
> {
> const int A = (8 - x) * (8 - y);
> const int B = (x) * (8 - y);
>--
>2.1.0
>

LGTM.


___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH] avcodec/mips: Fix a warnning of indentation not reflect the block structure.

2019-09-08 Thread Shiyou Yin

The indentation of code dose not reflect the if block structure in
'apply_ltp_mips', and this will generate a warnning when build with
'-Wall' or '-Wmisleading-indentation'.
---
 libavcodec/mips/aacdec_mips.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/libavcodec/mips/aacdec_mips.c b/libavcodec/mips/aacdec_mips.c
index 253cdeb..01a2b30 100644
--- a/libavcodec/mips/aacdec_mips.c
+++ b/libavcodec/mips/aacdec_mips.c
@@ -237,9 +237,9 @@ static void apply_ltp_mips(AACContext *ac, 
SingleChannelElement *sce)
 
 if (ltp->lag < 1024)
 num_samples = ltp->lag + 1024;
-j = (2048 - num_samples) >> 2;
-k = (2048 - num_samples) & 3;
-p_predTime = [num_samples];
+j = (2048 - num_samples) >> 2;
+k = (2048 - num_samples) & 3;
+p_predTime = [num_samples];
 
 for (i = 0; i < num_samples; i++)
 predTime[i] = sce->ltp_state[i + 2048 - ltp->lag] * ltp->coef;
-- 
2.1.0


___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH] avutil/mips: remove redundant code in TRANSPOSE16x8_UB_UB.

2019-08-13 Thread Shiyou Yin

---
 libavutil/mips/generic_macros_msa.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/libavutil/mips/generic_macros_msa.h 
b/libavutil/mips/generic_macros_msa.h
index 9ac0583..219ff07 100644
--- a/libavutil/mips/generic_macros_msa.h
+++ b/libavutil/mips/generic_macros_msa.h
@@ -2523,8 +2523,6 @@
 out5 = (v16u8) __msa_ilvod_w((v4i32) tmp3_m, (v4i32) tmp2_m);\
  \
 tmp2_m = (v16u8) __msa_ilvod_h((v8i16) tmp5_m, (v8i16) tmp4_m);  \
-tmp2_m = (v16u8) __msa_ilvod_h((v8i16) tmp5_m, (v8i16) tmp4_m);  \
-tmp3_m = (v16u8) __msa_ilvod_h((v8i16) tmp7_m, (v8i16) tmp6_m);  \
 tmp3_m = (v16u8) __msa_ilvod_h((v8i16) tmp7_m, (v8i16) tmp6_m);  \
 out3 = (v16u8) __msa_ilvev_w((v4i32) tmp3_m, (v4i32) tmp2_m);\
 out7 = (v16u8) __msa_ilvod_w((v4i32) tmp3_m, (v4i32) tmp2_m);\
-- 
2.1.0


___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH v4] avutil/mips: refine msa macros CLIP_*.

2019-08-11 Thread Shiyou Yin

>-Original Message-
>From: ffmpeg-devel-boun...@ffmpeg.org [mailto:ffmpeg-devel-boun...@ffmpeg.org] 
>On Behalf Of
>Michael Niedermayer
>Sent: Friday, August 9, 2019 12:07 AM
>To: FFmpeg development discussions and patches
>Subject: Re: [FFmpeg-devel] [PATCH v4] avutil/mips: refine msa macros CLIP_*.
>
>On Thu, Aug 08, 2019 at 09:49:35AM +0800, 顾希伟 wrote:
>> > 发件人: "Michael Niedermayer" 
>> > 发送时间: 2019-08-08 07:05:13 (星期四)
>> > 收件人: "FFmpeg development discussions and patches"
>> > 
>> > 抄送:
>> > 主题: Re: [FFmpeg-devel] [PATCH v4] avutil/mips: refine msa macros CLIP_*.
>> >
>> > On Wed, Aug 07, 2019 at 05:52:00PM +0800, gxw wrote:
>> > > Changing details as following:
>> > > 1. Remove the local variable 'out_m' in 'CLIP_SH' and store the result in
>> > >source vector.
>> > > 2. Refine the implementation of macro 'CLIP_SH_0_255' and 
>> > > 'CLIP_SW_0_255'.
>> > >Performance of VP8 decoding has speed up about 1.1%(from 7.03x to 
>> > > 7.11x).
>> > >Performance of H264 decoding has speed up about 0.5%(from 4.35x to 
>> > > 4.37x).
>> > >Performance of Theora decoding has speed up about 0.7%(from 5.79x to 
>> > > 5.83x).
>> > > 3. Remove redundant macro 'CLIP_SH/Wn_0_255_MAX_SATU' and use 
>> > > 'CLIP_SH/Wn_0_255'
>> > >instead, because there are no difference in the effect of this two 
>> > > macros.
>> >
>> > can these 3 things be split into 3 patches ?
>> > It would be clearer if each change would be in its own patch
>> >
>> > thanks
>> >
>> > [...]
>>
>> It can be split into 3 patches. But there some benefits as 1 patch, these 
>> macros belong to the same
>>class and are highly relevant. It is more intuitive to put them in a patch.
>
>hmm
>does anyone else has any oppinion about this ?
>
>if not ill apply it
>

In fact, change 2 and 3 is related closely. it's using a new macro to replace 
'CLIP_SH/Wn_0_255' and
 'CLIP_SH/Wn_0_255_MAX_SATU'. So, It's better to put 2&3 in one patch. 
Change 1 belongs to the same macro type of change 2&3. Putting it together is 
mainly because of there are
too many macros are pending refactor, It's a balance between patch complexity 
and patch number.
So it's acceptable to me. 



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH v4] avutil/mips: refine msa macros CLIP_*.

2019-08-07 Thread Shiyou Yin

LGTM.

>-Original Message-
>From: ffmpeg-devel-boun...@ffmpeg.org [mailto:ffmpeg-devel-boun...@ffmpeg.org] 
>On Behalf Of gxw
>Sent: Wednesday, August 7, 2019 5:52 PM
>To: ffmpeg-devel@ffmpeg.org
>Subject: [FFmpeg-devel] [PATCH v4] avutil/mips: refine msa macros CLIP_*.
>
>Changing details as following:
>1. Remove the local variable 'out_m' in 'CLIP_SH' and store the result in
>   source vector.
>2. Refine the implementation of macro 'CLIP_SH_0_255' and 'CLIP_SW_0_255'.
>   Performance of VP8 decoding has speed up about 1.1%(from 7.03x to 7.11x).
>   Performance of H264 decoding has speed up about 0.5%(from 4.35x to 4.37x).
>   Performance of Theora decoding has speed up about 0.7%(from 5.79x to 5.83x).
>3. Remove redundant macro 'CLIP_SH/Wn_0_255_MAX_SATU' and use 
>'CLIP_SH/Wn_0_255'
>   instead, because there are no difference in the effect of this two macros.
>---
> libavcodec/mips/h264dsp_msa.c   |  39 +--
> libavcodec/mips/h264idct_msa.c  |   7 +-
> libavcodec/mips/hevc_idct_msa.c |  21 +++---
> libavcodec/mips/hevc_lpf_sao_msa.c  | 132 ++--
> libavcodec/mips/hevc_mc_bi_msa.c|  44 ++--
> libavcodec/mips/hevc_mc_biw_msa.c   |  56 +++
> libavcodec/mips/hevc_mc_uniw_msa.c  |  40 +--
> libavcodec/mips/hevcpred_msa.c  |   8 +--
> libavcodec/mips/idctdsp_msa.c   |   9 +--
> libavcodec/mips/qpeldsp_msa.c   |   4 +-
> libavcodec/mips/simple_idct_msa.c   |  98 +++---
> libavcodec/mips/vp3dsp_idct_msa.c   |  68 +++
> libavcodec/mips/vp8_idct_msa.c  |   5 +-
> libavcodec/mips/vp9_idct_msa.c  |  10 ++-
> libavutil/mips/generic_macros_msa.h | 119 +---
> 15 files changed, 280 insertions(+), 380 deletions(-)
>
>diff --git a/libavcodec/mips/h264dsp_msa.c b/libavcodec/mips/h264dsp_msa.c
>index c4ba8c4..dd05982 100644
>--- a/libavcodec/mips/h264dsp_msa.c
>+++ b/libavcodec/mips/h264dsp_msa.c
>@@ -413,8 +413,7 @@ static void avc_biwgt_8x8_msa(uint8_t *src, uint8_t *dst, 
>int32_t stride,
> tmp7 = __msa_dpadd_s_h(offset, wgt, vec7);
> SRA_4V(tmp0, tmp1, tmp2, tmp3, denom);
> SRA_4V(tmp4, tmp5, tmp6, tmp7, denom);
>-CLIP_SH4_0_255(tmp0, tmp1, tmp2, tmp3);
>-CLIP_SH4_0_255(tmp4, tmp5, tmp6, tmp7);
>+CLIP_SH8_0_255(tmp0, tmp1, tmp2, tmp3, tmp4, tmp5, tmp6, tmp7);
> PCKEV_B2_UB(tmp1, tmp0, tmp3, tmp2, dst0, dst1);
> PCKEV_B2_UB(tmp5, tmp4, tmp7, tmp6, dst2, dst3);
> ST_D8(dst0, dst1, dst2, dst3, 0, 1, 0, 1, 0, 1, 0, 1, dst, stride);
>@@ -475,8 +474,7 @@ static void avc_biwgt_8x16_msa(uint8_t *src, uint8_t *dst, 
>int32_t stride,
>
> SRA_4V(temp0, temp1, temp2, temp3, denom);
> SRA_4V(temp4, temp5, temp6, temp7, denom);
>-CLIP_SH4_0_255(temp0, temp1, temp2, temp3);
>-CLIP_SH4_0_255(temp4, temp5, temp6, temp7);
>+CLIP_SH8_0_255(temp0, temp1, temp2, temp3, temp4, temp5, temp6, 
>temp7);
> PCKEV_B4_UB(temp1, temp0, temp3, temp2, temp5, temp4, temp7, temp6,
> dst0, dst1, dst2, dst3);
> ST_D8(dst0, dst1, dst2, dst3, 0, 1, 0, 1, 0, 1, 0, 1, dst, stride);
>@@ -531,7 +529,7 @@ static void avc_biwgt_8x16_msa(uint8_t *src, uint8_t *dst, 
>int32_t stride,
> temp = p1_or_q1_org_in << 1;  \
> clip3 = clip3 - temp; \
> clip3 = __msa_ave_s_h(p2_or_q2_org_in, clip3);\
>-clip3 = CLIP_SH(clip3, negate_tc_in, tc_in);  \
>+CLIP_SH(clip3, negate_tc_in, tc_in);  \
> p1_or_q1_out = p1_or_q1_org_in + clip3;   \
> }
>
>@@ -549,7 +547,7 @@ static void avc_biwgt_8x16_msa(uint8_t *src, uint8_t *dst, 
>int32_t stride,
> delta = q0_sub_p0 + p1_sub_q1;  \
> delta >>= 3;\
> \
>-delta = CLIP_SH(delta, negate_threshold_in, threshold_in);  \
>+CLIP_SH(delta, negate_threshold_in, threshold_in);  \
> \
> p0_or_q0_out = p0_or_q0_org_in + delta; \
> q0_or_p0_out = q0_or_p0_org_in - delta; \
>@@ -598,7 +596,7 @@ static void avc_biwgt_8x16_msa(uint8_t *src, uint8_t *dst, 
>int32_t stride,
> delta = q0_sub_p0 + p1_sub_q1;   \
> delta = __msa_srari_h(delta, 3); \
>  \
>-delta = CLIP_SH(delta, -tc, tc); \
>+CLIP_SH(delta, -tc, tc); \
>  \
> ILVR_B2_SH(zeros, src1, zeros, src2, res0_r, res1_r);\
>

Re: [FFmpeg-devel] [PATCH v2] avutil/mips: refactor msa SLDI_Bn_0 and SLDI_Bn macros.

2019-08-06 Thread Shiyou Yin

LGTM.

>-Original Message-
>From: ffmpeg-devel-boun...@ffmpeg.org [mailto:ffmpeg-devel-boun...@ffmpeg.org] 
>On Behalf Of gxw
>Sent: Tuesday, August 6, 2019 7:11 PM
>To: ffmpeg-devel@ffmpeg.org
>Subject: [FFmpeg-devel] [PATCH v2] avutil/mips: refactor msa SLDI_Bn_0 and 
>SLDI_Bn macros.
>
>Changing details as following:
>1. The previous order of parameters are irregular and difficult to
>   understand. Adjust the order of the parameters according to the
>   rule: (RTYPE, input registers, input mask/input index/..., output 
> registers).
>   Most of the existing msa macros follow the rule.
>2. Remove the redundant macro SLDI_Bn_0 and use SLDI_Bn instead.
>---
> libavcodec/mips/h264dsp_msa.c   |  9 ++--
> libavcodec/mips/h264qpel_msa.c  | 64 ++--
> libavcodec/mips/hevc_lpf_sao_msa.c  | 70 ---
> libavcodec/mips/hevcpred_msa.c  | 30 ++---
> libavcodec/mips/hpeldsp_msa.c   | 66 ++---
> libavcodec/mips/me_cmp_msa.c|  8 ++--
> libavcodec/mips/qpeldsp_msa.c   | 84 ++---
> libavcodec/mips/vp8_mc_msa.c|  4 +-
> libavcodec/mips/vp9_idct_msa.c  |  3 +-
> libavcodec/mips/vp9_lpf_msa.c   |  3 +-
> libavcodec/mips/vp9_mc_msa.c| 16 +++
> libavutil/mips/generic_macros_msa.h | 80 ++-
> 12 files changed, 222 insertions(+), 215 deletions(-)
>
>diff --git a/libavcodec/mips/h264dsp_msa.c b/libavcodec/mips/h264dsp_msa.c
>index 89fe399..c4ba8c4 100644
>--- a/libavcodec/mips/h264dsp_msa.c
>+++ b/libavcodec/mips/h264dsp_msa.c
>@@ -620,7 +620,7 @@ static void avc_biwgt_8x16_msa(uint8_t *src, uint8_t *dst, 
>int32_t stride,
>  \
> out0 = (v16u8) __msa_ilvr_b((v16i8) in1, (v16i8) in0);   \
> out1 = (v16u8) __msa_sldi_b(zero_m, (v16i8) out0, 2);\
>-SLDI_B2_0_UB(out1, out2, out2, out3, 2); \
>+SLDI_B2_UB(zero_m, out1, zero_m, out2, 2, out2, out3);   \
> }
>
> #define AVC_LPF_H_2BYTE_CHROMA_422(src, stride, tc_val, alpha, beta, res)  \
>@@ -1025,7 +1025,8 @@ static void 
>avc_h_loop_filter_luma_mbaff_intra_msa(uint8_t *src, int32_t
>stride,
>
> ILVR_W2_SB(tmp2, tmp0, tmp3, tmp1, src6, src3);
> ILVL_W2_SB(tmp2, tmp0, tmp3, tmp1, src1, src5);
>-SLDI_B4_0_SB(src6, src1, src3, src5, src0, src2, src4, src7, 8);
>+SLDI_B4_SB(zeros, src6, zeros, src1, zeros, src3, zeros, src5,
>+   8, src0, src2, src4, src7);
>
> p0_asub_q0 = __msa_asub_u_b((v16u8) src2, (v16u8) src3);
> p1_asub_p0 = __msa_asub_u_b((v16u8) src1, (v16u8) src2);
>@@ -1116,10 +1117,10 @@ static void 
>avc_h_loop_filter_luma_mbaff_intra_msa(uint8_t *src, int32_t
>stride,
> ILVRL_H2_SH(zeros, dst2_x, tmp2, tmp3);
>
> ILVR_W2_UB(tmp2, tmp0, tmp3, tmp1, dst0, dst4);
>-SLDI_B2_0_UB(dst0, dst4, dst1, dst5, 8);
>+SLDI_B2_UB(zeros, dst0, zeros, dst4, 8, dst1, dst5);
> dst2_x = (v16u8) __msa_ilvl_w((v4i32) tmp2, (v4i32) tmp0);
> dst2_y = (v16u8) __msa_ilvl_w((v4i32) tmp3, (v4i32) tmp1);
>-SLDI_B2_0_UB(dst2_x, dst2_y, dst3_x, dst3_y, 8);
>+SLDI_B2_UB(zeros, dst2_x, zeros, dst2_y, 8, dst3_x, dst3_y);
>
> out0 = __msa_copy_u_w((v4i32) dst0, 0);
> out1 = __msa_copy_u_h((v8i16) dst0, 2);
>diff --git a/libavcodec/mips/h264qpel_msa.c b/libavcodec/mips/h264qpel_msa.c
>index df7e3e2..e435c18 100644
>--- a/libavcodec/mips/h264qpel_msa.c
>+++ b/libavcodec/mips/h264qpel_msa.c
>@@ -790,8 +790,8 @@ void ff_put_h264_qpel16_mc10_msa(uint8_t *dst, const 
>uint8_t *src,
>  minus5b, res4, res5, res6, res7);
> DPADD_SB4_SH(vec2, vec5, vec8, vec11, plus20b, plus20b, plus20b,
>  plus20b, res4, res5, res6, res7);
>-SLDI_B2_SB(src1, src3, src0, src2, src0, src2, 2);
>-SLDI_B2_SB(src5, src7, src4, src6, src4, src6, 2);
>+SLDI_B4_SB(src1, src0, src3, src2, src5, src4, src7, src6, 2,
>+   src0, src2, src4, src6);
> SRARI_H4_SH(res0, res1, res2, res3, 5);
> SRARI_H4_SH(res4, res5, res6, res7, 5);
> SAT_SH4_SH(res0, res1, res2, res3, 7);
>@@ -858,8 +858,8 @@ void ff_put_h264_qpel16_mc30_msa(uint8_t *dst, const 
>uint8_t *src,
>  minus5b, res4, res5, res6, res7);
> DPADD_SB4_SH(vec2, vec5, vec8, vec11, plus20b, plus20b, plus20b,
>  plus20b, res4, res5, res6, res7);
>-SLDI_B2_SB(src1, src3, src0, src2, src0, src2, 3);
>-SLDI_B2_SB(src5, src7, src4, src6, src4, src6, 3);
>+SLDI_B4_SB(src1, src0, src3, src2, src5, src4, src7, src6, 3,
>+   src0, src2, src4, src6);
> SRARI_H4_SH(res0, res1, res2, res3, 5);
> SRARI_H4_SH(res4, res5, res6, res7, 5);
> SAT_SH4_SH(res0, res1, res2, res3, 7);
>@@ -911,10 +911,10 @@ void ff_put_h264_qpel8_mc10_msa(uint8_t *dst, const 
>uint8_t *src,
> VSHF_B2_SB(src6, src6, src7, src7, mask2,

Re: [FFmpeg-devel] [PATCH] avutil/mips: refactor msa SLDI_Bn_0 and SLDI_Bn macros.

2019-08-06 Thread Shiyou Yin

>-Original Message-
>From: ffmpeg-devel-boun...@ffmpeg.org [mailto:ffmpeg-devel-boun...@ffmpeg.org] 
>On Behalf Of gxw
>Sent: Tuesday, August 6, 2019 11:38 AM
>To: ffmpeg-devel@ffmpeg.org
>Subject: [FFmpeg-devel] [PATCH] avutil/mips: refactor msa SLDI_Bn_0 and 
>SLDI_Bn macros.
>
>Changing details as following:
>1. Modified the parameters order of SLDI_Bn. The previous order of
>   parameters is difficult to understand.
>2. Remove the redundant macro SLDI_Bn_0 and use SLDI_Bn instead.

It would be better to add some explanation for the new macro parameter or the 
rules it followed in the commit message.


……
>diff --git a/libavutil/mips/generic_macros_msa.h 
>b/libavutil/mips/generic_macros_msa.h
>index 9ac0583..a5f8bba 100644
>--- a/libavutil/mips/generic_macros_msa.h
>+++ b/libavutil/mips/generic_macros_msa.h
>@@ -602,67 +602,48 @@
> }
> #define AVER_UB4_UB(...) AVER_UB4(v16u8, __VA_ARGS__)
>
>-/* Description : Immediate number of columns to slide with zero
>-   Arguments   : Inputs  - in0, in1, slide_val
>- Outputs - out0, out1
>+/* Description : Immediate number of columns to slide
>+   Arguments   : Inputs  - s, d, slide_val
>+ Outputs - out
>  Return Type - as per RTYPE
>-   Details : Byte elements from 'zero_m' vector are slide into 'in0' by
>+   Details : Byte elements from 'd' vector are slide into 's' by
>  number of elements specified by 'slide_val'
> */
>-#define SLDI_B2_0(RTYPE, in0, in1, out0, out1, slide_val) \
>-{ \
>-v16i8 zero_m = { 0 }; \
>-out0 = (RTYPE) __msa_sldi_b((v16i8) zero_m, (v16i8) in0, slide_val);  \
>-out1 = (RTYPE) __msa_sldi_b((v16i8) zero_m, (v16i8) in1, slide_val);  \
>-}
>-#define SLDI_B2_0_UB(...) SLDI_B2_0(v16u8, __VA_ARGS__)
>-#define SLDI_B2_0_SB(...) SLDI_B2_0(v16i8, __VA_ARGS__)
>-#define SLDI_B2_0_SW(...) SLDI_B2_0(v4i32, __VA_ARGS__)
>-
>-#define SLDI_B3_0(RTYPE, in0, in1, in2, out0, out1, out2,  slide_val) \
>-{ \
>-v16i8 zero_m = { 0 }; \
>-SLDI_B2_0(RTYPE, in0, in1, out0, out1, slide_val);\
>-out2 = (RTYPE) __msa_sldi_b((v16i8) zero_m, (v16i8) in2, slide_val);  \
>-}
>-#define SLDI_B3_0_UB(...) SLDI_B3_0(v16u8, __VA_ARGS__)
>-#define SLDI_B3_0_SB(...) SLDI_B3_0(v16i8, __VA_ARGS__)
>-
>-#define SLDI_B4_0(RTYPE, in0, in1, in2, in3,\
>-  out0, out1, out2, out3, slide_val)\
>-{   \
>-SLDI_B2_0(RTYPE, in0, in1, out0, out1, slide_val);  \
>-SLDI_B2_0(RTYPE, in2, in3, out2, out3, slide_val);  \
>+#define SLDI_B1(RTYPE, d, s, slide_val, out)  \
>+{ \
>+out = (RTYPE) __msa_sldi_b((v16i8) d, (v16i8) s, slide_val);  \
> }
>-#define SLDI_B4_0_UB(...) SLDI_B4_0(v16u8, __VA_ARGS__)
>-#define SLDI_B4_0_SB(...) SLDI_B4_0(v16i8, __VA_ARGS__)
>-#define SLDI_B4_0_SH(...) SLDI_B4_0(v8i16, __VA_ARGS__)
>
>-/* Description : Immediate number of columns to slide
>-   Arguments   : Inputs  - in0_0, in0_1, in1_0, in1_1, slide_val
>- Outputs - out0, out1
>- Return Type - as per RTYPE
>-   Details : Byte elements from 'in0_0' vector are slide into 'in1_0' by
>- number of elements specified by 'slide_val'
>-*/
>-#define SLDI_B2(RTYPE, in0_0, in0_1, in1_0, in1_1, out0, out1, slide_val)  \
>-{  \
>-out0 = (RTYPE) __msa_sldi_b((v16i8) in0_0, (v16i8) in1_0, slide_val);  \
>-out1 = (RTYPE) __msa_sldi_b((v16i8) in0_1, (v16i8) in1_1, slide_val);  \
>+#define SLDI_B2(RTYPE, d0, s0, d1, s1, slide_val, out0, out1)  \
>+{  \
>+SLDI_B1(RTYPE, d0, s0, slide_val, out0)\
>+SLDI_B1(RTYPE, d1, s1, slide_val, out1)\
> }
> #define SLDI_B2_UB(...) SLDI_B2(v16u8, __VA_ARGS__)
> #define SLDI_B2_SB(...) SLDI_B2(v16i8, __VA_ARGS__)
> #define SLDI_B2_SH(...) SLDI_B2(v8i16, __VA_ARGS__)
>+#define SLDI_B2_SW(...) SLDI_B2(v4i32, __VA_ARGS__)
>
>-#define SLDI_B3(RTYPE, in0_0, in0_1, in0_2, in1_0, in1_1, in1_2,   \
>-out0, out1, out2, slide_val)   \
>-{  \
>-SLDI_B2(RTYPE, in0_0, in0_1, in1_0, in1_1, out0, out1, slide_val)  \
>-out2 = (RTYPE) __msa_sldi_b((v16i8) in0_2, (v16i8) in1_2, slide_val);  \
>+#define SLDI_B3(RTYPE, d0, s0, d1, s1, d2, s2, slide_val,  \
>+out0, out1, out2)  \
>+{  \
>+

Re: [FFmpeg-devel] [PATCH v3] avutil/mips: Avoid instruction exception caused by gssqc1/gslqc1.

2019-07-30 Thread Shiyou Yin

>-Original Message-
>From: ffmpeg-devel-boun...@ffmpeg.org [mailto:ffmpeg-devel-boun...@ffmpeg.org] 
>On Behalf Of
>Reimar D?ffinger
>Sent: Tuesday, July 30, 2019 2:54 AM
>To: FFmpeg development discussions and patches
>Subject: Re: [FFmpeg-devel] [PATCH v3] avutil/mips: Avoid instruction 
>exception caused by gssqc1/gslqc1.
>
>On 29.07.2019, at 11:54, "Shiyou Yin"  wrote:
>>>
>> DECLARE_ALIGNED is defined in ' libavutil/mem.h ' and related to compiler. 
>> No matter mips or x86,
>> it's definition is ' #define DECLARE_ALIGNED(n,t,v)  t __attribute__ 
>> ((aligned (n))) v' when build
>> with gcc or clang (Specific implementation within the compiler is not 
>> considered here.).
>> In libavcodec/x86, DECLARE_ALIGNED is used to define 8/16/32-byte aligned 
>> variable too.
>
>The aligned attribute does not work reliably with stack variables in some 
>cases.
>Compare with other code, I think you need to use LOCAL_ALIGNED_16 for the 
>stack variable.
>Yes, it might work in your test even with DECLARE_ALIGNED, but it might not be 
>robust.

You are right, LOCAL_ALIGNED_16 might be more robust.
In v5 LOCAL_ALIGNED_16 is used for the stack variables .


___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

1 2 >

1 - 100 of 186 matches

Mail list logo