Re: [FFmpeg-devel] [PATCH] avcodec/mips: [loongson] optimize theora decoding with mmi.

2019-02-14 Thread Michael Niedermayer
On Fri, Feb 15, 2019 at 12:13:43AM +0100, Michael Niedermayer wrote:
> On Wed, Feb 13, 2019 at 05:56:50PM +0800, Shiyou Yin wrote:
> > >-Original Message-
> > >From: ffmpeg-devel-boun...@ffmpeg.org 
> > >[mailto:ffmpeg-devel-boun...@ffmpeg.org] On Behalf Of gxw
> > >Sent: Tuesday, February 12, 2019 6:56 PM
> > >To: ffmpeg-devel@ffmpeg.org
> > >Cc: gxw
> > >Subject: [FFmpeg-devel] [PATCH] avcodec/mips: [loongson] optimize theora 
> > >decoding with mmi.
> > >
> > >Optimize theora decoding with mmi in functions:
> > >1. ff_vp3_idct_add_mmi
> > >2. ff_vp3_idct_put_mmi
> > >3. ff_vp3_idct_dc_add_mmi
> > >4. ff_put_no_rnd_pixels_l2_mmi
> > >
> > >Theora decoding speed improved about 32%(from 88fps to 116fps, Tested on 
> > >loongson 3A3000).
> > >---
> > > libavcodec/mips/Makefile   |   1 +
> > > libavcodec/mips/vp3dsp_idct_mmi.c  | 769 
> > > +
> > > libavcodec/mips/vp3dsp_init_mips.c |  14 +
> > > libavcodec/mips/vp3dsp_mips.h  |   6 +
> > > 4 files changed, 790 insertions(+)
> > > create mode 100644 libavcodec/mips/vp3dsp_idct_mmi.c
> > >
> > 
> > Verified + 1, LGTM.
> 
> will apply

one last minute issue i noticed
The author looks like a nick name or user name, is that intended:
"gxw " ?
I mean do you want "gxw" instead of your full name ?

(iam asking as it cannot be changed after pushing ...)

Thanks

[...]

-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety -- Benjamin Franklin


signature.asc
Description: PGP signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] avcodec/mips: [loongson] optimize theora decoding with mmi.

2019-02-14 Thread Michael Niedermayer
On Wed, Feb 13, 2019 at 05:56:50PM +0800, Shiyou Yin wrote:
> >-Original Message-
> >From: ffmpeg-devel-boun...@ffmpeg.org 
> >[mailto:ffmpeg-devel-boun...@ffmpeg.org] On Behalf Of gxw
> >Sent: Tuesday, February 12, 2019 6:56 PM
> >To: ffmpeg-devel@ffmpeg.org
> >Cc: gxw
> >Subject: [FFmpeg-devel] [PATCH] avcodec/mips: [loongson] optimize theora 
> >decoding with mmi.
> >
> >Optimize theora decoding with mmi in functions:
> >1. ff_vp3_idct_add_mmi
> >2. ff_vp3_idct_put_mmi
> >3. ff_vp3_idct_dc_add_mmi
> >4. ff_put_no_rnd_pixels_l2_mmi
> >
> >Theora decoding speed improved about 32%(from 88fps to 116fps, Tested on 
> >loongson 3A3000).
> >---
> > libavcodec/mips/Makefile   |   1 +
> > libavcodec/mips/vp3dsp_idct_mmi.c  | 769 
> > +
> > libavcodec/mips/vp3dsp_init_mips.c |  14 +
> > libavcodec/mips/vp3dsp_mips.h  |   6 +
> > 4 files changed, 790 insertions(+)
> > create mode 100644 libavcodec/mips/vp3dsp_idct_mmi.c
> >
> 
> Verified + 1, LGTM.

will apply

thx

[...]

-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

If you fake or manipulate statistics in a paper in physics you will never
get a job again.
If you fake or manipulate statistics in a paper in medicin you will get
a job for life at the pharma industry.


signature.asc
Description: PGP signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] avcodec/mips: [loongson] optimize theora decoding with mmi.

2019-02-13 Thread Shiyou Yin
>-Original Message-
>From: ffmpeg-devel-boun...@ffmpeg.org [mailto:ffmpeg-devel-boun...@ffmpeg.org] 
>On Behalf Of gxw
>Sent: Tuesday, February 12, 2019 6:56 PM
>To: ffmpeg-devel@ffmpeg.org
>Cc: gxw
>Subject: [FFmpeg-devel] [PATCH] avcodec/mips: [loongson] optimize theora 
>decoding with mmi.
>
>Optimize theora decoding with mmi in functions:
>1. ff_vp3_idct_add_mmi
>2. ff_vp3_idct_put_mmi
>3. ff_vp3_idct_dc_add_mmi
>4. ff_put_no_rnd_pixels_l2_mmi
>
>Theora decoding speed improved about 32%(from 88fps to 116fps, Tested on 
>loongson 3A3000).
>---
> libavcodec/mips/Makefile   |   1 +
> libavcodec/mips/vp3dsp_idct_mmi.c  | 769 +
> libavcodec/mips/vp3dsp_init_mips.c |  14 +
> libavcodec/mips/vp3dsp_mips.h  |   6 +
> 4 files changed, 790 insertions(+)
> create mode 100644 libavcodec/mips/vp3dsp_idct_mmi.c
>

Verified + 1, LGTM.


___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH] avcodec/mips: [loongson] optimize theora decoding with mmi.

2019-02-12 Thread gxw
Optimize theora decoding with mmi in functions:
1. ff_vp3_idct_add_mmi
2. ff_vp3_idct_put_mmi
3. ff_vp3_idct_dc_add_mmi
4. ff_put_no_rnd_pixels_l2_mmi

Theora decoding speed improved about 32%(from 88fps to 116fps, Tested on 
loongson 3A3000).
---
 libavcodec/mips/Makefile   |   1 +
 libavcodec/mips/vp3dsp_idct_mmi.c  | 769 +
 libavcodec/mips/vp3dsp_init_mips.c |  14 +
 libavcodec/mips/vp3dsp_mips.h  |   6 +
 4 files changed, 790 insertions(+)
 create mode 100644 libavcodec/mips/vp3dsp_idct_mmi.c

diff --git a/libavcodec/mips/Makefile b/libavcodec/mips/Makefile
index 3029872..c827649 100644
--- a/libavcodec/mips/Makefile
+++ b/libavcodec/mips/Makefile
@@ -87,3 +87,4 @@ MMI-OBJS-$(CONFIG_HPELDSP)+= 
mips/hpeldsp_mmi.o
 MMI-OBJS-$(CONFIG_VC1_DECODER)+= mips/vc1dsp_mmi.o
 MMI-OBJS-$(CONFIG_WMV2DSP)+= mips/wmv2dsp_mmi.o
 MMI-OBJS-$(CONFIG_HEVC_DECODER)   += mips/hevcdsp_mmi.o
+MMI-OBJS-$(CONFIG_VP3DSP) += mips/vp3dsp_idct_mmi.o
diff --git a/libavcodec/mips/vp3dsp_idct_mmi.c 
b/libavcodec/mips/vp3dsp_idct_mmi.c
new file mode 100644
index 000..c5c4cf3
--- /dev/null
+++ b/libavcodec/mips/vp3dsp_idct_mmi.c
@@ -0,0 +1,769 @@
+/*
+ * Copyright (c) 2018 gxw 
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include "vp3dsp_mips.h"
+#include "libavutil/intreadwrite.h"
+#include "libavutil/mips/mmiutils.h"
+#include "libavutil/common.h"
+#include "libavcodec/rnd_avg.h"
+
+#define LOAD_CONST(dst, value)\
+"li %[tmp1],  "#value"  \n\t" \
+"dmtc1  %[tmp1],  "#dst"\n\t" \
+"pshufh "#dst",   "#dst", %[ftmp10] \n\t"
+
+static void idct_row_mmi(int16_t *input)
+{
+double ftmp[23];
+uint64_t tmp[2];
+__asm__ volatile (
+"xor%[ftmp10],  %[ftmp10],%[ftmp10] \n\t"
+LOAD_CONST(%[csth_1], 1)
+"li %[tmp0],0x02\n\t"
+"1: \n\t"
+/* Load input */
+"ldc1   %[ftmp0],   0x00(%[input])  \n\t"
+"ldc1   %[ftmp1],   0x10(%[input])  \n\t"
+"ldc1   %[ftmp2],   0x20(%[input])  \n\t"
+"ldc1   %[ftmp3],   0x30(%[input])  \n\t"
+"ldc1   %[ftmp4],   0x40(%[input])  \n\t"
+"ldc1   %[ftmp5],   0x50(%[input])  \n\t"
+"ldc1   %[ftmp6],   0x60(%[input])  \n\t"
+"ldc1   %[ftmp7],   0x70(%[input])  \n\t"
+LOAD_CONST(%[ftmp8], 64277)
+LOAD_CONST(%[ftmp9], 12785)
+"pmulhh %[A],   %[ftmp9], %[ftmp7]  \n\t"
+"pcmpgth%[C],   %[ftmp10],%[ftmp1]  \n\t"
+"or %[mask],%[C], %[csth_1] \n\t"
+"pmullh %[B],   %[ftmp1], %[mask]   \n\t"
+"pmulhuh%[B],   %[ftmp8], %[B]  \n\t"
+"pmullh %[B],   %[B], %[mask]   \n\t"
+"paddh  %[A],   %[A], %[B]  \n\t"
+"paddh  %[A],   %[A], %[C]  \n\t"
+"pcmpgth%[D],   %[ftmp10],%[ftmp7]  \n\t"
+"or %[mask],%[D], %[csth_1] \n\t"
+"pmullh %[ftmp7],   %[ftmp7], %[mask]   \n\t"
+"pmulhuh%[B],   %[ftmp8], %[ftmp7]  \n\t"
+"pmullh %[B],   %[B], %[mask]   \n\t"
+"pmulhh %[C],   %[ftmp9], %[ftmp1]  \n\t"
+"psubh  %[B],   %[C], %[B]  \n\t"
+"psubh  %[B],   %[B], %[D]  \n\t"
+
+LOAD_CONST(%[ftmp8], 54491)
+LOAD_CONST(%[ftmp9], 36410)
+"pcmpgth%[Ad],  %[ftmp10],%[ftmp5]  \n\t"
+"or %[mask],%[Ad],%[csth_1] \n\t"
+"pmullh %[ftmp1],   %[ftmp5], %[mask]   \n\t"
+"pmulhuh%[C],   %[ftmp9], %[ftmp1]  \n\t"
+"pm