subject:"\[FFmpeg\-devel\] \[PATCH v2\] swscale\/output\: VSX\-optimize 16\-bit yuv2plane1"

Re: [FFmpeg-devel] [PATCH v2] swscale/output: VSX-optimize 16-bit yuv2plane1

2018-12-14 Thread Michael Niedermayer

On Thu, Dec 13, 2018 at 02:07:58PM +0200, Lauri Kasanen wrote:
> ./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt 
> yuv420p16le \
> -f null -vframes 100 -v error -nostats -
> 
> 2120 UNITS in planar1,   65393 runs,143 skips
> 
> -cpuflags 0
> 
> 19157 UNITS in planar1,   65512 runs, 24 skips
> 
> 9.03632 speedup, 16be similarly.
> 
> Fate passes, each format tested with an image to video conversion.
> 
> Signed-off-by: Lauri Kasanen 
> ---
> 
> v2: Copy-pasted rows were flipped.
> 
>  libswscale/ppc/swscale_vsx.c | 59 
> 
>  1 file changed, 59 insertions(+)

will apply

thanks

[...]
-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

If you drop bombs on a foreign country and kill a hundred thousand
innocent people, expect your government to call the consequence
"unprovoked inhuman terrorist attacks" and use it to justify dropping
more bombs and killing more people. The technology changed, the idea is old.


signature.asc
Description: PGP signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [PATCH v2] swscale/output: VSX-optimize 16-bit yuv2plane1

2018-12-13 Thread Lauri Kasanen

./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt 
yuv420p16le \
-f null -vframes 100 -v error -nostats -

2120 UNITS in planar1,   65393 runs,143 skips

-cpuflags 0

19157 UNITS in planar1,   65512 runs, 24 skips

9.03632 speedup, 16be similarly.

Fate passes, each format tested with an image to video conversion.

Signed-off-by: Lauri Kasanen 
---

v2: Copy-pasted rows were flipped.

 libswscale/ppc/swscale_vsx.c | 59 
 1 file changed, 59 insertions(+)

diff --git a/libswscale/ppc/swscale_vsx.c b/libswscale/ppc/swscale_vsx.c
index 6462c11..70da6ae 100644
--- a/libswscale/ppc/swscale_vsx.c
+++ b/libswscale/ppc/swscale_vsx.c
@@ -180,6 +180,60 @@ static void yuv2plane1_nbps_vsx(const int16_t *src, 
uint16_t *dest, int dstW,
 yuv2plane1_nbps_u(src, dest, dstW, big_endian, output_bits, i);
 }
 
+#undef output_pixel
+
+#define output_pixel(pos, val, bias, signedness) \
+if (big_endian) { \
+AV_WB16(pos, bias + av_clip_ ## signedness ## 16(val >> shift)); \
+} else { \
+AV_WL16(pos, bias + av_clip_ ## signedness ## 16(val >> shift)); \
+}
+
+static void yuv2plane1_16_u(const int32_t *src, uint16_t *dest, int dstW,
+  int big_endian, int output_bits, int start)
+{
+int i;
+const int shift = 3;
+
+for (i = start; i < dstW; i++) {
+int val = src[i] + (1 << (shift - 1));
+output_pixel([i], val, 0, uint);
+}
+}
+
+static void yuv2plane1_16_vsx(const int32_t *src, uint16_t *dest, int dstW,
+   int big_endian, int output_bits)
+{
+const int dst_u = -(uintptr_t)dest & 7;
+const int shift = 3;
+const int add = (1 << (shift - 1));
+const vector uint32_t vadd = (vector uint32_t) {add, add, add, add};
+const vector uint16_t vswap = (vector uint16_t) vec_splat_u16(big_endian ? 
8 : 0);
+const vector uint32_t vshift = (vector uint32_t) vec_splat_u32(shift);
+vector uint32_t v, v2;
+vector uint16_t vd;
+int i;
+
+yuv2plane1_16_u(src, dest, dst_u, big_endian, output_bits, 0);
+
+for (i = dst_u; i < dstW - 7; i += 8) {
+v = vec_vsx_ld(0, (const uint32_t *) [i]);
+v = vec_add(v, vadd);
+v = vec_sr(v, vshift);
+
+v2 = vec_vsx_ld(0, (const uint32_t *) [i + 4]);
+v2 = vec_add(v2, vadd);
+v2 = vec_sr(v2, vshift);
+
+vd = vec_packsu(v, v2);
+vd = vec_rl(vd, vswap);
+
+vec_st(vd, 0, [i]);
+}
+
+yuv2plane1_16_u(src, dest, dstW, big_endian, output_bits, i);
+}
+
 #define yuv2NBPS(bits, BE_LE, is_be, template_size, typeX_t) \
 static void yuv2plane1_ ## bits ## BE_LE ## _vsx(const int16_t *src, \
  uint8_t *dest, int dstW, \
@@ -197,6 +251,8 @@ yuv2NBPS(12, BE, 1, nbps, int16_t)
 yuv2NBPS(12, LE, 0, nbps, int16_t)
 yuv2NBPS(14, BE, 1, nbps, int16_t)
 yuv2NBPS(14, LE, 0, nbps, int16_t)
+yuv2NBPS(16, BE, 1, 16, int32_t)
+yuv2NBPS(16, LE, 0, 16, int32_t)
 
 #endif /* !HAVE_BIGENDIAN */
 
@@ -240,6 +296,9 @@ av_cold void ff_sws_init_swscale_vsx(SwsContext *c)
 case 14:
 c->yuv2plane1 = isBE(dstFormat) ? yuv2plane1_14BE_vsx  : 
yuv2plane1_14LE_vsx;
 break;
+case 16:
+c->yuv2plane1 = isBE(dstFormat) ? yuv2plane1_16BE_vsx  : 
yuv2plane1_16LE_vsx;
+break;
 #endif
 }
 }
-- 
2.6.2

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH v2] swscale/output: VSX-optimize 16-bit yuv2plane1

[FFmpeg-devel] [PATCH v2] swscale/output: VSX-optimize 16-bit yuv2plane1

2 matches

Site Navigation

Mail list logo

Footer information