On Thu, Nov 27, 2014 at 02:35:24PM +0800, rongyan wrote:
Hi,
We present 5 patches to fix h264 bugs for POWER8 little endian, which are
sent in 5 seperate emails.
This is the second, to fix the functions
h264_idct8_add_altivec();
h264_idct_dc_add_internal();
h264_loop_filter_luma_altivec();
write16x4() VEC_1D_DCT();
weight_h264_W_altivec();
biweight_h264_W_altivec();
VEC_LOAD_U8_ADD_S16_STORE_U8();
ALTIVEC_STORE_SUM_CLIP();
And add marcos GET_2PERM(), dstv_load(),vdst_load(), dest_unligned_store().
The fate test result after merge these 5 patches can be found on website by
searching ibmcrl, also attached in the below to facilitate the review. The
passed test cases change from 2017/2243 to 2209/2245.
Thanks.
Rong Yan
--
The world has enough for everyone's need, but not enough for everyone's
greed.
h264dsp.c | 374
+-
1 file changed, 205 insertions(+), 169 deletions(-)
dcaccec4338f960704148c933e1ec454dd4dc6a2
0002-libavcodec-ppc-h264dsp.c-fix-h264_idct8_add_altivec.patch
From 130b20e650a2d83a4c66cd23c10fe943742339f8 Mon Sep 17 00:00:00 2001
From: Rong Yan rongyan...@gmail.com
Date: Thu, 27 Nov 2014 05:49:53 +
Subject: [PATCH 2/5] libavcodec/ppc/h264dsp.c : fix h264_idct8_add_altivec()
h264_idct_dc_add_internal() h264_loop_filter_luma_altivec() write16x4()
VEC_1D_DCT() weight_h264_W_altivec() biweight_h264_W_altivec()
VEC_LOAD_U8_ADD_S16_STORE_U8() ALTIVEC_STORE_SUM_CLIP() add marcos
GET_2PERM() dstv_load() vdst_load() dest_unligned_store() for POWER LE
---
libavcodec/ppc/h264dsp.c | 374
++-
1 file changed, 205 insertions(+), 169 deletions(-)
diff --git a/libavcodec/ppc/h264dsp.c b/libavcodec/ppc/h264dsp.c
index 7fc7e0b..cfce32d 100644
--- a/libavcodec/ppc/h264dsp.c
+++ b/libavcodec/ppc/h264dsp.c
@@ -34,7 +34,7 @@
* IDCT transform:
/
-#define VEC_1D_DCT(vb0,vb1,vb2,vb3,va0,va1,va2,va3) \
+#define VEC_1D_DCT(vb0,vb1,vb2,vb3,va0,va1,va2,va3) {\
/* 1st stage */ \
vz0 = vec_add(vb0,vb2); /* temp[0] = Y[0] + Y[2] */ \
vz1 = vec_sub(vb0,vb2); /* temp[1] = Y[0] - Y[2] */ \
@@ -46,7 +46,8 @@
va0 = vec_add(vz0,vz3); /* x[0] = temp[0] + temp[3] */ \
va1 = vec_add(vz1,vz2); /* x[1] = temp[1] + temp[2] */ \
va2 = vec_sub(vz1,vz2); /* x[2] = temp[1] - temp[2] */ \
-va3 = vec_sub(vz0,vz3)/* x[3] = temp[0] - temp[3] */
+va3 = vec_sub(vz0,vz3);/* x[3] = temp[0] - temp[3] */\
+}
#define VEC_TRANSPOSE_4(a0,a1,a2,a3,b0,b1,b2,b3) \
b0 = vec_mergeh( a0, a0 ); \
@@ -62,14 +63,23 @@
b2 = vec_mergeh( a1, a3 ); \
b3 = vec_mergel( a1, a3 )
-#define VEC_LOAD_U8_ADD_S16_STORE_U8(va) \
-vdst_orig = vec_ld(0, dst); \
-vdst = vec_perm(vdst_orig, zero_u8v, vdst_mask); \
-vdst_ss = (vec_s16) vec_mergeh(zero_u8v, vdst); \
-va = vec_add(va, vdst_ss);\
-va_u8 = vec_packsu(va, zero_s16v);\
-va_u32 = vec_splat((vec_u32)va_u8, 0); \
-vec_ste(va_u32, element, (uint32_t*)dst);
+#if HAVE_BIGENDIAN
+#define vdst_load(d)\
+vdst_orig = vec_ld(0, dst); \
+vdst = vec_perm(vdst_orig, zero_u8v, vdst_mask)
+#else
+#define vdst_load(d)\
+vdst = vec_vsx_ld(0, dst)
+#endif
+
+#define VEC_LOAD_U8_ADD_S16_STORE_U8(va) {\
+vdst_load();\
+vdst_ss = (vec_s16) VEC_MERGEH(zero_u8v, vdst);\
+va = vec_add(va, vdst_ss);\
+va_u8 = vec_packsu(va, zero_s16v);\
+va_u32 = vec_splat((vec_u32)va_u8, 0);\
+vec_ste(va_u32, element, (uint32_t*)dst);\
+}
please dont mix whitespace changes with functional changes
this makes the patch and commit unreadable
it also can cause problems for other developers as rebasing their
work becomes harder if the code changed alot
please leave the whitespaces in place
git show HEAD^^^ -w --stat
libavcodec/ppc/h264dsp.c | 106 +++---
1 file changed, 71 insertions(+), 35 deletions(-)
git show HEAD^^^ --stat
libavcodec/ppc/h264dsp.c | 374 +-
1 file changed, 205 insertions(+), 169 deletions(-)
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
Dictatorship naturally arises out of democracy, and the most aggravated
form of tyranny and slavery out of the most extreme liberty. -- Plato
signature.asc
Description: Digital signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel