Re: [FFmpeg-devel] [Patch 2/5]Fix h264 on POWER LE: libavcodec/ppc/h264dsp.c

2014-11-27 Thread Michael Niedermayer
On Thu, Nov 27, 2014 at 02:35:24PM +0800, rongyan wrote:
 Hi,
 We present 5 patches to fix h264 bugs for POWER8 little endian, which are 
 sent in 5 seperate emails.
 This is the second, to fix the functions  
 h264_idct8_add_altivec();
  
 h264_idct_dc_add_internal();
  
 h264_loop_filter_luma_altivec();
  
 write16x4() VEC_1D_DCT();
  
 weight_h264_W_altivec();
  
 biweight_h264_W_altivec();
  
 VEC_LOAD_U8_ADD_S16_STORE_U8();
  
 ALTIVEC_STORE_SUM_CLIP();
  
 And add marcos  GET_2PERM(), dstv_load(),vdst_load(), dest_unligned_store().
 
 The fate test result after merge these 5 patches can be found on website by 
 searching ibmcrl, also attached in the below to facilitate the review. The 
 passed test cases change from  2017/2243 to 2209/2245.
  
 Thanks.
  Rong Yan
   
   --
   The world has enough for everyone's need, but not enough for everyone's 
 greed.


  h264dsp.c |  374 
 +-
  1 file changed, 205 insertions(+), 169 deletions(-)
 dcaccec4338f960704148c933e1ec454dd4dc6a2  
 0002-libavcodec-ppc-h264dsp.c-fix-h264_idct8_add_altivec.patch
 From 130b20e650a2d83a4c66cd23c10fe943742339f8 Mon Sep 17 00:00:00 2001
 From: Rong Yan rongyan...@gmail.com
 Date: Thu, 27 Nov 2014 05:49:53 +
 Subject: [PATCH 2/5] libavcodec/ppc/h264dsp.c : fix h264_idct8_add_altivec()
  h264_idct_dc_add_internal() h264_loop_filter_luma_altivec() write16x4()
  VEC_1D_DCT() weight_h264_W_altivec() biweight_h264_W_altivec()
  VEC_LOAD_U8_ADD_S16_STORE_U8() ALTIVEC_STORE_SUM_CLIP() add marcos 
  GET_2PERM() dstv_load() vdst_load() dest_unligned_store() for POWER LE
 
 ---
  libavcodec/ppc/h264dsp.c | 374 
 ++-
  1 file changed, 205 insertions(+), 169 deletions(-)
 
 diff --git a/libavcodec/ppc/h264dsp.c b/libavcodec/ppc/h264dsp.c
 index 7fc7e0b..cfce32d 100644
 --- a/libavcodec/ppc/h264dsp.c
 +++ b/libavcodec/ppc/h264dsp.c
 @@ -34,7 +34,7 @@
   * IDCT transform:
   
 /
  
 -#define VEC_1D_DCT(vb0,vb1,vb2,vb3,va0,va1,va2,va3)   \
 +#define VEC_1D_DCT(vb0,vb1,vb2,vb3,va0,va1,va2,va3) {\
  /* 1st stage */   \
  vz0 = vec_add(vb0,vb2);   /* temp[0] = Y[0] + Y[2] */ \
  vz1 = vec_sub(vb0,vb2);   /* temp[1] = Y[0] - Y[2] */ \
 @@ -46,7 +46,8 @@
  va0 = vec_add(vz0,vz3);   /* x[0] = temp[0] + temp[3] */  \
  va1 = vec_add(vz1,vz2);   /* x[1] = temp[1] + temp[2] */  \
  va2 = vec_sub(vz1,vz2);   /* x[2] = temp[1] - temp[2] */  \
 -va3 = vec_sub(vz0,vz3)/* x[3] = temp[0] - temp[3] */
 +va3 = vec_sub(vz0,vz3);/* x[3] = temp[0] - temp[3] */\
 +}
  
  #define VEC_TRANSPOSE_4(a0,a1,a2,a3,b0,b1,b2,b3) \
  b0 = vec_mergeh( a0, a0 ); \
 @@ -62,14 +63,23 @@
  b2 = vec_mergeh( a1, a3 ); \
  b3 = vec_mergel( a1, a3 )
  
 -#define VEC_LOAD_U8_ADD_S16_STORE_U8(va)  \
 -vdst_orig = vec_ld(0, dst);   \
 -vdst = vec_perm(vdst_orig, zero_u8v, vdst_mask);  \
 -vdst_ss = (vec_s16) vec_mergeh(zero_u8v, vdst); \
 -va = vec_add(va, vdst_ss);\
 -va_u8 = vec_packsu(va, zero_s16v);\
 -va_u32 = vec_splat((vec_u32)va_u8, 0);  \
 -vec_ste(va_u32, element, (uint32_t*)dst);
 +#if HAVE_BIGENDIAN
 +#define vdst_load(d)\
 +vdst_orig = vec_ld(0, dst); \
 +vdst = vec_perm(vdst_orig, zero_u8v, vdst_mask)
 +#else
 +#define vdst_load(d)\
 +vdst = vec_vsx_ld(0, dst)
 +#endif
 +
 +#define VEC_LOAD_U8_ADD_S16_STORE_U8(va) {\
 +vdst_load();\
 +vdst_ss = (vec_s16) VEC_MERGEH(zero_u8v, vdst);\
 +va = vec_add(va, vdst_ss);\
 +va_u8 = vec_packsu(va, zero_s16v);\
 +va_u32 = vec_splat((vec_u32)va_u8, 0);\
 +vec_ste(va_u32, element, (uint32_t*)dst);\
 +}

please dont mix whitespace changes with functional changes
this makes the patch and commit unreadable
it also can cause problems for other developers as rebasing their
work becomes harder if the code changed alot
please leave the whitespaces in place

git show HEAD^^^ -w --stat
 libavcodec/ppc/h264dsp.c |  106 +++---
 1 file changed, 71 insertions(+), 35 deletions(-)


git show HEAD^^^  --stat
 libavcodec/ppc/h264dsp.c |  374 +-
 1 file changed, 205 insertions(+), 169 deletions(-)

 [...]

-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Dictatorship naturally arises out of democracy, and the most aggravated
form of tyranny and slavery out of the most extreme liberty. -- Plato


signature.asc
Description: Digital signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [Patch 2/5]Fix h264 on POWER LE: libavcodec/ppc/h264dsp.c

2014-11-26 Thread rongyan
Hi,
We present 5 patches to fix h264 bugs for POWER8 little endian, which are sent 
in 5 seperate emails.
This is the second, to fix the functions  
h264_idct8_add_altivec();
 
h264_idct_dc_add_internal();
 
h264_loop_filter_luma_altivec();
 
write16x4() VEC_1D_DCT();
 
weight_h264_W_altivec();
 
biweight_h264_W_altivec();
 
VEC_LOAD_U8_ADD_S16_STORE_U8();
 
ALTIVEC_STORE_SUM_CLIP();
 
And add marcos  GET_2PERM(), dstv_load(),vdst_load(), dest_unligned_store().

The fate test result after merge these 5 patches can be found on website by 
searching ibmcrl, also attached in the below to facilitate the review. The 
passed test cases change from  2017/2243 to 2209/2245.
 
Thanks.
 Rong Yan
  
  --
  The world has enough for everyone's need, but not enough for everyone's greed.

0002-libavcodec-ppc-h264dsp.c-fix-h264_idct8_add_altivec.patch
Description: Binary data
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel