date:20140324

Re: [x265] [PATCH 1 of 3] primitives: added C primitives for upShift/downShift of input pixels

2014-03-24 Thread Murugan Vairavel

On Fri, Mar 21, 2014 at 8:44 PM, chen chenm...@163.com wrote:

 At 2014-03-21 13:35:29,muru...@multicorewareinc.com wrote:
 # HG changeset patch
 # User Murugan Vairavel muru...@multicorewareinc.com
 # Date 1395379028 -19800
 #  Fri Mar 21 10:47:08 2014 +0530
 # Node ID 0c4fdd43325e6501698a281862b1c027238a9c9d
 # Parent  fe3fcd9838c02fb65fed8638a13dea9f06f8a9be
 primitives: added C primitives for upShift/downShift of input pixels


 +void planecopy_cp_c(uint8_t *src, intptr_t srcStride, pixel *dst, intptr_t 
 dstStride, int width, int height)
 this function for 8 to 10 convert only

Yes it will do 8-bit to 10-bit upshift only. Should i need to change the
function name???


 +{
 +for (int r = 0; r  height; r++)
 +{
 +for (int c = 0; c  width; c++)
 +{
 +dst[c] = ((pixel)src[c])  2;
 +}
 +
 +dst += dstStride;
 +src += srcStride;
 +}
 +}
 ___
 x265-devel mailing list
 x265-devel@videolan.org
 https://mailman.videolan.org/listinfo/x265-devel




-- 
With Regards,

Murugan. V
+919659287478
___
x265-devel mailing list
x265-devel@videolan.org
https://mailman.videolan.org/listinfo/x265-devel

Re: [x265] [PATCH 2 of 3] testbench: code for testing input pixel upShift/downShift primitives

2014-03-24 Thread Murugan Vairavel

On Fri, Mar 21, 2014 at 8:42 PM, chen chenm...@163.com wrote:

 At 2014-03-21 13:35:30,muru...@multicorewareinc.com wrote:
 # HG changeset patch
 # User Murugan Vairavel muru...@multicorewareinc.com
 # Date 1395379187 -19800
 #  Fri Mar 21 10:49:47 2014 +0530
 # Node ID 435e50b2b92c83e10fdb2bd86bc8e8df91b7338b
 # Parent  0c4fdd43325e6501698a281862b1c027238a9c9d
 testbench: code for testing input pixel upShift/downShift primitives
 
  /* [0] --- Random values
 @@ -79,16 +83,22 @@

  short_test_buff1[0][i]  = rand()  PIXEL_MAX;   // 
  For block copy only

  short_test_buff2[0][i]  = rand() % 16383;   // 
  for addAvg
  int_test_buff[0][i] = rand() % SHORT_MAX;
 +ushort_test_buff[0][i]  = rand() % ((1  10) - 1);
 +uchar_test_buff[0][i]  = rand() % ((1  8) - 1);
 out code include a clip operator, so you can do more dynamic range to
 verify that.

Do you mean to increase the Dynamic range of ushort buffer???

 ___
 x265-devel mailing list
 x265-devel@videolan.org
 https://mailman.videolan.org/listinfo/x265-devel




-- 
With Regards,

Murugan. V
+919659287478
___
x265-devel mailing list
x265-devel@videolan.org
https://mailman.videolan.org/listinfo/x265-devel

Re: [x265] [PATCH 1 of 3] primitives: added C primitives for upShift/downShift of input pixels

2014-03-24 Thread chen



 
On Fri, Mar 21, 2014 at 8:44 PM, chen chenm...@163.com wrote:
At 2014-03-21 13:35:29,muru...@multicorewareinc.com wrote:

# HG changeset patch
# User Murugan Vairavel muru...@multicorewareinc.com
# Date 1395379028 -19800
#  Fri Mar 21 10:47:08 2014 +0530
# Node ID 0c4fdd43325e6501698a281862b1c027238a9c9d
# Parent  fe3fcd9838c02fb65fed8638a13dea9f06f8a9be
primitives: added C primitives for upShift/downShift of input pixels


+void planecopy_cp_c(uint8_t *src, intptr_t srcStride, pixel *dst, intptr_t 
dstStride, int width, int height)

this function for 8 to 10 convert only

Yes it will do 8-bit to 10-bit upshift only. Should i need to change the 
function name??? 
 
Yes, please modify it, I worry about we need 8to12 in future.___
x265-devel mailing list
x265-devel@videolan.org
https://mailman.videolan.org/listinfo/x265-devel

Re: [x265] [PATCH 2 of 3] testbench: code for testing input pixel upShift/downShift primitives

2014-03-24 Thread chen


 
On Fri, Mar 21, 2014 at 8:42 PM, chen chenm...@163.com wrote:
At 2014-03-21 13:35:30,muru...@multicorewareinc.com wrote:

# HG changeset patch
# User Murugan Vairavel muru...@multicorewareinc.com
# Date 1395379187 -19800
#  Fri Mar 21 10:49:47 2014 +0530
# Node ID 435e50b2b92c83e10fdb2bd86bc8e8df91b7338b
# Parent  0c4fdd43325e6501698a281862b1c027238a9c9d
testbench: code for testing input pixel upShift/downShift primitives


 /* [0] --- Random values
@@ -79,16 +83,22 @@
 short_test_buff1[0][i]  = rand()  PIXEL_MAX;   // 
 For block copy only
 short_test_buff2[0][i]  = rand() % 16383;   // 
 for addAvg
 int_test_buff[0][i] = rand() % SHORT_MAX;
+ushort_test_buff[0][i]  = rand() % ((1  10) - 1);
+uchar_test_buff[0][i]  = rand() % ((1  8) - 1);

out code include a clip operator, so you can do more dynamic range to verify 
that.

Do you mean to increase the Dynamic range of ushort buffer???

Yes, more dynamic range may verify our CLIP code___
x265-devel mailing list
x265-devel@videolan.org
https://mailman.videolan.org/listinfo/x265-devel

Re: [x265] [PATCH 3 of 3] asm: code for input pixel upShift/downShift

2014-03-24 Thread chen

On Fri, Mar 21, 2014 at 8:01 PM, chen chenm...@163.com wrote:
At 2014-03-21 13:35:31,muru...@multicorewareinc.com wrote:
# HG changeset patch
# User Murugan Vairavel muru...@multicorewareinc.com
# Date 1395379456 -19800
#  Fri Mar 21 10:54:16 2014 +0530
# Node ID 29728f7728591116192575d411ef2db2dff49c18
# Parent  435e50b2b92c83e10fdb2bd86bc8e8df91b7338b
asm: code for input pixel upShift/downShift

+; Input 10bpp, Output 8bpp, width is multiple of 16
+;
+;void planecopy_sp(uint16_t *src, intptr_t srcStride, pixel *dst, intptr_t 
dstStride, int width, int height, int shift, uint16_t mask)
+;
+INIT_XMM sse2
+cglobal downShift_10, 7,7,3
+movdm0, r6d; m0 = shift
+add r1, r1
+dec r5d+.loopH:
+xor r6, r6
tip: r6 is a offset, when you do prepare 'r1=r1-r4', you may direct operator on 
r0But the pixels processed in each row is not equal to the width(r4), in case 
the width is not a multiple of 16. If i do it as above then the output mismatch 
will occur. Your algorithm do a loop that width multiple of 16 except last one, 
you need not to modify this part now, just for you information. 
+.loopW:
+movum1, [r0 + r6 * 2]
+movum2, [r0 + r6 * 2 + 16]
+psrlw   m1, m0
+psrlw   m2, m0
+packuswbm1, m2
+movu[r2 + r6], m1
+
+add r6, 16
+cmp r6d, r4d
+jl  .loopW
+
+; move to next row
+lea r0, [r0 + r1]
+lea r2, [r2 + r3]
add r0,r1
add r2,r3I will modify that. 

+dec r5d
+jnz .loopH
+
+;processing last row of every frame [To handle width which not a multiple of 
16]
+
+.loop16:
+movum1, [r0]
+movum2, [r0 + 16]
+psrlw   m1, m0
+psrlw   m2, m0
+packuswbm1, m2
+movu[r2], m1
+
+add r0, 2 * mmsize
+add r2, mmsize
+sub r4d, 16
+jz  .end
+cmp r4d, 15
+jg  .loop16

-- (X  16)  (X 15) ??

means??

 

 r4d = X

sub r4d,16   cmp   jz   - (X-16 == 0)

cmp r4d, 15   jg - (X-16  15)  --- here logic a little problem, it's 
right but reduce, when it is true, means (x-16=16)  -_-!

+cmp r4d, 8
+jl  .process4
+movum1, [r0]
+psrlw   m1, m0
+packuswbm1, m1
+movh[r2], m1
+
+add r0, mmsize
+add r2, 8
+sub r4d, 8
+jz  .end
+
+.process4:
+cmp r4d, 4
+jl  .process2
+movhm1,[r0]
+psrlw   m1, m0
+packuswbm1, m1
+movd[r2], m1
+
+add r0, 8
+add r2, 4
+sub r4d, 4
+jz  .end
+
+.process2:
+cmp r4d, 2
+jl  .process1
+movdm1, [r0]
+psrlw   m1, m0
+packuswbm1, m1
+movdr6, m1
+mov [r2], r6w
+
+add r0, 4
+add r2, 2
+sub r4d, 2
+jz  .end
+
+.process1:
+movdm1, [r0]
+psrlw   m1, m0
+packuswbm1, m1
+movdr6, m1
+mov [r2], r6b
+.end:
+RET

(4, 2, 1) pixels path may share calculate code

Do you mean defining a macro for that?? 

 

No, last 16 or 8 pixel may cover all of case (result in different part in 
register), so we need not to calculate many times.___
x265-devel mailing list
x265-devel@videolan.org
https://mailman.videolan.org/listinfo/x265-devel

[x265] [PATCH 1 of 7] improvement TEncBinCABAC::encodeBin by temporary variant and reduce AND operator

2014-03-24 Thread Min Chen

# HG changeset patch
# User Min Chen chenm...@163.com
# Date 1395687461 25200
# Node ID 842aab45735b6b309f6945d4a9f04588ee0e8324
# Parent  fdd7c6168cf42a11240ff1c7fc7b401605524db2
improvement TEncBinCABAC::encodeBin by temporary variant and reduce AND operator

diff -r fdd7c6168cf4 -r 842aab45735b 
source/Lib/TLibEncoder/TEncBinCoderCABAC.cpp
--- a/source/Lib/TLibEncoder/TEncBinCoderCABAC.cpp  Fri Mar 21 14:44:35 
2014 -0500
+++ b/source/Lib/TLibEncoder/TEncBinCoderCABAC.cpp  Mon Mar 24 11:57:41 
2014 -0700
@@ -190,25 +190,30 @@
 }
 ctxModel.bBinsCoded = 1;
 
-uint32_t mps = sbacGetMps(mstate);
+uint32_t range = m_range;
 uint32_t state = sbacGetState(mstate);
-uint32_t lps = g_lpsTable[state][(m_range  6)  3];
-m_range -= lps;
+uint32_t lps = g_lpsTable[state][((uint8_t)range  6)];
+range -= lps;
 
-assert(lps != 0);
+assert(lps = 2);
 
-int numBits = (uint32_t)(m_range - 256)  31;
+int numBits = (uint32_t)(range - 256)  31;
 uint32_t low = m_low;
-uint32_t range = m_range;
-if (binValue != mps)
+
+// NOTE: MPS must be LOWEST bit in mstate
+assert(((binValue ^ mstate)  1) == (binValue != sbacGetMps(mstate)));
+if ((binValue ^ mstate)  1)
 {
 // NOTE: lps is non-zero and the maximum of idx is 8 because lps less 
than 256
 //numBits   = g_renormTable[lps  3];
 unsigned long idx;
 CLZ32(idx, lps);
+assert(state != 63 || idx == 1);
+
 numBits = 8 - idx;
-if (numBits = 6)
+if (state = 63)
 numBits = 6;
+assert(numBits = 6);
 
 low+= range;
 range   = lps;

___
x265-devel mailing list
x265-devel@videolan.org
https://mailman.videolan.org/listinfo/x265-devel

[x265] [PATCH 2 of 7] improvement TEncBinCABAC::writeOut by mask operator and local variant

2014-03-24 Thread Min Chen

# HG changeset patch
# User Min Chen chenm...@163.com
# Date 1395687480 25200
# Node ID 928156df5d736de1c8f053ae06d8bb6ce11185e4
# Parent  842aab45735b6b309f6945d4a9f04588ee0e8324
improvement TEncBinCABAC::writeOut by mask operator and local variant

diff -r 842aab45735b -r 928156df5d73 
source/Lib/TLibEncoder/TEncBinCoderCABAC.cpp
--- a/source/Lib/TLibEncoder/TEncBinCoderCABAC.cpp  Mon Mar 24 11:57:41 
2014 -0700
+++ b/source/Lib/TLibEncoder/TEncBinCoderCABAC.cpp  Mon Mar 24 11:58:00 
2014 -0700
@@ -350,9 +350,10 @@
 void TEncBinCABAC::writeOut()
 {
 uint32_t leadByte = m_low  (13 + m_bitsLeft);
+uint32_t low_mask = (uint32_t)(~0)  (11 + 8 - m_bitsLeft);
 
 m_bitsLeft -= 8;
-m_low = 0xu  (11 - m_bitsLeft);
+m_low = low_mask;
 
 if (leadByte == 0xff)
 {
@@ -360,25 +361,22 @@
 }
 else
 {
-if (m_numBufferedBytes  0)
+uint32_t numBufferedBytes = m_numBufferedBytes;
+if (numBufferedBytes  0)
 {
 uint32_t carry = leadByte  8;
 uint32_t byteTowrite = m_bufferedByte + carry;
-m_bufferedByte = leadByte  0xff;
 m_bitIf-writeByte(byteTowrite);
 
 byteTowrite = (0xff + carry)  0xff;
-while (m_numBufferedBytes  1)
+while (numBufferedBytes  1)
 {
 m_bitIf-writeByte(byteTowrite);
-m_numBufferedBytes--;
+numBufferedBytes--;
 }
 }
-else
-{
-m_numBufferedBytes = 1;
-m_bufferedByte = leadByte;
-}
+m_numBufferedBytes = 1;
+m_bufferedByte = (uint8_t)leadByte;
 }
 }
 

___
x265-devel mailing list
x265-devel@videolan.org
https://mailman.videolan.org/listinfo/x265-devel

[x265] [PATCH 4 of 7] optimize: replace g_groupIdx[] by getGroupIdx()

2014-03-24 Thread Min Chen

# HG changeset patch
# User Min Chen chenm...@163.com
# Date 1395687533 25200
# Node ID 105fa844e4e3e2c6bffb8d2ea613e56e429cdf64
# Parent  700a63ba598db1828534ee824fbb1f93fef86c0f
optimize: replace g_groupIdx[] by getGroupIdx()

diff -r 700a63ba598d -r 105fa844e4e3 source/Lib/TLibCommon/TComRom.cpp
--- a/source/Lib/TLibCommon/TComRom.cpp Mon Mar 24 11:58:22 2014 -0700
+++ b/source/Lib/TLibCommon/TComRom.cpp Mon Mar 24 11:58:53 2014 -0700
@@ -434,7 +434,6 @@
 // 

 
 const uint8_t g_minInGroup[10] = { 0, 1, 2, 3, 4, 6, 8, 12, 16, 24 };
-const uint8_t g_groupIdx[32]   = { 0, 1, 2, 3, 4, 4, 5, 5, 6, 6, 6, 6, 7, 7, 
7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 9, 9 };
 
 // Rice parameters for absolute transform levels
 const uint8_t g_goRiceRange[5] = { 7, 14, 26, 46, 78 };
diff -r 700a63ba598d -r 105fa844e4e3 source/Lib/TLibCommon/TComRom.h
--- a/source/Lib/TLibCommon/TComRom.h   Mon Mar 24 11:58:22 2014 -0700
+++ b/source/Lib/TLibCommon/TComRom.h   Mon Mar 24 11:58:53 2014 -0700
@@ -128,7 +128,24 @@
 // Scanning order  context mapping table
 // 

 
-extern const uint8_t g_groupIdx[32];
+//extern const uint8_t g_groupIdx[32];
+static inline uint32_t getGroupIdx(const uint32_t idx)
+{
+uint32_t group = (idx  3);
+if (idx = 24)
+group = 2;
+uint32_t groupIdx = ((idx  (group + 1)) - 2) + 4 + (group  1);
+if (idx = 3)
+groupIdx = idx;
+
+#ifdef _DEBUG
+static const uint8_t g_groupIdx[32]   = { 0, 1, 2, 3, 4, 4, 5, 5, 6, 6, 6, 
6, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 9, 9 };
+assert(groupIdx == g_groupIdx[idx]);
+#endif
+
+return groupIdx;
+}
+
 extern const uint8_t g_minInGroup[10];
 
 extern const uint8_t g_goRiceRange[5];  //! maximum value coded with Rice 
codes
diff -r 700a63ba598d -r 105fa844e4e3 source/Lib/TLibCommon/TComTrQuant.cpp
--- a/source/Lib/TLibCommon/TComTrQuant.cpp Mon Mar 24 11:58:22 2014 -0700
+++ b/source/Lib/TLibCommon/TComTrQuant.cpp Mon Mar 24 11:58:53 2014 -0700
@@ -1330,8 +1330,8 @@
  */
 inline double TComTrQuant::xGetRateLast(uint32_t posx, uint32_t posy) const
 {
-uint32_t ctxX = g_groupIdx[posx];
-uint32_t ctxY = g_groupIdx[posy];
+uint32_t ctxX = getGroupIdx(posx);
+uint32_t ctxY = getGroupIdx(posy);
 uint32_t cost = m_estBitsSbac-lastXBits[ctxX] + 
m_estBitsSbac-lastYBits[ctxY];
 
 int32_t maskX = (int32_t)(2 - posx)  31;
diff -r 700a63ba598d -r 105fa844e4e3 source/Lib/TLibEncoder/TEncSbac.cpp
--- a/source/Lib/TLibEncoder/TEncSbac.cpp   Mon Mar 24 11:58:22 2014 -0700
+++ b/source/Lib/TLibEncoder/TEncSbac.cpp   Mon Mar 24 11:58:53 2014 -0700
@@ -1988,8 +1988,8 @@
 }
 
 uint32_t ctxLast;
-uint32_t groupIdxX = g_groupIdx[posx];
-uint32_t groupIdxY = g_groupIdx[posy];
+uint32_t groupIdxX = getGroupIdx(posx);
+uint32_t groupIdxY = getGroupIdx(posy);
 
 int blkSizeOffset = ttype ? NUM_CTX_LAST_FLAG_XY_LUMA : ((log2TrSize - 2) 
* 3 + ((log2TrSize - 1)  2));
 int ctxShift = ttype ? log2TrSize - 2 : ((log2TrSize + 1)  2);

___
x265-devel mailing list
x265-devel@videolan.org
https://mailman.videolan.org/listinfo/x265-devel

[x265] [PATCH 3 of 7] reduce g_minInGroup from uint32_t to uint8_t

2014-03-24 Thread Min Chen

# HG changeset patch
# User Min Chen chenm...@163.com
# Date 1395687502 25200
# Node ID 700a63ba598db1828534ee824fbb1f93fef86c0f
# Parent  928156df5d736de1c8f053ae06d8bb6ce11185e4
reduce g_minInGroup from uint32_t to uint8_t

diff -r 928156df5d73 -r 700a63ba598d source/Lib/TLibCommon/TComRom.cpp
--- a/source/Lib/TLibCommon/TComRom.cpp Mon Mar 24 11:58:00 2014 -0700
+++ b/source/Lib/TLibCommon/TComRom.cpp Mon Mar 24 11:58:22 2014 -0700
@@ -433,7 +433,7 @@
 // Scanning order  context model mapping
 // 

 
-const uint32_t g_minInGroup[10] = { 0, 1, 2, 3, 4, 6, 8, 12, 16, 24 };
+const uint8_t g_minInGroup[10] = { 0, 1, 2, 3, 4, 6, 8, 12, 16, 24 };
 const uint8_t g_groupIdx[32]   = { 0, 1, 2, 3, 4, 4, 5, 5, 6, 6, 6, 6, 7, 7, 
7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 9, 9 };
 
 // Rice parameters for absolute transform levels
diff -r 928156df5d73 -r 700a63ba598d source/Lib/TLibCommon/TComRom.h
--- a/source/Lib/TLibCommon/TComRom.h   Mon Mar 24 11:58:00 2014 -0700
+++ b/source/Lib/TLibCommon/TComRom.h   Mon Mar 24 11:58:22 2014 -0700
@@ -129,7 +129,7 @@
 // 

 
 extern const uint8_t g_groupIdx[32];
-extern const uint32_t g_minInGroup[10];
+extern const uint8_t g_minInGroup[10];
 
 extern const uint8_t g_goRiceRange[5];  //! maximum value coded with Rice 
codes
 //extern const uint8_t g_goRicePrefixLen[5];  //! prefix length for each 
maximum value

___
x265-devel mailing list
x265-devel@videolan.org
https://mailman.videolan.org/listinfo/x265-devel

[x265] [PATCH 5 of 7] improvement by replace SHIFT to MASK_AND

2014-03-24 Thread Min Chen

# HG changeset patch
# User Min Chen chenm...@163.com
# Date 1395687572 25200
# Node ID d39b436d01f293e20fd51a5a53028166e50cee58
# Parent  105fa844e4e3e2c6bffb8d2ea613e56e429cdf64
improvement by replace SHIFT to MASK_AND

diff -r 105fa844e4e3 -r d39b436d01f2 source/Lib/TLibEncoder/TEncSbac.cpp
--- a/source/Lib/TLibEncoder/TEncSbac.cpp   Mon Mar 24 11:58:53 2014 -0700
+++ b/source/Lib/TLibEncoder/TEncSbac.cpp   Mon Mar 24 11:59:32 2014 -0700
@@ -2117,7 +2117,7 @@
 
 // Code position of last coefficient
 int posLastY = posLast  log2TrSize;
-int posLastX = posLast - (posLastY  log2TrSize);
+int posLastX = posLast  (trSize - 1);
 codeLastSignificantXY(posLastX, posLastY, log2TrSize, ttype, 
codingParameters.scanType);
 //= code significance flag =
 ContextModel * const baseCoeffGroupCtx = 
m_contextModels[OFF_SIG_CG_FLAG_CTX + (ttype ? NUM_SIG_CG_FLAG_CTX : 0)];
@@ -2178,9 +2178,9 @@
 if (sig)
 {
 absCoeff[numNonZero] = int(abs(coeff[blkPos]));
-coeffSigns = 2 * coeffSigns + (coeff[blkPos]  0);
+coeffSigns = 2 * coeffSigns + ((uint32_t)coeff[blkPos]  
31);
 numNonZero++;
-if (lastNZPosInCG == -1)
+if (lastNZPosInCG  0)
 {
 lastNZPosInCG = scanPosSig;
 }

___
x265-devel mailing list
x265-devel@videolan.org
https://mailman.videolan.org/listinfo/x265-devel

[x265] [PATCH 7 of 7] cleanup on TComTrQuant::getTUEntropyCodingParameters

2014-03-24 Thread Min Chen

# HG changeset patch
# User Min Chen chenm...@163.com
# Date 1395687606 25200
# Node ID 6d4a78f0c1b603370fcebafa70eee2f2dffdc11a
# Parent  5d22c7cd7cd603a3481720dd2467865012e39d37
cleanup on TComTrQuant::getTUEntropyCodingParameters

diff -r 5d22c7cd7cd6 -r 6d4a78f0c1b6 source/Lib/TLibCommon/TComTrQuant.cpp
--- a/source/Lib/TLibCommon/TComTrQuant.cpp Mon Mar 24 11:59:50 2014 -0700
+++ b/source/Lib/TLibCommon/TComTrQuant.cpp Mon Mar 24 12:00:06 2014 -0700
@@ -486,41 +486,6 @@
 }
 }
 
-void TComTrQuant::getTUEntropyCodingParameters(TComDataCU*cu,
-   TUEntropyCodingParameters 
result,
-   uint32_t   
absPartIdx,
-   uint32_t   
log2TrSize,
-   TextType   
ttype)
-{
-//set the group layout
-const uint32_t log2TrSizeCG = log2TrSize - MLS_CG_LOG2_SIZE;
-result.log2TrSize   = log2TrSize;
-result.log2TrSizeCG = log2TrSizeCG;
-
-//set the scan orders
-result.scanType = COEFF_SCAN_TYPE(cu-getCoefScanIdx(absPartIdx, 
log2TrSize, ttype == TEXT_LUMA, cu-isIntra(absPartIdx)));
-result.scan   = g_scanOrder[SCAN_GROUPED_4x4][result.scanType][log2TrSize];
-result.scanCG = g_scanOrder[SCAN_UNGROUPED][result.scanType][log2TrSizeCG];
-
-//set the significance map context selection parameters
-TextType ctype = ttype == TEXT_LUMA ? TEXT_LUMA : TEXT_CHROMA;
-result.ctype = ctype;
-if (log2TrSize == 2)
-{
-result.firstSignificanceMapContext = 
significanceMapContextSetStart[ctype][CONTEXT_TYPE_4x4];
-}
-else if (log2TrSize == 3)
-{
-result.firstSignificanceMapContext = 
significanceMapContextSetStart[ctype][CONTEXT_TYPE_8x8];
-if (result.scanType != SCAN_DIAG)
-result.firstSignificanceMapContext += 
nonDiagonalScan8x8ContextOffset[ctype];
-}
-else
-{
-result.firstSignificanceMapContext = 
significanceMapContextSetStart[ctype][CONTEXT_TYPE_NxN];
-}
-}
-
 /** RDOQ with CABAC
  * \param cu pointer to coding unit structure
  * \param plSrcCoeff pointer to input buffer
@@ -643,7 +608,8 @@
 }
 else
 {
-const uint32_t ctxSig = getSigCtxInc(patternSigCtx, 
log2TrSize, trSize, blkPos, codingParameters);
+// NOTE: ttype is different to ctype, but getSigCtxInc may 
safety use it 
+const uint32_t ctxSig = getSigCtxInc(patternSigCtx, 
log2TrSize, trSize, blkPos, ttype, 
codingParameters.firstSignificanceMapContext);
 if (maxAbsLevel  3)
 {
 costSig[scanPos] = xGetRateSigCoef(0, ctxSig);
@@ -1055,7 +1021,8 @@
const uint32_t   log2TrSize,
const uint32_t   trSize,
const uint32_t   blkPos,
-   const TUEntropyCodingParameters 
codingParameters)
+   const TextType   ctype,
+   const uint32_t   
firstSignificanceMapContext)
 {
 static const uint8_t ctxIndMap[16] =
 {
@@ -1114,11 +1081,11 @@
 };
 
 int cnt = table_cnt[patternSigCtx][posXinSubset][posYinSubset];
-int offset = codingParameters.firstSignificanceMapContext;
+int offset = firstSignificanceMapContext;
 
 offset += cnt;
 
-return (codingParameters.ctype == TEXT_LUMA  (posX | posY) = 4) ? 3 + 
offset : offset;
+return (ctype == TEXT_LUMA  (posX | posY) = 4) ? 3 + offset : offset;
 }
 
 /** Get the best level in RD sense
diff -r 5d22c7cd7cd6 -r 6d4a78f0c1b6 source/Lib/TLibCommon/TComTrQuant.h
--- a/source/Lib/TLibCommon/TComTrQuant.h   Mon Mar 24 11:59:50 2014 -0700
+++ b/source/Lib/TLibCommon/TComTrQuant.h   Mon Mar 24 12:00:06 2014 -0700
@@ -159,9 +159,42 @@
 void processScalingListEnc(int32_t *coeff, int32_t *quantcoeff, int 
quantScales, uint32_t height, uint32_t width, uint32_t ratio, int sizuNum, 
uint32_t dc);
 void processScalingListDec(int32_t *coeff, int32_t *dequantcoeff, int 
invQuantScales, uint32_t height, uint32_t width, uint32_t ratio, int sizuNum, 
uint32_t dc);
 static uint32_t calcPatternSigCtx(const uint64_t sigCoeffGroupFlag64, 
uint32_t cgPosX, uint32_t cgPosY, uint32_t log2TrSizeCG);
-static uint32_t getSigCtxInc(uint32_t patternSigCtx, const uint32_t 
log2TrSize, const uint32_t trSize, const uint32_t blkPos, const 
TUEntropyCodingParameters codingParameters);
+static uint32_t getSigCtxInc(uint32_t patternSigCtx, const uint32_t 
log2TrSize, const uint32_t trSize, const uint32_t blkPos, const TextType ctype, 
const uint32_t firstSignificanceMapContext);
 static uint32_t

[x265] [PATCH 6 of 7] faster sign(X) and N^2 on TComTrQuant::xRateDistOptQuant

2014-03-24 Thread Min Chen

# HG changeset patch
# User Min Chen chenm...@163.com
# Date 1395687590 25200
# Node ID 5d22c7cd7cd603a3481720dd2467865012e39d37
# Parent  d39b436d01f293e20fd51a5a53028166e50cee58
faster sign(X) and N^2 on TComTrQuant::xRateDistOptQuant

diff -r d39b436d01f2 -r 5d22c7cd7cd6 source/Lib/TLibCommon/TComTrQuant.cpp
--- a/source/Lib/TLibCommon/TComTrQuant.cpp Mon Mar 24 11:59:32 2014 -0700
+++ b/source/Lib/TLibCommon/TComTrQuant.cpp Mon Mar 24 11:59:50 2014 -0700
@@ -876,7 +876,8 @@
 absSum += level;
 if (level)
 *lastPos = blkPos;
-dstCoeff[blkPos] = (srcCoeff[blkPos]  0) ? -level : level;
+uint32_t mask = (int32_t)srcCoeff[blkPos]  31;
+dstCoeff[blkPos] = (level ^ mask) - mask;
 }
 
 //= clean uncoded coefficients =
@@ -895,7 +896,7 @@
 int tmpSum = 0;
 int n;
 
-for (int subSet = (trSize * trSize - 1)  LOG2_SCAN_SET_SIZE; subSet 
= 0; subSet--)
+for (int subSet = ((1  (log2TrSize * 2)) - 1)  LOG2_SCAN_SET_SIZE; 
subSet = 0; subSet--)
 {
 int subPos = subSet  LOG2_SCAN_SET_SIZE;
 int firstNZPosInCG = SCAN_SET_SIZE, lastNZPosInCG = -1;

___
x265-devel mailing list
x265-devel@videolan.org
https://mailman.videolan.org/listinfo/x265-devel

[x265] x265 Custom Implementation

2014-03-24 Thread Nicolas Morey-Chaisemartin


Hi everyone,

My company (Kalray) is looking into writing a HEVC encoder based on x265 on its 
many core processor (MPPA-256).
Because of our architecture (distributed, limited memory among other things), a 
direct port of x265 is not a viable solution.

Our plan is to write a custom encoder core optimized for our platform and use 
it as an accelerator for x265 running on a x86 processor.
This should look something like that

/-\ /-\
|  x86| | 1 or more MPPA-256  |
| || |
|  x265 preAnalysis + |= PCI Link = | Kalray Encoder Core |
|  Rate Control   || |
| || |
\-/\-/

The idea for the encoder core is to implement a CTU encoder.
This leaves us some flexibilty on how we want to dispatch the CTU accross the 
cores (Tiles, frame parallelism, etc.)

From what I could gather after a quick glance at x265 code is that right now, x265 is 
using HM as is to do the actual encoding. Meaning except for a few 
exceptions, HM code is use directly to try out the different modes, estimate cost, 
generate bitstream, etc.

Therefore, our idea was to use HM structures as a stable interface between 
x265 and our encoder core. From these structures we can extract all the required info 
(pixels, reference frames, QPs, etc.) and convert/transfer it to our core.

What is your opinion on this approach?
Is HM classes (at least structure wise) an interface stable enough to do this?

We are still in the design phase about this so any idea is more than welcome.

Regards
--
Nicolas Morey Chaisemartin
Phone : +33 6 42 46 68 87

___
x265-devel mailing list
x265-devel@videolan.org
https://mailman.videolan.org/listinfo/x265-devel

Re: [x265] x265 Custom Implementation

2014-03-24 Thread chen

Hello,
 
In your describe, seems similary my previous FPGA architecture, it is based on 
task pool.
the biggest problem isn't on the interface, the bottleneck is transfer 
bandwidth and dispose latency.
Of course, RDO context is another bottleneck.

Best regards,
Min
 
At 2014-03-25 01:04:57,Nicolas Morey-Chaisemartin nmo...@kalray.eu wrote:
Hi everyone,

My company (Kalray) is looking into writing a HEVC encoder based on x265 on 
its many core processor (MPPA-256).
Because of our architecture (distributed, limited memory among other things), 
a direct port of x265 is not a viable solution.

Our plan is to write a custom encoder core optimized for our platform and use 
it as an accelerator for x265 running on a x86 processor.
This should look something like that

/-\ /-\
|  x86| | 1 or more MPPA-256  |
| || |
|  x265 preAnalysis + |= PCI Link = | Kalray Encoder Core |
|  Rate Control   || |
| || |
\-/\-/

The idea for the encoder core is to implement a CTU encoder.
This leaves us some flexibilty on how we want to dispatch the CTU accross the 
cores (Tiles, frame parallelism, etc.)

 From what I could gather after a quick glance at x265 code is that right now, 
 x265 is using HM as is to do the actual encoding. Meaning except for a few 
 exceptions, HM code is use directly to try out the different modes, estimate 
 cost, generate bitstream, etc.

Therefore, our idea was to use HM structures as a stable interface between 
x265 and our encoder core. From these structures we can extract all the 
required info (pixels, reference frames, QPs, etc.) and convert/transfer it to 
our core.

What is your opinion on this approach?
Is HM classes (at least structure wise) an interface stable enough to do this?

We are still in the design phase about this so any idea is more than welcome.

Regards
-- 
Nicolas Morey Chaisemartin
Phone : +33 6 42 46 68 87
___
x265-devel mailing list
x265-devel@videolan.org
https://mailman.videolan.org/listinfo/x265-devel

Re: [x265] Preset superfast undocumented in help output and PDF page 3

2014-03-24 Thread Steve Borho

On Sun, Mar 23, 2014 at 9:43 AM, Mario Rohkrämer cont...@ligh.de wrote:
 There is a preset superfast which is missing in the help output of x265
 and in the verbose explanation of command line options in the Evaulator's
 Guide on page 3. But it appears in the table of quality presets on page 9
 and is available while encoding.

Thanks, it was missed when adding the CLI help and then that string
was copied to the eval guide.  I've fixed the CLI help.  The eval
guide will be updated.

-- 
Steve Borho
___
x265-devel mailing list
x265-devel@videolan.org
https://mailman.videolan.org/listinfo/x265-devel

[x265] fix chroma lambda weighting

2014-03-24 Thread Satoshi Nakagawa

# HG changeset patch
# User Satoshi Nakagawa nakagawa...@oki.com
# Date 1395672158 -32400
#  Mon Mar 24 23:42:38 2014 +0900
# Node ID 08584b5913bce6a5f9d2f0d408fcdace6aa83a65
# Parent  fdd7c6168cf42a11240ff1c7fc7b401605524db2
fix chroma lambda weighting

diff -r fdd7c6168cf4 -r 08584b5913bc source/encoder/frameencoder.cpp
--- a/source/encoder/frameencoder.cpp   Fri Mar 21 14:44:35 2014 -0500
+++ b/source/encoder/frameencoder.cpp   Mon Mar 24 23:42:38 2014 +0900
@@ -335,11 +335,10 @@
 // instead we weight the distortion of chroma.
 int chromaQPOffset = slice-getPPS()-getChromaCbQpOffset() + 
slice-getSliceQpDeltaCb();
 int qpc = Clip3(0, MAX_MAX_QP, qp + chromaQPOffset);
-double cbWeight = pow(2.0, (qp - g_chromaScale[chFmt][qpc])); // takes 
into account of the chroma qp mapping and chroma qp Offset
-
+double cbWeight = pow(2.0, (qp - g_chromaScale[chFmt][qpc]) / 3.0); // 
takes into account of the chroma qp mapping and chroma qp Offset
 chromaQPOffset = slice-getPPS()-getChromaCrQpOffset() + 
slice-getSliceQpDeltaCr();
 qpc = Clip3(0, MAX_MAX_QP, qp + chromaQPOffset);
-double crWeight = pow(2.0, (qp - g_chromaScale[chFmt][qpc])); // takes 
into account of the chroma qp mapping and chroma qp Offset
+double crWeight = pow(2.0, (qp - g_chromaScale[chFmt][qpc]) / 3.0); // 
takes into account of the chroma qp mapping and chroma qp Offset
 double chromaLambda = lambda / crWeight;
 
 m_rows[row].m_search.setQPLambda(qp, lambda, chromaLambda);
@@ -376,10 +375,10 @@
 int qpc;
 int chromaQPOffset = slice-getPPS()-getChromaCbQpOffset() + 
slice-getSliceQpDeltaCb();
 qpc = Clip3(0, MAX_MAX_QP, qp + chromaQPOffset);
-double cbWeight = pow(2.0, (qp - g_chromaScale[chFmt][qpc])); // takes 
into account of the chroma qp mapping and chroma qp Offset
+double cbWeight = pow(2.0, (qp - g_chromaScale[chFmt][qpc]) / 3.0); // 
takes into account of the chroma qp mapping and chroma qp Offset
 chromaQPOffset = slice-getPPS()-getChromaCrQpOffset() + 
slice-getSliceQpDeltaCr();
 qpc = Clip3(0, MAX_MAX_QP, qp + chromaQPOffset);
-double crWeight = pow(2.0, (qp - g_chromaScale[chFmt][qpc])); // takes 
into account of the chroma qp mapping and chroma qp Offset
+double crWeight = pow(2.0, (qp - g_chromaScale[chFmt][qpc]) / 3.0); // 
takes into account of the chroma qp mapping and chroma qp Offset
 double chromaLambda = lambda / crWeight;
 
 // NOTE: set SAO lambda every Frame
___
x265-devel mailing list
x265-devel@videolan.org
https://mailman.videolan.org/listinfo/x265-devel

Re: [x265] [PATCH 1 of 3] primitives: added C primitives for upShift/downShift of input pixels

Re: [x265] [PATCH 2 of 3] testbench: code for testing input pixel upShift/downShift primitives

Re: [x265] [PATCH 1 of 3] primitives: added C primitives for upShift/downShift of input pixels

Re: [x265] [PATCH 2 of 3] testbench: code for testing input pixel upShift/downShift primitives

Re: [x265] [PATCH 3 of 3] asm: code for input pixel upShift/downShift

[x265] [PATCH 1 of 7] improvement TEncBinCABAC::encodeBin by temporary variant and reduce AND operator

[x265] [PATCH 2 of 7] improvement TEncBinCABAC::writeOut by mask operator and local variant

[x265] [PATCH 4 of 7] optimize: replace g_groupIdx[] by getGroupIdx()

[x265] [PATCH 3 of 7] reduce g_minInGroup from uint32_t to uint8_t

[x265] [PATCH 5 of 7] improvement by replace SHIFT to MASK_AND

[x265] [PATCH 7 of 7] cleanup on TComTrQuant::getTUEntropyCodingParameters

[x265] [PATCH 6 of 7] faster sign(X) and N^2 on TComTrQuant::xRateDistOptQuant

[x265] x265 Custom Implementation

Re: [x265] x265 Custom Implementation

Re: [x265] Preset superfast undocumented in help output and PDF page 3

[x265] fix chroma lambda weighting

16 matches

Site Navigation

Mail list logo

Footer information