Re: [x265] [PATCH 1 of 3] primitives: added C primitives for upShift/downShift of input pixels
On Fri, Mar 21, 2014 at 8:44 PM, chen chenm...@163.com wrote: At 2014-03-21 13:35:29,muru...@multicorewareinc.com wrote: # HG changeset patch # User Murugan Vairavel muru...@multicorewareinc.com # Date 1395379028 -19800 # Fri Mar 21 10:47:08 2014 +0530 # Node ID 0c4fdd43325e6501698a281862b1c027238a9c9d # Parent fe3fcd9838c02fb65fed8638a13dea9f06f8a9be primitives: added C primitives for upShift/downShift of input pixels +void planecopy_cp_c(uint8_t *src, intptr_t srcStride, pixel *dst, intptr_t dstStride, int width, int height) this function for 8 to 10 convert only Yes it will do 8-bit to 10-bit upshift only. Should i need to change the function name??? +{ +for (int r = 0; r height; r++) +{ +for (int c = 0; c width; c++) +{ +dst[c] = ((pixel)src[c]) 2; +} + +dst += dstStride; +src += srcStride; +} +} ___ x265-devel mailing list x265-devel@videolan.org https://mailman.videolan.org/listinfo/x265-devel -- With Regards, Murugan. V +919659287478 ___ x265-devel mailing list x265-devel@videolan.org https://mailman.videolan.org/listinfo/x265-devel
Re: [x265] [PATCH 2 of 3] testbench: code for testing input pixel upShift/downShift primitives
On Fri, Mar 21, 2014 at 8:42 PM, chen chenm...@163.com wrote: At 2014-03-21 13:35:30,muru...@multicorewareinc.com wrote: # HG changeset patch # User Murugan Vairavel muru...@multicorewareinc.com # Date 1395379187 -19800 # Fri Mar 21 10:49:47 2014 +0530 # Node ID 435e50b2b92c83e10fdb2bd86bc8e8df91b7338b # Parent 0c4fdd43325e6501698a281862b1c027238a9c9d testbench: code for testing input pixel upShift/downShift primitives /* [0] --- Random values @@ -79,16 +83,22 @@ short_test_buff1[0][i] = rand() PIXEL_MAX; // For block copy only short_test_buff2[0][i] = rand() % 16383; // for addAvg int_test_buff[0][i] = rand() % SHORT_MAX; +ushort_test_buff[0][i] = rand() % ((1 10) - 1); +uchar_test_buff[0][i] = rand() % ((1 8) - 1); out code include a clip operator, so you can do more dynamic range to verify that. Do you mean to increase the Dynamic range of ushort buffer??? ___ x265-devel mailing list x265-devel@videolan.org https://mailman.videolan.org/listinfo/x265-devel -- With Regards, Murugan. V +919659287478 ___ x265-devel mailing list x265-devel@videolan.org https://mailman.videolan.org/listinfo/x265-devel
Re: [x265] [PATCH 1 of 3] primitives: added C primitives for upShift/downShift of input pixels
On Fri, Mar 21, 2014 at 8:44 PM, chen chenm...@163.com wrote: At 2014-03-21 13:35:29,muru...@multicorewareinc.com wrote: # HG changeset patch # User Murugan Vairavel muru...@multicorewareinc.com # Date 1395379028 -19800 # Fri Mar 21 10:47:08 2014 +0530 # Node ID 0c4fdd43325e6501698a281862b1c027238a9c9d # Parent fe3fcd9838c02fb65fed8638a13dea9f06f8a9be primitives: added C primitives for upShift/downShift of input pixels +void planecopy_cp_c(uint8_t *src, intptr_t srcStride, pixel *dst, intptr_t dstStride, int width, int height) this function for 8 to 10 convert only Yes it will do 8-bit to 10-bit upshift only. Should i need to change the function name??? Yes, please modify it, I worry about we need 8to12 in future.___ x265-devel mailing list x265-devel@videolan.org https://mailman.videolan.org/listinfo/x265-devel
Re: [x265] [PATCH 2 of 3] testbench: code for testing input pixel upShift/downShift primitives
On Fri, Mar 21, 2014 at 8:42 PM, chen chenm...@163.com wrote: At 2014-03-21 13:35:30,muru...@multicorewareinc.com wrote: # HG changeset patch # User Murugan Vairavel muru...@multicorewareinc.com # Date 1395379187 -19800 # Fri Mar 21 10:49:47 2014 +0530 # Node ID 435e50b2b92c83e10fdb2bd86bc8e8df91b7338b # Parent 0c4fdd43325e6501698a281862b1c027238a9c9d testbench: code for testing input pixel upShift/downShift primitives /* [0] --- Random values @@ -79,16 +83,22 @@ short_test_buff1[0][i] = rand() PIXEL_MAX; // For block copy only short_test_buff2[0][i] = rand() % 16383; // for addAvg int_test_buff[0][i] = rand() % SHORT_MAX; +ushort_test_buff[0][i] = rand() % ((1 10) - 1); +uchar_test_buff[0][i] = rand() % ((1 8) - 1); out code include a clip operator, so you can do more dynamic range to verify that. Do you mean to increase the Dynamic range of ushort buffer??? Yes, more dynamic range may verify our CLIP code___ x265-devel mailing list x265-devel@videolan.org https://mailman.videolan.org/listinfo/x265-devel
Re: [x265] [PATCH 3 of 3] asm: code for input pixel upShift/downShift
On Fri, Mar 21, 2014 at 8:01 PM, chen chenm...@163.com wrote: At 2014-03-21 13:35:31,muru...@multicorewareinc.com wrote: # HG changeset patch # User Murugan Vairavel muru...@multicorewareinc.com # Date 1395379456 -19800 # Fri Mar 21 10:54:16 2014 +0530 # Node ID 29728f7728591116192575d411ef2db2dff49c18 # Parent 435e50b2b92c83e10fdb2bd86bc8e8df91b7338b asm: code for input pixel upShift/downShift +; Input 10bpp, Output 8bpp, width is multiple of 16 +; +;void planecopy_sp(uint16_t *src, intptr_t srcStride, pixel *dst, intptr_t dstStride, int width, int height, int shift, uint16_t mask) +; +INIT_XMM sse2 +cglobal downShift_10, 7,7,3 +movdm0, r6d; m0 = shift +add r1, r1 +dec r5d+.loopH: +xor r6, r6 tip: r6 is a offset, when you do prepare 'r1=r1-r4', you may direct operator on r0But the pixels processed in each row is not equal to the width(r4), in case the width is not a multiple of 16. If i do it as above then the output mismatch will occur. Your algorithm do a loop that width multiple of 16 except last one, you need not to modify this part now, just for you information. +.loopW: +movum1, [r0 + r6 * 2] +movum2, [r0 + r6 * 2 + 16] +psrlw m1, m0 +psrlw m2, m0 +packuswbm1, m2 +movu[r2 + r6], m1 + +add r6, 16 +cmp r6d, r4d +jl .loopW + +; move to next row +lea r0, [r0 + r1] +lea r2, [r2 + r3] add r0,r1 add r2,r3I will modify that. +dec r5d +jnz .loopH + +;processing last row of every frame [To handle width which not a multiple of 16] + +.loop16: +movum1, [r0] +movum2, [r0 + 16] +psrlw m1, m0 +psrlw m2, m0 +packuswbm1, m2 +movu[r2], m1 + +add r0, 2 * mmsize +add r2, mmsize +sub r4d, 16 +jz .end +cmp r4d, 15 +jg .loop16 -- (X 16) (X 15) ?? means?? r4d = X sub r4d,16 cmp jz - (X-16 == 0) cmp r4d, 15 jg - (X-16 15) --- here logic a little problem, it's right but reduce, when it is true, means (x-16=16) -_-! +cmp r4d, 8 +jl .process4 +movum1, [r0] +psrlw m1, m0 +packuswbm1, m1 +movh[r2], m1 + +add r0, mmsize +add r2, 8 +sub r4d, 8 +jz .end + +.process4: +cmp r4d, 4 +jl .process2 +movhm1,[r0] +psrlw m1, m0 +packuswbm1, m1 +movd[r2], m1 + +add r0, 8 +add r2, 4 +sub r4d, 4 +jz .end + +.process2: +cmp r4d, 2 +jl .process1 +movdm1, [r0] +psrlw m1, m0 +packuswbm1, m1 +movdr6, m1 +mov [r2], r6w + +add r0, 4 +add r2, 2 +sub r4d, 2 +jz .end + +.process1: +movdm1, [r0] +psrlw m1, m0 +packuswbm1, m1 +movdr6, m1 +mov [r2], r6b +.end: +RET (4, 2, 1) pixels path may share calculate code Do you mean defining a macro for that?? No, last 16 or 8 pixel may cover all of case (result in different part in register), so we need not to calculate many times.___ x265-devel mailing list x265-devel@videolan.org https://mailman.videolan.org/listinfo/x265-devel
[x265] [PATCH 1 of 7] improvement TEncBinCABAC::encodeBin by temporary variant and reduce AND operator
# HG changeset patch # User Min Chen chenm...@163.com # Date 1395687461 25200 # Node ID 842aab45735b6b309f6945d4a9f04588ee0e8324 # Parent fdd7c6168cf42a11240ff1c7fc7b401605524db2 improvement TEncBinCABAC::encodeBin by temporary variant and reduce AND operator diff -r fdd7c6168cf4 -r 842aab45735b source/Lib/TLibEncoder/TEncBinCoderCABAC.cpp --- a/source/Lib/TLibEncoder/TEncBinCoderCABAC.cpp Fri Mar 21 14:44:35 2014 -0500 +++ b/source/Lib/TLibEncoder/TEncBinCoderCABAC.cpp Mon Mar 24 11:57:41 2014 -0700 @@ -190,25 +190,30 @@ } ctxModel.bBinsCoded = 1; -uint32_t mps = sbacGetMps(mstate); +uint32_t range = m_range; uint32_t state = sbacGetState(mstate); -uint32_t lps = g_lpsTable[state][(m_range 6) 3]; -m_range -= lps; +uint32_t lps = g_lpsTable[state][((uint8_t)range 6)]; +range -= lps; -assert(lps != 0); +assert(lps = 2); -int numBits = (uint32_t)(m_range - 256) 31; +int numBits = (uint32_t)(range - 256) 31; uint32_t low = m_low; -uint32_t range = m_range; -if (binValue != mps) + +// NOTE: MPS must be LOWEST bit in mstate +assert(((binValue ^ mstate) 1) == (binValue != sbacGetMps(mstate))); +if ((binValue ^ mstate) 1) { // NOTE: lps is non-zero and the maximum of idx is 8 because lps less than 256 //numBits = g_renormTable[lps 3]; unsigned long idx; CLZ32(idx, lps); +assert(state != 63 || idx == 1); + numBits = 8 - idx; -if (numBits = 6) +if (state = 63) numBits = 6; +assert(numBits = 6); low+= range; range = lps; ___ x265-devel mailing list x265-devel@videolan.org https://mailman.videolan.org/listinfo/x265-devel
[x265] [PATCH 2 of 7] improvement TEncBinCABAC::writeOut by mask operator and local variant
# HG changeset patch # User Min Chen chenm...@163.com # Date 1395687480 25200 # Node ID 928156df5d736de1c8f053ae06d8bb6ce11185e4 # Parent 842aab45735b6b309f6945d4a9f04588ee0e8324 improvement TEncBinCABAC::writeOut by mask operator and local variant diff -r 842aab45735b -r 928156df5d73 source/Lib/TLibEncoder/TEncBinCoderCABAC.cpp --- a/source/Lib/TLibEncoder/TEncBinCoderCABAC.cpp Mon Mar 24 11:57:41 2014 -0700 +++ b/source/Lib/TLibEncoder/TEncBinCoderCABAC.cpp Mon Mar 24 11:58:00 2014 -0700 @@ -350,9 +350,10 @@ void TEncBinCABAC::writeOut() { uint32_t leadByte = m_low (13 + m_bitsLeft); +uint32_t low_mask = (uint32_t)(~0) (11 + 8 - m_bitsLeft); m_bitsLeft -= 8; -m_low = 0xu (11 - m_bitsLeft); +m_low = low_mask; if (leadByte == 0xff) { @@ -360,25 +361,22 @@ } else { -if (m_numBufferedBytes 0) +uint32_t numBufferedBytes = m_numBufferedBytes; +if (numBufferedBytes 0) { uint32_t carry = leadByte 8; uint32_t byteTowrite = m_bufferedByte + carry; -m_bufferedByte = leadByte 0xff; m_bitIf-writeByte(byteTowrite); byteTowrite = (0xff + carry) 0xff; -while (m_numBufferedBytes 1) +while (numBufferedBytes 1) { m_bitIf-writeByte(byteTowrite); -m_numBufferedBytes--; +numBufferedBytes--; } } -else -{ -m_numBufferedBytes = 1; -m_bufferedByte = leadByte; -} +m_numBufferedBytes = 1; +m_bufferedByte = (uint8_t)leadByte; } } ___ x265-devel mailing list x265-devel@videolan.org https://mailman.videolan.org/listinfo/x265-devel
[x265] [PATCH 4 of 7] optimize: replace g_groupIdx[] by getGroupIdx()
# HG changeset patch # User Min Chen chenm...@163.com # Date 1395687533 25200 # Node ID 105fa844e4e3e2c6bffb8d2ea613e56e429cdf64 # Parent 700a63ba598db1828534ee824fbb1f93fef86c0f optimize: replace g_groupIdx[] by getGroupIdx() diff -r 700a63ba598d -r 105fa844e4e3 source/Lib/TLibCommon/TComRom.cpp --- a/source/Lib/TLibCommon/TComRom.cpp Mon Mar 24 11:58:22 2014 -0700 +++ b/source/Lib/TLibCommon/TComRom.cpp Mon Mar 24 11:58:53 2014 -0700 @@ -434,7 +434,6 @@ // const uint8_t g_minInGroup[10] = { 0, 1, 2, 3, 4, 6, 8, 12, 16, 24 }; -const uint8_t g_groupIdx[32] = { 0, 1, 2, 3, 4, 4, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 9, 9 }; // Rice parameters for absolute transform levels const uint8_t g_goRiceRange[5] = { 7, 14, 26, 46, 78 }; diff -r 700a63ba598d -r 105fa844e4e3 source/Lib/TLibCommon/TComRom.h --- a/source/Lib/TLibCommon/TComRom.h Mon Mar 24 11:58:22 2014 -0700 +++ b/source/Lib/TLibCommon/TComRom.h Mon Mar 24 11:58:53 2014 -0700 @@ -128,7 +128,24 @@ // Scanning order context mapping table // -extern const uint8_t g_groupIdx[32]; +//extern const uint8_t g_groupIdx[32]; +static inline uint32_t getGroupIdx(const uint32_t idx) +{ +uint32_t group = (idx 3); +if (idx = 24) +group = 2; +uint32_t groupIdx = ((idx (group + 1)) - 2) + 4 + (group 1); +if (idx = 3) +groupIdx = idx; + +#ifdef _DEBUG +static const uint8_t g_groupIdx[32] = { 0, 1, 2, 3, 4, 4, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 9, 9 }; +assert(groupIdx == g_groupIdx[idx]); +#endif + +return groupIdx; +} + extern const uint8_t g_minInGroup[10]; extern const uint8_t g_goRiceRange[5]; //! maximum value coded with Rice codes diff -r 700a63ba598d -r 105fa844e4e3 source/Lib/TLibCommon/TComTrQuant.cpp --- a/source/Lib/TLibCommon/TComTrQuant.cpp Mon Mar 24 11:58:22 2014 -0700 +++ b/source/Lib/TLibCommon/TComTrQuant.cpp Mon Mar 24 11:58:53 2014 -0700 @@ -1330,8 +1330,8 @@ */ inline double TComTrQuant::xGetRateLast(uint32_t posx, uint32_t posy) const { -uint32_t ctxX = g_groupIdx[posx]; -uint32_t ctxY = g_groupIdx[posy]; +uint32_t ctxX = getGroupIdx(posx); +uint32_t ctxY = getGroupIdx(posy); uint32_t cost = m_estBitsSbac-lastXBits[ctxX] + m_estBitsSbac-lastYBits[ctxY]; int32_t maskX = (int32_t)(2 - posx) 31; diff -r 700a63ba598d -r 105fa844e4e3 source/Lib/TLibEncoder/TEncSbac.cpp --- a/source/Lib/TLibEncoder/TEncSbac.cpp Mon Mar 24 11:58:22 2014 -0700 +++ b/source/Lib/TLibEncoder/TEncSbac.cpp Mon Mar 24 11:58:53 2014 -0700 @@ -1988,8 +1988,8 @@ } uint32_t ctxLast; -uint32_t groupIdxX = g_groupIdx[posx]; -uint32_t groupIdxY = g_groupIdx[posy]; +uint32_t groupIdxX = getGroupIdx(posx); +uint32_t groupIdxY = getGroupIdx(posy); int blkSizeOffset = ttype ? NUM_CTX_LAST_FLAG_XY_LUMA : ((log2TrSize - 2) * 3 + ((log2TrSize - 1) 2)); int ctxShift = ttype ? log2TrSize - 2 : ((log2TrSize + 1) 2); ___ x265-devel mailing list x265-devel@videolan.org https://mailman.videolan.org/listinfo/x265-devel
[x265] [PATCH 3 of 7] reduce g_minInGroup from uint32_t to uint8_t
# HG changeset patch # User Min Chen chenm...@163.com # Date 1395687502 25200 # Node ID 700a63ba598db1828534ee824fbb1f93fef86c0f # Parent 928156df5d736de1c8f053ae06d8bb6ce11185e4 reduce g_minInGroup from uint32_t to uint8_t diff -r 928156df5d73 -r 700a63ba598d source/Lib/TLibCommon/TComRom.cpp --- a/source/Lib/TLibCommon/TComRom.cpp Mon Mar 24 11:58:00 2014 -0700 +++ b/source/Lib/TLibCommon/TComRom.cpp Mon Mar 24 11:58:22 2014 -0700 @@ -433,7 +433,7 @@ // Scanning order context model mapping // -const uint32_t g_minInGroup[10] = { 0, 1, 2, 3, 4, 6, 8, 12, 16, 24 }; +const uint8_t g_minInGroup[10] = { 0, 1, 2, 3, 4, 6, 8, 12, 16, 24 }; const uint8_t g_groupIdx[32] = { 0, 1, 2, 3, 4, 4, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 9, 9 }; // Rice parameters for absolute transform levels diff -r 928156df5d73 -r 700a63ba598d source/Lib/TLibCommon/TComRom.h --- a/source/Lib/TLibCommon/TComRom.h Mon Mar 24 11:58:00 2014 -0700 +++ b/source/Lib/TLibCommon/TComRom.h Mon Mar 24 11:58:22 2014 -0700 @@ -129,7 +129,7 @@ // extern const uint8_t g_groupIdx[32]; -extern const uint32_t g_minInGroup[10]; +extern const uint8_t g_minInGroup[10]; extern const uint8_t g_goRiceRange[5]; //! maximum value coded with Rice codes //extern const uint8_t g_goRicePrefixLen[5]; //! prefix length for each maximum value ___ x265-devel mailing list x265-devel@videolan.org https://mailman.videolan.org/listinfo/x265-devel
[x265] [PATCH 5 of 7] improvement by replace SHIFT to MASK_AND
# HG changeset patch # User Min Chen chenm...@163.com # Date 1395687572 25200 # Node ID d39b436d01f293e20fd51a5a53028166e50cee58 # Parent 105fa844e4e3e2c6bffb8d2ea613e56e429cdf64 improvement by replace SHIFT to MASK_AND diff -r 105fa844e4e3 -r d39b436d01f2 source/Lib/TLibEncoder/TEncSbac.cpp --- a/source/Lib/TLibEncoder/TEncSbac.cpp Mon Mar 24 11:58:53 2014 -0700 +++ b/source/Lib/TLibEncoder/TEncSbac.cpp Mon Mar 24 11:59:32 2014 -0700 @@ -2117,7 +2117,7 @@ // Code position of last coefficient int posLastY = posLast log2TrSize; -int posLastX = posLast - (posLastY log2TrSize); +int posLastX = posLast (trSize - 1); codeLastSignificantXY(posLastX, posLastY, log2TrSize, ttype, codingParameters.scanType); //= code significance flag = ContextModel * const baseCoeffGroupCtx = m_contextModels[OFF_SIG_CG_FLAG_CTX + (ttype ? NUM_SIG_CG_FLAG_CTX : 0)]; @@ -2178,9 +2178,9 @@ if (sig) { absCoeff[numNonZero] = int(abs(coeff[blkPos])); -coeffSigns = 2 * coeffSigns + (coeff[blkPos] 0); +coeffSigns = 2 * coeffSigns + ((uint32_t)coeff[blkPos] 31); numNonZero++; -if (lastNZPosInCG == -1) +if (lastNZPosInCG 0) { lastNZPosInCG = scanPosSig; } ___ x265-devel mailing list x265-devel@videolan.org https://mailman.videolan.org/listinfo/x265-devel
[x265] [PATCH 7 of 7] cleanup on TComTrQuant::getTUEntropyCodingParameters
# HG changeset patch # User Min Chen chenm...@163.com # Date 1395687606 25200 # Node ID 6d4a78f0c1b603370fcebafa70eee2f2dffdc11a # Parent 5d22c7cd7cd603a3481720dd2467865012e39d37 cleanup on TComTrQuant::getTUEntropyCodingParameters diff -r 5d22c7cd7cd6 -r 6d4a78f0c1b6 source/Lib/TLibCommon/TComTrQuant.cpp --- a/source/Lib/TLibCommon/TComTrQuant.cpp Mon Mar 24 11:59:50 2014 -0700 +++ b/source/Lib/TLibCommon/TComTrQuant.cpp Mon Mar 24 12:00:06 2014 -0700 @@ -486,41 +486,6 @@ } } -void TComTrQuant::getTUEntropyCodingParameters(TComDataCU*cu, - TUEntropyCodingParameters result, - uint32_t absPartIdx, - uint32_t log2TrSize, - TextType ttype) -{ -//set the group layout -const uint32_t log2TrSizeCG = log2TrSize - MLS_CG_LOG2_SIZE; -result.log2TrSize = log2TrSize; -result.log2TrSizeCG = log2TrSizeCG; - -//set the scan orders -result.scanType = COEFF_SCAN_TYPE(cu-getCoefScanIdx(absPartIdx, log2TrSize, ttype == TEXT_LUMA, cu-isIntra(absPartIdx))); -result.scan = g_scanOrder[SCAN_GROUPED_4x4][result.scanType][log2TrSize]; -result.scanCG = g_scanOrder[SCAN_UNGROUPED][result.scanType][log2TrSizeCG]; - -//set the significance map context selection parameters -TextType ctype = ttype == TEXT_LUMA ? TEXT_LUMA : TEXT_CHROMA; -result.ctype = ctype; -if (log2TrSize == 2) -{ -result.firstSignificanceMapContext = significanceMapContextSetStart[ctype][CONTEXT_TYPE_4x4]; -} -else if (log2TrSize == 3) -{ -result.firstSignificanceMapContext = significanceMapContextSetStart[ctype][CONTEXT_TYPE_8x8]; -if (result.scanType != SCAN_DIAG) -result.firstSignificanceMapContext += nonDiagonalScan8x8ContextOffset[ctype]; -} -else -{ -result.firstSignificanceMapContext = significanceMapContextSetStart[ctype][CONTEXT_TYPE_NxN]; -} -} - /** RDOQ with CABAC * \param cu pointer to coding unit structure * \param plSrcCoeff pointer to input buffer @@ -643,7 +608,8 @@ } else { -const uint32_t ctxSig = getSigCtxInc(patternSigCtx, log2TrSize, trSize, blkPos, codingParameters); +// NOTE: ttype is different to ctype, but getSigCtxInc may safety use it +const uint32_t ctxSig = getSigCtxInc(patternSigCtx, log2TrSize, trSize, blkPos, ttype, codingParameters.firstSignificanceMapContext); if (maxAbsLevel 3) { costSig[scanPos] = xGetRateSigCoef(0, ctxSig); @@ -1055,7 +1021,8 @@ const uint32_t log2TrSize, const uint32_t trSize, const uint32_t blkPos, - const TUEntropyCodingParameters codingParameters) + const TextType ctype, + const uint32_t firstSignificanceMapContext) { static const uint8_t ctxIndMap[16] = { @@ -1114,11 +1081,11 @@ }; int cnt = table_cnt[patternSigCtx][posXinSubset][posYinSubset]; -int offset = codingParameters.firstSignificanceMapContext; +int offset = firstSignificanceMapContext; offset += cnt; -return (codingParameters.ctype == TEXT_LUMA (posX | posY) = 4) ? 3 + offset : offset; +return (ctype == TEXT_LUMA (posX | posY) = 4) ? 3 + offset : offset; } /** Get the best level in RD sense diff -r 5d22c7cd7cd6 -r 6d4a78f0c1b6 source/Lib/TLibCommon/TComTrQuant.h --- a/source/Lib/TLibCommon/TComTrQuant.h Mon Mar 24 11:59:50 2014 -0700 +++ b/source/Lib/TLibCommon/TComTrQuant.h Mon Mar 24 12:00:06 2014 -0700 @@ -159,9 +159,42 @@ void processScalingListEnc(int32_t *coeff, int32_t *quantcoeff, int quantScales, uint32_t height, uint32_t width, uint32_t ratio, int sizuNum, uint32_t dc); void processScalingListDec(int32_t *coeff, int32_t *dequantcoeff, int invQuantScales, uint32_t height, uint32_t width, uint32_t ratio, int sizuNum, uint32_t dc); static uint32_t calcPatternSigCtx(const uint64_t sigCoeffGroupFlag64, uint32_t cgPosX, uint32_t cgPosY, uint32_t log2TrSizeCG); -static uint32_t getSigCtxInc(uint32_t patternSigCtx, const uint32_t log2TrSize, const uint32_t trSize, const uint32_t blkPos, const TUEntropyCodingParameters codingParameters); +static uint32_t getSigCtxInc(uint32_t patternSigCtx, const uint32_t log2TrSize, const uint32_t trSize, const uint32_t blkPos, const TextType ctype, const uint32_t firstSignificanceMapContext); static uint32_t
[x265] [PATCH 6 of 7] faster sign(X) and N^2 on TComTrQuant::xRateDistOptQuant
# HG changeset patch # User Min Chen chenm...@163.com # Date 1395687590 25200 # Node ID 5d22c7cd7cd603a3481720dd2467865012e39d37 # Parent d39b436d01f293e20fd51a5a53028166e50cee58 faster sign(X) and N^2 on TComTrQuant::xRateDistOptQuant diff -r d39b436d01f2 -r 5d22c7cd7cd6 source/Lib/TLibCommon/TComTrQuant.cpp --- a/source/Lib/TLibCommon/TComTrQuant.cpp Mon Mar 24 11:59:32 2014 -0700 +++ b/source/Lib/TLibCommon/TComTrQuant.cpp Mon Mar 24 11:59:50 2014 -0700 @@ -876,7 +876,8 @@ absSum += level; if (level) *lastPos = blkPos; -dstCoeff[blkPos] = (srcCoeff[blkPos] 0) ? -level : level; +uint32_t mask = (int32_t)srcCoeff[blkPos] 31; +dstCoeff[blkPos] = (level ^ mask) - mask; } //= clean uncoded coefficients = @@ -895,7 +896,7 @@ int tmpSum = 0; int n; -for (int subSet = (trSize * trSize - 1) LOG2_SCAN_SET_SIZE; subSet = 0; subSet--) +for (int subSet = ((1 (log2TrSize * 2)) - 1) LOG2_SCAN_SET_SIZE; subSet = 0; subSet--) { int subPos = subSet LOG2_SCAN_SET_SIZE; int firstNZPosInCG = SCAN_SET_SIZE, lastNZPosInCG = -1; ___ x265-devel mailing list x265-devel@videolan.org https://mailman.videolan.org/listinfo/x265-devel
[x265] x265 Custom Implementation
Hi everyone, My company (Kalray) is looking into writing a HEVC encoder based on x265 on its many core processor (MPPA-256). Because of our architecture (distributed, limited memory among other things), a direct port of x265 is not a viable solution. Our plan is to write a custom encoder core optimized for our platform and use it as an accelerator for x265 running on a x86 processor. This should look something like that /-\ /-\ | x86| | 1 or more MPPA-256 | | || | | x265 preAnalysis + |= PCI Link = | Kalray Encoder Core | | Rate Control || | | || | \-/\-/ The idea for the encoder core is to implement a CTU encoder. This leaves us some flexibilty on how we want to dispatch the CTU accross the cores (Tiles, frame parallelism, etc.) From what I could gather after a quick glance at x265 code is that right now, x265 is using HM as is to do the actual encoding. Meaning except for a few exceptions, HM code is use directly to try out the different modes, estimate cost, generate bitstream, etc. Therefore, our idea was to use HM structures as a stable interface between x265 and our encoder core. From these structures we can extract all the required info (pixels, reference frames, QPs, etc.) and convert/transfer it to our core. What is your opinion on this approach? Is HM classes (at least structure wise) an interface stable enough to do this? We are still in the design phase about this so any idea is more than welcome. Regards -- Nicolas Morey Chaisemartin Phone : +33 6 42 46 68 87 ___ x265-devel mailing list x265-devel@videolan.org https://mailman.videolan.org/listinfo/x265-devel
Re: [x265] x265 Custom Implementation
Hello, In your describe, seems similary my previous FPGA architecture, it is based on task pool. the biggest problem isn't on the interface, the bottleneck is transfer bandwidth and dispose latency. Of course, RDO context is another bottleneck. Best regards, Min At 2014-03-25 01:04:57,Nicolas Morey-Chaisemartin nmo...@kalray.eu wrote: Hi everyone, My company (Kalray) is looking into writing a HEVC encoder based on x265 on its many core processor (MPPA-256). Because of our architecture (distributed, limited memory among other things), a direct port of x265 is not a viable solution. Our plan is to write a custom encoder core optimized for our platform and use it as an accelerator for x265 running on a x86 processor. This should look something like that /-\ /-\ | x86| | 1 or more MPPA-256 | | || | | x265 preAnalysis + |= PCI Link = | Kalray Encoder Core | | Rate Control || | | || | \-/\-/ The idea for the encoder core is to implement a CTU encoder. This leaves us some flexibilty on how we want to dispatch the CTU accross the cores (Tiles, frame parallelism, etc.) From what I could gather after a quick glance at x265 code is that right now, x265 is using HM as is to do the actual encoding. Meaning except for a few exceptions, HM code is use directly to try out the different modes, estimate cost, generate bitstream, etc. Therefore, our idea was to use HM structures as a stable interface between x265 and our encoder core. From these structures we can extract all the required info (pixels, reference frames, QPs, etc.) and convert/transfer it to our core. What is your opinion on this approach? Is HM classes (at least structure wise) an interface stable enough to do this? We are still in the design phase about this so any idea is more than welcome. Regards -- Nicolas Morey Chaisemartin Phone : +33 6 42 46 68 87 ___ x265-devel mailing list x265-devel@videolan.org https://mailman.videolan.org/listinfo/x265-devel
Re: [x265] Preset superfast undocumented in help output and PDF page 3
On Sun, Mar 23, 2014 at 9:43 AM, Mario Rohkrämer cont...@ligh.de wrote: There is a preset superfast which is missing in the help output of x265 and in the verbose explanation of command line options in the Evaulator's Guide on page 3. But it appears in the table of quality presets on page 9 and is available while encoding. Thanks, it was missed when adding the CLI help and then that string was copied to the eval guide. I've fixed the CLI help. The eval guide will be updated. -- Steve Borho ___ x265-devel mailing list x265-devel@videolan.org https://mailman.videolan.org/listinfo/x265-devel
[x265] fix chroma lambda weighting
# HG changeset patch # User Satoshi Nakagawa nakagawa...@oki.com # Date 1395672158 -32400 # Mon Mar 24 23:42:38 2014 +0900 # Node ID 08584b5913bce6a5f9d2f0d408fcdace6aa83a65 # Parent fdd7c6168cf42a11240ff1c7fc7b401605524db2 fix chroma lambda weighting diff -r fdd7c6168cf4 -r 08584b5913bc source/encoder/frameencoder.cpp --- a/source/encoder/frameencoder.cpp Fri Mar 21 14:44:35 2014 -0500 +++ b/source/encoder/frameencoder.cpp Mon Mar 24 23:42:38 2014 +0900 @@ -335,11 +335,10 @@ // instead we weight the distortion of chroma. int chromaQPOffset = slice-getPPS()-getChromaCbQpOffset() + slice-getSliceQpDeltaCb(); int qpc = Clip3(0, MAX_MAX_QP, qp + chromaQPOffset); -double cbWeight = pow(2.0, (qp - g_chromaScale[chFmt][qpc])); // takes into account of the chroma qp mapping and chroma qp Offset - +double cbWeight = pow(2.0, (qp - g_chromaScale[chFmt][qpc]) / 3.0); // takes into account of the chroma qp mapping and chroma qp Offset chromaQPOffset = slice-getPPS()-getChromaCrQpOffset() + slice-getSliceQpDeltaCr(); qpc = Clip3(0, MAX_MAX_QP, qp + chromaQPOffset); -double crWeight = pow(2.0, (qp - g_chromaScale[chFmt][qpc])); // takes into account of the chroma qp mapping and chroma qp Offset +double crWeight = pow(2.0, (qp - g_chromaScale[chFmt][qpc]) / 3.0); // takes into account of the chroma qp mapping and chroma qp Offset double chromaLambda = lambda / crWeight; m_rows[row].m_search.setQPLambda(qp, lambda, chromaLambda); @@ -376,10 +375,10 @@ int qpc; int chromaQPOffset = slice-getPPS()-getChromaCbQpOffset() + slice-getSliceQpDeltaCb(); qpc = Clip3(0, MAX_MAX_QP, qp + chromaQPOffset); -double cbWeight = pow(2.0, (qp - g_chromaScale[chFmt][qpc])); // takes into account of the chroma qp mapping and chroma qp Offset +double cbWeight = pow(2.0, (qp - g_chromaScale[chFmt][qpc]) / 3.0); // takes into account of the chroma qp mapping and chroma qp Offset chromaQPOffset = slice-getPPS()-getChromaCrQpOffset() + slice-getSliceQpDeltaCr(); qpc = Clip3(0, MAX_MAX_QP, qp + chromaQPOffset); -double crWeight = pow(2.0, (qp - g_chromaScale[chFmt][qpc])); // takes into account of the chroma qp mapping and chroma qp Offset +double crWeight = pow(2.0, (qp - g_chromaScale[chFmt][qpc]) / 3.0); // takes into account of the chroma qp mapping and chroma qp Offset double chromaLambda = lambda / crWeight; // NOTE: set SAO lambda every Frame ___ x265-devel mailing list x265-devel@videolan.org https://mailman.videolan.org/listinfo/x265-devel