On Tue, Feb 4, 2020 at 5:10 PM Srikanth Kurapati < srikanth.kurap...@multicorewareinc.com> wrote:
> On Wed, Jan 29, 2020 at 12:20 PM <srikanth.kurap...@multicorewareinc.com> > wrote: > >> # HG changeset patch >> # User Srikanth Kurapati >> # Date 1580280547 -19800 >> # Wed Jan 29 12:19:07 2020 +0530 >> # Node ID e9c8c0089bddc9e9e47774b5fda1f4dff1fb45e4 >> # Parent fdbd4e4a2aff93bfc14b10efcd9e681a7ebae311 >> Edge Aware Quad Tree Establishment. >> >> This patch does the following: >> 1. Terminates recursion using edge information. >> 2. Adds modes for "--rskip". Modes 0,1 for current usage and 2,3 for edge >> based >> rskips for RD levels 0 to 6. >> > [KS] Since there are only 0 to 6 levels, should we mention? > [Srikanth] edited. > [KS] CLI option is --rskip, not --r-skip > [srikanth] edited > [KS] We do malloc_zero here including the margins instead of copying the > last row/column values to the padding. IIRC, copying last row/column values > has some significance for interpolation/MC. Can you verify if this is right > to do ? > How is this question relevant here ? I don't understand. > [KS] I basically wanted you to confirm if the pixel value of padding has an impact on the edge based depth decisions your algorithm is taking > if (m_fencPic->create(param, !!m_param->bCopyPicToFrame) && >> m_lowres.create(param, m_fencPic, param->rc.qgSize)) >> { >> X265_CHECK((m_reconColCount == NULL), "m_reconColCount was >> initialized"); >> @@ -267,4 +282,10 @@ >> X265_FREE(m_gaussianPic); >> X265_FREE(m_thetaPic); >> } >> + >> + if (m_param->bEnableRecursionSkip >= EDGE_BASED_RSKIP) >> + { >> + X265_FREE_ZERO(m_edgeBitPlane); >> + m_edgeBitPic = NULL; >> + } >> } >> diff -r fdbd4e4a2aff -r e9c8c0089bdd source/common/frame.h >> --- a/source/common/frame.h Sat Jan 25 18:08:03 2020 +0530 >> +++ b/source/common/frame.h Wed Jan 29 12:19:07 2020 +0530 >> @@ -137,6 +137,9 @@ >> pixel* m_gaussianPic; >> pixel* m_thetaPic; >> >> + pixel* m_edgeBitPlane; >> + pixel* m_edgeBitPic; >> > [KS] Do we need 2 pointers? m_edgeBitPlane is used only for > allocation/freeing, in all other places only edgepic is used. > [Srikanth] Yes. This is a fix to avoid unnecessary computation of the > edgepic pointer for every CU. > [KS] I understand that from your code. My suggestion is to see if you can find an alternate way to do the same logic without adding a member to Frame class. > + uint32_t m_bitPlaneSize; >> > [KS] when bitPlaneSize can be a local variable, what is the significance > for making it part of Frame? > [Srikanth] to avoid duplication of the computation in the future. > [KS] Is there any followup patch that uses this size? If not, I recommend making it local for now > [KS] bEnableRecursionSkip == RDCOST/EDGE_BASED_RSKIP is checked twice > unnecessarily (you do it once before complexityCheck CU call). Can you > optimize this code? > [KS] same here > [Srikanth] No. currently the idea is to run both the algorithms > independently in different modes and not simultaneously. We cannot > optimize further here even if I replace with a switch we will still be > doing the same number of integer comparisons. > [KS] I feel it can be a little cleaner. I will let you take the call on code optimization. > + { >> + int blockType = bestMode.cu.m_log2CUSize[0] - 2; >> + int shift = bestMode.cu.m_log2CUSize[0] * 2; >> + intptr_t stride = m_frame->m_fencPic->m_stride; >> + intptr_t blockOffsetLuma = bestMode.cu.m_cuPelX + >> bestMode.cu.m_cuPelY * stride; >> + uint64_t sum_ss = primitives.cu[blockType].var(m_frame->m_edgeBitPic >> + blockOffsetLuma, stride); >> + uint32_t sum = (uint32_t)sum_ss; >> + uint32_t ss = (uint32_t)(sum_ss >> 32); >> + uint32_t pixelCount = 1 << shift; >> + double cuEdgeVariance = (ss - ((double)sum * sum / pixelCount)) >> / pixelCount; >> + if (cuEdgeVariance > (double)m_param->edgeThreshold) >> + return false; >> + else >> + return true; >> } >> > [KS] Earlier for my question to combine edgeRecursion with > complexityCheck, you mentioned that - Homogeneity and variance are two > different metrics and they don't do the same functionality and so you > didn't combine. I would like to understand the reasoning behind combining > them in this patch. > [Srikanth] the function prototypes turned out to be the same and they are > for similar purpose. > >> - homo = homo / (cuSize * cuSize); >> - >> - if (homo < (.1 * mean)) >> - return true; >> - >> - return false; >> + else >> + return false; >> > [KS] When does the encoder hit this final "else"? > [Srikanth] It's just a safety mechanism which will otherwise lead to > compiler warnings. > [KS] The function is called only if rskip >= EDGE_BASED_RSKIP || rskip == RDCOST_BASED_RSKIP, in that case can help me understand why you need a else-if? Rewriting the conditional statements appropriately should resolve compiler warnings > > On Mon, Feb 3, 2020 at 3:46 PM Kavitha Sampath < > kavi...@multicorewareinc.com> wrote: > >> >> >> On Wed, Jan 29, 2020 at 12:20 PM <srikanth.kurap...@multicorewareinc.com> >> wrote: >> >>> # HG changeset patch >>> # User Srikanth Kurapati >>> # Date 1580280547 -19800 >>> # Wed Jan 29 12:19:07 2020 +0530 >>> # Node ID e9c8c0089bddc9e9e47774b5fda1f4dff1fb45e4 >>> # Parent fdbd4e4a2aff93bfc14b10efcd9e681a7ebae311 >>> Edge Aware Quad Tree Establishment. >>> >>> This patch does the following: >>> 1. Terminates recursion using edge information. >>> 2. Adds modes for "--rskip". Modes 0,1 for current usage and 2,3 for >>> edge based >>> rskips for RD levels 0 to 6. >>> >> [KS] Since there are only 0 to 6 levels, should we mention? >> >>> 3. Adds option "edge-threshold" to decide recursion skip using CU edge >>> density. >>> 4. Re uses edge information when already available in encoder. >>> >>> diff -r fdbd4e4a2aff -r e9c8c0089bdd doc/reST/cli.rst >>> --- a/doc/reST/cli.rst Sat Jan 25 18:08:03 2020 +0530 >>> +++ b/doc/reST/cli.rst Wed Jan 29 12:19:07 2020 +0530 >>> @@ -842,15 +842,31 @@ >>> Measure 2Nx2N merge candidates first; if no residual is found, >>> additional modes at that depth are not analysed. Default disabled >>> >>> -.. option:: --rskip, --no-rskip >>> - >>> - This option determines early exit from CU depth recursion. When >>> a skip CU is >>> - found, additional heuristics (depending on rd-level) are used to >>> decide whether >>> - to terminate recursion. In rdlevels 5 and 6, comparison with >>> inter2Nx2N is used, >>> - while at rdlevels 4 and neighbour costs are used to skip >>> recursion. >>> - Provides minimal quality degradation at good performance gains >>> when enabled. >>> - >>> - Default: enabled, disabled for :option:`--tune grain` >>> +.. option:: --rskip <0|1|2|3> >>> + >>> + This option determines early exit from CU depth recursion in >>> modes 1, 2 and 3. When a skip CU is >>> + found, additional heuristics (depending on RD level and rskip >>> mode) are used to decide whether >>> + to terminate recursion. The following table summarizes the >>> behavior. >>> + >>> + >>> >>> +----------+------------+----------------------------------------------------------------+ >>> + | RD Level | Rskip Mode | Skip Recursion Heuristic >>> | >>> + >>> >>> +==========+============+================================================================+ >>> + | 0 - 4 | 1 | Neighbour costs. >>> | >>> + >>> >>> +----------+------------+----------------------------------------------------------------+ >>> + | 5 - 6 | 1 | Comparison with inter2Nx2N. >>> | >>> + >>> >>> +----------+------------+----------------------------------------------------------------+ >>> + | 0 - 6 | 2 | CU edge denstiy. >>> | >>> + >>> >>> +----------+------------+----------------------------------------------------------------+ >>> + | 0 - 6 | 3 | CU edge denstiy with forceful skip >>> for lower levels of CTU. | >>> + >>> >>> +----------+------------+----------------------------------------------------------------+ >>> + >>> + Provides minimal quality degradation at good performance gains >>> for non-zero modes. >>> + :option:`--r-skip mode 0` means disabled. Default: 1, disabled >>> when :option:`--tune grain` is used. >>> >> [KS] CLI option is --rskip, not --r-skip >> >>> + >>> +.. option:: --edge-threshold <0..100> >>> + >>> + Denotes the minimum expected edge-density percentage within the >>> CU, below which the recursion is skipped. >>> + Default: 5, requires :option:`--rskip mode 2|3` to be enabled. >>> >>> .. option:: --splitrd-skip, --no-splitrd-skip >>> >>> diff -r fdbd4e4a2aff -r e9c8c0089bdd source/CMakeLists.txt >>> --- a/source/CMakeLists.txt Sat Jan 25 18:08:03 2020 +0530 >>> +++ b/source/CMakeLists.txt Wed Jan 29 12:19:07 2020 +0530 >>> @@ -29,7 +29,7 @@ >>> option(STATIC_LINK_CRT "Statically link C runtime for release builds" >>> OFF) >>> mark_as_advanced(FPROFILE_USE FPROFILE_GENERATE NATIVE_BUILD) >>> # X265_BUILD must be incremented each time the public API is changed >>> -set(X265_BUILD 188) >>> +set(X265_BUILD 189) >>> configure_file("${PROJECT_SOURCE_DIR}/x265.def.in" >>> "${PROJECT_BINARY_DIR}/x265.def") >>> configure_file("${PROJECT_SOURCE_DIR}/x265_config.h.in" >>> diff -r fdbd4e4a2aff -r e9c8c0089bdd source/common/common.h >>> --- a/source/common/common.h Sat Jan 25 18:08:03 2020 +0530 >>> +++ b/source/common/common.h Wed Jan 29 12:19:07 2020 +0530 >>> @@ -129,6 +129,7 @@ >>> typedef uint64_t sum2_t; >>> typedef uint64_t pixel4; >>> typedef int64_t ssum2_t; >>> +#define SHIFT_TO_BITPLANE 9 >>> #define HISTOGRAM_BINS 1024 >>> #define SHIFT 1 >>> #else >>> @@ -137,6 +138,7 @@ >>> typedef uint32_t sum2_t; >>> typedef uint32_t pixel4; >>> typedef int32_t ssum2_t; // Signed sum >>> +#define SHIFT_TO_BITPLANE 7 >>> #define HISTOGRAM_BINS 256 >>> #define SHIFT 0 >>> #endif // if HIGH_BIT_DEPTH >>> @@ -272,6 +274,9 @@ >>> #define MAX_TR_SIZE (1 << MAX_LOG2_TR_SIZE) >>> #define MAX_TS_SIZE (1 << MAX_LOG2_TS_SIZE) >>> >>> +#define RDCOST_BASED_RSKIP 1 >>> +#define EDGE_BASED_RSKIP 2 >>> + >>> #define COEF_REMAIN_BIN_REDUCTION 3 // indicates the level at which >>> the VLC >>> // transitions from Golomb-Rice >>> to TU+EG(k) >>> >>> diff -r fdbd4e4a2aff -r e9c8c0089bdd source/common/frame.cpp >>> --- a/source/common/frame.cpp Sat Jan 25 18:08:03 2020 +0530 >>> +++ b/source/common/frame.cpp Wed Jan 29 12:19:07 2020 +0530 >>> @@ -61,6 +61,8 @@ >>> m_edgePic = NULL; >>> m_gaussianPic = NULL; >>> m_thetaPic = NULL; >>> + m_edgeBitPlane = NULL; >>> + m_edgeBitPic = NULL; >>> } >>> >>> bool Frame::create(x265_param *param, float* quantOffsets) >>> @@ -115,6 +117,19 @@ >>> m_thetaPic = X265_MALLOC(pixel, m_stride * (maxHeight + >>> (m_lumaMarginY * 2))); >>> } >>> >>> + if (param->bEnableRecursionSkip >= EDGE_BASED_RSKIP) >>> + { >>> + uint32_t numCuInWidth = (param->sourceWidth + param->maxCUSize >>> - 1) / param->maxCUSize; >>> + uint32_t numCuInHeight = (param->sourceHeight + >>> param->maxCUSize - 1) / param->maxCUSize; >>> + uint32_t lumaMarginX = param->maxCUSize + 32; >>> + uint32_t lumaMarginY = param->maxCUSize + 16; >>> + uint32_t stride = (numCuInWidth * param->maxCUSize) + >>> (lumaMarginX << 1); >>> + uint32_t maxHeight = numCuInHeight * param->maxCUSize; >>> + m_bitPlaneSize = stride * (maxHeight + (lumaMarginY * 2)); >>> + CHECKED_MALLOC_ZERO(m_edgeBitPlane, pixel, m_bitPlaneSize); >>> + m_edgeBitPic = m_edgeBitPlane + lumaMarginY * stride + >>> lumaMarginX; >>> + } >>> + >>> >> [KS] We do malloc_zero here including the margins instead of copying the >> last row/column values to the padding. IIRC, copying last row/column values >> has some significance for interpolation/MC. Can you verify if this is right >> to do? >> >>> if (m_fencPic->create(param, !!m_param->bCopyPicToFrame) && >>> m_lowres.create(param, m_fencPic, param->rc.qgSize)) >>> { >>> X265_CHECK((m_reconColCount == NULL), "m_reconColCount was >>> initialized"); >>> @@ -267,4 +282,10 @@ >>> X265_FREE(m_gaussianPic); >>> X265_FREE(m_thetaPic); >>> } >>> + >>> + if (m_param->bEnableRecursionSkip >= EDGE_BASED_RSKIP) >>> + { >>> + X265_FREE_ZERO(m_edgeBitPlane); >>> + m_edgeBitPic = NULL; >>> + } >>> } >>> diff -r fdbd4e4a2aff -r e9c8c0089bdd source/common/frame.h >>> --- a/source/common/frame.h Sat Jan 25 18:08:03 2020 +0530 >>> +++ b/source/common/frame.h Wed Jan 29 12:19:07 2020 +0530 >>> @@ -137,6 +137,9 @@ >>> pixel* m_gaussianPic; >>> pixel* m_thetaPic; >>> >>> + pixel* m_edgeBitPlane; >>> + pixel* m_edgeBitPic; >>> >> [KS] Do we need 2 pointers? m_edgeBitPlane is used only for >> allocation/freeing, in all other places only edgepic is used. >> >>> + uint32_t m_bitPlaneSize; >>> >> [KS] when bitPlaneSize can be a local variable, what is the significance >> for making it part of Frame? >> >>> Frame(); >>> >>> bool create(x265_param *param, float* quantOffsets); >>> diff -r fdbd4e4a2aff -r e9c8c0089bdd source/common/param.cpp >>> --- a/source/common/param.cpp Sat Jan 25 18:08:03 2020 +0530 >>> +++ b/source/common/param.cpp Wed Jan 29 12:19:07 2020 +0530 >>> @@ -199,6 +199,7 @@ >>> param->bEnableWeightedBiPred = 0; >>> param->bEnableEarlySkip = 1; >>> param->bEnableRecursionSkip = 1; >>> + param->edgeThreshold = 0.05f; >>> param->bEnableAMP = 0; >>> param->bEnableRectInter = 0; >>> param->rdLevel = 3; >>> @@ -702,8 +703,9 @@ >>> OPT("ref") p->maxNumReferences = atoi(value); >>> OPT("fast-intra") p->bEnableFastIntra = atobool(value); >>> OPT("early-skip") p->bEnableEarlySkip = atobool(value); >>> - OPT("rskip") p->bEnableRecursionSkip = atobool(value); >>> - OPT("me")p->searchMethod = parseName(value, x265_motion_est_names, >>> bError); >>> + OPT("rskip") p->bEnableRecursionSkip = atoi(value); >>> + OPT("edge-threshold") p->edgeThreshold = atoi(value)/100.0f; >>> + OPT("me") p->searchMethod = parseName(value, x265_motion_est_names, >>> bError); >>> OPT("subme") p->subpelRefine = atoi(value); >>> OPT("merange") p->searchRange = atoi(value); >>> OPT("rect") p->bEnableRectInter = atobool(value); >>> @@ -919,7 +921,7 @@ >>> OPT("max-merge") p->maxNumMergeCand = (uint32_t)atoi(value); >>> OPT("temporal-mvp") p->bEnableTemporalMvp = atobool(value); >>> OPT("early-skip") p->bEnableEarlySkip = atobool(value); >>> - OPT("rskip") p->bEnableRecursionSkip = atobool(value); >>> + OPT("rskip") p->bEnableRecursionSkip = atoi(value); >>> OPT("rdpenalty") p->rdPenalty = atoi(value); >>> OPT("tskip") p->bEnableTransformSkip = atobool(value); >>> OPT("no-tskip-fast") p->bEnableTSkipFast = atobool(value); >>> @@ -1221,6 +1223,7 @@ >>> } >>> } >>> OPT("hist-threshold") p->edgeTransitionThreshold = atof(value); >>> + OPT("edge-threshold") p->edgeThreshold = atoi(value)/100.0f; >>> OPT("lookahead-threads") p->lookaheadThreads = atoi(value); >>> OPT("opt-cu-delta-qp") p->bOptCUDeltaQP = atobool(value); >>> OPT("multi-pass-opt-analysis") p->analysisMultiPassRefine = >>> atobool(value); >>> @@ -1596,9 +1599,16 @@ >>> CHECK(param->rdLevel < 1 || param->rdLevel > 6, >>> "RD Level is out of range"); >>> CHECK(param->rdoqLevel < 0 || param->rdoqLevel > 2, >>> - "RDOQ Level is out of range"); >>> + "RDOQ Level is out of range"); >>> CHECK(param->dynamicRd < 0 || param->dynamicRd > >>> x265_ADAPT_RD_STRENGTH, >>> - "Dynamic RD strength must be between 0 and 4"); >>> + "Dynamic RD strength must be between 0 and 4"); >>> + CHECK(param->bEnableRecursionSkip > 3 || >>> param->bEnableRecursionSkip < 0, >>> + "Invalid Recursion skip mode. Valid modes 0,1,2,3"); >>> + if (param->bEnableRecursionSkip >= EDGE_BASED_RSKIP) >>> + { >>> + CHECK(param->edgeThreshold < 0.0f || param->edgeThreshold > >>> 1.0f, >>> + "Minimum edge density percentage for a CU should be an >>> integer between 0 to 100"); >>> + } >>> CHECK(param->bframes && param->bframes >= param->lookaheadDepth && >>> !param->rc.bStatRead, >>> "Lookahead depth must be greater than the max consecutive >>> bframe count"); >>> CHECK(param->bframes < 0, >>> @@ -1908,7 +1918,9 @@ >>> TOOLVAL(param->psyRdoq, "psy-rdoq=%.2lf"); >>> TOOLOPT(param->bEnableRdRefine, "rd-refine"); >>> TOOLOPT(param->bEnableEarlySkip, "early-skip"); >>> - TOOLOPT(param->bEnableRecursionSkip, "rskip"); >>> + TOOLVAL(param->bEnableRecursionSkip, "rskip mode=%d"); >>> + if (param->bEnableRecursionSkip >= EDGE_BASED_RSKIP) >>> + TOOLVAL(param->edgeThreshold, "rskip-threshold=%.2f"); >>> TOOLOPT(param->bEnableSplitRdSkip, "splitrd-skip"); >>> TOOLVAL(param->noiseReductionIntra, "nr-intra=%d"); >>> TOOLVAL(param->noiseReductionInter, "nr-inter=%d"); >>> @@ -2067,6 +2079,9 @@ >>> s += sprintf(s, " selective-sao=%d", p->selectiveSAO); >>> BOOL(p->bEnableEarlySkip, "early-skip"); >>> BOOL(p->bEnableRecursionSkip, "rskip"); >>> + if (p->bEnableRecursionSkip >= EDGE_BASED_RSKIP) >>> + s += sprintf(s, " edge-threshold=%f", p->edgeThreshold); >>> + >>> BOOL(p->bEnableFastIntra, "fast-intra"); >>> BOOL(p->bEnableTSkipFast, "tskip-fast"); >>> BOOL(p->bCULossless, "cu-lossless"); >>> @@ -2374,6 +2389,7 @@ >>> dst->rdLevel = src->rdLevel; >>> dst->bEnableEarlySkip = src->bEnableEarlySkip; >>> dst->bEnableRecursionSkip = src->bEnableRecursionSkip; >>> + dst->edgeThreshold = src->edgeThreshold; >>> dst->bEnableFastIntra = src->bEnableFastIntra; >>> dst->bEnableTSkipFast = src->bEnableTSkipFast; >>> dst->bCULossless = src->bCULossless; >>> diff -r fdbd4e4a2aff -r e9c8c0089bdd source/common/pixel.cpp >>> --- a/source/common/pixel.cpp Sat Jan 25 18:08:03 2020 +0530 >>> +++ b/source/common/pixel.cpp Wed Jan 29 12:19:07 2020 +0530 >>> @@ -876,6 +876,18 @@ >>> } >>> } >>> >>> +static void planecopy_pp_shr_c(const pixel* src, intptr_t srcStride, >>> pixel* dst, intptr_t dstStride, int width, int height, int shift) >>> +{ >>> + for (int r = 0; r < height; r++) >>> + { >>> + for (int c = 0; c < width; c++) >>> + dst[c] = (pixel)((src[c] >> shift)); >>> + >>> + dst += dstStride; >>> + src += srcStride; >>> + } >>> +} >>> + >>> static void planecopy_sp_shl_c(const uint16_t* src, intptr_t srcStride, >>> pixel* dst, intptr_t dstStride, int width, int height, int shift, uint16_t >>> mask) >>> { >>> for (int r = 0; r < height; r++) >>> @@ -1316,6 +1328,7 @@ >>> p.planecopy_cp = planecopy_cp_c; >>> p.planecopy_sp = planecopy_sp_c; >>> p.planecopy_sp_shl = planecopy_sp_shl_c; >>> + p.planecopy_pp_shr = planecopy_pp_shr_c; >>> #if HIGH_BIT_DEPTH >>> p.planeClipAndMax = planeClipAndMax_c; >>> #endif >>> diff -r fdbd4e4a2aff -r e9c8c0089bdd source/common/primitives.h >>> --- a/source/common/primitives.h Sat Jan 25 18:08:03 2020 +0530 >>> +++ b/source/common/primitives.h Wed Jan 29 12:19:07 2020 +0530 >>> @@ -204,6 +204,7 @@ >>> typedef void (*sign_t)(int8_t *dst, const pixel *src1, const pixel >>> *src2, const int endX); >>> typedef void (*planecopy_cp_t) (const uint8_t* src, intptr_t srcStride, >>> pixel* dst, intptr_t dstStride, int width, int height, int shift); >>> typedef void (*planecopy_sp_t) (const uint16_t* src, intptr_t >>> srcStride, pixel* dst, intptr_t dstStride, int width, int height, int >>> shift, uint16_t mask); >>> +typedef void (*planecopy_pp_t) (const pixel* src, intptr_t srcStride, >>> pixel* dst, intptr_t dstStride, int width, int height, int shift); >>> typedef pixel (*planeClipAndMax_t)(pixel *src, intptr_t stride, int >>> width, int height, uint64_t *outsum, const pixel minPix, const pixel >>> maxPix); >>> >>> typedef void (*cutree_propagate_cost) (int* dst, const uint16_t* >>> propagateIn, const int32_t* intraCosts, const uint16_t* interCosts, const >>> int32_t* invQscales, const double* fpsFactor, int len); >>> @@ -358,6 +359,7 @@ >>> planecopy_cp_t planecopy_cp; >>> planecopy_sp_t planecopy_sp; >>> planecopy_sp_t planecopy_sp_shl; >>> + planecopy_pp_t planecopy_pp_shr; >>> planeClipAndMax_t planeClipAndMax; >>> >>> weightp_sp_t weight_sp; >>> diff -r fdbd4e4a2aff -r e9c8c0089bdd source/encoder/analysis.cpp >>> --- a/source/encoder/analysis.cpp Sat Jan 25 18:08:03 2020 +0530 >>> +++ b/source/encoder/analysis.cpp Wed Jan 29 12:19:07 2020 +0530 >>> @@ -1317,12 +1317,21 @@ >>> if (md.bestMode && m_param->bEnableRecursionSkip && >>> !bCtuInfoCheck && !(m_param->bAnalysisType == AVC_INFO && >>> m_param->analysisLoadReuseLevel == 7 && (m_modeFlag[0] || m_modeFlag[1]))) >>> { >>> skipRecursion = md.bestMode->cu.isSkipped(0); >>> - if (mightSplit && depth >= minDepth && !skipRecursion) >>> + if (mightSplit && !skipRecursion) >>> { >>> - if (depth) >>> - skipRecursion = recursionDepthCheck(parentCTU, >>> cuGeom, *md.bestMode); >>> - if (m_bHD && !skipRecursion && m_param->rdLevel == 2 && >>> md.fencYuv.m_size != MAX_CU_SIZE) >>> + if (depth >= minDepth && m_param->bEnableRecursionSkip >>> == RDCOST_BASED_RSKIP) >>> + { >>> + if (depth) >>> + skipRecursion = recursionDepthCheck(parentCTU, >>> cuGeom, *md.bestMode); >>> + if (m_bHD && !skipRecursion && m_param->rdLevel == >>> 2 && md.fencYuv.m_size != MAX_CU_SIZE) >>> + skipRecursion = complexityCheckCU(*md.bestMode); >>> + } >>> + else if (cuGeom.log2CUSize >= MAX_LOG2_CU_SIZE - 1 && >>> m_param->bEnableRecursionSkip >= EDGE_BASED_RSKIP) >>> + { >>> skipRecursion = complexityCheckCU(*md.bestMode); >>> + } >>> + else if (m_param->bEnableRecursionSkip > >>> EDGE_BASED_RSKIP) >>> + skipRecursion = true; >> >> } >>> } >>> if (m_param->bAnalysisType == AVC_INFO && md.bestMode && >>> cuGeom.numPartitions <= 16 && m_param->analysisLoadReuseLevel == 7) >>> @@ -2015,8 +2024,12 @@ >>> checkInter_rd5_6(md.pred[PRED_2Nx2N], cuGeom, SIZE_2Nx2N, >>> refMasks); >>> checkBestMode(md.pred[PRED_2Nx2N], cuGeom.depth); >>> >>> - if (m_param->bEnableRecursionSkip && depth && >>> m_modeDepth[depth - 1].bestMode) >>> + if (m_param->bEnableRecursionSkip == RDCOST_BASED_RSKIP && >>> depth && m_modeDepth[depth - 1].bestMode) >>> skipRecursion = md.bestMode && >>> !md.bestMode->cu.getQtRootCbf(0); >>> + else if (cuGeom.log2CUSize >= MAX_LOG2_CU_SIZE - 1 && >>> m_param->bEnableRecursionSkip >= EDGE_BASED_RSKIP) >>> + skipRecursion = md.bestMode && >>> complexityCheckCU(*md.bestMode); >>> + else if (m_param->bEnableRecursionSkip > EDGE_BASED_RSKIP) >>> + skipRecursion = true; >>> } >>> if (m_param->bAnalysisType == AVC_INFO && md.bestMode && >>> cuGeom.numPartitions <= 16 && m_param->analysisLoadReuseLevel == 7) >>> skipRecursion = true; >>> @@ -3525,26 +3538,47 @@ >>> >>> bool Analysis::complexityCheckCU(const Mode& bestMode) >>> { >>> - uint32_t mean = 0; >>> - uint32_t homo = 0; >>> - uint32_t cuSize = bestMode.fencYuv->m_size; >>> - for (uint32_t y = 0; y < cuSize; y++) { >>> - for (uint32_t x = 0; x < cuSize; x++) { >>> - mean += (bestMode.fencYuv->m_buf[0][y * cuSize + x]); >>> + if (m_param->bEnableRecursionSkip == RDCOST_BASED_RSKIP) >>> >> [KS] bEnableRecursionSkip == RDCOST/EDGE_BASED_RSKIP is checked twice >> unnecessarily (you do it once before complexityCheck CU call). Can you >> optimize this code? >> >>> + { >>> + uint32_t mean = 0; >>> + uint32_t homo = 0; >>> + uint32_t cuSize = bestMode.fencYuv->m_size; >>> + for (uint32_t y = 0; y < cuSize; y++) { >>> + for (uint32_t x = 0; x < cuSize; x++) { >>> + mean += (bestMode.fencYuv->m_buf[0][y * cuSize + x]); >>> + } >>> } >>> + mean = mean / (cuSize * cuSize); >>> + for (uint32_t y = 0; y < cuSize; y++) { >>> + for (uint32_t x = 0; x < cuSize; x++) { >>> + homo += abs(int(bestMode.fencYuv->m_buf[0][y * cuSize + >>> x] - mean)); >>> + } >>> + } >>> + homo = homo / (cuSize * cuSize); >>> + >>> + if (homo < (.1 * mean)) >>> + return true; >>> + >>> + return false; >>> } >>> - mean = mean / (cuSize * cuSize); >>> - for (uint32_t y = 0 ; y < cuSize; y++){ >>> - for (uint32_t x = 0 ; x < cuSize; x++){ >>> - homo += abs(int(bestMode.fencYuv->m_buf[0][y * cuSize + x] >>> - mean)); >>> - } >>> + else if (m_param->bEnableRecursionSkip >= EDGE_BASED_RSKIP) >>> >> [KS] same here >> >>> + { >>> + int blockType = bestMode.cu.m_log2CUSize[0] - 2; >>> + int shift = bestMode.cu.m_log2CUSize[0] * 2; >>> + intptr_t stride = m_frame->m_fencPic->m_stride; >>> + intptr_t blockOffsetLuma = bestMode.cu.m_cuPelX + >>> bestMode.cu.m_cuPelY * stride; >>> + uint64_t sum_ss = >>> primitives.cu[blockType].var(m_frame->m_edgeBitPic >>> + blockOffsetLuma, stride); >>> + uint32_t sum = (uint32_t)sum_ss; >>> + uint32_t ss = (uint32_t)(sum_ss >> 32); >>> + uint32_t pixelCount = 1 << shift; >>> + double cuEdgeVariance = (ss - ((double)sum * sum / pixelCount)) >>> / pixelCount; >>> + if (cuEdgeVariance > (double)m_param->edgeThreshold) >>> + return false; >>> + else >>> + return true; >>> } >>> >> [KS] Earlier for my question to combine edgeRecursion with >> complexityCheck, you mentioned that - Homogeneity and variance are two >> different metrics and they don't do the same functionality and so you >> didn't combine. I would like to understand the reasoning behind combining >> them in this patch. >> >>> - homo = homo / (cuSize * cuSize); >>> - >>> - if (homo < (.1 * mean)) >>> - return true; >>> - >>> - return false; >>> + else >>> + return false; >>> >> [KS] When does the encoder hit this final "else"? >> >>> } >>> >>> uint32_t Analysis::calculateCUVariance(const CUData& ctu, const CUGeom& >>> cuGeom) >>> @@ -3570,7 +3604,6 @@ >>> cnt++; >>> } >>> } >>> - >>> return cuVariance / cnt; >>> } >>> >>> diff -r fdbd4e4a2aff -r e9c8c0089bdd source/encoder/analysis.h >>> --- a/source/encoder/analysis.h Sat Jan 25 18:08:03 2020 +0530 >>> +++ b/source/encoder/analysis.h Wed Jan 29 12:19:07 2020 +0530 >>> @@ -52,7 +52,7 @@ >>> splitRefs = 0; >>> mvCost[0] = 0; // L0 >>> mvCost[1] = 0; // L1 >>> - sa8dCost = 0; >>> + sa8dCost = 0; >>> } >>> }; >>> >>> @@ -120,7 +120,6 @@ >>> >>> Mode& compressCTU(CUData& ctu, Frame& frame, const CUGeom& cuGeom, >>> const Entropy& initialContext); >>> int32_t loadTUDepth(CUGeom cuGeom, CUData parentCTU); >>> - >>> protected: >>> /* Analysis data for save/load mode, writes/reads data based on >>> absPartIdx */ >>> x265_analysis_inter_data* m_reuseInterDataCTU; >>> diff -r fdbd4e4a2aff -r e9c8c0089bdd source/encoder/encoder.cpp >>> --- a/source/encoder/encoder.cpp Sat Jan 25 18:08:03 2020 +0530 >>> +++ b/source/encoder/encoder.cpp Wed Jan 29 12:19:07 2020 +0530 >>> @@ -1343,9 +1343,9 @@ >>> int32_t numBytes = m_param->sourceBitDepth > 8 ? 2 : 1; >>> memset(m_edgePic, 0, bufSize * numBytes); >>> >>> - if (!computeEdge(m_edgePic, src, NULL, pic->width, pic->height, >>> pic->width, false)) >>> - { >>> - x265_log(m_param, X265_LOG_ERROR, "Failed edge computation!"); >>> + if (!computeEdge(m_edgePic, src, NULL, pic->width, pic->height, >>> pic->width, false, 1)) >>> + { >>> + x265_log(m_param, X265_LOG_ERROR, "Failed to compute edge!"); >>> return false; >>> } >>> >>> @@ -1660,6 +1660,12 @@ >>> } >>> } >>> } >>> + if (m_param->bEnableRecursionSkip >= EDGE_BASED_RSKIP >>> && m_param->bHistBasedSceneCut) >>> + { >>> + pixel* src = m_edgePic; >>> + primitives.planecopy_pp_shr(src, >>> inFrame->m_fencPic->m_picWidth, inFrame->m_edgeBitPic, >>> inFrame->m_fencPic->m_stride, >>> + inFrame->m_fencPic->m_picWidth, >>> inFrame->m_fencPic->m_picHeight, 0); >>> + } >>> } >>> else >>> { >>> diff -r fdbd4e4a2aff -r e9c8c0089bdd source/encoder/frameencoder.cpp >>> --- a/source/encoder/frameencoder.cpp Sat Jan 25 18:08:03 2020 +0530 >>> +++ b/source/encoder/frameencoder.cpp Wed Jan 29 12:19:07 2020 +0530 >>> @@ -130,7 +130,7 @@ >>> { >>> rowSum += sliceGroupSizeAccu; >>> m_sliceBaseRow[++sidx] = i; >>> - } >>> + } >>> } >>> X265_CHECK(sidx < m_param->maxSlices, "sliceID check failed!"); >>> m_sliceBaseRow[0] = 0; >>> @@ -268,6 +268,19 @@ >>> curFrame->m_encData->m_jobProvider = this; >>> curFrame->m_encData->m_slice->m_mref = m_mref; >>> >>> + if (!m_param->bHistBasedSceneCut && m_param->rc.aqMode != >>> X265_AQ_EDGE && m_param->bEnableRecursionSkip >= EDGE_BASED_RSKIP) >>> + { >>> + int height = curFrame->m_fencPic->m_picHeight; >>> + int width = curFrame->m_fencPic->m_picWidth; >>> + intptr_t stride = curFrame->m_fencPic->m_stride; >>> + >>> + if (!computeEdge(curFrame->m_edgeBitPic, >>> curFrame->m_fencPic->m_picOrg[0], NULL, stride, height, width, false, 1)) >>> + { >>> + x265_log(m_param, X265_LOG_ERROR, " Failed to compute edge >>> !"); >>> + return false; >>> + } >>> + } >>> + >>> if (!m_cuGeoms) >>> { >>> if (!initializeGeoms()) >>> diff -r fdbd4e4a2aff -r e9c8c0089bdd source/encoder/slicetype.cpp >>> --- a/source/encoder/slicetype.cpp Sat Jan 25 18:08:03 2020 +0530 >>> +++ b/source/encoder/slicetype.cpp Wed Jan 29 12:19:07 2020 +0530 >>> @@ -87,7 +87,7 @@ >>> >>> namespace X265_NS { >>> >>> -bool computeEdge(pixel *edgePic, pixel *refPic, pixel *edgeTheta, >>> intptr_t stride, int height, int width, bool bcalcTheta) >>> +bool computeEdge(pixel* edgePic, pixel* refPic, pixel* edgeTheta, >>> intptr_t stride, int height, int width, bool bcalcTheta, pixel whitePixel) >>> { >>> intptr_t rowOne = 0, rowTwo = 0, rowThree = 0, colOne = 0, colTwo = >>> 0, colThree = 0; >>> intptr_t middle = 0, topLeft = 0, topRight = 0, bottomLeft = 0, >>> bottomRight = 0; >>> @@ -141,7 +141,7 @@ >>> theta = 180 + theta; >>> edgeTheta[middle] = (pixel)theta; >>> } >>> - edgePic[middle] = (pixel)(gradientMagnitude >= >>> edgeThreshold ? edgeThreshold : blackPixel); >>> + edgePic[middle] = (pixel)(gradientMagnitude >= >>> EDGE_THRESHOLD ? whitePixel : blackPixel); >>> } >>> } >>> return true; >>> @@ -519,6 +519,13 @@ >>> if (param->rc.aqMode == X265_AQ_EDGE) >>> edgeFilter(curFrame, param); >>> >>> + if (param->rc.aqMode == X265_AQ_EDGE && >>> !param->bHistBasedSceneCut && param->bEnableRecursionSkip >= >>> EDGE_BASED_RSKIP) >>> + { >>> + pixel* src = curFrame->m_edgePic + >>> curFrame->m_fencPic->m_lumaMarginY * curFrame->m_fencPic->m_stride + >>> curFrame->m_fencPic->m_lumaMarginX; >>> + primitives.planecopy_pp_shr(src, >>> curFrame->m_fencPic->m_stride, curFrame->m_edgeBitPic, >>> + curFrame->m_fencPic->m_stride, >>> curFrame->m_fencPic->m_picWidth, curFrame->m_fencPic->m_picHeight, >>> SHIFT_TO_BITPLANE); >>> + } >>> + >>> if (param->rc.aqMode == X265_AQ_AUTO_VARIANCE || >>> param->rc.aqMode == X265_AQ_AUTO_VARIANCE_BIASED || param->rc.aqMode == >>> X265_AQ_EDGE) >>> { >>> double bit_depth_correction = 1.f / (1 << (2 * >>> (X265_DEPTH - 8))); >>> diff -r fdbd4e4a2aff -r e9c8c0089bdd source/encoder/slicetype.h >>> --- a/source/encoder/slicetype.h Sat Jan 25 18:08:03 2020 +0530 >>> +++ b/source/encoder/slicetype.h Wed Jan 29 12:19:07 2020 +0530 >>> @@ -44,9 +44,9 @@ >>> #define EDGE_INCLINATION 45 >>> >>> #if HIGH_BIT_DEPTH >>> -#define edgeThreshold 1023.0 >>> +#define EDGE_THRESHOLD 1023.0 >>> #else >>> -#define edgeThreshold 255.0 >>> +#define EDGE_THRESHOLD 255.0 >>> #endif >>> #define PI 3.14159265 >>> >>> @@ -101,7 +101,7 @@ >>> protected: >>> >>> uint32_t acEnergyCu(Frame* curFrame, uint32_t blockX, uint32_t >>> blockY, int csp, uint32_t qgSize); >>> - uint32_t edgeDensityCu(Frame*curFrame, uint32_t &avgAngle, uint32_t >>> blockX, uint32_t blockY, uint32_t qgSize); >>> + uint32_t edgeDensityCu(Frame* curFrame, uint32_t &avgAngle, >>> uint32_t blockX, uint32_t blockY, uint32_t qgSize); >>> uint32_t lumaSumCu(Frame* curFrame, uint32_t blockX, uint32_t >>> blockY, uint32_t qgSize); >>> uint32_t weightCostLuma(Lowres& fenc, Lowres& ref, WeightParam& wp); >>> bool allocWeightedRef(Lowres& fenc); >>> @@ -265,7 +265,6 @@ >>> CostEstimateGroup& operator=(const CostEstimateGroup&); >>> }; >>> >>> -bool computeEdge(pixel *edgePic, pixel *refPic, pixel *edgeTheta, >>> intptr_t stride, int height, int width, bool bcalcTheta); >>> - >>> +bool computeEdge(pixel* edgePic, pixel* refPic, pixel* edgeTheta, >>> intptr_t stride, int height, int width, bool bcalcTheta, pixel whitePixel = >>> EDGE_THRESHOLD); >>> } >>> #endif // ifndef X265_SLICETYPE_H >>> diff -r fdbd4e4a2aff -r e9c8c0089bdd source/test/regression-tests.txt >>> --- a/source/test/regression-tests.txt Sat Jan 25 18:08:03 2020 +0530 >>> +++ b/source/test/regression-tests.txt Wed Jan 29 12:19:07 2020 +0530 >>> @@ -162,7 +162,11 @@ >>> sintel_trailer_2k_1920x1080_24.yuv, --preset medium --hist-scenecut >>> --hist-threshold 0.02 --frame-dup --dup-threshold 60 --hrd --bitrate 10000 >>> --vbv-bufsize 15000 --vbv-maxrate 12000 >>> sintel_trailer_2k_1920x1080_24.yuv, --preset medium --hist-scenecut >>> --hist-threshold 0.02 >>> sintel_trailer_2k_1920x1080_24.yuv, --preset ultrafast --hist-scenecut >>> --hist-threshold 0.02 >>> - >>> +crowd_run_1080p50.yuv, --preset faster --ctu 32 --rskip 2 >>> --edge-threshold 5 >>> +crowd_run_1080p50.yuv, --preset fast --ctu 64 --rskip 2 >>> --edge-threshold 5 --aq-mode 4 >>> +crowd_run_1080p50.yuv, --preset slow --ctu 32 --rskip 2 >>> --edge-threshold 5 --hist-scenecut --hist-threshold 0.1 >>> +crowd_run_1080p50.yuv, --preset slower --ctu 16 --rskip 2 >>> --edge-threshold 5 --hist-scenecut --hist-threshold 0.1 --aq-mode 4 >>> + >>> # Main12 intraCost overflow bug test >>> 720p50_parkrun_ter.y4m,--preset medium >>> >>> diff -r fdbd4e4a2aff -r e9c8c0089bdd source/x265.h >>> --- a/source/x265.h Sat Jan 25 18:08:03 2020 +0530 >>> +++ b/source/x265.h Wed Jan 29 12:19:07 2020 +0530 >>> @@ -1255,9 +1255,9 @@ >>> * skip blocks. Default is disabled */ >>> int bEnableEarlySkip; >>> >>> - /* Enable early CU size decisions to avoid recursing to higher >>> depths. >>> + /* Enable early CU size decisions to avoid recursing to higher >>> depths. >>> * Default is enabled */ >>> - int bEnableRecursionSkip; >>> + int bEnableRecursionSkip; >>> >>> /* Use a faster search method to find the best intra mode. Default >>> is 0 */ >>> int bEnableFastIntra; >>> @@ -1857,7 +1857,7 @@ >>> double edgeTransitionThreshold; >>> >>> /* Enables histogram based scenecut detection algorithm to detect >>> scenecuts. Default disabled */ >>> - int bHistBasedSceneCut; >>> + int bHistBasedSceneCut; >>> >>> /* Enable HME search ranges for L0, L1 and L2 respectively. */ >>> int hmeRange[3]; >>> @@ -1874,7 +1874,7 @@ >>> * analysis information stored in analysis-save. Higher the refine >>> level higher >>> * the information stored. Default is 5 */ >>> int analysisSaveReuseLevel; >>> - >>> + >>> /* A value between 1 and 10 (both inclusive) determines the level of >>> * analysis information reused in analysis-load. Higher the refine >>> level higher >>> * the information reused. Default is 5 */ >>> @@ -1901,6 +1901,9 @@ >>> * info is available from the corresponding analysis-save. */ >>> >>> int confWinBottomOffset; >>> + >>> + /* Edge variance threshold for quad tree establishment. */ >>> + float edgeThreshold; >>> } x265_param; >>> >>> /* x265_param_alloc: >>> diff -r fdbd4e4a2aff -r e9c8c0089bdd source/x265cli.h >>> --- a/source/x265cli.h Sat Jan 25 18:08:03 2020 +0530 >>> +++ b/source/x265cli.h Wed Jan 29 12:19:07 2020 +0530 >>> @@ -105,8 +105,8 @@ >>> { "amp", no_argument, NULL, 0 }, >>> { "no-early-skip", no_argument, NULL, 0 }, >>> { "early-skip", no_argument, NULL, 0 }, >>> - { "no-rskip", no_argument, NULL, 0 }, >>> - { "rskip", no_argument, NULL, 0 }, >>> + { "rskip", required_argument, NULL, 0 }, >>> + { "edge-threshold", required_argument, NULL, 0 }, >>> { "no-fast-cbf", no_argument, NULL, 0 }, >>> { "fast-cbf", no_argument, NULL, 0 }, >>> { "no-tskip", no_argument, NULL, 0 }, >>> @@ -457,7 +457,9 @@ >>> H0(" --[no-]ssim-rd Enable ssim rate distortion >>> optimization, 0 to disable. Default %s\n", OPT(param->bSsimRd)); >>> H0(" --[no-]rd-refine Enable QP based RD refinement >>> for rd levels 5 and 6. Default %s\n", OPT(param->bEnableRdRefine)); >>> H0(" --[no-]early-skip Enable early SKIP detection. >>> Default %s\n", OPT(param->bEnableEarlySkip)); >>> - H0(" --[no-]rskip Enable early exit from >>> recursion. Default %s\n", OPT(param->bEnableRecursionSkip)); >>> + H0(" --rskip <mode> Set mode for early exit from >>> recursion. Mode 1: exit using rdcost. Mode 2: exit using edge density. Mode >>> 3: exit using edge density with forceful skip for small sized CU's." >>> + " Mode 0: disabled. Default >>> %s\n", OPT(param->bEnableRecursionSkip)); >>> + H1(" --edge-threshold Threshold in terms of >>> percentage for minimum edge density in CUs to terminate the recursion >>> depth. Applicable only for rskip modes 2 and 3. Default %s\n", >>> OPT(param->edgeThreshold)); >>> H1(" --[no-]tskip-fast Enable fast intra transform >>> skipping. Default %s\n", OPT(param->bEnableTSkipFast)); >>> H1(" --[no-]splitrd-skip Enable skipping split RD >>> analysis when sum of split CU rdCost larger than one split CU rdCost for >>> Intra CU. Default %s\n", OPT(param->bEnableSplitRdSkip)); >>> H1(" --nr-intra <integer> An integer value in range of 0 >>> to 2000, which denotes strength of noise reduction in intra CUs. Default >>> 0\n"); >>> _______________________________________________ >>> x265-devel mailing list >>> x265-devel@videolan.org >>> https://mailman.videolan.org/listinfo/x265-devel >>> >> >> >> -- >> Regards, >> Kavitha >> _______________________________________________ >> x265-devel mailing list >> x265-devel@videolan.org >> https://mailman.videolan.org/listinfo/x265-devel >> > > > -- > *With Regards,* > *Srikanth Kurapati.* > _______________________________________________ > x265-devel mailing list > x265-devel@videolan.org > https://mailman.videolan.org/listinfo/x265-devel > -- Regards, Kavitha
_______________________________________________ x265-devel mailing list x265-devel@videolan.org https://mailman.videolan.org/listinfo/x265-devel