On Mon, Jan 20, 2020 at 12:12 PM Srikanth Kurapati <
srikanth.kurap...@multicorewareinc.com> wrote:

> Patch does not apply on current x265 tip, please fix.
> [Srikanth] - resolved
> I am assuming the following will be sent as follow up patches
> 1. Edge aware quadtree feature for Rd levels 5, 6
> 2. Asm version of planecopy used for this feature
> [Srikanth] - Yes, they will be addressed in follow up patches.
> Check comments below
>
> On Fri, Jan 10, 2020 at 5:46 PM <srikanth.kurap...@multicorewareinc.com>
> wrote:
>
>> # HG changeset patch
>> # User Srikanth Kurapati
>> # Date 1578656713 -19800
>> #      Fri Jan 10 17:15:13 2020 +0530
>> # Node ID 82a92c26b4429327c9038d822e02ad6c0de290d4
>> # Parent  6b348d5b56d86ddfc3874d0f50f1283edab5fb4f
>> Edge Aware Quad Tree Establishment.
>>
>> This patch does the following:
>> 1. Terminates recursion using edge information.
>> 2. Adds modes for "--rskip". Modes 0,1 for current usage and 2 for edge
>> based
>> rskip for RD levels 0 to 4.
>> 3. Adds option "edge-threshold" to decide recursion skip using CU edge
>> density.
>> 4. Re uses edge information when already available in encoder.
>>
>> diff -r 6b348d5b56d8 -r 82a92c26b442 doc/reST/cli.rst
>> --- a/doc/reST/cli.rst  Fri Jan 10 14:38:32 2020 +0530
>> +++ b/doc/reST/cli.rst  Fri Jan 10 17:15:13 2020 +0530
>> @@ -842,15 +842,20 @@
>>         Measure 2Nx2N merge candidates first; if no residual is found,
>>         additional modes at that depth are not analysed. Default disabled
>>
>> -.. option:: --rskip, --no-rskip
>> -
>> -       This option determines early exit from CU depth recursion. When a
>> skip CU is
>> -       found, additional heuristics (depending on rd-level) are used to
>> decide whether
>> -       to terminate recursion. In rdlevels 5 and 6, comparison with
>> inter2Nx2N is used,
>> -       while at rdlevels 4 and neighbour costs are used to skip
>> recursion.
>> -       Provides minimal quality degradation at good performance gains
>> when enabled.
>> -
>> -       Default: enabled, disabled for :option:`--tune grain`
>> +.. option:: --rskip <0|1|2>
>> +
>> +       This option determines early exit from CU depth recursion when
>> enabled. When a skip CU is
>> +       found, additional heuristics (depending on RD level and rskip
>> mode) are used to decide whether
>> +       to terminate recursion. In RD levels 5 and 6, comparison with
>> inter2Nx2N is used,
>> +       while at RD levels 4 and below, neighbour costs are used to skip
>> recursion in mode 1, and CU edge density in mode 2.
>> +       Provides minimal quality degradation at good performance gains
>> when enabled. :option:`--r-skip mode 0` means disabled.
>> +
>> +       Default: 1, disabled when :option:`--tune grain` is used.
>> +
>> +.. option:: --edge-threshold <0..100>
>> +
>> +       Denotes the minimum edge-density percentage (computed as
>> variance) within the CU, below which the recursion is skipped.
>> +       Default: 5, requires :option:`--rskip mode 2` to be enabled.
>>
> [KS] I don't think it's necessary to talk about variance here. And again
> it is "minimum expected" (still not clear)
> [Srikanth] Addressed and Resolved.
>
>
>>  .. option:: --splitrd-skip, --no-splitrd-skip
>>
>> diff -r 6b348d5b56d8 -r 82a92c26b442 source/CMakeLists.txt
>> --- a/source/CMakeLists.txt     Fri Jan 10 14:38:32 2020 +0530
>> +++ b/source/CMakeLists.txt     Fri Jan 10 17:15:13 2020 +0530
>> @@ -29,7 +29,7 @@
>>  option(STATIC_LINK_CRT "Statically link C runtime for release builds"
>> OFF)
>>  mark_as_advanced(FPROFILE_USE FPROFILE_GENERATE NATIVE_BUILD)
>>  # X265_BUILD must be incremented each time the public API is changed
>> -set(X265_BUILD 186)
>> +set(X265_BUILD 187)
>>  configure_file("${PROJECT_SOURCE_DIR}/x265.def.in"
>>                 "${PROJECT_BINARY_DIR}/x265.def")
>>  configure_file("${PROJECT_SOURCE_DIR}/x265_config.h.in"
>> diff -r 6b348d5b56d8 -r 82a92c26b442 source/common/common.h
>> --- a/source/common/common.h    Fri Jan 10 14:38:32 2020 +0530
>> +++ b/source/common/common.h    Fri Jan 10 17:15:13 2020 +0530
>> @@ -129,6 +129,7 @@
>>  typedef uint64_t sum2_t;
>>  typedef uint64_t pixel4;
>>  typedef int64_t  ssum2_t;
>> +#define SHIFT_TO_BITPLANE 9
>>  #define HISTOGRAM_BINS 1024
>>  #define SHIFT 1
>>  #else
>> @@ -137,6 +138,7 @@
>>  typedef uint32_t sum2_t;
>>  typedef uint32_t pixel4;
>>  typedef int32_t  ssum2_t; // Signed sum
>> +#define SHIFT_TO_BITPLANE 7
>>  #define HISTOGRAM_BINS 256
>>  #define SHIFT 0
>>  #endif // if HIGH_BIT_DEPTH
>> @@ -272,6 +274,9 @@
>>  #define MAX_TR_SIZE (1 << MAX_LOG2_TR_SIZE)
>>  #define MAX_TS_SIZE (1 << MAX_LOG2_TS_SIZE)
>>
>> +#define RDCOST_BASED_RSKIP 1
>> +#define EDGE_BASED_RSKIP 2
>> +
>>  #define COEF_REMAIN_BIN_REDUCTION   3 // indicates the level at which
>> the VLC
>>                                        // transitions from Golomb-Rice to
>> TU+EG(k)
>>
>> diff -r 6b348d5b56d8 -r 82a92c26b442 source/common/frame.cpp
>> --- a/source/common/frame.cpp   Fri Jan 10 14:38:32 2020 +0530
>> +++ b/source/common/frame.cpp   Fri Jan 10 17:15:13 2020 +0530
>> @@ -61,6 +61,7 @@
>>      m_edgePic = NULL;
>>      m_gaussianPic = NULL;
>>      m_thetaPic = NULL;
>> +    m_edgeBitPlane = NULL;
>>  }
>>
>>  bool Frame::create(x265_param *param, float* quantOffsets)
>> @@ -115,6 +116,18 @@
>>          m_thetaPic = X265_MALLOC(pixel, m_stride * (maxHeight +
>> (m_lumaMarginY * 2)));
>>      }
>>
>> +    if (param->bEnableRecursionSkip == EDGE_BASED_RSKIP)
>> +    {
>> +        uint32_t numCuInWidth = (param->sourceWidth + param->maxCUSize -
>> 1) / param->maxCUSize;
>> +        uint32_t numCuInHeight = (param->sourceHeight + param->maxCUSize
>> - 1) / param->maxCUSize;
>> +        uint32_t lumaMarginX = param->maxCUSize + 32;
>> +        uint32_t lumaMarginY = param->maxCUSize + 16;
>> +        uint32_t stride = (numCuInWidth * param->maxCUSize) +
>> (lumaMarginX << 1);
>> +        uint32_t maxHeight = numCuInHeight * param->maxCUSize;
>> +        m_bitPlaneSize = stride * (maxHeight + (lumaMarginY * 2));
>> +        CHECKED_MALLOC_ZERO(m_edgeBitPlane, pixel, m_bitPlaneSize);
>> +    }
>> +
>>      if (m_fencPic->create(param, !!m_param->bCopyPicToFrame) &&
>> m_lowres.create(param, m_fencPic, param->rc.qgSize))
>>      {
>>          X265_CHECK((m_reconColCount == NULL), "m_reconColCount was
>> initialized");
>> @@ -267,4 +280,9 @@
>>          X265_FREE(m_gaussianPic);
>>          X265_FREE(m_thetaPic);
>>      }
>> +
>> +    if (m_param->bEnableRecursionSkip == EDGE_BASED_RSKIP)
>> +    {
>> +        X265_FREE(m_edgeBitPlane);
>> +    }
>>  }
>> diff -r 6b348d5b56d8 -r 82a92c26b442 source/common/frame.h
>> --- a/source/common/frame.h     Fri Jan 10 14:38:32 2020 +0530
>> +++ b/source/common/frame.h     Fri Jan 10 17:15:13 2020 +0530
>> @@ -137,6 +137,8 @@
>>      pixel*                 m_gaussianPic;
>>      pixel*                 m_thetaPic;
>>
>> +    pixel*                 m_edgeBitPlane;
>> +    uint32_t               m_bitPlaneSize;
>>      Frame();
>>
>>      bool create(x265_param *param, float* quantOffsets);
>> diff -r 6b348d5b56d8 -r 82a92c26b442 source/common/param.cpp
>> --- a/source/common/param.cpp   Fri Jan 10 14:38:32 2020 +0530
>> +++ b/source/common/param.cpp   Fri Jan 10 17:15:13 2020 +0530
>> @@ -199,6 +199,7 @@
>>      param->bEnableWeightedBiPred = 0;
>>      param->bEnableEarlySkip = 1;
>>      param->bEnableRecursionSkip = 1;
>> +    param->edgeThreshold = 0.05f;
>>      param->bEnableAMP = 0;
>>      param->bEnableRectInter = 0;
>>      param->rdLevel = 3;
>> @@ -696,7 +697,8 @@
>>      OPT("ref") p->maxNumReferences = atoi(value);
>>      OPT("fast-intra") p->bEnableFastIntra = atobool(value);
>>      OPT("early-skip") p->bEnableEarlySkip = atobool(value);
>> -    OPT("rskip") p->bEnableRecursionSkip = atobool(value);
>> +    OPT("rskip") p->bEnableRecursionSkip = atoi(value);
>> +    OPT("edge-threshold") p->edgeThreshold = atoi(value)/100.0f;
>>
> [KS] White space
> [Srikanth] Added.
> [KS] Edge threshold is applicable only for rskip 2, should this condition
> be checking that?
> [Srikanth] Yes, Failure in the derived range indicates failure in input
> range.
>
>>      CHECK(param->bframes && param->bframes >= param->lookaheadDepth &&
>> !param->rc.bStatRead,
>>            "Lookahead depth must be greater than the max consecutive
>> bframe count");
>>      CHECK(param->bframes < 0,
>> @@ -1891,7 +1901,11 @@
>>      TOOLVAL(param->psyRdoq, "psy-rdoq=%.2lf");
>>      TOOLOPT(param->bEnableRdRefine, "rd-refine");
>>      TOOLOPT(param->bEnableEarlySkip, "early-skip");
>> -    TOOLOPT(param->bEnableRecursionSkip, "rskip");
>> +    TOOLVAL(param->bEnableRecursionSkip, "rskip mode=%d");
>> +    if (param->bEnableRecursionSkip == EDGE_BASED_RSKIP)
>> +    {
>> +        TOOLVAL(param->edgeThreshold, "rskip-threshold=%.2f");
>> +    }
>>
> [KS] Braces are optional for single line loop/conditional blocks. Good to
> follow to maintain consistent coding style
> [Srikanth]  Addressed.
> [KS] Regarding the question on topskip you mentioned that it is applied on
> the lower levels of the ctu and increased the overall processing time. Do
> you mean that the topskip algo is compute intensive? If that's the case we
> do not restrict its computation in rd 0 to 4, then why is the processing
> time increasing? Can you clarify?
> [Srikanth] In the worst case of a complete coding unit tree (when the
> frames are extremely edgy) we will have to compute 4^d more variances for
> every parent node. This leads to negligable gain in fps over the existing
> method.
>
> +                else if (cuGeom.log2CUSize >= 5 &&
>> m_param->bEnableRecursionSkip == EDGE_BASED_RSKIP)
>
> [KS] Don't use numbers
>
>> +                {
>> +                    skipRecursion = edgeRecursionSkip(parentCTU,
>> cuGeom.log2CUSize);
>> +                }
>>
> [KS] As discussed, are we planning to let users do both rskip 1 and 2 (if
> reqd), given that both algorithms are orthogonal? Can that be expected in
> the subsequent patch?
> [Srikanth] We can expect that only if the combination experiment gives
> good results.
>
>>              }
>>          }
>> +
>>          if (m_param->bAnalysisType == AVC_INFO && md.bestMode &&
>> cuGeom.numPartitions <= 16 && m_param->analysisReuseLevel == 7)
>>              skipRecursion = true;
>>          /* Step 2. Evaluate each of the 4 split sub-blocks in series */
>> @@ -3543,6 +3551,25 @@
>>      return false;
>>  }
>>
>> +bool Analysis::edgeRecursionSkip(const CUData& ctu, int log2CUSize)
>> +{
>> +    int blockType = log2CUSize - 2;
>> +    int shift = log2CUSize * 2;
>> +    intptr_t stride = m_frame->m_fencPic->m_stride;
>> +    pixel* edgePic = m_frame->m_edgeBitPlane +
>> m_frame->m_fencPic->m_lumaMarginY * m_frame->m_fencPic->m_stride +
>> m_frame->m_fencPic->m_lumaMarginX;
>> +    intptr_t blockOffsetLuma = ctu.m_cuPelX + ctu.m_cuPelY * stride;
>>
> [KS] Are you sure the blockOffset computation is correct? pelx and
> pely will give incorrect values. Did you debug and verify your logic?
> [Srikanth] Yes. ctu.pelx and ctu.pely values are always multiples of max
> cu & max cu -1 which serves the requirement of the algorithm.
>
[KS] That is not the only requirement. pelx and pely provides the starting
offset of the buffer for which variance should be computed. parentCTU's
pelx and pely do not return the correct x,y coordinates of the CU
undergoing analysis.
I once again insist you to debug, understand the computation of pelx and
pely and fix this before sending the next updated patch.

>
> +    uint64_t sum_ss = primitives.cu[blockType].var(edgePic +
>> blockOffsetLuma, stride);
>> +    uint32_t sum = (uint32_t)sum_ss;
>> +    uint32_t ss = (uint32_t)(sum_ss >> 32);
>> +    uint32_t pixelCount = 1 << shift;
>> +    double cuEdgeVariance = (ss - ((double)sum * sum / pixelCount)) /
>> pixelCount;
>> +
>> +    if (cuEdgeVariance > (double)m_param->edgeThreshold)
>> +        return false;
>> +    else
>> +        return true;
>> +}
>> +
>>  uint32_t Analysis::calculateCUVariance(const CUData& ctu, const CUGeom&
>> cuGeom)
>>  {
>>      uint32_t cuVariance = 0;
>> @@ -3566,7 +3593,6 @@
>>              cnt++;
>>          }
>>      }
>> -
>>      return cuVariance / cnt;
>>  }
>>
>> diff -r 6b348d5b56d8 -r 82a92c26b442 source/encoder/analysis.h
>> --- a/source/encoder/analysis.h Fri Jan 10 14:38:32 2020 +0530
>> +++ b/source/encoder/analysis.h Fri Jan 10 17:15:13 2020 +0530
>> @@ -52,7 +52,7 @@
>>          splitRefs = 0;
>>          mvCost[0] = 0; // L0
>>          mvCost[1] = 0; // L1
>> -        sa8dCost    = 0;
>> +        sa8dCost  = 0;
>>      }
>>  };
>>
>> @@ -120,7 +120,6 @@
>>
>>      Mode& compressCTU(CUData& ctu, Frame& frame, const CUGeom& cuGeom,
>> const Entropy& initialContext);
>>      int32_t loadTUDepth(CUGeom cuGeom, CUData parentCTU);
>> -
>>  protected:
>>      /* Analysis data for save/load mode, writes/reads data based on
>> absPartIdx */
>>      x265_analysis_inter_data*  m_reuseInterDataCTU;
>> @@ -192,6 +191,7 @@
>>      uint32_t topSkipMinDepth(const CUData& parentCTU, const CUGeom&
>> cuGeom);
>>      bool recursionDepthCheck(const CUData& parentCTU, const CUGeom&
>> cuGeom, const Mode& bestMode);
>>      bool complexityCheckCU(const Mode& bestMode);
>> +    bool edgeRecursionSkip(const CUData& parentCTU, int32_t log2CuSize);
>>
>>      /* generate residual and recon pixels for an entire CTU recursively
>> (RD0) */
>>      void encodeResidue(const CUData& parentCTU, const CUGeom& cuGeom);
>> diff -r 6b348d5b56d8 -r 82a92c26b442 source/encoder/encoder.cpp
>> --- a/source/encoder/encoder.cpp        Fri Jan 10 14:38:32 2020 +0530
>> +++ b/source/encoder/encoder.cpp        Fri Jan 10 17:15:13 2020 +0530
>> @@ -1351,9 +1351,9 @@
>>      int32_t numBytes = m_param->sourceBitDepth > 8 ? 2 : 1;
>>      memset(m_edgePic, 0, bufSize * numBytes);
>>
>> -    if (!computeEdge(m_edgePic, src, NULL, pic->width, pic->height,
>> pic->width, false))
>> -    {
>> -        x265_log(m_param, X265_LOG_ERROR, "Failed edge computation!");
>> +    if (!computeEdge(m_edgePic, src, NULL, pic->width, pic->height,
>> pic->width, false, 1))
>> +    {
>> +        x265_log(m_param, X265_LOG_ERROR, "Failed to compute edge!");
>>          return false;
>>      }
>>
>> @@ -1668,6 +1668,13 @@
>>                          }
>>                      }
>>                  }
>> +                if (m_param->bEnableRecursionSkip == EDGE_BASED_RSKIP &&
>> m_param->bHistBasedSceneCut)
>> +                {
>> +                    pixel* src = m_edgePic;
>> +                    pixel* edgePic = inFrame->m_edgeBitPlane +
>> inFrame->m_fencPic->m_lumaMarginY * inFrame->m_fencPic->m_stride +
>> inFrame->m_fencPic->m_lumaMarginX;
>> +                    primitives.planecopy_pp_shr(src,
>> inFrame->m_fencPic->m_picWidth, edgePic, inFrame->m_fencPic->m_stride,
>> +                        inFrame->m_fencPic->m_picWidth,
>> inFrame->m_fencPic->m_picHeight, 0);
>> +                }
>>              }
>>              else
>>              {
>> diff -r 6b348d5b56d8 -r 82a92c26b442 source/encoder/frameencoder.cpp
>> --- a/source/encoder/frameencoder.cpp   Fri Jan 10 14:38:32 2020 +0530
>> +++ b/source/encoder/frameencoder.cpp   Fri Jan 10 17:15:13 2020 +0530
>> @@ -130,7 +130,7 @@
>>          {
>>              rowSum += sliceGroupSizeAccu;
>>              m_sliceBaseRow[++sidx] = i;
>> -        }
>> +        }
>>      }
>>      X265_CHECK(sidx < m_param->maxSlices, "sliceID check failed!");
>>      m_sliceBaseRow[0] = 0;
>> @@ -268,6 +268,20 @@
>>      curFrame->m_encData->m_jobProvider = this;
>>      curFrame->m_encData->m_slice->m_mref = m_mref;
>>
>> +    if (!m_param->bHistBasedSceneCut && m_param->rc.aqMode !=
>> X265_AQ_EDGE && m_param->bEnableRecursionSkip == EDGE_BASED_RSKIP)
>> +    {
>> +        int height = curFrame->m_fencPic->m_picHeight;
>> +        int width = curFrame->m_fencPic->m_picWidth;
>> +        intptr_t stride = curFrame->m_fencPic->m_stride;
>> +        pixel* edgePic = curFrame->m_edgeBitPlane +
>> curFrame->m_fencPic->m_lumaMarginY * curFrame->m_fencPic->m_stride +
>> curFrame->m_fencPic->m_lumaMarginX;
>> +
>> +        if (!computeEdge(edgePic, curFrame->m_fencPic->m_picOrg[0],
>> NULL, stride, height, width, false, 1))
>> +        {
>> +            x265_log(m_param, X265_LOG_ERROR, " Failed to compute edge
>> !");
>> +            return false;
>> +        }
>> +    }
>> +
>>      if (!m_cuGeoms)
>>      {
>>          if (!initializeGeoms())
>> diff -r 6b348d5b56d8 -r 82a92c26b442 source/encoder/slicetype.cpp
>> --- a/source/encoder/slicetype.cpp      Fri Jan 10 14:38:32 2020 +0530
>> +++ b/source/encoder/slicetype.cpp      Fri Jan 10 17:15:13 2020 +0530
>> @@ -87,7 +87,7 @@
>>
>>  namespace X265_NS {
>>
>> -bool computeEdge(pixel *edgePic, pixel *refPic, pixel *edgeTheta,
>> intptr_t stride, int height, int width, bool bcalcTheta)
>> +bool computeEdge(pixel* edgePic, pixel* refPic, pixel* edgeTheta,
>> intptr_t stride, int height, int width, bool bcalcTheta, pixel whitePixel)
>>  {
>>      intptr_t rowOne = 0, rowTwo = 0, rowThree = 0, colOne = 0, colTwo =
>> 0, colThree = 0;
>>      intptr_t middle = 0, topLeft = 0, topRight = 0, bottomLeft = 0,
>> bottomRight = 0;
>> @@ -141,7 +141,7 @@
>>                         theta = 180 + theta;
>>                      edgeTheta[middle] = (pixel)theta;
>>                  }
>> -                edgePic[middle] = (pixel)(gradientMagnitude >=
>> edgeThreshold ? edgeThreshold : blackPixel);
>> +                edgePic[middle] = (pixel)(gradientMagnitude >=
>> EDGE_THRESHOLD ? whitePixel : blackPixel);
>>              }
>>          }
>>          return true;
>> @@ -519,6 +519,14 @@
>>                  if (param->rc.aqMode == X265_AQ_EDGE)
>>                      edgeFilter(curFrame, param);
>>
>> +                if (param->rc.aqMode == X265_AQ_EDGE &&
>> !param->bHistBasedSceneCut && param->bEnableRecursionSkip ==
>> EDGE_BASED_RSKIP)
>> +                {
>> +                    pixel* src = curFrame->m_edgePic +
>> curFrame->m_fencPic->m_lumaMarginY * curFrame->m_fencPic->m_stride +
>> curFrame->m_fencPic->m_lumaMarginX;
>> +                    pixel* dst = curFrame->m_edgeBitPlane +
>> curFrame->m_fencPic->m_lumaMarginY * curFrame->m_fencPic->m_stride +
>> curFrame->m_fencPic->m_lumaMarginX;
>> +                    primitives.planecopy_pp_shr(src,
>> curFrame->m_fencPic->m_stride, dst,
>> +                        curFrame->m_fencPic->m_stride,
>> curFrame->m_fencPic->m_picWidth, curFrame->m_fencPic->m_picHeight,
>> SHIFT_TO_BITPLANE);
>> +                }
>> +
>>                  if (param->rc.aqMode == X265_AQ_AUTO_VARIANCE ||
>> param->rc.aqMode == X265_AQ_AUTO_VARIANCE_BIASED || param->rc.aqMode ==
>> X265_AQ_EDGE)
>>                  {
>>                      double bit_depth_correction = 1.f / (1 << (2 *
>> (X265_DEPTH - 8)));
>> diff -r 6b348d5b56d8 -r 82a92c26b442 source/encoder/slicetype.h
>> --- a/source/encoder/slicetype.h        Fri Jan 10 14:38:32 2020 +0530
>> +++ b/source/encoder/slicetype.h        Fri Jan 10 17:15:13 2020 +0530
>> @@ -44,9 +44,9 @@
>>  #define EDGE_INCLINATION 45
>>
>>  #if HIGH_BIT_DEPTH
>> -#define edgeThreshold 1023.0
>> +#define EDGE_THRESHOLD 1023.0
>>  #else
>> -#define edgeThreshold 255.0
>> +#define EDGE_THRESHOLD 255.0
>>  #endif
>>  #define PI 3.14159265
>>
>> @@ -101,7 +101,7 @@
>>  protected:
>>
>>      uint32_t acEnergyCu(Frame* curFrame, uint32_t blockX, uint32_t
>> blockY, int csp, uint32_t qgSize);
>> -    uint32_t edgeDensityCu(Frame*curFrame, uint32_t &avgAngle, uint32_t
>> blockX, uint32_t blockY, uint32_t qgSize);
>> +    uint32_t edgeDensityCu(Frame* curFrame, uint32_t &avgAngle, uint32_t
>> blockX, uint32_t blockY, uint32_t qgSize);
>>      uint32_t lumaSumCu(Frame* curFrame, uint32_t blockX, uint32_t
>> blockY, uint32_t qgSize);
>>      uint32_t weightCostLuma(Lowres& fenc, Lowres& ref, WeightParam& wp);
>>      bool     allocWeightedRef(Lowres& fenc);
>> @@ -265,7 +265,6 @@
>>      CostEstimateGroup& operator=(const CostEstimateGroup&);
>>  };
>>
>> -bool computeEdge(pixel *edgePic, pixel *refPic, pixel *edgeTheta,
>> intptr_t stride, int height, int width, bool bcalcTheta);
>> -
>> +bool computeEdge(pixel* edgePic, pixel* refPic, pixel* edgeTheta,
>> intptr_t stride, int height, int width, bool bcalcTheta, pixel whitePixel =
>> EDGE_THRESHOLD);
>>  }
>>  #endif // ifndef X265_SLICETYPE_H
>> diff -r 6b348d5b56d8 -r 82a92c26b442 source/test/regression-tests.txt
>> --- a/source/test/regression-tests.txt  Fri Jan 10 14:38:32 2020 +0530
>> +++ b/source/test/regression-tests.txt  Fri Jan 10 17:15:13 2020 +0530
>> @@ -161,6 +161,11 @@
>>  Island_960x540_24.yuv,--no-cutree --aq-mode 0 --bitrate 6000
>> --scenecut-aware-qp
>>  sintel_trailer_2k_1920x1080_24.yuv, --preset medium --hist-scenecut
>> --hist-threshold 0.02 --frame-dup --dup-threshold 60 --hrd --bitrate 10000
>> --vbv-bufsize 15000 --vbv-maxrate 12000
>>  sintel_trailer_2k_1920x1080_24.yuv, --preset medium --hist-scenecut
>> --hist-threshold 0.02
>> +crowd_run_1080p50.yuv, --rskip 2 --edge-threshold 5
>> +crowd_run_1080p50.yuv, --rskip 2 --edge-threshold 5 --aq-mode 4
>> +crowd_run_1080p50.yuv, --rskip 2 --edge-threshold 5 --hist-scenecut
>> --hist-threshold 0.1
>> +crowd_run_1080p50.yuv, --rskip 2 --edge-threshold 5 --hist-scenecut
>> --hist-threshold 0.1 --aq-mode 4
>> +
>>
> [KS] Test presets with different maxCU size
> [Srikanth] Addressed.
> [KS] This is no longer on/off CLI to enable/disable (Previous review
> comment not addressed)
> [Srikanth] addressed.
>
>> +    H1("   --edge-threshold              Threshold in terms of
>> percentage for edge density in CUs to terminate the recursion depth.
>> Applicable only for rskip mode 2. Default %s\n", OPT(param->edgeThreshold));
>>      H1("   --[no-]tskip-fast             Enable fast intra transform
>> skipping. Default %s\n", OPT(param->bEnableTSkipFast));
>>      H1("   --[no-]splitrd-skip           Enable skipping split RD
>> analysis when sum of split CU rdCost larger than one split CU rdCost for
>> Intra CU. Default %s\n", OPT(param->bEnableSplitRdSkip));
>>      H1("   --nr-intra <integer>          An integer value in range of 0
>> to 2000, which denotes strength of noise reduction in intra CUs. Default
>> 0\n");
>
>
> On Mon, Jan 13, 2020 at 3:48 PM Kavitha Sampath <
> kavi...@multicorewareinc.com> wrote:
>
>> Patch does not apply on current x265 tip, please fix.
>>
>> I am assuming the following will be sent as follow up patches
>> 1. Edge aware quadtree feature for Rd levels 5, 6
>> 2. Asm version of planecopy used for this feature
>>
>> Check comments below
>>
>> On Fri, Jan 10, 2020 at 5:46 PM <srikanth.kurap...@multicorewareinc.com>
>> wrote:
>>
>>> # HG changeset patch
>>> # User Srikanth Kurapati
>>> # Date 1578656713 -19800
>>> #      Fri Jan 10 17:15:13 2020 +0530
>>> # Node ID 82a92c26b4429327c9038d822e02ad6c0de290d4
>>> # Parent  6b348d5b56d86ddfc3874d0f50f1283edab5fb4f
>>> Edge Aware Quad Tree Establishment.
>>>
>>> This patch does the following:
>>> 1. Terminates recursion using edge information.
>>> 2. Adds modes for "--rskip". Modes 0,1 for current usage and 2 for edge
>>> based
>>> rskip for RD levels 0 to 4.
>>> 3. Adds option "edge-threshold" to decide recursion skip using CU edge
>>> density.
>>> 4. Re uses edge information when already available in encoder.
>>>
>>> diff -r 6b348d5b56d8 -r 82a92c26b442 doc/reST/cli.rst
>>> --- a/doc/reST/cli.rst  Fri Jan 10 14:38:32 2020 +0530
>>> +++ b/doc/reST/cli.rst  Fri Jan 10 17:15:13 2020 +0530
>>> @@ -842,15 +842,20 @@
>>>         Measure 2Nx2N merge candidates first; if no residual is found,
>>>         additional modes at that depth are not analysed. Default disabled
>>>
>>> -.. option:: --rskip, --no-rskip
>>> -
>>> -       This option determines early exit from CU depth recursion. When
>>> a skip CU is
>>> -       found, additional heuristics (depending on rd-level) are used to
>>> decide whether
>>> -       to terminate recursion. In rdlevels 5 and 6, comparison with
>>> inter2Nx2N is used,
>>> -       while at rdlevels 4 and neighbour costs are used to skip
>>> recursion.
>>> -       Provides minimal quality degradation at good performance gains
>>> when enabled.
>>> -
>>> -       Default: enabled, disabled for :option:`--tune grain`
>>> +.. option:: --rskip <0|1|2>
>>> +
>>> +       This option determines early exit from CU depth recursion when
>>> enabled. When a skip CU is
>>> +       found, additional heuristics (depending on RD level and rskip
>>> mode) are used to decide whether
>>> +       to terminate recursion. In RD levels 5 and 6, comparison with
>>> inter2Nx2N is used,
>>> +       while at RD levels 4 and below, neighbour costs are used to skip
>>> recursion in mode 1, and CU edge density in mode 2.
>>> +       Provides minimal quality degradation at good performance gains
>>> when enabled. :option:`--r-skip mode 0` means disabled.
>>> +
>>> +       Default: 1, disabled when :option:`--tune grain` is used.
>>> +
>>> +.. option:: --edge-threshold <0..100>
>>> +
>>> +       Denotes the minimum edge-density percentage (computed as
>>> variance) within the CU, below which the recursion is skipped.
>>> +       Default: 5, requires :option:`--rskip mode 2` to be enabled.
>>>
>> [KS] I don't think it's necessary to talk about variance here. And again
>> it is "minimum expected" (still not clear)
>>
>>
>>>  .. option:: --splitrd-skip, --no-splitrd-skip
>>>
>>> diff -r 6b348d5b56d8 -r 82a92c26b442 source/CMakeLists.txt
>>> --- a/source/CMakeLists.txt     Fri Jan 10 14:38:32 2020 +0530
>>> +++ b/source/CMakeLists.txt     Fri Jan 10 17:15:13 2020 +0530
>>> @@ -29,7 +29,7 @@
>>>  option(STATIC_LINK_CRT "Statically link C runtime for release builds"
>>> OFF)
>>>  mark_as_advanced(FPROFILE_USE FPROFILE_GENERATE NATIVE_BUILD)
>>>  # X265_BUILD must be incremented each time the public API is changed
>>> -set(X265_BUILD 186)
>>> +set(X265_BUILD 187)
>>>  configure_file("${PROJECT_SOURCE_DIR}/x265.def.in"
>>>                 "${PROJECT_BINARY_DIR}/x265.def")
>>>  configure_file("${PROJECT_SOURCE_DIR}/x265_config.h.in"
>>> diff -r 6b348d5b56d8 -r 82a92c26b442 source/common/common.h
>>> --- a/source/common/common.h    Fri Jan 10 14:38:32 2020 +0530
>>> +++ b/source/common/common.h    Fri Jan 10 17:15:13 2020 +0530
>>> @@ -129,6 +129,7 @@
>>>  typedef uint64_t sum2_t;
>>>  typedef uint64_t pixel4;
>>>  typedef int64_t  ssum2_t;
>>> +#define SHIFT_TO_BITPLANE 9
>>>  #define HISTOGRAM_BINS 1024
>>>  #define SHIFT 1
>>>  #else
>>> @@ -137,6 +138,7 @@
>>>  typedef uint32_t sum2_t;
>>>  typedef uint32_t pixel4;
>>>  typedef int32_t  ssum2_t; // Signed sum
>>> +#define SHIFT_TO_BITPLANE 7
>>>  #define HISTOGRAM_BINS 256
>>>  #define SHIFT 0
>>>  #endif // if HIGH_BIT_DEPTH
>>> @@ -272,6 +274,9 @@
>>>  #define MAX_TR_SIZE (1 << MAX_LOG2_TR_SIZE)
>>>  #define MAX_TS_SIZE (1 << MAX_LOG2_TS_SIZE)
>>>
>>> +#define RDCOST_BASED_RSKIP 1
>>> +#define EDGE_BASED_RSKIP 2
>>> +
>>>  #define COEF_REMAIN_BIN_REDUCTION   3 // indicates the level at which
>>> the VLC
>>>                                        // transitions from Golomb-Rice
>>> to TU+EG(k)
>>>
>>> diff -r 6b348d5b56d8 -r 82a92c26b442 source/common/frame.cpp
>>> --- a/source/common/frame.cpp   Fri Jan 10 14:38:32 2020 +0530
>>> +++ b/source/common/frame.cpp   Fri Jan 10 17:15:13 2020 +0530
>>> @@ -61,6 +61,7 @@
>>>      m_edgePic = NULL;
>>>      m_gaussianPic = NULL;
>>>      m_thetaPic = NULL;
>>> +    m_edgeBitPlane = NULL;
>>>  }
>>>
>>>  bool Frame::create(x265_param *param, float* quantOffsets)
>>> @@ -115,6 +116,18 @@
>>>          m_thetaPic = X265_MALLOC(pixel, m_stride * (maxHeight +
>>> (m_lumaMarginY * 2)));
>>>      }
>>>
>>> +    if (param->bEnableRecursionSkip == EDGE_BASED_RSKIP)
>>> +    {
>>> +        uint32_t numCuInWidth = (param->sourceWidth + param->maxCUSize
>>> - 1) / param->maxCUSize;
>>> +        uint32_t numCuInHeight = (param->sourceHeight +
>>> param->maxCUSize - 1) / param->maxCUSize;
>>> +        uint32_t lumaMarginX = param->maxCUSize + 32;
>>> +        uint32_t lumaMarginY = param->maxCUSize + 16;
>>> +        uint32_t stride = (numCuInWidth * param->maxCUSize) +
>>> (lumaMarginX << 1);
>>> +        uint32_t maxHeight = numCuInHeight * param->maxCUSize;
>>> +        m_bitPlaneSize = stride * (maxHeight + (lumaMarginY * 2));
>>> +        CHECKED_MALLOC_ZERO(m_edgeBitPlane, pixel, m_bitPlaneSize);
>>> +    }
>>> +
>>>      if (m_fencPic->create(param, !!m_param->bCopyPicToFrame) &&
>>> m_lowres.create(param, m_fencPic, param->rc.qgSize))
>>>      {
>>>          X265_CHECK((m_reconColCount == NULL), "m_reconColCount was
>>> initialized");
>>> @@ -267,4 +280,9 @@
>>>          X265_FREE(m_gaussianPic);
>>>          X265_FREE(m_thetaPic);
>>>      }
>>> +
>>> +    if (m_param->bEnableRecursionSkip == EDGE_BASED_RSKIP)
>>> +    {
>>> +        X265_FREE(m_edgeBitPlane);
>>> +    }
>>>  }
>>> diff -r 6b348d5b56d8 -r 82a92c26b442 source/common/frame.h
>>> --- a/source/common/frame.h     Fri Jan 10 14:38:32 2020 +0530
>>> +++ b/source/common/frame.h     Fri Jan 10 17:15:13 2020 +0530
>>> @@ -137,6 +137,8 @@
>>>      pixel*                 m_gaussianPic;
>>>      pixel*                 m_thetaPic;
>>>
>>> +    pixel*                 m_edgeBitPlane;
>>> +    uint32_t               m_bitPlaneSize;
>>>      Frame();
>>>
>>>      bool create(x265_param *param, float* quantOffsets);
>>> diff -r 6b348d5b56d8 -r 82a92c26b442 source/common/param.cpp
>>> --- a/source/common/param.cpp   Fri Jan 10 14:38:32 2020 +0530
>>> +++ b/source/common/param.cpp   Fri Jan 10 17:15:13 2020 +0530
>>> @@ -199,6 +199,7 @@
>>>      param->bEnableWeightedBiPred = 0;
>>>      param->bEnableEarlySkip = 1;
>>>      param->bEnableRecursionSkip = 1;
>>> +    param->edgeThreshold = 0.05f;
>>>      param->bEnableAMP = 0;
>>>      param->bEnableRectInter = 0;
>>>      param->rdLevel = 3;
>>> @@ -696,7 +697,8 @@
>>>      OPT("ref") p->maxNumReferences = atoi(value);
>>>      OPT("fast-intra") p->bEnableFastIntra = atobool(value);
>>>      OPT("early-skip") p->bEnableEarlySkip = atobool(value);
>>> -    OPT("rskip") p->bEnableRecursionSkip = atobool(value);
>>> +    OPT("rskip") p->bEnableRecursionSkip = atoi(value);
>>> +    OPT("edge-threshold") p->edgeThreshold = atoi(value)/100.0f;
>>>
>> [KS] White space
>>
>>>      OPT("me")p->searchMethod = parseName(value, x265_motion_est_names,
>>> bError);
>>>      OPT("subme") p->subpelRefine = atoi(value);
>>>      OPT("merange") p->searchRange = atoi(value);
>>> @@ -913,7 +915,7 @@
>>>      OPT("max-merge") p->maxNumMergeCand = (uint32_t)atoi(value);
>>>      OPT("temporal-mvp") p->bEnableTemporalMvp = atobool(value);
>>>      OPT("early-skip") p->bEnableEarlySkip = atobool(value);
>>> -    OPT("rskip") p->bEnableRecursionSkip = atobool(value);
>>> +    OPT("rskip") p->bEnableRecursionSkip = atoi(value);
>>>      OPT("rdpenalty") p->rdPenalty = atoi(value);
>>>      OPT("tskip") p->bEnableTransformSkip = atobool(value);
>>>      OPT("no-tskip-fast") p->bEnableTSkipFast = atobool(value);
>>> @@ -1215,6 +1217,7 @@
>>>              }
>>>          }
>>>          OPT("hist-threshold") p->edgeTransitionThreshold = atof(value);
>>> +        OPT("edge-threshold") p->edgeThreshold = atoi(value)/100.0f;
>>>          OPT("lookahead-threads") p->lookaheadThreads = atoi(value);
>>>          OPT("opt-cu-delta-qp") p->bOptCUDeltaQP = atobool(value);
>>>          OPT("multi-pass-opt-analysis") p->analysisMultiPassRefine =
>>> atobool(value);
>>> @@ -1583,9 +1586,16 @@
>>>      CHECK(param->rdLevel < 1 || param->rdLevel > 6,
>>>            "RD Level is out of range");
>>>      CHECK(param->rdoqLevel < 0 || param->rdoqLevel > 2,
>>> -        "RDOQ Level is out of range");
>>> +          "RDOQ Level is out of range");
>>>      CHECK(param->dynamicRd < 0 || param->dynamicRd >
>>> x265_ADAPT_RD_STRENGTH,
>>> -        "Dynamic RD strength must be between 0 and 4");
>>> +          "Dynamic RD strength must be between 0 and 4");
>>> +    CHECK(param->bEnableRecursionSkip > 2 ||
>>> param->bEnableRecursionSkip < 0,
>>> +          "Invalid Recursion skip mode. Valid modes 0,1,2");
>>> +    if (param->bEnableRecursionSkip == EDGE_BASED_RSKIP)
>>> +    {
>>> +        CHECK(param->edgeThreshold < 0.0f || param->edgeThreshold >
>>> 1.0f,
>>> +              "Minimum edge density percentage for a CU should be an
>>> integer between 0 to 100");
>>> +    }
>>>
>> [KS] Edge threshold is applicable only for rskip 2, should this condition
>> be checking that?
>>
>>>      CHECK(param->bframes && param->bframes >= param->lookaheadDepth &&
>>> !param->rc.bStatRead,
>>>            "Lookahead depth must be greater than the max consecutive
>>> bframe count");
>>>      CHECK(param->bframes < 0,
>>> @@ -1891,7 +1901,11 @@
>>>      TOOLVAL(param->psyRdoq, "psy-rdoq=%.2lf");
>>>      TOOLOPT(param->bEnableRdRefine, "rd-refine");
>>>      TOOLOPT(param->bEnableEarlySkip, "early-skip");
>>> -    TOOLOPT(param->bEnableRecursionSkip, "rskip");
>>> +    TOOLVAL(param->bEnableRecursionSkip, "rskip mode=%d");
>>> +    if (param->bEnableRecursionSkip == EDGE_BASED_RSKIP)
>>> +    {
>>> +        TOOLVAL(param->edgeThreshold, "rskip-threshold=%.2f");
>>> +    }
>>>
>> [KS] Braces are optional for single line loop/conditional blocks. Good to
>> follow to maintain consistent coding style
>>
>>>      TOOLOPT(param->bEnableSplitRdSkip, "splitrd-skip");
>>>      TOOLVAL(param->noiseReductionIntra, "nr-intra=%d");
>>>      TOOLVAL(param->noiseReductionInter, "nr-inter=%d");
>>> @@ -2050,6 +2064,10 @@
>>>      s += sprintf(s, " selective-sao=%d", p->selectiveSAO);
>>>      BOOL(p->bEnableEarlySkip, "early-skip");
>>>      BOOL(p->bEnableRecursionSkip, "rskip");
>>> +    if (p->bEnableRecursionSkip == EDGE_BASED_RSKIP)
>>> +    {
>>> +        s += sprintf(s, " edge-threshold=%f", p->edgeThreshold);
>>> +    }
>>>      BOOL(p->bEnableFastIntra, "fast-intra");
>>>      BOOL(p->bEnableTSkipFast, "tskip-fast");
>>>      BOOL(p->bCULossless, "cu-lossless");
>>> @@ -2354,6 +2372,7 @@
>>>      dst->rdLevel = src->rdLevel;
>>>      dst->bEnableEarlySkip = src->bEnableEarlySkip;
>>>      dst->bEnableRecursionSkip = src->bEnableRecursionSkip;
>>> +    dst->edgeThreshold = src->edgeThreshold;
>>>      dst->bEnableFastIntra = src->bEnableFastIntra;
>>>      dst->bEnableTSkipFast = src->bEnableTSkipFast;
>>>      dst->bCULossless = src->bCULossless;
>>> diff -r 6b348d5b56d8 -r 82a92c26b442 source/common/pixel.cpp
>>> --- a/source/common/pixel.cpp   Fri Jan 10 14:38:32 2020 +0530
>>> +++ b/source/common/pixel.cpp   Fri Jan 10 17:15:13 2020 +0530
>>> @@ -876,6 +876,18 @@
>>>      }
>>>  }
>>>
>>> +static void planecopy_pp_shr_c(const pixel* src, intptr_t srcStride,
>>> pixel* dst, intptr_t dstStride, int width, int height, int shift)
>>> +{
>>> +    for (int r = 0; r < height; r++)
>>> +    {
>>> +        for (int c = 0; c < width; c++)
>>> +            dst[c] = (pixel)((src[c] >> shift));
>>> +
>>> +        dst += dstStride;
>>> +        src += srcStride;
>>> +    }
>>> +}
>>> +
>>>  static void planecopy_sp_shl_c(const uint16_t* src, intptr_t srcStride,
>>> pixel* dst, intptr_t dstStride, int width, int height, int shift, uint16_t
>>> mask)
>>>  {
>>>      for (int r = 0; r < height; r++)
>>> @@ -1316,6 +1328,7 @@
>>>      p.planecopy_cp = planecopy_cp_c;
>>>      p.planecopy_sp = planecopy_sp_c;
>>>      p.planecopy_sp_shl = planecopy_sp_shl_c;
>>> +    p.planecopy_pp_shr = planecopy_pp_shr_c;
>>>  #if HIGH_BIT_DEPTH
>>>      p.planeClipAndMax = planeClipAndMax_c;
>>>  #endif
>>> diff -r 6b348d5b56d8 -r 82a92c26b442 source/common/primitives.h
>>> --- a/source/common/primitives.h        Fri Jan 10 14:38:32 2020 +0530
>>> +++ b/source/common/primitives.h        Fri Jan 10 17:15:13 2020 +0530
>>> @@ -204,6 +204,7 @@
>>>  typedef void (*sign_t)(int8_t *dst, const pixel *src1, const pixel
>>> *src2, const int endX);
>>>  typedef void (*planecopy_cp_t) (const uint8_t* src, intptr_t srcStride,
>>> pixel* dst, intptr_t dstStride, int width, int height, int shift);
>>>  typedef void (*planecopy_sp_t) (const uint16_t* src, intptr_t
>>> srcStride, pixel* dst, intptr_t dstStride, int width, int height, int
>>> shift, uint16_t mask);
>>> +typedef void (*planecopy_pp_t) (const pixel* src, intptr_t srcStride,
>>> pixel* dst, intptr_t dstStride, int width, int height, int shift);
>>>  typedef pixel (*planeClipAndMax_t)(pixel *src, intptr_t stride, int
>>> width, int height, uint64_t *outsum, const pixel minPix, const pixel
>>> maxPix);
>>>
>>>  typedef void (*cutree_propagate_cost) (int* dst, const uint16_t*
>>> propagateIn, const int32_t* intraCosts, const uint16_t* interCosts, const
>>> int32_t* invQscales, const double* fpsFactor, int len);
>>> @@ -358,6 +359,7 @@
>>>      planecopy_cp_t        planecopy_cp;
>>>      planecopy_sp_t        planecopy_sp;
>>>      planecopy_sp_t        planecopy_sp_shl;
>>> +    planecopy_pp_t        planecopy_pp_shr;
>>>      planeClipAndMax_t     planeClipAndMax;
>>>
>>>      weightp_sp_t          weight_sp;
>>> diff -r 6b348d5b56d8 -r 82a92c26b442 source/encoder/analysis.cpp
>>> --- a/source/encoder/analysis.cpp       Fri Jan 10 14:38:32 2020 +0530
>>> +++ b/source/encoder/analysis.cpp       Fri Jan 10 17:15:13 2020 +0530
>>> @@ -1313,14 +1313,22 @@
>>>          if (md.bestMode && m_param->bEnableRecursionSkip &&
>>> !bCtuInfoCheck && !(m_param->bAnalysisType == AVC_INFO &&
>>> m_param->analysisReuseLevel == 7 && (m_modeFlag[0] || m_modeFlag[1])))
>>>          {
>>>              skipRecursion = md.bestMode->cu.isSkipped(0);
>>> -            if (mightSplit && depth >= minDepth && !skipRecursion)
>>> +            if (mightSplit && !skipRecursion)
>>>              {
>>> -                if (depth)
>>> -                    skipRecursion = recursionDepthCheck(parentCTU,
>>> cuGeom, *md.bestMode);
>>> -                if (m_bHD && !skipRecursion && m_param->rdLevel == 2 &&
>>> md.fencYuv.m_size != MAX_CU_SIZE)
>>> -                    skipRecursion = complexityCheckCU(*md.bestMode);
>>> +                if (depth >= minDepth && m_param->bEnableRecursionSkip
>>> == RDCOST_BASED_RSKIP)
>>> +                {
>>> +                    if (depth)
>>> +                        skipRecursion = recursionDepthCheck(parentCTU,
>>> cuGeom, *md.bestMode);
>>> +                    if (m_bHD && !skipRecursion && m_param->rdLevel ==
>>> 2 && md.fencYuv.m_size != MAX_CU_SIZE)
>>> +                        skipRecursion = complexityCheckCU(*md.bestMode);
>>> +                }
>>>
>> [KS] Regarding the question on topskip you mentioned that it is applied
>> on the lower levels of the ctu and increased the overall processing time.
>> Do you mean that the topskip algo is compute intensive? If that's the case
>> we do not restrict its computation in rd 0 to 4, then why is the processing
>> time increasing? Can you clarify?
>>
>> +                else if (cuGeom.log2CUSize >= 5 &&
>>> m_param->bEnableRecursionSkip == EDGE_BASED_RSKIP)
>>
>> [KS] Don't use numbers
>>
>>> +                {
>>> +                    skipRecursion = edgeRecursionSkip(parentCTU,
>>> cuGeom.log2CUSize);
>>> +                }
>>>
>> [KS] As discussed, are we planning to let users do both rskip 1 and 2 (if
>> reqd), given that both algorithms are orthogonal? Can that be expected in
>> the subsequent patch?
>>
>>>              }
>>>          }
>>> +
>>>          if (m_param->bAnalysisType == AVC_INFO && md.bestMode &&
>>> cuGeom.numPartitions <= 16 && m_param->analysisReuseLevel == 7)
>>>              skipRecursion = true;
>>>          /* Step 2. Evaluate each of the 4 split sub-blocks in series */
>>> @@ -3543,6 +3551,25 @@
>>>      return false;
>>>  }
>>>
>>> +bool Analysis::edgeRecursionSkip(const CUData& ctu, int log2CUSize)
>>> +{
>>> +    int blockType = log2CUSize - 2;
>>> +    int shift = log2CUSize * 2;
>>> +    intptr_t stride = m_frame->m_fencPic->m_stride;
>>> +    pixel* edgePic = m_frame->m_edgeBitPlane +
>>> m_frame->m_fencPic->m_lumaMarginY * m_frame->m_fencPic->m_stride +
>>> m_frame->m_fencPic->m_lumaMarginX;
>>> +    intptr_t blockOffsetLuma = ctu.m_cuPelX + ctu.m_cuPelY * stride;
>>>
>> [KS] Are you sure the blockOffset computation is correct? pelx and
>> pely will give incorrect values. Did you debug and verify your logic?
>>
>> +    uint64_t sum_ss = primitives.cu[blockType].var(edgePic +
>>> blockOffsetLuma, stride);
>>> +    uint32_t sum = (uint32_t)sum_ss;
>>> +    uint32_t ss = (uint32_t)(sum_ss >> 32);
>>> +    uint32_t pixelCount = 1 << shift;
>>> +    double cuEdgeVariance = (ss - ((double)sum * sum / pixelCount)) /
>>> pixelCount;
>>> +
>>> +    if (cuEdgeVariance > (double)m_param->edgeThreshold)
>>> +        return false;
>>> +    else
>>> +        return true;
>>> +}
>>> +
>>>  uint32_t Analysis::calculateCUVariance(const CUData& ctu, const CUGeom&
>>> cuGeom)
>>>  {
>>>      uint32_t cuVariance = 0;
>>> @@ -3566,7 +3593,6 @@
>>>              cnt++;
>>>          }
>>>      }
>>> -
>>>      return cuVariance / cnt;
>>>  }
>>>
>>> diff -r 6b348d5b56d8 -r 82a92c26b442 source/encoder/analysis.h
>>> --- a/source/encoder/analysis.h Fri Jan 10 14:38:32 2020 +0530
>>> +++ b/source/encoder/analysis.h Fri Jan 10 17:15:13 2020 +0530
>>> @@ -52,7 +52,7 @@
>>>          splitRefs = 0;
>>>          mvCost[0] = 0; // L0
>>>          mvCost[1] = 0; // L1
>>> -        sa8dCost    = 0;
>>> +        sa8dCost  = 0;
>>>      }
>>>  };
>>>
>>> @@ -120,7 +120,6 @@
>>>
>>>      Mode& compressCTU(CUData& ctu, Frame& frame, const CUGeom& cuGeom,
>>> const Entropy& initialContext);
>>>      int32_t loadTUDepth(CUGeom cuGeom, CUData parentCTU);
>>> -
>>>  protected:
>>>      /* Analysis data for save/load mode, writes/reads data based on
>>> absPartIdx */
>>>      x265_analysis_inter_data*  m_reuseInterDataCTU;
>>> @@ -192,6 +191,7 @@
>>>      uint32_t topSkipMinDepth(const CUData& parentCTU, const CUGeom&
>>> cuGeom);
>>>      bool recursionDepthCheck(const CUData& parentCTU, const CUGeom&
>>> cuGeom, const Mode& bestMode);
>>>      bool complexityCheckCU(const Mode& bestMode);
>>> +    bool edgeRecursionSkip(const CUData& parentCTU, int32_t log2CuSize);
>>>
>>>      /* generate residual and recon pixels for an entire CTU recursively
>>> (RD0) */
>>>      void encodeResidue(const CUData& parentCTU, const CUGeom& cuGeom);
>>> diff -r 6b348d5b56d8 -r 82a92c26b442 source/encoder/encoder.cpp
>>> --- a/source/encoder/encoder.cpp        Fri Jan 10 14:38:32 2020 +0530
>>> +++ b/source/encoder/encoder.cpp        Fri Jan 10 17:15:13 2020 +0530
>>> @@ -1351,9 +1351,9 @@
>>>      int32_t numBytes = m_param->sourceBitDepth > 8 ? 2 : 1;
>>>      memset(m_edgePic, 0, bufSize * numBytes);
>>>
>>> -    if (!computeEdge(m_edgePic, src, NULL, pic->width, pic->height,
>>> pic->width, false))
>>> -    {
>>> -        x265_log(m_param, X265_LOG_ERROR, "Failed edge computation!");
>>> +    if (!computeEdge(m_edgePic, src, NULL, pic->width, pic->height,
>>> pic->width, false, 1))
>>> +    {
>>> +        x265_log(m_param, X265_LOG_ERROR, "Failed to compute edge!");
>>>          return false;
>>>      }
>>>
>>> @@ -1668,6 +1668,13 @@
>>>                          }
>>>                      }
>>>                  }
>>> +                if (m_param->bEnableRecursionSkip == EDGE_BASED_RSKIP
>>> && m_param->bHistBasedSceneCut)
>>> +                {
>>> +                    pixel* src = m_edgePic;
>>> +                    pixel* edgePic = inFrame->m_edgeBitPlane +
>>> inFrame->m_fencPic->m_lumaMarginY * inFrame->m_fencPic->m_stride +
>>> inFrame->m_fencPic->m_lumaMarginX;
>>> +                    primitives.planecopy_pp_shr(src,
>>> inFrame->m_fencPic->m_picWidth, edgePic, inFrame->m_fencPic->m_stride,
>>> +                        inFrame->m_fencPic->m_picWidth,
>>> inFrame->m_fencPic->m_picHeight, 0);
>>> +                }
>>>              }
>>>              else
>>>              {
>>> diff -r 6b348d5b56d8 -r 82a92c26b442 source/encoder/frameencoder.cpp
>>> --- a/source/encoder/frameencoder.cpp   Fri Jan 10 14:38:32 2020 +0530
>>> +++ b/source/encoder/frameencoder.cpp   Fri Jan 10 17:15:13 2020 +0530
>>> @@ -130,7 +130,7 @@
>>>          {
>>>              rowSum += sliceGroupSizeAccu;
>>>              m_sliceBaseRow[++sidx] = i;
>>> -        }
>>> +        }
>>>      }
>>>      X265_CHECK(sidx < m_param->maxSlices, "sliceID check failed!");
>>>      m_sliceBaseRow[0] = 0;
>>> @@ -268,6 +268,20 @@
>>>      curFrame->m_encData->m_jobProvider = this;
>>>      curFrame->m_encData->m_slice->m_mref = m_mref;
>>>
>>> +    if (!m_param->bHistBasedSceneCut && m_param->rc.aqMode !=
>>> X265_AQ_EDGE && m_param->bEnableRecursionSkip == EDGE_BASED_RSKIP)
>>> +    {
>>> +        int height = curFrame->m_fencPic->m_picHeight;
>>> +        int width = curFrame->m_fencPic->m_picWidth;
>>> +        intptr_t stride = curFrame->m_fencPic->m_stride;
>>> +        pixel* edgePic = curFrame->m_edgeBitPlane +
>>> curFrame->m_fencPic->m_lumaMarginY * curFrame->m_fencPic->m_stride +
>>> curFrame->m_fencPic->m_lumaMarginX;
>>> +
>>> +        if (!computeEdge(edgePic, curFrame->m_fencPic->m_picOrg[0],
>>> NULL, stride, height, width, false, 1))
>>> +        {
>>> +            x265_log(m_param, X265_LOG_ERROR, " Failed to compute edge
>>> !");
>>> +            return false;
>>> +        }
>>> +    }
>>> +
>>>      if (!m_cuGeoms)
>>>      {
>>>          if (!initializeGeoms())
>>> diff -r 6b348d5b56d8 -r 82a92c26b442 source/encoder/slicetype.cpp
>>> --- a/source/encoder/slicetype.cpp      Fri Jan 10 14:38:32 2020 +0530
>>> +++ b/source/encoder/slicetype.cpp      Fri Jan 10 17:15:13 2020 +0530
>>> @@ -87,7 +87,7 @@
>>>
>>>  namespace X265_NS {
>>>
>>> -bool computeEdge(pixel *edgePic, pixel *refPic, pixel *edgeTheta,
>>> intptr_t stride, int height, int width, bool bcalcTheta)
>>> +bool computeEdge(pixel* edgePic, pixel* refPic, pixel* edgeTheta,
>>> intptr_t stride, int height, int width, bool bcalcTheta, pixel whitePixel)
>>>  {
>>>      intptr_t rowOne = 0, rowTwo = 0, rowThree = 0, colOne = 0, colTwo =
>>> 0, colThree = 0;
>>>      intptr_t middle = 0, topLeft = 0, topRight = 0, bottomLeft = 0,
>>> bottomRight = 0;
>>> @@ -141,7 +141,7 @@
>>>                         theta = 180 + theta;
>>>                      edgeTheta[middle] = (pixel)theta;
>>>                  }
>>> -                edgePic[middle] = (pixel)(gradientMagnitude >=
>>> edgeThreshold ? edgeThreshold : blackPixel);
>>> +                edgePic[middle] = (pixel)(gradientMagnitude >=
>>> EDGE_THRESHOLD ? whitePixel : blackPixel);
>>>              }
>>>          }
>>>          return true;
>>> @@ -519,6 +519,14 @@
>>>                  if (param->rc.aqMode == X265_AQ_EDGE)
>>>                      edgeFilter(curFrame, param);
>>>
>>> +                if (param->rc.aqMode == X265_AQ_EDGE &&
>>> !param->bHistBasedSceneCut && param->bEnableRecursionSkip ==
>>> EDGE_BASED_RSKIP)
>>> +                {
>>> +                    pixel* src = curFrame->m_edgePic +
>>> curFrame->m_fencPic->m_lumaMarginY * curFrame->m_fencPic->m_stride +
>>> curFrame->m_fencPic->m_lumaMarginX;
>>> +                    pixel* dst = curFrame->m_edgeBitPlane +
>>> curFrame->m_fencPic->m_lumaMarginY * curFrame->m_fencPic->m_stride +
>>> curFrame->m_fencPic->m_lumaMarginX;
>>> +                    primitives.planecopy_pp_shr(src,
>>> curFrame->m_fencPic->m_stride, dst,
>>> +                        curFrame->m_fencPic->m_stride,
>>> curFrame->m_fencPic->m_picWidth, curFrame->m_fencPic->m_picHeight,
>>> SHIFT_TO_BITPLANE);
>>> +                }
>>> +
>>>                  if (param->rc.aqMode == X265_AQ_AUTO_VARIANCE ||
>>> param->rc.aqMode == X265_AQ_AUTO_VARIANCE_BIASED || param->rc.aqMode ==
>>> X265_AQ_EDGE)
>>>                  {
>>>                      double bit_depth_correction = 1.f / (1 << (2 *
>>> (X265_DEPTH - 8)));
>>> diff -r 6b348d5b56d8 -r 82a92c26b442 source/encoder/slicetype.h
>>> --- a/source/encoder/slicetype.h        Fri Jan 10 14:38:32 2020 +0530
>>> +++ b/source/encoder/slicetype.h        Fri Jan 10 17:15:13 2020 +0530
>>> @@ -44,9 +44,9 @@
>>>  #define EDGE_INCLINATION 45
>>>
>>>  #if HIGH_BIT_DEPTH
>>> -#define edgeThreshold 1023.0
>>> +#define EDGE_THRESHOLD 1023.0
>>>  #else
>>> -#define edgeThreshold 255.0
>>> +#define EDGE_THRESHOLD 255.0
>>>  #endif
>>>  #define PI 3.14159265
>>>
>>> @@ -101,7 +101,7 @@
>>>  protected:
>>>
>>>      uint32_t acEnergyCu(Frame* curFrame, uint32_t blockX, uint32_t
>>> blockY, int csp, uint32_t qgSize);
>>> -    uint32_t edgeDensityCu(Frame*curFrame, uint32_t &avgAngle, uint32_t
>>> blockX, uint32_t blockY, uint32_t qgSize);
>>> +    uint32_t edgeDensityCu(Frame* curFrame, uint32_t &avgAngle,
>>> uint32_t blockX, uint32_t blockY, uint32_t qgSize);
>>>      uint32_t lumaSumCu(Frame* curFrame, uint32_t blockX, uint32_t
>>> blockY, uint32_t qgSize);
>>>      uint32_t weightCostLuma(Lowres& fenc, Lowres& ref, WeightParam& wp);
>>>      bool     allocWeightedRef(Lowres& fenc);
>>> @@ -265,7 +265,6 @@
>>>      CostEstimateGroup& operator=(const CostEstimateGroup&);
>>>  };
>>>
>>> -bool computeEdge(pixel *edgePic, pixel *refPic, pixel *edgeTheta,
>>> intptr_t stride, int height, int width, bool bcalcTheta);
>>> -
>>> +bool computeEdge(pixel* edgePic, pixel* refPic, pixel* edgeTheta,
>>> intptr_t stride, int height, int width, bool bcalcTheta, pixel whitePixel =
>>> EDGE_THRESHOLD);
>>>  }
>>>  #endif // ifndef X265_SLICETYPE_H
>>> diff -r 6b348d5b56d8 -r 82a92c26b442 source/test/regression-tests.txt
>>> --- a/source/test/regression-tests.txt  Fri Jan 10 14:38:32 2020 +0530
>>> +++ b/source/test/regression-tests.txt  Fri Jan 10 17:15:13 2020 +0530
>>> @@ -161,6 +161,11 @@
>>>  Island_960x540_24.yuv,--no-cutree --aq-mode 0 --bitrate 6000
>>> --scenecut-aware-qp
>>>  sintel_trailer_2k_1920x1080_24.yuv, --preset medium --hist-scenecut
>>> --hist-threshold 0.02 --frame-dup --dup-threshold 60 --hrd --bitrate 10000
>>> --vbv-bufsize 15000 --vbv-maxrate 12000
>>>  sintel_trailer_2k_1920x1080_24.yuv, --preset medium --hist-scenecut
>>> --hist-threshold 0.02
>>> +crowd_run_1080p50.yuv, --rskip 2 --edge-threshold 5
>>> +crowd_run_1080p50.yuv, --rskip 2 --edge-threshold 5 --aq-mode 4
>>> +crowd_run_1080p50.yuv, --rskip 2 --edge-threshold 5 --hist-scenecut
>>> --hist-threshold 0.1
>>> +crowd_run_1080p50.yuv, --rskip 2 --edge-threshold 5 --hist-scenecut
>>> --hist-threshold 0.1 --aq-mode 4
>>> +
>>>
>> [KS] Test presets with different maxCU size
>>
>>>  # Main12 intraCost overflow bug test
>>>  720p50_parkrun_ter.y4m,--preset medium
>>> diff -r 6b348d5b56d8 -r 82a92c26b442 source/x265.h
>>> --- a/source/x265.h     Fri Jan 10 14:38:32 2020 +0530
>>> +++ b/source/x265.h     Fri Jan 10 17:15:13 2020 +0530
>>> @@ -1867,6 +1867,8 @@
>>>      * Auto-enabled when max-cll, max-fall, or mastering display info is
>>> specified.
>>>      * Default is disabled */
>>>      int       bEmitHDR10SEI;
>>> +    /* Edge variance threshold for quad tree establishment. */
>>> +    float     edgeThreshold;
>>>  } x265_param;
>>>
>>>  /* x265_param_alloc:
>>> diff -r 6b348d5b56d8 -r 82a92c26b442 source/x265cli.h
>>> --- a/source/x265cli.h  Fri Jan 10 14:38:32 2020 +0530
>>> +++ b/source/x265cli.h  Fri Jan 10 17:15:13 2020 +0530
>>> @@ -105,8 +105,8 @@
>>>      { "amp",                  no_argument, NULL, 0 },
>>>      { "no-early-skip",        no_argument, NULL, 0 },
>>>      { "early-skip",           no_argument, NULL, 0 },
>>> -    { "no-rskip",             no_argument, NULL, 0 },
>>> -    { "rskip",                no_argument, NULL, 0 },
>>> +    { "rskip",                required_argument, NULL, 0 },
>>> +    { "edge-threshold",       required_argument, NULL, 0 },
>>>      { "no-fast-cbf",          no_argument, NULL, 0 },
>>>      { "fast-cbf",             no_argument, NULL, 0 },
>>>      { "no-tskip",             no_argument, NULL, 0 },
>>> @@ -455,7 +455,8 @@
>>>      H0("   --[no-]ssim-rd                Enable ssim rate distortion
>>> optimization, 0 to disable. Default %s\n", OPT(param->bSsimRd));
>>>      H0("   --[no-]rd-refine              Enable QP based RD refinement
>>> for rd levels 5 and 6. Default %s\n", OPT(param->bEnableRdRefine));
>>>      H0("   --[no-]early-skip             Enable early SKIP detection.
>>> Default %s\n", OPT(param->bEnableEarlySkip));
>>> -    H0("   --[no-]rskip                  Enable early exit from
>>> recursion. Default %s\n", OPT(param->bEnableRecursionSkip));
>>> +    H0("   --rskip <mode>                Enable early exit from
>>> recursion. Mode 1: exit using rdcost. Mode 2: exit using edge density. Mode
>>> 0: Disabled. Default %s\n", OPT(param->bEnableRecursionSkip));
>>>
>> [KS] This is no longer on/off CLI to enable/disable (Previous review
>> comment not addressed)
>>
>>> +    H1("   --edge-threshold              Threshold in terms of
>>> percentage for edge density in CUs to terminate the recursion depth.
>>> Applicable only for rskip mode 2. Default %s\n", OPT(param->edgeThreshold));
>>>      H1("   --[no-]tskip-fast             Enable fast intra transform
>>> skipping. Default %s\n", OPT(param->bEnableTSkipFast));
>>>      H1("   --[no-]splitrd-skip           Enable skipping split RD
>>> analysis when sum of split CU rdCost larger than one split CU rdCost for
>>> Intra CU. Default %s\n", OPT(param->bEnableSplitRdSkip));
>>>      H1("   --nr-intra <integer>          An integer value in range of 0
>>> to 2000, which denotes strength of noise reduction in intra CUs. Default
>>> 0\n");
>>> _______________________________________________
>>> x265-devel mailing list
>>> x265-devel@videolan.org
>>> https://mailman.videolan.org/listinfo/x265-devel
>>>
>>
>>
>> --
>> Regards,
>> Kavitha
>> _______________________________________________
>> x265-devel mailing list
>> x265-devel@videolan.org
>> https://mailman.videolan.org/listinfo/x265-devel
>>
>
>
> --
> *With Regards,*
> *Srikanth Kurapati.*
> _______________________________________________
> x265-devel mailing list
> x265-devel@videolan.org
> https://mailman.videolan.org/listinfo/x265-devel
>


-- 
Regards,
Kavitha
_______________________________________________
x265-devel mailing list
x265-devel@videolan.org
https://mailman.videolan.org/listinfo/x265-devel

Reply via email to