Re: [x265] [PATCH] intra: skip RD analysis when sum of sub CUsplitcostbigger than non-split cost

2017-08-18 Thread Mario *LigH* Rohkrämer

Am 18.08.2017, 14:34 Uhr, schrieb Mario *LigH* Rohkrämer :

Am 11.08.2017, 21:32 Uhr, schrieb Ximing Cheng  
:



splitrd-skip


Does not appear in the list of CLI commands in the help output.  
Intentional or forgotten?



My mistake. "--log-level full" disappeared from my script which logs help  
files of all built versions ... this is an advanced parameter, only  
documented in a verbose help output.


--

Fun and success!
Mario *LigH* Rohkrämer
mailto:cont...@ligh.de

___
x265-devel mailing list
x265-devel@videolan.org
https://mailman.videolan.org/listinfo/x265-devel


Re: [x265] [PATCH] intra: skip RD analysis when sum of sub CUsplitcostbigger than non-split cost

2017-08-18 Thread Mario *LigH* Rohkrämer
Am 11.08.2017, 21:32 Uhr, schrieb Ximing Cheng  
:



splitrd-skip


Does not appear in the list of CLI commands in the help output.  
Intentional or forgotten?


--

Fun and success!
Mario *LigH* Rohkrämer
mailto:cont...@ligh.de

___
x265-devel mailing list
x265-devel@videolan.org
https://mailman.videolan.org/listinfo/x265-devel


Re: [x265] [PATCH] intra: skip RD analysis when sum of sub CUsplitcostbigger than non-split cost

2017-08-18 Thread Pradeep Ramachandran
Pushed to default branch. I agree that this looks like a bitexact change,
and gives an nice perf boost.
Can this also be extended to inter analysis as the same logic should work
there too, and we don't have an early out there?

Thanks,
Pradeep.

On Sat, Aug 12, 2017 at 11:13 PM, Tom Vaughan <
tom.vaug...@multicorewareinc.com> wrote:

> Thanks for this additional explanation, and thanks again for your
> contribution!
>
>
>
> *From:* x265-devel [mailto:x265-devel-boun...@videolan.org] *On Behalf Of
> *Ximing Cheng
> *Sent:* Friday, August 11, 2017 12:32 PM
> *To:* Ximing Cheng
> *Subject:* Re: [x265] [PATCH] intra: skip RD analysis when sum of sub
> CUsplitcostbigger than non-split cost
>
>
>
> In fact, this skip is not a fast skip algorithm.
>
> As the sum of split cost is larger than none split CU's best cost (both
> rdcost of sub-cu and none split CU are without split flag cost), which
> means splitting into 4 parts at this depth of cu is a worse case compared
> with none split CU. So that, the remain N * 1/4 parts of CU analysis is
> useless.
>
>
>
> 
>
> .A   .B   .
>
> . . .
>
> 
>
> .C   .D   .
>
> . . .
>
>    (A B C D is the 4 parts of a CU)
>
> If sum of sub CU split cost(A_Cost + B_Cost) larger than non-split
> cost(NSCost), assume  NSCost < A_Cost + B_Cost, the remain parts (C, D)
> continue to analysis rd.
>
> C_Cost + D_Cost >= 0 --->
>
> NSCost < A_Cost + B_Cost + C_Cost + D_Cost ---> (likely that)
>
> NSCost + splitCost(splitflag = 0) < A_Cost + B_Cost + C_Cost + D_Cost +
> splitCost(splitflag = 1)  ---> choose none split
>
>
>
> So, C and D rd analysis can be skipped.
>
> So in my test cases, the MD5 checksum of the output bitstream is the same
> with the original after this skip.
>
>
>
> -- Original --
>
> *From: * "Ximing Cheng";<chengximing1...@foxmail.com>;
>
> *Send time:* Friday, Aug 4, 2017 1:56 AM
>
> *To:* "x265-devel"<x265-devel@videolan.org>;
>
> *Subject: * [x265] [PATCH] intra: skip RD analysis when sum of sub
> CUsplitcostbigger than non-split cost
>
>
>
> # HG changeset patch
> # User Ximing Cheng <ximingch...@tencent.com>
> # Date 1501782508 -28800
> #  Fri Aug 04 01:48:28 2017 +0800
> # Node ID 5943a1f73d5814a3a723f814a4dd0635b1fe2b35
> # Parent  d11482e5fedbcdaf62ee3c6872f43827d99ad181
> intra: skip RD analysis when sum of sub CUsplitcost bigger than non-split
> cost
>
> diff -r d11482e5fedb -r 5943a1f73d58 source/CMakeLists.txt
> --- a/source/CMakeLists.txt Mon Jul 24 11:15:38 2017 +0530
> +++ b/source/CMakeLists.txt Fri Aug 04 01:48:28 2017 +0800
> @@ -29,7 +29,7 @@
>  option(STATIC_LINK_CRT "Statically link C runtime for release builds" OFF)
>  mark_as_advanced(FPROFILE_USE FPROFILE_GENERATE NATIVE_BUILD)
>  # X265_BUILD must be incremented each time the public API is changed
> -set(X265_BUILD 131)
> +set(X265_BUILD 132)
>  configure_file("${PROJECT_SOURCE_DIR}/x265.def.in"
> "${PROJECT_BINARY_DIR}/x265.def")
>  configure_file("${PROJECT_SOURCE_DIR}/x265_config.h.in"
> diff -r d11482e5fedb -r 5943a1f73d58 source/common/param.cpp
> --- a/source/common/param.cpp Mon Jul 24 11:15:38 2017 +0530
> +++ b/source/common/param.cpp Fri Aug 04 01:48:28 2017 +0800
> @@ -157,6 +157,7 @@
>  param->bEnableConstrainedIntra = 0;
>  param->bEnableStrongIntraSmoothing = 1;
>  param->bEnableFastIntra = 0;
> +param->bEnableSplitRdSkip = 0;
>
>  /* Inter Coding tools */
>  param->searchMethod = X265_HEX_SEARCH;
> @@ -975,6 +976,7 @@
>  OPT("refine-inter")p->interRefine = atobool(value);
>  OPT("refine-mv")p->mvRefine = atobool(value);
>  OPT("force-flush")p->forceFlush = atoi(value);
> +OPT("splitrd-skip") p->bEnableSplitRdSkip = atobool(value);
>  else
>  return X265_PARAM_BAD_NAME;
>  }
> @@ -1431,6 +1433,7 @@
>  TOOLOPT(param->bEnableRdRefine, "rd-refine");
>  TOOLOPT(param->bEnableEarlySkip, "early-skip");
>  TOOLOPT(param->bEnableRecursionSkip, "rskip");
> +TOOLOPT(param->bEnableSplitRdSkip, "splitrd-skip");
>  TOOLVAL(param->noiseReductionIntra, "nr-intra=%d");
>  TOOLVAL(param->noiseReductionInter, "nr-inter=%d");
>  TOOLOPT(param->bEnableTSkipFast, "tskip-fast");
> @@ -1560,6 +1563,7 @@
> 

Re: [x265] [PATCH] intra: skip RD analysis when sum of sub CUsplitcostbigger than non-split cost

2017-08-12 Thread Tom Vaughan
Thanks for this additional explanation, and thanks again for your
contribution!



*From:* x265-devel [mailto:x265-devel-boun...@videolan.org] *On Behalf
Of *Ximing
Cheng
*Sent:* Friday, August 11, 2017 12:32 PM
*To:* Ximing Cheng
*Subject:* Re: [x265] [PATCH] intra: skip RD analysis when sum of sub
CUsplitcostbigger than non-split cost



In fact, this skip is not a fast skip algorithm.

As the sum of split cost is larger than none split CU's best cost (both
rdcost of sub-cu and none split CU are without split flag cost), which
means splitting into 4 parts at this depth of cu is a worse case compared
with none split CU. So that, the remain N * 1/4 parts of CU analysis is
useless.





.A   .B   .

. . .



.C   .D   .

. . .

   (A B C D is the 4 parts of a CU)

If sum of sub CU split cost(A_Cost + B_Cost) larger than non-split
cost(NSCost), assume  NSCost < A_Cost + B_Cost, the remain parts (C, D)
continue to analysis rd.

C_Cost + D_Cost >= 0 --->

NSCost < A_Cost + B_Cost + C_Cost + D_Cost ---> (likely that)

NSCost + splitCost(splitflag = 0) < A_Cost + B_Cost + C_Cost + D_Cost +
splitCost(splitflag = 1)  ---> choose none split



So, C and D rd analysis can be skipped.

So in my test cases, the MD5 checksum of the output bitstream is the same
with the original after this skip.



-- Original --

*From: * "Ximing Cheng";<chengximing1...@foxmail.com>;

*Send time:* Friday, Aug 4, 2017 1:56 AM

*To:* "x265-devel"<x265-devel@videolan.org>;

*Subject: * [x265] [PATCH] intra: skip RD analysis when sum of sub
CUsplitcostbigger than non-split cost



# HG changeset patch
# User Ximing Cheng <ximingch...@tencent.com>
# Date 1501782508 -28800
#  Fri Aug 04 01:48:28 2017 +0800
# Node ID 5943a1f73d5814a3a723f814a4dd0635b1fe2b35
# Parent  d11482e5fedbcdaf62ee3c6872f43827d99ad181
intra: skip RD analysis when sum of sub CUsplitcost bigger than non-split
cost

diff -r d11482e5fedb -r 5943a1f73d58 source/CMakeLists.txt
--- a/source/CMakeLists.txt Mon Jul 24 11:15:38 2017 +0530
+++ b/source/CMakeLists.txt Fri Aug 04 01:48:28 2017 +0800
@@ -29,7 +29,7 @@
 option(STATIC_LINK_CRT "Statically link C runtime for release builds" OFF)
 mark_as_advanced(FPROFILE_USE FPROFILE_GENERATE NATIVE_BUILD)
 # X265_BUILD must be incremented each time the public API is changed
-set(X265_BUILD 131)
+set(X265_BUILD 132)
 configure_file("${PROJECT_SOURCE_DIR}/x265.def.in"
"${PROJECT_BINARY_DIR}/x265.def")
 configure_file("${PROJECT_SOURCE_DIR}/x265_config.h.in"
diff -r d11482e5fedb -r 5943a1f73d58 source/common/param.cpp
--- a/source/common/param.cpp Mon Jul 24 11:15:38 2017 +0530
+++ b/source/common/param.cpp Fri Aug 04 01:48:28 2017 +0800
@@ -157,6 +157,7 @@
 param->bEnableConstrainedIntra = 0;
 param->bEnableStrongIntraSmoothing = 1;
 param->bEnableFastIntra = 0;
+param->bEnableSplitRdSkip = 0;

 /* Inter Coding tools */
 param->searchMethod = X265_HEX_SEARCH;
@@ -975,6 +976,7 @@
 OPT("refine-inter")p->interRefine = atobool(value);
 OPT("refine-mv")p->mvRefine = atobool(value);
 OPT("force-flush")p->forceFlush = atoi(value);
+OPT("splitrd-skip") p->bEnableSplitRdSkip = atobool(value);
 else
 return X265_PARAM_BAD_NAME;
 }
@@ -1431,6 +1433,7 @@
 TOOLOPT(param->bEnableRdRefine, "rd-refine");
 TOOLOPT(param->bEnableEarlySkip, "early-skip");
 TOOLOPT(param->bEnableRecursionSkip, "rskip");
+TOOLOPT(param->bEnableSplitRdSkip, "splitrd-skip");
 TOOLVAL(param->noiseReductionIntra, "nr-intra=%d");
 TOOLVAL(param->noiseReductionInter, "nr-inter=%d");
 TOOLOPT(param->bEnableTSkipFast, "tskip-fast");
@@ -1560,6 +1563,7 @@
 BOOL(p->bEnableTSkipFast, "tskip-fast");
 BOOL(p->bCULossless, "cu-lossless");
 BOOL(p->bIntraInBFrames, "b-intra");
+BOOL(p->bEnableSplitRdSkip, "splitrd-skip");
 s += sprintf(s, " rdpenalty=%d", p->rdPenalty);
 s += sprintf(s, " psy-rd=%.2f", p->psyRd);
 s += sprintf(s, " psy-rdoq=%.2f", p->psyRdoq);
diff -r d11482e5fedb -r 5943a1f73d58 source/encoder/analysis.cpp
--- a/source/encoder/analysis.cpp Mon Jul 24 11:15:38 2017 +0530
+++ b/source/encoder/analysis.cpp Fri Aug 04 01:48:28 2017 +0800
@@ -485,7 +485,7 @@
 md.bestMode->reconYuv.copyToPicYuv(*m_frame->m_reconPic,
parentCTU.m_cuAddr, cuGeom.absPartIdx);
 }

-void Analysis::compressIntraCU(const CUData& parentCTU, const CUGeom&
cuGeom, int32_t qp)
+uint64_t Analysis::compressIntraCU(const CUData& parentCTU, const

Re: [x265] [PATCH] intra: skip RD analysis when sum of sub CUsplitcostbigger than non-split cost

2017-08-11 Thread Ximing Cheng
In fact, this skip is not a fast skip algorithm.
As the sum of split cost is larger than none split CU's best cost (both rdcost 
of sub-cu and none split CU are without split flag cost), which means splitting 
into 4 parts at this depth of cu is a worse case compared with none split CU. 
So that, the remain N * 1/4 parts of CU analysis is useless.



.A   .B   .
. . .

.C   .D   .
. . .
   (A B C D is the 4 parts of a CU)
If sum of sub CU split cost(A_Cost + B_Cost) larger than non-split 
cost(NSCost), assume  NSCost < A_Cost + B_Cost, the remain parts (C, D) 
continue to analysis rd. 
C_Cost + D_Cost >= 0 --->
NSCost < A_Cost + B_Cost + C_Cost + D_Cost ---> (likely that)
NSCost + splitCost(splitflag = 0) < A_Cost + B_Cost + C_Cost + D_Cost + 
splitCost(splitflag = 1)  ---> choose none split


So, C and D rd analysis can be skipped.
So in my test cases, the MD5 checksum of the output bitstream is the same with 
the original after this skip.


-- Original --
From:  "Ximing Cheng";;
Send time: Friday, Aug 4, 2017 1:56 AM
To: "x265-devel"; 

Subject:  [x265] [PATCH] intra: skip RD analysis when sum of sub 
CUsplitcostbigger than non-split cost



# HG changeset patch
# User Ximing Cheng 
# Date 1501782508 -28800
#  Fri Aug 04 01:48:28 2017 +0800
# Node ID 5943a1f73d5814a3a723f814a4dd0635b1fe2b35
# Parent  d11482e5fedbcdaf62ee3c6872f43827d99ad181
intra: skip RD analysis when sum of sub CUsplitcost bigger than non-split cost

diff -r d11482e5fedb -r 5943a1f73d58 source/CMakeLists.txt
--- a/source/CMakeLists.txt Mon Jul 24 11:15:38 2017 +0530
+++ b/source/CMakeLists.txt Fri Aug 04 01:48:28 2017 +0800
@@ -29,7 +29,7 @@
 option(STATIC_LINK_CRT "Statically link C runtime for release builds" OFF)
 mark_as_advanced(FPROFILE_USE FPROFILE_GENERATE NATIVE_BUILD)
 # X265_BUILD must be incremented each time the public API is changed
-set(X265_BUILD 131)
+set(X265_BUILD 132)
 configure_file("${PROJECT_SOURCE_DIR}/x265.def.in"
"${PROJECT_BINARY_DIR}/x265.def")
 configure_file("${PROJECT_SOURCE_DIR}/x265_config.h.in"
diff -r d11482e5fedb -r 5943a1f73d58 source/common/param.cpp
--- a/source/common/param.cpp   Mon Jul 24 11:15:38 2017 +0530
+++ b/source/common/param.cpp   Fri Aug 04 01:48:28 2017 +0800
@@ -157,6 +157,7 @@
 param->bEnableConstrainedIntra = 0;
 param->bEnableStrongIntraSmoothing = 1;
 param->bEnableFastIntra = 0;
+param->bEnableSplitRdSkip = 0;
 
 /* Inter Coding tools */
 param->searchMethod = X265_HEX_SEARCH;
@@ -975,6 +976,7 @@
 OPT("refine-inter")p->interRefine = atobool(value);
 OPT("refine-mv")p->mvRefine = atobool(value);
 OPT("force-flush")p->forceFlush = atoi(value);
+OPT("splitrd-skip") p->bEnableSplitRdSkip = atobool(value);
 else
 return X265_PARAM_BAD_NAME;
 }
@@ -1431,6 +1433,7 @@
 TOOLOPT(param->bEnableRdRefine, "rd-refine");
 TOOLOPT(param->bEnableEarlySkip, "early-skip");
 TOOLOPT(param->bEnableRecursionSkip, "rskip");
+TOOLOPT(param->bEnableSplitRdSkip, "splitrd-skip");
 TOOLVAL(param->noiseReductionIntra, "nr-intra=%d");
 TOOLVAL(param->noiseReductionInter, "nr-inter=%d");
 TOOLOPT(param->bEnableTSkipFast, "tskip-fast");
@@ -1560,6 +1563,7 @@
 BOOL(p->bEnableTSkipFast, "tskip-fast");
 BOOL(p->bCULossless, "cu-lossless");
 BOOL(p->bIntraInBFrames, "b-intra");
+BOOL(p->bEnableSplitRdSkip, "splitrd-skip");
 s += sprintf(s, " rdpenalty=%d", p->rdPenalty);
 s += sprintf(s, " psy-rd=%.2f", p->psyRd);
 s += sprintf(s, " psy-rdoq=%.2f", p->psyRdoq);
diff -r d11482e5fedb -r 5943a1f73d58 source/encoder/analysis.cpp
--- a/source/encoder/analysis.cpp   Mon Jul 24 11:15:38 2017 +0530
+++ b/source/encoder/analysis.cpp   Fri Aug 04 01:48:28 2017 +0800
@@ -485,7 +485,7 @@
 md.bestMode->reconYuv.copyToPicYuv(*m_frame->m_reconPic, 
parentCTU.m_cuAddr, cuGeom.absPartIdx);
 }
 
-void Analysis::compressIntraCU(const CUData& parentCTU, const CUGeom& cuGeom, 
int32_t qp)
+uint64_t Analysis::compressIntraCU(const CUData& parentCTU, const CUGeom& 
cuGeom, int32_t qp)
 {
 uint32_t depth = cuGeom.depth;
 ModeDepth& md = m_modeDepth[depth];
@@ -560,6 +560,8 @@
 invalidateContexts(nextDepth);
 Entropy* nextContext = _rqt[depth].cur;
 int32_t nextQP = qp;
+uint64_t curCost = 0;
+int skipSplitCheck = 0;
 
 for (uint32_t subPartIdx = 0; subPartIdx < 4; subPartIdx++)
 {
@@ -572,7 +574,17 @@
 if (m_slice->m_pps->bUseDQP && nextDepth <= 
m_slice->m_pps->maxCuDQPDepth)
 nextQP = setLambdaFromQP(parentCTU, 
calculateQpforCuSize(parentCTU, childGeom));
 
-compressIntraCU(parentCTU, childGeom, nextQP);
+if