[x265] ?????? [PATCH] intra: skip RD analysis when sum of subCUsplitcost bigger than non-split cost

Ximing Cheng Sat, 22 Jul 2017 13:26:08 -0700

The second line of the test output in each cases is before applying the patch, 
the third line is after applying the patch, we can get that no BDRATE loss in 
my test cases, and when the output bit-rate is smaller, the more speed up gain 
we will get.



BasketballDrive_1920x1080_50.yuv --frames 500 --fps 50 --input-res 1920x1080 
--keyint 0 -o test.265
encoded 500 frames in 261.10s (1.91 fps), 10683.10 kb/s, Avg QP:35.21
encoded 500 frames in 254.14s (1.97 fps), 10683.10 kb/s, Avg QP:35.21


BasketballDrive_1920x1080_50.yuv --frames 500 --bitrate 2048 --fps 50 
--input-res 1920x1080 --keyint 0 -o test.265
encoded 500 frames in 202.13s (2.47 fps), 2059.03 kb/s, Avg QP:48.72
encoded 500 frames in 162.09s (3.08 fps), 2059.03 kb/s, Avg QP:48.72


BasketballDrive_1920x1080_50.yuv --frames 500 --bitrate 1024 --fps 50 
--input-res 1920x1080 --keyint 0 -o test.265
encoded 500 frames in 176.67s (2.83 fps), 1778.76 kb/s, Avg QP:50.93
encoded 500 frames in 103.88s (4.81 fps), 1778.76 kb/s, Avg QP:50.93


BasketballDrill_832x480_50.yuv --frames 500 --bitrate 1024 --fps 50 --input-res 
832x480 --keyint 0 -o test.265
encoded 500 frames in 45.75s (10.93 fps), 1025.68 kb/s, Avg QP:46.18
encoded 500 frames in 41.50s (12.05 fps), 1025.68 kb/s, Avg QP:46.18


BasketballDrill_832x480_50.yuv --frames 500 --bitrate 2048 --fps 50 --input-res 
832x480 --keyint 0 -o test.265
encoded 500 frames in 50.72s (9.86 fps), 2059.46 kb/s, Avg QP:41.22
encoded 500 frames in 49.03s (10.20 fps), 2059.46 kb/s, Avg QP:41.22


BasketballDrill_832x480_50.yuv --frames 500 --bitrate 4096 --fps 50 --input-res 
832x480 --keyint 0 -o test.265
encoded 500 frames in 58.47s (8.55 fps), 4109.42 kb/s, Avg QP:35.65
encoded 500 frames in 56.16s (8.90 fps), 4109.42 kb/s, Avg QP:35.65


BQMall_832x480_60.yuv --frames 600 --bitrate 4096 --fps 60 --input-res 832x480 
--keyint 0 -o test.265
encoded 600 frames in 63.68s (9.42 fps), 3927.55 kb/s, Avg QP:38.39
encoded 600 frames in 62.43s (9.61 fps), 3927.55 kb/s, Avg QP:38.39


BQMall_832x480_60.yuv --frames 600 --bitrate 2048 --fps 60 --input-res 832x480 
--keyint 0 -o test.265
encoded 600 frames in 57.47s (10.44 fps), 2010.04 kb/s, Avg QP:43.65
encoded 600 frames in 53.77s (11.16 fps), 2010.04 kb/s, Avg QP:43.65


BQMall_832x480_60.yuv --frames 600 --bitrate 1024 --fps 60 --input-res 832x480 
--keyint 0 -o test.265
encoded 600 frames in 51.82s (11.58 fps), 970.95 kb/s, Avg QP:49.04
encoded 600 frames in 43.71s (13.73 fps), 970.95 kb/s, Avg QP:49.04
 




------------------ ???????? ------------------
??????: "Ximing Cheng";<[email protected]>;
????????: 2017??7??22??(??????) ????11:57
??????: "Development for x265"<[email protected]>; 

????: Re: [x265] [PATCH] intra: skip RD analysis when sum of subCUsplitcost 
bigger than non-split cost



 I will give my test cases later, and I will add a ci option on this skip. As I 
was very busy on workdays, sorry for late.
 
---Original---
From: "Tom Vaughan"<[email protected]>
Date: 2017/7/22 03:33:28
To: "Development for x265"<[email protected]>;
Subject: Re: [x265] [PATCH] intra: skip RD analysis when sum of sub CUsplitcost 
bigger than non-split cost


 
Ximing - thanks for your contribution!

 

Pradeep - This early decision optimization certainly looks like it has good 
potential to be a smart tradeoff for intermediate presets (like medium).   
Sure, the test was only for 2 seconds of video, but for this test case we see a 
10% performance gain with no loss of quality... so that??s very promising.  
You??re right;  more testing is needed to fully understand the cost/benefit 
under a wide range of conditions.  Can we ask one of our engineers to run some 
tests?  

 

I also think we need to take the lead when it comes to naming CLI options and 
parameters.  Maybe --splitrdskip?

 

Thanks,

Tom

 

From: x265-devel [mailto:[email protected]] On Behalf Of Pradeep 
Ramachandran
Sent: Tuesday, July 18, 2017 10:10 AM
To: Development for x265
Subject: Re: [x265] [PATCH] intra: skip RD analysis when sum of sub CU 
splitcost bigger than non-split cost

 

 

On Fri, Jul 14, 2017 at 10:38 PM, Ximing Cheng <[email protected]> 
wrote:

command line:


x265 --input BasketballDrive_1920x1080_50.yuv --input-res 1920x1080 --fps 50 
--frames 100 --keyint 0 -o test.265


 


before patch 


encoded 100 frames in 57.28s (1.75 fps), 10496.03 kb/s, Avg QP:34.74


after patch


encoded 100 frames in 51.52s (1.94 fps), 10496.03 kb/s, Avg QP:34.74


 


Thanks for your test data. Looks like this is an intra-only optimization. Could 
you please share more results with longer test sequences at different 
resolutions, if you have them? We can also run additional tests before 
considering this improvement if the results are more wide-spread.


 


Also, please add a new param field and cli option that may be used to exercise 
this. It is always better to do this instead of affect default encodes with 
such performance optimizations. They param should be off by default, and if we 
can clearly see the benefits to all possible encodes for a given preset, then 
we can consider enabling that optimization for that given preset.


 


 


 


------------------ Original ------------------


From:  "Ximing Cheng";<[email protected]>;


Send time: Saturday, Jul 15, 2017 1:07 AM


To: "x265-devel"<[email protected]>; 


Subject:  [x265] [PATCH] intra: skip RD analysis when sum of sub CU splitcost 
bigger than non-split cost



 


# HG changeset patch
# User Ximing Cheng <[email protected]>
# Date 1500052036 -28800
#      Sat Jul 15 01:07:16 2017 +0800
# Node ID 9c2e9f6c6ee73e75b94c2e52f85a64bca628baf0
# Parent  3f6841d271e36dc324936f09846d1f2cb77c63e5
intra: skip RD analysis when sum of sub CU split cost bigger than non-split cost
This patch will speed up all intra case with almost no BDRATE loss

diff -r 3f6841d271e3 -r 9c2e9f6c6ee7 source/encoder/analysis.cpp
--- a/source/encoder/analysis.cpp Wed Jun 28 10:44:19 2017 +0530
+++ b/source/encoder/analysis.cpp Sat Jul 15 01:07:16 2017 +0800
@@ -485,7 +485,7 @@
     md.bestMode->reconYuv.copyToPicYuv(*m_frame->m_reconPic, 
parentCTU.m_cuAddr, cuGeom.absPartIdx);
 }
 
-void Analysis::compressIntraCU(const CUData& parentCTU, const CUGeom& cuGeom, 
int32_t qp)
+uint64_t Analysis::compressIntraCU(const CUData& parentCTU, const CUGeom& 
cuGeom, int32_t qp)
 {
     uint32_t depth = cuGeom.depth;
     ModeDepth& md = m_modeDepth[depth];
@@ -561,6 +561,9 @@
         Entropy* nextContext = &m_rqt[depth].cur;
         int32_t nextQP = qp;
 
+        uint64_t curCost = 0;
+        int skipSplitCheck = 0;
+
         for (uint32_t subPartIdx = 0; subPartIdx < 4; subPartIdx++)
         {
             const CUGeom& childGeom = *(&cuGeom + cuGeom.childOffset + 
subPartIdx);
@@ -572,7 +575,12 @@
                 if (m_slice->m_pps->bUseDQP && nextDepth <= 
m_slice->m_pps->maxCuDQPDepth)
                     nextQP = setLambdaFromQP(parentCTU, 
calculateQpforCuSize(parentCTU, childGeom));
 
-                compressIntraCU(parentCTU, childGeom, nextQP);
+                curCost += compressIntraCU(parentCTU, childGeom, nextQP);
+                if (m_modeDepth[depth].bestMode && curCost > 
m_modeDepth[depth].bestMode->rdCost)
+                {
+                    skipSplitCheck = 1;
+                    break;
+                }
 
                 // Save best CU and pred data for this sub CU
                 splitCU->copyPartFrom(nd.bestMode->cu, childGeom, subPartIdx);
@@ -590,14 +598,18 @@
                     memset(parentCTU.m_cuDepth + childGeom.absPartIdx, 0, 
childGeom.numPartitions);
             }
         }
-        nextContext->store(splitPred->contexts);
-        if (mightNotSplit)
-            addSplitFlagCost(*splitPred, cuGeom.depth);
-        else
-            updateModeCost(*splitPred);
-
-        checkDQPForSplitPred(*splitPred, cuGeom);
-        checkBestMode(*splitPred, depth);
+
+        if (!skipSplitCheck)
+        {
+            nextContext->store(splitPred->contexts);
+            if (mightNotSplit)
+                addSplitFlagCost(*splitPred, cuGeom.depth);
+            else
+                updateModeCost(*splitPred);
+
+            checkDQPForSplitPred(*splitPred, cuGeom);
+            checkBestMode(*splitPred, depth);
+        }
     }
 
     if (m_param->bEnableRdRefine && depth <= m_slice->m_pps->maxCuDQPDepth)
@@ -620,6 +632,8 @@
     md.bestMode->cu.copyToPic(depth);
     if (md.bestMode != &md.pred[PRED_SPLIT])
         md.bestMode->reconYuv.copyToPicYuv(*m_frame->m_reconPic, 
parentCTU.m_cuAddr, cuGeom.absPartIdx);
+
+    return md.bestMode->rdCost;
 }
 
 void Analysis::PMODE::processTasks(int workerThreadId)
diff -r 3f6841d271e3 -r 9c2e9f6c6ee7 source/encoder/analysis.h
--- a/source/encoder/analysis.h Wed Jun 28 10:44:19 2017 +0530
+++ b/source/encoder/analysis.h Sat Jul 15 01:07:16 2017 +0800
@@ -145,7 +145,7 @@
     void qprdRefine(const CUData& parentCTU, const CUGeom& cuGeom, int32_t qp, 
int32_t lqp);
 
     /* full analysis for an I-slice CU */
-    void compressIntraCU(const CUData& parentCTU, const CUGeom& cuGeom, 
int32_t qp);
+    uint64_t compressIntraCU(const CUData& parentCTU, const CUGeom& cuGeom, 
int32_t qp);
 
     /* full analysis for a P or B slice CU */
     uint32_t compressInterCU_dist(const CUData& parentCTU, const CUGeom& 
cuGeom, int32_t qp);


_______________________________________________
x265-devel mailing list
[email protected]
https://mailman.videolan.org/listinfo/x265-devel





_______________________________________________
x265-devel mailing list
[email protected]
https://mailman.videolan.org/listinfo/x265-devel

_______________________________________________
x265-devel mailing list
[email protected]
https://mailman.videolan.org/listinfo/x265-devel

[x265] ?????? [PATCH] intra: skip RD analysis when sum of subCUsplitcost bigger than non-split cost

Reply via email to