This patch has been pushed to the master branch. *__________________________* *Karam Singh* *Ph.D. IIT Guwahati* Senior Software (Video Coding) Engineer Mobile: +91 8011279030 Block 9A, 6th floor, DLF Cyber City Manapakkam, Chennai 600 089
On Fri, Sep 6, 2024 at 4:16 PM Wei Chen <wei.c...@arm.com> wrote: > Starting from b3ce1c32da303c92241e85bd69298ab9903c0126, > We observed that the video streams generated by x265 > compiled with CLANG and GCC, using the same parameters > and input video, are inconsistent. > > Encoding logs for CLANG compiled x265: > x265 [info]: frame I: 2, Avg QP:37.79 kb/s: 72220.75 > x265 [info]: frame P: 20, Avg QP:37.75 kb/s: 74146.30 > x265 [info]: frame B: 76, Avg QP:39.75 kb/s: 67231.66 > > Encoding logs for GCC complied x265: > x265 [info]: frame I: 2, Avg QP:37.79 kb/s: 72220.75 > x265 [info]: frame P: 20, Avg QP:37.75 kb/s: 74022.17 > x265 [info]: frame B: 76, Avg QP:39.75 kb/s: 67220.26 > > This because a macro LOAD_DIFF_16x4_sve2 was introduced in > commit b3ce1c32da303c92241e85bd69298ab9903c0126 for AArch64 > (This macro has been renamed to LOAD_DIFF_16x4_sve now) and > is used by SATD kernels. The macro LOAD_DIFF_16x4_sve uses > unprotected callee-saved SVE registers z8 to z15. > > Unfortunately, in the caller function MotionEstimate of SATD, > the CLANG generates some code to use s10. Here is a simplified > code snippet: > > fmov s10, #0.50000000 > bl subpelCompare -> will invoke SATD > fadd s0, s0, s10 > > Thus, the content of s10 used in MotionEstimate is overwritten > by the SATD function, which affects the correctness of subsequent > video encoding. > > But we did not observe similar code in GCC to access s8 - s15 > in MotionEstimate. At the same time, we can influence CLANG by > modifying -mpu=neoverse-N2, in this case CLANG will not use s10 > in MotionEstimate, thereby making the video stream consistent. > However, such compiler dependent behavior is unreliable. > > Therefore, in this patch, we avoid using z8 to z15 and extra stack > operations by reusing z0 to z7 in LOAD_DIFF_16x4_sve. Here are the > test results after applying the patch: > > Encoding logs for CLANG compiled x265: > x265 [info]: frame I: 2, Avg QP:37.79 kb/s: 72220.75 > x265 [info]: frame P: 20, Avg QP:37.75 kb/s: 74022.17 > x265 [info]: frame B: 76, Avg QP:39.75 kb/s: 67220.26 > > Encoding logs for GCC compiled x265: > x265 [info]: frame I: 2, Avg QP:37.79 kb/s: 72220.75 > x265 [info]: frame P: 20, Avg QP:37.75 kb/s: 74022.17 > x265 [info]: frame B: 76, Avg QP:39.75 kb/s: 67220.26 > > Change-Id: Ic0e0c0706b99e53b138cceb758ee6ce148130e4b > --- > source/common/aarch64/pixel-util-sve.S | 34 +++++++++++++------------- > 1 file changed, 17 insertions(+), 17 deletions(-) > > diff --git a/source/common/aarch64/pixel-util-sve.S > b/source/common/aarch64/pixel-util-sve.S > index 3d073d42e..106ba903a 100644 > --- a/source/common/aarch64/pixel-util-sve.S > +++ b/source/common/aarch64/pixel-util-sve.S > @@ -190,27 +190,27 @@ endfunc > ld1b {z7.h}, p0/z, [x2, x11] > add x0, x0, x1 > add x2, x2, x3 > - ld1b {z29.h}, p0/z, [x0] > - ld1b {z9.h}, p0/z, [x0, x11] > - ld1b {z10.h}, p0/z, [x2] > - ld1b {z11.h}, p0/z, [x2, x11] > - add x0, x0, x1 > - add x2, x2, x3 > - ld1b {z12.h}, p0/z, [x0] > - ld1b {z13.h}, p0/z, [x0, x11] > - ld1b {z14.h}, p0/z, [x2] > - ld1b {z15.h}, p0/z, [x2, x11] > - add x0, x0, x1 > - add x2, x2, x3 > - > sub \v0\().h, z0.h, z2.h > sub \v4\().h, z1.h, z3.h > sub \v1\().h, z4.h, z6.h > sub \v5\().h, z5.h, z7.h > - sub \v2\().h, z29.h, z10.h > - sub \v6\().h, z9.h, z11.h > - sub \v3\().h, z12.h, z14.h > - sub \v7\().h, z13.h, z15.h > + > + ld1b {z0.h}, p0/z, [x0] > + ld1b {z1.h}, p0/z, [x0, x11] > + ld1b {z2.h}, p0/z, [x2] > + ld1b {z3.h}, p0/z, [x2, x11] > + add x0, x0, x1 > + add x2, x2, x3 > + ld1b {z4.h}, p0/z, [x0] > + ld1b {z5.h}, p0/z, [x0, x11] > + ld1b {z6.h}, p0/z, [x2] > + ld1b {z7.h}, p0/z, [x2, x11] > + add x0, x0, x1 > + add x2, x2, x3 > + sub \v2\().h, z0.h, z2.h > + sub \v6\().h, z1.h, z3.h > + sub \v3\().h, z4.h, z6.h > + sub \v7\().h, z5.h, z7.h > .endm > > // one vertical hadamard pass and two horizontal > -- > 2.34.1_______________________________________________ > x265-devel mailing list > x265-devel@videolan.org > https://mailman.videolan.org/listinfo/x265-devel >
_______________________________________________ x265-devel mailing list x265-devel@videolan.org https://mailman.videolan.org/listinfo/x265-devel