On 03/10, dave wrote: > > >>This produces some interesting numbers. > sorry, I mixed these two up. > >>>>incorrect:Without using registers for constants > >>>>with using registers > >>>>x265 [info]: I32: Intra 100%(DC 0% P 40% Ang 58%) > >>>> > >>>>encoded 2000 frames in 95.98s (20.84 fps), 1020.04 kb/s > >>>> > >>>>incorrect:With using registers for constants > >>>>without using registers > >>>>x265 [info]: I32: Intra 99%(DC 39% P 16% Ang 43%) > >>>> > >>>>encoded 2000 frames in 93.10s (21.48 fps), 1008.63 kb/s > >>>> > >>>>I just added --cu-stats to the same command options that I used > >>>>previously and I ran it several times and got exactly the same > >>>>percentages. Times varied by less than a second for each build. So > >>>>how can simple register usage in one primitive affect intra pred > >>>>decisions? > >>>it shouldn't, the behavior must be wrong in one of the cases. no change > >>>in performance should be able to impact the encoder output (or any > >>>coding decisions) > >>> > >>So execution time isn't directly measured for decision making? > >> > >>The output is also different. > >> > >>ls -l bridge-close* > >>-rw-r--r-- 1 shakezula shakezula 8432204 Mar 10 09:25 bridge-close1.y4m > >>-rw-r--r-- 1 shakezula shakezula 8527219 Mar 10 07:49 bridge-close.y4m > >> > >> bridge-close1.y4m was generated without the use of registers to hold > >>constants. > >yeah, definitely a bug in one of the two versions and if the testbench > >doesn't catch it that's really bad. > I am using the same source tree for both so the only differences is > the register usage. > > The unpatched tip, which is going to use c code for planar32, > produces the same intra pred decision percentages as not using > registers for constants but different encoded output. > > x265 [info]: I32: Intra 99%(DC 39% P 16% Ang 43%) > > encoded 2000 frames in 101.82s (19.64 fps), 1008.64 kb/s > > ls -l bridge-close.* > -rw-r--r-- 1 shakezula shakezula 8432239 Mar 10 10:03 bridge-close.hevc > > The reconstructed output of all three looks the same. > > Just to test for overflow I modified the testbench to test with all > maximum 10-bit values of 0x3FF instead of random values and it > passes. One more bit, 0x4FF, and it fails. Though the y4m file has > 8 bit depth.
this sounds like your outputs would be non-deterministic if you just ran the same encode multiple times? That would be a different class of bug, perhaps unrelated to your work on the intra primitives. I don't think we often check for non-determinism on older architectures. we regularly test --no-asm against fully optimized outputs but this only tests primitives normally used on our test machines. -- Steve Borho _______________________________________________ x265-devel mailing list [email protected] https://mailman.videolan.org/listinfo/x265-devel
