At 2015-03-07 07:58:13,dave <[email protected]> wrote: >On 03/06/2015 05:07 PM, Min Chen wrote: >> # HG changeset patch >> # User Min Chen <[email protected]> >> # Date 1425690429 28800 >> # Node ID 63d132c844b9d299081b40e7589275b78fe71093 >> # Parent 043c2418864b0a3ada6f597e6def6ead73d90b5f >> asm: improve on intra_dc32 >> --- >> source/common/x86/intrapred8.asm | 71 >> ++++++++++++-------------------------- >> 1 files changed, 22 insertions(+), 49 deletions(-) >> >> diff -r 043c2418864b -r 63d132c844b9 source/common/x86/intrapred8.asm >> --- a/source/common/x86/intrapred8.asm Fri Mar 06 13:15:55 2015 -0600 >> +++ b/source/common/x86/intrapred8.asm Fri Mar 06 17:07:09 2015 -0800 >> @@ -524,15 +524,21 @@ >> pshuflw m1, m1, 0x00 ; m1 = byte [dc_val ...] >> pshufd m1, m1, 0x00 >> >> + lea r2, [r0 + r1 * 2] >> %assign x 0 >> -%rep 16 >> +%rep 8 >> ; store DC 16x16 >> movu [r0], m1 >> + movu [r0 + 16], m1 >> movu [r0 + r1], m1 >> - movu [r0 + 16], m1 >> movu [r0 + r1 + 16], m1 >> -%if x < 16 >> - lea r0, [r0 + 2 * r1] >> + movu [r2], m1 >> + movu [r2 + 16], m1 >> + movu [r2 + r1], m1 >> + movu [r2 + r1 + 16], m1 >> +%if x < 8 >> + lea r0, [r0 + 4 * r1] >> + lea r2, [r2 + 4 * r1] >All this does is trade 15 "lea r0..." for 7 "lea r0..." and 8 "lea r2..." > >./test/TestBench --testbench intrapred | grep intra_dc_32x32 >intra_dc_32x32[f=0] 4.45x 1680.01 7475.13 > >and the original code. > >./test/TestBench --testbench intrapred | grep intra_dc_32x32 >intra_dc_32x32[f=0] 4.53x 1650.03 7475.56
I have verify before upload, on my Haswell PC, it faster ~1.5%, the two of LEA dispatch to Port1 and Port5
_______________________________________________ x265-devel mailing list [email protected] https://mailman.videolan.org/listinfo/x265-devel
