On Sat, 16 Dec 2006 11:34, Jimi Xenidis wrote:
> If you really want to explore mem/page copy for XenPPC then you have
> to understand that since we run without an MMU, profiling code with
> MMU on, _including_ RMA, is not helpful because the access is guarded ...
> Please run your experiments _in_ Xen ...
Timing code has been included in Xen, setup.c;
however, results match prior timings in userspace:
elapsed time: 0x000000000000a8f5
elapsed time using dcbz: 0x0000000000005410
elapsed time: 0x000000000000a987
elapsed time using dcbz: 0x0000000000005361
elapsed time: 0x0000000000000862
elapsed time using dcbz: 0x0000000000000420
elapsed time: 0x0000000000000859
elapsed time using dcbz: 0x0000000000000424
> You will probably find that grouping (as Hollis suggests) by cache
> line will be much better. but also prefetch the next line somehow.
Somewhat better... (following observations were made running in user space)
The unrolling the copy loop (by cache line) improves performance a few percent.
(did not record the time; also unrolled loop still used same number of registers
and no touching)
However, including dcbz at beginning of loop slowed things down. Perhaps need to
dcbz a couple lines ahead of usage?
Xen-ppc-devel mailing list