Hollis Blanchard writes:
> Hi Paul, some Xen people were just noticing that copy_4K_page
> (arch/powerpc/lib/copypage_64.S) doesn't use the dcbtst instruction. Why
> doesn't it help there?
Why would we want to read the cache lines for the destination from
memory when we're only going to overwrite them completely anyway?
A stronger argument would be for using dcbz, but IIRC it actually made
things slower (on POWER4 at least). I suspect the hardware is
gathering the stores for the whole of each cache line automatically,
so using dcbz doesn't provide any benefit.
I did a lot of measurements of memory copy speed on POWER4 (using
different copy loops, copy sizes, alignments, cache hot/cold cases)
and the copy_4K_page loop is the fastest I could come up with for
POWER4. If anyone can come up with a routine that is measurably
faster on current machines, I'm happy to look at it, of course.
Xen-ppc-devel mailing list