On Tue, 2006-08-29 at 10:16 +1000, Paul Mackerras wrote: > Hollis Blanchard writes: > > > Hi Paul, some Xen people were just noticing that copy_4K_page > > (arch/powerpc/lib/copypage_64.S) doesn't use the dcbtst instruction. Why > > doesn't it help there? > > Why would we want to read the cache lines for the destination from > memory when we're only going to overwrite them completely anyway? > > A stronger argument would be for using dcbz, but IIRC it actually made > things slower (on POWER4 at least). I suspect the hardware is > gathering the stores for the whole of each cache line automatically, > so using dcbz doesn't provide any benefit.
Yes, dcbz makes more sense. > I did a lot of measurements of memory copy speed on POWER4 (using > different copy loops, copy sizes, alignments, cache hot/cold cases) > and the copy_4K_page loop is the fastest I could come up with for > POWER4. If anyone can come up with a routine that is measurably > faster on current machines, I'm happy to look at it, of course. I figured you had done measurements; we were just curious about the unexpected results. Thanks! -- Hollis Blanchard IBM Linux Technology Center _______________________________________________ Xen-ppc-devel mailing list Xenfirstname.lastname@example.org http://lists.xensource.com/xen-ppc-devel