I would expect to see dcbtst in here, no?

Nah, dcbtst is expensive (it causes some non-cheap bus
transactions) and not needed at all; dcbz is much better
(but can only be used if you kill the whole cache line;
which is true here).

Both functions (copy and clear) could stand a little loop unrolling.

ldu ; stdu ; bdnz is not the best loop possible, esp. not on
970/P4/P5.  You guys got Mac's, use Shark (go to the code browser,
cmd-shift-M, select "show 970 dispatch groups" and "show 970
details drawer").  In most cases the time spent in the loop will
be dominated by memory (cache) speed, of course, but still.

I can understand if you're not *really* trying to optimize these, but in that case why do you want to add dcbz? Is there a noticeable performance

Yes, dcbz is (should be) a huge improvement.


Xen-ppc-devel mailing list

Reply via email to