On 28.08.2025 23:33, Jason Andryuk wrote: > On 2025-08-28 05:17, Jan Beulich wrote: >> The present copy_page_sse2() is useful in case the destination page isn't >> going to get touched again soon, or if we want to limit churn on the >> caches. Just rename it, to fit the corresponding {clear,scrub}_page_*() >> naming scheme. >> >> For cases where latency is the most important aspect, or when it is >> expected that sufficiently large parts of a destination page will get >> accessed again soon after the copying, introduce a "hot" alternative. >> Again use alternatives patching to select between a "legacy" and an ERMS >> variant. >> >> Don't switch any callers just yet - this will be the subject of subsequent >> changes. >> >> Signed-off-by: Jan Beulich <jbeul...@suse.com> > > Reviewed-by: Jason Andryuk <jason.andr...@amd.com>
Thanks. >> To avoid the NOP padding (also in clear_page_hot()) we could use a double >> REP prefix in the replacement code (accounting for the REX one in the code >> being replaced). > > Did my tool chain do it automatically? > > 0000000000000000 <.altinstr_replacement>: > 0: b9 00 10 00 00 mov $0x1000,%ecx > 5: f3 f3 a4 repz rep movsb %ds:(%rsi),%es:(%rdi) Interesting. That looks like a bug to me, when source code merely has rep movsb Did you also check what copy_page_movsq (i.e. "rep movsq") expands to? What gas version is this? With 2.45 I get 0000000000000000 <.altinstr_replacement>: 0: b9 00 10 00 00 mov $0x1000,%ecx 5: f3 a4 rep movsb (%rsi),(%rdi) (the omission of segment indicators when there's no segment override is indeed a change in 2.45). Jan