On 2025-08-29 02:29, Jan Beulich wrote:
On 28.08.2025 23:33, Jason Andryuk wrote:
On 2025-08-28 05:17, Jan Beulich wrote:
The present copy_page_sse2() is useful in case the destination page isn't
going to get touched again soon, or if we want to limit churn on the
caches. Just rename it, to fit the corresponding {clear,scrub}_page_*()
naming scheme.

For cases where latency is the most important aspect, or when it is
expected that sufficiently large parts of a destination page will get
accessed again soon after the copying, introduce a "hot" alternative.
Again use alternatives patching to select between a "legacy" and an ERMS
variant.

Don't switch any callers just yet - this will be the subject of subsequent
changes.

Signed-off-by: Jan Beulich <jbeul...@suse.com>

Reviewed-by: Jason Andryuk <jason.andr...@amd.com>

Thanks.

To avoid the NOP padding (also in clear_page_hot()) we could use a double
REP prefix in the replacement code (accounting for the REX one in the code
being replaced).

Did my tool chain do it automatically?

0000000000000000 <.altinstr_replacement>:
     0: b9 00 10 00 00          mov    $0x1000,%ecx
     5: f3 f3 a4                repz rep movsb %ds:(%rsi),%es:(%rdi)

Interesting. That looks like a bug to me, when source code merely has

         rep movsb

Did you also check what copy_page_movsq (i.e. "rep movsq") expands to?
What gas version is this? With 2.45 I get

0000000000000000 <.altinstr_replacement>:
    0:  b9 00 10 00 00          mov    $0x1000,%ecx
    5:  f3 a4                   rep movsb (%rsi),(%rdi)

(the omission of segment indicators when there's no segment override is
indeed a change in 2.45).

Oh, sorry, I forgot I had the extra rep in when I looked at the disassembly. It is as you show.

Sorry for the noise.

-Jason

Reply via email to