On 25/09/2025 11:46 am, Jan Beulich wrote: > Along with Zen2 (which doesn't expose ERMS), both families reportedly > suffer from sub-optimal aliasing detection when deciding whether REP MOVSB > can actually be carried out the accelerated way. Therefore we want to > avoid its use in the common case (memset(), copy_page_hot()). > > Reported-by: Andrew Cooper <[email protected]> > Signed-off-by: Jan Beulich <[email protected]> > --- > Question is whether merely avoiding REP MOVSB (but not REP MOVSQ) is going > to be good enough.
In the problem case, MOVSQ is 8 times less bad than MOVSB, but they're both slower than alternative algorithms. > > --- a/xen/arch/x86/copy_page.S > +++ b/xen/arch/x86/copy_page.S > @@ -57,6 +57,6 @@ END(copy_page_cold) > .endm > > FUNC(copy_page_hot) > - ALTERNATIVE copy_page_movsq, copy_page_movsb, X86_FEATURE_ERMS > + ALTERNATIVE copy_page_movsq, copy_page_movsb, > X86_FEATURE_XEN_REP_MOVSB > RET > END(copy_page_hot) Hmm. Overall I think this patch is an improvement. But, for any copy_page variants, we know both pointers are 4k aligned, so will not tickle the problem case. This does mess with the naming of the synthetic feature. > --- a/xen/arch/x86/cpu/amd.c > +++ b/xen/arch/x86/cpu/amd.c > @@ -1386,6 +1386,10 @@ static void cf_check init_amd(struct cpu > > check_syscfg_dram_mod_en(); > > + if (c == &boot_cpu_data && cpu_has(c, X86_FEATURE_ERMS) > + && c->family != 0x19 /* Zen3/4 */) Even if this is Linux style, && on the previous line please. ~Andrew
