On 11/08/2025 11:06 am, Jan Beulich wrote: > On 11.08.2025 11:50, Andrew Cooper wrote: >> On 11/08/2025 9:16 am, Jan Beulich wrote: >>> On 09.08.2025 00:20, Andrew Cooper wrote: >>>> + "mov %%rax, %%rdx\n\t" >>>> + "shr $32, %%rdx\n\t" >>>> + ".byte 0x0f,0x01,0xc6", X86_FEATURE_WRMSRNS, >>>> + >>>> + [msr] "i" (msr), "a" (val) : "rcx", "rdx"); >>> [msr] "i" (msr), "a" (val), "c" (msr) : "rdx"); >>> >>> allowing the compiler to actually know what's put in %ecx? That'll make >>> original and 2nd replacement code 10 bytes, better balancing with the 9 >>> bytes of the 1st replacement. And I'd guess that the potentially dead >>> MOV to %ecx would be hidden in the noise as well. >> I considered that, but what can the compiler do as a result of knowing %ecx? > For example ... > >> That said, we do need an RDMSR form (which I desperately want to make >> foo = rdmsr(MSR_BAR) but my cleanup series from 2019 got nowhere), and >> in a read+write case I suppose the compiler could deduplicate the setup >> of %ecx. > ... this. But also simply to use a good pattern (exposing as much as possible > to the compiler), so there are more good instances of code for future cloning > from. (In size-optimizing builds, the compiler could further favor ADD/SUB > over MOV when the two MSRs accessed are relatively close together.)
I have seen the compiler do this in the past, but couldn't reproduce it for this work. We specifically do not want any conversion to ADD/SUB, because that takes our "close to a nop" and makes it no so. ~Andrew