>>> On 10.02.16 at 16:03, <andrew.coop...@citrix.com> wrote: > On 10/02/16 12:57, Jan Beulich wrote: >> Also drop an unnecessary va adjustment in the code being touched. >> >> Signed-off-by: Jan Beulich <jbeul...@suse.com> >> >> --- a/xen/arch/x86/flushtlb.c >> +++ b/xen/arch/x86/flushtlb.c >> @@ -139,10 +139,12 @@ unsigned int flush_area_local(const void >> c->x86_clflush_size && c->x86_cache_size && sz && >> ((sz >> 10) < c->x86_cache_size) ) >> { >> - va = (const void *)((unsigned long)va & ~(sz - 1)); >> + alternative(ASM_NOP3, "sfence", X86_FEATURE_CLFLUSHOPT); > > Why separate? This would be better in the lower alternative(), with one > single nop making up the difference in length. That way, processors > without CLFLUSHOPT don't suffer the 1 cycle instruction decode stall > from the redundant rex prefix.
Why would we want the fence inside the loop - a single fence is sufficient for the entire flush. Also if we're worried about the REX decode, this could easily be a NOP instead, just that I'm not certain which one in the end is less decode overhead. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel