On 10.10.2022 20:56, Andrew Cooper wrote:
> On 06/10/2022 14:11, Jan Beulich wrote:
>> In an entirely different context I came across Linux commit 428e3d08574b
>> ("KVM: x86: Fix zero iterations REP-string"), which points out that
>> we're still doing things wrong: For one, there's no zero-extension at
>> all on AMD. And then while RCX is zero-extended from 32 bits uniformly
>> for all string instructions on newer hardware, RSI/RDI are only for MOVS
>> and STOS on the systems I have access to. (On an old family 0xf system
>> I've further found that for REP LODS even RCX is not zero-extended.)
>>
>> Fixes: 79e996a89f69 ("x86emul: correct 64-bit mode repeated string insn 
>> handling with zero count")
>> Signed-off-by: Jan Beulich <jbeul...@suse.com>
>> ---
>> Partly RFC for none of this being documented anywhere (and it partly
>> being model specific); inquiry pending.
> 
> None of this surprises me.  The rep instructions have always been
> microcoded, and 0 reps is a special case which has been largely ignored
> until recently.
> 
> I wouldn't be surprised if the behaviour changes with
> MISC_ENABLE.FAST_STRINGS (given the KVM commit message) and I also
> wouldn't be surprised if it's different between Core and Atom too (given
> the Fam 0xf observation).
> 
> It's almost worth executing a zero-length rep stub, except that may
> potentially go very wrong in certain ecx/rcx cases.
> 
> I'm not sure how important these cases are to cover.  Given that they do
> differ between vendors and generation, and that their use in compiled
> code is not going to consider the registers live after use, is the
> complexity really worth it?

By "complexity", what do you mean? The patch doesn't add new complexity,
it only converts "true" to "false" in several places, plus it updates a
comment. I don't think we can legitimately simplify things (by removing
logic), so the only thing I can think of is your thought towards
executing a zero-length REP stub (which you say may be problematic in
certain cases). Patch 2 makes clear why this wouldn't be a good idea
for INS and OUTS. It also cannot possibly be got right when emulating
16-bit code (without switching to a 16-bit code segment), and it's
uncertain whether a 32-bit address size override would actually yield
the same behavior as a native address size operation in 32-bit code.
Of course, if limiting this (the way we currently do) to just 32-bit
addressing in 64-bit mode, then this ought to be representative (with
the INS/OUTS caveat remaining), but - as you say - adding complexity
for likely little gain.

Jan

Reply via email to