On 23/04/2025 7:13 am, Jan Beulich wrote:
> SDM revision 087 points out that apparently as of quite some time ago on
> Intel hardware BSF and BSR may alter all arithmetic flags, not just ZF.
> Because of the inconsistency (and because documentation doesn't look to
> be quite right about PF), best we can do is simply take the flag values
> from what the processor produces, just like we do for various other
> arithmetic insns. (Note also that AMD and Intel have always been
> disagreeing on arithmetic flags other than ZF.)

The new footnote Intel have added about "older processors" does match
AMD, saying "unmodified".

I think it's clear now that Intel and AMD behaviour was the same
originally, except AMD wrote "unmodified" and Intel wrote "undefined",
and Intel used this flexibility to alter the behaviour.

>
> Signed-off-by: Jan Beulich <jbeul...@suse.com>
> ---
> Since I happened to look at it, I wonder whether for e.g. SHLD/SHRD we
> wouldn't be better off using _emulate_2op_SrcV_nobyte() as well; we
> already do so in x86_emul_rmw(), after all.

Probably a good move.

>
> --- a/xen/arch/x86/x86_emulate/x86_emulate.c
> +++ b/xen/arch/x86/x86_emulate/x86_emulate.c
> @@ -5268,16 +5268,14 @@ x86_emulate(
>          break;
>  
>      case X86EMUL_OPC(0x0f, 0xbc): /* bsf or tzcnt */
> -    {
> -        bool zf;
> -
> -        asm ( "bsf %2,%0" ASM_FLAG_OUT(, "; setz %1")
> -              : "=r" (dst.val), ASM_FLAG_OUT("=@ccz", "=qm") (zf)
> -              : "rm" (src.val) );
> -        _regs.eflags &= ~X86_EFLAGS_ZF;
>          if ( (vex.pfx == vex_f3) && vcpu_has_bmi1() )
>          {
> -            _regs.eflags &= ~X86_EFLAGS_CF;
> +            bool zf;
> +
> +            asm ( "bsf %2,%0" ASM_FLAG_OUT(, "; setz %1")
> +                  : "=r" (dst.val), ASM_FLAG_OUT("=@ccz", "=qm") (zf)
> +                  : "rm" (src.val) );
> +            _regs.eflags &= ~(X86_EFLAGS_ZF | X86_EFLAGS_CF);
>              if ( zf )
>              {
>                  _regs.eflags |= X86_EFLAGS_CF;
> @@ -5286,25 +5284,23 @@ x86_emulate(
>              else if ( !dst.val )
>                  _regs.eflags |= X86_EFLAGS_ZF;
>          }
> -        else if ( zf )
> +        else
>          {
> -            _regs.eflags |= X86_EFLAGS_ZF;
> -            dst.type = OP_NONE;
> +            emulate_2op_SrcV_srcmem("bsf", src, dst, _regs.eflags);
> +            if ( _regs.eflags & X86_EFLAGS_ZF )
> +                dst.type = OP_NONE;

On Intel, BSF/BSR writes back the destination register.  Notably, it
gets 0 extended per normal rules, which is why you have to be extra
careful when using the trick of preloading it with -1; the result must
be interpreted as (int) even over a 64bit operation.

This needs an amd_like() qualification to override dst.type.  This
aspect genuinely is different between them.  Alternatively, we might be
able to set the operand size always to 64 and write back the entire
register as the processor gave to us, but I'm not sure if that will have
effects elsewhere.

Finally, I'm wary leaving TZCNT/LZCNT using the old logic.  Despite the
absence of an update in 087, I find it unlikely that they behave
differently WRT flags (specifically, I severely doubt they've got
differing circuitry).

I'd suggest giving them the same fully-emulated treatment as BSF/BSR.

~Andrew

Reply via email to