On 22/11/2019 13:31, Jan Beulich wrote:
> On 21.11.2019 23:15, Andrew Cooper wrote:
>> --- a/xen/arch/x86/hvm/svm/emulate.c
>> +++ b/xen/arch/x86/hvm/svm/emulate.c
>> @@ -117,6 +117,61 @@ unsigned int svm_get_insn_len(struct vcpu *v, unsigned 
>> int instr_enc)
>>  }
>>  
>>  /*
>> + * TASK_SWITCH vmexits never provide an instruction length.  We must always
>> + * decode under %rip to find the answer.
>> + */
>> +unsigned int svm_get_task_switch_insn_len(struct vcpu *v)
>> +{
>> +    struct hvm_emulate_ctxt ctxt;
>> +    struct x86_emulate_state *state;
>> +    unsigned int emul_len, modrm_reg;
>> +
>> +    ASSERT(v == current);
> You look to be using v here just for this ASSERT() - is this really
> worth it? By making the function take "void" it would be quite obvious
> that it would act on the current vCPU only.

This was cribbed largely from svm_get_insn_len(), which also behaves the
same.

>
>> +    hvm_emulate_init_once(&ctxt, NULL, guest_cpu_user_regs());
>> +    hvm_emulate_init_per_insn(&ctxt, NULL, 0);
>> +    state = x86_decode_insn(&ctxt.ctxt, hvmemul_insn_fetch);
>> +    if ( IS_ERR_OR_NULL(state) )
>> +        return 0;
>> +
>> +    emul_len = x86_insn_length(state, &ctxt.ctxt);
>> +
>> +    /*
>> +     * Check for an instruction which can cause a task switch.  Any far
>> +     * jmp/call/ret, any software interrupt/exception, and iret.
>> +     */
>> +    switch ( ctxt.ctxt.opcode )
>> +    {
>> +    case 0xff: /* Grp 5 */
>> +        /* call / jmp (far, absolute indirect) */
>> +        if ( x86_insn_modrm(state, NULL, &modrm_reg) != 3 ||
> DYM "== 3", to bail upon non-memory operands?

Ah yes (and this demonstrates that I really need to get an XTF tested
sorted soon.)

>
>> +             (modrm_reg != 3 && modrm_reg != 5) )
>> +        {
>> +            /* Wrong instruction.  Throw #GP back for now. */
>> +    default:
>> +            hvm_inject_hw_exception(TRAP_gp_fault, 0);
>> +            emul_len = 0;
>> +            break;
>> +        }
>> +        /* Fallthrough */
>> +    case 0x62: /* bound */
> Does "bound" really belong on this list? It raising #BR is like
> insns raising random other exceptions, not like INTO / INT3,
> where the IDT descriptor also has to have suitable DPL for the
> exception to actually get delivered (rather than #GP). I.e. it
> shouldn't make it here in the first place, due to the
> X86_EVENTTYPE_HW_EXCEPTION check in the caller.
>
> IOW if "bound" needs to be here, then all others need to be as
> well, unless they can't cause any exception at all.

More experimentation required.  BOUND doesn't appear to be special cased
by SVM, but is by VT-x.  VT-x however does throw it in the same category
as #UD, and identify it to be a hardware exception.

I suspect you are right, and t doesn't want to be here.

>> +    case 0x9a: /* call (far, absolute) */
>> +    case 0xca: /* ret imm16 (far) */
>> +    case 0xcb: /* ret (far) */
>> +    case 0xcc: /* int3 */
>> +    case 0xcd: /* int imm8 */
>> +    case 0xce: /* into */
>> +    case 0xcf: /* iret */
>> +    case 0xea: /* jmp (far, absolute) */
>> +    case 0xf1: /* icebp */
> Same perhaps for ICEBP, albeit I'm less certain here, as its
> behavior is too poorly documented (if at all).

ICEBP's #DB is a trap, not a fault, so instruction length is important.

>
>> --- a/xen/arch/x86/hvm/svm/svm.c
>> +++ b/xen/arch/x86/hvm/svm/svm.c
>> @@ -2776,7 +2776,41 @@ void svm_vmexit_handler(struct cpu_user_regs *regs)
>>  
>>      case VMEXIT_TASK_SWITCH: {
>>          enum hvm_task_switch_reason reason;
>> -        int32_t errcode = -1;
>> +        int32_t errcode = -1, insn_len = -1;
>> +
>> +        /*
>> +         * All TASK_SWITCH intercepts have fault-like semantics.  NRIP is
>> +         * never provided, even for instruction-induced task switches, but 
>> we
>> +         * need to know the instruction length in order to set %eip suitably
>> +         * in the outgoing TSS.
>> +         *
>> +         * For a task switch which vectored through the IDT, look at the 
>> type
>> +         * to distinguish interrupts/exceptions from instruction based
>> +         * switches.
>> +         */
>> +        if ( vmcb->eventinj.fields.v )
>> +        {
>> +            /*
>> +             * HW_EXCEPTION, NMI and EXT_INTR are not instruction based.  
>> All
>> +             * others are.
>> +             */
>> +            if ( vmcb->eventinj.fields.type <= X86_EVENTTYPE_HW_EXCEPTION )
>> +                insn_len = 0;
>> +
>> +            /*
>> +             * Clobber the vectoring information, as we are going to emulate
>> +             * the task switch in full.
>> +             */
>> +            vmcb->eventinj.bytes = 0;
>> +        }
>> +
>> +        /*
>> +         * insn_len being -1 indicates that we have an instruction-induced
>> +         * task switch.  Decode under %rip to find its length.
>> +         */
>> +        if ( insn_len < 0 && (insn_len = svm_get_task_switch_insn_len(v)) 
>> == 0 )
>> +            break;
> Won't this live-lock the guest?

Potentially, yes.

> I.e. isn't it better to e.g. crash it
> if svm_get_task_switch_insn_len() didn't raise #GP(0)?

No - that would need and XSA if we got it wrong, as none of these are
privileged instruction.

However, it occurs to me that we are in a position to use
svm_crash_or_fault(), so I'll respin with that in mind.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Reply via email to