On Tue, Oct 22, 2019 at 06:57:32PM +0900, Iori YONEJI wrote: > On Tue, Oct 22, 2019 at 11:17 AM Mike Larkin <mlar...@nested.page> wrote: > > > > On Mon, Oct 21, 2019 at 03:52:52AM +0900, Iori YONEJI wrote: > > > Hello tech@, > > > > > > I have a question (or maybe a suggestion) about vmm(4). > > > > > > I'm writing a small additional feature to sys/arch/amd64/amd64/vmm.c > > > and found a seemingly unneeded vmclear() at the end of > > > vcpu_readregs_vmx(). This function didn't seem to affect VM state at > > > first glance because, obviously, its name is _read_. Against this > > > intuition, this function clears VM state, which makes the vmexit > > > handling stop (eg. "advance rip" failure) and the VM die if it is used > > > in there. Possibly this was added as a counterpart of > > > vcpu_reload_vmcs_vmx() at the very beginning of the read function, but > > > I don't think it does undo the reload. > > > > > > This vmclear can be removed in my understanding and also I confirmed > > > that Alpine Linux and OpenBSD guest VMs can run on the vmm(4) without > > > this vmclear call. > > > > > > Or I am just wrong and it may be actually needed in some corner cases. > > > If so, I will restore (vmptrld) vcpu->vc_control_pa every time the VM > > > context continues after calling readregs function or add another flag > > > to indicate whether vmclear is needed here if all of the corner cases > > > are known. > > > > > > Would you mind to tell me I am correct or not, or how it has to be > > > dealt with? > > > > > > > > > Index: sys/arch/amd64/amd64/vmm.c > > > =================================================================== > > > RCS file: /cvs/src/sys/arch/amd64/amd64/vmm.c,v > > > retrieving revision 1.254 > > > diff -u -p -r1.254 vmm.c > > > --- sys/arch/amd64/amd64/vmm.c 22 Sep 2019 08:47:54 -0000 1.254 > > > +++ sys/arch/amd64/amd64/vmm.c 20 Oct 2019 18:04:17 -0000 > > > @@ -1602,8 +1602,10 @@ vcpu_readregs_vmx(struct vcpu *vcpu, uin > > > errout: > > > ret = EINVAL; > > > out: > > > + /* XXX Why do we need vmclear here? > > > if (vmclear(&vcpu->vc_control_pa)) > > > ret = EINVAL; > > > + */ > > > return (ret); > > > } > > > > > > > The assumption in vcpu_run_vmx is that we will always come in with a cleared > > VMCS state, unless we are staying in the run loop (the "resume" flag in that > > function tracks this). So any ioctl (including things like read/write regs) > > that might possibly manipulate the VMCS, need to clear the VMCS before they > > return to user space. That was the original reason for that clear at the end > > of the function. > > > > If you are adding new functionality that uses read/write regs internally, > > then > > you're right; on return, you'll have a cleared VMCS and subsequent vmreads > > and > > vmwrites will fail. > > > > It looks like there are a couple of other users of readregs/writeregs > > internally - the presently unused GVA translator function as well as the > > in/out > > exit handler structure preparation code. Both of those look like they got > > lucky > > in how they were handling things; they aren't touching the VMCS after > > read/write > > regs or we would have seen them explode like you're saying. > > > > It looks like the start of vcpu_run_vmx *almost* properly handles coming > > in with an arbitrary VMCS. The "almost" part is the bit under #ifdef > > VMM_DEBUG > > that handles triple faults. It seems to assume that the proper VMCS is > > currently > > loaded, which is certainly not true all the time. I'll fix that part. > > > > Based on the fact that the other code paths look clean, I think we can > > remove > > the vmclear from the end of read/write regs, as you suggest. I'll do that as > > well. > > > > Might I ask what you're working on? Just in case I know of someone else > > working > > on the same thing? (There are about a dozen vmm side-projects going on that > > I'm > > aware of and it would be a waste for two people to be working on the same > > thing > > without knowing of the other). Feel free to mail me off-list if you don't > > want > > to share in public. > > > > -ml > > > Thank you. I didn't notice that the VMCS must be cleared before > returning to user space. > > Talking about the use of this function from outside of the run loop, I > worry about that it would overwrite VMCS memory when another physical > CPU that is running the vcpu's run loop has exited and written back its > new state just before read regs clears its VMCS. > If those control_pa(s) always point different memory addresses, this > will never happen. Otherwise, I think it would (very unlikely) happen. > > And thank you to remove the vmclear. > > > Might I ask what you're working on? > I am working on adding MMIO handling. > > Originally, I encountered very same situation as this report: > Booting NetBSD on vmm fails with uvm_fault > https://marc.info/?l=openbsd-bugs&m=153761926725642&w=2 > > The address 0xfed40014 is the TPM capability register as defined in TIS > specification, which is not needed for the guest VM for now. I can > manage it by handling this specific issue by just ignoring EPT page fault > at the address (luckily, the destination register seems to be always > cleared before read operation in this case and zero means > "TPM not available"). > To advance from this hacky workaround and to make the handling more > general, I am going to parse this instruction (it's at GVA, so I need > to use vmm_translate_gva so as read regs) and store proper data at > specified register. > > Currently, only mov instruction is valid and only VMX is supported, > but I will hopefully post the patch soon as "the scaffold" after make > it work on AMD machine.
Sounds good, looking forward to seeing it. Once note; you'll eventually need to handle the string in/out instructions in vmd(8), as this is what NetBSD uses once autoconf is complete. When you get there, let me or pd@ know, we have some old diffs that implemented this. You'll know you're there once you can see the kernel autoconf messages but then nothing after that (but the VM still keeps working). -ml