On Wed, Apr 01, 2020 at 01:12:04AM +0200, Paolo Bonzini wrote: > On 31/03/20 18:51, Peter Xu wrote: > > On Tue, Mar 31, 2020 at 05:34:43PM +0200, Paolo Bonzini wrote: > >> On 31/03/20 17:23, Peter Xu wrote: > >>>> Or KVM_MEM_READONLY. > >>> Yeah, I used a new flag because I thought READONLY was a bit tricky to > >>> be used directly here. The thing is IIUC if guest writes to a > >>> READONLY slot then KVM should either ignore the write or trigger an > >>> error which I didn't check, however here what we want to do is to let > >>> the write to fallback to the userspace so it's neither dropped (we > >>> still want the written data to land gracefully on RAM), nor triggering > >>> an error (because the slot is actually writable). > >> > >> No, writes fall back to userspace with KVM_MEM_READONLY. > > > > I read that __kvm_write_guest_page() will return -EFAULT when writting > > to the read-only memslot, and e.g. kvm_write_guest_virt_helper() will > > return with X86EMUL_IO_NEEDED, which will be translated into a > > EMULATION_OK in x86_emulate_insn(). Then in x86_emulate_instruction() > > it seems to get a "1" returned (note that I think it does not set > > either vcpu->arch.pio.count or vcpu->mmio_needed). Does that mean > > it'll retry the write forever instead of quit into the userspace? I > > may possibly have misread somewhere, though.. > > We are definitely relying on KVM_MEM_READONLY to exit to userspace, in > order to emulate flash memory. > > > However... I think I might find another race with this: > > > > main thread vcpu thread > > ----------- ----------- > > dirty GFN1, cached in PML > > ... > > remove memslot1 of GFN1 > > set slot READONLY (whatever, or INVALID) > > sync log (NOTE: no GFN1 yet) > > vmexit, flush PML with RCU > > (will flush to old bitmap) > > <------- [1] > > delete memslot1 (old bitmap freed) > > <------- [2] > > add memslot2 of GFN1 (memslot2 could be smaller) > > add memslot2 > > > > I'm not 100% sure, but I think GFN1's dirty bit will be lost though > > it's correctly applied at [1] but quickly freed at [2]. > > Yes, we probably need to do a mass vCPU kick when a slot is made > READONLY, before KVM_SET_USER_MEMORY_REGION returns (and after releasing > slots_lock). It makes sense to guarantee that you can't get any more > dirtying after KVM_SET_USER_MEMORY_REGION returns.
Sounds doable. Though we still need a synchronous way to kick vcpus in KVM to make sure the PML is flushed before KVM_SET_MEMORY_REGION returns, am I right? Is there an existing good way to do this? Thanks, -- Peter Xu