On 31/03/20 18:51, Peter Xu wrote: > On Tue, Mar 31, 2020 at 05:34:43PM +0200, Paolo Bonzini wrote: >> On 31/03/20 17:23, Peter Xu wrote: >>>> Or KVM_MEM_READONLY. >>> Yeah, I used a new flag because I thought READONLY was a bit tricky to >>> be used directly here. The thing is IIUC if guest writes to a >>> READONLY slot then KVM should either ignore the write or trigger an >>> error which I didn't check, however here what we want to do is to let >>> the write to fallback to the userspace so it's neither dropped (we >>> still want the written data to land gracefully on RAM), nor triggering >>> an error (because the slot is actually writable). >> >> No, writes fall back to userspace with KVM_MEM_READONLY. > > I read that __kvm_write_guest_page() will return -EFAULT when writting > to the read-only memslot, and e.g. kvm_write_guest_virt_helper() will > return with X86EMUL_IO_NEEDED, which will be translated into a > EMULATION_OK in x86_emulate_insn(). Then in x86_emulate_instruction() > it seems to get a "1" returned (note that I think it does not set > either vcpu->arch.pio.count or vcpu->mmio_needed). Does that mean > it'll retry the write forever instead of quit into the userspace? I > may possibly have misread somewhere, though..
We are definitely relying on KVM_MEM_READONLY to exit to userspace, in order to emulate flash memory. > However... I think I might find another race with this: > > main thread vcpu thread > ----------- ----------- > dirty GFN1, cached in PML > ... > remove memslot1 of GFN1 > set slot READONLY (whatever, or INVALID) > sync log (NOTE: no GFN1 yet) > vmexit, flush PML with RCU > (will flush to old bitmap) > <------- [1] > delete memslot1 (old bitmap freed) > <------- [2] > add memslot2 of GFN1 (memslot2 could be smaller) > add memslot2 > > I'm not 100% sure, but I think GFN1's dirty bit will be lost though > it's correctly applied at [1] but quickly freed at [2]. Yes, we probably need to do a mass vCPU kick when a slot is made READONLY, before KVM_SET_USER_MEMORY_REGION returns (and after releasing slots_lock). It makes sense to guarantee that you can't get any more dirtying after KVM_SET_USER_MEMORY_REGION returns. Paolo >>> I think the whole kick operation is indeed too heavy for this when >>> with the run_on_cpu() trick, because the thing we want to know (pml >>> flushing) is actually per-vcpu and no BQL interaction. Do we have/need >>> a lightweight way to kick one vcpu in synchronous way? I was >>> wondering maybe something like responding a "sync kick" request in the >>> vcpu thread right after KVM_RUN ends (when we don't have BQL yet). >>> Would that make sense? >> >> Not synchronously, because anything synchronous is very susceptible to >> deadlocks. > > Yeah it's easy to deadlock (I suffer from it...), but besides above > case (which I really think it's special) I still think unluckily we > need a synchronous way. For example, the VGA code will need the > latest dirty bit information to decide whether to update the screen > (or it could stall), or the migration code where we need to calculate > downtime with the current dirty bit information, etc. >