Re: anon & pmap_page_protect
> Date: Fri, 1 Sep 2023 23:02:59 +0200 > From: Martin Pieuchot > > On 12/08/23(Sat) 10:43, Martin Pieuchot wrote: > > Since UVM has been imported, we zap mappings associated to anon pages > > before deactivating or freeing them. Sadly, with the introduction of > > locking for amaps & anons, I added new code paths that do not respect > > this behavior. > > The diff below restores it by moving the call to pmap_page_protect() > > inside uvm_anon_release(). With it the 3 code paths using the function > > are now coherent with the rest of UVM. > > > > I remember a discussion we had questioning the need for zapping such > > mappings. I'm interested in hearing more arguments for or against this > > change. However, right now, I'm more concerned about coherency, so I'd > > like to commit the change below before we try something different. > > Unless there's an objection, I'll commit this tomorrow in order to > unblock my upcoming UVM changes. I was planning to take a closer look at this, but the change should be fine in terms of correctness. > > Index: uvm/uvm_anon.c > > === > > RCS file: /cvs/src/sys/uvm/uvm_anon.c,v > > retrieving revision 1.55 > > diff -u -p -u -7 -r1.55 uvm_anon.c > > --- uvm/uvm_anon.c 11 Apr 2023 00:45:09 - 1.55 > > +++ uvm/uvm_anon.c 15 May 2023 13:55:28 - > > @@ -251,14 +251,15 @@ uvm_anon_release(struct vm_anon *anon) > > KASSERT((pg->pg_flags & PG_RELEASED) != 0); > > KASSERT((pg->pg_flags & PG_BUSY) != 0); > > KASSERT(pg->uobject == NULL); > > KASSERT(pg->uanon == anon); > > KASSERT(anon->an_ref == 0); > > > > uvm_lock_pageq(); > > + pmap_page_protect(pg, PROT_NONE); > > uvm_pagefree(pg); > > uvm_unlock_pageq(); > > KASSERT(anon->an_page == NULL); > > lock = anon->an_lock; > > uvm_anfree(anon); > > rw_exit(lock); > > /* Note: extra reference is held for PG_RELEASED case. */ > > Index: uvm/uvm_fault.c > > === > > RCS file: /cvs/src/sys/uvm/uvm_fault.c,v > > retrieving revision 1.133 > > diff -u -p -u -7 -r1.133 uvm_fault.c > > --- uvm/uvm_fault.c 4 Nov 2022 09:36:44 - 1.133 > > +++ uvm/uvm_fault.c 15 May 2023 13:55:28 - > > @@ -392,15 +392,14 @@ uvmfault_anonget(struct uvm_faultinfo *u > > > > /* > > * if we were RELEASED during I/O, then our anon is > > * no longer part of an amap. we need to free the > > * anon and try again. > > */ > > if (pg->pg_flags & PG_RELEASED) { > > - pmap_page_protect(pg, PROT_NONE); > > KASSERT(anon->an_ref == 0); > > /* > > * Released while we had unlocked amap. > > */ > > if (locked) > > uvmfault_unlockall(ufi, NULL, NULL); > > uvm_anon_release(anon); /* frees page for us */ > > > >
Re: anon & pmap_page_protect
On 12/08/23(Sat) 10:43, Martin Pieuchot wrote: > Since UVM has been imported, we zap mappings associated to anon pages > before deactivating or freeing them. Sadly, with the introduction of > locking for amaps & anons, I added new code paths that do not respect > this behavior. > The diff below restores it by moving the call to pmap_page_protect() > inside uvm_anon_release(). With it the 3 code paths using the function > are now coherent with the rest of UVM. > > I remember a discussion we had questioning the need for zapping such > mappings. I'm interested in hearing more arguments for or against this > change. However, right now, I'm more concerned about coherency, so I'd > like to commit the change below before we try something different. Unless there's an objection, I'll commit this tomorrow in order to unblock my upcoming UVM changes. > Index: uvm/uvm_anon.c > === > RCS file: /cvs/src/sys/uvm/uvm_anon.c,v > retrieving revision 1.55 > diff -u -p -u -7 -r1.55 uvm_anon.c > --- uvm/uvm_anon.c11 Apr 2023 00:45:09 - 1.55 > +++ uvm/uvm_anon.c15 May 2023 13:55:28 - > @@ -251,14 +251,15 @@ uvm_anon_release(struct vm_anon *anon) > KASSERT((pg->pg_flags & PG_RELEASED) != 0); > KASSERT((pg->pg_flags & PG_BUSY) != 0); > KASSERT(pg->uobject == NULL); > KASSERT(pg->uanon == anon); > KASSERT(anon->an_ref == 0); > > uvm_lock_pageq(); > + pmap_page_protect(pg, PROT_NONE); > uvm_pagefree(pg); > uvm_unlock_pageq(); > KASSERT(anon->an_page == NULL); > lock = anon->an_lock; > uvm_anfree(anon); > rw_exit(lock); > /* Note: extra reference is held for PG_RELEASED case. */ > Index: uvm/uvm_fault.c > === > RCS file: /cvs/src/sys/uvm/uvm_fault.c,v > retrieving revision 1.133 > diff -u -p -u -7 -r1.133 uvm_fault.c > --- uvm/uvm_fault.c 4 Nov 2022 09:36:44 - 1.133 > +++ uvm/uvm_fault.c 15 May 2023 13:55:28 - > @@ -392,15 +392,14 @@ uvmfault_anonget(struct uvm_faultinfo *u > > /* >* if we were RELEASED during I/O, then our anon is >* no longer part of an amap. we need to free the >* anon and try again. >*/ > if (pg->pg_flags & PG_RELEASED) { > - pmap_page_protect(pg, PROT_NONE); > KASSERT(anon->an_ref == 0); > /* >* Released while we had unlocked amap. >*/ > if (locked) > uvmfault_unlockall(ufi, NULL, NULL); > uvm_anon_release(anon); /* frees page for us */ >
vmd/vmm: remove an ioctl from the vcpu hotpath, go brrr
Now that my i8259 fix is in, it's safe to expand the testing pool for this diff. (Without that fix, users would definitely hit the hung block device issue testing this one.) Hoping that folks that run non-OpenBSD guests or strange configurations can give it a spin. This change removes an ioctl(2) call from the vcpu thread hot path in vmd. Instead of making that syscall to toggle on/off a pending interrupt flag on the vcpu object in vmm(4), it adds a flag into the vm_run_params struct sent with the VMM_IOC_RUN ioctl. The in-kernel vcpu runloop can now toggle the pending interrupt state prior to vm entry. mbuhl@ and phessler@ have run this diff on their machines. Current observations are reduced average network latency for guests. My terse measurements using the following btrace script show some promising changes in terms of reducing ioctl syscalls: /* VMM_IOC_INTR: 0x800c5606 -> 2148292102 */ syscall:ioctl:entry /arg1 == 2148292102/ { @total[tid] = count(); @running[tid] = count(); } interval:hz:1 { print(@running); clear(@running); } Measuring from boot of an OpenBSD guest to after the guest finishes relinking (based on my manual observation of the libevent thread settling down in syscall rate), I see a huge reduction in VMM_IOC_INTR ioctls for a single guest: ## -current @total[433237]: 1325100 # vcpu thread (!!) @total[187073]: 80239# libevent thread ## with diff @total[550347]: 42 # vcpu thread (!!) @total[256550]: 86946# libevent thread Most of the VMM_IOC_INTR ioctls on the vcpu threads come from seabios and the bootloader prodding some of the emulated hardware, but even after the bootloader you'll see ~10-20k/s of ioctl's on -current vs. ~4-5k/s with the diff. At steady-state, the vcpu thread no longer makes the VMM_IOC_INTR calls at all and you should see the libevent thread calling it at a rate ~100/s (probably hardclock?). *Without* the diff, I see a steady 650/s rate on the vcpu thread at idle. *With* the diff, it's 0/s at idle. :) To test: - rebuild & install new kernel - copy/symlink vmmvar.h into /usr/include/machine/ - rebuild & re-install vmd & vmctl - reboot -dv diffstat refs/heads/master refs/heads/vmm-vrp_intr_pending M sys/arch/amd64/amd64/vmm_machdep.c | 10+ 0- M sys/arch/amd64/include/vmmvar.h | 1+ 0- M usr.sbin/vmd/vm.c | 2+ 16- 3 files changed, 13 insertions(+), 16 deletions(-) diff refs/heads/master refs/heads/vmm-vrp_intr_pending commit - 8afcf90fb39e4a84606e93137c2b6c20f44312cb commit + 10eeb8a0414ec927b6282473c50043a7027d6b41 blob - 24a376a8f3bc94bc4a4203fe66c5994594adff46 blob + e3b6d10a0ae78b12ec2f3296f708b42540ce798e --- sys/arch/amd64/amd64/vmm_machdep.c +++ sys/arch/amd64/amd64/vmm_machdep.c @@ -3973,6 +3973,11 @@ vcpu_run_vmx(struct vcpu *vcpu, struct vm_run_params * */ irq = vrp->vrp_irq; + if (vrp->vrp_intr_pending) + vcpu->vc_intr = 1; + else + vcpu->vc_intr = 0; + if (vrp->vrp_continue) { switch (vcpu->vc_gueststate.vg_exit_reason) { case VMX_EXIT_IO: @@ -6381,6 +6386,11 @@ vcpu_run_svm(struct vcpu *vcpu, struct vm_run_params * irq = vrp->vrp_irq; + if (vrp->vrp_intr_pending) + vcpu->vc_intr = 1; + else + vcpu->vc_intr = 0; + /* * If we are returning from userspace (vmd) because we exited * last time, fix up any needed vcpu state first. Which state blob - e9f8384cccfde33034d7ac9782610f93eb5dc640 blob + 88545b54b35dd60280ba87403e343db9463d7419 --- sys/arch/amd64/include/vmmvar.h +++ sys/arch/amd64/include/vmmvar.h @@ -456,6 +456,7 @@ struct vm_run_params { uint32_tvrp_vcpu_id; uint8_t vrp_continue; /* Continuing from an exit */ uint16_tvrp_irq;/* IRQ to inject */ + uint8_t vrp_intr_pending; /* Additional intrs pending? */ /* Input/output parameter to VMM_IOC_RUN */ struct vm_exit *vrp_exit; /* updated exit data */ blob - 5f598bcc14af5115372d34a4176254d377aad91c blob + 447fc219adadf945de2bf25d5335993c2abdc26f --- usr.sbin/vmd/vm.c +++ usr.sbin/vmd/vm.c @@ -1610,22 +1610,8 @@ vcpu_run_loop(void *arg) } else vrp->vrp_irq = 0x; - /* Still more pending? */ - if (i8259_is_pending()) { - /* -* XXX can probably avoid ioctls here by providing intr -* in vrp -*/ - if (vcpu_pic_intr(vrp->vrp_vm_id, - vrp->vrp_vcpu_id, 1)) { - fatal("can't set INTR"); - } - } else { - if (vcpu_pic_intr(vrp->vrp_vm_id, - vrp->vrp_vcpu_id, 0)) { -