Re: anon & pmap_page_protect

2023-09-01 Thread Mark Kettenis
> Date: Fri, 1 Sep 2023 23:02:59 +0200
> From: Martin Pieuchot 
> 
> On 12/08/23(Sat) 10:43, Martin Pieuchot wrote:
> > Since UVM has been imported, we zap mappings associated to anon pages
> > before deactivating or freeing them.  Sadly, with the introduction of
> > locking for amaps & anons, I added new code paths that do not respect
> > this behavior.
> > The diff below restores it by moving the call to pmap_page_protect()
> > inside uvm_anon_release().  With it the 3 code paths using the function
> > are now coherent with the rest of UVM.
> > 
> > I remember a discussion we had questioning the need for zapping such
> > mappings.  I'm interested in hearing more arguments for or against this
> > change. However, right now, I'm more concerned about coherency, so I'd
> > like to commit the change below before we try something different.
> 
> Unless there's an objection, I'll commit this tomorrow in order to
> unblock my upcoming UVM changes.

I was planning to take a closer look at this, but the change should be
fine in terms of correctness.

> > Index: uvm/uvm_anon.c
> > ===
> > RCS file: /cvs/src/sys/uvm/uvm_anon.c,v
> > retrieving revision 1.55
> > diff -u -p -u -7 -r1.55 uvm_anon.c
> > --- uvm/uvm_anon.c  11 Apr 2023 00:45:09 -  1.55
> > +++ uvm/uvm_anon.c  15 May 2023 13:55:28 -
> > @@ -251,14 +251,15 @@ uvm_anon_release(struct vm_anon *anon)
> > KASSERT((pg->pg_flags & PG_RELEASED) != 0);
> > KASSERT((pg->pg_flags & PG_BUSY) != 0);
> > KASSERT(pg->uobject == NULL);
> > KASSERT(pg->uanon == anon);
> > KASSERT(anon->an_ref == 0);
> >  
> > uvm_lock_pageq();
> > +   pmap_page_protect(pg, PROT_NONE);
> > uvm_pagefree(pg);
> > uvm_unlock_pageq();
> > KASSERT(anon->an_page == NULL);
> > lock = anon->an_lock;
> > uvm_anfree(anon);
> > rw_exit(lock);
> > /* Note: extra reference is held for PG_RELEASED case. */
> > Index: uvm/uvm_fault.c
> > ===
> > RCS file: /cvs/src/sys/uvm/uvm_fault.c,v
> > retrieving revision 1.133
> > diff -u -p -u -7 -r1.133 uvm_fault.c
> > --- uvm/uvm_fault.c 4 Nov 2022 09:36:44 -   1.133
> > +++ uvm/uvm_fault.c 15 May 2023 13:55:28 -
> > @@ -392,15 +392,14 @@ uvmfault_anonget(struct uvm_faultinfo *u
> >  
> > /*
> >  * if we were RELEASED during I/O, then our anon is
> >  * no longer part of an amap.   we need to free the
> >  * anon and try again.
> >  */
> > if (pg->pg_flags & PG_RELEASED) {
> > -   pmap_page_protect(pg, PROT_NONE);
> > KASSERT(anon->an_ref == 0);
> > /*
> >  * Released while we had unlocked amap.
> >  */
> > if (locked)
> > uvmfault_unlockall(ufi, NULL, NULL);
> > uvm_anon_release(anon); /* frees page for us */
> > 
> 
> 



Re: anon & pmap_page_protect

2023-09-01 Thread Martin Pieuchot
On 12/08/23(Sat) 10:43, Martin Pieuchot wrote:
> Since UVM has been imported, we zap mappings associated to anon pages
> before deactivating or freeing them.  Sadly, with the introduction of
> locking for amaps & anons, I added new code paths that do not respect
> this behavior.
> The diff below restores it by moving the call to pmap_page_protect()
> inside uvm_anon_release().  With it the 3 code paths using the function
> are now coherent with the rest of UVM.
> 
> I remember a discussion we had questioning the need for zapping such
> mappings.  I'm interested in hearing more arguments for or against this
> change. However, right now, I'm more concerned about coherency, so I'd
> like to commit the change below before we try something different.

Unless there's an objection, I'll commit this tomorrow in order to
unblock my upcoming UVM changes.

> Index: uvm/uvm_anon.c
> ===
> RCS file: /cvs/src/sys/uvm/uvm_anon.c,v
> retrieving revision 1.55
> diff -u -p -u -7 -r1.55 uvm_anon.c
> --- uvm/uvm_anon.c11 Apr 2023 00:45:09 -  1.55
> +++ uvm/uvm_anon.c15 May 2023 13:55:28 -
> @@ -251,14 +251,15 @@ uvm_anon_release(struct vm_anon *anon)
>   KASSERT((pg->pg_flags & PG_RELEASED) != 0);
>   KASSERT((pg->pg_flags & PG_BUSY) != 0);
>   KASSERT(pg->uobject == NULL);
>   KASSERT(pg->uanon == anon);
>   KASSERT(anon->an_ref == 0);
>  
>   uvm_lock_pageq();
> + pmap_page_protect(pg, PROT_NONE);
>   uvm_pagefree(pg);
>   uvm_unlock_pageq();
>   KASSERT(anon->an_page == NULL);
>   lock = anon->an_lock;
>   uvm_anfree(anon);
>   rw_exit(lock);
>   /* Note: extra reference is held for PG_RELEASED case. */
> Index: uvm/uvm_fault.c
> ===
> RCS file: /cvs/src/sys/uvm/uvm_fault.c,v
> retrieving revision 1.133
> diff -u -p -u -7 -r1.133 uvm_fault.c
> --- uvm/uvm_fault.c   4 Nov 2022 09:36:44 -   1.133
> +++ uvm/uvm_fault.c   15 May 2023 13:55:28 -
> @@ -392,15 +392,14 @@ uvmfault_anonget(struct uvm_faultinfo *u
>  
>   /*
>* if we were RELEASED during I/O, then our anon is
>* no longer part of an amap.   we need to free the
>* anon and try again.
>*/
>   if (pg->pg_flags & PG_RELEASED) {
> - pmap_page_protect(pg, PROT_NONE);
>   KASSERT(anon->an_ref == 0);
>   /*
>* Released while we had unlocked amap.
>*/
>   if (locked)
>   uvmfault_unlockall(ufi, NULL, NULL);
>   uvm_anon_release(anon); /* frees page for us */
> 



vmd/vmm: remove an ioctl from the vcpu hotpath, go brrr

2023-09-01 Thread Dave Voutila
Now that my i8259 fix is in, it's safe to expand the testing pool for
this diff. (Without that fix, users would definitely hit the hung block
device issue testing this one.) Hoping that folks that run non-OpenBSD
guests or strange configurations can give it a spin.

This change removes an ioctl(2) call from the vcpu thread hot path in
vmd. Instead of making that syscall to toggle on/off a pending interrupt
flag on the vcpu object in vmm(4), it adds a flag into the vm_run_params
struct sent with the VMM_IOC_RUN ioctl. The in-kernel vcpu runloop can
now toggle the pending interrupt state prior to vm entry.

mbuhl@ and phessler@ have run this diff on their machines. Current
observations are reduced average network latency for guests.

My terse measurements using the following btrace script show some
promising changes in terms of reducing ioctl syscalls:

  /* VMM_IOC_INTR: 0x800c5606 -> 2148292102 */
  syscall:ioctl:entry
  /arg1 == 2148292102/
  {
@total[tid] = count();
@running[tid] = count();
  }
  interval:hz:1
  {
print(@running);
clear(@running);
  }

Measuring from boot of an OpenBSD guest to after the guest finishes
relinking (based on my manual observation of the libevent thread
settling down in syscall rate), I see a huge reduction in VMM_IOC_INTR
ioctls for a single guest:

## -current
@total[433237]: 1325100  # vcpu thread (!!)
@total[187073]: 80239# libevent thread

## with diff
@total[550347]: 42   # vcpu thread (!!)
@total[256550]: 86946# libevent thread

Most of the VMM_IOC_INTR ioctls on the vcpu threads come from seabios
and the bootloader prodding some of the emulated hardware, but even
after the bootloader you'll see ~10-20k/s of ioctl's on -current
vs. ~4-5k/s with the diff.

At steady-state, the vcpu thread no longer makes the VMM_IOC_INTR calls
at all and you should see the libevent thread calling it at a rate ~100/s
(probably hardclock?). *Without* the diff, I see a steady 650/s rate on
the vcpu thread at idle. *With* the diff, it's 0/s at idle. :)

To test:
- rebuild & install new kernel
- copy/symlink vmmvar.h into /usr/include/machine/
- rebuild & re-install vmd & vmctl
- reboot

-dv


diffstat refs/heads/master refs/heads/vmm-vrp_intr_pending
 M  sys/arch/amd64/amd64/vmm_machdep.c  |  10+   0-
 M  sys/arch/amd64/include/vmmvar.h |   1+   0-
 M  usr.sbin/vmd/vm.c   |   2+  16-

3 files changed, 13 insertions(+), 16 deletions(-)

diff refs/heads/master refs/heads/vmm-vrp_intr_pending
commit - 8afcf90fb39e4a84606e93137c2b6c20f44312cb
commit + 10eeb8a0414ec927b6282473c50043a7027d6b41
blob - 24a376a8f3bc94bc4a4203fe66c5994594adff46
blob + e3b6d10a0ae78b12ec2f3296f708b42540ce798e
--- sys/arch/amd64/amd64/vmm_machdep.c
+++ sys/arch/amd64/amd64/vmm_machdep.c
@@ -3973,6 +3973,11 @@ vcpu_run_vmx(struct vcpu *vcpu, struct vm_run_params *
 */
irq = vrp->vrp_irq;

+   if (vrp->vrp_intr_pending)
+   vcpu->vc_intr = 1;
+   else
+   vcpu->vc_intr = 0;
+
if (vrp->vrp_continue) {
switch (vcpu->vc_gueststate.vg_exit_reason) {
case VMX_EXIT_IO:
@@ -6381,6 +6386,11 @@ vcpu_run_svm(struct vcpu *vcpu, struct vm_run_params *

irq = vrp->vrp_irq;

+   if (vrp->vrp_intr_pending)
+   vcpu->vc_intr = 1;
+   else
+   vcpu->vc_intr = 0;
+
/*
 * If we are returning from userspace (vmd) because we exited
 * last time, fix up any needed vcpu state first. Which state
blob - e9f8384cccfde33034d7ac9782610f93eb5dc640
blob + 88545b54b35dd60280ba87403e343db9463d7419
--- sys/arch/amd64/include/vmmvar.h
+++ sys/arch/amd64/include/vmmvar.h
@@ -456,6 +456,7 @@ struct vm_run_params {
uint32_tvrp_vcpu_id;
uint8_t vrp_continue;   /* Continuing from an exit */
uint16_tvrp_irq;/* IRQ to inject */
+   uint8_t vrp_intr_pending;   /* Additional intrs pending? */

/* Input/output parameter to VMM_IOC_RUN */
struct vm_exit  *vrp_exit;  /* updated exit data */
blob - 5f598bcc14af5115372d34a4176254d377aad91c
blob + 447fc219adadf945de2bf25d5335993c2abdc26f
--- usr.sbin/vmd/vm.c
+++ usr.sbin/vmd/vm.c
@@ -1610,22 +1610,8 @@ vcpu_run_loop(void *arg)
} else
vrp->vrp_irq = 0x;

-   /* Still more pending? */
-   if (i8259_is_pending()) {
-   /*
-* XXX can probably avoid ioctls here by providing intr
-* in vrp
-*/
-   if (vcpu_pic_intr(vrp->vrp_vm_id,
-   vrp->vrp_vcpu_id, 1)) {
-   fatal("can't set INTR");
-   }
-   } else {
-   if (vcpu_pic_intr(vrp->vrp_vm_id,
-   vrp->vrp_vcpu_id, 0)) {
-