On Fri, Jun 16, 2017 at 10:25:29AM +0200, Mike Belopuhov wrote: > On Fri, Jun 16, 2017 at 16:31 +1000, Jonathan Matthew wrote: > > Recently I updated the kernel lock profiling stuff I've been working on, > > since > > it had been rotting a bit since witness was introduced. Running my diff > > on a > > KVM VM, I found there was a pretty huge performance impact (10 minutes to > > build a kernel instead of 4), which turned out to be because reading the > > emulated HPET in KVM is slow, and lock profiling involves a lot of extra > > clock reads. The diff below adds a new TSC-based timecounter implementation > > for KVM and Xen to remedy this. > > > > KVM and Xen provide frequently-updated views of system time from the host to > > each vcpu in a way that lets the VM get accurate high resolution time > > without > > much work. Linux calls this mechanism 'pvclock' so I'm doing the same. > > > > The pvclock structure gives you a system time (in nanoseconds), the TSC > > reading from when the time was updated, and scaling factors for converting > > TSC > > values to nanoseconds. Usually you subtract the TSC reading in the pvclock > > structure from a current reading, convert that to nanoseconds, and add it to > > the system time. I decided to go the other way in order to keep all the > > available resolution. > > > > Using pvclock as the timecounter reduces the overhead of lock profiling to > > almost nothing. Even without the extra clock reads for lock profiling, > > it cuts a few seconds off kernel compile time on a 2 vcpu vm. I've run it > > for ~12 hours without ntpd and the clock keeps time accurately. > > > > One wrinkle here is that the KVM pvclock mechanism requires setup on each > > vcpu, > > so I added a new pvbus function that gets called from cpu_hatch, allowing > > any > > hypervisor-specific setup to happen there. > > > > I still need to try this on xen, but comments at this stage are welcome. > > > > Cool! You've beaten both of us to it :) > > Last time I've tried uebayashi's pvclock on Xen, it didn't > work for me. I didn't have time to investigate why but > probably because we need per-cpu readings. Which you do > for KVM. I'll test this on Xen as soon as I get to the > office. > > Now regarding the diff. pvbus_init_vcpu. Ah yes, please. > It was a chicken and the egg problem for me: I didn't have > Xen, but wanted a callback from cpu_hatch to setup shared > info pages and events (interrupt delivery) for all CPUs. > So please factor it out and let's get that committed.
Updated version of this is below. The init_cpu function pointer is now in the pvbus_hv so it's easier to decide what it does at runtime. > > I don't know if it's a good idea to depend on Xen's > definition of vcpu_time_info. I think I have factored > it out into the pvclock_time_info and put it into the > pvclockvar.h or something like that. And then made Xen > use those definitions instead of its own. Dunno what's > the best course of action here. > > But this brings another point: where and how to perform > the pvclock initialization and attachment. In your diff > pvclock_xen_init comes a bit too early: none of the Xen > things are initialized at that point, shared info page > isn't allocated. I'm dropping the xen bits for now, since they need more work in a few different ways. > > I told Stefan in Munich that perhaps having a kvm.c shim > that would prepare and attach pvclock (and maybe provide > some flags and other bells and whistles). > > I think we need to call pvclock attachment from Xen code > where it's appropriate, not from pvbus code. Or do a > config_attach on it. Why didn't you want to put it in > its own device driver? I was hoping it'd remain as simple as the first diff, but since things aren't going that way I'm happy to reconsider. > > It's nice that this version avoids using assembly. Any idea > what was the reason for Linux/FreeBSD code to use it? Were > they afraid to lose precision maybe? I think so. We probably want to try harder to keep the high bits of the system time since the low 32 bits wrap every 4s or so. oks on this bit? Index: arch/amd64/amd64/cpu.c =================================================================== RCS file: /cvs/src/sys/arch/amd64/amd64/cpu.c,v retrieving revision 1.105 diff -u -p -r1.105 cpu.c --- arch/amd64/amd64/cpu.c 30 May 2017 15:11:32 -0000 1.105 +++ arch/amd64/amd64/cpu.c 18 Jun 2017 09:16:12 -0000 @@ -67,6 +67,7 @@ #include "lapic.h" #include "ioapic.h" #include "vmm.h" +#include "pvbus.h" #include <sys/param.h> #include <sys/timeout.h> @@ -103,6 +104,10 @@ #include <machine/i82093var.h> #endif +#if NPVBUS > 0 +#include <dev/pv/pvvar.h> +#endif + #include <dev/ic/mc146818reg.h> #include <amd64/isa/nvram.h> #include <dev/isa/isareg.h> @@ -728,6 +733,9 @@ cpu_hatch(void *v) lldt(0); cpu_init(ci); +#if NPVBUS > 0 + pvbus_init_cpu(); +#endif /* Re-initialise memory range handling on AP */ if (mem_range_softc.mr_op != NULL) Index: arch/i386/i386/cpu.c =================================================================== RCS file: /cvs/src/sys/arch/i386/i386/cpu.c,v retrieving revision 1.84 diff -u -p -r1.84 cpu.c --- arch/i386/i386/cpu.c 30 May 2017 15:11:32 -0000 1.84 +++ arch/i386/i386/cpu.c 18 Jun 2017 09:16:13 -0000 @@ -67,6 +67,7 @@ #include "lapic.h" #include "ioapic.h" #include "vmm.h" +#include "pvbus.h" #include <sys/param.h> #include <sys/timeout.h> @@ -104,6 +105,10 @@ #include <machine/i82093var.h> #endif +#if NPVBUS > 0 +#include <dev/pv/pvvar.h> +#endif + #include <dev/ic/mc146818reg.h> #include <i386/isa/nvram.h> #include <dev/isa/isareg.h> @@ -626,6 +631,9 @@ cpu_hatch(void *v) ci->ci_curpmap = pmap_kernel(); cpu_init(ci); +#if NPVBUS > 0 + pvbus_init_cpu(); +#endif /* Re-initialise memory range handling on AP */ if (mem_range_softc.mr_op != NULL) Index: dev/pv/pvbus.c =================================================================== RCS file: /cvs/src/sys/dev/pv/pvbus.c,v retrieving revision 1.16 diff -u -p -r1.16 pvbus.c --- dev/pv/pvbus.c 10 Jan 2017 17:16:39 -0000 1.16 +++ dev/pv/pvbus.c 18 Jun 2017 09:16:17 -0000 @@ -210,6 +210,19 @@ pvbus_identify(void) has_hv_cpuid = 1; } +void +pvbus_init_cpu(void) +{ + int i; + + for (i = 0; i < PVBUS_MAX; i++) { + if (pvbus_hv[i].hv_base == 0) + continue; + if (pvbus_hv[i].hv_init_cpu != NULL) + (pvbus_hv[i].hv_init_cpu)(&pvbus_hv[i]); + } +} + int pvbus_activate(struct device *self, int act) { Index: dev/pv/pvvar.h =================================================================== RCS file: /cvs/src/sys/dev/pv/pvvar.h,v retrieving revision 1.9 diff -u -p -r1.9 pvvar.h --- dev/pv/pvvar.h 10 Jan 2017 17:16:39 -0000 1.9 +++ dev/pv/pvvar.h 18 Jun 2017 09:16:17 -0000 @@ -56,6 +56,7 @@ struct pvbus_hv { void *hv_arg; int (*hv_kvop)(void *, int, char *, char *, size_t); + void (*hv_init_cpu)(struct pvbus_hv *); }; struct pvbus_softc { @@ -77,6 +78,7 @@ struct pv_attach_args { void pvbus_identify(void); int pvbus_probe(void); +void pvbus_init_cpu(void); void pvbus_reboot(struct device *); void pvbus_shutdown(struct device *);