> Date: Fri, 8 Jan 2021 10:18:27 -0600 > From: Scott Cheloha <scottchel...@gmail.com> > > On Thu, Jan 07, 2021 at 08:12:10PM -0600, Scott Cheloha wrote: > > On Thu, Jan 07, 2021 at 09:37:58PM +0100, Mark Kettenis wrote: > > > > Date: Thu, 7 Jan 2021 11:15:41 -0600 > > > > From: Scott Cheloha <scottchel...@gmail.com> > > > > > > > > Hi, > > > > > > > > I want to isolate statclock() from hardclock(9). This will simplify > > > > the logic in my WIP dynamic clock interrupt framework. > > > > > > > > Currently, if stathz is zero, we call statclock() from within > > > > hardclock(9). It looks like this (see sys/kern/kern_clock.c): > > > > > > > > void > > > > hardclock(struct clockframe *frame) > > > > { > > > > /* [...] */ > > > > > > > > if (stathz == 0) > > > > statclock(frame); > > > > > > > > /* [...] */ > > > > > > > > This is the case on alpha, amd64 (w/ lapic), hppa, i386 (w/ lapic), > > > > loongson, luna88k, mips64, and sh. > > > > > > > > (We seem to do it on sgi, too. I was under the impression that sgi > > > > *was* a mips64 platform, yet sgi seems to it have its own clock > > > > interrupt code distinct from mips64's general clock interrupt code > > > > which is used by e.g. octeon). > > > > > > > > However, if stathz is not zero we call statclock() separately. This > > > > is the case on armv7, arm, arm64, macppc, powerpc64, and sparc64. > > > > > > > > (The situation for the general powerpc code and socppc in particular > > > > is a mystery to me.) > > > > > > > > If we could remove this MD distinction it would make my MI framework > > > > simpler. Instead of checking stathz and conditionally starting a > > > > statclock event I will be able to unconditionally start a statclock > > > > event on all platforms on every CPU. > > > > > > > > In general I don't think the "is stathz zero?" variance between > > > > platforms is useful: > > > > > > > > - The difference is invisible to userspace, as we hide the fact that > > > > stathz is zero from e.g. the kern.clockrate sysctl. > > > > > > > > - We run statclock() at some point, regardless of whether stathz is > > > > zero. If statclock() is run from hardclock(9) then isn't stathz > > > > effectively equal to hz? > > > > > > > > - Because stathz might be zero we need to add a bunch of safety checks > > > > to our MI code to ensure we don't accidentally divide by zero. > > > > > > > > Maybe we can ensure stathz is non-zero in a later diff... > > > > > > > > -- > > > > > > > > Anyway, I don't think I have missed any platforms. However, if > > > > platform experts could weigh in here to verify my changes (and test > > > > them!) I'd really appreciate it. > > > > > > > > In particular, I'm confused about how clock interrupts work on > > > > powerpc, socppc, and sgi. > > > > > > > > -- > > > > > > > > Thoughts? Platform-specific OKs? > > > > > > I wouldn't be opposed to doing this. It is less magic! > > > > > > But yes, this needs to be tested on the platforms that you change. > > > > I guess I'll CC all the platform-specific people I'm aware of. > > > > > Note that many platforms don't have have separate schedclock and > > > statclock. But on many platforms where we use a one-shot timer as the > > > clock we have a randomized statclock. I'm sure Theo would love to > > > tell you about the cpuhog story... > > > > I am familiar with cpuhog. It's the one thing everybody mentions when > > I talk about clock interrupts and/or statclock(). > > > > Related: > > > > I wonder if we could avoid the cpuhog problem entirely by implementing > > some kind of MI cycle counting clock API that we use to timestamp > > whenever we cross the syscall boundary, or enter an interrupt, etc., > > to determine the time a thread spends using the CPU without any > > sampling error. > > > > Instead of a process accumulating ticks from a sampling clock > > interrupt you would accumulate, say, a 64-bit count of cycles, or > > something like that. > > > > Sampling with a regular clock interrupt is prone to error and trickery > > like cpuhog. The classic BSD solution to the cpuhog exploit was to > > randomize the statclock/schedclock to make it harder to fool the > > sampler. But if we used cycle counts or instruction counts at each > > state transition it would be impossible to fool because we wouldn't be > > sampling at all. > > > > Unsure what the performance implications would be, but in general I > > would guess that most platforms have a way to count instructions or > > cycles and that reading this data is fast enough for us to use it in > > e.g. syscall() or the interrupt handler without a huge performance > > hit. > > > > > Anyway, we probably want that on amd64 as well. > > > > My WIP dynamic clock interrupt system can run a randomized statclock() > > on amd64 boxes with a lapic. I imagine we will be able to do the same > > on i386 systems that have a lapic, too, though it will be slower > > because all the i386 timecounters are glacial compared to the TSC. > > > > Eventually I want to isolate schedclock() from statclock() and run it > > as an independent event. But that's a "later on" goal. For now I'm > > just trying to get every platform as similar as possible to make > > merging the dynamic clock interrupt work less painful. > > Whoops, some garbage snuck into amd64/lapic.c. > > Here's the patch without it. > > Also, sorry if I've CC'd you and you're not the right person for one > of these platforms/architectures. My thinking is: > > miod: loongson (?), sh > aoyama: luna88k > visa: mips64, sgi > deraadt: alpha > kettenis: hppa > sthen: i386
hppa is happy with this > Index: sys/kern/kern_clock.c > =================================================================== > RCS file: /cvs/src/sys/kern/kern_clock.c,v > retrieving revision 1.101 > diff -u -p -r1.101 kern_clock.c > --- sys/kern/kern_clock.c 21 Jan 2020 16:16:23 -0000 1.101 > +++ sys/kern/kern_clock.c 8 Jan 2021 15:56:24 -0000 > @@ -164,12 +164,6 @@ hardclock(struct clockframe *frame) > } > } > > - /* > - * If no separate statistics clock is available, run it from here. > - */ > - if (stathz == 0) > - statclock(frame); > - > if (--ci->ci_schedstate.spc_rrticks <= 0) > roundrobin(ci); > > Index: sys/arch/alpha/alpha/clock.c > =================================================================== > RCS file: /cvs/src/sys/arch/alpha/alpha/clock.c,v > retrieving revision 1.24 > diff -u -p -r1.24 clock.c > --- sys/arch/alpha/alpha/clock.c 6 Jul 2020 13:33:06 -0000 1.24 > +++ sys/arch/alpha/alpha/clock.c 8 Jan 2021 15:56:24 -0000 > @@ -136,6 +136,13 @@ clockattach(dev, fns) > * Machine-dependent clock routines. > */ > > +void > +clockintr(struct clockframe *frame) > +{ > + hardclock(frame); > + statclock(frame); > +} > + > /* > * Start the real-time and statistics clocks. Leave stathz 0 since there > * are no other timers available. > @@ -165,7 +172,7 @@ cpu_initclocks(void) > * hardclock, which would then fall over because the pointer > * to the virtual timers wasn't set at that time. > */ > - platform.clockintr = hardclock; > + platform.clockintr = clockintr; > schedhz = 16; > > evcount_attach(&clk_count, "clock", &clk_irq); > Index: sys/arch/amd64/amd64/lapic.c > =================================================================== > RCS file: /cvs/src/sys/arch/amd64/amd64/lapic.c,v > retrieving revision 1.57 > diff -u -p -r1.57 lapic.c > --- sys/arch/amd64/amd64/lapic.c 6 Sep 2020 20:50:00 -0000 1.57 > +++ sys/arch/amd64/amd64/lapic.c 8 Jan 2021 15:56:25 -0000 > @@ -452,6 +452,7 @@ lapic_clockintr(void *arg, struct intrfr > floor = ci->ci_handled_intr_level; > ci->ci_handled_intr_level = ci->ci_ilevel; > hardclock((struct clockframe *)&frame); > + statclock((struct clockframe *)&frame); > ci->ci_handled_intr_level = floor; > > clk_count.ec_count++; > Index: sys/arch/hppa/dev/clock.c > =================================================================== > RCS file: /cvs/src/sys/arch/hppa/dev/clock.c,v > retrieving revision 1.31 > diff -u -p -r1.31 clock.c > --- sys/arch/hppa/dev/clock.c 6 Jul 2020 13:33:07 -0000 1.31 > +++ sys/arch/hppa/dev/clock.c 8 Jan 2021 15:56:25 -0000 > @@ -43,7 +43,7 @@ > > u_long cpu_hzticks; > > -int cpu_hardclock(void *); > +int cpu_clockintr(void *); > u_int itmr_get_timecount(struct timecounter *); > > struct timecounter itmr_timecounter = { > @@ -106,7 +106,7 @@ cpu_initclocks(void) > } > > int > -cpu_hardclock(void *v) > +cpu_clockintr(void *v) > { > struct cpu_info *ci = curcpu(); > u_long __itmr, delta, eta; > @@ -114,14 +114,15 @@ cpu_hardclock(void *v) > register_t eiem; > > /* > - * Invoke hardclock as many times as there has been cpu_hzticks > - * ticks since the last interrupt. > + * Invoke hardclock and statclock as many times as there has been > + * cpu_hzticks ticks since the last interrupt. > */ > for (;;) { > mfctl(CR_ITMR, __itmr); > delta = __itmr - ci->ci_itmr; > if (delta >= cpu_hzticks) { > hardclock(v); > + statclock(v); > ci->ci_itmr += cpu_hzticks; > } else > break; > Index: sys/arch/hppa/dev/cpu.c > =================================================================== > RCS file: /cvs/src/sys/arch/hppa/dev/cpu.c,v > retrieving revision 1.42 > diff -u -p -r1.42 cpu.c > --- sys/arch/hppa/dev/cpu.c 29 May 2020 04:42:23 -0000 1.42 > +++ sys/arch/hppa/dev/cpu.c 8 Jan 2021 15:56:25 -0000 > @@ -89,7 +89,7 @@ cpuattach(struct device *parent, struct > extern u_int cpu_ticksnum, cpu_ticksdenom; > extern u_int fpu_enable; > /* clock.c */ > - extern int cpu_hardclock(void *); > + extern int cpu_clockintr(void *); > /* ipi.c */ > extern int hppa_ipi_intr(void *); > > @@ -173,7 +173,7 @@ cpuattach(struct device *parent, struct > printf(", %u/%u D/I BTLBs", > pdc_btlb.finfo.num_i, pdc_btlb.finfo.num_d); > > - cpu_intr_establish(IPL_CLOCK, 31, cpu_hardclock, NULL, "clock"); > + cpu_intr_establish(IPL_CLOCK, 31, cpu_clockintr, NULL, "clock"); > #ifdef MULTIPROCESSOR > cpu_intr_establish(IPL_IPI, 30, hppa_ipi_intr, NULL, "ipi"); > #endif > Index: sys/arch/i386/i386/lapic.c > =================================================================== > RCS file: /cvs/src/sys/arch/i386/i386/lapic.c,v > retrieving revision 1.47 > diff -u -p -r1.47 lapic.c > --- sys/arch/i386/i386/lapic.c 30 Jul 2018 14:19:12 -0000 1.47 > +++ sys/arch/i386/i386/lapic.c 8 Jan 2021 15:56:25 -0000 > @@ -257,6 +257,7 @@ lapic_clockintr(void *arg) > struct clockframe *frame = arg; > > hardclock(frame); > + statclock(frame); > > clk_count.ec_count++; > } > Index: sys/arch/loongson/dev/glxclk.c > =================================================================== > RCS file: /cvs/src/sys/arch/loongson/dev/glxclk.c,v > retrieving revision 1.5 > diff -u -p -r1.5 glxclk.c > --- sys/arch/loongson/dev/glxclk.c 19 Jul 2015 21:11:47 -0000 1.5 > +++ sys/arch/loongson/dev/glxclk.c 8 Jan 2021 15:56:25 -0000 > @@ -286,6 +286,7 @@ glxclk_intr(void *arg) > return 1; > > hardclock(frame); > + statclock(frame); > > return 1; > } > Index: sys/arch/luna88k/luna88k/clock.c > =================================================================== > RCS file: /cvs/src/sys/arch/luna88k/luna88k/clock.c,v > retrieving revision 1.15 > diff -u -p -r1.15 clock.c > --- sys/arch/luna88k/luna88k/clock.c 6 Jul 2020 13:33:07 -0000 1.15 > +++ sys/arch/luna88k/luna88k/clock.c 8 Jan 2021 15:56:25 -0000 > @@ -165,8 +165,10 @@ clockintr(void *eframe) > clockevc->ec_count++; > > *(volatile uint32_t *)(ci->ci_clock_ack) = ~0; > - if (clockinitted) > + if (clockinitted) { > hardclock(eframe); > + statclock(eframe); > + } > return 1; > } > > Index: sys/arch/mips64/mips64/clock.c > =================================================================== > RCS file: /cvs/src/sys/arch/mips64/mips64/clock.c,v > retrieving revision 1.42 > diff -u -p -r1.42 clock.c > --- sys/arch/mips64/mips64/clock.c 30 Jun 2020 14:56:10 -0000 1.42 > +++ sys/arch/mips64/mips64/clock.c 8 Jan 2021 15:56:25 -0000 > @@ -151,6 +151,7 @@ cp0_int5(uint32_t mask, struct trapframe > while (ci->ci_pendingticks) { > cp0_clock_count.ec_count++; > hardclock(tf); > + statclock(tf); > ci->ci_pendingticks--; > } > #ifdef MULTIPROCESSOR > Index: sys/arch/sgi/localbus/int.c > =================================================================== > RCS file: /cvs/src/sys/arch/sgi/localbus/int.c,v > retrieving revision 1.15 > diff -u -p -r1.15 int.c > --- sys/arch/sgi/localbus/int.c 24 Feb 2018 11:42:31 -0000 1.15 > +++ sys/arch/sgi/localbus/int.c 8 Jan 2021 15:56:25 -0000 > @@ -524,6 +524,7 @@ int_8254_intr0(uint32_t hwpend, struct t > while (ci->ci_pendingticks) { > int_clock_count.ec_count++; > hardclock(tf); > + statclock(tf); > ci->ci_pendingticks--; > } > } > Index: sys/arch/sh/sh/clock.c > =================================================================== > RCS file: /cvs/src/sys/arch/sh/sh/clock.c,v > retrieving revision 1.11 > diff -u -p -r1.11 clock.c > --- sys/arch/sh/sh/clock.c 20 Oct 2020 15:59:17 -0000 1.11 > +++ sys/arch/sh/sh/clock.c 8 Jan 2021 15:56:25 -0000 > @@ -333,6 +333,7 @@ sh3_clock_intr(void *arg) /* trap frame > _reg_bclr_2(SH3_TCR0, TCR_UNF); > > hardclock(arg); > + statclock(arg); > > return (1); > } > @@ -354,6 +355,7 @@ sh4_clock_intr(void *arg) /* trap frame > _reg_bclr_2(SH4_TCR0, TCR_UNF); > > hardclock(arg); > + statclock(arg); > > return (1); > } >