> Date: Fri, 8 Jan 2021 10:18:27 -0600
> From: Scott Cheloha <scottchel...@gmail.com>
> 
> On Thu, Jan 07, 2021 at 08:12:10PM -0600, Scott Cheloha wrote:
> > On Thu, Jan 07, 2021 at 09:37:58PM +0100, Mark Kettenis wrote:
> > > > Date: Thu, 7 Jan 2021 11:15:41 -0600
> > > > From: Scott Cheloha <scottchel...@gmail.com>
> > > > 
> > > > Hi,
> > > > 
> > > > I want to isolate statclock() from hardclock(9).  This will simplify
> > > > the logic in my WIP dynamic clock interrupt framework.
> > > > 
> > > > Currently, if stathz is zero, we call statclock() from within
> > > > hardclock(9).  It looks like this (see sys/kern/kern_clock.c):
> > > > 
> > > > void
> > > > hardclock(struct clockframe *frame)
> > > > {
> > > >         /* [...] */
> > > > 
> > > >         if (stathz == 0)
> > > >                 statclock(frame);
> > > > 
> > > >         /* [...] */
> > > > 
> > > > This is the case on alpha, amd64 (w/ lapic), hppa, i386 (w/ lapic),
> > > > loongson, luna88k, mips64, and sh.
> > > > 
> > > > (We seem to do it on sgi, too.  I was under the impression that sgi
> > > > *was* a mips64 platform, yet sgi seems to it have its own clock
> > > > interrupt code distinct from mips64's general clock interrupt code
> > > > which is used by e.g. octeon).
> > > > 
> > > > However, if stathz is not zero we call statclock() separately.  This
> > > > is the case on armv7, arm, arm64, macppc, powerpc64, and sparc64.
> > > > 
> > > > (The situation for the general powerpc code and socppc in particular
> > > > is a mystery to me.)
> > > > 
> > > > If we could remove this MD distinction it would make my MI framework
> > > > simpler.  Instead of checking stathz and conditionally starting a
> > > > statclock event I will be able to unconditionally start a statclock
> > > > event on all platforms on every CPU.
> > > > 
> > > > In general I don't think the "is stathz zero?" variance between
> > > > platforms is useful:
> > > > 
> > > > - The difference is invisible to userspace, as we hide the fact that
> > > >   stathz is zero from e.g. the kern.clockrate sysctl.
> > > > 
> > > > - We run statclock() at some point, regardless of whether stathz is
> > > >   zero.  If statclock() is run from hardclock(9) then isn't stathz
> > > >   effectively equal to hz?
> > > > 
> > > > - Because stathz might be zero we need to add a bunch of safety checks
> > > >   to our MI code to ensure we don't accidentally divide by zero.
> > > > 
> > > > Maybe we can ensure stathz is non-zero in a later diff...
> > > > 
> > > > --
> > > > 
> > > > Anyway, I don't think I have missed any platforms.  However, if
> > > > platform experts could weigh in here to verify my changes (and test
> > > > them!) I'd really appreciate it.
> > > > 
> > > > In particular, I'm confused about how clock interrupts work on
> > > > powerpc, socppc, and sgi.
> > > > 
> > > > --
> > > > 
> > > > Thoughts?  Platform-specific OKs?
> > > 
> > > I wouldn't be opposed to doing this.  It is less magic!
> > > 
> > > But yes, this needs to be tested on the platforms that you change.
> > 
> > I guess I'll CC all the platform-specific people I'm aware of.
> > 
> > > Note that many platforms don't have have separate schedclock and
> > > statclock.  But on many platforms where we use a one-shot timer as the
> > > clock we have a randomized statclock.  I'm sure Theo would love to
> > > tell you about the cpuhog story...
> > 
> > I am familiar with cpuhog.  It's the one thing everybody mentions when
> > I talk about clock interrupts and/or statclock().
> > 
> > Related:
> > 
> > I wonder if we could avoid the cpuhog problem entirely by implementing
> > some kind of MI cycle counting clock API that we use to timestamp
> > whenever we cross the syscall boundary, or enter an interrupt, etc.,
> > to determine the time a thread spends using the CPU without any
> > sampling error.
> > 
> > Instead of a process accumulating ticks from a sampling clock
> > interrupt you would accumulate, say, a 64-bit count of cycles, or
> > something like that.
> > 
> > Sampling with a regular clock interrupt is prone to error and trickery
> > like cpuhog.  The classic BSD solution to the cpuhog exploit was to
> > randomize the statclock/schedclock to make it harder to fool the
> > sampler.  But if we used cycle counts or instruction counts at each
> > state transition it would be impossible to fool because we wouldn't be
> > sampling at all.
> > 
> > Unsure what the performance implications would be, but in general I
> > would guess that most platforms have a way to count instructions or
> > cycles and that reading this data is fast enough for us to use it in
> > e.g. syscall() or the interrupt handler without a huge performance
> > hit.
> > 
> > > Anyway, we probably want that on amd64 as well.
> > 
> > My WIP dynamic clock interrupt system can run a randomized statclock()
> > on amd64 boxes with a lapic.  I imagine we will be able to do the same
> > on i386 systems that have a lapic, too, though it will be slower
> > because all the i386 timecounters are glacial compared to the TSC.
> > 
> > Eventually I want to isolate schedclock() from statclock() and run it
> > as an independent event.  But that's a "later on" goal.  For now I'm
> > just trying to get every platform as similar as possible to make
> > merging the dynamic clock interrupt work less painful.
> 
> Whoops, some garbage snuck into amd64/lapic.c.
> 
> Here's the patch without it.
> 
> Also, sorry if I've CC'd you and you're not the right person for one
> of these platforms/architectures.  My thinking is:
> 
> miod: loongson (?), sh
> aoyama: luna88k
> visa: mips64, sgi
> deraadt: alpha
> kettenis: hppa
> sthen: i386

hppa is happy with this

> Index: sys/kern/kern_clock.c
> ===================================================================
> RCS file: /cvs/src/sys/kern/kern_clock.c,v
> retrieving revision 1.101
> diff -u -p -r1.101 kern_clock.c
> --- sys/kern/kern_clock.c     21 Jan 2020 16:16:23 -0000      1.101
> +++ sys/kern/kern_clock.c     8 Jan 2021 15:56:24 -0000
> @@ -164,12 +164,6 @@ hardclock(struct clockframe *frame)
>               }
>       }
>  
> -     /*
> -      * If no separate statistics clock is available, run it from here.
> -      */
> -     if (stathz == 0)
> -             statclock(frame);
> -
>       if (--ci->ci_schedstate.spc_rrticks <= 0)
>               roundrobin(ci);
>  
> Index: sys/arch/alpha/alpha/clock.c
> ===================================================================
> RCS file: /cvs/src/sys/arch/alpha/alpha/clock.c,v
> retrieving revision 1.24
> diff -u -p -r1.24 clock.c
> --- sys/arch/alpha/alpha/clock.c      6 Jul 2020 13:33:06 -0000       1.24
> +++ sys/arch/alpha/alpha/clock.c      8 Jan 2021 15:56:24 -0000
> @@ -136,6 +136,13 @@ clockattach(dev, fns)
>   * Machine-dependent clock routines.
>   */
>  
> +void
> +clockintr(struct clockframe *frame)
> +{
> +     hardclock(frame);
> +     statclock(frame);
> +}
> +
>  /*
>   * Start the real-time and statistics clocks. Leave stathz 0 since there
>   * are no other timers available.
> @@ -165,7 +172,7 @@ cpu_initclocks(void)
>        * hardclock, which would then fall over because the pointer
>        * to the virtual timers wasn't set at that time.
>        */
> -     platform.clockintr = hardclock;
> +     platform.clockintr = clockintr;
>       schedhz = 16;
>  
>       evcount_attach(&clk_count, "clock", &clk_irq);
> Index: sys/arch/amd64/amd64/lapic.c
> ===================================================================
> RCS file: /cvs/src/sys/arch/amd64/amd64/lapic.c,v
> retrieving revision 1.57
> diff -u -p -r1.57 lapic.c
> --- sys/arch/amd64/amd64/lapic.c      6 Sep 2020 20:50:00 -0000       1.57
> +++ sys/arch/amd64/amd64/lapic.c      8 Jan 2021 15:56:25 -0000
> @@ -452,6 +452,7 @@ lapic_clockintr(void *arg, struct intrfr
>       floor = ci->ci_handled_intr_level;
>       ci->ci_handled_intr_level = ci->ci_ilevel;
>       hardclock((struct clockframe *)&frame);
> +     statclock((struct clockframe *)&frame);
>       ci->ci_handled_intr_level = floor;
>  
>       clk_count.ec_count++;
> Index: sys/arch/hppa/dev/clock.c
> ===================================================================
> RCS file: /cvs/src/sys/arch/hppa/dev/clock.c,v
> retrieving revision 1.31
> diff -u -p -r1.31 clock.c
> --- sys/arch/hppa/dev/clock.c 6 Jul 2020 13:33:07 -0000       1.31
> +++ sys/arch/hppa/dev/clock.c 8 Jan 2021 15:56:25 -0000
> @@ -43,7 +43,7 @@
>  
>  u_long       cpu_hzticks;
>  
> -int  cpu_hardclock(void *);
> +int  cpu_clockintr(void *);
>  u_int        itmr_get_timecount(struct timecounter *);
>  
>  struct timecounter itmr_timecounter = {
> @@ -106,7 +106,7 @@ cpu_initclocks(void)
>  }
>  
>  int
> -cpu_hardclock(void *v)
> +cpu_clockintr(void *v)
>  {
>       struct cpu_info *ci = curcpu();
>       u_long __itmr, delta, eta;
> @@ -114,14 +114,15 @@ cpu_hardclock(void *v)
>       register_t eiem;
>  
>       /*
> -      * Invoke hardclock as many times as there has been cpu_hzticks
> -      * ticks since the last interrupt.
> +      * Invoke hardclock and statclock as many times as there has been
> +      * cpu_hzticks ticks since the last interrupt.
>        */
>       for (;;) {
>               mfctl(CR_ITMR, __itmr);
>               delta = __itmr - ci->ci_itmr;
>               if (delta >= cpu_hzticks) {
>                       hardclock(v);
> +                     statclock(v);
>                       ci->ci_itmr += cpu_hzticks;
>               } else
>                       break;
> Index: sys/arch/hppa/dev/cpu.c
> ===================================================================
> RCS file: /cvs/src/sys/arch/hppa/dev/cpu.c,v
> retrieving revision 1.42
> diff -u -p -r1.42 cpu.c
> --- sys/arch/hppa/dev/cpu.c   29 May 2020 04:42:23 -0000      1.42
> +++ sys/arch/hppa/dev/cpu.c   8 Jan 2021 15:56:25 -0000
> @@ -89,7 +89,7 @@ cpuattach(struct device *parent, struct 
>       extern u_int cpu_ticksnum, cpu_ticksdenom;
>       extern u_int fpu_enable;
>       /* clock.c */
> -     extern int cpu_hardclock(void *);
> +     extern int cpu_clockintr(void *);
>       /* ipi.c */
>       extern int hppa_ipi_intr(void *);
>  
> @@ -173,7 +173,7 @@ cpuattach(struct device *parent, struct 
>               printf(", %u/%u D/I BTLBs",
>                   pdc_btlb.finfo.num_i, pdc_btlb.finfo.num_d);
>  
> -     cpu_intr_establish(IPL_CLOCK, 31, cpu_hardclock, NULL, "clock");
> +     cpu_intr_establish(IPL_CLOCK, 31, cpu_clockintr, NULL, "clock");
>  #ifdef MULTIPROCESSOR
>       cpu_intr_establish(IPL_IPI, 30, hppa_ipi_intr, NULL, "ipi");
>  #endif
> Index: sys/arch/i386/i386/lapic.c
> ===================================================================
> RCS file: /cvs/src/sys/arch/i386/i386/lapic.c,v
> retrieving revision 1.47
> diff -u -p -r1.47 lapic.c
> --- sys/arch/i386/i386/lapic.c        30 Jul 2018 14:19:12 -0000      1.47
> +++ sys/arch/i386/i386/lapic.c        8 Jan 2021 15:56:25 -0000
> @@ -257,6 +257,7 @@ lapic_clockintr(void *arg)
>       struct clockframe *frame = arg;
>  
>       hardclock(frame);
> +     statclock(frame);
>  
>       clk_count.ec_count++;
>  }
> Index: sys/arch/loongson/dev/glxclk.c
> ===================================================================
> RCS file: /cvs/src/sys/arch/loongson/dev/glxclk.c,v
> retrieving revision 1.5
> diff -u -p -r1.5 glxclk.c
> --- sys/arch/loongson/dev/glxclk.c    19 Jul 2015 21:11:47 -0000      1.5
> +++ sys/arch/loongson/dev/glxclk.c    8 Jan 2021 15:56:25 -0000
> @@ -286,6 +286,7 @@ glxclk_intr(void *arg)
>               return 1;
>  
>       hardclock(frame);
> +     statclock(frame);
>  
>       return 1;
>  }
> Index: sys/arch/luna88k/luna88k/clock.c
> ===================================================================
> RCS file: /cvs/src/sys/arch/luna88k/luna88k/clock.c,v
> retrieving revision 1.15
> diff -u -p -r1.15 clock.c
> --- sys/arch/luna88k/luna88k/clock.c  6 Jul 2020 13:33:07 -0000       1.15
> +++ sys/arch/luna88k/luna88k/clock.c  8 Jan 2021 15:56:25 -0000
> @@ -165,8 +165,10 @@ clockintr(void *eframe)
>               clockevc->ec_count++;
>  
>       *(volatile uint32_t *)(ci->ci_clock_ack) = ~0;
> -     if (clockinitted)
> +     if (clockinitted) {
>               hardclock(eframe);
> +             statclock(eframe);
> +     }
>       return 1;
>  }
>  
> Index: sys/arch/mips64/mips64/clock.c
> ===================================================================
> RCS file: /cvs/src/sys/arch/mips64/mips64/clock.c,v
> retrieving revision 1.42
> diff -u -p -r1.42 clock.c
> --- sys/arch/mips64/mips64/clock.c    30 Jun 2020 14:56:10 -0000      1.42
> +++ sys/arch/mips64/mips64/clock.c    8 Jan 2021 15:56:25 -0000
> @@ -151,6 +151,7 @@ cp0_int5(uint32_t mask, struct trapframe
>               while (ci->ci_pendingticks) {
>                       cp0_clock_count.ec_count++;
>                       hardclock(tf);
> +                     statclock(tf);
>                       ci->ci_pendingticks--;
>               }
>  #ifdef MULTIPROCESSOR
> Index: sys/arch/sgi/localbus/int.c
> ===================================================================
> RCS file: /cvs/src/sys/arch/sgi/localbus/int.c,v
> retrieving revision 1.15
> diff -u -p -r1.15 int.c
> --- sys/arch/sgi/localbus/int.c       24 Feb 2018 11:42:31 -0000      1.15
> +++ sys/arch/sgi/localbus/int.c       8 Jan 2021 15:56:25 -0000
> @@ -524,6 +524,7 @@ int_8254_intr0(uint32_t hwpend, struct t
>                       while (ci->ci_pendingticks) {
>                               int_clock_count.ec_count++;
>                               hardclock(tf);
> +                             statclock(tf);
>                               ci->ci_pendingticks--;
>                       }
>               }
> Index: sys/arch/sh/sh/clock.c
> ===================================================================
> RCS file: /cvs/src/sys/arch/sh/sh/clock.c,v
> retrieving revision 1.11
> diff -u -p -r1.11 clock.c
> --- sys/arch/sh/sh/clock.c    20 Oct 2020 15:59:17 -0000      1.11
> +++ sys/arch/sh/sh/clock.c    8 Jan 2021 15:56:25 -0000
> @@ -333,6 +333,7 @@ sh3_clock_intr(void *arg) /* trap frame 
>       _reg_bclr_2(SH3_TCR0, TCR_UNF);
>  
>       hardclock(arg);
> +     statclock(arg);
>  
>       return (1);
>  }
> @@ -354,6 +355,7 @@ sh4_clock_intr(void *arg) /* trap frame 
>       _reg_bclr_2(SH4_TCR0, TCR_UNF);
>  
>       hardclock(arg);
> +     statclock(arg);
>  
>       return (1);
>  }
> 

Reply via email to