Re: dt(4), hardclock(9): move interval, profile providers to dedicated callback

Scott Cheloha Fri, 25 Aug 2023 19:01:50 -0700

On Thu, Aug 24, 2023 at 07:21:29PM +0200, Martin Pieuchot wrote:
> On 23/08/23(Wed) 18:52, Scott Cheloha wrote:
> > This is the next patch in the clock interrupt reorganization series.
> 
> Thanks for your diff.  I'm sorry but it is really hard for me to help
> review this diff because there is still no man page for this API+subsystem.
> 
> Can we start with that please?


Sure, a first draft of a clockintr_establish.9 manpage is included
below.

We also have a manpage in the tree, clockintr.9.  It is a bit out of
date, but it covers the broad strokes of how the driver-facing portion
of the subsystem works.

> > This patch moves the entry points for the interval and profile dt(4)
> > providers from the hardclock(9) to a dedicated clock interrupt
> > callback, dt_prov_profile_intr(), in dev/dt/dt_prov_profile.c.
> > 
> > - To preserve current behavior, (1) both provider entrypoints have
> >   been moved into a single callback function, (2) the interrupt runs at
> >   the same frequency as the hardclock, and (3) the interrupt is
> >   staggered to co-occur with the hardclock on a given CPU.
> 
> The only behavior that needs to be preserved is the output of dumping
> stacks.  That means DT_FA_PROFILE and DT_FA_STATIC certainly needs to
> be adapted with this change.  You can figure that out by looking at the
> output of /usr/src/share/btrace/kprofile.bt without and with this diff.
> 
> Please generate a FlameGraph to make sure they're still the same.

dt_prov_profile_intr() runs at the same stack depth as hardclock(), so
indeed they are still the same.

> Apart from that I'd prefer if we could skip the mechanical change and
> go straight to what dt(4) needs.  Otherwise we will have to re-design
> everything.

I think a mechanical "move the code from point A to point B" patch is
useful.  It makes the changes easier to follow when tracing the
revision history in the future.

If you insist on skipping it, though, I think I can live without it.

> If you don't want to do this work, then leave it and tell
> me what you need and what is your plan so I can help you and do it
> myself.

I am very keen to do this, or at least to help with it.

> dt(4) needs a way to schedule two different kind of periodic timeouts
> with the higher precision possible.  It is currently plugged to hardclock
> because there is nothing better.

Yes.

> The current code assumes the periodic entry points are external to dt(4).
> This diff moves them in the middle of dt(4) but keeps the existing flow
> which makes the code very convoluted.
> 
> A starting point to understand the added complexity it so see that the
> DT_ENTER() macro are no longer needed if we move the entry points inside
> dt(4).

I did see that.  It seems really easy to remove the macros in a
subsequent patch, though.

Again, if you want to do it all in one patch that's OK.

> The first periodic timeout is dt_prov_interval_enter().  It could be
> implemented as a per-PCB timeout_add_nsec(9).  The drawback of this
> approach is that it uses too much code in the kernel which is a problem
> when instrumenting the kernel itself.  Every subsystem used by dt(4) is
> impossible to instrument with btrace(8).

I think you can avoid this instrumentation problem by using clockintr,
where the callback functions are run from the hardware interrupt
context, just like hardclock().

> The second periodic timeout it dt_prov_profile_enter().  It is similar
> to the previous one and has to run on every CPU.
> 
> Both are currently bound to tick, but we want per-PCB time resolution.
> We can get rid of `dp_nticks' and `dp_maxtick' if we control when the
> timeouts fires.

My current thought is that each PCB could have its own "dp_period" or
"dp_interval", a count of nanoseconds.

> > - Each dt_pcb gets a provider-specific "dp_clockintr" pointer.  If the
> >   PCB's implementing provider needs a clock interrupt to do its work, it
> >   stores the handle in dp_clockintr.  The handle, if any, is established
> >   during dtpv_alloc() and disestablished during dtpv_dealloc().
> 
> Sorry, but as I said I don't understand what is a clockintr.  How does it
> fit in the kernel and how is it supposed to be used?

The goal of clockintr is to provide a machine-independent API for
scheduling clock interrupts.  You can use it to implement something
like hardclock() or statclock().  We are already using it to implement
these functions, among others.

> Why have it per PCB and not per provider or for the whole driver instead?
> Per-PCB implies that if I run 3 different profiling on a 32 CPU machines
> I now have 96 different clockintr.  Is it what we want?

Yes, I think that sounds fine.  If we run into scaling problems we can
always just change the underlying data structure to an RB tree or a
minheap.

> If so, why not simply use timeout(9)?  What's the difference?

Some code needs to run from a hardware interrupt context.  It's going
to be easier to profile more of the kernel if we're collecting stack
traces during a clock interrupt.

Timeouts run at IPL_SOFTCLOCK.  You can profile process context
code...  that's it.  Timeouts also only run on a single CPU.  Per-CPU
timeout wheels are at least a year away if not longer.

> >   One alternative is to start running the clock interrupts when they
> >   are allocated in dtpv_alloc() and stop them when they are freed in
> >   dtpv_dealloc().  This is wasteful, though: the PCBs are not recording
> >   yet, so the interrupts won't perform any useful work until the
> >   controlling process enables recording.
> > 
> >   An additional pair of provider hooks, e.g. "dtpv_record_start" and
> >   "dtpv_record_stop", might resolve this.
> 
> Another alternative would be to have a single hardclock-like handler for
> dt(4).  Sorry, I don't understand the big picture behind clockintr, so I
> can't help.
> 
> > - We haven't needed to destroy clock interrupts yet, so the
> >   clockintr_disestablish() function used in this patch is new.
> 
> So maybe that's fishy.  Are they supposed to be destroyed? [...]

Let's start with what we have above and circle back to this.

Index: share/man/man9/Makefile
===================================================================
RCS file: /cvs/src/share/man/man9/Makefile,v
retrieving revision 1.308
diff -u -p -r1.308 Makefile
--- share/man/man9/Makefile     5 Nov 2022 19:29:45 -0000       1.308
+++ share/man/man9/Makefile     26 Aug 2023 01:44:37 -0000
@@ -9,7 +9,7 @@ MAN=    aml_evalnode.9 atomic_add_int.9 ato
        audio.9 autoconf.9 \
        bemtoh32.9 bio_register.9 bintimeadd.9 boot.9 bpf_mtap.9 buffercache.9 \
        bufq_init.9 bus_dma.9 bus_space.9 \
-       clockintr.9 \
+       clockintr.9 clockintr_establish.9 \
        copy.9 cond_init.9 config_attach.9 config_defer.9 counters_alloc.9 \
        cpumem_get.9 crypto.9 \
        delay.9 disk.9 disklabel.9 dma_alloc.9 dohooks.9 \
Index: share/man/man9/clockintr_establish.9
===================================================================
RCS file: share/man/man9/clockintr_establish.9
diff -N share/man/man9/clockintr_establish.9
--- /dev/null   1 Jan 1970 00:00:00 -0000
+++ share/man/man9/clockintr_establish.9        26 Aug 2023 01:44:37 -0000
@@ -0,0 +1,239 @@
+.\" $OpenBSD$
+.\"
+.\" Copyright (c) 2020-2023 Scott Cheloha <chel...@openbsd.org>
+.\"
+.\" Permission to use, copy, modify, and distribute this software for any
+.\" purpose with or without fee is hereby granted, provided that the above
+.\" copyright notice and this permission notice appear in all copies.
+.\"
+.\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+.\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+.\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+.\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+.\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+.\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+.\"
+.Dd $Mdocdate$
+.Dt CLOCKINTR_ESTABLISH 9
+.Os
+.Sh NAME
+.Nm clockintr_advance ,
+.Nm clockintr_cancel ,
+.Nm clockintr_disestablish ,
+.Nm clockintr_establish ,
+.Nm clockintr_expiration ,
+.Nm clockintr_nsecuptime ,
+.Nm clockintr_schedule ,
+.Nm clockintr_stagger
+.Nd schedule a function for execution from clock interrupt context
+.Sh SYNOPSIS
+.In sys/clockintr.h
+.Ft uint64_t
+.Fo clockintr_advance
+.Fa "struct clockintr *cl"
+.Fa "uint64_t interval"
+.Fc
+.Ft void
+.Fo clockintr_cancel
+.Fa "struct clockintr *cl"
+.Fc
+.Ft void
+.Fo clockintr_disestablish
+.Fa "struct clockintr *cl"
+.Fc
+.Ft struct clockintr *
+.Fo clockintr_establish
+.Fa "struct clockintr_queue *queue"
+.Fa "void (*func)(struct clockintr *, void *)"
+.Fc
+.Ft uint64_t
+.Fo clockintr_expiration
+.Fa "const struct clockintr *cl"
+.Fc
+.Ft uint64_t
+.Fo clockintr_nsecuptime
+.Fa "const struct clockintr *cl"
+.Fc
+.Ft void
+.Fo clockintr_schedule
+.Fa "struct clockintr *cl"
+.Fa "uint64_t abs"
+.Fc
+.Ft void
+.Fo clockintr_stagger
+.Fa "struct clockintr *cl"
+.Fa "uint64_t interval"
+.Fa "u_int numerator"
+.Fa "u_int denominator"
+.Fc
+.Sh DESCRIPTION
+The clock interrupt subsystem schedules functions for asynchronous execution
+in the future from a hardware interrupt context.
+.Pp
+The
+.Fn clockintr_establish
+function allocates a new clock interrupt object
+.Po
+a
+.Dq clockintr
+.Pc
+and binds it to the given clock interrupt
+.Fa queue .
+When the clockintr is executed,
+the callback function
+.Fa func
+will be called from a hardware interrupt context on the CPU in control of the
+.Fa queue .
+The new clockintr is not initially scheduled for execution and has an
+expiration time of zero.
+.Pp
+The
+.Fn clockintr_schedule
+function schedules the given clockintr for execution at or after the
+absolute time
+.Fa abs
+has elapsed on the system uptime clock.
+If the clockintr is already pending,
+its original expiration time is quietly superseded by
+.Fa abs .
+.Pp
+The
+.Fn clockintr_advance
+function reschedules the given clockintr to expire in the future at an offset
+from the current expiration time equal to an integral multiple of the given
+.Fa interval ,
+a non-zero count of nanoseconds.
+This interface is useful for rescheduling periodic clock interrupts.
+.Pp
+The
+.Fn clockintr_cancel
+function cancels any pending execution of the given clockintr.
+Its internal expiration time is unmodified.
+.Pp
+The
+.Fn clockintr_disestablish
+function cancels any pending execution of the given clockintr and frees
+its underlying object.
+It is an error to use a clockintr after it is disestablished.
+It is an error for a callback function to disestablish its own clockintr.
+.Pp
+The
+.Fn clockintr_expiration
+function returns the given clockintr's expiration time.
+.Pp
+The
+.Fn clockintr_nsecuptime
+function returns the system uptime value cached by the clock interrupt
+subsystem.
+The result is reasonably accurate and is produced much more quickly than
+.Xr nsecuptime 9 .
+It is an error to call this function outside of a clockintr callback function.
+.Pp
+The
+.Fn clockintr_stagger
+function resets the given clockintr's expiration time to a fraction of the 
given
+.Fa interval ,
+a count of nanoseconds.
+The clockintr is not scheduled for execution:
+only its internal expiration time is modified.
+This interface may be used in combination with
+.Fn clockintr_advance
+to minimize the likelihood that a periodic clock interrupt will execute
+simultaneously on more than one CPU.
+It is an error if the
+.Fa numerator
+is greater than or equal to the
+.Fa denominator .
+It is an error to stagger a clockintr that is already scheduled for execution.
+.Sh CONTEXT
+The
+.Fn clockintr_advance ,
+.Fn clockintr_cancel ,
+.Fn clockintr_expiration ,
+.Fn clockintr_schedule ,
+and
+.Fn clockintr_stagger
+functions may be called during autoconf,
+from process context,
+or from interrupt context.
+.Pp
+The
+.Fn clockintr_disestablish
+and
+.Fn clockintr_establish
+functions may be called during autoconf or from process context.
+.Pp
+The
+.Fn clockintr_nsecuptime
+function may only be called during a clockintr callback function.
+.Pp
+When a clockintr is executed,
+the callback function
+.Fa func
+set by
+.Fn clockintr_establish
+is called from the
+.Dv IPL_CLOCK
+interrupt context on the CPU in control of the clockintr's parent
+.Fa queue .
+The function
+.Fa func
+must not block.
+The first argument to
+.Fa func
+is a pointer to a private copy of the underlying clock interrupt object.
+The callback function may use this pointer to reschedule the clockintr.
+The second argument to
+.Fa func
+is a pointer to the current CPU's machine-dependent clockframe object.
+.Sh RETURN VALUES
+The
+.Fn clockintr_advance
+function returns the number of intervals that have elapsed since the 
clockintr's
+expiration time,
+or zero if the clockintr's expiration time has not yet elapsed.
+.Pp
+The
+.Fn clockintr_establish
+function returns a pointer to a new clock interrupt handle on success,
+or
+.Dv NULL
+if the allocation fails.
+.Pp
+The
+.Fn clockintr_expiration
+and
+.Fn clockintr_nsecuptime
+functions' return values are as described above.
+.Sh CODE REFERENCES
+.Pa sys/kern/kern_clockintr.c
+.Sh SEE ALSO
+.Xr hardclock 9 ,
+.Xr hz 9 ,
+.Xr inittodr 9 ,
+.Xr nanouptime 9 ,
+.Xr spl 9 ,
+.Xr tc_init 9 ,
+.Xr timeout 9
+.Rs
+.%A Steven McCanne
+.%A Chris Torek
+.%T A Randomized Sampling Clock for CPU Utilization Estimation and Code 
Profiling
+.%B \&In Proc. Winter 1993 USENIX Conference
+.%D 1993
+.%P pp. 387\(en394
+.%I USENIX Association
+.Re
+.Rs
+.%A Richard McDougall
+.%A Jim Mauro
+.%B Solaris Internals: Solaris 10 and OpenSolaris Kernel Architecture
+.%I Prentice Hall
+.%I Sun Microsystems Press
+.%D 2nd Edition, 2007
+.%P pp. 912\(en925
+.Re
+.Sh HISTORY
+The clock interrupt subsystem first appeared in
+.Ox 7.3 .
Index: sys/kern/kern_clockintr.c
===================================================================
RCS file: /cvs/src/sys/kern/kern_clockintr.c,v
retrieving revision 1.32
diff -u -p -r1.32 kern_clockintr.c
--- sys/kern/kern_clockintr.c   21 Aug 2023 17:22:04 -0000      1.32
+++ sys/kern/kern_clockintr.c   26 Aug 2023 01:44:38 -0000
@@ -218,7 +218,7 @@ clockintr_dispatch(void *frame)
 {
        uint64_t lateness, run = 0, start;
        struct cpu_info *ci = curcpu();
-       struct clockintr *cl;
+       struct clockintr *cl, *shadow;
        struct clockintr_queue *cq = &ci->ci_queue;
        u_int ogen;
 
@@ -257,22 +257,26 @@ clockintr_dispatch(void *frame)
                                break;
                }
                clockintr_cancel_locked(cl);
-               cq->cq_shadow.cl_expiration = cl->cl_expiration;
+               shadow = &cq->cq_shadow;
+               shadow->cl_expiration = cl->cl_expiration;
+               shadow->cl_func = cl->cl_func;
                cq->cq_running = cl;
                mtx_leave(&cq->cq_mtx);
 
-               cl->cl_func(&cq->cq_shadow, frame);
+               shadow->cl_func(shadow, frame);
 
                mtx_enter(&cq->cq_mtx);
-               cq->cq_running = NULL;
-               if (ISSET(cl->cl_flags, CLST_IGNORE_SHADOW)) {
-                       CLR(cl->cl_flags, CLST_IGNORE_SHADOW);
-                       CLR(cq->cq_shadow.cl_flags, CLST_SHADOW_PENDING);
-               }
-               if (ISSET(cq->cq_shadow.cl_flags, CLST_SHADOW_PENDING)) {
-                       CLR(cq->cq_shadow.cl_flags, CLST_SHADOW_PENDING);
-                       clockintr_schedule_locked(cl,
-                           cq->cq_shadow.cl_expiration);
+               if (cq->cq_running != NULL) {
+                       cq->cq_running = NULL;
+                       if (ISSET(cl->cl_flags, CLST_IGNORE_SHADOW)) {
+                               CLR(cl->cl_flags, CLST_IGNORE_SHADOW);
+                               CLR(shadow->cl_flags, CLST_SHADOW_PENDING);
+                       }
+                       if (ISSET(shadow->cl_flags, CLST_SHADOW_PENDING)) {
+                               CLR(shadow->cl_flags, CLST_SHADOW_PENDING);
+                               clockintr_schedule_locked(cl,
+                                   shadow->cl_expiration);
+                       }
                }
                run++;
        }
@@ -382,6 +386,34 @@ clockintr_cancel_locked(struct clockintr
 
        TAILQ_REMOVE(&cq->cq_pend, cl, cl_plink);
        CLR(cl->cl_flags, CLST_PENDING);
+}
+
+void
+clockintr_disestablish(struct clockintr *cl)
+{
+       struct clockintr_queue *cq = cl->cl_queue;
+       int was_next;
+
+       if (cl == &cq->cq_shadow)
+               panic("%s: called during dispatch\n", __func__);
+
+       mtx_enter(&cq->cq_mtx);
+       if (ISSET(cl->cl_flags, CLST_PENDING)) {
+               was_next = cl == TAILQ_FIRST(&cq->cq_pend);
+               clockintr_cancel_locked(cl);
+               if (ISSET(cq->cq_flags, CQ_INTRCLOCK)) {
+                       if (was_next && !TAILQ_EMPTY(&cq->cq_pend)) {
+                               if (cq == &curcpu()->ci_queue)
+                                       clockqueue_reset_intrclock(cq);
+                       }
+               }
+       }
+       if (cl == cq->cq_running)
+               cq->cq_running = NULL;
+       TAILQ_REMOVE(&cq->cq_est, cl, cl_elink);
+       mtx_leave(&cq->cq_mtx);
+
+       free(cl, M_DEVBUF, sizeof *cl);
 }
 
 struct clockintr *
Index: sys/kern/kern_clock.c
===================================================================
RCS file: /cvs/src/sys/kern/kern_clock.c,v
retrieving revision 1.115
diff -u -p -r1.115 kern_clock.c
--- sys/kern/kern_clock.c       23 Aug 2023 01:55:45 -0000      1.115
+++ sys/kern/kern_clock.c       26 Aug 2023 01:44:38 -0000
@@ -49,11 +49,6 @@
 #include <sys/sched.h>
 #include <sys/timetc.h>
 
-#include "dt.h"
-#if NDT > 0
-#include <dev/dt/dtvar.h>
-#endif
-
 /*
  * Clock handling routines.
  *
@@ -114,12 +109,6 @@ initclocks(void)
 void
 hardclock(struct clockframe *frame)
 {
-#if NDT > 0
-       DT_ENTER(profile, NULL);
-       if (CPU_IS_PRIMARY(curcpu()))
-               DT_ENTER(interval, NULL);
-#endif
-
        /*
         * If we are not the primary CPU, we're not allowed to do
         * any more work.
Index: sys/sys/clockintr.h
===================================================================
RCS file: /cvs/src/sys/sys/clockintr.h,v
retrieving revision 1.10
diff -u -p -r1.10 clockintr.h
--- sys/sys/clockintr.h 21 Aug 2023 17:22:04 -0000      1.10
+++ sys/sys/clockintr.h 26 Aug 2023 01:44:38 -0000
@@ -128,6 +128,7 @@ void clockintr_trigger(void);
 
 uint64_t clockintr_advance(struct clockintr *, uint64_t);
 void clockintr_cancel(struct clockintr *);
+void clockintr_disestablish(struct clockintr *);
 struct clockintr *clockintr_establish(struct clockintr_queue *,
     void (*)(struct clockintr *, void *));
 void clockintr_stagger(struct clockintr *, uint64_t, u_int, u_int);
Index: sys/dev/dt/dt_prov_profile.c
===================================================================
RCS file: /cvs/src/sys/dev/dt/dt_prov_profile.c,v
retrieving revision 1.5
diff -u -p -r1.5 dt_prov_profile.c
--- sys/dev/dt/dt_prov_profile.c        26 Apr 2023 16:53:59 -0000      1.5
+++ sys/dev/dt/dt_prov_profile.c        26 Aug 2023 01:44:38 -0000
@@ -20,6 +20,7 @@
 #include <sys/systm.h>
 #include <sys/param.h>
 #include <sys/atomic.h>
+#include <sys/clockintr.h>
 
 #include <dev/dt/dtvar.h>
 
@@ -31,6 +32,8 @@ struct dt_probe       *dtpp_interval;         /* glob
 
 int    dt_prov_profile_alloc(struct dt_probe *, struct dt_softc *,
            struct dt_pcb_list *, struct dtioc_req *);
+int    dt_prov_profile_dealloc(struct dt_probe *, struct dt_softc *,
+           struct dtioc_req *);
 int    dt_prov_profile_enter(struct dt_provider *, ...);
 int    dt_prov_interval_enter(struct dt_provider *, ...);
 
@@ -39,7 +42,7 @@ struct dt_provider dt_prov_profile = {
        .dtpv_alloc     = dt_prov_profile_alloc,
        .dtpv_enter     = dt_prov_profile_enter,
        .dtpv_leave     = NULL,
-       .dtpv_dealloc   = NULL,
+       .dtpv_dealloc   = dt_prov_profile_dealloc,
 };
 
 struct dt_provider dt_prov_interval = {
@@ -47,9 +50,11 @@ struct dt_provider dt_prov_interval = {
        .dtpv_alloc     = dt_prov_profile_alloc,
        .dtpv_enter     = dt_prov_interval_enter,
        .dtpv_leave     = NULL,
-       .dtpv_dealloc   = NULL,
+       .dtpv_dealloc   = dt_prov_profile_dealloc,
 };
 
+void dt_prov_profile_intr(struct clockintr *, void *);
+
 int
 dt_prov_profile_init(void)
 {
@@ -85,10 +90,17 @@ dt_prov_profile_alloc(struct dt_probe *d
                        continue;
 
                dp = dt_pcb_alloc(dtp, sc);
-               if (dp == NULL) {
-                       dt_pcb_purge(plist);
-                       return ENOMEM;
+               if (dp == NULL)
+                       goto purge;
+
+               dp->dp_clockintr = clockintr_establish(&ci->ci_queue,
+                   dt_prov_profile_intr);
+               if (dp->dp_clockintr == NULL) {
+                       dt_pcb_free(dp);
+                       goto purge;
                }
+               clockintr_stagger(dp->dp_clockintr, hardclock_period,
+                   CPU_INFO_UNIT(ci), MAXCPUS);
 
                dp->dp_maxtick = hz / dtrq->dtrq_rate;
                dp->dp_cpuid = ci->ci_cpuid;
@@ -99,6 +111,31 @@ dt_prov_profile_alloc(struct dt_probe *d
        }
 
        return 0;
+purge:
+       TAILQ_FOREACH(dp, plist, dp_snext)
+               clockintr_disestablish(dp->dp_clockintr);
+       dt_pcb_purge(plist);
+       return ENOMEM;
+}
+
+int
+dt_prov_profile_dealloc(struct dt_probe *dtp, struct dt_softc *sc,
+    struct dtioc_req *dtrq)
+{
+       struct dt_pcb *dp;
+
+       KASSERT(dtioc_req_isvalid(dtrq));
+       KASSERT(dtp == dtpp_profile || dtp == dtpp_interval);
+
+       TAILQ_FOREACH(dp, &sc->ds_pcbs, dp_snext) {
+               if (dp->dp_dtp == dtp) {
+                       if (dp->dp_clockintr != NULL) {
+                               clockintr_disestablish(dp->dp_clockintr);
+                               dp->dp_clockintr = NULL;
+                       }
+               }
+       }
+       return 0;
 }
 
 static inline void
@@ -148,4 +185,18 @@ dt_prov_interval_enter(struct dt_provide
        }
        smr_read_leave();
        return 0;
+}
+
+void
+dt_prov_profile_intr(struct clockintr *cl, void *cf)
+{
+       uint64_t count, i;
+       struct cpu_info *ci = curcpu();
+
+       count = clockintr_advance(cl, hardclock_period);
+       for (i = 0; i < count; i++) {
+               DT_ENTER(profile, NULL);
+               if (CPU_IS_PRIMARY(ci))
+                       DT_ENTER(interval, NULL);
+       }
 }
Index: sys/dev/dt/dt_dev.c
===================================================================
RCS file: /cvs/src/sys/dev/dt/dt_dev.c,v
retrieving revision 1.28
diff -u -p -r1.28 dt_dev.c
--- sys/dev/dt/dt_dev.c 14 Jul 2023 07:07:08 -0000      1.28
+++ sys/dev/dt/dt_dev.c 26 Aug 2023 01:44:39 -0000
@@ -19,6 +19,7 @@
 #include <sys/types.h>
 #include <sys/systm.h>
 #include <sys/param.h>
+#include <sys/clockintr.h>
 #include <sys/device.h>
 #include <sys/exec_elf.h>
 #include <sys/malloc.h>
@@ -82,32 +83,6 @@
 
 #define DPRINTF(x...) /* nothing */
 
-/*
- * Descriptor associated with each program opening /dev/dt.  It is used
- * to keep track of enabled PCBs.
- *
- *  Locks used to protect struct members in this file:
- *     m       per-softc mutex
- *     K       kernel lock
- */
-struct dt_softc {
-       SLIST_ENTRY(dt_softc)    ds_next;       /* [K] descriptor list */
-       int                      ds_unit;       /* [I] D_CLONE unique unit */
-       pid_t                    ds_pid;        /* [I] PID of tracing program */
-
-       struct mutex             ds_mtx;
-
-       struct dt_pcb_list       ds_pcbs;       /* [K] list of enabled PCBs */
-       struct dt_evt           *ds_bufqueue;   /* [K] copy evts to userland */
-       size_t                   ds_bufqlen;    /* [K] length of the queue */
-       int                      ds_recording;  /* [K] currently recording? */
-       int                      ds_evtcnt;     /* [m] # of readable evts */
-
-       /* Counters */
-       uint64_t                 ds_readevt;    /* [m] # of events read */
-       uint64_t                 ds_dropevt;    /* [m] # of events dropped */
-};
-
 SLIST_HEAD(, dt_softc) dtdev_list;     /* [K] list of open /dev/dt nodes */
 
 /*
@@ -486,6 +461,14 @@ dt_ioctl_record_start(struct dt_softc *s
                SMR_SLIST_INSERT_HEAD_LOCKED(&dtp->dtp_pcbs, dp, dp_pnext);
                dtp->dtp_recording++;
                dtp->dtp_prov->dtpv_recording++;
+
+               /*
+                * XXX Not provider-agnostic.  Is there a better place
+                * to do this?
+                */
+               if (dp->dp_clockintr != NULL)
+                       clockintr_advance(dp->dp_clockintr, hardclock_period);
+               
        }
        rw_exit_write(&dt_lock);
 
@@ -514,6 +497,11 @@ dt_ioctl_record_stop(struct dt_softc *sc
 
                dtp->dtp_recording--;
                dtp->dtp_prov->dtpv_recording--;
+
+               /* XXX This is not provider-agnostic. */
+               if (dp->dp_clockintr != NULL)
+                       clockintr_cancel(dp->dp_clockintr);
+
                SMR_SLIST_REMOVE_LOCKED(&dtp->dtp_pcbs, dp, dt_pcb, dp_pnext);
        }
        rw_exit_write(&dt_lock);
Index: sys/dev/dt/dtvar.h
===================================================================
RCS file: /cvs/src/sys/dev/dt/dtvar.h,v
retrieving revision 1.17
diff -u -p -r1.17 dtvar.h
--- sys/dev/dt/dtvar.h  26 Apr 2023 16:53:59 -0000      1.17
+++ sys/dev/dt/dtvar.h  26 Aug 2023 01:44:41 -0000
@@ -158,12 +158,39 @@ struct dtioc_getaux {
 #include <sys/queue.h>
 #include <sys/smr.h>
 
+struct dt_pcb;
+TAILQ_HEAD(dt_pcb_list, dt_pcb);
+
 /* Flags that make sense for all providers. */
 #define DTEVT_COMMON   (DTEVT_EXECNAME|DTEVT_KSTACK|DTEVT_USTACK)
 
 #define M_DT M_DEVBUF /* XXX FIXME */
 
-struct dt_softc;
+/*
+ * Descriptor associated with each program opening /dev/dt.  It is used
+ * to keep track of enabled PCBs.
+ *
+ *  Locks used to protect struct members in this file:
+ *     m       per-softc mutex
+ *     K       kernel lock
+ */
+struct dt_softc {
+       SLIST_ENTRY(dt_softc)    ds_next;       /* [K] descriptor list */
+       int                      ds_unit;       /* [I] D_CLONE unique unit */
+       pid_t                    ds_pid;        /* [I] PID of tracing program */
+
+       struct mutex             ds_mtx;
+
+       struct dt_pcb_list       ds_pcbs;       /* [K] list of enabled PCBs */
+       struct dt_evt           *ds_bufqueue;   /* [K] copy evts to userland */
+       size_t                   ds_bufqlen;    /* [K] length of the queue */
+       int                      ds_recording;  /* [K] currently recording? */
+       int                      ds_evtcnt;     /* [m] # of readable evts */
+
+       /* Counters */
+       uint64_t                 ds_readevt;    /* [m] # of events read */
+       uint64_t                 ds_dropevt;    /* [m] # of events dropped */
+};
 
 int            dtioc_req_isvalid(struct dtioc_req *);
 
@@ -197,6 +224,7 @@ struct dt_pcb {
        struct dt_filter         dp_filter;     /* [I] filter to match */
 
        /* Provider specific fields. */
+       struct clockintr        *dp_clockintr;  /* [?] clock interrupt handle */
        unsigned int             dp_cpuid;      /* [I] on which CPU */
        unsigned int             dp_maxtick;    /* [I] freq. of profiling */
        unsigned int             dp_nticks;     /* [c] current tick count */
@@ -204,8 +232,6 @@ struct dt_pcb {
        /* Counters */
        uint64_t                 dp_dropevt;    /* [m] # dropped event */
 };
-
-TAILQ_HEAD(dt_pcb_list, dt_pcb);
 
 struct dt_pcb  *dt_pcb_alloc(struct dt_probe *, struct dt_softc *);
 void            dt_pcb_free(struct dt_pcb *);

Re: dt(4), hardclock(9): move interval, profile providers to dedicated callback

Reply via email to