On Thu, Aug 24, 2023 at 07:21:29PM +0200, Martin Pieuchot wrote: > On 23/08/23(Wed) 18:52, Scott Cheloha wrote: > > This is the next patch in the clock interrupt reorganization series. > > Thanks for your diff. I'm sorry but it is really hard for me to help > review this diff because there is still no man page for this API+subsystem. > > Can we start with that please?
Sure, a first draft of a clockintr_establish.9 manpage is included below. We also have a manpage in the tree, clockintr.9. It is a bit out of date, but it covers the broad strokes of how the driver-facing portion of the subsystem works. > > This patch moves the entry points for the interval and profile dt(4) > > providers from the hardclock(9) to a dedicated clock interrupt > > callback, dt_prov_profile_intr(), in dev/dt/dt_prov_profile.c. > > > > - To preserve current behavior, (1) both provider entrypoints have > > been moved into a single callback function, (2) the interrupt runs at > > the same frequency as the hardclock, and (3) the interrupt is > > staggered to co-occur with the hardclock on a given CPU. > > The only behavior that needs to be preserved is the output of dumping > stacks. That means DT_FA_PROFILE and DT_FA_STATIC certainly needs to > be adapted with this change. You can figure that out by looking at the > output of /usr/src/share/btrace/kprofile.bt without and with this diff. > > Please generate a FlameGraph to make sure they're still the same. dt_prov_profile_intr() runs at the same stack depth as hardclock(), so indeed they are still the same. > Apart from that I'd prefer if we could skip the mechanical change and > go straight to what dt(4) needs. Otherwise we will have to re-design > everything. I think a mechanical "move the code from point A to point B" patch is useful. It makes the changes easier to follow when tracing the revision history in the future. If you insist on skipping it, though, I think I can live without it. > If you don't want to do this work, then leave it and tell > me what you need and what is your plan so I can help you and do it > myself. I am very keen to do this, or at least to help with it. > dt(4) needs a way to schedule two different kind of periodic timeouts > with the higher precision possible. It is currently plugged to hardclock > because there is nothing better. Yes. > The current code assumes the periodic entry points are external to dt(4). > This diff moves them in the middle of dt(4) but keeps the existing flow > which makes the code very convoluted. > > A starting point to understand the added complexity it so see that the > DT_ENTER() macro are no longer needed if we move the entry points inside > dt(4). I did see that. It seems really easy to remove the macros in a subsequent patch, though. Again, if you want to do it all in one patch that's OK. > The first periodic timeout is dt_prov_interval_enter(). It could be > implemented as a per-PCB timeout_add_nsec(9). The drawback of this > approach is that it uses too much code in the kernel which is a problem > when instrumenting the kernel itself. Every subsystem used by dt(4) is > impossible to instrument with btrace(8). I think you can avoid this instrumentation problem by using clockintr, where the callback functions are run from the hardware interrupt context, just like hardclock(). > The second periodic timeout it dt_prov_profile_enter(). It is similar > to the previous one and has to run on every CPU. > > Both are currently bound to tick, but we want per-PCB time resolution. > We can get rid of `dp_nticks' and `dp_maxtick' if we control when the > timeouts fires. My current thought is that each PCB could have its own "dp_period" or "dp_interval", a count of nanoseconds. > > - Each dt_pcb gets a provider-specific "dp_clockintr" pointer. If the > > PCB's implementing provider needs a clock interrupt to do its work, it > > stores the handle in dp_clockintr. The handle, if any, is established > > during dtpv_alloc() and disestablished during dtpv_dealloc(). > > Sorry, but as I said I don't understand what is a clockintr. How does it > fit in the kernel and how is it supposed to be used? The goal of clockintr is to provide a machine-independent API for scheduling clock interrupts. You can use it to implement something like hardclock() or statclock(). We are already using it to implement these functions, among others. > Why have it per PCB and not per provider or for the whole driver instead? > Per-PCB implies that if I run 3 different profiling on a 32 CPU machines > I now have 96 different clockintr. Is it what we want? Yes, I think that sounds fine. If we run into scaling problems we can always just change the underlying data structure to an RB tree or a minheap. > If so, why not simply use timeout(9)? What's the difference? Some code needs to run from a hardware interrupt context. It's going to be easier to profile more of the kernel if we're collecting stack traces during a clock interrupt. Timeouts run at IPL_SOFTCLOCK. You can profile process context code... that's it. Timeouts also only run on a single CPU. Per-CPU timeout wheels are at least a year away if not longer. > > One alternative is to start running the clock interrupts when they > > are allocated in dtpv_alloc() and stop them when they are freed in > > dtpv_dealloc(). This is wasteful, though: the PCBs are not recording > > yet, so the interrupts won't perform any useful work until the > > controlling process enables recording. > > > > An additional pair of provider hooks, e.g. "dtpv_record_start" and > > "dtpv_record_stop", might resolve this. > > Another alternative would be to have a single hardclock-like handler for > dt(4). Sorry, I don't understand the big picture behind clockintr, so I > can't help. > > > - We haven't needed to destroy clock interrupts yet, so the > > clockintr_disestablish() function used in this patch is new. > > So maybe that's fishy. Are they supposed to be destroyed? [...] Let's start with what we have above and circle back to this. Index: share/man/man9/Makefile =================================================================== RCS file: /cvs/src/share/man/man9/Makefile,v retrieving revision 1.308 diff -u -p -r1.308 Makefile --- share/man/man9/Makefile 5 Nov 2022 19:29:45 -0000 1.308 +++ share/man/man9/Makefile 26 Aug 2023 01:44:37 -0000 @@ -9,7 +9,7 @@ MAN= aml_evalnode.9 atomic_add_int.9 ato audio.9 autoconf.9 \ bemtoh32.9 bio_register.9 bintimeadd.9 boot.9 bpf_mtap.9 buffercache.9 \ bufq_init.9 bus_dma.9 bus_space.9 \ - clockintr.9 \ + clockintr.9 clockintr_establish.9 \ copy.9 cond_init.9 config_attach.9 config_defer.9 counters_alloc.9 \ cpumem_get.9 crypto.9 \ delay.9 disk.9 disklabel.9 dma_alloc.9 dohooks.9 \ Index: share/man/man9/clockintr_establish.9 =================================================================== RCS file: share/man/man9/clockintr_establish.9 diff -N share/man/man9/clockintr_establish.9 --- /dev/null 1 Jan 1970 00:00:00 -0000 +++ share/man/man9/clockintr_establish.9 26 Aug 2023 01:44:37 -0000 @@ -0,0 +1,239 @@ +.\" $OpenBSD$ +.\" +.\" Copyright (c) 2020-2023 Scott Cheloha <chel...@openbsd.org> +.\" +.\" Permission to use, copy, modify, and distribute this software for any +.\" purpose with or without fee is hereby granted, provided that the above +.\" copyright notice and this permission notice appear in all copies. +.\" +.\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES +.\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF +.\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR +.\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES +.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN +.\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF +.\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. +.\" +.Dd $Mdocdate$ +.Dt CLOCKINTR_ESTABLISH 9 +.Os +.Sh NAME +.Nm clockintr_advance , +.Nm clockintr_cancel , +.Nm clockintr_disestablish , +.Nm clockintr_establish , +.Nm clockintr_expiration , +.Nm clockintr_nsecuptime , +.Nm clockintr_schedule , +.Nm clockintr_stagger +.Nd schedule a function for execution from clock interrupt context +.Sh SYNOPSIS +.In sys/clockintr.h +.Ft uint64_t +.Fo clockintr_advance +.Fa "struct clockintr *cl" +.Fa "uint64_t interval" +.Fc +.Ft void +.Fo clockintr_cancel +.Fa "struct clockintr *cl" +.Fc +.Ft void +.Fo clockintr_disestablish +.Fa "struct clockintr *cl" +.Fc +.Ft struct clockintr * +.Fo clockintr_establish +.Fa "struct clockintr_queue *queue" +.Fa "void (*func)(struct clockintr *, void *)" +.Fc +.Ft uint64_t +.Fo clockintr_expiration +.Fa "const struct clockintr *cl" +.Fc +.Ft uint64_t +.Fo clockintr_nsecuptime +.Fa "const struct clockintr *cl" +.Fc +.Ft void +.Fo clockintr_schedule +.Fa "struct clockintr *cl" +.Fa "uint64_t abs" +.Fc +.Ft void +.Fo clockintr_stagger +.Fa "struct clockintr *cl" +.Fa "uint64_t interval" +.Fa "u_int numerator" +.Fa "u_int denominator" +.Fc +.Sh DESCRIPTION +The clock interrupt subsystem schedules functions for asynchronous execution +in the future from a hardware interrupt context. +.Pp +The +.Fn clockintr_establish +function allocates a new clock interrupt object +.Po +a +.Dq clockintr +.Pc +and binds it to the given clock interrupt +.Fa queue . +When the clockintr is executed, +the callback function +.Fa func +will be called from a hardware interrupt context on the CPU in control of the +.Fa queue . +The new clockintr is not initially scheduled for execution and has an +expiration time of zero. +.Pp +The +.Fn clockintr_schedule +function schedules the given clockintr for execution at or after the +absolute time +.Fa abs +has elapsed on the system uptime clock. +If the clockintr is already pending, +its original expiration time is quietly superseded by +.Fa abs . +.Pp +The +.Fn clockintr_advance +function reschedules the given clockintr to expire in the future at an offset +from the current expiration time equal to an integral multiple of the given +.Fa interval , +a non-zero count of nanoseconds. +This interface is useful for rescheduling periodic clock interrupts. +.Pp +The +.Fn clockintr_cancel +function cancels any pending execution of the given clockintr. +Its internal expiration time is unmodified. +.Pp +The +.Fn clockintr_disestablish +function cancels any pending execution of the given clockintr and frees +its underlying object. +It is an error to use a clockintr after it is disestablished. +It is an error for a callback function to disestablish its own clockintr. +.Pp +The +.Fn clockintr_expiration +function returns the given clockintr's expiration time. +.Pp +The +.Fn clockintr_nsecuptime +function returns the system uptime value cached by the clock interrupt +subsystem. +The result is reasonably accurate and is produced much more quickly than +.Xr nsecuptime 9 . +It is an error to call this function outside of a clockintr callback function. +.Pp +The +.Fn clockintr_stagger +function resets the given clockintr's expiration time to a fraction of the given +.Fa interval , +a count of nanoseconds. +The clockintr is not scheduled for execution: +only its internal expiration time is modified. +This interface may be used in combination with +.Fn clockintr_advance +to minimize the likelihood that a periodic clock interrupt will execute +simultaneously on more than one CPU. +It is an error if the +.Fa numerator +is greater than or equal to the +.Fa denominator . +It is an error to stagger a clockintr that is already scheduled for execution. +.Sh CONTEXT +The +.Fn clockintr_advance , +.Fn clockintr_cancel , +.Fn clockintr_expiration , +.Fn clockintr_schedule , +and +.Fn clockintr_stagger +functions may be called during autoconf, +from process context, +or from interrupt context. +.Pp +The +.Fn clockintr_disestablish +and +.Fn clockintr_establish +functions may be called during autoconf or from process context. +.Pp +The +.Fn clockintr_nsecuptime +function may only be called during a clockintr callback function. +.Pp +When a clockintr is executed, +the callback function +.Fa func +set by +.Fn clockintr_establish +is called from the +.Dv IPL_CLOCK +interrupt context on the CPU in control of the clockintr's parent +.Fa queue . +The function +.Fa func +must not block. +The first argument to +.Fa func +is a pointer to a private copy of the underlying clock interrupt object. +The callback function may use this pointer to reschedule the clockintr. +The second argument to +.Fa func +is a pointer to the current CPU's machine-dependent clockframe object. +.Sh RETURN VALUES +The +.Fn clockintr_advance +function returns the number of intervals that have elapsed since the clockintr's +expiration time, +or zero if the clockintr's expiration time has not yet elapsed. +.Pp +The +.Fn clockintr_establish +function returns a pointer to a new clock interrupt handle on success, +or +.Dv NULL +if the allocation fails. +.Pp +The +.Fn clockintr_expiration +and +.Fn clockintr_nsecuptime +functions' return values are as described above. +.Sh CODE REFERENCES +.Pa sys/kern/kern_clockintr.c +.Sh SEE ALSO +.Xr hardclock 9 , +.Xr hz 9 , +.Xr inittodr 9 , +.Xr nanouptime 9 , +.Xr spl 9 , +.Xr tc_init 9 , +.Xr timeout 9 +.Rs +.%A Steven McCanne +.%A Chris Torek +.%T A Randomized Sampling Clock for CPU Utilization Estimation and Code Profiling +.%B \&In Proc. Winter 1993 USENIX Conference +.%D 1993 +.%P pp. 387\(en394 +.%I USENIX Association +.Re +.Rs +.%A Richard McDougall +.%A Jim Mauro +.%B Solaris Internals: Solaris 10 and OpenSolaris Kernel Architecture +.%I Prentice Hall +.%I Sun Microsystems Press +.%D 2nd Edition, 2007 +.%P pp. 912\(en925 +.Re +.Sh HISTORY +The clock interrupt subsystem first appeared in +.Ox 7.3 . Index: sys/kern/kern_clockintr.c =================================================================== RCS file: /cvs/src/sys/kern/kern_clockintr.c,v retrieving revision 1.32 diff -u -p -r1.32 kern_clockintr.c --- sys/kern/kern_clockintr.c 21 Aug 2023 17:22:04 -0000 1.32 +++ sys/kern/kern_clockintr.c 26 Aug 2023 01:44:38 -0000 @@ -218,7 +218,7 @@ clockintr_dispatch(void *frame) { uint64_t lateness, run = 0, start; struct cpu_info *ci = curcpu(); - struct clockintr *cl; + struct clockintr *cl, *shadow; struct clockintr_queue *cq = &ci->ci_queue; u_int ogen; @@ -257,22 +257,26 @@ clockintr_dispatch(void *frame) break; } clockintr_cancel_locked(cl); - cq->cq_shadow.cl_expiration = cl->cl_expiration; + shadow = &cq->cq_shadow; + shadow->cl_expiration = cl->cl_expiration; + shadow->cl_func = cl->cl_func; cq->cq_running = cl; mtx_leave(&cq->cq_mtx); - cl->cl_func(&cq->cq_shadow, frame); + shadow->cl_func(shadow, frame); mtx_enter(&cq->cq_mtx); - cq->cq_running = NULL; - if (ISSET(cl->cl_flags, CLST_IGNORE_SHADOW)) { - CLR(cl->cl_flags, CLST_IGNORE_SHADOW); - CLR(cq->cq_shadow.cl_flags, CLST_SHADOW_PENDING); - } - if (ISSET(cq->cq_shadow.cl_flags, CLST_SHADOW_PENDING)) { - CLR(cq->cq_shadow.cl_flags, CLST_SHADOW_PENDING); - clockintr_schedule_locked(cl, - cq->cq_shadow.cl_expiration); + if (cq->cq_running != NULL) { + cq->cq_running = NULL; + if (ISSET(cl->cl_flags, CLST_IGNORE_SHADOW)) { + CLR(cl->cl_flags, CLST_IGNORE_SHADOW); + CLR(shadow->cl_flags, CLST_SHADOW_PENDING); + } + if (ISSET(shadow->cl_flags, CLST_SHADOW_PENDING)) { + CLR(shadow->cl_flags, CLST_SHADOW_PENDING); + clockintr_schedule_locked(cl, + shadow->cl_expiration); + } } run++; } @@ -382,6 +386,34 @@ clockintr_cancel_locked(struct clockintr TAILQ_REMOVE(&cq->cq_pend, cl, cl_plink); CLR(cl->cl_flags, CLST_PENDING); +} + +void +clockintr_disestablish(struct clockintr *cl) +{ + struct clockintr_queue *cq = cl->cl_queue; + int was_next; + + if (cl == &cq->cq_shadow) + panic("%s: called during dispatch\n", __func__); + + mtx_enter(&cq->cq_mtx); + if (ISSET(cl->cl_flags, CLST_PENDING)) { + was_next = cl == TAILQ_FIRST(&cq->cq_pend); + clockintr_cancel_locked(cl); + if (ISSET(cq->cq_flags, CQ_INTRCLOCK)) { + if (was_next && !TAILQ_EMPTY(&cq->cq_pend)) { + if (cq == &curcpu()->ci_queue) + clockqueue_reset_intrclock(cq); + } + } + } + if (cl == cq->cq_running) + cq->cq_running = NULL; + TAILQ_REMOVE(&cq->cq_est, cl, cl_elink); + mtx_leave(&cq->cq_mtx); + + free(cl, M_DEVBUF, sizeof *cl); } struct clockintr * Index: sys/kern/kern_clock.c =================================================================== RCS file: /cvs/src/sys/kern/kern_clock.c,v retrieving revision 1.115 diff -u -p -r1.115 kern_clock.c --- sys/kern/kern_clock.c 23 Aug 2023 01:55:45 -0000 1.115 +++ sys/kern/kern_clock.c 26 Aug 2023 01:44:38 -0000 @@ -49,11 +49,6 @@ #include <sys/sched.h> #include <sys/timetc.h> -#include "dt.h" -#if NDT > 0 -#include <dev/dt/dtvar.h> -#endif - /* * Clock handling routines. * @@ -114,12 +109,6 @@ initclocks(void) void hardclock(struct clockframe *frame) { -#if NDT > 0 - DT_ENTER(profile, NULL); - if (CPU_IS_PRIMARY(curcpu())) - DT_ENTER(interval, NULL); -#endif - /* * If we are not the primary CPU, we're not allowed to do * any more work. Index: sys/sys/clockintr.h =================================================================== RCS file: /cvs/src/sys/sys/clockintr.h,v retrieving revision 1.10 diff -u -p -r1.10 clockintr.h --- sys/sys/clockintr.h 21 Aug 2023 17:22:04 -0000 1.10 +++ sys/sys/clockintr.h 26 Aug 2023 01:44:38 -0000 @@ -128,6 +128,7 @@ void clockintr_trigger(void); uint64_t clockintr_advance(struct clockintr *, uint64_t); void clockintr_cancel(struct clockintr *); +void clockintr_disestablish(struct clockintr *); struct clockintr *clockintr_establish(struct clockintr_queue *, void (*)(struct clockintr *, void *)); void clockintr_stagger(struct clockintr *, uint64_t, u_int, u_int); Index: sys/dev/dt/dt_prov_profile.c =================================================================== RCS file: /cvs/src/sys/dev/dt/dt_prov_profile.c,v retrieving revision 1.5 diff -u -p -r1.5 dt_prov_profile.c --- sys/dev/dt/dt_prov_profile.c 26 Apr 2023 16:53:59 -0000 1.5 +++ sys/dev/dt/dt_prov_profile.c 26 Aug 2023 01:44:38 -0000 @@ -20,6 +20,7 @@ #include <sys/systm.h> #include <sys/param.h> #include <sys/atomic.h> +#include <sys/clockintr.h> #include <dev/dt/dtvar.h> @@ -31,6 +32,8 @@ struct dt_probe *dtpp_interval; /* glob int dt_prov_profile_alloc(struct dt_probe *, struct dt_softc *, struct dt_pcb_list *, struct dtioc_req *); +int dt_prov_profile_dealloc(struct dt_probe *, struct dt_softc *, + struct dtioc_req *); int dt_prov_profile_enter(struct dt_provider *, ...); int dt_prov_interval_enter(struct dt_provider *, ...); @@ -39,7 +42,7 @@ struct dt_provider dt_prov_profile = { .dtpv_alloc = dt_prov_profile_alloc, .dtpv_enter = dt_prov_profile_enter, .dtpv_leave = NULL, - .dtpv_dealloc = NULL, + .dtpv_dealloc = dt_prov_profile_dealloc, }; struct dt_provider dt_prov_interval = { @@ -47,9 +50,11 @@ struct dt_provider dt_prov_interval = { .dtpv_alloc = dt_prov_profile_alloc, .dtpv_enter = dt_prov_interval_enter, .dtpv_leave = NULL, - .dtpv_dealloc = NULL, + .dtpv_dealloc = dt_prov_profile_dealloc, }; +void dt_prov_profile_intr(struct clockintr *, void *); + int dt_prov_profile_init(void) { @@ -85,10 +90,17 @@ dt_prov_profile_alloc(struct dt_probe *d continue; dp = dt_pcb_alloc(dtp, sc); - if (dp == NULL) { - dt_pcb_purge(plist); - return ENOMEM; + if (dp == NULL) + goto purge; + + dp->dp_clockintr = clockintr_establish(&ci->ci_queue, + dt_prov_profile_intr); + if (dp->dp_clockintr == NULL) { + dt_pcb_free(dp); + goto purge; } + clockintr_stagger(dp->dp_clockintr, hardclock_period, + CPU_INFO_UNIT(ci), MAXCPUS); dp->dp_maxtick = hz / dtrq->dtrq_rate; dp->dp_cpuid = ci->ci_cpuid; @@ -99,6 +111,31 @@ dt_prov_profile_alloc(struct dt_probe *d } return 0; +purge: + TAILQ_FOREACH(dp, plist, dp_snext) + clockintr_disestablish(dp->dp_clockintr); + dt_pcb_purge(plist); + return ENOMEM; +} + +int +dt_prov_profile_dealloc(struct dt_probe *dtp, struct dt_softc *sc, + struct dtioc_req *dtrq) +{ + struct dt_pcb *dp; + + KASSERT(dtioc_req_isvalid(dtrq)); + KASSERT(dtp == dtpp_profile || dtp == dtpp_interval); + + TAILQ_FOREACH(dp, &sc->ds_pcbs, dp_snext) { + if (dp->dp_dtp == dtp) { + if (dp->dp_clockintr != NULL) { + clockintr_disestablish(dp->dp_clockintr); + dp->dp_clockintr = NULL; + } + } + } + return 0; } static inline void @@ -148,4 +185,18 @@ dt_prov_interval_enter(struct dt_provide } smr_read_leave(); return 0; +} + +void +dt_prov_profile_intr(struct clockintr *cl, void *cf) +{ + uint64_t count, i; + struct cpu_info *ci = curcpu(); + + count = clockintr_advance(cl, hardclock_period); + for (i = 0; i < count; i++) { + DT_ENTER(profile, NULL); + if (CPU_IS_PRIMARY(ci)) + DT_ENTER(interval, NULL); + } } Index: sys/dev/dt/dt_dev.c =================================================================== RCS file: /cvs/src/sys/dev/dt/dt_dev.c,v retrieving revision 1.28 diff -u -p -r1.28 dt_dev.c --- sys/dev/dt/dt_dev.c 14 Jul 2023 07:07:08 -0000 1.28 +++ sys/dev/dt/dt_dev.c 26 Aug 2023 01:44:39 -0000 @@ -19,6 +19,7 @@ #include <sys/types.h> #include <sys/systm.h> #include <sys/param.h> +#include <sys/clockintr.h> #include <sys/device.h> #include <sys/exec_elf.h> #include <sys/malloc.h> @@ -82,32 +83,6 @@ #define DPRINTF(x...) /* nothing */ -/* - * Descriptor associated with each program opening /dev/dt. It is used - * to keep track of enabled PCBs. - * - * Locks used to protect struct members in this file: - * m per-softc mutex - * K kernel lock - */ -struct dt_softc { - SLIST_ENTRY(dt_softc) ds_next; /* [K] descriptor list */ - int ds_unit; /* [I] D_CLONE unique unit */ - pid_t ds_pid; /* [I] PID of tracing program */ - - struct mutex ds_mtx; - - struct dt_pcb_list ds_pcbs; /* [K] list of enabled PCBs */ - struct dt_evt *ds_bufqueue; /* [K] copy evts to userland */ - size_t ds_bufqlen; /* [K] length of the queue */ - int ds_recording; /* [K] currently recording? */ - int ds_evtcnt; /* [m] # of readable evts */ - - /* Counters */ - uint64_t ds_readevt; /* [m] # of events read */ - uint64_t ds_dropevt; /* [m] # of events dropped */ -}; - SLIST_HEAD(, dt_softc) dtdev_list; /* [K] list of open /dev/dt nodes */ /* @@ -486,6 +461,14 @@ dt_ioctl_record_start(struct dt_softc *s SMR_SLIST_INSERT_HEAD_LOCKED(&dtp->dtp_pcbs, dp, dp_pnext); dtp->dtp_recording++; dtp->dtp_prov->dtpv_recording++; + + /* + * XXX Not provider-agnostic. Is there a better place + * to do this? + */ + if (dp->dp_clockintr != NULL) + clockintr_advance(dp->dp_clockintr, hardclock_period); + } rw_exit_write(&dt_lock); @@ -514,6 +497,11 @@ dt_ioctl_record_stop(struct dt_softc *sc dtp->dtp_recording--; dtp->dtp_prov->dtpv_recording--; + + /* XXX This is not provider-agnostic. */ + if (dp->dp_clockintr != NULL) + clockintr_cancel(dp->dp_clockintr); + SMR_SLIST_REMOVE_LOCKED(&dtp->dtp_pcbs, dp, dt_pcb, dp_pnext); } rw_exit_write(&dt_lock); Index: sys/dev/dt/dtvar.h =================================================================== RCS file: /cvs/src/sys/dev/dt/dtvar.h,v retrieving revision 1.17 diff -u -p -r1.17 dtvar.h --- sys/dev/dt/dtvar.h 26 Apr 2023 16:53:59 -0000 1.17 +++ sys/dev/dt/dtvar.h 26 Aug 2023 01:44:41 -0000 @@ -158,12 +158,39 @@ struct dtioc_getaux { #include <sys/queue.h> #include <sys/smr.h> +struct dt_pcb; +TAILQ_HEAD(dt_pcb_list, dt_pcb); + /* Flags that make sense for all providers. */ #define DTEVT_COMMON (DTEVT_EXECNAME|DTEVT_KSTACK|DTEVT_USTACK) #define M_DT M_DEVBUF /* XXX FIXME */ -struct dt_softc; +/* + * Descriptor associated with each program opening /dev/dt. It is used + * to keep track of enabled PCBs. + * + * Locks used to protect struct members in this file: + * m per-softc mutex + * K kernel lock + */ +struct dt_softc { + SLIST_ENTRY(dt_softc) ds_next; /* [K] descriptor list */ + int ds_unit; /* [I] D_CLONE unique unit */ + pid_t ds_pid; /* [I] PID of tracing program */ + + struct mutex ds_mtx; + + struct dt_pcb_list ds_pcbs; /* [K] list of enabled PCBs */ + struct dt_evt *ds_bufqueue; /* [K] copy evts to userland */ + size_t ds_bufqlen; /* [K] length of the queue */ + int ds_recording; /* [K] currently recording? */ + int ds_evtcnt; /* [m] # of readable evts */ + + /* Counters */ + uint64_t ds_readevt; /* [m] # of events read */ + uint64_t ds_dropevt; /* [m] # of events dropped */ +}; int dtioc_req_isvalid(struct dtioc_req *); @@ -197,6 +224,7 @@ struct dt_pcb { struct dt_filter dp_filter; /* [I] filter to match */ /* Provider specific fields. */ + struct clockintr *dp_clockintr; /* [?] clock interrupt handle */ unsigned int dp_cpuid; /* [I] on which CPU */ unsigned int dp_maxtick; /* [I] freq. of profiling */ unsigned int dp_nticks; /* [c] current tick count */ @@ -204,8 +232,6 @@ struct dt_pcb { /* Counters */ uint64_t dp_dropevt; /* [m] # dropped event */ }; - -TAILQ_HEAD(dt_pcb_list, dt_pcb); struct dt_pcb *dt_pcb_alloc(struct dt_probe *, struct dt_softc *); void dt_pcb_free(struct dt_pcb *);