On 23/08/23(Wed) 18:52, Scott Cheloha wrote: > This is the next patch in the clock interrupt reorganization series.
Thanks for your diff. I'm sorry but it is really hard for me to help review this diff because there is still no man page for this API+subsystem. Can we start with that please? > This patch moves the entry points for the interval and profile dt(4) > providers from the hardclock(9) to a dedicated clock interrupt > callback, dt_prov_profile_intr(), in dev/dt/dt_prov_profile.c. > > - To preserve current behavior, (1) both provider entrypoints have > been moved into a single callback function, (2) the interrupt runs at > the same frequency as the hardclock, and (3) the interrupt is > staggered to co-occur with the hardclock on a given CPU. The only behavior that needs to be preserved is the output of dumping stacks. That means DT_FA_PROFILE and DT_FA_STATIC certainly needs to be adapted with this change. You can figure that out by looking at the output of /usr/src/share/btrace/kprofile.bt without and with this diff. Please generate a FlameGraph to make sure they're still the same. Apart from that I'd prefer if we could skip the mechanical change and go straight to what dt(4) needs. Otherwise we will have to re-design everything. If you don't want to do this work, then leave it and tell me what you need and what is your plan so I can help you and do it myself. dt(4) needs a way to schedule two different kind of periodic timeouts with the higher precision possible. It is currently plugged to hardclock because there is nothing better. The current code assumes the periodic entry points are external to dt(4). This diff moves them in the middle of dt(4) but keeps the existing flow which makes the code very convoluted. A starting point to understand the added complexity it so see that the DT_ENTER() macro are no longer needed if we move the entry points inside dt(4). The first periodic timeout is dt_prov_interval_enter(). It could be implemented as a per-PCB timeout_add_nsec(9). The drawback of this approach is that it uses too much code in the kernel which is a problem when instrumenting the kernel itself. Every subsystem used by dt(4) is impossible to instrument with btrace(8). The second periodic timeout it dt_prov_profile_enter(). It is similar to the previous one and has to run on every CPU. Both are currently bound to tick, but we want per-PCB time resolution. We can get rid of `dp_nticks' and `dp_maxtick' if we control when the timeouts fires. > - Each dt_pcb gets a provider-specific "dp_clockintr" pointer. If the > PCB's implementing provider needs a clock interrupt to do its work, it > stores the handle in dp_clockintr. The handle, if any, is established > during dtpv_alloc() and disestablished during dtpv_dealloc(). Sorry, but as I said I don't understand what is a clockintr. How does it fit in the kernel and how is it supposed to be used? Why have it per PCB and not per provider or for the whole driver instead? Per-PCB implies that if I run 3 different profiling on a 32 CPU machines I now have 96 different clockintr. Is it what we want? If so, why not simply use timeout(9)? What's the difference? > One alternative is to start running the clock interrupts when they > are allocated in dtpv_alloc() and stop them when they are freed in > dtpv_dealloc(). This is wasteful, though: the PCBs are not recording > yet, so the interrupts won't perform any useful work until the > controlling process enables recording. > > An additional pair of provider hooks, e.g. "dtpv_record_start" and > "dtpv_record_stop", might resolve this. Another alternative would be to have a single hardclock-like handler for dt(4). Sorry, I don't understand the big picture behind clockintr, so I can't help. > - We haven't needed to destroy clock interrupts yet, so the > clockintr_disestablish() function used in this patch is new. So maybe that's fishy. Are they supposed to be destroyed? What is the goal of this API?