Re: dt(4), hardclock(9): move interval, profile providers to dedicated callback

Martin Pieuchot Mon, 04 Sep 2023 04:26:23 -0700

On 25/08/23(Fri) 21:00, Scott Cheloha wrote:
> On Thu, Aug 24, 2023 at 07:21:29PM +0200, Martin Pieuchot wrote:
> > On 23/08/23(Wed) 18:52, Scott Cheloha wrote:
> > > This is the next patch in the clock interrupt reorganization series.
> > 
> > Thanks for your diff.  I'm sorry but it is really hard for me to help
> > review this diff because there is still no man page for this API+subsystem.
> > 
> > Can we start with that please?
> 
> Sure, a first draft of a clockintr_establish.9 manpage is included
> below.


Lovely, I'll answer to that first in this email.  Please commit it, so
we can tweak it in tree.  ok mpi@

> We also have a manpage in the tree, clockintr.9.  It is a bit out of
> date, but it covers the broad strokes of how the driver-facing portion
> of the subsystem works.

Currently reading it, should we get rid of `schedhz' in the manpage and
its remaining in the kernel?

Why isn't the statclock() always randomize?  I see a couple of
clocks/archs that do not use CL_RNDSTAT...   Is there any technical
reason apart from testing?

I don't understand what you mean with "Until we understand scheduler
lock contention better"?  What does lock contention has to do with
firing hardclock and statclock at the same moment?

> Index: share/man/man9/clockintr_establish.9
> ===================================================================
> RCS file: share/man/man9/clockintr_establish.9
> diff -N share/man/man9/clockintr_establish.9
> --- /dev/null 1 Jan 1970 00:00:00 -0000
> +++ share/man/man9/clockintr_establish.9      26 Aug 2023 01:44:37 -0000
> @@ -0,0 +1,239 @@
> +.\" $OpenBSD$
> +.\"
> +.\" Copyright (c) 2020-2023 Scott Cheloha <chel...@openbsd.org>
> +.\"
> +.\" Permission to use, copy, modify, and distribute this software for any
> +.\" purpose with or without fee is hereby granted, provided that the above
> +.\" copyright notice and this permission notice appear in all copies.
> +.\"
> +.\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
> +.\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
> +.\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
> +.\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
> +.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
> +.\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
> +.\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
> +.\"
> +.Dd $Mdocdate$
> +.Dt CLOCKINTR_ESTABLISH 9
> +.Os
> +.Sh NAME
> +.Nm clockintr_advance ,
> +.Nm clockintr_cancel ,
> +.Nm clockintr_disestablish ,
> +.Nm clockintr_establish ,
> +.Nm clockintr_expiration ,
> +.Nm clockintr_nsecuptime ,
> +.Nm clockintr_schedule ,
> +.Nm clockintr_stagger
> +.Nd schedule a function for execution from clock interrupt context
> +.Sh SYNOPSIS
> +.In sys/clockintr.h
> +.Ft uint64_t
> +.Fo clockintr_advance
> +.Fa "struct clockintr *cl"
> +.Fa "uint64_t interval"

Can we do s/interval/nsecs/?

> +.Fc
> +.Ft void
> +.Fo clockintr_cancel
> +.Fa "struct clockintr *cl"
> +.Fc
> +.Ft void
> +.Fo clockintr_disestablish
> +.Fa "struct clockintr *cl"
> +.Fc
> +.Ft struct clockintr *
> +.Fo clockintr_establish
> +.Fa "struct clockintr_queue *queue"
> +.Fa "void (*func)(struct clockintr *, void *)"
> +.Fc
> +.Ft uint64_t
> +.Fo clockintr_expiration
> +.Fa "const struct clockintr *cl"
> +.Fc
> +.Ft uint64_t
> +.Fo clockintr_nsecuptime
> +.Fa "const struct clockintr *cl"
> +.Fc
> +.Ft void
> +.Fo clockintr_schedule
> +.Fa "struct clockintr *cl"
> +.Fa "uint64_t abs"
> +.Fc
> +.Ft void
> +.Fo clockintr_stagger
> +.Fa "struct clockintr *cl"
> +.Fa "uint64_t interval"
> +.Fa "u_int numerator"
> +.Fa "u_int denominator"
> +.Fc
> +.Sh DESCRIPTION
> +The clock interrupt subsystem schedules functions for asynchronous execution
> +in the future from a hardware interrupt context.

This is the same description as timeout_set(9) apart from "hardware
interrupt context".  Why should I use this one and not the other one?
I'm confused.  I dislike choices.

> +.Pp
> +The
> +.Fn clockintr_establish
> +function allocates a new clock interrupt object
> +.Po
> +a
> +.Dq clockintr
> +.Pc
> +and binds it to the given clock interrupt
> +.Fa queue .
> +When the clockintr is executed,
> +the callback function
> +.Fa func
> +will be called from a hardware interrupt context on the CPU in control of the
> +.Fa queue .

It seems clockintr queues are per-CPU, can we abstract this data
structure from the caller and pass CPUs instead?  I don't see the point
of manipulating such queues outside of kern/kern_clockintr.c 

> +The new clockintr is not initially scheduled for execution and has an
> +expiration time of zero.

Speaking of expiration time confuse me.  I don't see this as necessary,
at least for the moment.  When writing API I'd strongly suggest to not
add functions that are not used.  This applies to clockintr_expiration()
below.

> +.Pp
> +The
> +.Fn clockintr_schedule
> +function schedules the given clockintr for execution at or after the
> +absolute time
> +.Fa abs
> +has elapsed on the system uptime clock.

Lovely.  

> +If the clockintr is already pending,
> +its original expiration time is quietly superseded by
> +.Fa abs .

This is also true for clockintr_advance() I guess?

> +.Pp
> +The
> +.Fn clockintr_advance
> +function reschedules the given clockintr to expire in the future at an offset
> +from the current expiration time equal to an integral multiple of the given
> +.Fa interval ,
> +a non-zero count of nanoseconds.

.Fn clockintr_advance
schedules the given clockintr for execution after at least
.Fa nsecs .

This is simpler to understand, no?  Does the implementation details
matter that much for the reader?

> +This interface is useful for rescheduling periodic clock interrupts.

This is obvious and not necessary in my opinion.

> +.Pp
> +The
> +.Fn clockintr_cancel
> +function cancels any pending execution of the given clockintr.

Lovely.  I'd argue we should use add/del instead of advance/cancel to
keep it closer to timeout(9) and save us some mental efforts.

> +Its internal expiration time is unmodified.

That's IMHO unnecessary internal implementation detail.

> +.Pp
> +The
> +.Fn clockintr_disestablish
> +function cancels any pending execution of the given clockintr and frees
> +its underlying object.
> +It is an error to use a clockintr after it is disestablished.
> +It is an error for a callback function to disestablish its own clockintr.
> +.Pp
> +The
> +.Fn clockintr_expiration
> +function returns the given clockintr's expiration time.
> +.Pp
> +The
> +.Fn clockintr_nsecuptime
> +function returns the system uptime value cached by the clock interrupt
> +subsystem.
> +The result is reasonably accurate and is produced much more quickly than
> +.Xr nsecuptime 9 .

This function is not yet used...  While I see the benefit of having a
cache I disagree about introducing this now.  It is really hard for me
to work with you because you throw too much code in your diff that are
not used.  I'd love to review, give inputs and discuss all of these but
I can't do it because it's just too much and out of scope.  So I feel
pressured and I just want to run away.

We'll need some kind of cache in the scheduler but I'm not sure tying
this to `clockintr' object is the way to go.  I have the feeling we're
missing the per-CPU picture here.  More on this in a later discussion.

> +It is an error to call this function outside of a clockintr callback 
> function.
> +.Pp
> +The
> +.Fn clockintr_stagger
> +function resets the given clockintr's expiration time to a fraction of the 
> given
> +.Fa interval ,
> +a count of nanoseconds.
> +The clockintr is not scheduled for execution:
> +only its internal expiration time is modified.

Modifies the expiration time of the given clockintr by a small delta in
order minimize simultaneous executions.

I find it confusing to talk about expiration time.  Especially that you
call this function only once after establish.

> +This interface may be used in combination with
> +.Fn clockintr_advance
> +to minimize the likelihood that a periodic clock interrupt will execute
> +simultaneously on more than one CPU.
> +It is an error if the
> +.Fa numerator
> +is greater than or equal to the
> +.Fa denominator .
> +It is an error to stagger a clockintr that is already scheduled for 
> execution.
> +.Sh CONTEXT
> +The
> +.Fn clockintr_advance ,
> +.Fn clockintr_cancel ,
> +.Fn clockintr_expiration ,
> +.Fn clockintr_schedule ,
> +and
> +.Fn clockintr_stagger
> +functions may be called during autoconf,
> +from process context,
> +or from interrupt context.
> +.Pp
> +The
> +.Fn clockintr_disestablish
> +and
> +.Fn clockintr_establish
> +functions may be called during autoconf or from process context.
> +.Pp
> +The
> +.Fn clockintr_nsecuptime
> +function may only be called during a clockintr callback function.
> +.Pp
> +When a clockintr is executed,
> +the callback function
> +.Fa func
> +set by
> +.Fn clockintr_establish
> +is called from the
> +.Dv IPL_CLOCK
> +interrupt context on the CPU in control of the clockintr's parent
> +.Fa queue .
> +The function
> +.Fa func
> +must not block.
> +The first argument to
> +.Fa func
> +is a pointer to a private copy of the underlying clock interrupt object.
> +The callback function may use this pointer to reschedule the clockintr.
> +The second argument to
> +.Fa func
> +is a pointer to the current CPU's machine-dependent clockframe object.
> +.Sh RETURN VALUES
> +The
> +.Fn clockintr_advance
> +function returns the number of intervals that have elapsed since the 
> clockintr's
> +expiration time,
> +or zero if the clockintr's expiration time has not yet elapsed.
> +.Pp
> +The
> +.Fn clockintr_establish
> +function returns a pointer to a new clock interrupt handle on success,
> +or
> +.Dv NULL
> +if the allocation fails.
> +.Pp
> +The
> +.Fn clockintr_expiration
> +and
> +.Fn clockintr_nsecuptime
> +functions' return values are as described above.
> +.Sh CODE REFERENCES
> +.Pa sys/kern/kern_clockintr.c
> +.Sh SEE ALSO
> +.Xr hardclock 9 ,
> +.Xr hz 9 ,
> +.Xr inittodr 9 ,
> +.Xr nanouptime 9 ,
> +.Xr spl 9 ,
> +.Xr tc_init 9 ,
> +.Xr timeout 9
> +.Rs
> +.%A Steven McCanne
> +.%A Chris Torek
> +.%T A Randomized Sampling Clock for CPU Utilization Estimation and Code 
> Profiling
> +.%B \&In Proc. Winter 1993 USENIX Conference
> +.%D 1993
> +.%P pp. 387\(en394
> +.%I USENIX Association
> +.Re
> +.Rs
> +.%A Richard McDougall
> +.%A Jim Mauro
> +.%B Solaris Internals: Solaris 10 and OpenSolaris Kernel Architecture
> +.%I Prentice Hall
> +.%I Sun Microsystems Press
> +.%D 2nd Edition, 2007
> +.%P pp. 912\(en925
> +.Re
> +.Sh HISTORY
> +The clock interrupt subsystem first appeared in
> +.Ox 7.3 .

Re: dt(4), hardclock(9): move interval, profile providers to dedicated callback

Reply via email to