Hi, I want to add clock-based timeouts to the kernel because tick-based timeouts suffer from a few problems:
1. They are not sensitive to NTP adjustment, so they can easily expire too early or too late. This is incorrect, particularly for POSIX interfaces that forbid early return. 2. Tick-based sleeps incur an additional tick of latency for wakeup due to their granularity. 10ms is a long time. 2a. Given (2), system calls using tick-based sleeps cannot block for under a tick. This is part of the root cause of the "vmm(4) time problem". 3. Ticks are not as robust as the system clock. If a tick is too long or too short all timeouts accumulate the error. The result is that very long timeouts have a lot of error. This is the other half of the "vmm(4) time problem". Basically, ticks are a poor approximation for the system clock. We should use the real thing where possible. FreeBSD, Linux, and Solaris/illumos all have varying implementations of clock-based timeouts. Usually they call them "high resolution" timeouts or something like that, which to my mind sort-of misses the point (being backed by a real clock) but they are equivalent to what I'm proposing here. I tried adding clock-based timeouts to the system last year but it broke some stuff and I ended up reverting the commit. Attached is my second attempt. This is a less clever approach. Instead of quietly replacing tick-based timeouts with clock-based timeouts the two will coexist. Tick-based timeouts are still needed because the scheduler is tick-based. Until that changes you can't entirely replace them. A tick is not a fixed unit, so approximations like "a tick is 10 milliseconds" don't work and break things in subtle ways. I discovered this the hard way. What follows is an overview of the clock-based timeout API. After that I'll summarize the more significant parts of the implementation. -- Clock-based timeouts are initialized at runtime with timeout_set_kclock(9). This interface is equivalent to timeout_set_flags(9), except it accepts an additional parameter, a "kclock". A kclock is an integer representing one of the kernel's "real" clocks. At the moment we only support KCLOCK_UPTIME, the uptime clock (nanouptime(9)). In the future I will add support for KCLOCK_RUNTIME (nanoruntime(9)) and KCLOCK_UTC (nanotime(9)). When these other clocks are supported we can add POSIX's clock_nanosleep() to the kernel. We can also add proper support for CLOCK_REALTIME timeouts in the kernel (CC guenther@, who mentioned wanting this for futex(2), for pthreads). I chose *not* to use the POSIX CLOCK_* constants directly because they are baked into the ABI, in sys/_time.h. I think it's a better idea for us to write a "clock-to-kclock" conversion function for use by syscalls when needed than to chain ourselves to the ABI. Anyway, to setup a timeout scheduled against the uptime clock you would do something like this: struct timeout to; timeout_set_kclock(&to, func, to, 0, KCLOCK_UPTIME); Or if you wanted a process-context timeout you could do: timeout_set_kclock(&to, func, to, TIMEOUT_PROC, KCLOCK_UPTIME); Relative clock-based timeouts are scheduled with timeout_in_nsec(9). The timeout's associated clock is read and the timeout is set to expire after a given number of nanoseconds. This has more overhead than a tick-based timeout because you are reading the hardware clock. This is unavoidable: if you want clock-based timeouts you need to read the clock. For example, to set the sample timeout to expire in half a second: timeout_in_nsec(to, MSEC_TO_NSEC(500)); If a clock-based timeout is periodic it should be rescheduled from the callback function with timeout_advance_nsec(9). This function finds the next expiration point for a given period and schedules the timeout to expire then. You need a special interface for this operation because a nanosecond is much more fine-grained than a tick. The timeout will drift out of phase if you don't take into account its last expiration time when rescheduling it. Calling timeout_in_nsec(9) from the callback function is insufficient for period timeouts. timeout_advance_nsec(9) also optionally returns the number of missed expirations. This can be used to implement something like POSIX's timer_getoverrun(). So, for example, if the sample timeout also had a period of half a second, rescheduling it would look something like this: void func(void *arg) { timeout_advance_nsec(to, MSEC_TO_NSEC(500), NULL); } I have not exposed an absolute timeout interface yet. We can cross that bridge when one is needed. Clock-based timeouts are still canceled with timeout_del(9) and timeout_del_barrier(9). -- As for the implementation, clock-based timeouts are kept on a separate timeout wheel, but the wheel is indexed by hashing absolute timespecs instead of absolute tick values. See timeout_bucket() and timeout_maskwheel() in the patch. Each timeout has an absolute expiry time, to_abstime, a timespec. Each timeout also has an associated kclock, to_kclock. These new members are used during softclock() to determine if the timeout has expired and, if not, what bucket to put it into. The timeout_hardclock_update() code for the clock-based timeout wheel is somewhat more complex because an arbitrary number of buckets can expire during a call. The longer you wait, the more buckets expire. At 100hz you should dump two buckets per call on average. The overhead is small. After dumping expired buckets we update the internal state for each kclock for use in the subsequent softclock(). For now there is only one such kclock, KCLOCK_UPTIME. I have divided the logical parts of softclock() into functions that handle both tick-based and clock-based timeouts. The result is that softclock() is nearly pseudocode now. -- Nothing in the system uses clock-based timeouts yet. This diff will not break anything, unlike my first attempt. My first target for applying these new timeouts are tsleep_nsec(9), msleep_nsec(9), and rwsleep_nsec(9). Using them there will make blocking system calls that accept a timeout clock-based, which will make them more correct. After that, setitimer(2) and the kevent(2) timer filters are obvious applications. -- The alternative to reusing the timeout namespace is a new interface for clock-based timeouts. Linux did this. They called it hrtimers. FreeBSD did not do this. They stuck with callout. If possible I'd like to avoid the Linux route because I'd wind up duplicating a bunch of code. -- Thoughts on this approach? Thoughts on the proposed API? -Scott Index: kern/kern_timeout.c =================================================================== RCS file: /cvs/src/sys/kern/kern_timeout.c,v retrieving revision 1.76 diff -u -p -r1.76 kern_timeout.c --- kern/kern_timeout.c 25 Jul 2020 00:48:04 -0000 1.76 +++ kern/kern_timeout.c 26 Jul 2020 01:41:15 -0000 @@ -59,16 +59,28 @@ struct timeoutstat tostat; /* [T] stati * of the global variable "ticks" when the timeout should be called. There are * four levels with 256 buckets each. */ -#define BUCKETS 1024 +#define WHEELCOUNT 4 #define WHEELSIZE 256 #define WHEELMASK 255 #define WHEELBITS 8 +#define BUCKETS (WHEELCOUNT * WHEELSIZE) -struct circq timeout_wheel[BUCKETS]; /* [T] Queues of timeouts */ +struct circq timeout_wheel[BUCKETS]; /* [T] Tick-based timeouts */ +struct circq timeout_wheel_kc[BUCKETS]; /* [T] Kernel clock-based timeouts */ struct circq timeout_new; /* [T] New, unscheduled timeouts */ struct circq timeout_todo; /* [T] Due or needs rescheduling */ struct circq timeout_proc; /* [T] Due + needs process context */ +time_t timeout_level_width[WHEELCOUNT]; /* [I] Wheel level width (seconds) */ +struct timespec tick_ts; /* [I] Length of a tick (1/hz secs) */ +struct timespec intvl_limit; /* [I] Max interval for easy advance */ + +struct kclock { + struct timespec kc_lastscan; /* [T] Clock time at last wheel scan */ + struct timespec kc_offset; /* [T] Offset from primary kclock */ + struct timespec kc_late; /* [T] Late if due prior */ +} timeout_kclock[KCLOCK_MAX]; + #define MASKWHEEL(wheel, time) (((time) >> ((wheel)*WHEELBITS)) & WHEELMASK) #define BUCKET(rel, abs) \ @@ -150,9 +162,19 @@ struct lock_type timeout_spinlock_type = ((needsproc) ? &timeout_sleeplock_obj : &timeout_spinlock_obj) #endif +void kclock_nanotime(int, struct timespec *); +uint64_t itimer_advance(const struct timespec *, const struct timespec *, + const struct timespec *, struct timespec *); void softclock(void *); void softclock_create_thread(void *); void softclock_thread(void *); +void _timeout_set(struct timeout *, void (*)(void *), void *, int, int); +uint32_t timeout_bucket(struct timeout *); +int timeout_has_expired(const struct timeout *); +int timeout_is_late(const struct timeout *); +uint32_t timeout_maskwheel(uint32_t, const struct timespec *); +void timeout_reschedule(struct timeout *, int); +void timeout_run(struct timeout *); void timeout_proc_barrier(void *); /* @@ -202,13 +224,20 @@ timeout_sync_leave(int needsproc) void timeout_startup(void) { - int b; + int b, level; CIRCQ_INIT(&timeout_new); CIRCQ_INIT(&timeout_todo); CIRCQ_INIT(&timeout_proc); for (b = 0; b < nitems(timeout_wheel); b++) CIRCQ_INIT(&timeout_wheel[b]); + for (b = 0; b < nitems(timeout_wheel_kc); b++) + CIRCQ_INIT(&timeout_wheel_kc[b]); + + for (level = 0; level < nitems(timeout_level_width); level++) + timeout_level_width[level] = 2 << (level * WHEELBITS); + NSEC_TO_TIMESPEC(tick_nsec, &tick_ts); + NSEC_TO_TIMESPEC(UINT64_MAX, &intvl_limit); } void @@ -225,23 +254,38 @@ timeout_proc_init(void) } void +_timeout_set(struct timeout *to, void (*fn)(void *), void *arg, int flags, + int kclock) +{ + to->to_func = fn; + to->to_arg = arg; + to->to_flags = flags | TIMEOUT_INITIALIZED; + to->to_kclock = kclock; +} + +void timeout_set(struct timeout *new, void (*fn)(void *), void *arg) { - timeout_set_flags(new, fn, arg, 0); + _timeout_set(new, fn, arg, 0, KCLOCK_NONE); } void timeout_set_flags(struct timeout *to, void (*fn)(void *), void *arg, int flags) { - to->to_func = fn; - to->to_arg = arg; - to->to_flags = flags | TIMEOUT_INITIALIZED; + _timeout_set(to, fn, arg, flags, KCLOCK_NONE); } void timeout_set_proc(struct timeout *new, void (*fn)(void *), void *arg) { - timeout_set_flags(new, fn, arg, TIMEOUT_PROC); + _timeout_set(new, fn, arg, TIMEOUT_PROC, KCLOCK_NONE); +} + +void +timeout_set_kclock(struct timeout *to, void (*fn)(void *), void *arg, int flags, + int kclock) +{ + _timeout_set(to, fn, arg, flags | TIMEOUT_KCLOCK, kclock); } int @@ -251,6 +295,8 @@ timeout_add(struct timeout *new, int to_ int ret = 1; KASSERT(ISSET(new->to_flags, TIMEOUT_INITIALIZED)); + KASSERT(!ISSET(new->to_flags, TIMEOUT_KCLOCK)); + KASSERT(new->to_kclock == KCLOCK_NONE); KASSERT(to_ticks >= 0); mtx_enter(&timeout_mutex); @@ -376,6 +422,157 @@ timeout_add_nsec(struct timeout *to, int } int +timeout_at_ts(struct timeout *to, const struct timespec *ts) +{ + struct timespec old_abstime; + int ret = 1; + + KASSERT(ISSET(to->to_flags, TIMEOUT_INITIALIZED | TIMEOUT_KCLOCK)); + + mtx_enter(&timeout_mutex); + + old_abstime = to->to_abstime; + to->to_abstime = *ts; + CLR(to->to_flags, TIMEOUT_TRIGGERED); + + if (ISSET(to->to_flags, TIMEOUT_ONQUEUE)) { + if (timespeccmp(ts, &old_abstime, <)) { + CIRCQ_REMOVE(&to->to_list); + CIRCQ_INSERT_TAIL(&timeout_new, &to->to_list); + } + tostat.tos_readded++; + ret = 0; + } else { + SET(to->to_flags, TIMEOUT_ONQUEUE); + CIRCQ_INSERT_TAIL(&timeout_new, &to->to_list); + } + + tostat.tos_added++; + + mtx_leave(&timeout_mutex); + + return ret; +} + +int +timeout_in_nsec(struct timeout *to, uint64_t nsecs) +{ + struct timespec deadline, interval, now; + + kclock_nanotime(to->to_kclock, &now); + NSEC_TO_TIMESPEC(nsecs, &interval); + timespecadd(&now, &interval, &deadline); + + return timeout_at_ts(to, &deadline); +} + +int +timeout_advance_nsec(struct timeout *to, uint64_t nsecs, uint64_t *omissed) +{ + struct timespec intvl, next, now; + uint64_t missed; + int ret; + + kclock_nanotime(to->to_kclock, &now); + NSEC_TO_TIMESPEC(nsecs, &intvl); + missed = itimer_advance(&to->to_abstime, &intvl, &now, &next); + ret = timeout_at_ts(to, &next); + if (omissed != NULL) + *omissed = missed; + return ret; +} + +/* + * Given an interval timer with a period of invtl that most recently + * expired at absolute time last, find the timer's next absolute + * expiration after absolute time now and write it to next. + * + * Returns the number of intervals that elapsed between last and now, + * i.e. the number of expirations the timer missed. + */ +uint64_t +itimer_advance(const struct timespec *last, const struct timespec *intvl, + const struct timespec *now, struct timespec *next) +{ + struct timespec base, diff, minbase, intvl_product, intvl_product_max; + uint64_t intvl_nsecs, missed, quo; + + /* + * Typical case: no additional intervals have elapsed. + */ + timespecadd(last, intvl, next); + if (timespeccmp(now, next, <)) + return 0; + + missed = 0; + + /* + * If the interval is too large we can't use 64-bit integer math. + * Practical intervals are never this large. + */ + if (__predict_false(timespeccmp(&intvl_limit, intvl, <))) { + while (timespeccmp(next, now, <=)) { + timespecadd(next, intvl, next); + missed = MAX(missed, missed + 1); + } + return missed; + } + + /* + * Find a base within interval product range of the current time. + * The last expiration time should practically always be within + * range, but for sake of correctness we handle cases where longer + * expanses of time have elapsed. + */ + intvl_nsecs = TIMESPEC_TO_NSEC(intvl); + quo = UINT64_MAX / intvl_nsecs; + NSEC_TO_TIMESPEC(quo * intvl_nsecs, &intvl_product_max); + timespecsub(now, &intvl_product_max, &minbase); + base = *last; + if (__predict_false(timespeccmp(&base, &minbase, <))) { + while (timespeccmp(&base, &minbase, <)) { + timespecadd(&base, &intvl_product_max, &base); + missed = MAX(missed, missed + quo); + } + } + + /* + * We have a base within range. Now find the interval product + * that, when added to the base, gets us just past the current + * time to the most imminent expiration point. + * + * If the product would overflow a 64-bit integer we advance the + * base by one interval and retry. This can happen at most once. + */ + for (;;) { + timespecsub(now, &base, &diff); + quo = TIMESPEC_TO_NSEC(&diff) / intvl_nsecs; + if (intvl_nsecs * quo <= UINT64_MAX - intvl_nsecs) + break; + timespecadd(&base, intvl, &base); + missed = MAX(missed, missed + 1); + } + + NSEC_TO_TIMESPEC(intvl_nsecs * (quo + 1), &intvl_product); + timespecadd(&base, &intvl_product, next); + missed = MAX(missed, missed + quo); + + return missed; +} + +void +kclock_nanotime(int kclock, struct timespec *now) +{ + switch (kclock) { + case KCLOCK_UPTIME: + nanouptime(now); + break; + default: + panic("invalid kclock: 0x%x", kclock); + } +} + +int timeout_del(struct timeout *to) { int ret = 0; @@ -444,6 +641,35 @@ timeout_proc_barrier(void *arg) cond_signal(c); } +uint32_t +timeout_bucket(struct timeout *to) +{ + struct kclock *kc = &timeout_kclock[to->to_kclock]; + struct timespec diff; + uint32_t level; + + KASSERT(ISSET(to->to_flags, TIMEOUT_KCLOCK)); + KASSERT(timespeccmp(&kc->kc_lastscan, &to->to_abstime, <)); + + timespecsub(&to->to_abstime, &kc->kc_lastscan, &diff); + for (level = 0; level < nitems(timeout_level_width) - 1; level++) { + if (diff.tv_sec < timeout_level_width[level]) + break; + } + return level * WHEELSIZE + timeout_maskwheel(level, &to->to_abstime); +} + +uint32_t +timeout_maskwheel(uint32_t level, const struct timespec *abstime) +{ + uint32_t hi, lo; + + hi = abstime->tv_sec << 7; + lo = abstime->tv_nsec / 7812500; + + return ((hi | lo) >> (level * WHEELBITS)) & WHEELMASK; +} + /* * This is called from hardclock() on the primary CPU at the start of * every tick. @@ -451,7 +677,15 @@ timeout_proc_barrier(void *arg) void timeout_hardclock_update(void) { - int need_softclock = 1; + struct timespec elapsed, now; + struct kclock *kc; + struct timespec *lastscan; + int b, done, first, i, last, level, need_softclock, off; + + kclock_nanotime(KCLOCK_UPTIME, &now); + lastscan = &timeout_kclock[KCLOCK_UPTIME].kc_lastscan; + timespecsub(&now, lastscan, &elapsed); + need_softclock = 1; mtx_enter(&timeout_mutex); @@ -465,6 +699,44 @@ timeout_hardclock_update(void) } } + /* + * Dump the buckets that expired while we were away. + * + * If the elapsed time has exceeded a level's limit then we need + * to dump every bucket in the level. We have necessarily completed + * a lap of that level, too, so we need to process buckets in the + * next level. + * + * Otherwise we need to compare indices: if the index of the first + * expired bucket is greater than that of the last then we have + * completed a lap of the level and need to process buckets in the + * next level. + */ + for (level = 0; level < nitems(timeout_level_width); level++) { + first = timeout_maskwheel(level, lastscan); + if (elapsed.tv_sec >= timeout_level_width[level]) { + last = (first == 0) ? WHEELSIZE - 1 : first - 1; + done = 0; + } else { + last = timeout_maskwheel(level, &now); + done = first <= last; + } + off = level * WHEELSIZE; + for (b = first;; b = (b + 1) % WHEELSIZE) { + CIRCQ_CONCAT(&timeout_todo, &timeout_wheel_kc[off + b]); + if (b == last) + break; + } + if (done) + break; + } + + for (i = 0; i < nitems(timeout_kclock); i++) { + kc = &timeout_kclock[i]; + timespecadd(&now, &kc->kc_offset, &kc->kc_lastscan); + timespecsub(&kc->kc_lastscan, &tick_ts, &kc->kc_late); + } + if (CIRCQ_EMPTY(&timeout_new) && CIRCQ_EMPTY(&timeout_todo)) need_softclock = 0; @@ -497,6 +769,44 @@ timeout_run(struct timeout *to) mtx_enter(&timeout_mutex); } +int +timeout_has_expired(const struct timeout *to) +{ + if (ISSET(to->to_flags, TIMEOUT_KCLOCK)) { + struct kclock *kc = &timeout_kclock[to->to_kclock]; + return timespeccmp(&to->to_abstime, &kc->kc_lastscan, <=); + } + + return (to->to_time - ticks) <= 0; +} + +int +timeout_is_late(const struct timeout *to) +{ + if (ISSET(to->to_flags, TIMEOUT_KCLOCK)) { + struct kclock *kc = &timeout_kclock[to->to_kclock]; + return timespeccmp(&to->to_abstime, &kc->kc_late, <=); + } + + return (to->to_time - ticks) < 0; +} + +void +timeout_reschedule(struct timeout *to, int new) +{ + struct circq *bucket; + + tostat.tos_scheduled++; + if (!new) + tostat.tos_rescheduled++; + + if (ISSET(to->to_flags, TIMEOUT_KCLOCK)) + bucket = &timeout_wheel_kc[timeout_bucket(to)]; + else + bucket = &BUCKET(to->to_time - ticks, to->to_time); + CIRCQ_INSERT_TAIL(bucket, &to->to_list); +} + /* * Timeouts are processed here instead of timeout_hardclock_update() * to avoid doing any more work at IPL_CLOCK than absolutely necessary. @@ -506,9 +816,8 @@ timeout_run(struct timeout *to) void softclock(void *arg) { - struct circq *bucket; struct timeout *first_new, *to; - int delta, needsproc, new; + int needsproc, new; first_new = NULL; new = 0; @@ -521,23 +830,13 @@ softclock(void *arg) to = timeout_from_circq(CIRCQ_FIRST(&timeout_todo)); CIRCQ_REMOVE(&to->to_list); if (to == first_new) - new = 1; - - /* - * If due run it or defer execution to the thread, - * otherwise insert it into the right bucket. - */ - delta = to->to_time - ticks; - if (delta > 0) { - bucket = &BUCKET(delta, to->to_time); - CIRCQ_INSERT_TAIL(bucket, &to->to_list); - tostat.tos_scheduled++; - if (!new) - tostat.tos_rescheduled++; + new = 0; + if (!timeout_has_expired(to)) { + timeout_reschedule(to, new); continue; } - if (!new && delta < 0) - tostat.tos_late++; + if (!new && timeout_is_late(to)) + tostat.tos_late++; if (ISSET(to->to_flags, TIMEOUT_PROC)) { CIRCQ_INSERT_TAIL(&timeout_proc, &to->to_list); continue; @@ -642,52 +941,114 @@ timeout_sysctl(void *oldp, size_t *oldle } #ifdef DDB +const char *db_kclock(int); void db_show_callout_bucket(struct circq *); +void db_show_timeout(struct timeout *, struct circq *); +const char *db_timespec(const struct timespec *); + +const char * +db_kclock(int kclock) +{ + switch (kclock) { + case KCLOCK_UPTIME: + return "uptime"; + default: + return "invalid"; + } +} + +const char * +db_timespec(const struct timespec *ts) +{ + static char buf[32]; + struct timespec tmp, zero; + + if (ts->tv_sec >= 0) { + snprintf(buf, sizeof(buf), "%lld.%09ld", + ts->tv_sec, ts->tv_nsec); + return buf; + } + + timespecclear(&zero); + timespecsub(&zero, ts, &tmp); + snprintf(buf, sizeof(buf), "-%lld.%09ld", tmp.tv_sec, tmp.tv_nsec); + return buf; +} void db_show_callout_bucket(struct circq *bucket) { - char buf[8]; - struct timeout *to; struct circq *p; + + CIRCQ_FOREACH(p, bucket) + db_show_timeout(timeout_from_circq(p), bucket); +} + +void +db_show_timeout(struct timeout *to, struct circq *bucket) +{ + struct timespec remaining; + struct kclock *kc; + char buf[8]; db_expr_t offset; + struct circq *wheel; char *name, *where; int width = sizeof(long) * 2; - CIRCQ_FOREACH(p, bucket) { - to = timeout_from_circq(p); - db_find_sym_and_offset((vaddr_t)to->to_func, &name, &offset); - name = name ? name : "?"; - if (bucket == &timeout_todo) - where = "softint"; - else if (bucket == &timeout_proc) - where = "thread"; - else if (bucket == &timeout_new) - where = "new"; - else { - snprintf(buf, sizeof(buf), "%3ld/%1ld", - (bucket - timeout_wheel) % WHEELSIZE, - (bucket - timeout_wheel) / WHEELSIZE); - where = buf; - } - db_printf("%9d %7s 0x%0*lx %s\n", - to->to_time - ticks, where, width, (ulong)to->to_arg, name); + db_find_sym_and_offset((vaddr_t)to->to_func, &name, &offset); + name = name ? name : "?"; + if (bucket == &timeout_new) + where = "new"; + else if (bucket == &timeout_todo) + where = "softint"; + else if (bucket == &timeout_proc) + where = "thread"; + else { + if (ISSET(to->to_flags, TIMEOUT_KCLOCK)) + wheel = timeout_wheel_kc; + else + wheel = timeout_wheel; + snprintf(buf, sizeof(buf), "%3ld/%1ld", + (bucket - wheel) % WHEELSIZE, + (bucket - wheel) / WHEELSIZE); + where = buf; + } + if (ISSET(to->to_flags, TIMEOUT_KCLOCK)) { + kc = &timeout_kclock[to->to_kclock]; + timespecsub(&to->to_abstime, &kc->kc_lastscan, &remaining); + db_printf("%20s %8s %7s 0x%0*lx %s\n", + db_timespec(&remaining), db_kclock(to->to_kclock), where, + width, (ulong)to->to_arg, name); + } else { + db_printf("%20d %8s %7s 0x%0*lx %s\n", + to->to_time - ticks, "ticks", where, + width, (ulong)to->to_arg, name); } } void db_show_callout(db_expr_t addr, int haddr, db_expr_t count, char *modif) { + struct kclock *kc; int width = sizeof(long) * 2 + 2; - int b; - - db_printf("ticks now: %d\n", ticks); - db_printf("%9s %7s %*s func\n", "ticks", "wheel", width, "arg"); + int b, i; + db_printf("%20s %8s\n", "lastscan", "clock"); + db_printf("%20d %8s\n", ticks, "ticks"); + for (i = 0; i < nitems(timeout_kclock); i++) { + kc = &timeout_kclock[i]; + db_printf("%20s %8s\n", + db_timespec(&kc->kc_lastscan), db_kclock(i)); + } + db_printf("\n"); + db_printf("%20s %8s %7s %*s %s\n", + "remaining", "clock", "wheel", width, "arg", "func"); db_show_callout_bucket(&timeout_new); db_show_callout_bucket(&timeout_todo); db_show_callout_bucket(&timeout_proc); for (b = 0; b < nitems(timeout_wheel); b++) db_show_callout_bucket(&timeout_wheel[b]); + for (b = 0; b < nitems(timeout_wheel_kc); b++) + db_show_callout_bucket(&timeout_wheel_kc[b]); } #endif Index: sys/timeout.h =================================================================== RCS file: /cvs/src/sys/sys/timeout.h,v retrieving revision 1.37 diff -u -p -r1.37 timeout.h --- sys/timeout.h 25 Jul 2020 00:48:03 -0000 1.37 +++ sys/timeout.h 26 Jul 2020 01:41:15 -0000 @@ -1,4 +1,4 @@ -/* $OpenBSD: timeout.h,v 1.37 2020/07/25 00:48:03 cheloha Exp $ */ +/* $OpenBSD: timeout.h,v 1.36 2020/01/03 02:16:38 cheloha Exp $ */ /* * Copyright (c) 2000-2001 Artur Grabowski <a...@openbsd.org> * All rights reserved. @@ -51,6 +51,8 @@ * These functions may be called in interrupt context (anything below splhigh). */ +#include <sys/time.h> + struct circq { struct circq *next; /* next element */ struct circq *prev; /* previous element */ @@ -58,10 +60,12 @@ struct circq { struct timeout { struct circq to_list; /* timeout queue, don't move */ + struct timespec to_abstime; /* absolute time to run at */ void (*to_func)(void *); /* function to call */ void *to_arg; /* function argument */ int to_time; /* ticks on event */ int to_flags; /* misc flags */ + int to_kclock; /* abstime's kernel clock */ }; /* @@ -71,6 +75,7 @@ struct timeout { #define TIMEOUT_ONQUEUE 0x02 /* on any timeout queue */ #define TIMEOUT_INITIALIZED 0x04 /* initialized */ #define TIMEOUT_TRIGGERED 0x08 /* running or ran */ +#define TIMEOUT_KCLOCK 0x10 /* clock-based timeout */ struct timeoutstat { uint64_t tos_added; /* timeout_add*(9) calls */ @@ -100,21 +105,36 @@ int timeout_sysctl(void *, size_t *, voi #define timeout_initialized(to) ((to)->to_flags & TIMEOUT_INITIALIZED) #define timeout_triggered(to) ((to)->to_flags & TIMEOUT_TRIGGERED) -#define TIMEOUT_INITIALIZER_FLAGS(fn, arg, flags) { \ +#define KCLOCK_NONE (-1) /* dummy clock for sanity checks */ +#define KCLOCK_UPTIME 0 /* uptime clock; time since boot */ +#define KCLOCK_MAX 1 + +#define _TIMEOUT_INITIALIZER(fn, arg, flags, kclock) { \ .to_list = { NULL, NULL }, \ + .to_abstime = { .tv_sec = 0, .tv_nsec = 0 }, \ .to_func = (fn), \ .to_arg = (arg), \ .to_time = 0, \ - .to_flags = (flags) | TIMEOUT_INITIALIZED \ + .to_flags = (flags) | TIMEOUT_INITIALIZED, \ + .to_kclock = (kclock) \ } -#define TIMEOUT_INITIALIZER(_f, _a) TIMEOUT_INITIALIZER_FLAGS((_f), (_a), 0) +#define TIMEOUT_INITIALIZER_KCLOCK(fn, arg, flags, kclock) \ + _TIMEOUT_INITIALIZER((fn), (args), (flags) | TIMEOUT_KCLOCK, (kclock)) + +#define TIMEOUT_INITIALIZER_FLAGS(fn, arg, flags) \ + _TIMEOUT_INITIALIZER((fn), (args), (flags), KCLOCK_NONE) + +#define TIMEOUT_INITIALIZER(_f, _a) \ + _TIMEOUT_INITIALIZER((_f), (_a), 0, KCLOCK_NONE) struct bintime; void timeout_set(struct timeout *, void (*)(void *), void *); void timeout_set_flags(struct timeout *, void (*)(void *), void *, int); +void timeout_set_kclock(struct timeout *, void (*)(void *), void *, int, int); void timeout_set_proc(struct timeout *, void (*)(void *), void *); + int timeout_add(struct timeout *, int); int timeout_add_tv(struct timeout *, const struct timeval *); int timeout_add_ts(struct timeout *, const struct timespec *); @@ -123,6 +143,10 @@ int timeout_add_sec(struct timeout *, in int timeout_add_msec(struct timeout *, int); int timeout_add_usec(struct timeout *, int); int timeout_add_nsec(struct timeout *, int); + +int timeout_advance_nsec(struct timeout *, uint64_t, uint64_t *); +int timeout_in_nsec(struct timeout *, uint64_t); + int timeout_del(struct timeout *); int timeout_del_barrier(struct timeout *); void timeout_barrier(struct timeout *);