On Tue, May 04, 2021 at 11:24:05AM +0200, Martin Pieuchot wrote:
> On 04/05/21(Tue) 01:10, Scott Cheloha wrote:
> > [...]
> > I want to run softclock() without the kernel lock. The way to go, I
> > think, is to first push the kernel lock down into timeout_run(), and
> > then to remove the kernel lock from each timeout, one by one.
>
> Grabbing and releasing the KERNEL_LOCk() on a per-timeout basis creates
> more latency than running all timeouts in a batch after having grabbed
> the KERNEL_LOCK(). I doubt this is the best way forward.
>
> > Before we can push the kernel lock down into timeout_run() we need to
> > remove the kernel lock from timeout_del_barrier(9).
>
> Seems worth it on its own.
>
> > The kernel lock is used in timeout_del_barrier(9) to determine whether
> > the given timeout has stopped running. Because the softclock() runs
> > with the kernel lock we currently assume that once the calling thread
> > has taken the kernel lock any onging softclock() must have returned
> > and relinquished the lock, so the timeout in question has returned.
as i'll try to explain below, it's not about waiting for a specific
timeout, it's about knowing that a currently running timeout has
finished.
> So you want to stop using the KERNEL_LOCK() to do the serialization?
the KERNEL_LOCK in timeout_barrier cos it was an elegant^Wclever^W hack.
because timeouts are run under the kernel lock, i knew i could take the
lock and know that timeouts arent running anymore. that's all.
in hindsight this means that the thread calling timeout_barrier spins
when it could sleep, and worse, it spins waiting for all pending
timeouts to run. so yeah, i agree that undoing the hack is worth it on
its own.
> > The simplest replacement I can think of is a volatile pointer to the
> > running timeout that we set before leaving the timeout_mutex and clear
> > after reentering the same during timeout_run().
>
> Sounds like a condition variable protected by this mutex. Interesting
> that cond_wait(9) doesn't work with a mutex.
cond_wait borrows the sched lock to coordinate between the waiting
thread and the signalling context. the cond api is basically a wrapper
around sleep_setup/sleep_finish.
> > So in the non-TIMEOUT_PROC case the timeout_del_barrier(9) caller just
> > spins until the timeout function returns and the timeout_running
> > pointer is changed. Not every caller can sleep during
> > timeout_del_barrier(9). I think spinning is the simplest thing that
> > will definitely work here.
>
> This keeps the current semantic indeed.
i don't want to throw timeout_barrier out just yet.
> > -void
> > -timeout_barrier(struct timeout *to)
> > +int
> > +timeout_del_barrier(struct timeout *to)
> > {
> > + struct timeout barrier;
> > + struct cond c = COND_INITIALIZER();
> > int needsproc = ISSET(to->to_flags, TIMEOUT_PROC);
> >
> > timeout_sync_order(needsproc);
> >
> > - if (!needsproc) {
> > - KERNEL_LOCK();
> > - splx(splsoftclock());
> > - KERNEL_UNLOCK();
> > - } else {
> > - struct cond c = COND_INITIALIZER();
> > - struct timeout barrier;
> > + mtx_enter(&timeout_mutex);
> > +
> > + if (timeout_del_locked(to)) {
> > + mtx_leave(&timeout_mutex);
> > + return 1;
> > + }
> >
> > + if (needsproc) {
> > timeout_set_proc(&barrier, timeout_proc_barrier, &c);
> > barrier.to_process = curproc->p_p;
> > -
> > - mtx_enter(&timeout_mutex);
> > SET(barrier.to_flags, TIMEOUT_ONQUEUE);
> > CIRCQ_INSERT_TAIL(&timeout_proc, &barrier.to_list);
> > mtx_leave(&timeout_mutex);
> > -
> > wakeup_one(&timeout_proc);
> > -
> > cond_wait(&c, "tmobar");
> > + } else {
> > + mtx_leave(&timeout_mutex);
> > + /* XXX Is this in the right spot? */
> > + splx(splsoftclock());
> > + while (timeout_running == to)
> > + CPU_BUSY_CYCLE();
>
> Won't splx() will execute the soft-interrupt if there's any pending?
> Shouldn't the barrier be before? Could you add `spc->spc_spinning++'
> around the spinning loop? What happen if two threads call
> timeout_del_barrier(9) with the same argument at the same time? Is
> it possible and/or supported?
the timeout passed to timeout_barrier is only used to figure out which
context the barrier should apply to, ie, it's used to pick between the
softint or proc queue runners. like the other barriers in other parts of
the kernel, it's just supposed to wait for the current work to finish
running.
it is unfortunate that the timeout_barrier man page isn't very
clear. it's worth reading the manpage for intr_barrier and
taskq_barrier. sched_barrier is a thing that exists too, but isn't
documented. also have a look at the EXAMPLE in the cond_wait(9) manpage
too. however, don't read the taskq_barrier code. timeout_barrier
is like intr_barrier in that it uses the argument to find out where
work runs, but again, it doesn't care about the specific handler
itself. we could have a timeout_barrier(void) and a
timeout_barrier_proc(void) instead.
the point im trying to make here is that timeout_barrier isn't a
snowflake that exists in a vacuum, and should have similar semantics to
barriers in the other places in the kernel.
something like this should work. i haven't kicked it around much
though, so maybe it doesn't. it would be nice if there was a
CIRQ_INSERT_HEAD to use, but that would be a separate change.
Index: kern_timeout.c
===================================================================
RCS file: /cvs/src/sys/kern/kern_timeout.c,v
retrieving revision 1.83
diff -u -p -r1.83 kern_timeout.c
--- kern_timeout.c 8 Feb 2021 08:18:45 -0000 1.83
+++ kern_timeout.c 4 May 2021 11:49:46 -0000
@@ -175,7 +175,7 @@ void softclock_thread(void *);
uint32_t timeout_bucket(const struct timeout *);
uint32_t timeout_maskwheel(uint32_t, const struct timespec *);
void timeout_run(struct timeout *);
-void timeout_proc_barrier(void *);
+void timeout_barrier_signal(void *);
/*
* The first thing in a struct timeout is its struct circq, so we
@@ -490,34 +490,33 @@ timeout_del_barrier(struct timeout *to)
void
timeout_barrier(struct timeout *to)
{
- int needsproc = ISSET(to->to_flags, TIMEOUT_PROC);
+ int proc = ISSET(to->to_flags, TIMEOUT_PROC);
+ struct cond c = COND_INITIALIZER();
+ struct timeout barrier;
- timeout_sync_order(needsproc);
-
- if (!needsproc) {
- KERNEL_LOCK();
- splx(splsoftclock());
- KERNEL_UNLOCK();
- } else {
- struct cond c = COND_INITIALIZER();
- struct timeout barrier;
+ timeout_sync_order(proc);
- timeout_set_proc(&barrier, timeout_proc_barrier, &c);
- barrier.to_process = curproc->p_p;
+ _timeout_set(&barrier, timeout_barrier_signal, &c, proc, KCLOCK_NONE);
+#if NKCOV > 0
+ barrier.to_process = curproc->p_p;
+#endif
- mtx_enter(&timeout_mutex);
- SET(barrier.to_flags, TIMEOUT_ONQUEUE);
- CIRCQ_INSERT_TAIL(&timeout_proc, &barrier.to_list);
- mtx_leave(&timeout_mutex);
+ mtx_enter(&timeout_mutex);
+ SET(barrier.to_flags, TIMEOUT_ONQUEUE);
+ CIRCQ_INSERT_TAIL(proc ? &timeout_proc : &timeout_todo,
+ &barrier.to_list);
+ mtx_leave(&timeout_mutex);
+ if (proc)
wakeup_one(&timeout_proc);
+ else
+ softintr_schedule(softclock_si);
- cond_wait(&c, "tmobar");
- }
+ cond_wait(&c, "tmobar");
}
void
-timeout_proc_barrier(void *arg)
+timeout_barrier_signal(void *arg)
{
struct cond *c = arg;