On Thu, Jul 15, 2010 at 07:30:41PM +0200, Andrej Podzimek wrote:
> >I don't know why you're having problems, but for efficiency's sake the
> >amount of work being dispatched to the task queue should be at least
> >(handwaving wildly) 10-100x more time consuming that the operation of
> >dispatching itself.
> 
> Well, this is exactly why I am having problems. The dispatched tasks
> are so short that dispatching can easily take (shooting in the dark)
> 20 times longer than the tasks themselves. Consequently, the task
> queue threads need to hold their bucket mutexes locked for 95% of
> their effective running time. This is not (yet) disastrous, since
> there are per CPU buckets. However, once all buckets get full, most
> threads (and especially the task producers) start to compete for the
> global task queue mutex, which is what I observe. They spin all the
> time and it is just a matter of chance that those 100 tasks per second
> eventually get dispatched.

Task queues are useful when you have: a) a task that can take a fairly
long amount of time to process, possibly unbounded, or just involving
high latencies (e.g., disk I/O), b) a thread that must produce a result
quickly.

For example, in the secure NFS stack we used to have nfsd threads
blocking on upcalls to gssd(1M).  This was problematic because nfsd is
designed to have a fixed number of threads that at most block on a
single filesystem transaction, but gssd could take longer than that to
complete an upcall.  So we changed the secure RPC implementation to
queue up the part of the RPCSEC_GSS handling that requires upcalls to
gssd; the nfsd thread unwinds immediately after queueing that task.

There are probably many task queue examples where the latency mismatch
between the caller and the task is less impressive, but there should
always be some such latency mismatch, otherwise there's no point to
using task queues.

> What I originally tested was a workload with big bursts (== millions)
> of small tasks that take just a few instructions in most cases, but
> some of them (perhaps one in a thousand) *might* sleep. Batching tasks
> works fine (as already mentioned), but then none of the tasks can
> sleep. So I tried the task queues, but they are obviously not designed
> for this type of workload. As you say, a task would have to take much
> more time than the dispatching overhead to keep mutex contention
> acceptable.

Can you re-organize the tasks such that you only queue the tasks that
must sleep?  For example, if the sleeping comes from kmem_*alloc() then
you could use KM_NOSLEEP instead of KM_SLEEP and then queue up tasks
only when the allocation fails (the queued task will use KM_SLEEP, of
course).

Nico
-- 
_______________________________________________
on-discuss mailing list
on-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/on-discuss

Reply via email to