Am 28.11.2014 um 13:26 schrieb Paolo Bonzini: > > On 28/11/2014 12:46, Peter Lieven wrote: >>> I get: >>> Run operation 40000000 iterations 9.883958 s, 4046K operations/s, 247ns per >>> coroutine >> Ok, understood, it "steals" the whole pool, right? Isn't that bad if we have >> more >> than one thread in need of a lot of coroutines? > Overall the algorithm is expected to adapt. The N threads contribute to > the global release pool, so the pool will fill up N times faster than if > you had only one thread. There can be some variance, which is why the > maximum size of the pool is twice the threshold (and probably could be > tuned better). > > Benchmarks are needed on real I/O too, of course, especially with high > queue depth.
Yes, cool. The atomic operations are a bit tricky at the first glance ;-) Question: Why is the pool_size increment atomic and the set to zero not? Idea: If the release_pool is full why not put the coroutine in the thread alloc_pool instead of throwing it away? :-) Run operation 40000000 iterations 9.057805 s, 4416K operations/s, 226ns per coroutine diff --git a/qemu-coroutine.c b/qemu-coroutine.c index 6bee354..edea162 100644 --- a/qemu-coroutine.c +++ b/qemu-coroutine.c @@ -25,8 +25,9 @@ enum { /** Free list to speed up creation */ static QSLIST_HEAD(, Coroutine) release_pool = QSLIST_HEAD_INITIALIZER(pool); -static unsigned int pool_size; +static unsigned int release_pool_size; static __thread QSLIST_HEAD(, Coroutine) alloc_pool = QSLIST_HEAD_INITIALIZER(pool); +static __thread unsigned int alloc_pool_size; /* The GPrivate is only used to invoke coroutine_pool_cleanup. */ static void coroutine_pool_cleanup(void *value); @@ -39,12 +40,12 @@ Coroutine *qemu_coroutine_create(CoroutineEntry *entry) if (CONFIG_COROUTINE_POOL) { co = QSLIST_FIRST(&alloc_pool); if (!co) { - if (pool_size > POOL_BATCH_SIZE) { - /* This is not exact; there could be a little skew between pool_size + if (release_pool_size > POOL_BATCH_SIZE) { + /* This is not exact; there could be a little skew between release_pool_size * and the actual size of alloc_pool. But it is just a heuristic, * it does not need to be perfect. */ - pool_size = 0; + alloc_pool_size = atomic_fetch_and(&release_pool_size, 0); QSLIST_MOVE_ATOMIC(&alloc_pool, &release_pool); co = QSLIST_FIRST(&alloc_pool); @@ -53,6 +54,8 @@ Coroutine *qemu_coroutine_create(CoroutineEntry *entry) */ g_private_set(&dummy_key, &dummy_key); } + } else { + alloc_pool_size--; } if (co) { QSLIST_REMOVE_HEAD(&alloc_pool, pool_next); @@ -71,10 +74,15 @@ Coroutine *qemu_coroutine_create(CoroutineEntry *entry) static void coroutine_delete(Coroutine *co) { if (CONFIG_COROUTINE_POOL) { - if (pool_size < POOL_BATCH_SIZE * 2) { + if (release_pool_size < POOL_BATCH_SIZE * 2) { co->caller = NULL; QSLIST_INSERT_HEAD_ATOMIC(&release_pool, co, pool_next); - atomic_inc(&pool_size); + atomic_inc(&release_pool_size); + return; + } else if (alloc_pool_size < POOL_BATCH_SIZE) { + co->caller = NULL; + QSLIST_INSERT_HEAD(&alloc_pool, co, pool_next); + alloc_pool_size++; return; } } Bug?: The release_pool is not cleanup up on termination I think. Peter