Re: [Qemu-devel] [RFC PATCH 3/3] qemu-coroutine: use a ring per thread for the pool

Peter Lieven Fri, 28 Nov 2014 04:41:33 -0800

Am 28.11.2014 um 13:26 schrieb Paolo Bonzini:
>
> On 28/11/2014 12:46, Peter Lieven wrote:
>>> I get:
>>> Run operation 40000000 iterations 9.883958 s, 4046K operations/s, 247ns per 
>>> coroutine
>> Ok, understood, it "steals" the whole pool, right? Isn't that bad if we have 
>> more
>> than one thread in need of a lot of coroutines?
> Overall the algorithm is expected to adapt.  The N threads contribute to
> the global release pool, so the pool will fill up N times faster than if
> you had only one thread.  There can be some variance, which is why the
> maximum size of the pool is twice the threshold (and probably could be
> tuned better).
>
> Benchmarks are needed on real I/O too, of course, especially with high
> queue depth.


Yes, cool. The atomic operations are a bit tricky at the first glance ;-)

Question:
 Why is the pool_size increment atomic and the set to zero not?
 
Idea:
 If the release_pool is full why not put the coroutine in the thread alloc_pool 
instead of throwing it away? :-)

Run operation 40000000 iterations 9.057805 s, 4416K operations/s, 226ns per 
coroutine

diff --git a/qemu-coroutine.c b/qemu-coroutine.c
index 6bee354..edea162 100644
--- a/qemu-coroutine.c
+++ b/qemu-coroutine.c
@@ -25,8 +25,9 @@ enum {
 
 /** Free list to speed up creation */
 static QSLIST_HEAD(, Coroutine) release_pool = QSLIST_HEAD_INITIALIZER(pool);
-static unsigned int pool_size;
+static unsigned int release_pool_size;
 static __thread QSLIST_HEAD(, Coroutine) alloc_pool = 
QSLIST_HEAD_INITIALIZER(pool);
+static __thread unsigned int alloc_pool_size;
 
 /* The GPrivate is only used to invoke coroutine_pool_cleanup.  */
 static void coroutine_pool_cleanup(void *value);
@@ -39,12 +40,12 @@ Coroutine *qemu_coroutine_create(CoroutineEntry *entry)
     if (CONFIG_COROUTINE_POOL) {
         co = QSLIST_FIRST(&alloc_pool);
         if (!co) {
-            if (pool_size > POOL_BATCH_SIZE) {
-                /* This is not exact; there could be a little skew between 
pool_size
+            if (release_pool_size > POOL_BATCH_SIZE) {
+                /* This is not exact; there could be a little skew between 
release_pool_size
                  * and the actual size of alloc_pool.  But it is just a 
heuristic,
                  * it does not need to be perfect.
                  */
-                pool_size = 0;
+                alloc_pool_size = atomic_fetch_and(&release_pool_size, 0);
                 QSLIST_MOVE_ATOMIC(&alloc_pool, &release_pool);
                 co = QSLIST_FIRST(&alloc_pool);
 
@@ -53,6 +54,8 @@ Coroutine *qemu_coroutine_create(CoroutineEntry *entry)
                  */
                 g_private_set(&dummy_key, &dummy_key);
             }
+        } else {
+            alloc_pool_size--;
         }
         if (co) {
             QSLIST_REMOVE_HEAD(&alloc_pool, pool_next);
@@ -71,10 +74,15 @@ Coroutine *qemu_coroutine_create(CoroutineEntry *entry)
 static void coroutine_delete(Coroutine *co)
 {
     if (CONFIG_COROUTINE_POOL) {
-        if (pool_size < POOL_BATCH_SIZE * 2) {
+        if (release_pool_size < POOL_BATCH_SIZE * 2) {
             co->caller = NULL;
             QSLIST_INSERT_HEAD_ATOMIC(&release_pool, co, pool_next);
-            atomic_inc(&pool_size);
+            atomic_inc(&release_pool_size);
+            return;
+        } else if (alloc_pool_size < POOL_BATCH_SIZE) {
+            co->caller = NULL;
+            QSLIST_INSERT_HEAD(&alloc_pool, co, pool_next);
+            alloc_pool_size++;
             return;
         }
     }


Bug?:
 The release_pool is not cleanup up on termination I think.

Peter

Re: [Qemu-devel] [RFC PATCH 3/3] qemu-coroutine: use a ring per thread for the pool

Reply via email to