Re: [PATCH] rcu: Unify force quiescent state

Akihiko Odaki Sat, 18 Oct 2025 06:58:44 -0700

On 2025/10/17 4:33, Dmitry Osipenko wrote:

On 10/16/25 09:34, Akihiko Odaki wrote:

-        /* Wait for one thread to report a quiescent state and try again.
+        /*
+         * Sleep for a while and try again.
           * Release rcu_registry_lock, so rcu_(un)register_thread() doesn't
           * wait too much time.
           *
@@ -133,7 +150,20 @@ static void wait_for_readers(void)
           * rcu_registry_lock is released.
           */
          qemu_mutex_unlock(&rcu_registry_lock);
-        qemu_event_wait(&rcu_gp_event);
+
+        if (forced) {
+            qemu_event_wait(&rcu_gp_event);
+
+            /*
+             * We want to be notified of changes made to rcu_gp_ongoing
+             * while we walk the list.
+             */
+            qemu_event_reset(&rcu_gp_event);
+        } else {
+            g_usleep(10000);
+            sleeps++;


Thanks a lot for this RCU improvement. It indeed removes the hard stalls
with unmapping of virtio-gpu blobs.

Am I understanding correctly that potentially we will be hitting this
g_usleep(10000) and stall virtio-gpu for the first ~10ms? I.e. the
MemoryRegion patches from Alex [1] are still needed to avoid stalls
entirely.

[1]
https://lore.kernel.org/qemu-devel/[email protected]/


That is right, but "avoiding stalls entirely" also causes use-after-free.

The problem with virtio-gpu on TCG is that TCG keeps using the oldmemory map until force_rcu is triggered. So, without force_rcu, thefollowing pseudo-code on a guest will result in use-after-free:


address = blob_map(resource_id);
blob_unmap(resource_id);

for (i = 0; i < some_big_number; i++)
  *(uint8_t *)address = 0;

*(uint8_t *)address will dereference the blob until force_rcu istriggered, so finalizing MemoryRegion before force_rcu results inuse-after-free.

The best option to eliminate the delay entirely I have in mind is tocall drain_call_rcu(), but I'm not for such a change (for now).drain_call_rcu() eliminates the delay if the FlatView protected by RCUis the only referrer of the MemoryRegion, but that is not guaranteed.

Performance should not be a concern anyway in this situation. The guestshould not waste CPU time by polling in the first place if you reallycare performance; since it's a para-virtualized device and not a realhardware, CPU time may be shared between the guest and the device, andthus polling on the guest has an inherent risk of slowing down thedevice. For performance-sensitive workloads, the guest should:


- avoid polling and
- accumulate commands instead of waiting for each

The delay will be less problematic if the guest does so, and I think atleast Linux does avoid polling.

That said, stalling the guest forever in this situation is "wrong" (!="bad performance"). I wrote this patch to guarantee forward progress,which is mandatory for semantic correctness.

Perhaps drain_call_rcu() may make sense also in other,performance-sensitive scenarios, but it should be added after benchmarkor we will have a immature optimization.


Regards,
Akihiko Odaki

Re: [PATCH] rcu: Unify force quiescent state

Reply via email to