Re: [Qemu-devel] [PATCH 16/16] cpus-common: lock-free fast path for cpu_exec_start/end

Emilio G. Cota Wed, 21 Sep 2016 15:17:20 -0700

On Wed, Sep 21, 2016 at 20:19:18 +0200, Paolo Bonzini wrote:
(snip)
> No, this is not true.  Barriers order stores and loads within a thread
> _and_ establish synchronizes-with edges.
> 
> In the example above you are violating causality:
> 
> - cpu0 stores cpu->running before loading pending_cpus
> 
> - because pending_cpus == 0, cpu1 stores pending_cpus = 1 after cpu0
> loads it
> 
> - cpu1 loads cpu->running after it stores pending_cpus


OK. So I simplified the example to understand this better:

cpu0                    cpu1
----                    ----
   { A = B = 0, r0 and r1 are private variables }
x = 1                   y = 1
smp_mb()                smp_mb()
r0 = y                  r1 = x

Turns out this is scenario 10 here: https://lwn.net/Articles/573436/

The source of my confusion was not paying due attention to smp_mb,
which is necessary for maintaining transitivity.

> > Is there a performance (scalability) reason behind this patch?
> 
> Yes: it speeds up all cpu_exec_start/end, _not_ start/end_exclusive.
> 
> With this patch, as long as there are no start/end_exclusive (which are
> supposed to be rare) there is no contention on multiple CPUs doing
> cpu_exec_start/end.
> 
> Without it, as CPUs increase, the global cpu_list_mutex is going to
> become a bottleneck.

I see. Scalability-wise I wouldn't expect much improvement with MTTCG
full-system, given that the iothread lock is still acquired on every
CPU loop exit (just like in KVM). However, for user-mode this should
yield measurable improvements =D

Thanks,

                E.

Re: [Qemu-devel] [PATCH 16/16] cpus-common: lock-free fast path for cpu_exec_start/end

Reply via email to