On 28/11/2014 12:21, Peter Lieven wrote: > Am 28.11.2014 um 12:14 schrieb Paolo Bonzini: >>> master: >>> Run operation 40000000 iterations 12.851414 s, 3112K operations/s, 321ns >>> per coroutine >>> >>> paolo: >>> Run operation 40000000 iterations 11.951720 s, 3346K operations/s, 298ns >>> per coroutine >> Nice. :) >> >> Can you please try "coroutine: Use __thread … " together, too? I still >> see 11% time spent in pthread_getspecific, and I get ~10% more indeed if >> I apply it here (my times are 191/160/145). > > indeed: > > Run operation 40000000 iterations 10.138684 s, 3945K operations/s, 253ns per > coroutine
Your perf_master2 uses the ring buffer unconditionally, right? I wonder if we can use a similar algorithm but with arrays instead of lists... Paolo