Am 28.11.2014 um 12:23 schrieb Paolo Bonzini: > > On 28/11/2014 12:21, Peter Lieven wrote: >> Am 28.11.2014 um 12:14 schrieb Paolo Bonzini: >>>> master: >>>> Run operation 40000000 iterations 12.851414 s, 3112K operations/s, 321ns >>>> per coroutine >>>> >>>> paolo: >>>> Run operation 40000000 iterations 11.951720 s, 3346K operations/s, 298ns >>>> per coroutine >>> Nice. :) >>> >>> Can you please try "coroutine: Use __thread … " together, too? I still >>> see 11% time spent in pthread_getspecific, and I get ~10% more indeed if >>> I apply it here (my times are 191/160/145). >> indeed: >> >> Run operation 40000000 iterations 10.138684 s, 3945K operations/s, 253ns per >> coroutine > Your perf_master2 uses the ring buffer unconditionally, right? I wonder > if we can use a similar algorithm but with arrays instead of lists...
You mean an algorithm similar to perf_master2 or to the current implementation? The ring buffer seems to have a drawback when it comes to excessive coroutine nesting. My idea was that you do not throw away hot coroutines when the pool is full. However, i do not know if this is really a problem since the pool is only full when there is not much I/O. Or is this assumption to easy? Peter