On Tue, Feb 14, 2012 at 1:12 PM, Alex Barcelo <abarc...@ac.upc.edu> wrote: > On Tue, Feb 14, 2012 at 13:17, Stefan Hajnoczi <stefa...@gmail.com> wrote: >> On Tue, Feb 14, 2012 at 11:38 AM, Alex Barcelo <abarc...@ac.upc.edu> wrote: >>> On Tue, Feb 14, 2012 at 09:33, Stefan Hajnoczi <stefa...@gmail.com> wrote: >>>> On Mon, Feb 13, 2012 at 04:11:15PM +0100, Alex Barcelo wrote: >>>>> This new implementation... well, it seems to work (I have done an >>>>> ubuntu installation with a cdrom and a qcow drive, which seems to use >>>>> quite a lot of coroutines). Of course I have done the coroutine-test >>>>> and it was OK. But... I wasn't confident enough to propose it as a >>>>> "mature alternative". And I don't have any performance benchmark, >>>>> which would be interesting. So, I thought that the better option would >>>>> be to send this patch to the developers as an alternative to ucontext. >>>> >>>> As a starting point, I suggest looking at >>>> test-coroutine.c:perf_lifecycle(). It's a simple create-and-then-enter >>>> benchmark which measures the latency of doing this. I expect you will >>>> find performance is identical to the ucontext version because the >>>> coroutine should be pooled and created using sigaltstack only once. >>>> >>>> The interesting thing would be to benchmark ucontext coroutine creation >>>> against sigaltstack. Even then it may not matter much as long as pooled >>>> coroutines are used most of the time. >>> >>> Didn't see the performance mode for test-coroutine. Now a benchmark >>> test it's easy (it's half-done). The lifecycle is not a good >>> benchmark, because sigaltstack is only called once. (As you said, the >>> timing change in less than 1%). >>> >>> I thought that it would be interesting to add a performance test for >>> nesting (which can be coroutine creation intensive). So I did it. I >>> will send as a patch, is simple but it works for this. >>> >>> The preliminary results are: >>> ucontext (traditional) method: >>> MSG: Nesting 1000000 iterations of 100000 depth each: 0.452988 s >>> >>> sigaltstack (new) method: >>> MSG: Nesting 1000000 iterations of 100000 depth each: 0.689649 s >> >> Plase run the tests with more iterations. The execution time should >> be several seconds to reduce any scheduler impact or other hickups. I >> suggest scaling iterations up to around 10 seconds. > > Ok, 10.2s vs 10.5s (still wins the traditional ucontext, but it > doesn't seem relevant any more). > >>> The sigaltstack is worse (well, it doesn't surprise me, it's more >>> complicated and does more jumps and is a code flow more erratic). But >>> a loss in efficiency in coroutines should not be important (how many >>> coroutines are created in a typical qemu-system execution? I'm >>> thinking "one"). Also as you said ;) pooled coroutines are used most >>> of the time, in real qemu-system execution. >> >> No, a lot of coroutines are created - each parallel disk I/O request >> involves a coroutine. Coroutines are also being used in other >> subsystems (e.g. virtfs). >> >> Hopefully the number active coroutines is still <100 but it's definitely >1. > > I put a "Hello world, look, I'm in a coroutine" printf inside the > coroutine creation function, and I have only seen it twice in a normal > qemu-system execution. And I was doubting.
Run a couple of dd if=/dev/vda of=/dev/null iflag=direct processes inside the guest to get some parallel I/O requests going. Stefan