Hi,

tl;dr if you notice new weird behavior in rumprun, it's probably the new scheduler. file an issue, please.

The thread scheduler we inherited into rumprun from the Xen Mini-OS was not designed to work in a situation with several infrequently running threads. While that was ok for Mini-OS, a rump kernel creates a large('ish) number of kernel threads at bootstrap. Now that we support variable sized stacks, those threads didn't cost us more than a few pages of memory per thread, except when the scheduler ran. The old scheduler kept everything in a single runqueue and moved threads to the end of the queue when they were run. Therefore, a vast majority of the scheduling operations involved going over a large amount of blocking threads before finding actual work to schedule.

I rewrote the scheduler to use separate run/block/timeout queues for essentially O(1) scheduling. In a microbenchmark with two sched_yield()ing threads the new scheduler gives >50% better performance for that scientifically "common workload" of incrementing an integer by one every time a thread runs. It will probably translate to a few % better application performance in heavy network use scenarios, so not a magical enough optimization to glow in the dark, but a few % is always a few %. Getting closer to beating Linux performance ....

All tests pass on both baremetal and Xen (~200 total runs), but given that a scheduler is finicky by nature and that our test suite could be stronger, my confidence in the code is not extremely high it will be perfect before a bug shakedown period. If suspicious problems appear, do not hesitate to speculatively blame me.

  - antti

Reply via email to