On Wed, 01/07 15:08, Stefan Hajnoczi wrote: > On Tue, Dec 16, 2014 at 10:04:38AM +0800, Fam Zheng wrote: > > On Thu, 12/04 11:43, Fam Zheng wrote: > > > v2: Emulate nanoseconds precison of timeout with ppoll and timerfd. > > > Their performance is on par with each other, but both much better than > > > qemu.git: > > > > > > syscall high # of fd low # of fd > > > ------------------------------------------------- > > > qemu.git(ppoll) 44 96 > > > ppoll+epoll 85 101 > > > timerfd+epoll 87 109 > > > > More data points. > > > > Xiaomei tested this series (applied on top of RHEL 7 qemu-kvm-rhev) and > > found > > that: > > > > 0) when # of fds is high, epoll solutions are much better (+30%). > > > > 1) timerfd+epoll is slightly better than ppoll+epoll, but the difference is > > minimal. > > > > 2) original code is 2%~5% faster than the new implementations when # of fds > > is > > low. > > What is "high" and "low"? > > I'd like to understand whether they are extremes that almost no users > will encounter or whether they are plausible in the real world.
In the origin story, "low" means barely few fds, say 15; and "high" means what we get after plugging one virtio-serial device, say 70. I wouldn't consider it a extreme case because we assign one ioeventfd for each vq, and # of vq could be times of host cpu core number. In a relatively big system it can go to a few hundreds, easily. > > > This leads to the conclusion that that we'll have a small performance > > degradation if merge this series. I'm thinking about possible optimizations. > > Options in my mind are: > > > > 1) Remove 1ns PR_SET_TIMERSLACK in timerfd+epoll, this doesn't make > > qemu_poll > > faster than the old qemu_poll_ns, but may have other positive effects that > > compensate the cost. > > Sounds like a random hack. What is the reasoning for messing with timer > slack? In a test this doesn't work. The reason is that timer slack affects poll sys calls' timeout, therefore they are correlated. Anyway, I've left this. Fam > > Perhaps it is worth investigating timer slack as an independent issue > though. > > > 2) Use dynamic switch between ppoll and timerfd+epoll. In poll-linux.c, We > > start with pure ppoll, while keeping track of elapsed time in ppoll. And > > periodically, we try "timerfd+epoll" for a few iterations, so that we can > > compare if it is faster than pure ppoll. If it is, swap them, use > > timerfd+epoll > > and and periodically try "ppoll". > > > > That said, I'll also look at the kernel side. Maybe optimizing ppoll or just > > add EPOLL_NANOSECOND_TIMEOUT to epoll_create1 is a better place for > > engineering. > > I agree that a kernel fix would be good. Even if the patch is rejected, > we might get good ideas on how applications can optimize. > > Stefan