Re: [Qemu-devel] [PATCH v2 0/6] aio: Support epoll by introducing qemu_poll abstraction

Fam Zheng Wed, 07 Jan 2015 18:54:01 -0800

On Wed, 01/07 15:08, Stefan Hajnoczi wrote:
> On Tue, Dec 16, 2014 at 10:04:38AM +0800, Fam Zheng wrote:
> > On Thu, 12/04 11:43, Fam Zheng wrote:
> > > v2: Emulate nanoseconds precison of timeout with ppoll and timerfd.
> > >     Their performance is on par with each other, but both much better than
> > >     qemu.git:
> > > 
> > >     syscall         high # of fd      low # of fd
> > >     -------------------------------------------------
> > >     qemu.git(ppoll) 44                96
> > >     ppoll+epoll     85                101
> > >     timerfd+epoll   87                109
> > 
> > More data points.
> > 
> > Xiaomei tested this series (applied on top of RHEL 7 qemu-kvm-rhev) and 
> > found
> > that:
> > 
> > 0) when # of fds is high, epoll solutions are much better (+30%).
> > 
> > 1) timerfd+epoll is slightly better than ppoll+epoll, but the difference is
> > minimal.
> > 
> > 2) original code is 2%~5% faster than the new implementations when # of fds 
> > is
> > low.
> 
> What is "high" and "low"?
> 
> I'd like to understand whether they are extremes that almost no users
> will encounter or whether they are plausible in the real world.


In the origin story, "low" means barely few fds, say 15; and "high" means what
we get after plugging one virtio-serial device, say 70. I wouldn't consider it
a extreme case because we assign one ioeventfd for each vq, and # of vq could
be times of host cpu core number. In a relatively big system it can go to a few
hundreds, easily.

> 
> > This leads to the conclusion that that we'll have a small performance
> > degradation if merge this series. I'm thinking about possible optimizations.
> > Options in my mind are:
> > 
> > 1) Remove 1ns PR_SET_TIMERSLACK in timerfd+epoll, this doesn't make 
> > qemu_poll
> > faster than the old qemu_poll_ns, but may have other positive effects that
> > compensate the cost.
> 
> Sounds like a random hack.  What is the reasoning for messing with timer
> slack?

In a test this doesn't work.  The reason is that timer slack affects poll
sys calls' timeout, therefore they are correlated. Anyway, I've left this.

Fam

> 
> Perhaps it is worth investigating timer slack as an independent issue
> though.
> 
> > 2) Use dynamic switch between ppoll and timerfd+epoll. In poll-linux.c, We
> > start with pure ppoll, while keeping track of elapsed time in ppoll. And
> > periodically, we try "timerfd+epoll" for a few iterations, so that we can
> > compare if it is faster than pure ppoll. If it is, swap them, use 
> > timerfd+epoll
> > and and periodically try "ppoll".
> > 
> > That said, I'll also look at the kernel side. Maybe optimizing ppoll or just
> > add EPOLL_NANOSECOND_TIMEOUT to epoll_create1 is a better place for
> > engineering.
> 
> I agree that a kernel fix would be good.  Even if the patch is rejected,
> we might get good ideas on how applications can optimize.
> 
> Stefan

Re: [Qemu-devel] [PATCH v2 0/6] aio: Support epoll by introducing qemu_poll abstraction

Reply via email to