Re: [PATCH -next 0/2] fs/epoll: loosen irq safety when possible
On Thu, 06 Sep 2018, Peter Zijlstra wrote: On Fri, Jul 20, 2018 at 01:05:59PM -0700, Davidlohr Bueso wrote: On Fri, 20 Jul 2018, Andrew Morton wrote: >I'm surprised. Is spin_lock_irqsave() significantly more expensive >than spin_lock_irq()? Relative to all the other stuff those functions >are doing? If so, how come? Some architectural thing makes >local_irq_save() much more costly than local_irq_disable()? For example, if you compare x86 native_restore_fl() to xen_restore_fl(), the cost of Xen is much higher. Xen is a moot argument. IIRC the point is that POPF (as used by *irqrestore()) is a very expensive operation because it changes all flags and thus has very 'difficult' instruction dependencies, killing the front end reorder and generating a giant bubble in the pipeline. Similarly, I suppose PUSHF is an expensive instruction because it needs all the flags 'stable' and thus needs to wait for a fair number of prior instructions to retire before it can get on with it. Combined the whole PUSHF + POPF is _far_ more expensive than STI + CLI, because the latter only has dependencies on instructions that muck about with IF -- not that many. ack. In fact it turns out that my Xen numbers for this patch were actually native (popf), and not the xen_restore_fl() as it was using hvm and not paravirt. Thanks, Davidlohr
Re: [PATCH -next 0/2] fs/epoll: loosen irq safety when possible
On Fri, Jul 20, 2018 at 01:05:59PM -0700, Davidlohr Bueso wrote: > On Fri, 20 Jul 2018, Andrew Morton wrote: > >I'm surprised. Is spin_lock_irqsave() significantly more expensive > >than spin_lock_irq()? Relative to all the other stuff those functions > >are doing? If so, how come? Some architectural thing makes > >local_irq_save() much more costly than local_irq_disable()? > > For example, if you compare x86 native_restore_fl() to xen_restore_fl(), > the cost of Xen is much higher. Xen is a moot argument. IIRC the point is that POPF (as used by *irqrestore()) is a very expensive operation because it changes all flags and thus has very 'difficult' instruction dependencies, killing the front end reorder and generating a giant bubble in the pipeline. Similarly, I suppose PUSHF is an expensive instruction because it needs all the flags 'stable' and thus needs to wait for a fair number of prior instructions to retire before it can get on with it. Combined the whole PUSHF + POPF is _far_ more expensive than STI + CLI, because the latter only has dependencies on instructions that muck about with IF -- not that many.
Re: [PATCH -next 0/2] fs/epoll: loosen irq safety when possible
On Sat, 21 Jul 2018 11:31:27 -0700 Davidlohr Bueso wrote: > On Sat, 21 Jul 2018, Peter Zijlstra wrote: > > >On Sat, Jul 21, 2018 at 10:21:20AM -0700, Davidlohr Bueso wrote: > >> On Fri, 20 Jul 2018, Andrew Morton wrote: > >> > >> > We could open-code it locally. Add a couple of > >> > WARN_ON_ONCE(irqs_disabled())? That might need re-benchmarking with > >> > Xen but surely just reading the thing isn't too expensive? > >> > >> We could also pass on the responsibility to lockdep and just use > >> lockdep_assert_irqs_disabled(). But I guess that would be less effective > >> than to just open code it in epoll without lockdep -- note that over 80 > >> places in the kernel do this. > > > >The lockdep thing is relatively recent. I think someone proposed to go > >replace a bunch of the open-coded ones at some point. > > For the open coded checks, I'm seeing a small (1-2% ish) cost for bare > metal on workload 1). I don't see (via code inspection) any additional > overhead in xen either. While negligible in the overall of things, I do > like the idea of lockdep handling it nonetheless. It's good that the overhead goes away when lockdep is disabled. That's a big advantage over open-coding it. > I can add the open coded version if people really feel that it would catch > more bugs (no lockdep users out there in production afaik :) in the long > term; but if lockdep is where things are headed... > > ... > > --- a/fs/eventpoll.c > +++ b/fs/eventpoll.c > @@ -670,6 +670,9 @@ static __poll_t ep_scan_ready_list(struct eventpoll *ep, > struct epitem *epi, *nepi; > LIST_HEAD(txlist); > > + /* must not be called with irqs off */ > + lockdep_assert_irqs_enabled(); Although I be the humble comment's most avid fan, these ones are too obvious to be worth the space ;) Would you survive if I zapped 'em? --- a/fs/eventpoll.c~fs-epoll-robustify-irq-safety-with-lockdep_assert_irqs_enabled-fix +++ a/fs/eventpoll.c @@ -670,7 +670,6 @@ static __poll_t ep_scan_ready_list(struc struct epitem *epi, *nepi; LIST_HEAD(txlist); - /* must not be called with irqs off */ lockdep_assert_irqs_enabled(); /* @@ -767,7 +766,6 @@ static int ep_remove(struct eventpoll *e { struct file *file = epi->ffd.file; - /* must not be called with irqs off */ lockdep_assert_irqs_enabled(); /* @@ -1418,7 +1416,6 @@ static int ep_insert(struct eventpoll *e struct epitem *epi; struct ep_pqueue epq; - /* must not be called with irqs off */ lockdep_assert_irqs_enabled(); user_watches = atomic_long_read(&ep->user->epoll_watches); @@ -1549,7 +1546,6 @@ static int ep_modify(struct eventpoll *e int pwake = 0; poll_table pt; - /* must not be called with irqs off */ lockdep_assert_irqs_enabled(); init_poll_funcptr(&pt, NULL); _
Re: [PATCH -next 0/2] fs/epoll: loosen irq safety when possible
On Sat, 21 Jul 2018, Peter Zijlstra wrote: On Sat, Jul 21, 2018 at 10:21:20AM -0700, Davidlohr Bueso wrote: On Fri, 20 Jul 2018, Andrew Morton wrote: > We could open-code it locally. Add a couple of > WARN_ON_ONCE(irqs_disabled())? That might need re-benchmarking with > Xen but surely just reading the thing isn't too expensive? We could also pass on the responsibility to lockdep and just use lockdep_assert_irqs_disabled(). But I guess that would be less effective than to just open code it in epoll without lockdep -- note that over 80 places in the kernel do this. The lockdep thing is relatively recent. I think someone proposed to go replace a bunch of the open-coded ones at some point. For the open coded checks, I'm seeing a small (1-2% ish) cost for bare metal on workload 1). I don't see (via code inspection) any additional overhead in xen either. While negligible in the overall of things, I do like the idea of lockdep handling it nonetheless. I can add the open coded version if people really feel that it would catch more bugs (no lockdep users out there in production afaik :) in the long term; but if lockdep is where things are headed... Thanks, Davidlohr ---8< [PATCH -next 3/2] fs/epoll: robustify irq safety with lockdep_assert_irqs_enabled() Sprinkle lockdep_assert_irqs_enabled() checks in the functions that do not save and restore interrupts when dealing with the ep->wq.lock. These are ep_scan_ready_list() and those called by epoll_ctl(): ep_insert, ep_modify and ep_remove. Signed-off-by: Davidlohr Bueso --- fs/eventpoll.c | 12 1 file changed, 12 insertions(+) diff --git a/fs/eventpoll.c b/fs/eventpoll.c index 1b1abc461fc0..97b9b73dfec8 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -670,6 +670,9 @@ static __poll_t ep_scan_ready_list(struct eventpoll *ep, struct epitem *epi, *nepi; LIST_HEAD(txlist); + /* must not be called with irqs off */ + lockdep_assert_irqs_enabled(); + /* * We need to lock this because we could be hit by * eventpoll_release_file() and epoll_ctl(). @@ -764,6 +767,9 @@ static int ep_remove(struct eventpoll *ep, struct epitem *epi) { struct file *file = epi->ffd.file; + /* must not be called with irqs off */ + lockdep_assert_irqs_enabled(); + /* * Removes poll wait queue hooks. */ @@ -1412,6 +1418,9 @@ static int ep_insert(struct eventpoll *ep, const struct epoll_event *event, struct epitem *epi; struct ep_pqueue epq; + /* must not be called with irqs off */ + lockdep_assert_irqs_enabled(); + user_watches = atomic_long_read(&ep->user->epoll_watches); if (unlikely(user_watches >= max_user_watches)) return -ENOSPC; @@ -1540,6 +1549,9 @@ static int ep_modify(struct eventpoll *ep, struct epitem *epi, int pwake = 0; poll_table pt; + /* must not be called with irqs off */ + lockdep_assert_irqs_enabled(); + init_poll_funcptr(&pt, NULL); /* -- 2.16.4
Re: [PATCH -next 0/2] fs/epoll: loosen irq safety when possible
On Sat, Jul 21, 2018 at 10:21:20AM -0700, Davidlohr Bueso wrote: > On Fri, 20 Jul 2018, Andrew Morton wrote: > > > We could open-code it locally. Add a couple of > > WARN_ON_ONCE(irqs_disabled())? That might need re-benchmarking with > > Xen but surely just reading the thing isn't too expensive? > > We could also pass on the responsibility to lockdep and just use > lockdep_assert_irqs_disabled(). But I guess that would be less effective > than to just open code it in epoll without lockdep -- note that over 80 > places in the kernel do this. The lockdep thing is relatively recent. I think someone proposed to go replace a bunch of the open-coded ones at some point.
Re: [PATCH -next 0/2] fs/epoll: loosen irq safety when possible
On Fri, 20 Jul 2018, Andrew Morton wrote: We could open-code it locally. Add a couple of WARN_ON_ONCE(irqs_disabled())? That might need re-benchmarking with Xen but surely just reading the thing isn't too expensive? We could also pass on the responsibility to lockdep and just use lockdep_assert_irqs_disabled(). But I guess that would be less effective than to just open code it in epoll without lockdep -- note that over 80 places in the kernel do this. Thanks, Davidlohr
Re: [PATCH -next 0/2] fs/epoll: loosen irq safety when possible
On Fri, 20 Jul 2018, Andrew Morton wrote: Did you try measuring it on bare hardware? I did and wasn't expecting much difference. For a 2-socket 40-core (ht) IvyBridge on a few workloads, unfortunately I don't have a xen environment and the results for Xen I do have (which numbers are in patch 1) I don't have the actual workload, so cannot compare them directly. 1) Different configurations were used for a epoll_wait (pipes io) microbench (http://linux-scalability.org/epoll/epoll-test.c) and shows around a 7-10% improvement in overall total number of times the epoll_wait() loops when using both regular and nested epolls, so very raw numbers, but measurable nonetheless. # threads vanilla dirty 1 1677717 1805587 2 1660510 1854064 4 1610184 1805484 8 1577696 1751222 16 1568837 1725299 32 1291532 1378463 64 752584 787368 Note that stddev is pretty small. 2) Another pipe test, which shows no real measurable improvement. (http://www.xmailserver.org/linux-patches/pipetest.c) > >I'd have more confidence if we had some warning mechanism if we run >spin_lock_irq() when IRQs are disabled, which is probably-a-bug. But >afaict we don't have that. Probably for good reasons - I wonder what >they are? Well ignored ;) We could open-code it locally. Add a couple of WARN_ON_ONCE(irqs_disabled())? That might need re-benchmarking with Xen but surely just reading the thing isn't too expensive? I agree, I'll see what I can come up with and also ask the customer to test in his setup. Bare metal would also need some new numbers I guess. Thanks, Davidlohr
Re: [PATCH -next 0/2] fs/epoll: loosen irq safety when possible
On Fri, 20 Jul 2018 13:05:59 -0700 Davidlohr Bueso wrote: > On Fri, 20 Jul 2018, Andrew Morton wrote: > > >On Fri, 20 Jul 2018 10:29:54 -0700 Davidlohr Bueso wrote: > > > >> Hi, > >> > >> Both patches replace saving+restoring interrupts when taking the > >> ep->lock (now the waitqueue lock), with just disabling local irqs. > >> This shows immediate performance benefits in patch 1 for an epoll > >> workload running on Xen. > > > >I'm surprised. Is spin_lock_irqsave() significantly more expensive > >than spin_lock_irq()? Relative to all the other stuff those functions > >are doing? If so, how come? Some architectural thing makes > >local_irq_save() much more costly than local_irq_disable()? > > For example, if you compare x86 native_restore_fl() to xen_restore_fl(), > the cost of Xen is much higher. > > And at least considering ep_scan_ready_list(), the lock is taken/released > twice, to deal with the ovflist when the ep->wq.lock is not held. To the > point that it yields measurable results (see patch 1) across incremental > thread counts. Did you try measuring it on bare hardware? > > > >> The main concern we need to have with this > >> sort of changes in epoll is the ep_poll_callback() which is passed > >> to the wait queue wakeup and is done very often under irq context, > >> this patch does not touch this call. > > > >Yeah, these changes are scary. For the code as it stands now, and for > >the code as it evolves. > > Yes which is why I've been throwing lots of epoll workloads at it. I'm sure. It's the "as it evolves" that is worrisome, and has caught us in the past. > > > >I'd have more confidence if we had some warning mechanism if we run > >spin_lock_irq() when IRQs are disabled, which is probably-a-bug. But > >afaict we don't have that. Probably for good reasons - I wonder what > >they are? Well ignored ;) We could open-code it locally. Add a couple of WARN_ON_ONCE(irqs_disabled())? That might need re-benchmarking with Xen but surely just reading the thing isn't too expensive?
Re: [PATCH -next 0/2] fs/epoll: loosen irq safety when possible
On Fri, 20 Jul 2018, Andrew Morton wrote: On Fri, 20 Jul 2018 10:29:54 -0700 Davidlohr Bueso wrote: Hi, Both patches replace saving+restoring interrupts when taking the ep->lock (now the waitqueue lock), with just disabling local irqs. This shows immediate performance benefits in patch 1 for an epoll workload running on Xen. I'm surprised. Is spin_lock_irqsave() significantly more expensive than spin_lock_irq()? Relative to all the other stuff those functions are doing? If so, how come? Some architectural thing makes local_irq_save() much more costly than local_irq_disable()? For example, if you compare x86 native_restore_fl() to xen_restore_fl(), the cost of Xen is much higher. And at least considering ep_scan_ready_list(), the lock is taken/released twice, to deal with the ovflist when the ep->wq.lock is not held. To the point that it yields measurable results (see patch 1) across incremental thread counts. The main concern we need to have with this sort of changes in epoll is the ep_poll_callback() which is passed to the wait queue wakeup and is done very often under irq context, this patch does not touch this call. Yeah, these changes are scary. For the code as it stands now, and for the code as it evolves. Yes which is why I've been throwing lots of epoll workloads at it. I'd have more confidence if we had some warning mechanism if we run spin_lock_irq() when IRQs are disabled, which is probably-a-bug. But afaict we don't have that. Probably for good reasons - I wonder what they are? Patches have been tested pretty heavily with the customer workload, microbenchmarks, ltp testcases and two high level workloads that use epoll under the hood: nginx and libevent benchmarks. Details are in the individual patches. Applies on top of mmotd. Please convince me about the performance benefits? As for number I only have patch 1. Thanks, Davidlohr
Re: [PATCH -next 0/2] fs/epoll: loosen irq safety when possible
On Fri, 20 Jul 2018 10:29:54 -0700 Davidlohr Bueso wrote: > Hi, > > Both patches replace saving+restoring interrupts when taking the > ep->lock (now the waitqueue lock), with just disabling local irqs. > This shows immediate performance benefits in patch 1 for an epoll > workload running on Xen. I'm surprised. Is spin_lock_irqsave() significantly more expensive than spin_lock_irq()? Relative to all the other stuff those functions are doing? If so, how come? Some architectural thing makes local_irq_save() much more costly than local_irq_disable()? > The main concern we need to have with this > sort of changes in epoll is the ep_poll_callback() which is passed > to the wait queue wakeup and is done very often under irq context, > this patch does not touch this call. Yeah, these changes are scary. For the code as it stands now, and for the code as it evolves. I'd have more confidence if we had some warning mechanism if we run spin_lock_irq() when IRQs are disabled, which is probably-a-bug. But afaict we don't have that. Probably for good reasons - I wonder what they are? > Patches have been tested pretty heavily with the customer workload, > microbenchmarks, ltp testcases and two high level workloads that > use epoll under the hood: nginx and libevent benchmarks. > > Details are in the individual patches. > > Applies on top of mmotd. Please convince me about the performance benefits?
[PATCH -next 0/2] fs/epoll: loosen irq safety when possible
Hi, Both patches replace saving+restoring interrupts when taking the ep->lock (now the waitqueue lock), with just disabling local irqs. This shows immediate performance benefits in patch 1 for an epoll workload running on Xen. The main concern we need to have with this sort of changes in epoll is the ep_poll_callback() which is passed to the wait queue wakeup and is done very often under irq context, this patch does not touch this call. Patches have been tested pretty heavily with the customer workload, microbenchmarks, ltp testcases and two high level workloads that use epoll under the hood: nginx and libevent benchmarks. Details are in the individual patches. Applies on top of mmotd. Thanks! Davidlohr Bueso (2): fs/epoll: loosen irq safety in ep_scan_ready_list() fs/epoll: loosen irq safety in epoll_insert() and epoll_remove() fs/eventpoll.c | 23 ++- 1 file changed, 10 insertions(+), 13 deletions(-) -- 2.16.4