Re: [PATCH -next 0/2] fs/epoll: loosen irq safety when possible

2018-09-06 Thread Davidlohr Bueso

On Thu, 06 Sep 2018, Peter Zijlstra wrote:


On Fri, Jul 20, 2018 at 01:05:59PM -0700, Davidlohr Bueso wrote:

On Fri, 20 Jul 2018, Andrew Morton wrote:



>I'm surprised.  Is spin_lock_irqsave() significantly more expensive
>than spin_lock_irq()?  Relative to all the other stuff those functions
>are doing?  If so, how come?  Some architectural thing makes
>local_irq_save() much more costly than local_irq_disable()?

For example, if you compare x86 native_restore_fl() to xen_restore_fl(),
the cost of Xen is much higher.


Xen is a moot argument. IIRC the point is that POPF (as used by
*irqrestore()) is a very expensive operation because it changes all
flags and thus has very 'difficult' instruction dependencies, killing
the front end reorder and generating a giant bubble in the pipeline.

Similarly, I suppose PUSHF is an expensive instruction because it needs
all the flags 'stable' and thus needs to wait for a fair number of prior
instructions to retire before it can get on with it.

Combined the whole PUSHF + POPF is _far_ more expensive than STI + CLI,
because the latter only has dependencies on instructions that muck about
with IF -- not that many.


ack.

In fact it turns out that my Xen numbers for this patch were actually
native (popf), and not the xen_restore_fl() as it was using hvm and
not paravirt.

Thanks,
Davidlohr


Re: [PATCH -next 0/2] fs/epoll: loosen irq safety when possible

2018-09-06 Thread Peter Zijlstra
On Fri, Jul 20, 2018 at 01:05:59PM -0700, Davidlohr Bueso wrote:
> On Fri, 20 Jul 2018, Andrew Morton wrote:

> >I'm surprised.  Is spin_lock_irqsave() significantly more expensive
> >than spin_lock_irq()?  Relative to all the other stuff those functions
> >are doing?  If so, how come?  Some architectural thing makes
> >local_irq_save() much more costly than local_irq_disable()?
> 
> For example, if you compare x86 native_restore_fl() to xen_restore_fl(),
> the cost of Xen is much higher.

Xen is a moot argument. IIRC the point is that POPF (as used by
*irqrestore()) is a very expensive operation because it changes all
flags and thus has very 'difficult' instruction dependencies, killing
the front end reorder and generating a giant bubble in the pipeline.

Similarly, I suppose PUSHF is an expensive instruction because it needs
all the flags 'stable' and thus needs to wait for a fair number of prior
instructions to retire before it can get on with it.

Combined the whole PUSHF + POPF is _far_ more expensive than STI + CLI,
because the latter only has dependencies on instructions that muck about
with IF -- not that many.


Re: [PATCH -next 0/2] fs/epoll: loosen irq safety when possible

2018-07-24 Thread Andrew Morton
On Sat, 21 Jul 2018 11:31:27 -0700 Davidlohr Bueso  wrote:

> On Sat, 21 Jul 2018, Peter Zijlstra wrote:
> 
> >On Sat, Jul 21, 2018 at 10:21:20AM -0700, Davidlohr Bueso wrote:
> >> On Fri, 20 Jul 2018, Andrew Morton wrote:
> >>
> >> > We could open-code it locally.  Add a couple of
> >> > WARN_ON_ONCE(irqs_disabled())?  That might need re-benchmarking with
> >> > Xen but surely just reading the thing isn't too expensive?
> >>
> >> We could also pass on the responsibility to lockdep and just use
> >> lockdep_assert_irqs_disabled(). But I guess that would be less effective
> >> than to just open code it in epoll without lockdep -- note that over 80
> >> places in the kernel do this.
> >
> >The lockdep thing is relatively recent. I think someone proposed to go
> >replace a bunch of the open-coded ones at some point.
> 
> For the open coded checks, I'm seeing a small (1-2% ish) cost for bare
> metal on workload 1). I don't see (via code inspection) any additional
> overhead in xen either. While negligible in the overall of things, I do
> like the idea of lockdep handling it nonetheless.

It's good that the overhead goes away when lockdep is disabled.  That's
a big advantage over open-coding it.

> I can add the open coded version if people really feel that it would catch
> more bugs (no lockdep users out there in production afaik :) in the long
> term; but if lockdep is where things are headed...
> 
> ...
>
> --- a/fs/eventpoll.c
> +++ b/fs/eventpoll.c
> @@ -670,6 +670,9 @@ static __poll_t ep_scan_ready_list(struct eventpoll *ep,
>   struct epitem *epi, *nepi;
>   LIST_HEAD(txlist);
>  
> + /* must not be called with irqs off */
> + lockdep_assert_irqs_enabled();

Although I be the humble comment's most avid fan, these ones are too obvious
to be worth the space ;)

Would you survive if I zapped 'em?

--- 
a/fs/eventpoll.c~fs-epoll-robustify-irq-safety-with-lockdep_assert_irqs_enabled-fix
+++ a/fs/eventpoll.c
@@ -670,7 +670,6 @@ static __poll_t ep_scan_ready_list(struc
struct epitem *epi, *nepi;
LIST_HEAD(txlist);
 
-   /* must not be called with irqs off */
lockdep_assert_irqs_enabled();
 
/*
@@ -767,7 +766,6 @@ static int ep_remove(struct eventpoll *e
 {
struct file *file = epi->ffd.file;
 
-   /* must not be called with irqs off */
lockdep_assert_irqs_enabled();
 
/*
@@ -1418,7 +1416,6 @@ static int ep_insert(struct eventpoll *e
struct epitem *epi;
struct ep_pqueue epq;
 
-   /* must not be called with irqs off */
lockdep_assert_irqs_enabled();
 
user_watches = atomic_long_read(&ep->user->epoll_watches);
@@ -1549,7 +1546,6 @@ static int ep_modify(struct eventpoll *e
int pwake = 0;
poll_table pt;
 
-   /* must not be called with irqs off */
lockdep_assert_irqs_enabled();
 
init_poll_funcptr(&pt, NULL);
_



Re: [PATCH -next 0/2] fs/epoll: loosen irq safety when possible

2018-07-21 Thread Davidlohr Bueso

On Sat, 21 Jul 2018, Peter Zijlstra wrote:


On Sat, Jul 21, 2018 at 10:21:20AM -0700, Davidlohr Bueso wrote:

On Fri, 20 Jul 2018, Andrew Morton wrote:

> We could open-code it locally.  Add a couple of
> WARN_ON_ONCE(irqs_disabled())?  That might need re-benchmarking with
> Xen but surely just reading the thing isn't too expensive?

We could also pass on the responsibility to lockdep and just use
lockdep_assert_irqs_disabled(). But I guess that would be less effective
than to just open code it in epoll without lockdep -- note that over 80
places in the kernel do this.


The lockdep thing is relatively recent. I think someone proposed to go
replace a bunch of the open-coded ones at some point.


For the open coded checks, I'm seeing a small (1-2% ish) cost for bare
metal on workload 1). I don't see (via code inspection) any additional
overhead in xen either. While negligible in the overall of things, I do
like the idea of lockdep handling it nonetheless.

I can add the open coded version if people really feel that it would catch
more bugs (no lockdep users out there in production afaik :) in the long
term; but if lockdep is where things are headed...

Thanks,
Davidlohr

---8<
[PATCH -next 3/2] fs/epoll: robustify irq safety with  
lockdep_assert_irqs_enabled()

Sprinkle lockdep_assert_irqs_enabled() checks in the functions that
do not save and restore interrupts when dealing with the ep->wq.lock.
These are ep_scan_ready_list() and those called by epoll_ctl():
ep_insert, ep_modify and ep_remove.

Signed-off-by: Davidlohr Bueso 
---
fs/eventpoll.c | 12 
1 file changed, 12 insertions(+)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 1b1abc461fc0..97b9b73dfec8 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -670,6 +670,9 @@ static __poll_t ep_scan_ready_list(struct eventpoll *ep,
struct epitem *epi, *nepi;
LIST_HEAD(txlist);

+   /* must not be called with irqs off */
+   lockdep_assert_irqs_enabled();
+
/*
 * We need to lock this because we could be hit by
 * eventpoll_release_file() and epoll_ctl().
@@ -764,6 +767,9 @@ static int ep_remove(struct eventpoll *ep, struct epitem 
*epi)
{
struct file *file = epi->ffd.file;

+   /* must not be called with irqs off */
+   lockdep_assert_irqs_enabled();
+
/*
 * Removes poll wait queue hooks.
 */
@@ -1412,6 +1418,9 @@ static int ep_insert(struct eventpoll *ep, const struct 
epoll_event *event,
struct epitem *epi;
struct ep_pqueue epq;

+   /* must not be called with irqs off */
+   lockdep_assert_irqs_enabled();
+
user_watches = atomic_long_read(&ep->user->epoll_watches);
if (unlikely(user_watches >= max_user_watches))
return -ENOSPC;
@@ -1540,6 +1549,9 @@ static int ep_modify(struct eventpoll *ep, struct epitem 
*epi,
int pwake = 0;
poll_table pt;

+   /* must not be called with irqs off */
+   lockdep_assert_irqs_enabled();
+
init_poll_funcptr(&pt, NULL);

/*
--
2.16.4



Re: [PATCH -next 0/2] fs/epoll: loosen irq safety when possible

2018-07-21 Thread Peter Zijlstra
On Sat, Jul 21, 2018 at 10:21:20AM -0700, Davidlohr Bueso wrote:
> On Fri, 20 Jul 2018, Andrew Morton wrote:
> 
> > We could open-code it locally.  Add a couple of
> > WARN_ON_ONCE(irqs_disabled())?  That might need re-benchmarking with
> > Xen but surely just reading the thing isn't too expensive?
> 
> We could also pass on the responsibility to lockdep and just use
> lockdep_assert_irqs_disabled(). But I guess that would be less effective
> than to just open code it in epoll without lockdep -- note that over 80
> places in the kernel do this.

The lockdep thing is relatively recent. I think someone proposed to go
replace a bunch of the open-coded ones at some point.


Re: [PATCH -next 0/2] fs/epoll: loosen irq safety when possible

2018-07-21 Thread Davidlohr Bueso

On Fri, 20 Jul 2018, Andrew Morton wrote:


We could open-code it locally.  Add a couple of
WARN_ON_ONCE(irqs_disabled())?  That might need re-benchmarking with
Xen but surely just reading the thing isn't too expensive?


We could also pass on the responsibility to lockdep and just use
lockdep_assert_irqs_disabled(). But I guess that would be less effective
than to just open code it in epoll without lockdep -- note that over 80
places in the kernel do this.

Thanks,
Davidlohr


Re: [PATCH -next 0/2] fs/epoll: loosen irq safety when possible

2018-07-20 Thread Davidlohr Bueso

On Fri, 20 Jul 2018, Andrew Morton wrote:


Did you try measuring it on bare hardware?


I did and wasn't expecting much difference.

For a 2-socket 40-core (ht) IvyBridge on a few workloads, unfortunately
I don't have a xen environment and the results for Xen I do have (which numbers
are in patch 1) I don't have the actual workload, so cannot compare them 
directly.

1) Different configurations were used for a epoll_wait (pipes io) microbench
(http://linux-scalability.org/epoll/epoll-test.c) and shows around a 7-10%
improvement in overall total number of times the epoll_wait() loops when using
both regular and nested epolls, so very raw numbers, but measurable nonetheless.

# threads   vanilla dirty
1   1677717 1805587
2   1660510 1854064
4   1610184 1805484
8   1577696 1751222
16  1568837 1725299
32  1291532 1378463
64   752584  787368

Note that stddev is pretty small.

2) Another pipe test, which shows no real measurable improvement.
(http://www.xmailserver.org/linux-patches/pipetest.c)


>
>I'd have more confidence if we had some warning mechanism if we run
>spin_lock_irq() when IRQs are disabled, which is probably-a-bug.  But
>afaict we don't have that.  Probably for good reasons - I wonder what
>they are?


Well ignored ;)

We could open-code it locally.  Add a couple of
WARN_ON_ONCE(irqs_disabled())?  That might need re-benchmarking with
Xen but surely just reading the thing isn't too expensive?


I agree, I'll see what I can come up with and also ask the customer to test
in his setup. Bare metal would also need some new numbers I guess.

Thanks,
Davidlohr


Re: [PATCH -next 0/2] fs/epoll: loosen irq safety when possible

2018-07-20 Thread Andrew Morton
On Fri, 20 Jul 2018 13:05:59 -0700 Davidlohr Bueso  wrote:

> On Fri, 20 Jul 2018, Andrew Morton wrote:
> 
> >On Fri, 20 Jul 2018 10:29:54 -0700 Davidlohr Bueso  wrote:
> >
> >> Hi,
> >>
> >> Both patches replace saving+restoring interrupts when taking the
> >> ep->lock (now the waitqueue lock), with just disabling local irqs.
> >> This shows immediate performance benefits in patch 1 for an epoll
> >> workload running on Xen.
> >
> >I'm surprised.  Is spin_lock_irqsave() significantly more expensive
> >than spin_lock_irq()?  Relative to all the other stuff those functions
> >are doing?  If so, how come?  Some architectural thing makes
> >local_irq_save() much more costly than local_irq_disable()?
> 
> For example, if you compare x86 native_restore_fl() to xen_restore_fl(),
> the cost of Xen is much higher.
> 
> And at least considering ep_scan_ready_list(), the lock is taken/released
> twice, to deal with the ovflist when the ep->wq.lock is not held. To the
> point that it yields measurable results (see patch 1) across incremental
> thread counts.

Did you try measuring it on bare hardware?

> >
> >> The main concern we need to have with this
> >> sort of changes in epoll is the ep_poll_callback() which is passed
> >> to the wait queue wakeup and is done very often under irq context,
> >> this patch does not touch this call.
> >
> >Yeah, these changes are scary.  For the code as it stands now, and for
> >the code as it evolves.
> 
> Yes which is why I've been throwing lots of epoll workloads at it.

I'm sure.  It's the "as it evolves" that is worrisome, and has caught
us in the past.

> >
> >I'd have more confidence if we had some warning mechanism if we run
> >spin_lock_irq() when IRQs are disabled, which is probably-a-bug.  But
> >afaict we don't have that.  Probably for good reasons - I wonder what
> >they are?

Well ignored ;)

We could open-code it locally.  Add a couple of
WARN_ON_ONCE(irqs_disabled())?  That might need re-benchmarking with
Xen but surely just reading the thing isn't too expensive?




Re: [PATCH -next 0/2] fs/epoll: loosen irq safety when possible

2018-07-20 Thread Davidlohr Bueso

On Fri, 20 Jul 2018, Andrew Morton wrote:


On Fri, 20 Jul 2018 10:29:54 -0700 Davidlohr Bueso  wrote:


Hi,

Both patches replace saving+restoring interrupts when taking the
ep->lock (now the waitqueue lock), with just disabling local irqs.
This shows immediate performance benefits in patch 1 for an epoll
workload running on Xen.


I'm surprised.  Is spin_lock_irqsave() significantly more expensive
than spin_lock_irq()?  Relative to all the other stuff those functions
are doing?  If so, how come?  Some architectural thing makes
local_irq_save() much more costly than local_irq_disable()?


For example, if you compare x86 native_restore_fl() to xen_restore_fl(),
the cost of Xen is much higher.

And at least considering ep_scan_ready_list(), the lock is taken/released
twice, to deal with the ovflist when the ep->wq.lock is not held. To the
point that it yields measurable results (see patch 1) across incremental
thread counts.




The main concern we need to have with this
sort of changes in epoll is the ep_poll_callback() which is passed
to the wait queue wakeup and is done very often under irq context,
this patch does not touch this call.


Yeah, these changes are scary.  For the code as it stands now, and for
the code as it evolves.


Yes which is why I've been throwing lots of epoll workloads at it.



I'd have more confidence if we had some warning mechanism if we run
spin_lock_irq() when IRQs are disabled, which is probably-a-bug.  But
afaict we don't have that.  Probably for good reasons - I wonder what
they are?


Patches have been tested pretty heavily with the customer workload,
microbenchmarks, ltp testcases and two high level workloads that
use epoll under the hood: nginx and libevent benchmarks.

Details are in the individual patches.

Applies on top of mmotd.


Please convince me about the performance benefits?


As for number I only have patch 1.

Thanks,
Davidlohr


Re: [PATCH -next 0/2] fs/epoll: loosen irq safety when possible

2018-07-20 Thread Andrew Morton
On Fri, 20 Jul 2018 10:29:54 -0700 Davidlohr Bueso  wrote:

> Hi,
> 
> Both patches replace saving+restoring interrupts when taking the
> ep->lock (now the waitqueue lock), with just disabling local irqs.
> This shows immediate performance benefits in patch 1 for an epoll
> workload running on Xen.

I'm surprised.  Is spin_lock_irqsave() significantly more expensive
than spin_lock_irq()?  Relative to all the other stuff those functions
are doing?  If so, how come?  Some architectural thing makes
local_irq_save() much more costly than local_irq_disable()?

> The main concern we need to have with this
> sort of changes in epoll is the ep_poll_callback() which is passed
> to the wait queue wakeup and is done very often under irq context,
> this patch does not touch this call.

Yeah, these changes are scary.  For the code as it stands now, and for
the code as it evolves.

I'd have more confidence if we had some warning mechanism if we run
spin_lock_irq() when IRQs are disabled, which is probably-a-bug.  But
afaict we don't have that.  Probably for good reasons - I wonder what
they are?

> Patches have been tested pretty heavily with the customer workload,
> microbenchmarks, ltp testcases and two high level workloads that
> use epoll under the hood: nginx and libevent benchmarks.
> 
> Details are in the individual patches.
> 
> Applies on top of mmotd.

Please convince me about the performance benefits?


[PATCH -next 0/2] fs/epoll: loosen irq safety when possible

2018-07-20 Thread Davidlohr Bueso
Hi,

Both patches replace saving+restoring interrupts when taking the
ep->lock (now the waitqueue lock), with just disabling local irqs.
This shows immediate performance benefits in patch 1 for an epoll
workload running on Xen. The main concern we need to have with this
sort of changes in epoll is the ep_poll_callback() which is passed
to the wait queue wakeup and is done very often under irq context,
this patch does not touch this call.

Patches have been tested pretty heavily with the customer workload,
microbenchmarks, ltp testcases and two high level workloads that
use epoll under the hood: nginx and libevent benchmarks.

Details are in the individual patches.

Applies on top of mmotd.

Thanks!

Davidlohr Bueso (2):
  fs/epoll: loosen irq safety in ep_scan_ready_list()
  fs/epoll: loosen irq safety in epoll_insert() and epoll_remove()

 fs/eventpoll.c | 23 ++-
 1 file changed, 10 insertions(+), 13 deletions(-)

-- 
2.16.4