Re: [RFC][PATCH] spin loop arch primitives for busy waiting

2017-04-07 Thread Will Deacon
On Fri, Apr 07, 2017 at 01:30:11AM +1000, Nicholas Piggin wrote: > On Thu, 6 Apr 2017 15:13:53 +0100 > Will Deacon wrote: > > On Thu, Apr 06, 2017 at 10:59:58AM +1000, Nicholas Piggin wrote: > > > Thanks for taking a look. The default spin primitives should just > > >

Re: [RFC][PATCH] spin loop arch primitives for busy waiting

2017-04-07 Thread Will Deacon
On Fri, Apr 07, 2017 at 01:30:11AM +1000, Nicholas Piggin wrote: > On Thu, 6 Apr 2017 15:13:53 +0100 > Will Deacon wrote: > > On Thu, Apr 06, 2017 at 10:59:58AM +1000, Nicholas Piggin wrote: > > > Thanks for taking a look. The default spin primitives should just > > > continue to do the right

Re: [RFC][PATCH] spin loop arch primitives for busy waiting

2017-04-07 Thread Nicholas Piggin
On Fri, 7 Apr 2017 11:43:49 +0200 Peter Zijlstra wrote: > On Thu, Apr 06, 2017 at 10:31:46AM -0700, Linus Torvalds wrote: > > But maybe "monitor" is really cheap. I suspect it's microcoded, > > though, which implies "no". > > On my IVB-EP (will also try on something

Re: [RFC][PATCH] spin loop arch primitives for busy waiting

2017-04-07 Thread Nicholas Piggin
On Fri, 7 Apr 2017 11:43:49 +0200 Peter Zijlstra wrote: > On Thu, Apr 06, 2017 at 10:31:46AM -0700, Linus Torvalds wrote: > > But maybe "monitor" is really cheap. I suspect it's microcoded, > > though, which implies "no". > > On my IVB-EP (will also try on something newer): > > MONITOR

Re: [RFC][PATCH] spin loop arch primitives for busy waiting

2017-04-07 Thread Peter Zijlstra
On Thu, Apr 06, 2017 at 10:31:46AM -0700, Linus Torvalds wrote: > But maybe "monitor" is really cheap. I suspect it's microcoded, > though, which implies "no". On my IVB-EP (will also try on something newer): MONITOR ~332 cycles MWAIT ~224 cycles (C0, explicitly invalidated MONITOR) So yes,

Re: [RFC][PATCH] spin loop arch primitives for busy waiting

2017-04-07 Thread Peter Zijlstra
On Thu, Apr 06, 2017 at 10:31:46AM -0700, Linus Torvalds wrote: > But maybe "monitor" is really cheap. I suspect it's microcoded, > though, which implies "no". On my IVB-EP (will also try on something newer): MONITOR ~332 cycles MWAIT ~224 cycles (C0, explicitly invalidated MONITOR) So yes,

Re: [RFC][PATCH] spin loop arch primitives for busy waiting

2017-04-06 Thread Nicholas Piggin
On Thu, 6 Apr 2017 12:41:52 -0700 Linus Torvalds wrote: > On Thu, Apr 6, 2017 at 12:23 PM, Peter Zijlstra wrote: > > > > Something like so then. According to the SDM mwait is a no-op if we do > > not execute monitor first. So this variant

Re: [RFC][PATCH] spin loop arch primitives for busy waiting

2017-04-06 Thread Nicholas Piggin
On Thu, 6 Apr 2017 12:41:52 -0700 Linus Torvalds wrote: > On Thu, Apr 6, 2017 at 12:23 PM, Peter Zijlstra wrote: > > > > Something like so then. According to the SDM mwait is a no-op if we do > > not execute monitor first. So this variant should get the first > > iteration without expensive

Re: [RFC][PATCH] spin loop arch primitives for busy waiting

2017-04-06 Thread Linus Torvalds
On Thu, Apr 6, 2017 at 12:23 PM, Peter Zijlstra wrote: > > Something like so then. According to the SDM mwait is a no-op if we do > not execute monitor first. So this variant should get the first > iteration without expensive instructions. No, the problem is that we *would*

Re: [RFC][PATCH] spin loop arch primitives for busy waiting

2017-04-06 Thread Linus Torvalds
On Thu, Apr 6, 2017 at 12:23 PM, Peter Zijlstra wrote: > > Something like so then. According to the SDM mwait is a no-op if we do > not execute monitor first. So this variant should get the first > iteration without expensive instructions. No, the problem is that we *would* have executed a prior

Re: [RFC][PATCH] spin loop arch primitives for busy waiting

2017-04-06 Thread Peter Zijlstra
On Thu, Apr 06, 2017 at 10:31:46AM -0700, Linus Torvalds wrote: > And we'd probably want to make it even more strict, in that soem mwait > implementations might simply not be very good for short waits. Yeah, we need to find something that works; assuming its beneficial at all on modern chips. >

Re: [RFC][PATCH] spin loop arch primitives for busy waiting

2017-04-06 Thread Peter Zijlstra
On Thu, Apr 06, 2017 at 10:31:46AM -0700, Linus Torvalds wrote: > And we'd probably want to make it even more strict, in that soem mwait > implementations might simply not be very good for short waits. Yeah, we need to find something that works; assuming its beneficial at all on modern chips. >

Re: [RFC][PATCH] spin loop arch primitives for busy waiting

2017-04-06 Thread Linus Torvalds
On Thu, Apr 6, 2017 at 9:36 AM, Peter Zijlstra wrote: > > Something like the below, which is ugly (because I couldn't be bothered > to resolve the header recursion and thus duplicates the monitor/mwait > functions) and broken (because it hard assumes the hardware can do >

Re: [RFC][PATCH] spin loop arch primitives for busy waiting

2017-04-06 Thread Linus Torvalds
On Thu, Apr 6, 2017 at 9:36 AM, Peter Zijlstra wrote: > > Something like the below, which is ugly (because I couldn't be bothered > to resolve the header recursion and thus duplicates the monitor/mwait > functions) and broken (because it hard assumes the hardware can do > monitor/mwait). Yeah, I

Re: [RFC][PATCH] spin loop arch primitives for busy waiting

2017-04-06 Thread Peter Zijlstra
On Thu, Apr 06, 2017 at 08:16:19AM -0700, Linus Torvalds wrote: > In theory x86 could use monitor/mwait for it too, in practice I think > it tends to still be too high latency (because it was originally just > designed for the idle loop). mwait got extended to actually be useful, > but I'm not

Re: [RFC][PATCH] spin loop arch primitives for busy waiting

2017-04-06 Thread Peter Zijlstra
On Thu, Apr 06, 2017 at 08:16:19AM -0700, Linus Torvalds wrote: > In theory x86 could use monitor/mwait for it too, in practice I think > it tends to still be too high latency (because it was originally just > designed for the idle loop). mwait got extended to actually be useful, > but I'm not

Re: [RFC][PATCH] spin loop arch primitives for busy waiting

2017-04-06 Thread Nicholas Piggin
On Thu, 6 Apr 2017 15:13:53 +0100 Will Deacon wrote: > Hi Nick, > > On Thu, Apr 06, 2017 at 10:59:58AM +1000, Nicholas Piggin wrote: > > On Wed, 05 Apr 2017 07:01:57 -0700 (PDT) > > David Miller wrote: > > > > > From: Nicholas Piggin

Re: [RFC][PATCH] spin loop arch primitives for busy waiting

2017-04-06 Thread Nicholas Piggin
On Thu, 6 Apr 2017 15:13:53 +0100 Will Deacon wrote: > Hi Nick, > > On Thu, Apr 06, 2017 at 10:59:58AM +1000, Nicholas Piggin wrote: > > On Wed, 05 Apr 2017 07:01:57 -0700 (PDT) > > David Miller wrote: > > > > > From: Nicholas Piggin > > > Date: Tue, 4 Apr 2017 13:02:33 +1000 > > > > >

Re: [RFC][PATCH] spin loop arch primitives for busy waiting

2017-04-06 Thread Linus Torvalds
On Thu, Apr 6, 2017 at 7:13 AM, Will Deacon wrote: > > We've wrapped this up in the arm64 code as __cmpwait, and we use that > to build smp_cond_load_acquire. It would be nice to use the same machinery > for the conditional spinning here, unless you anticipate that we're only

Re: [RFC][PATCH] spin loop arch primitives for busy waiting

2017-04-06 Thread Linus Torvalds
On Thu, Apr 6, 2017 at 7:13 AM, Will Deacon wrote: > > We've wrapped this up in the arm64 code as __cmpwait, and we use that > to build smp_cond_load_acquire. It would be nice to use the same machinery > for the conditional spinning here, unless you anticipate that we're only > going to be

Re: [RFC][PATCH] spin loop arch primitives for busy waiting

2017-04-06 Thread Will Deacon
Hi Nick, On Thu, Apr 06, 2017 at 10:59:58AM +1000, Nicholas Piggin wrote: > On Wed, 05 Apr 2017 07:01:57 -0700 (PDT) > David Miller wrote: > > > From: Nicholas Piggin > > Date: Tue, 4 Apr 2017 13:02:33 +1000 > > > > > On Mon, 3 Apr 2017 17:43:05 -0700 >

Re: [RFC][PATCH] spin loop arch primitives for busy waiting

2017-04-06 Thread Will Deacon
Hi Nick, On Thu, Apr 06, 2017 at 10:59:58AM +1000, Nicholas Piggin wrote: > On Wed, 05 Apr 2017 07:01:57 -0700 (PDT) > David Miller wrote: > > > From: Nicholas Piggin > > Date: Tue, 4 Apr 2017 13:02:33 +1000 > > > > > On Mon, 3 Apr 2017 17:43:05 -0700 > > > Linus Torvalds wrote: > > > > >

Re: [RFC][PATCH] spin loop arch primitives for busy waiting

2017-04-05 Thread Nicholas Piggin
On Wed, 05 Apr 2017 07:01:57 -0700 (PDT) David Miller wrote: > From: Nicholas Piggin > Date: Tue, 4 Apr 2017 13:02:33 +1000 > > > On Mon, 3 Apr 2017 17:43:05 -0700 > > Linus Torvalds wrote: > > > >> But that depends on

Re: [RFC][PATCH] spin loop arch primitives for busy waiting

2017-04-05 Thread Nicholas Piggin
On Wed, 05 Apr 2017 07:01:57 -0700 (PDT) David Miller wrote: > From: Nicholas Piggin > Date: Tue, 4 Apr 2017 13:02:33 +1000 > > > On Mon, 3 Apr 2017 17:43:05 -0700 > > Linus Torvalds wrote: > > > >> But that depends on architectures having some pattern that we *can* > >> abstract. Would

Re: [RFC][PATCH] spin loop arch primitives for busy waiting

2017-04-05 Thread David Miller
From: Nicholas Piggin Date: Tue, 4 Apr 2017 13:02:33 +1000 > On Mon, 3 Apr 2017 17:43:05 -0700 > Linus Torvalds wrote: > >> But that depends on architectures having some pattern that we *can* >> abstract. Would some "begin/in-loop/end" pattern

Re: [RFC][PATCH] spin loop arch primitives for busy waiting

2017-04-05 Thread David Miller
From: Nicholas Piggin Date: Tue, 4 Apr 2017 13:02:33 +1000 > On Mon, 3 Apr 2017 17:43:05 -0700 > Linus Torvalds wrote: > >> But that depends on architectures having some pattern that we *can* >> abstract. Would some "begin/in-loop/end" pattern like the above be >> sufficient? > > Yes.

Re: [RFC][PATCH] spin loop arch primitives for busy waiting

2017-04-03 Thread Nicholas Piggin
On Tue, 4 Apr 2017 13:02:33 +1000 Nicholas Piggin wrote: > On Mon, 3 Apr 2017 17:43:05 -0700 > Linus Torvalds wrote: > > > But that depends on architectures having some pattern that we *can* > > abstract. Would some "begin/in-loop/end" pattern

Re: [RFC][PATCH] spin loop arch primitives for busy waiting

2017-04-03 Thread Nicholas Piggin
On Tue, 4 Apr 2017 13:02:33 +1000 Nicholas Piggin wrote: > On Mon, 3 Apr 2017 17:43:05 -0700 > Linus Torvalds wrote: > > > But that depends on architectures having some pattern that we *can* > > abstract. Would some "begin/in-loop/end" pattern like the above be > > sufficient? > > Yes.

Re: [RFC][PATCH] spin loop arch primitives for busy waiting

2017-04-03 Thread Nicholas Piggin
On Mon, 3 Apr 2017 17:43:05 -0700 Linus Torvalds wrote: > On Mon, Apr 3, 2017 at 4:50 PM, Nicholas Piggin wrote: > > If you have any ideas, I'd be open to them. > > So the idea would be that maybe we can just make those things > explicit.

Re: [RFC][PATCH] spin loop arch primitives for busy waiting

2017-04-03 Thread Nicholas Piggin
On Mon, 3 Apr 2017 17:43:05 -0700 Linus Torvalds wrote: > On Mon, Apr 3, 2017 at 4:50 PM, Nicholas Piggin wrote: > > If you have any ideas, I'd be open to them. > > So the idea would be that maybe we can just make those things > explicit. IOW, instead of having that magical looping

Re: [RFC][PATCH] spin loop arch primitives for busy waiting

2017-04-03 Thread Linus Torvalds
On Mon, Apr 3, 2017 at 4:50 PM, Nicholas Piggin wrote: > > POWER does not have an instruction like pause. We can only set current > thread priority, and current implementations do something like allocate > issue cycles to threads based on relative priorities. So there should >

Re: [RFC][PATCH] spin loop arch primitives for busy waiting

2017-04-03 Thread Linus Torvalds
On Mon, Apr 3, 2017 at 4:50 PM, Nicholas Piggin wrote: > > POWER does not have an instruction like pause. We can only set current > thread priority, and current implementations do something like allocate > issue cycles to threads based on relative priorities. So there should > be at least one or

Re: [RFC][PATCH] spin loop arch primitives for busy waiting

2017-04-03 Thread Nicholas Piggin
On Mon, 3 Apr 2017 08:31:30 -0700 Linus Torvalds wrote: > On Mon, Apr 3, 2017 at 1:13 AM, Nicholas Piggin wrote: > > > > The loops have some restrictions on what can be used, but they are > > intended to be small and simple so it's not generally

Re: [RFC][PATCH] spin loop arch primitives for busy waiting

2017-04-03 Thread Nicholas Piggin
On Mon, 3 Apr 2017 08:31:30 -0700 Linus Torvalds wrote: > On Mon, Apr 3, 2017 at 1:13 AM, Nicholas Piggin wrote: > > > > The loops have some restrictions on what can be used, but they are > > intended to be small and simple so it's not generally a problem: > > - Don't use cpu_relax. > > -

Re: [RFC][PATCH] spin loop arch primitives for busy waiting

2017-04-03 Thread Linus Torvalds
On Mon, Apr 3, 2017 at 1:13 AM, Nicholas Piggin wrote: > > The loops have some restrictions on what can be used, but they are > intended to be small and simple so it's not generally a problem: > - Don't use cpu_relax. > - Don't use return or goto. > - Don't use sleeping or

Re: [RFC][PATCH] spin loop arch primitives for busy waiting

2017-04-03 Thread Linus Torvalds
On Mon, Apr 3, 2017 at 1:13 AM, Nicholas Piggin wrote: > > The loops have some restrictions on what can be used, but they are > intended to be small and simple so it's not generally a problem: > - Don't use cpu_relax. > - Don't use return or goto. > - Don't use sleeping or spinning primitives.

[RFC][PATCH] spin loop arch primitives for busy waiting

2017-04-03 Thread Nicholas Piggin
Hi, I would like to revisit this again and see if people are opposed to this arch primitive. We have attributed cases of suboptimal performance on real customer workloads to this, so I'd like to find a solution. Since last posting, I promised the s390 people I'd consider hypervisor yield

[RFC][PATCH] spin loop arch primitives for busy waiting

2017-04-03 Thread Nicholas Piggin
Hi, I would like to revisit this again and see if people are opposed to this arch primitive. We have attributed cases of suboptimal performance on real customer workloads to this, so I'd like to find a solution. Since last posting, I promised the s390 people I'd consider hypervisor yield