On Fri, Apr 07, 2017 at 01:30:11AM +1000, Nicholas Piggin wrote:
> On Thu, 6 Apr 2017 15:13:53 +0100
> Will Deacon wrote:
> > On Thu, Apr 06, 2017 at 10:59:58AM +1000, Nicholas Piggin wrote:
> > > Thanks for taking a look. The default spin primitives should just
> > > continue to do the right thin
On Fri, 7 Apr 2017 11:43:49 +0200
Peter Zijlstra wrote:
> On Thu, Apr 06, 2017 at 10:31:46AM -0700, Linus Torvalds wrote:
> > But maybe "monitor" is really cheap. I suspect it's microcoded,
> > though, which implies "no".
>
> On my IVB-EP (will also try on something newer):
>
> MONITOR
On Thu, Apr 06, 2017 at 10:31:46AM -0700, Linus Torvalds wrote:
> But maybe "monitor" is really cheap. I suspect it's microcoded,
> though, which implies "no".
On my IVB-EP (will also try on something newer):
MONITOR ~332 cycles
MWAIT ~224 cycles (C0, explicitly invalidated MONITOR)
So yes, ex
On Thu, 6 Apr 2017 12:41:52 -0700
Linus Torvalds wrote:
> On Thu, Apr 6, 2017 at 12:23 PM, Peter Zijlstra wrote:
> >
> > Something like so then. According to the SDM mwait is a no-op if we do
> > not execute monitor first. So this variant should get the first
> > iteration without expensive inst
On Thu, Apr 6, 2017 at 12:23 PM, Peter Zijlstra wrote:
>
> Something like so then. According to the SDM mwait is a no-op if we do
> not execute monitor first. So this variant should get the first
> iteration without expensive instructions.
No, the problem is that we *would* have executed a prior
On Thu, Apr 06, 2017 at 10:31:46AM -0700, Linus Torvalds wrote:
> And we'd probably want to make it even more strict, in that soem mwait
> implementations might simply not be very good for short waits.
Yeah, we need to find something that works; assuming its beneficial at
all on modern chips.
> >
On Thu, Apr 6, 2017 at 9:36 AM, Peter Zijlstra wrote:
>
> Something like the below, which is ugly (because I couldn't be bothered
> to resolve the header recursion and thus duplicates the monitor/mwait
> functions) and broken (because it hard assumes the hardware can do
> monitor/mwait).
Yeah, I
On Thu, Apr 06, 2017 at 08:16:19AM -0700, Linus Torvalds wrote:
> In theory x86 could use monitor/mwait for it too, in practice I think
> it tends to still be too high latency (because it was originally just
> designed for the idle loop). mwait got extended to actually be useful,
> but I'm not sur
On Thu, 6 Apr 2017 15:13:53 +0100
Will Deacon wrote:
> Hi Nick,
>
> On Thu, Apr 06, 2017 at 10:59:58AM +1000, Nicholas Piggin wrote:
> > On Wed, 05 Apr 2017 07:01:57 -0700 (PDT)
> > David Miller wrote:
> >
> > > From: Nicholas Piggin
> > > Date: Tue, 4 Apr 2017 13:02:33 +1000
> > >
> > >
On Thu, Apr 6, 2017 at 7:13 AM, Will Deacon wrote:
>
> We've wrapped this up in the arm64 code as __cmpwait, and we use that
> to build smp_cond_load_acquire. It would be nice to use the same machinery
> for the conditional spinning here, unless you anticipate that we're only
> going to be spinnin
Hi Nick,
On Thu, Apr 06, 2017 at 10:59:58AM +1000, Nicholas Piggin wrote:
> On Wed, 05 Apr 2017 07:01:57 -0700 (PDT)
> David Miller wrote:
>
> > From: Nicholas Piggin
> > Date: Tue, 4 Apr 2017 13:02:33 +1000
> >
> > > On Mon, 3 Apr 2017 17:43:05 -0700
> > > Linus Torvalds wrote:
> > >
> >
On Wed, 05 Apr 2017 07:01:57 -0700 (PDT)
David Miller wrote:
> From: Nicholas Piggin
> Date: Tue, 4 Apr 2017 13:02:33 +1000
>
> > On Mon, 3 Apr 2017 17:43:05 -0700
> > Linus Torvalds wrote:
> >
> >> But that depends on architectures having some pattern that we *can*
> >> abstract. Would som
From: Nicholas Piggin
Date: Tue, 4 Apr 2017 13:02:33 +1000
> On Mon, 3 Apr 2017 17:43:05 -0700
> Linus Torvalds wrote:
>
>> But that depends on architectures having some pattern that we *can*
>> abstract. Would some "begin/in-loop/end" pattern like the above be
>> sufficient?
>
> Yes. begin/in
On Tue, 4 Apr 2017 13:02:33 +1000
Nicholas Piggin wrote:
> On Mon, 3 Apr 2017 17:43:05 -0700
> Linus Torvalds wrote:
>
> > But that depends on architectures having some pattern that we *can*
> > abstract. Would some "begin/in-loop/end" pattern like the above be
> > sufficient?
>
> Yes. begi
On Mon, 3 Apr 2017 17:43:05 -0700
Linus Torvalds wrote:
> On Mon, Apr 3, 2017 at 4:50 PM, Nicholas Piggin wrote:
> > If you have any ideas, I'd be open to them.
>
> So the idea would be that maybe we can just make those things
> explicit. IOW, instead of having that magical looping construct
On Mon, Apr 3, 2017 at 4:50 PM, Nicholas Piggin wrote:
>
> POWER does not have an instruction like pause. We can only set current
> thread priority, and current implementations do something like allocate
> issue cycles to threads based on relative priorities. So there should
> be at least one or t
On Mon, 3 Apr 2017 08:31:30 -0700
Linus Torvalds wrote:
> On Mon, Apr 3, 2017 at 1:13 AM, Nicholas Piggin wrote:
> >
> > The loops have some restrictions on what can be used, but they are
> > intended to be small and simple so it's not generally a problem:
> > - Don't use cpu_relax.
> > - Don'
On Mon, Apr 3, 2017 at 1:13 AM, Nicholas Piggin wrote:
>
> The loops have some restrictions on what can be used, but they are
> intended to be small and simple so it's not generally a problem:
> - Don't use cpu_relax.
> - Don't use return or goto.
> - Don't use sleeping or spinning primitives.
Hi,
I would like to revisit this again and see if people are opposed to this
arch primitive. We have attributed cases of suboptimal performance on
real customer workloads to this, so I'd like to find a solution.
Since last posting, I promised the s390 people I'd consider hypervisor
yield addition
19 matches
Mail list logo