On Thu, Sep 01, 2016 at 10:17:57PM +0800, Boqun Feng wrote:
> On Thu, Sep 01, 2016 at 08:57:38AM +0200, Peter Zijlstra wrote:
> Could there be some code that relies on the full barrier semantics of
> schedule() to provide transitivity?
Could, sure, who knows. RCU might, although Paul typically
On Thu, Sep 01, 2016 at 10:17:57PM +0800, Boqun Feng wrote:
> On Thu, Sep 01, 2016 at 08:57:38AM +0200, Peter Zijlstra wrote:
> Could there be some code that relies on the full barrier semantics of
> schedule() to provide transitivity?
Could, sure, who knows. RCU might, although Paul typically
On Thu, Sep 01, 2016 at 08:57:38AM +0200, Peter Zijlstra wrote:
> On Thu, Sep 01, 2016 at 07:47:10AM +1000, Benjamin Herrenschmidt wrote:
>
> > > OK, for giggles, could you (or Balbir) check what happens if you take
> > > that sync out?
>
> > The problem is no amount of testing can tell you it
On Thu, Sep 01, 2016 at 08:57:38AM +0200, Peter Zijlstra wrote:
> On Thu, Sep 01, 2016 at 07:47:10AM +1000, Benjamin Herrenschmidt wrote:
>
> > > OK, for giggles, could you (or Balbir) check what happens if you take
> > > that sync out?
>
> > The problem is no amount of testing can tell you it
On 01/09/16 11:48, Alexey Kardashevskiy wrote:
> On 31/08/16 17:28, Peter Zijlstra wrote:
>> On Wed, Aug 31, 2016 at 01:41:33PM +1000, Balbir Singh wrote:
>>> On 30/08/16 22:19, Peter Zijlstra wrote:
On Tue, Aug 30, 2016 at 06:49:37PM +1000, Balbir Singh wrote:
>
>
> The origin of
On 01/09/16 11:48, Alexey Kardashevskiy wrote:
> On 31/08/16 17:28, Peter Zijlstra wrote:
>> On Wed, Aug 31, 2016 at 01:41:33PM +1000, Balbir Singh wrote:
>>> On 30/08/16 22:19, Peter Zijlstra wrote:
On Tue, Aug 30, 2016 at 06:49:37PM +1000, Balbir Singh wrote:
>
>
> The origin of
On Thu, Sep 01, 2016 at 07:47:10AM +1000, Benjamin Herrenschmidt wrote:
> > OK, for giggles, could you (or Balbir) check what happens if you take
> > that sync out?
> The problem is no amount of testing can tell you it works for sure :-)
It breaking does prove the negative though, so still
On Thu, Sep 01, 2016 at 07:47:10AM +1000, Benjamin Herrenschmidt wrote:
> > OK, for giggles, could you (or Balbir) check what happens if you take
> > that sync out?
> The problem is no amount of testing can tell you it works for sure :-)
It breaking does prove the negative though, so still
On 01/09/16 07:47, Benjamin Herrenschmidt wrote:
> On Wed, 2016-08-31 at 15:31 +0200, Peter Zijlstra wrote:
>> On Wed, Aug 31, 2016 at 07:28:18AM +1000, Benjamin Herrenschmidt
>> wrote:
>>
>>>
>>> On powerpc we have a sync deep in _switch to achieve that.
>>
>> OK, for giggles, could you (or
On 01/09/16 07:47, Benjamin Herrenschmidt wrote:
> On Wed, 2016-08-31 at 15:31 +0200, Peter Zijlstra wrote:
>> On Wed, Aug 31, 2016 at 07:28:18AM +1000, Benjamin Herrenschmidt
>> wrote:
>>
>>>
>>> On powerpc we have a sync deep in _switch to achieve that.
>>
>> OK, for giggles, could you (or
On 31/08/16 17:28, Peter Zijlstra wrote:
> On Wed, Aug 31, 2016 at 01:41:33PM +1000, Balbir Singh wrote:
>> On 30/08/16 22:19, Peter Zijlstra wrote:
>>> On Tue, Aug 30, 2016 at 06:49:37PM +1000, Balbir Singh wrote:
The origin of the issue I've seen seems to be related to
rwsem
On 31/08/16 17:28, Peter Zijlstra wrote:
> On Wed, Aug 31, 2016 at 01:41:33PM +1000, Balbir Singh wrote:
>> On 30/08/16 22:19, Peter Zijlstra wrote:
>>> On Tue, Aug 30, 2016 at 06:49:37PM +1000, Balbir Singh wrote:
The origin of the issue I've seen seems to be related to
rwsem
On Wed, 2016-08-31 at 15:31 +0200, Peter Zijlstra wrote:
> On Wed, Aug 31, 2016 at 07:28:18AM +1000, Benjamin Herrenschmidt
> wrote:
>
> >
> > On powerpc we have a sync deep in _switch to achieve that.
>
> OK, for giggles, could you (or Balbir) check what happens if you take
> that sync out?
>
On Wed, 2016-08-31 at 15:31 +0200, Peter Zijlstra wrote:
> On Wed, Aug 31, 2016 at 07:28:18AM +1000, Benjamin Herrenschmidt
> wrote:
>
> >
> > On powerpc we have a sync deep in _switch to achieve that.
>
> OK, for giggles, could you (or Balbir) check what happens if you take
> that sync out?
>
On Wed, Aug 31, 2016 at 07:28:18AM +1000, Benjamin Herrenschmidt wrote:
> On powerpc we have a sync deep in _switch to achieve that.
OK, for giggles, could you (or Balbir) check what happens if you take
that sync out?
There should be enough serialization in the generic code to cover the
case
On Wed, Aug 31, 2016 at 07:28:18AM +1000, Benjamin Herrenschmidt wrote:
> On powerpc we have a sync deep in _switch to achieve that.
OK, for giggles, could you (or Balbir) check what happens if you take
that sync out?
There should be enough serialization in the generic code to cover the
case
On Wed, 2016-08-31 at 09:18 +0200, Peter Zijlstra wrote:
> On Wed, Aug 31, 2016 at 07:28:18AM +1000, Benjamin Herrenschmidt
> wrote:
> >
> > It's always been a requirement that if you actually context switch
> > a
> > full mb() is implied ...
>
> >
> > On powerpc we have a sync deep in _switch
On Wed, 2016-08-31 at 09:20 +0200, Peter Zijlstra wrote:
> On Wed, Aug 31, 2016 at 07:25:01AM +1000, Benjamin Herrenschmidt wrote:
> >
> > On Tue, 2016-08-30 at 15:04 +0200, Oleg Nesterov wrote:
> > >
> > >
> > > Confused... how this connects to UNLOCK+LOCK on rq->lock? A LOAD can
> > > leak
On Wed, 2016-08-31 at 09:18 +0200, Peter Zijlstra wrote:
> On Wed, Aug 31, 2016 at 07:28:18AM +1000, Benjamin Herrenschmidt
> wrote:
> >
> > It's always been a requirement that if you actually context switch
> > a
> > full mb() is implied ...
>
> >
> > On powerpc we have a sync deep in _switch
On Wed, 2016-08-31 at 09:20 +0200, Peter Zijlstra wrote:
> On Wed, Aug 31, 2016 at 07:25:01AM +1000, Benjamin Herrenschmidt wrote:
> >
> > On Tue, 2016-08-30 at 15:04 +0200, Oleg Nesterov wrote:
> > >
> > >
> > > Confused... how this connects to UNLOCK+LOCK on rq->lock? A LOAD can
> > > leak
On Wed, 2016-08-31 at 09:28 +0200, Peter Zijlstra wrote:
>
> What hardware do you see this on, is it shiny new Power8 chips which
> have never before seen deep queues or something. Or is it 'regular'
> old Power7 like stuff?
Power8 which isn't *that* new these days...
Cheers,
Ben.
On Wed, 2016-08-31 at 09:28 +0200, Peter Zijlstra wrote:
>
> What hardware do you see this on, is it shiny new Power8 chips which
> have never before seen deep queues or something. Or is it 'regular'
> old Power7 like stuff?
Power8 which isn't *that* new these days...
Cheers,
Ben.
On 31/08/16 17:28, Peter Zijlstra wrote:
> On Wed, Aug 31, 2016 at 01:41:33PM +1000, Balbir Singh wrote:
>> On 30/08/16 22:19, Peter Zijlstra wrote:
>>> On Tue, Aug 30, 2016 at 06:49:37PM +1000, Balbir Singh wrote:
The origin of the issue I've seen seems to be related to
On 31/08/16 17:28, Peter Zijlstra wrote:
> On Wed, Aug 31, 2016 at 01:41:33PM +1000, Balbir Singh wrote:
>> On 30/08/16 22:19, Peter Zijlstra wrote:
>>> On Tue, Aug 30, 2016 at 06:49:37PM +1000, Balbir Singh wrote:
The origin of the issue I've seen seems to be related to
On Wed, Aug 31, 2016 at 01:41:33PM +1000, Balbir Singh wrote:
> On 30/08/16 22:19, Peter Zijlstra wrote:
> > On Tue, Aug 30, 2016 at 06:49:37PM +1000, Balbir Singh wrote:
> >>
> >>
> >> The origin of the issue I've seen seems to be related to
> >> rwsem spin lock stealing. Basically I see the
On Wed, Aug 31, 2016 at 01:41:33PM +1000, Balbir Singh wrote:
> On 30/08/16 22:19, Peter Zijlstra wrote:
> > On Tue, Aug 30, 2016 at 06:49:37PM +1000, Balbir Singh wrote:
> >>
> >>
> >> The origin of the issue I've seen seems to be related to
> >> rwsem spin lock stealing. Basically I see the
On Wed, Aug 31, 2016 at 07:25:01AM +1000, Benjamin Herrenschmidt wrote:
> On Tue, 2016-08-30 at 15:04 +0200, Oleg Nesterov wrote:
> >
> > Confused... how this connects to UNLOCK+LOCK on rq->lock? A LOAD can
> > leak into the critical section.
> >
> > But context switch should imply mb() we can
On Wed, Aug 31, 2016 at 07:25:01AM +1000, Benjamin Herrenschmidt wrote:
> On Tue, 2016-08-30 at 15:04 +0200, Oleg Nesterov wrote:
> >
> > Confused... how this connects to UNLOCK+LOCK on rq->lock? A LOAD can
> > leak into the critical section.
> >
> > But context switch should imply mb() we can
On Wed, Aug 31, 2016 at 07:28:18AM +1000, Benjamin Herrenschmidt wrote:
> It's always been a requirement that if you actually context switch a
> full mb() is implied ...
> On powerpc we have a sync deep in _switch to achieve that.
OK, fair enough. I must've missed it in the x86 switch_to, must
On Wed, Aug 31, 2016 at 07:28:18AM +1000, Benjamin Herrenschmidt wrote:
> It's always been a requirement that if you actually context switch a
> full mb() is implied ...
> On powerpc we have a sync deep in _switch to achieve that.
OK, fair enough. I must've missed it in the x86 switch_to, must
On 30/08/16 22:19, Peter Zijlstra wrote:
> On Tue, Aug 30, 2016 at 06:49:37PM +1000, Balbir Singh wrote:
>>
>>
>> The origin of the issue I've seen seems to be related to
>> rwsem spin lock stealing. Basically I see the system deadlock'd in the
>> following state
>
> As Nick says (good to see
On 30/08/16 22:19, Peter Zijlstra wrote:
> On Tue, Aug 30, 2016 at 06:49:37PM +1000, Balbir Singh wrote:
>>
>>
>> The origin of the issue I've seen seems to be related to
>> rwsem spin lock stealing. Basically I see the system deadlock'd in the
>> following state
>
> As Nick says (good to see
On 30/08/16 22:58, Oleg Nesterov wrote:
> On 08/30, Balbir Singh wrote:
>>
>> The origin of the issue I've seen seems to be related to
>> rwsem spin lock stealing. Basically I see the system deadlock'd in the
>> following state
>>
>> I have a system with multiple threads and
>>
>> Most of the
On 30/08/16 22:58, Oleg Nesterov wrote:
> On 08/30, Balbir Singh wrote:
>>
>> The origin of the issue I've seen seems to be related to
>> rwsem spin lock stealing. Basically I see the system deadlock'd in the
>> following state
>>
>> I have a system with multiple threads and
>>
>> Most of the
On Tue, 2016-08-30 at 20:34 +0200, Peter Zijlstra wrote:
>
> I'm not actually sure it does. There is the comment from 8643cda549ca4
> which explain the program order guarantees.
>
> But I'm not sure who or what would simply a full smp_mb() when you call
> schedule() -- I mean, its true on x86,
On Tue, 2016-08-30 at 20:34 +0200, Peter Zijlstra wrote:
>
> I'm not actually sure it does. There is the comment from 8643cda549ca4
> which explain the program order guarantees.
>
> But I'm not sure who or what would simply a full smp_mb() when you call
> schedule() -- I mean, its true on x86,
On Tue, 2016-08-30 at 15:04 +0200, Oleg Nesterov wrote:
>
> Confused... how this connects to UNLOCK+LOCK on rq->lock? A LOAD can
> leak into the critical section.
>
> But context switch should imply mb() we can rely on?
Between setting of ->on_rq and returning to the task so it can
change its
On Tue, 2016-08-30 at 15:04 +0200, Oleg Nesterov wrote:
>
> Confused... how this connects to UNLOCK+LOCK on rq->lock? A LOAD can
> leak into the critical section.
>
> But context switch should imply mb() we can rely on?
Between setting of ->on_rq and returning to the task so it can
change its
On Tue, Aug 30, 2016 at 06:57:47PM +0200, Oleg Nesterov wrote:
> On 08/30, Peter Zijlstra wrote:
> > On Tue, Aug 30, 2016 at 03:04:27PM +0200, Oleg Nesterov wrote:
> > > But context switch should imply mb() we can rely on?
> >
> > Not sure it should, on x86 switch_mm does a CR3 write and that is
On Tue, Aug 30, 2016 at 06:57:47PM +0200, Oleg Nesterov wrote:
> On 08/30, Peter Zijlstra wrote:
> > On Tue, Aug 30, 2016 at 03:04:27PM +0200, Oleg Nesterov wrote:
> > > But context switch should imply mb() we can rely on?
> >
> > Not sure it should, on x86 switch_mm does a CR3 write and that is
On 08/30, Peter Zijlstra wrote:
> On Tue, Aug 30, 2016 at 03:04:27PM +0200, Oleg Nesterov wrote:
> > On 08/30, Peter Zijlstra wrote:
> > >
> > > /*
> > >* Ensure we load p->on_rq _after_ p->state, otherwise it would
> > >* be possible to, falsely, observe p->on_rq == 0 and get stuck
> >
On 08/30, Peter Zijlstra wrote:
> On Tue, Aug 30, 2016 at 03:04:27PM +0200, Oleg Nesterov wrote:
> > On 08/30, Peter Zijlstra wrote:
> > >
> > > /*
> > >* Ensure we load p->on_rq _after_ p->state, otherwise it would
> > >* be possible to, falsely, observe p->on_rq == 0 and get stuck
> >
On Tue, Aug 30, 2016 at 03:04:27PM +0200, Oleg Nesterov wrote:
> On 08/30, Peter Zijlstra wrote:
> >
> > /*
> > * Ensure we load p->on_rq _after_ p->state, otherwise it would
> > * be possible to, falsely, observe p->on_rq == 0 and get stuck
> > * in smp_cond_load_acquire()
On Tue, Aug 30, 2016 at 03:04:27PM +0200, Oleg Nesterov wrote:
> On 08/30, Peter Zijlstra wrote:
> >
> > /*
> > * Ensure we load p->on_rq _after_ p->state, otherwise it would
> > * be possible to, falsely, observe p->on_rq == 0 and get stuck
> > * in smp_cond_load_acquire()
On 08/30, Peter Zijlstra wrote:
>
> /*
>* Ensure we load p->on_rq _after_ p->state, otherwise it would
>* be possible to, falsely, observe p->on_rq == 0 and get stuck
>* in smp_cond_load_acquire() below.
>*
>* sched_ttwu_pending()
On 08/30, Peter Zijlstra wrote:
>
> /*
>* Ensure we load p->on_rq _after_ p->state, otherwise it would
>* be possible to, falsely, observe p->on_rq == 0 and get stuck
>* in smp_cond_load_acquire() below.
>*
>* sched_ttwu_pending()
On 08/30, Balbir Singh wrote:
>
> The origin of the issue I've seen seems to be related to
> rwsem spin lock stealing. Basically I see the system deadlock'd in the
> following state
>
> I have a system with multiple threads and
>
> Most of the threads are stuck doing
>
> [67272.593915] ---
On 08/30, Balbir Singh wrote:
>
> The origin of the issue I've seen seems to be related to
> rwsem spin lock stealing. Basically I see the system deadlock'd in the
> following state
>
> I have a system with multiple threads and
>
> Most of the threads are stuck doing
>
> [67272.593915] ---
On Tue, Aug 30, 2016 at 06:49:37PM +1000, Balbir Singh wrote:
>
>
> The origin of the issue I've seen seems to be related to
> rwsem spin lock stealing. Basically I see the system deadlock'd in the
> following state
As Nick says (good to see you're back Nick!), this is unrelated to
rwsems.
On Tue, Aug 30, 2016 at 06:49:37PM +1000, Balbir Singh wrote:
>
>
> The origin of the issue I've seen seems to be related to
> rwsem spin lock stealing. Basically I see the system deadlock'd in the
> following state
As Nick says (good to see you're back Nick!), this is unrelated to
rwsems.
On Tue, 30 Aug 2016 18:49:37 +1000
Balbir Singh wrote:
> The origin of the issue I've seen seems to be related to
> rwsem spin lock stealing. Basically I see the system deadlock'd in the
> following state
BTW. this is not really to do with rwsems, but purely a scheduler
On Tue, 30 Aug 2016 18:49:37 +1000
Balbir Singh wrote:
> The origin of the issue I've seen seems to be related to
> rwsem spin lock stealing. Basically I see the system deadlock'd in the
> following state
BTW. this is not really to do with rwsems, but purely a scheduler bug.
It was seen in
52 matches
Mail list logo