Re: rcu stalls and soft lockups with recent kernels

2016-03-23 Thread Paul E. McKenney
On Wed, Mar 23, 2016 at 05:50:14AM +0100, Mike Galbraith wrote: > (cc) > > On Tue, 2016-03-22 at 16:22 -0400, Ion Badulescu wrote: > > On 03/17/2016 10:28 PM, Mike Galbraith wrote: > > > On Wed, 2016-03-16 at 12:15 -0400, Ion Badulescu wrote: > > > > Just following up to my own email: > > > > >

Re: rcu stalls and soft lockups with recent kernels

2016-03-23 Thread Paul E. McKenney
On Wed, Mar 23, 2016 at 05:50:14AM +0100, Mike Galbraith wrote: > (cc) > > On Tue, 2016-03-22 at 16:22 -0400, Ion Badulescu wrote: > > On 03/17/2016 10:28 PM, Mike Galbraith wrote: > > > On Wed, 2016-03-16 at 12:15 -0400, Ion Badulescu wrote: > > > > Just following up to my own email: > > > > >

Re: rcu stalls and soft lockups with recent kernels

2016-03-22 Thread Mike Galbraith
(cc) On Tue, 2016-03-22 at 16:22 -0400, Ion Badulescu wrote: > On 03/17/2016 10:28 PM, Mike Galbraith wrote: > > On Wed, 2016-03-16 at 12:15 -0400, Ion Badulescu wrote: > > > Just following up to my own email: > > > > > > It turns out that we can eliminate the RCU stalls by changing from > > >

Re: rcu stalls and soft lockups with recent kernels

2016-03-22 Thread Mike Galbraith
(cc) On Tue, 2016-03-22 at 16:22 -0400, Ion Badulescu wrote: > On 03/17/2016 10:28 PM, Mike Galbraith wrote: > > On Wed, 2016-03-16 at 12:15 -0400, Ion Badulescu wrote: > > > Just following up to my own email: > > > > > > It turns out that we can eliminate the RCU stalls by changing from > > >

Re: rcu stalls and soft lockups with recent kernels

2016-03-22 Thread Ion Badulescu
On 03/17/2016 10:28 PM, Mike Galbraith wrote: On Wed, 2016-03-16 at 12:15 -0400, Ion Badulescu wrote: Just following up to my own email: It turns out that we can eliminate the RCU stalls by changing from CONFIG_RCU_NOCB_CPU_ALL to CONFIG_RCU_NOCB_CPU_NONE. Letting each cpu handle its own RCU

Re: rcu stalls and soft lockups with recent kernels

2016-03-22 Thread Ion Badulescu
On 03/17/2016 10:28 PM, Mike Galbraith wrote: On Wed, 2016-03-16 at 12:15 -0400, Ion Badulescu wrote: Just following up to my own email: It turns out that we can eliminate the RCU stalls by changing from CONFIG_RCU_NOCB_CPU_ALL to CONFIG_RCU_NOCB_CPU_NONE. Letting each cpu handle its own RCU

Re: rcu stalls and soft lockups with recent kernels

2016-03-19 Thread Ion Badulescu
Just following up to my own email: It turns out that we can eliminate the RCU stalls by changing from CONFIG_RCU_NOCB_CPU_ALL to CONFIG_RCU_NOCB_CPU_NONE. Letting each cpu handle its own RCU callbacks completely fixes the problems for us. Now, CONFIG_NO_HZ_FULL and CONFIG_RCU_NOCB_CPU_ALL is

Re: rcu stalls and soft lockups with recent kernels

2016-03-19 Thread Ion Badulescu
Just following up to my own email: It turns out that we can eliminate the RCU stalls by changing from CONFIG_RCU_NOCB_CPU_ALL to CONFIG_RCU_NOCB_CPU_NONE. Letting each cpu handle its own RCU callbacks completely fixes the problems for us. Now, CONFIG_NO_HZ_FULL and CONFIG_RCU_NOCB_CPU_ALL is

Re: rcu stalls and soft lockups with recent kernels

2016-03-18 Thread Mike Galbraith
On Wed, 2016-03-16 at 12:15 -0400, Ion Badulescu wrote: > Just following up to my own email: > > It turns out that we can eliminate the RCU stalls by changing from > CONFIG_RCU_NOCB_CPU_ALL to CONFIG_RCU_NOCB_CPU_NONE. Letting each cpu > handle its own RCU callbacks completely fixes the

Re: rcu stalls and soft lockups with recent kernels

2016-03-18 Thread Mike Galbraith
On Wed, 2016-03-16 at 12:15 -0400, Ion Badulescu wrote: > Just following up to my own email: > > It turns out that we can eliminate the RCU stalls by changing from > CONFIG_RCU_NOCB_CPU_ALL to CONFIG_RCU_NOCB_CPU_NONE. Letting each cpu > handle its own RCU callbacks completely fixes the

rcu stalls and soft lockups with recent kernels

2016-02-04 Thread Ion Badulescu
Hello, We run a compute cluster of about 800 or so machines here, which makes heavy use of NFS and fscache (on a dedicated local drive with an ext4 filesystem) and also exercises the other local drives pretty hard. All the compute jobs run as unprivileged users with SCHED_OTHER scheduling, nice

rcu stalls and soft lockups with recent kernels

2016-02-04 Thread Ion Badulescu
Hello, We run a compute cluster of about 800 or so machines here, which makes heavy use of NFS and fscache (on a dedicated local drive with an ext4 filesystem) and also exercises the other local drives pretty hard. All the compute jobs run as unprivileged users with SCHED_OTHER scheduling, nice