Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!

2016-01-24 Thread Linus Torvalds
On Fri, Jan 22, 2016 at 8:46 AM, Christoph Lameter  wrote:
>
> Subject: vmstat: Remove BUG_ON from vmstat_update
>
> If we detect that there is nothing to do just set the flag and do not check
> if it was already set before. [..]

Ok, I am assuming this is in Andrew's queue already, but this bug hit
my machine overnight, so I'm applying it directly..

Linus


Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!

2016-01-24 Thread Linus Torvalds
On Fri, Jan 22, 2016 at 8:46 AM, Christoph Lameter  wrote:
>
> Subject: vmstat: Remove BUG_ON from vmstat_update
>
> If we detect that there is nothing to do just set the flag and do not check
> if it was already set before. [..]

Ok, I am assuming this is in Andrew's queue already, but this bug hit
my machine overnight, so I'm applying it directly..

Linus


Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!

2016-01-22 Thread Michal Hocko
On Fri 22-01-16 10:46:14, Christoph Lameter wrote:
> On Fri, 22 Jan 2016, Michal Hocko wrote:
> 
> > Could you repost the patch with the updated description?
> 
> Subject: vmstat: Remove BUG_ON from vmstat_update
> 
> If we detect that there is nothing to do just set the flag and do not check
> if it was already set before. Races really do not matter. If the flag is
> set by any code then the shepherd will start dealing with the situation
> and reenable the vmstat workers when necessary again.
> 
> Since 0eb77e988032 ("vmstat: make vmstat_updater deferrable again and
> shut down on idle") quiet_vmstat might update cpu_stat_off and mark a
> particular cpu to be handled by vmstat_shepherd. This might trigger
> a VM_BUG_ON in vmstat_update because the work item might have been
> sleeping during the idle period and see the cpu_stat_off updated after
> the wake up. The VM_BUG_ON is therefore misleading and no more
> appropriate. Moreover it doesn't really suite any protection from real
> bugs because vmstat_shepherd will simply reschedule the vmstat_work
> anytime it sees a particular cpu set or vmstat_update would do the same
> from the worker context directly. Even when the two would race the
> result wouldn't be incorrect as the counters update is fully idempotent.
>

Fixes: 0eb77e988032 ("vmstat: make vmstat_updater deferrable again and shut 
down on idle"
Reported-by: Sasha Levin 

Would be appropriate IMO

> Signed-off-by: Christoph Lameter 

Acked-by: Michal Hocko 

Thanks!

> 
> Index: linux/mm/vmstat.c
> ===
> --- linux.orig/mm/vmstat.c
> +++ linux/mm/vmstat.c
> @@ -1408,17 +1408,7 @@ static void vmstat_update(struct work_st
>* Defer the checking for differentials to the
>* shepherd thread on a different processor.
>*/
> - int r;
> - /*
> -  * Shepherd work thread does not race since it never
> -  * changes the bit if its zero but the cpu
> -  * online / off line code may race if
> -  * worker threads are still allowed during
> -  * shutdown / startup.
> -  */
> - r = cpumask_test_and_set_cpu(smp_processor_id(),
> - cpu_stat_off);
> - VM_BUG_ON(r);
> + cpumask_set_cpu(smp_processor_id(), cpu_stat_off);
>   }
>  }

-- 
Michal Hocko
SUSE Labs


Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!

2016-01-22 Thread Christoph Lameter
On Fri, 22 Jan 2016, Michal Hocko wrote:

> Could you repost the patch with the updated description?

Subject: vmstat: Remove BUG_ON from vmstat_update

If we detect that there is nothing to do just set the flag and do not check
if it was already set before. Races really do not matter. If the flag is
set by any code then the shepherd will start dealing with the situation
and reenable the vmstat workers when necessary again.

Since 0eb77e988032 ("vmstat: make vmstat_updater deferrable again and
shut down on idle") quiet_vmstat might update cpu_stat_off and mark a
particular cpu to be handled by vmstat_shepherd. This might trigger
a VM_BUG_ON in vmstat_update because the work item might have been
sleeping during the idle period and see the cpu_stat_off updated after
the wake up. The VM_BUG_ON is therefore misleading and no more
appropriate. Moreover it doesn't really suite any protection from real
bugs because vmstat_shepherd will simply reschedule the vmstat_work
anytime it sees a particular cpu set or vmstat_update would do the same
from the worker context directly. Even when the two would race the
result wouldn't be incorrect as the counters update is fully idempotent.

Signed-off-by: Christoph Lameter 

Index: linux/mm/vmstat.c
===
--- linux.orig/mm/vmstat.c
+++ linux/mm/vmstat.c
@@ -1408,17 +1408,7 @@ static void vmstat_update(struct work_st
 * Defer the checking for differentials to the
 * shepherd thread on a different processor.
 */
-   int r;
-   /*
-* Shepherd work thread does not race since it never
-* changes the bit if its zero but the cpu
-* online / off line code may race if
-* worker threads are still allowed during
-* shutdown / startup.
-*/
-   r = cpumask_test_and_set_cpu(smp_processor_id(),
-   cpu_stat_off);
-   VM_BUG_ON(r);
+   cpumask_set_cpu(smp_processor_id(), cpu_stat_off);
}
 }



Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!

2016-01-22 Thread Michal Hocko
On Fri 22-01-16 10:07:01, Christoph Lameter wrote:
> On Fri, 22 Jan 2016, Michal Hocko wrote:
> 
> > Wouldn't it be much more easier and simply get rid of the VM_BUG_ON?
> > What is the point of keeping it in the first place. The code can
> > perfectly cope with the race.
> 
> Ok then lets do that.

Could you repost the patch with the updated description?

-- 
Michal Hocko
SUSE Labs


Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!

2016-01-22 Thread Christoph Lameter
On Fri, 22 Jan 2016, Michal Hocko wrote:

> Wouldn't it be much more easier and simply get rid of the VM_BUG_ON?
> What is the point of keeping it in the first place. The code can
> perfectly cope with the race.

Ok then lets do that.



Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!

2016-01-22 Thread Michal Hocko
On Thu 21-01-16 11:38:46, Christoph Lameter wrote:
> On Thu, 21 Jan 2016, Michal Hocko wrote:
> 
> > It goes like this:
> > CPU0:   CPU1
> > vmstat_update
> >   cpumask_test_and_set_cpu (0->1)
> > [...]
> > vmstat_shepherd
> >   
> > cpumask_test_and_clear_cpu(CPU0) (1->0)
> > quiet_vmstat
> >   cpumask_test_and_set_cpu (0->1)
> >   queue_delayed_work_on(CPU0)
> > refresh_cpu_vm_stats()
> > [...]
> > vmstat_update
> >   nothing_to_do
> >   cpumask_test_and_set_cpu (1->1)
> >   VM_BUG_ON
> >
> > Or am I missing something?
> 
> Ok then the following should fix it:

Wouldn't it be much more easier and simply get rid of the VM_BUG_ON?
What is the point of keeping it in the first place. The code can
perfectly cope with the race.
-- 
Michal Hocko
SUSE Labs


Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!

2016-01-22 Thread Shiraz Hashim
On Thu, Jan 21, 2016 at 11:08 PM, Christoph Lameter  wrote:
> Subject: vmstat: Queue work before clearing cpu_stat_off
>
> There is a race between vmstat_shepherd and quiet_vmstat() because
> the responsibility for checking for counter updates changes depending
> on the state of teh bit in cpu_stat_off. So queue the work before
> changing state of the bit in vmstat_shepherd. That way quiet_vmstat
> is guaranteed to remove the work request when clearing the bit and the
> bug in vmstat_update wont trigger anymore.
>
> Signed-off-by: Christoph Lameter 
>
> Index: linux/mm/vmstat.c
> ===
> --- linux.orig/mm/vmstat.c
> +++ linux/mm/vmstat.c
> @@ -1480,12 +1480,14 @@ static void vmstat_shepherd(struct work_
> get_online_cpus();
> /* Check processors whose vmstat worker threads have been disabled */
> for_each_cpu(cpu, cpu_stat_off)
> -   if (need_update(cpu) &&
> -   cpumask_test_and_clear_cpu(cpu, cpu_stat_off))
> +   if (need_update(cpu)) {
>
> queue_delayed_work_on(cpu, vmstat_wq,
> _cpu(vmstat_work, cpu), 0);
>
> +   cpumask_clear_cpu(smp_processor_id(), cpu_stat_off);
> +   }
> +
> put_online_cpus();
>
> schedule_delayed_work(,


This can alternatively lead to following where vmstat may not be
scheduled for cpu  when it is back from idle.

CPU0:CPU1:
   vmstat_shepherd
queue_delayed_work_on(CPU0)
quiet_vmstat
  cancel_delayed_work
  cpumask_test_and_set_cpu (0->1)

cpumask_clear_cpu(CPU0) (1->0)

-- 
regards
Shiraz Hashim


Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!

2016-01-22 Thread Shiraz Hashim
On Thu, Jan 21, 2016 at 11:08 PM, Christoph Lameter  wrote:
> Subject: vmstat: Queue work before clearing cpu_stat_off
>
> There is a race between vmstat_shepherd and quiet_vmstat() because
> the responsibility for checking for counter updates changes depending
> on the state of teh bit in cpu_stat_off. So queue the work before
> changing state of the bit in vmstat_shepherd. That way quiet_vmstat
> is guaranteed to remove the work request when clearing the bit and the
> bug in vmstat_update wont trigger anymore.
>
> Signed-off-by: Christoph Lameter 
>
> Index: linux/mm/vmstat.c
> ===
> --- linux.orig/mm/vmstat.c
> +++ linux/mm/vmstat.c
> @@ -1480,12 +1480,14 @@ static void vmstat_shepherd(struct work_
> get_online_cpus();
> /* Check processors whose vmstat worker threads have been disabled */
> for_each_cpu(cpu, cpu_stat_off)
> -   if (need_update(cpu) &&
> -   cpumask_test_and_clear_cpu(cpu, cpu_stat_off))
> +   if (need_update(cpu)) {
>
> queue_delayed_work_on(cpu, vmstat_wq,
> _cpu(vmstat_work, cpu), 0);
>
> +   cpumask_clear_cpu(smp_processor_id(), cpu_stat_off);
> +   }
> +
> put_online_cpus();
>
> schedule_delayed_work(,


This can alternatively lead to following where vmstat may not be
scheduled for cpu  when it is back from idle.

CPU0:CPU1:
   vmstat_shepherd
queue_delayed_work_on(CPU0)
quiet_vmstat
  cancel_delayed_work
  cpumask_test_and_set_cpu (0->1)

cpumask_clear_cpu(CPU0) (1->0)

-- 
regards
Shiraz Hashim


Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!

2016-01-22 Thread Christoph Lameter
On Fri, 22 Jan 2016, Michal Hocko wrote:

> Wouldn't it be much more easier and simply get rid of the VM_BUG_ON?
> What is the point of keeping it in the first place. The code can
> perfectly cope with the race.

Ok then lets do that.



Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!

2016-01-22 Thread Michal Hocko
On Fri 22-01-16 10:46:14, Christoph Lameter wrote:
> On Fri, 22 Jan 2016, Michal Hocko wrote:
> 
> > Could you repost the patch with the updated description?
> 
> Subject: vmstat: Remove BUG_ON from vmstat_update
> 
> If we detect that there is nothing to do just set the flag and do not check
> if it was already set before. Races really do not matter. If the flag is
> set by any code then the shepherd will start dealing with the situation
> and reenable the vmstat workers when necessary again.
> 
> Since 0eb77e988032 ("vmstat: make vmstat_updater deferrable again and
> shut down on idle") quiet_vmstat might update cpu_stat_off and mark a
> particular cpu to be handled by vmstat_shepherd. This might trigger
> a VM_BUG_ON in vmstat_update because the work item might have been
> sleeping during the idle period and see the cpu_stat_off updated after
> the wake up. The VM_BUG_ON is therefore misleading and no more
> appropriate. Moreover it doesn't really suite any protection from real
> bugs because vmstat_shepherd will simply reschedule the vmstat_work
> anytime it sees a particular cpu set or vmstat_update would do the same
> from the worker context directly. Even when the two would race the
> result wouldn't be incorrect as the counters update is fully idempotent.
>

Fixes: 0eb77e988032 ("vmstat: make vmstat_updater deferrable again and shut 
down on idle"
Reported-by: Sasha Levin 

Would be appropriate IMO

> Signed-off-by: Christoph Lameter 

Acked-by: Michal Hocko 

Thanks!

> 
> Index: linux/mm/vmstat.c
> ===
> --- linux.orig/mm/vmstat.c
> +++ linux/mm/vmstat.c
> @@ -1408,17 +1408,7 @@ static void vmstat_update(struct work_st
>* Defer the checking for differentials to the
>* shepherd thread on a different processor.
>*/
> - int r;
> - /*
> -  * Shepherd work thread does not race since it never
> -  * changes the bit if its zero but the cpu
> -  * online / off line code may race if
> -  * worker threads are still allowed during
> -  * shutdown / startup.
> -  */
> - r = cpumask_test_and_set_cpu(smp_processor_id(),
> - cpu_stat_off);
> - VM_BUG_ON(r);
> + cpumask_set_cpu(smp_processor_id(), cpu_stat_off);
>   }
>  }

-- 
Michal Hocko
SUSE Labs


Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!

2016-01-22 Thread Michal Hocko
On Thu 21-01-16 11:38:46, Christoph Lameter wrote:
> On Thu, 21 Jan 2016, Michal Hocko wrote:
> 
> > It goes like this:
> > CPU0:   CPU1
> > vmstat_update
> >   cpumask_test_and_set_cpu (0->1)
> > [...]
> > vmstat_shepherd
> >   
> > cpumask_test_and_clear_cpu(CPU0) (1->0)
> > quiet_vmstat
> >   cpumask_test_and_set_cpu (0->1)
> >   queue_delayed_work_on(CPU0)
> > refresh_cpu_vm_stats()
> > [...]
> > vmstat_update
> >   nothing_to_do
> >   cpumask_test_and_set_cpu (1->1)
> >   VM_BUG_ON
> >
> > Or am I missing something?
> 
> Ok then the following should fix it:

Wouldn't it be much more easier and simply get rid of the VM_BUG_ON?
What is the point of keeping it in the first place. The code can
perfectly cope with the race.
-- 
Michal Hocko
SUSE Labs


Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!

2016-01-22 Thread Christoph Lameter
On Fri, 22 Jan 2016, Michal Hocko wrote:

> Could you repost the patch with the updated description?

Subject: vmstat: Remove BUG_ON from vmstat_update

If we detect that there is nothing to do just set the flag and do not check
if it was already set before. Races really do not matter. If the flag is
set by any code then the shepherd will start dealing with the situation
and reenable the vmstat workers when necessary again.

Since 0eb77e988032 ("vmstat: make vmstat_updater deferrable again and
shut down on idle") quiet_vmstat might update cpu_stat_off and mark a
particular cpu to be handled by vmstat_shepherd. This might trigger
a VM_BUG_ON in vmstat_update because the work item might have been
sleeping during the idle period and see the cpu_stat_off updated after
the wake up. The VM_BUG_ON is therefore misleading and no more
appropriate. Moreover it doesn't really suite any protection from real
bugs because vmstat_shepherd will simply reschedule the vmstat_work
anytime it sees a particular cpu set or vmstat_update would do the same
from the worker context directly. Even when the two would race the
result wouldn't be incorrect as the counters update is fully idempotent.

Signed-off-by: Christoph Lameter 

Index: linux/mm/vmstat.c
===
--- linux.orig/mm/vmstat.c
+++ linux/mm/vmstat.c
@@ -1408,17 +1408,7 @@ static void vmstat_update(struct work_st
 * Defer the checking for differentials to the
 * shepherd thread on a different processor.
 */
-   int r;
-   /*
-* Shepherd work thread does not race since it never
-* changes the bit if its zero but the cpu
-* online / off line code may race if
-* worker threads are still allowed during
-* shutdown / startup.
-*/
-   r = cpumask_test_and_set_cpu(smp_processor_id(),
-   cpu_stat_off);
-   VM_BUG_ON(r);
+   cpumask_set_cpu(smp_processor_id(), cpu_stat_off);
}
 }



Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!

2016-01-22 Thread Michal Hocko
On Fri 22-01-16 10:07:01, Christoph Lameter wrote:
> On Fri, 22 Jan 2016, Michal Hocko wrote:
> 
> > Wouldn't it be much more easier and simply get rid of the VM_BUG_ON?
> > What is the point of keeping it in the first place. The code can
> > perfectly cope with the race.
> 
> Ok then lets do that.

Could you repost the patch with the updated description?

-- 
Michal Hocko
SUSE Labs


Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!

2016-01-21 Thread Christoph Lameter
On Thu, 21 Jan 2016, Michal Hocko wrote:

> It goes like this:
> CPU0: CPU1
> vmstat_update
>   cpumask_test_and_set_cpu (0->1)
> [...]
>   vmstat_shepherd
> 
> cpumask_test_and_clear_cpu(CPU0) (1->0)
> quiet_vmstat
>   cpumask_test_and_set_cpu (0->1)
> queue_delayed_work_on(CPU0)
> refresh_cpu_vm_stats()
> [...]
> vmstat_update
>   nothing_to_do
>   cpumask_test_and_set_cpu (1->1)
>   VM_BUG_ON
>
> Or am I missing something?

Ok then the following should fix it:



Subject: vmstat: Queue work before clearing cpu_stat_off

There is a race between vmstat_shepherd and quiet_vmstat() because
the responsibility for checking for counter updates changes depending
on the state of teh bit in cpu_stat_off. So queue the work before
changing state of the bit in vmstat_shepherd. That way quiet_vmstat
is guaranteed to remove the work request when clearing the bit and the
bug in vmstat_update wont trigger anymore.

Signed-off-by: Christoph Lameter 

Index: linux/mm/vmstat.c
===
--- linux.orig/mm/vmstat.c
+++ linux/mm/vmstat.c
@@ -1480,12 +1480,14 @@ static void vmstat_shepherd(struct work_
get_online_cpus();
/* Check processors whose vmstat worker threads have been disabled */
for_each_cpu(cpu, cpu_stat_off)
-   if (need_update(cpu) &&
-   cpumask_test_and_clear_cpu(cpu, cpu_stat_off))
+   if (need_update(cpu)) {

queue_delayed_work_on(cpu, vmstat_wq,
_cpu(vmstat_work, cpu), 0);

+   cpumask_clear_cpu(smp_processor_id(), cpu_stat_off);
+   }
+
put_online_cpus();

schedule_delayed_work(,


Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!

2016-01-21 Thread Michal Hocko
On Thu 21-01-16 09:45:12, Christoph Lameter wrote:
> On Thu, 21 Jan 2016, Michal Hocko wrote:
[...]
> > The vmstat update might be still waiting for its timer, idle mode started
> > and kick vmstat_update which might cpumask_test_and_set_cpu. Once the
> > idle terminates and the originally schedule vmstate_update executes it
> > sees the bit set and BUG_ON.
> 
> Ok so we are going into idle mode and the vmstat_update timer is pending.
> Then the timer will not fire since going idle switches preemption off.
> quiet_vmstat will run without the chance of running vmstat_update
> 
> We could be going idle and not have disabled preemption yet. Then
> vmstat_update will run. On return to the idling operation preemption will
> be disabled and quiet_vmstat() will be run.
> 
> I do not see how these two things could race.

It goes like this:
CPU0:   CPU1
vmstat_update
  cpumask_test_and_set_cpu (0->1)
[...]
vmstat_shepherd
  
cpumask_test_and_clear_cpu(CPU0) (1->0)
quiet_vmstat
  cpumask_test_and_set_cpu (0->1)
  queue_delayed_work_on(CPU0)
refresh_cpu_vm_stats()
[...]
vmstat_update
  nothing_to_do
  cpumask_test_and_set_cpu (1->1)
  VM_BUG_ON

Or am I missing something?
-- 
Michal Hocko
SUSE Labs


Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!

2016-01-21 Thread Christoph Lameter
On Thu, 21 Jan 2016, Michal Hocko wrote:

> > > Since 0eb77e988032 ("vmstat: make vmstat_updater deferrable again and
> > > shut down on idle") quiet_vmstat might update cpu_stat_off and mark a
> > > particular cpu to be handled by vmstat_shepherd. This might trigger
> > > a VM_BUG_ON in vmstat_update because the work item might have been
> > > sleeping during the idle period and see the cpu_stat_off updated after
> > > the wake up. The VM_BUG_ON is therefore misleading and no more
> > > appropriate. Moreover it doesn't really suite any protection from real
> > > bugs because vmstat_shepherd will simply reschedule the vmstat_work
> > > anytime it sees a particular cpu set or vmstat_update would do the same
> > > from the worker context directly. Even when the two would race the
> > > result wouldn't be incorrect as the counters update is fully idempotent.
> >
> >
> > Hmmm... the vmstat_update can be interrupted while running and the cpu put
> > into idle mode? If vmstat_update is running then the cpu is not idle but
> > running code. If this is really going on then there is other stuff wrong
> > with the idling logic.
>
> The vmstat update might be still waiting for its timer, idle mode started
> and kick vmstat_update which might cpumask_test_and_set_cpu. Once the
> idle terminates and the originally schedule vmstate_update executes it
> sees the bit set and BUG_ON.

Ok so we are going into idle mode and the vmstat_update timer is pending.
Then the timer will not fire since going idle switches preemption off.
quiet_vmstat will run without the chance of running vmstat_update

We could be going idle and not have disabled preemption yet. Then
vmstat_update will run. On return to the idling operation preemption will
be disabled and quiet_vmstat() will be run.

I do not see how these two things could race.





Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!

2016-01-21 Thread Michal Hocko
On Wed 20-01-16 15:57:43, Christoph Lameter wrote:
> On Wed, 20 Jan 2016, Michal Hocko wrote:
> 
> > On Wed 20-01-16 09:55:22, Christoph Lameter wrote:
> > [...]
> > > Subject: vmstat: Remove BUG_ON from vmstat_update
> > >
> > > If we detect that there is nothing to do just set the flag and do not 
> > > check
> > > if it was already set before. Races really do not matter. If the flag is
> > > set by any code then the shepherd will start dealing with the situation
> > > and reenable the vmstat workers when necessary again.
> > >
> > > Concurrent actions could be onlining and offlining of processors or be a
> > > result of concurrency issues when updating the cpumask from multiple
> > > processors.
> >
> > Now that 7e988032 ("vmstat: make vmstat_updater deferrable again and
> > shut down on idle) is merged the VM_BUG_ON is simply bogus because
> > vmstat_update might "race" with quiet_vmstat. The changelog should
> > reflect that. What about the following wording?
> 
> How can it race if preemption is off?

See below.

> > Since 0eb77e988032 ("vmstat: make vmstat_updater deferrable again and
> > shut down on idle") quiet_vmstat might update cpu_stat_off and mark a
> > particular cpu to be handled by vmstat_shepherd. This might trigger
> > a VM_BUG_ON in vmstat_update because the work item might have been
> > sleeping during the idle period and see the cpu_stat_off updated after
> > the wake up. The VM_BUG_ON is therefore misleading and no more
> > appropriate. Moreover it doesn't really suite any protection from real
> > bugs because vmstat_shepherd will simply reschedule the vmstat_work
> > anytime it sees a particular cpu set or vmstat_update would do the same
> > from the worker context directly. Even when the two would race the
> > result wouldn't be incorrect as the counters update is fully idempotent.
> 
> 
> Hmmm... the vmstat_update can be interrupted while running and the cpu put
> into idle mode? If vmstat_update is running then the cpu is not idle but
> running code. If this is really going on then there is other stuff wrong
> with the idling logic.

The vmstat update might be still waiting for its timer, idle mode started
and kick vmstat_update which might cpumask_test_and_set_cpu. Once the
idle terminates and the originally schedule vmstate_update executes it
sees the bit set and BUG_ON.

> > Fixes: 0eb77e988032 ("vmstat: make vmstat_updater deferrable again and
> > shut down on idle")
> > CC: stable # 4.4+
> 
> ?? There has not been an upstream release with this yet.

Ohh, I thought it made it into 4.4 but you are right it is post 4.4 so
no CC: stable required.

-- 
Michal Hocko
SUSE Labs


Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!

2016-01-21 Thread Michal Hocko
On Wed 20-01-16 15:57:43, Christoph Lameter wrote:
> On Wed, 20 Jan 2016, Michal Hocko wrote:
> 
> > On Wed 20-01-16 09:55:22, Christoph Lameter wrote:
> > [...]
> > > Subject: vmstat: Remove BUG_ON from vmstat_update
> > >
> > > If we detect that there is nothing to do just set the flag and do not 
> > > check
> > > if it was already set before. Races really do not matter. If the flag is
> > > set by any code then the shepherd will start dealing with the situation
> > > and reenable the vmstat workers when necessary again.
> > >
> > > Concurrent actions could be onlining and offlining of processors or be a
> > > result of concurrency issues when updating the cpumask from multiple
> > > processors.
> >
> > Now that 7e988032 ("vmstat: make vmstat_updater deferrable again and
> > shut down on idle) is merged the VM_BUG_ON is simply bogus because
> > vmstat_update might "race" with quiet_vmstat. The changelog should
> > reflect that. What about the following wording?
> 
> How can it race if preemption is off?

See below.

> > Since 0eb77e988032 ("vmstat: make vmstat_updater deferrable again and
> > shut down on idle") quiet_vmstat might update cpu_stat_off and mark a
> > particular cpu to be handled by vmstat_shepherd. This might trigger
> > a VM_BUG_ON in vmstat_update because the work item might have been
> > sleeping during the idle period and see the cpu_stat_off updated after
> > the wake up. The VM_BUG_ON is therefore misleading and no more
> > appropriate. Moreover it doesn't really suite any protection from real
> > bugs because vmstat_shepherd will simply reschedule the vmstat_work
> > anytime it sees a particular cpu set or vmstat_update would do the same
> > from the worker context directly. Even when the two would race the
> > result wouldn't be incorrect as the counters update is fully idempotent.
> 
> 
> Hmmm... the vmstat_update can be interrupted while running and the cpu put
> into idle mode? If vmstat_update is running then the cpu is not idle but
> running code. If this is really going on then there is other stuff wrong
> with the idling logic.

The vmstat update might be still waiting for its timer, idle mode started
and kick vmstat_update which might cpumask_test_and_set_cpu. Once the
idle terminates and the originally schedule vmstate_update executes it
sees the bit set and BUG_ON.

> > Fixes: 0eb77e988032 ("vmstat: make vmstat_updater deferrable again and
> > shut down on idle")
> > CC: stable # 4.4+
> 
> ?? There has not been an upstream release with this yet.

Ohh, I thought it made it into 4.4 but you are right it is post 4.4 so
no CC: stable required.

-- 
Michal Hocko
SUSE Labs


Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!

2016-01-21 Thread Christoph Lameter
On Thu, 21 Jan 2016, Michal Hocko wrote:

> > > Since 0eb77e988032 ("vmstat: make vmstat_updater deferrable again and
> > > shut down on idle") quiet_vmstat might update cpu_stat_off and mark a
> > > particular cpu to be handled by vmstat_shepherd. This might trigger
> > > a VM_BUG_ON in vmstat_update because the work item might have been
> > > sleeping during the idle period and see the cpu_stat_off updated after
> > > the wake up. The VM_BUG_ON is therefore misleading and no more
> > > appropriate. Moreover it doesn't really suite any protection from real
> > > bugs because vmstat_shepherd will simply reschedule the vmstat_work
> > > anytime it sees a particular cpu set or vmstat_update would do the same
> > > from the worker context directly. Even when the two would race the
> > > result wouldn't be incorrect as the counters update is fully idempotent.
> >
> >
> > Hmmm... the vmstat_update can be interrupted while running and the cpu put
> > into idle mode? If vmstat_update is running then the cpu is not idle but
> > running code. If this is really going on then there is other stuff wrong
> > with the idling logic.
>
> The vmstat update might be still waiting for its timer, idle mode started
> and kick vmstat_update which might cpumask_test_and_set_cpu. Once the
> idle terminates and the originally schedule vmstate_update executes it
> sees the bit set and BUG_ON.

Ok so we are going into idle mode and the vmstat_update timer is pending.
Then the timer will not fire since going idle switches preemption off.
quiet_vmstat will run without the chance of running vmstat_update

We could be going idle and not have disabled preemption yet. Then
vmstat_update will run. On return to the idling operation preemption will
be disabled and quiet_vmstat() will be run.

I do not see how these two things could race.





Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!

2016-01-21 Thread Michal Hocko
On Thu 21-01-16 09:45:12, Christoph Lameter wrote:
> On Thu, 21 Jan 2016, Michal Hocko wrote:
[...]
> > The vmstat update might be still waiting for its timer, idle mode started
> > and kick vmstat_update which might cpumask_test_and_set_cpu. Once the
> > idle terminates and the originally schedule vmstate_update executes it
> > sees the bit set and BUG_ON.
> 
> Ok so we are going into idle mode and the vmstat_update timer is pending.
> Then the timer will not fire since going idle switches preemption off.
> quiet_vmstat will run without the chance of running vmstat_update
> 
> We could be going idle and not have disabled preemption yet. Then
> vmstat_update will run. On return to the idling operation preemption will
> be disabled and quiet_vmstat() will be run.
> 
> I do not see how these two things could race.

It goes like this:
CPU0:   CPU1
vmstat_update
  cpumask_test_and_set_cpu (0->1)
[...]
vmstat_shepherd
  
cpumask_test_and_clear_cpu(CPU0) (1->0)
quiet_vmstat
  cpumask_test_and_set_cpu (0->1)
  queue_delayed_work_on(CPU0)
refresh_cpu_vm_stats()
[...]
vmstat_update
  nothing_to_do
  cpumask_test_and_set_cpu (1->1)
  VM_BUG_ON

Or am I missing something?
-- 
Michal Hocko
SUSE Labs


Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!

2016-01-21 Thread Christoph Lameter
On Thu, 21 Jan 2016, Michal Hocko wrote:

> It goes like this:
> CPU0: CPU1
> vmstat_update
>   cpumask_test_and_set_cpu (0->1)
> [...]
>   vmstat_shepherd
> 
> cpumask_test_and_clear_cpu(CPU0) (1->0)
> quiet_vmstat
>   cpumask_test_and_set_cpu (0->1)
> queue_delayed_work_on(CPU0)
> refresh_cpu_vm_stats()
> [...]
> vmstat_update
>   nothing_to_do
>   cpumask_test_and_set_cpu (1->1)
>   VM_BUG_ON
>
> Or am I missing something?

Ok then the following should fix it:



Subject: vmstat: Queue work before clearing cpu_stat_off

There is a race between vmstat_shepherd and quiet_vmstat() because
the responsibility for checking for counter updates changes depending
on the state of teh bit in cpu_stat_off. So queue the work before
changing state of the bit in vmstat_shepherd. That way quiet_vmstat
is guaranteed to remove the work request when clearing the bit and the
bug in vmstat_update wont trigger anymore.

Signed-off-by: Christoph Lameter 

Index: linux/mm/vmstat.c
===
--- linux.orig/mm/vmstat.c
+++ linux/mm/vmstat.c
@@ -1480,12 +1480,14 @@ static void vmstat_shepherd(struct work_
get_online_cpus();
/* Check processors whose vmstat worker threads have been disabled */
for_each_cpu(cpu, cpu_stat_off)
-   if (need_update(cpu) &&
-   cpumask_test_and_clear_cpu(cpu, cpu_stat_off))
+   if (need_update(cpu)) {

queue_delayed_work_on(cpu, vmstat_wq,
_cpu(vmstat_work, cpu), 0);

+   cpumask_clear_cpu(smp_processor_id(), cpu_stat_off);
+   }
+
put_online_cpus();

schedule_delayed_work(,


Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!

2016-01-20 Thread Christoph Lameter
On Wed, 20 Jan 2016, Michal Hocko wrote:

> On Wed 20-01-16 09:55:22, Christoph Lameter wrote:
> [...]
> > Subject: vmstat: Remove BUG_ON from vmstat_update
> >
> > If we detect that there is nothing to do just set the flag and do not check
> > if it was already set before. Races really do not matter. If the flag is
> > set by any code then the shepherd will start dealing with the situation
> > and reenable the vmstat workers when necessary again.
> >
> > Concurrent actions could be onlining and offlining of processors or be a
> > result of concurrency issues when updating the cpumask from multiple
> > processors.
>
> Now that 7e988032 ("vmstat: make vmstat_updater deferrable again and
> shut down on idle) is merged the VM_BUG_ON is simply bogus because
> vmstat_update might "race" with quiet_vmstat. The changelog should
> reflect that. What about the following wording?

How can it race if preemption is off?

> Since 0eb77e988032 ("vmstat: make vmstat_updater deferrable again and
> shut down on idle") quiet_vmstat might update cpu_stat_off and mark a
> particular cpu to be handled by vmstat_shepherd. This might trigger
> a VM_BUG_ON in vmstat_update because the work item might have been
> sleeping during the idle period and see the cpu_stat_off updated after
> the wake up. The VM_BUG_ON is therefore misleading and no more
> appropriate. Moreover it doesn't really suite any protection from real
> bugs because vmstat_shepherd will simply reschedule the vmstat_work
> anytime it sees a particular cpu set or vmstat_update would do the same
> from the worker context directly. Even when the two would race the
> result wouldn't be incorrect as the counters update is fully idempotent.


Hmmm... the vmstat_update can be interrupted while running and the cpu put
into idle mode? If vmstat_update is running then the cpu is not idle but
running code. If this is really going on then there is other stuff wrong
with the idling logic.

> Fixes: 0eb77e988032 ("vmstat: make vmstat_updater deferrable again and
> shut down on idle")
> CC: stable # 4.4+

?? There has not been an upstream release with this yet.


Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!

2016-01-20 Thread Michal Hocko
On Wed 20-01-16 09:55:22, Christoph Lameter wrote:
[...]
> Subject: vmstat: Remove BUG_ON from vmstat_update
> 
> If we detect that there is nothing to do just set the flag and do not check
> if it was already set before. Races really do not matter. If the flag is
> set by any code then the shepherd will start dealing with the situation
> and reenable the vmstat workers when necessary again.
> 
> Concurrent actions could be onlining and offlining of processors or be a
> result of concurrency issues when updating the cpumask from multiple
> processors.

Now that 7e988032 ("vmstat: make vmstat_updater deferrable again and
shut down on idle) is merged the VM_BUG_ON is simply bogus because
vmstat_update might "race" with quiet_vmstat. The changelog should
reflect that. What about the following wording?

"
Since 0eb77e988032 ("vmstat: make vmstat_updater deferrable again and
shut down on idle") quiet_vmstat might update cpu_stat_off and mark a
particular cpu to be handled by vmstat_shepherd. This might trigger
a VM_BUG_ON in vmstat_update because the work item might have been
sleeping during the idle period and see the cpu_stat_off updated after
the wake up. The VM_BUG_ON is therefore misleading and no more
appropriate. Moreover it doesn't really suite any protection from real
bugs because vmstat_shepherd will simply reschedule the vmstat_work
anytime it sees a particular cpu set or vmstat_update would do the same
from the worker context directly. Even when the two would race the
result wouldn't be incorrect as the counters update is fully idempotent.

Fixes: 0eb77e988032 ("vmstat: make vmstat_updater deferrable again and
shut down on idle")
CC: stable # 4.4+
"

> Signed-off-by: Christoph Lameter 
> 
> Index: linux/mm/vmstat.c
> ===
> --- linux.orig/mm/vmstat.c
> +++ linux/mm/vmstat.c
> @@ -1408,17 +1408,7 @@ static void vmstat_update(struct work_st
>* Defer the checking for differentials to the
>* shepherd thread on a different processor.
>*/
> - int r;
> - /*
> -  * Shepherd work thread does not race since it never
> -  * changes the bit if its zero but the cpu
> -  * online / off line code may race if
> -  * worker threads are still allowed during
> -  * shutdown / startup.
> -  */
> - r = cpumask_test_and_set_cpu(smp_processor_id(),
> - cpu_stat_off);
> - VM_BUG_ON(r);
> + cpumask_set_cpu(smp_processor_id(), cpu_stat_off);
>   }
>  }

-- 
Michal Hocko
SUSE Labs


Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!

2016-01-20 Thread Christoph Lameter
On Wed, 20 Jan 2016, Sasha Levin wrote:

>
> As I've mentioned - this reproduces frequently. I'd be happy to add in debug
> information into the kernel that might help you reproduce it, but as it seems
> like a timing issue, I can't provide a simple reproducer.

This isnt really important I think. Lets remove it.


Subject: vmstat: Remove BUG_ON from vmstat_update

If we detect that there is nothing to do just set the flag and do not check
if it was already set before. Races really do not matter. If the flag is
set by any code then the shepherd will start dealing with the situation
and reenable the vmstat workers when necessary again.

Concurrent actions could be onlining and offlining of processors or be a
result of concurrency issues when updating the cpumask from multiple
processors.

Signed-off-by: Christoph Lameter 

Index: linux/mm/vmstat.c
===
--- linux.orig/mm/vmstat.c
+++ linux/mm/vmstat.c
@@ -1408,17 +1408,7 @@ static void vmstat_update(struct work_st
 * Defer the checking for differentials to the
 * shepherd thread on a different processor.
 */
-   int r;
-   /*
-* Shepherd work thread does not race since it never
-* changes the bit if its zero but the cpu
-* online / off line code may race if
-* worker threads are still allowed during
-* shutdown / startup.
-*/
-   r = cpumask_test_and_set_cpu(smp_processor_id(),
-   cpu_stat_off);
-   VM_BUG_ON(r);
+   cpumask_set_cpu(smp_processor_id(), cpu_stat_off);
}
 }



Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!

2016-01-20 Thread Sasha Levin
On 01/20/2016 10:20 AM, Christoph Lameter wrote:
> On Wed, 20 Jan 2016, Michal Hocko wrote:
> 
>> > On Wed 20-01-16 09:56:26, Sasha Levin wrote:
>>> > > On 01/20/2016 09:37 AM, Michal Hocko wrote:
 > > > I am just reading through this old discussion again because "vmstat:
 > > > make vmstat_updater deferrable again and shut down on idle" which 
 > > > seems
 > > > to be the culprit AFAIU has been merged as 0eb77e988032 and I do not 
 > > > see
 > > > any follow up fix merged to linus tree
>>> > >
>>> > > So this isn't an "old" discussion - the bug is very much there and I can
>>> > > hit it easily. As a workaround I've "disabled" vmstat.
>> >
>> > Well the report is since 18th Dec which is over month old. Should we
>> > revert 0eb77e988032 as a pre caution and make sure this is done properly
>> > in -mm tree. AFAIR none of the proposed fix worked without other
>> > fallouts?
> Seems that we are unable to get enough information to reproduce the issue?

As I've mentioned - this reproduces frequently. I'd be happy to add in debug
information into the kernel that might help you reproduce it, but as it seems
like a timing issue, I can't provide a simple reproducer.


Thanks,
Sasha



Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!

2016-01-20 Thread Christoph Lameter
On Wed, 20 Jan 2016, Michal Hocko wrote:

> On Wed 20-01-16 09:56:26, Sasha Levin wrote:
> > On 01/20/2016 09:37 AM, Michal Hocko wrote:
> > > I am just reading through this old discussion again because "vmstat:
> > > make vmstat_updater deferrable again and shut down on idle" which seems
> > > to be the culprit AFAIU has been merged as 0eb77e988032 and I do not see
> > > any follow up fix merged to linus tree
> >
> > So this isn't an "old" discussion - the bug is very much there and I can
> > hit it easily. As a workaround I've "disabled" vmstat.
>
> Well the report is since 18th Dec which is over month old. Should we
> revert 0eb77e988032 as a pre caution and make sure this is done properly
> in -mm tree. AFAIR none of the proposed fix worked without other
> fallouts?

Seems that we are unable to get enough information to reproduce the issue?



Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!

2016-01-20 Thread Michal Hocko
On Wed 20-01-16 09:14:06, Christoph Lameter wrote:
> On Wed, 20 Jan 2016, Michal Hocko wrote:
> 
> > [CCing Andrew]
> >
> > I am just reading through this old discussion again because "vmstat:
> > make vmstat_updater deferrable again and shut down on idle" which seems
> > to be the culprit AFAIU has been merged as 0eb77e988032 and I do not see
> > any follow up fix merged to linus tree
> 
> Is there any way to reproce this issue? This is running through trinity
> right? Can we please get the exact syscall that causes this to occur?

As per the backtrace in the initial report this seems to be time
dependent as the crash happens from _kthread_ context. So it doesn't
seem to be directly related to any particular syscall.
-- 
Michal Hocko
SUSE Labs


Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!

2016-01-20 Thread Christoph Lameter
On Wed, 20 Jan 2016, Michal Hocko wrote:

> [CCing Andrew]
>
> I am just reading through this old discussion again because "vmstat:
> make vmstat_updater deferrable again and shut down on idle" which seems
> to be the culprit AFAIU has been merged as 0eb77e988032 and I do not see
> any follow up fix merged to linus tree

Is there any way to reproce this issue? This is running through trinity
right? Can we please get the exact syscall that causes this to occur?


Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!

2016-01-20 Thread Michal Hocko
On Wed 20-01-16 09:56:26, Sasha Levin wrote:
> On 01/20/2016 09:37 AM, Michal Hocko wrote:
> > I am just reading through this old discussion again because "vmstat:
> > make vmstat_updater deferrable again and shut down on idle" which seems
> > to be the culprit AFAIU has been merged as 0eb77e988032 and I do not see
> > any follow up fix merged to linus tree
> 
> So this isn't an "old" discussion - the bug is very much there and I can
> hit it easily. As a workaround I've "disabled" vmstat.

Well the report is since 18th Dec which is over month old. Should we
revert 0eb77e988032 as a pre caution and make sure this is done properly
in -mm tree. AFAIR none of the proposed fix worked without other
fallouts?
-- 
Michal Hocko
SUSE Labs


Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!

2016-01-20 Thread Sasha Levin
On 01/20/2016 09:37 AM, Michal Hocko wrote:
> I am just reading through this old discussion again because "vmstat:
> make vmstat_updater deferrable again and shut down on idle" which seems
> to be the culprit AFAIU has been merged as 0eb77e988032 and I do not see
> any follow up fix merged to linus tree

So this isn't an "old" discussion - the bug is very much there and I can
hit it easily. As a workaround I've "disabled" vmstat.


Thanks,
Sasha


Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!

2016-01-20 Thread Sasha Levin
On 01/20/2016 09:37 AM, Michal Hocko wrote:
> I am just reading through this old discussion again because "vmstat:
> make vmstat_updater deferrable again and shut down on idle" which seems
> to be the culprit AFAIU has been merged as 0eb77e988032 and I do not see
> any follow up fix merged to linus tree

So this isn't an "old" discussion - the bug is very much there and I can
hit it easily. As a workaround I've "disabled" vmstat.


Thanks,
Sasha


Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!

2016-01-20 Thread Sasha Levin
On 01/20/2016 10:20 AM, Christoph Lameter wrote:
> On Wed, 20 Jan 2016, Michal Hocko wrote:
> 
>> > On Wed 20-01-16 09:56:26, Sasha Levin wrote:
>>> > > On 01/20/2016 09:37 AM, Michal Hocko wrote:
 > > > I am just reading through this old discussion again because "vmstat:
 > > > make vmstat_updater deferrable again and shut down on idle" which 
 > > > seems
 > > > to be the culprit AFAIU has been merged as 0eb77e988032 and I do not 
 > > > see
 > > > any follow up fix merged to linus tree
>>> > >
>>> > > So this isn't an "old" discussion - the bug is very much there and I can
>>> > > hit it easily. As a workaround I've "disabled" vmstat.
>> >
>> > Well the report is since 18th Dec which is over month old. Should we
>> > revert 0eb77e988032 as a pre caution and make sure this is done properly
>> > in -mm tree. AFAIR none of the proposed fix worked without other
>> > fallouts?
> Seems that we are unable to get enough information to reproduce the issue?

As I've mentioned - this reproduces frequently. I'd be happy to add in debug
information into the kernel that might help you reproduce it, but as it seems
like a timing issue, I can't provide a simple reproducer.


Thanks,
Sasha



Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!

2016-01-20 Thread Michal Hocko
On Wed 20-01-16 09:14:06, Christoph Lameter wrote:
> On Wed, 20 Jan 2016, Michal Hocko wrote:
> 
> > [CCing Andrew]
> >
> > I am just reading through this old discussion again because "vmstat:
> > make vmstat_updater deferrable again and shut down on idle" which seems
> > to be the culprit AFAIU has been merged as 0eb77e988032 and I do not see
> > any follow up fix merged to linus tree
> 
> Is there any way to reproce this issue? This is running through trinity
> right? Can we please get the exact syscall that causes this to occur?

As per the backtrace in the initial report this seems to be time
dependent as the crash happens from _kthread_ context. So it doesn't
seem to be directly related to any particular syscall.
-- 
Michal Hocko
SUSE Labs


Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!

2016-01-20 Thread Christoph Lameter
On Wed, 20 Jan 2016, Michal Hocko wrote:

> On Wed 20-01-16 09:56:26, Sasha Levin wrote:
> > On 01/20/2016 09:37 AM, Michal Hocko wrote:
> > > I am just reading through this old discussion again because "vmstat:
> > > make vmstat_updater deferrable again and shut down on idle" which seems
> > > to be the culprit AFAIU has been merged as 0eb77e988032 and I do not see
> > > any follow up fix merged to linus tree
> >
> > So this isn't an "old" discussion - the bug is very much there and I can
> > hit it easily. As a workaround I've "disabled" vmstat.
>
> Well the report is since 18th Dec which is over month old. Should we
> revert 0eb77e988032 as a pre caution and make sure this is done properly
> in -mm tree. AFAIR none of the proposed fix worked without other
> fallouts?

Seems that we are unable to get enough information to reproduce the issue?



Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!

2016-01-20 Thread Christoph Lameter
On Wed, 20 Jan 2016, Sasha Levin wrote:

>
> As I've mentioned - this reproduces frequently. I'd be happy to add in debug
> information into the kernel that might help you reproduce it, but as it seems
> like a timing issue, I can't provide a simple reproducer.

This isnt really important I think. Lets remove it.


Subject: vmstat: Remove BUG_ON from vmstat_update

If we detect that there is nothing to do just set the flag and do not check
if it was already set before. Races really do not matter. If the flag is
set by any code then the shepherd will start dealing with the situation
and reenable the vmstat workers when necessary again.

Concurrent actions could be onlining and offlining of processors or be a
result of concurrency issues when updating the cpumask from multiple
processors.

Signed-off-by: Christoph Lameter 

Index: linux/mm/vmstat.c
===
--- linux.orig/mm/vmstat.c
+++ linux/mm/vmstat.c
@@ -1408,17 +1408,7 @@ static void vmstat_update(struct work_st
 * Defer the checking for differentials to the
 * shepherd thread on a different processor.
 */
-   int r;
-   /*
-* Shepherd work thread does not race since it never
-* changes the bit if its zero but the cpu
-* online / off line code may race if
-* worker threads are still allowed during
-* shutdown / startup.
-*/
-   r = cpumask_test_and_set_cpu(smp_processor_id(),
-   cpu_stat_off);
-   VM_BUG_ON(r);
+   cpumask_set_cpu(smp_processor_id(), cpu_stat_off);
}
 }



Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!

2016-01-20 Thread Michal Hocko
On Wed 20-01-16 09:56:26, Sasha Levin wrote:
> On 01/20/2016 09:37 AM, Michal Hocko wrote:
> > I am just reading through this old discussion again because "vmstat:
> > make vmstat_updater deferrable again and shut down on idle" which seems
> > to be the culprit AFAIU has been merged as 0eb77e988032 and I do not see
> > any follow up fix merged to linus tree
> 
> So this isn't an "old" discussion - the bug is very much there and I can
> hit it easily. As a workaround I've "disabled" vmstat.

Well the report is since 18th Dec which is over month old. Should we
revert 0eb77e988032 as a pre caution and make sure this is done properly
in -mm tree. AFAIR none of the proposed fix worked without other
fallouts?
-- 
Michal Hocko
SUSE Labs


Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!

2016-01-20 Thread Michal Hocko
On Wed 20-01-16 09:55:22, Christoph Lameter wrote:
[...]
> Subject: vmstat: Remove BUG_ON from vmstat_update
> 
> If we detect that there is nothing to do just set the flag and do not check
> if it was already set before. Races really do not matter. If the flag is
> set by any code then the shepherd will start dealing with the situation
> and reenable the vmstat workers when necessary again.
> 
> Concurrent actions could be onlining and offlining of processors or be a
> result of concurrency issues when updating the cpumask from multiple
> processors.

Now that 7e988032 ("vmstat: make vmstat_updater deferrable again and
shut down on idle) is merged the VM_BUG_ON is simply bogus because
vmstat_update might "race" with quiet_vmstat. The changelog should
reflect that. What about the following wording?

"
Since 0eb77e988032 ("vmstat: make vmstat_updater deferrable again and
shut down on idle") quiet_vmstat might update cpu_stat_off and mark a
particular cpu to be handled by vmstat_shepherd. This might trigger
a VM_BUG_ON in vmstat_update because the work item might have been
sleeping during the idle period and see the cpu_stat_off updated after
the wake up. The VM_BUG_ON is therefore misleading and no more
appropriate. Moreover it doesn't really suite any protection from real
bugs because vmstat_shepherd will simply reschedule the vmstat_work
anytime it sees a particular cpu set or vmstat_update would do the same
from the worker context directly. Even when the two would race the
result wouldn't be incorrect as the counters update is fully idempotent.

Fixes: 0eb77e988032 ("vmstat: make vmstat_updater deferrable again and
shut down on idle")
CC: stable # 4.4+
"

> Signed-off-by: Christoph Lameter 
> 
> Index: linux/mm/vmstat.c
> ===
> --- linux.orig/mm/vmstat.c
> +++ linux/mm/vmstat.c
> @@ -1408,17 +1408,7 @@ static void vmstat_update(struct work_st
>* Defer the checking for differentials to the
>* shepherd thread on a different processor.
>*/
> - int r;
> - /*
> -  * Shepherd work thread does not race since it never
> -  * changes the bit if its zero but the cpu
> -  * online / off line code may race if
> -  * worker threads are still allowed during
> -  * shutdown / startup.
> -  */
> - r = cpumask_test_and_set_cpu(smp_processor_id(),
> - cpu_stat_off);
> - VM_BUG_ON(r);
> + cpumask_set_cpu(smp_processor_id(), cpu_stat_off);
>   }
>  }

-- 
Michal Hocko
SUSE Labs


Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!

2016-01-20 Thread Christoph Lameter
On Wed, 20 Jan 2016, Michal Hocko wrote:

> [CCing Andrew]
>
> I am just reading through this old discussion again because "vmstat:
> make vmstat_updater deferrable again and shut down on idle" which seems
> to be the culprit AFAIU has been merged as 0eb77e988032 and I do not see
> any follow up fix merged to linus tree

Is there any way to reproce this issue? This is running through trinity
right? Can we please get the exact syscall that causes this to occur?


Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!

2016-01-20 Thread Christoph Lameter
On Wed, 20 Jan 2016, Michal Hocko wrote:

> On Wed 20-01-16 09:55:22, Christoph Lameter wrote:
> [...]
> > Subject: vmstat: Remove BUG_ON from vmstat_update
> >
> > If we detect that there is nothing to do just set the flag and do not check
> > if it was already set before. Races really do not matter. If the flag is
> > set by any code then the shepherd will start dealing with the situation
> > and reenable the vmstat workers when necessary again.
> >
> > Concurrent actions could be onlining and offlining of processors or be a
> > result of concurrency issues when updating the cpumask from multiple
> > processors.
>
> Now that 7e988032 ("vmstat: make vmstat_updater deferrable again and
> shut down on idle) is merged the VM_BUG_ON is simply bogus because
> vmstat_update might "race" with quiet_vmstat. The changelog should
> reflect that. What about the following wording?

How can it race if preemption is off?

> Since 0eb77e988032 ("vmstat: make vmstat_updater deferrable again and
> shut down on idle") quiet_vmstat might update cpu_stat_off and mark a
> particular cpu to be handled by vmstat_shepherd. This might trigger
> a VM_BUG_ON in vmstat_update because the work item might have been
> sleeping during the idle period and see the cpu_stat_off updated after
> the wake up. The VM_BUG_ON is therefore misleading and no more
> appropriate. Moreover it doesn't really suite any protection from real
> bugs because vmstat_shepherd will simply reschedule the vmstat_work
> anytime it sees a particular cpu set or vmstat_update would do the same
> from the worker context directly. Even when the two would race the
> result wouldn't be incorrect as the counters update is fully idempotent.


Hmmm... the vmstat_update can be interrupted while running and the cpu put
into idle mode? If vmstat_update is running then the cpu is not idle but
running code. If this is really going on then there is other stuff wrong
with the idling logic.

> Fixes: 0eb77e988032 ("vmstat: make vmstat_updater deferrable again and
> shut down on idle")
> CC: stable # 4.4+

?? There has not been an upstream release with this yet.


Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!

2016-01-04 Thread Sasha Levin
On 01/04/2016 01:05 PM, Christoph Lameter wrote:
> On Thu, 24 Dec 2015, Sasha Levin wrote:
> 
>>> Also what workload triggers the BUG()?
>>
>> Fuzzing with trinity inside a KVM guest. I've attached my config.
> 
> Ok build and bootup works fine after fix from Tetsuo to config. Does not
> like my initrd it seems. Is there a root with the tools available somehow?

Will is hosting a stand-alone version here: 
git://git.kernel.org/pub/scm/linux/kernel/git/will/kvmtool.git


Thanks,
Sasha

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!

2016-01-04 Thread Christoph Lameter
On Thu, 24 Dec 2015, Sasha Levin wrote:

> > Also what workload triggers the BUG()?
>
> Fuzzing with trinity inside a KVM guest. I've attached my config.

Ok build and bootup works fine after fix from Tetsuo to config. Does not
like my initrd it seems. Is there a root with the tools available somehow?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!

2016-01-04 Thread Sasha Levin
On 01/04/2016 01:05 PM, Christoph Lameter wrote:
> On Thu, 24 Dec 2015, Sasha Levin wrote:
> 
>>> Also what workload triggers the BUG()?
>>
>> Fuzzing with trinity inside a KVM guest. I've attached my config.
> 
> Ok build and bootup works fine after fix from Tetsuo to config. Does not
> like my initrd it seems. Is there a root with the tools available somehow?

Will is hosting a stand-alone version here: 
git://git.kernel.org/pub/scm/linux/kernel/git/will/kvmtool.git


Thanks,
Sasha

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!

2016-01-04 Thread Christoph Lameter
On Thu, 24 Dec 2015, Sasha Levin wrote:

> > Also what workload triggers the BUG()?
>
> Fuzzing with trinity inside a KVM guest. I've attached my config.

Ok build and bootup works fine after fix from Tetsuo to config. Does not
like my initrd it seems. Is there a root with the tools available somehow?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!

2015-12-29 Thread Christoph Lameter
Builds with your kernel config fail with

 CHK include/generated/bounds.h
  CHK include/generated/asm-offsets.h
  CALLscripts/checksyscalls.sh
  CHK include/generated/compile.h
make[1]: *** No rule to make target 'signing_key.pem', needed by
'certs/signing_key.x509'.  Stop.
Makefile:967: recipe for target 'certs' failed
make: *** [certs] Error 2
make: *** Waiting for unfinished jobs
  CHK kernel/config_data.h

???

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!

2015-12-29 Thread Christoph Lameter
On Thu, 24 Dec 2015, Sasha Levin wrote:

> >> > [ 3462.527795]   pwq 4: cpus=2 node=2 flags=0x0 nice=0 active=1/256
> >> > [ 3462.554836] pending: vmstat_update
> > Does that mean that vmstat_update locks up or something that schedules it?
>
> I think it means that vmstat_update didn't finish running (working).

There is nothing in there that blocks.

> > Also what workload triggers the BUG()?
>
> Fuzzing with trinity inside a KVM guest. I've attached my config.

Ok will have a look.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!

2015-12-29 Thread Christoph Lameter
On Thu, 24 Dec 2015, Sasha Levin wrote:

> >> > [ 3462.527795]   pwq 4: cpus=2 node=2 flags=0x0 nice=0 active=1/256
> >> > [ 3462.554836] pending: vmstat_update
> > Does that mean that vmstat_update locks up or something that schedules it?
>
> I think it means that vmstat_update didn't finish running (working).

There is nothing in there that blocks.

> > Also what workload triggers the BUG()?
>
> Fuzzing with trinity inside a KVM guest. I've attached my config.

Ok will have a look.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!

2015-12-29 Thread Christoph Lameter
Builds with your kernel config fail with

 CHK include/generated/bounds.h
  CHK include/generated/asm-offsets.h
  CALLscripts/checksyscalls.sh
  CHK include/generated/compile.h
make[1]: *** No rule to make target 'signing_key.pem', needed by
'certs/signing_key.x509'.  Stop.
Makefile:967: recipe for target 'certs' failed
make: *** [certs] Error 2
make: *** Waiting for unfinished jobs
  CHK kernel/config_data.h

???

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!

2015-12-24 Thread Sasha Levin
On 12/21/2015 04:14 PM, Christoph Lameter wrote:
> On Mon, 21 Dec 2015, Sasha Levin wrote:
> 
>> > I've also noticed a new warning from the workqueue code which my scripts
>> > didn't pick up before:
>> >
>> > [ 3462.380681] BUG: workqueue lockup - pool cpus=2 node=2 flags=0x4 nice=0 
>> > stuck for 54s!
>> > [ 3462.522041] workqueue vmstat: flags=0xc
>> > [ 3462.527795]   pwq 4: cpus=2 node=2 flags=0x0 nice=0 active=1/256
>> > [ 3462.554836] pending: vmstat_update
> Does that mean that vmstat_update locks up or something that schedules it?

I think it means that vmstat_update didn't finish running (working).

> Also what workload triggers the BUG()?

Fuzzing with trinity inside a KVM guest. I've attached my config.


Thanks,
Sasha



config-sasha.gz
Description: application/gzip


Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!

2015-12-24 Thread Sasha Levin
On 12/21/2015 04:14 PM, Christoph Lameter wrote:
> On Mon, 21 Dec 2015, Sasha Levin wrote:
> 
>> > I've also noticed a new warning from the workqueue code which my scripts
>> > didn't pick up before:
>> >
>> > [ 3462.380681] BUG: workqueue lockup - pool cpus=2 node=2 flags=0x4 nice=0 
>> > stuck for 54s!
>> > [ 3462.522041] workqueue vmstat: flags=0xc
>> > [ 3462.527795]   pwq 4: cpus=2 node=2 flags=0x0 nice=0 active=1/256
>> > [ 3462.554836] pending: vmstat_update
> Does that mean that vmstat_update locks up or something that schedules it?

I think it means that vmstat_update didn't finish running (working).

> Also what workload triggers the BUG()?

Fuzzing with trinity inside a KVM guest. I've attached my config.


Thanks,
Sasha



config-sasha.gz
Description: application/gzip


Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!

2015-12-22 Thread Christoph Lameter
Ran this here but no errors. Need config etc to reproduce.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!

2015-12-22 Thread Christoph Lameter
Ran this here but no errors. Need config etc to reproduce.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!

2015-12-21 Thread Christoph Lameter
On Mon, 21 Dec 2015, Sasha Levin wrote:

> I've also noticed a new warning from the workqueue code which my scripts
> didn't pick up before:
>
> [ 3462.380681] BUG: workqueue lockup - pool cpus=2 node=2 flags=0x4 nice=0 
> stuck for 54s!
> [ 3462.522041] workqueue vmstat: flags=0xc
> [ 3462.527795]   pwq 4: cpus=2 node=2 flags=0x0 nice=0 active=1/256
> [ 3462.554836] pending: vmstat_update

Does that mean that vmstat_update locks up or something that schedules it?

Also what workload triggers the BUG()?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!

2015-12-21 Thread Sasha Levin
On 12/21/2015 03:28 PM, Sasha Levin wrote:
> On 12/21/2015 08:08 AM, Christoph Lameter wrote:
>> > On Fri, 18 Dec 2015, Sasha Levin wrote:
>> > 
 >> > [  531.164630] RIP vmstat_update (mm/vmstat.c:1408)
>> > Hmmm.. Yes we need to fold the diffs first before disabling the timer
>> > otherwise the shepherd task may intervene.
>> > 
>> > Does this patch fix it?
> It didn't. With the patch I'm still seeing:

I've also noticed a new warning from the workqueue code which my scripts
didn't pick up before:

[ 3462.380681] BUG: workqueue lockup - pool cpus=2 node=2 flags=0x4 nice=0 
stuck for 54s!
[ 3462.522041] workqueue vmstat: flags=0xc
[ 3462.527795]   pwq 4: cpus=2 node=2 flags=0x0 nice=0 active=1/256
[ 3462.554836] pending: vmstat_update


Thanks,
Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!

2015-12-21 Thread Sasha Levin
On 12/21/2015 08:08 AM, Christoph Lameter wrote:
> On Fri, 18 Dec 2015, Sasha Levin wrote:
> 
>> > [  531.164630] RIP vmstat_update (mm/vmstat.c:1408)
> Hmmm.. Yes we need to fold the diffs first before disabling the timer
> otherwise the shepherd task may intervene.
> 
> Does this patch fix it?

It didn't. With the patch I'm still seeing:

[ 1997.674112] kernel BUG at mm/vmstat.c:1408!
[ 1997.674930] invalid opcode:  [#1] PREEMPT SMP KASAN
[ 1997.676252] Modules linked in:
[ 1997.676880] CPU: 4 PID: 14713 Comm: kworker/4:0 Not tainted 
4.4.0-rc5-next-20151221-sasha-00020-g840272e-dirty #2753
[ 1997.679262] Workqueue: vmstat vmstat_update
[ 1997.680109] task: 88015bca ti: 88015bcb8000 task.ti: 
88015bcb8000
[ 1997.681279] RIP: 0010:[]  [] 
vmstat_update+0x178/0x1a0
[ 1997.682810] RSP: 0018:88015bcbfc00  EFLAGS: 00010297
[ 1997.683611] RAX: 0004 RBX: 8803d2801a18 RCX: 
[ 1997.684689] RDX: 0007 RSI: ad098220 RDI: bc8be724
[ 1997.685750] RBP: 88015bcbfc20 R08:  R09: 88015bca0230
[ 1997.686812] R10: ad098220 R11: 880180a1be78 R12: 880180a1be60
[ 1997.688087] R13: 880157b27908 R14: 880180a21000 R15: 880157b27900
[ 1997.689120] FS:  () GS:880180a0() 
knlGS:
[ 1997.690290] CS:  0010 DS:  ES:  CR0: 8005003b
[ 1997.691141] CR2: 7f05e569 CR3: 000158699000 CR4: 06a0
[ 1997.692189] Stack:
[ 1997.692496]  880180a21000 880157b27918 880180a1be60 
880157b27908
[ 1997.693689]  88015bcbfd40 a23aa40f af66686b 
880157b27948
[ 1997.694851]  880180a1be68 8801 880157b27910 
880157b27920
[ 1997.696013] Call Trace:
[ 1997.696542]  [] process_one_work+0xacf/0x12a0
[ 1997.697516]  [] ? cancel_delayed_work_sync+0x10/0x10
[ 1997.698889]  [] ? __schedule+0x1127/0x14c0
[ 1997.699738]  [] ? get_parent_ip+0xd/0x40
[ 1997.700547]  [] ? preempt_count_add+0xe9/0x140
[ 1997.701426]  [] worker_thread+0xcb8/0x1090
[ 1997.702259]  [] ? load_mm_ldt+0x1f0/0x1f0
[ 1997.703084]  [] ? update_rq_clock+0x123/0x2e0
[ 1997.703962]  [] ? process_one_work+0x12a0/0x12a0
[ 1997.704896]  [] ? process_one_work+0x12a0/0x12a0
[ 1997.705804]  [] kthread+0x31e/0x330
[ 1997.706551]  [] ? finish_task_switch+0x6c5/0x970
[ 1997.707481]  [] ? kthread_worker_fn+0x680/0x680
[ 1997.708374]  [] ? kthread_worker_fn+0x680/0x680
[ 1997.709269]  [] ret_from_fork+0x3f/0x70
[ 1997.710067]  [] ? kthread_worker_fn+0x680/0x680
[ 1997.710964] Code: 75 1e be 79 00 00 00 48 c7 c7 40 16 10 ad 89 45 e4 e8 2d 
88 cd ff 8b 45 e4 c6 05 a0 a9 13 1a 01 89 c0 f0 48 0f ab 03 72 02 eb 0e <0f> 0b 
48 c7 c7 c0 21 48 b1 e8 45 d6 ad 01 48 83 c4 08 5b 41 5c
[ 1997.714961] RIP  [] vmstat_update+0x178/0x1a0
[ 1997.715852]  RSP 


Thanks,
Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!

2015-12-21 Thread Christoph Lameter
On Fri, 18 Dec 2015, Sasha Levin wrote:

> [  531.164630] RIP vmstat_update (mm/vmstat.c:1408)

Hmmm.. Yes we need to fold the diffs first before disabling the timer
otherwise the shepherd task may intervene.

Does this patch fix it?


Subject: quiet_vmstat: Avoid race with shepherd by folding counters first

We need to fold the counters first otherwise the shepherd task may
remotely reactivate the vmstat worker.

This also avoids the strange loop. Nothing can really increase the
counters at that point since we are in the cpu idle loop. So
folding the counters once is enough. Cancelling work that does
not exist is fine too so just avoid the branches completely.

Signed-off-by: Christoph Lameter 

Index: linux/mm/vmstat.c
===
--- linux.orig/mm/vmstat.c
+++ linux/mm/vmstat.c
@@ -1419,11 +1419,9 @@ void quiet_vmstat(void)
if (system_state != SYSTEM_RUNNING)
return;

-   do {
-   if (!cpumask_test_and_set_cpu(smp_processor_id(), cpu_stat_off))
-   cancel_delayed_work(this_cpu_ptr(_work));
-
-   } while (refresh_cpu_vm_stats(false));
+   refresh_cpu_vm_stats(false);
+   cancel_delayed_work(this_cpu_ptr(_work));
+   cpumask_set_cpu(smp_processor_id(), cpu_stat_off);
 }

 /*
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!

2015-12-21 Thread Sasha Levin
On 12/21/2015 03:28 PM, Sasha Levin wrote:
> On 12/21/2015 08:08 AM, Christoph Lameter wrote:
>> > On Fri, 18 Dec 2015, Sasha Levin wrote:
>> > 
 >> > [  531.164630] RIP vmstat_update (mm/vmstat.c:1408)
>> > Hmmm.. Yes we need to fold the diffs first before disabling the timer
>> > otherwise the shepherd task may intervene.
>> > 
>> > Does this patch fix it?
> It didn't. With the patch I'm still seeing:

I've also noticed a new warning from the workqueue code which my scripts
didn't pick up before:

[ 3462.380681] BUG: workqueue lockup - pool cpus=2 node=2 flags=0x4 nice=0 
stuck for 54s!
[ 3462.522041] workqueue vmstat: flags=0xc
[ 3462.527795]   pwq 4: cpus=2 node=2 flags=0x0 nice=0 active=1/256
[ 3462.554836] pending: vmstat_update


Thanks,
Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!

2015-12-21 Thread Sasha Levin
On 12/21/2015 08:08 AM, Christoph Lameter wrote:
> On Fri, 18 Dec 2015, Sasha Levin wrote:
> 
>> > [  531.164630] RIP vmstat_update (mm/vmstat.c:1408)
> Hmmm.. Yes we need to fold the diffs first before disabling the timer
> otherwise the shepherd task may intervene.
> 
> Does this patch fix it?

It didn't. With the patch I'm still seeing:

[ 1997.674112] kernel BUG at mm/vmstat.c:1408!
[ 1997.674930] invalid opcode:  [#1] PREEMPT SMP KASAN
[ 1997.676252] Modules linked in:
[ 1997.676880] CPU: 4 PID: 14713 Comm: kworker/4:0 Not tainted 
4.4.0-rc5-next-20151221-sasha-00020-g840272e-dirty #2753
[ 1997.679262] Workqueue: vmstat vmstat_update
[ 1997.680109] task: 88015bca ti: 88015bcb8000 task.ti: 
88015bcb8000
[ 1997.681279] RIP: 0010:[]  [] 
vmstat_update+0x178/0x1a0
[ 1997.682810] RSP: 0018:88015bcbfc00  EFLAGS: 00010297
[ 1997.683611] RAX: 0004 RBX: 8803d2801a18 RCX: 
[ 1997.684689] RDX: 0007 RSI: ad098220 RDI: bc8be724
[ 1997.685750] RBP: 88015bcbfc20 R08:  R09: 88015bca0230
[ 1997.686812] R10: ad098220 R11: 880180a1be78 R12: 880180a1be60
[ 1997.688087] R13: 880157b27908 R14: 880180a21000 R15: 880157b27900
[ 1997.689120] FS:  () GS:880180a0() 
knlGS:
[ 1997.690290] CS:  0010 DS:  ES:  CR0: 8005003b
[ 1997.691141] CR2: 7f05e569 CR3: 000158699000 CR4: 06a0
[ 1997.692189] Stack:
[ 1997.692496]  880180a21000 880157b27918 880180a1be60 
880157b27908
[ 1997.693689]  88015bcbfd40 a23aa40f af66686b 
880157b27948
[ 1997.694851]  880180a1be68 8801 880157b27910 
880157b27920
[ 1997.696013] Call Trace:
[ 1997.696542]  [] process_one_work+0xacf/0x12a0
[ 1997.697516]  [] ? cancel_delayed_work_sync+0x10/0x10
[ 1997.698889]  [] ? __schedule+0x1127/0x14c0
[ 1997.699738]  [] ? get_parent_ip+0xd/0x40
[ 1997.700547]  [] ? preempt_count_add+0xe9/0x140
[ 1997.701426]  [] worker_thread+0xcb8/0x1090
[ 1997.702259]  [] ? load_mm_ldt+0x1f0/0x1f0
[ 1997.703084]  [] ? update_rq_clock+0x123/0x2e0
[ 1997.703962]  [] ? process_one_work+0x12a0/0x12a0
[ 1997.704896]  [] ? process_one_work+0x12a0/0x12a0
[ 1997.705804]  [] kthread+0x31e/0x330
[ 1997.706551]  [] ? finish_task_switch+0x6c5/0x970
[ 1997.707481]  [] ? kthread_worker_fn+0x680/0x680
[ 1997.708374]  [] ? kthread_worker_fn+0x680/0x680
[ 1997.709269]  [] ret_from_fork+0x3f/0x70
[ 1997.710067]  [] ? kthread_worker_fn+0x680/0x680
[ 1997.710964] Code: 75 1e be 79 00 00 00 48 c7 c7 40 16 10 ad 89 45 e4 e8 2d 
88 cd ff 8b 45 e4 c6 05 a0 a9 13 1a 01 89 c0 f0 48 0f ab 03 72 02 eb 0e <0f> 0b 
48 c7 c7 c0 21 48 b1 e8 45 d6 ad 01 48 83 c4 08 5b 41 5c
[ 1997.714961] RIP  [] vmstat_update+0x178/0x1a0
[ 1997.715852]  RSP 


Thanks,
Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!

2015-12-21 Thread Christoph Lameter
On Mon, 21 Dec 2015, Sasha Levin wrote:

> I've also noticed a new warning from the workqueue code which my scripts
> didn't pick up before:
>
> [ 3462.380681] BUG: workqueue lockup - pool cpus=2 node=2 flags=0x4 nice=0 
> stuck for 54s!
> [ 3462.522041] workqueue vmstat: flags=0xc
> [ 3462.527795]   pwq 4: cpus=2 node=2 flags=0x0 nice=0 active=1/256
> [ 3462.554836] pending: vmstat_update

Does that mean that vmstat_update locks up or something that schedules it?

Also what workload triggers the BUG()?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!

2015-12-21 Thread Christoph Lameter
On Fri, 18 Dec 2015, Sasha Levin wrote:

> [  531.164630] RIP vmstat_update (mm/vmstat.c:1408)

Hmmm.. Yes we need to fold the diffs first before disabling the timer
otherwise the shepherd task may intervene.

Does this patch fix it?


Subject: quiet_vmstat: Avoid race with shepherd by folding counters first

We need to fold the counters first otherwise the shepherd task may
remotely reactivate the vmstat worker.

This also avoids the strange loop. Nothing can really increase the
counters at that point since we are in the cpu idle loop. So
folding the counters once is enough. Cancelling work that does
not exist is fine too so just avoid the branches completely.

Signed-off-by: Christoph Lameter 

Index: linux/mm/vmstat.c
===
--- linux.orig/mm/vmstat.c
+++ linux/mm/vmstat.c
@@ -1419,11 +1419,9 @@ void quiet_vmstat(void)
if (system_state != SYSTEM_RUNNING)
return;

-   do {
-   if (!cpumask_test_and_set_cpu(smp_processor_id(), cpu_stat_off))
-   cancel_delayed_work(this_cpu_ptr(_work));
-
-   } while (refresh_cpu_vm_stats(false));
+   refresh_cpu_vm_stats(false);
+   cancel_delayed_work(this_cpu_ptr(_work));
+   cpumask_set_cpu(smp_processor_id(), cpu_stat_off);
 }

 /*
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


mm, vmstat: kernel BUG at mm/vmstat.c:1408!

2015-12-18 Thread Sasha Levin
Hi all,

I've started seeing the following in the latest -next kernel.

[  531.127489] kernel BUG at mm/vmstat.c:1408!
[  531.128157] invalid opcode:  [#1] PREEMPT SMP KASAN
[  531.128872] Modules linked in:
[  531.129324] CPU: 6 PID: 407 Comm: kworker/6:1 Not tainted 
4.4.0-rc5-next-20151218-sasha-00021-gaba8d84-dirty #2750
[  531.130939] Workqueue: vmstat vmstat_update
[  531.131741] task: 88020407 ti: 880204078000 task.ti: 
880204078000
[  531.133189] RIP: vmstat_update (mm/vmstat.c:1408)
[  531.134466] RSP: 0018:88020407fbf8  EFLAGS: 00010293
[  531.135132] RAX: 0006 RBX: 8800418e2fd8 RCX: 
[  531.135995] RDX: 0007 RSI: 8c0982a0 RDI: 9b8bd6e4
[  531.137475] RBP: 88020407fc18 R08:  R09: 880204070230
[  531.138304] R10: 8c0982a0 R11: e272b4f2 R12: 880204c1bf60
[  531.139329] R13: 880204ab09c8 R14: 880204ab09b8 R15: 880204ab09b0
[  531.140261] FS:  () GS:880204c0() 
knlGS:
[  531.141218] CS:  0010 DS:  ES:  CR0: 8005003b
[  531.142036] CR2: 7f039a8c1944 CR3: 0ea28000 CR4: 06a0
[  531.142752] Stack:
[  531.142963]  880204c21000 880204c21000 880204c1bf60 
880204ab09c8
[  531.144095]  88020407fd40 813a8fea 41b58ab3 
8e667cdb
[  531.145258]  880204ab09f8 880204c1bf68 880204ab09c0 
8802
[  531.146475] Call Trace:
[  531.147037] process_one_work (./arch/x86/include/asm/preempt.h:22 
kernel/workqueue.c:2045)
[  531.150790] worker_thread (include/linux/compiler.h:218 
include/linux/list.h:206 kernel/workqueue.c:2171)
[  531.155176] kthread (kernel/kthread.c:209)
[  531.158941] ret_from_fork (arch/x86/entry/entry_64.S:469)
[ 531.160654] Code: 75 1e be 79 00 00 00 48 c7 c7 80 0f 10 8c 89 45 e4 e8 cd 92 
cd ff 8b 45 e4 c6 05 e1 c4 13 1a 01 89 c0 f0 48 0f ab 03 72 02 eb 0e <0f> 0b 48 
c7 c7 c0 f1 47 90 e8 3d 03 ae 01 48 83 c4 08 5b 41 5c
All code

   0:   75 1e   jne0x20
   2:   be 79 00 00 00  mov$0x79,%esi
   7:   48 c7 c7 80 0f 10 8cmov$0x8c100f80,%rdi
   e:   89 45 e4mov%eax,-0x1c(%rbp)
  11:   e8 cd 92 cd ff  callq  0xffcd92e3
  16:   8b 45 e4mov-0x1c(%rbp),%eax
  19:   c6 05 e1 c4 13 1a 01movb   $0x1,0x1a13c4e1(%rip)# 0x1a13c501
  20:   89 c0   mov%eax,%eax
  22:   f0 48 0f ab 03  lock bts %rax,(%rbx)
  27:   72 02   jb 0x2b
  29:   eb 0e   jmp0x39
  2b:*  0f 0b   ud2 <-- trapping instruction
  2d:   48 c7 c7 c0 f1 47 90mov$0x9047f1c0,%rdi
  34:   e8 3d 03 ae 01  callq  0x1ae0376
  39:   48 83 c4 08 add$0x8,%rsp
  3d:   5b  pop%rbx
  3e:   41 5c   pop%r12
...

Code starting with the faulting instruction
===
   0:   0f 0b   ud2
   2:   48 c7 c7 c0 f1 47 90mov$0x9047f1c0,%rdi
   9:   e8 3d 03 ae 01  callq  0x1ae034b
   e:   48 83 c4 08 add$0x8,%rsp
  12:   5b  pop%rbx
  13:   41 5c   pop%r12
...
[  531.164630] RIP vmstat_update (mm/vmstat.c:1408)
[  531.165523]  RSP 


Thanks,
Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


mm, vmstat: kernel BUG at mm/vmstat.c:1408!

2015-12-18 Thread Sasha Levin
Hi all,

I've started seeing the following in the latest -next kernel.

[  531.127489] kernel BUG at mm/vmstat.c:1408!
[  531.128157] invalid opcode:  [#1] PREEMPT SMP KASAN
[  531.128872] Modules linked in:
[  531.129324] CPU: 6 PID: 407 Comm: kworker/6:1 Not tainted 
4.4.0-rc5-next-20151218-sasha-00021-gaba8d84-dirty #2750
[  531.130939] Workqueue: vmstat vmstat_update
[  531.131741] task: 88020407 ti: 880204078000 task.ti: 
880204078000
[  531.133189] RIP: vmstat_update (mm/vmstat.c:1408)
[  531.134466] RSP: 0018:88020407fbf8  EFLAGS: 00010293
[  531.135132] RAX: 0006 RBX: 8800418e2fd8 RCX: 
[  531.135995] RDX: 0007 RSI: 8c0982a0 RDI: 9b8bd6e4
[  531.137475] RBP: 88020407fc18 R08:  R09: 880204070230
[  531.138304] R10: 8c0982a0 R11: e272b4f2 R12: 880204c1bf60
[  531.139329] R13: 880204ab09c8 R14: 880204ab09b8 R15: 880204ab09b0
[  531.140261] FS:  () GS:880204c0() 
knlGS:
[  531.141218] CS:  0010 DS:  ES:  CR0: 8005003b
[  531.142036] CR2: 7f039a8c1944 CR3: 0ea28000 CR4: 06a0
[  531.142752] Stack:
[  531.142963]  880204c21000 880204c21000 880204c1bf60 
880204ab09c8
[  531.144095]  88020407fd40 813a8fea 41b58ab3 
8e667cdb
[  531.145258]  880204ab09f8 880204c1bf68 880204ab09c0 
8802
[  531.146475] Call Trace:
[  531.147037] process_one_work (./arch/x86/include/asm/preempt.h:22 
kernel/workqueue.c:2045)
[  531.150790] worker_thread (include/linux/compiler.h:218 
include/linux/list.h:206 kernel/workqueue.c:2171)
[  531.155176] kthread (kernel/kthread.c:209)
[  531.158941] ret_from_fork (arch/x86/entry/entry_64.S:469)
[ 531.160654] Code: 75 1e be 79 00 00 00 48 c7 c7 80 0f 10 8c 89 45 e4 e8 cd 92 
cd ff 8b 45 e4 c6 05 e1 c4 13 1a 01 89 c0 f0 48 0f ab 03 72 02 eb 0e <0f> 0b 48 
c7 c7 c0 f1 47 90 e8 3d 03 ae 01 48 83 c4 08 5b 41 5c
All code

   0:   75 1e   jne0x20
   2:   be 79 00 00 00  mov$0x79,%esi
   7:   48 c7 c7 80 0f 10 8cmov$0x8c100f80,%rdi
   e:   89 45 e4mov%eax,-0x1c(%rbp)
  11:   e8 cd 92 cd ff  callq  0xffcd92e3
  16:   8b 45 e4mov-0x1c(%rbp),%eax
  19:   c6 05 e1 c4 13 1a 01movb   $0x1,0x1a13c4e1(%rip)# 0x1a13c501
  20:   89 c0   mov%eax,%eax
  22:   f0 48 0f ab 03  lock bts %rax,(%rbx)
  27:   72 02   jb 0x2b
  29:   eb 0e   jmp0x39
  2b:*  0f 0b   ud2 <-- trapping instruction
  2d:   48 c7 c7 c0 f1 47 90mov$0x9047f1c0,%rdi
  34:   e8 3d 03 ae 01  callq  0x1ae0376
  39:   48 83 c4 08 add$0x8,%rsp
  3d:   5b  pop%rbx
  3e:   41 5c   pop%r12
...

Code starting with the faulting instruction
===
   0:   0f 0b   ud2
   2:   48 c7 c7 c0 f1 47 90mov$0x9047f1c0,%rdi
   9:   e8 3d 03 ae 01  callq  0x1ae034b
   e:   48 83 c4 08 add$0x8,%rsp
  12:   5b  pop%rbx
  13:   41 5c   pop%r12
...
[  531.164630] RIP vmstat_update (mm/vmstat.c:1408)
[  531.165523]  RSP 


Thanks,
Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/