[PATCH RT 2/8] softirq: Init softirq local lock after per cpu section is set up

2012-10-11 Thread Steven Rostedt
From: Steven Rostedt 

I discovered this bug when booting 3.4-rt on my powerpc box. It crashed
with the following report:

[ cut here ]
kernel BUG at /work/rt/stable-rt.git/kernel/rtmutex_common.h:75!
Oops: Exception in kernel mode, sig: 5 [#1]
PREEMPT SMP NR_CPUS=64 NUMA PA Semi PWRficient
Modules linked in:
NIP: c04aa03c LR: c04aa01c CTR: c009b2ac
REGS: c0003e8d7950 TRAP: 0700   Not tainted  (3.4.11-test-rt19)
MSR: 90029032   CR: 2482  XER: 2000
SOFTE: 0
TASK = c0003e8fdcd0[11] 'ksoftirqd/1' THREAD: c0003e8d4000 CPU: 1
GPR00: 0001 c0003e8d7bd0 c0d6cbb0 
GPR04: c0003e8fdcd0  24004082 c0011454
GPR08:  8001 c0003e8fdcd1 
GPR12: 2484 cfff0280  3ad8
GPR16:  0072c798 0060 
GPR20: 00642741 0072c858 3af0 0417
GPR24: 0072dcd0 c0003e7ff990  0001
GPR28:  c0792340 c0ccec78 c1182338
NIP [c04aa03c] .wakeup_next_waiter+0x44/0xb8
LR [c04aa01c] .wakeup_next_waiter+0x24/0xb8
Call Trace:
[c0003e8d7bd0] [c04aa01c] .wakeup_next_waiter+0x24/0xb8 (unreliable)
[c0003e8d7c60] [c04a0320] .rt_spin_lock_slowunlock+0x8c/0xe4
[c0003e8d7ce0] [c04a07cc] .rt_spin_unlock+0x54/0x64
[c0003e8d7d60] [c00636bc] .__thread_do_softirq+0x130/0x174
[c0003e8d7df0] [c006379c] .run_ksoftirqd+0x9c/0x1a4
[c0003e8d7ea0] [c0080b68] .kthread+0xa8/0xb4
[c0003e8d7f90] [c001c2f8] .kernel_thread+0x54/0x70
Instruction dump:
6000 e86d01c8 38630730 4bff7061 6000 ebbf0008 7c7c1b78 e81d0040
7fe00278 7c74 7800d182 6801 <0b00> e88d01c8 387d0010 38840738

The rtmutex_common.h:75 is:

rt_mutex_top_waiter(struct rt_mutex *lock)
{
struct rt_mutex_waiter *w;

w = plist_first_entry(>wait_list, struct rt_mutex_waiter,
   list_entry);
BUG_ON(w->lock != lock);

return w;
}

Where the waiter->lock is corrupted. I saw various other random bugs
that all had to with the softirq lock and plist. As plist needs to be
initialized before it is used I investigated how this lock is
initialized. It's initialized with:

void __init softirq_early_init(void)
{
local_irq_lock_init(local_softirq_lock);
}

Where:

#define local_irq_lock_init(lvar)   \
do {\
int __cpu;  \
for_each_possible_cpu(__cpu)\
spin_lock_init(_cpu(lvar, __cpu).lock); \
} while (0)

As the softirq lock is a local_irq_lock, which is a per_cpu lock, the
initialization is done to all per_cpu versions of the lock. But lets
look at where the softirq_early_init() is called from.

In init/main.c: start_kernel()

/*
 * Interrupts are still disabled. Do necessary setups, then
 * enable them
 */
softirq_early_init();
tick_init();
boot_cpu_init();
page_address_init();
printk(KERN_NOTICE "%s", linux_banner);
setup_arch(_line);
mm_init_owner(_mm, _task);
mm_init_cpumask(_mm);
setup_command_line(command_line);
setup_nr_cpu_ids();
setup_per_cpu_areas();
smp_prepare_boot_cpu(); /* arch-specific boot-cpu hooks */

One of the first things that is called is the initialization of the
softirq lock. But if you look further down, we see the per_cpu areas
have not been set up yet. Thus initializing a local_irq_lock() before
the per_cpu section is set up, may not work as it is initializing the
per cpu locks before the per cpu exists.

By moving the softirq_early_init() right after setup_per_cpu_areas(),
the kernel boots fine.

Signed-off-by: Steven Rostedt 
Cc: Clark Williams 
Cc: John Kacur 
Cc: Carsten Emde 
Cc: voml...@texas.net
Cc: stable...@vger.kernel.org
Link: http://lkml.kernel.org/r/1349362924.6755.18.ca...@gandalf.local.home
Signed-off-by: Thomas Gleixner 
---
 init/main.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/init/main.c b/init/main.c
index d432bea..6f96224 100644
--- a/init/main.c
+++ b/init/main.c
@@ -490,7 +490,6 @@ asmlinkage void __init start_kernel(void)
  * Interrupts are still disabled. Do necessary setups, then
  * enable them
  */
-   softirq_early_init();
tick_init();
boot_cpu_init();
page_address_init();
@@ -501,6 +500,7 @@ asmlinkage void __init start_kernel(void)
setup_command_line(command_line);
setup_nr_cpu_ids();
setup_per_cpu_areas();
+   softirq_early_init();
smp_prepare_boot_cpu(); /* 

[PATCH RT 2/8] softirq: Init softirq local lock after per cpu section is set up

2012-10-11 Thread Steven Rostedt
From: Steven Rostedt rost...@goodmis.org

I discovered this bug when booting 3.4-rt on my powerpc box. It crashed
with the following report:

[ cut here ]
kernel BUG at /work/rt/stable-rt.git/kernel/rtmutex_common.h:75!
Oops: Exception in kernel mode, sig: 5 [#1]
PREEMPT SMP NR_CPUS=64 NUMA PA Semi PWRficient
Modules linked in:
NIP: c04aa03c LR: c04aa01c CTR: c009b2ac
REGS: c0003e8d7950 TRAP: 0700   Not tainted  (3.4.11-test-rt19)
MSR: 90029032 SF,HV,EE,ME,IR,DR,RI  CR: 2482  XER: 2000
SOFTE: 0
TASK = c0003e8fdcd0[11] 'ksoftirqd/1' THREAD: c0003e8d4000 CPU: 1
GPR00: 0001 c0003e8d7bd0 c0d6cbb0 
GPR04: c0003e8fdcd0  24004082 c0011454
GPR08:  8001 c0003e8fdcd1 
GPR12: 2484 cfff0280  3ad8
GPR16:  0072c798 0060 
GPR20: 00642741 0072c858 3af0 0417
GPR24: 0072dcd0 c0003e7ff990  0001
GPR28:  c0792340 c0ccec78 c1182338
NIP [c04aa03c] .wakeup_next_waiter+0x44/0xb8
LR [c04aa01c] .wakeup_next_waiter+0x24/0xb8
Call Trace:
[c0003e8d7bd0] [c04aa01c] .wakeup_next_waiter+0x24/0xb8 (unreliable)
[c0003e8d7c60] [c04a0320] .rt_spin_lock_slowunlock+0x8c/0xe4
[c0003e8d7ce0] [c04a07cc] .rt_spin_unlock+0x54/0x64
[c0003e8d7d60] [c00636bc] .__thread_do_softirq+0x130/0x174
[c0003e8d7df0] [c006379c] .run_ksoftirqd+0x9c/0x1a4
[c0003e8d7ea0] [c0080b68] .kthread+0xa8/0xb4
[c0003e8d7f90] [c001c2f8] .kernel_thread+0x54/0x70
Instruction dump:
6000 e86d01c8 38630730 4bff7061 6000 ebbf0008 7c7c1b78 e81d0040
7fe00278 7c74 7800d182 6801 0b00 e88d01c8 387d0010 38840738

The rtmutex_common.h:75 is:

rt_mutex_top_waiter(struct rt_mutex *lock)
{
struct rt_mutex_waiter *w;

w = plist_first_entry(lock-wait_list, struct rt_mutex_waiter,
   list_entry);
BUG_ON(w-lock != lock);

return w;
}

Where the waiter-lock is corrupted. I saw various other random bugs
that all had to with the softirq lock and plist. As plist needs to be
initialized before it is used I investigated how this lock is
initialized. It's initialized with:

void __init softirq_early_init(void)
{
local_irq_lock_init(local_softirq_lock);
}

Where:

#define local_irq_lock_init(lvar)   \
do {\
int __cpu;  \
for_each_possible_cpu(__cpu)\
spin_lock_init(per_cpu(lvar, __cpu).lock); \
} while (0)

As the softirq lock is a local_irq_lock, which is a per_cpu lock, the
initialization is done to all per_cpu versions of the lock. But lets
look at where the softirq_early_init() is called from.

In init/main.c: start_kernel()

/*
 * Interrupts are still disabled. Do necessary setups, then
 * enable them
 */
softirq_early_init();
tick_init();
boot_cpu_init();
page_address_init();
printk(KERN_NOTICE %s, linux_banner);
setup_arch(command_line);
mm_init_owner(init_mm, init_task);
mm_init_cpumask(init_mm);
setup_command_line(command_line);
setup_nr_cpu_ids();
setup_per_cpu_areas();
smp_prepare_boot_cpu(); /* arch-specific boot-cpu hooks */

One of the first things that is called is the initialization of the
softirq lock. But if you look further down, we see the per_cpu areas
have not been set up yet. Thus initializing a local_irq_lock() before
the per_cpu section is set up, may not work as it is initializing the
per cpu locks before the per cpu exists.

By moving the softirq_early_init() right after setup_per_cpu_areas(),
the kernel boots fine.

Signed-off-by: Steven Rostedt rost...@goodmis.org
Cc: Clark Williams cl...@redhat.com
Cc: John Kacur jka...@redhat.com
Cc: Carsten Emde c...@osadl.org
Cc: voml...@texas.net
Cc: stable...@vger.kernel.org
Link: http://lkml.kernel.org/r/1349362924.6755.18.ca...@gandalf.local.home
Signed-off-by: Thomas Gleixner t...@linutronix.de
---
 init/main.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/init/main.c b/init/main.c
index d432bea..6f96224 100644
--- a/init/main.c
+++ b/init/main.c
@@ -490,7 +490,6 @@ asmlinkage void __init start_kernel(void)
  * Interrupts are still disabled. Do necessary setups, then
  * enable them
  */
-   softirq_early_init();
tick_init();
boot_cpu_init();
page_address_init();
@@ -501,6 +500,7 @@ asmlinkage void __init start_kernel(void)

[PATCH RT 2/8] softirq: Init softirq local lock after per cpu section is set up

2012-10-10 Thread Steven Rostedt
From: Steven Rostedt 

I discovered this bug when booting 3.4-rt on my powerpc box. It crashed
with the following report:

[ cut here ]
kernel BUG at /work/rt/stable-rt.git/kernel/rtmutex_common.h:75!
Oops: Exception in kernel mode, sig: 5 [#1]
PREEMPT SMP NR_CPUS=64 NUMA PA Semi PWRficient
Modules linked in:
NIP: c04aa03c LR: c04aa01c CTR: c009b2ac
REGS: c0003e8d7950 TRAP: 0700   Not tainted  (3.4.11-test-rt19)
MSR: 90029032   CR: 2482  XER: 2000
SOFTE: 0
TASK = c0003e8fdcd0[11] 'ksoftirqd/1' THREAD: c0003e8d4000 CPU: 1
GPR00: 0001 c0003e8d7bd0 c0d6cbb0 
GPR04: c0003e8fdcd0  24004082 c0011454
GPR08:  8001 c0003e8fdcd1 
GPR12: 2484 cfff0280  3ad8
GPR16:  0072c798 0060 
GPR20: 00642741 0072c858 3af0 0417
GPR24: 0072dcd0 c0003e7ff990  0001
GPR28:  c0792340 c0ccec78 c1182338
NIP [c04aa03c] .wakeup_next_waiter+0x44/0xb8
LR [c04aa01c] .wakeup_next_waiter+0x24/0xb8
Call Trace:
[c0003e8d7bd0] [c04aa01c] .wakeup_next_waiter+0x24/0xb8 (unreliable)
[c0003e8d7c60] [c04a0320] .rt_spin_lock_slowunlock+0x8c/0xe4
[c0003e8d7ce0] [c04a07cc] .rt_spin_unlock+0x54/0x64
[c0003e8d7d60] [c00636bc] .__thread_do_softirq+0x130/0x174
[c0003e8d7df0] [c006379c] .run_ksoftirqd+0x9c/0x1a4
[c0003e8d7ea0] [c0080b68] .kthread+0xa8/0xb4
[c0003e8d7f90] [c001c2f8] .kernel_thread+0x54/0x70
Instruction dump:
6000 e86d01c8 38630730 4bff7061 6000 ebbf0008 7c7c1b78 e81d0040
7fe00278 7c74 7800d182 6801 <0b00> e88d01c8 387d0010 38840738

The rtmutex_common.h:75 is:

rt_mutex_top_waiter(struct rt_mutex *lock)
{
struct rt_mutex_waiter *w;

w = plist_first_entry(>wait_list, struct rt_mutex_waiter,
   list_entry);
BUG_ON(w->lock != lock);

return w;
}

Where the waiter->lock is corrupted. I saw various other random bugs
that all had to with the softirq lock and plist. As plist needs to be
initialized before it is used I investigated how this lock is
initialized. It's initialized with:

void __init softirq_early_init(void)
{
local_irq_lock_init(local_softirq_lock);
}

Where:

do {\
int __cpu;  \
for_each_possible_cpu(__cpu)\
spin_lock_init(_cpu(lvar, __cpu).lock); \
} while (0)

As the softirq lock is a local_irq_lock, which is a per_cpu lock, the
initialization is done to all per_cpu versions of the lock. But lets
look at where the softirq_early_init() is called from.

In init/main.c: start_kernel()

/*
 * Interrupts are still disabled. Do necessary setups, then
 * enable them
 */
softirq_early_init();
tick_init();
boot_cpu_init();
page_address_init();
printk(KERN_NOTICE "%s", linux_banner);
setup_arch(_line);
mm_init_owner(_mm, _task);
mm_init_cpumask(_mm);
setup_command_line(command_line);
setup_nr_cpu_ids();
setup_per_cpu_areas();
smp_prepare_boot_cpu(); /* arch-specific boot-cpu hooks */

One of the first things that is called is the initialization of the
softirq lock. But if you look further down, we see the per_cpu areas
have not been set up yet. Thus initializing a local_irq_lock() before
the per_cpu section is set up, may not work as it is initializing the
per cpu locks before the per cpu exists.

By moving the softirq_early_init() right after setup_per_cpu_areas(),
the kernel boots fine.

Signed-off-by: Steven Rostedt 
Cc: Clark Williams 
Cc: John Kacur 
Cc: Carsten Emde 
Cc: voml...@texas.net
Cc: stable...@vger.kernel.org
Link: http://lkml.kernel.org/r/1349362924.6755.18.ca...@gandalf.local.home
Signed-off-by: Thomas Gleixner 
---
 init/main.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/init/main.c b/init/main.c
index f07f2b0..cee1a91 100644
--- a/init/main.c
+++ b/init/main.c
@@ -490,7 +490,6 @@ asmlinkage void __init start_kernel(void)
  * Interrupts are still disabled. Do necessary setups, then
  * enable them
  */
-   softirq_early_init();
tick_init();
boot_cpu_init();
page_address_init();
@@ -501,6 +500,7 @@ asmlinkage void __init start_kernel(void)
setup_command_line(command_line);
setup_nr_cpu_ids();
setup_per_cpu_areas();
+   softirq_early_init();
smp_prepare_boot_cpu(); /* arch-specific boot-cpu hooks */
 
build_all_zonelists(NULL);
-- 

[PATCH RT 2/8] softirq: Init softirq local lock after per cpu section is set up

2012-10-10 Thread Steven Rostedt
From: Steven Rostedt rost...@goodmis.org

I discovered this bug when booting 3.4-rt on my powerpc box. It crashed
with the following report:

[ cut here ]
kernel BUG at /work/rt/stable-rt.git/kernel/rtmutex_common.h:75!
Oops: Exception in kernel mode, sig: 5 [#1]
PREEMPT SMP NR_CPUS=64 NUMA PA Semi PWRficient
Modules linked in:
NIP: c04aa03c LR: c04aa01c CTR: c009b2ac
REGS: c0003e8d7950 TRAP: 0700   Not tainted  (3.4.11-test-rt19)
MSR: 90029032 SF,HV,EE,ME,IR,DR,RI  CR: 2482  XER: 2000
SOFTE: 0
TASK = c0003e8fdcd0[11] 'ksoftirqd/1' THREAD: c0003e8d4000 CPU: 1
GPR00: 0001 c0003e8d7bd0 c0d6cbb0 
GPR04: c0003e8fdcd0  24004082 c0011454
GPR08:  8001 c0003e8fdcd1 
GPR12: 2484 cfff0280  3ad8
GPR16:  0072c798 0060 
GPR20: 00642741 0072c858 3af0 0417
GPR24: 0072dcd0 c0003e7ff990  0001
GPR28:  c0792340 c0ccec78 c1182338
NIP [c04aa03c] .wakeup_next_waiter+0x44/0xb8
LR [c04aa01c] .wakeup_next_waiter+0x24/0xb8
Call Trace:
[c0003e8d7bd0] [c04aa01c] .wakeup_next_waiter+0x24/0xb8 (unreliable)
[c0003e8d7c60] [c04a0320] .rt_spin_lock_slowunlock+0x8c/0xe4
[c0003e8d7ce0] [c04a07cc] .rt_spin_unlock+0x54/0x64
[c0003e8d7d60] [c00636bc] .__thread_do_softirq+0x130/0x174
[c0003e8d7df0] [c006379c] .run_ksoftirqd+0x9c/0x1a4
[c0003e8d7ea0] [c0080b68] .kthread+0xa8/0xb4
[c0003e8d7f90] [c001c2f8] .kernel_thread+0x54/0x70
Instruction dump:
6000 e86d01c8 38630730 4bff7061 6000 ebbf0008 7c7c1b78 e81d0040
7fe00278 7c74 7800d182 6801 0b00 e88d01c8 387d0010 38840738

The rtmutex_common.h:75 is:

rt_mutex_top_waiter(struct rt_mutex *lock)
{
struct rt_mutex_waiter *w;

w = plist_first_entry(lock-wait_list, struct rt_mutex_waiter,
   list_entry);
BUG_ON(w-lock != lock);

return w;
}

Where the waiter-lock is corrupted. I saw various other random bugs
that all had to with the softirq lock and plist. As plist needs to be
initialized before it is used I investigated how this lock is
initialized. It's initialized with:

void __init softirq_early_init(void)
{
local_irq_lock_init(local_softirq_lock);
}

Where:

do {\
int __cpu;  \
for_each_possible_cpu(__cpu)\
spin_lock_init(per_cpu(lvar, __cpu).lock); \
} while (0)

As the softirq lock is a local_irq_lock, which is a per_cpu lock, the
initialization is done to all per_cpu versions of the lock. But lets
look at where the softirq_early_init() is called from.

In init/main.c: start_kernel()

/*
 * Interrupts are still disabled. Do necessary setups, then
 * enable them
 */
softirq_early_init();
tick_init();
boot_cpu_init();
page_address_init();
printk(KERN_NOTICE %s, linux_banner);
setup_arch(command_line);
mm_init_owner(init_mm, init_task);
mm_init_cpumask(init_mm);
setup_command_line(command_line);
setup_nr_cpu_ids();
setup_per_cpu_areas();
smp_prepare_boot_cpu(); /* arch-specific boot-cpu hooks */

One of the first things that is called is the initialization of the
softirq lock. But if you look further down, we see the per_cpu areas
have not been set up yet. Thus initializing a local_irq_lock() before
the per_cpu section is set up, may not work as it is initializing the
per cpu locks before the per cpu exists.

By moving the softirq_early_init() right after setup_per_cpu_areas(),
the kernel boots fine.

Signed-off-by: Steven Rostedt rost...@goodmis.org
Cc: Clark Williams cl...@redhat.com
Cc: John Kacur jka...@redhat.com
Cc: Carsten Emde c...@osadl.org
Cc: voml...@texas.net
Cc: stable...@vger.kernel.org
Link: http://lkml.kernel.org/r/1349362924.6755.18.ca...@gandalf.local.home
Signed-off-by: Thomas Gleixner t...@linutronix.de
---
 init/main.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/init/main.c b/init/main.c
index f07f2b0..cee1a91 100644
--- a/init/main.c
+++ b/init/main.c
@@ -490,7 +490,6 @@ asmlinkage void __init start_kernel(void)
  * Interrupts are still disabled. Do necessary setups, then
  * enable them
  */
-   softirq_early_init();
tick_init();
boot_cpu_init();
page_address_init();
@@ -501,6 +500,7 @@ asmlinkage void __init start_kernel(void)
setup_command_line(command_line);
setup_nr_cpu_ids();
setup_per_cpu_areas();