Re: BUG: spinlock bad magic on CPU#0 on BeagleBone

2012-12-19 Thread Stephen Boyd
On 12/19/2012 10:44 PM, Bedia, Vaibhav wrote:
> On Thu, Dec 20, 2012 at 11:55:24, Stephen Boyd wrote:
>> On 12/19/2012 8:48 PM, Bedia, Vaibhav wrote:
>>> I tried out 3 variants of AM335x boards - 2 of these (BeagleBone and EVM) 
>>> have DDR2
>>> and 1 has DDR3 (EVM-SK). The BUG is triggered on all of these at the same 
>>> point.
>>>
>>> With Stephen's change I don't see this on any of the board variants :)
>>> New bootlog below.
>> Great! Can I have your Tested-by then? I'll wrap it up into a patch. Is
>> this is a new regression? From a glance at the code it looks to have
>> existed for quite a while now.
> I went back to a branch based off 3.7-rc4 and don't see the issue there. Not 
> sure
> what is triggering this now.
>
> Tested-by: Vaibhav Bedia 

Thanks. I was thrown off by the author date of this patch which
introduced your problem

commit 8823c079ba7136dc1948d6f6dcb5f8022bde438e
Author: Eric W. Biederman 
AuthorDate: Sun Mar 7 18:49:36 2010 -0800
Commit: Eric W. Biederman 
CommitDate: Mon Nov 19 05:59:18 2012 -0800

vfs: Add setns support for the mount namespace


It seems to have a 2 year gap between commit date and author date.
Either way, it looks to be isolated to the 3.8 merge window but affects
quite a few architectures. Patch to follow shortly.

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: BUG: spinlock bad magic on CPU#0 on BeagleBone

2012-12-19 Thread Bedia, Vaibhav
On Thu, Dec 20, 2012 at 11:55:24, Stephen Boyd wrote:
> On 12/19/2012 8:48 PM, Bedia, Vaibhav wrote:
> > I tried out 3 variants of AM335x boards - 2 of these (BeagleBone and EVM) 
> > have DDR2
> > and 1 has DDR3 (EVM-SK). The BUG is triggered on all of these at the same 
> > point.
> >
> > With Stephen's change I don't see this on any of the board variants :)
> > New bootlog below.
> 
> Great! Can I have your Tested-by then? I'll wrap it up into a patch. Is
> this is a new regression? From a glance at the code it looks to have
> existed for quite a while now.

I went back to a branch based off 3.7-rc4 and don't see the issue there. Not 
sure
what is triggering this now.

Tested-by: Vaibhav Bedia 

--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: spinlock bad magic on CPU#0 on BeagleBone

2012-12-19 Thread Stephen Boyd
On 12/19/2012 8:48 PM, Bedia, Vaibhav wrote:
> I tried out 3 variants of AM335x boards - 2 of these (BeagleBone and EVM) 
> have DDR2
> and 1 has DDR3 (EVM-SK). The BUG is triggered on all of these at the same 
> point.
>
> With Stephen's change I don't see this on any of the board variants :)
> New bootlog below.

Great! Can I have your Tested-by then? I'll wrap it up into a patch. Is
this is a new regression? From a glance at the code it looks to have
existed for quite a while now.

-- 
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: BUG: spinlock bad magic on CPU#0 on BeagleBone

2012-12-19 Thread Bedia, Vaibhav
On Thu, Dec 20, 2012 at 01:53:42, Stephen Boyd wrote:
> On 12/19/12 08:53, Paul Walmsley wrote:
> > On Wed, 19 Dec 2012, Bedia, Vaibhav wrote:
> >
> >> Current mainline on Beaglebone using the omap2plus_defconfig + 3 build 
> >> fixes
> >> is triggering a BUG()
> > ...
> >
> >> [0.109688] Security Framework initialized
> >> [0.109889] Mount-cache hash table entries: 512
> >> [0.112674] BUG: spinlock bad magic on CPU#0, swapper/0/0
> >> [0.112724]  lock: atomic64_lock+0x240/0x400, .magic: , .owner: 
> >> /-1, .owner_cpu: 0
> >> [0.112782] [] (unwind_backtrace+0x0/0xf0) from [] 
> >> (do_raw_spin_lock+0x158/0x198)
> >> [0.112813] [] (do_raw_spin_lock+0x158/0x198) from 
> >> [] (_raw_spin_lock_irqsave+0x4c/0x58)
> >> [0.112844] [] (_raw_spin_lock_irqsave+0x4c/0x58) from 
> >> [] (atomic64_add_return+0x30/0x5c)
> >> [0.112886] [] (atomic64_add_return+0x30/0x5c) from 
> >> [] (alloc_mnt_ns.clone.14+0x44/0xac)
> >> [0.112914] [] (alloc_mnt_ns.clone.14+0x44/0xac) from 
> >> [] (create_mnt_ns+0xc/0x54)
> >> [0.112951] [] (create_mnt_ns+0xc/0x54) from [] 
> >> (mnt_init+0x120/0x1d4)
> >> [0.112978] [] (mnt_init+0x120/0x1d4) from [] 
> >> (vfs_caches_init+0xe0/0x10c)
> >> [0.113005] [] (vfs_caches_init+0xe0/0x10c) from [] 
> >> (start_kernel+0x29c/0x300)
> >> [0.113029] [] (start_kernel+0x29c/0x300) from [<80008078>] 
> >> (0x80008078)
> >> [0.118290] CPU: Testing write buffer coherency: ok
> >> [0.118968] CPU0: thread -1, cpu 0, socket -1, mpidr 0
> >> [0.119053] Setting up static identity map for 0x804de2c8 - 0x804de338
> >> [0.120698] Brought up 1 CPUs
> > This is probably a memory corruption bug, there's probably some code 
> > executing early that's writing outside its own data and trashing some 
> > previously-allocated memory.
> 
> I'm not so sure. It looks like atomic64s use spinlocks on processors
> that don't have 64-bit atomic instructions (see lib/atomic64.c). And
> those spinlocks are not initialized until a pure initcall runs,
> init_atomic64_lock(). Pure initcalls don't run until after
> vfs_caches_init() and so you get this BUG() warning that the spinlock is
> not initialized.
> 
> How about we initialize the locks statically? Does that fix your problem?
> 
> >8-
> 
> diff --git a/lib/atomic64.c b/lib/atomic64.c
> index 9785378..08a4f06 100644
> --- a/lib/atomic64.c
> +++ b/lib/atomic64.c
> @@ -31,7 +31,11 @@
>  static union {
> raw_spinlock_t lock;
> char pad[L1_CACHE_BYTES];
> -} atomic64_lock[NR_LOCKS] __cacheline_aligned_in_smp;
> +} atomic64_lock[NR_LOCKS] __cacheline_aligned_in_smp = {
> +   [0 ... (NR_LOCKS - 1)] = {
> +   .lock =  __RAW_SPIN_LOCK_UNLOCKED(atomic64_lock.lock),
> +   },
> +};
>  
>  static inline raw_spinlock_t *lock_addr(const atomic64_t *v)
>  {
> @@ -173,14 +177,3 @@ int atomic64_add_unless(atomic64_t *v, long long a, long 
> long u)
> return ret;
>  }
>  EXPORT_SYMBOL(atomic64_add_unless);
> -
> -static int init_atomic64_lock(void)
> -{
> -   int i;
> -
> -   for (i = 0; i < NR_LOCKS; ++i)
> -   raw_spin_lock_init(&atomic64_lock[i].lock);
> -   return 0;
> -}
> -
> -pure_initcall(init_atomic64_lock);
> 

I tried out 3 variants of AM335x boards - 2 of these (BeagleBone and EVM) have 
DDR2
and 1 has DDR3 (EVM-SK). The BUG is triggered on all of these at the same point.

With Stephen's change I don't see this on any of the board variants :)
New bootlog below.

Thanks,
Vaibhav

---


[0.00] Booting Linux on physical CPU 0x0
[0.00] Linux version 3.7.0-01415-g55bc169-dirty (a0393953@psplinux063) 
(gcc version 4.5.3 20110311 (prerelease) (GCC) ) #4 SMP Thu Dec 20 09:59:12 IST 
2012
[0.00] CPU: ARMv7 Processor [413fc082] revision 2 (ARMv7), cr=10c53c7d
[0.00] CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing 
instruction cache
[0.00] Machine: Generic AM33XX (Flattened Device Tree), model: TI 
AM335x BeagleBone
[0.00] Memory policy: ECC disabled, Data cache writeback
[0.00] AM335X ES1.0 (neon )
[0.00] PERCPU: Embedded 9 pages/cpu @c0f1a000 s12992 r8192 d15680 u36864
[0.00] Built 1 zonelists in Zone order, mobility grouping on.  Total 
pages: 64768
[0.00] Kernel command line: console=ttyO0,115200n8 mem=256M 
root=/dev/ram rw initrd=0x8200,16MB ramdisk_size=65536 earlyprintk=serial
[0.00] PID hash table entries: 1024 (order: 0, 4096 bytes)
[0.00] Dentry cache hash table entries: 32768 (order: 5, 131072 bytes)
[0.00] Inode-cache hash table entries: 16384 (order: 4, 65536 bytes)
[0.00] __ex_table already sorted, skipping sort
[0.00] Memory: 255MB = 255MB total
[0.00] Memory: 229012k/229012k available, 33132k reserved, 0K highmem
[0.00] Virtual kernel memory layout:
[0.00] vector  : 0x - 0x1000   (   4 kB)
[0.00] fixmap  : 0xfff0 - 0xfffe   ( 896 kB)
[ 

Re: BUG: spinlock bad magic on CPU#0 on BeagleBone

2012-12-19 Thread Stephen Boyd
On 12/19/12 08:53, Paul Walmsley wrote:
> On Wed, 19 Dec 2012, Bedia, Vaibhav wrote:
>
>> Current mainline on Beaglebone using the omap2plus_defconfig + 3 build fixes
>> is triggering a BUG()
> ...
>
>> [0.109688] Security Framework initialized
>> [0.109889] Mount-cache hash table entries: 512
>> [0.112674] BUG: spinlock bad magic on CPU#0, swapper/0/0
>> [0.112724]  lock: atomic64_lock+0x240/0x400, .magic: , .owner: 
>> /-1, .owner_cpu: 0
>> [0.112782] [] (unwind_backtrace+0x0/0xf0) from [] 
>> (do_raw_spin_lock+0x158/0x198)
>> [0.112813] [] (do_raw_spin_lock+0x158/0x198) from [] 
>> (_raw_spin_lock_irqsave+0x4c/0x58)
>> [0.112844] [] (_raw_spin_lock_irqsave+0x4c/0x58) from 
>> [] (atomic64_add_return+0x30/0x5c)
>> [0.112886] [] (atomic64_add_return+0x30/0x5c) from 
>> [] (alloc_mnt_ns.clone.14+0x44/0xac)
>> [0.112914] [] (alloc_mnt_ns.clone.14+0x44/0xac) from 
>> [] (create_mnt_ns+0xc/0x54)
>> [0.112951] [] (create_mnt_ns+0xc/0x54) from [] 
>> (mnt_init+0x120/0x1d4)
>> [0.112978] [] (mnt_init+0x120/0x1d4) from [] 
>> (vfs_caches_init+0xe0/0x10c)
>> [0.113005] [] (vfs_caches_init+0xe0/0x10c) from [] 
>> (start_kernel+0x29c/0x300)
>> [0.113029] [] (start_kernel+0x29c/0x300) from [<80008078>] 
>> (0x80008078)
>> [0.118290] CPU: Testing write buffer coherency: ok
>> [0.118968] CPU0: thread -1, cpu 0, socket -1, mpidr 0
>> [0.119053] Setting up static identity map for 0x804de2c8 - 0x804de338
>> [0.120698] Brought up 1 CPUs
> This is probably a memory corruption bug, there's probably some code 
> executing early that's writing outside its own data and trashing some 
> previously-allocated memory.

I'm not so sure. It looks like atomic64s use spinlocks on processors
that don't have 64-bit atomic instructions (see lib/atomic64.c). And
those spinlocks are not initialized until a pure initcall runs,
init_atomic64_lock(). Pure initcalls don't run until after
vfs_caches_init() and so you get this BUG() warning that the spinlock is
not initialized.

How about we initialize the locks statically? Does that fix your problem?

>8-

diff --git a/lib/atomic64.c b/lib/atomic64.c
index 9785378..08a4f06 100644
--- a/lib/atomic64.c
+++ b/lib/atomic64.c
@@ -31,7 +31,11 @@
 static union {
raw_spinlock_t lock;
char pad[L1_CACHE_BYTES];
-} atomic64_lock[NR_LOCKS] __cacheline_aligned_in_smp;
+} atomic64_lock[NR_LOCKS] __cacheline_aligned_in_smp = {
+   [0 ... (NR_LOCKS - 1)] = {
+   .lock =  __RAW_SPIN_LOCK_UNLOCKED(atomic64_lock.lock),
+   },
+};
 
 static inline raw_spinlock_t *lock_addr(const atomic64_t *v)
 {
@@ -173,14 +177,3 @@ int atomic64_add_unless(atomic64_t *v, long long a, long 
long u)
return ret;
 }
 EXPORT_SYMBOL(atomic64_add_unless);
-
-static int init_atomic64_lock(void)
-{
-   int i;
-
-   for (i = 0; i < NR_LOCKS; ++i)
-   raw_spin_lock_init(&atomic64_lock[i].lock);
-   return 0;
-}
-
-pure_initcall(init_atomic64_lock);

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: spinlock bad magic on CPU#0 on BeagleBone

2012-12-19 Thread Paul Walmsley
On Wed, 19 Dec 2012, Bedia, Vaibhav wrote:

> Current mainline on Beaglebone using the omap2plus_defconfig + 3 build fixes
> is triggering a BUG()

...

> [0.109688] Security Framework initialized
> [0.109889] Mount-cache hash table entries: 512
> [0.112674] BUG: spinlock bad magic on CPU#0, swapper/0/0
> [0.112724]  lock: atomic64_lock+0x240/0x400, .magic: , .owner: 
> /-1, .owner_cpu: 0
> [0.112782] [] (unwind_backtrace+0x0/0xf0) from [] 
> (do_raw_spin_lock+0x158/0x198)
> [0.112813] [] (do_raw_spin_lock+0x158/0x198) from [] 
> (_raw_spin_lock_irqsave+0x4c/0x58)
> [0.112844] [] (_raw_spin_lock_irqsave+0x4c/0x58) from 
> [] (atomic64_add_return+0x30/0x5c)
> [0.112886] [] (atomic64_add_return+0x30/0x5c) from [] 
> (alloc_mnt_ns.clone.14+0x44/0xac)
> [0.112914] [] (alloc_mnt_ns.clone.14+0x44/0xac) from 
> [] (create_mnt_ns+0xc/0x54)
> [0.112951] [] (create_mnt_ns+0xc/0x54) from [] 
> (mnt_init+0x120/0x1d4)
> [0.112978] [] (mnt_init+0x120/0x1d4) from [] 
> (vfs_caches_init+0xe0/0x10c)
> [0.113005] [] (vfs_caches_init+0xe0/0x10c) from [] 
> (start_kernel+0x29c/0x300)
> [0.113029] [] (start_kernel+0x29c/0x300) from [<80008078>] 
> (0x80008078)
> [0.118290] CPU: Testing write buffer coherency: ok
> [0.118968] CPU0: thread -1, cpu 0, socket -1, mpidr 0
> [0.119053] Setting up static identity map for 0x804de2c8 - 0x804de338
> [0.120698] Brought up 1 CPUs

This is probably a memory corruption bug, there's probably some code 
executing early that's writing outside its own data and trashing some 
previously-allocated memory.


- Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html