Re: Crash on 2.6.21.7 Vanilla + DRBD 0.7

2007-10-08 Thread Laurent CARON
David Chinner wrote:
> Can you turn on slab debug and poisoning and see where
> the kernel fails with that? e.g. set:
> 
> CONFIG_DEBUG_SLAB=y
> CONFIG_DEBUG_SLAB_LEAK=y


I was a little worried about letting those servers in such a bad state,
and went the "easy" way.

I did upgrade from drbd 0.7.X to latest svn 8.0.X

Laurent

PS: Should this bug reappear, i'll change the kernel's config, and let
you know the result.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Crash on 2.6.21.7 Vanilla + DRBD 0.7

2007-10-08 Thread Laurent CARON
David Chinner wrote:
 Can you turn on slab debug and poisoning and see where
 the kernel fails with that? e.g. set:
 
 CONFIG_DEBUG_SLAB=y
 CONFIG_DEBUG_SLAB_LEAK=y


I was a little worried about letting those servers in such a bad state,
and went the easy way.

I did upgrade from drbd 0.7.X to latest svn 8.0.X

Laurent

PS: Should this bug reappear, i'll change the kernel's config, and let
you know the result.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Crash on 2.6.21.7 Vanilla + DRBD 0.7

2007-10-07 Thread David Chinner
On Thu, Oct 04, 2007 at 09:29:40AM +0200, Laurent Caron wrote:
> 
> Hi,
> 
> I did compile a fresh 2.6.21.7 kernel from kernel.org (no distro patch, 
> ), and latest svn (3062) 0.7.X drbd.
> 
> After just 2 days of uptime, I did experience another crash.
> 
> I wonder if it is an XFS related bug, a DRBD one, or related to XFS on top of 
> DRBD.
> 
> This bug seems to occur with intensive IO operations.
> 
> What do you think about it ?

This still looks like memory corruption of some sort:. I'd
suspect DRBD at this point because nobody is repprting this against
other block devices in 2.6.21

> Oct  3 18:55:23  kernel: Oops: 0002 [#1]
> Oct  3 18:55:23  kernel: SMP 
> Oct  3 18:55:23  kernel: CPU:7
> Oct  3 18:55:23  kernel: EIP:0060:[]Not tainted VLI
> Oct  3 18:55:23  kernel: EFLAGS: 00010046   (2.6.21-dl380-g5-20071001 #1)
> Oct  3 18:55:23  kernel: EIP is at cache_alloc_refill+0x11c/0x4f0

Can you turn on slab debug and poisoning and see where
the kernel fails with that? e.g. set:

CONFIG_DEBUG_SLAB=y
CONFIG_DEBUG_SLAB_LEAK=y

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Crash on 2.6.21.7 Vanilla + DRBD 0.7

2007-10-07 Thread David Chinner
On Thu, Oct 04, 2007 at 09:29:40AM +0200, Laurent Caron wrote:
 
 Hi,
 
 I did compile a fresh 2.6.21.7 kernel from kernel.org (no distro patch, 
 ), and latest svn (3062) 0.7.X drbd.
 
 After just 2 days of uptime, I did experience another crash.
 
 I wonder if it is an XFS related bug, a DRBD one, or related to XFS on top of 
 DRBD.
 
 This bug seems to occur with intensive IO operations.
 
 What do you think about it ?

This still looks like memory corruption of some sort:. I'd
suspect DRBD at this point because nobody is repprting this against
other block devices in 2.6.21

 Oct  3 18:55:23  kernel: Oops: 0002 [#1]
 Oct  3 18:55:23  kernel: SMP 
 Oct  3 18:55:23  kernel: CPU:7
 Oct  3 18:55:23  kernel: EIP:0060:[c016540c]Not tainted VLI
 Oct  3 18:55:23  kernel: EFLAGS: 00010046   (2.6.21-dl380-g5-20071001 #1)
 Oct  3 18:55:23  kernel: EIP is at cache_alloc_refill+0x11c/0x4f0

Can you turn on slab debug and poisoning and see where
the kernel fails with that? e.g. set:

CONFIG_DEBUG_SLAB=y
CONFIG_DEBUG_SLAB_LEAK=y

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Crash on 2.6.21.7 Vanilla + DRBD 0.7

2007-10-04 Thread Laurent Caron

Hi,

I did compile a fresh 2.6.21.7 kernel from kernel.org (no distro patch, ), 
and latest svn (3062) 0.7.X drbd.

After just 2 days of uptime, I did experience another crash.

I wonder if it is an XFS related bug, a DRBD one, or related to XFS on top of 
DRBD.

This bug seems to occur with intensive IO operations.

What do you think about it ?

Thanks

Laurent




Oct  3 18:55:23  kernel: Oops: 0002 [#1]
Oct  3 18:55:23  kernel: SMP 
Oct  3 18:55:23  kernel: CPU:7
Oct  3 18:55:23  kernel: EIP:0060:[]Not tainted VLI
Oct  3 18:55:23  kernel: EFLAGS: 00010046   (2.6.21-dl380-g5-20071001 #1)
Oct  3 18:55:23  kernel: EIP is at cache_alloc_refill+0x11c/0x4f0
Oct  3 18:55:23  kernel: eax: f79c2940   ebx: 0015   ecx: 0005   edx: 
65b567b0
Oct  3 18:55:23  kernel: esi: 000a   edi: d5d26000   ebp: f79d03c0   esp: 
d2531c98
Oct  3 18:55:23  kernel: ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
Oct  3 18:55:23  kernel: Process rsync (pid: 22409, ti=d253 task=da1e8070 
task.ti=d253)
Oct  3 18:55:23  kernel: Stack: 0010 02d0 ce9ca0b8 02d0 f79cfe00 
f79d1c00 f79c2940  
Oct  3 18:55:23  kernel: 0001 d2531cd4 ce9ca088 c022aade d5d2601c 0282 
f79cfe00 02d0 
Oct  3 18:55:23  kernel: f79cfe00 c01652e6  0001 c0265a4e 0011 
d2531d60 d7acfb40 
Oct  3 18:55:23  kernel: Call Trace:
Oct  3 18:55:23  kernel: [] xfs_da_brelse+0x6e/0xb0
Oct  3 18:55:23  kernel: [] kmem_cache_alloc+0x46/0x50
Oct  3 18:55:23  kernel: [] kmem_zone_alloc+0x4e/0xc0
Oct  3 18:55:23  kernel: [] xfs_fs_alloc_inode+0xf/0x20
Oct  3 18:55:23  kernel: [] alloc_inode+0x16/0x170
Oct  3 18:55:23  kernel: [] iget_locked+0x59/0x130
Oct  3 18:55:23  kernel: [] xfs_iget+0x78/0x160
Oct  3 18:55:23  kernel: [] xfs_acl_vget+0x6c/0x160
Oct  3 18:55:23  kernel: [] xfs_dir_lookup_int+0x93/0xf0
Oct  3 18:55:23  kernel: [] xfs_lookup+0x75/0xa0
Oct  3 18:55:23  kernel: [] xfs_vn_lookup+0x52/0x90
Oct  3 18:55:23  kernel: [] do_lookup+0x148/0x190
Oct  3 18:55:23  kernel: [] __link_path_walk+0x814/0xe40
Oct  3 18:55:23  kernel: [] link_path_walk+0x45/0xc0
Oct  3 18:55:23  kernel: [] do_path_lookup+0x81/0x1c0
Oct  3 18:55:23  kernel: [] getname+0xb3/0xe0
Oct  3 18:55:23  kernel: [] __user_walk_fd+0x3b/0x60
Oct  3 18:55:23  kernel: [] vfs_lstat_fd+0x1f/0x50
Oct  3 18:55:23  kernel: [] sys_lstat64+0xf/0x30
Oct  3 18:55:23  kernel: [] sysenter_past_esp+0x5d/0x81
Oct  3 18:55:23  kernel: ===
Oct  3 18:55:23  kernel: Code: 10 8b 77 14 01 c2 8b 44 24 30 8b 34 b0 89 77 14 
89 54 8d 14 8d 51 01 89 55 00 8b 44 24 10 8b 77 10 3b 70 5c 72 c0 8b 17 8b 47 
04 <89> 42 04 89 10 83 7f 14 ff c7 07 00 01 10 00 c7 47 04 00 02 20 
Oct  3 18:55:23  kernel: EIP: [] cache_alloc_refill+0x11c/0x4f0 
SS:ESP 0068:d2531c98
Oct  3 18:55:26  kernel: Oops: 0002 [#2]
Oct  3 18:55:26  kernel: SMP 
Oct  3 18:55:26  kernel: CPU:7
Oct  3 18:55:26  kernel: EIP:0060:[]Not tainted VLI
Oct  3 18:55:26  kernel: EFLAGS: 00210282   (2.6.21-dl380-g5-20071001 #1)
Oct  3 18:55:26  kernel: EIP is at alloc_inode+0x20/0x170
Oct  3 18:55:26  kernel: eax: b4fd89ba   ebx: b4fd89ba   ecx: b4fd89ba   edx: 
b4fd89ba
Oct  3 18:55:26  kernel: esi: f29bb000   edi: f29bb000   ebp: ca743575   esp: 
d6747c64
Oct  3 18:55:26  kernel: ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
Oct  3 18:55:26  kernel: Process imapd (pid: 20054, ti=d6746000 task=e04a20b0 
task.ti=d6746000)
Oct  3 18:55:26  kernel: Stack:  c76fe0dc f29bb000 c017bd89  
 c04abda0 ca743575 
Oct  3 18:55:26  kernel: ca743575 f53b5800 c023fa38 cb2b4524 1b2595f3 0020 
f0dd7400 ded8b7a8 
Oct  3 18:55:26  kernel:  f53b5800 c04abda0 cb2b4524 cb2b4524 ca743575 
 0004 
Oct  3 18:55:26  kernel: Call Trace:
Oct  3 18:55:26  kernel: [] iget_locked+0x59/0x130
Oct  3 18:55:26  kernel: [] xfs_iget+0x78/0x160
Oct  3 18:55:26  kernel: [] xfs_trans_iget+0x117/0x190
Oct  3 18:55:26  kernel: [] xfs_ialloc+0xc7/0x570
Oct  3 18:55:26  kernel: [] xlog_grant_push_ail+0x3c/0x150
Oct  3 18:55:26  kernel: [] xfs_dir_ialloc+0x81/0x2d0
Oct  3 18:55:26  kernel: [] xfs_trans_reserve+0xab/0x230
Oct  3 18:55:26  kernel: [] xfs_create+0x395/0x6a0
Oct  3 18:55:26  kernel: [] xfs_iunlock+0x85/0xa0
Oct  3 18:55:26  kernel: [] xfs_vn_mknod+0x235/0x360
Oct  3 18:55:26  kernel: [] vfs_create+0xdd/0x140
Oct  3 18:55:26  kernel: [] open_namei+0x58e/0x5f0
Oct  3 18:55:26  kernel: [] do_filp_open+0x2e/0x60
Oct  3 18:55:26  kernel: [] get_unused_fd+0x4f/0xb0
Oct  3 18:55:26  kernel: [] do_sys_open+0x4a/0xe0
Oct  3 18:55:26  kernel: [] sys_open+0x1c/0x20
Oct  3 18:55:26  kernel: [] sysenter_past_esp+0x5d/0x81
Oct  3 18:55:26  kernel: ===
Oct  3 18:55:26  kernel: Code: 90 90 90 90 90 90 90 90 90 90 90 57 56 89 c6 53 
8b 40 20 8b 10 85 d2 0f 84 1e 01 00 00 89 f0 ff d2 89 c3 85 db 0f 84 ee 00 00 
00 <89> b3 98 00 00 00 b9 02 00 00 00 0f b6 46 10 8d bb f8 00 00 00 
Oct  3 18:55:26  kernel: EIP: [] alloc_inode+0x20/0x170 SS:ESP 

Crash on 2.6.21.7 Vanilla + DRBD 0.7

2007-10-04 Thread Laurent Caron

Hi,

I did compile a fresh 2.6.21.7 kernel from kernel.org (no distro patch, ), 
and latest svn (3062) 0.7.X drbd.

After just 2 days of uptime, I did experience another crash.

I wonder if it is an XFS related bug, a DRBD one, or related to XFS on top of 
DRBD.

This bug seems to occur with intensive IO operations.

What do you think about it ?

Thanks

Laurent




Oct  3 18:55:23  kernel: Oops: 0002 [#1]
Oct  3 18:55:23  kernel: SMP 
Oct  3 18:55:23  kernel: CPU:7
Oct  3 18:55:23  kernel: EIP:0060:[c016540c]Not tainted VLI
Oct  3 18:55:23  kernel: EFLAGS: 00010046   (2.6.21-dl380-g5-20071001 #1)
Oct  3 18:55:23  kernel: EIP is at cache_alloc_refill+0x11c/0x4f0
Oct  3 18:55:23  kernel: eax: f79c2940   ebx: 0015   ecx: 0005   edx: 
65b567b0
Oct  3 18:55:23  kernel: esi: 000a   edi: d5d26000   ebp: f79d03c0   esp: 
d2531c98
Oct  3 18:55:23  kernel: ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
Oct  3 18:55:23  kernel: Process rsync (pid: 22409, ti=d253 task=da1e8070 
task.ti=d253)
Oct  3 18:55:23  kernel: Stack: 0010 02d0 ce9ca0b8 02d0 f79cfe00 
f79d1c00 f79c2940  
Oct  3 18:55:23  kernel: 0001 d2531cd4 ce9ca088 c022aade d5d2601c 0282 
f79cfe00 02d0 
Oct  3 18:55:23  kernel: f79cfe00 c01652e6  0001 c0265a4e 0011 
d2531d60 d7acfb40 
Oct  3 18:55:23  kernel: Call Trace:
Oct  3 18:55:23  kernel: [c022aade] xfs_da_brelse+0x6e/0xb0
Oct  3 18:55:23  kernel: [c01652e6] kmem_cache_alloc+0x46/0x50
Oct  3 18:55:23  kernel: [c0265a4e] kmem_zone_alloc+0x4e/0xc0
Oct  3 18:55:23  kernel: [c027015f] xfs_fs_alloc_inode+0xf/0x20
Oct  3 18:55:23  kernel: [c017bbd6] alloc_inode+0x16/0x170
Oct  3 18:55:23  kernel: [c017bd89] iget_locked+0x59/0x130
Oct  3 18:55:23  kernel: [c023fa38] xfs_iget+0x78/0x160
Oct  3 18:55:23  kernel: [c020a49c] xfs_acl_vget+0x6c/0x160
Oct  3 18:55:23  kernel: [c025b143] xfs_dir_lookup_int+0x93/0xf0
Oct  3 18:55:23  kernel: [c025ea55] xfs_lookup+0x75/0xa0
Oct  3 18:55:23  kernel: [c026d0c2] xfs_vn_lookup+0x52/0x90
Oct  3 18:55:23  kernel: [c016fd08] do_lookup+0x148/0x190
Oct  3 18:55:23  kernel: [c0171cb4] __link_path_walk+0x814/0xe40
Oct  3 18:55:23  kernel: [c0172325] link_path_walk+0x45/0xc0
Oct  3 18:55:23  kernel: [c0172581] do_path_lookup+0x81/0x1c0
Oct  3 18:55:23  kernel: [c01712c3] getname+0xb3/0xe0
Oct  3 18:55:23  kernel: [c0172f8b] __user_walk_fd+0x3b/0x60
Oct  3 18:55:23  kernel: [c016bcdf] vfs_lstat_fd+0x1f/0x50
Oct  3 18:55:23  kernel: [c016bd5f] sys_lstat64+0xf/0x30
Oct  3 18:55:23  kernel: [c01040b0] sysenter_past_esp+0x5d/0x81
Oct  3 18:55:23  kernel: ===
Oct  3 18:55:23  kernel: Code: 10 8b 77 14 01 c2 8b 44 24 30 8b 34 b0 89 77 14 
89 54 8d 14 8d 51 01 89 55 00 8b 44 24 10 8b 77 10 3b 70 5c 72 c0 8b 17 8b 47 
04 89 42 04 89 10 83 7f 14 ff c7 07 00 01 10 00 c7 47 04 00 02 20 
Oct  3 18:55:23  kernel: EIP: [c016540c] cache_alloc_refill+0x11c/0x4f0 
SS:ESP 0068:d2531c98
Oct  3 18:55:26  kernel: Oops: 0002 [#2]
Oct  3 18:55:26  kernel: SMP 
Oct  3 18:55:26  kernel: CPU:7
Oct  3 18:55:26  kernel: EIP:0060:[c017bbe0]Not tainted VLI
Oct  3 18:55:26  kernel: EFLAGS: 00210282   (2.6.21-dl380-g5-20071001 #1)
Oct  3 18:55:26  kernel: EIP is at alloc_inode+0x20/0x170
Oct  3 18:55:26  kernel: eax: b4fd89ba   ebx: b4fd89ba   ecx: b4fd89ba   edx: 
b4fd89ba
Oct  3 18:55:26  kernel: esi: f29bb000   edi: f29bb000   ebp: ca743575   esp: 
d6747c64
Oct  3 18:55:26  kernel: ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
Oct  3 18:55:26  kernel: Process imapd (pid: 20054, ti=d6746000 task=e04a20b0 
task.ti=d6746000)
Oct  3 18:55:26  kernel: Stack:  c76fe0dc f29bb000 c017bd89  
 c04abda0 ca743575 
Oct  3 18:55:26  kernel: ca743575 f53b5800 c023fa38 cb2b4524 1b2595f3 0020 
f0dd7400 ded8b7a8 
Oct  3 18:55:26  kernel:  f53b5800 c04abda0 cb2b4524 cb2b4524 ca743575 
 0004 
Oct  3 18:55:26  kernel: Call Trace:
Oct  3 18:55:26  kernel: [c017bd89] iget_locked+0x59/0x130
Oct  3 18:55:26  kernel: [c023fa38] xfs_iget+0x78/0x160
Oct  3 18:55:26  kernel: [c025a697] xfs_trans_iget+0x117/0x190
Oct  3 18:55:26  kernel: [c0243d87] xfs_ialloc+0xc7/0x570
Oct  3 18:55:26  kernel: [c024aabc] xlog_grant_push_ail+0x3c/0x150
Oct  3 18:55:26  kernel: [c025b261] xfs_dir_ialloc+0x81/0x2d0
Oct  3 18:55:26  kernel: [c025855b] xfs_trans_reserve+0xab/0x230
Oct  3 18:55:26  kernel: [c0261aa5] xfs_create+0x395/0x6a0
Oct  3 18:55:26  kernel: [c023eac5] xfs_iunlock+0x85/0xa0
Oct  3 18:55:26  kernel: [c026d6b5] xfs_vn_mknod+0x235/0x360
Oct  3 18:55:26  kernel: [c01705cd] vfs_create+0xdd/0x140
Oct  3 18:55:26  kernel: [c01738ae] open_namei+0x58e/0x5f0
Oct  3 18:55:26  kernel: [c016716e] do_filp_open+0x2e/0x60
Oct  3 18:55:26  kernel: [c0166e4f] get_unused_fd+0x4f/0xb0
Oct  3 18:55:26  kernel: [c01671ea] do_sys_open+0x4a/0xe0
Oct  3 18:55:26  kernel: [c01672bc] sys_open+0x1c/0x20
Oct  3 18:55:26  kernel: [c01040b0] sysenter_past_esp+0x5d/0x81
Oct  3 18:55:26  kernel: