Re: Strange crash on Dell R720xd

2012-10-17 Thread Laurent CARON
On Tue, Oct 16, 2012 at 10:58:49AM -0700, Dan Williams wrote:
> I think this may be a bug in __raid_run_ops that is only possible when
> raid offload and CONFIG_MULTICORE_RAID456 are enabled.  I'm thinking
> the descriptor is completed and recycled to another requester in the
> space between these two events:
> 
> ops_run_compute();
> 
> /* terminate the chain if reconstruct is not set to be run */
> if (tx && !test_bit(STRIPE_OP_RECONSTRUCT, _request))
> async_tx_ack(tx);
> 
> ...don't use the experimental CONFIG_MULTICORE_RAID456 even if you
> leave IOAT DMA disabled.  A rework of the raid operation dma chaining
> is in progress, but may not be ready for a while.

Hi,

I usually don't use CONFIG_MULTICORE_RAID456 as it proved to be sluggish
and/or unstable in my experience, so I should be pretty safe letting I/O
AT DMA disabled for now on those bosex.

Thanks

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Strange crash on Dell R720xd

2012-10-17 Thread Laurent CARON
On Tue, Oct 16, 2012 at 10:58:49AM -0700, Dan Williams wrote:
 I think this may be a bug in __raid_run_ops that is only possible when
 raid offload and CONFIG_MULTICORE_RAID456 are enabled.  I'm thinking
 the descriptor is completed and recycled to another requester in the
 space between these two events:
 
 ops_run_compute();
 
 /* terminate the chain if reconstruct is not set to be run */
 if (tx  !test_bit(STRIPE_OP_RECONSTRUCT, ops_request))
 async_tx_ack(tx);
 
 ...don't use the experimental CONFIG_MULTICORE_RAID456 even if you
 leave IOAT DMA disabled.  A rework of the raid operation dma chaining
 is in progress, but may not be ready for a while.

Hi,

I usually don't use CONFIG_MULTICORE_RAID456 as it proved to be sluggish
and/or unstable in my experience, so I should be pretty safe letting I/O
AT DMA disabled for now on those bosex.

Thanks

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Strange crash on Dell R720xd

2012-10-16 Thread Laurent CARON
On Tue, Oct 16, 2012 at 02:48:25PM +0200, Borislav Petkov wrote:
> On Tue, Oct 16, 2012 at 11:26:01AM +0200, Laurent CARON wrote:
> > On Tue, Oct 16, 2012 at 11:03:53AM +0200, Borislav Petkov wrote:
> > > That's:
> > > 
> > > BUG_ON(async_tx_test_ack(depend_tx) || 
> > > txd_next(depend_tx) ||
> > >   txd_parent(tx));
> > > 
> > > but probably the b0rkage happens up the stack. And this __raid_run_ops
> > > is probably starting the whole TX so maybe we should add
> > > linux-r...@vger.kernel.org to CC. Added.
> > 
> > 
> > Hi,
> > 
> > The machines seem stable after disabling I/O AT DMA at the BIOS level.
> 
> That's a good point because the backtrace goes through I/O AT DMA so it
> could very well be the culprit. Let's add some more people to Cc.
> 
> Vinod/Dan, here's the BUG_ON Laurent is hitting:
> 
> http://marc.info/?l=linux-kernel=135033064724794=2
> 
> and it has ioat2_tx_submit_unlock in the backtrace. Disabling ioat dma
> in the BIOS makes the issue disappear so ...
> 
> > > What is that "r510" thing in the kernel version? You have your patches
> > > ontop? If yes, please try reproducing this with a kernel.org kernel
> > > without anything else ontop.
> > 
> > My kernel is vanilla from Kernel.org. The -r510 string is because I
> > tried it on a -r510 also.
> 
> Ok, good.
> 
> > > Also, it might be worth trying plain 3.6 to rule out a regression
> > > introduced in the stable 3.6 series.
> > 
> > I tried 3.5.x, 3.6, 3.6.1, 3.6.2 with exactly the same results.
> > 
> > For now, I did create more volumes, rsync lors of data over the network
> > to the disks with no crashs (after disabling I/O AT DMA).
> 
> And when you do this with ioat dma enabled, you get the bug, right? So
> it is reproducible...?

It is 100% reproductible. The only "nondeterministic" point is the time
it takes to have the machine crash.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Strange crash on Dell R720xd

2012-10-16 Thread Laurent CARON
On Tue, Oct 16, 2012 at 11:03:53AM +0200, Borislav Petkov wrote:
> That's:
> 
> BUG_ON(async_tx_test_ack(depend_tx) || txd_next(depend_tx) ||
>   txd_parent(tx));
> 
> but probably the b0rkage happens up the stack. And this __raid_run_ops
> is probably starting the whole TX so maybe we should add
> linux-r...@vger.kernel.org to CC. Added.


Hi,

The machines seem stable after disabling I/O AT DMA at the BIOS level.

> What is that "r510" thing in the kernel version? You have your patches
> ontop? If yes, please try reproducing this with a kernel.org kernel
> without anything else ontop.

My kernel is vanilla from Kernel.org. The -r510 string is because I
tried it on a -r510 also.

> Also, it might be worth trying plain 3.6 to rule out a regression
> introduced in the stable 3.6 series.

I tried 3.5.x, 3.6, 3.6.1, 3.6.2 with exactly the same results.

For now, I did create more volumes, rsync lors of data over the network
to the disks with no crashs (after disabling I/O AT DMA).

...snip...

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Strange crash on Dell R720xd

2012-10-16 Thread Laurent CARON
On Tue, Oct 16, 2012 at 11:03:53AM +0200, Borislav Petkov wrote:
 That's:
 
 BUG_ON(async_tx_test_ack(depend_tx) || txd_next(depend_tx) ||
   txd_parent(tx));
 
 but probably the b0rkage happens up the stack. And this __raid_run_ops
 is probably starting the whole TX so maybe we should add
 linux-r...@vger.kernel.org to CC. Added.


Hi,

The machines seem stable after disabling I/O AT DMA at the BIOS level.

 What is that r510 thing in the kernel version? You have your patches
 ontop? If yes, please try reproducing this with a kernel.org kernel
 without anything else ontop.

My kernel is vanilla from Kernel.org. The -r510 string is because I
tried it on a -r510 also.

 Also, it might be worth trying plain 3.6 to rule out a regression
 introduced in the stable 3.6 series.

I tried 3.5.x, 3.6, 3.6.1, 3.6.2 with exactly the same results.

For now, I did create more volumes, rsync lors of data over the network
to the disks with no crashs (after disabling I/O AT DMA).

...snip...

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Strange crash on Dell R720xd

2012-10-16 Thread Laurent CARON
On Tue, Oct 16, 2012 at 02:48:25PM +0200, Borislav Petkov wrote:
 On Tue, Oct 16, 2012 at 11:26:01AM +0200, Laurent CARON wrote:
  On Tue, Oct 16, 2012 at 11:03:53AM +0200, Borislav Petkov wrote:
   That's:
   
   BUG_ON(async_tx_test_ack(depend_tx) || 
   txd_next(depend_tx) ||
 txd_parent(tx));
   
   but probably the b0rkage happens up the stack. And this __raid_run_ops
   is probably starting the whole TX so maybe we should add
   linux-r...@vger.kernel.org to CC. Added.
  
  
  Hi,
  
  The machines seem stable after disabling I/O AT DMA at the BIOS level.
 
 That's a good point because the backtrace goes through I/O AT DMA so it
 could very well be the culprit. Let's add some more people to Cc.
 
 Vinod/Dan, here's the BUG_ON Laurent is hitting:
 
 http://marc.info/?l=linux-kernelm=135033064724794w=2
 
 and it has ioat2_tx_submit_unlock in the backtrace. Disabling ioat dma
 in the BIOS makes the issue disappear so ...
 
   What is that r510 thing in the kernel version? You have your patches
   ontop? If yes, please try reproducing this with a kernel.org kernel
   without anything else ontop.
  
  My kernel is vanilla from Kernel.org. The -r510 string is because I
  tried it on a -r510 also.
 
 Ok, good.
 
   Also, it might be worth trying plain 3.6 to rule out a regression
   introduced in the stable 3.6 series.
  
  I tried 3.5.x, 3.6, 3.6.1, 3.6.2 with exactly the same results.
  
  For now, I did create more volumes, rsync lors of data over the network
  to the disks with no crashs (after disabling I/O AT DMA).
 
 And when you do this with ioat dma enabled, you get the bug, right? So
 it is reproducible...?

It is 100% reproductible. The only nondeterministic point is the time
it takes to have the machine crash.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Strange crash on Dell R720xd

2012-10-15 Thread Laurent CARON
Hi,

I'm currently replacing an old system (HP DL 380 G5) by new dell R720xd.
On those new boxes I did configure the H310 controler as plain JBOD.

Those boxes appear to crash more often than not (from 5 mins to a couple
of hours).
I have the impression those crashes appear under heavy IO.

The setup consists of a few md RAID arrays serving as underlying devices
for either filesystem, or drbd (plus lvm on top).

I managed to catch a trace over netconsole:
[ cut here ]
kernel BUG at crypto/async_tx/async_tx.c:174!
invalid opcode:  [#1] SMP 
Modules linked in: drbd lru_cache netconsole iptable_filter ip_tables 
ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue 
bonding ipv6 btrfs ioatdma lpc_ich sb_edac dca mfd_core
CPU 0 
Pid: 12580, comm: kworker/u:2 Not tainted 3.6.2-r510-r720xd #1 Dell Inc. 
PowerEdge R720xd
RIP: 0010:[]  [] async_tx_submit+0x29/0xab
RSP: 0018:88100940fb30  EFLAGS: 00010202
RAX: 88100b30aeb0 RBX: 88080b5cf390 RCX: 0029
RDX: 88100940fd00 RSI: 88080b5cf390 RDI: 880809ad0818
RBP: 8808054a7d90 R08: 88080b5cf900 R09: 0001
R10: 1000 R11: 0001 R12: 88100940fd00
R13: 0002 R14: 880809ad0638 R15: 880809ad0818
FS:  () GS:88080fc0() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: ff600400 CR3: 000e4055f000 CR4: 000407f0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process kworker/u:2 (pid: 12580, threadinfo 88100940e000, task 
880804850630)
Stack:
 88100940fd00 88100940fc40 0101 8131044b
 0001 0246 0201 a0073a00
 8808054a7d90 8808054a7690 88100940fc40 88080bf9e668
Call Trace:
 [] ? do_async_gen_syndrome+0x2f3/0x320
 [] ? ioat2_tx_submit_unlock+0xac/0xb3 [ioatdma]
 [] ? ops_complete_compute+0x7b/0x7b
 [] ? async_gen_syndrome+0xc8/0x1d6
 [] ? __raid_run_ops+0x9e7/0xb5a
 [] ? select_task_rq_fair+0x487/0x74b
 [] ? ops_complete_compute+0x7b/0x7b
 [] ? __wake_up+0x35/0x46
 [] ? async_schedule+0x12/0x12
 [] ? async_run_ops+0x32/0x3e
 [] ? async_run_entry_fn+0xa4/0x17e
 [] ? async_schedule+0x12/0x12
 [] ? process_one_work+0x259/0x381
 [] ? worker_thread+0x2ad/0x3e3
 [] ? try_to_wake_up+0x1fc/0x20c
 [] ? manage_workers+0x245/0x245
 [] ? manage_workers+0x245/0x245
 [] ? kthread+0x81/0x89
 [] ? kernel_thread_helper+0x4/0x10
 [] ? kthread_freezable_should_stop+0x4e/0x4e
 [] ? gs_change+0xb/0xb
Code: 5b c3 41 54 49 89 d4 55 53 48 89 f3 48 8b 6a 08 48 8b 42 10 48 85 ed 48 
89 46 20 48 8b 42 18 48 89 46 28 74 5c f6 45 04 02 74 72 <0f> 0b eb fe 48 8b 02 
48 8b 48 28 80 e1 40 74 24 31 f6 48 89 d7 
RIP  [] async_tx_submit+0x29/0xab
 RSP 
---[ end trace 64fb561d16a3b535 ]---
Kernel panic - not syncing: Fatal exception in interrupt
Rebooting in 5 seconds..

Do any of you guys have a clue about it ?

Thanks

Laurent

PS: The very same kernel doesn't cause any trouble on R510 hardware.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Strange crash on Dell R720xd

2012-10-15 Thread Laurent CARON
Hi,

I'm currently replacing an old system (HP DL 380 G5) by new dell R720xd.
On those new boxes I did configure the H310 controler as plain JBOD.

Those boxes appear to crash more often than not (from 5 mins to a couple
of hours).
I have the impression those crashes appear under heavy IO.

The setup consists of a few md RAID arrays serving as underlying devices
for either filesystem, or drbd (plus lvm on top).

I managed to catch a trace over netconsole:
[ cut here ]
kernel BUG at crypto/async_tx/async_tx.c:174!
invalid opcode:  [#1] SMP 
Modules linked in: drbd lru_cache netconsole iptable_filter ip_tables 
ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue 
bonding ipv6 btrfs ioatdma lpc_ich sb_edac dca mfd_core
CPU 0 
Pid: 12580, comm: kworker/u:2 Not tainted 3.6.2-r510-r720xd #1 Dell Inc. 
PowerEdge R720xd
RIP: 0010:[8130f9ab]  [8130f9ab] async_tx_submit+0x29/0xab
RSP: 0018:88100940fb30  EFLAGS: 00010202
RAX: 88100b30aeb0 RBX: 88080b5cf390 RCX: 0029
RDX: 88100940fd00 RSI: 88080b5cf390 RDI: 880809ad0818
RBP: 8808054a7d90 R08: 88080b5cf900 R09: 0001
R10: 1000 R11: 0001 R12: 88100940fd00
R13: 0002 R14: 880809ad0638 R15: 880809ad0818
FS:  () GS:88080fc0() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: ff600400 CR3: 000e4055f000 CR4: 000407f0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process kworker/u:2 (pid: 12580, threadinfo 88100940e000, task 
880804850630)
Stack:
 88100940fd00 88100940fc40 0101 8131044b
 0001 0246 0201 a0073a00
 8808054a7d90 8808054a7690 88100940fc40 88080bf9e668
Call Trace:
 [8131044b] ? do_async_gen_syndrome+0x2f3/0x320
 [a0073a00] ? ioat2_tx_submit_unlock+0xac/0xb3 [ioatdma]
 [815e6820] ? ops_complete_compute+0x7b/0x7b
 [81310540] ? async_gen_syndrome+0xc8/0x1d6
 [815e8b9a] ? __raid_run_ops+0x9e7/0xb5a
 [810848f0] ? select_task_rq_fair+0x487/0x74b
 [815e6820] ? ops_complete_compute+0x7b/0x7b
 [8107e40b] ? __wake_up+0x35/0x46
 [8107ca2a] ? async_schedule+0x12/0x12
 [815e8d3f] ? async_run_ops+0x32/0x3e
 [8107cace] ? async_run_entry_fn+0xa4/0x17e
 [8107ca2a] ? async_schedule+0x12/0x12
 [81071cf8] ? process_one_work+0x259/0x381
 [81072312] ? worker_thread+0x2ad/0x3e3
 [81082e50] ? try_to_wake_up+0x1fc/0x20c
 [81072065] ? manage_workers+0x245/0x245
 [81072065] ? manage_workers+0x245/0x245
 [8107746a] ? kthread+0x81/0x89
 [81791034] ? kernel_thread_helper+0x4/0x10
 [810773e9] ? kthread_freezable_should_stop+0x4e/0x4e
 [81791030] ? gs_change+0xb/0xb
Code: 5b c3 41 54 49 89 d4 55 53 48 89 f3 48 8b 6a 08 48 8b 42 10 48 85 ed 48 
89 46 20 48 8b 42 18 48 89 46 28 74 5c f6 45 04 02 74 72 0f 0b eb fe 48 8b 02 
48 8b 48 28 80 e1 40 74 24 31 f6 48 89 d7 
RIP  [8130f9ab] async_tx_submit+0x29/0xab
 RSP 88100940fb30
---[ end trace 64fb561d16a3b535 ]---
Kernel panic - not syncing: Fatal exception in interrupt
Rebooting in 5 seconds..

Do any of you guys have a clue about it ?

Thanks

Laurent

PS: The very same kernel doesn't cause any trouble on R510 hardware.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Crash with XFS on top of DRBD (DRBD 8.0.6 svn / Kernel 2.6.22)

2007-10-29 Thread Laurent Caron

Hi,

I'm back with my crash, oomkiller. problems on my DRBD cluster of 2
servers.

I compiled a 2.6.22 kernel with slab/slab debugging turned on.

Here is the last oom-killer message I got on that server.

I couldn't wait until a crash since a lot of users are working on it:

---
kernel: procmail invoked oom-killer: gfp_mask=0xd0, order=1, oomkilladj=0
kernel: [] out_of_memory+0x69/0x197
kernel: [] __alloc_pages+0x20a/0x294
kernel: [] __get_free_pages+0x2c/0x3a
kernel: [] copy_process+0xa4/0x102d
kernel: [] alloc_pid+0x16/0x240
kernel: [] kmem_cache_alloc+0x80/0x8a
kernel: [] alloc_pid+0x16/0x240
kernel: [] do_fork+0x9a/0x1c2
kernel: [] sys_clone+0x36/0x3b
kernel: [] sysenter_past_esp+0x6b/0xa1
kernel: ===
kernel: Mem-info:
kernel: DMA per-cpu:
kernel: CPU0: Hot: hi:0, btch:   1 usd:   0   Cold: hi:0, btch:   1 
usd:   0  
kernel: CPU1: Hot: hi:0, btch:   1 usd:   0   Cold: hi:0, btch:   1 
usd:   0  
kernel: CPU2: Hot: hi:0, btch:   1 usd:   0   Cold: hi:0, btch:   1 
usd:   0  
kernel: CPU3: Hot: hi:0, btch:   1 usd:   0   Cold: hi:0, btch:   1 
usd:   0  
kernel: CPU4: Hot: hi:0, btch:   1 usd:   0   Cold: hi:0, btch:   1 
usd:   0  
kernel: CPU5: Hot: hi:0, btch:   1 usd:   0   Cold: hi:0, btch:   1 
usd:   0  
kernel: CPU6: Hot: hi:0, btch:   1 usd:   0   Cold: hi:0, btch:   1 
usd:   0  
kernel: CPU7: Hot: hi:0, btch:   1 usd:   0   Cold: hi:0, btch:   1 
usd:   0  
kernel: Normal per-cpu:
kernel: CPU0: Hot: hi:  186, btch:  31 usd:  63   Cold: hi:   62, btch:  15 
usd:  51 
kernel: CPU1: Hot: hi:  186, btch:  31 usd:  97   Cold: hi:   62, btch:  15 
usd:  54 
kernel: CPU2: Hot: hi:  186, btch:  31 usd:  26   Cold: hi:   62, btch:  15 
usd:  47 
kernel: CPU3: Hot: hi:  186, btch:  31 usd:   8   Cold: hi:   62, btch:  15 
usd:  61 
kernel: CPU4: Hot: hi:  186, btch:  31 usd:  18   Cold: hi:   62, btch:  15 
usd:  53 
kernel: CPU5: Hot: hi:  186, btch:  31 usd:   0   Cold: hi:   62, btch:  15 
usd:  60 
kernel: CPU6: Hot: hi:  186, btch:  31 usd:   6   Cold: hi:   62, btch:  15 
usd:  57 
kernel: CPU7: Hot: hi:  186, btch:  31 usd:  28   Cold: hi:   62, btch:  15 
usd:  59 
kernel: HighMem per-cpu:
kernel: CPU0: Hot: hi:  186, btch:  31 usd:  98   Cold: hi:   62, btch:  15 
usd:   8 
kernel: CPU1: Hot: hi:  186, btch:  31 usd:   7   Cold: hi:   62, btch:  15 
usd:  12 
kernel: CPU2: Hot: hi:  186, btch:  31 usd:  85   Cold: hi:   62, btch:  15 
usd:   4 
kernel: CPU3: Hot: hi:  186, btch:  31 usd:  72   Cold: hi:   62, btch:  15 
usd:   0 
kernel: CPU4: Hot: hi:  186, btch:  31 usd:  21   Cold: hi:   62, btch:  15 
usd:   6 
kernel: CPU5: Hot: hi:  186, btch:  31 usd:  19   Cold: hi:   62, btch:  15 
usd:  12 
kernel: CPU6: Hot: hi:  186, btch:  31 usd: 175   Cold: hi:   62, btch:  15 
usd:  12 
kernel: CPU7: Hot: hi:  186, btch:  31 usd: 137   Cold: hi:   62, btch:  15 
usd:   2 
kernel: Active:381850 inactive:9157 dirty:256 writeback:97 unstable:0
kernel: free:2519061 slab:22044 mapped:7487 pagetables:2163 bounce:0
kernel: DMA free:3564kB min:68kB low:84kB high:100kB active:0kB inactive:0kB 
present:16256kB pages_scanned:0 all_unreclaimable? yes 
kernel: lowmem_reserve[]: 0 873 12938
kernel: Normal free:7756kB min:3744kB low:4680kB high:5616kB active:116kB 
inactive:0kB present:894080kB pages_scanned:192 all_unreclaimable? yes 
kernel: lowmem_reserve[]: 0 0 96520
kernel: HighMem free:10064924kB min:512kB low:13456kB high:26404kB 
active:1527284kB inactive:36804kB present:12354560kB pages_scanned:0 
all_unreclaimable? no
kernel: lowmem_reserve[]: 0 0 0
kernel: DMA: 9*4kB 15*8kB 3*16kB 1*32kB 0*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 
1*2048kB 0*4096kB = 3564kB
kernel: Normal: 1495*4kB 10*8kB 1*16kB 0*32kB 1*64kB 1*128kB 0*256kB 1*512kB 
1*1024kB 0*2048kB 0*4096kB = 7804kB
kernel: HighMem: 39234*4kB 109305*8kB 80630*16kB 52420*32kB 27267*64kB 
9904*128kB 2822*256kB 871*512kB 377*1024kB 218*2048kB 257*4096kB = 10065264kB
kernel: Swap cache: add 43, delete 42, find 0/0, race 0+0 
kernel: Free swap  = 393204kB
kernel: Total swap = 393208kB
kernel: Free swap:   393204kB
kernel: 3342335 pages of RAM
kernel: 3112959 pages of HIGHMEM
kernel: 224356 reserved pages
kernel: 221558 pages shared
kernel: 1 pages swap cached
kernel: 256 pages dirty
kernel: 97 pages writeback
kernel: 7487 pages mapped
kernel: 22044 pages slab
kernel: 2163 pages pagetables
kernel: Out of memory: kill process 12069 (slapd) score 62386 or a child 
kernel: Killed process 12069 (slapd)
-

Do anyone have any clue about what's happening ?

Thanks

Laurent



-- 
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please 

Crash with XFS on top of DRBD (DRBD 8.0.6 svn / Kernel 2.6.22)

2007-10-29 Thread Laurent Caron

Hi,

I'm back with my crash, oomkiller. problems on my DRBD cluster of 2
servers.

I compiled a 2.6.22 kernel with slab/slab debugging turned on.

Here is the last oom-killer message I got on that server.

I couldn't wait until a crash since a lot of users are working on it:

---
kernel: procmail invoked oom-killer: gfp_mask=0xd0, order=1, oomkilladj=0
kernel: [c0147dbe] out_of_memory+0x69/0x197
kernel: [c01492e1] __alloc_pages+0x20a/0x294
kernel: [c0149397] __get_free_pages+0x2c/0x3a
kernel: [c011ba87] copy_process+0xa4/0x102d
kernel: [c012aff2] alloc_pid+0x16/0x240
kernel: [c016048d] kmem_cache_alloc+0x80/0x8a
kernel: [c012aff2] alloc_pid+0x16/0x240
kernel: [c011cc57] do_fork+0x9a/0x1c2
kernel: [c01021d1] sys_clone+0x36/0x3b
kernel: [c0103cb2] sysenter_past_esp+0x6b/0xa1
kernel: ===
kernel: Mem-info:
kernel: DMA per-cpu:
kernel: CPU0: Hot: hi:0, btch:   1 usd:   0   Cold: hi:0, btch:   1 
usd:   0  
kernel: CPU1: Hot: hi:0, btch:   1 usd:   0   Cold: hi:0, btch:   1 
usd:   0  
kernel: CPU2: Hot: hi:0, btch:   1 usd:   0   Cold: hi:0, btch:   1 
usd:   0  
kernel: CPU3: Hot: hi:0, btch:   1 usd:   0   Cold: hi:0, btch:   1 
usd:   0  
kernel: CPU4: Hot: hi:0, btch:   1 usd:   0   Cold: hi:0, btch:   1 
usd:   0  
kernel: CPU5: Hot: hi:0, btch:   1 usd:   0   Cold: hi:0, btch:   1 
usd:   0  
kernel: CPU6: Hot: hi:0, btch:   1 usd:   0   Cold: hi:0, btch:   1 
usd:   0  
kernel: CPU7: Hot: hi:0, btch:   1 usd:   0   Cold: hi:0, btch:   1 
usd:   0  
kernel: Normal per-cpu:
kernel: CPU0: Hot: hi:  186, btch:  31 usd:  63   Cold: hi:   62, btch:  15 
usd:  51 
kernel: CPU1: Hot: hi:  186, btch:  31 usd:  97   Cold: hi:   62, btch:  15 
usd:  54 
kernel: CPU2: Hot: hi:  186, btch:  31 usd:  26   Cold: hi:   62, btch:  15 
usd:  47 
kernel: CPU3: Hot: hi:  186, btch:  31 usd:   8   Cold: hi:   62, btch:  15 
usd:  61 
kernel: CPU4: Hot: hi:  186, btch:  31 usd:  18   Cold: hi:   62, btch:  15 
usd:  53 
kernel: CPU5: Hot: hi:  186, btch:  31 usd:   0   Cold: hi:   62, btch:  15 
usd:  60 
kernel: CPU6: Hot: hi:  186, btch:  31 usd:   6   Cold: hi:   62, btch:  15 
usd:  57 
kernel: CPU7: Hot: hi:  186, btch:  31 usd:  28   Cold: hi:   62, btch:  15 
usd:  59 
kernel: HighMem per-cpu:
kernel: CPU0: Hot: hi:  186, btch:  31 usd:  98   Cold: hi:   62, btch:  15 
usd:   8 
kernel: CPU1: Hot: hi:  186, btch:  31 usd:   7   Cold: hi:   62, btch:  15 
usd:  12 
kernel: CPU2: Hot: hi:  186, btch:  31 usd:  85   Cold: hi:   62, btch:  15 
usd:   4 
kernel: CPU3: Hot: hi:  186, btch:  31 usd:  72   Cold: hi:   62, btch:  15 
usd:   0 
kernel: CPU4: Hot: hi:  186, btch:  31 usd:  21   Cold: hi:   62, btch:  15 
usd:   6 
kernel: CPU5: Hot: hi:  186, btch:  31 usd:  19   Cold: hi:   62, btch:  15 
usd:  12 
kernel: CPU6: Hot: hi:  186, btch:  31 usd: 175   Cold: hi:   62, btch:  15 
usd:  12 
kernel: CPU7: Hot: hi:  186, btch:  31 usd: 137   Cold: hi:   62, btch:  15 
usd:   2 
kernel: Active:381850 inactive:9157 dirty:256 writeback:97 unstable:0
kernel: free:2519061 slab:22044 mapped:7487 pagetables:2163 bounce:0
kernel: DMA free:3564kB min:68kB low:84kB high:100kB active:0kB inactive:0kB 
present:16256kB pages_scanned:0 all_unreclaimable? yes 
kernel: lowmem_reserve[]: 0 873 12938
kernel: Normal free:7756kB min:3744kB low:4680kB high:5616kB active:116kB 
inactive:0kB present:894080kB pages_scanned:192 all_unreclaimable? yes 
kernel: lowmem_reserve[]: 0 0 96520
kernel: HighMem free:10064924kB min:512kB low:13456kB high:26404kB 
active:1527284kB inactive:36804kB present:12354560kB pages_scanned:0 
all_unreclaimable? no
kernel: lowmem_reserve[]: 0 0 0
kernel: DMA: 9*4kB 15*8kB 3*16kB 1*32kB 0*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 
1*2048kB 0*4096kB = 3564kB
kernel: Normal: 1495*4kB 10*8kB 1*16kB 0*32kB 1*64kB 1*128kB 0*256kB 1*512kB 
1*1024kB 0*2048kB 0*4096kB = 7804kB
kernel: HighMem: 39234*4kB 109305*8kB 80630*16kB 52420*32kB 27267*64kB 
9904*128kB 2822*256kB 871*512kB 377*1024kB 218*2048kB 257*4096kB = 10065264kB
kernel: Swap cache: add 43, delete 42, find 0/0, race 0+0 
kernel: Free swap  = 393204kB
kernel: Total swap = 393208kB
kernel: Free swap:   393204kB
kernel: 3342335 pages of RAM
kernel: 3112959 pages of HIGHMEM
kernel: 224356 reserved pages
kernel: 221558 pages shared
kernel: 1 pages swap cached
kernel: 256 pages dirty
kernel: 97 pages writeback
kernel: 7487 pages mapped
kernel: 22044 pages slab
kernel: 2163 pages pagetables
kernel: Out of memory: kill process 12069 (slapd) score 62386 or a child 
kernel: Killed process 12069 (slapd)
-

Do anyone have any clue about what's happening ?

Thanks

Laurent



-- 
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL 

Re: Crash on 2.6.21.7 Vanilla + DRBD 0.7

2007-10-08 Thread Laurent CARON
David Chinner wrote:
> Can you turn on slab debug and poisoning and see where
> the kernel fails with that? e.g. set:
> 
> CONFIG_DEBUG_SLAB=y
> CONFIG_DEBUG_SLAB_LEAK=y


I was a little worried about letting those servers in such a bad state,
and went the "easy" way.

I did upgrade from drbd 0.7.X to latest svn 8.0.X

Laurent

PS: Should this bug reappear, i'll change the kernel's config, and let
you know the result.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Crash on 2.6.21.7 Vanilla + DRBD 0.7

2007-10-08 Thread Laurent CARON
David Chinner wrote:
 Can you turn on slab debug and poisoning and see where
 the kernel fails with that? e.g. set:
 
 CONFIG_DEBUG_SLAB=y
 CONFIG_DEBUG_SLAB_LEAK=y


I was a little worried about letting those servers in such a bad state,
and went the easy way.

I did upgrade from drbd 0.7.X to latest svn 8.0.X

Laurent

PS: Should this bug reappear, i'll change the kernel's config, and let
you know the result.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Crash on 2.6.21.7 Vanilla + DRBD 0.7

2007-10-04 Thread Laurent Caron

Hi,

I did compile a fresh 2.6.21.7 kernel from kernel.org (no distro patch, ), 
and latest svn (3062) 0.7.X drbd.

After just 2 days of uptime, I did experience another crash.

I wonder if it is an XFS related bug, a DRBD one, or related to XFS on top of 
DRBD.

This bug seems to occur with intensive IO operations.

What do you think about it ?

Thanks

Laurent




Oct  3 18:55:23  kernel: Oops: 0002 [#1]
Oct  3 18:55:23  kernel: SMP 
Oct  3 18:55:23  kernel: CPU:7
Oct  3 18:55:23  kernel: EIP:0060:[]Not tainted VLI
Oct  3 18:55:23  kernel: EFLAGS: 00010046   (2.6.21-dl380-g5-20071001 #1)
Oct  3 18:55:23  kernel: EIP is at cache_alloc_refill+0x11c/0x4f0
Oct  3 18:55:23  kernel: eax: f79c2940   ebx: 0015   ecx: 0005   edx: 
65b567b0
Oct  3 18:55:23  kernel: esi: 000a   edi: d5d26000   ebp: f79d03c0   esp: 
d2531c98
Oct  3 18:55:23  kernel: ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
Oct  3 18:55:23  kernel: Process rsync (pid: 22409, ti=d253 task=da1e8070 
task.ti=d253)
Oct  3 18:55:23  kernel: Stack: 0010 02d0 ce9ca0b8 02d0 f79cfe00 
f79d1c00 f79c2940  
Oct  3 18:55:23  kernel: 0001 d2531cd4 ce9ca088 c022aade d5d2601c 0282 
f79cfe00 02d0 
Oct  3 18:55:23  kernel: f79cfe00 c01652e6  0001 c0265a4e 0011 
d2531d60 d7acfb40 
Oct  3 18:55:23  kernel: Call Trace:
Oct  3 18:55:23  kernel: [] xfs_da_brelse+0x6e/0xb0
Oct  3 18:55:23  kernel: [] kmem_cache_alloc+0x46/0x50
Oct  3 18:55:23  kernel: [] kmem_zone_alloc+0x4e/0xc0
Oct  3 18:55:23  kernel: [] xfs_fs_alloc_inode+0xf/0x20
Oct  3 18:55:23  kernel: [] alloc_inode+0x16/0x170
Oct  3 18:55:23  kernel: [] iget_locked+0x59/0x130
Oct  3 18:55:23  kernel: [] xfs_iget+0x78/0x160
Oct  3 18:55:23  kernel: [] xfs_acl_vget+0x6c/0x160
Oct  3 18:55:23  kernel: [] xfs_dir_lookup_int+0x93/0xf0
Oct  3 18:55:23  kernel: [] xfs_lookup+0x75/0xa0
Oct  3 18:55:23  kernel: [] xfs_vn_lookup+0x52/0x90
Oct  3 18:55:23  kernel: [] do_lookup+0x148/0x190
Oct  3 18:55:23  kernel: [] __link_path_walk+0x814/0xe40
Oct  3 18:55:23  kernel: [] link_path_walk+0x45/0xc0
Oct  3 18:55:23  kernel: [] do_path_lookup+0x81/0x1c0
Oct  3 18:55:23  kernel: [] getname+0xb3/0xe0
Oct  3 18:55:23  kernel: [] __user_walk_fd+0x3b/0x60
Oct  3 18:55:23  kernel: [] vfs_lstat_fd+0x1f/0x50
Oct  3 18:55:23  kernel: [] sys_lstat64+0xf/0x30
Oct  3 18:55:23  kernel: [] sysenter_past_esp+0x5d/0x81
Oct  3 18:55:23  kernel: ===
Oct  3 18:55:23  kernel: Code: 10 8b 77 14 01 c2 8b 44 24 30 8b 34 b0 89 77 14 
89 54 8d 14 8d 51 01 89 55 00 8b 44 24 10 8b 77 10 3b 70 5c 72 c0 8b 17 8b 47 
04 <89> 42 04 89 10 83 7f 14 ff c7 07 00 01 10 00 c7 47 04 00 02 20 
Oct  3 18:55:23  kernel: EIP: [] cache_alloc_refill+0x11c/0x4f0 
SS:ESP 0068:d2531c98
Oct  3 18:55:26  kernel: Oops: 0002 [#2]
Oct  3 18:55:26  kernel: SMP 
Oct  3 18:55:26  kernel: CPU:7
Oct  3 18:55:26  kernel: EIP:0060:[]Not tainted VLI
Oct  3 18:55:26  kernel: EFLAGS: 00210282   (2.6.21-dl380-g5-20071001 #1)
Oct  3 18:55:26  kernel: EIP is at alloc_inode+0x20/0x170
Oct  3 18:55:26  kernel: eax: b4fd89ba   ebx: b4fd89ba   ecx: b4fd89ba   edx: 
b4fd89ba
Oct  3 18:55:26  kernel: esi: f29bb000   edi: f29bb000   ebp: ca743575   esp: 
d6747c64
Oct  3 18:55:26  kernel: ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
Oct  3 18:55:26  kernel: Process imapd (pid: 20054, ti=d6746000 task=e04a20b0 
task.ti=d6746000)
Oct  3 18:55:26  kernel: Stack:  c76fe0dc f29bb000 c017bd89  
 c04abda0 ca743575 
Oct  3 18:55:26  kernel: ca743575 f53b5800 c023fa38 cb2b4524 1b2595f3 0020 
f0dd7400 ded8b7a8 
Oct  3 18:55:26  kernel:  f53b5800 c04abda0 cb2b4524 cb2b4524 ca743575 
 0004 
Oct  3 18:55:26  kernel: Call Trace:
Oct  3 18:55:26  kernel: [] iget_locked+0x59/0x130
Oct  3 18:55:26  kernel: [] xfs_iget+0x78/0x160
Oct  3 18:55:26  kernel: [] xfs_trans_iget+0x117/0x190
Oct  3 18:55:26  kernel: [] xfs_ialloc+0xc7/0x570
Oct  3 18:55:26  kernel: [] xlog_grant_push_ail+0x3c/0x150
Oct  3 18:55:26  kernel: [] xfs_dir_ialloc+0x81/0x2d0
Oct  3 18:55:26  kernel: [] xfs_trans_reserve+0xab/0x230
Oct  3 18:55:26  kernel: [] xfs_create+0x395/0x6a0
Oct  3 18:55:26  kernel: [] xfs_iunlock+0x85/0xa0
Oct  3 18:55:26  kernel: [] xfs_vn_mknod+0x235/0x360
Oct  3 18:55:26  kernel: [] vfs_create+0xdd/0x140
Oct  3 18:55:26  kernel: [] open_namei+0x58e/0x5f0
Oct  3 18:55:26  kernel: [] do_filp_open+0x2e/0x60
Oct  3 18:55:26  kernel: [] get_unused_fd+0x4f/0xb0
Oct  3 18:55:26  kernel: [] do_sys_open+0x4a/0xe0
Oct  3 18:55:26  kernel: [] sys_open+0x1c/0x20
Oct  3 18:55:26  kernel: [] sysenter_past_esp+0x5d/0x81
Oct  3 18:55:26  kernel: ===
Oct  3 18:55:26  kernel: Code: 90 90 90 90 90 90 90 90 90 90 90 57 56 89 c6 53 
8b 40 20 8b 10 85 d2 0f 84 1e 01 00 00 89 f0 ff d2 89 c3 85 db 0f 84 ee 00 00 
00 <89> b3 98 00 00 00 b9 02 00 00 00 0f b6 46 10 8d bb f8 00 00 00 
Oct  3 18:55:26  kernel: EIP: [] alloc_inode+0x20/0x170 SS:ESP 

Crash on 2.6.21.7 Vanilla + DRBD 0.7

2007-10-04 Thread Laurent Caron

Hi,

I did compile a fresh 2.6.21.7 kernel from kernel.org (no distro patch, ), 
and latest svn (3062) 0.7.X drbd.

After just 2 days of uptime, I did experience another crash.

I wonder if it is an XFS related bug, a DRBD one, or related to XFS on top of 
DRBD.

This bug seems to occur with intensive IO operations.

What do you think about it ?

Thanks

Laurent




Oct  3 18:55:23  kernel: Oops: 0002 [#1]
Oct  3 18:55:23  kernel: SMP 
Oct  3 18:55:23  kernel: CPU:7
Oct  3 18:55:23  kernel: EIP:0060:[c016540c]Not tainted VLI
Oct  3 18:55:23  kernel: EFLAGS: 00010046   (2.6.21-dl380-g5-20071001 #1)
Oct  3 18:55:23  kernel: EIP is at cache_alloc_refill+0x11c/0x4f0
Oct  3 18:55:23  kernel: eax: f79c2940   ebx: 0015   ecx: 0005   edx: 
65b567b0
Oct  3 18:55:23  kernel: esi: 000a   edi: d5d26000   ebp: f79d03c0   esp: 
d2531c98
Oct  3 18:55:23  kernel: ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
Oct  3 18:55:23  kernel: Process rsync (pid: 22409, ti=d253 task=da1e8070 
task.ti=d253)
Oct  3 18:55:23  kernel: Stack: 0010 02d0 ce9ca0b8 02d0 f79cfe00 
f79d1c00 f79c2940  
Oct  3 18:55:23  kernel: 0001 d2531cd4 ce9ca088 c022aade d5d2601c 0282 
f79cfe00 02d0 
Oct  3 18:55:23  kernel: f79cfe00 c01652e6  0001 c0265a4e 0011 
d2531d60 d7acfb40 
Oct  3 18:55:23  kernel: Call Trace:
Oct  3 18:55:23  kernel: [c022aade] xfs_da_brelse+0x6e/0xb0
Oct  3 18:55:23  kernel: [c01652e6] kmem_cache_alloc+0x46/0x50
Oct  3 18:55:23  kernel: [c0265a4e] kmem_zone_alloc+0x4e/0xc0
Oct  3 18:55:23  kernel: [c027015f] xfs_fs_alloc_inode+0xf/0x20
Oct  3 18:55:23  kernel: [c017bbd6] alloc_inode+0x16/0x170
Oct  3 18:55:23  kernel: [c017bd89] iget_locked+0x59/0x130
Oct  3 18:55:23  kernel: [c023fa38] xfs_iget+0x78/0x160
Oct  3 18:55:23  kernel: [c020a49c] xfs_acl_vget+0x6c/0x160
Oct  3 18:55:23  kernel: [c025b143] xfs_dir_lookup_int+0x93/0xf0
Oct  3 18:55:23  kernel: [c025ea55] xfs_lookup+0x75/0xa0
Oct  3 18:55:23  kernel: [c026d0c2] xfs_vn_lookup+0x52/0x90
Oct  3 18:55:23  kernel: [c016fd08] do_lookup+0x148/0x190
Oct  3 18:55:23  kernel: [c0171cb4] __link_path_walk+0x814/0xe40
Oct  3 18:55:23  kernel: [c0172325] link_path_walk+0x45/0xc0
Oct  3 18:55:23  kernel: [c0172581] do_path_lookup+0x81/0x1c0
Oct  3 18:55:23  kernel: [c01712c3] getname+0xb3/0xe0
Oct  3 18:55:23  kernel: [c0172f8b] __user_walk_fd+0x3b/0x60
Oct  3 18:55:23  kernel: [c016bcdf] vfs_lstat_fd+0x1f/0x50
Oct  3 18:55:23  kernel: [c016bd5f] sys_lstat64+0xf/0x30
Oct  3 18:55:23  kernel: [c01040b0] sysenter_past_esp+0x5d/0x81
Oct  3 18:55:23  kernel: ===
Oct  3 18:55:23  kernel: Code: 10 8b 77 14 01 c2 8b 44 24 30 8b 34 b0 89 77 14 
89 54 8d 14 8d 51 01 89 55 00 8b 44 24 10 8b 77 10 3b 70 5c 72 c0 8b 17 8b 47 
04 89 42 04 89 10 83 7f 14 ff c7 07 00 01 10 00 c7 47 04 00 02 20 
Oct  3 18:55:23  kernel: EIP: [c016540c] cache_alloc_refill+0x11c/0x4f0 
SS:ESP 0068:d2531c98
Oct  3 18:55:26  kernel: Oops: 0002 [#2]
Oct  3 18:55:26  kernel: SMP 
Oct  3 18:55:26  kernel: CPU:7
Oct  3 18:55:26  kernel: EIP:0060:[c017bbe0]Not tainted VLI
Oct  3 18:55:26  kernel: EFLAGS: 00210282   (2.6.21-dl380-g5-20071001 #1)
Oct  3 18:55:26  kernel: EIP is at alloc_inode+0x20/0x170
Oct  3 18:55:26  kernel: eax: b4fd89ba   ebx: b4fd89ba   ecx: b4fd89ba   edx: 
b4fd89ba
Oct  3 18:55:26  kernel: esi: f29bb000   edi: f29bb000   ebp: ca743575   esp: 
d6747c64
Oct  3 18:55:26  kernel: ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
Oct  3 18:55:26  kernel: Process imapd (pid: 20054, ti=d6746000 task=e04a20b0 
task.ti=d6746000)
Oct  3 18:55:26  kernel: Stack:  c76fe0dc f29bb000 c017bd89  
 c04abda0 ca743575 
Oct  3 18:55:26  kernel: ca743575 f53b5800 c023fa38 cb2b4524 1b2595f3 0020 
f0dd7400 ded8b7a8 
Oct  3 18:55:26  kernel:  f53b5800 c04abda0 cb2b4524 cb2b4524 ca743575 
 0004 
Oct  3 18:55:26  kernel: Call Trace:
Oct  3 18:55:26  kernel: [c017bd89] iget_locked+0x59/0x130
Oct  3 18:55:26  kernel: [c023fa38] xfs_iget+0x78/0x160
Oct  3 18:55:26  kernel: [c025a697] xfs_trans_iget+0x117/0x190
Oct  3 18:55:26  kernel: [c0243d87] xfs_ialloc+0xc7/0x570
Oct  3 18:55:26  kernel: [c024aabc] xlog_grant_push_ail+0x3c/0x150
Oct  3 18:55:26  kernel: [c025b261] xfs_dir_ialloc+0x81/0x2d0
Oct  3 18:55:26  kernel: [c025855b] xfs_trans_reserve+0xab/0x230
Oct  3 18:55:26  kernel: [c0261aa5] xfs_create+0x395/0x6a0
Oct  3 18:55:26  kernel: [c023eac5] xfs_iunlock+0x85/0xa0
Oct  3 18:55:26  kernel: [c026d6b5] xfs_vn_mknod+0x235/0x360
Oct  3 18:55:26  kernel: [c01705cd] vfs_create+0xdd/0x140
Oct  3 18:55:26  kernel: [c01738ae] open_namei+0x58e/0x5f0
Oct  3 18:55:26  kernel: [c016716e] do_filp_open+0x2e/0x60
Oct  3 18:55:26  kernel: [c0166e4f] get_unused_fd+0x4f/0xb0
Oct  3 18:55:26  kernel: [c01671ea] do_sys_open+0x4a/0xe0
Oct  3 18:55:26  kernel: [c01672bc] sys_open+0x1c/0x20
Oct  3 18:55:26  kernel: [c01040b0] sysenter_past_esp+0x5d/0x81
Oct  3 18:55:26  kernel: 

Re: [DRBD-user] Crash on 2.6.22

2007-09-28 Thread Laurent CARON
Stefan Seifert wrote:
> The deadlock also occures with 0.7.x. A patch for that is floating around.

Here is a transcript from a mail I sent to Lars Ellenberg

It should 'normally' be fixed.

Am I wrong ?

Thanks



On Sun, Sep 16, 2007 at 05:34:01PM +0200, Laurent CARON wrote:
> > Lars Ellenberg a écrit :
>> > >On Fri, Sep 14, 2007 at 12:33:01AM +0200, Laurent CARON wrote:
>>> > >>Hi,
>>> > >>
>>> > >>After reading the thread about XFS, drbd 8.0.5 and 2.6.22+
kernels (XFS
>>> > >>filesystem locks on mounting with 0.8.0.5), i wonder if the same
applies
>>> > >>to using 0.7.24 and 2.6.22+.
>> > >
>> > >basically, yes.
>> > >so either don't use 0.7 with the kernels showing this behaviour,
>> > >or wait for some maintenance release,
>> > >or patch it yourself.
> >
> >
> > You said that this is fixed in the latest 8.x SVN.

it is fixed in 8.0.6

> > It is fixed in the latest 0.7.x SVN too ?

now, since you are so pushy today  :)
I just checked in something (r3062),
which should do the job, but is untested by me so far.
please test and report back.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Crash on 2.6.22

2007-09-28 Thread Laurent CARON
Hi,

I did experience a quite strange problem (at least for me) on the first
node of our 2 node cluster.

This is basically an imap/smtp/http proxy server.

One of the imapd processes started to use a lot of cpu, memory... this
morning.

Oomkiller showed up and killed slapd, imapd, amavisd

I then restarted those processes manually, and it went fine.

A few moments later i got the following messages on my ssh terminal:

kernel: Bad page state in process 'swapper'
kernel: Bad page state in process 'swapper'
kernel: page:c1032a40 flags:0x4400 mapping: mapcount:0 count:0
kernel: page:c1032a40 flags:0x4400 mapping: mapcount:0 count:0
kernel: Trying to fix it up, but a reboot is needed
kernel: Trying to fix it up, but a reboot is needed
kernel: Backtrace:
kernel: Backtrace:
kernel: Bad page state in process 'swapper'
kernel: Bad page state in process 'swapper'
kernel: page:c1032a40 flags:0x4400 mapping: mapcount:0 count:0
kernel: page:c1032a40 flags:0x4400 mapping: mapcount:0 count:0
kernel: Trying to fix it up, but a reboot is needed
kernel: Trying to fix it up, but a reboot is needed
kernel: Backtrace:
kernel: Backtrace:


The machine then completely locked, and did reboot (thanks to the watchdog).

This server is a HP DL380G5 with 12Gb memory, 8 SAS Disks,  a quite
standard box.


The $HOME directories are stored on a drbd (version: 0.7.24
(api:79/proto:74)) partition (with an XFS filesystem).

The only 'non standard' thing I did use is a swap file instead of a swap
partition.

$ free total   used   free sharedbuffers
 cached
Mem:  1247193274203645051568  0   39846680868
-/+ buffers/cache: 735512   11736420
Swap:   393208  0 393208


$ grep swap /etc/fstab
/var/tmp/swapfile swapswapdefaults0   0


Might this be the (or one of the) cause of this problem ?

.config is available here: http://zenon.apartia.fr/stuff/config-2.6.22

Thanks

Laurent
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [DRBD-user] Crash on 2.6.22

2007-09-28 Thread Laurent CARON
Hannes Dorbath wrote:
> On 28.09.2007 11:00, Laurent CARON wrote:
>> The $HOME directories are stored on a drbd (version: 0.7.24
>> (api:79/proto:74)) partition (with an XFS filesystem).
> 
> Was the deadlock with 2.6.22 only in 0.8.x? Is 0.7.x fine with 2.6.22?

I only experienced it with 2.6.22.

Since I pulled the latest svn tree of drbd 0.7.X, it should be pretty
deadlock safe.

Isn't it ?

Laurent
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [DRBD-user] Crash on 2.6.22

2007-09-28 Thread Laurent CARON
Hannes Dorbath wrote:
 On 28.09.2007 11:00, Laurent CARON wrote:
 The $HOME directories are stored on a drbd (version: 0.7.24
 (api:79/proto:74)) partition (with an XFS filesystem).
 
 Was the deadlock with 2.6.22 only in 0.8.x? Is 0.7.x fine with 2.6.22?

I only experienced it with 2.6.22.

Since I pulled the latest svn tree of drbd 0.7.X, it should be pretty
deadlock safe.

Isn't it ?

Laurent
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Crash on 2.6.22

2007-09-28 Thread Laurent CARON
Hi,

I did experience a quite strange problem (at least for me) on the first
node of our 2 node cluster.

This is basically an imap/smtp/http proxy server.

One of the imapd processes started to use a lot of cpu, memory... this
morning.

Oomkiller showed up and killed slapd, imapd, amavisd

I then restarted those processes manually, and it went fine.

A few moments later i got the following messages on my ssh terminal:

kernel: Bad page state in process 'swapper'
kernel: Bad page state in process 'swapper'
kernel: page:c1032a40 flags:0x4400 mapping: mapcount:0 count:0
kernel: page:c1032a40 flags:0x4400 mapping: mapcount:0 count:0
kernel: Trying to fix it up, but a reboot is needed
kernel: Trying to fix it up, but a reboot is needed
kernel: Backtrace:
kernel: Backtrace:
kernel: Bad page state in process 'swapper'
kernel: Bad page state in process 'swapper'
kernel: page:c1032a40 flags:0x4400 mapping: mapcount:0 count:0
kernel: page:c1032a40 flags:0x4400 mapping: mapcount:0 count:0
kernel: Trying to fix it up, but a reboot is needed
kernel: Trying to fix it up, but a reboot is needed
kernel: Backtrace:
kernel: Backtrace:


The machine then completely locked, and did reboot (thanks to the watchdog).

This server is a HP DL380G5 with 12Gb memory, 8 SAS Disks,  a quite
standard box.


The $HOME directories are stored on a drbd (version: 0.7.24
(api:79/proto:74)) partition (with an XFS filesystem).

The only 'non standard' thing I did use is a swap file instead of a swap
partition.

$ free total   used   free sharedbuffers
 cached
Mem:  1247193274203645051568  0   39846680868
-/+ buffers/cache: 735512   11736420
Swap:   393208  0 393208


$ grep swap /etc/fstab
/var/tmp/swapfile swapswapdefaults0   0


Might this be the (or one of the) cause of this problem ?

.config is available here: http://zenon.apartia.fr/stuff/config-2.6.22

Thanks

Laurent
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [DRBD-user] Crash on 2.6.22

2007-09-28 Thread Laurent CARON
Stefan Seifert wrote:
 The deadlock also occures with 0.7.x. A patch for that is floating around.

Here is a transcript from a mail I sent to Lars Ellenberg

It should 'normally' be fixed.

Am I wrong ?

Thanks



On Sun, Sep 16, 2007 at 05:34:01PM +0200, Laurent CARON wrote:
  Lars Ellenberg a écrit :
  On Fri, Sep 14, 2007 at 12:33:01AM +0200, Laurent CARON wrote:
  Hi,
  
  After reading the thread about XFS, drbd 8.0.5 and 2.6.22+
kernels (XFS
  filesystem locks on mounting with 0.8.0.5), i wonder if the same
applies
  to using 0.7.24 and 2.6.22+.
  
  basically, yes.
  so either don't use 0.7 with the kernels showing this behaviour,
  or wait for some maintenance release,
  or patch it yourself.
 
 
  You said that this is fixed in the latest 8.x SVN.

it is fixed in 8.0.6

  It is fixed in the latest 0.7.x SVN too ?

now, since you are so pushy today  :)
I just checked in something (r3062),
which should do the job, but is untested by me so far.
please test and report back.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: Server crashes unexpectedly

2007-08-23 Thread Laurent CARON
Frederik Deweerdt wrote:
> On Thu, Aug 23, 2007 at 01:15:12PM +0200, Laurent CARON wrote:
>> Hi,
>>
>> One of my server crashes randomly.
>>
>> I suspect a filesystem corruption.
> What makes you think so?  I'd check the memory with memtest.


I suspect the filesystem, because it happened to me on 2 other servers
in the past.

A reiserfs3 corruption occured, making the server crash with the same
kind of symptoms but that's only a guess).

Checking with memtest asap.

Thanks

Laurent
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


PROBLEM: Server crashes unexpectedly

2007-08-23 Thread Laurent CARON
Hi,

One of my server crashes randomly.

I suspect a filesystem corruption.

Can you please confirm this ?

Thanks

Here is the relevant part from /var/log/syslog



Aug 23 12:10:55 berlin kernel: BUG: unable to handle kernel paging
request at virtual address 74c1803d
Aug 23 12:10:55 berlin kernel: printing eip:
Aug 23 12:10:55 berlin kernel: c014fd25
Aug 23 12:10:55 berlin kernel: *pde = 
Aug 23 12:10:55 berlin kernel: Oops: 0002 [#1]
Aug 23 12:10:55 berlin kernel: SMP
Aug 23 12:10:55 berlin kernel: Modules linked in: xt_helper xt_state
iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack camellia xcbc
dm_mirror hisax
Aug 23 12:10:55 berlin kernel: CPU:1
Aug 23 12:10:55 berlin kernel: EIP:0060:[]Not tainted VLI
Aug 23 12:10:55 berlin kernel: EFLAGS: 00010082   (2.6.22-berlin #1)
Aug 23 12:10:55 berlin kernel: EIP is at free_block+0x61/0xfb
Aug 23 12:10:55 berlin kernel: eax: df2c   ebx: 0024   ecx:
d64ae080   edx: 74c18039
Aug 23 12:10:55 berlin kernel: esi: d64ae000   edi: dfe4b880   ebp:
dfe48840   esp: dff9df40
Aug 23 12:10:55 berlin kernel: ds: 007b   es: 007b   fs: 00d8  gs: 
 ss: 0068
Aug 23 12:10:55 berlin kernel: Process events/1 (pid: 8, ti=dff9c000
task=dff8e540 task.ti=dff9c000)
Aug 23 12:10:55 berlin kernel: Stack: 0009  000b
0001 dfe35cd8 dfe35cd4 000b dfe35cc0
Aug 23 12:10:55 berlin kernel: dfe4b880 c014fe37  dfe48840
dfe4b880 dfe48840 c1413a60 
Aug 23 12:10:55 berlin kernel: c0150b36   dffcd4c0
c1413a60 c0150aed c0127dd0 00ff
Aug 23 12:10:55 berlin kernel: Call Trace:
Aug 23 12:10:55 berlin kernel: [] drain_array+0x78/0x97
Aug 23 12:10:55 berlin kernel: [] cache_reap+0x49/0xe5
Aug 23 12:10:55 berlin kernel: [] cache_reap+0x0/0xe5
Aug 23 12:10:55 berlin kernel: [] run_workqueue+0x73/0xf5
Aug 23 12:10:55 berlin kernel: [] worker_thread+0x0/0xc6
Aug 23 12:10:55 berlin kernel: [] worker_thread+0xba/0xc6
Aug 23 12:10:55 berlin kernel: []
autoremove_wake_function+0x0/0x35
Aug 23 12:10:55 berlin kernel: [] kthread+0x38/0x5d
Aug 23 12:10:55 berlin kernel: [] kthread+0x0/0x5d
Aug 23 12:10:55 berlin kernel: [] kernel_thread_helper+0x7/0x10
Aug 23 12:10:55 berlin kernel: ===
Aug 23 12:10:55 berlin kernel: Code: 8b 02 25 00 40 02 00 3d 00 40 02 00
75 03 8b 52 0c 8b 02 84 c0 78 04 0f 0b eb fe 8b 72 1c 8b 54 24 28 8b 7c
95 68 8b 16 8b 46 04 <89>
 42 04 89 10 c7 06 00 01 10 00 c7 46 04 00 02 20 00 2b 4e 0c
Aug 23 12:10:55 berlin kernel: EIP: [] free_block+0x61/0xfb
SS:ESP 0068:dff9df40


Thanks

Laurent
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


PROBLEM: Server crashes unexpectedly

2007-08-23 Thread Laurent CARON
Hi,

One of my server crashes randomly.

I suspect a filesystem corruption.

Can you please confirm this ?

Thanks

Here is the relevant part from /var/log/syslog



Aug 23 12:10:55 berlin kernel: BUG: unable to handle kernel paging
request at virtual address 74c1803d
Aug 23 12:10:55 berlin kernel: printing eip:
Aug 23 12:10:55 berlin kernel: c014fd25
Aug 23 12:10:55 berlin kernel: *pde = 
Aug 23 12:10:55 berlin kernel: Oops: 0002 [#1]
Aug 23 12:10:55 berlin kernel: SMP
Aug 23 12:10:55 berlin kernel: Modules linked in: xt_helper xt_state
iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack camellia xcbc
dm_mirror hisax
Aug 23 12:10:55 berlin kernel: CPU:1
Aug 23 12:10:55 berlin kernel: EIP:0060:[c014fd25]Not tainted VLI
Aug 23 12:10:55 berlin kernel: EFLAGS: 00010082   (2.6.22-berlin #1)
Aug 23 12:10:55 berlin kernel: EIP is at free_block+0x61/0xfb
Aug 23 12:10:55 berlin kernel: eax: df2c   ebx: 0024   ecx:
d64ae080   edx: 74c18039
Aug 23 12:10:55 berlin kernel: esi: d64ae000   edi: dfe4b880   ebp:
dfe48840   esp: dff9df40
Aug 23 12:10:55 berlin kernel: ds: 007b   es: 007b   fs: 00d8  gs: 
 ss: 0068
Aug 23 12:10:55 berlin kernel: Process events/1 (pid: 8, ti=dff9c000
task=dff8e540 task.ti=dff9c000)
Aug 23 12:10:55 berlin kernel: Stack: 0009  000b
0001 dfe35cd8 dfe35cd4 000b dfe35cc0
Aug 23 12:10:55 berlin kernel: dfe4b880 c014fe37  dfe48840
dfe4b880 dfe48840 c1413a60 
Aug 23 12:10:55 berlin kernel: c0150b36   dffcd4c0
c1413a60 c0150aed c0127dd0 00ff
Aug 23 12:10:55 berlin kernel: Call Trace:
Aug 23 12:10:55 berlin kernel: [c014fe37] drain_array+0x78/0x97
Aug 23 12:10:55 berlin kernel: [c0150b36] cache_reap+0x49/0xe5
Aug 23 12:10:55 berlin kernel: [c0150aed] cache_reap+0x0/0xe5
Aug 23 12:10:55 berlin kernel: [c0127dd0] run_workqueue+0x73/0xf5
Aug 23 12:10:55 berlin kernel: [c01284f2] worker_thread+0x0/0xc6
Aug 23 12:10:55 berlin kernel: [c01285ac] worker_thread+0xba/0xc6
Aug 23 12:10:55 berlin kernel: [c012aaa1]
autoremove_wake_function+0x0/0x35
Aug 23 12:10:55 berlin kernel: [c012a9db] kthread+0x38/0x5d
Aug 23 12:10:55 berlin kernel: [c012a9a3] kthread+0x0/0x5d
Aug 23 12:10:55 berlin kernel: [c01030e7] kernel_thread_helper+0x7/0x10
Aug 23 12:10:55 berlin kernel: ===
Aug 23 12:10:55 berlin kernel: Code: 8b 02 25 00 40 02 00 3d 00 40 02 00
75 03 8b 52 0c 8b 02 84 c0 78 04 0f 0b eb fe 8b 72 1c 8b 54 24 28 8b 7c
95 68 8b 16 8b 46 04 89
 42 04 89 10 c7 06 00 01 10 00 c7 46 04 00 02 20 00 2b 4e 0c
Aug 23 12:10:55 berlin kernel: EIP: [c014fd25] free_block+0x61/0xfb
SS:ESP 0068:dff9df40


Thanks

Laurent
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: Server crashes unexpectedly

2007-08-23 Thread Laurent CARON
Frederik Deweerdt wrote:
 On Thu, Aug 23, 2007 at 01:15:12PM +0200, Laurent CARON wrote:
 Hi,

 One of my server crashes randomly.

 I suspect a filesystem corruption.
 What makes you think so?  I'd check the memory with memtest.


I suspect the filesystem, because it happened to me on 2 other servers
in the past.

A reiserfs3 corruption occured, making the server crash with the same
kind of symptoms but that's only a guess).

Checking with memtest asap.

Thanks

Laurent
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Oops on 2.6.21 + DRBD + XFS

2007-08-07 Thread Laurent CARON
Christoph Hellwig wrote:
> On Tue, Aug 07, 2007 at 09:31:22AM +1000, David Chinner wrote:
>> On Mon, Aug 06, 2007 at 09:38:19AM +0200, Laurent Caron wrote:
>>> Hi,
>>>
>>> I'm using an XFS filesystem over DRBD for a few weeks on this machine
>>> and did experience an oops.
>> ..
>>> Aug  3 10:59:47 fileserv kernel: [] cache_flusharray+0x59/0xd0 
>>> Aug  3 10:59:47 fileserv kernel: [] kmem_cache_free+0x5a/0x70 
>>> Aug  3 10:59:47 fileserv kernel: [] xfs_finish_reclaim+0x36/0x170 
>>> Aug  3 10:59:47 fileserv kernel: [] xfs_fs_clear_inode+0x91/0xc0 
>>> Aug  3 10:59:47 fileserv kernel: [] clear_inode+0x93/0x140 
>>> Aug  3 10:59:47 fileserv kernel: [] dispose_list+0x1a/0xd0 
>> Can you run with slab debugging turned on?
> 
> And please upgrade to a recent kernel.  I'm pretty sure drbd doesn't
> like the useage of slab memory for bios in older xfs versions.
> 

I upgraded to 2.6.22.

We'll see if it improves.

Thanks
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Oops on 2.6.21 + DRBD + XFS

2007-08-07 Thread Laurent CARON
David Chinner wrote:
> On Mon, Aug 06, 2007 at 09:38:19AM +0200, Laurent Caron wrote:
>> Hi,
>>
>> I'm using an XFS filesystem over DRBD for a few weeks on this machine
>> and did experience an oops.
> 
> ..
>> Aug  3 10:59:47 fileserv kernel: [] cache_flusharray+0x59/0xd0 
>> Aug  3 10:59:47 fileserv kernel: [] kmem_cache_free+0x5a/0x70 
>> Aug  3 10:59:47 fileserv kernel: [] xfs_finish_reclaim+0x36/0x170 
>> Aug  3 10:59:47 fileserv kernel: [] xfs_fs_clear_inode+0x91/0xc0 
>> Aug  3 10:59:47 fileserv kernel: [] clear_inode+0x93/0x140 
>> Aug  3 10:59:47 fileserv kernel: [] dispose_list+0x1a/0xd0 
> 
> Can you run with slab debugging turned on?

I unfortunately can't play much with this machine since it is a
production server.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Oops on 2.6.21 + DRBD + XFS

2007-08-07 Thread Laurent CARON
David Chinner wrote:
 On Mon, Aug 06, 2007 at 09:38:19AM +0200, Laurent Caron wrote:
 Hi,

 I'm using an XFS filesystem over DRBD for a few weeks on this machine
 and did experience an oops.
 
 ..
 Aug  3 10:59:47 fileserv kernel: [c0164be9] cache_flusharray+0x59/0xd0 
 Aug  3 10:59:47 fileserv kernel: [c0164d2a] kmem_cache_free+0x5a/0x70 
 Aug  3 10:59:47 fileserv kernel: [c025fa16] xfs_finish_reclaim+0x36/0x170 
 Aug  3 10:59:47 fileserv kernel: [c026fe41] xfs_fs_clear_inode+0x91/0xc0 
 Aug  3 10:59:47 fileserv kernel: [c017c063] clear_inode+0x93/0x140 
 Aug  3 10:59:47 fileserv kernel: [c017c38a] dispose_list+0x1a/0xd0 
 
 Can you run with slab debugging turned on?

I unfortunately can't play much with this machine since it is a
production server.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Oops on 2.6.21 + DRBD + XFS

2007-08-07 Thread Laurent CARON
Christoph Hellwig wrote:
 On Tue, Aug 07, 2007 at 09:31:22AM +1000, David Chinner wrote:
 On Mon, Aug 06, 2007 at 09:38:19AM +0200, Laurent Caron wrote:
 Hi,

 I'm using an XFS filesystem over DRBD for a few weeks on this machine
 and did experience an oops.
 ..
 Aug  3 10:59:47 fileserv kernel: [c0164be9] cache_flusharray+0x59/0xd0 
 Aug  3 10:59:47 fileserv kernel: [c0164d2a] kmem_cache_free+0x5a/0x70 
 Aug  3 10:59:47 fileserv kernel: [c025fa16] xfs_finish_reclaim+0x36/0x170 
 Aug  3 10:59:47 fileserv kernel: [c026fe41] xfs_fs_clear_inode+0x91/0xc0 
 Aug  3 10:59:47 fileserv kernel: [c017c063] clear_inode+0x93/0x140 
 Aug  3 10:59:47 fileserv kernel: [c017c38a] dispose_list+0x1a/0xd0 
 Can you run with slab debugging turned on?
 
 And please upgrade to a recent kernel.  I'm pretty sure drbd doesn't
 like the useage of slab memory for bios in older xfs versions.
 

I upgraded to 2.6.22.

We'll see if it improves.

Thanks
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Oops on 2.6.21 + DRBD + XFS

2007-08-06 Thread Laurent Caron
Hi,

I'm using an XFS filesystem over DRBD for a few weeks on this machine
and did experience an oops.

Aug  3 10:59:47 fileserv kernel: Oops: 0002 [#1] 
Aug  3 10:59:47 fileserv kernel: SMP  
Aug  3 10:59:47 fileserv kernel: CPU:3 
Aug  3 10:59:47 fileserv kernel: EIP:0060:[]Not tainted VLI 
Aug  3 10:59:47 fileserv kernel: EFLAGS: 00010082   (2.6.21-dl380-g5-20070628 
#1) 
Aug  3 10:59:47 fileserv kernel: EIP is at free_block+0x90/0x130 
Aug  3 10:59:47 fileserv kernel: eax: d1c3c000   ebx: 0007   ecx: cb163080  
 edx: 7c57dcba 
Aug  3 10:59:47 fileserv kernel: esi: cb163000   edi: f79f9d40   ebp: f7a1f4b0  
 esp: f7d3fe3c 
Aug  3 10:59:47 fileserv kernel: ds: 007b   es: 007b   fs: 00d8  gs:   ss: 
0068 
Aug  3 10:59:47 fileserv kernel: Process kswapd0 (pid: 240, ti=f7d3e000 
task=f7c0a0b0 task.ti=f7d3e000) 
Aug  3 10:59:47 fileserv kernel: Stack: 0009  001b f79ed440 
0007 f7a1f494 001b f79ed440  
Aug  3 10:59:47 fileserv kernel: f7a1e400 c0164be9  f7a1f480 f79f9d40 
f7a1f480 0246 f7724cc0  
Aug  3 10:59:47 fileserv kernel: 0001 c0164d2a f7724cc0 fa66746c e3234080 
c025fa16 0001 e323409c  
Aug  3 10:59:47 fileserv kernel: Call Trace: 
Aug  3 10:59:47 fileserv kernel: [] cache_flusharray+0x59/0xd0 
Aug  3 10:59:47 fileserv kernel: [] kmem_cache_free+0x5a/0x70 
Aug  3 10:59:47 fileserv kernel: [] xfs_finish_reclaim+0x36/0x170 
Aug  3 10:59:47 fileserv kernel: [] xfs_fs_clear_inode+0x91/0xc0 
Aug  3 10:59:47 fileserv kernel: [] clear_inode+0x93/0x140 
Aug  3 10:59:47 fileserv kernel: [] dispose_list+0x1a/0xd0 
Aug  3 10:59:47 fileserv kernel: [] shrink_icache_memory+0x17d/0x250 
Aug  3 10:59:47 fileserv kernel: [] shrink_slab+0x11f/0x180 
Aug  3 10:59:47 fileserv kernel: [] kswapd+0x357/0x450 
Aug  3 10:59:47 fileserv kernel: [] autoremove_wake_function+0x0/0x50 
Aug  3 10:59:47 fileserv kernel: [] kswapd+0x0/0x450 
Aug  3 10:59:47 fileserv kernel: [] kthread+0xbb/0xf0 
Aug  3 10:59:47 fileserv kernel: [] kthread+0x0/0xf0 
Aug  3 10:59:47 fileserv kernel: [] kernel_thread_helper+0x7/0x18 
Aug  3 10:59:47 fileserv kernel: === 
Aug  3 10:59:47 fileserv kernel: Code: 01 f2 8b 02 f6 c4 40 0f 85 a4 00 00 00 
8b 02 84 c0 0f 89 a7 00 00 00 8b 72 1c 8b 44 24 28 8b 54 24 0c 8b 7c 82 54 8b 
16 8b 46 04 <89> 42 04 89 10 8b 5e 0c c7 06 00 01 10 00 c7 46 04 00 02 20 00  
Aug  3 10:59:47 fileserv kernel: EIP: [] free_block+0x90/0x130 SS:ESP 
0068:f7d3fe3c 
---

Thanks

Laurent

PS: Can you please cc me while replying.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Oops on 2.6.21 + DRBD + XFS

2007-08-06 Thread Laurent Caron
Hi,

I'm using an XFS filesystem over DRBD for a few weeks on this machine
and did experience an oops.

Aug  3 10:59:47 fileserv kernel: Oops: 0002 [#1] 
Aug  3 10:59:47 fileserv kernel: SMP  
Aug  3 10:59:47 fileserv kernel: CPU:3 
Aug  3 10:59:47 fileserv kernel: EIP:0060:[c0164ed0]Not tainted VLI 
Aug  3 10:59:47 fileserv kernel: EFLAGS: 00010082   (2.6.21-dl380-g5-20070628 
#1) 
Aug  3 10:59:47 fileserv kernel: EIP is at free_block+0x90/0x130 
Aug  3 10:59:47 fileserv kernel: eax: d1c3c000   ebx: 0007   ecx: cb163080  
 edx: 7c57dcba 
Aug  3 10:59:47 fileserv kernel: esi: cb163000   edi: f79f9d40   ebp: f7a1f4b0  
 esp: f7d3fe3c 
Aug  3 10:59:47 fileserv kernel: ds: 007b   es: 007b   fs: 00d8  gs:   ss: 
0068 
Aug  3 10:59:47 fileserv kernel: Process kswapd0 (pid: 240, ti=f7d3e000 
task=f7c0a0b0 task.ti=f7d3e000) 
Aug  3 10:59:47 fileserv kernel: Stack: 0009  001b f79ed440 
0007 f7a1f494 001b f79ed440  
Aug  3 10:59:47 fileserv kernel: f7a1e400 c0164be9  f7a1f480 f79f9d40 
f7a1f480 0246 f7724cc0  
Aug  3 10:59:47 fileserv kernel: 0001 c0164d2a f7724cc0 fa66746c e3234080 
c025fa16 0001 e323409c  
Aug  3 10:59:47 fileserv kernel: Call Trace: 
Aug  3 10:59:47 fileserv kernel: [c0164be9] cache_flusharray+0x59/0xd0 
Aug  3 10:59:47 fileserv kernel: [c0164d2a] kmem_cache_free+0x5a/0x70 
Aug  3 10:59:47 fileserv kernel: [c025fa16] xfs_finish_reclaim+0x36/0x170 
Aug  3 10:59:47 fileserv kernel: [c026fe41] xfs_fs_clear_inode+0x91/0xc0 
Aug  3 10:59:47 fileserv kernel: [c017c063] clear_inode+0x93/0x140 
Aug  3 10:59:47 fileserv kernel: [c017c38a] dispose_list+0x1a/0xd0 
Aug  3 10:59:47 fileserv kernel: [c017c5bd] shrink_icache_memory+0x17d/0x250 
Aug  3 10:59:47 fileserv kernel: [c01522bf] shrink_slab+0x11f/0x180 
Aug  3 10:59:47 fileserv kernel: [c0152737] kswapd+0x357/0x450 
Aug  3 10:59:47 fileserv kernel: [c0132260] autoremove_wake_function+0x0/0x50 
Aug  3 10:59:47 fileserv kernel: [c01523e0] kswapd+0x0/0x450 
Aug  3 10:59:47 fileserv kernel: [c01320ab] kthread+0xbb/0xf0 
Aug  3 10:59:47 fileserv kernel: [c0131ff0] kthread+0x0/0xf0 
Aug  3 10:59:47 fileserv kernel: [c0104cff] kernel_thread_helper+0x7/0x18 
Aug  3 10:59:47 fileserv kernel: === 
Aug  3 10:59:47 fileserv kernel: Code: 01 f2 8b 02 f6 c4 40 0f 85 a4 00 00 00 
8b 02 84 c0 0f 89 a7 00 00 00 8b 72 1c 8b 44 24 28 8b 54 24 0c 8b 7c 82 54 8b 
16 8b 46 04 89 42 04 89 10 8b 5e 0c c7 06 00 01 10 00 c7 46 04 00 02 20 00  
Aug  3 10:59:47 fileserv kernel: EIP: [c0164ed0] free_block+0x90/0x130 SS:ESP 
0068:f7d3fe3c 
---

Thanks

Laurent

PS: Can you please cc me while replying.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Intel Ethernet PRO 100

2005-03-15 Thread Laurent CARON
shafa.hidee wrote:
Hi All,
   Where we can find specs for writing driver for Intel PRO 100 card.
Regards
Shafahidee
 

already supported.
isn't it?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Intel Ethernet PRO 100

2005-03-15 Thread Laurent CARON
shafa.hidee wrote:
Hi All,
   Where we can find specs for writing driver for Intel PRO 100 card.
Regards
Shafahidee
 

already supported.
isn't it?
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/