Re: 2.6.23.1: mdadm/raid5 hung/d-state
Dan Williams wrote: > The following patch, also attached, cleans up cases where the code looks > at sh->ops.pending when it should be looking at the consistent > stack-based snapshot of the operations flags. I tried this patch (against a stock 2.6.23), and it did not work for me. Not only did I/O to the effected RAID5 & XFS partition stop, but also I/O to all other disks. I was not able to capture any debugging information, but I should be able to do that tomorrow when I can hook a serial console to the machine. I'm not sure if my problem is identical to these others, as mine only seems to manifest with RAID5+XFS. The RAID rebuilds with no problem, and I've not had any problems with RAID5+ext3. > > > --- > > drivers/md/raid5.c | 16 +--- > 1 files changed, 9 insertions(+), 7 deletions(-) > > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c > index 496b9a3..e1a3942 100644 > --- a/drivers/md/raid5.c > +++ b/drivers/md/raid5.c > @@ -693,7 +693,8 @@ ops_run_prexor(struct stripe_head *sh, struct dma_async_tx_descriptor *tx) > } > > static struct dma_async_tx_descriptor * > -ops_run_biodrain(struct stripe_head *sh, struct dma_async_tx_descriptor *tx) > +ops_run_biodrain(struct stripe_head *sh, struct dma_async_tx_descriptor *tx, > + unsigned long pending) > { >int disks = sh->disks; >int pd_idx = sh->pd_idx, i; > @@ -701,7 +702,7 @@ ops_run_biodrain(struct stripe_head *sh, struct dma_async_tx_descriptor *tx) >/* check if prexor is active which means only process blocks > * that are part of a read-modify-write (Wantprexor) > */ > - int prexor = test_bit(STRIPE_OP_PREXOR, &sh->ops.pending); > + int prexor = test_bit(STRIPE_OP_PREXOR, &pending); > >pr_debug("%s: stripe %llu\n", __FUNCTION__, >(unsigned long long)sh->sector); > @@ -778,7 +779,8 @@ static void ops_complete_write(void *stripe_head_ref) > } > > static void > -ops_run_postxor(struct stripe_head *sh, struct dma_async_tx_descriptor *tx) > +ops_run_postxor(struct stripe_head *sh, struct dma_async_tx_descriptor *tx, > + unsigned long pending) > { >/* kernel stack size limits the total number of disks */ >int disks = sh->disks; > @@ -786,7 +788,7 @@ ops_run_postxor(struct stripe_head *sh, struct dma_async_tx_descriptor *tx) > >int count = 0, pd_idx = sh->pd_idx, i; >struct page *xor_dest; > - int prexor = test_bit(STRIPE_OP_PREXOR, &sh->ops.pending); > + int prexor = test_bit(STRIPE_OP_PREXOR, &pending); >unsigned long flags; >dma_async_tx_callback callback; > > @@ -813,7 +815,7 @@ ops_run_postxor(struct stripe_head *sh, struct dma_async_tx_descriptor *tx) >} > >/* check whether this postxor is part of a write */ > - callback = test_bit(STRIPE_OP_BIODRAIN, &sh->ops.pending) ? > + callback = test_bit(STRIPE_OP_BIODRAIN, &pending) ? >ops_complete_write : ops_complete_postxor; > >/* 1/ if we prexor'd then the dest is reused as a source > @@ -901,12 +903,12 @@ static void raid5_run_ops(struct stripe_head *sh, unsigned long pending) >tx = ops_run_prexor(sh, tx); > >if (test_bit(STRIPE_OP_BIODRAIN, &pending)) { > - tx = ops_run_biodrain(sh, tx); > + tx = ops_run_biodrain(sh, tx, pending); >overlap_clear++; >} > >if (test_bit(STRIPE_OP_POSTXOR, &pending)) > - ops_run_postxor(sh, tx); > + ops_run_postxor(sh, tx, pending); > >if (test_bit(STRIPE_OP_CHECK, &pending)) >ops_run_check(sh); > > - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.23.1: mdadm/raid5 hung/d-state
On Tue, 2007-11-06 at 03:19 -0700, BERTRAND Joël wrote: > Done. Here is obtained ouput : Much appreciated. > > [ 1260.969314] handling stripe 7629696, state=0x14 cnt=1, pd_idx=2 ops=0:0:0 > [ 1260.980606] check 5: state 0x6 toread read > write f800ffcffcc0 written > [ 1260.994808] check 4: state 0x6 toread read > write f800fdd4e360 written > [ 1261.009325] check 3: state 0x1 toread read > write written > [ 1261.244478] check 2: state 0x1 toread read > write written > [ 1261.270821] check 1: state 0x6 toread read > write f800ff517e40 written > [ 1261.312320] check 0: state 0x6 toread read > write f800fd4cae60 written > [ 1261.361030] locked=4 uptodate=2 to_read=0 to_write=4 failed=0 failed_num=0 > [ 1261.443120] for sector 7629696, rmw=0 rcw=0 [..] This looks as if the blocks were prepared to be written out, but were never handled in ops_run_biodrain(), so they remain locked forever. The operations flags are all clear which means handle_stripe thinks nothing else needs to be done. The following patch, also attached, cleans up cases where the code looks at sh->ops.pending when it should be looking at the consistent stack-based snapshot of the operations flags. --- drivers/md/raid5.c | 16 +--- 1 files changed, 9 insertions(+), 7 deletions(-) diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 496b9a3..e1a3942 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -693,7 +693,8 @@ ops_run_prexor(struct stripe_head *sh, struct dma_async_tx_descriptor *tx) } static struct dma_async_tx_descriptor * -ops_run_biodrain(struct stripe_head *sh, struct dma_async_tx_descriptor *tx) +ops_run_biodrain(struct stripe_head *sh, struct dma_async_tx_descriptor *tx, +unsigned long pending) { int disks = sh->disks; int pd_idx = sh->pd_idx, i; @@ -701,7 +702,7 @@ ops_run_biodrain(struct stripe_head *sh, struct dma_async_tx_descriptor *tx) /* check if prexor is active which means only process blocks * that are part of a read-modify-write (Wantprexor) */ - int prexor = test_bit(STRIPE_OP_PREXOR, &sh->ops.pending); + int prexor = test_bit(STRIPE_OP_PREXOR, &pending); pr_debug("%s: stripe %llu\n", __FUNCTION__, (unsigned long long)sh->sector); @@ -778,7 +779,8 @@ static void ops_complete_write(void *stripe_head_ref) } static void -ops_run_postxor(struct stripe_head *sh, struct dma_async_tx_descriptor *tx) +ops_run_postxor(struct stripe_head *sh, struct dma_async_tx_descriptor *tx, + unsigned long pending) { /* kernel stack size limits the total number of disks */ int disks = sh->disks; @@ -786,7 +788,7 @@ ops_run_postxor(struct stripe_head *sh, struct dma_async_tx_descriptor *tx) int count = 0, pd_idx = sh->pd_idx, i; struct page *xor_dest; - int prexor = test_bit(STRIPE_OP_PREXOR, &sh->ops.pending); + int prexor = test_bit(STRIPE_OP_PREXOR, &pending); unsigned long flags; dma_async_tx_callback callback; @@ -813,7 +815,7 @@ ops_run_postxor(struct stripe_head *sh, struct dma_async_tx_descriptor *tx) } /* check whether this postxor is part of a write */ - callback = test_bit(STRIPE_OP_BIODRAIN, &sh->ops.pending) ? + callback = test_bit(STRIPE_OP_BIODRAIN, &pending) ? ops_complete_write : ops_complete_postxor; /* 1/ if we prexor'd then the dest is reused as a source @@ -901,12 +903,12 @@ static void raid5_run_ops(struct stripe_head *sh, unsigned long pending) tx = ops_run_prexor(sh, tx); if (test_bit(STRIPE_OP_BIODRAIN, &pending)) { - tx = ops_run_biodrain(sh, tx); + tx = ops_run_biodrain(sh, tx, pending); overlap_clear++; } if (test_bit(STRIPE_OP_POSTXOR, &pending)) - ops_run_postxor(sh, tx); + ops_run_postxor(sh, tx, pending); if (test_bit(STRIPE_OP_CHECK, &pending)) ops_run_check(sh); raid5: fix unending write sequence From: Dan Williams <[EMAIL PROTECTED]> --- drivers/md/raid5.c | 16 +--- 1 files changed, 9 insertions(+), 7 deletions(-) diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 496b9a3..e1a3942 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -693,7 +693,8 @@ ops_run_prexor(struct stripe_head *sh, struct dma_async_tx_descriptor *tx) } static struct dma_async_tx_descriptor * -ops_run_biodrain(struct stripe_head *sh, struct dma_async_tx_descriptor *tx) +ops_run_biodrain(struct stripe_head *sh, struct dm
Re: Stack Trace. Bad?
Jon Nelson wrote: Whom should I contact regarding the forcedeth problem? A post to linux-netdev will get it looked at. Regards, Richard - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Stack Trace. Bad?
I was testing some network throughput today and ran into this. I'm going to bet it's a forcedeth driver problem but since it also involve software raid I thought I'd include it. Whom should I contact regarding the forcedeth problem? The following is only an harmless informational message. Unless you get a _continuous_flood_ of these messages it means everything is working fine. Allocations from irqs cannot be perfectly reliable and the kernel is designed to handle that. md0_raid5: page allocation failure. order:2, mode:0x20 Call Trace: [] __alloc_pages+0x324/0x33d [] kmem_getpages+0x66/0x116 [] fallback_alloc+0x104/0x174 [] kmem_cache_alloc_node+0x9c/0xa8 [] __alloc_skb+0x65/0x138 [] :forcedeth:nv_alloc_rx_optimized+0x4d/0x18f [] :forcedeth:nv_napi_poll+0x61f/0x71c [] net_rx_action+0xb2/0x1c5 [] __do_softirq+0x65/0xce [] call_softirq+0x1c/0x28 [] do_softirq+0x2c/0x7d [] do_IRQ+0xb6/0xd6 [] ret_from_intr+0x0/0xa [] mempool_free_slab+0x0/0xe [] _spin_unlock_irqrestore+0x8/0x9 [] bitmap_daemon_work+0xee/0x2f3 [] md_check_recovery+0x22/0x4b9 [] :raid456:raid5d+0x1b/0x3a2 [] del_timer_sync+0xc/0x16 [] schedule_timeout+0x92/0xad [] process_timeout+0x0/0x5 [] schedule_timeout+0x85/0xad [] md_thread+0xf2/0x10e [] autoremove_wake_function+0x0/0x2e [] md_thread+0x0/0x10e [] kthread+0x47/0x73 [] child_rip+0xa/0x12 [] kthread+0x0/0x73 [] child_rip+0x0/0x12 Mem-info: Node 0 DMA per-cpu: CPU0: Hot: hi:0, btch: 1 usd: 0 Cold: hi:0, btch: 1 usd: 0 CPU1: Hot: hi:0, btch: 1 usd: 0 Cold: hi:0, btch: 1 usd: 0 Node 0 DMA32 per-cpu: CPU0: Hot: hi: 186, btch: 31 usd: 115 Cold: hi: 62, btch: 15 usd: 31 CPU1: Hot: hi: 186, btch: 31 usd: 128 Cold: hi: 62, btch: 15 usd: 56 Active:111696 inactive:116497 dirty:31 writeback:0 unstable:0 free:1850 slab:19676 mapped:3608 pagetables:1217 bounce:0 Node 0 DMA free:3988kB min:40kB low:48kB high:60kB active:232kB inactive:5496kB present:10692kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 994 994 Node 0 DMA32 free:3412kB min:4012kB low:5012kB high:6016kB active:446552kB inactive:460492kB present:1018020kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 Node 0 DMA: 29*4kB 2*8kB 1*16kB 0*32kB 0*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 3988kB Node 0 DMA32: 419*4kB 147*8kB 19*16kB 0*32kB 1*64kB 0*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3476kB Swap cache: add 57, delete 57, find 0/0, race 0+0 Free swap = 979608kB Total swap = 979832kB Free swap: 979608kB 262128 pages of RAM 4938 reserved pages 108367 pages shared 0 pages swap cached -- Jon - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Software raid - controller options
Yes, I must have missed that. I've only been on the mailing list for a week or so. I did go through some of the archives though. I keep my kernel up to date, usually within a few days of a release. The 3ware and Areca cards sound nice, but I could buy quite a few drives for the price of those cards (for a 12 port card). Which is what made me start seriously considering software raid. Plus, from what I understand, with software raid it is easier to change out server parts than it is with hardware raid, i.e. swapping controllers or motherboard, etc. After reading a few responses that I have gotten, it sounds like a budget based *raid* card from a good vender with good linux support might be the best option to get a good number of ports on a PCIe interface, and have it work well with linux, all well being cheaper than a full blown hardware raid solution. Thanks for the info and I will have a look at the cards you mentioned. Lyle On Tue, 2007-11-06 at 00:41 -0600, Alberto Alonso wrote: > You've probably missed a discussion on issues I've been having with > SATA, software RAID and bad drivers. A clear thing from the responses > I got is that you really need to use a recent kernel, as they may have > fixed those problems. > > I didn't get clear responses indicating specific cards that are > known to work well when hardrives fail. But if you can deal with > a server crashing and then rebooting manually then software RAID > is the way to go. I've always been able to get the servers back > online even with the problematic drivers. > > I am happy with the 3ware cards and do use their hardware RAID to > avoid the problems that I've had. With those I've fully tested > 16 drive systems with 2 arrays using 2 8-port cards. Others have > recommended the Areca line. > > As for cheap "dumb" interfaces I am now using the RocketRAID 2220, > which gives you 8 ports on a PCI-X. I believe the "built" in RAID > on those is just firmware based so you may as well use them to > show the drives in normal/legacy mode and use software RAID on > top. Keep in mind I haven't fully tested this solution nor have > tested for proper functioning when a drive fails. > > Another inexpensive card I've used with good results is the Q-stor > PCI-X card, but I think this is now obsolete. > > Hope this helps, > > Alberto > > > On Tue, 2007-11-06 at 05:20 +0300, Lyle Schlueter wrote: > > Hello, > > > > I just started looking into software raid with linux a few weeks ago. I > > am outgrowing the commercial NAS product that I bought a while back. > > I've been learning as much as I can, suscribing to this mailing list, > > reading man pages, experimenting with loopback devices setting up and > > expanding test arrays. > > > > I have a few questions now that I'm sure someone here will be able to > > enlighten me about. > > First, I want to run a 12 drive raid 6, honestly, would I be better of > > going with true hardware raid like the areca ARC-1231ML vs software > > raid? I would prefer software raid just for the sheer cost savings. But > > what kind of processing power would it take to match or exceed a mid to > > high-level hardware controller? > > > > I haven't seen much, if any, discussion of this, but how many drives are > > people putting into software arrays? And how are you going about it? > > Motherboards seem to max out around 6-8 SATA ports. Do you just add SATA > > controllers? Looking around on newegg (and some googling) 2-port SATA > > controllers are pretty easy to find, but once you get to 4 ports the > > cards all seem to include some sort of built in *raid* functionality. > > Are there any 4+ port PCI-e SATA controllers cards? > > > > Are there any specific chipsets/brands of motherboards or controller > > cards that you software raid veterans prefer? > > > > Thank you for your time and any info you are able to give me! > > > > Lyle > > > > - > > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > > the body of a message to [EMAIL PROTECTED] > > More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.23.1: mdadm/raid5 hung/d-state
Justin Piszcz wrote: On Tue, 6 Nov 2007, BERTRAND Joël wrote: Justin Piszcz wrote: On Tue, 6 Nov 2007, BERTRAND Joël wrote: Done. Here is obtained ouput : [ 1265.899068] check 4: state 0x6 toread read write f800fdd4e360 written [ 1265.941328] check 3: state 0x1 toread read write written [ 1265.972129] check 2: state 0x1 toread read write written For information, after crash, I have : Root poulenc:[/sys/block] > cat /proc/mdstat Personalities : [raid1] [raid6] [raid5] [raid4] md_d0 : active raid5 sdc1[0] sdh1[5] sdg1[4] sdf1[3] sde1[2] sdd1[1] 1464725760 blocks level 5, 64k chunk, algorithm 2 [6/6] [UU] Regards, JKB After the crash it is not 'resyncing' ? No, it isn't... JKB After any crash/unclean shutdown the RAID should resync, if it doesn't, that's not good, I'd suggest running a raid check. The 'repair' is supposed to clean it, in some cases (md0=swap) it gets dirty again. Tue May 8 09:19:54 EDT 2007: Executing RAID health check for /dev/md0... Tue May 8 09:19:55 EDT 2007: Executing RAID health check for /dev/md1... Tue May 8 09:19:56 EDT 2007: Executing RAID health check for /dev/md2... Tue May 8 09:19:57 EDT 2007: Executing RAID health check for /dev/md3... Tue May 8 10:09:58 EDT 2007: cat /sys/block/md0/md/mismatch_cnt Tue May 8 10:09:58 EDT 2007: 2176 Tue May 8 10:09:58 EDT 2007: cat /sys/block/md1/md/mismatch_cnt Tue May 8 10:09:58 EDT 2007: 0 Tue May 8 10:09:58 EDT 2007: cat /sys/block/md2/md/mismatch_cnt Tue May 8 10:09:58 EDT 2007: 0 Tue May 8 10:09:58 EDT 2007: cat /sys/block/md3/md/mismatch_cnt Tue May 8 10:09:58 EDT 2007: 0 Tue May 8 10:09:58 EDT 2007: The meta-device /dev/md0 has 2176 mismatched sectors. Tue May 8 10:09:58 EDT 2007: Executing repair on /dev/md0 Tue May 8 10:09:59 EDT 2007: The meta-device /dev/md1 has no mismatched sectors. Tue May 8 10:10:00 EDT 2007: The meta-device /dev/md2 has no mismatched sectors. Tue May 8 10:10:01 EDT 2007: The meta-device /dev/md3 has no mismatched sectors. Tue May 8 10:20:02 EDT 2007: All devices are clean... Tue May 8 10:20:02 EDT 2007: cat /sys/block/md0/md/mismatch_cnt Tue May 8 10:20:02 EDT 2007: 2176 Tue May 8 10:20:02 EDT 2007: cat /sys/block/md1/md/mismatch_cnt Tue May 8 10:20:02 EDT 2007: 0 Tue May 8 10:20:02 EDT 2007: cat /sys/block/md2/md/mismatch_cnt Tue May 8 10:20:02 EDT 2007: 0 Tue May 8 10:20:02 EDT 2007: cat /sys/block/md3/md/mismatch_cnt Tue May 8 10:20:02 EDT 2007: 0 I cannot repair this raid volume. I cannot reboot server without sending stop+A. init 6 stops at "INIT:". After reboot, md0 is resynchronized. Regards, JKB - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.23.1: mdadm/raid5 hung/d-state
On Tue, 6 Nov 2007, BERTRAND Joël wrote: Justin Piszcz wrote: On Tue, 6 Nov 2007, BERTRAND Joël wrote: Done. Here is obtained ouput : [ 1265.899068] check 4: state 0x6 toread read write f800fdd4e360 written [ 1265.941328] check 3: state 0x1 toread read write written [ 1265.972129] check 2: state 0x1 toread read write written For information, after crash, I have : Root poulenc:[/sys/block] > cat /proc/mdstat Personalities : [raid1] [raid6] [raid5] [raid4] md_d0 : active raid5 sdc1[0] sdh1[5] sdg1[4] sdf1[3] sde1[2] sdd1[1] 1464725760 blocks level 5, 64k chunk, algorithm 2 [6/6] [UU] Regards, JKB After the crash it is not 'resyncing' ? No, it isn't... JKB After any crash/unclean shutdown the RAID should resync, if it doesn't, that's not good, I'd suggest running a raid check. The 'repair' is supposed to clean it, in some cases (md0=swap) it gets dirty again. Tue May 8 09:19:54 EDT 2007: Executing RAID health check for /dev/md0... Tue May 8 09:19:55 EDT 2007: Executing RAID health check for /dev/md1... Tue May 8 09:19:56 EDT 2007: Executing RAID health check for /dev/md2... Tue May 8 09:19:57 EDT 2007: Executing RAID health check for /dev/md3... Tue May 8 10:09:58 EDT 2007: cat /sys/block/md0/md/mismatch_cnt Tue May 8 10:09:58 EDT 2007: 2176 Tue May 8 10:09:58 EDT 2007: cat /sys/block/md1/md/mismatch_cnt Tue May 8 10:09:58 EDT 2007: 0 Tue May 8 10:09:58 EDT 2007: cat /sys/block/md2/md/mismatch_cnt Tue May 8 10:09:58 EDT 2007: 0 Tue May 8 10:09:58 EDT 2007: cat /sys/block/md3/md/mismatch_cnt Tue May 8 10:09:58 EDT 2007: 0 Tue May 8 10:09:58 EDT 2007: The meta-device /dev/md0 has 2176 mismatched sectors. Tue May 8 10:09:58 EDT 2007: Executing repair on /dev/md0 Tue May 8 10:09:59 EDT 2007: The meta-device /dev/md1 has no mismatched sectors. Tue May 8 10:10:00 EDT 2007: The meta-device /dev/md2 has no mismatched sectors. Tue May 8 10:10:01 EDT 2007: The meta-device /dev/md3 has no mismatched sectors. Tue May 8 10:20:02 EDT 2007: All devices are clean... Tue May 8 10:20:02 EDT 2007: cat /sys/block/md0/md/mismatch_cnt Tue May 8 10:20:02 EDT 2007: 2176 Tue May 8 10:20:02 EDT 2007: cat /sys/block/md1/md/mismatch_cnt Tue May 8 10:20:02 EDT 2007: 0 Tue May 8 10:20:02 EDT 2007: cat /sys/block/md2/md/mismatch_cnt Tue May 8 10:20:02 EDT 2007: 0 Tue May 8 10:20:02 EDT 2007: cat /sys/block/md3/md/mismatch_cnt Tue May 8 10:20:02 EDT 2007: 0
Re: 2.6.23.1: mdadm/raid5 hung/d-state
Justin Piszcz wrote: On Tue, 6 Nov 2007, BERTRAND Joël wrote: Done. Here is obtained ouput : [ 1265.899068] check 4: state 0x6 toread read write f800fdd4e360 written [ 1265.941328] check 3: state 0x1 toread read write written [ 1265.972129] check 2: state 0x1 toread read write written For information, after crash, I have : Root poulenc:[/sys/block] > cat /proc/mdstat Personalities : [raid1] [raid6] [raid5] [raid4] md_d0 : active raid5 sdc1[0] sdh1[5] sdg1[4] sdf1[3] sde1[2] sdd1[1] 1464725760 blocks level 5, 64k chunk, algorithm 2 [6/6] [UU] Regards, JKB After the crash it is not 'resyncing' ? No, it isn't... JKB - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.23.1: mdadm/raid5 hung/d-state
On Tue, 6 Nov 2007, BERTRAND Joël wrote: Done. Here is obtained ouput : [ 1265.899068] check 4: state 0x6 toread read write f800fdd4e360 written [ 1265.941328] check 3: state 0x1 toread read write written [ 1265.972129] check 2: state 0x1 toread read write written For information, after crash, I have : Root poulenc:[/sys/block] > cat /proc/mdstat Personalities : [raid1] [raid6] [raid5] [raid4] md_d0 : active raid5 sdc1[0] sdh1[5] sdg1[4] sdf1[3] sde1[2] sdd1[1] 1464725760 blocks level 5, 64k chunk, algorithm 2 [6/6] [UU] Regards, JKB After the crash it is not 'resyncing' ? Justin.
Re: 2.6.23.1: mdadm/raid5 hung/d-state
Done. Here is obtained ouput : [ 1260.967796] for sector 7629696, rmw=0 rcw=0 [ 1260.969314] handling stripe 7629696, state=0x14 cnt=1, pd_idx=2 ops=0:0:0 [ 1260.980606] check 5: state 0x6 toread read write f800ffcffcc0 written [ 1260.994808] check 4: state 0x6 toread read write f800fdd4e360 written [ 1261.009325] check 3: state 0x1 toread read write written [ 1261.244478] check 2: state 0x1 toread read write written [ 1261.270821] check 1: state 0x6 toread read write f800ff517e40 written [ 1261.312320] check 0: state 0x6 toread read write f800fd4cae60 written [ 1261.361030] locked=4 uptodate=2 to_read=0 to_write=4 failed=0 failed_num=0 [ 1261.443120] for sector 7629696, rmw=0 rcw=0 [ 1261.453348] handling stripe 7629696, state=0x14 cnt=1, pd_idx=2 ops=0:0:0 [ 1261.491538] check 5: state 0x6 toread read write f800ffcffcc0 written [ 1261.529120] check 4: state 0x6 toread read write f800fdd4e360 written [ 1261.560151] check 3: state 0x1 toread read write written [ 1261.599180] check 2: state 0x1 toread read write written [ 1261.637138] check 1: state 0x6 toread read write f800ff517e40 written [ 1261.674502] check 0: state 0x6 toread read write f800fd4cae60 written [ 1261.712589] locked=4 uptodate=2 to_read=0 to_write=4 failed=0 failed_num=0 [ 1261.864338] for sector 7629696, rmw=0 rcw=0 [ 1261.873475] handling stripe 7629696, state=0x14 cnt=1, pd_idx=2 ops=0:0:0 [ 1261.907840] check 5: state 0x6 toread read write f800ffcffcc0 written [ 1261.950770] check 4: state 0x6 toread read write f800fdd4e360 written [ 1261.989003] check 3: state 0x1 toread read write written [ 1262.019621] check 2: state 0x1 toread read write written [ 1262.068705] check 1: state 0x6 toread read write f800ff517e40 written [ 1262.113265] check 0: state 0x6 toread read write f800fd4cae60 written [ 1262.150511] locked=4 uptodate=2 to_read=0 to_write=4 failed=0 failed_num=0 [ 1262.171143] for sector 7629696, rmw=0 rcw=0 [ 1262.179142] handling stripe 7629696, state=0x14 cnt=1, pd_idx=2 ops=0:0:0 [ 1262.201905] check 5: state 0x6 toread read write f800ffcffcc0 written [ 1262.252750] check 4: state 0x6 toread read write f800fdd4e360 written [ 1262.289631] check 3: state 0x1 toread read write written [ 1262.344709] check 2: state 0x1 toread read write written [ 1262.400411] check 1: state 0x6 toread read write f800ff517e40 written [ 1262.437353] check 0: state 0x6 toread read write f800fd4cae60 written [ 1262.492561] locked=4 uptodate=2 to_read=0 to_write=4 failed=0 failed_num=0 [ 1262.524993] for sector 7629696, rmw=0 rcw=0 [ 1262.533314] handling stripe 7629696, state=0x14 cnt=1, pd_idx=2 ops=0:0:0 [ 1262.561900] check 5: state 0x6 toread read write f800ffcffcc0 written [ 1262.588986] check 4: state 0x6 toread read write f800fdd4e360 written [ 1262.619455] check 3: state 0x1 toread read write written [ 1262.671006] check 2: state 0x1 toread read write written [ 1262.709065] check 1: state 0x6 toread read write f800ff517e40 written [ 1262.746904] check 0: state 0x6 toread read write f800fd4cae60 written [ 1262.780203] locked=4 uptodate=2 to_read=0 to_write=4 failed=0 failed_num=0 [ 1262.805941] for sector 7629696, rmw=0 rcw=0 [ 1262.815759] handl
Re: Software raid - controller options
In message <[EMAIL PROTECTED]> you wrote: > > I had been looking at the Adaptec 2240900-R PCI Express and HighPoint > RocketRAID 2300 PCI Express. These are both *raid* cards. But if they > can be used as a regular controller card, they both provide 4 SATA ports > and are PCI-e. But sounds like the RocketRAID doesn't work with the > 2.6.22+ kernel (according to newegg reviewers). It sounds like the > Adaptec works quite nicely though. Be careful. Ideally, the controller should be supported by drivers that are included withthe standard kernel.org Linux tree. If this is not the case, try to find out if you get Linux drivers with *complete* source code. Both Highpoint (RocketRAID) and Adaptec are known to include binary-only modules in their drivers, even if they seem to provide source code. This is always a PITA and may seriously mimit your on specific *usually very old) kernel versions. > Sounds pretty iffy there. That Adaptec card I mentioned is going for > about 100 USD. Seems like a lot for 4 ports. But sounds like it works I returned such a card because I could not get it working with the kernel versions I wanted to run. On the other hand, the Supermicro card worked fine for me out of the box. Best regards, Wolfgang Denk -- DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: [EMAIL PROTECTED] When a woman marries again it is because she detested her first hus- band. When a man marries again, it is because he adored his first wife. -- Oscar Wilde - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html