Re: 2.6.23.1: mdadm/raid5 hung/d-state

2007-11-06 Thread Jeff Lessem

Dan Williams wrote:
> The following patch, also attached, cleans up cases where the code looks
> at sh->ops.pending when it should be looking at the consistent
> stack-based snapshot of the operations flags.

I tried this patch (against a stock 2.6.23), and it did not work for
me.  Not only did I/O to the effected RAID5 & XFS partition stop, but
also I/O to all other disks.  I was not able to capture any debugging
information, but I should be able to do that tomorrow when I can hook
a serial console to the machine.

I'm not sure if my problem is identical to these others, as mine only
seems to manifest with RAID5+XFS.  The RAID rebuilds with no problem,
and I've not had any problems with RAID5+ext3.

>
>
> ---
>
>  drivers/md/raid5.c |   16 +---
>  1 files changed, 9 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index 496b9a3..e1a3942 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -693,7 +693,8 @@ ops_run_prexor(struct stripe_head *sh, struct 
dma_async_tx_descriptor *tx)

>  }
>
>  static struct dma_async_tx_descriptor *
> -ops_run_biodrain(struct stripe_head *sh, struct dma_async_tx_descriptor *tx)
> +ops_run_biodrain(struct stripe_head *sh, struct dma_async_tx_descriptor *tx,
> +   unsigned long pending)
>  {
>int disks = sh->disks;
>int pd_idx = sh->pd_idx, i;
> @@ -701,7 +702,7 @@ ops_run_biodrain(struct stripe_head *sh, struct 
dma_async_tx_descriptor *tx)

>/* check if prexor is active which means only process blocks
> * that are part of a read-modify-write (Wantprexor)
> */
> -  int prexor = test_bit(STRIPE_OP_PREXOR, &sh->ops.pending);
> +  int prexor = test_bit(STRIPE_OP_PREXOR, &pending);
>
>pr_debug("%s: stripe %llu\n", __FUNCTION__,
>(unsigned long long)sh->sector);
> @@ -778,7 +779,8 @@ static void ops_complete_write(void *stripe_head_ref)
>  }
>
>  static void
> -ops_run_postxor(struct stripe_head *sh, struct dma_async_tx_descriptor *tx)
> +ops_run_postxor(struct stripe_head *sh, struct dma_async_tx_descriptor *tx,
> +  unsigned long pending)
>  {
>/* kernel stack size limits the total number of disks */
>int disks = sh->disks;
> @@ -786,7 +788,7 @@ ops_run_postxor(struct stripe_head *sh, struct 
dma_async_tx_descriptor *tx)

>
>int count = 0, pd_idx = sh->pd_idx, i;
>struct page *xor_dest;
> -  int prexor = test_bit(STRIPE_OP_PREXOR, &sh->ops.pending);
> +  int prexor = test_bit(STRIPE_OP_PREXOR, &pending);
>unsigned long flags;
>dma_async_tx_callback callback;
>
> @@ -813,7 +815,7 @@ ops_run_postxor(struct stripe_head *sh, struct 
dma_async_tx_descriptor *tx)

>}
>
>/* check whether this postxor is part of a write */
> -  callback = test_bit(STRIPE_OP_BIODRAIN, &sh->ops.pending) ?
> +  callback = test_bit(STRIPE_OP_BIODRAIN, &pending) ?
>ops_complete_write : ops_complete_postxor;
>
>/* 1/ if we prexor'd then the dest is reused as a source
> @@ -901,12 +903,12 @@ static void raid5_run_ops(struct stripe_head *sh, 
unsigned long pending)

>tx = ops_run_prexor(sh, tx);
>
>if (test_bit(STRIPE_OP_BIODRAIN, &pending)) {
> -  tx = ops_run_biodrain(sh, tx);
> +  tx = ops_run_biodrain(sh, tx, pending);
>overlap_clear++;
>}
>
>if (test_bit(STRIPE_OP_POSTXOR, &pending))
> -  ops_run_postxor(sh, tx);
> +  ops_run_postxor(sh, tx, pending);
>
>if (test_bit(STRIPE_OP_CHECK, &pending))
>ops_run_check(sh);
>
>

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.23.1: mdadm/raid5 hung/d-state

2007-11-06 Thread Dan Williams
On Tue, 2007-11-06 at 03:19 -0700, BERTRAND Joël wrote:
> Done. Here is obtained ouput :

Much appreciated.
> 
> [ 1260.969314] handling stripe 7629696, state=0x14 cnt=1, pd_idx=2 ops=0:0:0
> [ 1260.980606] check 5: state 0x6 toread  read 
>  write f800ffcffcc0 written 
> [ 1260.994808] check 4: state 0x6 toread  read 
>  write f800fdd4e360 written 
> [ 1261.009325] check 3: state 0x1 toread  read 
>  write  written 
> [ 1261.244478] check 2: state 0x1 toread  read 
>  write  written 
> [ 1261.270821] check 1: state 0x6 toread  read 
>  write f800ff517e40 written 
> [ 1261.312320] check 0: state 0x6 toread  read 
>  write f800fd4cae60 written 
> [ 1261.361030] locked=4 uptodate=2 to_read=0 to_write=4 failed=0 failed_num=0
> [ 1261.443120] for sector 7629696, rmw=0 rcw=0
[..]

This looks as if the blocks were prepared to be written out, but were
never handled in ops_run_biodrain(), so they remain locked forever.  The
operations flags are all clear which means handle_stripe thinks nothing
else needs to be done.

The following patch, also attached, cleans up cases where the code looks
at sh->ops.pending when it should be looking at the consistent
stack-based snapshot of the operations flags.


---

 drivers/md/raid5.c |   16 +---
 1 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 496b9a3..e1a3942 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -693,7 +693,8 @@ ops_run_prexor(struct stripe_head *sh, struct 
dma_async_tx_descriptor *tx)
 }
 
 static struct dma_async_tx_descriptor *
-ops_run_biodrain(struct stripe_head *sh, struct dma_async_tx_descriptor *tx)
+ops_run_biodrain(struct stripe_head *sh, struct dma_async_tx_descriptor *tx,
+unsigned long pending)
 {
int disks = sh->disks;
int pd_idx = sh->pd_idx, i;
@@ -701,7 +702,7 @@ ops_run_biodrain(struct stripe_head *sh, struct 
dma_async_tx_descriptor *tx)
/* check if prexor is active which means only process blocks
 * that are part of a read-modify-write (Wantprexor)
 */
-   int prexor = test_bit(STRIPE_OP_PREXOR, &sh->ops.pending);
+   int prexor = test_bit(STRIPE_OP_PREXOR, &pending);
 
pr_debug("%s: stripe %llu\n", __FUNCTION__,
(unsigned long long)sh->sector);
@@ -778,7 +779,8 @@ static void ops_complete_write(void *stripe_head_ref)
 }
 
 static void
-ops_run_postxor(struct stripe_head *sh, struct dma_async_tx_descriptor *tx)
+ops_run_postxor(struct stripe_head *sh, struct dma_async_tx_descriptor *tx,
+   unsigned long pending)
 {
/* kernel stack size limits the total number of disks */
int disks = sh->disks;
@@ -786,7 +788,7 @@ ops_run_postxor(struct stripe_head *sh, struct 
dma_async_tx_descriptor *tx)
 
int count = 0, pd_idx = sh->pd_idx, i;
struct page *xor_dest;
-   int prexor = test_bit(STRIPE_OP_PREXOR, &sh->ops.pending);
+   int prexor = test_bit(STRIPE_OP_PREXOR, &pending);
unsigned long flags;
dma_async_tx_callback callback;
 
@@ -813,7 +815,7 @@ ops_run_postxor(struct stripe_head *sh, struct 
dma_async_tx_descriptor *tx)
}
 
/* check whether this postxor is part of a write */
-   callback = test_bit(STRIPE_OP_BIODRAIN, &sh->ops.pending) ?
+   callback = test_bit(STRIPE_OP_BIODRAIN, &pending) ?
ops_complete_write : ops_complete_postxor;
 
/* 1/ if we prexor'd then the dest is reused as a source
@@ -901,12 +903,12 @@ static void raid5_run_ops(struct stripe_head *sh, 
unsigned long pending)
tx = ops_run_prexor(sh, tx);
 
if (test_bit(STRIPE_OP_BIODRAIN, &pending)) {
-   tx = ops_run_biodrain(sh, tx);
+   tx = ops_run_biodrain(sh, tx, pending);
overlap_clear++;
}
 
if (test_bit(STRIPE_OP_POSTXOR, &pending))
-   ops_run_postxor(sh, tx);
+   ops_run_postxor(sh, tx, pending);
 
if (test_bit(STRIPE_OP_CHECK, &pending))
ops_run_check(sh);

raid5: fix unending write sequence

From: Dan Williams <[EMAIL PROTECTED]>


---

 drivers/md/raid5.c |   16 +---
 1 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 496b9a3..e1a3942 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -693,7 +693,8 @@ ops_run_prexor(struct stripe_head *sh, struct dma_async_tx_descriptor *tx)
 }
 
 static struct dma_async_tx_descriptor *
-ops_run_biodrain(struct stripe_head *sh, struct dma_async_tx_descriptor *tx)
+ops_run_biodrain(struct stripe_head *sh, struct dm

Re: Stack Trace. Bad?

2007-11-06 Thread Richard Scobie

Jon Nelson wrote:


Whom should I contact regarding the forcedeth problem?


A post to linux-netdev will get it looked at.

Regards,

Richard
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Stack Trace. Bad?

2007-11-06 Thread Jon Nelson
I was testing some network throughput today and ran into this.
I'm going to bet it's a forcedeth driver problem but since it also
involve software raid I thought I'd include it.
Whom should I contact regarding the forcedeth problem?

The following is only an harmless informational message.
Unless you get a _continuous_flood_ of these messages it means
everything is working fine. Allocations from irqs cannot be
perfectly reliable and the kernel is designed to handle that.
md0_raid5: page allocation failure. order:2, mode:0x20

Call Trace:
   [] __alloc_pages+0x324/0x33d
 [] kmem_getpages+0x66/0x116
 [] fallback_alloc+0x104/0x174
 [] kmem_cache_alloc_node+0x9c/0xa8
 [] __alloc_skb+0x65/0x138
 [] :forcedeth:nv_alloc_rx_optimized+0x4d/0x18f
 [] :forcedeth:nv_napi_poll+0x61f/0x71c
 [] net_rx_action+0xb2/0x1c5
 [] __do_softirq+0x65/0xce
 [] call_softirq+0x1c/0x28
 [] do_softirq+0x2c/0x7d
 [] do_IRQ+0xb6/0xd6
 [] ret_from_intr+0x0/0xa
   [] mempool_free_slab+0x0/0xe
 [] _spin_unlock_irqrestore+0x8/0x9
 [] bitmap_daemon_work+0xee/0x2f3
 [] md_check_recovery+0x22/0x4b9
 [] :raid456:raid5d+0x1b/0x3a2
 [] del_timer_sync+0xc/0x16
 [] schedule_timeout+0x92/0xad
 [] process_timeout+0x0/0x5
 [] schedule_timeout+0x85/0xad
 [] md_thread+0xf2/0x10e
 [] autoremove_wake_function+0x0/0x2e
 [] md_thread+0x0/0x10e
 [] kthread+0x47/0x73
 [] child_rip+0xa/0x12
 [] kthread+0x0/0x73
 [] child_rip+0x0/0x12

Mem-info:
Node 0 DMA per-cpu:
CPU0: Hot: hi:0, btch:   1 usd:   0   Cold: hi:0, btch:   1 usd:   0
CPU1: Hot: hi:0, btch:   1 usd:   0   Cold: hi:0, btch:   1 usd:   0
Node 0 DMA32 per-cpu:
CPU0: Hot: hi:  186, btch:  31 usd: 115   Cold: hi:   62, btch:  15 usd:  31
CPU1: Hot: hi:  186, btch:  31 usd: 128   Cold: hi:   62, btch:  15 usd:  56
Active:111696 inactive:116497 dirty:31 writeback:0 unstable:0
 free:1850 slab:19676 mapped:3608 pagetables:1217 bounce:0
Node 0 DMA free:3988kB min:40kB low:48kB high:60kB active:232kB
inactive:5496kB present:10692kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 994 994
Node 0 DMA32 free:3412kB min:4012kB low:5012kB high:6016kB
active:446552kB inactive:460492kB present:1018020kB pages_scanned:0
all_unreclaimable? no
lowmem_reserve[]: 0 0 0
Node 0 DMA: 29*4kB 2*8kB 1*16kB 0*32kB 0*64kB 0*128kB 1*256kB 1*512kB
1*1024kB 1*2048kB 0*4096kB = 3988kB
Node 0 DMA32: 419*4kB 147*8kB 19*16kB 0*32kB 1*64kB 0*128kB 1*256kB
0*512kB 0*1024kB 0*2048kB 0*4096kB = 3476kB
Swap cache: add 57, delete 57, find 0/0, race 0+0
Free swap  = 979608kB
Total swap = 979832kB
 Free swap:   979608kB
262128 pages of RAM
4938 reserved pages
108367 pages shared
0 pages swap cached


-- 
Jon
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Software raid - controller options

2007-11-06 Thread Lyle Schlueter
Yes, I must have missed that. I've only been on the mailing list for a
week or so. I did go through some of the archives though. I keep my
kernel up to date, usually within a few days of a release. 

The 3ware and Areca cards sound nice, but I could buy quite a few drives
for the price of those cards (for a 12 port card). Which is what made me
start seriously considering software raid. Plus, from what I understand,
with software raid it is easier to change out server parts than it is
with hardware raid, i.e. swapping controllers or motherboard, etc.

After reading a few responses that I have gotten, it sounds like a
budget based *raid* card from a good vender with good linux support
might be the best option to get a good number of ports on a PCIe
interface, and have it work well with linux, all well being cheaper than
a full blown hardware raid solution.

Thanks for the info and I will have a look at the cards you mentioned.

Lyle


On Tue, 2007-11-06 at 00:41 -0600, Alberto Alonso wrote:
> You've probably missed a discussion on issues I've been having with
> SATA, software RAID and bad drivers. A clear thing from the responses 
> I got is that you really need to use a recent kernel, as they may have
> fixed those problems.
> 
> I didn't get clear responses indicating specific cards that are 
> known to work well when hardrives fail. But if you can deal with
> a server crashing and then rebooting manually then software RAID
> is the way to go. I've always been able to get the servers back
> online even with the problematic drivers.
> 
> I am happy with the 3ware cards and do use their hardware RAID to
> avoid the problems that I've had. With those I've fully tested
> 16 drive systems with 2 arrays using 2 8-port cards. Others have
> recommended the Areca line.
> 
> As for cheap "dumb" interfaces I am now using the RocketRAID 2220,
> which gives you 8 ports on a PCI-X. I believe the "built" in RAID
> on those is just firmware based so you may as well use them to
> show the drives in normal/legacy mode and use software RAID on
> top. Keep in mind I haven't fully tested this solution nor have
> tested for proper functioning when a drive fails.
> 
> Another inexpensive card I've used with good results is the Q-stor
> PCI-X card, but I think this is now obsolete.
> 
> Hope this helps,
> 
> Alberto
> 
> 
> On Tue, 2007-11-06 at 05:20 +0300, Lyle Schlueter wrote:
> > Hello,
> > 
> > I just started looking into software raid with linux a few weeks ago. I
> > am outgrowing the commercial NAS product that I bought a while back.
> > I've been learning as much as I can, suscribing to this mailing list,
> > reading man pages, experimenting with loopback devices setting up and
> > expanding test arrays. 
> > 
> > I have a few questions now that I'm sure someone here will be able to
> > enlighten me about.
> > First, I want to run a 12 drive raid 6, honestly, would I be better of
> > going with true hardware raid like the areca ARC-1231ML vs software
> > raid? I would prefer software raid just for the sheer cost savings. But
> > what kind of processing power would it take to match or exceed a mid to
> > high-level hardware controller?
> > 
> > I haven't seen much, if any, discussion of this, but how many drives are
> > people putting into software arrays? And how are you going about it?
> > Motherboards seem to max out around 6-8 SATA ports. Do you just add SATA
> > controllers? Looking around on newegg (and some googling) 2-port SATA
> > controllers are pretty easy to find, but once you get to 4 ports the
> > cards all seem to include some sort of built in *raid* functionality.
> > Are there any 4+ port PCI-e SATA controllers cards? 
> > 
> > Are there any specific chipsets/brands of motherboards or controller
> > cards that you software raid veterans prefer?
> > 
> > Thank you for your time and any info you are able to give me!
> > 
> > Lyle
> > 
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> > the body of a message to [EMAIL PROTECTED]
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.23.1: mdadm/raid5 hung/d-state

2007-11-06 Thread BERTRAND Joël

Justin Piszcz wrote:



On Tue, 6 Nov 2007, BERTRAND Joël wrote:


Justin Piszcz wrote:



On Tue, 6 Nov 2007, BERTRAND Joël wrote:


Done. Here is obtained ouput :

[ 1265.899068] check 4: state 0x6 toread  read 
 write f800fdd4e360 written 
[ 1265.941328] check 3: state 0x1 toread  read 
 write  written 
[ 1265.972129] check 2: state 0x1 toread  read 
 write  written 



For information, after crash, I have :

Root poulenc:[/sys/block] > cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md_d0 : active raid5 sdc1[0] sdh1[5] sdg1[4] sdf1[3] sde1[2] sdd1[1]
 1464725760 blocks level 5, 64k chunk, algorithm 2 [6/6] [UU]

Regards,

JKB


After the crash it is not 'resyncing' ?


No, it isn't...

JKB



After any crash/unclean shutdown the RAID should resync, if it doesn't, 
that's not good, I'd suggest running a raid check.


The 'repair' is supposed to clean it, in some cases (md0=swap) it gets 
dirty again.


Tue May  8 09:19:54 EDT 2007: Executing RAID health check for /dev/md0...
Tue May  8 09:19:55 EDT 2007: Executing RAID health check for /dev/md1...
Tue May  8 09:19:56 EDT 2007: Executing RAID health check for /dev/md2...
Tue May  8 09:19:57 EDT 2007: Executing RAID health check for /dev/md3...
Tue May  8 10:09:58 EDT 2007: cat /sys/block/md0/md/mismatch_cnt
Tue May  8 10:09:58 EDT 2007: 2176
Tue May  8 10:09:58 EDT 2007: cat /sys/block/md1/md/mismatch_cnt
Tue May  8 10:09:58 EDT 2007: 0
Tue May  8 10:09:58 EDT 2007: cat /sys/block/md2/md/mismatch_cnt
Tue May  8 10:09:58 EDT 2007: 0
Tue May  8 10:09:58 EDT 2007: cat /sys/block/md3/md/mismatch_cnt
Tue May  8 10:09:58 EDT 2007: 0
Tue May  8 10:09:58 EDT 2007: The meta-device /dev/md0 has 2176 
mismatched sectors.

Tue May  8 10:09:58 EDT 2007: Executing repair on /dev/md0
Tue May  8 10:09:59 EDT 2007: The meta-device /dev/md1 has no mismatched 
sectors.
Tue May  8 10:10:00 EDT 2007: The meta-device /dev/md2 has no mismatched 
sectors.
Tue May  8 10:10:01 EDT 2007: The meta-device /dev/md3 has no mismatched 
sectors.

Tue May  8 10:20:02 EDT 2007: All devices are clean...
Tue May  8 10:20:02 EDT 2007: cat /sys/block/md0/md/mismatch_cnt
Tue May  8 10:20:02 EDT 2007: 2176
Tue May  8 10:20:02 EDT 2007: cat /sys/block/md1/md/mismatch_cnt
Tue May  8 10:20:02 EDT 2007: 0
Tue May  8 10:20:02 EDT 2007: cat /sys/block/md2/md/mismatch_cnt
Tue May  8 10:20:02 EDT 2007: 0
Tue May  8 10:20:02 EDT 2007: cat /sys/block/md3/md/mismatch_cnt
Tue May  8 10:20:02 EDT 2007: 0


	I cannot repair this raid volume. I cannot reboot server without 
sending stop+A. init 6 stops at "INIT:". After reboot, md0 is 
resynchronized.


Regards,

JKB
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.23.1: mdadm/raid5 hung/d-state

2007-11-06 Thread Justin Piszcz



On Tue, 6 Nov 2007, BERTRAND Joël wrote:


Justin Piszcz wrote:



On Tue, 6 Nov 2007, BERTRAND Joël wrote:


Done. Here is obtained ouput :

[ 1265.899068] check 4: state 0x6 toread  read 
 write f800fdd4e360 written 
[ 1265.941328] check 3: state 0x1 toread  read 
 write  written 
[ 1265.972129] check 2: state 0x1 toread  read 
 write  written 



For information, after crash, I have :

Root poulenc:[/sys/block] > cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md_d0 : active raid5 sdc1[0] sdh1[5] sdg1[4] sdf1[3] sde1[2] sdd1[1]
 1464725760 blocks level 5, 64k chunk, algorithm 2 [6/6] [UU]

Regards,

JKB


After the crash it is not 'resyncing' ?


No, it isn't...

JKB



After any crash/unclean shutdown the RAID should resync, if it doesn't, 
that's not good, I'd suggest running a raid check.


The 'repair' is supposed to clean it, in some cases (md0=swap) it gets 
dirty again.


Tue May  8 09:19:54 EDT 2007: Executing RAID health check for /dev/md0...
Tue May  8 09:19:55 EDT 2007: Executing RAID health check for /dev/md1...
Tue May  8 09:19:56 EDT 2007: Executing RAID health check for /dev/md2...
Tue May  8 09:19:57 EDT 2007: Executing RAID health check for /dev/md3...
Tue May  8 10:09:58 EDT 2007: cat /sys/block/md0/md/mismatch_cnt
Tue May  8 10:09:58 EDT 2007: 2176
Tue May  8 10:09:58 EDT 2007: cat /sys/block/md1/md/mismatch_cnt
Tue May  8 10:09:58 EDT 2007: 0
Tue May  8 10:09:58 EDT 2007: cat /sys/block/md2/md/mismatch_cnt
Tue May  8 10:09:58 EDT 2007: 0
Tue May  8 10:09:58 EDT 2007: cat /sys/block/md3/md/mismatch_cnt
Tue May  8 10:09:58 EDT 2007: 0
Tue May  8 10:09:58 EDT 2007: The meta-device /dev/md0 has 2176 mismatched 
sectors.

Tue May  8 10:09:58 EDT 2007: Executing repair on /dev/md0
Tue May  8 10:09:59 EDT 2007: The meta-device /dev/md1 has no mismatched 
sectors.
Tue May  8 10:10:00 EDT 2007: The meta-device /dev/md2 has no mismatched 
sectors.
Tue May  8 10:10:01 EDT 2007: The meta-device /dev/md3 has no mismatched 
sectors.

Tue May  8 10:20:02 EDT 2007: All devices are clean...
Tue May  8 10:20:02 EDT 2007: cat /sys/block/md0/md/mismatch_cnt
Tue May  8 10:20:02 EDT 2007: 2176
Tue May  8 10:20:02 EDT 2007: cat /sys/block/md1/md/mismatch_cnt
Tue May  8 10:20:02 EDT 2007: 0
Tue May  8 10:20:02 EDT 2007: cat /sys/block/md2/md/mismatch_cnt
Tue May  8 10:20:02 EDT 2007: 0
Tue May  8 10:20:02 EDT 2007: cat /sys/block/md3/md/mismatch_cnt
Tue May  8 10:20:02 EDT 2007: 0


Re: 2.6.23.1: mdadm/raid5 hung/d-state

2007-11-06 Thread BERTRAND Joël

Justin Piszcz wrote:



On Tue, 6 Nov 2007, BERTRAND Joël wrote:


Done. Here is obtained ouput :

[ 1265.899068] check 4: state 0x6 toread  read 
 write f800fdd4e360 written 
[ 1265.941328] check 3: state 0x1 toread  read 
 write  written 
[ 1265.972129] check 2: state 0x1 toread  read 
 write  written 



For information, after crash, I have :

Root poulenc:[/sys/block] > cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md_d0 : active raid5 sdc1[0] sdh1[5] sdg1[4] sdf1[3] sde1[2] sdd1[1]
 1464725760 blocks level 5, 64k chunk, algorithm 2 [6/6] [UU]

Regards,

JKB


After the crash it is not 'resyncing' ?


No, it isn't...

JKB
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.23.1: mdadm/raid5 hung/d-state

2007-11-06 Thread Justin Piszcz



On Tue, 6 Nov 2007, BERTRAND Joël wrote:


Done. Here is obtained ouput :

[ 1265.899068] check 4: state 0x6 toread  read 
 write f800fdd4e360 written 
[ 1265.941328] check 3: state 0x1 toread  read 
 write  written 
[ 1265.972129] check 2: state 0x1 toread  read 
 write  written 



For information, after crash, I have :

Root poulenc:[/sys/block] > cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md_d0 : active raid5 sdc1[0] sdh1[5] sdg1[4] sdf1[3] sde1[2] sdd1[1]
 1464725760 blocks level 5, 64k chunk, algorithm 2 [6/6] [UU]

Regards,

JKB


After the crash it is not 'resyncing' ?

Justin.


Re: 2.6.23.1: mdadm/raid5 hung/d-state

2007-11-06 Thread BERTRAND Joël

Done. Here is obtained ouput :

[ 1260.967796] for sector 7629696, rmw=0 rcw=0
[ 1260.969314] handling stripe 7629696, state=0x14 cnt=1, pd_idx=2 ops=0:0:0
[ 1260.980606] check 5: state 0x6 toread  read 
 write f800ffcffcc0 written 
[ 1260.994808] check 4: state 0x6 toread  read 
 write f800fdd4e360 written 
[ 1261.009325] check 3: state 0x1 toread  read 
 write  written 
[ 1261.244478] check 2: state 0x1 toread  read 
 write  written 
[ 1261.270821] check 1: state 0x6 toread  read 
 write f800ff517e40 written 
[ 1261.312320] check 0: state 0x6 toread  read 
 write f800fd4cae60 written 
[ 1261.361030] locked=4 uptodate=2 to_read=0 to_write=4 failed=0 
failed_num=0

[ 1261.443120] for sector 7629696, rmw=0 rcw=0
[ 1261.453348] handling stripe 7629696, state=0x14 cnt=1, pd_idx=2 ops=0:0:0
[ 1261.491538] check 5: state 0x6 toread  read 
 write f800ffcffcc0 written 
[ 1261.529120] check 4: state 0x6 toread  read 
 write f800fdd4e360 written 
[ 1261.560151] check 3: state 0x1 toread  read 
 write  written 
[ 1261.599180] check 2: state 0x1 toread  read 
 write  written 
[ 1261.637138] check 1: state 0x6 toread  read 
 write f800ff517e40 written 
[ 1261.674502] check 0: state 0x6 toread  read 
 write f800fd4cae60 written 
[ 1261.712589] locked=4 uptodate=2 to_read=0 to_write=4 failed=0 
failed_num=0

[ 1261.864338] for sector 7629696, rmw=0 rcw=0
[ 1261.873475] handling stripe 7629696, state=0x14 cnt=1, pd_idx=2 ops=0:0:0
[ 1261.907840] check 5: state 0x6 toread  read 
 write f800ffcffcc0 written 
[ 1261.950770] check 4: state 0x6 toread  read 
 write f800fdd4e360 written 
[ 1261.989003] check 3: state 0x1 toread  read 
 write  written 
[ 1262.019621] check 2: state 0x1 toread  read 
 write  written 
[ 1262.068705] check 1: state 0x6 toread  read 
 write f800ff517e40 written 
[ 1262.113265] check 0: state 0x6 toread  read 
 write f800fd4cae60 written 
[ 1262.150511] locked=4 uptodate=2 to_read=0 to_write=4 failed=0 
failed_num=0

[ 1262.171143] for sector 7629696, rmw=0 rcw=0
[ 1262.179142] handling stripe 7629696, state=0x14 cnt=1, pd_idx=2 ops=0:0:0
[ 1262.201905] check 5: state 0x6 toread  read 
 write f800ffcffcc0 written 
[ 1262.252750] check 4: state 0x6 toread  read 
 write f800fdd4e360 written 
[ 1262.289631] check 3: state 0x1 toread  read 
 write  written 
[ 1262.344709] check 2: state 0x1 toread  read 
 write  written 
[ 1262.400411] check 1: state 0x6 toread  read 
 write f800ff517e40 written 
[ 1262.437353] check 0: state 0x6 toread  read 
 write f800fd4cae60 written 
[ 1262.492561] locked=4 uptodate=2 to_read=0 to_write=4 failed=0 
failed_num=0

[ 1262.524993] for sector 7629696, rmw=0 rcw=0
[ 1262.533314] handling stripe 7629696, state=0x14 cnt=1, pd_idx=2 ops=0:0:0
[ 1262.561900] check 5: state 0x6 toread  read 
 write f800ffcffcc0 written 
[ 1262.588986] check 4: state 0x6 toread  read 
 write f800fdd4e360 written 
[ 1262.619455] check 3: state 0x1 toread  read 
 write  written 
[ 1262.671006] check 2: state 0x1 toread  read 
 write  written 
[ 1262.709065] check 1: state 0x6 toread  read 
 write f800ff517e40 written 
[ 1262.746904] check 0: state 0x6 toread  read 


write f800fd4cae60 written 
[ 1262.780203] locked=4 uptodate=2 to_read=0 to_write=4 failed=0 
failed_num=0

[ 1262.805941] for sector 7629696, rmw=0 rcw=0
[ 1262.815759] handl

Re: Software raid - controller options

2007-11-06 Thread Wolfgang Denk
In message <[EMAIL PROTECTED]> you wrote:
> 
> I had been looking at the Adaptec 2240900-R PCI Express and HighPoint
> RocketRAID 2300 PCI Express. These are both *raid* cards. But if they
> can be used as a regular controller card, they both provide 4 SATA ports
> and are PCI-e. But sounds like the RocketRAID doesn't work with the
> 2.6.22+ kernel (according to newegg reviewers). It sounds like the
> Adaptec works quite nicely though. 

Be careful. Ideally, the controller should be  supported  by  drivers
that  are included withthe standard kernel.org Linux tree. If this is
not the case,  try  to  find  out  if  you  get  Linux  drivers  with
*complete*  source  code. Both Highpoint (RocketRAID) and Adaptec are
known to include binary-only modules in their drivers, even  if  they
seem  to provide source code. This is always a PITA and may seriously
mimit your on specific *usually very old) kernel versions.

> Sounds pretty iffy there. That Adaptec card I mentioned is going for
> about 100 USD. Seems like a lot for 4 ports. But sounds like it works

I returned such a card because I could not get it working with the
kernel versions I wanted to run.

On the other hand, the Supermicro card worked fine for me out of the
box.

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: [EMAIL PROTECTED]
When a woman marries again it is because she detested her first  hus-
band.  When  a  man  marries again, it is because he adored his first
wife.  -- Oscar Wilde
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html