Re: [RFC] A SCSI fault injection framework using SystemTap.

2008-01-22 Thread K.Tanaka

The new framework is tested on Fedora8(i386) running with kernel 2.6.23.12.
So far, I'm cleaning up the tool set for release, and plan to post it in the 
near future.

Now it's ready. The scsi fault injection tool is available from the following 
site.
https://sourceforge.net/projects/scsifaultinjtst/
If you have any comments, please let me know.

Additionally, the deadlock problem reproduced also on md RAID10. I think that 
the
same reason for RAID1 deadlock reported earlier cause this problem, because
raid10.c is based on raid1.c.
 e.g.
  -The kernel thread for md RAID1 could cause a deadlock when the error 
 handler for
md RAID1 contends with the write access to the md RAID1 array.

I've reproduced the deadlock on RAID10 using this tool with a small shell 
script for
automatically injecting a fault repeatedly. But I can't come up with any good
idea for the patch to fix this problem so far.

-- 

Kenichi TANAKA| Open Source Software Platform Development Division
  | Computers Software Operations Unit, NEC Corporation
  | [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: One Large md or Many Smaller md for Better Peformance?

2008-01-22 Thread Moshe Yudkowsky

Carlos Carvalho wrote:


I use reiser3 and xfs. reiser3 is very good with many small files. A
simple test shows interactively perceptible results: removing large
files is faster with xfs, removing large directories (ex. the kernel
tree) is faster with reiser3.


My current main concern about XFS and reiser3 is writebacks. The default 
mode for ext3 is journal, which in case of power failure is more 
robust than the writeback modes of XFS, reiser3, or JFS -- or so I'm 
given to understand.


On the other hand, I have a UPS and it should shut down gracefully 
regardless if there's a power failure. I wonder if I'm being too cautious?



--
Moshe Yudkowsky * [EMAIL PROTECTED] * www.pobox.com/~moshe
 Keep some secrets/Never tell,
  And they will keep you very well.
-- Michelle Shocked
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Raid1, mdadm and nfs that remains in D state

2008-01-22 Thread BERTRAND Joël

Hello,

	I have installed a lot of T1000 with debian/testing and official 
2.6.23.9 linux kernel. All but iscsi packages come from debian 
repositories. iscsi was built from SVN tree. md7 is a raid1 volume over 
iscsi and I can access to this device. This morning, one of my T1000 has 
crashed. NFS daemon stays in D state:


Root gershwin:[~]  ps auwx | grep NFS
root 17041  0.0  0.0   2064   744 ttyS0S+   12:33   0:00 grep NFS
Root gershwin:[~]  ps auwx | grep nfs
root 17043  0.0  0.0   2064   744 ttyS0S+   12:33   0:00 grep nfs
root 18276  0.0  0.0  0 0 ?D 2007  16:59 [nfsd]
root 18277  0.0  0.0  0 0 ?D 2007  16:56 [nfsd]
root 18278  0.0  0.0  0 0 ?D 2007  16:57 [nfsd]
root 18279  0.0  0.0  0 0 ?D 2007  16:41 [nfsd]
root 18280  0.0  0.0  0 0 ?D 2007  16:44 [nfsd]
root 18281  0.0  0.0  0 0 ?D 2007  16:49 [nfsd]
root 18282  0.0  0.0  0 0 ?D 2007  16:37 [nfsd]
root 18283  0.0  0.0  0 0 ?D 2007  16:54 [nfsd]
Root gershwin:[~]  dmesg
sp: f800f2bcf3b1 ret_pc: 005e6d54
RPC: raid1d+0x35c/0x1020
l0: f80060b8fa40 l1: 0050 l2: 0006 l3: 
0001
l4: f800fde2c8a0 l5: f800fc74dc20 l6: 0007 l7: 

i0: f800fb70c400 i1: f800fde2c8c8 i2: f8006297ee40 i3: 
f800
i4: 0010 i5: 007a2f00 i6: f800f2bcf4f1 i7: 
005f2f50

I7: md_thread+0x38/0x140
BUG: soft lockup - CPU#6 stuck for 11s! [md7_raid1:5818]
TSTATE: 80001600 TPC: 0055bff0 TNPC: 0055bff4 Y: 
Not tainted

TPC: loop+0x14/0x28
g0: 0020 g1: dffd57408000 g2: 0002a8ba2e81 g3: 

g4: f800fd52d960 g5: f800020bc000 g6: f800f2bcc000 g7: 

o0: f8009d13d254 o1: f80071755254 o2: 0dac o3: 

o4: 0018d1a6 o5: 00225c52 sp: f800f2bcf3b1 ret_pc: 
005e6d54

RPC: raid1d+0x35c/0x1020
l0: f80077d36ce0 l1: 0050 l2: 0006 l3: 
0001
l4: f800fde2c8a0 l5: f800f4372ea0 l6: 0007 l7: 

i0: f800fb70c400 i1: f800fde2c8c8 i2: f80091038660 i3: 
f800
i4: 0010 i5: 007a2f00 i6: f800f2bcf4f1 i7: 
005f2f50

I7: md_thread+0x38/0x140
BUG: soft lockup - CPU#6 stuck for 11s! [md7_raid1:5818]
TSTATE: 004480001607 TPC: 006803a0 TNPC: 006803a4 Y: 
Not tainted

TPC: _spin_unlock_irqrestore+0x28/0x40
g0: f800fed95000 g1:  g2: c0002000 g3: 
d0002000
g4: f800fd52d960 g5: f800020bc000 g6: f800f2bcc000 g7: 
f800ffcb
o0: f800fee16000 o1:  o2:  o3: 
f800fee16000
o4:  o5: 00784000 sp: f800f2bceda1 ret_pc: 
005a4fb8

RPC: tg3_poll+0x820/0xc40
l0: 042a l1: 0001 l2: f800f79aba00 l3: 
01d0
l4: f800fed95700 l5: f800f1091ec0 l6: 01d0 l7: 
0001
i0: 01df i1: 0029 i2: 01df i3: 
0029
i4: f800fed95794 i5: 94479812 i6: f800f2bcee81 i7: 
00609780

I7: net_rx_action+0x88/0x160
BUG: soft lockup - CPU#6 stuck for 11s! [md7_raid1:5818]
TSTATE: 009980001602 TPC: 10170100 TNPC: 10170104 Y: 
Not tainted

TPC: ipv4_get_l4proto+0x8/0xa0 [nf_conntrack_ipv4]
g0: 1002bb58 g1: 006c g2: f800eba32b0c g3: 
10170100
g4: f800fd52d960 g5: f800020bc000 g6: f800f2bcc000 g7: 
0003
o0: f800d69aae00 o1:  o2: f800f2bced24 o3: 
f800f2bced2f
o4: f800fed95000 o5: f800f2bceec8 sp: f800f2bce411 ret_pc: 
10019d7c

RPC: nf_conntrack_in+0xa4/0x580 [nf_conntrack]
l0: 0002 l1: 10175590 l2: 8000 l3: 
0002
l4:  l5: 0cbcc8bb l6: 0002 l7: 
f80062b8f820
i0: 0002 i1: 0003 i2: f800f2bcf080 i3: 
f800fed95000
i4: 00630260 i5: 00630260 i6: f800f2bce541 i7: 
0062517c

I7: nf_iterate+0x84/0xe0
BUG: soft lockup - CPU#6 stuck for 11s! [md7_raid1:5818]
TSTATE: 004480001605 TPC: 10161030 TNPC: 10161034 Y: 
Not tainted

TPC: ipt_do_table+0xd8/0x5a0 [ip_tables]
g0: 0001 g1:  g2: c0a80001 g3: 

g4: f800fd52d960 g5: f800020bc000 g6: f800f2bcc000 g7: 
0be0
o0: 10180b74 o1: f800f2bcf480 o2:  o3: 
f800fed95000
o4:  o5: f8005ef72be0 sp: f800f2bce821 ret_pc: 
10160fac

RPC: ipt_do_table+0x54/0x5a0 [ip_tables]
l0: 

Re: One Large md or Many Smaller md for Better Peformance?

2008-01-22 Thread Tomasz Chmielewski

Moshe Yudkowsky schrieb:

Carlos Carvalho wrote:


I use reiser3 and xfs. reiser3 is very good with many small files. A
simple test shows interactively perceptible results: removing large
files is faster with xfs, removing large directories (ex. the kernel
tree) is faster with reiser3.


My current main concern about XFS and reiser3 is writebacks. The default 
mode for ext3 is journal, which in case of power failure is more 
robust than the writeback modes of XFS, reiser3, or JFS -- or so I'm 
given to understand.


Also, barriers (barrier=1 option for ext3) are not supported on 
filesystems placed on md/dm, it's a bit of a pain.



--
Tomasz Chmielewski
http://wpkg.org
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: One Large md or Many Smaller md for Better Peformance?

2008-01-22 Thread Bill Davidsen

Moshe Yudkowsky wrote:

Carlos Carvalho wrote:


I use reiser3 and xfs. reiser3 is very good with many small files. A
simple test shows interactively perceptible results: removing large
files is faster with xfs, removing large directories (ex. the kernel
tree) is faster with reiser3.


My current main concern about XFS and reiser3 is writebacks. The 
default mode for ext3 is journal, which in case of power failure is 
more robust than the writeback modes of XFS, reiser3, or JFS -- or so 
I'm given to understand.


On the other hand, I have a UPS and it should shut down gracefully 
regardless if there's a power failure. I wonder if I'm being too 
cautious?



No.

If you haven't actually *tested* the UPS failover code to be sure your 
system is talking to the UPS properly, and that the UPS is able to hold 
up power long enough for a shutdown after the system detects the 
problem, then you don't know if you actually have protection.  Even 
then, if you don't proactively replace batteries on schedule, etc, then 
you aren't as protected as you might wish to be.


And CPU fans fail, capacitors pop, power supplies fail, etc. These are 
things which have happened here in the last ten years. I also had a 
charging circuit in a UPS half-fail (from full wave rectifier to half). 
So the UPS would discharge until it ran out of power, then the system 
would fail hard. By the time I got on site and rebooted the UPS had 
trickle charged and would run the system. After replacing two 
intermittent power supplies in the system, the UPS was swapped on 
general principles and the real problem was isolated.


Shit happens, don't rely on graceful shutdowns (or recovery, have backups).

--
Bill Davidsen [EMAIL PROTECTED]
 Woe unto the statesman who makes war without a reason that will still
 be valid when the war is over... Otto von Bismark 



-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: One Large md or Many Smaller md for Better Peformance?

2008-01-22 Thread Iustin Pop
On Tue, Jan 22, 2008 at 05:34:14AM -0600, Moshe Yudkowsky wrote:
 Carlos Carvalho wrote:

 I use reiser3 and xfs. reiser3 is very good with many small files. A
 simple test shows interactively perceptible results: removing large
 files is faster with xfs, removing large directories (ex. the kernel
 tree) is faster with reiser3.

 My current main concern about XFS and reiser3 is writebacks. The default  
 mode for ext3 is journal, which in case of power failure is more  
 robust than the writeback modes of XFS, reiser3, or JFS -- or so I'm  
 given to understand.

 On the other hand, I have a UPS and it should shut down gracefully  
 regardless if there's a power failure. I wonder if I'm being too 
 cautious?

I'm not sure what your actual worry is. It's not like XFS loses
*commited* data on power failure. It may lose data that was never
required to go to disk via fsync()/fdatasync()/sync. If someone is
losing data on power failure is the unprotected write cache of the
harddrive.

If you have properly-behaved applications, then they know when to do an
fsync and if XFS returns success on fsync and your linux is properly
configured (no write-back caches on drives that are not backed by NVRAM,
etc.) then you won't lose data.

regards,
iustin
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] md: constify function pointer tables

2008-01-22 Thread Jan Engelhardt
Signed-off-by: Jan Engelhardt [EMAIL PROTECTED]
---
 drivers/md/md.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index cef9ebd..6295b90 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -5033,7 +5033,7 @@ static int md_seq_show(struct seq_file *seq, void *v)
return 0;
 }
 
-static struct seq_operations md_seq_ops = {
+static const struct seq_operations md_seq_ops = {
.start  = md_seq_start,
.next   = md_seq_next,
.stop   = md_seq_stop,
-- 
1.5.3.4

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: idle array consuming cpu ??!!

2008-01-22 Thread Bill Davidsen

Carlos Carvalho wrote:

Neil Brown ([EMAIL PROTECTED]) wrote on 21 January 2008 12:15:
 On Sunday January 20, [EMAIL PROTECTED] wrote:
  A raid6 array with a spare and bitmap is idle: not mounted and with no
  IO to it or any of its disks (obviously), as shown by iostat. However
  it's consuming cpu: since reboot it used about 11min in 24h, which is quite
  a lot even for a busy array (the cpus are fast). The array was cleanly
  shutdown so there's been no reconstruction/check or anything else.
  
  How can this be? Kernel is 2.6.22.16 with the two patches for the

  deadlock ([PATCH 004 of 4] md: Fix an occasional deadlock in raid5 -
  FIX) and the previous one.
 
 Maybe the bitmap code is waking up regularly to do nothing.
 
 Would you be happy to experiment?  Remove the bitmap with
mdadm --grow /dev/mdX --bitmap=none
 
 and see how that affects cpu usage?

Confirmed, removing the bitmap stopped cpu consumption.


Looks like quite a bit of CPU going into idle arrays here, too.

--
Bill Davidsen [EMAIL PROTECTED]
 Woe unto the statesman who makes war without a reason that will still
 be valid when the war is over... Otto von Bismark 



-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: idle array consuming cpu ??!!

2008-01-22 Thread Carlos Carvalho
Bill Davidsen ([EMAIL PROTECTED]) wrote on 22 January 2008 17:53:
 Carlos Carvalho wrote:
  Neil Brown ([EMAIL PROTECTED]) wrote on 21 January 2008 12:15:
   On Sunday January 20, [EMAIL PROTECTED] wrote:
A raid6 array with a spare and bitmap is idle: not mounted and with no
IO to it or any of its disks (obviously), as shown by iostat. However
it's consuming cpu: since reboot it used about 11min in 24h, which is 
  quite
a lot even for a busy array (the cpus are fast). The array was cleanly
shutdown so there's been no reconstruction/check or anything else.

How can this be? Kernel is 2.6.22.16 with the two patches for the
deadlock ([PATCH 004 of 4] md: Fix an occasional deadlock in raid5 -
FIX) and the previous one.
   
   Maybe the bitmap code is waking up regularly to do nothing.
   
   Would you be happy to experiment?  Remove the bitmap with
  mdadm --grow /dev/mdX --bitmap=none
   
   and see how that affects cpu usage?
 
  Confirmed, removing the bitmap stopped cpu consumption.
 
 Looks like quite a bit of CPU going into idle arrays here, too.

I don't mind the cpu time (in the machines where we use it here), what
worries me is that it shouldn't happen when the disks are completely
idle. Looks like there's a bug somewhere.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


AACRAID driver broken in 2.6.22.x (and beyond?) [WAS: Re: 2.6.22.16 MD raid1 doesn't mark removed disk faulty, MD thread goes UN]

2008-01-22 Thread Mike Snitzer
On Jan 22, 2008 12:29 AM, Mike Snitzer [EMAIL PROTECTED] wrote:
 cc'ing Tanaka-san given his recent raid1 BUG report:
 http://lkml.org/lkml/2008/1/14/515


 On Jan 21, 2008 6:04 PM, Mike Snitzer [EMAIL PROTECTED] wrote:
  Under 2.6.22.16, I physically pulled a SATA disk (/dev/sdac, connected to
  an aacraid controller) that was acting as the local raid1 member of
  /dev/md30.
 
  Linux MD didn't see an /dev/sdac1 error until I tried forcing the issue by
  doing a read (with dd) from /dev/md30:

 The raid1d thread is locked at line 720 in raid1.c (raid1d+2437); aka
 freeze_array:

 (gdb) l *0x2539
 0x2539 is in raid1d (drivers/md/raid1.c:720).
 715  * wait until barrier+nr_pending match nr_queued+2
 716  */
 717 spin_lock_irq(conf-resync_lock);
 718 conf-barrier++;
 719 conf-nr_waiting++;
 720 wait_event_lock_irq(conf-wait_barrier,
 721 conf-barrier+conf-nr_pending ==
 conf-nr_queued+2,
 722 conf-resync_lock,
 723 raid1_unplug(conf-mddev-queue));
 724 spin_unlock_irq(conf-resync_lock);

 Given Tanaka-san's report against 2.6.23 and me hitting what seems to
 be the same deadlock in 2.6.22.16; it stands to reason this affects
 raid1 in 2.6.24-rcX too.

Turns out that the aacraid driver in 2.6.22.x is HORRIBLY BROKEN (when
you pull a drive); it responds to MD's write requests with uptodate=1
(in raid1_end_write_request) for the drive that was pulled!  I've not
looked to see if aacraid has been fixed in newer kernels... are others
aware of any crucial aacraid fixes in 2.6.23.x or 2.6.24?

After the drive was physically pulled, and small periodic writes
continued to the associated MD device, the raid1 MD driver did _NOT_
detect the pulled drive's writes as having failed (verified this with
systemtap).  MD happily thought the write completed to both members
(so MD had no reason to mark the pulled drive faulty; or mark the
raid degraded).

Installing an Adaptec-provided 1.1-5[2451] driver enabled raid1 to
work as expected.

That said, I now have a recipe for hitting the raid1 deadlock that
Tanaka first reported over a week ago.  I'm still surprised that all
of this chatter about that BUG hasn't drawn interest/scrutiny from
others!?

regards,
Mike
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Problem with raid5 grow/resize (not restripe)

2008-01-22 Thread Peter Rabbitson

Hello,

I can not seem to be able to extend slightly a raid volume of mine. I issue 
the command:


mdadm --grow --size=max /dev/md5

it completes and nothing happens. The kernel log is empty, however the even 
counter on the drive is incremented by +3.


Here is what I have (yes I know that I am resizing only by about 200MB). Why 
am I not able to reach 824.8GiB?


Thank you for your help.



[EMAIL PROTECTED]:~# cat /proc/mdstat
Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md5 : active raid5 sda3[4] sdd3[3] sdc3[2] sdb3[1]
  864276480 blocks super 1.1 level 5, 2048k chunk, algorithm 2 [4/4] []

md10 : active raid10 sdd2[3] sdc2[2] sdb2[1] sda2[0]
  5353472 blocks 1024K chunks 3 far-copies [4/4] []

md1 : active raid1 sdd1[1] sdc1[0] sdb1[3] sda1[2]
  56128 blocks [4/4] []

unused devices: none
[EMAIL PROTECTED]:~#



[EMAIL PROTECTED]:~# mdadm -D /dev/md5
/dev/md5:
Version : 01.01.03
  Creation Time : Tue Jan 22 03:52:42 2008
 Raid Level : raid5
 Array Size : 864276480 (824.24 GiB 885.02 GB)
  Used Dev Size : 576184320 (274.75 GiB 295.01 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 5
Persistence : Superblock is persistent

Update Time : Wed Jan 23 02:21:47 2008
  State : clean
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0

 Layout : left-symmetric
 Chunk Size : 2048K

   Name : Thesaurus:Crypta  (local to host Thesaurus)
   UUID : 1decb2d1:ebf16128:a240938a:669b0999
 Events : 5632

Number   Major   Minor   RaidDevice State
   4   830  active sync   /dev/sda3
   1   8   191  active sync   /dev/sdb3
   2   8   352  active sync   /dev/sdc3
   3   8   513  active sync   /dev/sdd3
[EMAIL PROTECTED]:~#



[EMAIL PROTECTED]:~# fdisk -l /dev/sd[abcd]

Disk /dev/sda: 400.0 GB, 400088457216 bytes
255 heads, 63 sectors/track, 48641 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x

   Device Boot  Start End  Blocks   Id  System
/dev/sda1   1   7   56196   fd  Linux raid autodetect
/dev/sda2   8 507 4016250   fd  Linux raid autodetect
/dev/sda3 508   36385   288190035   83  Linux
/dev/sda4   36386   4864198446320   83  Linux

Disk /dev/sdb: 320.0 GB, 320072933376 bytes
255 heads, 63 sectors/track, 38913 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x

   Device Boot  Start End  Blocks   Id  System
/dev/sdb1   1   7   56196   fd  Linux raid autodetect
/dev/sdb2   8 507 4016250   fd  Linux raid autodetect
/dev/sdb3 508   36385   288190035   83  Linux
/dev/sdb4   36386   3891320306160   83  Linux

Disk /dev/sdc: 300.0 GB, 300090728448 bytes
255 heads, 63 sectors/track, 36483 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x

   Device Boot  Start End  Blocks   Id  System
/dev/sdc1   1   7   56196   fd  Linux raid autodetect
/dev/sdc2   8 507 4016250   fd  Linux raid autodetect
/dev/sdc3 508   36385   288190035   83  Linux
/dev/sdc4   36386   36483  787185   83  Linux

Disk /dev/sdd: 300.0 GB, 300090728448 bytes
255 heads, 63 sectors/track, 36483 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x

   Device Boot  Start End  Blocks   Id  System
/dev/sdd1   1   7   56196   fd  Linux raid autodetect
/dev/sdd2   8 507 4016250   fd  Linux raid autodetect
/dev/sdd3 508   36385   288190035   83  Linux
/dev/sdd4   36386   36483  787185   83  Linux
[EMAIL PROTECTED]:~#

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


identifying failed disk/s in an array.

2008-01-22 Thread Michael Harris
Hi,

I have just built a Raid 5 array  using mdadm and while it is running fine I 
have a question, about identifying the order of disks in the array.

In the pre sata days you would connect your drives as follows:

Primary Master - HDA
Primary Slave - HDB
Secondary - Master - HDC
Secondary - Slave -HDD

So if disk HDC failed i would know it was the primary disk on the secondary 
controller and would replace that drive.

My current setup is as follows

MB Primary Master (PATA) Primary Master - Operating System

The array disks are attached to:

MB Sata port 1 
MB Sata port 2
PCI card Sata port 1

When i setup the array the OS drive was SDA and the other SDB,SDC,SDD.

Now the problem is everytime i reboot, the drives are sometimes detected in a 
different order, now because i mount root via the UUID of the OS disk and the 
kernel looks at the superblocks of the raided drives everything comes up fine, 
but I'm worried that if i move the array to another machine and need to do a 
mdadm --assemble that i won't know the correct order of the disks and what is 
more worrying if i have a disk fail say HDC for example, i wont know which disk 
HDC is as it could be any of the 5 disks in the PC. Is there anyway to make it 
easier to identify which disk is which?.

thanks

Mike



-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: identifying failed disk/s in an array.

2008-01-22 Thread Tomasz Chmielewski

Michael Harris schrieb:

Hi,

I have just built a Raid 5 array  using mdadm and while it is running fine I 
have a question, about identifying the order of disks in the array.

In the pre sata days you would connect your drives as follows:

Primary Master - HDA
Primary Slave - HDB
Secondary - Master - HDC
Secondary - Slave -HDD

So if disk HDC failed i would know it was the primary disk on the secondary 
controller and would replace that drive.

My current setup is as follows

MB Primary Master (PATA) Primary Master - Operating System

The array disks are attached to:

MB Sata port 1 
MB Sata port 2

PCI card Sata port 1

When i setup the array the OS drive was SDA and the other SDB,SDC,SDD.

Now the problem is everytime i reboot, the drives are sometimes detected in a 
different order, now because i mount root via the UUID of the OS disk and the 
kernel looks at the superblocks of the raided drives everything comes up fine, 
but I'm worried that if i move the array to another machine and need to do a 
mdadm --assemble that i won't know the correct order of the disks and what is 
more worrying if i have a disk fail say HDC for example, i wont know which disk 
HDC is as it could be any of the 5 disks in the PC. Is there anyway to make it 
easier to identify which disk is which?.


If the drives have any LEDs, the most reliable way would be:

dd if=/dev/drive of=/dev/null

Then look which LED is the one which blinks the most.


--
Tomasz Chmielewski
http://wpkg.org


-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html