[zfs-discuss] ZFS - how to determine which physical drive to replace

2009-12-12 Thread Paul Bruce
Hi,

I'm just about to build a ZFS system as a home file server in raidz, but I
have one question - pre-empting the need to replace one of the drives if it
ever fails.

How on earth do you determine the actual physical drive that has failed ?

I've got the while zpool status thing worked out, but how do I translate
the c1t0d0, c1t0d1 etc.. to a real physical driver.

I can just see myself looking at the 6 drives, and thinking .
 c1t0d1
i think that's *this* one.. einee menee minee moe

P
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS - how to determine which physical drive to replace

2009-12-12 Thread Edward Ned Harvey
This is especially important, because if you have 1 failed drive, and you
pull the wrong drive, now you have 2 failed drives.  And that could destroy
the dataset (depending on whether you have raidz-1 or raidz-2)

 

Whenever possible, always get the hotswappable hardware, that will blink a
red light for you, so there can be no mistake.  Even if the hardware doesn't
blink a light for you, you could manually cycle between activity and
non-activity on the disks, to identify the disk yourself . But if that's not
a possibility . if you have no lights on non-hotswappable disks . then . 

 

Given you're going to have to power off the system.

 

Given it's difficult to map the device name to physical wire.

 

I would suggest something like this:  While the system is still on, if the
failed drive is at least writable *a little bit* . then you can dd
if=/dev/zero of=/dev/rdsk/FailedDiskDevice bs=1024 count=1024 . and then
after the system is off, you could plug the drives into another system
one-by-one, and read the first 1M, and see if it's all zeros.   (Or instead
of dd zero, you could echo some text onto the drive, or whatever you think
is easiest.)  

 

Obviously that's not necessarily an option.  If the drive is completely
dead, totally unwritable, then when you plug the drives one-by-one into
another system, it should be easy to identify the failed drive.

 

 

 

From: zfs-discuss-boun...@opensolaris.org
[mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Paul Bruce
Sent: Saturday, December 12, 2009 9:18 AM
To: zfs-discuss@opensolaris.org
Subject: [zfs-discuss] ZFS - how to determine which physical drive to
replace

 

Hi, 

 

I'm just about to build a ZFS system as a home file server in raidz, but I
have one question - pre-empting the need to replace one of the drives if it
ever fails. 

 

How on earth do you determine the actual physical drive that has failed ?

 

I've got the while zpool status thing worked out, but how do I translate the
c1t0d0, c1t0d1 etc.. to a real physical driver. 





I can just see myself looking at the 6 drives, and thinking .
c1t0d1 i think that's *this* one.. einee menee minee moe





P

 

 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS - how to determine which physical drive to replace

2009-12-12 Thread Ed Plese
On Sat, Dec 12, 2009 at 8:17 AM, Paul Bruce p...@cais.com.au wrote:
 Hi,
 I'm just about to build a ZFS system as a home file server in raidz, but I
 have one question - pre-empting the need to replace one of the drives if it
 ever fails.
 How on earth do you determine the actual physical drive that has failed ?
 I've got the while zpool status thing worked out, but how do I translate
 the c1t0d0, c1t0d1 etc.. to a real physical driver.
 I can just see myself looking at the 6 drives, and thinking .
  c1t0d1 i think that's *this* one.. einee menee minee moe
 P

As suggested at
http://opensolaris.org/jive/thread.jspa?messageID=416264, you can try
viewing the disk serial numbers with cfgadm:

cfgadm -al -s select=type(disk),cols=ap_id:info

You may need to power down the system to view the serial numbers
printed on the disks to match them up, but it beats guessing.


Ed Plese
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS - how to determine which physical drive to replace

2009-12-12 Thread Patrick O'Sullivan
I've found that when I build a system, it's worth the initial effort
to install drives one by one to see how they get mapped to names. Then
I put labels on the drives and SATA cables. If there were room to
label the actual SATA ports on the motherboard and cards, I would.

While this isn't foolproof, it gives me a bit more reassurance in the
[inevitable] event of a drive failure.

On Sat, Dec 12, 2009 at 9:17 AM, Paul Bruce p...@cais.com.au wrote:
 Hi,
 I'm just about to build a ZFS system as a home file server in raidz, but I
 have one question - pre-empting the need to replace one of the drives if it
 ever fails.
 How on earth do you determine the actual physical drive that has failed ?
 I've got the while zpool status thing worked out, but how do I translate
 the c1t0d0, c1t0d1 etc.. to a real physical driver.
 I can just see myself looking at the 6 drives, and thinking .
  c1t0d1 i think that's *this* one.. einee menee minee moe
 P


 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS - how to determine which physical drive to replace

2009-12-12 Thread Mike Gerdts
On Sat, Dec 12, 2009 at 9:58 AM, Edward Ned Harvey
sola...@nedharvey.com wrote:
 I would suggest something like this:  While the system is still on, if the
 failed drive is at least writable *a little bit* … then you can “dd
 if=/dev/zero of=/dev/rdsk/FailedDiskDevice bs=1024 count=1024” … and then
 after the system is off, you could plug the drives into another system
 one-by-one, and read the first 1M, and see if it’s all zeros.   (Or instead
 of dd zero, you could echo some text onto the drive, or whatever you think
 is easiest.)


How about reading instead?

dd if=/dev/rdsk/$whatever of=/dev/null

If the failed disk generates I/O errors that prevent it from reading
at a rate that causes an LED to blink, you could read from all of the
good disks.  The one that doesn't blink is the broken one.

You can also get the drive serial number with iostat -En:

$ iostat -En
c3d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Model: Hitachi HTS5425 Revision:  Serial No: 080804BB6300HCG Size:
160.04GB 160039305216 bytes
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0
...

That /should/ be printed on the disk somewhere.

-- 
Mike Gerdts
http://mgerdts.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss