Re: [CentOS] Question about a hard drive error

2010-11-16 Thread Benjamin Franz
On 11/15/2010 10:41 AM, Gilbert Sebenste wrote:
 Thanks John, I appreciate it! Both are being replaced after a nearby 55
 KV power line shorted to ground and blew a manhole cover 50' into the air,
 damaging a lot of equipment over here, even those on UPS's. Nobody was
 hurt, thank goodness. But, I'll be looking into RAID 5 in the future.

In these days of multi-terabyte drives you should be looking at RAID6 
instead. The chances of a 'double failure' during degraded 
operation/resync is too high to ignore.

-- 
Benjamin Franz
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Question about a hard drive error

2010-11-16 Thread Alan Hodgson
On November 16, 2010 08:31:05 am Benjamin Franz wrote:
 On 11/15/2010 10:41 AM, Gilbert Sebenste wrote:
  Thanks John, I appreciate it! Both are being replaced after a nearby 55
  KV power line shorted to ground and blew a manhole cover 50' into the
  air, damaging a lot of equipment over here, even those on UPS's. Nobody
  was hurt, thank goodness. But, I'll be looking into RAID 5 in the
  future.
 
 In these days of multi-terabyte drives you should be looking at RAID6
 instead. The chances of a 'double failure' during degraded
 operation/resync is too high to ignore.

Like almost 100% ..
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Question about a hard drive error

2010-11-16 Thread John R Pierce
On 11/16/10 8:31 AM, Benjamin Franz wrote:
 In these days of multi-terabyte drives you should be looking at RAID6
 instead. The chances of a 'double failure' during degraded
 operation/resync is too high to ignore.


These days of cheap drives, I use raid10 almost exclusively.  and if its 
at all mission critical, I like to have 1-2 hotspares.   if I was 
deploying a new server, and its workload was at all database-centric, 
I'd want to use use 2.5 SAS rather than 3.5 SATA

With RAID10, the rebuild time is how long it takes to copy the one 
drive.   if you have 6 drives in a raid10 and one fails, leaving 5, and 
another fails, there's only a 1 in 5 chance of that other failure being 
the mirror of the dead drive.   If you have a  hot spare, that 
rebuild starts immediately, reducing the window for that dreaded double 
failure to a minimum.


___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Question about a hard drive error

2010-11-16 Thread Benjamin Franz
On 11/16/2010 09:25 AM, John R Pierce wrote:

 These days of cheap drives, I use raid10 almost exclusively.  and if its
 at all mission critical, I like to have 1-2 hotspares.   if I was
 deploying a new server, and its workload was at all database-centric,
 I'd want to use use 2.5 SAS rather than 3.5 SATA

 With RAID10, the rebuild time is how long it takes to copy the one
 drive.   if you have 6 drives in a raid10 and one fails, leaving 5, and
 another fails, there's only a 1 in 5 chance of that other failure being
 the mirror of the dead drive.   If you have a  hot spare, that
 rebuild starts immediately, reducing the window for that dreaded double
 failure to a minimum.


Oh, I agree - and when price is no object, or if write performance is 
the bottleneck, or if you need huge numbers of drives, I love RAID10. 
You can take it to crazy levels of redundancy + performance by going to 
RAID0 layered over multiple three-way RAID1 arrays. Why have multiple 
hotspares when you can go for N2-RAID1 + 0 instead and get a hefty 
performance boost on reads for almost free at even higher reliability?

-- 
Benjamin Franz

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Question about a hard drive error

2010-11-16 Thread John R Pierce
On 11/16/10 10:41 AM, Benjamin Franz wrote:
 Oh, I agree - and when price is no object, or if write performance is

the price spread isn't that big of a deal.

a 6-drive raid-6 gives you 4x space, while a 6 drive raid-10 gives you 
3X.not that big of a deal.

an 8-drive RAID-6 gives you 6X space, while an 8-drive RAID-10 gives you 
4x space.   not much bigger of a gap.

raid sets really shouldn't be much bigger than about 8 drives, 
anyways.   rebuild times for a 12 drive raid6 would be astronomical.


___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Question about a hard drive error

2010-11-16 Thread Benjamin Franz
On 11/16/2010 10:47 AM, John R Pierce wrote:

 raid sets really shouldn't be much bigger than about 8 drives,
 anyways.   rebuild times for a 12 drive raid6 would be astronomical.


You are ok up to here. Rebuild time for replacement of a failed drive 
scales by drive size, not raid set size, regardless of whether it is 
RAID1, 5, 6 or 10. It remains roughly the amount of time it takes to 
completely write one drive at full speed (at least unless you run out of 
bus bandwidth - but that takes a lot of drives).

However, system availability/performance is much better for RAID10 than 
for the others during a rebuild because of the isolation of the rebuild 
work to only the involved spindles.

-- 
Benjamin Franz

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Question about a hard drive error

2010-11-15 Thread Gilbert Sebenste
On Wed, 10 Nov 2010, John R Pierce wrote:

 On 11/10/10 6:58 PM, Gilbert Sebenste wrote:
  Hey everyone,

  I just got one of these today:

  Nov 10 16:07:54 stormy kernel: sd 0:0:0:0: SCSI error: return code =
  0x0800
  Nov 10 16:07:54 stormy kernel: sda: Current: sense key: Medium Error
  Nov 10 16:07:54 stormy kernel: Add. Sense: Unrecovered read error
  Nov 10 16:07:54 stormy kernel:
  Nov 10 16:07:54 stormy kernel: Info fld=0x0
  Nov 10 16:07:54 stormy kernel: end_request: I/O error, dev sda, sector
  3896150669

 see where it says dev sda ?   thats physical drive zero which has a read 
 error on that sector.


  Nov 10 16:07:54 stormy kernel: Read-error on swap-device (253:1:743752)
  Nov 10 16:07:54 stormy kernel: Read-error on swap-device (253:1:743760)
  Nov 10 16:07:54 stormy kernel: Read-error on swap-device (253:1:743768)
  Nov 10 16:07:54 stormy kernel: Read-error on swap-device (253:1:743776)
  Nov 10 16:07:54 stormy kernel: Read-error on swap-device (253:1:743784)
  Nov 10 16:07:54 stormy kernel: Read-error on swap-device (253:1:743792)
  Nov 10 16:07:54 stormy kernel: Read-error on swap-device (253:1:743800)
  Nov 10 16:07:54 stormy kernel: Read-error on swap-device (253:1:743808)

  My question is this: I have RAID00 set up, but don't really understand
  it well. This is how my disks are set up:

  Filesystem   1K-blocks  Used Available Use% Mounted on
  /dev/mapper/VolGroup00-LogVol00
 1886608544 296733484 1492495120  17% /
  /dev/sda1   101086 19877 75990  21% /boot
  tmpfs  1684312   1204416479896  72% /dev/shm
 

 that is not how your disks are setup, thats how your FILE SYSTEMS are setup.

Correct, apologies for the incorrect wording.

 that dev/mapper thing is a LVM volume.  you can display the physical volumes 
 behind a LVM with the command 'pvs'

Thank you! That was helpful.

  Which one is having the trouble? Any ideas so I can swap it out?

 raid0 is not suitable for reliability.  if any one drive in the raid0 fails 
 (or is removed) the whole volume has failed and will become unusable.

Thanks John, I appreciate it! Both are being replaced after a nearby 55 
KV power line shorted to ground and blew a manhole cover 50' into the air,
damaging a lot of equipment over here, even those on UPS's. Nobody was 
hurt, thank goodness. But, I'll be looking into RAID 5 in the future.

***
Gilbert Sebenste 
(My opinions only!)  **
***
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Question about a hard drive error

2010-11-10 Thread John R Pierce
On 11/10/10 6:58 PM, Gilbert Sebenste wrote:
 Hey everyone,

 I just got one of these today:

 Nov 10 16:07:54 stormy kernel: sd 0:0:0:0: SCSI error: return code =
 0x0800
 Nov 10 16:07:54 stormy kernel: sda: Current: sense key: Medium Error
 Nov 10 16:07:54 stormy kernel: Add. Sense: Unrecovered read error
 Nov 10 16:07:54 stormy kernel:
 Nov 10 16:07:54 stormy kernel: Info fld=0x0
 Nov 10 16:07:54 stormy kernel: end_request: I/O error, dev sda, sector
 3896150669

see where it says dev sda ?   thats physical drive zero which has a read 
error on that sector.


 Nov 10 16:07:54 stormy kernel: Read-error on swap-device (253:1:743752)
 Nov 10 16:07:54 stormy kernel: Read-error on swap-device (253:1:743760)
 Nov 10 16:07:54 stormy kernel: Read-error on swap-device (253:1:743768)
 Nov 10 16:07:54 stormy kernel: Read-error on swap-device (253:1:743776)
 Nov 10 16:07:54 stormy kernel: Read-error on swap-device (253:1:743784)
 Nov 10 16:07:54 stormy kernel: Read-error on swap-device (253:1:743792)
 Nov 10 16:07:54 stormy kernel: Read-error on swap-device (253:1:743800)
 Nov 10 16:07:54 stormy kernel: Read-error on swap-device (253:1:743808)

 My question is this: I have RAID00 set up, but don't really understand
 it well. This is how my disks are set up:

 Filesystem   1K-blocks  Used Available Use% Mounted on
 /dev/mapper/VolGroup00-LogVol00
1886608544 296733484 1492495120  17% /
 /dev/sda1   101086 19877 75990  21% /boot
 tmpfs  1684312   1204416479896  72% /dev/shm


that is not how your disks are setup, thats how your FILE SYSTEMS are setup.

that dev/mapper thing is a LVM volume.  you can display the physical 
volumes behind a LVM with the command 'pvs'




 Which one is having the trouble? Any ideas so I can swap it out?


raid0 is not suitable for reliability.  if any one drive in the raid0 
fails (or is removed) the whole volume has failed and will become unusable.


___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos