2.6.11-rc4 md loops on missing drives

2005-02-15 Thread Brad Campbell
G'day all,
I have just finished my shiny new RAID-6 box. 15 x 250GB SATA drives.
While doing some failure testing (inadvertently due to libata SMART causing command errors) I 
dropped 3 drives out of the array in sequence.
md coped with the first two (as it should), but after the third one dropped out I got the below 
errors spinning continuously in my syslog until I managed to stop the array with mdadm --stop /dev/md0

I'm not really sure how it's supposed to cope with losing more disks than planned, but filling the 
syslog with nastiness is not very polite.

This box takes _ages_ (like between 6 an 10 hours) to rebuild the array, but I'm willing to run some 
tests if anyone has particular RAID-6 stuff they want tested before I put it into service.
I do plan on a couple of days burn-in testing before I really load it up anyway.

The last disk is missing at the moment as I'm short one disk due to a Maxtor dropping its bundle 
after about 5000 hours.

I'm using todays BK kernel plus the libata and libata-dev trees. The drives are all on Promise 
SATA150TX4 controllers.

Feb 15 17:58:28 storage1 kernel: .6md: syncing RAID array md0
Feb 15 17:58:28 storage1 kernel: md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
Feb 15 17:58:28 storage1 kernel: md: using maximum available idle IO bandwith (but not more than 
20 KB/sec) for reconstruction.
Feb 15 17:58:28 storage1 kernel: md: using 128k window, over a total of 245117312 blocks.
Feb 15 17:58:28 storage1 kernel: md: md0: sync done.
Feb 15 17:58:28 storage1 kernel: .6md: syncing RAID array md0
Feb 15 17:58:28 storage1 kernel: md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
Feb 15 17:58:28 storage1 kernel: md: using maximum available idle IO bandwith (but not more than 
20 KB/sec) for reconstruction.
Feb 15 17:58:28 storage1 kernel: md: using 128k window, over a total of 245117312 blocks.
Feb 15 17:58:28 storage1 kernel: md: md0: sync done.
Feb 15 17:58:28 storage1 kernel: .6md: syncing RAID array md0
Feb 15 17:58:28 storage1 kernel: md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
Feb 15 17:58:28 storage1 kernel: md: using maximum available idle IO bandwith (but not more than 
20 KB/sec) for reconstruction.
Feb 15 17:58:28 storage1 kernel: md: using 128k window, over a total of 245117312 blocks.
Feb 15 17:58:28 storage1 kernel: md: md0: sync done.
Feb 15 17:58:28 storage1 kernel: .6md: syncing RAID array md0
Feb 15 17:58:28 storage1 kernel: md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
Feb 15 17:58:28 storage1 kernel: md: using maximum available idle IO bandwith (but not more than 
20 KB/sec) for reconstruction.
Feb 15 17:58:28 storage1 kernel: md: using 128k window, over a total of 245117312 blocks.
Feb 15 17:58:28 storage1 kernel: md: md0: sync done.
Feb 15 17:58:28 storage1 kernel: .6md: syncing RAID array md0
Feb 15 17:58:28 storage1 kernel: md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
Feb 15 17:58:28 storage1 kernel: md: using maximum available idle IO bandwith (but not more than 
20 KB/sec) for reconstruction.
to infinity and beyond

Existing raid config below. Fail any additional 2 drives due to IO errors to 
cause this issue.
storage1:/home/brad# mdadm --detail /dev/md0
/dev/md0:
Version : 00.90.01
  Creation Time : Tue Feb 15 22:00:16 2005
 Raid Level : raid6
 Array Size : 3186525056 (3038.91 GiB 3263.00 GB)
Device Size : 245117312 (233.76 GiB 251.00 GB)
   Raid Devices : 15
  Total Devices : 15
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Tue Feb 15 17:17:36 2005
  State : clean, degraded, resyncing
 Active Devices : 14
Working Devices : 14
 Failed Devices : 1
  Spare Devices : 0
 Chunk Size : 128K
 Rebuild Status : 0% complete
   UUID : 11217f79:ac676966:279f2816:f5678084
 Events : 0.40101
Number   Major   Minor   RaidDevice State
   0   800  active sync   
/dev/devfs/scsi/host0/bus0/target0/lun0/disc
   1   8   161  active sync   
/dev/devfs/scsi/host1/bus0/target0/lun0/disc
   2   8   322  active sync   
/dev/devfs/scsi/host2/bus0/target0/lun0/disc
   3   8   483  active sync   
/dev/devfs/scsi/host3/bus0/target0/lun0/disc
   4   8   644  active sync   
/dev/devfs/scsi/host4/bus0/target0/lun0/disc
   5   8   805  active sync   
/dev/devfs/scsi/host5/bus0/target0/lun0/disc
   6   8   966  active sync   
/dev/devfs/scsi/host6/bus0/target0/lun0/disc
   7   8  1127  active sync   
/dev/devfs/scsi/host7/bus0/target0/lun0/disc
   8   8  1288  active sync   
/dev/devfs/scsi/host8/bus0/target0/lun0/disc
   9   8  1449  active sync   
/dev/devfs/scsi/host9/bus0/target0/lun0/disc
  10   8  160   10  active 

Re: 2.6.11-rc4 md loops on missing drives

2005-02-20 Thread Brad Campbell
Neil Brown wrote:
On Tuesday February 15, [EMAIL PROTECTED] wrote:
G'day all,
I'm not really sure how it's supposed to cope with losing more disks than planned, but filling the 
syslog with nastiness is not very polite.

Thanks for the bug report.  There are actually a few problems relating
to resync/recovery when an array (raid 5 or 6) has lost too many
devices.
This patch should fix them.
I applied your latest array of 9 patches to a vanilla BK kernel and did very, very horrible things 
to it while it was rebuilding. I can confirm that it does indeed tidy up the resync issues.

Ta!
Brad
--
Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so. -- Douglas Adams
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Raid-6 hang on write.

2005-02-24 Thread Brad Campbell
G'day all,
I have a painful issue with a RAID-6 box. It only manifests itself on a fully complete and synced up 
array, and I can't reproduce it on an array smaller than the entire drives which means after every 
attempt at debugging I have to endure a 12 hour resync before I try again.

I have a single 3TB array as md0 and on top of that I have an ext3 filesystem. While the array is 
degraded I can read/write to/from it to my hearts content. When it's fully synced up a dd to the 
filesystem results in a lockup like the following.
dd hangs in a D state, as does any attempt to access the filesystem or /proc/mdstat. I then have to 
reboot the box using the doorbell or alt-sysrq which results in a dirty array and another 12 hours 
of rebuild. I have not managed to reproduce this issue on a partitioned array smaller than the full 
drives.

I have reproduced this on 2.6.11-rc4-bk4-bk10 and 2.6.11-rc4-mm1 with no other 
patches.
I'm waiting for it to resync again and I figure I'll try and backup the superblocks, therefore when 
it locks up I should be able to restore the clean superblocks before I try and start the array to 
fool it into thinking the array is clean and skip the long resync.

The write *always* sits in get_active_stripe.
I'll continue to hack on it over the weekend, but I thought I'd post it here in case someone spotted 
a thinko I might have missed.

Here is an alt-sysqr-t of the stuck process. It *always* hangs here like this and is completely 
reproducible on a clean synced array.

Feb 24 14:08:50 storage1 kernel: ddD C041D9A0 0   366353 (NOTLB)
Feb 24 14:08:50 storage1 kernel: f60158e0 0086 f6be35e0 c041d9a0 0004 f713a400 0004 
f713a390
Feb 24 14:08:50 storage1 kernel:09a5 02fa1650 09b8 f6be35e0 f6be3704 f6014000 
c1ba60b8 
Feb 24 14:08:50 storage1 kernel:c1ba6060 c0268574 f7d67200 06528800  0b05cf72 
c1ba60c0 f6014000
Feb 24 14:08:50 storage1 kernel: Call Trace:
Feb 24 14:08:50 storage1 kernel:  [c0268574] get_active_stripe+0x224/0x260
Feb 24 14:08:50 storage1 kernel:  [c01113a0] default_wake_function+0x0/0x20
Feb 24 14:08:50 storage1 kernel:  [c026afea] make_request+0x19a/0x2e0
Feb 24 14:08:50 storage1 kernel:  [c0127480] autoremove_wake_function+0x0/0x60
Feb 24 14:08:50 storage1 kernel:  [c0127480] autoremove_wake_function+0x0/0x60
Feb 24 14:08:50 storage1 kernel:  [c023d702] generic_make_request+0x172/0x210
Feb 24 14:08:50 storage1 kernel:  [c0132399] buffered_rmqueue+0xf9/0x1e0
Feb 24 14:08:50 storage1 kernel:  [c0127480] autoremove_wake_function+0x0/0x60
Feb 24 14:08:50 storage1 last message repeated 2 times
Feb 24 14:08:50 storage1 kernel:  [c023d802] submit_bio+0x62/0x100
Feb 24 14:08:50 storage1 kernel:  [c01502c4] bio_alloc_bioset+0xe4/0x1c0
Feb 24 14:08:50 storage1 kernel:  [c01503c0] bio_alloc+0x20/0x30
Feb 24 14:08:50 storage1 kernel:  [c014fbfa] submit_bh+0x13a/0x1a0
Feb 24 14:08:50 storage1 kernel:  [c014da38] __bread_slow+0x48/0x80
Feb 24 14:08:50 storage1 kernel:  [c014dd1d] __bread+0x3d/0x50
Feb 24 14:08:50 storage1 kernel:  [c017f818] read_block_bitmap+0x58/0xa0
Feb 24 14:08:50 storage1 kernel:  [c018094d] ext3_new_block+0x17d/0x560
Feb 24 14:08:50 storage1 kernel:  [c014db8a] __find_get_block+0x5a/0xe0
Feb 24 14:08:50 storage1 kernel:  [c0183380] ext3_alloc_branch+0x50/0x2d0
Feb 24 14:08:50 storage1 kernel:  [c0183187] ext3_get_branch+0x67/0xf0
Feb 24 14:08:50 storage1 kernel:  [c018395e] ext3_get_block_handle+0x16e/0x360
Feb 24 14:08:50 storage1 kernel:  [c0185f32] __ext3_get_inode_loc+0x62/0x260
Feb 24 14:08:50 storage1 kernel:  [c0183bb4] ext3_get_block+0x64/0xb0
Feb 24 14:08:50 storage1 kernel:  [c014e546] __block_prepare_write+0x206/0x410
Feb 24 14:08:50 storage1 kernel:  [c014ef64] block_prepare_write+0x34/0x50
Feb 24 14:08:50 storage1 kernel:  [c0183b50] ext3_get_block+0x0/0xb0
Feb 24 14:08:50 storage1 kernel:  [c0184149] ext3_prepare_write+0x69/0x140
Feb 24 14:08:50 storage1 kernel:  [c0183b50] ext3_get_block+0x0/0xb0
Feb 24 14:08:50 storage1 kernel:  [c012d994] add_to_page_cache+0x54/0x80
Feb 24 14:08:50 storage1 kernel:  [c012f966] generic_file_buffered_write+0x1b6/0x630
Feb 24 14:08:50 storage1 kernel:  [c0106bee] timer_interrupt+0x4e/0x100
Feb 24 14:08:50 storage1 kernel:  [c0164682] inode_update_time+0x52/0xe0
Feb 24 14:08:50 storage1 kernel:  [c01300ad] __generic_file_aio_write_nolock+0x2cd/0x500
Feb 24 14:08:50 storage1 kernel:  [c0118c7d] __do_softirq+0x7d/0x90
Feb 24 14:08:50 storage1 kernel:  [c0130592] generic_file_aio_write+0x72/0xe0
Feb 24 14:08:50 storage1 kernel:  [c0181b34] ext3_file_write+0x44/0xd0
Feb 24 14:08:50 storage1 kernel:  [c014b49e] do_sync_write+0xbe/0xf0
Feb 24 14:08:50 storage1 kernel:  [c01645bf] update_atime+0x5f/0xd0
Feb 24 14:08:50 storage1 kernel:  [c0127480] autoremove_wake_function+0x0/0x60
Feb 24 14:08:50 storage1 kernel:  [c0157017] pipe_read+0x37/0x40
Feb 24 14:08:50 storage1 kernel:  [c014b56f] vfs_write+0x9f/0x120
Feb 24 14:08:50 

Re: Something is broken with SATA RAID ?

2005-03-02 Thread Brad Campbell
J.A. Magallon wrote:
Hi...
I posted this in other mail, but now I can confirm this.
I have a box with a SATA RAID-5, and with 2.6.11-rc3-mm2+libata-dev1
works like a charm as a samba server, I dropped it 12Gb from an
osx client, and people does backups from W2k boxes and everything was fine.
With 2.6.11-rc4-mm1, it hangs shortly after the mac starts copying
files. No oops, no messages... It even hanged on a local copy (wget),
so I will discard samba as the buggy piece in the puzzle.
I'm going to make a definitive test with rc5-mm1 vs rc5-mm1+libata-dev1.
I already know that plain rc5-mm1 hangs. I have to wait the md reconstruction
of the 1.2 TB to check rc5-mm1+libata (and no user putting things there...)
But, anyone has a clue about what is happening ? I have seen other
reports of RAID related hangs... Any important change after rc3 ?
Any important bugfix in libata-dev1 ? Something broken in -mm ?
There was (is) a bug in -rc4-mm1 that may still be there in -rc5-mm1 related to the way RAID-5 and 
RAID-6 writes out blocks that can cause a deadlock in the raid code. Do you processes just hang in 
the D state and any access to /proc/mdstat do the same thing?

Can you try with just 2.6.11+libata+libata-dev?
I moved to 2.6.11+libata+libata-dev+netdev and all my problems went away.
CC'd to linux-raid
Regards,
Brad
--
Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so. -- Douglas Adams
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Spare disk could not sleep / standby

2005-03-08 Thread Brad Campbell
Gordon Henderson wrote:
I'm in the middle of building up a new home server - looking at RAID-5 or
6 and 2.6.x, so maybe it's time to look at all this again, but it sounds
like the auto superblock update might thwart it all now...
Nah... As far as I can tell, 20ms after the last write, the auto superblock update will write the 
array as clean. You can then spin the disks down as you normally would after a delay. It's just like 
a normal write. There is an overhead I guess, where prior to the next write it's going to mark the 
superblocks as dirty. I wonder in your case if this would spin up *all* the disks at once, or do a 
staged spin up, given it's going to touch all the disks at the same time?

I have my Raid-6 with ext3 and a commit time of 30s. With a idle system, it really stays idle. 
Nothing touches the disks. If I wanted to spin them down I could do that.

The thing I *love* about this feature, is when I do something totally stupid and panic the box, 90% 
of the time I don't need a resync as the array was marked clean after the last write. Thanks Neil!

Just for yuk's, here are a couple of photos of my latest Frankenstein. 3TB of Raid-6 in a Midi-Tower 
case. Had to re-wire the PSU internally to export an extra 12v rail to an appropriate place however.

I have been beating Raid-6 senseless for the last week now and doing horrid things to the hardware. 
I'm now completely confident in its stability and ready to use it for production. Thanks HPA!

http://www.wasp.net.au/~brad/nas/nas-front.jpg
http://www.wasp.net.au/~brad/nas/nas-psu.jpg
http://www.wasp.net.au/~brad/nas/nas-rear.jpg
http://www.wasp.net.au/~brad/nas/nas-side.jpg
Regards,
Brad
--
Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so. -- Douglas Adams
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: now on to tuning....

2005-03-09 Thread Brad Campbell
Gordon Henderson wrote:
And do check your disks regularly, although I don't think current version
of smartmontools fully supports sata under the scsi subsystem yet...
Actually, if you are using a UP machine, the libata-dev tree has patches that make this work. I 
believe there may be races on SMP machines however.

All 29 drives get a short test every morning and a long test every Sunday morning. Odd results are 
immediately E-mailed to me by smartd.

storage1:/home/brad# smartctl -A -d ata /dev/sda
smartctl version 5.32 Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE  UPDATED  
WHEN_FAILED RAW_VALUE
  3 Spin_Up_Time0x0027   252   252   063Pre-fail  Always   
-   5622
  4 Start_Stop_Count0x0032   253   253   000Old_age   Always   
-   20
  5 Reallocated_Sector_Ct   0x0033   253   253   063Pre-fail  Always   
-   0
  6 Read_Channel_Margin 0x0001   253   253   100Pre-fail  Offline  
-   0
  7 Seek_Error_Rate 0x000a   253   252   000Old_age   Always   
-   0
  8 Seek_Time_Performance   0x0027   250   248   187Pre-fail  Always   
-   35232
  9 Power_On_Minutes0x0032   252   252   000Old_age   Always   
-   457h+24m
 10 Spin_Retry_Count0x002b   252   252   157Pre-fail  Always   
-   0
 11 Calibration_Retry_Count 0x002b   253   252   223Pre-fail  Always   
-   0
 12 Power_Cycle_Count   0x0032   253   253   000Old_age   Always   
-   34
192 Power-Off_Retract_Count 0x0032   253   253   000Old_age   Always   
-   0
193 Load_Cycle_Count0x0032   253   253   000Old_age   Always   
-   0
194 Temperature_Celsius 0x0032   253   253   000Old_age   Always   
-   35
195 Hardware_ECC_Recovered  0x000a   253   252   000Old_age   Always   
-   1411
196 Reallocated_Event_Count 0x0008   253   253   000Old_age   Offline  
-   0
197 Current_Pending_Sector  0x0008   253   253   000Old_age   Offline  
-   0
198 Offline_Uncorrectable   0x0008   253   253   000Old_age   Offline  
-   0
199 UDMA_CRC_Error_Count0x0008   199   199   000Old_age   Offline  
-   0
200 Multi_Zone_Error_Rate   0x000a   253   252   000Old_age   Always   
-   0
201 Soft_Read_Error_Rate0x000a   253   252   000Old_age   Always   
-   3
202 TA_Increase_Count   0x000a   253   252   000Old_age   Always   
-   0
203 Run_Out_Cancel  0x000b   253   252   180Pre-fail  Always   
-   0
204 Shock_Count_Write_Opern 0x000a   253   252   000Old_age   Always   
-   0
205 Shock_Rate_Write_Opern  0x000a   253   252   000Old_age   Always   
-   0
207 Spin_High_Current   0x002a   252   252   000Old_age   Always   
-   0
208 Spin_Buzz   0x002a   252   252   000Old_age   Always   
-   0
209 Offline_Seek_Performnce 0x0024   194   194   000Old_age   Offline  
-   0
 99 Unknown_Attribute   0x0004   253   253   000Old_age   Offline  
-   0
100 Unknown_Attribute   0x0004   253   253   000Old_age   Offline  
-   0
101 Unknown_Attribute   0x0004   253   253   000Old_age   Offline  
-   0
Regards,
Brad
--
Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so. -- Douglas Adams
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Convert raid5 to raid1?

2005-03-10 Thread Brad Campbell
John McMonagle wrote:
Was planning to adding a hot spare to my 3 disk raid5 array and was 
thinking if I go to 4 drives I would be a  better off  as 2 raid1 arrays 
considering the current state of raid5.
I just wonder about the comment considering the current state of raid5. What might be wrong with 
raid5 currently? I certainly know a number of people (me included) who run several large raid-5 
arrays and don't have any problems.

Brad
--
Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so. -- Douglas Adams
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID 5 disks not spinning down

2005-03-12 Thread Brad Campbell
Alexander Stockinger wrote:
Hi all,
I have a linux software RAID 5 running on a Debian Sarge with 2.6.7-smp. 
Since I installed the system (several kernel updates ago) the disks of 
the RAID 5 won't stay spinned down. Having them sent to standy manually 
using hdparm ends up in having the disks spin up within the next minute 
or so.
To further investigate the problem I watched the /proc/diskstats file 
for /dev/md0 and realized that prior to the spinup a single read access 
to the RAID is issued - consequently the disks attached to the read spin 
up again...
Does it occur if you have the filesystem on the md device unmounted?
Regards,
Brad
--
Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so. -- Douglas Adams
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: EVMS or md?

2005-04-05 Thread Brad Campbell
H. Peter n wrote:
No hiccups, data losses, or missing functionality.  At the end of the
whole ordeal, the filesystem (1 TB, 50% full) was still quite prisine,
and fsck confirmed this.  I was quite pleased :)
I second this. I endured numerous kernel crashes and other lockup/forced restart issues while 
setting up a 15 drive 3TB raid-6 (Crashes not really realted to the mb subsystem except for the 
oddity in the -mm kernel, and that was not raid-6 specific). I have popped out various drives and 
caused numerous failures/rebuilds with an ext3 system over 90% full while burn in testing and not 
experienced one glitch.
It has been used in production now for over a month and is performing flawlessly. I have run it in 
full/1-disk and 2-disk degraded mode for testing. I certainly consider it stable.

Brad
--
Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so. -- Douglas Adams
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: skip raid5 reconstruction

2005-07-08 Thread Brad Campbell

Ming Zhang wrote:

Hi folks

I am testing some HW performance with raid5 with 2.4.x kenrel.

It is really troublesome every time I create a raid5, wait 4 hours for
reconstruction, and then test some data and then recreate another one
and wait again. I wonder if there is any hack or option available to
create a raid5 without reconstruct the parity disk. I just have interest
to test the performance so do not care about data correctness at this
stage.


I did a similar thing a while back.
I created the raid and waited for it to sync, I then make dd copies of the raid 
superblocks.
When I blew it up I just dd the clean superblocks back again (saved a 12 hour 
rebuild time)

Having just thought about what you wrote, I guess you are building the raid in 
different
configurationes each time, so my method might not be good for you.

Regards,
Brad
--
Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so. -- Douglas Adams
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: skip raid5 reconstruction

2005-07-08 Thread Brad Campbell

Ming Zhang wrote:


I did a similar thing a while back.
I created the raid and waited for it to sync, I then make dd copies of the raid 
superblocks.
When I blew it up I just dd the clean superblocks back again (saved a 12 hour 
rebuild time)


interesting to know about this. u just check the dmesg and see where is
the sb and then u dd it out and dd back later?


Actually, I used blockdev to get the device size and then just copied the last 64kb from memory.. 
It's a bit hazy now but that's pretty close I think


Regards
Brad
--
Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so. -- Douglas Adams
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Raid-6 stability gets my vote

2005-07-15 Thread Brad Campbell

G'day all,

This message is really just for future googles.
I have been running a 15 disk raid-6 since 24th feb in production and can completely vouch for it's 
stability. I have had both simulated and real drive failures and it has handled itself perfectly 
under all cases. Unclean shutdowns and resyncs have all been perfect.
I know it's tagged as stable in any case, but I still get E-mails from people dragging my name from 
google asking about it, so I place this here for the public record.


storage1:/home/brad# mdadm -D /dev/md0
/dev/md0:
Version : 00.90.01
  Creation Time : Thu Feb 24 14:51:17 2005
 Raid Level : raid6
 Array Size : 3186525056 (3038.91 GiB 3263.00 GB)
Device Size : 245117312 (233.76 GiB 251.00 GB)
   Raid Devices : 15
  Total Devices : 15
Preferred Minor : 0
Persistence : Superblock is persistent

Update Time : Fri Jul 15 13:27:51 2005
  State : clean
 Active Devices : 15
Working Devices : 15
 Failed Devices : 0
  Spare Devices : 0

Many thanks to hpa and others responsible for raid-6. I have already had a 2 drive failure and they 
were so close together even a hot spare would not have had time to rebuild.


Also thanks to Neil Brown for a great monitoring and management tool. Mdadm and it's monitoring to 
E-mail has been invaluable.


I'd also like to thank Maxtor for producing drives that actually have useful S.M.A.R.T. data. Wish 
the other manuf's would follow suit.


Brad
--
Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so. -- Douglas Adams
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Recovery raid5 after sata cable failure.

2005-07-18 Thread Brad Campbell

Francisco Zafra wrote:
Hi Neil, 
Since some hours I am trying to solved it with the last version:

[EMAIL PROTECTED]:~ # mdadm --version
mdadm - v2.0-devel-2 - DEVELOPMENT VERSION NOT FOR REGULAR USE - 7 July 2005

With the same results :(

I really don't think it is locked I dd it in act of desperation and I have
no problems:
[EMAIL PROTECTED]:~ # dd if=/dev/zero of=/dev/sdh bs=1k count=1000
1000+0 records in
1000+0 records out
1024000 bytes transferred in 0.417862 seconds (2450570 bytes/sec)



Asking a silly question perhaps..

fuser /dev/sdh

Regards,
Brad
--
Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so. -- Douglas Adams
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: split RAID1 during backups?

2005-10-24 Thread Brad Campbell

Jeff Breidenbach wrote:


So - I'm thinking of the following backup scenario.  First, remount
/dev/md0 readonly just to be safe. Then mount the two component
paritions (sdc1, sdd1) readonly. Tell the webserver to work from one
component partition, and tell the backup process to work from the
other component partition. Once the backup is complete, point the
webserver back at /dev/md0, unmount the component partitions, then
switch read-write mode back on.


Why not do something like this ?

mount -o remount,ro /dev/md0 /web
mdadm --fail /dev/md0 /dev/sdd1
mdadm --remove /dev/md0 /dev/sdd1
mount -o ro /dev/sdd1 /target

do backup here

umount /target
mdadm -add /dev/md0 /dev/sdd1
mount -o remount,rw /dev/md0 /web

That way the web server continues to run from the md..
However you will endure a rebuild on md0 when you re-add the disk, but given everything is mounted 
read-only, you should not practically be doing anything and if you fail a disk during the rebuild 
the other disk will still be intact.


I second jurriaan's vote for rsync also, but I would be inclined just to let it loose on the whole 
disk rather than break it up into parts.. but then I have heaps of ram too..


Regards,
Brad
--
Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so. -- Douglas Adams
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Raid-6 Rebuild question

2005-11-13 Thread Brad Campbell

G'day all,

Here is an interesting question( well I think so in any case ). I just replaced a failed disk in my 
15 drive Raid-6.


Simply mdadm --add /dev/md0 /dev/sdl

Why, when there is no other activity on the array at all, is it writing to every disk during the 
recovery? I would have assumed it just read from the others and write to sdl.


This is an iostat -k 5 on that machine while rebuilding

avg-cpu:  %user   %nice%sys %iowait   %idle
   0.000.00  100.000.000.00

Device:tpskB_read/skB_wrtn/skB_readkB_wrtn
sda 121.08 14187.95   925.30  23552   1536
sdb 127.71 14187.95  1002.41  23552   1664
sdc 125.30 14187.95  1002.41  23552   1664
sdd 122.29 14187.95  1002.41  23552   1664
sde 125.30 14187.95  1002.41  23552   1664
sdf 127.71 14187.95  1002.41  23552   1664
sdg 125.90 14187.95   925.30  23552   1536
sdh 125.30 14187.95   925.30  23552   1536
sdi 134.34 14187.95   925.30  23552   1536
sdj 137.95 14187.95   925.30  23552   1536
sdk 140.36 14187.95  1850.60  23552   3072
sdl  79.52 0.00 14265.06  0  23680
sdm 133.13 14187.95   925.30  23552   1536
sdn 134.34 14187.95   925.30  23552   1536
sdo 133.73 14187.95   925.30  23552   1536
md0   0.00 0.00 0.00  0  0

storage1:/home/brad# cat /proc/mdstat
Personalities : [raid6]
md0 : active raid6 sdl[15] sdg[6] sda[0] sdo[14] sdn[13] sdm[12] sdk[10] sdj[9] sdi[8] sdh[7] sdf[5] 
sde[4] sdd[3] sdc[2] sdb[1]

  3186525056 blocks level 6, 128k chunk, algorithm 2 [15/14] 
[UUU_UUU]
  []  recovery =  1.8% (4518144/245117312) 
finish=838.3min speed=4782K/sec
unused devices: none

Regards,
Brad
--
Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so. -- Douglas Adams
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Raid-6 Rebuild question

2005-11-13 Thread Brad Campbell

Brad Campbell wrote:

G'day all,

Here is an interesting question( well I think so in any case ). I just 
replaced a failed disk in my 15 drive Raid-6.


Forgot the most important detail (as usual)

bklaptop:~ssh storage1 uname -a
Linux storage1 2.6.11.7 #4 Fri Oct 7 20:00:25 GST 2005 i686 GNU/Linux

Regards,
Brad
--
Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so. -- Douglas Adams
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Expanding RAID array?

2006-01-16 Thread Brad Campbell

Neil Brown wrote:


- add disks to convert to raid6.
I don't think this is possible, but you should check the latest
raid reconfig.


It's not. I started work on it Feb last year but then real life got in the way 
again.
In the longer term, I think raidreconf as it stands is going to die (mainly because it's 
infrastructure relies on the old raidtab architecture). I thought about perhaps porting it as an 
addon to mdadm, but then I ran out of drives/machines/time to test it on.



Might be supported online with a limited raid6 in which the Q
syndrome (second 'parity' block) isn't rotated among disks.


In theory I would have thought it not that much different than a raid-5 expand, just inserting an 
extra block for the Q syndrome.



- status of RAID6
I believe it is as stable/reliable as raid5.


Mine has been running since Feb last year with fairly moderate use and no 
hiccups.
I have just upgraded to the latest 2.6.15-git on that machine to give some of the newer raid patches 
(like check  repair) a whirl. Seems fairly solid.


Let's say I've never had any raid-6 related data loss, or even near misses, but it has saved my 
bacon in 2 dual drive failures in the last year.


Oh, and the new read and check code (rather than just rebuild the parity blocks) shaves about 1 hour 
of what was an 11 hour rebuild time on this particular raid-6. Thanks Neil!


Regards,
Brad
--
Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so. -- Douglas Adams
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID 1 vs RAID 0

2006-01-18 Thread Brad Campbell

Max Waterman wrote:


Still, it seems like it should be a solvable problem...if you order the 
data differently on each disk; for example, in the two disk case, 
putting odd and even numbered 'stripes' on different platters [or sides 
of platters].




The only problem there is determining the internal geometry of the disk, and knowing that each disk 
is probably different. How do you know which logical sector number correlates to which surface and 
where abouts on the surface? Just thinking about it makes my brain hurt.


Not like the good old days of the old stepper disks.

Brad
--
Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so. -- Douglas Adams
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Two RAID6 questions

2006-01-20 Thread Brad Campbell

John Rowe wrote:


First, can raidreconf grow a RAID6 device? The man page doesn't seem to
mention RAID6 at all.


No, raidreconf has no knowledge of raid-6 at all.


Second, with RAID5 or RAID6 my biggest fear is a system crash whilst the
RAID is writing resulting in dirty blocks. Does RAID6 give some sort of
ECC capability when reconstructing? I'm imagining checking the parity of
a RAID block and if it's wrong assuming each block in turn is dirty,
recalculating it from the first parity and then checking the result
against the second parity.


This is just one of those things.. if you crash while writing, unless you have hardware raid with 
NVRAM you are going to leave the array in an unclean, uncertain state..

The best you can do is re-sync the array, fsck the filesystem and hope for the 
best..

The greatest new feature in md in this regard is the periodically mark the array as clean while it's 
idle.. It generally ensures that most of the time on a crash you don't need a resync on the next 
assemble.


Brad
--
Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so. -- Douglas Adams
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Silly stripe cache values

2006-03-25 Thread Brad Campbell

Brad Campbell wrote:

G'day all,

I have a box here.. it has a 2Ghz processor and 1.5GB of ram. It runs 
the entire OS over NFS and it's sole purpose in life is to run 15 SATA 
drives in a Raid-6 with ext3 on it, and share that over NFS. Most of 
that ram is sitting completely idle and thus I thought a logical thing 
to do would be to stuff as much of it as possible into the MD subsystem 
to help it cache..


Are there any limits to the values living in /sys/block/md* and what 
might be the tradeoffs (if any) to using what would normally be thought 
stupid amounts of ram for these knobs ?


This box does not get written to often, it's just a media streamer 
mostly.. but if I am writing to it then it chokes just providing a 1Mb 
stream over the network currently. (It's on a 2.6.15-git11 kernel 
currently but I'm just upgrading to 2.6.16 now)


Scratch that.. the limit appears to be 32768 and that works fine..

Google search results increase in accuracy proportionally with the elapsed time of a list posting 
with the question.. :\


Brad
--
Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so. -- Douglas Adams
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Recommendations for supported 4-port SATA PCI card ?

2006-04-01 Thread Brad Campbell

Christopher Smith wrote:

Brad Campbell wrote:
I've been running 3 together in one box for about 18 months, and four 
in another for a year now... the on board BIOS will only pickup 8 
drives, but they work just fine under Linux and recognise all 
connected drives.


What distro and kernel ?


Err.. well, I started with 2.4 originally and I've run 2.6.5, 2.6.9, 2.6.10-bk10, 2.6.15.1 and now 
2.6.16 on both boxes.. never an issue with multiple card support..


It's based on Debian mostly, but to be honest it's not important as the kernel is always self 
compiled as with mdadm..




I tried this about 2 - 3 months ago and had problems whenever more than 
two cards were in my system, even if only a single drive was installed. 
 I posted about it here (search for multiple promise sata150 tx4 cards 
back in January).


I recall the thread.. don't think I replied then as I did not really have 
anything to say I guess.



I have no complaints about the performance (relatively speaking, of 
course), but I've got the cards in a machine with multiple PCI-X busses, 
so it's not really bottlenecked there.


Perhaps its related to that. All mine are on ASUS A7V600 motherboards with a single bog standard PCI 
bus. At one point early on I recall someone stating they were having all sorts of problems with 
those cards and a 66Mhz bus..


Are you running the latest BIOS in all the cards? Only ask as the 1st thing I did when mine arrived 
was to upgrade the BIOS.. not using it but I thought it might have some impact on the way they are 
set up on the PCI bus.


I have a mate who has another 3 in his box also.. but again he's on an ASUS 
A7V600 motherboard.

I'm looking to build another 15 drive box soon and I was thinking of perhaps a couple of 8 port 
Marvell cards, but then I might just stick with what I know and source some more of the same.


Regards,
Brad
--
Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so. -- Douglas Adams
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: A failed-disk-how-to anywhere?

2006-04-09 Thread Brad Campbell

Martin Stender wrote:

Hi there!

I have two identical disks sitting on a Promise dual channel IDE 
controller. I guess both disks are primary's then.


One of the disks have failed, so I bought a new disk, took out the 
failed disk, and put in the new one.
That might seem a little naive, and apparently it was, since the system 
won't boot up now.

It boots fine, when only the old, healthy disk is connected.

By the way, all three partitions are raided - including /boot ...

Anyway, I have removed the old disk from the Raid with:
#mdadm /dev/mdo --remove /dev/hde1
#mdadm /dev/md1 --remove /dev/hde2
#mdadm /dev/md2 --remove /dev/hde3

- but the the problem persists.

I can't seem to find a decent 'How-To' - so how it this supposed to be 
done?


A little more info would be helpful. How does the machine boot? How are your 
other disks configured?
Are you booting off the Promise board or on-board controller (making assumptions given your promise 
appears to contain hde, I'm assuming hd[abcd] are on board somewhere..)


I'm going to take a wild stab in the dark now..

My initial thought would be you have hde and hdg in a raid-1 and nothing on the on-board 
controllers. hde has failed and when you removed it your controller tried the 1st disk it could find 
(hdg) to boot of.. Bingo.. away we go.
You plug a new shiny disk into hde and now the controller tries to boot off that, except it's blank 
and therefore a no-go.


I'd either try and force the controller to boot off hdg (which might be a controller bios option) or 
swap hde  hdg.. then it might boot and let you create your partitions on hdg and then add it back 
into the mirror.


How close did I get ?


Brad
--
Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so. -- Douglas Adams
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: I need a PCI V2.1 4 port SATA card

2006-06-27 Thread Brad Campbell

Guy wrote:

Hello group,

I am upgrading my disks from old 18 Gig SCSI disks to 300 Gig SATA
disks.  I need a good SATA controller.  My system is old and has PCI V 2.1.
I need a 4 port card, or 2 2 port cards.  My system has multi PCI buses, so
2 cards may give me better performance, but I don't need it.  I will be
using software RAID.  Can anyone recommend a card that is supported by the
current kernel?


I'm using Promise SATA150TX4 cards here in old PCI based systems. They work great and have been rock 
solid for well in excess of a year 24/7 hard use. I have 3 in one box and 4 in another.


I'm actually looking at building another 15 disk server now and was hoping to move to something 
quicker using _almost_ commodity hardware.


My current 15 drive RAID-6 server is built around a KT600 board with an AMD Sempron processor and 4 
SATA150TX4 cards. It does the job but it's not the fastest thing around (takes about 10 hours to do 
a check of the array or about 15 to do a rebuild).


I'd love to do something similar with PCI-E or PCI-X and make it go faster (the PCI bus bandwidth is 
the killer), however I've not seen many affordable PCI-E multi-port cards that are supported yet and 
PCI-X seems to mean moving to server class mainboards and the other expenses that come along with 
that.


Brad
--
Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so. -- Douglas Adams
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ok to go ahead with this setup?

2006-06-28 Thread Brad Campbell

[EMAIL PROTECTED] wrote:

Mike Dresser wrote:

On Fri, 23 Jun 2006, Molle Bestefich wrote:


Christian Pernegger wrote:

Anything specific wrong with the Maxtors?

I'd watch out regarding the Western Digital disks, apparently they
have a bad habit of turning themselves off when used in RAID mode, for
some reason:
http://thread.gmane.org/gmane.linux.kernel.device-mapper.devel/1980/

The MaxLine III's (7V300F0) with VA111630/670 firmware currently timeout
on a weekly or less basis.. I'm still testing VA111680 on a 15x300 gig
array


We also see similar problem on Maxtor 6V250F0 drives: they 'crash' randomly at
a weeks timescale. Only way to get them back is by power cycling. Tried both
SuperMicro SATA card (Marvell chip) and Promise Fastrak, firmware updates from
Maxtor did not fix it yet. We were already forced to exchange all drives at
a customer because he does not want to use Maxtor's anymore. Neither do we :(



Whereas I have 28 7Y250M0 drives sitting in a couple of arrays here that have behaved perfectly 
(aside from some grown defects) for over 18000 hours so far. They are *all* sitting on Promise 
SATA150TX4 cards on 2.6 kernels.


I'm looking at another server and another 15 drives at the moment, and it's 
Maxtors I'm looking at.

Everyone has different experience. I would not touch Seagate with a 10 foot pole (blew up way too 
many logic boards when I was using them), and I got bitten *badly* by the WD firmware issue with 
RAID (firmware upgrade fixed that, but can't replace the data I lost when 3 of them failed at the 
same time and the array got corrupted).


Having said that, it was MaxLineIII 300G drives I was looking at, so perhaps I'll wait a little 
longer and hear some more stories before I drop $$ on 15 of them.


Brad
--
Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so. -- Douglas Adams
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: trying to brute-force my RAID 5...

2006-07-18 Thread Brad Campbell

Francois Barre wrote:

What are you expecting fdisk to tell you?  fdisk lists partitions and
I suspect you didn't have any partitions on /dev/md0
More likely you want something like
   fsck -n -f /dev/md0

and see which one produces the least noise.


Maybe a simple file -s /dev/md0 could do the trick, and would only
produce output different from the mere data when the good
configuration is found...

More likely to produce an output whenever the 1st disk in the array is in the right place as it will 
just look at the 1st couple of sectors for the superblock.


I'd go with the fsck idea as it will try to inspect the rest of the filesystem 
also.

Brad
--
Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so. -- Douglas Adams
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Resize on dirty array?

2006-08-11 Thread Brad Campbell

David Rees wrote:

 I personally prefer to do a long self-test once a week, a month seems
 like a lot of time for something to go wrong.

unfortunately i found some drives (seagate 400 pata) had a rather 
negative

effect on performance while doing self-test.


Interesting that you noted negative performance, but I typically
schedule the tests for off-hours anyway where performance isn't
critical.


Personally I have every disk do a short test at 6am Monday-Saturday, and then they *all* (29 of 
them) do a long test every Sunday at 6am.


I figure having all disks do a long test at the same time rather than staggered is going to show up 
any pending issues with my PSU's also.


(Been doing this for nearly 2 years now and had it show up a couple of drives that were slowly 
growing defects. Nothing a dd if=/dev/zero of=/dev/sd(x) did not fix though)


Brad
--
Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so. -- Douglas Adams
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RAID-6 check slow..

2006-08-22 Thread Brad Campbell

G'day all,

I have a box with 15 SATA drives in it, they are all on the PCI bus and it's a 
relatively slow machine.

I can extract about 100MB/s combined read speed from these drives with dd.

When reading /dev/md0 with dd I get about 80MB/s, but when I ask it to check the array on a 
completely idle system with echo check  /sys/block/md/md0/sync_action I get a combined read speed 
across all drives of 31.9MB/s


I'm not that fussed I guess, given the system does have extended idle periods, it would be nice to 
have a sync or check complete as quickly as the hardware allows. Experience has shown that a rebuild 
of a single disk failure takes 10-12 hours but the check seems to take forever


[EMAIL PROTECTED]:~$ cat /proc/mdstat
Personalities : [raid6]
md0 : active raid6 sda[0] sdo[14] sdn[13] sdm[12] sdl[11] sdk[10] sdj[9] sdi[8] sdh[7] sdg[6] sdf[5] 
sde[4] sdd[3] sdc[2] sdb[1]

  3186525056 blocks level 6, 128k chunk, algorithm 2 [15/15] 
[UUU]
  []  resync =  0.1% (458496/245117312) 
finish=1881.9min speed=2164K/sec

unused devices: none

I have included some iostat output running on a 5 second interval and allowed 
30 seconds to stabilise.

Linux storage1 2.6.17.9 #2 Sun Aug 20 17:16:24 GST 2006 i686 GNU/Linux

- snip -

1st a dd from all drives.

storage1:/home/brad# cat t
#!/bin/sh
for i in /dev/sd[abcdefghijklmno] ; do
echo $i
dd if=$i of=/dev/null 
done;


avg-cpu:  %user   %nice%sys %iowait   %idle
   8.800.00   58.40   32.800.00

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda  13.00 13312.00 0.00  66560  0
sdb  12.80 13107.20 0.00  65536  0
sdc  12.80 13107.20 0.00  65536  0
sdd  12.80 13107.20 0.00  65536  0
sde  12.80 13107.20 0.00  65536  0
sdf  12.80 13107.20 0.00  65536  0
sdg  12.80 13107.20 0.00  65536  0
sdh  13.00 13312.00 0.00  66560  0
sdi  12.80 13107.20 0.00  65536  0
sdj  13.00 13312.00 0.00  66560  0
sdk  13.00 13312.00 0.00  66560  0
sdl  12.80 13107.20 0.00  65536  0
sdm  17.20 17612.80 0.00  88064  0
sdn  17.20 17612.80 0.00  88064  0
sdo  17.20 17612.80 0.00  88064  0
md0   0.00 0.00 0.00  0  0


 snip -

echo check  /sys/block/md/md0/sync_action

avg-cpu:  %user   %nice%sys %iowait   %idle
   0.800.006.590.00   92.61

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda   5.99  4343.31 0.00  21760  0
sdb   5.99  4343.31 0.00  21760  0
sdc   5.99  4343.31 0.00  21760  0
sdd   5.99  4343.31 0.00  21760  0
sde   5.99  4343.31 0.00  21760  0
sdf   5.99  4343.31 0.00  21760  0
sdg   5.99  4343.31 0.00  21760  0
sdh   5.99  4343.31 0.00  21760  0
sdi   5.99  4343.31 0.00  21760  0
sdj   5.99  4343.31 0.00  21760  0
sdk   5.99  4343.31 0.00  21760  0
sdl   5.99  4343.31 0.00  21760  0
sdm   5.99  4343.31 0.00  21760  0
sdn   5.99  4343.31 0.00  21760  0
sdo   5.99  4343.31 0.00  21760  0
md0   0.00 0.00 0.00  0  0

storage1:/home/brad# grep 0 /proc/sys/dev/raid/*
/proc/sys/dev/raid/speed_limit_max:40
/proc/sys/dev/raid/speed_limit_min:1000

- snip -

dd if=/dev/md0 of=/dev/null

avg-cpu:  %user   %nice%sys %iowait   %idle
   9.000.00   72.60   18.400.00

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda  25.80 11008.00 0.00  55040  0
sdb  25.60 10924.80 0.00  54624  0
sdc  26.00 10956.80 0.00  54784  0
sdd  25.80 10956.80 0.00  54784  0
sde  25.20 11059.20 0.00  55296  0
sdf  26.00 11008.00 0.00  55040  0
sdg  26.20 11008.00 0.00  55040  0
sdh  26.40 11008.00 

Re: RAID-6 check slow..

2006-08-22 Thread Brad Campbell

Neil Brown wrote:


Hmm  nothing obvious.
Have you tried increasing 
  /proc/sys/dev/raid/speed_limit_min:1000

just in case that makes a difference (it shouldn't but you seem to be
down close to that speed).


No difference..


What speed in the raid6 algorithm used - as reported at boot time?
Again, I doubt that is the problem - if should be about 1000 times
speed you are seeing.


raid6: int32x1739 MB/s
raid6: int32x2991 MB/s
raid6: int32x4636 MB/s
raid6: int32x8587 MB/s
raid6: mmxx1 1556 MB/s
raid6: mmxx2 2701 MB/s
raid6: sse1x11432 MB/s
raid6: sse1x22398 MB/s
raid6: using algorithm sse1x2 (2398 MB/s)
md: raid6 personality registered for level 6
raid5: automatically using best checksumming function: pIII_sse
   pIII_sse  :  2345.000 MB/sec
raid5: using function: pIII_sse (2345.000 MB/sec)
md: md driver 0.90.3 MAX_MD_DEVS=256, MD_SB_DISKS=27
md: bitmap version 4.39



What if you try increasing /sys/block/md0/md/stripe_cache_size ?


I already have this at 8192 (which appears to be HUGE for 15 drives, but I've got 1.5GB of ram and 
nothing else using it)



That's all I can think of for now.



Oh well, no stress.. Just thought I'd ask anyway :)

Brad
--
Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so. -- Douglas Adams
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: md array numbering is messed up

2006-10-30 Thread Brad Campbell

Michael Tokarev wrote:

Neil Brown wrote:

On Sunday October 29, [EMAIL PROTECTED] wrote:

Hi,

I have 2 arrays whose numbers get inverted, creating havoc, when booting
under different kernels.

I have md0 (raid1) made up of ide drives and md1 (raid5) made up of five
sata drives, when booting with my current ubuntu 2.6.12-9 kernel. When I
try to boot a more recent kernel (2.6.15-26 or 2.6.15-27) the
order is inversed and my sata raid5 array shows up as md0.

My arrays are part of evms volumes that just stop working if the
numbering is inverted.

any clues ?

Your arrays are being started the wrong way.
Do you have an mdadm.conf that lists the arrays?  Can you show us what
it looked like?
If not, do you know how the arrays are started in ubuntu?


My guess is that it's using mdrun shell script - the same as on Debian.
It's a long story, the thing is quite ugly and messy and does messy things
too, but they says it's compatibility stuff and continue shipping it.

For the OP, the solution is to *create* mdadm.conf file - in that case
mdrun should hopefully NOT run.


I'd suggest you are probably correct. By default on Ubuntu 6.06

[EMAIL PROTECTED]:~$ cat /etc/init.d/mdadm-raid
#!/bin/sh
#
# Start any arrays which are described in /etc/mdadm/mdadm.conf and which are
# not running already.
#
# Copyright (c) 2001-2004 Mario Jou/3en [EMAIL PROTECTED]
# Distributable under the terms of the GNU GPL version 2.

MDADM=/sbin/mdadm
MDRUN=/sbin/mdrun
CONFIG=/etc/mdadm/mdadm.conf
DEBIANCONFIG=/etc/default/mdadm

. /lib/lsb/init-functions

test -x $MDADM || exit 0

AUTOSTART=true
test -f $DEBIANCONFIG  . $DEBIANCONFIG

case $1 in
start)
if [ x$AUTOSTART = xtrue ] ; then
if [ ! -f /proc/mdstat ]  [ -x /sbin/modprobe ] ; then
/sbin/modprobe -k md  /dev/null 21
fi
test -f /proc/mdstat || exit 0
log_begin_msg Starting RAID devices...
if [ -f $CONFIG ]  [ -x $MDADM ] ; then
$MDADM -A -s
elif [ -x $MDRUN ] ; then
$MDRUN
fi
log_end_msg $?
fi
;;
stop|restart|reload|force-reload)
;;
*)
log_success_msg Usage: $0 {start|stop|restart|reload|force-reload}
exit 1
;;
esac

exit 0


Brad
--
Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so. -- Douglas Adams
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Swapping out for larger disks

2007-05-08 Thread Brad Campbell

G'day all,

I've got 3 arrays here. A 3 drive raid-5, a 10 drive raid-5 and a 15 drive raid-6. They are all 
currently 250GB SATA drives.


I'm contemplating an upgrade to 500GB drives on one or more of the arrays and wondering the best way 
to do the physical swap.


The slow and steady way would be to degrade the array, remove a disk, add the new disk, lather, 
rinse, repeat. After which I could use mdadm --grow. There is the concern of a degraded array here 
though (and one of the reasons I'm looking to swap is some of the disks have about 30,000 hours on 
the clock and are growing the odd defect).


I was more wondering about the feasibility of using dd to copy the drive contents to the larger 
drives (then I could do 5 at a time) and working it from there.


It occurs though that the superblocks would be in the wrong place for the new drives and I'm 
wondering if the kernel or mdadm might not find them.


Ideas? Suggestions ?

Brad
--
Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so. -- Douglas Adams
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Swapping out for larger disks

2007-05-08 Thread Brad Campbell

David Greaves wrote:


I was more wondering about the feasibility of using dd to copy the drive
contents to the larger drives (then I could do 5 at a time) and working
it from there.



Err, if you can dd the drives, why can't you create a new array and use xfsdump
or equivalent? Is downtime due to copying that bad?


I can only do 5 at a time. (10 slots, 5 source - 5 destination).

I'm not worried about the downtime so much as the constant swapping of disks. This way I can do it 
in 2 or 3 blocks at most.



Brad
--
Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so. -- Douglas Adams
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Machine hanging on synchronize cache on shutdown 2.6.22-rc4-git[45678]

2007-06-18 Thread Brad Campbell

Mikael Pettersson wrote:


I don't think sata_promise is the guilty party here. Looks like some
layer above sata_promise got confused about the state of the interface.

But locking up hard after hardreset is a problem of sata_promise, no?


Maybe, maybe not. The original report doesn't specify where/how
the machine hung.


It hangs in the process of trying to power it off. Unmount everything and halt 
the machine.

I've tried halt with and without the -h.

With the -h you can hear the drives spin down, then it tries to spin them up 
again and hangs.

Without the -h it just hangs hard where you see in the photo.


Brad: can you enable sysrq and check if the kernel responds to
sysrq when it appears to hang, and if so, where it's executing?


All my kernels have sysrq enabled. Once the hard reset is displayed on the 
screen everything locks.


sata_promise just passes sata_std_hardreset to ata_do_eh.
I've certainly seen EH hardresets work before, so I'm assuming
that something in this particular situation (PHY offlined,
kernel close to shutting down) breaks things.


That is my thought. I thought on a .22-rc kernel if I used halt -h and it spun the disks down that 
the kernel would detect that and not try to flush the caches on them, or have I read something 
incorrectly?



FWIW, I'm seeing scsi layer accesses (cache flushes) after things
like rmmod sata_promise. They error out and don't seem to cause
any harm, but the fact that they occur at all makes me nervous.


Brad
--
Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so. -- Douglas Adams
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Software based SATA RAID-5 expandable arrays?

2007-06-20 Thread Brad Campbell

greenjelly wrote:


The options I seek are to be able to start with a 6 Drive array RAID-5
array, then as my demand for more space increases in the future I want to be
able to plug in more drives and incorporate them into the Array without the
need to backup the data.  Basically I need the software to add the
drive/drives to the Array, then Rebuild the array incorporating the new
drives while preserving the data on the original array.


I've got 2 boxes. One has 14 drives and a 480W PSU and the other has 15 drives and a 600W PSU. It's 
not rocket science. Put a lot of drives in a box, make sure you have enough sata ports and power to 
go around (watch your peak 12V consumption on spin up really) and use linux md. Easy.. Oh, but make 
sure the drives stay cool!


For a cheap-o home server (which is what I have) I'd certainly not bother with a dedicated RAID 
card. You are not even going to need GB ethernet really.. I've got 15 drives on a single PCI bus, 
it's as slow as a wet week in may (in the southern hemisphere), but I'm streaming to 3 head units 
which total a combined 5MB/s if I'm lucky.. Rebuilds can take up to 10 hours though.



QUESTIONS
Since this is a media server, and would only be used to serve Movies and
Video to my two machines It wouldn't have to be powered up full time (My
Music consumes less space and will be contained on two seperate machines). 
Is there a way to considerably lower the power consumption of this server

the 90% of time its not in use?


Yes, don't poll for SMART and spin down the drives when idle (man hdparm). Use S3 sleep and WOL if 
you are really clever. (I'm not, my boxes live in a dedicated server room with its own AC, but 
that's because I'm nuts). I also have over 25k hours on the drives because I don't spin them down. I 
figure the extra power is a trade off for drive life. They've got less than 50 spin cycles on them 
in over 25k hours..



Can Linux support Drive Arrays of Significant Sizes (4-8 terabytes)?


Yes, easily (6TB here)


Can Linux Software support RAID-5 expandability, allowing me to increase the
number of disks in the array, without the need to backup the media, recreate
the array from scratch and then copy the backup to the machine (something I
will be unable to do)?


Yes but get a cheap UPS at least (it's cheap insurance)


I know this is a Linux forum, but I figure many of you guys work with
Windows Server.  If so does Windows 2003 provide the same support for the
requested requirements above?


Why would you even _ask_ ??

Read the man page for mdadm, then read it again (and a third time). Then google for Raid-5 two 
drive failure linux just to familiarise yourself with the background.


What you are doing has been done before many, many times. There are some well written sites out 
there relating to building exactly what you want to build with great detail.


If you are serious about using windows, I pity you.. Linux (actually a combination of the kernel md 
layer and mdadm) makes it so easy you'd be nuts to beat your head against the wall with the alternative.


Brad
--
Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so. -- Douglas Adams
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Software based SATA RAID-5 expandable arrays?

2007-06-22 Thread Brad Campbell

jahammonds prost wrote:

From: Brad Campbell [EMAIL PROTECTED]

I've got 2 boxes. One has 14 drives and a 480W PSU and the other has 15 drives and a 600W PSU. It's 
not rocket science.


Where did you find reasonably priced cases to hold so many drives? Each of my 
home servers top out at 8 data drives each - plus a 20Gb one to boot from.


For one of them I used a modified CD duplicator case (9 5.25 bays) and the other one I used a nice 
tall tower. All except 4 drives are in Supermicro hotswap bays. Aside from the Supermicro bays 
(which do look nice and keep the drives very cool) these machines are chewing gum and duct tape jobs.


http://i10.photobucket.com/albums/a109/ytixelprep/F.jpg

Having said that, they are chewing gum and duct tape jobs that have had a downtime of less than 4 
hrs/year over the last 3 years.



Brad
--
Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so. -- Douglas Adams
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Linux Software RAID is really RAID?

2007-06-26 Thread Brad Campbell

Johny Mail list wrote:

Hello list,
I have a little question about software RAID on Linux.
I have installed Software Raid on all my SC1425 servers DELL by
believing that the md raid was a strong driver.
And recently i make some test on a server and try to view if the RAID
hard drive power failure work fine, so i power up my server and after
booting and the prompt appear I disconnected the power cable of my
SATA hard drive. Normaly the MD should eleminate the failure hard
drive of the logical drive it build, and the server continue to work
fine like nothing happen. Oddly the server stop to respond and i get
this messages :
ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata4.00: cmd e7/00:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 0x0 data 0
 res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata4: port is slow to respond, please be patient (Status 0xd0)
ata4: port failed to respond (30sec, Status 0xd0)
ata4: soft resetting port

After that my system is frozen. Normaly in a basic RAID the device is
disable in the logical RAID device (md0) and it only use the last
disk.


cc to linux-ide added.

Unfortunately this is not an artifact of the linux raid driver, rather it appears to be an issue 
with the SATA driver and related error recovery. Some information about what kernel, configuration, 
drives, controller cards and other relevant system information would be good.


See the information at this URL for the sort of extra information that would be 
handy.

http://www.kernel.org/pub/linux/docs/lkml/reporting-bugs.html


Regards,
Brad
--
Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so. -- Douglas Adams
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html