2.6.11-rc4 md loops on missing drives
G'day all, I have just finished my shiny new RAID-6 box. 15 x 250GB SATA drives. While doing some failure testing (inadvertently due to libata SMART causing command errors) I dropped 3 drives out of the array in sequence. md coped with the first two (as it should), but after the third one dropped out I got the below errors spinning continuously in my syslog until I managed to stop the array with mdadm --stop /dev/md0 I'm not really sure how it's supposed to cope with losing more disks than planned, but filling the syslog with nastiness is not very polite. This box takes _ages_ (like between 6 an 10 hours) to rebuild the array, but I'm willing to run some tests if anyone has particular RAID-6 stuff they want tested before I put it into service. I do plan on a couple of days burn-in testing before I really load it up anyway. The last disk is missing at the moment as I'm short one disk due to a Maxtor dropping its bundle after about 5000 hours. I'm using todays BK kernel plus the libata and libata-dev trees. The drives are all on Promise SATA150TX4 controllers. Feb 15 17:58:28 storage1 kernel: .6md: syncing RAID array md0 Feb 15 17:58:28 storage1 kernel: md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc. Feb 15 17:58:28 storage1 kernel: md: using maximum available idle IO bandwith (but not more than 20 KB/sec) for reconstruction. Feb 15 17:58:28 storage1 kernel: md: using 128k window, over a total of 245117312 blocks. Feb 15 17:58:28 storage1 kernel: md: md0: sync done. Feb 15 17:58:28 storage1 kernel: .6md: syncing RAID array md0 Feb 15 17:58:28 storage1 kernel: md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc. Feb 15 17:58:28 storage1 kernel: md: using maximum available idle IO bandwith (but not more than 20 KB/sec) for reconstruction. Feb 15 17:58:28 storage1 kernel: md: using 128k window, over a total of 245117312 blocks. Feb 15 17:58:28 storage1 kernel: md: md0: sync done. Feb 15 17:58:28 storage1 kernel: .6md: syncing RAID array md0 Feb 15 17:58:28 storage1 kernel: md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc. Feb 15 17:58:28 storage1 kernel: md: using maximum available idle IO bandwith (but not more than 20 KB/sec) for reconstruction. Feb 15 17:58:28 storage1 kernel: md: using 128k window, over a total of 245117312 blocks. Feb 15 17:58:28 storage1 kernel: md: md0: sync done. Feb 15 17:58:28 storage1 kernel: .6md: syncing RAID array md0 Feb 15 17:58:28 storage1 kernel: md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc. Feb 15 17:58:28 storage1 kernel: md: using maximum available idle IO bandwith (but not more than 20 KB/sec) for reconstruction. Feb 15 17:58:28 storage1 kernel: md: using 128k window, over a total of 245117312 blocks. Feb 15 17:58:28 storage1 kernel: md: md0: sync done. Feb 15 17:58:28 storage1 kernel: .6md: syncing RAID array md0 Feb 15 17:58:28 storage1 kernel: md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc. Feb 15 17:58:28 storage1 kernel: md: using maximum available idle IO bandwith (but not more than 20 KB/sec) for reconstruction. to infinity and beyond Existing raid config below. Fail any additional 2 drives due to IO errors to cause this issue. storage1:/home/brad# mdadm --detail /dev/md0 /dev/md0: Version : 00.90.01 Creation Time : Tue Feb 15 22:00:16 2005 Raid Level : raid6 Array Size : 3186525056 (3038.91 GiB 3263.00 GB) Device Size : 245117312 (233.76 GiB 251.00 GB) Raid Devices : 15 Total Devices : 15 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Tue Feb 15 17:17:36 2005 State : clean, degraded, resyncing Active Devices : 14 Working Devices : 14 Failed Devices : 1 Spare Devices : 0 Chunk Size : 128K Rebuild Status : 0% complete UUID : 11217f79:ac676966:279f2816:f5678084 Events : 0.40101 Number Major Minor RaidDevice State 0 800 active sync /dev/devfs/scsi/host0/bus0/target0/lun0/disc 1 8 161 active sync /dev/devfs/scsi/host1/bus0/target0/lun0/disc 2 8 322 active sync /dev/devfs/scsi/host2/bus0/target0/lun0/disc 3 8 483 active sync /dev/devfs/scsi/host3/bus0/target0/lun0/disc 4 8 644 active sync /dev/devfs/scsi/host4/bus0/target0/lun0/disc 5 8 805 active sync /dev/devfs/scsi/host5/bus0/target0/lun0/disc 6 8 966 active sync /dev/devfs/scsi/host6/bus0/target0/lun0/disc 7 8 1127 active sync /dev/devfs/scsi/host7/bus0/target0/lun0/disc 8 8 1288 active sync /dev/devfs/scsi/host8/bus0/target0/lun0/disc 9 8 1449 active sync /dev/devfs/scsi/host9/bus0/target0/lun0/disc 10 8 160 10 active
Re: 2.6.11-rc4 md loops on missing drives
Neil Brown wrote: On Tuesday February 15, [EMAIL PROTECTED] wrote: G'day all, I'm not really sure how it's supposed to cope with losing more disks than planned, but filling the syslog with nastiness is not very polite. Thanks for the bug report. There are actually a few problems relating to resync/recovery when an array (raid 5 or 6) has lost too many devices. This patch should fix them. I applied your latest array of 9 patches to a vanilla BK kernel and did very, very horrible things to it while it was rebuilding. I can confirm that it does indeed tidy up the resync issues. Ta! Brad -- Human beings, who are almost unique in having the ability to learn from the experience of others, are also remarkable for their apparent disinclination to do so. -- Douglas Adams - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Raid-6 hang on write.
G'day all, I have a painful issue with a RAID-6 box. It only manifests itself on a fully complete and synced up array, and I can't reproduce it on an array smaller than the entire drives which means after every attempt at debugging I have to endure a 12 hour resync before I try again. I have a single 3TB array as md0 and on top of that I have an ext3 filesystem. While the array is degraded I can read/write to/from it to my hearts content. When it's fully synced up a dd to the filesystem results in a lockup like the following. dd hangs in a D state, as does any attempt to access the filesystem or /proc/mdstat. I then have to reboot the box using the doorbell or alt-sysrq which results in a dirty array and another 12 hours of rebuild. I have not managed to reproduce this issue on a partitioned array smaller than the full drives. I have reproduced this on 2.6.11-rc4-bk4-bk10 and 2.6.11-rc4-mm1 with no other patches. I'm waiting for it to resync again and I figure I'll try and backup the superblocks, therefore when it locks up I should be able to restore the clean superblocks before I try and start the array to fool it into thinking the array is clean and skip the long resync. The write *always* sits in get_active_stripe. I'll continue to hack on it over the weekend, but I thought I'd post it here in case someone spotted a thinko I might have missed. Here is an alt-sysqr-t of the stuck process. It *always* hangs here like this and is completely reproducible on a clean synced array. Feb 24 14:08:50 storage1 kernel: ddD C041D9A0 0 366353 (NOTLB) Feb 24 14:08:50 storage1 kernel: f60158e0 0086 f6be35e0 c041d9a0 0004 f713a400 0004 f713a390 Feb 24 14:08:50 storage1 kernel:09a5 02fa1650 09b8 f6be35e0 f6be3704 f6014000 c1ba60b8 Feb 24 14:08:50 storage1 kernel:c1ba6060 c0268574 f7d67200 06528800 0b05cf72 c1ba60c0 f6014000 Feb 24 14:08:50 storage1 kernel: Call Trace: Feb 24 14:08:50 storage1 kernel: [c0268574] get_active_stripe+0x224/0x260 Feb 24 14:08:50 storage1 kernel: [c01113a0] default_wake_function+0x0/0x20 Feb 24 14:08:50 storage1 kernel: [c026afea] make_request+0x19a/0x2e0 Feb 24 14:08:50 storage1 kernel: [c0127480] autoremove_wake_function+0x0/0x60 Feb 24 14:08:50 storage1 kernel: [c0127480] autoremove_wake_function+0x0/0x60 Feb 24 14:08:50 storage1 kernel: [c023d702] generic_make_request+0x172/0x210 Feb 24 14:08:50 storage1 kernel: [c0132399] buffered_rmqueue+0xf9/0x1e0 Feb 24 14:08:50 storage1 kernel: [c0127480] autoremove_wake_function+0x0/0x60 Feb 24 14:08:50 storage1 last message repeated 2 times Feb 24 14:08:50 storage1 kernel: [c023d802] submit_bio+0x62/0x100 Feb 24 14:08:50 storage1 kernel: [c01502c4] bio_alloc_bioset+0xe4/0x1c0 Feb 24 14:08:50 storage1 kernel: [c01503c0] bio_alloc+0x20/0x30 Feb 24 14:08:50 storage1 kernel: [c014fbfa] submit_bh+0x13a/0x1a0 Feb 24 14:08:50 storage1 kernel: [c014da38] __bread_slow+0x48/0x80 Feb 24 14:08:50 storage1 kernel: [c014dd1d] __bread+0x3d/0x50 Feb 24 14:08:50 storage1 kernel: [c017f818] read_block_bitmap+0x58/0xa0 Feb 24 14:08:50 storage1 kernel: [c018094d] ext3_new_block+0x17d/0x560 Feb 24 14:08:50 storage1 kernel: [c014db8a] __find_get_block+0x5a/0xe0 Feb 24 14:08:50 storage1 kernel: [c0183380] ext3_alloc_branch+0x50/0x2d0 Feb 24 14:08:50 storage1 kernel: [c0183187] ext3_get_branch+0x67/0xf0 Feb 24 14:08:50 storage1 kernel: [c018395e] ext3_get_block_handle+0x16e/0x360 Feb 24 14:08:50 storage1 kernel: [c0185f32] __ext3_get_inode_loc+0x62/0x260 Feb 24 14:08:50 storage1 kernel: [c0183bb4] ext3_get_block+0x64/0xb0 Feb 24 14:08:50 storage1 kernel: [c014e546] __block_prepare_write+0x206/0x410 Feb 24 14:08:50 storage1 kernel: [c014ef64] block_prepare_write+0x34/0x50 Feb 24 14:08:50 storage1 kernel: [c0183b50] ext3_get_block+0x0/0xb0 Feb 24 14:08:50 storage1 kernel: [c0184149] ext3_prepare_write+0x69/0x140 Feb 24 14:08:50 storage1 kernel: [c0183b50] ext3_get_block+0x0/0xb0 Feb 24 14:08:50 storage1 kernel: [c012d994] add_to_page_cache+0x54/0x80 Feb 24 14:08:50 storage1 kernel: [c012f966] generic_file_buffered_write+0x1b6/0x630 Feb 24 14:08:50 storage1 kernel: [c0106bee] timer_interrupt+0x4e/0x100 Feb 24 14:08:50 storage1 kernel: [c0164682] inode_update_time+0x52/0xe0 Feb 24 14:08:50 storage1 kernel: [c01300ad] __generic_file_aio_write_nolock+0x2cd/0x500 Feb 24 14:08:50 storage1 kernel: [c0118c7d] __do_softirq+0x7d/0x90 Feb 24 14:08:50 storage1 kernel: [c0130592] generic_file_aio_write+0x72/0xe0 Feb 24 14:08:50 storage1 kernel: [c0181b34] ext3_file_write+0x44/0xd0 Feb 24 14:08:50 storage1 kernel: [c014b49e] do_sync_write+0xbe/0xf0 Feb 24 14:08:50 storage1 kernel: [c01645bf] update_atime+0x5f/0xd0 Feb 24 14:08:50 storage1 kernel: [c0127480] autoremove_wake_function+0x0/0x60 Feb 24 14:08:50 storage1 kernel: [c0157017] pipe_read+0x37/0x40 Feb 24 14:08:50 storage1 kernel: [c014b56f] vfs_write+0x9f/0x120 Feb 24 14:08:50
Re: Something is broken with SATA RAID ?
J.A. Magallon wrote: Hi... I posted this in other mail, but now I can confirm this. I have a box with a SATA RAID-5, and with 2.6.11-rc3-mm2+libata-dev1 works like a charm as a samba server, I dropped it 12Gb from an osx client, and people does backups from W2k boxes and everything was fine. With 2.6.11-rc4-mm1, it hangs shortly after the mac starts copying files. No oops, no messages... It even hanged on a local copy (wget), so I will discard samba as the buggy piece in the puzzle. I'm going to make a definitive test with rc5-mm1 vs rc5-mm1+libata-dev1. I already know that plain rc5-mm1 hangs. I have to wait the md reconstruction of the 1.2 TB to check rc5-mm1+libata (and no user putting things there...) But, anyone has a clue about what is happening ? I have seen other reports of RAID related hangs... Any important change after rc3 ? Any important bugfix in libata-dev1 ? Something broken in -mm ? There was (is) a bug in -rc4-mm1 that may still be there in -rc5-mm1 related to the way RAID-5 and RAID-6 writes out blocks that can cause a deadlock in the raid code. Do you processes just hang in the D state and any access to /proc/mdstat do the same thing? Can you try with just 2.6.11+libata+libata-dev? I moved to 2.6.11+libata+libata-dev+netdev and all my problems went away. CC'd to linux-raid Regards, Brad -- Human beings, who are almost unique in having the ability to learn from the experience of others, are also remarkable for their apparent disinclination to do so. -- Douglas Adams - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Spare disk could not sleep / standby
Gordon Henderson wrote: I'm in the middle of building up a new home server - looking at RAID-5 or 6 and 2.6.x, so maybe it's time to look at all this again, but it sounds like the auto superblock update might thwart it all now... Nah... As far as I can tell, 20ms after the last write, the auto superblock update will write the array as clean. You can then spin the disks down as you normally would after a delay. It's just like a normal write. There is an overhead I guess, where prior to the next write it's going to mark the superblocks as dirty. I wonder in your case if this would spin up *all* the disks at once, or do a staged spin up, given it's going to touch all the disks at the same time? I have my Raid-6 with ext3 and a commit time of 30s. With a idle system, it really stays idle. Nothing touches the disks. If I wanted to spin them down I could do that. The thing I *love* about this feature, is when I do something totally stupid and panic the box, 90% of the time I don't need a resync as the array was marked clean after the last write. Thanks Neil! Just for yuk's, here are a couple of photos of my latest Frankenstein. 3TB of Raid-6 in a Midi-Tower case. Had to re-wire the PSU internally to export an extra 12v rail to an appropriate place however. I have been beating Raid-6 senseless for the last week now and doing horrid things to the hardware. I'm now completely confident in its stability and ready to use it for production. Thanks HPA! http://www.wasp.net.au/~brad/nas/nas-front.jpg http://www.wasp.net.au/~brad/nas/nas-psu.jpg http://www.wasp.net.au/~brad/nas/nas-rear.jpg http://www.wasp.net.au/~brad/nas/nas-side.jpg Regards, Brad -- Human beings, who are almost unique in having the ability to learn from the experience of others, are also remarkable for their apparent disinclination to do so. -- Douglas Adams - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: now on to tuning....
Gordon Henderson wrote: And do check your disks regularly, although I don't think current version of smartmontools fully supports sata under the scsi subsystem yet... Actually, if you are using a UP machine, the libata-dev tree has patches that make this work. I believe there may be races on SMP machines however. All 29 drives get a short test every morning and a long test every Sunday morning. Odd results are immediately E-mailed to me by smartd. storage1:/home/brad# smartctl -A -d ata /dev/sda smartctl version 5.32 Copyright (C) 2002-4 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF READ SMART DATA SECTION === SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 3 Spin_Up_Time0x0027 252 252 063Pre-fail Always - 5622 4 Start_Stop_Count0x0032 253 253 000Old_age Always - 20 5 Reallocated_Sector_Ct 0x0033 253 253 063Pre-fail Always - 0 6 Read_Channel_Margin 0x0001 253 253 100Pre-fail Offline - 0 7 Seek_Error_Rate 0x000a 253 252 000Old_age Always - 0 8 Seek_Time_Performance 0x0027 250 248 187Pre-fail Always - 35232 9 Power_On_Minutes0x0032 252 252 000Old_age Always - 457h+24m 10 Spin_Retry_Count0x002b 252 252 157Pre-fail Always - 0 11 Calibration_Retry_Count 0x002b 253 252 223Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 253 253 000Old_age Always - 34 192 Power-Off_Retract_Count 0x0032 253 253 000Old_age Always - 0 193 Load_Cycle_Count0x0032 253 253 000Old_age Always - 0 194 Temperature_Celsius 0x0032 253 253 000Old_age Always - 35 195 Hardware_ECC_Recovered 0x000a 253 252 000Old_age Always - 1411 196 Reallocated_Event_Count 0x0008 253 253 000Old_age Offline - 0 197 Current_Pending_Sector 0x0008 253 253 000Old_age Offline - 0 198 Offline_Uncorrectable 0x0008 253 253 000Old_age Offline - 0 199 UDMA_CRC_Error_Count0x0008 199 199 000Old_age Offline - 0 200 Multi_Zone_Error_Rate 0x000a 253 252 000Old_age Always - 0 201 Soft_Read_Error_Rate0x000a 253 252 000Old_age Always - 3 202 TA_Increase_Count 0x000a 253 252 000Old_age Always - 0 203 Run_Out_Cancel 0x000b 253 252 180Pre-fail Always - 0 204 Shock_Count_Write_Opern 0x000a 253 252 000Old_age Always - 0 205 Shock_Rate_Write_Opern 0x000a 253 252 000Old_age Always - 0 207 Spin_High_Current 0x002a 252 252 000Old_age Always - 0 208 Spin_Buzz 0x002a 252 252 000Old_age Always - 0 209 Offline_Seek_Performnce 0x0024 194 194 000Old_age Offline - 0 99 Unknown_Attribute 0x0004 253 253 000Old_age Offline - 0 100 Unknown_Attribute 0x0004 253 253 000Old_age Offline - 0 101 Unknown_Attribute 0x0004 253 253 000Old_age Offline - 0 Regards, Brad -- Human beings, who are almost unique in having the ability to learn from the experience of others, are also remarkable for their apparent disinclination to do so. -- Douglas Adams - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Convert raid5 to raid1?
John McMonagle wrote: Was planning to adding a hot spare to my 3 disk raid5 array and was thinking if I go to 4 drives I would be a better off as 2 raid1 arrays considering the current state of raid5. I just wonder about the comment considering the current state of raid5. What might be wrong with raid5 currently? I certainly know a number of people (me included) who run several large raid-5 arrays and don't have any problems. Brad -- Human beings, who are almost unique in having the ability to learn from the experience of others, are also remarkable for their apparent disinclination to do so. -- Douglas Adams - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID 5 disks not spinning down
Alexander Stockinger wrote: Hi all, I have a linux software RAID 5 running on a Debian Sarge with 2.6.7-smp. Since I installed the system (several kernel updates ago) the disks of the RAID 5 won't stay spinned down. Having them sent to standy manually using hdparm ends up in having the disks spin up within the next minute or so. To further investigate the problem I watched the /proc/diskstats file for /dev/md0 and realized that prior to the spinup a single read access to the RAID is issued - consequently the disks attached to the read spin up again... Does it occur if you have the filesystem on the md device unmounted? Regards, Brad -- Human beings, who are almost unique in having the ability to learn from the experience of others, are also remarkable for their apparent disinclination to do so. -- Douglas Adams - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: EVMS or md?
H. Peter n wrote: No hiccups, data losses, or missing functionality. At the end of the whole ordeal, the filesystem (1 TB, 50% full) was still quite prisine, and fsck confirmed this. I was quite pleased :) I second this. I endured numerous kernel crashes and other lockup/forced restart issues while setting up a 15 drive 3TB raid-6 (Crashes not really realted to the mb subsystem except for the oddity in the -mm kernel, and that was not raid-6 specific). I have popped out various drives and caused numerous failures/rebuilds with an ext3 system over 90% full while burn in testing and not experienced one glitch. It has been used in production now for over a month and is performing flawlessly. I have run it in full/1-disk and 2-disk degraded mode for testing. I certainly consider it stable. Brad -- Human beings, who are almost unique in having the ability to learn from the experience of others, are also remarkable for their apparent disinclination to do so. -- Douglas Adams - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: skip raid5 reconstruction
Ming Zhang wrote: Hi folks I am testing some HW performance with raid5 with 2.4.x kenrel. It is really troublesome every time I create a raid5, wait 4 hours for reconstruction, and then test some data and then recreate another one and wait again. I wonder if there is any hack or option available to create a raid5 without reconstruct the parity disk. I just have interest to test the performance so do not care about data correctness at this stage. I did a similar thing a while back. I created the raid and waited for it to sync, I then make dd copies of the raid superblocks. When I blew it up I just dd the clean superblocks back again (saved a 12 hour rebuild time) Having just thought about what you wrote, I guess you are building the raid in different configurationes each time, so my method might not be good for you. Regards, Brad -- Human beings, who are almost unique in having the ability to learn from the experience of others, are also remarkable for their apparent disinclination to do so. -- Douglas Adams - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: skip raid5 reconstruction
Ming Zhang wrote: I did a similar thing a while back. I created the raid and waited for it to sync, I then make dd copies of the raid superblocks. When I blew it up I just dd the clean superblocks back again (saved a 12 hour rebuild time) interesting to know about this. u just check the dmesg and see where is the sb and then u dd it out and dd back later? Actually, I used blockdev to get the device size and then just copied the last 64kb from memory.. It's a bit hazy now but that's pretty close I think Regards Brad -- Human beings, who are almost unique in having the ability to learn from the experience of others, are also remarkable for their apparent disinclination to do so. -- Douglas Adams - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Raid-6 stability gets my vote
G'day all, This message is really just for future googles. I have been running a 15 disk raid-6 since 24th feb in production and can completely vouch for it's stability. I have had both simulated and real drive failures and it has handled itself perfectly under all cases. Unclean shutdowns and resyncs have all been perfect. I know it's tagged as stable in any case, but I still get E-mails from people dragging my name from google asking about it, so I place this here for the public record. storage1:/home/brad# mdadm -D /dev/md0 /dev/md0: Version : 00.90.01 Creation Time : Thu Feb 24 14:51:17 2005 Raid Level : raid6 Array Size : 3186525056 (3038.91 GiB 3263.00 GB) Device Size : 245117312 (233.76 GiB 251.00 GB) Raid Devices : 15 Total Devices : 15 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Fri Jul 15 13:27:51 2005 State : clean Active Devices : 15 Working Devices : 15 Failed Devices : 0 Spare Devices : 0 Many thanks to hpa and others responsible for raid-6. I have already had a 2 drive failure and they were so close together even a hot spare would not have had time to rebuild. Also thanks to Neil Brown for a great monitoring and management tool. Mdadm and it's monitoring to E-mail has been invaluable. I'd also like to thank Maxtor for producing drives that actually have useful S.M.A.R.T. data. Wish the other manuf's would follow suit. Brad -- Human beings, who are almost unique in having the ability to learn from the experience of others, are also remarkable for their apparent disinclination to do so. -- Douglas Adams - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Recovery raid5 after sata cable failure.
Francisco Zafra wrote: Hi Neil, Since some hours I am trying to solved it with the last version: [EMAIL PROTECTED]:~ # mdadm --version mdadm - v2.0-devel-2 - DEVELOPMENT VERSION NOT FOR REGULAR USE - 7 July 2005 With the same results :( I really don't think it is locked I dd it in act of desperation and I have no problems: [EMAIL PROTECTED]:~ # dd if=/dev/zero of=/dev/sdh bs=1k count=1000 1000+0 records in 1000+0 records out 1024000 bytes transferred in 0.417862 seconds (2450570 bytes/sec) Asking a silly question perhaps.. fuser /dev/sdh Regards, Brad -- Human beings, who are almost unique in having the ability to learn from the experience of others, are also remarkable for their apparent disinclination to do so. -- Douglas Adams - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: split RAID1 during backups?
Jeff Breidenbach wrote: So - I'm thinking of the following backup scenario. First, remount /dev/md0 readonly just to be safe. Then mount the two component paritions (sdc1, sdd1) readonly. Tell the webserver to work from one component partition, and tell the backup process to work from the other component partition. Once the backup is complete, point the webserver back at /dev/md0, unmount the component partitions, then switch read-write mode back on. Why not do something like this ? mount -o remount,ro /dev/md0 /web mdadm --fail /dev/md0 /dev/sdd1 mdadm --remove /dev/md0 /dev/sdd1 mount -o ro /dev/sdd1 /target do backup here umount /target mdadm -add /dev/md0 /dev/sdd1 mount -o remount,rw /dev/md0 /web That way the web server continues to run from the md.. However you will endure a rebuild on md0 when you re-add the disk, but given everything is mounted read-only, you should not practically be doing anything and if you fail a disk during the rebuild the other disk will still be intact. I second jurriaan's vote for rsync also, but I would be inclined just to let it loose on the whole disk rather than break it up into parts.. but then I have heaps of ram too.. Regards, Brad -- Human beings, who are almost unique in having the ability to learn from the experience of others, are also remarkable for their apparent disinclination to do so. -- Douglas Adams - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Raid-6 Rebuild question
G'day all, Here is an interesting question( well I think so in any case ). I just replaced a failed disk in my 15 drive Raid-6. Simply mdadm --add /dev/md0 /dev/sdl Why, when there is no other activity on the array at all, is it writing to every disk during the recovery? I would have assumed it just read from the others and write to sdl. This is an iostat -k 5 on that machine while rebuilding avg-cpu: %user %nice%sys %iowait %idle 0.000.00 100.000.000.00 Device:tpskB_read/skB_wrtn/skB_readkB_wrtn sda 121.08 14187.95 925.30 23552 1536 sdb 127.71 14187.95 1002.41 23552 1664 sdc 125.30 14187.95 1002.41 23552 1664 sdd 122.29 14187.95 1002.41 23552 1664 sde 125.30 14187.95 1002.41 23552 1664 sdf 127.71 14187.95 1002.41 23552 1664 sdg 125.90 14187.95 925.30 23552 1536 sdh 125.30 14187.95 925.30 23552 1536 sdi 134.34 14187.95 925.30 23552 1536 sdj 137.95 14187.95 925.30 23552 1536 sdk 140.36 14187.95 1850.60 23552 3072 sdl 79.52 0.00 14265.06 0 23680 sdm 133.13 14187.95 925.30 23552 1536 sdn 134.34 14187.95 925.30 23552 1536 sdo 133.73 14187.95 925.30 23552 1536 md0 0.00 0.00 0.00 0 0 storage1:/home/brad# cat /proc/mdstat Personalities : [raid6] md0 : active raid6 sdl[15] sdg[6] sda[0] sdo[14] sdn[13] sdm[12] sdk[10] sdj[9] sdi[8] sdh[7] sdf[5] sde[4] sdd[3] sdc[2] sdb[1] 3186525056 blocks level 6, 128k chunk, algorithm 2 [15/14] [UUU_UUU] [] recovery = 1.8% (4518144/245117312) finish=838.3min speed=4782K/sec unused devices: none Regards, Brad -- Human beings, who are almost unique in having the ability to learn from the experience of others, are also remarkable for their apparent disinclination to do so. -- Douglas Adams - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid-6 Rebuild question
Brad Campbell wrote: G'day all, Here is an interesting question( well I think so in any case ). I just replaced a failed disk in my 15 drive Raid-6. Forgot the most important detail (as usual) bklaptop:~ssh storage1 uname -a Linux storage1 2.6.11.7 #4 Fri Oct 7 20:00:25 GST 2005 i686 GNU/Linux Regards, Brad -- Human beings, who are almost unique in having the ability to learn from the experience of others, are also remarkable for their apparent disinclination to do so. -- Douglas Adams - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Expanding RAID array?
Neil Brown wrote: - add disks to convert to raid6. I don't think this is possible, but you should check the latest raid reconfig. It's not. I started work on it Feb last year but then real life got in the way again. In the longer term, I think raidreconf as it stands is going to die (mainly because it's infrastructure relies on the old raidtab architecture). I thought about perhaps porting it as an addon to mdadm, but then I ran out of drives/machines/time to test it on. Might be supported online with a limited raid6 in which the Q syndrome (second 'parity' block) isn't rotated among disks. In theory I would have thought it not that much different than a raid-5 expand, just inserting an extra block for the Q syndrome. - status of RAID6 I believe it is as stable/reliable as raid5. Mine has been running since Feb last year with fairly moderate use and no hiccups. I have just upgraded to the latest 2.6.15-git on that machine to give some of the newer raid patches (like check repair) a whirl. Seems fairly solid. Let's say I've never had any raid-6 related data loss, or even near misses, but it has saved my bacon in 2 dual drive failures in the last year. Oh, and the new read and check code (rather than just rebuild the parity blocks) shaves about 1 hour of what was an 11 hour rebuild time on this particular raid-6. Thanks Neil! Regards, Brad -- Human beings, who are almost unique in having the ability to learn from the experience of others, are also remarkable for their apparent disinclination to do so. -- Douglas Adams - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID 1 vs RAID 0
Max Waterman wrote: Still, it seems like it should be a solvable problem...if you order the data differently on each disk; for example, in the two disk case, putting odd and even numbered 'stripes' on different platters [or sides of platters]. The only problem there is determining the internal geometry of the disk, and knowing that each disk is probably different. How do you know which logical sector number correlates to which surface and where abouts on the surface? Just thinking about it makes my brain hurt. Not like the good old days of the old stepper disks. Brad -- Human beings, who are almost unique in having the ability to learn from the experience of others, are also remarkable for their apparent disinclination to do so. -- Douglas Adams - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Two RAID6 questions
John Rowe wrote: First, can raidreconf grow a RAID6 device? The man page doesn't seem to mention RAID6 at all. No, raidreconf has no knowledge of raid-6 at all. Second, with RAID5 or RAID6 my biggest fear is a system crash whilst the RAID is writing resulting in dirty blocks. Does RAID6 give some sort of ECC capability when reconstructing? I'm imagining checking the parity of a RAID block and if it's wrong assuming each block in turn is dirty, recalculating it from the first parity and then checking the result against the second parity. This is just one of those things.. if you crash while writing, unless you have hardware raid with NVRAM you are going to leave the array in an unclean, uncertain state.. The best you can do is re-sync the array, fsck the filesystem and hope for the best.. The greatest new feature in md in this regard is the periodically mark the array as clean while it's idle.. It generally ensures that most of the time on a crash you don't need a resync on the next assemble. Brad -- Human beings, who are almost unique in having the ability to learn from the experience of others, are also remarkable for their apparent disinclination to do so. -- Douglas Adams - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Silly stripe cache values
Brad Campbell wrote: G'day all, I have a box here.. it has a 2Ghz processor and 1.5GB of ram. It runs the entire OS over NFS and it's sole purpose in life is to run 15 SATA drives in a Raid-6 with ext3 on it, and share that over NFS. Most of that ram is sitting completely idle and thus I thought a logical thing to do would be to stuff as much of it as possible into the MD subsystem to help it cache.. Are there any limits to the values living in /sys/block/md* and what might be the tradeoffs (if any) to using what would normally be thought stupid amounts of ram for these knobs ? This box does not get written to often, it's just a media streamer mostly.. but if I am writing to it then it chokes just providing a 1Mb stream over the network currently. (It's on a 2.6.15-git11 kernel currently but I'm just upgrading to 2.6.16 now) Scratch that.. the limit appears to be 32768 and that works fine.. Google search results increase in accuracy proportionally with the elapsed time of a list posting with the question.. :\ Brad -- Human beings, who are almost unique in having the ability to learn from the experience of others, are also remarkable for their apparent disinclination to do so. -- Douglas Adams - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Recommendations for supported 4-port SATA PCI card ?
Christopher Smith wrote: Brad Campbell wrote: I've been running 3 together in one box for about 18 months, and four in another for a year now... the on board BIOS will only pickup 8 drives, but they work just fine under Linux and recognise all connected drives. What distro and kernel ? Err.. well, I started with 2.4 originally and I've run 2.6.5, 2.6.9, 2.6.10-bk10, 2.6.15.1 and now 2.6.16 on both boxes.. never an issue with multiple card support.. It's based on Debian mostly, but to be honest it's not important as the kernel is always self compiled as with mdadm.. I tried this about 2 - 3 months ago and had problems whenever more than two cards were in my system, even if only a single drive was installed. I posted about it here (search for multiple promise sata150 tx4 cards back in January). I recall the thread.. don't think I replied then as I did not really have anything to say I guess. I have no complaints about the performance (relatively speaking, of course), but I've got the cards in a machine with multiple PCI-X busses, so it's not really bottlenecked there. Perhaps its related to that. All mine are on ASUS A7V600 motherboards with a single bog standard PCI bus. At one point early on I recall someone stating they were having all sorts of problems with those cards and a 66Mhz bus.. Are you running the latest BIOS in all the cards? Only ask as the 1st thing I did when mine arrived was to upgrade the BIOS.. not using it but I thought it might have some impact on the way they are set up on the PCI bus. I have a mate who has another 3 in his box also.. but again he's on an ASUS A7V600 motherboard. I'm looking to build another 15 drive box soon and I was thinking of perhaps a couple of 8 port Marvell cards, but then I might just stick with what I know and source some more of the same. Regards, Brad -- Human beings, who are almost unique in having the ability to learn from the experience of others, are also remarkable for their apparent disinclination to do so. -- Douglas Adams - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: A failed-disk-how-to anywhere?
Martin Stender wrote: Hi there! I have two identical disks sitting on a Promise dual channel IDE controller. I guess both disks are primary's then. One of the disks have failed, so I bought a new disk, took out the failed disk, and put in the new one. That might seem a little naive, and apparently it was, since the system won't boot up now. It boots fine, when only the old, healthy disk is connected. By the way, all three partitions are raided - including /boot ... Anyway, I have removed the old disk from the Raid with: #mdadm /dev/mdo --remove /dev/hde1 #mdadm /dev/md1 --remove /dev/hde2 #mdadm /dev/md2 --remove /dev/hde3 - but the the problem persists. I can't seem to find a decent 'How-To' - so how it this supposed to be done? A little more info would be helpful. How does the machine boot? How are your other disks configured? Are you booting off the Promise board or on-board controller (making assumptions given your promise appears to contain hde, I'm assuming hd[abcd] are on board somewhere..) I'm going to take a wild stab in the dark now.. My initial thought would be you have hde and hdg in a raid-1 and nothing on the on-board controllers. hde has failed and when you removed it your controller tried the 1st disk it could find (hdg) to boot of.. Bingo.. away we go. You plug a new shiny disk into hde and now the controller tries to boot off that, except it's blank and therefore a no-go. I'd either try and force the controller to boot off hdg (which might be a controller bios option) or swap hde hdg.. then it might boot and let you create your partitions on hdg and then add it back into the mirror. How close did I get ? Brad -- Human beings, who are almost unique in having the ability to learn from the experience of others, are also remarkable for their apparent disinclination to do so. -- Douglas Adams - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: I need a PCI V2.1 4 port SATA card
Guy wrote: Hello group, I am upgrading my disks from old 18 Gig SCSI disks to 300 Gig SATA disks. I need a good SATA controller. My system is old and has PCI V 2.1. I need a 4 port card, or 2 2 port cards. My system has multi PCI buses, so 2 cards may give me better performance, but I don't need it. I will be using software RAID. Can anyone recommend a card that is supported by the current kernel? I'm using Promise SATA150TX4 cards here in old PCI based systems. They work great and have been rock solid for well in excess of a year 24/7 hard use. I have 3 in one box and 4 in another. I'm actually looking at building another 15 disk server now and was hoping to move to something quicker using _almost_ commodity hardware. My current 15 drive RAID-6 server is built around a KT600 board with an AMD Sempron processor and 4 SATA150TX4 cards. It does the job but it's not the fastest thing around (takes about 10 hours to do a check of the array or about 15 to do a rebuild). I'd love to do something similar with PCI-E or PCI-X and make it go faster (the PCI bus bandwidth is the killer), however I've not seen many affordable PCI-E multi-port cards that are supported yet and PCI-X seems to mean moving to server class mainboards and the other expenses that come along with that. Brad -- Human beings, who are almost unique in having the ability to learn from the experience of others, are also remarkable for their apparent disinclination to do so. -- Douglas Adams - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ok to go ahead with this setup?
[EMAIL PROTECTED] wrote: Mike Dresser wrote: On Fri, 23 Jun 2006, Molle Bestefich wrote: Christian Pernegger wrote: Anything specific wrong with the Maxtors? I'd watch out regarding the Western Digital disks, apparently they have a bad habit of turning themselves off when used in RAID mode, for some reason: http://thread.gmane.org/gmane.linux.kernel.device-mapper.devel/1980/ The MaxLine III's (7V300F0) with VA111630/670 firmware currently timeout on a weekly or less basis.. I'm still testing VA111680 on a 15x300 gig array We also see similar problem on Maxtor 6V250F0 drives: they 'crash' randomly at a weeks timescale. Only way to get them back is by power cycling. Tried both SuperMicro SATA card (Marvell chip) and Promise Fastrak, firmware updates from Maxtor did not fix it yet. We were already forced to exchange all drives at a customer because he does not want to use Maxtor's anymore. Neither do we :( Whereas I have 28 7Y250M0 drives sitting in a couple of arrays here that have behaved perfectly (aside from some grown defects) for over 18000 hours so far. They are *all* sitting on Promise SATA150TX4 cards on 2.6 kernels. I'm looking at another server and another 15 drives at the moment, and it's Maxtors I'm looking at. Everyone has different experience. I would not touch Seagate with a 10 foot pole (blew up way too many logic boards when I was using them), and I got bitten *badly* by the WD firmware issue with RAID (firmware upgrade fixed that, but can't replace the data I lost when 3 of them failed at the same time and the array got corrupted). Having said that, it was MaxLineIII 300G drives I was looking at, so perhaps I'll wait a little longer and hear some more stories before I drop $$ on 15 of them. Brad -- Human beings, who are almost unique in having the ability to learn from the experience of others, are also remarkable for their apparent disinclination to do so. -- Douglas Adams - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: trying to brute-force my RAID 5...
Francois Barre wrote: What are you expecting fdisk to tell you? fdisk lists partitions and I suspect you didn't have any partitions on /dev/md0 More likely you want something like fsck -n -f /dev/md0 and see which one produces the least noise. Maybe a simple file -s /dev/md0 could do the trick, and would only produce output different from the mere data when the good configuration is found... More likely to produce an output whenever the 1st disk in the array is in the right place as it will just look at the 1st couple of sectors for the superblock. I'd go with the fsck idea as it will try to inspect the rest of the filesystem also. Brad -- Human beings, who are almost unique in having the ability to learn from the experience of others, are also remarkable for their apparent disinclination to do so. -- Douglas Adams - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Resize on dirty array?
David Rees wrote: I personally prefer to do a long self-test once a week, a month seems like a lot of time for something to go wrong. unfortunately i found some drives (seagate 400 pata) had a rather negative effect on performance while doing self-test. Interesting that you noted negative performance, but I typically schedule the tests for off-hours anyway where performance isn't critical. Personally I have every disk do a short test at 6am Monday-Saturday, and then they *all* (29 of them) do a long test every Sunday at 6am. I figure having all disks do a long test at the same time rather than staggered is going to show up any pending issues with my PSU's also. (Been doing this for nearly 2 years now and had it show up a couple of drives that were slowly growing defects. Nothing a dd if=/dev/zero of=/dev/sd(x) did not fix though) Brad -- Human beings, who are almost unique in having the ability to learn from the experience of others, are also remarkable for their apparent disinclination to do so. -- Douglas Adams - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RAID-6 check slow..
G'day all, I have a box with 15 SATA drives in it, they are all on the PCI bus and it's a relatively slow machine. I can extract about 100MB/s combined read speed from these drives with dd. When reading /dev/md0 with dd I get about 80MB/s, but when I ask it to check the array on a completely idle system with echo check /sys/block/md/md0/sync_action I get a combined read speed across all drives of 31.9MB/s I'm not that fussed I guess, given the system does have extended idle periods, it would be nice to have a sync or check complete as quickly as the hardware allows. Experience has shown that a rebuild of a single disk failure takes 10-12 hours but the check seems to take forever [EMAIL PROTECTED]:~$ cat /proc/mdstat Personalities : [raid6] md0 : active raid6 sda[0] sdo[14] sdn[13] sdm[12] sdl[11] sdk[10] sdj[9] sdi[8] sdh[7] sdg[6] sdf[5] sde[4] sdd[3] sdc[2] sdb[1] 3186525056 blocks level 6, 128k chunk, algorithm 2 [15/15] [UUU] [] resync = 0.1% (458496/245117312) finish=1881.9min speed=2164K/sec unused devices: none I have included some iostat output running on a 5 second interval and allowed 30 seconds to stabilise. Linux storage1 2.6.17.9 #2 Sun Aug 20 17:16:24 GST 2006 i686 GNU/Linux - snip - 1st a dd from all drives. storage1:/home/brad# cat t #!/bin/sh for i in /dev/sd[abcdefghijklmno] ; do echo $i dd if=$i of=/dev/null done; avg-cpu: %user %nice%sys %iowait %idle 8.800.00 58.40 32.800.00 Device:tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 13.00 13312.00 0.00 66560 0 sdb 12.80 13107.20 0.00 65536 0 sdc 12.80 13107.20 0.00 65536 0 sdd 12.80 13107.20 0.00 65536 0 sde 12.80 13107.20 0.00 65536 0 sdf 12.80 13107.20 0.00 65536 0 sdg 12.80 13107.20 0.00 65536 0 sdh 13.00 13312.00 0.00 66560 0 sdi 12.80 13107.20 0.00 65536 0 sdj 13.00 13312.00 0.00 66560 0 sdk 13.00 13312.00 0.00 66560 0 sdl 12.80 13107.20 0.00 65536 0 sdm 17.20 17612.80 0.00 88064 0 sdn 17.20 17612.80 0.00 88064 0 sdo 17.20 17612.80 0.00 88064 0 md0 0.00 0.00 0.00 0 0 snip - echo check /sys/block/md/md0/sync_action avg-cpu: %user %nice%sys %iowait %idle 0.800.006.590.00 92.61 Device:tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 5.99 4343.31 0.00 21760 0 sdb 5.99 4343.31 0.00 21760 0 sdc 5.99 4343.31 0.00 21760 0 sdd 5.99 4343.31 0.00 21760 0 sde 5.99 4343.31 0.00 21760 0 sdf 5.99 4343.31 0.00 21760 0 sdg 5.99 4343.31 0.00 21760 0 sdh 5.99 4343.31 0.00 21760 0 sdi 5.99 4343.31 0.00 21760 0 sdj 5.99 4343.31 0.00 21760 0 sdk 5.99 4343.31 0.00 21760 0 sdl 5.99 4343.31 0.00 21760 0 sdm 5.99 4343.31 0.00 21760 0 sdn 5.99 4343.31 0.00 21760 0 sdo 5.99 4343.31 0.00 21760 0 md0 0.00 0.00 0.00 0 0 storage1:/home/brad# grep 0 /proc/sys/dev/raid/* /proc/sys/dev/raid/speed_limit_max:40 /proc/sys/dev/raid/speed_limit_min:1000 - snip - dd if=/dev/md0 of=/dev/null avg-cpu: %user %nice%sys %iowait %idle 9.000.00 72.60 18.400.00 Device:tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 25.80 11008.00 0.00 55040 0 sdb 25.60 10924.80 0.00 54624 0 sdc 26.00 10956.80 0.00 54784 0 sdd 25.80 10956.80 0.00 54784 0 sde 25.20 11059.20 0.00 55296 0 sdf 26.00 11008.00 0.00 55040 0 sdg 26.20 11008.00 0.00 55040 0 sdh 26.40 11008.00
Re: RAID-6 check slow..
Neil Brown wrote: Hmm nothing obvious. Have you tried increasing /proc/sys/dev/raid/speed_limit_min:1000 just in case that makes a difference (it shouldn't but you seem to be down close to that speed). No difference.. What speed in the raid6 algorithm used - as reported at boot time? Again, I doubt that is the problem - if should be about 1000 times speed you are seeing. raid6: int32x1739 MB/s raid6: int32x2991 MB/s raid6: int32x4636 MB/s raid6: int32x8587 MB/s raid6: mmxx1 1556 MB/s raid6: mmxx2 2701 MB/s raid6: sse1x11432 MB/s raid6: sse1x22398 MB/s raid6: using algorithm sse1x2 (2398 MB/s) md: raid6 personality registered for level 6 raid5: automatically using best checksumming function: pIII_sse pIII_sse : 2345.000 MB/sec raid5: using function: pIII_sse (2345.000 MB/sec) md: md driver 0.90.3 MAX_MD_DEVS=256, MD_SB_DISKS=27 md: bitmap version 4.39 What if you try increasing /sys/block/md0/md/stripe_cache_size ? I already have this at 8192 (which appears to be HUGE for 15 drives, but I've got 1.5GB of ram and nothing else using it) That's all I can think of for now. Oh well, no stress.. Just thought I'd ask anyway :) Brad -- Human beings, who are almost unique in having the ability to learn from the experience of others, are also remarkable for their apparent disinclination to do so. -- Douglas Adams - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: md array numbering is messed up
Michael Tokarev wrote: Neil Brown wrote: On Sunday October 29, [EMAIL PROTECTED] wrote: Hi, I have 2 arrays whose numbers get inverted, creating havoc, when booting under different kernels. I have md0 (raid1) made up of ide drives and md1 (raid5) made up of five sata drives, when booting with my current ubuntu 2.6.12-9 kernel. When I try to boot a more recent kernel (2.6.15-26 or 2.6.15-27) the order is inversed and my sata raid5 array shows up as md0. My arrays are part of evms volumes that just stop working if the numbering is inverted. any clues ? Your arrays are being started the wrong way. Do you have an mdadm.conf that lists the arrays? Can you show us what it looked like? If not, do you know how the arrays are started in ubuntu? My guess is that it's using mdrun shell script - the same as on Debian. It's a long story, the thing is quite ugly and messy and does messy things too, but they says it's compatibility stuff and continue shipping it. For the OP, the solution is to *create* mdadm.conf file - in that case mdrun should hopefully NOT run. I'd suggest you are probably correct. By default on Ubuntu 6.06 [EMAIL PROTECTED]:~$ cat /etc/init.d/mdadm-raid #!/bin/sh # # Start any arrays which are described in /etc/mdadm/mdadm.conf and which are # not running already. # # Copyright (c) 2001-2004 Mario Jou/3en [EMAIL PROTECTED] # Distributable under the terms of the GNU GPL version 2. MDADM=/sbin/mdadm MDRUN=/sbin/mdrun CONFIG=/etc/mdadm/mdadm.conf DEBIANCONFIG=/etc/default/mdadm . /lib/lsb/init-functions test -x $MDADM || exit 0 AUTOSTART=true test -f $DEBIANCONFIG . $DEBIANCONFIG case $1 in start) if [ x$AUTOSTART = xtrue ] ; then if [ ! -f /proc/mdstat ] [ -x /sbin/modprobe ] ; then /sbin/modprobe -k md /dev/null 21 fi test -f /proc/mdstat || exit 0 log_begin_msg Starting RAID devices... if [ -f $CONFIG ] [ -x $MDADM ] ; then $MDADM -A -s elif [ -x $MDRUN ] ; then $MDRUN fi log_end_msg $? fi ;; stop|restart|reload|force-reload) ;; *) log_success_msg Usage: $0 {start|stop|restart|reload|force-reload} exit 1 ;; esac exit 0 Brad -- Human beings, who are almost unique in having the ability to learn from the experience of others, are also remarkable for their apparent disinclination to do so. -- Douglas Adams - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Swapping out for larger disks
G'day all, I've got 3 arrays here. A 3 drive raid-5, a 10 drive raid-5 and a 15 drive raid-6. They are all currently 250GB SATA drives. I'm contemplating an upgrade to 500GB drives on one or more of the arrays and wondering the best way to do the physical swap. The slow and steady way would be to degrade the array, remove a disk, add the new disk, lather, rinse, repeat. After which I could use mdadm --grow. There is the concern of a degraded array here though (and one of the reasons I'm looking to swap is some of the disks have about 30,000 hours on the clock and are growing the odd defect). I was more wondering about the feasibility of using dd to copy the drive contents to the larger drives (then I could do 5 at a time) and working it from there. It occurs though that the superblocks would be in the wrong place for the new drives and I'm wondering if the kernel or mdadm might not find them. Ideas? Suggestions ? Brad -- Human beings, who are almost unique in having the ability to learn from the experience of others, are also remarkable for their apparent disinclination to do so. -- Douglas Adams - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Swapping out for larger disks
David Greaves wrote: I was more wondering about the feasibility of using dd to copy the drive contents to the larger drives (then I could do 5 at a time) and working it from there. Err, if you can dd the drives, why can't you create a new array and use xfsdump or equivalent? Is downtime due to copying that bad? I can only do 5 at a time. (10 slots, 5 source - 5 destination). I'm not worried about the downtime so much as the constant swapping of disks. This way I can do it in 2 or 3 blocks at most. Brad -- Human beings, who are almost unique in having the ability to learn from the experience of others, are also remarkable for their apparent disinclination to do so. -- Douglas Adams - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Machine hanging on synchronize cache on shutdown 2.6.22-rc4-git[45678]
Mikael Pettersson wrote: I don't think sata_promise is the guilty party here. Looks like some layer above sata_promise got confused about the state of the interface. But locking up hard after hardreset is a problem of sata_promise, no? Maybe, maybe not. The original report doesn't specify where/how the machine hung. It hangs in the process of trying to power it off. Unmount everything and halt the machine. I've tried halt with and without the -h. With the -h you can hear the drives spin down, then it tries to spin them up again and hangs. Without the -h it just hangs hard where you see in the photo. Brad: can you enable sysrq and check if the kernel responds to sysrq when it appears to hang, and if so, where it's executing? All my kernels have sysrq enabled. Once the hard reset is displayed on the screen everything locks. sata_promise just passes sata_std_hardreset to ata_do_eh. I've certainly seen EH hardresets work before, so I'm assuming that something in this particular situation (PHY offlined, kernel close to shutting down) breaks things. That is my thought. I thought on a .22-rc kernel if I used halt -h and it spun the disks down that the kernel would detect that and not try to flush the caches on them, or have I read something incorrectly? FWIW, I'm seeing scsi layer accesses (cache flushes) after things like rmmod sata_promise. They error out and don't seem to cause any harm, but the fact that they occur at all makes me nervous. Brad -- Human beings, who are almost unique in having the ability to learn from the experience of others, are also remarkable for their apparent disinclination to do so. -- Douglas Adams - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Software based SATA RAID-5 expandable arrays?
greenjelly wrote: The options I seek are to be able to start with a 6 Drive array RAID-5 array, then as my demand for more space increases in the future I want to be able to plug in more drives and incorporate them into the Array without the need to backup the data. Basically I need the software to add the drive/drives to the Array, then Rebuild the array incorporating the new drives while preserving the data on the original array. I've got 2 boxes. One has 14 drives and a 480W PSU and the other has 15 drives and a 600W PSU. It's not rocket science. Put a lot of drives in a box, make sure you have enough sata ports and power to go around (watch your peak 12V consumption on spin up really) and use linux md. Easy.. Oh, but make sure the drives stay cool! For a cheap-o home server (which is what I have) I'd certainly not bother with a dedicated RAID card. You are not even going to need GB ethernet really.. I've got 15 drives on a single PCI bus, it's as slow as a wet week in may (in the southern hemisphere), but I'm streaming to 3 head units which total a combined 5MB/s if I'm lucky.. Rebuilds can take up to 10 hours though. QUESTIONS Since this is a media server, and would only be used to serve Movies and Video to my two machines It wouldn't have to be powered up full time (My Music consumes less space and will be contained on two seperate machines). Is there a way to considerably lower the power consumption of this server the 90% of time its not in use? Yes, don't poll for SMART and spin down the drives when idle (man hdparm). Use S3 sleep and WOL if you are really clever. (I'm not, my boxes live in a dedicated server room with its own AC, but that's because I'm nuts). I also have over 25k hours on the drives because I don't spin them down. I figure the extra power is a trade off for drive life. They've got less than 50 spin cycles on them in over 25k hours.. Can Linux support Drive Arrays of Significant Sizes (4-8 terabytes)? Yes, easily (6TB here) Can Linux Software support RAID-5 expandability, allowing me to increase the number of disks in the array, without the need to backup the media, recreate the array from scratch and then copy the backup to the machine (something I will be unable to do)? Yes but get a cheap UPS at least (it's cheap insurance) I know this is a Linux forum, but I figure many of you guys work with Windows Server. If so does Windows 2003 provide the same support for the requested requirements above? Why would you even _ask_ ?? Read the man page for mdadm, then read it again (and a third time). Then google for Raid-5 two drive failure linux just to familiarise yourself with the background. What you are doing has been done before many, many times. There are some well written sites out there relating to building exactly what you want to build with great detail. If you are serious about using windows, I pity you.. Linux (actually a combination of the kernel md layer and mdadm) makes it so easy you'd be nuts to beat your head against the wall with the alternative. Brad -- Human beings, who are almost unique in having the ability to learn from the experience of others, are also remarkable for their apparent disinclination to do so. -- Douglas Adams - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Software based SATA RAID-5 expandable arrays?
jahammonds prost wrote: From: Brad Campbell [EMAIL PROTECTED] I've got 2 boxes. One has 14 drives and a 480W PSU and the other has 15 drives and a 600W PSU. It's not rocket science. Where did you find reasonably priced cases to hold so many drives? Each of my home servers top out at 8 data drives each - plus a 20Gb one to boot from. For one of them I used a modified CD duplicator case (9 5.25 bays) and the other one I used a nice tall tower. All except 4 drives are in Supermicro hotswap bays. Aside from the Supermicro bays (which do look nice and keep the drives very cool) these machines are chewing gum and duct tape jobs. http://i10.photobucket.com/albums/a109/ytixelprep/F.jpg Having said that, they are chewing gum and duct tape jobs that have had a downtime of less than 4 hrs/year over the last 3 years. Brad -- Human beings, who are almost unique in having the ability to learn from the experience of others, are also remarkable for their apparent disinclination to do so. -- Douglas Adams - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Linux Software RAID is really RAID?
Johny Mail list wrote: Hello list, I have a little question about software RAID on Linux. I have installed Software Raid on all my SC1425 servers DELL by believing that the md raid was a strong driver. And recently i make some test on a server and try to view if the RAID hard drive power failure work fine, so i power up my server and after booting and the prompt appear I disconnected the power cable of my SATA hard drive. Normaly the MD should eleminate the failure hard drive of the logical drive it build, and the server continue to work fine like nothing happen. Oddly the server stop to respond and i get this messages : ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata4.00: cmd e7/00:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 0x0 data 0 res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) ata4: port is slow to respond, please be patient (Status 0xd0) ata4: port failed to respond (30sec, Status 0xd0) ata4: soft resetting port After that my system is frozen. Normaly in a basic RAID the device is disable in the logical RAID device (md0) and it only use the last disk. cc to linux-ide added. Unfortunately this is not an artifact of the linux raid driver, rather it appears to be an issue with the SATA driver and related error recovery. Some information about what kernel, configuration, drives, controller cards and other relevant system information would be good. See the information at this URL for the sort of extra information that would be handy. http://www.kernel.org/pub/linux/docs/lkml/reporting-bugs.html Regards, Brad -- Human beings, who are almost unique in having the ability to learn from the experience of others, are also remarkable for their apparent disinclination to do so. -- Douglas Adams - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html