Re: mdadm create to existing raid5
Guy Watkins wrote: } [EMAIL PROTECTED] On Behalf Of Jon Collette } I wasn't thinking and did a mdadm --create to my existing raid5 instead } of --assemble. The syncing process ran and now its not mountable. Is } there anyway to recover from this? Maybe. Not really sure. But don't do anything until someone that really knows answers! I agree - Yes, maybe. What I think... If you did a create with the exact same parameters the data should not have changed. But you can't mount so you must have used different parameters. I'd agree. Only 1 disk was written to during the create. Yep. Only that disk was changed. Yep. If you remove the 1 disk and do another create with the original parameters and put missing for the 1 disk your array will be back to normal, but degraded. Once you confirm this you can add back the 1 disk. Yep. **WARNING** **WARNING** **WARNING** At this point you are relatively safe (!) but as soon as you do an 'add' and initiate another resync then if you got it wrong you will have toasted your data completely!!! **WARNING** **WARNING** **WARNING** You must be able to determine which disk was written to. I don't know how to do that unless you have the output from mdadm -D during the create/syncing. Do you know the *exact* command you issued when you did the initial --create? Do you know the *exact* command you issued when you did the bogus --create? And what version of mdadm you are using? Neil said that it's mdadm, not the kernel, that determines which device is initially degraded during a create. We can look at the code and your command line and guess which device mdadm chose. (Getting this wrong won't matter but it may make recovery quicker.) assuming you have a 4 device raid using /dev/sda1, /dev/sdb1, /dev/sdc1, /dev/sdd1 you'll then do something like: mdadm --create --verbose /dev/md0 --level=5 --raid-devices=4 /dev/sda1 /dev/sdb1 /dev/sdc1 missing try a mount mdadm --create --verbose /dev/md0 --level=5 --raid-devices=4 /dev/sda1 missing /dev/sdc1 /dev/sdb1 try a mount mdadm --create --verbose /dev/md0 --level=5 --raid-devices=4 /dev/sdb1 /dev/sda1 /dev/sdc1 missing try a mount mdadm --create --verbose /dev/md0 --level=5 --raid-devices=4 /dev/sdc1 /dev/sdb1 /dev/sda1 missing try a mount etc etc, So you'll still need to do a trial and error assemble For a simple 4 device array I there are 24 permutations - doable by hand, if you have 5 devices then it's 120, 6 is 720 - getting tricky ;) I'm bored so I'm going to write a script based on something like this: http://www.unix.org.ua/orelly/perl/cookbook/ch04_20.htm Feel free to beat me to it ... The critical thing is that you *must* use 'missing' when doing these trial --create calls. If we've not explained something very well and you don't understand then please ask before trying it out... David - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: mdadm create to existing raid5
David Greaves wrote: For a simple 4 device array I there are 24 permutations - doable by hand, if you have 5 devices then it's 120, 6 is 720 - getting tricky ;) Oh, wait, for 4 devices there are 24 permutations - and you need to do it 4 times, substituting 'missing' for each device - so 96 trials. 4320 trials for a 6 device array. Hmm. I've got a 7 device raid 6 - I think I'll go an make a note of how it's put together... grin Have a look at this section and the linked script. I can't test it until later http://linux-raid.osdl.org/index.php/RAID_Recovery http://linux-raid.osdl.org/index.php/Permute_array.pl David - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [dm-devel] Re: [RFD] BIO_RW_BARRIER - what it means for devices, filesystems, and dm/md.
Guy Watkins wrote: } -Original Message- } From: [EMAIL PROTECTED] [mailto:linux-raid- } [EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED] } Sent: Thursday, July 12, 2007 1:35 PM } To: [EMAIL PROTECTED] } Cc: Tejun Heo; [EMAIL PROTECTED]; Stefan Bader; Phillip Susi; device-mapper } development; [EMAIL PROTECTED]; [EMAIL PROTECTED]; } linux-raid@vger.kernel.org; Jens Axboe; David Chinner; Andreas Dilger } Subject: Re: [dm-devel] Re: [RFD] BIO_RW_BARRIER - what it means for } devices, filesystems, and dm/md. } } On Wed, 11 Jul 2007 18:44:21 EDT, Ric Wheeler said: } [EMAIL PROTECTED] wrote: } On Tue, 10 Jul 2007 14:39:41 EDT, Ric Wheeler said: } } All of the high end arrays have non-volatile cache (read, on power } loss, it is a } promise that it will get all of your data out to permanent storage). } You don't } need to ask this kind of array to drain the cache. In fact, it might } just ignore } you if you send it that kind of request ;-) } } OK, I'll bite - how does the kernel know whether the other end of that } fiberchannel cable is attached to a DMX-3 or to some no-name product } that } may not have the same assurances? Is there a I'm a high-end array } bit } in the sense data that I'm unaware of? } } } There are ways to query devices (think of hdparm -I in S-ATA/P-ATA } drives, SCSI } has similar queries) to see what kind of device you are talking to. I am } not } sure it is worth the trouble to do any automatic detection/handling of } this. } } In this specific case, it is more a case of when you attach a high end } (or } mid-tier) device to a server, you should configure it without barriers } for its } exported LUNs. } } I don't have a problem with the sysadmin *telling* the system the other } end of } that fiber cable has characteristics X, Y and Z. What worried me was } that it } looked like conflating device reported writeback cache with device } actually } has enough battery/hamster/whatever backup to flush everything on a power } loss. } (My back-of-envelope calculation shows for a worst-case of needing a 1ms } seek } for each 4K block, a 1G cache can take up to 4 1/2 minutes to sync. } That's } a lot of battery..) Most hardware RAID devices I know of use the battery to save the cache while the power is off. When the power is restored it flushes the cache to disk. If the power failure lasts longer than the batteries then the cache data is lost, but the batteries last 24+ hours I beleve. Most mid-range and high end arrays actually use that battery to insure that data is all written out to permanent media when the power is lost. I won't go into how that is done, but it clearly would not be a safe assumption to assume that your power outage is only going to be a certain length of time (and if not, you would lose data). A big EMC array we had had enough battery power to power about 400 disks while the 16 Gig of cache was flushed. I think EMC told me the batteries would last about 20 minutes. I don't recall if the array was usable during the 20 minutes. We never tested a power failure. Guy I worked on the team that designed that big array. At one point, we had an array on loan to a partner who tried to put it in a very small data center. A few weeks later, they brought in an electrician who needed to run more power into the center. It was pretty funny - he tried to find a power button to turn it off and then just walked over and dropped power trying to get the Symm to turn off. When that didn't work, he was really, really confused ;-) ric - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Software based SATA RAID-5 expandable arrays?
To run it manually; echo check /sys/block/md0/md/sync_action than you can check the status with; cat /proc/mdstat Or to continually watch it, if you want (kind of boring though :) ) watch cat /proc/mdstat This will refresh ever 2sec. In my original email I suggested to use a crontab so you don't need to remember to do this every once in a while. Run (I did this in root); crontab -e This will allow you to edit you crontab. Now past this command in there; 30 2 * * Mon echo check /sys/block/md0/md/sync_action If you want you can add comments, I like to comment my stuff since I have lots of stuff in mine, just make sure you have '#' in the front of the lines so your system knows it is just a comment and not a command it should run; #check for bad blocks once a week (every Mon at 2:30am) #if bad blocks are found, they are corrected from parity information After you have put this in your crontab, write and quit with this command; :wq It should come back with this; [EMAIL PROTECTED] ~]# crontab -e crontab: installing new crontab Now you can look at your cron table (without editing) with this; crontab -l It should return something like this, depending if you added comments or how you scheduled your command; #check for bad blocks once a week (every Mon at 2:30am) #if bad blocks are found, they are corrected from parity information 30 2 * * Mon echo check /sys/block/md0/md/sync_action For more info on crontab and syntax for times (I just did a google and grabbed the first couple links...); http://www.tech-geeks.org/contrib/mdrone/croncrontab-howto.htm http://ubuntuforums.org/showthread.php?t=102626highlight=cron Cheers, Dan. -Original Message- From: Michael [mailto:[EMAIL PROTECTED] Sent: Thursday, July 12, 2007 5:43 PM To: Bill Davidsen; Daniel Korstad Cc: linux-raid@vger.kernel.org Subject: Re: Software based SATA RAID-5 expandable arrays? SuSe uses its own version of cron which is different then everything else I have seen, and the documentation is horrible. However they provide a wonderfull xwindows utility that helps set them up... the problem Im having is figuring out what to run. When I try to run /sys/block/md0/md/sync_action under a prompt it shoots out a permission denied even though I am SU or logged in under Root. Very annoying. You mention Check vrs Repair... which brings me too my last issue on setting up this machine. How do you send an email when Check, SMART, and when a RAID drive fails? How do you auto repair if the Check fails? These are the last things I need to do for my Linux Server to work right... after I get all of this done, I will change the boot to goto the command prompt and not XWindows, and I will leave it in the corner of my room hopefully not to be used for as long as possible. - Original Message From: Bill Davidsen [EMAIL PROTECTED] To: Daniel Korstad [EMAIL PROTECTED] Cc: Michael [EMAIL PROTECTED]; linux-raid@vger.kernel.org Sent: Wednesday, July 11, 2007 10:21:42 AM Subject: Re: Software based SATA RAID-5 expandable arrays? Daniel Korstad wrote: You have lots of options. This will be a lengthy response and will give just some ideas for just some of the options... Just a few thoughts below interspersed with your comments. For my server, I had started out with a single drive. I later migrated to migrate to a RAID 1 mirror (after having to deal with reinstalls after drive failures I wised up). Since I already had an OS that I wanted to keep, my RAID-1 setup was a bit more involved. I following this migration to get me there; http://wiki.clug.org.za/wiki/RAID-1_in_a_hurry_with_grub_and_mdadm Since you are starting from scratch, it should be easier for you. Most distros will have an installer that will guide you though the process. When you get to hard drive partitioning, look for an advance option or review and modify partition layout option or something similar otherwise it might just make a guess of what you want and that would not be RAID. In this advance partition setup, you will be able to create your RAID. First you make equal size partitions on both physical drives. For example, first carve out 100M partition on each of the two physical OS drives, than make a RAID 1 md0 with each of this partitions and than make this your /boot. Do this again for other partitions you want to have RAIDed. You can do this for /boot, /var, /home, /tmp, /usr. This is can be nice to have a separations incase a user fills /home/foo with crap and this will not effect other parts of the OS, or if mail spool fills up, it will not hang the OS. Only problem it determining how big to make them during the install. At a minimum, I would do three partitions; /boot, swap, and / This means all the others (/var, /home, /tmp, /usr) are in the / partition but this way you don't have to worry about sizing them all correctly. For the simplest setup, I would do RAID 1 for
[GIT PULL] ioat fixes, raid5 acceleration, and the async_tx api
Linus, please pull from git://lost.foo-projects.org/~dwillia2/git/iop ioat-md-accel-for-linus to receive: 1/ I/OAT performance tweaks and simple fixups. These patches have been in -mm for a few kernel releases as git-ioat.patch 2/ RAID5 acceleration and the async_tx api. These patches have also been in -mm for a few kernel releases as git-md-accel.patch. In addition, they have received field testing as a part of the -iop kernel released via SourceForge[1] since 2.6.18-rc6. The raid acceleration work can further be subdivided into three logical areas: - API - The async_tx api provides methods for describing a chain of asynchronous bulk memory transfers/transforms with support for inter-transactional dependencies. It is implemented as a dmaengine client that smooths over the details of different hardware offload engine implementations. Code that is written to the api can optimize for asynchronous operation and the api will fit the chain of operations to the available offload resources. - Implementation - When the raid acceleration work was proposed, Neil laid out the following attack plan: 1/ move the xor and copy operations outside spin_lock(sh-lock) 2/ find/implement an asynchronous offload api The raid5_run_ops routine uses the asynchronous offload api (async_tx) and the stripe_operations member of a stripe_head to carry out xor and copy operations asynchronously, outside the lock. - Driver - The Intel(R) Xscale IOP series of I/O processors integrate an Xscale core with raid acceleration engines. The iop-adma driver supports the copy and xor capabilities of the 3 IOP architectures iop32x, iop33x, and iop34x. All the MD changes have been acked-by Neil Brown. For the changes made to net/ I have received David Miller's acked-by. Shannon Nelson has tested the I/OAT changes (due to async_tx support) in his environment and has added his signed-off-by. Herbert Xu has agreed to let the async_tx api be housed under crypto/ with the intent to coordinate efforts as support for transforms like crc32c and raid6-p+q are developed. To be clear Shannon Nelson is the I/OAT maintainer, but we agreed that I should coordinate this release to simplify the merge process. Going forward I will be the iop-adma maintainer. For the common bits, dmaengine core and the async_tx api, Shannon and I will coordinate as co-maintainers. - Credits - I cannot thank Neil Brown enough for his advice and patience as this code was developed. Jeff Garzik is credited with helping the dmaengine core and async_tx become sane apis. You are credited with the general premise that users of an asynchronous offload engine api should not know or care if an operation is carried out asynchronously or synchronously in software. Andrew Morton is credited with corralling these conflicting git trees in -mm and more importantly imparting encouragement at OLS 2006. Per Andrew's request the md-accel changelogs were fleshed out and the patch set was posted for a final review a few weeks ago[2]. To my knowledge there are no pending review items. This tree is based on 2.6.22. Thank you, Dan [1] http://sourceforge.net/projects/xscaleiop [2] http://marc.info/?l=linux-raidw=2r=1s=md-accelq=b Andrew Morton (1): I/OAT: warning fix Chris Leech (5): ioatdma: Push pending transactions to hardware more frequently ioatdma: Remove the wrappers around read(bwl)/write(bwl) in ioatdma ioatdma: Remove the use of writeq from the ioatdma driver I/OAT: Add documentation for the tcp_dma_copybreak sysctl I/OAT: Only offload copies for TCP when there will be a context switch Dan Aloni (1): I/OAT: fix I/OAT for kexec Dan Williams (20): dmaengine: refactor dmaengine around dma_async_tx_descriptor dmaengine: make clients responsible for managing channels xor: make 'xor_blocks' a library routine for use with async_tx async_tx: add the async_tx api raid5: refactor handle_stripe5 and handle_stripe6 (v3) raid5: replace custom debug PRINTKs with standard pr_debug md: raid5_run_ops - run stripe operations outside sh-lock md: common infrastructure for running operations with raid5_run_ops md: handle_stripe5 - add request/completion logic for async write ops md: handle_stripe5 - add request/completion logic for async compute ops md: handle_stripe5 - add request/completion logic for async check ops md: handle_stripe5 - add request/completion logic for async read ops md: handle_stripe5 - add request/completion logic for async expand ops md: handle_stripe5 - request io processing in raid5_run_ops md: remove raid5 compute_block and compute_parity5 dmaengine: driver for the iop32x, iop33x, and iop13xx raid engines iop13xx: surface the iop13xx adma units to the iop-adma driver iop3xx: surface the iop3xx DMA and AAU units to the iop-adma driver ARM: Add drivers/dma to arch/arm/Kconfig
Re: Software based SATA RAID-5 expandable arrays?
Michael wrote: RESPONSE I had everything working, but it is evident that when I installed SuSe the first time check and repair where not included in the package:( I did not use the I used , as was incorrectly stated in many documentations I set up. Doesn't matter, either will work and most people just use The thing that made me suspect check and repair wasn't part of sues was the failure of check or repair typed at the command prompt to respond in any kind other then a response that stated their was no command. In addition man check and man repair was also missing. One more time, check and repair are not commands, they are character strings! You are using the echo command to write those strings into the control interface in the sysfs area. If you type exactly what people have sent you that will work. BROKEN! I did an auto update of the SuSe machine, which ended up replacing the kernel. They added the new entries to the boot choices but the mount information was not transfered. SuSe also deleted the original kernel boot setup. When suse looked at the drives individually they found that none of them was recognizable. Therefor when I woke up this morning and rebooted the machine after the update, I received the errors and then dumps me to a basic prompt with limited ability to do anything. I know I need to manually remount the drives, but its going to be a challenge since I did not do this in the past. The answer to this question is that I either have to change distro's (which I am tempted to do) or fix the current distro. Please do not bother providing any solutions for I simply have to RTFM (which I haven't had time to do). I think I am going to reset up my machines. The first two drives with identical boot partitions, yet not mirror them. I can then manually run a tree copy that would update my second drive as I grow the system, and after successfull and needed updates. This would then allow me a fall back after any updates, and with simply swapping SATA drive cables from the first boot drive too the second. I am assuming this will work. I then can RAID-6 (or 5) in the setup, recopy my files (yes I haven't deleted them because I am not confident in my ability with Linux yet.). Hopefully I will just simply remount these 4 drives because there a simple raid 5 array. SUSE's COMPLETE FAILURES This frustration with SuSe, the lack of a simple reliable update utility and the failures I experience has discouraged me from using SuSe at all. Its got some amazing tools that help me from constantly looking up documentation, posting to forums, or going to IRC, but the unreliable upgrade process is a deal breaker for me. Its simply to much work to manually update everything. This project had a simple goal, which was to provide an easy and cheap solution to an unlimited NAS service. SUPPORT In addition, SuSe's IRC help channel is among the worst I have encountered. The level of support is often very good, but the level of harassment, flames and simple childish behavior overcomes almost any attempt at providing any level of support. I have no problem giving back to the community when I learn enough to do so, but I will not be mocked for my inability to understand a new and very in depth system. In fact, I tend to goto the wonderful gentoo irc for my answers. The IRC is amazing, the people patient and encouraging, the level of knowledge is the best I have experienced. This resource, outside the original incident, has been an amazing resource. I feel highly confident asking questions about RAID here, because I know you guys are actually RUNNING systems that I am attempting to do. - Original Message From: Daniel Korstad [EMAIL PROTECTED] To: big.green.jelly.bean [EMAIL PROTECTED] Cc: davidsen [EMAIL PROTECTED]; linux-raid linux-raid@vger.kernel.org Sent: Friday, July 13, 2007 11:22:45 AM Subject: RE: Software based SATA RAID-5 expandable arrays? To run it manually; echo check /sys/block/md0/md/sync_action than you can check the status with; cat /proc/mdstat Or to continually watch it, if you want (kind of boring though :) ) watch cat /proc/mdstat This will refresh ever 2sec. In my original email I suggested to use a crontab so you don't need to remember to do this every once in a while. Run (I did this in root); crontab -e This will allow you to edit you crontab. Now past this command in there; 30 2 * * Mon echo check /sys/block/md0/md/sync_action If you want you can add comments, I like to comment my stuff since I have lots of stuff in mine, just make sure you have '#' in the front of the lines so your system knows it is just a comment and not a command it should run; #check for bad blocks once a week (every Mon at 2:30am) #if bad blocks are found, they are corrected from parity information After you have put this in your crontab, write and quit with this command; :wq It should come back with this; [EMAIL PROTECTED] ~]# crontab -e
RE: Software based SATA RAID-5 expandable arrays?
I can't speak for SuSe issues but I believe there is some confusion on the packages and command syntax. So hang on, we are going for a ride, step by step... Check and repair are not packages per say. You should have a package called echo. If you run this; echo 1 Should get a 1 echoed back at you. For example; [EMAIL PROTECTED] echo 1 1 Or anything else you want; [EMAIL PROTECTED] echo check check Now all we are doing with this is redirecting with the to another location, /sys/block/md0/md/sync_action The difference between a double and a single is the will append it to the end and the single will replace the contents of the file with the value. For example; I will create a file called foo; [EMAIL PROTECTED] tmp]# vi foo In this file I add two lines of text, foo, than I will write and quit :wq Now I will take a look at the file I just made with my vi editor... [EMAIL PROTECTED] tmp]# cat foo foo foo Great, now I run my echo command to send another value to it. First I use the double to just append; [EMAIL PROTECTED] tmp]# echo foo2 foo Now I take another look at the file; [EMAIL PROTECTED] tmp]# cat foo foo foo foo2 So, I have my first two text lines the third line foo2 appended. Now I do this again but use just the single to replace the file with a value. [EMAIL PROTECTED] tmp]# echo foo3 foo Than I look at it again; [EMAIL PROTECTED] tmp]# cat foo foo3 Ahh, all the other lines are gone and now I just have foo3. So, replaces and appends. How does this affect your /sys/block/md0/md/sync_action file? As it turns out, it does not matter. Think of the proc and sys (/proc and /sys) as psuedo file system is a real time, memory resident file system that tracks the processes running on your machine and the state of your system. So first lets go to /sys/block/ Than I will list its contents; [EMAIL PROTECTED] ~]# cd /sys/block/ [EMAIL PROTECTED] block]# ls dm-0 dm-3 hda md1 ram0 ram11 ram14 ram3 ram6 ram9 sdc sdf sdi dm-1 dm-4 hdc md2 ram1 ram12 ram15 ram4 ram7 sda sdd sdg dm-2 dm-5 md0 md3 ram10 ram13 ram2 ram5 ram8 sdb sde sdh This will be different for you since your system will have different hardware and settings, again a pseudo file system. The dm stuff are my logical volumes and you might have more or less sata drives, the sda, sdb, ... these were created when I boot the system. If I add another sata drive, another sdj will be created automatically for me. So depending on how many raid devices you have (I have four, /boot, swa, /, and my RAID6 data, (md0, md1, md2, md3)) they are listed here too. So lets go into one, my swap RAID, md1, is small so let go to that one and test this out; [EMAIL PROTECTED] md1]# ls dev holders md range removable size slaves stat uevent Lets go deeper, [EMAIL PROTECTED] md1]# cd /sys/block/md1/md/ [EMAIL PROTECTED] md]# ls chunk_size dev-hdc1 mismatch_cnt rd0 suspend_lo sync_speed component_size level new_dev rd1 sync_action sync_speed_max dev-hda1metadata_version raid_diskssuspend_hi sync_completed sync_speed_min Now lets look at sync_action; [EMAIL PROTECTED] md]# cat sync_action idle That is the pseudo file the represents the current state of my RAID md1. So lets run that echo command and than lets check the state of the RAID; [EMAIL PROTECTED] md]# echo check sync_action [EMAIL PROTECTED] md]# cat /proc/mdstat Personalities : [raid1] [raid6] md1 : active raid1 hdc1[1] hda1[0] 104320 blocks [2/2] [UU] [] resync = 62.7% (65664/104320) finish=0.0min speed=65664K/sec So it is in resync state and if there are bad blocks they will be correct from parity. Now once it is done, lets check that sync_action file again. [EMAIL PROTECTED] md]# cat sync_action idle Now remember we used the single redirect, so we replace the value with the text of check with our echo command. Once it was done with the resync, my system changed the value back to idle. What about the double well they append to the file but it will have the over all same effect... [EMAIL PROTECTED] md]# echo check sync_action [EMAIL PROTECTED] md]# cat /proc/mdstat Personalities : [raid1] [raid6] md1 : active raid1 hdc1[1] hda1[0] 104320 blocks [2/2] [UU] [=...] resync = 49.0% (52096/104320) finish=0.0min speed=52096K/sec When it is done the value goes back to idle; [EMAIL PROTECTED] md]# cat sync_action idle So, or does not matter here. And the command you need is echo. Manipulating the pseudo files in /proc are similar. Say for example, for security, I don't want my box to respond to pings (1 is for true and 0 is for false), echo 0 /proc/sys/net/ipv4/icmp_echo_ignore_all In this case, you want the single because you want to replace the current value to 1 and not the for append. Also another pseudo file for turning you linux box into a
Re: mdadm create to existing raid5
The mdadm --create with missing instead of a drive is a good idea. Do you actually say missing or just leave out a drive? However doesn't it do a sync everytime you create? So wouldn't you run the risk of corrupting another drive each time? Or does it not sync because of the saying missing? To bad I am intent on learning things the hard way. /etc/mdadm.conf from before I recreated ARRAY /dev/md2 level=raid5 num-devices=4 spares=1 UUID=4f935928:2b7a1633:71d575d6:dab4d6bc /etc/mdadm.conf after I recreated ARRAY /dev/md1 level=raid5 num-devices=4 UUID=81bdd737:901c0a8f:af38cb94:41c4e3da Well before I heard back from you guys . I noticed this problem and in my fountain of infinite wisdom I did mdadm --zero-superblock to all my raid drives and created them again thinking if I got it to look the same it woud just fix it. Well they do look the same now, I am at work or I would give you the new mdadm.conf. I really need to learn patients :( David Greaves wrote: David Greaves wrote: For a simple 4 device array I there are 24 permutations - doable by hand, if you have 5 devices then it's 120, 6 is 720 - getting tricky ;) Oh, wait, for 4 devices there are 24 permutations - and you need to do it 4 times, substituting 'missing' for each device - so 96 trials. 4320 trials for a 6 device array. Hmm. I've got a 7 device raid 6 - I think I'll go an make a note of how it's put together... grin Have a look at this section and the linked script. I can't test it until later http://linux-raid.osdl.org/index.php/RAID_Recovery http://linux-raid.osdl.org/index.php/Permute_array.pl David - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 3ware 9650 tips
Wouldn't Raid 6 be slower than Raid 5 because of the extra fault tolerance? http://www.enterprisenetworksandservers.com/monthly/art.php?1754 - 20% drop according to this article His 500GB WD drives are 7200RPM compared to the Raptors 10K. So his numbers will be slower. Justin what file system do you have running on the Raptors? I think thats an interesting point made by Joshua. Justin Piszcz wrote: On Fri, 13 Jul 2007, Joshua Baker-LePain wrote: My new system has a 3ware 9650SE-24M8 controller hooked to 24 500GB WD drives. The controller is set up as a RAID6 w/ a hot spare. OS is CentOS 5 x86_64. It's all running on a couple of Xeon 5130s on a Supermicro X7DBE motherboard w/ 4GB of RAM. Trying to stick with a supported config as much as possible, I need to run ext3. As per usual, though, initial ext3 numbers are less than impressive. Using bonnie++ to get a baseline, I get (after doing 'blockdev --setra 65536' on the device): Write: 136MB/s Read: 384MB/s Proving it's not the hardware, with XFS the numbers look like: Write: 333MB/s Read: 465MB/s How many folks are using these? Any tuning tips? Thanks. -- Joshua Baker-LePain Department of Biomedical Engineering Duke University Let's try that again with the right address :) You are using HW RAID then? Those numbers seem pretty awful for that setup, including linux-raid@ even it though it appears you're running HW raid, this is rather peculiar. To give you an example I get 464MB/s write and 627MB/s with a 10 disk raptor software raid5. Justin. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 3ware 9650 tips
On Fri, 13 Jul 2007, Joshua Baker-LePain wrote: My new system has a 3ware 9650SE-24M8 controller hooked to 24 500GB WD drives. The controller is set up as a RAID6 w/ a hot spare. OS is CentOS 5 x86_64. It's all running on a couple of Xeon 5130s on a Supermicro X7DBE motherboard w/ 4GB of RAM. Trying to stick with a supported config as much as possible, I need to run ext3. As per usual, though, initial ext3 numbers are less than impressive. Using bonnie++ to get a baseline, I get (after doing 'blockdev --setra 65536' on the device): Write: 136MB/s Read: 384MB/s Proving it's not the hardware, with XFS the numbers look like: Write: 333MB/s Read: 465MB/s How many folks are using these? Any tuning tips? Thanks. -- Joshua Baker-LePain Department of Biomedical Engineering Duke University Let's try that again with the right address :) You are using HW RAID then? Those numbers seem pretty awful for that setup, including linux-raid@ even it though it appears you're running HW raid, this is rather peculiar. To give you an example I get 464MB/s write and 627MB/s with a 10 disk raptor software raid5. Justin. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re-building an array
Hi List, I am very new to raid, and I am having a problem. I made a raid10 array, but I only used 2 disks. Since then, one failed, and my system crashes with a kernel panic. I copied all the data, and I would like to start over. How can I start from scratch? I need to get rid of my /dev/md0, fully test the discs, and build them over again as raid1 ? Thanks! Rick - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 3ware 9650 tips
On Fri, 13 Jul 2007 at 2:35pm, Justin Piszcz wrote On Fri, 13 Jul 2007, Joshua Baker-LePain wrote: My new system has a 3ware 9650SE-24M8 controller hooked to 24 500GB WD drives. The controller is set up as a RAID6 w/ a hot spare. OS is CentOS 5 x86_64. It's all running on a couple of Xeon 5130s on a Supermicro X7DBE motherboard w/ 4GB of RAM. Trying to stick with a supported config as much as possible, I need to run ext3. As per usual, though, initial ext3 numbers are less than impressive. Using bonnie++ to get a baseline, I get (after doing 'blockdev --setra 65536' on the device): Write: 136MB/s Read: 384MB/s Proving it's not the hardware, with XFS the numbers look like: Write: 333MB/s Read: 465MB/s How many folks are using these? Any tuning tips? Thanks. You are using HW RAID then? Those numbers seem pretty awful for that setup, including linux-raid@ even it though it appears you're running HW raid, this is rather peculiar. Yep, hardware RAID -- I need the hot swappability (which, AFAIK, is still an issue with md). -- Joshua Baker-LePain Department of Biomedical Engineering Duke University - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re-building an array
On Fri, 13 Jul 2007, mail wrote: Hi List, I am very new to raid, and I am having a problem. I made a raid10 array, but I only used 2 disks. Since then, one failed, and my system crashes with a kernel panic. I copied all the data, and I would like to start over. How can I start from scratch? I need to get rid of my /dev/md0, fully test the discs, and build them over again as raid1 ? Thanks! Rick - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html man mdadm, check --zero-superblock option - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[-mm PATCH 1/2] raid5: add the stripe_queue object for tracking raid io requests (take2)
The raid5 stripe cache object, struct stripe_head, serves two purposes: 1/ frontend: queuing incoming requests 2/ backend: transitioning requests through the cache state machine to the backing devices The problem with this model is that queuing decisions are directly tied to cache availability. There is no facility to determine that a request or group of requests 'deserves' usage of the cache and disks at any given time. This patch separates the object members needed for queuing from the object members used for caching. The stripe_queue object takes over the incoming bio lists as well as the buffer state flags. The following fields are moved from struct stripe_head to struct stripe_queue: raid5_private_data *raid_conf int pd_idx spinlock_t lock int bm_seq The following fields are moved from struct r5dev to struct r5_queue_dev: sector_t sector struct bio *toread, *towrite This patch lays the groundwork, but does not implement, the facility to have more queue objects in the system than available stripes, currently this remains a 1:1 relationship. In other words, this patch just moves fields around and does not implement new logic. --- Performance Data --- Unit information File size = megabytes Blk Size = bytes Num Thr = number of threads Avg Rate = relative throughput CPU% = relative percentage of CPU used during the test CPU Eff = Rate divided by CPU% - relative throughput per cpu load Configuration = Platform: 1200Mhz iop348 with 4-disk sata_vsc array mdadm --create /dev/md0 /dev/sd[abcd] -n 4 -l 5 mkfs.ext2 /dev/md0 mount /dev/md0 /mnt/raid tiobench --size 2048 --numruns 5 --block 4096 --block 131072 --dir /mnt/raid Sequential Reads FileBlk Num Avg Maximum CPU Identifier SizeSizeThr Rate(CPU%) Eff --- -- - --- -- -- - 2.6.22-iop1 204840961 -1% 2% -3% 2.6.22-iop1 204840962 -37%-34%-5% 2.6.22-iop1 204840964 -22%-19%-3% 2.6.22-iop1 204840968 -3% -3% -1% 2.6.22-iop1 204813107 1 1% -1% 2% 2.6.22-iop1 204813107 2 -11%-11%-1% 2.6.22-iop1 204813107 4 25% 20% 4% 2.6.22-iop1 204813107 8 8% 6% 2% Sequential Writes FileBlk Num Avg Maximum CPU Identifier SizeSizeThr Rate(CPU%) Eff --- -- - --- -- -- - 2.6.22-iop1 204840961 26% 29% -2% 2.6.22-iop1 204840962 40% 43% -2% 2.6.22-iop1 204840964 24% 7% 16% 2.6.22-iop1 204840968 6% -11%19% 2.6.22-iop1 204813107 1 66% 65% 0% 2.6.22-iop1 204813107 2 41% 33% 6% 2.6.22-iop1 204813107 4 23% -8% 34% 2.6.22-iop1 204813107 8 13% -24%49% The read numbers in this take have approved from a %14 average decline to a %5 average decline. However it is still a mystery as to why any significant variance is showing up because most reads should completely bypass the stripe_cache. New for take3 is blktrace data for a component disk while running the following: for i in `seq 1 5`; do dd if=/dev/zero of=/dev/md0 bs=1024k count=1024; done Pre-patch: CPU0 (sda): Reads Queued:7965,31860KiB Writes Queued: 437458, 1749MiB Read Dispatches: 881,31860KiB Write Dispatches:26405, 1749MiB Reads Requeued: 0 Writes Requeued: 0 Reads Completed: 881,31860KiB Writes Completed:26415, 1749MiB Read Merges: 6955,27820KiB Write Merges: 411007, 1644MiB Read depth: 2 Write depth: 2 IO unplugs: 176 Timer unplugs: 176 Post-patch: CPU0 (sda): Reads Queued: 36255, 145020KiB Writes Queued: 437727, 1750MiB Read Dispatches: 1960, 145020KiB Write Dispatches: 6672, 1750MiB Reads Requeued: 0 Writes Requeued: 0 Reads Completed: 1960, 145020KiB Writes Completed: 6682, 1750MiB Read Merges:34235, 136940KiB Write Merges: 430409, 1721MiB Read depth: 2 Write depth: 2 IO unplugs: 423 Timer unplugs: 423 It looks like the performance win is coming from improved merging and not from reduced reads as previously assumed. Note that with blktrace enabled the throughput comes in at ~98MB/s compared to ~120MB/s without. Pre-patch throughput hovers at ~85MB/s for this dd command. Changes in take2: * leave the flags with the buffers, prevents a data corruption issue whereby stale buffer state
Re: [-mm PATCH 0/2] 74% decrease in dispatched writes, stripe-queue take3
On Fri, 13 Jul 2007 15:35:42 -0700 Dan Williams [EMAIL PROTECTED] wrote: The following patches replace the stripe-queue patches currently in -mm. I have a little practical problem here: am presently unable to compile anything much due to all the git rejects coming out of git-md-accel.patch. It'd be appreciated if you could keep on top of that, please. It's a common problem at this time of the kernel cycle. The quilt trees are much worse - Greg's stuff is an unholy mess. Ho hum. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [-mm PATCH 0/2] 74% decrease in dispatched writes, stripe-queue take3
-Original Message- From: Andrew Morton [mailto:[EMAIL PROTECTED] The following patches replace the stripe-queue patches currently in -mm. I have a little practical problem here: am presently unable to compile anything much due to all the git rejects coming out of git-md-accel.patch. It'd be appreciated if you could keep on top of that, please. It's a common problem at this time of the kernel cycle. The quilt trees are much worse - Greg's stuff is an unholy mess. Ho hum. Sorry, please drop git-md-accel.patch and git-ioat.patch as they have been merged into Linus' tree. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [-mm PATCH 0/2] 74% decrease in dispatched writes, stripe-queue take3
On Fri, 13 Jul 2007 15:57:26 -0700 Williams, Dan J [EMAIL PROTECTED] wrote: -Original Message- From: Andrew Morton [mailto:[EMAIL PROTECTED] The following patches replace the stripe-queue patches currently in -mm. I have a little practical problem here: am presently unable to compile anything much due to all the git rejects coming out of git-md-accel.patch. It'd be appreciated if you could keep on top of that, please. It's a common problem at this time of the kernel cycle. The quilt trees are much worse - Greg's stuff is an unholy mess. Ho hum. Sorry, please drop git-md-accel.patch and git-ioat.patch as they have been merged into Linus' tree. But your ongoing maintenance activity will continue to be held in those trees, won't it? - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [-mm PATCH 0/2] 74% decrease in dispatched writes, stripe-queue take3
-Original Message- From: Andrew Morton [mailto:[EMAIL PROTECTED] But your ongoing maintenance activity will continue to be held in those trees, won't it? For now: git://lost.foo-projects.org/~dwillia2/git/iop ioat-md-accel-for-linus is where the latest combined tree is located. However, Shannon Nelson is coming online to own the i/oat driver so we may need to revisit this situation. We want to avoid the git-ioat/git-md-accel collisions that happened in the past. I will talk with Shannon about how we will coordinate this going forward. The code ownership looks like this: ioat dma driver - Shannon net dma offload implementation - Shannon dmaengine core - shared async_tx api - shared iop-adma dma driver - Dan md-accel implementation - Dan - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 3ware 9650 tips
Joshua Baker-LePain wrote: [] Yep, hardware RAID -- I need the hot swappability (which, AFAIK, is still an issue with md). Just out of curiocity - what do you mean by swappability ? For many years we're using linux software raid, we had no problems with swappability of the component drives (in case of drive failures and what not). With non-hotswappable drives (old scsi and ide ones), rebooting is needed for the system to recognize the drives. For modern sas/sata drives, i can replace a faulty drive without anyone noticing... Maybe you're referring to something else? Thanks. /mjt - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [-mm PATCH 0/2] 74% decrease in dispatched writes, stripe-queue take3
On Fri, 13 Jul 2007 16:28:30 -0700 Williams, Dan J [EMAIL PROTECTED] wrote: -Original Message- From: Andrew Morton [mailto:[EMAIL PROTECTED] But your ongoing maintenance activity will continue to be held in those trees, won't it? For now: git://lost.foo-projects.org/~dwillia2/git/iop ioat-md-accel-for-linus is where the latest combined tree is located. However, Shannon Nelson is coming online to own the i/oat driver so we may need to revisit this situation. We want to avoid the git-ioat/git-md-accel collisions that happened in the past. I will talk with Shannon about how we will coordinate this going forward. The code ownership looks like this: ioat dma driver - Shannon net dma offload implementation - Shannon dmaengine core - shared async_tx api - shared iop-adma dma driver - Dan md-accel implementation - Dan oh my, how scary. I'll go into hiding until the dust has settled. Please send me the git URLs when it's all set up, thanks. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid array is not automatically detected.
On Fri, 2007-07-13 at 15:36 -0500, Bryan Christ wrote: My apologies if this is not the right place to ask this question. Hopefully it is. I created a RAID5 array with: mdadm --create /dev/md0 --level=5 --raid-devices=5 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 mdadm -D /dev/md0 verifies the devices has a persistent super-block, but upon reboot, /dev/md0 does not get automatically assembled (an hence is not a installable/bootable device). I have created several raid1 arrays and one raid5 array this way and have never had this problem. In all fairness, this is the first time I have used mdadm for the job. Usually, I boot to something like SysRescueCD, used raidtools to create my array and then reboot with my Slackware install CD. Anyone know why this might be happening? Are you trying to boot on this raid device? I believe there is a limitation as what raid type you can boot off of (IIRC. only raid0 and raid1). -- Zivago Lee [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 3ware 9650 tips
--- Justin Piszcz [EMAIL PROTECTED] wrote: To give you an example I get 464MB/s write and 627MB/s with a 10 disk raptor software raid5. Is that with the 9650? Andrew Fussy? Opinionated? Impossible to please? Perfect. Join Yahoo!'s user panel and lay it on us. http://surveylink.yahoo.com/gmrs/yahoo_panel_invite.asp?a=7 - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid array is not automatically detected.
I would like for it to be the boot device. I have setup a raid5 mdraid array before and it was automatically accessible as /dev/md0 after every reboot. In this peculiar case, I am having to assemble the array manually before I can access it... mdadm --assemble /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 Unless I do the above, I cannot access /dev/md0. I've never had this happen before. Usually a cursory glance through dmesg will show that the array was detected, but not so in this case. Zivago Lee wrote: On Fri, 2007-07-13 at 15:36 -0500, Bryan Christ wrote: My apologies if this is not the right place to ask this question. Hopefully it is. I created a RAID5 array with: mdadm --create /dev/md0 --level=5 --raid-devices=5 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 mdadm -D /dev/md0 verifies the devices has a persistent super-block, but upon reboot, /dev/md0 does not get automatically assembled (an hence is not a installable/bootable device). I have created several raid1 arrays and one raid5 array this way and have never had this problem. In all fairness, this is the first time I have used mdadm for the job. Usually, I boot to something like SysRescueCD, used raidtools to create my array and then reboot with my Slackware install CD. Anyone know why this might be happening? Are you trying to boot on this raid device? I believe there is a limitation as what raid type you can boot off of (IIRC. only raid0 and raid1). - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html