Re: Linux RAID migration
On 8/7/07, saeed bishara [EMAIL PROTECTED] wrote: Hi, I'm looking for a method for doing RAID migration while keeping the data available. the migrations I'm interested with are: 1. Single drive -RAID1/RAID5 2. RAID1 - RAID5. 1. is a bit complicated, as a raid device on a disk is slightly smaller than the original device. you might need to copy data manually. 2. should be simple as (offline) re-creating the raid as raid5. - can I really assume that RAID 5 on 2 hdds (degraded mode) will function as raid 5? You should test by using loopback devices and files. But why degraded? raid5 of two disks should look like raid1. - how to build raid while keeping the contents of an existing drive available? Normally the array is available when building/rebuilding. - tuomas - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raid 1 recovery steps
On 9/24/06, chapman [EMAIL PROTECTED] wrote: Can I assume the disk is ok, just needs to be re-added to the array? Not necessarily. You should look for logs indicating _why_ it is marked bad. If you don't know how long it's been broken, you need some monitoring system, like mdadm or logcheck. This is most commonly caused by bad sectors. If you can re-add and resync goes through without complaining, you're probably ok. I'm assuming I need to first remove sda1 from the raid then re-add it, correct? If so, what are the specific steps? mdadm /dev/md0 -r /dev/sda1 mdadm /dev/md0 -a /dev/sda1 Can this be done safely on a live server without pulling the system down? sure. How will this affect rebooting once completed - if at all? Should not have any impact. At least if boot setup is ok. Any gotchas I should look out for? Things get tricky if the other disk has stealthily become bad. - tuomas - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID5 Problem - $1000 reward for help
On 9/15/06, Reza Naima [EMAIL PROTECTED] wrote: I've picked up two 500G disks, and am in the process of dd'ing the contents of the raid partitions over. The 2nd failed disk came up just fine, and has been coping the data over without fault. I expect it to finish, but thought I would send this email now. I will include some data that I captured before the system failed. Note that if there are bad blocks, dd might not be reliable, ddrescue will do a better job of filling the gaps with zeroes (silent data corruption may result). The theory with your problem is to recreate the raid in degraded mode (the replaced drive as missing), something like mdadm --create /dev/md0 --chunk=256 --layout=left-symmetric --raid-devices=4 /dev/hda3 missing /dev/hdf1 /dev/hdg1 after which see if the filesystem mounts and is ok, if so, add the remaining drive back to the array. It's recommended to use a script to scrub the raid device regularly, to detect sleeping bad blocks early. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID5 Problem - $1000 reward for help
On 9/17/06, Ask Bjørn Hansen [EMAIL PROTECTED] wrote: It's recommended to use a script to scrub the raid device regularly, to detect sleeping bad blocks early. What's the best way to do that? dd the full md device to /dev/null? echo check /sys/block/md?/md/sync_action Distros may have cron scripts to do this right. And you need a fairly recent kernel. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: scrub was Re: RAID5 Problem - $1000 reward for help
On 9/17/06, Dexter Filmore [EMAIL PROTECTED] wrote: It's recommended to use a script to scrub the raid device regularly, to detect sleeping bad blocks early. What's the best way to do that? dd the full md device to /dev/null? echo check /sys/block/md?/md/sync_action Distros may have cron scripts to do this right. And you need a fairly recent kernel. Does this test stress the discs a lot, like a resync? How long does it take? Can I use it on a mounted array? yup. long. think resync. yup. It's practically read everything, verify checksum, report bad blocks or inconsistencies. echo repairsync_action causes md to fix redundancy blocks if they're out of sync (but at that point you already have another problem like flakey hardware or so) - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: access *exisiting* array from knoppix
mdadm --assemble /dev/md0 /dev/hda1 /dev/hdb1 # i think, man mdadm Not what I meant: there already exists an array on a file server that was created from the server os, I want to boot that server from knoppix instead and access the array. exactly what --assemble does. looks at disks, finds raid components, assembles an array out of them (meaning, tells the kernel where to find the pieces) and starts it. no? did you try? read the manual? - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: access *exisiting* array from knoppix
On 9/14/06, Dexter Filmore [EMAIL PROTECTED] wrote: How about you read the rest of the thread, wisecracker? sorry. mailreader-excuse/ - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: proactive-raid-disk-replacement
On 9/10/06, Bodo Thiesen [EMAIL PROTECTED] wrote: So, we need a way, to feedback the redundancy from the raid5 to the raid1. snip long explanation Sounds awfully complicated to me. Perhaps this is how it internally works, but my 2 cents go to the option to gracefully remove a device (migrating to a spare without losing redundancy) in the kernel (or mdadm). I'm thinking mdadm /dev/raid-device -a /dev/new-disk mdadm /dev/raid-device --graceful-remove /dev/failing-disk also hopefully a path to do this instead of kicking (multiple) disks when bad blocks occur. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Please help me save my data
On 9/8/06, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: So, what I want to do is: * Mark the synced spare drive as working and in position 1 * Assemble the array without the unsynced spare and check if this provides consistent data * If it didnt, I want to mark the synced spare as working and in position 3, and try the same thing again * When I have it working, I just want to add the unsynced spare and let it sync normally * Then I will create a write-intent bitmap to avoid the dangerously long sync times, and also buy a new USB controller hoping that it will solve my problems You can recreate the raid array with 1 missing disk, like this: mdadm -C /dev/md1 /dev/sdn1 /dev/sdX1 /dev/sdn1 /dev/sdn1 missing The ordering is relevant, raid-disks 0,1,2,3,4 or so. beware, you have to have block size and symmetry correct, so better backup mdadm --examine and --detail output beforehand. This create op causes no sync (no danger data overwrites), as there is still the one drive missing, but raid-superblocks are rewritten. (On a sidenote, i'm uncertain if a bitmap helps in the case of single-device remove-add cycle? I thought it was only for crashes, at least for now..) - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Check/repair on composite RAID
On 9/9/06, Richard Scobie [EMAIL PROTECTED] wrote: If I have a RAID 10, comprising a RAID 0, /dev/md3 made up of RAID1, /dev/md1 and RAID1, /dev/md2 and I do an: echo repair /sys/block/md3/md/sync_action will this run simultaneous repairs on the the underlying RAID 1's, or should seperate repairs be done to md1 and 2? check/repair is pointless on raid0, as there is no redundancy. You should run separate checks (repairs) on the underlying devices. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Feature Request/Suggestion - Drive Linking
This way I could get the replacement in and do the resync without actually having to degrade the array first. snip 2) This sort of brings up a subject I'm getting increasingly paranoid about. It seems to me that if disk 1 develops a unrecoverable error at block 500 and disk 4 develops one at 55,000 I'm going to get a double disk failure as soon as one of the bad blocks is read Here's an alternate description. On first 'unrecoverable' error, the disk is marked as FAILING, which means that a spare is immediately taken into use to replace the failing one. The disk is not kicked, and readable blocks can still be used to rebuild other blocks (from other FAILING disks). The rebuild can be more like a ddrescue type operation, which is probably a lot faster in the case of raid6, and the disk can be automatically kicked after the sync is done. If there is no read access to the FAILING disk, the rebuild will be faster just because seeks are avoided in a busy system. Personally I feel this is a good idea, count my vote in. - Tuomas -- VGER BF report: U 0.505245 - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID6 fallen apart
Possibly safer to recreate with two missing if you aren't sure of the order. That way you can look in the array to see if it looks right, or if you have to try a different order. I'd say it's safer to recreate with all disks, in order to get the resync. Otherwise you risk the all so famous silent data corruption on stripes with writes in-flight at the time of failure. Tuomas -- VGER BF report: U 0.497554 - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID6 fallen apart
On 9/3/06, Tuomas Leikola [EMAIL PROTECTED] wrote: Possibly safer to recreate with two missing if you aren't sure of the order. That way you can look in the array to see if it looks right, or if you have to try a different order. I'd say it's safer to recreate with all disks, in order to get the resync. Otherwise you risk the all so famous silent data corruption on stripes with writes in-flight at the time of failure. Ment to say: after you know the correct order. Sorry. Tuomas -- VGER BF report: H 3.07213e-07 - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Resize on dirty array?
On 8/9/06, James Peverill [EMAIL PROTECTED] wrote: I'll try the force assemble but it sounds like I'm screwed. It sounds like what happened was that two of my drives developed bad sectors in different places that weren't found until I accessed certain areas (in the case of the first failure) and did the drive rebuild (for the second failure). In the future, is there a way to help prevent this? This is a common scenario, and I feel could be helped if md could be told to not drop the disk on first failure, but rather keep it running in FAILING status (as opposed to FAILED), until all data from it has been evacuated (hot spare). This way, if another disk became failing during rebuild, due to another area of the disk, those blocks could be rebuilt using the other failing disk. (Also, this allows for the rebuild to mostly be a ddrescue-style copy operation, rather than parity computation). Do you guys feel this is feasible? Neil? - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: trying to brute-force my RAID 5...
On 7/19/06, Sevrin Robstad [EMAIL PROTECTED] wrote: I tried file -s /dev/md0 also, and with one of the disk as first disk I got ext 3 filedata (needs journal recovery) (errors) . Congratulations, you have found your first disk. Does fsck still complain about the magic number? - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Two-disk RAID5?
No. When one of the 2 drives in your RAID5 dies, and all you have for some blocks is parity info, how will the missing data be reconstructed? You could [I suspect] create a 2 disk RAID5 in degraded mode (3rd member missing), but it'll obviously lack redundancy until you add a 3rd disk, which won't add anything to your RAID5 storage capacity. IMO if you have a 2-disk raid5, the parity for each block is the same as the data. There is performance drop as I suspect md isn't smart enough to read data from both disks, but that's all. When one disk fails, the (lone) parity block is quite enough to reconstruct. With XOR parity, you can always assume any amount of additional disks full of zero, it doesn't really change the algorithm. (maybe mdadm could/can change a raid-1 into raid5 by just changing the superblocks, for the purpose of expanding into more disks..) - tuomas - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Can't mount /dev/md0 after stopping a synchronization
On 4/8/06, Mike Garey [EMAIL PROTECTED] wrote: I have one last question though.. When I update /boot/grub/menu.lst while booted from /dev/md0 with both disks available, does this file get written to the MBR on both disks, or do I have to do this manually? Grub's configuration lives on both mirrors, as it's in the filesystem, not in MBR. At boot time, grub kinda mounts the filesystem and reads the configuration from there. (grub doesn't understand the mirror, but it doesn't need to) - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Real Time Mirroring of a NAS
I'm looking for a way to create a real-time mirror of a NAS. In other words, say I have a 5.5 TB NAS (3ware 16-drive array, RAID-5, 500 GB drives). I want to mirror it in real time to a completely separate 5.5 TB NAS. RSYNCing in the background is not an option. The two NAS boxes need to hold identical data at all times. It is NOT necessary that the data be accessed from both NAS boxes simultaneously. One is simply a backup of the other. I don't have any experience with it, but I've often seen DRBD mentioned for just this sort of situation. I'd look into that. Linux NBD (and md on top) is a simpler solution for the same thing. I'd look into that also :) - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Can't mount /dev/md0 after stopping a synchronization
On 4/5/06, Mike Garey [EMAIL PROTECTED] wrote: I tried booting from /dev/hdc1 (as /dev/md0 in grub) using a 2.6.15 kernel with md and raid1 support built in and this is what I now get: md: autodetecting raid arrays md: autorun ... md: considering hdc1 ... md: adding hdc1 ... md: created md0 md: bind:hdc1 raid1: RAID set md0 active with 1 out of 2 mirrors md: ...autrun done. Warning: unable to open an initial console Input: AT translated set 2 keyboard as /class/input/input0 and then at this point, the system just hangs and nothing happens. So I seem to be getting closer.. If I try booting from a kernel without raid1 and md support, but using an initrd with raid1/md modules, then I get the ALERT! /dev/md0 does not exist. Dropping to a shell! message. I can't understand why there would be any difference between using a kernel with raid1/md support, or using an initrd image with raid1/md support, but apparently there is. If anyone else has any suggestions, please keep them coming. Sounds like your initrd could use a command like mdadm --assemble /dev/md0 /dev/hda1 /dev/hdc1 at some point before mounting the real rootfs. There are many cleaner examples in the list archive, but that should do the trick. It seems like your initrd-kernel doesn't autostart the raid for some reason (config option?). Note, you should never do any read/write access to the component disks after creating the raid. I guess you know this already, but some wording seemed suspect. Can you specify more what is the problem with mounting md0? The log snipped doesn't show any errors about that. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Conflicting Size Numbers -- Bug in mdadm, md, df?
On 3/27/06, andy liebman [EMAIL PROTECTED] wrote: Case 1: When we stripe together TWO RAW 3ware RAID-5 devices (i.e., /dev/sdc + /dev/sdd = /dev/md2), df -h tells us that the device is 11 TB in size. df -k tells us that the device is 10741827072 blocks in size and cat /proc/partitions tells us the md device is 10741958144 blocks in size (a little larger) This is what you lose in creating a file system. df reports available space usable in files, /proc/partitions reports the underlying block device. Case 2: When we create a SINGLE partition on each 3ware device using parted, the partitions /dev/sdb1 and /dev/sdc1 are each reported to be 34 blocks smaller than the RAW 3ware devices mentioned above in Case 1. Partition table+overhead Yet, when we stripe together /dev/sdb1 + /dev/sdc1, we get a Linux md device that is IDENTICAL in size to the Linux md device mentioned above -- 10741958144 blocks. We don't understand why the resulting Linux md device isn't 68 blocks smaller than when we use the raw 3ware device. In the SINGLE partition case, df -h also tells us that the device is 11 TB in size. I'd suspect the reason is RAID with 256k chunk size. The resulting block device is rounded down - and 68 blocks isn't that much. Didn't do the math tho. Case 3: However, when we use mdadm to stripe together the first partition on each device and also to stripe together the second partition on each device (/dev/sdb1 + /dev/sdc1 = /dev/md1 AND /dev/sdb2 + /dev/sdc2 = /dev/md2), df -h reports that the total size of the two Linux RAID-0 arrays is 0.8 TB LESS than when we stripe together the RAW 3ware devices or when we only have ONE partition. That seems curious, however I'd trust the block count in this case. 0.8TB is a lot. And df -k reports that the total block size of the two mdX arrays is 10741694464 blocks, which is 114532 blocks smaller than size reported for the md device when we have NO partitions and 132072 blocks smaller than when we have a SINGLE partition. In addition to chunk size rounding, you also lose some space to md superblock and other stuff (bitmap, etc if you have those). 100k blocks is around what I'd expect. Didn't do the math here either, tho. We are wondering what these discrepencies mean and whether they could lead to filesystem corruption issues? Hope i've shed some light on this. mdadm 1.x isn't the latest. 2.3.x is. Tuomas - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Avoiding resync of RAID1 during creation
On 2/20/06, Bryan Wann [EMAIL PROTECTED] wrote: mdadm --assume-clean What version of mdadm was that from? From mdadm(8) in mdadm-1.11.0-4.fc4 on my systems: cut I tried with --assume-clean, it still wanted to sync. The man page i quoted was from 2.3.1 (6 feb) - relatively new. I tested this with 2 boxes: 1.9.0 starts the resync and 2.3.1 doesn't. Used kernel 2.6.14 - altough I don't expect that to make much of a difference. -tuomas - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: paralellism of device use in md
On 1/19/06, Neil Brown [EMAIL PROTECTED] wrote: The read balancing in raid1 is clunky at best. I've often thought there must be a better way. I've never thought what the better way might be (though I haven't tried very hard). If anyone would like to experiment with the read-balancing code, suggest and test changes, it would be most welcome. An interesting and desperately complex topic, intertwined with IO schedulers in general. I'll follow with my 2 cents. The way I see this, there are two fundamentally different approaches: optimize for throughput or latency. When optimizing for latency, the balancer would always choose a device that can serve a request in the shortest time. This is close to what the current code does, altough it doesn't seem to account for devices' pending request queue lengths. (I'd estimate for a traditional ATA disk, around 2-3 short seek requests is worth 1 long seek, because of spindle latency). I'd assume a fair in-order service for the latency mode. When optimizing for throughput, the balancer would choose a device that will have it's total queue completion time increased the least. This indicates reordering of requests etc. For queue depth of 1, the thoughput balancer would pick the closest available device as long as the devices are idle, and when they are all busy, leave the requests into array-wide queue until one of the devices becomes available, and then dequeue the request the device can serve fastest (or one that's had its deadline exceeded). Both approaches become difficult when taking into account device queues. The throughput balancer, as described, could just estimate how close the new request is to all others already in the device, and pick one that is nearby the other work. The latency scheduler is propably pretty much useless in this scenario, as its definition will change if requests can push each other around. I'd expect it to be useful in the common desktop configuration with no device queues though. One thing i'd like to see is more powerful estimates of request cost for a device. It's possible, if not practical, to profile devices for things like spindle latency and sector locations. If this cost estimation data is correct enough, per-device queues become less important as performance factors. As it is now, one can only hope that requests that are near LBA-wise are near timewise, which is not true for most devices. Yes, i know it's mostly wishful thinking. Measurements would be tricky and would provide complex maps for estimating costs, and (I think) would be virtually impossible to do correctly for anything with device queues. I'd expect that no drives in the market expose this kind of latency estimation data to the controller or OS. I'd also expect that high end storage system vendors use the very same information in their hardware raid implementations to provide better queuing and load balancing. Both the described balancer algorithms can be implemented somewhat easily, and (I'd expect) will work relatively well with common desktop drives. They could be optional (like the IO schedulers currently are), and different cost estimation algorithms could also be optional (and tunable if autotuning is out of question). Unfortunately my kernel hacking skills are too weak for most of this - there needs to be another who's interested enough. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html