Grub vs Lilo
Hi All, Wondering if anyone can comment on an easy way to get grub to update all components in a raid1 array. I have a raid1 /boot with a raid10 /root and have previously used lilo with the raid-extra-boot option to install to boot sectors of all component devices. With grub it appears that you can only update non default devices via the command line. I like the ability to be able to type lilo and have all updated in one hit. Is there a way to do this with grub? Cheers, Lewis - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: host based mirror distance in a fc-based SAN environment
Stefan Majer wrote: Hi, im curious if there are some numbers out up to which distance its possible to mirror (raid1) 2 FC-LUNs. We have 2 datacenters with a effective distance of 11km. The fabrics in one datacenter are connected to the fabrics in the other datacenter with 5 dark fibre both about 11km in distance. I want to set up servers wich mirrors their LUNs across the SAN-boxen in both datacenters. On top of this mirrored LUN i put lvm2. So the question is does anybody have some numbers up to which distance this method works ? No. But have a look at man mdadm in later mdadm: -W, --write-mostly subsequent devices lists in a --build, --create, or --add command will be flagged as 'write-mostly'. This is valid for RAID1 only and means that the 'md' driver will avoid reading from these devices if at all possible. This can be useful if mirroring over a slow link. --write-behind= Specify that write-behind mode should be enabled (valid for RAID1 only). If an argument is specified, it will set the maximum number of outstanding writes allowed. The default value is 256. A write-intent bitmap is required in order to use write-behind mode, and write-behind is only attempted on drives marked as write-mostly. Which suggests that the WAN/LAN latency shouldn't impact you except on failure. HTH David -- - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Grub vs Lilo
[EMAIL PROTECTED] said: Wondering if anyone can comment on an easy way to get grub to update all components in a raid1 array. I have a raid1 /boot with a raid10 /root and have previously used lilo with the raid-extra-boot option to install to boot sectors of all component devices. With grub it appears that you can only update non default devices via the command line. I like the ability to be able to type lilo and have all updated in one hit. Is there a way to do this with grub? assuming your /boot is made of hda1 and hdc1: grub-install /dev/hda1 grub-install /dev/hdc1 Jason - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Grub vs Lilo
Jason Lunz wrote: [EMAIL PROTECTED] said: Wondering if anyone can comment on an easy way to get grub to update all components in a raid1 array. I have a raid1 /boot with a raid10 /root and have previously used lilo with the raid-extra-boot option to install to boot sectors of all component devices. With grub it appears that you can only update non default devices via the command line. I like the ability to be able to type lilo and have all updated in one hit. Is there a way to do this with grub? assuming your /boot is made of hda1 and hdc1: grub-install /dev/hda1 grub-install /dev/hdc1 Don't do that. Because if your hda dies, and you will try to boot off hdc instead (which will be hda in this case), grub will try to read hdc which is gone, and will fail. Most of the time (unless the bootloader is really smart and understands mirroring in full - lilo and grub does not) you want to have THE SAME boot code on both (or more, in case of 3 or 4 disks mirrors) your disks, including bios disk codes. after the above two commands, grub will write code to boot from disk 0x80 to hda, and from disk 0x81 (or 0x82) to hdc. So when your hdc becomes hda, it will not boot. In order to solve this all, you have to write diskmap file and run grub-install twice. Both times, diskmap should list 0x80 for the device to which you're installing grub. I don't remember the syntax of the diskmap file (or even if it's really called 'diskmap'), but assuming hda and hdc notation, I mean the following: echo /dev/hda 0x80 /boot/grub/diskmap grub-install /dev/hda1 echo /dev/hdc 0x80 /boot/grub/diskmap # overwrite it! grub-install /dev/hdc1 The thing with all this my RAID devices works, it is really simple! thing is: for too many people it indeed works, so they think it's good and correct way. But it works up to the actual failure, which, in most setups, isn't tested. But once something failed, umm... Jason, try to remove your hda (pretend it is failed) and boot off hdc to see what I mean ;) (Well yes, rescue disk will help in that case... hopefully. But not RAID, which, when installed properly, will really make disk failure transparent). /mjt - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Grub vs Lilo
Michael Tokarev wrote on 26.07.2006 20:00: The thing with all this my RAID devices works, it is really simple! thing is: for too many people it indeed works, so they think it's good and correct way. But it works up to the actual failure, which, in most setups, isn't tested. But once something failed, umm... Jason, try to remove your hda (pretend it is failed) and boot off hdc to see what I mean ;) (Well yes, rescue disk will help in that case... hopefully. But not RAID, which, when installed properly, will really make disk failure transparent). /mjt Yes Michael, your right. We use a simple RAID1 config with swap and / on three SCSI-disks (2 working, one hot-spare) on SuSE 9.3 systems. We had to use lilo to handle the boot off of any of the two (three) disks. But we had problems over problems until lilo 22.7 came up. With this version of lilo we can pull off any disk in any scenario. The box boots in any case. We were wondering when we asked the groups while in trouble with lilo before 22.7 not having any response. Ok, the RAID-Driver and the kernel worked fine while resyncing the spare in case of a disk failure (thanks to Neil Brown for that). But if a box had to be rebooted with a failed disk the situation became worse. And you have to reboot because hotplug still doesn't work. But nobody seems to care abou or nobody apart of us has these problems ... We tested the setup again and again until we find a stable setup which works in _any_ case. Ok, we're still missing hotpluging (seems to be solved for aic79 in 2.6.17, we're testing). But when we tried to discuss these problems (one half of the raid-devices go offline on that controle where hotplugging occurs) there was no response, too. So we came to the conclusion that everybody is working on RAID but nobody cares about the things around, just as you mentioned, thanks for that. Bernd Rieke - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Grub vs Lilo
Bernd Rieke wrote: Michael Tokarev wrote on 26.07.2006 20:00: . . The thing with all this my RAID devices works, it is really simple! thing is: for too many people it indeed works, so they think it's good and correct way. But it works up to the actual failure, which, in most setups, isn't tested. But once something failed, umm... Jason, try to remove your hda (pretend it is failed) and boot off hdc to see what I mean ;) (Well yes, rescue disk will help in that case... hopefully. But not RAID, which, when installed properly, will really make disk failure transparent). /mjt Yes Michael, your right. We use a simple RAID1 config with swap and / on three SCSI-disks (2 working, one hot-spare) on SuSE 9.3 systems. We had to use lilo to handle the boot off of any of the two (three) disks. But we had problems over problems until lilo 22.7 came up. With this version of lilo we can pull off any disk in any scenario. The box boots in any case. Well, alot of systems here works on root-on-raid1 with lilo-2.2.4 (Debian package), and grub. By works I mean they really works, ie, any disk failure don't prevent the system from working and (re)booting flawlessly (provided the disk is really dead, as opposed to when it is present but fails to read (some) data - in which case the only way is either to remove it physically or to choose another boot device in BIOS. But that's entirely different story, about (non-existed) really smart boot loader I mentioned in my previous email). The trick is to set the system up properly. Simple/obvious way (installing grub to hda1 and hdc1) don't work when you remove hda, but complex way works. More, I'd not let LILO to do more guesswork for me (like raid-extra-boot stuff, or whatever comes with 22.7 - to be honest, I didn't look at it at all, as debian package of 2.2.4 (or 22.4?) works for me just fine). Just write the damn thing into the start of mdN (and let raid code to replicate it to all drives, regardless of how many of them there is), after realizing it's really a partition number X (with offset Y) on a real disk, and use bios code 0x80 for all disk access. That's all. The rest - like ensuring all the (boot) partitions are at the same place on every disk, that disk geometry is the same etc - is my duty, and this duty is done by me accurately - because I want the disks to be interchangeable. We were wondering when we asked the groups while in trouble with lilo before 22.7 not having any response. Ok, the RAID-Driver and the kernel worked fine while resyncing the spare in case of a disk failure (thanks to Neil Brown for that). But if a box had to be rebooted with a failed disk the situation became worse. And you have to reboot because hotplug still doesn't work. But nobody seems to care abou or nobody apart of us has these problems ... Just curious - when/where you asked? [] So we came to the conclusion that everybody is working on RAID but nobody cares about the things around, just as you mentioned, thanks for that. I tend to disagree. My statement above refers to simple advise sometimes given here and elsewhere, do this and that, it worked for me. By users who didn't do their homework, who never tested the stuff, who, sometimes, just has no idea as of HOW to test (it's not an insulting statement hopefully - I don't blame them for their lack of knowlege, it's something which isn't really cheap, after all). Majority of users are of this sort, and they follow each other's advises, again, without testing. HOWTOs written by such users, as well (as someone mentioned to me in private email as a response to my reply). I mean, the existing software works. It really works. The only thing left is to set it up correctly. And please PLEASE don't treat it all as blames to bad users. It's not. I learned this stuff the hard way too. After having unbootable remote machines after a disk failure, when everything seemed to be ok. After screwing up systems using famous linux raid autodetect stuff everyone loves, when, after replacing a failed disk to another, which -- bad me -- was a part of another raid array on another system, and the box choosen to assemble THAT raid array instead of this box's one, and overwritten good disk with data from new disk which was in a testing machine. And so on. That all to say: it's easy to make a mistake, and treating the resulting setup as a good one, until shit start happening. But shit happens very rarely, compared to average system usage, so you may never know at all that your setup is wrong, and ofcourse you will tell how to do things to others... :) /mjt - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: host based mirror distance in a fc-based SAN environment
On Wed, Jul 26, 2006 at 07:58:09AM +0200, Stefan Majer wrote: Hi, im curious if there are some numbers out up to which distance its possible to mirror (raid1) 2 FC-LUNs. We have 2 datacenters with a effective distance of 11km. The fabrics in one datacenter are connected to the fabrics in the other datacenter with 5 dark fibre both about 11km in distance. as you probably already know with LX (1310nm) GBICS and single-mode fiber you can reach up to a theoretical limit of 50Km, and you can double that using 1550 nm lasers (ZX?) I want to set up servers wich mirrors their LUNs across the SAN-boxen in both datacenters. On top of this mirrored LUN i put lvm2. So the question is does anybody have some numbers up to which distance this method works ? the method is independent of the distance, if your FC hardware can do that, then you can. the only thing you should consider (and that is not directly related to distance) is the bandwith you have between the two sites (i mean the number of systems that might be using those 5 fibers) Regards, L. -- Luca Berra -- [EMAIL PROTECTED] Communication Media Services S.r.l. /\ \ / ASCII RIBBON CAMPAIGN XAGAINST HTML MAIL / \ - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] md: new bitmap sysfs interface
On 7/25/06, Paul Clements [EMAIL PROTECTED] wrote: This patch (tested against 2.6.18-rc1-mm1) adds a new sysfs interface that allows the bitmap of an array to be dirtied. The interface is write-only, and is used as follows: echo 1000 /sys/block/md2/md/bitmap (dirty the bit for chunk 1000 [offset 0] in the in-memory and on-disk bitmaps of array md2) echo 1000-2000 /sys/block/md1/md/bitmap (dirty the bits for chunks 1000-2000 in md1's bitmap) This is useful, for example, in cluster environments where you may need to combine two disjoint bitmaps into one (following a server failure, after a secondary server has taken over the array). By combining the bitmaps on the two servers, a full resync can be avoided (This was discussed on the list back on March 18, 2005, [PATCH 1/2] md bitmap bug fixes thread). Hi Paul, I tracked down the thread you referenced and these posts (by you) seems to summarize things well: http://marc.theaimsgroup.com/?l=linux-raidm=16563016418w=2 http://marc.theaimsgroup.com/?l=linux-raidm=17515400864w=2 But for clarity's sake, could you elaborate on the negative implications of not merging the bitmaps on the secondary server? Will the previous primary's dirty blocks get dropped on the floor because the secondary (now the primary) doesn't have awareness of the previous primary's dirty blocks once it activates the raid1? Also, what is the interface one should use to collect dirty bits from the primary's bitmap? This bitmap merge can't happen until the primary's dirty bits can be collected right? Waiting for the failed server to come back to harvest the dirty bits it has seems wrong (why failover at all?); so I must be missing something. please advise, thanks. Mike - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
ICH7R strip size
Howdy. I realize this list is more focused on Linux software RAID than proprietary RAID controllers, but I've been Googling and can't find an answer that I suspect someone on this list probably knows. If anyone can suggest a better forum for this question or someplace where the answer is documented, that would be great. Intel's ICH7R firmware allows creating volumes in various RAID modes where the strip size (which I believe to be a misspelling of stripe size) can be certain multiples of 16KB. IIUC, a RAID stripe is composed of chunks spanning two or more drives. For example, a four disc RAID-5 array with 64KB chunks would yield a 256KB stripe with 192KB of usable storage per stripe. I'm trying to optimize I/O performance for a database with 64KB blocks. I understand that RAID-0 multiplies the likelihood of a failure but that's only a nuisance and not a problem in my situation. My question: What are the chunk and usable storage sizes per stripe for four discs in RAID-0 on an ICH7R configured for a 128KB strip? Thank you very much. -- Jeff Woods [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] md: new bitmap sysfs interface
Mike Snitzer wrote: I tracked down the thread you referenced and these posts (by you) seems to summarize things well: http://marc.theaimsgroup.com/?l=linux-raidm=16563016418w=2 http://marc.theaimsgroup.com/?l=linux-raidm=17515400864w=2 But for clarity's sake, could you elaborate on the negative implications of not merging the bitmaps on the secondary server? Will the previous primary's dirty blocks get dropped on the floor because the secondary (now the primary) doesn't have awareness of the previous primary's dirty blocks once it activates the raid1? Right. At the time of the failover, there were (probably) blocks that were out of sync between the primary and secondary. Now, after you've failed over to the secondary, you've got to overwrite those blocks with data from the secondary in order to make the primary disk consistent again. This requires that either you do a full resync from secondary to primary (if you don't know what differs), or you merge the two bitmaps and resync just that data. Also, what is the interface one should use to collect dirty bits from the primary's bitmap? Whatever you'd like. scp the bitmap file over or collect the ranges into a file and scp that over, or something similar. This bitmap merge can't happen until the primary's dirty bits can be collected right? Waiting for the failed server to come back to Right. So, when the primary fails, you start the array on the secondary with a _clean_ bitmap, and just its local disk component. Now, whatever gets written while the primary is down gets put into the bitmap on the secondary. When the primary comes back up, you take the dirty bits from it and add them into the secondary's bitmap. Then, you insert the primary's disk (via nbd or similar) back into the array, and begin a resync. That's the whole reason for this interface. We have to modify the bitmap while the array is active (modifying the bitmap while the array is down is trivial, and certainly doesn't require sysfs :). harvest the dirty bits it has seems wrong (why failover at all?); so I must be missing something. We fail over immediately. We wait until later to combine the bitmaps and resync the data. Hope that helps. -- Paul - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] md: new bitmap sysfs interface
On 7/26/06, Paul Clements [EMAIL PROTECTED] wrote: Mike Snitzer wrote: I tracked down the thread you referenced and these posts (by you) seems to summarize things well: http://marc.theaimsgroup.com/?l=linux-raidm=16563016418w=2 http://marc.theaimsgroup.com/?l=linux-raidm=17515400864w=2 But for clarity's sake, could you elaborate on the negative implications of not merging the bitmaps on the secondary server? Will the previous primary's dirty blocks get dropped on the floor because the secondary (now the primary) doesn't have awareness of the previous primary's dirty blocks once it activates the raid1? Right. At the time of the failover, there were (probably) blocks that were out of sync between the primary and secondary. Now, after you've failed over to the secondary, you've got to overwrite those blocks with data from the secondary in order to make the primary disk consistent again. This requires that either you do a full resync from secondary to primary (if you don't know what differs), or you merge the two bitmaps and resync just that data. I took more time to read the later posts in the original thread; that coupled with your detailed response has helped a lot. thanks. Also, what is the interface one should use to collect dirty bits from the primary's bitmap? Whatever you'd like. scp the bitmap file over or collect the ranges into a file and scp that over, or something similar. OK, so regardless of whether you are using an external or internal bitmap; how does one collect the ranges from an array's bitmap? Generally speaking I think others would have the same (naive) question given that we need to know what to use as input for the sysfs interface you've kindly provided. If it is left as an exercise to the user that is fine; I'd imagine neilb will get our backs with a nifty new mdadm flag if need be. thanks again, Mike - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ICH7R strip size
My question: What are the chunk and usable storage sizes per stripe for four discs in RAID-0 on an ICH7R configured for a 128KB strip? raid0 always has 100% usable; configuring it is deciding how much concurrency you want. if your writes are 64K and you have 4 disks, your max concurrency would be at 64K per disk (stripe size in MD), or 256K for a whole stripe. if you want max bandwidth, but concurrency of 1, you want the 64k write to busy all the disks, for a 16K stripe size (64K whole-stripe). with raid5, the main thing additional factor is to try to arrange blind whole-stripe writes if possible. that is, you pay the read-modify-write penalty unless you write a whole stripe at once: n-1 * stripe size if you can manage it aligned, or at least several whole stripes to amortize the reads necessary on the edges... - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html