Re: RAID needs more to survive a power hit, different /boot layout for example (was Re: draft howto on making raids for surviving a disk crash)
Moshe Yudkowsky wrote: [] But that's *exactly* what I have -- well, 5GB -- and which failed. I've modified /etc/fstab system to use data=journal (even on root, which I thought wasn't supposed to work without a grub option!) and I can power-cycle the system and bring it up reliably afterwards. Note also that data=journal effectively doubles the write time. It's a bit faster for small writes (because all writes are first done into the journal, i.e. into the same place, so no seeking is needed), but for larger writes, the journal will become full and data found in it needs to be written to proper place, to free space for new data. Here, if you'll continue writing, you will have more than 2x speed degradation, because of a) double writes, and b) more seeking. /mjt - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raid1 or raid10 for /boot
On Mon Feb 04, 2008 at 07:34:54AM +0100, Keld Jørn Simonsen wrote: I understand that lilo and grub only can boot partitions that look like a normal single-drive partition. And then I understand that a plain raid10 has a layout which is equivalent to raid1. Can such a raid10 partition be used with grub or lilo for booting? And would there be any advantages in this, for example better disk utilization in the raid10 driver compared with raid? A plain RAID-10 does _not_ have a layout equivalent to RAID-1 and _cannot_ be used for booting (well, possibly a 2-disk RAID-10 could - I'm not sure how that'd be layed out). RAID-10 uses striping as well as mirroring, and the striping breaks both grub and lilo (and, AFAIK, every other boot manager currently out there). Cheers, Robin -- ___ ( ' } | Robin Hill[EMAIL PROTECTED] | / / ) | Little Jim says | // !! | He fallen in de water !! | pgp5z4BPWeOcx.pgp Description: PGP signature
Re: RAID needs more to survive a power hit, different /boot layout for example (was Re: draft howto on making raids for surviving a disk crash)
Michael Tokarev wrote: Moshe Yudkowsky wrote: [] But that's *exactly* what I have -- well, 5GB -- and which failed. I've modified /etc/fstab system to use data=journal (even on root, which I thought wasn't supposed to work without a grub option!) and I can power-cycle the system and bring it up reliably afterwards. Note also that data=journal effectively doubles the write time. It's a bit faster for small writes (because all writes are first done into the journal, i.e. into the same place, so no seeking is needed), but for larger writes, the journal will become full and data found in it needs to be written to proper place, to free space for new data. Here, if you'll continue writing, you will have more than 2x speed degradation, because of a) double writes, and b) more seeking. The alternative seems to be that portions of the / file system won't mount because the file system is corrupted on a crash while writing. If I'm reading the man pages, Wikis, READMEs and mailing lists correctly -- not necessarily the case -- the ext3 file system uses the equivalent of data=journal as a default. The question then becomes what data scheme to use with reiserfs on the remainder of the file system, the /usr, /var, /home, and others. If they can recover on a reboot sing fsck and the default configuration of resierfs, then I have no problem using them. But my understanding is that data can be destroyed or lost or destroyed if there's a crash on a write; then there's little point in running a RAID system that can collect corrupt data. Another way to phrase this: unless you're running data-center grade hardware and have absolute confidence in your UPS, you should use data=journal for reiserfs and perhaps avoid XFS entirely. -- Moshe Yudkowsky * [EMAIL PROTECTED] * www.pobox.com/~moshe Right in the middle of a large field where there had never been a trench was a shell hole... 8 feet deep by 15 across. On the edge of it was a dead... rat not over twice the size of a mouse. No wonder the war costs so much. Col. George Patton - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID needs more to survive a power hit, different /boot layout for example (was Re: draft howto on making raids for surviving a disk crash)
Robin, thanks for the explanation. I have a further question. Robin Hill wrote: Once the file system is mounted then hdX,Y maps according to the device.map file (which may actually bear no resemblance to the drive order at boot - I've had issues with this before). At boot time it maps to the BIOS boot order though, and (in my experience anyway) hd0 will always map to the drive the BIOS is booting from. At the time that I use grub to write to the MBR, hd2,1 is /dev/sdc1. Therefore, I don't quite understand why this would not work: grub EOF root(hd2,1) setup(hd2) EOF This would seem to be a command to have the MBR on hd2 written to use the boot on hd2,1. It's valid when written. Are you saying that it's a command for the MBR on /dev/sdc to find the data on (hd2,1), the location of which might change at any time? That's... a very strange way to write the tool. I thought it would be a command for the MBR on hd2 (sdc) to look at hd2,1 (sdc1) to find its data, regardless of the boot order that caused sdc to be the boot disk. -- Moshe Yudkowsky * [EMAIL PROTECTED] * www.pobox.com/~moshe Bring me the head of Prince Charming. -- Robert Sheckley Roger Zelazny - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raid1 or raid10 for /boot
On Mon, Feb 04, 2008 at 09:17:35AM +, Robin Hill wrote: On Mon Feb 04, 2008 at 07:34:54AM +0100, Keld Jørn Simonsen wrote: I understand that lilo and grub only can boot partitions that look like a normal single-drive partition. And then I understand that a plain raid10 has a layout which is equivalent to raid1. Can such a raid10 partition be used with grub or lilo for booting? And would there be any advantages in this, for example better disk utilization in the raid10 driver compared with raid? A plain RAID-10 does _not_ have a layout equivalent to RAID-1 and _cannot_ be used for booting (well, possibly a 2-disk RAID-10 could - I'm not sure how that'd be layed out). RAID-10 uses striping as well as mirroring, and the striping breaks both grub and lilo (and, AFAIK, every other boot manager currently out there). Yes, it is understood that raid10,f2 uses striping, but a raid10,near=2, far=1 does not use striping, anfd this is what you get if you just make amdadm --create /dev/md0 -l 10 -n 2 /dev/sda1 /dev/sdb1 best regards keld - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID needs more to survive a power hit, different /boot layout for example (was Re: draft howto on making raids for surviving a disk crash)
Moshe Yudkowsky wrote: [] If I'm reading the man pages, Wikis, READMEs and mailing lists correctly -- not necessarily the case -- the ext3 file system uses the equivalent of data=journal as a default. ext3 defaults to data=ordered, not data=journal. ext2 doesn't have journal at all. The question then becomes what data scheme to use with reiserfs on the I'd say don't use reiserfs in the first place ;) Another way to phrase this: unless you're running data-center grade hardware and have absolute confidence in your UPS, you should use data=journal for reiserfs and perhaps avoid XFS entirely. By the way, even if you do have a good UPS, there should be some control program for it, to properly shut down your system when UPS loses the AC power. So far, I've seen no such programs... /mjt - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID needs more to survive a power hit, different /boot layout for example (was Re: draft howto on making raids for surviving a disk crash)
Eric, Thanks very much for your note. I'm becoming very leery of resiserfs at the moment... I'm about to run another series of crash tests. Eric Sandeen wrote: Justin Piszcz wrote: Why avoid XFS entirely? esandeen, any comments here? Heh; well, it's the meme. Well, yeah... Note also that ext3 has the barrier option as well, but it is not enabled by default due to performance concerns. Barriers also affect xfs performance, but enabling them in the non-battery-backed-write-cache scenario is the right thing to do for filesystem integrity. So if I understand you correctly, you're stating that current the most reliable fs in its default configuration, in terms of protection against power-loss scenarios, is XFS? -- Moshe Yudkowsky * [EMAIL PROTECTED] * www.pobox.com/~moshe There is something fundamentally wrong with a country [USSR] where the citizens want to buy your underwear. -- Paul Thereaux - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID needs more to survive a power hit, different /boot layout for example (was Re: draft howto on making raids for surviving a disk crash)
Eric Sandeen wrote: Moshe Yudkowsky wrote: So if I understand you correctly, you're stating that current the most reliable fs in its default configuration, in terms of protection against power-loss scenarios, is XFS? I wouldn't go that far without some real-world poweroff testing, because various fs's are probably more or less tolerant of a write-cache evaporation. I suppose it'd depend on the size of the write cache as well. I know no filesystem which is, as you say, tolerant to a write-cache evaporation. If a drive says the data is written but in fact it's not, it's a Bad Drive (tm) and it should be thrown away immediately. Fortunately, almost all modern disk drives don't lie this way. The only thing needed for the filesystem is to tell the drive to flush it's cache at the appropriate time, and actually wait for the flush to complete. Barriers (mentioned in this thread) is just another way to do so, in a somewhat more efficient way, but normal cache flush will do as well. IFF the write caching is enabled in the first place - note that with some workloads, write caching in the drive actually makes write speed worse, not better - namely, in case of massive writes. Speaking of XFS (and with ext3fs with write barriers enabled) - I'm confused here as well, and answers to my questions didn't help either. As far as I understand, XFS only use barriers, not regular cache flushes, hence without write barrier support (which is not here for linux software raid, which is explained elsewhere) it's unsafe, -- probably the same applies to ext3 with barrier support enabled. But I'm not sure I got it all correctly. /mjt - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raid1 and raid 10 always writes all data to all disks?
Keld Jørn Simonsen wrote: On Sun, Feb 03, 2008 at 10:56:01AM -0500, Bill Davidsen wrote: Keld Jørn Simonsen wrote: I found a sentence in the HOWTO: raid1 and raid 10 always writes all data to all disks I think this is wrong for raid10. eg a raid10,f2 of 4 disks only writes to two of the disks - not all 4 disks. Is that true? I suspect that really should have read all mirror copies, in the raid10 case. OK, I changed the text to: raid1 always writes all data to all disks. Just to be really pedantic, you might say devices instead of disks, since many or most arrays are on partitions. Otherwise I like this, it's much clearer. raid10 always writes all data to the number of copies that the raid holds. For example on a raid10,f2 or raid10,o2 of 6 disks, the data will only be written 2 times. Best regards Keld -- Bill Davidsen [EMAIL PROTECTED] Woe unto the statesman who makes war without a reason that will still be valid when the war is over... Otto von Bismark - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: draft howto on making raids for surviving a disk crash
Keld Jørn Simonsen wrote: On Sun, Feb 03, 2008 at 10:53:51AM -0500, Bill Davidsen wrote: Keld Jørn Simonsen wrote: This is intended for the linux raid howto. Please give comments. It is not fully ready /keld Howto prepare for a failing disk 6. /etc/mdadm.conf Something here on /etc/mdadm.conf. What would be safe, allowing a system to boot even if a disk has crashed? Recommend PARTITIONS by used Thanks Bill for your suggestions, which I have incorporated in the text. However, I do not understand what to do with the remark above. Please explain. The mdadm.conf file should contain the DEVICE partitions statement to identify all possible partitions regardless of name changes. See man mdadm.conf for more discussion. This protects against udev doing something innovative in device naming. -- Bill Davidsen [EMAIL PROTECTED] Woe unto the statesman who makes war without a reason that will still be valid when the war is over... Otto von Bismark - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: In this partition scheme, grub does not find md information?
John Stoffel wrote: [] C'mon, how many of you are programmed to believe that 1.2 is better than 1.0? But when they're not different, just just different placements, then it's confusing. Speaking of more is better thing... There were quite a few bugs fixed in recent months wrt version 1 superblocks - both in kernel and in mdadm. While 0.90 format is stable for a very long time, and unless you're hitting its limits (namely, max 26 drives in an array, no homehost field), there's nothing which makes v1 superblocks better than 0.90 ones. In my view, better = stable first, faster/easier/whatever second. /mjt - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: In this partition scheme, grub does not find md information?
David On 26 Oct 2007, Neil Brown wrote: On Thursday October 25, [EMAIL PROTECTED] wrote: I also suspect that a *lot* of people will assume that the highest superblock version is the best and should be used for new installs etc. Grumble... why can't people expect what I want them to expect? David Moshe Yudkowsky wrote: I expect it's because I used 1.2 superblocks (why not use the latest, I said, foolishly...) and therefore the RAID10 -- David Aha - an 'in the wild' example of why we should deprecate '0.9 David 1.0 1.1, 1.2' and rename the superblocks to data-version + David on-disk-location :) As the person who started this entire thread ages ago about the *poor* naming convetion used for RAID Superblocks, I have to agree. I'd much rather see 1.near, 1.far, 1.both or something like that added in. Heck, we don't have to remove the support for the old 1.0, 1.1, 1.2 names either, just make the default be something more user friendly. C'mon, how many of you are programmed to believe that 1.2 is better than 1.0? But when they're not different, just just different placements, then it's confusing. John - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID needs more to survive a power hit, different /boot layout for example (was Re: draft howto on making raids for surviving a disk crash)
Michael Tokarev wrote: Unfortunately an UPS does not *really* help here. Because unless it has control program which properly shuts system down on the loss of input power, and the battery really has the capacity to power the system while it's shutting down (anyone tested this? With new UPS? and after an year of use, when the battery is not new?), -- unless the UPS actually has the capacity to shutdown system, it will cut the power at an unexpected time, while the disk(s) still has dirty caches... I'm unsure what you mean here. The Network UPS Tools project http://www.networkupstools.org/ has been supplying software to do this for years. In addition, a number of UPS manufacturers including APC, one of the larger ones, provide Linux management and monitoring software with the UPS. As far as worrying whether a one year old battery has enough capacity to hold up while the system shuts down, there is no reason why you cannot set it to shut the system down gracefully after maybe 30 seconds of power loss if you feel it is necessary. A reputable brand UPS with a correctly sized battery capacity will have no trouble in this scenario unless the battery is faulty, in which case it will probably be picked up during automated load tests. As long as the manufacturers battery replacement schedule is followed, genuine replacement batteries are used and automated regular UPS tests are enabled, the risks of failure are small. Regards, Richard - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID needs more to survive a power hit, different /boot layout for example (was Re: draft howto on making raids for surviving a disk crash)
Moshe Yudkowsky wrote: So if I understand you correctly, you're stating that current the most reliable fs in its default configuration, in terms of protection against power-loss scenarios, is XFS? I wouldn't go that far without some real-world poweroff testing, because various fs's are probably more or less tolerant of a write-cache evaporation. I suppose it'd depend on the size of the write cache as well. -Eric - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raid1 or raid10 for /boot
On Mon Feb 04, 2008 at 12:21:40PM +0100, Keld Jørn Simonsen wrote: On Mon, Feb 04, 2008 at 09:17:35AM +, Robin Hill wrote: On Mon Feb 04, 2008 at 07:34:54AM +0100, Keld Jørn Simonsen wrote: I understand that lilo and grub only can boot partitions that look like a normal single-drive partition. And then I understand that a plain raid10 has a layout which is equivalent to raid1. Can such a raid10 partition be used with grub or lilo for booting? And would there be any advantages in this, for example better disk utilization in the raid10 driver compared with raid? A plain RAID-10 does _not_ have a layout equivalent to RAID-1 and _cannot_ be used for booting (well, possibly a 2-disk RAID-10 could - I'm not sure how that'd be layed out). RAID-10 uses striping as well as mirroring, and the striping breaks both grub and lilo (and, AFAIK, every other boot manager currently out there). Yes, it is understood that raid10,f2 uses striping, but a raid10,near=2, far=1 does not use striping, anfd this is what you get if you just make amdadm --create /dev/md0 -l 10 -n 2 /dev/sda1 /dev/sdb1 Well yes, if you do a two-disk RAID-10 then (as I said above) you probably end up with a RAID-1 (as you do with a two-disk RAID-5). I don't see how this would work any differently (or better) than a RAID-1 though (and only serves to confuse things). If you have more than two disks then RAID-10 will _always_ stripe (no matter whether you use near, far or offset layout - these affect only where the mirrored chunks are put) and grub/lilo will fail to work. Cheers, Robin -- ___ ( ' } | Robin Hill[EMAIL PROTECTED] | / / ) | Little Jim says | // !! | He fallen in de water !! | pgpJkvfTSVpOK.pgp Description: PGP signature
Re: RAID needs more to survive a power hit, different /boot layout for example (was Re: draft howto on making raids for surviving a disk crash)
On Mon Feb 04, 2008 at 05:06:09AM -0600, Moshe Yudkowsky wrote: Robin, thanks for the explanation. I have a further question. Robin Hill wrote: Once the file system is mounted then hdX,Y maps according to the device.map file (which may actually bear no resemblance to the drive order at boot - I've had issues with this before). At boot time it maps to the BIOS boot order though, and (in my experience anyway) hd0 will always map to the drive the BIOS is booting from. At the time that I use grub to write to the MBR, hd2,1 is /dev/sdc1. Therefore, I don't quite understand why this would not work: grub EOF root(hd2,1) setup(hd2) EOF This would seem to be a command to have the MBR on hd2 written to use the boot on hd2,1. It's valid when written. Are you saying that it's a command for the MBR on /dev/sdc to find the data on (hd2,1), the location of which might change at any time? That's... a very strange way to write the tool. I thought it would be a command for the MBR on hd2 (sdc) to look at hd2,1 (sdc1) to find its data, regardless of the boot order that caused sdc to be the boot disk. This is exactly what it does, yes - the hdX,Y are mapped by GRUB into BIOS disk interfaces (0x80 being the first, 0x81 the second and so on) and it writes (to hdc in this case) the instructions to look on the first partition of BIOS drive 0x82 (whichever drive that ends up being) for the rest of the bootloader. It is a bit of a strange way to work, but it's really the only way it _can_ work (and cover all circumstances). Unfortunately when you start playing with bootloaders you have to get down to the BIOS level, and things weren't written to make sense at that level (after all, when these standards were put in place everyone was booting from a single floppy disk system). If EFI becomes more standard then hopefully this will simplify but we're stuck with things as they are for now. Cheers, Robin -- ___ ( ' } | Robin Hill[EMAIL PROTECTED] | / / ) | Little Jim says | // !! | He fallen in de water !! | pgplpoLJetl8c.pgp Description: PGP signature
Re: Linux md and iscsi problems
Good morning. Quoting Neil Brown [EMAIL PROTECTED]: On Friday February 1, [EMAIL PROTECTED] wrote: Summarizing, I have two questions about the behavior of Linux md with slow devices: 1. Is it possible to modify some kind of time-out parameter on the mdadm tool so the slow device wouldn't be marked as faulty because of its slow performance. No. md doesn't do timeouts at all. The underlying device does. So if you are getting time out errors from the iscsi initiator, then you need to change the timeout value used by the iscsi initiator. md has no part to play in this. It just sends a request and eventually gets either 'success' or 'fail'. 1. What seems strange to me is that under the same conditions running the test only with iscsi, the initiator never fails. I had some problems with iscsi on the past and they were solved changing the time-out parameters on the initiator side, but now that I added the md layer I am getting errors with the slow device. 2. Is it possible to control the buffer size of the RAID?, in other words, can I control the amount of data I can write to the local disc before I receive an acknowledgment from the slow device when I am using the write-behind option. No. md/raid1 simply calls 'kmalloc' to get space to buffer each write as the write arrives. If the allocation succeeds, it is used to perform the write lazily. If the allocation fails, the write is performs synchronously. What did you hope to achieve by such tuning? It can probably be added if it is generally useful. NeilBrown 2. The idea here is to implement remote replication, by having a RAID-1 I can create a mirror of my local disk and use it for backup or for any other purposes on a centralized location. Changing the buffer will allow me to improve performance on the writing, so a user will always experience local writing speed, while I am still sending data across the network to the mirror device. Thanks a lot for your time. Juan Aristizabal. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID needs more to survive a power hit, different /boot layout for example (was Re: draft howto on making raids for surviving a disk crash)
On Mon, 4 Feb 2008, Michael Tokarev wrote: Moshe Yudkowsky wrote: [] If I'm reading the man pages, Wikis, READMEs and mailing lists correctly -- not necessarily the case -- the ext3 file system uses the equivalent of data=journal as a default. ext3 defaults to data=ordered, not data=journal. ext2 doesn't have journal at all. The question then becomes what data scheme to use with reiserfs on the I'd say don't use reiserfs in the first place ;) Another way to phrase this: unless you're running data-center grade hardware and have absolute confidence in your UPS, you should use data=journal for reiserfs and perhaps avoid XFS entirely. By the way, even if you do have a good UPS, there should be some control program for it, to properly shut down your system when UPS loses the AC power. So far, I've seen no such programs... /mjt Why avoid XFS entirely? esandeen, any comments here? Justin. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID needs more to survive a power hit, different /boot layout for example (was Re: draft howto on making raids for surviving a disk crash)
Eric Sandeen wrote: [] http://oss.sgi.com/projects/xfs/faq.html#nulls and note that recent fixes have been made in this area (also noted in the faq) Also - the above all assumes that when a drive says it's written/flushed data, that it truly has. Modern write-caching drives can wreak havoc with any journaling filesystem, so that's one good reason for a UPS. If Unfortunately an UPS does not *really* help here. Because unless it has control program which properly shuts system down on the loss of input power, and the battery really has the capacity to power the system while it's shutting down (anyone tested this? With new UPS? and after an year of use, when the battery is not new?), -- unless the UPS actually has the capacity to shutdown system, it will cut the power at an unexpected time, while the disk(s) still has dirty caches... the drive claims to have metadata safe on disk but actually does not, and you lose power, the data claimed safe will evaporate, there's not much the fs can do. IO write barriers address this by forcing the drive to flush order-critical data before continuing; xfs has them on by default, although they are tested at mount time and if you have something in between xfs and the disks which does not support barriers (i.e. lvm...) then they are disabled again, with a notice in the logs. Note also that with linux software raid barriers are NOT supported. /mjt - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID needs more to survive a power hit, different /boot layout for example (was Re: draft howto on making raids for surviving a disk crash)
Eric Sandeen wrote: Justin Piszcz wrote: Why avoid XFS entirely? esandeen, any comments here? Heh; well, it's the meme. see: http://oss.sgi.com/projects/xfs/faq.html#nulls and note that recent fixes have been made in this area (also noted in the faq) Actually, continue reading past that specific entry to the next several, it covers all this quite well. -Eric - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID needs more to survive a power hit, different /boot layout for example (was Re: draft howto on making raids for surviving a disk crash)
Justin Piszcz wrote: Why avoid XFS entirely? esandeen, any comments here? Heh; well, it's the meme. see: http://oss.sgi.com/projects/xfs/faq.html#nulls and note that recent fixes have been made in this area (also noted in the faq) Also - the above all assumes that when a drive says it's written/flushed data, that it truly has. Modern write-caching drives can wreak havoc with any journaling filesystem, so that's one good reason for a UPS. If the drive claims to have metadata safe on disk but actually does not, and you lose power, the data claimed safe will evaporate, there's not much the fs can do. IO write barriers address this by forcing the drive to flush order-critical data before continuing; xfs has them on by default, although they are tested at mount time and if you have something in between xfs and the disks which does not support barriers (i.e. lvm...) then they are disabled again, with a notice in the logs. Note also that ext3 has the barrier option as well, but it is not enabled by default due to performance concerns. Barriers also affect xfs performance, but enabling them in the non-battery-backed-write-cache scenario is the right thing to do for filesystem integrity. -Eric Justin. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: using update-initramfs: how to get new mdadm.conf into the /boot? Or is it XFS?
On Mon, Feb 04, 2008 at 02:59:44PM -0600, Moshe Yudkowsky wrote: Problem: on reboot, the I get an error message: root (hd0,1) (Moshe comment: as expected) Filesystem type is xfs, partition type 0xfd (Moshe comment: as expected) kernel /boot/vmliuz-etc.-amd64 root=/dev/md/boot ro Error 15: File not found error 15 is an *grub* error. grub is known for it's dislike of xfs, so with this whole setup use ext3 rerun grub-install and you should be fine. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: using update-initramfs: how to get new mdadm.conf into the /boot? Or is it XFS?
On Mon Feb 04, 2008 at 02:59:44PM -0600, Moshe Yudkowsky wrote: I've managed to get myself into a little problem. Since power hits were taking out the /boot partition, I decided to split /boot out of root. Working from my emergency partition, I copied all files from /root, re-partitioned what had been /root into room for /boot and /root, and then created the drives. This left me with /dev/md/boot, /dev/md/root, and /dev/md/base (everything else). I modified mdadm.conf on the emergency partition, used update-initramfs to make certain that the new md drives would be recognized, and rebooted. This worked as expected. I then mounted all the entire new file system on a mount point, copied the mdadm.conf to that point, did a chroot to that point, and did an update-initramfs so that the non-emergency partition would have the updated mdadm.conf. This worked -- but with complaints about missing the file /proc/modules (which is not present under chroot). If I use the -v option I can see the raid456, raid1, etc. modules loading. I modified menu.lst to make certain that boot=/dev/md/boot, ran grub (thanks, Robin!) successfully. Problem: on reboot, the I get an error message: root (hd0,1) (Moshe comment: as expected) Filesystem type is xfs, partition type 0xfd (Moshe comment: as expected) kernel /boot/vmliuz-etc.-amd64 root=/dev/md/boot ro ---^^ Are you sure that's right? Looks like a typo to me. Cheers, Robin -- ___ ( ' } | Robin Hill[EMAIL PROTECTED] | / / ) | Little Jim says | // !! | He fallen in de water !! | pgpLybDJ7EGiw.pgp Description: PGP signature
Re: using update-initramfs: how to get new mdadm.conf into the /boot? Or is it XFS?
Robin Hill wrote: File not found at that point would suggest it can't find the kernel file. The path here should be relative to the root of the partition /boot is on, so if your /boot is its own partition then you should either use kernel /vmlinuz or (the more usual solution from what I've seen) make sure there's a symlink: ln -s . /boot/boot Robin, Thanks very much! ln -s . /boot/boot works to get past this problem. Now it's failed in a different section and complains that it can't find /sbin/init. I'm at the (initramfs) prompt, which I don't ever recall seeing before. I can't mount /dev/md/root on any mount points (invalid arguments even though I'm not supplying any). I've checked /dev/md/root and it does work as expected when I try mounting it while in my emergency partition, and it does contain /sbin/init and the other files and mount points for /var, /boot, /tmp, etc. So this leads me to the question of why /sbin isn't being seen. /sbin is on the device /dev/md/root, and /etc/fstab specifically mounts it at /. I would think /boot would look at an internal copy of /etc/fstab. Is this another side effect of using /boot on its own partition? -- Moshe Yudkowsky * [EMAIL PROTECTED] * www.pobox.com/~moshe Blessed are the peacemakers, for they shall be mowed down in the crossfire. -- Michael Flynn - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: using update-initramfs: how to get new mdadm.conf into the /boot? Or is it XFS?
maximilian attems wrote: error 15 is an *grub* error. grub is known for it's dislike of xfs, so with this whole setup use ext3 rerun grub-install and you should be fine. I should mention that something *did* change. When attempting to use XFS, grub would give me a note about 18 partitions used (I forget the exact language). This was different than I'd remembered; when I switched back to using reiserfs, grub reports using 19 partitions. So there's something definitely interesting about XFS and booting. As an additional note, if I use the grub boot-time commands to edit root to read, e.g., root=/dev/sda2 or root=/dev/sdb2, I get the same Error 15 error message. It may be that grub is complaining about grub and resiserfs, but I suspect that it has a true complain about the file system and what's on the partitions. -- Moshe Yudkowsky * [EMAIL PROTECTED] * www.pobox.com/~moshe If, after hearing my songs, just one human being is inspired to say something nasty to a friend, it will all have been worthwhile. -- Tom Lehrer - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID needs more to survive a power hit, different /boot layout for example (was Re: draft howto on making raids for surviving a disk crash)
On Mon, 4 Feb 2008, Michael Tokarev wrote: Eric Sandeen wrote: [] http://oss.sgi.com/projects/xfs/faq.html#nulls and note that recent fixes have been made in this area (also noted in the faq) Also - the above all assumes that when a drive says it's written/flushed data, that it truly has. Modern write-caching drives can wreak havoc with any journaling filesystem, so that's one good reason for a UPS. If Unfortunately an UPS does not *really* help here. Because unless it has control program which properly shuts system down on the loss of input power, and the battery really has the capacity to power the system while it's shutting down (anyone tested this? With new UPS? and after an year of use, when the battery is not new?), -- unless the UPS actually has the capacity to shutdown system, it will cut the power at an unexpected time, while the disk(s) still has dirty caches... You use nut and a large enough UPS to handle the load of the system, it shuts the machine down just fine. the drive claims to have metadata safe on disk but actually does not, and you lose power, the data claimed safe will evaporate, there's not much the fs can do. IO write barriers address this by forcing the drive to flush order-critical data before continuing; xfs has them on by default, although they are tested at mount time and if you have something in between xfs and the disks which does not support barriers (i.e. lvm...) then they are disabled again, with a notice in the logs. Note also that with linux software raid barriers are NOT supported. /mjt - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: using update-initramfs: how to get new mdadm.conf into the /boot? Or is it XFS?
On Mon Feb 04, 2008 at 02:59:44PM -0600, Moshe Yudkowsky wrote: I've managed to get myself into a little problem. Since power hits were taking out the /boot partition, I decided to split /boot out of root. Working from my emergency partition, I copied all files from /root, re-partitioned what had been /root into room for /boot and /root, and then created the drives. This left me with /dev/md/boot, /dev/md/root, and /dev/md/base (everything else). I modified mdadm.conf on the emergency partition, used update-initramfs to make certain that the new md drives would be recognized, and rebooted. This worked as expected. I then mounted all the entire new file system on a mount point, copied the mdadm.conf to that point, did a chroot to that point, and did an update-initramfs so that the non-emergency partition would have the updated mdadm.conf. This worked -- but with complaints about missing the file /proc/modules (which is not present under chroot). If I use the -v option I can see the raid456, raid1, etc. modules loading. I modified menu.lst to make certain that boot=/dev/md/boot, ran grub (thanks, Robin!) successfully. Problem: on reboot, the I get an error message: root (hd0,1) (Moshe comment: as expected) Filesystem type is xfs, partition type 0xfd (Moshe comment: as expected) kernel /boot/vmliuz-etc.-amd64 root=/dev/md/boot ro Error 15: File not found File not found at that point would suggest it can't find the kernel file. The path here should be relative to the root of the partition /boot is on, so if your /boot is its own partition then you should either use kernel /vmlinuz or (the more usual solution from what I've seen) make sure there's a symlink: ln -s . /boot/boot HTH, Robin -- ___ ( ' } | Robin Hill[EMAIL PROTECTED] | / / ) | Little Jim says | // !! | He fallen in de water !! | pgpSsIYkFb4DG.pgp Description: PGP signature
when is a disk non-fresh?
Seems the other topic wasn't quite clear... Occasionally a disk is kicked for being non-fresh - what does this mean and what causes it? Dex -- -BEGIN GEEK CODE BLOCK- Version: 3.12 GCS d--(+)@ s-:+ a- C UL++ P+++ L+++ E-- W++ N o? K- w--(---) !O M+ V- PS+ PE Y++ PGP t++(---)@ 5 X+(++) R+(++) tv--(+)@ b++(+++) DI+++ D- G++ e* h++ r* y? --END GEEK CODE BLOCK-- http://www.vorratsdatenspeicherung.de - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
using update-initramfs: how to get new mdadm.conf into the /boot? Or is it XFS?
I've managed to get myself into a little problem. Since power hits were taking out the /boot partition, I decided to split /boot out of root. Working from my emergency partition, I copied all files from /root, re-partitioned what had been /root into room for /boot and /root, and then created the drives. This left me with /dev/md/boot, /dev/md/root, and /dev/md/base (everything else). I modified mdadm.conf on the emergency partition, used update-initramfs to make certain that the new md drives would be recognized, and rebooted. This worked as expected. I then mounted all the entire new file system on a mount point, copied the mdadm.conf to that point, did a chroot to that point, and did an update-initramfs so that the non-emergency partition would have the updated mdadm.conf. This worked -- but with complaints about missing the file /proc/modules (which is not present under chroot). If I use the -v option I can see the raid456, raid1, etc. modules loading. I modified menu.lst to make certain that boot=/dev/md/boot, ran grub (thanks, Robin!) successfully. Problem: on reboot, the I get an error message: root (hd0,1) (Moshe comment: as expected) Filesystem type is xfs, partition type 0xfd (Moshe comment: as expected) kernel /boot/vmliuz-etc.-amd64 root=/dev/md/boot ro Error 15: File not found Did I miss something? I'm pretty certain this is the procedure I used before. The XFS module is being loaded by update-initramfs, so unless there's a reason that I can't boot md from a boot partition with the XFS file system, then I don't understand what the problem is. Comments welcome -- I'm wedged! -- Moshe Yudkowsky * [EMAIL PROTECTED] * www.pobox.com/~moshe Many that live deserve death. And some that die deserve life. Can you give it to them? Then do not be too eager to deal out death in judgement. For even the wise cannot see all ends. -- Gandalf (J.R.R. Tolkien) - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: mdadm 2.6.4 : How i can check out current status of reshaping ?
On Monday February 4, [EMAIL PROTECTED] wrote: [EMAIL PROTECTED]:/# cat /proc/mdstat Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath] [faulty] md1 : active raid5 sdc[0] sdb[5](S) sdf[3] sde[2] sdd[1] 1465159488 blocks super 0.91 level 5, 64k chunk, algorithm 2 [5/4] [_] unused devices: none ## But how i can see the status of reshaping ? Is it reshaped realy ? or may be just hang up ? or may be mdadm nothing do not give in general ? How long wait when reshaping will finish ? ## The reshape hasn't restarted. Did you do that mdadm -w /dev/md1 like I suggested? If so, what happened? Possibly you tried mounting the filesystem before trying the mdadm -w. There seems to be a bug such that doing this would cause the reshape not to restart, and mdadm -w would not help any more. I suggest you: echo 0 /sys/module/md_mod/parameters/start_ro stop the array mdadm -S /dev/md1 (after unmounting if necessary). Then assemble the array again. Then mdadm -w /dev/md1 just to be sure. If this doesn't work, please report exactly what you did, exactly what message you got and exactly where message appeared in the kernel log. NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: using update-initramfs: how to get new mdadm.conf into the /boot? Or is it XFS?
I wrote: Now it's failed in a different section and complains that it can't find /sbin/init. I'm at the (initramfs) prompt, which I don't ever recall seeing before. I can't mount /dev/md/root on any mount points (invalid arguments even though I'm not supplying any). I've checked /dev/md/root and it does work as expected when I try mounting it while in my emergency partition, and it does contain /sbin/init and the other files and mount points for /var, /boot, /tmp, etc. So this leads me to the question of why /sbin isn't being seen. /sbin is on the device /dev/md/root, and /etc/fstab specifically mounts it at /. I would think /boot would look at an internal copy of /etc/fstab. Is this another side effect of using /boot on its own partition? The answer: I managed to make a mistake in the configuration of grub, in /boot/grub/menu.lst. I'd changed root= from /dev/md/root to /dev/md/boot -- but I really need to include the *root* location, which does not change, vs. the boot location, which is not relevant. -- Moshe Yudkowsky * [EMAIL PROTECTED] * www.pobox.com/~moshe The central tenet of Buddhism is not 'Every man for himself.' -- Wanda - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: when is a disk non-fresh?
On Monday February 4, [EMAIL PROTECTED] wrote: Seems the other topic wasn't quite clear... not necessarily. sometimes it helps to repeat your question. there is a lot of noise on the internet and somethings important things get missed... :-) Occasionally a disk is kicked for being non-fresh - what does this mean and what causes it? The 'event' count is too small. Every event that happens on an array causes the event count to be incremented. If the event counts on different devices differ by more than 1, then the smaller number is 'non-fresh'. You need to look to the kernel logs of when the array was previously shut down to figure out why it is now non-fresh. NeilBrown Dex -- -BEGIN GEEK CODE BLOCK- Version: 3.12 GCS d--(+)@ s-:+ a- C UL++ P+++ L+++ E-- W++ N o? K- w--(---) !O M+ V- PS+ PE Y++ PGP t++(---)@ 5 X+(++) R+(++) tv--(+)@ b++(+++) DI+++ D- G++ e* h++ r* y? --END GEEK CODE BLOCK-- http://www.vorratsdatenspeicherung.de - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html