Re: [PATCH 004 of 11] md: Increase the delay before marking metadata clean, and make it configurable.
On Sunday April 30, [EMAIL PROTECTED] wrote: NeilBrown [EMAIL PROTECTED] wrote: When a md array has been idle (no writes) for 20msecs it is marked as 'clean'. This delay turns out to be too short for some real workloads. So increase it to 200msec (the time to update the metadata should be a tiny fraction of that) and make it sysfs-configurable. ... + safe_mode_delay + When an md array has seen no write requests for a certain period + of time, it will be marked as 'clean'. When another write + request arrive, the array is marked as 'dirty' before the write + commenses. This is known as 'safe_mode'. + The 'certain period' is controlled by this file which stores the + period as a number of seconds. The default is 200msec (0.200). + Writing a value of 0 disables safemode. + Why not make the units milliseconds? Rename this to safe_mode_delay_msecs to remove any doubt. Because umpteen years ago when I was adding thread-usage statistics to /proc/net/rpc/nfsd I used milliseconds and Linus asked me to make it seconds - a much more obvious unit. See Email below. It seems very sensible to me. ... + msec = simple_strtoul(buf, e, 10); + if (e == buf || (*e *e != '\n')) + return -EINVAL; + msec = (msec * 1000) / scale; + if (msec == 0) + mddev-safemode_delay = 0; + else { + mddev-safemode_delay = (msec*HZ)/1000; + if (mddev-safemode_delay == 0) + mddev-safemode_delay = 1; + } + return len; And most of that goes away. Maybe it could go in a library :-? NeilBrown From: Linus Torvalds [EMAIL PROTECTED] To: Neil Brown [EMAIL PROTECTED] cc: [EMAIL PROTECTED] Subject: Re: PATCH knfsd - stats tidy up. Date: Tue, 18 Jul 2000 12:21:12 -0700 (PDT) Content-Type: TEXT/PLAIN; charset=US-ASCII On Tue, 18 Jul 2000, Neil Brown wrote: The following patch converts jiffies to milliseconds for output, and also makes the number wrap predicatably at 1,000,000 seconds (approximately one fortnight). If no programs depend on the format, I actually prefer format changes like this to be of the obvious kind. One such obvious kind is the format 0.001 which obviously means 0.001 seconds. And yes, I'm _really_ sorry that a lot of the old /proc files contain jiffies. Lazy. Ugly. Bad. Much of it my bad. Doing 0.001 doesn't mean that you have to use floating point, in fact you've done most of the work already in your ms patch, just splitting things out a bit works well: /* gcc knows to combine / and % - generate one divl */ unsigned int sec = time / HZ, msec = time % HZ; msec = (msec * 1000) / HZ; sprintf( %d.%03d, sec, msec) (It's basically the same thing you already do, except it doesn't re-combine the seconds and milliseconds but just prints them out separately.. And it has the advantage that if you want to change it to microseconds some day, you can do so very trivially without breaking the format. Plus it's readable as hell.) Linus - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 004 of 11] md: Increase the delay before marking metadata clean, and make it configurable.
Neil Brown [EMAIL PROTECTED] wrote: On Sunday April 30, [EMAIL PROTECTED] wrote: NeilBrown [EMAIL PROTECTED] wrote: When a md array has been idle (no writes) for 20msecs it is marked as 'clean'. This delay turns out to be too short for some real workloads. So increase it to 200msec (the time to update the metadata should be a tiny fraction of that) and make it sysfs-configurable. ... + safe_mode_delay + When an md array has seen no write requests for a certain period + of time, it will be marked as 'clean'. When another write + request arrive, the array is marked as 'dirty' before the write + commenses. This is known as 'safe_mode'. + The 'certain period' is controlled by this file which stores the + period as a number of seconds. The default is 200msec (0.200). + Writing a value of 0 disables safemode. + Why not make the units milliseconds? Rename this to safe_mode_delay_msecs to remove any doubt. Because umpteen years ago when I was adding thread-usage statistics to /proc/net/rpc/nfsd I used milliseconds and Linus asked me to make it seconds - a much more obvious unit. See Email below. It seems very sensible to me. That's output. It's easier to do the conversion with output. And I guess one could argue that lots of people read /proc files, but few write to them. Generally I don't think we should be teaching the kernel to accept pretend-floating-point numbers like this, especially when a) delay in milliseconds is such a simple concept and b) it's so easy to go from float to milliseconds in userspace. Do you really expect that humans (really dumb ones ;)) will be echoing numbers into this file? Or will it mainly be a thing for mdadm to fiddle with? - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 004 of 11] md: Increase the delay before marking metadata clean, and make it configurable.
Neil Brown wrote: On Sunday April 30, [EMAIL PROTECTED] wrote: NeilBrown [EMAIL PROTECTED] wrote: When a md array has been idle (no writes) for 20msecs it is marked as 'clean'. This delay turns out to be too short for some real workloads. So increase it to 200msec (the time to update the metadata should be a tiny fraction of that) and make it sysfs-configurable. ... + safe_mode_delay + When an md array has seen no write requests for a certain period + of time, it will be marked as 'clean'. When another write + request arrive, the array is marked as 'dirty' before the write + commenses. This is known as 'safe_mode'. + The 'certain period' is controlled by this file which stores the + period as a number of seconds. The default is 200msec (0.200). + Writing a value of 0 disables safemode. + Why not make the units milliseconds? Rename this to safe_mode_delay_msecs to remove any doubt. Because umpteen years ago when I was adding thread-usage statistics to /proc/net/rpc/nfsd I used milliseconds and Linus asked me to make it seconds - a much more obvious unit. See Email below. It seems very sensible to me. Either way, all ambiguity is removed if you put the unit in the name. And don't use jiffies because that obviously is not portable (which sounds like it was Linus' biggest concern). Once you do that, I don't much care whether you use seconds or milliseconds. Other than to note that many of our units now are ms, especially when they're measuring things at or around the ms order of magnitude. But I'm not aware of so many proc values that don't work in integers. -- Send instant messages to your online friends http://au.messenger.yahoo.com - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raid5 resizing
Neil Brown wrote: On Monday May 1, [EMAIL PROTECTED] wrote: Hey folks. There's no point in using LVM on a raid5 setup if all you intend to do in the future is resize the filesystem on it, is there? The new raid5 resizing code takes care of providing the extra space and then as long as the say ext3 filesystem is created with resize_inode all should be sweet. Right? Or have I missed something crucial here? :) You are correct. md/raid5 makes the extra space available all by itself. Further - even if you don't create the filesystem with the right amount of extra metadata space for online resizing, you can resize any ext2/3 filesystem offline, and it doesn't take very long. You just use resize2fsf instead of ext2online -Mike - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Two-disk RAID5?
On Wed, Apr 26, 2006 at 03:22:38PM -0400, Jon Lewis wrote: On Wed, 26 Apr 2006, Jansen, Frank wrote: It is not possible to flip a bit to change a set of disks from RAID 1 to RAID 5, as the physical layout is different. As Tuomas pointed out though, a 2 disk RAID5 is kind of a special case where all you have is data and parity which is actually also just data. No, the other way around: RAID1 is a special case of RAID5. The parity of RAID5 with n disks is contructed like[1]: parity = disk1 XOR disk2 XOR ... XOR disk n-1 With n = 2, this reduces to: parity = disk1 XOR nothing = disk1 Which is just mirroring, which we usually call RAID1. Seems kind of like a RAID1 with extra overhead. I don't think I've ever heard of a RAID5 implementation willing to handle 3 drives though. Our own RAID recovery tools can handle that just fine. Erik [1] Yes, there's also an algorithm to select which disk is used for parity for what block, but that doesn't change the way *how* parity is calculated. -- +-- Erik Mouw -- www.harddisk-recovery.com -- +31 70 370 12 90 -- | Lab address: Delftechpark 26, 2628 XH, Delft, The Netherlands - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: md: Change ENOTSUPP to EOPNOTSUPP
Paul Clements wrote: Gil wrote: So for those of us using other filesystems (e.g. ext2/3), is there some way to determine whether or not barriers are available? You'll see something like this in your system log if barriers are not supported: Apr 3 16:44:01 adam kernel: JBD: barrier-based sync failed on md0 - disabling barriers Otherwise, assume that they are. But like Neil said, it shouldn't matter to a user whether they are supported or not. Filesystems will work correctly either way. This seems very important to me to understand thoroughly, so please forgive me if I'm being dense. What I'm not sure of in the above is for what definition of working? For the definition where the code simply doesn't bomb out, or for the stricter definition that despite write caching at the drive level there is no point where there could possibly be a data inconsistency between what the filesystem thinks is written and what got written, power loss or no? My understanding to this point is that with write caching and no barrier support, you would still care as power loss would give you a window of inconsistency. With the exception of the very minor situation Neil mentioned about the first write through md not being a superblock write... -Mike - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 004 of 11] md: Increase the delay before marking metadata clean, and make it configurable.
On Mon, May 01, 2006 at 03:30:19PM +1000, NeilBrown wrote: When a md array has been idle (no writes) for 20msecs it is marked as 'clean'. This delay turns out to be too short for some real workloads. So increase it to 200msec (the time to update the metadata should be a tiny fraction of that) and make it sysfs-configurable. What does this mean, 'too short'? What happens in that case, backing block devices are still busy writing? When making this configurable, the help text better explain what the trade offs are. Thanks. -- http://www.PowerDNS.com Open source, database driven DNS Software http://netherlabs.nl Open and Closed source services - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html