Re: [PATCH 004 of 11] md: Increase the delay before marking metadata clean, and make it configurable.
On Tuesday May 2, [EMAIL PROTECTED] wrote: > On Mon, May 01, 2006 at 03:30:19PM +1000, NeilBrown wrote: > > When a md array has been idle (no writes) for 20msecs it is marked as > > 'clean'. This delay turns out to be too short for some real > > workloads. So increase it to 200msec (the time to update the metadata > > should be a tiny fraction of that) and make it sysfs-configurable. > > What does this mean, 'too short'? What happens in that case, backing block > devices are still busy writing? When making this configurable, the help text > better explain what the trade offs are. "too short" means that the update happens often enough to cause a noticeable performance degradation. In an application writes steadily very 21msecs (or maybe 30msecs) then there will be 2 superblock writes and 1 application write every 21msecs, and this causes enough disk io to close the app down. - I guess all the updates fill up the 21msec space. With a larger delay - 200msec - you could still get bad situations e.g. with the app writing every 210msecs. However 2 superblock updates plus one app write is a much smaller fraction of 200msecs, so there shouldn't be as many problems. Yes, a more detailed explanation should go in Documentation/md.txt NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 004 of 11] md: Increase the delay before marking metadata clean, and make it configurable.
On Mon, May 01, 2006 at 03:30:19PM +1000, NeilBrown wrote: > When a md array has been idle (no writes) for 20msecs it is marked as > 'clean'. This delay turns out to be too short for some real > workloads. So increase it to 200msec (the time to update the metadata > should be a tiny fraction of that) and make it sysfs-configurable. What does this mean, 'too short'? What happens in that case, backing block devices are still busy writing? When making this configurable, the help text better explain what the trade offs are. Thanks. -- http://www.PowerDNS.com Open source, database driven DNS Software http://netherlabs.nl Open and Closed source services - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 004 of 11] md: Increase the delay before marking metadata clean, and make it configurable.
On Sun, 30 Apr 2006, Andrew Morton wrote: > > Generally I don't think we should be teaching the kernel to accept > pretend-floating-point numbers like this, especially when a) "delay in > milliseconds" is such a simple concept and b) it's so easy to go from float > to milliseconds in userspace. > > Do you really expect that humans (really dumb ones ;)) will be echoing > numbers into this file? Or will it mainly be a thing for mdadm to fiddle > with? I generally hate interfaces that have some "random base". So "delay in seconds" is not a random base, because "seconds" is a good SI base unit, and there's not a lot of question about it. But once you start talking milliseconds on microseconds, I'd actually much rather have a "fake floating point number" over having different files have different (magic) base constants. How do you remember which are milliseconds, which are microseconds, and which are just seconds? It should be easy to have a helper function or two that takes a "struct timeval" and reads/writes a "float". Linus - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 004 of 11] md: Increase the delay before marking metadata clean, and make it configurable.
Neil Brown wrote: On Sunday April 30, [EMAIL PROTECTED] wrote: NeilBrown <[EMAIL PROTECTED]> wrote: When a md array has been idle (no writes) for 20msecs it is marked as 'clean'. This delay turns out to be too short for some real workloads. So increase it to 200msec (the time to update the metadata should be a tiny fraction of that) and make it sysfs-configurable. ... + safe_mode_delay + When an md array has seen no write requests for a certain period + of time, it will be marked as 'clean'. When another write + request arrive, the array is marked as 'dirty' before the write + commenses. This is known as 'safe_mode'. + The 'certain period' is controlled by this file which stores the + period as a number of seconds. The default is 200msec (0.200). + Writing a value of 0 disables safemode. + Why not make the units milliseconds? Rename this to safe_mode_delay_msecs to remove any doubt. Because umpteen years ago when I was adding thread-usage statistics to /proc/net/rpc/nfsd I used milliseconds and Linus asked me to make it seconds - a much more "obvious" unit. See Email below. It seems very sensible to me. Either way, all ambiguity is removed if you put the unit in the name. And don't use jiffies because that obviously is not portable (which sounds like it was Linus' biggest concern). Once you do that, I don't much care whether you use seconds or milliseconds. Other than to note that many of our units now are ms, especially when they're measuring things at or around the ms order of magnitude. But I'm not aware of so many proc values that don't work in integers. -- Send instant messages to your online friends http://au.messenger.yahoo.com - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 004 of 11] md: Increase the delay before marking metadata clean, and make it configurable.
Neil Brown <[EMAIL PROTECTED]> wrote: > > On Sunday April 30, [EMAIL PROTECTED] wrote: > > NeilBrown <[EMAIL PROTECTED]> wrote: > > > > > > > > > When a md array has been idle (no writes) for 20msecs it is marked as > > > 'clean'. This delay turns out to be too short for some real > > > workloads. So increase it to 200msec (the time to update the metadata > > > should be a tiny fraction of that) and make it sysfs-configurable. > > > > > > > > > ... > > > > > > + safe_mode_delay > > > + When an md array has seen no write requests for a certain period > > > + of time, it will be marked as 'clean'. When another write > > > + request arrive, the array is marked as 'dirty' before the write > > > + commenses. This is known as 'safe_mode'. > > > + The 'certain period' is controlled by this file which stores the > > > + period as a number of seconds. The default is 200msec (0.200). > > > + Writing a value of 0 disables safemode. > > > + > > > > Why not make the units milliseconds? Rename this to safe_mode_delay_msecs > > to remove any doubt. > > Because umpteen years ago when I was adding thread-usage statistics to > /proc/net/rpc/nfsd I used milliseconds and Linus asked me to make it > seconds - a much more "obvious" unit. See Email below. > It seems very sensible to me. That's output. It's easier to do the conversion with output. And I guess one could argue that lots of people read /proc files, but few write to them. Generally I don't think we should be teaching the kernel to accept pretend-floating-point numbers like this, especially when a) "delay in milliseconds" is such a simple concept and b) it's so easy to go from float to milliseconds in userspace. Do you really expect that humans (really dumb ones ;)) will be echoing numbers into this file? Or will it mainly be a thing for mdadm to fiddle with? - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 004 of 11] md: Increase the delay before marking metadata clean, and make it configurable.
On Sunday April 30, [EMAIL PROTECTED] wrote: > NeilBrown <[EMAIL PROTECTED]> wrote: > > > > > > When a md array has been idle (no writes) for 20msecs it is marked as > > 'clean'. This delay turns out to be too short for some real > > workloads. So increase it to 200msec (the time to update the metadata > > should be a tiny fraction of that) and make it sysfs-configurable. > > > > > > ... > > > > + safe_mode_delay > > + When an md array has seen no write requests for a certain period > > + of time, it will be marked as 'clean'. When another write > > + request arrive, the array is marked as 'dirty' before the write > > + commenses. This is known as 'safe_mode'. > > + The 'certain period' is controlled by this file which stores the > > + period as a number of seconds. The default is 200msec (0.200). > > + Writing a value of 0 disables safemode. > > + > > Why not make the units milliseconds? Rename this to safe_mode_delay_msecs > to remove any doubt. Because umpteen years ago when I was adding thread-usage statistics to /proc/net/rpc/nfsd I used milliseconds and Linus asked me to make it seconds - a much more "obvious" unit. See Email below. It seems very sensible to me. ... > > + msec = simple_strtoul(buf, &e, 10); > > + if (e == buf || (*e && *e != '\n')) > > + return -EINVAL; > > + msec = (msec * 1000) / scale; > > + if (msec == 0) > > + mddev->safemode_delay = 0; > > + else { > > + mddev->safemode_delay = (msec*HZ)/1000; > > + if (mddev->safemode_delay == 0) > > + mddev->safemode_delay = 1; > > + } > > + return len; > > And most of that goes away. Maybe it could go in a library :-? NeilBrown From: Linus Torvalds <[EMAIL PROTECTED]> To: Neil Brown <[EMAIL PROTECTED]> cc: [EMAIL PROTECTED] Subject: Re: PATCH knfsd - stats tidy up. Date: Tue, 18 Jul 2000 12:21:12 -0700 (PDT) Content-Type: TEXT/PLAIN; charset=US-ASCII On Tue, 18 Jul 2000, Neil Brown wrote: > > The following patch converts jiffies to milliseconds for output, and > also makes the number wrap predicatably at 1,000,000 seconds > (approximately one fortnight). If no programs depend on the format, I actually prefer format changes like this to be of the "obvious" kind. One such obvious kind is the format 0.001 which obviously means 0.001 seconds. And yes, I'm _really_ sorry that a lot of the old /proc files contain jiffies. Lazy. Ugly. Bad. Much of it my bad. Doing 0.001 doesn't mean that you have to use floating point, in fact you've done most of the work already in your ms patch, just splitting things out a bit works well: /* gcc knows to combine / and % - generate one "divl" */ unsigned int sec = time / HZ, msec = time % HZ; msec = (msec * 1000) / HZ; sprintf(" %d.%03d", sec, msec) (It's basically the same thing you already do, except it doesn't re-combine the seconds and milliseconds but just prints them out separately.. And it has the advantage that if you want to change it to microseconds some day, you can do so very trivially without breaking the format. Plus it's readable as hell.) Linus - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 004 of 11] md: Increase the delay before marking metadata clean, and make it configurable.
NeilBrown <[EMAIL PROTECTED]> wrote: > > > When a md array has been idle (no writes) for 20msecs it is marked as > 'clean'. This delay turns out to be too short for some real > workloads. So increase it to 200msec (the time to update the metadata > should be a tiny fraction of that) and make it sysfs-configurable. > > > ... > > + safe_mode_delay > + When an md array has seen no write requests for a certain period > + of time, it will be marked as 'clean'. When another write > + request arrive, the array is marked as 'dirty' before the write > + commenses. This is known as 'safe_mode'. > + The 'certain period' is controlled by this file which stores the > + period as a number of seconds. The default is 200msec (0.200). > + Writing a value of 0 disables safemode. > + Why not make the units milliseconds? Rename this to safe_mode_delay_msecs to remove any doubt. > +static ssize_t > +safe_delay_store(mddev_t *mddev, const char *cbuf, size_t len) > +{ > + int scale=1; > + int dot=0; > + int i; > + unsigned long msec; > + char buf[30]; > + char *e; > + /* remove a period, and count digits after it */ > + if (len >= sizeof(buf)) > + return -EINVAL; > + strlcpy(buf, cbuf, len); > + buf[len] = 0; > + for (i=0; i + if (dot) { > + if (isdigit(buf[i])) { > + buf[i-1] = buf[i]; > + scale *= 10; > + } > + buf[i] = 0; > + } else if (buf[i] == '.') { > + dot=1; > + buf[i] = 0; > + } > + } > + msec = simple_strtoul(buf, &e, 10); > + if (e == buf || (*e && *e != '\n')) > + return -EINVAL; > + msec = (msec * 1000) / scale; > + if (msec == 0) > + mddev->safemode_delay = 0; > + else { > + mddev->safemode_delay = (msec*HZ)/1000; > + if (mddev->safemode_delay == 0) > + mddev->safemode_delay = 1; > + } > + return len; And most of that goes away. - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html