Re: [PATCH 004 of 11] md: Increase the delay before marking metadata clean, and make it configurable.

2006-05-08 Thread Neil Brown
On Tuesday May 2, [EMAIL PROTECTED] wrote:
> On Mon, May 01, 2006 at 03:30:19PM +1000, NeilBrown wrote:
> > When a md array has been idle (no writes) for 20msecs it is marked as
> > 'clean'.  This delay turns out to be too short for some real
> > workloads.  So increase it to 200msec (the time to update the metadata
> > should be a tiny fraction of that) and make it sysfs-configurable.
> 
> What does this mean, 'too short'? What happens in that case, backing block
> devices are still busy writing? When making this configurable, the help text
> better explain what the trade offs are.

"too short" means that the update happens often enough to cause a
noticeable performance degradation.

In an application writes steadily very 21msecs (or maybe 30msecs) then
there will be 2 superblock writes and 1 application write every
21msecs, and this causes enough disk io to close the app down. - I
guess all the updates fill up the 21msec space.

With a larger delay - 200msec - you could still get bad situations
e.g. with the app writing every 210msecs.  However 2 superblock
updates plus one app write is a much smaller fraction of 200msecs, so
there shouldn't be as many problems.

Yes, a more detailed explanation should go in Documentation/md.txt

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 004 of 11] md: Increase the delay before marking metadata clean, and make it configurable.

2006-05-01 Thread bert hubert
On Mon, May 01, 2006 at 03:30:19PM +1000, NeilBrown wrote:
> When a md array has been idle (no writes) for 20msecs it is marked as
> 'clean'.  This delay turns out to be too short for some real
> workloads.  So increase it to 200msec (the time to update the metadata
> should be a tiny fraction of that) and make it sysfs-configurable.

What does this mean, 'too short'? What happens in that case, backing block
devices are still busy writing? When making this configurable, the help text
better explain what the trade offs are.

Thanks.

-- 
http://www.PowerDNS.com  Open source, database driven DNS Software 
http://netherlabs.nl  Open and Closed source services
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 004 of 11] md: Increase the delay before marking metadata clean, and make it configurable.

2006-05-01 Thread Linus Torvalds


On Sun, 30 Apr 2006, Andrew Morton wrote:
> 
> Generally I don't think we should be teaching the kernel to accept
> pretend-floating-point numbers like this, especially when a) "delay in
> milliseconds" is such a simple concept and b) it's so easy to go from float
> to milliseconds in userspace.
> 
> Do you really expect that humans (really dumb ones ;)) will be echoing
> numbers into this file?  Or will it mainly be a thing for mdadm to fiddle
> with?

I generally hate interfaces that have some "random base".

So "delay in seconds" is not a random base, because "seconds" is a good SI 
base unit, and there's not a lot of question about it. But once you start 
talking milliseconds on microseconds, I'd actually much rather have a 
"fake floating point number" over having different files have different 
(magic) base constants. How do you remember which are milliseconds, which 
are microseconds, and which are just seconds?

It should be easy to have a helper function or two that takes a "struct 
timeval" and reads/writes a "float".

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 004 of 11] md: Increase the delay before marking metadata clean, and make it configurable.

2006-04-30 Thread Nick Piggin

Neil Brown wrote:


On Sunday April 30, [EMAIL PROTECTED] wrote:


NeilBrown <[EMAIL PROTECTED]> wrote:



When a md array has been idle (no writes) for 20msecs it is marked as
'clean'.  This delay turns out to be too short for some real
workloads.  So increase it to 200msec (the time to update the metadata
should be a tiny fraction of that) and make it sysfs-configurable.


...

+   safe_mode_delay
+ When an md array has seen no write requests for a certain period
+ of time, it will be marked as 'clean'.  When another write
+ request arrive, the array is marked as 'dirty' before the write
+ commenses.  This is known as 'safe_mode'.
+ The 'certain period' is controlled by this file which stores the
+ period as a number of seconds.  The default is 200msec (0.200).
+ Writing a value of 0 disables safemode.
+


Why not make the units milliseconds?  Rename this to safe_mode_delay_msecs
to remove any doubt.



Because umpteen years ago when I was adding thread-usage statistics to
/proc/net/rpc/nfsd I used milliseconds and Linus asked me to make it
seconds - a much more "obvious" unit.  See Email below.
It seems very sensible to me.



Either way, all ambiguity is removed if you put the unit in the name. And
don't use jiffies because that obviously is not portable (which sounds like
it was Linus' biggest concern).

Once you do that, I don't much care whether you use seconds or milliseconds.
Other than to note that many of our units now are ms, especially when 
they're
measuring things at or around the ms order of magnitude. But I'm not 
aware of

so many proc values that don't work in integers.

--

Send instant messages to your online friends http://au.messenger.yahoo.com 


-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 004 of 11] md: Increase the delay before marking metadata clean, and make it configurable.

2006-04-30 Thread Andrew Morton
Neil Brown <[EMAIL PROTECTED]> wrote:
>
>  On Sunday April 30, [EMAIL PROTECTED] wrote:
>  > NeilBrown <[EMAIL PROTECTED]> wrote:
>  > >
>  > > 
>  > > When a md array has been idle (no writes) for 20msecs it is marked as
>  > > 'clean'.  This delay turns out to be too short for some real
>  > > workloads.  So increase it to 200msec (the time to update the metadata
>  > > should be a tiny fraction of that) and make it sysfs-configurable.
>  > > 
>  > > 
>  > > ...
>  > > 
>  > > +   safe_mode_delay
>  > > + When an md array has seen no write requests for a certain period
>  > > + of time, it will be marked as 'clean'.  When another write
>  > > + request arrive, the array is marked as 'dirty' before the write
>  > > + commenses.  This is known as 'safe_mode'.
>  > > + The 'certain period' is controlled by this file which stores the
>  > > + period as a number of seconds.  The default is 200msec (0.200).
>  > > + Writing a value of 0 disables safemode.
>  > > +
>  > 
>  > Why not make the units milliseconds?  Rename this to safe_mode_delay_msecs
>  > to remove any doubt.
> 
>  Because umpteen years ago when I was adding thread-usage statistics to
>  /proc/net/rpc/nfsd I used milliseconds and Linus asked me to make it
>  seconds - a much more "obvious" unit.  See Email below.
>  It seems very sensible to me.

That's output.  It's easier to do the conversion with output.  And I guess
one could argue that lots of people read /proc files, but few write to
them.

Generally I don't think we should be teaching the kernel to accept
pretend-floating-point numbers like this, especially when a) "delay in
milliseconds" is such a simple concept and b) it's so easy to go from float
to milliseconds in userspace.

Do you really expect that humans (really dumb ones ;)) will be echoing
numbers into this file?  Or will it mainly be a thing for mdadm to fiddle
with?
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 004 of 11] md: Increase the delay before marking metadata clean, and make it configurable.

2006-04-30 Thread Neil Brown
On Sunday April 30, [EMAIL PROTECTED] wrote:
> NeilBrown <[EMAIL PROTECTED]> wrote:
> >
> > 
> > When a md array has been idle (no writes) for 20msecs it is marked as
> > 'clean'.  This delay turns out to be too short for some real
> > workloads.  So increase it to 200msec (the time to update the metadata
> > should be a tiny fraction of that) and make it sysfs-configurable.
> > 
> > 
> > ...
> > 
> > +   safe_mode_delay
> > + When an md array has seen no write requests for a certain period
> > + of time, it will be marked as 'clean'.  When another write
> > + request arrive, the array is marked as 'dirty' before the write
> > + commenses.  This is known as 'safe_mode'.
> > + The 'certain period' is controlled by this file which stores the
> > + period as a number of seconds.  The default is 200msec (0.200).
> > + Writing a value of 0 disables safemode.
> > +
> 
> Why not make the units milliseconds?  Rename this to safe_mode_delay_msecs
> to remove any doubt.

Because umpteen years ago when I was adding thread-usage statistics to
/proc/net/rpc/nfsd I used milliseconds and Linus asked me to make it
seconds - a much more "obvious" unit.  See Email below.
It seems very sensible to me.

...
> > +   msec = simple_strtoul(buf, &e, 10);
> > +   if (e == buf || (*e && *e != '\n'))
> > +   return -EINVAL;
> > +   msec = (msec * 1000) / scale;
> > +   if (msec == 0)
> > +   mddev->safemode_delay = 0;
> > +   else {
> > +   mddev->safemode_delay = (msec*HZ)/1000;
> > +   if (mddev->safemode_delay == 0)
> > +   mddev->safemode_delay = 1;
> > +   }
> > +   return len;
> 
> And most of that goes away.

Maybe it could go in a library :-?

NeilBrown



From: Linus Torvalds <[EMAIL PROTECTED]>
To: Neil Brown <[EMAIL PROTECTED]>
cc: [EMAIL PROTECTED]
Subject: Re: PATCH knfsd - stats tidy up.
Date: Tue, 18 Jul 2000 12:21:12 -0700 (PDT)
Content-Type: TEXT/PLAIN; charset=US-ASCII



On Tue, 18 Jul 2000, Neil Brown wrote:
> 
> The following patch converts jiffies to milliseconds for output, and
> also makes the number wrap predicatably at 1,000,000 seconds
> (approximately one fortnight).

If no programs depend on the format, I actually prefer format changes like
this to be of the "obvious" kind. One such obvious kind is the format

0.001

which obviously means 0.001 seconds. 

And yes, I'm _really_ sorry that a lot of the old /proc files contain
jiffies. Lazy. Ugly. Bad. Much of it my bad.

Doing 0.001 doesn't mean that you have to use floating point, in fact
you've done most of the work already in your ms patch, just splitting
things out a bit works well:

/* gcc knows to combine / and % - generate one "divl" */
unsigned int sec = time / HZ, msec = time % HZ;
msec = (msec * 1000) / HZ;

sprintf(" %d.%03d", sec, msec)

(It's basically the same thing you already do, except it doesn't
re-combine the seconds and milliseconds but just prints them out
separately.. And it has the advantage that if you want to change it to
microseconds some day, you can do so very trivially without breaking the
format. Plus it's readable as hell.)

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 004 of 11] md: Increase the delay before marking metadata clean, and make it configurable.

2006-04-30 Thread Andrew Morton
NeilBrown <[EMAIL PROTECTED]> wrote:
>
> 
> When a md array has been idle (no writes) for 20msecs it is marked as
> 'clean'.  This delay turns out to be too short for some real
> workloads.  So increase it to 200msec (the time to update the metadata
> should be a tiny fraction of that) and make it sysfs-configurable.
> 
> 
> ...
> 
> +   safe_mode_delay
> + When an md array has seen no write requests for a certain period
> + of time, it will be marked as 'clean'.  When another write
> + request arrive, the array is marked as 'dirty' before the write
> + commenses.  This is known as 'safe_mode'.
> + The 'certain period' is controlled by this file which stores the
> + period as a number of seconds.  The default is 200msec (0.200).
> + Writing a value of 0 disables safemode.
> +

Why not make the units milliseconds?  Rename this to safe_mode_delay_msecs
to remove any doubt.

> +static ssize_t
> +safe_delay_store(mddev_t *mddev, const char *cbuf, size_t len)
> +{
> + int scale=1;
> + int dot=0;
> + int i;
> + unsigned long msec;
> + char buf[30];
> + char *e;
> + /* remove a period, and count digits after it */
> + if (len >= sizeof(buf))
> + return -EINVAL;
> + strlcpy(buf, cbuf, len);
> + buf[len] = 0;
> + for (i=0; i + if (dot) {
> + if (isdigit(buf[i])) {
> + buf[i-1] = buf[i];
> + scale *= 10;
> + }
> + buf[i] = 0;
> + } else if (buf[i] == '.') {
> + dot=1;
> + buf[i] = 0;
> + }
> + }
> + msec = simple_strtoul(buf, &e, 10);
> + if (e == buf || (*e && *e != '\n'))
> + return -EINVAL;
> + msec = (msec * 1000) / scale;
> + if (msec == 0)
> + mddev->safemode_delay = 0;
> + else {
> + mddev->safemode_delay = (msec*HZ)/1000;
> + if (mddev->safemode_delay == 0)
> + mddev->safemode_delay = 1;
> + }
> + return len;

And most of that goes away.


-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 004 of 11] md: Increase the delay before marking metadata clean, and make it configurable.

2006-04-30 Thread NeilBrown

When a md array has been idle (no writes) for 20msecs it is marked as
'clean'.  This delay turns out to be too short for some real
workloads.  So increase it to 200msec (the time to update the metadata
should be a tiny fraction of that) and make it sysfs-configurable.

Signed-off-by: Neil Brown <[EMAIL PROTECTED]>

### Diffstat output
 ./Documentation/md.txt |9 
 ./drivers/md/md.c  |   54 +++--
 2 files changed, 61 insertions(+), 2 deletions(-)

diff ./Documentation/md.txt~current~ ./Documentation/md.txt
--- ./Documentation/md.txt~current~ 2006-05-01 15:09:20.0 +1000
+++ ./Documentation/md.txt  2006-05-01 15:10:18.0 +1000
@@ -207,6 +207,15 @@ All md devices contain:
  available.  It will then appear at md/dev-XXX (depending on the
  name of the device) and further configuration is then possible.
 
+   safe_mode_delay
+ When an md array has seen no write requests for a certain period
+ of time, it will be marked as 'clean'.  When another write
+ request arrive, the array is marked as 'dirty' before the write
+ commenses.  This is known as 'safe_mode'.
+ The 'certain period' is controlled by this file which stores the
+ period as a number of seconds.  The default is 200msec (0.200).
+ Writing a value of 0 disables safemode.
+
sync_speed_min
sync_speed_max
  This are similar to /proc/sys/dev/raid/speed_limit_{min,max}

diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~  2006-05-01 15:10:17.0 +1000
+++ ./drivers/md/md.c   2006-05-01 15:10:18.0 +1000
@@ -43,6 +43,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -1968,6 +1969,54 @@ static void analyze_sbs(mddev_t * mddev)
 }
 
 static ssize_t
+safe_delay_show(mddev_t *mddev, char *page)
+{
+   int msec = (mddev->safemode_delay*1000)/HZ;
+   return sprintf(page, "%d.%03d\n", msec/1000, msec%1000);
+}
+static ssize_t
+safe_delay_store(mddev_t *mddev, const char *cbuf, size_t len)
+{
+   int scale=1;
+   int dot=0;
+   int i;
+   unsigned long msec;
+   char buf[30];
+   char *e;
+   /* remove a period, and count digits after it */
+   if (len >= sizeof(buf))
+   return -EINVAL;
+   strlcpy(buf, cbuf, len);
+   buf[len] = 0;
+   for (i=0; isafemode_delay = 0;
+   else {
+   mddev->safemode_delay = (msec*HZ)/1000;
+   if (mddev->safemode_delay == 0)
+   mddev->safemode_delay = 1;
+   }
+   return len;
+}
+static struct md_sysfs_entry md_safe_delay =
+__ATTR(safe_mode_delay, 0644,safe_delay_show, safe_delay_store);
+
+static ssize_t
 level_show(mddev_t *mddev, char *page)
 {
struct mdk_personality *p = mddev->pers;
@@ -2423,6 +2472,7 @@ static struct attribute *md_default_attr
&md_size.attr,
&md_metadata.attr,
&md_new_device.attr,
+   &md_safe_delay.attr,
NULL,
 };
 
@@ -2695,7 +2745,7 @@ static int do_md_run(mddev_t * mddev)
mddev->safemode = 0;
mddev->safemode_timer.function = md_safemode_timeout;
mddev->safemode_timer.data = (unsigned long) mddev;
-   mddev->safemode_delay = (20 * HZ)/1000 +1; /* 20 msec delay */
+   mddev->safemode_delay = (200 * HZ)/1000 +1; /* 200 msec delay */
mddev->in_sync = 1;
 
ITERATE_RDEV(mddev,rdev,tmp)
@@ -4581,7 +4631,7 @@ void md_write_end(mddev_t *mddev)
if (atomic_dec_and_test(&mddev->writes_pending)) {
if (mddev->safemode == 2)
md_wakeup_thread(mddev->thread);
-   else
+   else if (mddev->safemode_delay)
mod_timer(&mddev->safemode_timer, jiffies + 
mddev->safemode_delay);
}
 }
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html