Re: [PATCH 004 of 11] md: Increase the delay before marking metadata clean, and make it configurable.

2006-05-01 Thread Neil Brown
On Sunday April 30, [EMAIL PROTECTED] wrote:
 NeilBrown [EMAIL PROTECTED] wrote:
 
  
  When a md array has been idle (no writes) for 20msecs it is marked as
  'clean'.  This delay turns out to be too short for some real
  workloads.  So increase it to 200msec (the time to update the metadata
  should be a tiny fraction of that) and make it sysfs-configurable.
  
  
  ...
  
  +   safe_mode_delay
  + When an md array has seen no write requests for a certain period
  + of time, it will be marked as 'clean'.  When another write
  + request arrive, the array is marked as 'dirty' before the write
  + commenses.  This is known as 'safe_mode'.
  + The 'certain period' is controlled by this file which stores the
  + period as a number of seconds.  The default is 200msec (0.200).
  + Writing a value of 0 disables safemode.
  +
 
 Why not make the units milliseconds?  Rename this to safe_mode_delay_msecs
 to remove any doubt.

Because umpteen years ago when I was adding thread-usage statistics to
/proc/net/rpc/nfsd I used milliseconds and Linus asked me to make it
seconds - a much more obvious unit.  See Email below.
It seems very sensible to me.

...
  +   msec = simple_strtoul(buf, e, 10);
  +   if (e == buf || (*e  *e != '\n'))
  +   return -EINVAL;
  +   msec = (msec * 1000) / scale;
  +   if (msec == 0)
  +   mddev-safemode_delay = 0;
  +   else {
  +   mddev-safemode_delay = (msec*HZ)/1000;
  +   if (mddev-safemode_delay == 0)
  +   mddev-safemode_delay = 1;
  +   }
  +   return len;
 
 And most of that goes away.

Maybe it could go in a library :-?

NeilBrown



From: Linus Torvalds [EMAIL PROTECTED]
To: Neil Brown [EMAIL PROTECTED]
cc: [EMAIL PROTECTED]
Subject: Re: PATCH knfsd - stats tidy up.
Date: Tue, 18 Jul 2000 12:21:12 -0700 (PDT)
Content-Type: TEXT/PLAIN; charset=US-ASCII



On Tue, 18 Jul 2000, Neil Brown wrote:
 
 The following patch converts jiffies to milliseconds for output, and
 also makes the number wrap predicatably at 1,000,000 seconds
 (approximately one fortnight).

If no programs depend on the format, I actually prefer format changes like
this to be of the obvious kind. One such obvious kind is the format

0.001

which obviously means 0.001 seconds. 

And yes, I'm _really_ sorry that a lot of the old /proc files contain
jiffies. Lazy. Ugly. Bad. Much of it my bad.

Doing 0.001 doesn't mean that you have to use floating point, in fact
you've done most of the work already in your ms patch, just splitting
things out a bit works well:

/* gcc knows to combine / and % - generate one divl */
unsigned int sec = time / HZ, msec = time % HZ;
msec = (msec * 1000) / HZ;

sprintf( %d.%03d, sec, msec)

(It's basically the same thing you already do, except it doesn't
re-combine the seconds and milliseconds but just prints them out
separately.. And it has the advantage that if you want to change it to
microseconds some day, you can do so very trivially without breaking the
format. Plus it's readable as hell.)

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 004 of 11] md: Increase the delay before marking metadata clean, and make it configurable.

2006-05-01 Thread Andrew Morton
Neil Brown [EMAIL PROTECTED] wrote:

  On Sunday April 30, [EMAIL PROTECTED] wrote:
   NeilBrown [EMAIL PROTECTED] wrote:
   

When a md array has been idle (no writes) for 20msecs it is marked as
'clean'.  This delay turns out to be too short for some real
workloads.  So increase it to 200msec (the time to update the metadata
should be a tiny fraction of that) and make it sysfs-configurable.


...

+   safe_mode_delay
+ When an md array has seen no write requests for a certain period
+ of time, it will be marked as 'clean'.  When another write
+ request arrive, the array is marked as 'dirty' before the write
+ commenses.  This is known as 'safe_mode'.
+ The 'certain period' is controlled by this file which stores the
+ period as a number of seconds.  The default is 200msec (0.200).
+ Writing a value of 0 disables safemode.
+
   
   Why not make the units milliseconds?  Rename this to safe_mode_delay_msecs
   to remove any doubt.
 
  Because umpteen years ago when I was adding thread-usage statistics to
  /proc/net/rpc/nfsd I used milliseconds and Linus asked me to make it
  seconds - a much more obvious unit.  See Email below.
  It seems very sensible to me.

That's output.  It's easier to do the conversion with output.  And I guess
one could argue that lots of people read /proc files, but few write to
them.

Generally I don't think we should be teaching the kernel to accept
pretend-floating-point numbers like this, especially when a) delay in
milliseconds is such a simple concept and b) it's so easy to go from float
to milliseconds in userspace.

Do you really expect that humans (really dumb ones ;)) will be echoing
numbers into this file?  Or will it mainly be a thing for mdadm to fiddle
with?
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 004 of 11] md: Increase the delay before marking metadata clean, and make it configurable.

2006-05-01 Thread Nick Piggin

Neil Brown wrote:


On Sunday April 30, [EMAIL PROTECTED] wrote:


NeilBrown [EMAIL PROTECTED] wrote:



When a md array has been idle (no writes) for 20msecs it is marked as
'clean'.  This delay turns out to be too short for some real
workloads.  So increase it to 200msec (the time to update the metadata
should be a tiny fraction of that) and make it sysfs-configurable.


...

+   safe_mode_delay
+ When an md array has seen no write requests for a certain period
+ of time, it will be marked as 'clean'.  When another write
+ request arrive, the array is marked as 'dirty' before the write
+ commenses.  This is known as 'safe_mode'.
+ The 'certain period' is controlled by this file which stores the
+ period as a number of seconds.  The default is 200msec (0.200).
+ Writing a value of 0 disables safemode.
+


Why not make the units milliseconds?  Rename this to safe_mode_delay_msecs
to remove any doubt.



Because umpteen years ago when I was adding thread-usage statistics to
/proc/net/rpc/nfsd I used milliseconds and Linus asked me to make it
seconds - a much more obvious unit.  See Email below.
It seems very sensible to me.



Either way, all ambiguity is removed if you put the unit in the name. And
don't use jiffies because that obviously is not portable (which sounds like
it was Linus' biggest concern).

Once you do that, I don't much care whether you use seconds or milliseconds.
Other than to note that many of our units now are ms, especially when 
they're
measuring things at or around the ms order of magnitude. But I'm not 
aware of

so many proc values that don't work in integers.

--

Send instant messages to your online friends http://au.messenger.yahoo.com 


-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raid5 resizing

2006-05-01 Thread Mike Hardy

Neil Brown wrote:
 On Monday May 1, [EMAIL PROTECTED] wrote:
 
Hey folks.

There's no point in using LVM on a raid5 setup if all you intend to do
in the future is resize the filesystem on it, is there? The new raid5
resizing code takes care of providing the extra space and then as long
as the say ext3 filesystem is created with resize_inode all should be
sweet. Right? Or have I missed something crucial here? :)
 
 
 You are correct.  md/raid5 makes the extra space available all by
 itself. 

Further - even if you don't create the filesystem with the right amount
of extra metadata space for online resizing, you can resize any ext2/3
filesystem offline, and it doesn't take very long. You just use
resize2fsf instead of ext2online

-Mike
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Two-disk RAID5?

2006-05-01 Thread Erik Mouw
On Wed, Apr 26, 2006 at 03:22:38PM -0400, Jon Lewis wrote:
 On Wed, 26 Apr 2006, Jansen, Frank wrote:
 
 It is not possible to flip a bit to change a set of disks from RAID 1 to
 RAID 5, as the physical layout is different.
 
 As Tuomas pointed out though, a 2 disk RAID5 is kind of a special case 
 where all you have is data and parity which is actually also just data. 

No, the other way around: RAID1 is a special case of RAID5.

The parity of RAID5 with n disks is contructed like[1]:

  parity = disk1 XOR disk2 XOR ... XOR disk n-1

With n = 2, this reduces to:

  parity = disk1 XOR nothing = disk1

Which is just mirroring, which we usually call RAID1.

 Seems kind of like a RAID1 with extra overhead.  I don't think I've ever 
 heard of a RAID5 implementation willing to handle 3 drives though.

Our own RAID recovery tools can handle that just fine.


Erik

[1] Yes, there's also an algorithm to select which disk is used for
parity for what block, but that doesn't change the way *how* parity is
calculated.

-- 
+-- Erik Mouw -- www.harddisk-recovery.com -- +31 70 370 12 90 --
| Lab address: Delftechpark 26, 2628 XH, Delft, The Netherlands
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: md: Change ENOTSUPP to EOPNOTSUPP

2006-05-01 Thread Mike Hardy

Paul Clements wrote:
 Gil wrote:
 
 So for those of us using other filesystems (e.g. ext2/3), is there
 some way to determine whether or not barriers are available?
 
 
 You'll see something like this in your system log if barriers are not
 supported:
 
 Apr  3 16:44:01 adam kernel: JBD: barrier-based sync failed on md0 -
 disabling barriers
 
 
 Otherwise, assume that they are. But like Neil said, it shouldn't matter
 to a user whether they are supported or not. Filesystems will work
 correctly either way.

This seems very important to me to understand thoroughly, so please
forgive me if I'm being dense.

What I'm not sure of in the above is for what definition of working?

For the definition where the code simply doesn't bomb out, or for the
stricter definition that despite write caching at the drive level there
is no point where there could possibly be a data inconsistency between
what the filesystem thinks is written and what got written, power loss
or no?

My understanding to this point is that with write caching and no barrier
support, you would still care as power loss would give you a window of
inconsistency.

With the exception of the very minor situation Neil mentioned about the
first write through md not being a superblock write...

-Mike
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 004 of 11] md: Increase the delay before marking metadata clean, and make it configurable.

2006-05-01 Thread bert hubert
On Mon, May 01, 2006 at 03:30:19PM +1000, NeilBrown wrote:
 When a md array has been idle (no writes) for 20msecs it is marked as
 'clean'.  This delay turns out to be too short for some real
 workloads.  So increase it to 200msec (the time to update the metadata
 should be a tiny fraction of that) and make it sysfs-configurable.

What does this mean, 'too short'? What happens in that case, backing block
devices are still busy writing? When making this configurable, the help text
better explain what the trade offs are.

Thanks.

-- 
http://www.PowerDNS.com  Open source, database driven DNS Software 
http://netherlabs.nl  Open and Closed source services
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html