Re: Can't get drives containing spare devices to spindown

2006-06-21 Thread Neil Brown
On Thursday June 22, [EMAIL PROTECTED] wrote:
> >
> Thanks Neil for your quick reply. Would it be possible to elaborate a 
> bit on the problem and the solution? I guess I won't be on 2.6.18 for 
> some time...
> 

When an array has been idle (no writes) for a short time (20 or 200
ms, depending on which kernel you are running) the array is flagged as
'clean'. so that a crash/power failure at that point will not require
a full resync.  The 'clean' flag is stored on all superblocks,
including the spares.  So this causes writes to all devices when there
is changes to activity status.

Even fairly quite filesystems see occasional updates (updating atime
on files, or such syncing the journal), and that causes all devices to
be touched.

Fix
 1/ don't set the 'dirty' flag on spares - there really is no need.

However whenever the dirty bit is changed, the 'events' count is
updated, so just doing the above will cause the spares to get way
behind the main devices in their 'events' count so they will no longer
be treated as part of the array.  So

 2/ When clearing the dirty flag (and nothing else has happened),
   decrement the events count rather than increment it.

Together, these mean that simple dirty/clean transitions do not touch
the spares.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Can't get drives containing spare devices to spindown

2006-06-21 Thread Marc L. de Bruin

Neil Brown wrote:


On Thursday June 22, [EMAIL PROTECTED] wrote:
 


Marc L. de Bruin wrote:

   

Situation: /dev/md0, type raid1, containing 2 active devices 
(/dev/hda1 and /dev/hdc1) and 2 spare devices (/dev/hde1 and /dev/hdg1).


Those two spare 'partitions' are the only partitions on those disks 
and therefore I'd like to spin down those disks using hdparm for 
obvious reasons (noise, heat). Specifically, 'hdparm -S  
' sets the standby (spindown) timeout for a drive; the value 
is used by the drive to determine how long to wait (with no disk 
activity) before turning off the spindle motor to save power.


However, it turns out that md actually sort-of prevents those spare 
disks to spindown. I can get them off for about 3 to 4 seconds, after 
which they immediately spin up again. Removing the spare devices from 
/dev/md0 (mdadm /dev/md0 --remove /dev/hd[eg]1) actually solves this, 
but I have no intention actually removing those devices.


How can I make sure that I'm actually able to spin down those two 
spare drives?
 



This is fixed in current -mm kernels and the fix should be in 2.6.18.

NeilBrown
 

Thanks Neil for your quick reply. Would it be possible to elaborate a 
bit on the problem and the solution? I guess I won't be on 2.6.18 for 
some time...


Marc.
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Can't get drives containing spare devices to spindown

2006-06-21 Thread Neil Brown
On Thursday June 22, [EMAIL PROTECTED] wrote:
> Marc L. de Bruin wrote:
> 
> > Situation: /dev/md0, type raid1, containing 2 active devices 
> > (/dev/hda1 and /dev/hdc1) and 2 spare devices (/dev/hde1 and /dev/hdg1).
> >
> > Those two spare 'partitions' are the only partitions on those disks 
> > and therefore I'd like to spin down those disks using hdparm for 
> > obvious reasons (noise, heat). Specifically, 'hdparm -S  
> > ' sets the standby (spindown) timeout for a drive; the value 
> > is used by the drive to determine how long to wait (with no disk 
> > activity) before turning off the spindle motor to save power.
> >
> > However, it turns out that md actually sort-of prevents those spare 
> > disks to spindown. I can get them off for about 3 to 4 seconds, after 
> > which they immediately spin up again. Removing the spare devices from 
> > /dev/md0 (mdadm /dev/md0 --remove /dev/hd[eg]1) actually solves this, 
> > but I have no intention actually removing those devices.
> >
> > How can I make sure that I'm actually able to spin down those two 
> > spare drives?

This is fixed in current -mm kernels and the fix should be in 2.6.18.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Can't get drives containing spare devices to spindown

2006-06-21 Thread Marc L. de Bruin

Marc L. de Bruin wrote:

Situation: /dev/md0, type raid1, containing 2 active devices 
(/dev/hda1 and /dev/hdc1) and 2 spare devices (/dev/hde1 and /dev/hdg1).


Those two spare 'partitions' are the only partitions on those disks 
and therefore I'd like to spin down those disks using hdparm for 
obvious reasons (noise, heat). Specifically, 'hdparm -S  
' sets the standby (spindown) timeout for a drive; the value 
is used by the drive to determine how long to wait (with no disk 
activity) before turning off the spindle motor to save power.


However, it turns out that md actually sort-of prevents those spare 
disks to spindown. I can get them off for about 3 to 4 seconds, after 
which they immediately spin up again. Removing the spare devices from 
/dev/md0 (mdadm /dev/md0 --remove /dev/hd[eg]1) actually solves this, 
but I have no intention actually removing those devices.


How can I make sure that I'm actually able to spin down those two 
spare drives?


I'm replying to myself here which seems pointless, but AFAIK I got no 
reply and I still believe this is an interesting issue. :-)


Also, I have some extra info. After doing some research, it seems that 
the busy-ness of the filesystem matters too? For example, if I create a 
/dev/md1 on /dev/hdb1 and /dev/hdd1 with two spares on /dev/hdf1 and 
/dev/hdh1, put a filesystem on /dev/md1, mount it, put the spare drives 
to sleep (hdparm -S 5 /dev/hd[fh1]), and leave that filesystem alone 
completely, every few minutes for to me no obvious reason those spare 
drives will spin-up. I can only think of one reason: the md subsystem 
has to put some meta-info (hashes?) about /dev/md1 on the spare drives.


If I use the filesystem on /dev/md1 more intensively, those 'every few 
minutes' seems to become 'every 15 or so seconds'.


I may be completely wrong here (I'm no md guru), but maybe someone can 
confirm this behaviour? And if so, is there a way to control it? And if 
not, what could happen here?


For the original problem I can think of a solution: removing the spare 
drives from the array, get them to spin-down and use the mdadm monitor 
feature to trigger a script on a 'Failed' event which adds a spare to 
that array and remove any spin-down time from that spare. However, 
although this sort-of fixes the problem, there is still an extra short 
period of time where the raid1 array is not protected. If the scripts 
fails for whatever reason, the raid1 array might not be protected for a 
long time. Also, from an architectural point of view, this is really bad 
and should not be needed.


Thanks again for your time,

Marc.
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: New FAQ entry? (was IBM xSeries stop responding during RAID1 reconstruction)

2006-06-21 Thread Gil
Mark Hahn wrote:
>> There's much easier/simpler way to set default scheduler.  As
> 
> personally, I don't see any point to worrying about the default,
> compile-time or boot time:
> 
> for f in `find /sys/block/* -name scheduler`; do echo cfq > $f; done

I agree -- if you're talking about changing the io scheduler for the
duration of a resync you should take this approach rather than
changing kernels or rebooting.

--Gil
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: New FAQ entry? (was IBM xSeries stop responding during RAID1 reconstruction)

2006-06-21 Thread Mark Hahn
> There's much easier/simpler way to set default scheduler.  As

personally, I don't see any point to worrying about the default,
compile-time or boot time:

for f in `find /sys/block/* -name scheduler`; do echo cfq > $f; done

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[-mm patch] drivers/md/md.c: make code static

2006-06-21 Thread Adrian Bunk
This patch makes needlessly global code static.

Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>

---

 drivers/md/md.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- linux-2.6.17-mm1-full/drivers/md/md.c.old   2006-06-21 22:59:44.0 
+0200
+++ linux-2.6.17-mm1-full/drivers/md/md.c   2006-06-21 23:00:02.0 
+0200
@@ -175,7 +175,7 @@
 /* Alternate version that can be called from interrupts
  * when calling sysfs_notify isn't needed.
  */
-void md_new_event_inintr(mddev_t *mddev)
+static void md_new_event_inintr(mddev_t *mddev)
 {
atomic_inc(&md_event_count);
wake_up(&md_event_waiters);
@@ -2309,7 +2309,7 @@
  */
 enum array_state { clear, inactive, suspended, readonly, read_auto, clean, 
active,
   write_pending, active_idle, bad_word};
-char *array_states[] = {
+static char *array_states[] = {
"clear", "inactive", "suspended", "readonly", "read-auto", "clean", 
"active",
"write-pending", "active-idle", NULL };
 

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bitmap status question

2006-06-21 Thread Paul Clements

David Greaves wrote:

How do I interpret:
  bitmap: 0/117 pages [0KB], 1024KB chunk
in the mdstat output

what does it mean when it's, eg: 23/117


This refers to the in-memory bitmap (basically a cache of what's in the 
on-disk bitmap -- it allows bitmap operations to be more efficient).


If it's 23/117 that means there are 23 of 117 pages allocated in the 
in-memory bitmap. The pages are allocated on demand, and get freed when 
they're empty (all zeroes). The in-memory bitmap uses 16 bits for each 
bitmap chunk to count all ongoing writes to the chunk, so it's actually 
up to 16 times larger than the on-disk bitmap.


--
Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: New FAQ entry? (was IBM xSeries stop responding during RAID1 reconstruction)

2006-06-21 Thread Michael Tokarev
Niccolo Rigacci wrote:
[]
> From the command line you can see which schedulers are supported 
> and change it on the fly (remember to do it for each RAID disk):
> 
>   # cat /sys/block/hda/queue/scheduler
>   noop [anticipatory] deadline cfq
>   # echo cfq > /sys/block/hda/queue/scheduler
> 
> Otherwise you can recompile your kernel and set CFQ as the 
> default I/O scheduler (CONFIG_DEFAULT_CFQ=y in Block layer, IO 
> Schedulers, Default I/O scheduler).

There's much easier/simpler way to set default scheduler.  As
someone suggested, RTFM Documentation/kernel-parameters.txt.
Passing elevator=cfq (or whatever) will do the trick much simpler
than kernel recompile.

/mjt
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: New FAQ entry? (was IBM xSeries stop responding during RAID1 reconstruction)

2006-06-21 Thread David Greaves
OK :)

David

Niccolo Rigacci wrote:
> Thanks to the several guys in this list, I have solved my problem 
> and elaborated this, can be a new FAQ entry?
>
>
>
> Q: Sometimes when a RAID volume is resyncing, the system seems to 
> locks-up: every disk activity is blocked until resync is done.
>
> A: This is not strictly related to Linux RAID, this is a problem 
> related to the Linux kernel and the disk subsytem: in no 
> circumstances a process should get all the disk resources 
> preventing others to access them.
>
> You can control the max speed at which RAID reconstruction is 
> done by setting it, say at 5 Mb/s:
>
>   echo 5000 > /proc/sys/dev/raid/speed_limit_max
>
> This is just a workaround, you have to determine the max speed 
> that does not lock your system by trial and error and you cannot 
> predict what will be the disk load in the future when the RAID 
> will be resyncing for some reason.
>
> Starting from version 2.6, Linux kernel has several choices about 
> the I/O scheduler to be used. The default is the anticipatory 
> scheduler, which seems to be sub-optimal on resync high load. If 
> your kernel has the CFQ scheduler compiled in, use it during 
> resync.
>
> >From the command line you can see which schedulers are supported 
> and change it on the fly (remember to do it for each RAID disk):
>
>   # cat /sys/block/hda/queue/scheduler
>   noop [anticipatory] deadline cfq
>   # echo cfq > /sys/block/hda/queue/scheduler
>
> Otherwise you can recompile your kernel and set CFQ as the 
> default I/O scheduler (CONFIG_DEFAULT_CFQ=y in Block layer, IO 
> Schedulers, Default I/O scheduler).
>
>
>   


-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


New FAQ entry? (was IBM xSeries stop responding during RAID1 reconstruction)

2006-06-21 Thread Niccolo Rigacci
Thanks to the several guys in this list, I have solved my problem 
and elaborated this, can be a new FAQ entry?



Q: Sometimes when a RAID volume is resyncing, the system seems to 
locks-up: every disk activity is blocked until resync is done.

A: This is not strictly related to Linux RAID, this is a problem 
related to the Linux kernel and the disk subsytem: in no 
circumstances a process should get all the disk resources 
preventing others to access them.

You can control the max speed at which RAID reconstruction is 
done by setting it, say at 5 Mb/s:

  echo 5000 > /proc/sys/dev/raid/speed_limit_max

This is just a workaround, you have to determine the max speed 
that does not lock your system by trial and error and you cannot 
predict what will be the disk load in the future when the RAID 
will be resyncing for some reason.

Starting from version 2.6, Linux kernel has several choices about 
the I/O scheduler to be used. The default is the anticipatory 
scheduler, which seems to be sub-optimal on resync high load. If 
your kernel has the CFQ scheduler compiled in, use it during 
resync.

>From the command line you can see which schedulers are supported 
and change it on the fly (remember to do it for each RAID disk):

  # cat /sys/block/hda/queue/scheduler
  noop [anticipatory] deadline cfq
  # echo cfq > /sys/block/hda/queue/scheduler

Otherwise you can recompile your kernel and set CFQ as the 
default I/O scheduler (CONFIG_DEFAULT_CFQ=y in Block layer, IO 
Schedulers, Default I/O scheduler).


-- 
Niccolo Rigacci
Firenze - Italy

Iraq, missione di pace: 38475 morti - www.iraqbodycount.net
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html