Re: [PATCH] mdadm 2.5 (Was: ANNOUNCE: mdadm 2.5 - A tool for managing Soft RAID under Linux)

2006-05-29 Thread Neil Brown
On Monday May 29, [EMAIL PROTECTED] wrote:
 On Mon, May 29, 2006 at 12:08:25PM +1000, Neil Brown wrote:
 On Sunday May 28, [EMAIL PROTECTED] wrote:
 Thanks for the patches.  They are greatly appreciated.
 You're welcome
  
  - mdadm-2.3.1-kernel-byteswap-include-fix.patch
  reverts a change introduced with mdadm 2.3.1 for redhat compatibility
  asm/byteorder.h is an architecture dependent file and does more
  stuff than a call to the linux/byteorder/XXX_endian.h
  the fact that not calling asm/byteorder.h does not define
  __BYTEORDER_HAS_U64__ is just an example of issues that might arise.
  if redhat is broken it should be worked around differently than breaking
  mdadm.
 
 I don't understand the problem here.  What exactly breaks with the
 code currently in 2.5?  mdadm doesn't need __BYTEORDER_HAS_U64__, so
 why does not having id defined break anything?
 The coomment from the patch says:
   not including asm/byteorder.h will not define __BYTEORDER_HAS_U64__
   causing __fswab64 to be undefined and failure compiling mdadm on
   big_endian architectures like PPC
 
 But again, mdadm doesn't use __fswab64 
 More details please.
 you use __cpu_to_le64 (ie in super0.c line 987)
 
 bms-sync_size = __cpu_to_le64(size);
 
 which in byteorder/big_endian.h is defined as
 
 #define __cpu_to_le64(x) ((__force __le64)__swab64((x)))
 
 but __swab64 is defined in byteorder/swab.h (included by
 byteorder/big_endian.h) as
 
 #if defined(__GNUC__)  (__GNUC__ = 2)  defined(__OPTIMIZE__)
 #  define __swab64(x) \
 (__builtin_constant_p((__u64)(x)) ? \
 ___swab64((x)) : \
 __fswab64((x)))
 #else
 #  define __swab64(x) __fswab64(x)
 #endif /* OPTIMIZE */


Grrr..

Thanks for the details.  I think I'll just give up and do it myself.
e.g.
short swap16(short in)
{
int i;
short out=0;
for (i=0; i4; i++) {
out = out8 | (in255);
in = in  8;
}
return out;
}

I don't need top performance and at least this should be portable...

NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: problems with raid=noautodetect

2006-05-29 Thread Michael Tokarev
Neil Brown wrote:
 On Friday May 26, [EMAIL PROTECTED] wrote:
[]
 If we assume there is a list of devices provided by a (possibly
 default) 'DEVICE' line, then 
 
 DEVICEFILTER   !pattern1 !pattern2 pattern3 pattern4
 
 could mean that any device in that list which matches pattern 1 or 2
 is immediately discarded, and remaining device that matches patterns 3
 or 4 are included, and the remainder are discard.
 
 The rule could be that the default is to include any devices that
 don't match a !pattern, unless there is a pattern without a '!', in
 which case the default is to reject non-accepted patterns.
 Is that straight forward enough, or do I need an
   order allow,deny
 like apache has?

I'd suggest the following.

All the other devices are included or excluded from the list of devices
to consider based on the last component in the DEVICE line.  Ie. if it
ends up at !dev, all the rest of devices are included.  If it ends up at
dev (w/o !), all the rest are excluded.  If memory serves me right, it's
how squid ACLs works.

There's no need to introduce new keyword.  Given this rule, a line

 DEVICE a b c

will do exactly as it does now.  Line

 DEVICE a b c !d

is somewhat redundant - it's the same as DEVICE !d
Ie, if the list ends up at !stuff, append `partitions' (or *) to it.

Ofcourse mixing !s and !!s is useful, like to say use all sda* but not
sda1:

 DEVICE !sda1 sda*

(and nothing else).

And the default is to have `DEVICE partitions'.

The only possible issue I see here is that with udev, it's possible to
use, say, /dev/disk/by-id/*-like stuff (don't remember exact directory
layout) -- symlinked to /dev/sd* according to the disk serial number or
something like that -- for this to work, mdadm needs to use glob()
internally.

/mjt
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: problems with raid=noautodetect

2006-05-29 Thread Luca Berra

On Mon, May 29, 2006 at 12:38:25PM +0400, Michael Tokarev wrote:

Neil Brown wrote:

On Friday May 26, [EMAIL PROTECTED] wrote:

I'd suggest the following.

All the other devices are included or excluded from the list of devices
to consider based on the last component in the DEVICE line.  Ie. if it
ends up at !dev, all the rest of devices are included.  If it ends up at
dev (w/o !), all the rest are excluded.  If memory serves me right, it's
how squid ACLs works.

There's no need to introduce new keyword.  Given this rule, a line


as i said the new keyword is to warn on configurations that do not
account for changing device-ids, and if we change the syntax a new
keyword would make it clearer. In case the user tries to use a new
configuration on an old mdadm.


The only possible issue I see here is that with udev, it's possible to
use, say, /dev/disk/by-id/*-like stuff (don't remember exact directory
layout) -- symlinked to /dev/sd* according to the disk serial number or
something like that -- for this to work, mdadm needs to use glob()
internally.


uhm
i think that we would better translate any device found on a DEVICE (or
DEVICEFILTER) line to the corresponding major/minor number and blacklist
based on that.
nothing prevents someone to have an udev rule that creates a device
node, instead of symlinking.

L.

--
Luca Berra -- [EMAIL PROTECTED]
   Communication Media  Services S.r.l.
/\
\ / ASCII RIBBON CAMPAIGN
 XAGAINST HTML MAIL
/ \
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raid5 hang on get_active_stripe

2006-05-29 Thread dean gaudet
On Sun, 28 May 2006, Neil Brown wrote:

 The following patch adds some more tracing to raid5, and might fix a
 subtle bug in ll_rw_blk, though it is an incredible long shot that
 this could be affecting raid5 (if it is, I'll have to assume there is
 another bug somewhere).   It certainly doesn't break ll_rw_blk.
 Whether it actually fixes something I'm not sure.
 
 If you could try with these on top of the previous patches I'd really
 appreciate it.
 
 When you read from /stripe_cache_active, it should trigger a
 (cryptic) kernel message within the next 15 seconds.  If I could get
 the contents of that file and the kernel messages, that should help.

got the hang again... attached is the dmesg with the cryptic messages.  i 
didn't think to grab the task dump this time though.

hope there's a clue in this one :)  but send me another patch if you need 
more data.

-dean

neemlark:/sys/block/md4/md# cat stripe_cache_size 
256
neemlark:/sys/block/md4/md# cat stripe_cache_active 
251
0 preread
plugged
bitlist=0 delaylist=251
neemlark:/sys/block/md4/md# cat stripe_cache_active 
251
0 preread
plugged
bitlist=0 delaylist=251
neemlark:/sys/block/md4/md# echo 512 stripe_cache_size 
neemlark:/sys/block/md4/md# cat stripe_cache_active
512
292 preread
not plugged
bitlist=0 delaylist=32
neemlark:/sys/block/md4/md# cat stripe_cache_active
512
292 preread
not plugged
bitlist=0 delaylist=32
neemlark:/sys/block/md4/md# cat stripe_cache_active
445
0 preread
not plugged
bitlist=0 delaylist=73
neemlark:/sys/block/md4/md# cat stripe_cache_active
480
0 preread
not plugged
bitlist=0 delaylist=0
neemlark:/sys/block/md4/md# cat stripe_cache_active
413
0 preread
not plugged
bitlist=0 delaylist=0
neemlark:/sys/block/md4/md# cat stripe_cache_active
13
0 preread
not plugged
bitlist=0 delaylist=0
neemlark:/sys/block/md4/md# cat stripe_cache_active
493
0 preread
not plugged
bitlist=0 delaylist=0
neemlark:/sys/block/md4/md# cat stripe_cache_active
487
0 preread
not plugged
bitlist=0 delaylist=0
neemlark:/sys/block/md4/md# cat stripe_cache_active
405
0 preread
not plugged
bitlist=0 delaylist=0
neemlark:/sys/block/md4/md# cat stripe_cache_active
512
1 preread
not plugged
bitlist=0 delaylist=28
neemlark:/sys/block/md4/md# cat stripe_cache_active
512
84 preread
not plugged
bitlist=0 delaylist=69
neemlark:/sys/block/md4/md# cat stripe_cache_active
512
69 preread
not plugged
bitlist=0 delaylist=56
neemlark:/sys/block/md4/md# cat stripe_cache_active
512
41 preread
not plugged
bitlist=0 delaylist=38
neemlark:/sys/block/md4/md# cat stripe_cache_active
512
10 preread
not plugged
bitlist=0 delaylist=0
neemlark:/sys/block/md4/md# cat stripe_cache_active
453
3 preread
not plugged
bitlist=0 delaylist=0
neemlark:/sys/block/md4/md# cat stripe_cache_active
480
0 preread
not plugged
bitlist=0 delaylist=0
neemlark:/sys/block/md4/md# cat stripe_cache_active
512
0 preread
not plugged
bitlist=0 delaylist=0
neemlark:/sys/block/md4/md# cat stripe_cache_active
512
14 preread
not plugged
bitlist=0 delaylist=0
neemlark:/sys/block/md4/md# cat stripe_cache_active
477
0 preread
not plugged
bitlist=0 delaylist=0
neemlark:/sys/block/md4/md# cat stripe_cache_active
476
0 preread
not plugged
bitlist=0 delaylist=0
neemlark:/sys/block/md4/md# cat stripe_cache_active
486
0 preread
not plugged
bitlist=0 delaylist=0
neemlark:/sys/block/md4/md# cat stripe_cache_active
480
0 preread
not plugged
bitlist=0 delaylist=0
neemlark:/sys/block/md4/md# cat stripe_cache_active
384
0 preread
not plugged
bitlist=0 delaylist=0
neemlark:/sys/block/md4/md# cat stripe_cache_active
512
0 preread
not plugged
bitlist=0 delaylist=0
neemlark:/sys/block/md4/md# cat stripe_cache_active
387
0 preread
not plugged
bitlist=0 delaylist=0
neemlark:/sys/block/md4/md# cat stripe_cache_active
462
0 preread
not plugged
bitlist=0 delaylist=0
neemlark:/sys/block/md4/md# cat stripe_cache_active
480
0 preread
not plugged
bitlist=0 delaylist=0
neemlark:/sys/block/md4/md# cat stripe_cache_active
448
0 preread
not plugged
bitlist=0 delaylist=0
neemlark:/sys/block/md4/md# cat stripe_cache_active
512
0 preread
not plugged
bitlist=0 delaylist=0
neemlark:/sys/block/md4/md# cat stripe_cache_active
501
0 preread
not plugged
bitlist=0 delaylist=0
neemlark:/sys/block/md4/md# cat stripe_cache_active
476
0 preread
not plugged
bitlist=0 delaylist=0
neemlark:/sys/block/md4/md# cat stripe_cache_active
512
0 preread
not plugged
bitlist=0 delaylist=0
neemlark:/sys/block/md4/md# cat stripe_cache_active
416
0 preread
not plugged
bitlist=0 delaylist=0
neemlark:/sys/block/md4/md# cat stripe_cache_active
386
0 preread
not plugged
bitlist=0 delaylist=0
neemlark:/sys/block/md4/md# cat stripe_cache_active
512
0 preread
not plugged
bitlist=0 delaylist=0
neemlark:/sys/block/md4/md# cat stripe_cache_active
434
0 preread
not plugged
bitlist=0 delaylist=0
neemlark:/sys/block/md4/md# cat stripe_cache_active
406
0 preread
not plugged
bitlist=0 delaylist=0
neemlark:/sys/block/md4/md# cat stripe_cache_active
447
0 preread
not plugged
bitlist=0 

[PATCH] md: Fix badness in sysfs_notify caused by md_new_event

2006-05-29 Thread NeilBrown
Here is a patch for a bug in 2.6.17-rc that just came to light.  It
should get into 2.6.17 if possible and so is in 'obviously correct'
form rather than correct final fix form (see comments).
The patch was actually generated agains -rc4-mm3, but applies to
-rc4-git9 with a moderate offset for one of the chunks (no fuzz).

Thanks,
NeilBrown


### Comments for Changeset

If an error is reported by a drive in a RAID array
(which is done via bi_end_io - in interrupt context),
we call md_error and md_new_event which calls
sysfs_notify.
However sysfs_notify grabs a mutex and so cannot be called
in interrupt context.

This patch just creates a variant of md_new_event which
avoids the sysfs call, and uses that.
A better fix for later is to arrange for the event to be
called from user-context.

Note: avoiding the sysfs call isn't a problem as an error
will not, by itself, modify the sync_action attribute.
(We do still need to wake_up(md_event_waiters) as an
error by itself will modify /proc/mdstat).

Signed-off-by: Neil Brown [EMAIL PROTECTED]

### Diffstat output
 ./drivers/md/md.c |   11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~  2006-05-30 15:14:20.0 +1000
+++ ./drivers/md/md.c   2006-05-30 15:23:26.0 +1000
@@ -172,6 +172,15 @@ void md_new_event(mddev_t *mddev)
 }
 EXPORT_SYMBOL_GPL(md_new_event);
 
+/* Alternate version that can be called from interrupts
+ * when calling sysfs_notify isn't needed.
+ */
+void md_new_event_inintr(mddev_t *mddev)
+{
+   atomic_inc(md_event_count);
+   wake_up(md_event_waiters);
+}
+
 /*
  * Enables to iterate over all existing md arrays
  * all_mddevs_lock protects this list.
@@ -4268,7 +4277,7 @@ void md_error(mddev_t *mddev, mdk_rdev_t
set_bit(MD_RECOVERY_INTR, mddev-recovery);
set_bit(MD_RECOVERY_NEEDED, mddev-recovery);
md_wakeup_thread(mddev-thread);
-   md_new_event(mddev);
+   md_new_event_inintr(mddev);
 }
 
 /* seq_file implementation /proc/mdstat */
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html