Re: How to auto rebuild array?

2005-07-14 Thread [EMAIL PROTECTED]
Hi Rui,

 Is that a RAID1 ou RAID5?
 
It's a RAID5, but the problem also happends on RAID1.

 Can you give the output of these commands?
 
 mdadm --misc -D /dev/md3
 mdadm --misc -E /dev/hda4
 mdadm --misc -E /dev/hdb4
 mdadm --misc -E /dev/hde4
 mdadm --misc -E /dev/hdf4
 
 One other thing. Are you sure that all you raid partitions are
 marked 0xfd ?
 
There are 0xfd partitions with persistent superblock.

 Also attach you mdadm.conf/raidconf file, if you have any...
 
I don't use a /etc/mdadm/mdadm.conf file, the raid is started during kernel
boot with raid autodetect. That's when the 'kicking non-fresh drive' print
occurs. When this happened 'mdadm --detail' will not display the kicked disk
in the list of array disks anymore, also the mdadm deamon only will get the
name of the array that's degraded, not the name of the kicked device :(

I made a script that runs after reboot as fix, it will hott-add the kicked
disks back to the array. It seems to fix the problem.

Regards,
Bart


-
#! /bin/bash

DEVLIST=`ls /dev/hd??`
for dev in $DEVLIST; do
 result=`mdadm --query $dev | grep mismatch`
 if [ -n $result ]; then
raid=/dev/`echo $result | awk 'BEGIN {FS=[ .]} {print $9}'`
echo $raid needs $dev added
mdadm --add $raid $dev
 fi
done
-




 
 I have the problem that after a power failure I get the message:
 
 Jul 12 15:29:17 kernel: md: created md3
 Jul 12 15:29:17 kernel: md: bindhda4
 Jul 12 15:29:17 kernel: md: bindhdb4
 Jul 12 15:29:17 kernel: md: bindhde4
 Jul 12 15:29:17 kernel: md: bindhdf4
 Jul 12 15:29:17 kernel: md: running: hdf4hde4hdb4hda4
 Jul 12 15:29:17 kernel: md: kicking non-fresh hde4 from array!
 Jul 12 15:29:17 kernel: md: unbindhde4
 
 I understand that hde4 is not 'fresh' and the array need to be rebuild
 but I only can do that with 'mdadm --add /dev/md3 /dev/hde4'. I would
 like to have it turned into a hot-spare, in which case a rebuild would
 start automatic.
 
 This application runs unattended, so there is nobody there to enter
 mdadm commands How can I make the rebuild starting automatic
 (like a hardware raidcard does)?
 
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: disk failed, operator error: Now can't use RAID

2005-07-14 Thread Neil Brown
On Wednesday July 13, [EMAIL PROTECTED] wrote:
 
 I would very much appreciate suggestions on how to get the raid
 running again.

Remove the 
devices=/dev/hde1,/dev/sdd1,/dev/sdc1,/dev/sdb1,/dev/sda1

line from mdadm.conf (it is wrong and un-needed).

Then
  mdadm -S /dev/md0  # just to be sure
  mdadm -A /dev/md0 -f /dev/sd[abcd]1 /dev/hd[eg]1

and see if that works.

NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Oops when starting md multipath on a 2.4 kernel

2005-07-14 Thread James Pearson

Mike Tran wrote:

James Pearson wrote:

We have an existing system runing a 2.4.27 based kernel that uses md 
multipath and external fibre channel arrays.


We need to add more internal disks to the system, which means the 
external drives change device names.


When I tried to start the md multipath device using mdadm, the kernel 
Oops'd. Removing the new internal disks and going back the original 
setup, I can start the multipath device - as this machine is in 
production, I can't do any more tests.


However, I can reproduce the problem on test system by creating an md 
multipath device on an external SCSI disk, using /dev/sda1, stopping 
the multipath device, rmmod'ing the SCSI driver, pluging in a couple 
of USB storage devices which become /dev/sda and /dev/sdb and then 
modprobing the SCSI driver, so the original /dev/sda1 is now /dev/sdc1.


When I run 'mdadm -A -s', I get the following Oops:

 [events: 0004]
md: bindsdc1,1
md: sdc1's event counter: 0004
md0: former device sda1 is unavailable, removing from array!
md: unbindsdc1,0
md: export_rdev(sdc1)
md: RAID level -4 does not need chunksize! Continuing anyway.
md: multipath personality registered as nr 7
md0: max total readahead window set to 124k
md0: 1 data-disks, max readahead per data-disk: 124k
Unable to handle kernel NULL pointer dereference at virtual address 
0040

 printing eip:
e096527e
*pde = 
Oops: 
CPU:0
EIP:0010:[e096527e]Not tainted
EFLAGS: 00010246
eax: deb62a94   ebx:    ecx: dd65b400   edx: 
esi: 001c   edi: deb62a94   ebp:    esp: dd5fbdbc
ds: 0018   es: 0018   ss: 0018
Process mdadm (pid: 1389, stackpage=dd5fb000)
Stack: dd4c4000 dfa96000 c035ad00  0286 dd4c4000  

   deb62a94 dd5fbe5c dd4c6000 c02a6e10 dd65b400 c035ef1f 007c 

   000a  0002 2e2e c0118b49 2e2e 2e2e 
0286
Call Trace:[c02a6e10] [c0118b49] [c0118cc4] [c024a88c] 
[c024abb6]
  [c0118cc4] [c024907e] [c024b6f2] [c024c60c] [c014a326] 
[c013c483]

  [c013ca18] [c01375ac] [c013ca63] [c01439b6] [c01087c7]

Code: 8b 45 40 85 c0 0f 84 c2 01 00 00 6a 00 ff b4 24 cc 00 00 00

Running through ksymoops gives:

Unable to handle kernel NULL pointer dereference at virtual address 
0040

e096527e
*pde = 
Oops: 
CPU:0
EIP:0010:[e096527e]Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010246
eax: deb62a94   ebx:    ecx: dd65b400   edx: 
esi: 001c   edi: deb62a94   ebp:    esp: dd5fbdbc
ds: 0018   es: 0018   ss: 0018
Process mdadm (pid: 1389, stackpage=dd5fb000)
Stack: dd4c4000 dfa96000 c035ad00  0286 dd4c4000  

   deb62a94 dd5fbe5c dd4c6000 c02a6e10 dd65b400 c035ef1f 007c 

   000a  0002 2e2e c0118b49 2e2e 2e2e 
0286
Call Trace:[c02a6e10] [c0118b49] [c0118cc4] [c024a88c] 
[c024abb6]
  [c0118cc4] [c024907e] [c024b6f2] [c024c60c] [c014a326] 
[c013c483]

  [c013ca18] [c01375ac] [c013ca63] [c01439b6] [c01087c7]
Code: 8b 45 40 85 c0 0f 84 c2 01 00 00 6a 00 ff b4 24 cc 00 00 00

EIP; e096527e [multipath]multipath_run+2be/6c0   =
Trace; c02a6e10 vsnprintf+2e0/450
Trace; c0118b49 call_console_drivers+e9/f0
Trace; c0118cc4 printk+104/110
Trace; c024a88c device_size_calculation+19c/1f0
Trace; c024abb6 do_md_run+2d6/360
Trace; c0118cc4 printk+104/110
Trace; c024907e bind_rdev_to_array+9e/b0
Trace; c024b6f2 add_new_disk+132/290
Trace; c024c60c md_ioctl+6fc/790
Trace; c014a326 iput+236/240
Trace; c013c483 bdput+93/a0
Trace; c013ca18 blkdev_put+98/a0
Trace; c01375ac fput+bc/e0
Trace; c013ca63 blkdev_ioctl+23/30
Trace; c01439b6 sys_ioctl+216/230
Trace; c01087c7 system_call+33/38
Code;  e096527e [multipath]multipath_run+2be/6c0
 _EIP:
Code;  e096527e [multipath]multipath_run+2be/6c0   =
   0:   8b 45 40  mov0x40(%ebp),%eax   =
Code;  e0965281 [multipath]multipath_run+2c1/6c0
   3:   85 c0 test   %eax,%eax
Code;  e0965283 [multipath]multipath_run+2c3/6c0
   5:   0f 84 c2 01 00 00 je 1cd _EIP+0x1cd e096544b 
[multipath]m

ultipath_run+48b/6c0
Code;  e0965289 [multipath]multipath_run+2c9/6c0
   b:   6a 00 push   $0x0
Code;  e096528b [multipath]multipath_run+2cb/6c0
   d:   ff b4 24 cc 00 00 00  pushl  0xcc(%esp,1)

My /etc/mdadm.conf contains:

DEVICE /dev/sd?1
ARRAY /dev/md0 level=multipath num-devices=1
  UUID=277e4ba5:6c23c087:e17c877c:da642955


Should md multipath be able to handle changes like this with the 
underlying devices?



Thanks

James Pearson


Hi James,

My co-worker and I just happened to run into this problem a few days 
ago. So, I would like to share with you what we know.


The device major/minor numbers no longer match up values recorded in the 
descriptor array in the md superblock. Because of the exception made in 
the current code, the descriptor entries are removed and 

Re: Oops when starting md multipath on a 2.4 kernel

2005-07-14 Thread Lars Marowsky-Bree
On 2005-07-14T11:09:32, James Pearson [EMAIL PROTECTED] wrote:

 Thanks - that patch applies OK to more recent 2.4 kernels and appears to 
 'fix' this problem.
 
 However, if you have a cut down patch that fixes just this problem, then 
 I would appreciate it if you could make it available.

There's a bugfix needed for 2.4 md multipath which prevents guaranteed data
corruption on failover too. I don't have time to redo the diffs against
2.4 proper, but

-   bh-b_rdev = bh-b_dev;
-   bh-b_rsector = bh-b_blocknr;

are probably the two most important changes to multipath.c:multipathd().

The patch in the SLES8 2.4 kernel is
patches.common/md-multipath-retry-handling - there's also some locking
fixes etc in there.

The problem is our kernel has deviated so much from 2.4, and active
development is now focused on DM mpath in 2.6, that pulling out smaller
chunks and feeding them upstream on 2.4 just isn't worth it :-(


Sincerely,
Lars Marowsky-Brée [EMAIL PROTECTED]

-- 
High Availability  Clustering
SUSE Labs, Research and Development
SUSE LINUX Products GmbH - A Novell Business -- Charles Darwin
Ignorance more frequently begets confidence than does knowledge

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID-5 streaming read performance

2005-07-14 Thread Ming Zhang
On Wed, 2005-07-13 at 23:58 -0400, Dan Christensen wrote:
 David Greaves [EMAIL PROTECTED] writes:
 
  In my setup I get
 
  component partitions, e.g. /dev/sda7: 39MB/s
  raid device /dev/md2: 31MB/s
  lvm device /dev/main/media:   53MB/s
 
  (oldish system - but note that lvm device is *much* faster)
 
 Did you test component device and raid device speed using the
 read-ahead settings tuned for lvm reads?  If so, that's not a fair
 comparison.  :-)
 
  For your entertainment you may like to try this to 'tune' your readahead
  - it's OK to use so long as you're not recording:
 
 Thanks, I played around with that a lot.  I tuned readahead to
 optimize lvm device reads, and this improved things greatly.  It turns
 out the default lvm settings had readahead set to 0!  But by tuning
 things, I could get my read speed up to 59MB/s.  This is with raw
 device readahead 256, md device readahead 1024 and lvm readahead 2048.
 (The speed was most sensitive to the last one, but did seem to depend
 on the other ones a bit too.)
 
 I separately tuned the raid device read speed.  To maximize this, I
 needed to set the raw device readahead to 1024 and the raid device
 readahead to 4096.  This brought my raid read speed from 59MB/s to
 78MB/s.  Better!  (But note that now this makes the lvm read speed
 look bad.)
 
 My raw device read speed is independent of the readahead setting,
 as long as it is at least 256.  The speed is about 58MB/s.
 
 Summary:
 
 raw device:  58MB/s
 raid device: 78MB/s
 lvm device:  59MB/s
 
 raid still isn't achieving the 106MB/s that I can get with parallel
 direct reads, but at least it's getting closer.
 
 As a simple test, I wrote a program like dd that reads and discards
 64k chunks of data from a device, but which skips 1 out of every four
 chunks (simulating skipping parity blocks).  It's not surprising that
 this program can only read from a raw device at about 75% the rate of
 dd, since the kernel readahead is probably causing the skipped blocks
 to be read anyways (or maybe because the disk head has to pass over
 those sections of the disk anyways).
 
 I then ran four copies of this program in parallel, reading from the
 raw devices that make up my raid partition.  And, like md, they only
 achieved about 78MB/s.  This is very close to 75% of 106MB/s.  Again,
 not surprising, since I need to have raw device readahead turned on
 for this to be efficient at all, so 25% of the chunks that pass
 through the controller are ignored.
 
 But I still don't understand why the md layer can't do better.  If I
 turn off readahead of the raw devices, and keep it for the raid
 device, then parity blocks should never be requested, so they
 shouldn't use any bus/controller bandwidth.  And even if each drive is
 only acting at 75% efficiency, the four drives should still be able to
 saturate the bus/controller.  So I can't figure out what's going on
 here.
when read, i do not think MD will read parity at all. but since parity
is on all disk, there might be a seek here. so you might want to try a
raid4 to see what happen as well.




 
 Is there a way for me to simulate readahead in userspace, i.e. can
 I do lots of sequential asynchronous reads in parallel?
 
 Also, is there a way to disable caching of reads?  Having to clear
 the cache by reading 900M each time slows down testing.  I guess
 I could reboot with mem=100M, but it'd be nice to disable/enable
 caching on the fly.  Hmm, maybe I can just run something like
 memtest which locks a bunch of ram...
after you run your code, check the meminfo, the cached value might be
much lower than u expected. my feeling is that linux page cache will
discard all cache if last file handle closed.


 
 Thanks for all of the help so far!
 
 Dan

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID-5 streaming read performance

2005-07-14 Thread Ming Zhang
my problem here. this only apply to sdX not mdX. pls ignore this.

ming

On Thu, 2005-07-14 at 08:30 -0400, Ming Zhang wrote:
  Also, is there a way to disable caching of reads?  Having to clear
  the cache by reading 900M each time slows down testing.  I guess
  I could reboot with mem=100M, but it'd be nice to disable/enable
  caching on the fly.  Hmm, maybe I can just run something like
  memtest which locks a bunch of ram...
 after you run your code, check the meminfo, the cached value might be
 much lower than u expected. my feeling is that linux page cache will
 discard all cache if last file handle closed.
 

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID-5 streaming read performance

2005-07-14 Thread Dan Christensen
Mark Hahn [EMAIL PROTECTED] writes:

 Is there a way for me to simulate readahead in userspace, i.e. can
 I do lots of sequential asynchronous reads in parallel?

 there is async IO, but I don't think this is going to help you much.

 Also, is there a way to disable caching of reads?  Having to clear

 yes: O_DIRECT.

That might disable caching of reads, but it also disables readahead,
so unless I manually use aio to simulate readahead, this isn't going
to solve my problem, which is having to clear the cache before each
test to get relevant results.

I'm really surprised there isn't something in /proc you can use to
clear or disable the cache.  Would be very useful for benchmarking!

Dan

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: disk failed, operator error: Now can't use RAID

2005-07-14 Thread Hank Barta
On 7/14/05, Neil Brown [EMAIL PROTECTED] wrote:
 On Wednesday July 13, [EMAIL PROTECTED] wrote:
 
  I would very much appreciate suggestions on how to get the raid
  running again.
 
 Remove the
 devices=/dev/hde1,/dev/sdd1,/dev/sdc1,/dev/sdb1,/dev/sda1
 
 line from mdadm.conf (it is wrong and un-needed).
 
 Then
   mdadm -S /dev/md0  # just to be sure
   mdadm -A /dev/md0 -f /dev/sd[abcd]1 /dev/hd[eg]1
 
 and see if that works.

Yes, Thanks!

Results are:

oak:~# mdadm -S /dev/md0
oak:~# mdadm -A /dev/md0 -f /dev/sd[abcd]1 /dev/hd[eg]1
mdadm: forcing event count in /dev/sda1(0) from 1271893 upto 2816178
mdadm: /dev/md0 has been started with 4 drives (out of 5) and 1 spare.
oak:~# cat /proc/mdstat
Personalities : [raid5]
md0 : active raid5 sda1[0] hde1[5] sdd1[3] sdc1[2] sdb1[1]
  781433344 blocks level 5, 32k chunk, algorithm 2 [5/4] [_]
  []  recovery =  0.1% (389320/195358336)
finish=280.4min speed=11585K/sec
unused devices: none
oak:~#

Now... After this is through rebuilding, I need to replace the failed
drive. (Creating one partition and setting it to type 0xFD (Linux raid
autodetect)

What's the best way to get this in service with one drive as a spare?
Can I convert my current spare (/dev/hde1) to a regular disk and add
the new disk as a spare?

Or should I add the new disk as an active drive and if so, will it be
rebuilt and the spare (/dev/hde1) be relegated back as a spare?

thanks again,
hank

-- 
Beautiful Sunny Winfield, Illinois
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID-5 streaming read performance

2005-07-14 Thread Mark Hahn
 i also want a way to clear part of the whole page cache by file id. :)

understandably, kernel developers are don't high-prioritize this sort of 
not-useful-for-normal-work feature.

 i also want a way to tell the cache distribution, how many for file A
 and B, 

you should probably try mmaping the file and using mincore.
come to think of it, mmap+madvise might be a sensible way to 
flush pages corresponding to a particular file, as well.

  I'm really surprised there isn't something in /proc you can use to
  clear or disable the cache.  Would be very useful for benchmarking!

I assume you noticed blockdev --flushbufs, no?  it works for me 
(ie, a small, repeated streaming read of a disk device will show 
pagecache speed).

I think the problem is that it's difficult to dissociate readahead,
writebehind and normal lru-ish caching.  there was quite a flurry of 
activity around 2.4.10 related to this, and it left a bad taste in 
everyone's mouth.  I think the main conclusion was that too much fanciness
results in a fragile, more subtle and difficult-to-maintain system 
that performs better, true, but over a narrower range of workloads.

regards, mark hahn
sharcnet/mcmaster.

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Raid5 Failure

2005-07-14 Thread David M. Strang

Hello -

I'm currently stuck in a moderately awkward predicament. I have a 28 disk 
software RAID5; at the time I created it I was using EVMS - this was because 
mdadm 1.x didn't support superblock v1 and mdadm 2.x wouldn't compile on my 
system. Everything was working great; until I had an unusual kernel error:


Jun 20 02:55:07 abyss last message repeated 33 times
Jun 20 02:55:07 abyss kernel: KERNEL: assertion (flags  MSG_PEEK) failed at 
net/ 59A9F3C
Jun 20 02:55:07 abyss kernel: KERNEL: assertion (flags  MSG_PEEK) failed at 
net/ipv4/tcp.c (1294)


I used to get this error randomly; a reboot would resolve it - the final fix 
was to update the kernel. The reason I even noticed the error this time, was 
because I was attempting to access my RAID, and some of the data wouldn't 
come up. I did a cat /proc/mdstat and it said 13 of the 28 devices were 
failed. I checked /var/log/kernel and the above message was spamming the log 
repeatedly.


Upon reboot, I fired up EVMSGui to remount the raid - and I received the 
following error messages:


Jul 14 20:17:46 abyss _3_ Engine: engine_ioctl_object: ioctl to object 
md/md0 failed with error code 19: No such device
Jul 14 20:17:47 abyss _3_ MDRaid5RegMgr: md_analyze_volume: Object sda is 
out of date.
Jul 14 20:17:47 abyss _3_ MDRaid5RegMgr: md_analyze_volume: Object sdb is 
out of date.
Jul 14 20:17:47 abyss _3_ MDRaid5RegMgr: md_analyze_volume: Object sdc is 
out of date.
Jul 14 20:17:47 abyss _3_ MDRaid5RegMgr: md_analyze_volume: Object sdd is 
out of date.
Jul 14 20:17:47 abyss _3_ MDRaid5RegMgr: md_analyze_volume: Object sde is 
out of date.
Jul 14 20:17:47 abyss _3_ MDRaid5RegMgr: md_analyze_volume: Object sdf is 
out of date.
Jul 14 20:17:47 abyss _3_ MDRaid5RegMgr: md_analyze_volume: Object sdg is 
out of date.
Jul 14 20:17:47 abyss _3_ MDRaid5RegMgr: md_analyze_volume: Object sdh is 
out of date.
Jul 14 20:17:47 abyss _3_ MDRaid5RegMgr: md_analyze_volume: Object sdi is 
out of date.
Jul 14 20:17:47 abyss _3_ MDRaid5RegMgr: md_analyze_volume: Object sdj is 
out of date.
Jul 14 20:17:47 abyss _3_ MDRaid5RegMgr: md_analyze_volume: Object sdk is 
out of date.
Jul 14 20:17:47 abyss _3_ MDRaid5RegMgr: md_analyze_volume: Object sdl is 
out of date.
Jul 14 20:17:47 abyss _3_ MDRaid5RegMgr: md_analyze_volume: Object sdm is 
out of date.
Jul 14 20:17:47 abyss _3_ MDRaid5RegMgr: md_analyze_volume: Found 13 stale 
objects in region md/md0.
Jul 14 20:17:47 abyss _0_ MDRaid5RegMgr: sb1_analyze_sb: MD region md/md0 is 
corrupt
Jul 14 20:17:47 abyss _3_ MDRaid5RegMgr: md_fix_dev_major_minor: MD region 
md/md0 is corrupt.
Jul 14 20:17:47 abyss _0_ Engine: plugin_user_message: Message is: 
MDRaid5RegMgr: Region md/md0 : MD superblocks found in object(s) [sda sdb 
sdc sdd sde sdf sdg sdh sdi sdj sdk sdl sdm ] are not valid.  [sda sdb sdc 
sdd sde sdf sdg sdh sdi sdj sdk sdl sdm ] will not be activated and should 
be removed from the region.


Jul 14 20:17:47 abyss _0_ Engine: plugin_user_message: Message is: 
MDRaid5RegMgr: RAID5 region md/md0 is corrupt.  The number of raid disks for 
a full functional array is 28.  The number of active disks is 15.
Jul 14 20:17:47 abyss _2_ MDRaid5RegMgr: raid5_read: MD Object md/md0 is 
corrupt, data is suspect
Jul 14 20:17:47 abyss _2_ MDRaid5RegMgr: raid5_read: MD Object md/md0 is 
corrupt, data is suspect


I realize this is not the EVMS mailing list; I tried earlier (I've been 
swamped at work) with no success on resolving this issue there. Today, I 
tried mdadm 2.0-devel-2. It compiled w/o issue. I did a mdadm --misc -Q 
/dev/sdm.


-([EMAIL PROTECTED])-(~/mdadm-2.0-devel-2)- # ./mdadm --misc -Q /dev/sdm
/dev/sdm: is not an md array
/dev/sdm: device 134639616 in 28 device undetected raid5 md-1.  Use 
mdadm --examine for more detail.


-([EMAIL PROTECTED])-(~/mdadm-2.0-devel-2)- # ./mdadm --misc -E /dev/sdm
/dev/sdm:
 Magic : a92b4efc
   Version : 01.00
Array UUID : 4e2b6b0a8e:92e91c0c:018a4bf0:9bb74d
  Name : md/md0
 Creation Time : Wed Dec 31 19:00:00 1969
Raid Level : raid5
  Raid Devices : 28

   Device Size : 143374592 (68.37 GiB 73.41 GB)
  Super Offset : 143374632 sectors
 State : clean
   Device UUID : 4e2b6b0a8e:92e91c0c:018a4bf0:9bb74d
   Update Time : Sun Jun 19 14:49:52 2005
  Checksum : 296bf133 - correct
Events : 172758

Layout : left-asymmetric
Chunk Size : 128K

  Array State : Uuuu

After which, I checked on /dev/sdn.

-([EMAIL PROTECTED])-(~/mdadm-2.0-devel-2)- # ./mdadm --misc -Q /dev/sdn
/dev/sdn: is not an md array
/dev/sdn: device 134639616 in 28 device undetected raid5 md-1.  Use 
mdadm --examine for more detail.


-([EMAIL PROTECTED])-(~/mdadm-2.0-devel-2)- # ./mdadm --misc -E /dev/sdn
/dev/sdn:
 Magic : a92b4efc
   Version : 01.00
Array UUID : 4e2b6b0a8e:92e91c0c:018a4bf0:9bb74d
  Name : md/md0
 Creation Time : Wed Dec 31 19:00:00 1969
Raid Level : raid5
  Raid Devices : 28

   Device Size : 

Re: Raid5 Failure

2005-07-14 Thread Neil Brown
On Thursday July 14, [EMAIL PROTECTED] wrote:
 
 It looks like the first 'segment of discs' sda-sdm are all marked clean; 
 while sdn-sdab are marked active.
 
 What can I do to resolve this issue? Any assistance would be greatly 
 appreciated.


Apply the following patch to mdadm-2.0-devel2 (it fixes a few bugs and
particularly make --assemble work) then try:

mdadm -A /dev/md0 /dev/sd[a-z] /dev/sd

Just list all 28 SCSI devices, I'm not sure what their names are.

This will quite probably fail.
If it does, try again with 
   --force

NeilBrown



Signed-off-by: Neil Brown [EMAIL PROTECTED]

### Diffstat output
 ./Assemble.c |   13 -
 ./Query.c|   33 +++--
 ./mdadm.h|2 +-
 ./super0.c   |1 +
 ./super1.c   |4 ++--
 5 files changed, 35 insertions(+), 18 deletions(-)

diff ./Assemble.c~current~ ./Assemble.c
--- ./Assemble.c~current~   2005-07-15 10:13:04.0 +1000
+++ ./Assemble.c2005-07-15 10:37:59.0 +1000
@@ -473,6 +473,7 @@ int Assemble(struct supertype *st, char 
if (!devices[j].uptodate)
continue;
info.disk.number = i;
+   info.disk.raid_disk = i;
info.disk.state = desired_state;
 
if (devices[j].uptodate 
@@ -526,7 +527,17 @@ int Assemble(struct supertype *st, char 
 
/* Almost ready to actually *do* something */
if (!old_linux) {
-   if (ioctl(mdfd, SET_ARRAY_INFO, NULL) != 0) {
+   int rv;
+   if ((vers % 100) = 1) { /* can use different versions */
+   mdu_array_info_t inf;
+   memset(inf, 0, sizeof(inf));
+   inf.major_version = st-ss-major;
+   inf.minor_version = st-minor_version;
+   rv = ioctl(mdfd, SET_ARRAY_INFO, inf);
+   } else 
+   rv = ioctl(mdfd, SET_ARRAY_INFO, NULL);
+
+   if (rv) {
fprintf(stderr, Name : SET_ARRAY_INFO failed for %s: 
%s\n,
mddev, strerror(errno));
return 1;

diff ./Query.c~current~ ./Query.c
--- ./Query.c~current~  2005-07-07 09:19:53.0 +1000
+++ ./Query.c   2005-07-15 11:38:18.0 +1000
@@ -105,26 +105,31 @@ int Query(char *dev)
if (superror == 0) {
/* array might be active... */
st-ss-getinfo_super(info, super);
-   mddev = get_md_name(info.array.md_minor);
-   disc.number = info.disk.number;
-   activity = undetected;
-   if (mddev  (fd = open(mddev, O_RDONLY))=0) {
-   if (md_get_version(fd) = 9000
-   ioctl(fd, GET_ARRAY_INFO, array)= 0) {
-   if (ioctl(fd, GET_DISK_INFO, disc) = 0 
-   
makedev((unsigned)disc.major,(unsigned)disc.minor) == stb.st_rdev)
-   activity = active;
-   else
-   activity = mismatch;
+   if (st-ss-major == 0) {
+   mddev = get_md_name(info.array.md_minor);
+   disc.number = info.disk.number;
+   activity = undetected;
+   if (mddev  (fd = open(mddev, O_RDONLY))=0) {
+   if (md_get_version(fd) = 9000
+   ioctl(fd, GET_ARRAY_INFO, array)= 0) {
+   if (ioctl(fd, GET_DISK_INFO, disc) = 
0 
+   
makedev((unsigned)disc.major,(unsigned)disc.minor) == stb.st_rdev)
+   activity = active;
+   else
+   activity = mismatch;
+   }
+   close(fd);
}
-   close(fd);
+   } else {
+   activity = unknown;
+   mddev = array;
}
-   printf(%s: device %d in %d device %s %s md%d.  Use mdadm 
--examine for more detail.\n,
+   printf(%s: device %d in %d device %s %s %s.  Use mdadm 
--examine for more detail.\n,
   dev, 
   info.disk.number, info.array.raid_disks,
   activity,
   map_num(pers, info.array.level),
-  info.array.md_minor);
+  mddev);
}
return 0;
 }

diff ./mdadm.h~current~ ./mdadm.h
--- ./mdadm.h~current~  2005-07-07 09:19:53.0 +1000
+++ ./mdadm.h   2005-07-15 10:15:51.0 +1000
@@ -73,7 +73,7 @@ struct mdinfo {
mdu_array_info_tarray;
mdu_disk_info_t disk;

Re: Re[2]: Bugreport mdadm-2.0-devel-1

2005-07-14 Thread Neil Brown
On Saturday July 9, [EMAIL PROTECTED] wrote:
 On Thursday July 7, [EMAIL PROTECTED] wrote:
  Hi Neil!
  Thanks much for your help, array creation using devel-2 just works,
  however, the array can't be assembled again after it's stopped:(
 
 Hmm, yeh, nor it can :-(
 
 I'm not sure when I'll have time to look at this (I'm on leave at the
 moment with family visiting and such) but I'll definitely get back to
 you by Thursday if not before.

Sorry for the delay.

The following patch against -devel2 should fix these problems if
(when?) you get more, please let me know.

NeilBrown

Signed-off-by: Neil Brown [EMAIL PROTECTED]

### Diffstat output
 ./Assemble.c |   13 -
 ./Query.c|   33 +++--
 ./mdadm.h|2 +-
 ./super0.c   |1 +
 ./super1.c   |4 ++--
 5 files changed, 35 insertions(+), 18 deletions(-)

diff ./Assemble.c~current~ ./Assemble.c
--- ./Assemble.c~current~   2005-07-15 10:13:04.0 +1000
+++ ./Assemble.c2005-07-15 10:37:59.0 +1000
@@ -473,6 +473,7 @@ int Assemble(struct supertype *st, char 
if (!devices[j].uptodate)
continue;
info.disk.number = i;
+   info.disk.raid_disk = i;
info.disk.state = desired_state;
 
if (devices[j].uptodate 
@@ -526,7 +527,17 @@ int Assemble(struct supertype *st, char 
 
/* Almost ready to actually *do* something */
if (!old_linux) {
-   if (ioctl(mdfd, SET_ARRAY_INFO, NULL) != 0) {
+   int rv;
+   if ((vers % 100) = 1) { /* can use different versions */
+   mdu_array_info_t inf;
+   memset(inf, 0, sizeof(inf));
+   inf.major_version = st-ss-major;
+   inf.minor_version = st-minor_version;
+   rv = ioctl(mdfd, SET_ARRAY_INFO, inf);
+   } else 
+   rv = ioctl(mdfd, SET_ARRAY_INFO, NULL);
+
+   if (rv) {
fprintf(stderr, Name : SET_ARRAY_INFO failed for %s: 
%s\n,
mddev, strerror(errno));
return 1;

diff ./Query.c~current~ ./Query.c
--- ./Query.c~current~  2005-07-07 09:19:53.0 +1000
+++ ./Query.c   2005-07-15 11:38:18.0 +1000
@@ -105,26 +105,31 @@ int Query(char *dev)
if (superror == 0) {
/* array might be active... */
st-ss-getinfo_super(info, super);
-   mddev = get_md_name(info.array.md_minor);
-   disc.number = info.disk.number;
-   activity = undetected;
-   if (mddev  (fd = open(mddev, O_RDONLY))=0) {
-   if (md_get_version(fd) = 9000
-   ioctl(fd, GET_ARRAY_INFO, array)= 0) {
-   if (ioctl(fd, GET_DISK_INFO, disc) = 0 
-   
makedev((unsigned)disc.major,(unsigned)disc.minor) == stb.st_rdev)
-   activity = active;
-   else
-   activity = mismatch;
+   if (st-ss-major == 0) {
+   mddev = get_md_name(info.array.md_minor);
+   disc.number = info.disk.number;
+   activity = undetected;
+   if (mddev  (fd = open(mddev, O_RDONLY))=0) {
+   if (md_get_version(fd) = 9000
+   ioctl(fd, GET_ARRAY_INFO, array)= 0) {
+   if (ioctl(fd, GET_DISK_INFO, disc) = 
0 
+   
makedev((unsigned)disc.major,(unsigned)disc.minor) == stb.st_rdev)
+   activity = active;
+   else
+   activity = mismatch;
+   }
+   close(fd);
}
-   close(fd);
+   } else {
+   activity = unknown;
+   mddev = array;
}
-   printf(%s: device %d in %d device %s %s md%d.  Use mdadm 
--examine for more detail.\n,
+   printf(%s: device %d in %d device %s %s %s.  Use mdadm 
--examine for more detail.\n,
   dev, 
   info.disk.number, info.array.raid_disks,
   activity,
   map_num(pers, info.array.level),
-  info.array.md_minor);
+  mddev);
}
return 0;
 }

diff ./mdadm.h~current~ ./mdadm.h
--- ./mdadm.h~current~  2005-07-07 09:19:53.0 +1000
+++ ./mdadm.h   2005-07-15 10:15:51.0 +1000
@@ -73,7 +73,7 @@ struct mdinfo {
mdu_array_info_tarray;

Re: RAID-5 streaming read performance

2005-07-14 Thread Dan Christensen
Ming Zhang [EMAIL PROTECTED] writes:

 On Thu, 2005-07-14 at 19:29 -0400, Mark Hahn wrote:

  i also want a way to clear part of the whole page cache by file id. :)
 
 understandably, kernel developers are don't high-prioritize this sort of 
 not-useful-for-normal-work feature.
 agree.

Clearing just part of the page cache sounds too complicated to be
worth it, but clearing it all seems reasonable;  some kernel developers
spend time doing benchmarks too!

  Dan Christensen wrote:
 
   I'm really surprised there isn't something in /proc you can use to
   clear or disable the cache.  Would be very useful for benchmarking!
 
 I assume you noticed blockdev --flushbufs, no?  it works for me 

I had tried this and noticed that it didn't work for files on a
filesystem.  But it does seem to work for block devices.  That's
great, thanks.  I didn't realize the cache was so complicated;
it can be retained for files but not for the block device underlying
those files!  

 a test i did show that even you have sda and sdb to form a raid0,
 the page cache for sda and sdb will not be used by raid0. kind of
 funny.

I thought I had noticed raid devices making use of cache from
underlying devices, but a test I just did agrees with your result, for
both RAID-1 and RAID-5.  Again, this seems odd.  Shouldn't the raid
layer take advantage of a block that's already in RAM?  I guess this
won't matter in practice, since you usually don't read from both a
raid device and an underlying device.

Dan

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Raid5 Failure

2005-07-14 Thread David M. Strang

Neil -

You are the man; the array went w/o force - and is rebuilding now!

-([EMAIL PROTECTED])-(/)- # mdadm --detail /dev/md0
/dev/md0:
   Version : 01.00.01
 Creation Time : Wed Dec 31 19:00:00 1969
Raid Level : raid5
Array Size : 1935556992 (1845.89 GiB 1982.01 GB)
   Device Size : 71687296 (68.37 GiB 73.41 GB)
  Raid Devices : 28
 Total Devices : 28
Preferred Minor : 0
   Persistence : Superblock is persistent

   Update Time : Thu Jul 14 22:07:18 2005
 State : active, resyncing
Active Devices : 28
Working Devices : 28
Failed Devices : 0
 Spare Devices : 0

Layout : left-asymmetric
Chunk Size : 128K

Rebuild Status : 0% complete

  UUID : 4e2b6b0a8e:92e91c0c:018a4bf0:9bb74d
Events : 172760

   Number   Major   Minor   RaidDevice State
  0   800  active sync   /dev/evms/.nodes/sda
  1   8   161  active sync   /dev/evms/.nodes/sdb
  2   8   322  active sync   /dev/evms/.nodes/sdc
  3   8   483  active sync   /dev/evms/.nodes/sdd
  4   8   644  active sync   /dev/evms/.nodes/sde
  5   8   805  active sync   /dev/evms/.nodes/sdf
  6   8   966  active sync   /dev/evms/.nodes/sdg
  7   8  1127  active sync   /dev/evms/.nodes/sdh
  8   8  1288  active sync   /dev/evms/.nodes/sdi
  9   8  1449  active sync   /dev/evms/.nodes/sdj
 10   8  160   10  active sync   /dev/evms/.nodes/sdk
 11   8  176   11  active sync   /dev/evms/.nodes/sdl
 12   8  192   12  active sync   /dev/evms/.nodes/sdm
 13   8  208   13  active sync   /dev/evms/.nodes/sdn
 14   8  224   14  active sync   /dev/evms/.nodes/sdo
 15   8  240   15  active sync   /dev/evms/.nodes/sdp
 16  650   16  active sync   /dev/evms/.nodes/sdq
 17  65   16   17  active sync   /dev/evms/.nodes/sdr
 18  65   32   18  active sync   /dev/evms/.nodes/sds
 19  65   48   19  active sync   /dev/evms/.nodes/sdt
 20  65   64   20  active sync   /dev/evms/.nodes/sdu
 21  65   80   21  active sync   /dev/evms/.nodes/sdv
 22  65   96   22  active sync   /dev/evms/.nodes/sdw
 23  65  112   23  active sync   /dev/evms/.nodes/sdx
 24  65  128   24  active sync   /dev/evms/.nodes/sdy
 25  65  144   25  active sync   /dev/evms/.nodes/sdz
 26  65  160   26  active sync   /dev/evms/.nodes/sdaa
 27  65  176   27  active sync   /dev/evms/.nodes/sdab


-- David M. Strang

- Original Message - 
From: Neil Brown

To: David M. Strang
Cc: linux-raid@vger.kernel.org
Sent: Thursday, July 14, 2005 9:43 PM
Subject: Re: Raid5 Failure


On Thursday July 14, [EMAIL PROTECTED] wrote:


It looks like the first 'segment of discs' sda-sdm are all marked clean;
while sdn-sdab are marked active.

What can I do to resolve this issue? Any assistance would be greatly
appreciated.



Apply the following patch to mdadm-2.0-devel2 (it fixes a few bugs and
particularly make --assemble work) then try:

mdadm -A /dev/md0 /dev/sd[a-z] /dev/sd

Just list all 28 SCSI devices, I'm not sure what their names are.

This will quite probably fail.
If it does, try again with
  --force

NeilBrown



Signed-off-by: Neil Brown [EMAIL PROTECTED]

### Diffstat output
./Assemble.c |   13 -
./Query.c|   33 +++--
./mdadm.h|2 +-
./super0.c   |1 +
./super1.c   |4 ++--
5 files changed, 35 insertions(+), 18 deletions(-)

diff ./Assemble.c~current~ ./Assemble.c
--- ./Assemble.c~current~ 2005-07-15 10:13:04.0 +1000
+++ ./Assemble.c 2005-07-15 10:37:59.0 +1000
@@ -473,6 +473,7 @@ int Assemble(struct supertype *st, char
 if (!devices[j].uptodate)
 continue;
 info.disk.number = i;
+ info.disk.raid_disk = i;
 info.disk.state = desired_state;

 if (devices[j].uptodate 
@@ -526,7 +527,17 @@ int Assemble(struct supertype *st, char

 /* Almost ready to actually *do* something */
 if (!old_linux) {
- if (ioctl(mdfd, SET_ARRAY_INFO, NULL) != 0) {
+ int rv;
+ if ((vers % 100) = 1) { /* can use different versions */
+ mdu_array_info_t inf;
+ memset(inf, 0, sizeof(inf));
+ inf.major_version = st-ss-major;
+ inf.minor_version = st-minor_version;
+ rv = ioctl(mdfd, SET_ARRAY_INFO, inf);
+ } else
+ rv = ioctl(mdfd, SET_ARRAY_INFO, NULL);
+
+ if (rv) {
 fprintf(stderr, Name : SET_ARRAY_INFO failed for %s: %s\n,
 mddev, strerror(errno));
 return 1;

diff ./Query.c~current~ ./Query.c
--- ./Query.c~current~ 2005-07-07 09:19:53.0 +1000
+++ ./Query.c 2005-07-15 11:38:18.0 +1000
@@ -105,26 +105,31 @@ int