Re: Software RAID (non-preempt) server blocking question. (2.6.20.4)

2007-03-30 Thread Justin Piszcz



On Fri, 30 Mar 2007, Neil Brown wrote:


On Thursday March 29, [EMAIL PROTECTED] wrote:




Did you look at cat /proc/mdstat ?? What sort of speed was the check
running at?

Around 44MB/s.

I do use the following optimization, perhaps a bad idea if I want other
processes to 'stay alive'?

echo Setting minimum resync speed to 200MB/s...
echo This improves the resync speed from 2.1MB/s to 44MB/s
echo 20  /sys/block/md0/md/sync_speed_min
echo 20  /sys/block/md1/md/sync_speed_min
echo 20  /sys/block/md2/md/sync_speed_min
echo 20  /sys/block/md3/md/sync_speed_min
echo 20  /sys/block/md4/md/sync_speed_min



Yes, well

You told it to use up to 200MB/s and the drives are only delivering
44MB/s, so they will be taking nearly all of the available bandwidth.
You shouldn't be too surprised if other things suffer.

NeilBrown



Understood, will reduce this, thanks.

Justin.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: is this raid5 OK ?

2007-03-30 Thread Rainer Fuegenstein

hi,

1) the kernel was:
[EMAIL PROTECTED] ~]# uname -a
Linux alfred 2.6.19-1.2288.fc5xen0 #1 SMP Sat Feb 10 16:57:02 EST 2007 
i686 athlon i386 GNU/Linux


now upgraded to:

[EMAIL PROTECTED] ~]# uname -a
Linux alfred 2.6.20-1.2307.fc5xen0 #1 SMP Sun Mar 18 21:59:42 EDT 2007 
i686 athlon i386 GNU/Linux


OS is fedora core 6

[EMAIL PROTECTED] ~]# mdadm --version
mdadm - v2.3.1 - 6 February 2006

2) I got the impression that the old 350W power supply was to weak, I 
replaced it by a 400W version.


3) re-created the raid:

[EMAIL PROTECTED] ~]# mdadm --misc --zero-superblock /dev/hde1
[EMAIL PROTECTED] ~]# mdadm --misc --zero-superblock /dev/hdf1
[EMAIL PROTECTED] ~]# mdadm --misc --zero-superblock /dev/hdg1
[EMAIL PROTECTED] ~]# mdadm --misc --zero-superblock /dev/hdh1
[EMAIL PROTECTED] ~]# mdadm --create --verbose /dev/md0 --level=5 
--raid-devices=4 --spare-devices=0 /dev/hde1 /dev/hdf1 /dev/hdg1 /dev/hdh1

mdadm: layout defaults to left-symmetric
mdadm: chunk size defaults to 64K
mdadm: size set to 390708736K
mdadm: array /dev/md0 started.
[EMAIL PROTECTED] ~]# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 hdh1[4] hdg1[2] hdf1[1] hde1[0]
  1172126208 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_]

unused devices: none

same as before.

4) did as dan suggested:

[EMAIL PROTECTED] ~]# mdadm -S /dev/md0
[EMAIL PROTECTED] ~]# mdadm --misc --zero-superblock /dev/hde1
[EMAIL PROTECTED] ~]# mdadm --misc --zero-superblock /dev/hdf1
[EMAIL PROTECTED] ~]# mdadm --misc --zero-superblock /dev/hdg1
[EMAIL PROTECTED] ~]# mdadm --misc --zero-superblock /dev/hdh1
[EMAIL PROTECTED] ~]# mdadm --create /dev/md0 -n 4 -l 5 /dev/hd[efg]1 missing
mdadm: array /dev/md0 started.
[EMAIL PROTECTED] ~]# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 hdg1[2] hdf1[1] hde1[0]
  1172126208 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_]

unused devices: none
[EMAIL PROTECTED] ~]# mdadm --add /dev/md0 /dev/hdh1
mdadm: added /dev/hdh1
[EMAIL PROTECTED] ~]# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 hdh1[4] hdg1[2] hdf1[1] hde1[0]
  1172126208 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_]
  []  recovery =  0.0% (47984/390708736) 
finish=406.9min speed=15994K/sec


unused devices: none

seems like it's working now - tnx !

cu
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: is this raid5 OK ?

2007-03-30 Thread Bill Davidsen

Rainer Fuegenstein wrote:

hi,

1) the kernel was:
[EMAIL PROTECTED] ~]# uname -a
Linux alfred 2.6.19-1.2288.fc5xen0 #1 SMP Sat Feb 10 16:57:02 EST 2007 
i686 athlon i386 GNU/Linux


now upgraded to:

[EMAIL PROTECTED] ~]# uname -a
Linux alfred 2.6.20-1.2307.fc5xen0 #1 SMP Sun Mar 18 21:59:42 EDT 2007 
i686 athlon i386 GNU/Linux


OS is fedora core 6

[EMAIL PROTECTED] ~]# mdadm --version
mdadm - v2.3.1 - 6 February 2006

2) I got the impression that the old 350W power supply was to weak, I 
replaced it by a 400W version.


3) re-created the raid:

[EMAIL PROTECTED] ~]# mdadm --misc --zero-superblock /dev/hde1
[EMAIL PROTECTED] ~]# mdadm --misc --zero-superblock /dev/hdf1
[EMAIL PROTECTED] ~]# mdadm --misc --zero-superblock /dev/hdg1
[EMAIL PROTECTED] ~]# mdadm --misc --zero-superblock /dev/hdh1
[EMAIL PROTECTED] ~]# mdadm --create --verbose /dev/md0 --level=5 
--raid-devices=4 --spare-devices=0 /dev/hde1 /dev/hdf1 /dev/hdg1 
/dev/hdh1

mdadm: layout defaults to left-symmetric
mdadm: chunk size defaults to 64K
mdadm: size set to 390708736K
mdadm: array /dev/md0 started.
[EMAIL PROTECTED] ~]# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 hdh1[4] hdg1[2] hdf1[1] hde1[0]
  1172126208 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_]

unused devices: none

same as before.

4) did as dan suggested:

[EMAIL PROTECTED] ~]# mdadm -S /dev/md0
[EMAIL PROTECTED] ~]# mdadm --misc --zero-superblock /dev/hde1
[EMAIL PROTECTED] ~]# mdadm --misc --zero-superblock /dev/hdf1
[EMAIL PROTECTED] ~]# mdadm --misc --zero-superblock /dev/hdg1
[EMAIL PROTECTED] ~]# mdadm --misc --zero-superblock /dev/hdh1
[EMAIL PROTECTED] ~]# mdadm --create /dev/md0 -n 4 -l 5 /dev/hd[efg]1 missing
mdadm: array /dev/md0 started.
[EMAIL PROTECTED] ~]# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 hdg1[2] hdf1[1] hde1[0]
  1172126208 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_]

unused devices: none
[EMAIL PROTECTED] ~]# mdadm --add /dev/md0 /dev/hdh1
mdadm: added /dev/hdh1
[EMAIL PROTECTED] ~]# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 hdh1[4] hdg1[2] hdf1[1] hde1[0]
  1172126208 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_]
  []  recovery =  0.0% (47984/390708736) 
finish=406.9min speed=15994K/sec


unused devices: none

seems like it's working now - tnx !


This still looks odd, why should it behave like this. I have created a 
lot of arrays (when I was doing the RAID5 speed testing thread), and 
never had anything like this. I'd like to see dmesg to see if there was 
an error reported regarding this.


I think there's more going on, the original post showed the array as up 
rather than some building status, also indicates some issue, perhaps. 
What is the partition type of each of these partitions? Perhaps there's 
a clue there.


--
bill davidsen [EMAIL PROTECTED]
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mdadm: RUN_ARRAY failed: Cannot allocate memory

2007-03-30 Thread Bill Davidsen

Neil Brown wrote:

On Saturday March 24, [EMAIL PROTECTED] wrote:
  
 	Hello Neil ,  I found the problem that caused the 'cannot allcate 
memory' ,  DON'T use '--bitmap=' .

But that said ,  H ,  Shouldn't mdadm just stop  say ...
'md: bitmaps not supported for this level.'
Like it puts out into  dmesg .

Also think this message in dmesg is interesting .
raid0: bad disk number -1 - aborting!'

Hth ,  JimL



Yeah mdadm should be fixed too, but this kernel patch should make
it behave a bit better.  I'll queue it for 2.6.22.
  
Given the release cycle, this might fit 2.6.21-rc6 (is is a fix), or 
stable 2.6.21.1 if 21 comes out soon. In any case it could go in -mm for 
testing and to be sure it would be pushed at an appropriate time.

Thanks,
NeilBrown


Move test for whether level supports bitmap to correct place.

We need to check for internal-consistency of superblock in
load_super.  validate_super is for inter-device consistency.


Signed-off-by: Neil Brown [EMAIL PROTECTED]

### Diffstat output
 ./drivers/md/md.c |   42 ++
 1 file changed, 26 insertions(+), 16 deletions(-)

diff .prev/drivers/md/md.c ./drivers/md/md.c
--- .prev/drivers/md/md.c   2007-03-29 16:42:18.0 +1000
+++ ./drivers/md/md.c   2007-03-29 16:49:26.0 +1000
@@ -695,6 +695,17 @@ static int super_90_load(mdk_rdev_t *rde
rdev-data_offset = 0;
rdev-sb_size = MD_SB_BYTES;
 
+	if (sb-state  (1MD_SB_BITMAP_PRESENT)) {

+   if (sb-level != 1  sb-level != 4
+sb-level != 5  sb-level != 6
+sb-level != 10) {
+   /* FIXME use a better test */
+   printk(KERN_WARNING
+  md: bitmaps not supported for this level.\n);
+   goto abort;
+   }
+   }
+
if (sb-level == LEVEL_MULTIPATH)
rdev-desc_nr = -1;
else
@@ -793,16 +804,8 @@ static int super_90_validate(mddev_t *md
mddev-max_disks = MD_SB_DISKS;
 
 		if (sb-state  (1MD_SB_BITMAP_PRESENT) 

-   mddev-bitmap_file == NULL) {
-   if (mddev-level != 1  mddev-level != 4
-mddev-level != 5  mddev-level != 6
-mddev-level != 10) {
-   /* FIXME use a better test */
-   printk(KERN_WARNING md: bitmaps not supported for 
this level.\n);
-   return -EINVAL;
-   }
+   mddev-bitmap_file == NULL)
mddev-bitmap_offset = mddev-default_bitmap_offset;
-   }
 
 	} else if (mddev-pers == NULL) {

/* Insist on good event counter while assembling */
@@ -1059,6 +1062,18 @@ static int super_1_load(mdk_rdev_t *rdev
   bdevname(rdev-bdev,b));
return -EINVAL;
}
+   if ((le32_to_cpu(sb-feature_map)  MD_FEATURE_BITMAP_OFFSET)) {
+   if (sb-level != cpu_to_le32(1) 
+   sb-level != cpu_to_le32(4) 
+   sb-level != cpu_to_le32(5) 
+   sb-level != cpu_to_le32(6) 
+   sb-level != cpu_to_le32(10)) {
+   printk(KERN_WARNING
+  md: bitmaps not supported for this level.\n);
+   return -EINVAL;
+   }
+   }
+
rdev-preferred_minor = 0x;
rdev-data_offset = le64_to_cpu(sb-data_offset);
atomic_set(rdev-corrected_errors, 
le32_to_cpu(sb-cnt_corrected_read));
@@ -1142,14 +1157,9 @@ static int super_1_validate(mddev_t *mdd
mddev-max_disks =  (4096-256)/2;
 
 		if ((le32_to_cpu(sb-feature_map)  MD_FEATURE_BITMAP_OFFSET) 

-   mddev-bitmap_file == NULL ) {
-   if (mddev-level != 1  mddev-level != 5  
mddev-level != 6
-mddev-level != 10) {
-   printk(KERN_WARNING md: bitmaps not supported for 
this level.\n);
-   return -EINVAL;
-   }
+   mddev-bitmap_file == NULL )
mddev-bitmap_offset = 
(__s32)le32_to_cpu(sb-bitmap_offset);
-   }
+
if ((le32_to_cpu(sb-feature_map)  MD_FEATURE_RESHAPE_ACTIVE)) 
{
mddev-reshape_position = 
le64_to_cpu(sb-reshape_position);
mddev-delta_disks = le32_to_cpu(sb-delta_disks);
  

--

bill davidsen [EMAIL PROTECTED]
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: is this raid5 OK ?

2007-03-30 Thread Rainer Fuegenstein

Bill Davidsen wrote:

This still looks odd, why should it behave like this. I have created a 
lot of arrays (when I was doing the RAID5 speed testing thread), and 
never had anything like this. I'd like to see dmesg to see if there was 
an error reported regarding this.


I think there's more going on, the original post showed the array as up 
rather than some building status, also indicates some issue, perhaps. 
What is the partition type of each of these partitions? Perhaps there's 
a clue there.


partition type is FD (linux raid autodetect) on all disks.

here's some more info:
the hardware is pretty old, an 800MHz ASUS board with AMD cpu and an 
extra onboard promise IDE controller with two channels. the server was 
working well with a 60 GB hda disk (system) and a single 400 GB disk 
(hde) for data. kernel was 2.6.19-1.2288.fc5xen0.


when I added 3 more 400 GB disks (hdf to hdh) and created the raid5, the 
server crashed (rebooted, freezed, ...) as soon as there was more 
activity on the raid (kernel panics indicating trouble with interrupts, 
inpage errors etc.) I then upgraded to a 400W power supply, which didn't 
help.  I went back to two single (non-raid) 400 GB disks - same problem.


finally, I figured out that the non-xen kernel works without problems. 
I'm filling the raid5 since several hours now and the system is still 
stable.


I haven't tried to re-create the raid5 using the non-xen kernel, it was 
created using the xen kernel. maybe xen could be the problem ?


I was wrong in my last post - OS  is actually fedora core 5 (sorry for 
the typo)


current state of the raid5:

[EMAIL PROTECTED] ~]# mdadm --detail --scan
ARRAY /dev/md0 level=raid5 num-devices=4 spares=1 
UUID=e96cd8fe:c56c3438:6d9b6c14:9f0eebda

[EMAIL PROTECTED] ~]# mdadm --misc --detail /dev/md0
/dev/md0:
Version : 00.90.03
  Creation Time : Fri Mar 30 15:55:42 2007
 Raid Level : raid5
 Array Size : 1172126208 (1117.83 GiB 1200.26 GB)
Device Size : 390708736 (372.61 GiB 400.09 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 0
Persistence : Superblock is persistent

Update Time : Fri Mar 30 20:22:27 2007
  State : active, degraded, recovering
 Active Devices : 3
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 1

 Layout : left-symmetric
 Chunk Size : 64K

 Rebuild Status : 12% complete

   UUID : e96cd8fe:c56c3438:6d9b6c14:9f0eebda
 Events : 0.26067

Number   Major   Minor   RaidDevice State
   0  3310  active sync   /dev/hde1
   1  33   651  active sync   /dev/hdf1
   2  3412  active sync   /dev/hdg1
   4  34   653  spare rebuilding   /dev/hdh1


here's the dmesg of the last reboot (when the raid was already created, 
but still syncing):


Linux version 2.6.20-1.2307.fc5 
([EMAIL PROTECTED]) (gcc version 4.1.1 20070105 
(Red Hat 4.1.1-51)) #1 Sun Mar 18 20:44:48 EDT 2007

BIOS-provided physical RAM map:
sanitize start
sanitize end
copy_e820_map() start:  size: 0009f000 end: 
0009f000 type: 1

copy_e820_map() type is E820_RAM
copy_e820_map() start: 0009f000 size: 1000 end: 
000a type: 2
copy_e820_map() start: 000f size: 0001 end: 
0010 type: 2
copy_e820_map() start: 0010 size: 1feec000 end: 
1ffec000 type: 1

copy_e820_map() type is E820_RAM
copy_e820_map() start: 1ffec000 size: 3000 end: 
1ffef000 type: 3
copy_e820_map() start: 1ffef000 size: 0001 end: 
1000 type: 2
copy_e820_map() start: 1000 size: 1000 end: 
2000 type: 4
copy_e820_map() start:  size: 0001 end: 
0001 type: 2

 BIOS-e820:  - 0009f000 (usable)
 BIOS-e820: 0009f000 - 000a (reserved)
 BIOS-e820: 000f - 0010 (reserved)
 BIOS-e820: 0010 - 1ffec000 (usable)
 BIOS-e820: 1ffec000 - 1ffef000 (ACPI data)
 BIOS-e820: 1ffef000 - 1000 (reserved)
 BIOS-e820: 1000 - 2000 (ACPI NVS)
 BIOS-e820:  - 0001 (reserved)
0MB HIGHMEM available.
511MB LOWMEM available.
Using x86 segment limits to approximate NX protection
Entering add_active_range(0, 0, 131052) 0 entries of 256 used
Zone PFN ranges:
  DMA 0 - 4096
  Normal   4096 -   131052
  HighMem131052 -   131052
early_node_map[1] active PFN ranges
0:0 -   131052
On node 0 totalpages: 131052
  DMA zone: 32 pages used for memmap
  DMA zone: 0 pages reserved
  DMA zone: 4064 pages, LIFO batch:0
  Normal zone: 991 pages used for memmap
  Normal zone: 125965 pages, LIFO batch:31
  HighMem zone: 0 pages used for memmap
DMI 2.3 present.
Using APIC driver default
ACPI: RSDP 

Re: is this raid5 OK ?

2007-03-30 Thread Justin Piszcz


On Fri, 30 Mar 2007, Rainer Fuegenstein wrote:


Bill Davidsen wrote:

This still looks odd, why should it behave like this. I have created a lot 
of arrays (when I was doing the RAID5 speed testing thread), and never had 
anything like this. I'd like to see dmesg to see if there was an error 
reported regarding this.


I think there's more going on, the original post showed the array as up 
rather than some building status, also indicates some issue, perhaps. What 
is the partition type of each of these partitions? Perhaps there's a clue 
there.


partition type is FD (linux raid autodetect) on all disks.

here's some more info:
the hardware is pretty old, an 800MHz ASUS board with AMD cpu and an extra 
onboard promise IDE controller with two channels. the server was working well 
with a 60 GB hda disk (system) and a single 400 GB disk (hde) for data. 
kernel was 2.6.19-1.2288.fc5xen0.


when I added 3 more 400 GB disks (hdf to hdh) and created the raid5, the 
server crashed (rebooted, freezed, ...) as soon as there was more activity on 
the raid (kernel panics indicating trouble with interrupts, inpage errors 
etc.) I then upgraded to a 400W power supply, which didn't help.  I went back 
to two single (non-raid) 400 GB disks - same problem.


finally, I figured out that the non-xen kernel works without problems. I'm 
filling the raid5 since several hours now and the system is still stable.


I haven't tried to re-create the raid5 using the non-xen kernel, it was 
created using the xen kernel. maybe xen could be the problem ?


I was wrong in my last post - OS  is actually fedora core 5 (sorry for the 
typo)


PCI: Disabling Via external APIC routing


I will note there is the ominous '400GB' lockup bug with certain promise
controllers.

With the Promise ATA/133 controllers in some configurations you will get
a DRQ/lockup no matter what, replacing with an ATA/100 card and no
issues.  But I see you have a 20265 with is an ATA/100 promise/chipset.

Just out of curiosity have you tried writing or running badblocks on
each parition simultaenously, this would simulate (somewhat) the I/O
sent/received to the drives during a RAID5 rebuild.

Justin.

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raid5 write performance

2007-03-30 Thread Raz Ben-Jehuda(caro)

Please see bellow.

On 8/28/06, Neil Brown [EMAIL PROTECTED] wrote:

On Sunday August 13, [EMAIL PROTECTED] wrote:
 well ... me again

 Following your advice

 I added a deadline for every WRITE stripe head when it is created.
 in raid5_activate_delayed i checked if deadline is expired and if not i am
 setting the sh to prereadactive mode as .

 This small fix ( and in few other places in the code) reduced the
 amount of reads
 to zero with dd but with no improvement to throghput. But with random access 
to
 the raid  ( buffers are aligned by the stripe width and with the size
 of stripe width )
 there is an improvement of at least 20 % .

 Problem is that a user must know what he is doing else there would be
 a reduction
 in performance if deadline line it too long (say 100 ms).

So if I understand you correctly, you are delaying write requests to
partial stripes slightly (your 'deadline') and this is sometimes
giving you a 20% improvement ?

I'm not surprised that you could get some improvement.  20% is quite
surprising.  It would be worth following through with this to make
that improvement generally available.

As you say, picking a time in milliseconds is very error prone.  We
really need to come up with something more natural.
I had hopped that the 'unplug' infrastructure would provide the right
thing, but apparently not.  Maybe unplug is just being called too
often.

I'll see if I can duplicate this myself and find out what is really
going on.

Thanks for the report.

NeilBrown



Neil Hello. I am sorry for this interval , I was assigned abruptly to
a different project.

1.
 I'd taken a look at the raid5 delay patch I have written a while
ago. I ported it to 2.6.17 and tested it. it makes sounds of working
and when used correctly it eliminates the reads penalty.

2. Benchmarks .
   configuration:
I am testing a raid5 x 3 disks with 1MB chunk size.  IOs are
synchronous and non-buffered(o_direct) , 2 MB in size and always
aligned to the beginning of a stripe. kernel is 2.6.17. The
stripe_delay was set to 10ms.

Attached is the simple_write code.

command :
  simple_write /dev/md1 2048 0 1000
  simple_write raw writes (O_DIRECT) sequentially
starting from offset zero 2048 kilobytes 1000 times.

Benchmark Before patch

sda1848.00  8384.00 50992.00   8384  50992
sdb1995.00 12424.00 51008.00  12424  51008
sdc1698.00  8160.00 51000.00   8160  51000
sdd   0.00 0.00 0.00  0  0
md0   0.00 0.00 0.00  0  0
md1 450.00 0.00102400.00  0 102400


Benchmark After patch

sda 389.11 0.00128530.69  0 129816
sdb 381.19 0.00129354.46  0 130648
sdc 383.17 0.00128530.69  0 129816
sdd   0.00 0.00 0.00  0  0
md0   0.00 0.00 0.00  0  0
md11140.59 0.00259548.51  0 262144

As one can see , no additional reads were done. One can actually
calculate  the raid's utilization: n-1/n * ( single disk throughput
with 1M writes ) .


 3.  The patch code.
 Kernel tested above was 2.6.17. The patch is of 2.6.20.2
because I have noticed a big code differences between 17 to 20.x .
This patch was not tested on 2.6.20.2 but it is essentialy the same. I
have not tested (yet) degraded mode or any other non-common pathes.


--- linux-2.6.20.2/drivers/md/raid5.c   2007-03-09 20:58:04.0 +0200
+++ linux-2.6.20.2-raid/drivers/md/raid5.c  2007-03-30
12:37:55.0 +0300
@@ -65,6 +65,7 @@
#define NR_HASH(PAGE_SIZE / sizeof(struct hlist_head))
#define HASH_MASK  (NR_HASH - 1)

+
#define stripe_hash(conf, sect)
(((conf)-stripe_hashtbl[((sect)  STRIPE_SHIFT)  HASH_MASK]))

/* bio's attached to a stripe+device for I/O are linked together in bi_sector
@@ -234,6 +235,8 @@
   sh-sector = sector;
   sh-pd_idx = pd_idx;
   sh-state = 0;
+   sh-active_preread_jiffies =
+   msecs_to_jiffies(
atomic_read(conf-deadline_ms) )+ jiffies;

   sh-disks = disks;

@@ -628,6 +631,7 @@

   clear_bit(R5_LOCKED, sh-dev[i].flags);
   set_bit(STRIPE_HANDLE, sh-state);
+   sh-active_preread_jiffies = jiffies;
   release_stripe(sh);
   return 0;
}
@@ -1255,8 +1259,11 @@
   bip = sh-dev[dd_idx].towrite;
   if (*bip == NULL  sh-dev[dd_idx].written == NULL)
   firstwrite = 1;
-   } else
+   } else{
   bip = sh-dev[dd_idx].toread;
+   sh-active_preread_jiffies = jiffies;
+   }
+
   while (*bip  (*bip)-bi_sector  bi-bi_sector) {
   if ((*bip)-bi_sector + ((*bip)-bi_size  9)  bi-bi_sector)

Re: is this raid5 OK ?

2007-03-30 Thread Bill Davidsen

-wheneverRainer Fuegenstein wrote:

Bill Davidsen wrote:

This still looks odd, why should it behave like this. I have created 
a lot of arrays (when I was doing the RAID5 speed testing thread), 
and never had anything like this. I'd like to see dmesg to see if 
there was an error reported regarding this.


I think there's more going on, the original post showed the array as 
up rather than some building status, also indicates some issue, 
perhaps. What is the partition type of each of these partitions? 
Perhaps there's a clue there.


partition type is FD (linux raid autodetect) on all disks.

here's some more info:
the hardware is pretty old, an 800MHz ASUS board with AMD cpu and an 
extra onboard promise IDE controller with two channels. the server was 
working well with a 60 GB hda disk (system) and a single 400 GB disk 
(hde) for data. kernel was 2.6.19-1.2288.fc5xen0.


when I added 3 more 400 GB disks (hdf to hdh) and created the raid5, 
the server crashed (rebooted, freezed, ...) as soon as there was more 
activity on the raid (kernel panics indicating trouble with 
interrupts, inpage errors etc.) I then upgraded to a 400W power 
supply, which didn't help.  I went back to two single (non-raid) 400 
GB disks - same problem.


finally, I figured out that the non-xen kernel works without problems. 
I'm filling the raid5 since several hours now and the system is still 
stable.


I haven't tried to re-create the raid5 using the non-xen kernel, it 
was created using the xen kernel. maybe xen could be the problem ?
I think it sounds likely at this point, I have been having issues with 
xen FC6 kernels, so perhaps the build or testing environment has changed.


However, I would round up the usual suspects, check all cables tight, 
check master/slave jumper settings on drives, etc. Be sure you have the 
appropriate cables, 80 pin where needed. Unless you need the xen kernel 
you might be better off without it for now.


The rest of your details were complete but didn't give me a clue, sorry.


I was wrong in my last post - OS  is actually fedora core 5 (sorry for 
the typo)


current state of the raid5:

[EMAIL PROTECTED] ~]# mdadm --detail --scan
ARRAY /dev/md0 level=raid5 num-devices=4 spares=1 
UUID=e96cd8fe:c56c3438:6d9b6c14:9f0eebda

[EMAIL PROTECTED] ~]# mdadm --misc --detail /dev/md0
/dev/md0:
Version : 00.90.03
  Creation Time : Fri Mar 30 15:55:42 2007
 Raid Level : raid5
 Array Size : 1172126208 (1117.83 GiB 1200.26 GB)
Device Size : 390708736 (372.61 GiB 400.09 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 0
Persistence : Superblock is persistent

Update Time : Fri Mar 30 20:22:27 2007
  State : active, degraded, recovering
 Active Devices : 3
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 1

 Layout : left-symmetric
 Chunk Size : 64K

 Rebuild Status : 12% complete

   UUID : e96cd8fe:c56c3438:6d9b6c14:9f0eebda
 Events : 0.26067

Number   Major   Minor   RaidDevice State
   0  3310  active sync   /dev/hde1
   1  33   651  active sync   /dev/hdf1
   2  3412  active sync   /dev/hdg1
   4  34   653  spare rebuilding   /dev/hdh1


here's the dmesg of the last reboot (when the raid was already 
created, but still syncing):

[ since it told me nothing useful I deleted it ]

--
bill davidsen [EMAIL PROTECTED]
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: is this raid5 OK ?

2007-03-30 Thread Bill Davidsen

Justin Piszcz wrote:


On Fri, 30 Mar 2007, Rainer Fuegenstein wrote:


Bill Davidsen wrote:

This still looks odd, why should it behave like this. I have created 
a lot of arrays (when I was doing the RAID5 speed testing thread), 
and never had anything like this. I'd like to see dmesg to see if 
there was an error reported regarding this.


I think there's more going on, the original post showed the array as 
up rather than some building status, also indicates some issue, 
perhaps. What is the partition type of each of these partitions? 
Perhaps there's a clue there.


partition type is FD (linux raid autodetect) on all disks.

here's some more info:
the hardware is pretty old, an 800MHz ASUS board with AMD cpu and an 
extra onboard promise IDE controller with two channels. the server 
was working well with a 60 GB hda disk (system) and a single 400 GB 
disk (hde) for data. kernel was 2.6.19-1.2288.fc5xen0.


when I added 3 more 400 GB disks (hdf to hdh) and created the raid5, 
the server crashed (rebooted, freezed, ...) as soon as there was more 
activity on the raid (kernel panics indicating trouble with 
interrupts, inpage errors etc.) I then upgraded to a 400W power 
supply, which didn't help.  I went back to two single (non-raid) 400 
GB disks - same problem.


finally, I figured out that the non-xen kernel works without 
problems. I'm filling the raid5 since several hours now and the 
system is still stable.


I haven't tried to re-create the raid5 using the non-xen kernel, it 
was created using the xen kernel. maybe xen could be the problem ?


I was wrong in my last post - OS  is actually fedora core 5 (sorry 
for the typo)


PCI: Disabling Via external APIC routing


I will note there is the ominous '400GB' lockup bug with certain promise
controllers.

With the Promise ATA/133 controllers in some configurations you will get
a DRQ/lockup no matter what, replacing with an ATA/100 card and no
issues.  But I see you have a 20265 with is an ATA/100 promise/chipset.

Just out of curiosity have you tried writing or running badblocks on
each parition simultaenously, this would simulate (somewhat) the I/O
sent/received to the drives during a RAID5 rebuild.


These are all things which could be related, but any clue why the 
non-xen kernel works?


--
bill davidsen [EMAIL PROTECTED]
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html