Re: mdadm: RUN_ARRAY failed: Cannot allocate memory
On Saturday March 24, [EMAIL PROTECTED] wrote: Hello Neil , I found the problem that caused the 'cannot allcate memory' , DON'T use '--bitmap=' . But that said , H , Shouldn't mdadm just stop say ... 'md: bitmaps not supported for this level.' Like it puts out into dmesg . Also think this message in dmesg is interesting . raid0: bad disk number -1 - aborting!' Hth , JimL Yeah mdadm should be fixed too, but this kernel patch should make it behave a bit better. I'll queue it for 2.6.22. Thanks, NeilBrown Move test for whether level supports bitmap to correct place. We need to check for internal-consistency of superblock in load_super. validate_super is for inter-device consistency. Signed-off-by: Neil Brown [EMAIL PROTECTED] ### Diffstat output ./drivers/md/md.c | 42 ++ 1 file changed, 26 insertions(+), 16 deletions(-) diff .prev/drivers/md/md.c ./drivers/md/md.c --- .prev/drivers/md/md.c 2007-03-29 16:42:18.0 +1000 +++ ./drivers/md/md.c 2007-03-29 16:49:26.0 +1000 @@ -695,6 +695,17 @@ static int super_90_load(mdk_rdev_t *rde rdev-data_offset = 0; rdev-sb_size = MD_SB_BYTES; + if (sb-state (1MD_SB_BITMAP_PRESENT)) { + if (sb-level != 1 sb-level != 4 +sb-level != 5 sb-level != 6 +sb-level != 10) { + /* FIXME use a better test */ + printk(KERN_WARNING + md: bitmaps not supported for this level.\n); + goto abort; + } + } + if (sb-level == LEVEL_MULTIPATH) rdev-desc_nr = -1; else @@ -793,16 +804,8 @@ static int super_90_validate(mddev_t *md mddev-max_disks = MD_SB_DISKS; if (sb-state (1MD_SB_BITMAP_PRESENT) - mddev-bitmap_file == NULL) { - if (mddev-level != 1 mddev-level != 4 -mddev-level != 5 mddev-level != 6 -mddev-level != 10) { - /* FIXME use a better test */ - printk(KERN_WARNING md: bitmaps not supported for this level.\n); - return -EINVAL; - } + mddev-bitmap_file == NULL) mddev-bitmap_offset = mddev-default_bitmap_offset; - } } else if (mddev-pers == NULL) { /* Insist on good event counter while assembling */ @@ -1059,6 +1062,18 @@ static int super_1_load(mdk_rdev_t *rdev bdevname(rdev-bdev,b)); return -EINVAL; } + if ((le32_to_cpu(sb-feature_map) MD_FEATURE_BITMAP_OFFSET)) { + if (sb-level != cpu_to_le32(1) + sb-level != cpu_to_le32(4) + sb-level != cpu_to_le32(5) + sb-level != cpu_to_le32(6) + sb-level != cpu_to_le32(10)) { + printk(KERN_WARNING + md: bitmaps not supported for this level.\n); + return -EINVAL; + } + } + rdev-preferred_minor = 0x; rdev-data_offset = le64_to_cpu(sb-data_offset); atomic_set(rdev-corrected_errors, le32_to_cpu(sb-cnt_corrected_read)); @@ -1142,14 +1157,9 @@ static int super_1_validate(mddev_t *mdd mddev-max_disks = (4096-256)/2; if ((le32_to_cpu(sb-feature_map) MD_FEATURE_BITMAP_OFFSET) - mddev-bitmap_file == NULL ) { - if (mddev-level != 1 mddev-level != 5 mddev-level != 6 -mddev-level != 10) { - printk(KERN_WARNING md: bitmaps not supported for this level.\n); - return -EINVAL; - } + mddev-bitmap_file == NULL ) mddev-bitmap_offset = (__s32)le32_to_cpu(sb-bitmap_offset); - } + if ((le32_to_cpu(sb-feature_map) MD_FEATURE_RESHAPE_ACTIVE)) { mddev-reshape_position = le64_to_cpu(sb-reshape_position); mddev-delta_disks = le32_to_cpu(sb-delta_disks); - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: split raid1 into to arrays
The fast way (not redundant): You can mark hdb as failed, then remove it. Then you can create a new array using hdb and a missing device. Used this way and it worked. -- --- Dirk Jagdmann http://cubic.org/~doj - http://llg.cubic.org - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Software RAID (non-preempt) server blocking question. (2.6.20.4)
On Thu, 29 Mar 2007, Neil Brown wrote: On Tuesday March 27, [EMAIL PROTECTED] wrote: I ran a check on my SW RAID devices this morning. However, when I did so, I had a few lftp sessions open pulling files. After I executed the check, the lftp processes entered 'D' state and I could do 'nothing' in the process until the check finished. Is this normal? Should a check block all I/O to the device and put the processes writing to a particular device in 'D' state until it is finished? No, that shouldn't happen. The 'check' should notice any other disk activity and slow down if anything else is happening on the device. Did the check run to completion? And if so, did the 'lftp' start working normally again? Yes it did and the lftp did start working normally again. Did you look at cat /proc/mdstat ?? What sort of speed was the check running at? Around 44MB/s. I do use the following optimization, perhaps a bad idea if I want other processes to 'stay alive'? echo Setting minimum resync speed to 200MB/s... echo This improves the resync speed from 2.1MB/s to 44MB/s echo 20 /sys/block/md0/md/sync_speed_min echo 20 /sys/block/md1/md/sync_speed_min echo 20 /sys/block/md2/md/sync_speed_min echo 20 /sys/block/md3/md/sync_speed_min echo 20 /sys/block/md4/md/sync_speed_min NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Software RAID (non-preempt) server blocking question. (2.6.20.4)
On Thu, 29 Mar 2007, Henrique de Moraes Holschuh wrote: On Thu, 29 Mar 2007, Justin Piszcz wrote: Did you look at cat /proc/mdstat ?? What sort of speed was the check running at? Around 44MB/s. I do use the following optimization, perhaps a bad idea if I want other processes to 'stay alive'? echo Setting minimum resync speed to 200MB/s... echo This improves the resync speed from 2.1MB/s to 44MB/s echo 20 /sys/block/md0/md/sync_speed_min echo 20 /sys/block/md1/md/sync_speed_min echo 20 /sys/block/md2/md/sync_speed_min echo 20 /sys/block/md3/md/sync_speed_min echo 20 /sys/block/md4/md/sync_speed_min md RAID1 resync reacts *extremely* badly to CFQ. Just a data point, you may want to check on it. Might mean other RAID types also get screwed, and also that md check is also disturbed by CFQ (or disturbs CFQ, whatever). I reverted everything here to non-CFQ while the RAID did its resync (which fixed all issues immediately), and we went back to 2.6.16.x later for other reasons. -- One disk to rule them all, One disk to find them. One disk to bring them all and in the darkness grind them. In the Land of Redmond where the shadows lie. -- The Silicon Valley Tarot Henrique Holschuh I am using the AS scheduler; not CFQ. $ find /sys 2/dev/null|grep -i scheduler|xargs -n1 cat noop [anticipatory] noop [anticipatory] noop [anticipatory] noop [anticipatory] noop [anticipatory] noop [anticipatory] noop [anticipatory] noop [anticipatory] noop [anticipatory] noop [anticipatory] noop [anticipatory] noop [anticipatory] Justin. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Software RAID (non-preempt) server blocking question. (2.6.20.4)
On Thu, 29 Mar 2007, Justin Piszcz wrote: Did you look at cat /proc/mdstat ?? What sort of speed was the check running at? Around 44MB/s. I do use the following optimization, perhaps a bad idea if I want other processes to 'stay alive'? echo Setting minimum resync speed to 200MB/s... echo This improves the resync speed from 2.1MB/s to 44MB/s echo 20 /sys/block/md0/md/sync_speed_min echo 20 /sys/block/md1/md/sync_speed_min echo 20 /sys/block/md2/md/sync_speed_min echo 20 /sys/block/md3/md/sync_speed_min echo 20 /sys/block/md4/md/sync_speed_min md RAID1 resync reacts *extremely* badly to CFQ. Just a data point, you may want to check on it. Might mean other RAID types also get screwed, and also that md check is also disturbed by CFQ (or disturbs CFQ, whatever). I reverted everything here to non-CFQ while the RAID did its resync (which fixed all issues immediately), and we went back to 2.6.16.x later for other reasons. -- One disk to rule them all, One disk to find them. One disk to bring them all and in the darkness grind them. In the Land of Redmond where the shadows lie. -- The Silicon Valley Tarot Henrique Holschuh - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
is this raid5 OK ?
hi, I manually created my first raid5 on 4 400 GB pata harddisks: [EMAIL PROTECTED] ~]# mdadm --create --verbose /dev/md0 --level=5 --raid-devices=4 --spare-devices=0 /dev/hde1 /dev/hdf1 /dev/hdg1 /dev/hdh1 mdadm: layout defaults to left-symmetric mdadm: chunk size defaults to 64K mdadm: size set to 390708736K mdadm: array /dev/md0 started. but, mdstat shows: [EMAIL PROTECTED] ~]# cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] md0 : active raid5 hdh1[4] hdg1[2] hdf1[1] hde1[0] 1172126208 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_] unused devices: none I'm surprised to see that there's one failed device [UUU_] ? shouldn't it read [] ? [EMAIL PROTECTED] ~]# mdadm --detail --scan mdadm --misc --detail /dev/md0 mdadm: cannot open mdadm: No such file or directory /dev/md0: Version : 00.90.03 Creation Time : Thu Mar 29 19:21:29 2007 Raid Level : raid5 Array Size : 1172126208 (1117.83 GiB 1200.26 GB) Device Size : 390708736 (372.61 GiB 400.09 GB) Raid Devices : 4 Total Devices : 4 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Thu Mar 29 19:37:07 2007 State : clean Active Devices : 4 Working Devices : 4 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 64K UUID : 08c98d1b:d0b5614d:d6893163:61d4bf1b Events : 0.596 Number Major Minor RaidDevice State 0 3310 active sync /dev/hde1 1 33 651 active sync /dev/hdf1 2 3412 active sync /dev/hdg1 2 000 removed 4 34 654 active sync /dev/hdh1 ... and why is there a removed entry ? sorry if these questions are stupid, but this is my first raid5 and I'm a bit worried. cu - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: is this raid5 OK ?
On Thu, 29 Mar 2007, Rainer Fuegenstein wrote: hi, I manually created my first raid5 on 4 400 GB pata harddisks: [EMAIL PROTECTED] ~]# mdadm --create --verbose /dev/md0 --level=5 --raid-devices=4 --spare-devices=0 /dev/hde1 /dev/hdf1 /dev/hdg1 /dev/hdh1 mdadm: layout defaults to left-symmetric mdadm: chunk size defaults to 64K mdadm: size set to 390708736K mdadm: array /dev/md0 started. but, mdstat shows: [EMAIL PROTECTED] ~]# cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] md0 : active raid5 hdh1[4] hdg1[2] hdf1[1] hde1[0] 1172126208 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_] unused devices: none I'm surprised to see that there's one failed device [UUU_] ? shouldn't it read [] ? [EMAIL PROTECTED] ~]# mdadm --detail --scan mdadm --misc --detail /dev/md0 mdadm: cannot open mdadm: No such file or directory /dev/md0: Version : 00.90.03 Creation Time : Thu Mar 29 19:21:29 2007 Raid Level : raid5 Array Size : 1172126208 (1117.83 GiB 1200.26 GB) Device Size : 390708736 (372.61 GiB 400.09 GB) Raid Devices : 4 Total Devices : 4 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Thu Mar 29 19:37:07 2007 State : clean Active Devices : 4 Working Devices : 4 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 64K UUID : 08c98d1b:d0b5614d:d6893163:61d4bf1b Events : 0.596 Number Major Minor RaidDevice State 0 3310 active sync /dev/hde1 1 33 651 active sync /dev/hdf1 2 3412 active sync /dev/hdg1 2 000 removed 4 34 654 active sync /dev/hdh1 ... and why is there a removed entry ? sorry if these questions are stupid, but this is my first raid5 and I'm a bit worried. cu - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Strange, it should read [].. Correct, I would mdadm --zero-superblock on all those drives and re-create the array (mdadm -S (stop it first)) of course before you do it. Justin. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: is this raid5 OK ?
On Thursday March 29, [EMAIL PROTECTED] wrote: hi, I manually created my first raid5 on 4 400 GB pata harddisks: [EMAIL PROTECTED] ~]# mdadm --create --verbose /dev/md0 --level=5 --raid-devices=4 --spare-devices=0 /dev/hde1 /dev/hdf1 /dev/hdg1 /dev/hdh1 mdadm: layout defaults to left-symmetric mdadm: chunk size defaults to 64K mdadm: size set to 390708736K mdadm: array /dev/md0 started. but, mdstat shows: [EMAIL PROTECTED] ~]# cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] md0 : active raid5 hdh1[4] hdg1[2] hdf1[1] hde1[0] 1172126208 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_] unused devices: none I'm surprised to see that there's one failed device [UUU_] ? shouldn't it read [] ? It should read UUU_ at first while building the 4th drive (rebuilding a missing drive is faster that calculating and writing all the parity blocks). But it doesn't seem to be doing that. What kernel version? Try the latest 2.6.x.y in that series. NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: is this raid5 OK ?
On 3/29/07, Neil Brown [EMAIL PROTECTED] wrote: On Thursday March 29, [EMAIL PROTECTED] wrote: hi, I manually created my first raid5 on 4 400 GB pata harddisks: [EMAIL PROTECTED] ~]# mdadm --create --verbose /dev/md0 --level=5 --raid-devices=4 --spare-devices=0 /dev/hde1 /dev/hdf1 /dev/hdg1 /dev/hdh1 mdadm: layout defaults to left-symmetric mdadm: chunk size defaults to 64K mdadm: size set to 390708736K mdadm: array /dev/md0 started. but, mdstat shows: [EMAIL PROTECTED] ~]# cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] md0 : active raid5 hdh1[4] hdg1[2] hdf1[1] hde1[0] 1172126208 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_] unused devices: none I'm surprised to see that there's one failed device [UUU_] ? shouldn't it read [] ? It should read UUU_ at first while building the 4th drive (rebuilding a missing drive is faster that calculating and writing all the parity blocks). But it doesn't seem to be doing that. What kernel version? Try the latest 2.6.x.y in that series. I have seen something similar with older versions of mdadm when specifying all the member drives at once. Does the following kick things into action? mdadm --create /dev/md0 -n 4 -l 5 /dev/hd[efg]1 missing mdadm --add /dev/md0 /dev/hdh1 -- Dan - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html