Re: Reading takes 100% precedence over writes for mdadm+raid5?
On Dec 6, 2007 1:06 AM, Justin Piszcz [EMAIL PROTECTED] wrote: On Wed, 5 Dec 2007, Jon Nelson wrote: I saw something really similar while moving some very large (300MB to 4GB) files. I was really surprised to see actual disk I/O (as measured by dstat) be really horrible. Any work-arounds, or just don't perform heavy reads the same time as writes? What kernel are you using? (Did I miss it in your OP?) The per-device write throttling in 2.6.24 should help significantly, have you tried the latest -rc and compared to your current kernel? -Dave - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Reading takes 100% precedence over writes for mdadm+raid5?
On Thu, 6 Dec 2007, David Rees wrote: On Dec 6, 2007 1:06 AM, Justin Piszcz [EMAIL PROTECTED] wrote: On Wed, 5 Dec 2007, Jon Nelson wrote: I saw something really similar while moving some very large (300MB to 4GB) files. I was really surprised to see actual disk I/O (as measured by dstat) be really horrible. Any work-arounds, or just don't perform heavy reads the same time as writes? What kernel are you using? (Did I miss it in your OP?) The per-device write throttling in 2.6.24 should help significantly, have you tried the latest -rc and compared to your current kernel? -Dave 2.6.23.9-- thanks will try out the latest -rc or wait for 2.6.24! Justin. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Reading takes 100% precedence over writes for mdadm+raid5?
On 12/6/07, David Rees [EMAIL PROTECTED] wrote: On Dec 6, 2007 1:06 AM, Justin Piszcz [EMAIL PROTECTED] wrote: On Wed, 5 Dec 2007, Jon Nelson wrote: I saw something really similar while moving some very large (300MB to 4GB) files. I was really surprised to see actual disk I/O (as measured by dstat) be really horrible. Any work-arounds, or just don't perform heavy reads the same time as writes? What kernel are you using? (Did I miss it in your OP?) The per-device write throttling in 2.6.24 should help significantly, have you tried the latest -rc and compared to your current kernel? I was using 2.6.22.12 I think (openSUSE kernel). I can try using pretty much any kernel - I'm preparing to do an unrelated test using 2.6.24rc4 this weekend. If I remember I'll try to see what disk I/O looks like there. -- Jon - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Kernel 2.6.23.9 + mdadm 2.6.2-2 + Auto rebuild RAID1?
On Dec 5 2007 19:29, Nix wrote: On Dec 1 2007 06:19, Justin Piszcz wrote: RAID1, 0.90.03 superblocks (in order to be compatible with LILO, if you use 1.x superblocks with LILO you can't boot) Says who? (Don't use LILO ;-) Well, your kernels must be on a 0.90-superblocked RAID-0 or RAID-1 device. It can't handle booting off 1.x superblocks nor RAID-[56] (not that I could really hope for the latter). If the superblock is at the end (which is the case for 0.90 and 1.0), then the offsets for a specific block on /dev/mdX match the ones for /dev/sda, so it should be easy to use lilo on 1.0 too, no? (Yes, it will not work with 1.1 or 1.2.) - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: assemble vs create an array.......
Thank you. I want to make sure I understand. 1- Does it matter which permutation of drives I use for xfs_repair (as long as it tells me that the Structure needs cleaning)? When it comes to linux I consider myself at intermediate level, but I am a beginner when it comes to raid and filesystem issues. 2- After I do it, assuming that it worked, how do I reintegrate the 'missing' drive while keeping my data? Thank you for you time. Dragos David Greaves wrote: Dragos wrote: Thank you for your very fast answers. First I tried 'fsck -n' on the existing array. The answer was that If I wanted to check a XFS partition I should use 'xfs_check'. That seems to say that my array was partitioned with xfs, not reiserfs. Am I correct? Then I tried the different permutations: mdadm --create /dev/md0 --raid-devices=3 --level=5 missing /dev/sda1 /dev/sdb1 mount /dev/md0 temp mdadm --stop --scan mdadm --create /dev/md0 --raid-devices=3 --level=5 /dev/sda1 missing /dev/sdb1 mount /dev/md0 temp mdadm --stop --scan [etc] With some arrays mount reported: mount: you must specify the filesystem type and with others: mount: Structure needs cleaning No choice seems to have been successful. OK, not as good as you could have hoped for. Make sure you have the latest xfs tools. you may want to try xfs_repair and you can use the -n (I think - check man page) option. You may need to force it to ignore the log David - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Reading takes 100% precedence over writes for mdadm+raid5?
Justin Piszcz wrote: root 2206 1 4 Dec02 ?00:10:37 dd if /dev/zero of 1.out bs 1M root 2207 1 4 Dec02 ?00:10:38 dd if /dev/zero of 2.out bs 1M root 2208 1 4 Dec02 ?00:10:35 dd if /dev/zero of 3.out bs 1M root 2209 1 4 Dec02 ?00:10:45 dd if /dev/zero of 4.out bs 1M root 2210 1 4 Dec02 ?00:10:35 dd if /dev/zero of 5.out bs 1M root 2211 1 4 Dec02 ?00:10:35 dd if /dev/zero of 6.out bs 1M root 2212 1 4 Dec02 ?00:10:30 dd if /dev/zero of 7.out bs 1M root 2213 1 4 Dec02 ?00:10:42 dd if /dev/zero of 8.out bs 1M root 2214 1 4 Dec02 ?00:10:35 dd if /dev/zero of 9.out bs 1M root 2215 1 4 Dec02 ?00:10:37 dd if /dev/zero of 10.out bs 1M root 3080 24.6 0.0 10356 1672 ?D01:22 5:51 dd if /dev/md3 of /dev/null bs 1M Was curious if when running 10 DD's (which are writing to the RAID 5) fine, no issues, suddenly all go into D-state and let the read/give it 100% priority? Is this normal? I'm jumping back to the start of this thread, because after reading all the discussion I noticed that you are mixing apples and oranges here. Your write programs are going to files in the filesystem, and your read is going against the raw device. That may explain why you see something I haven't noticed doing all filesystem i/o. I am going to do a large rsync to another filesystem in the next two days, I will turn on some measurements when I do. But if you are just investigating this behavior, perhaps you could retry with a single read from a file rather than the device. [...snip...] -- Bill Davidsen [EMAIL PROTECTED] Woe unto the statesman who makes war without a reason that will still be valid when the war is over... Otto von Bismark - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
external bitmaps.. and more
I come across a situation where external MD bitmaps aren't usable on any standard linux distribution unless special (non-trivial) actions are taken. First is a small buglet in mdadm, or two. It's not possible to specify --bitmap= in assemble command line - the option seems to be ignored. But it's honored when specified in config file. Also, mdadm should probably warn or even refuse to do things (unless --force is given) when an array being assembled is using external bitmap, but the bitmap file isn't specified. Now for something more.. interesting. The thing is that when a external bitmap is being used for an array, and that bitmap resides on another filesystem, all common distributions fails to start/mount and to shutdown/umount arrays/filesystems properly, because all starts/stops is done in one script, and all mounts/umounts in another, but for bitmaps to work the two should be intermixed with each other. Here's why. Suppose I've an array mdX which used bitmap /stuff/bitmap, where /stuff is another separate filesystem. In this case, during startup, /stuff should be mounted before bringing up mdX, and during shutdown, mdX should be stopped before trying to umount /stuff. Or else during startup mdX will not find /stuff/bitmap, and during shutdown /stuff filesystem is busy since mdX is holding a reference to it. Doing things in simple way doesn't work: if I specify to mount mdX as /data in /etc/fstab, -- since mdX hasn't been assembled by mdadm (due to missing bitmap), the system will not start, asking for emergency root password... Oh well. So the only solution for this so far is to convert md array assemble/stop operation into... MOUNTS/UMOUNTS! And specify all necessary information in /etc/fstab - for both arrays and filesystems, with proper ordering in order column. Ghrm. Technically speaking it's not difficult - mount.md and fsck.md wrappers for mdadm are trivially to write (I even tried that myself - a quick-n-dirty 5-minutes hack works). But it's... ugly. But I don't see any other reasonable solutions. Alternatives are additional scripts to start/stop/mount/umount filesystems residing on or related to advanced arrays (with external bitmaps in this case) - but looking at how much code is in current startup scripts around mounting/fscking, and having in mind that mount/umount does not support alternative /etc/fstab, this is umm.. even more ugly... Comments anyone? Thanks. /mjt P.S. Why external bitmaps in the first place? Well, that's a good question, and here's a (hopefully good too) answer: When there are sufficient disk drives available to dedicate some of them for bitmap(s), and there's a large array(s) with dynamic content (many writes), and the content is important enough to care about data safety wrt possible power losses and kernel OOPSes and whatnot, placing bitmap into another disk(s) helps alot with resyncs (it's not about resync speed, it's about resync general UNRELIABILITY, which is another topic - hopefully a long-term linux raid gurus will understand me here), but does not slow down writes hugely due to constant disk seeks when updating bitmaps. Those seeks tends to have huge impact on random write performance. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: assemble vs create an array.......
[Cc'd to xfs list as it contains something related] Dragos wrote: Thank you. I want to make sure I understand. [Some background for XFS list. The talk is about a broken linux software raid (the reason for breakage isn't relevant anymore). The OP seems to lost the order of drives in his array, and now tries to create new array ontop, trying different combinations of drives. The filesystem there WAS XFS. One point is that linux refuses to mount it, saying structure needs cleaning. This all is mostly md-related, but there are several XFS-related questions and concerns too.] 1- Does it matter which permutation of drives I use for xfs_repair (as long as it tells me that the Structure needs cleaning)? When it comes to linux I consider myself at intermediate level, but I am a beginner when it comes to raid and filesystem issues. The permutation DOES MATTER - for all the devices. Linux, when mounting an fs, only looks at the superblock of the filesystem, which is usually located at the beginning of the device. So in each case linux actually recognizes the filesystem (instead of seeing complete garbage), the same device is the first one - I.e, this way you found your first device. The rest may be still out of order. Raid5 data is laid like this (with 3 drives for simplicity, it's similar with more drives): DiskA DiskB DiskC Blk0 Data0 Data1 P0 Blk1 P1 Data2 Data3 Blk2 Data4 P2 Data5 Blk3 Data6 Data7 P3 ... and so on ... where your actual data blocks are Data0, Data1, ... DataN, and PX are parity blocks. As long as DiskA remains in this position, the beginning of the array is Data0 block, -- hence linux sees the beginning of the filesystem and recognizes it. But you can switch DiskB and DiskC still, and the rest of the data will be complete garbage, only data blocks on DiskA will be in place. So you still need to find order of the other drives (you found your first drive, DriveA, already). Note also that if Data1 block is all-zeros (a situation which is unlikely for a non-empty filesystem), P0 (first parity block) will be exactly the same as Data0, because XORing anything with zeros gives the same anything again (XOR is the operation used to calculate parity blocks in RAID5). So there's still a remote chance you've TWO first disks... What to do is to give repairfs a try for each permutation, but again without letting it to actually fix anything. Just run it in read-only mode and see which combination of drives gives less errors, or no fatal errors (there may be several similar combinations, with the same order of drives but with different drive missing). It's sad that xfs refuses mount when structure needs cleaning - the best way here is to actually mount it and see how it looks like, instead of trying repair tools. Is there some option to force-mount it still (in readonly mode, knowing it may OOPs kernel etc)? I'm not very familiar with xfs yet - it seems to be much faster than ext3 for our workload (mostly databases), and I'm experimenting with it slowly. But this very thread prompted me to think. If I can't force-mount it (or browse it using other ways) as I can almost always do with (somewhat?) broken ext[23] just to examine things, maybe I'm trying it before it's mature enough? ;) Note the smile, but note there's a bit of joke in every joke... :) 2- After I do it, assuming that it worked, how do I reintegrate the 'missing' drive while keeping my data? Just add it back -- mdadm --add /dev/mdX /dev/sdYZ. But don't do that till you actually see your data. /mjt - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: assemble vs create an array.......
Michael Tokarev wrote: It's sad that xfs refuses mount when structure needs cleaning - the best way here is to actually mount it and see how it looks like, instead of trying repair tools. Is there some option to force-mount it still (in readonly mode, knowing it may OOPs kernel etc)? depends what went wrong, but in general that error means that metadata corruption was encountered which was sufficient for xfs to abort whatever it was doing. It's not done lightly; it's likely bailing out because it had no other choice. You can't force mount something which is sufficiently corrupted that xfs can't understand it anymore... IOW you can't traverse and read corrupted/scrambled metadata, no mount option can help you. :) If the shutdown were encountered during use, you could maybe avoid the bad metadata. If it's during mount that's probably a more fundamental problem. kernel messages when you get the structure needs cleaning error would be a clue as to what it actually hit. -Eric - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raid6 check/repair
On 15:31, Bill Davidsen wrote: Thiemo posted metacode which I find appears correct, It assumes that _exactly_ one disk has bad data which is hard to verify in practice. But yes, it's probably the best one can do if both P and Q happen to be incorrect. IMHO mdadm shouldn't do this automatically though and should always keep backup copies of the data it overwrites with good data. Andre -- The only person who always got his work done by Friday was Robinson Crusoe signature.asc Description: Digital signature
[PATCH] (2nd try) force parallel resync
Hello, here is the second version of the patch. With this version also on setting /sys/block/*/md/sync_force_parallel the sync_thread is woken up. Though, I still don't understand why md_wakeup_thread() is not working. Signed-off-by: Bernd Schubert [EMAIL PROTECTED] Index: linux-2.6.22/drivers/md/md.c === --- linux-2.6.22.orig/drivers/md/md.c 2007-12-06 19:51:55.0 +0100 +++ linux-2.6.22/drivers/md/md.c2007-12-06 19:52:33.0 +0100 @@ -2843,6 +2843,41 @@ __ATTR(sync_speed_max, S_IRUGO|S_IWUSR, static ssize_t +sync_force_parallel_show(mddev_t *mddev, char *page) +{ +return sprintf(page, %d\n, mddev-parallel_resync); +} + +static ssize_t +sync_force_parallel_store(mddev_t *mddev, const char *buf, size_t len) +{ + char *e; + unsigned long n = simple_strtoul(buf, e, 10); + + if (!*buf || (*e *e != '\n') || (n != 0 n != 1)) + return -EINVAL; + + mddev-parallel_resync = n; + + if (mddev-sync_thread) { + dprintk(md: waking up MD thread %s.\n, + mddev-sync_thread-tsk-comm); + set_bit(THREAD_WAKEUP, mddev-sync_thread-flags); + wake_up_process(mddev-sync_thread-tsk); + +/* FIXME: why does md_wakeup_thread() not work?, + somehow related to: wake_up(thread-wqueue); + md_wakeup_thread(mddev-sync_thread); */ + } + return len; +} + +/* force parallel resync, even with shared block devices */ +static struct md_sysfs_entry md_sync_force_parallel = +__ATTR(sync_force_parallel, S_IRUGO|S_IWUSR, + sync_force_parallel_show, sync_force_parallel_store); + +static ssize_t sync_speed_show(mddev_t *mddev, char *page) { unsigned long resync, dt, db; @@ -2980,6 +3015,7 @@ static struct attribute *md_redundancy_a md_sync_min.attr, md_sync_max.attr, md_sync_speed.attr, + md_sync_force_parallel.attr, md_sync_completed.attr, md_suspend_lo.attr, md_suspend_hi.attr, @@ -5264,8 +5300,9 @@ void md_do_sync(mddev_t *mddev) ITERATE_MDDEV(mddev2,tmp) { if (mddev2 == mddev) continue; - if (mddev2-curr_resync - match_mddev_units(mddev,mddev2)) { + if (!mddev-parallel_resync + mddev2-curr_resync + match_mddev_units(mddev,mddev2)) { DEFINE_WAIT(wq); if (mddev mddev2 mddev-curr_resync == 2) { /* arbitrarily yield */ Index: linux-2.6.22/include/linux/raid/md_k.h === --- linux-2.6.22.orig/include/linux/raid/md_k.h 2007-12-06 19:51:55.0 +0100 +++ linux-2.6.22/include/linux/raid/md_k.h 2007-12-06 19:52:33.0 +0100 @@ -170,6 +170,9 @@ struct mddev_s int sync_speed_min; int sync_speed_max; + /* resync even though the same disks are shared among md-devices */ + int parallel_resync; + int ok_start_degraded; /* recovery/resync flags * NEEDED: we might need to start a resync/recover -- Bernd Schubert Q-Leap Networks GmbH - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RAID mapper device size wrong after replacing drives
Hi, I have a problem with my RAID array under Linux after upgrading to larger drives. I have a machine with Windows and Linux dual-boot which had a pair of 160GB drives in a RAID-1 mirror with 3 partitions: partiton 1 = Windows boot partition (FAT32), partiton 2 = Linux /boot (ext3), partiton 3 = Windows system (NTFS). The Linux /root is on a separate physical drive. The dual boot is via Grub installed on the /boot partiton, and this was all working fine. But I just upgraded the drives in the RAID pair, replacing them with 500GB drives. I did this by replacing one of the 160s with a new 500 and letting the RAID copy the drive, splitting the drives out of the RAID array and increasing the size of the last partition of the 500 (which I did under Windows since its the Windows partiton) then replacing the last 160 with the other 500 and having the RAID controller create a new array with the two 500s, copying the drive that I'd copied from the 160. This worked great for Windows, and that now boots and sees a 500GB RAID drive with all the data intact. However, Linux has a problem and will not now boot all the way. It reports that the RAID /dev/mapper volume failed - the partition is beyond the boundaries of the disk. Running fdisk shows that it is seeing the larger partiton, but still sees the size of the RAID /dev/mapper drive as 160GB. Here is the fdisk output for one of the physical drives and for the RAID mapper drive: Disk /dev/sda: 500.1 GB, 500107862016 bytes 255 heads, 63 sectors/track, 60801 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sda1 1 625 5018624b W95 FAT32 Partition 1 does not end on cylinder boundary. /dev/sda2 626 637 96390 83 Linux /dev/sda3 * 638 60802 4832645127 HPFS/NTFS Disk /dev/mapper/isw_bcifcijdi_Raid-0: 163.9 GB, 163925983232 bytes 255 heads, 63 sectors/track, 19929 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/mapper/isw_bcifcijdi_Raid-0p1 1 625 5018624 b W95 FAT32 Partition 1 does not end on cylinder boundary. /dev/mapper/isw_bcifcijdi_Raid-0p2 626 637 96390 83 Linux /dev/mapper/isw_bcifcijdi_Raid-0p3 * 638 60802 483264512 7 HPFS/NTFS They differ only in the drive capacity and number of cylinders. I started to try to run a Linux reinstall, but it reports that the partiion table on the mapper drive is invalid, giving an option to re-initialize it but saying that doing so will lose all the data on the drive. So questions: 1. Where is the drive size information for the RAID mapper drive kept, and is there some way to patch it? 2. Is there some way to re-initialize the RAID mapper drive without destroying the data on the drive? Thanks, Ian -- View this message in context: http://www.nabble.com/RAID-mapper-device-size-wrong-after-replacing-drives-tf4958354.html#a14200241 Sent from the linux-raid mailing list archive at Nabble.com. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: assemble vs create an array.......
On Thu, Dec 06, 2007 at 07:39:28PM +0300, Michael Tokarev wrote: What to do is to give repairfs a try for each permutation, but again without letting it to actually fix anything. Just run it in read-only mode and see which combination of drives gives less errors, or no fatal errors (there may be several similar combinations, with the same order of drives but with different drive missing). Ugggh. It's sad that xfs refuses mount when structure needs cleaning - the best way here is to actually mount it and see how it looks like, instead of trying repair tools. It self protection - if you try to write to a corrupted filesystem, you'll only make the corruption worse. Mounting involves log recovery, which writes to the filesystem Is there some option to force-mount it still (in readonly mode, knowing it may OOPs kernel etc)? Sure you can: mount -o ro,norecovery dev mtpt But it you hit corruption it will still shut down on you. If the machine oopses then that is a bug. thread prompted me to think. If I can't force-mount it (or browse it using other ways) as I can almost always do with (somewhat?) broken ext[23] just to examine things, maybe I'm trying it before it's mature enough? ;) Hehe ;) For maximum uber-XFS-guru points, learn to browse your filesystem with xfs_db. :P Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Kernel 2.6.23.9 / P35 Chipset + WD 750GB Drives (reset port)
On Sat, 1 Dec 2007 06:26:08 -0500 (EST) Justin Piszcz [EMAIL PROTECTED] wrote: I am putting a new machine together and I have dual raptor raid 1 for the root, which works just fine under all stress tests. Then I have the WD 750 GiB drive (not RE2, desktop ones for ~150-160 on sale now adays): I ran the following: dd if=/dev/zero of=/dev/sdc dd if=/dev/zero of=/dev/sdd dd if=/dev/zero of=/dev/sde (as it is always a very good idea to do this with any new disk) And sometime along the way(?) (i had gone to sleep and let it run), this occurred: [42880.680144] ata3.00: exception Emask 0x10 SAct 0x0 SErr 0x401 action 0x2 frozen Gee we're seeing a lot of these lately. [42880.680231] ata3.00: irq_stat 0x00400040, connection status changed [42880.680290] ata3.00: cmd ec/00:00:00:00:00/00:00:00:00:00/00 tag 0 cdb 0x0 data 512 in [42880.680292] res 40/00:ac:d8:64:54/00:00:57:00:00/40 Emask 0x10 (ATA bus error) [42881.841899] ata3: soft resetting port [42885.966320] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [42915.919042] ata3.00: qc timeout (cmd 0xec) [42915.919094] ata3.00: failed to IDENTIFY (I/O error, err_mask=0x5) [42915.919149] ata3.00: revalidation failed (errno=-5) [42915.919206] ata3: failed to recover some devices, retrying in 5 secs [42920.912458] ata3: hard resetting port [42926.411363] ata3: port is slow to respond, please be patient (Status 0x80) [42930.943080] ata3: COMRESET failed (errno=-16) [42930.943130] ata3: hard resetting port [42931.399628] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [42931.413523] ata3.00: configured for UDMA/133 [42931.413586] ata3: EH pending after completion, repeating EH (cnt=4) [42931.413655] ata3: EH complete [42931.413719] sd 2:0:0:0: [sdc] 1465149168 512-byte hardware sectors (750156 MB) [42931.413809] sd 2:0:0:0: [sdc] Write Protect is off [42931.413856] sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00 [42931.413867] sd 2:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Usually when I see this sort of thing with another box I have full of raptors, it was due to a bad raptor and I never saw it again after I replaced the disk that it happened on, but that was using the Intel P965 chipset. For this board, it is a Gigabyte GSP-P35-DS4 (Rev 2.0) and I have all of the drives (2 raptors, 3 750s connected to the Intel ICH9 Southbridge). I am going to do some further testing but does this indicate a bad drive? Bad cable? Bad connector? As you can see above, /dev/sdc stopped responding for a little bit and then the kernel reset the port. Why is this though? What is the likely root cause? Should I replace the drive? Obviously this is not normal and cannot be good at all, the idea is to put these drives in a RAID5 and if one is going to timeout that is going to cause the array to go degraded and thus be worthless in a raid5 configuration. Can anyone offer any insight here? It would be interesting to try 2.6.21 or 2.6.22. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Kernel 2.6.23.9 / P35 Chipset + WD 750GB Drives (reset port)
On Thu, 6 Dec 2007, Andrew Morton wrote: On Sat, 1 Dec 2007 06:26:08 -0500 (EST) Justin Piszcz [EMAIL PROTECTED] wrote: I am putting a new machine together and I have dual raptor raid 1 for the root, which works just fine under all stress tests. Then I have the WD 750 GiB drive (not RE2, desktop ones for ~150-160 on sale now adays): I ran the following: dd if=/dev/zero of=/dev/sdc dd if=/dev/zero of=/dev/sdd dd if=/dev/zero of=/dev/sde (as it is always a very good idea to do this with any new disk) And sometime along the way(?) (i had gone to sleep and let it run), this occurred: [42880.680144] ata3.00: exception Emask 0x10 SAct 0x0 SErr 0x401 action 0x2 frozen Gee we're seeing a lot of these lately. [42880.680231] ata3.00: irq_stat 0x00400040, connection status changed [42880.680290] ata3.00: cmd ec/00:00:00:00:00/00:00:00:00:00/00 tag 0 cdb 0x0 data 512 in [42880.680292] res 40/00:ac:d8:64:54/00:00:57:00:00/40 Emask 0x10 (ATA bus error) [42881.841899] ata3: soft resetting port [42885.966320] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [42915.919042] ata3.00: qc timeout (cmd 0xec) [42915.919094] ata3.00: failed to IDENTIFY (I/O error, err_mask=0x5) [42915.919149] ata3.00: revalidation failed (errno=-5) [42915.919206] ata3: failed to recover some devices, retrying in 5 secs [42920.912458] ata3: hard resetting port [42926.411363] ata3: port is slow to respond, please be patient (Status 0x80) [42930.943080] ata3: COMRESET failed (errno=-16) [42930.943130] ata3: hard resetting port [42931.399628] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [42931.413523] ata3.00: configured for UDMA/133 [42931.413586] ata3: EH pending after completion, repeating EH (cnt=4) [42931.413655] ata3: EH complete [42931.413719] sd 2:0:0:0: [sdc] 1465149168 512-byte hardware sectors (750156 MB) [42931.413809] sd 2:0:0:0: [sdc] Write Protect is off [42931.413856] sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00 [42931.413867] sd 2:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Usually when I see this sort of thing with another box I have full of raptors, it was due to a bad raptor and I never saw it again after I replaced the disk that it happened on, but that was using the Intel P965 chipset. For this board, it is a Gigabyte GSP-P35-DS4 (Rev 2.0) and I have all of the drives (2 raptors, 3 750s connected to the Intel ICH9 Southbridge). I am going to do some further testing but does this indicate a bad drive? Bad cable? Bad connector? As you can see above, /dev/sdc stopped responding for a little bit and then the kernel reset the port. Why is this though? What is the likely root cause? Should I replace the drive? Obviously this is not normal and cannot be good at all, the idea is to put these drives in a RAID5 and if one is going to timeout that is going to cause the array to go degraded and thus be worthless in a raid5 configuration. Can anyone offer any insight here? It would be interesting to try 2.6.21 or 2.6.22. This was due to NCQ issues (disabling it fixed the problem). Justin. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Kernel 2.6.23.9 / P35 Chipset + WD 750GB Drives (reset port)
On Thu, 6 Dec 2007 17:38:08 -0500 (EST) Justin Piszcz [EMAIL PROTECTED] wrote: On Thu, 6 Dec 2007, Andrew Morton wrote: On Sat, 1 Dec 2007 06:26:08 -0500 (EST) Justin Piszcz [EMAIL PROTECTED] wrote: I am putting a new machine together and I have dual raptor raid 1 for the root, which works just fine under all stress tests. Then I have the WD 750 GiB drive (not RE2, desktop ones for ~150-160 on sale now adays): I ran the following: dd if=/dev/zero of=/dev/sdc dd if=/dev/zero of=/dev/sdd dd if=/dev/zero of=/dev/sde (as it is always a very good idea to do this with any new disk) And sometime along the way(?) (i had gone to sleep and let it run), this occurred: [42880.680144] ata3.00: exception Emask 0x10 SAct 0x0 SErr 0x401 action 0x2 frozen Gee we're seeing a lot of these lately. [42880.680231] ata3.00: irq_stat 0x00400040, connection status changed [42880.680290] ata3.00: cmd ec/00:00:00:00:00/00:00:00:00:00/00 tag 0 cdb 0x0 data 512 in [42880.680292] res 40/00:ac:d8:64:54/00:00:57:00:00/40 Emask 0x10 (ATA bus error) [42881.841899] ata3: soft resetting port [42885.966320] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [42915.919042] ata3.00: qc timeout (cmd 0xec) [42915.919094] ata3.00: failed to IDENTIFY (I/O error, err_mask=0x5) [42915.919149] ata3.00: revalidation failed (errno=-5) [42915.919206] ata3: failed to recover some devices, retrying in 5 secs [42920.912458] ata3: hard resetting port [42926.411363] ata3: port is slow to respond, please be patient (Status 0x80) [42930.943080] ata3: COMRESET failed (errno=-16) [42930.943130] ata3: hard resetting port [42931.399628] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [42931.413523] ata3.00: configured for UDMA/133 [42931.413586] ata3: EH pending after completion, repeating EH (cnt=4) [42931.413655] ata3: EH complete [42931.413719] sd 2:0:0:0: [sdc] 1465149168 512-byte hardware sectors (750156 MB) [42931.413809] sd 2:0:0:0: [sdc] Write Protect is off [42931.413856] sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00 [42931.413867] sd 2:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Usually when I see this sort of thing with another box I have full of raptors, it was due to a bad raptor and I never saw it again after I replaced the disk that it happened on, but that was using the Intel P965 chipset. For this board, it is a Gigabyte GSP-P35-DS4 (Rev 2.0) and I have all of the drives (2 raptors, 3 750s connected to the Intel ICH9 Southbridge). I am going to do some further testing but does this indicate a bad drive? Bad cable? Bad connector? As you can see above, /dev/sdc stopped responding for a little bit and then the kernel reset the port. Why is this though? What is the likely root cause? Should I replace the drive? Obviously this is not normal and cannot be good at all, the idea is to put these drives in a RAID5 and if one is going to timeout that is going to cause the array to go degraded and thus be worthless in a raid5 configuration. Can anyone offer any insight here? It would be interesting to try 2.6.21 or 2.6.22. This was due to NCQ issues (disabling it fixed the problem). I cannot locate any further email discussion on this topic. Disabling NCQ at either compile time or runtime is not a fix and further work should be done here to maek the kernel run acceptably on that hardware. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] (2nd try) force parallel resync
On Thursday December 6, [EMAIL PROTECTED] wrote: Hello, here is the second version of the patch. With this version also on setting /sys/block/*/md/sync_force_parallel the sync_thread is woken up. Though, I still don't understand why md_wakeup_thread() is not working. Could give a little more detail on why you want this? When do you want multiple arrays on the same device to sync at the same time? What exactly is the hardware like? md threads generally run for a little while to perform some task, then stop and wait to be needed again. md_wakeup_thread says you are needed again. The resync/recovery thread is a bit different. It just run md_do_sync once. md_wakeup_thread is not really meaningful in that context. What you want is: wake_up(resync_wait); that will get any thread that is waiting for some other array to resync to wake up and see if something needs to be done. NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 001 of 3] md: raid6: Fix mktable.c
From: H. Peter Anvin [EMAIL PROTECTED] Make both mktables.c and its output CodingStyle compliant. Update the copyright notice. Signed-off-by: H. Peter Anvin [EMAIL PROTECTED] Signed-off-by: Neil Brown [EMAIL PROTECTED] ### Diffstat output ./drivers/md/mktables.c | 43 +-- 1 file changed, 17 insertions(+), 26 deletions(-) diff .prev/drivers/md/mktables.c ./drivers/md/mktables.c --- .prev/drivers/md/mktables.c 2007-12-03 14:47:09.0 +1100 +++ ./drivers/md/mktables.c 2007-12-03 14:56:06.0 +1100 @@ -1,13 +1,10 @@ -#ident $Id: mktables.c,v 1.2 2002/12/12 22:41:27 hpa Exp $ -/* --- * +/* -*- linux-c -*- --- * * - * Copyright 2002 H. Peter Anvin - All Rights Reserved + * Copyright 2002-2007 H. Peter Anvin - All Rights Reserved * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation, Inc., 53 Temple Place Ste 330, - * Bostom MA 02111-1307, USA; either version 2 of the License, or - * (at your option) any later version; incorporated herein by reference. + * This file is part of the Linux kernel, and is made available under + * the terms of the GNU General Public License version 2 or (at your + * option) any later version; incorporated herein by reference. * * --- */ @@ -73,8 +70,8 @@ int main(int argc, char *argv[]) for (j = 0; j 256; j += 8) { printf(\t\t); for (k = 0; k 8; k++) - printf(0x%02x, , gfmul(i, j+k)); - printf(\n); + printf(0x%02x,%c, gfmul(i, j + k), + (k == 7) ? '\n' : ' '); } printf(\t},\n); } @@ -83,47 +80,41 @@ int main(int argc, char *argv[]) /* Compute power-of-2 table (exponent) */ v = 1; printf(\nconst u8 __attribute__((aligned(256)))\n - raid6_gfexp[256] =\n - {\n); + raid6_gfexp[256] =\n {\n); for (i = 0; i 256; i += 8) { printf(\t); for (j = 0; j 8; j++) { - exptbl[i+j] = v; - printf(0x%02x, , v); + exptbl[i + j] = v; + printf(0x%02x,%c, v, (j == 7) ? '\n' : ' '); v = gfmul(v, 2); if (v == 1) v = 0; /* For entry 255, not a real entry */ } - printf(\n); } printf(};\n); /* Compute inverse table x^-1 == x^254 */ printf(\nconst u8 __attribute__((aligned(256)))\n - raid6_gfinv[256] =\n - {\n); + raid6_gfinv[256] =\n {\n); for (i = 0; i 256; i += 8) { printf(\t); for (j = 0; j 8; j++) { - v = gfpow(i+j, 254); - invtbl[i+j] = v; - printf(0x%02x, , v); + invtbl[i + j] = v = gfpow(i + j, 254); + printf(0x%02x,%c, v, (j == 7) ? '\n' : ' '); } - printf(\n); } printf(};\n); /* Compute inv(2^x + 1) (exponent-xor-inverse) table */ printf(\nconst u8 __attribute__((aligned(256)))\n - raid6_gfexi[256] =\n - {\n); + raid6_gfexi[256] =\n {\n); for (i = 0; i 256; i += 8) { printf(\t); for (j = 0; j 8; j++) - printf(0x%02x, , invtbl[exptbl[i+j]^1]); - printf(\n); + printf(0x%02x,%c, invtbl[exptbl[i + j] ^ 1], + (j == 7) ? '\n' : ' '); } - printf(};\n\n); + printf(};\n); return 0; } - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 000 of 3] md: a few little patches
Following 3 patches for md provide some code tidyup and a small functionality improvement. They do not need to go into 2.6.24 but are definitely appropriate 25-rc1. (Patches made against 2.6.24-rc3-mm2) Thanks, NeilBrown [PATCH 001 of 3] md: raid6: Fix mktable.c [PATCH 002 of 3] md: raid6: clean up the style of raid6test/test.c [PATCH 003 of 3] md: Update md bitmap during resync. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID mapper device size wrong after replacing drives
I think you would have more luck posting this to [EMAIL PROTECTED] - I think that is where support for device mapper happens. NeilBrown On Thursday December 6, [EMAIL PROTECTED] wrote: Hi, I have a problem with my RAID array under Linux after upgrading to larger drives. I have a machine with Windows and Linux dual-boot which had a pair of 160GB drives in a RAID-1 mirror with 3 partitions: partiton 1 = Windows boot partition (FAT32), partiton 2 = Linux /boot (ext3), partiton 3 = Windows system (NTFS). The Linux /root is on a separate physical drive. The dual boot is via Grub installed on the /boot partiton, and this was all working fine. But I just upgraded the drives in the RAID pair, replacing them with 500GB drives. I did this by replacing one of the 160s with a new 500 and letting the RAID copy the drive, splitting the drives out of the RAID array and increasing the size of the last partition of the 500 (which I did under Windows since its the Windows partiton) then replacing the last 160 with the other 500 and having the RAID controller create a new array with the two 500s, copying the drive that I'd copied from the 160. This worked great for Windows, and that now boots and sees a 500GB RAID drive with all the data intact. However, Linux has a problem and will not now boot all the way. It reports that the RAID /dev/mapper volume failed - the partition is beyond the boundaries of the disk. Running fdisk shows that it is seeing the larger partiton, but still sees the size of the RAID /dev/mapper drive as 160GB. Here is the fdisk output for one of the physical drives and for the RAID mapper drive: Disk /dev/sda: 500.1 GB, 500107862016 bytes 255 heads, 63 sectors/track, 60801 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sda1 1 625 5018624b W95 FAT32 Partition 1 does not end on cylinder boundary. /dev/sda2 626 637 96390 83 Linux /dev/sda3 * 638 60802 4832645127 HPFS/NTFS Disk /dev/mapper/isw_bcifcijdi_Raid-0: 163.9 GB, 163925983232 bytes 255 heads, 63 sectors/track, 19929 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/mapper/isw_bcifcijdi_Raid-0p1 1 625 5018624 b W95 FAT32 Partition 1 does not end on cylinder boundary. /dev/mapper/isw_bcifcijdi_Raid-0p2 626 637 96390 83 Linux /dev/mapper/isw_bcifcijdi_Raid-0p3 * 638 60802 483264512 7 HPFS/NTFS They differ only in the drive capacity and number of cylinders. I started to try to run a Linux reinstall, but it reports that the partiion table on the mapper drive is invalid, giving an option to re-initialize it but saying that doing so will lose all the data on the drive. So questions: 1. Where is the drive size information for the RAID mapper drive kept, and is there some way to patch it? 2. Is there some way to re-initialize the RAID mapper drive without destroying the data on the drive? Thanks, Ian -- View this message in context: http://www.nabble.com/RAID-mapper-device-size-wrong-after-replacing-drives-tf4958354.html#a14200241 Sent from the linux-raid mailing list archive at Nabble.com. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 002 of 3] md: raid6: clean up the style of raid6test/test.c
From: H. Peter Anvin [EMAIL PROTECTED] Date: Fri, 26 Oct 2007 11:22:42 -0700 Clean up the coding style in raid6test/test.c. Break it apart into subfunctions to make the code more readable. Signed-off-by: H. Peter Anvin [EMAIL PROTECTED] Signed-off-by: Neil Brown [EMAIL PROTECTED] ### Diffstat output ./drivers/md/raid6test/test.c | 115 -- 1 file changed, 68 insertions(+), 47 deletions(-) diff .prev/drivers/md/raid6test/test.c ./drivers/md/raid6test/test.c --- .prev/drivers/md/raid6test/test.c 2007-12-03 14:57:55.0 +1100 +++ ./drivers/md/raid6test/test.c 2007-12-03 14:57:55.0 +1100 @@ -1,12 +1,10 @@ /* -*- linux-c -*- --- * * - * Copyright 2002 H. Peter Anvin - All Rights Reserved + * Copyright 2002-2007 H. Peter Anvin - All Rights Reserved * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation, Inc., 53 Temple Place Ste 330, - * Bostom MA 02111-1307, USA; either version 2 of the License, or - * (at your option) any later version; incorporated herein by reference. + * This file is part of the Linux kernel, and is made available under + * the terms of the GNU General Public License version 2 or (at your + * option) any later version; incorporated herein by reference. * * --- */ @@ -30,67 +28,87 @@ char *dataptrs[NDISKS]; char data[NDISKS][PAGE_SIZE]; char recovi[PAGE_SIZE], recovj[PAGE_SIZE]; -void makedata(void) +static void makedata(void) { int i, j; - for ( i = 0 ; i NDISKS ; i++ ) { - for ( j = 0 ; j PAGE_SIZE ; j++ ) { + for (i = 0; i NDISKS; i++) { + for (j = 0; j PAGE_SIZE; j++) data[i][j] = rand(); - } + dataptrs[i] = data[i]; } } +static char disk_type(int d) +{ + switch (d) { + case NDISKS-2: + return 'P'; + case NDISKS-1: + return 'Q'; + default: + return 'D'; + } +} + +static int test_disks(int i, int j) +{ + int erra, errb; + + memset(recovi, 0xf0, PAGE_SIZE); + memset(recovj, 0xba, PAGE_SIZE); + + dataptrs[i] = recovi; + dataptrs[j] = recovj; + + raid6_dual_recov(NDISKS, PAGE_SIZE, i, j, (void **)dataptrs); + + erra = memcmp(data[i], recovi, PAGE_SIZE); + errb = memcmp(data[j], recovj, PAGE_SIZE); + + if (i NDISKS-2 j == NDISKS-1) { + /* We don't implement the DQ failure scenario, since it's + equivalent to a RAID-5 failure (XOR, then recompute Q) */ + erra = errb = 0; + } else { + printf(algo=%-8s faila=%3d(%c) failb=%3d(%c) %s\n, + raid6_call.name, + i, disk_type(i), + j, disk_type(j), + (!erra !errb) ? OK : + !erra ? ERRB : + !errb ? ERRA : ERRAB); + } + + dataptrs[i] = data[i]; + dataptrs[j] = data[j]; + + return erra || errb; +} + int main(int argc, char *argv[]) { - const struct raid6_calls * const * algo; + const struct raid6_calls *const *algo; int i, j; - int erra, errb; + int err = 0; makedata(); - for ( algo = raid6_algos ; *algo ; algo++ ) { - if ( !(*algo)-valid || (*algo)-valid() ) { + for (algo = raid6_algos; *algo; algo++) { + if (!(*algo)-valid || (*algo)-valid()) { raid6_call = **algo; /* Nuke syndromes */ memset(data[NDISKS-2], 0xee, 2*PAGE_SIZE); /* Generate assumed good syndrome */ - raid6_call.gen_syndrome(NDISKS, PAGE_SIZE, (void **)dataptrs); + raid6_call.gen_syndrome(NDISKS, PAGE_SIZE, + (void **)dataptrs); - for ( i = 0 ; i NDISKS-1 ; i++ ) { - for ( j = i+1 ; j NDISKS ; j++ ) { - memset(recovi, 0xf0, PAGE_SIZE); - memset(recovj, 0xba, PAGE_SIZE); - - dataptrs[i] = recovi; - dataptrs[j] = recovj; - - raid6_dual_recov(NDISKS, PAGE_SIZE, i, j, (void **)dataptrs); - - erra = memcmp(data[i], recovi, PAGE_SIZE); - errb = memcmp(data[j], recovj, PAGE_SIZE); - - if ( i NDISKS-2 j == NDISKS-1 ) { -
[PATCH 003 of 3] md: Update md bitmap during resync.
Currently and md array with a write-intent bitmap does not updated that bitmap to reflect successful partial resync. Rather the entire bitmap is updated when the resync completes. This is because there is no guarentee that resync requests will complete in order, and tracking each request individually is unnecessarily burdensome. However there is value in regularly updating the bitmap, so add code to periodically pause while all pending sync requests complete, then update the bitmap. Doing this only every few seconds (the same as the bitmap update time) does not notciably affect resync performance. Signed-off-by: Neil Brown [EMAIL PROTECTED] ### Diffstat output ./drivers/md/bitmap.c | 34 +- ./drivers/md/raid1.c |1 + ./drivers/md/raid10.c |2 ++ ./drivers/md/raid5.c |3 +++ ./include/linux/raid/bitmap.h |3 +++ 5 files changed, 38 insertions(+), 5 deletions(-) diff .prev/drivers/md/bitmap.c ./drivers/md/bitmap.c --- .prev/drivers/md/bitmap.c 2007-12-03 14:58:48.0 +1100 +++ ./drivers/md/bitmap.c 2007-12-03 14:59:00.0 +1100 @@ -1342,14 +1342,38 @@ void bitmap_close_sync(struct bitmap *bi */ sector_t sector = 0; int blocks; - if (!bitmap) return; + if (!bitmap) + return; while (sector bitmap-mddev-resync_max_sectors) { bitmap_end_sync(bitmap, sector, blocks, 0); -/* - if (sector 500) printk(bitmap_close_sync: sec %llu blks %d\n, -(unsigned long long)sector, blocks); -*/ sector += blocks; + sector += blocks; + } +} + +void bitmap_cond_end_sync(struct bitmap *bitmap, sector_t sector) +{ + sector_t s = 0; + int blocks; + + if (!bitmap) + return; + if (sector == 0) { + bitmap-last_end_sync = jiffies; + return; + } + if (time_before(jiffies, (bitmap-last_end_sync + + bitmap-daemon_sleep * HZ))) + return; + wait_event(bitmap-mddev-recovery_wait, + atomic_read(bitmap-mddev-recovery_active) == 0); + + sector = ~((1ULL CHUNK_BLOCK_SHIFT(bitmap)) - 1); + s = 0; + while (s sector s bitmap-mddev-resync_max_sectors) { + bitmap_end_sync(bitmap, s, blocks, 0); + s += blocks; } + bitmap-last_end_sync = jiffies; } static void bitmap_set_memory_bits(struct bitmap *bitmap, sector_t offset, int needed) diff .prev/drivers/md/raid10.c ./drivers/md/raid10.c --- .prev/drivers/md/raid10.c 2007-12-03 14:58:48.0 +1100 +++ ./drivers/md/raid10.c 2007-12-03 14:58:10.0 +1100 @@ -1670,6 +1670,8 @@ static sector_t sync_request(mddev_t *md if (!go_faster conf-nr_waiting) msleep_interruptible(1000); + bitmap_cond_end_sync(mddev-bitmap, sector_nr); + /* Again, very different code for resync and recovery. * Both must result in an r10bio with a list of bios that * have bi_end_io, bi_sector, bi_bdev set, diff .prev/drivers/md/raid1.c ./drivers/md/raid1.c --- .prev/drivers/md/raid1.c2007-12-03 14:58:48.0 +1100 +++ ./drivers/md/raid1.c2007-12-03 14:58:10.0 +1100 @@ -1684,6 +1684,7 @@ static sector_t sync_request(mddev_t *md if (!go_faster conf-nr_waiting) msleep_interruptible(1000); + bitmap_cond_end_sync(mddev-bitmap, sector_nr); raise_barrier(conf); conf-next_resync = sector_nr; diff .prev/drivers/md/raid5.c ./drivers/md/raid5.c --- .prev/drivers/md/raid5.c2007-12-03 14:58:48.0 +1100 +++ ./drivers/md/raid5.c2007-12-03 14:58:10.0 +1100 @@ -4333,6 +4333,9 @@ static inline sector_t sync_request(mdde return sync_blocks * STRIPE_SECTORS; /* keep things rounded to whole stripes */ } + + bitmap_cond_end_sync(mddev-bitmap, sector_nr); + pd_idx = stripe_to_pdidx(sector_nr, conf, raid_disks); sh = wait_for_inactive_cache(conf, sector_nr, raid_disks, pd_idx); diff .prev/include/linux/raid/bitmap.h ./include/linux/raid/bitmap.h --- .prev/include/linux/raid/bitmap.h 2007-12-03 14:58:48.0 +1100 +++ ./include/linux/raid/bitmap.h 2007-12-03 14:58:10.0 +1100 @@ -244,6 +244,8 @@ struct bitmap { */ unsigned long daemon_lastrun; /* jiffies of last run */ unsigned long daemon_sleep; /* how many seconds between updates? */ + unsigned long last_end_sync; /* when we lasted called end_sync to + * update bitmap with resync progress */ atomic_t pending_writes; /* pending writes to the bitmap file */ wait_queue_head_t write_wait; @@ -275,6 +277,7 @@ void bitmap_endwrite(struct bitmap *bitm int bitmap_start_sync(struct bitmap *bitmap,
Re: Kernel 2.6.23.9 + mdadm 2.6.2-2 + Auto rebuild RAID1?
On 6 Dec 2007, Jan Engelhardt verbalised: On Dec 5 2007 19:29, Nix wrote: On Dec 1 2007 06:19, Justin Piszcz wrote: RAID1, 0.90.03 superblocks (in order to be compatible with LILO, if you use 1.x superblocks with LILO you can't boot) Says who? (Don't use LILO ;-) Well, your kernels must be on a 0.90-superblocked RAID-0 or RAID-1 device. It can't handle booting off 1.x superblocks nor RAID-[56] (not that I could really hope for the latter). If the superblock is at the end (which is the case for 0.90 and 1.0), then the offsets for a specific block on /dev/mdX match the ones for /dev/sda, so it should be easy to use lilo on 1.0 too, no? Sure, but you may have to hack /sbin/lilo to convince it to create the superblock there at all. It's likely to recognise that this is an md device without a v0.90 superblock and refuse to continue. (But I haven't tested it.) -- `The rest is a tale of post and counter-post.' --- Ian Rawlings describes USENET - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html