Spontaneous rebuild
[Please CC me on replies as I'm not subscribed] Hello! I've been experimenting with software RAID a bit lately, using two external 500GB drives. One is connected via USB, one via Firewire. It is set up as a RAID5 with LVM on top so that I can easily add more drives when I run out of space. About a day after the initial setup, things went belly up. First, EXT3 reported strange errors: EXT3-fs error (device dm-0): ext3_new_block: Allocating block in system zone - blocks from 106561536, length 1 EXT3-fs error (device dm-0): ext3_new_block: Allocating block in system zone - blocks from 106561537, length 1 ... There were literally hundreds of these, and they came back immediately when I reformatted the array. So I tried ReiserFS, which worked fine for about a day. Then I got errors like these: ReiserFS: warning: is_tree_node: node level 0 does not match to the expected one 2 ReiserFS: dm-0: warning: vs-5150: search_by_key: invalid format found in block 69839092. Fsck? ReiserFS: dm-0: warning: vs-13070: reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [6 10 0x0 SD] Again, hundreds. So I ran badblocks on the LVM volume, and it reported some bad blocks near the end. Running badblocks on the md array worked, so I recreated the LVM stuff and attributed the failures to undervolting experiments I had been doing (this is my old laptop running as a server). Anyway, the problems are back: To test my theory that everything is alright with the CPU running within its specs, I removed one of the drives while copying some large files yesterday. Initially, everything seemed to work out nicely, and by the morning, the rebuild had finished. Again, I unmounted the filesystem and ran badblocks -svn on the LVM. It ran without gripes for some hours, but just now I saw md had started to rebuild the array again out of the blue: Dec 1 20:04:49 quassel kernel: usb 4-5.2: reset high speed USB device using ehci_hcd and address 4 Dec 2 01:06:02 quassel kernel: md: data-check of RAID array md0 Dec 2 01:06:02 quassel kernel: md: minimum _guaranteed_ speed: 1000 KB/sec/disk. Dec 2 01:06:02 quassel kernel: md: using maximum available idle IO bandwidth (but not more than 20 KB/sec) for data-check. Dec 2 01:06:02 quassel kernel: md: using 128k window, over a total of 488383936 blocks. Dec 2 03:57:24 quassel kernel: usb 4-5.2: reset high speed USB device using ehci_hcd and address 4 I'm not sure the USB resets are related to the problem - device 4-5.2 is part of the array, but I get these sometimes at random intervals and they don't seem to hurt normally. Besides, the first one was long before the rebuild started, and the second one long afterwards. Any ideas why md is rebuilding the array? And could this be related to the bad blocks problem I had first? badblocks is still running, I'll post an update when it is finished. In the meantime, mdadm --detail /dev/md0 and mdadm --examine /dev/sd[bc]1 don't give me any clues as to what went wrong, both disks are marked as "active sync", and the whole array is "active, recovering". Before I forget, I'm running 2.6.23.1 with this config: http://stud4.tuwien.ac.at/~e0626486/config-2.6.23.1-hrt3-fw Thanks, Oliver - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Kernel 2.6.23.9 / P35 Chipset + WD 750GB Drives (reset port)
Justin Piszcz wrote: I am putting a new machine together and I have dual raptor raid 1 for the root, which works just fine under all stress tests. Then I have the WD 750 GiB drive (not RE2, desktop ones for ~150-160 on sale now adays): I ran the following: dd if=/dev/zero of=/dev/sdc dd if=/dev/zero of=/dev/sdd dd if=/dev/zero of=/dev/sde (as it is always a very good idea to do this with any new disk) And sometime along the way(?) (i had gone to sleep and let it run), this occurred: [42880.680144] ata3.00: exception Emask 0x10 SAct 0x0 SErr 0x401 action 0x2 frozen [42880.680231] ata3.00: irq_stat 0x00400040, connection status changed [42880.680290] ata3.00: cmd ec/00:00:00:00:00/00:00:00:00:00/00 tag 0 cdb 0x0 data 512 in [42880.680292] res 40/00:ac:d8:64:54/00:00:57:00:00/40 Emask 0x10 (ATA bus error) [42881.841899] ata3: soft resetting port [42885.966320] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [42915.919042] ata3.00: qc timeout (cmd 0xec) [42915.919094] ata3.00: failed to IDENTIFY (I/O error, err_mask=0x5) [42915.919149] ata3.00: revalidation failed (errno=-5) [42915.919206] ata3: failed to recover some devices, retrying in 5 secs [42920.912458] ata3: hard resetting port [42926.411363] ata3: port is slow to respond, please be patient (Status 0x80) [42930.943080] ata3: COMRESET failed (errno=-16) [42930.943130] ata3: hard resetting port [42931.399628] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [42931.413523] ata3.00: configured for UDMA/133 [42931.413586] ata3: EH pending after completion, repeating EH (cnt=4) [42931.413655] ata3: EH complete [42931.413719] sd 2:0:0:0: [sdc] 1465149168 512-byte hardware sectors (750156 MB) [42931.413809] sd 2:0:0:0: [sdc] Write Protect is off [42931.413856] sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00 [42931.413867] sd 2:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Usually when I see this sort of thing with another box I have full of raptors, it was due to a bad raptor and I never saw it again after I replaced the disk that it happened on, but that was using the Intel P965 chipset. For this board, it is a Gigabyte GSP-P35-DS4 (Rev 2.0) and I have all of the drives (2 raptors, 3 750s connected to the Intel ICH9 Southbridge). I am going to do some further testing but does this indicate a bad drive? Bad cable? Bad connector? Could be any of the above. As you can see above, /dev/sdc stopped responding for a little bit and then the kernel reset the port. It looks like the first thing that happened is that the controller reported it lost the SATA link, and then the drive didn't respond until it was bashed with a few hard resets.. Why is this though? What is the likely root cause? Should I replace the drive? Obviously this is not normal and cannot be good at all, the idea is to put these drives in a RAID5 and if one is going to timeout that is going to cause the array to go degraded and thus be worthless in a raid5 configuration. Can anyone offer any insight here? Thank you, Justin. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Kernel 2.6.23.9 / P35 Chipset + WD 750GB Drives (reset port)
Jan Engelhardt wrote: On Dec 1 2007 06:26, Justin Piszcz wrote: I ran the following: dd if=/dev/zero of=/dev/sdc dd if=/dev/zero of=/dev/sdd dd if=/dev/zero of=/dev/sde (as it is always a very good idea to do this with any new disk) Why would you care about what's on the disk? fdisk, mkfs and the day-to-day operation will overwrite it _anyway_. (If you think the disk is not empty, you should look at it and copy off all usable warez beforehand :-) Do you not test your drive for minimum functionality before using them? Also, if you have the tools to check for relocated sectors before and after doing this, that's a good idea as well. S.M.A.R.T is your friend. And when writing /dev/zero to a drive, if it craps out you have less emotional attachment to the data. -- Bill Davidsen <[EMAIL PROTECTED]> "We have more to fear from the bungling of the incompetent than from the machinations of the wicked." - from Slashdot - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Kernel 2.6.23.9 / P35 Chipset + WD 750GB Drives (reset port)
On Sat, 1 Dec 2007, Janek Kozicki wrote: Justin Piszcz said: (by the date of Sat, 1 Dec 2007 07:23:41 -0500 (EST)) dd if=/dev/zero of=/dev/sdc The purpose is with any new disk its good to write to all the blocks and let the drive to all of the re-mapping before you put 'real' data on it. Let it crap out or fail before I put my data on it. better use badblocks. It writes data, then reads it afterwards: In this example the data is semi random (quicker than /dev/urandom ;) badblocks -c 10240 -s -w -t random -v /dev/sdc -- Janek Kozicki | - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Will give this a shot and see if I can reproduce the error, thanks. - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Kernel 2.6.23.9 / P35 Chipset + WD 750GB Drives (reset port)
Justin Piszcz said: (by the date of Sat, 1 Dec 2007 07:23:41 -0500 (EST)) > >> dd if=/dev/zero of=/dev/sdc > > The purpose is with any new disk its good to write to all the blocks and > let the drive to all of the re-mapping before you put 'real' data on it. > Let it crap out or fail before I put my data on it. better use badblocks. It writes data, then reads it afterwards: In this example the data is semi random (quicker than /dev/urandom ;) badblocks -c 10240 -s -w -t random -v /dev/sdc -- Janek Kozicki | - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raid5 reshape/resync
- Message from [EMAIL PROTECTED] - Date: Thu, 29 Nov 2007 16:48:47 +1100 From: Neil Brown <[EMAIL PROTECTED]> Reply-To: Neil Brown <[EMAIL PROTECTED]> Subject: Re: raid5 reshape/resync To: Nagilum <[EMAIL PROTECTED]> Cc: linux-raid@vger.kernel.org > Hi, > I'm running 2.6.23.8 x86_64 using mdadm v2.6.4. > I was adding a disk (/dev/sdf) to an existing raid5 (/dev/sd[a-e] -> md0) > During that reshape (at around 4%) /dev/sdd reported read errors and > went offline. Sad. > I replaced /dev/sdd with a new drive and tried to reassemble the array > (/dev/sdd was shown as removed and now as spare). There must be a step missing here. Just because one drive goes offline, that doesn't mean that you need to reassemble the array. It should just continue with the reshape until that is finished. Did you shut the machine down or did it crash or what > Assembly worked but it would not run unless I use --force. That suggests an unclean shutdown. Maybe it did crash? I started the reshape and went out. When I came back the controller was beeping (indicating the erraneous disk). I tried to log on but I could not get in. The machine was responding to pings but that was about it (no ssh or xdm login worked). So I hard rebooted. I booted into a rescue root, the /etc/mdadm/mdadm.conf didn't yet include the new disk so the raid was missing one disk and not started. Since I didn't know what exactly what was going on I --re-added sdf (the new disk) and tried to resume reshaping. A second into that the read failure on /dev/sdd was reported. So I stopped md0 and shut down to verify the read error with another controller. After I had verified that I replaced /dev/sdd with a new drive and put in the broken drive as /dev/sdg, just in case. > Since I'm always reluctant to use force I put the bad disk back in, > this time as /dev/sdg . I re-added the drive and could run the array. > The array started to resync (since the disk can be read until 4%) and > then I marked the disk as failed. Now the array is "active, degraded, > recovering": It should have restarted the reshape from whereever it was up to, so it should have hit the read error almost immediately. Do you remember where it started the reshape from? If it restarted from the beginning that would be bad. It must have continued where it left off since the reshape position in all superblocks was at about 4%. Did you just "--assemble" all the drives or did you do something else? Sorry for being a bit unexact here, I didn't actually have to use --assemble, when booting into the rescue root the raid came up with /dev/sdd and /dev/sdf removed. I just had to --re-add /dev/sdf > unusually low which seems to indicate a lot of seeking as if two > operations are happening at the same time. Well reshape is always slow as it has to read from one part of the drive and write to another part of the drive. Actually it was resyncing with the minimum speed, I managed to crank up the speed to >20MB/s by adjusting /sys/block/md0/md/sync_speed_min > Can someone relief my doubts as to whether md does the right thing here? > Thanks, I believe it is do "the right thing". > - End message from [EMAIL PROTECTED] - Ok, so the reshape tried to continue without the failed drive and after that resynced to the new spare. As I would expect. Unfortunately the result is a mess. On top of the Raid5 I have Hmm. This I would not expect. dm-crypt and LVM. Although dmcrypt and LVM dont appear to have a problem the filesystems on top are a mess now. Can you be more specific about what sort of "mess" they are in? Sure. So here is the vg-layout: nas:~# lvdisplay vg01 --- Logical volume --- LV Name/dev/vg01/lv1 VG Namevg01 LV UUID4HmzU2-VQpO-vy5R-Wdys-PmwH-AuUg-W02CKS LV Write Accessread/write LV Status available # open 0 LV Size512.00 MB Current LE 128 Segments 1 Allocation inherit Read ahead sectors 0 Block device 253:1 --- Logical volume --- LV Name/dev/vg01/lv2 VG Namevg01 LV UUID4e2ZB9-29Rb-dy4M-EzEY-cEIG-Nm1I-CPI0kk LV Write Accessread/write LV Status available # open 0 LV Size7.81 GB Current LE 2000 Segments 1 Allocation inherit Read ahead sectors 0 Block device 253:2 --- Logical volume --- LV Name/dev/vg01/lv3 VG Namevg01 LV UUIDYQRd0X-5hF8-2dd3-GG4v-wQLH-WGH0-ntGgug LV Write Accessread/write LV Status available # open 0 LV Size1.81 TB Current LE 474735 Segments 1 Allocation inherit Rea
Re: Kernel 2.6.23.9 / P35 Chipset + WD 750GB Drives (reset port)
On Sat, 1 Dec 2007, Jan Engelhardt wrote: On Dec 1 2007 06:26, Justin Piszcz wrote: I ran the following: dd if=/dev/zero of=/dev/sdc dd if=/dev/zero of=/dev/sdd dd if=/dev/zero of=/dev/sde (as it is always a very good idea to do this with any new disk) Why would you care about what's on the disk? fdisk, mkfs and the day-to-day operation will overwrite it _anyway_. (If you think the disk is not empty, you should look at it and copy off all usable warez beforehand :-) The purpose is with any new disk its good to write to all the blocks and let the drive to all of the re-mapping before you put 'real' data on it. Let it crap out or fail before I put my data on it. Justin. - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Kernel 2.6.23.9 + mdadm 2.6.2-2 + Auto rebuild RAID1?
On Sat, 1 Dec 2007, Jan Engelhardt wrote: On Dec 1 2007 07:12, Justin Piszcz wrote: On Sat, 1 Dec 2007, Jan Engelhardt wrote: On Dec 1 2007 06:19, Justin Piszcz wrote: RAID1, 0.90.03 superblocks (in order to be compatible with LILO, if you use 1.x superblocks with LILO you can't boot) Says who? (Don't use LILO ;-) I like LILO :) LILO cares much less about disk layout / filesystems than GRUB does, so I would have expected LILO to cope with all sorts of superblocks. OTOH I would suspect GRUB to only handle 0.90 and 1.0, where the MDSB is at the end of the disk <=> the filesystem SB is at the very beginning. So two questions: 1) If it rebuilt by itself, how come it only rebuilt /dev/md0? So md1/md2 was NOT rebuilt? Correct. Well it should, after they are readded using -a. If they still don't, then perhaps another resync is in progress. There was nothing in progress, md0 was synced up and md1,md2 = degraded. - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Kernel 2.6.23.9 + mdadm 2.6.2-2 + Auto rebuild RAID1?
On Dec 1 2007 07:12, Justin Piszcz wrote: > On Sat, 1 Dec 2007, Jan Engelhardt wrote: >> On Dec 1 2007 06:19, Justin Piszcz wrote: >> >> > RAID1, 0.90.03 superblocks (in order to be compatible with LILO, if >> > you use 1.x superblocks with LILO you can't boot) >> >> Says who? (Don't use LILO ;-) > > I like LILO :) LILO cares much less about disk layout / filesystems than GRUB does, so I would have expected LILO to cope with all sorts of superblocks. OTOH I would suspect GRUB to only handle 0.90 and 1.0, where the MDSB is at the end of the disk <=> the filesystem SB is at the very beginning. >> > So two questions: >> > >> > 1) If it rebuilt by itself, how come it only rebuilt /dev/md0? >> >> So md1/md2 was NOT rebuilt? > > Correct. Well it should, after they are readded using -a. If they still don't, then perhaps another resync is in progress. - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Kernel 2.6.23.9 / P35 Chipset + WD 750GB Drives (reset port)
On Dec 1 2007 06:26, Justin Piszcz wrote: > I ran the following: > > dd if=/dev/zero of=/dev/sdc > dd if=/dev/zero of=/dev/sdd > dd if=/dev/zero of=/dev/sde > > (as it is always a very good idea to do this with any new disk) Why would you care about what's on the disk? fdisk, mkfs and the day-to-day operation will overwrite it _anyway_. (If you think the disk is not empty, you should look at it and copy off all usable warez beforehand :-) - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Kernel 2.6.23.9 + mdadm 2.6.2-2 + Auto rebuild RAID1?
On Sat, 1 Dec 2007, Jan Engelhardt wrote: On Dec 1 2007 06:19, Justin Piszcz wrote: RAID1, 0.90.03 superblocks (in order to be compatible with LILO, if you use 1.x superblocks with LILO you can't boot) Says who? (Don't use LILO ;-) I like LILO :) , and then: /dev/sda1+sdb1 <-> /dev/md0 <-> swap /dev/sda2+sdb2 <-> /dev/md1 <-> /boot (ext3) /dev/sda3+sdb3 <-> /dev/md2 <-> / (xfs) All works fine, no issues... Quick question though, I turned off the machine, disconnected /dev/sda from the machine, boot from /dev/sdb, no problems, shows as degraded RAID1. Turn the machine off. Re-attach the first drive. When I boot my first partition either re-synced by itself or it was not degraded, was is this? If md0 was not touched (written to) after you disconnected sda, it also should not be in a degraded state. So two questions: 1) If it rebuilt by itself, how come it only rebuilt /dev/md0? So md1/md2 was NOT rebuilt? Correct. 2) If it did not rebuild, is it because the kernel knows it does not need to re-calculate parity etc for swap? Kernel does not know what's inside an md usually. And it should not try to be smart. Ok. I had to: mdadm /dev/md1 -a /dev/sda2 and mdadm /dev/md2 -a /dev/sda3 To rebuild the /boot and /, which worked fine, I am just curious though why it works like this, I figured it would be all or nothing. Devices are not automatically readded. Who knows, maybe you inserted a different disk into sda which you don't want to be overwritten. Makes sense, I just wanted to confirm that it was normal.. More info: Not using ANY initramfs/initrd images, everything is compiled into 1 kernel image (makes things MUCH simpler and the expected device layout etc is always the same, unlike initrd/etc). My expected device layout is also always the same, _with_ initrd. Why? Simply because mdadm.conf is copied to the initrd, and mdadm will use your defined order. That is another way as well, people seem to be divided. - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Kernel 2.6.23.9 + mdadm 2.6.2-2 + Auto rebuild RAID1?
On Dec 1 2007 06:19, Justin Piszcz wrote: > RAID1, 0.90.03 superblocks (in order to be compatible with LILO, if > you use 1.x superblocks with LILO you can't boot) Says who? (Don't use LILO ;-) >, and then: > > /dev/sda1+sdb1 <-> /dev/md0 <-> swap > /dev/sda2+sdb2 <-> /dev/md1 <-> /boot (ext3) > /dev/sda3+sdb3 <-> /dev/md2 <-> / (xfs) > > All works fine, no issues... > > Quick question though, I turned off the machine, disconnected /dev/sda > from the machine, boot from /dev/sdb, no problems, shows as degraded > RAID1. Turn the machine off. Re-attach the first drive. When I boot > my first partition either re-synced by itself or it was not degraded, > was is this? If md0 was not touched (written to) after you disconnected sda, it also should not be in a degraded state. > So two questions: > > 1) If it rebuilt by itself, how come it only rebuilt /dev/md0? So md1/md2 was NOT rebuilt? > 2) If it did not rebuild, is it because the kernel knows it does not >need to re-calculate parity etc for swap? Kernel does not know what's inside an md usually. And it should not try to be smart. > I had to: > > mdadm /dev/md1 -a /dev/sda2 > and > mdadm /dev/md2 -a /dev/sda3 > > To rebuild the /boot and /, which worked fine, I am just curious > though why it works like this, I figured it would be all or nothing. Devices are not automatically readded. Who knows, maybe you inserted a different disk into sda which you don't want to be overwritten. > More info: > > Not using ANY initramfs/initrd images, everything is compiled into 1 > kernel image (makes things MUCH simpler and the expected device layout > etc is always the same, unlike initrd/etc). > My expected device layout is also always the same, _with_ initrd. Why? Simply because mdadm.conf is copied to the initrd, and mdadm will use your defined order. - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Kernel 2.6.23.9 / P35 Chipset + WD 750GB Drives (reset port)
I am putting a new machine together and I have dual raptor raid 1 for the root, which works just fine under all stress tests. Then I have the WD 750 GiB drive (not RE2, desktop ones for ~150-160 on sale now adays): I ran the following: dd if=/dev/zero of=/dev/sdc dd if=/dev/zero of=/dev/sdd dd if=/dev/zero of=/dev/sde (as it is always a very good idea to do this with any new disk) And sometime along the way(?) (i had gone to sleep and let it run), this occurred: [42880.680144] ata3.00: exception Emask 0x10 SAct 0x0 SErr 0x401 action 0x2 frozen [42880.680231] ata3.00: irq_stat 0x00400040, connection status changed [42880.680290] ata3.00: cmd ec/00:00:00:00:00/00:00:00:00:00/00 tag 0 cdb 0x0 data 512 in [42880.680292] res 40/00:ac:d8:64:54/00:00:57:00:00/40 Emask 0x10 (ATA bus error) [42881.841899] ata3: soft resetting port [42885.966320] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [42915.919042] ata3.00: qc timeout (cmd 0xec) [42915.919094] ata3.00: failed to IDENTIFY (I/O error, err_mask=0x5) [42915.919149] ata3.00: revalidation failed (errno=-5) [42915.919206] ata3: failed to recover some devices, retrying in 5 secs [42920.912458] ata3: hard resetting port [42926.411363] ata3: port is slow to respond, please be patient (Status 0x80) [42930.943080] ata3: COMRESET failed (errno=-16) [42930.943130] ata3: hard resetting port [42931.399628] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [42931.413523] ata3.00: configured for UDMA/133 [42931.413586] ata3: EH pending after completion, repeating EH (cnt=4) [42931.413655] ata3: EH complete [42931.413719] sd 2:0:0:0: [sdc] 1465149168 512-byte hardware sectors (750156 MB) [42931.413809] sd 2:0:0:0: [sdc] Write Protect is off [42931.413856] sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00 [42931.413867] sd 2:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Usually when I see this sort of thing with another box I have full of raptors, it was due to a bad raptor and I never saw it again after I replaced the disk that it happened on, but that was using the Intel P965 chipset. For this board, it is a Gigabyte GSP-P35-DS4 (Rev 2.0) and I have all of the drives (2 raptors, 3 750s connected to the Intel ICH9 Southbridge). I am going to do some further testing but does this indicate a bad drive? Bad cable? Bad connector? As you can see above, /dev/sdc stopped responding for a little bit and then the kernel reset the port. Why is this though? What is the likely root cause? Should I replace the drive? Obviously this is not normal and cannot be good at all, the idea is to put these drives in a RAID5 and if one is going to timeout that is going to cause the array to go degraded and thus be worthless in a raid5 configuration. Can anyone offer any insight here? Thank you, Justin. - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Kernel 2.6.23.9 + mdadm 2.6.2-2 + Auto rebuild RAID1?
Quick question, Setup a new machine last night with two raptor 150 disks. Setup RAID1 as I do everywhere else, 0.90.03 superblocks (in order to be compatible with LILO, if you use 1.x superblocks with LILO you can't boot), and then: /dev/sda1+sdb1 <-> /dev/md0 <-> swap /dev/sda2+sdb2 <-> /dev/md1 <-> /boot (ext3) /dev/sda3+sdb3 <-> /dev/md2 <-> / (xfs) All works fine, no issues... Quick question though, I turned off the machine, disconnected /dev/sda from the machine, boot from /dev/sdb, no problems, shows as degraded RAID1. Turn the machine off. Re-attach the first drive. When I boot my first partition either re-synced by itself or it was not degraded, was is this? So two questions: 1) If it rebuilt by itself, how come it only rebuilt /dev/md0? 2) If it did not rebuild, is it because the kernel knows it does not need to re-calculate parity etc for swap? I had to: mdadm /dev/md1 -a /dev/sda2 and mdadm /dev/md2 -a /dev/sda3 To rebuild the /boot and /, which worked fine, I am just curious though why it works like this, I figured it would be all or nothing. More info: Not using ANY initramfs/initrd images, everything is compiled into 1 kernel image (makes things MUCH simpler and the expected device layout etc is always the same, unlike initrd/etc). Justin. - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html