I'm currently running Linux version 2.6.18-6-amd64 (Debian
2.6.18.dfsg.1-18etch3) ([EMAIL PROTECTED]) (gcc version 4.1.2 20061115
(prerelease) (Debian 4.1.1-21)) #1 SMP Thu Apr 24 03:57:46 UTC 2008
Up until now, things have been working fine with the two software raid5
arrays I've got running via mdadm. I had just replaced a failed disk in
the second array (/dev/md1) and decided to update the system with
"apt-get update" and "apt-get upgrade", which all proceeded normally
(including updating to the latest kernel image as seen above). After
the update, I restarted the box to finish the process, and upon getting
back into the system I noticed the first array was running in degraded
mode with a disk missing. Upon inspecting /proc/partitions i found that
/dev/sdm didn't have any partitions listed at all:
---
major minor #blocks name
8 0 312571224 sda
8 16835626 sda1
8 2 1 sda2
8 56040408 sda5
8 6 299692543 sda6
816 488386584 sdb
817 488287611 sdb1
832 488386584 sdc
833 488287611 sdc1
848 488386584 sdd
849 488287611 sdd1
864 488386584 sde
865 488287611 sde1
880 488386584 sdf
881 488287611 sdf1
896 488386584 sdg
897 488287611 sdg1
8 112 488386584 sdh
8 113 488287611 sdh1
8 128 488386584 sdi
8 129 488287611 sdi1
8 144 488386584 sdj
8 145 488287611 sdj1
8 160 244198584 sdk
8 161 244147806 sdk1
8 176 244198584 sdl
8 177 244147806 sdl1
8 192 488386584 sdm
8 208 244198584 sdn
8 209 244147806 sdn1
9 0 4394587392 md0
9 1 488295424 md1
253 0 4882878464 dm-0
---
Here's the layout of the RAID arrays at that point:
---
Personalities : [raid6] [raid5] [raid4]
md1 : active raid5 sdk1[0] sdn1[2] sdl1[1]
488295424 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]
md0 : active raid5 sdb1[0] sdg1[9] sdh1[8] sdi1[7] sdj1[6] sdf1[4]
sde1[3] sdd1[2] sdc1[1]
4394587392 blocks level 5, 64k chunk, algorithm 2 [10/9] [U_]
unused devices:
---
So I figured I'd check the drive in fdisk, which actually found the
partition to exist:
---
Disk /dev/sdm: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/sdm1 1 60789 488287611 fd Linux raid
autodetect
---
I tried switching out the SATA cable with a new one, which had no effect.
Moving the drive to a different port on the controller card didn't
affect it either.
A self-test via smartctl (from smartmontools) didn't turn anything up,
so I just decided to go back into fdisk and write the partition table to
disk. I didn't make any changes to the table. I just went in, printed
the list to make sure it was there, then wrote to disk. After doing
this, the partition appeared in /proc/partitions as seen here:
---
major minor #blocks name
8 0 312571224 sda
8 16835626 sda1
8 2 1 sda2
8 56040408 sda5
8 6 299692543 sda6
816 488386584 sdb
817 488287611 sdb1
832 488386584 sdc
833 488287611 sdc1
848 488386584 sdd
849 488287611 sdd1
864 488386584 sde
865 488287611 sde1
880 488386584 sdf
881 488287611 sdf1
896 488386584 sdg
897 488287611 sdg1
8 112 488386584 sdh
8 113 488287611 sdh1
8 128 488386584 sdi
8 129 488287611 sdi1
8 144 488386584 sdj
8 145 488287611 sdj1
8 160 244198584 sdk
8 161 244147806 sdk1
8 176 244198584 sdl
8 177 244147806 sdl1
8 192 488386584 sdm
8 193 488287611 sdm1
8 208 244198584 sdn
8 209 244147806 sdn1
9 1 488295424 md1
---
So with the partition back in working order I attempted to start the
array with sdm1 included, which kicked it out with a non-fresh error
code. Rather than taking the risk of corrupted data, I just re-added
the drive to the array and let it rebuild. Everything appeared to be
working fine after the rebuild, but upon restarting the box one more
time to see what would happen, the partition had once again vanished.
Going over the dmesg output, it's clear the system can see the partition:
---
SCSI device sdm: 976773168 512-byte hdwr sectors (500108 MB)
sdm: Write Protect is off
sdm: Mode Sense: 00 3a 00 00
SCSI device sdm: drive cache: write back
SCSI device sdm: 976773168 512-byte hdwr sectors (500108 MB)
sdm: Write Protect is off
sdm: Mode Sense: 00 3a 00 00
SCSI device sdm: drive cache: write back
sdm: sdm1
sd 12:0:0:0: Attached scsi disk sdm
---
Any ideas?
--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]