Re: Vanishing RAID autodetect partition

2008-05-10 Thread ketonom

The problem seems to have been fixed after I updated from etch to lenny.


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]




Vanishing RAID autodetect partition

2008-05-09 Thread ketonom
I'm currently running Linux version 2.6.18-6-amd64 (Debian 
2.6.18.dfsg.1-18etch3) ([EMAIL PROTECTED]) (gcc version 4.1.2 20061115 
(prerelease) (Debian 4.1.1-21)) #1 SMP Thu Apr 24 03:57:46 UTC 2008


Up until now, things have been working fine with the two software raid5 
arrays I've got running via mdadm.  I had just replaced a failed disk in 
the second array (/dev/md1) and decided to update the system with 
"apt-get update" and "apt-get upgrade", which all proceeded normally 
(including updating to the latest kernel image as seen above).  After 
the update, I restarted the box to finish the process, and upon getting 
back into the system I noticed the first array was running in degraded 
mode with a disk missing.  Upon inspecting /proc/partitions i found that 
/dev/sdm didn't have any partitions listed at all:

---
major minor  #blocks  name

  8 0  312571224 sda
  8 16835626 sda1
  8 2  1 sda2
  8 56040408 sda5
  8 6  299692543 sda6
  816  488386584 sdb
  817  488287611 sdb1
  832  488386584 sdc
  833  488287611 sdc1
  848  488386584 sdd
  849  488287611 sdd1
  864  488386584 sde
  865  488287611 sde1
  880  488386584 sdf
  881  488287611 sdf1
  896  488386584 sdg
  897  488287611 sdg1
  8   112  488386584 sdh
  8   113  488287611 sdh1
  8   128  488386584 sdi
  8   129  488287611 sdi1
  8   144  488386584 sdj
  8   145  488287611 sdj1
  8   160  244198584 sdk
  8   161  244147806 sdk1
  8   176  244198584 sdl
  8   177  244147806 sdl1
  8   192  488386584 sdm
  8   208  244198584 sdn
  8   209  244147806 sdn1
  9 0 4394587392 md0
  9 1  488295424 md1
253 0 4882878464 dm-0
---

Here's the layout of the RAID arrays at that point:
---
Personalities : [raid6] [raid5] [raid4]
md1 : active raid5 sdk1[0] sdn1[2] sdl1[1]
 488295424 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]

md0 : active raid5 sdb1[0] sdg1[9] sdh1[8] sdi1[7] sdj1[6] sdf1[4] 
sde1[3] sdd1[2] sdc1[1]

 4394587392 blocks level 5, 64k chunk, algorithm 2 [10/9] [U_]

unused devices: 

---

So I figured I'd check the drive in fdisk, which actually found the 
partition to exist:

---
Disk /dev/sdm: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

  Device Boot  Start End  Blocks   Id  System
/dev/sdm1   1   60789   488287611   fd  Linux raid 
autodetect

---

I tried switching out the SATA cable with a new one, which had no effect.
Moving the drive to a different port on the controller card didn't 
affect it either.
A self-test via smartctl (from smartmontools) didn't turn anything up, 
so I just decided to go back into fdisk and write the partition table to 
disk. I didn't make any changes to the table. I just went in, printed 
the list to make sure it was there, then wrote to disk.  After doing 
this, the partition appeared in /proc/partitions as seen here:

---
major minor  #blocks  name

  8 0  312571224 sda
  8 16835626 sda1
  8 2  1 sda2
  8 56040408 sda5
  8 6  299692543 sda6
  816  488386584 sdb
  817  488287611 sdb1
  832  488386584 sdc
  833  488287611 sdc1
  848  488386584 sdd
  849  488287611 sdd1
  864  488386584 sde
  865  488287611 sde1
  880  488386584 sdf
  881  488287611 sdf1
  896  488386584 sdg
  897  488287611 sdg1
  8   112  488386584 sdh
  8   113  488287611 sdh1
  8   128  488386584 sdi
  8   129  488287611 sdi1
  8   144  488386584 sdj
  8   145  488287611 sdj1
  8   160  244198584 sdk
  8   161  244147806 sdk1
  8   176  244198584 sdl
  8   177  244147806 sdl1
  8   192  488386584 sdm
  8   193  488287611 sdm1
  8   208  244198584 sdn
  8   209  244147806 sdn1
  9 1  488295424 md1
---

So with the partition back in working order I attempted to start the 
array with sdm1 included, which kicked it out with a non-fresh error 
code.  Rather than taking the risk of corrupted data, I just re-added 
the drive to the array and let it rebuild.  Everything appeared to be 
working fine after the rebuild, but upon restarting the box one more 
time to see what would happen, the partition had once again vanished.


Going over the dmesg output, it's clear the system can see the partition:
---
SCSI device sdm: 976773168 512-byte hdwr sectors (500108 MB)
sdm: Write Protect is off
sdm: Mode Sense: 00 3a 00 00
SCSI device sdm: drive cache: write back
SCSI device sdm: 976773168 512-byte hdwr sectors (500108 MB)
sdm: Write Protect is off
sdm: Mode Sense: 00 3a 00 00
SCSI device sdm: drive cache: write back
sdm: sdm1
sd 12:0:0:0: Attached scsi disk sdm
---

Any ideas?


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]