Hi there,
I've got a very scary problem at the moment, which I will try to explain as
best I can - briefly, one of my harddrives has stopped responding, but was
working up until yesterday. Yesterday I swapped out a harddrive for another
one and in the process took all the drives out (but ultimately put them all
back in in the same order).
Ok... the lowdown is this:
1. The BIOS reports the HDD name as: "GDC GD!6 0JB-0 DEA#" wheras it SHOULD be
something like "WDC WD1600JB-00xxxx". (it is a WD 160Gb drive).
2. after the HDD fiasco yesterday (after I removed the drives the system
wouldn't boot at ALL, giving a "operating system not found" error - EVEN
AFTER I TESTED REPLACING THEM ALL EXACTLY AS THEY HAD BEEN!
- after that fiasco I booted into a CD linux, chrooted and edited my lilo
files etc, ran lilo - finally got it to boot (but weirdly enough only if I
had the suse install CD in there and chose boot from harddrive... the
bootloader is there - it's just sitting and sitting and doing nothing when I
switch the computer on).
anyway - during that process I did notice the harddrive name being reported
incorrectly (and incidentally this is /dev/hdd - it has one partition - using
the ext2 filesystem, it is connected as the secondary slave, to the SAME
CHANNEL as my DVD-RW (is this a problem?). I have not yet tested the
possibility of a faulty cable.
Anyway - after all that stuffing around when I DID get it to boot I got all
kinds of harddrive errors and all it wanted me to do was login and do some
manual fsking. so I logged in and edited the fstab to remove references to
the extra drive. when I got it to boot, I uncommented them and I mounted both
successfully (and the drive name returned to normal as far as I recall).
Today however, I was creating some dvd isos and I opened konqueror and the
computer froze. so I rebooted, and it came up with errors - fair enough I
thought - there's bound to be pollution caused by a running mkisofs session
that just gets terminated, and this is on hdb (not hdd - the drive in
question). so I do the fsck stuff and then it decides that it wont boot until
I deal with /dev/hdd1.
3. I attempt a fsck of /dev/hdd1 (the 160gb drive in question) and it says
something about it not containing a valid superblock? anyway whatever the
message was it basically asked if I want to continue, with a default option
of 'no'. playing it safe I quit it. commented the drive from the fstab and
rebooted. then I did an fdisk -l and it reported the drive as being 22Gb. I
tried to mount it and it said the device had a bad superblock or something
about magic numbers). NOW when I fdisk -l it I get:
fdisk -l :
Disk /dev/hdd: 22.6 GB, 22600835072 bytes
255 heads, 63 sectors/track, 2747 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/hdd1 1 19457 156288321 83 Linux
when I try to mount:
mount: wrong fs type, bad option, bad superblock on /dev/hdd1,
or too many mounted file systems
So basically I am stumped.
Usually I solve most of my problems by a combination of research and
experimentation, but this is NOT something I feel I should be taking an
inexpert approach to, given the stakes (160Gb of mostly unbacked up data).
So if anyone can offer any directions I should look in or approaches I could
take I'd be very grateful.
I realise that the problem may not even lie in linux - since the BIOS reports
exactly the same corrupted name as linux does. I looked for a SMART mode
option in the BIOS but couldn't find one.
Where do I go? change cable? is it bad to have it paired with a optical drive?
(it's been worrking fine that way for months).
or have I had something hideously bad happen, ie the drive is dying / has
died?
many thanks
James
--
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html