RE: Hard Disk problems

2006-04-02 Thread Gayn Winters
 [mailto:[EMAIL PROTECTED] On Behalf Of Shane Ambler
 Sent: Saturday, April 01, 2006 3:10 AM
 To: FreeBSD Mailing Lists
 Subject: Hard Disk problems
 
 
 A few days ago I started getting some disk errors and can't 
 seem to find a
 reference to find a way to fix them (other than the obvious re-format)
 
 
 The daily security run output contains the following (abbreviated)
 
 Checking setuid files and devices:
 find: /usr/ports/databases/db43/work/db-4.3.28/db: Input/output error
 find: /usr/ports/devel/git/Makefile: Input/output error
 
 ~ repeated 32 times for different files (thankfully all in 
 the ports tree)
 
 tower.home.com kernel log messages:
  ad0: FAILURE - READ_DMA status=51READY,DSC,ERROR
 error=40UNCORRECTABLE LBA=139102367
  ad0: FAILURE - READ_DMA status=51READY,DSC,ERROR
 error=1ILLEGAL_LENGTH LBA=139102367
 
 These 2 error codes are repeated a total of 38 times all with 
 the same LBA
 
 If I start in single user mode and do fsck it takes about 
 half an hour to
 get through and repeats similar errors many times for just 
 about every check
 it does.
 
 Running #fsck -y  fsckout (while in multiuser mode) is as follows -
 followed by dmesg output since boot
 
  cat fsckout 
 ** /dev/ad0s1a (NO WRITE)
 ** Last Mounted on /
 ** Root file system
 ** Phase 1 - Check Blocks and Sizes
 ** Phase 2 - Check Pathnames
 ** Phase 3 - Check Connectivity
 ** Phase 4 - Check Reference Counts
 ** Phase 5 - Check Cyl groups
 2259 files, 44188 used, 82651 free (251 frags, 10300 blocks, 0.2%
 fragmentation)
 ** /dev/ad0s1e (NO WRITE)
 ** Last Mounted on /tmp
 ** Phase 1 - Check Blocks and Sizes
 ** Phase 2 - Check Pathnames
 ** Phase 3 - Check Connectivity
 ** Phase 4 - Check Reference Counts
 ** Phase 5 - Check Cyl groups
 591 files, 4501 used, 122338 free (242 frags, 15262 blocks, 0.2%
 fragmentation)
 ** /dev/ad0s1f (NO WRITE)
 ** Last Mounted on /usr
 ** Phase 1 - Check Blocks and Sizes
 
 CANNOT READ BLK: 135486944
 UNEXPECTED SOFT UPDATE INCONSISTENCY
 
 CONTINUE? yes
 
 THE FOLLOWING DISK SECTORS COULD NOT BE READ: 135486944, 135486945,
 135486946, 135486947, 135486948, 135486949, 135486950, 
 135486951, 135486952,
 135486953, 135486954, 135486955, 135486956, 135486957, 
 135486958, 135486959,
 135486960, 135486961, 135486962, 135486963, 135486964, 
 135486965, 135486966,
 135486967, 135486968, 135486969, 135486970,
 ** Phase 2 - Check Pathnames
 UNALLOCATED  I=5049385  OWNER=squid MODE=100600
 SIZE=15032 MTIME=Apr  1 21:07 2006
 FILE=/local/squid/cache/00/26/26C2
 
 UNEXPECTED SOFT UPDATE INCONSISTENCY
 
 REMOVE? no
 
 UNALLOCATED  I=5049875  OWNER=squid MODE=100600
 SIZE=10825 MTIME=Apr  1 21:07 2006
 FILE=/local/squid/cache/00/26/26CA
 
 UNEXPECTED SOFT UPDATE INCONSISTENCY
 
 REMOVE? no
 
 UNALLOCATED  I=5049896  OWNER=squid MODE=100600
 SIZE=15008 MTIME=Apr  1 21:07 2006
 FILE=/local/squid/cache/00/26/26D1
 
 UNEXPECTED SOFT UPDATE INCONSISTENCY
 
 REMOVE? no
 
 ** Phase 3 - Check Connectivity
 ** Phase 4 - Check Reference Counts
 LINK COUNT FILE I=5740857  OWNER=squid MODE=0
 SIZE=0 MTIME=Apr  1 21:09 2006  COUNT 0 SHOULD BE -1
 ADJUST? no
 
 LINK COUNT FILE I=5792561  OWNER=squid MODE=0
 SIZE=0 MTIME=Apr  1 21:07 2006  COUNT 0 SHOULD BE -1
 ADJUST? no
 
 LINK COUNT FILE I=5875155  OWNER=squid MODE=0
 SIZE=0 MTIME=Apr  1 21:09 2006  COUNT 0 SHOULD BE -1
 ADJUST? no
 
 LINK COUNT FILE I=5970461  OWNER=squid MODE=0
 SIZE=0 MTIME=Apr  1 21:09 2006  COUNT 0 SHOULD BE -1
 ADJUST? no
 
 ** Phase 5 - Check Cyl groups
 SUMMARY INFORMATION BAD
 SALVAGE? no
 
 ALLOCATED FRAGS 1936880-1936911 MARKED FREE
 ALLOCATED FRAGS 1936976-1936983 MARKED FREE
 BLK(S) MISSING IN BIT MAPS
 SALVAGE? no
 
 ALLOCATED FILE 5740857 MARKED FREE
 ALLOCATED FRAG 22922007 MARKED FREE
 ALLOCATED FILE 5792561 MARKED FREE
 ALLOCATED FILE 5856663 MARKED FREE
 ALLOCATED FILE 5875155 MARKED FREE
 ALLOCATED FRAG 23448111 MARKED FREE
 ALLOCATED FILE 5970461 MARKED FREE
 ALLOCATED FRAG 23889647 MARKED FREE
 ALLOCATED FILE 6077762 MARKED FREE
 ALLOCATED FRAG 24353503 MARKED FREE
 ALLOCATED FRAGS 26021808-26021813 MARKED FREE
 ALLOCATED FRAGS 26301688-26301690 MARKED FREE
 1534559 files, 15746410 used, 21222026 free (2172530 frags, 
 2381187 blocks,
 5.9% fragmentation)
 ** /dev/ad0s1d (NO WRITE)
 ** Last Mounted on /var
 ** Phase 1 - Check Blocks and Sizes
 ** Phase 2 - Check Pathnames
 ** Phase 3 - Check Connectivity
 ** Phase 4 - Check Reference Counts
 UNREF FILE I=8278  OWNER=mysql MODE=100600
 SIZE=0 MTIME=Apr  1 19:13 2006
 CLEAR? no
 
 UNREF FILE I=8301  OWNER=mysql MODE=100600
 SIZE=0 MTIME=Apr  1 19:13 2006
 CLEAR? no
 
 UNREF FILE I=8306  OWNER=mysql MODE=100600
 SIZE=0 MTIME=Apr  1 19:13 2006
 CLEAR? no
 
 UNREF FILE I=25696  OWNER=root MODE=140666
 SIZE=0 MTIME=Apr  1 19:13 2006
 CLEAR? no
 
 ** Phase 5 - Check Cyl groups
 3681 files, 59732 used, 67107 free (1275 frags, 8229 blocks, 1.0%
 fragmentation)
 
  cat dmesg output
 ad0: FAILURE - READ_DMA status=51READY,DSC,ERROR 
 error=1ILLEGAL_LENGTH
 LBA=139102367
 

RE: Hard Disk problems

2006-04-02 Thread Gayn Winters
 You'll probably want to
 reread the section in the Handbook on Moving to a Larger Disk, since
 this is a good time to rethink the sizes of your partitions.

Sorry, this info is in FAQs 9.1 and 9.2 not in the Handbook.
http://www.freebsd.org/doc/en_US.ISO8859-1/books/faq/disks.html

-gayn

Bristol Systems Inc.
714/532-6776
www.bristolsystems.com 


___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Hard Disk problems

2006-04-02 Thread Shane Ambler
On 3/4/06 2:49 AM, Gayn Winters [EMAIL PROTECTED] wrote:

 [mailto:[EMAIL PROTECTED] On Behalf Of Shane Ambler
 Sent: Saturday, April 01, 2006 3:10 AM
 To: FreeBSD Mailing Lists
 Subject: Hard Disk problems
 
 
 A few days ago I started getting some disk errors and can't
 seem to find a
 reference to find a way to fix them (other than the obvious re-format)
 
 
 The daily security run output contains the following (abbreviated)
 
 Checking setuid files and devices:
 find: /usr/ports/databases/db43/work/db-4.3.28/db: Input/output error
 find: /usr/ports/devel/git/Makefile: Input/output error
 
 ~ repeated 32 times for different files (thankfully all in
 the ports tree)
 
 tower.home.com kernel log messages:
 ad0: FAILURE - READ_DMA status=51READY,DSC,ERROR
 error=40UNCORRECTABLE LBA=139102367
 ad0: FAILURE - READ_DMA status=51READY,DSC,ERROR
 error=1ILLEGAL_LENGTH LBA=139102367
 
 These 2 error codes are repeated a total of 38 times all with
 the same LBA
 
 If I start in single user mode and do fsck it takes about
 half an hour to
 get through and repeats similar errors many times for just
 about every check
 it does.
 
 Running #fsck -y  fsckout (while in multiuser mode) is as follows -
 followed by dmesg output since boot
 
 cat fsckout 
 ** /dev/ad0s1a (NO WRITE)
 ** Last Mounted on /

Snip

 ad0: FAILURE - READ_DMA status=51READY,DSC,ERROR
 error=1ILLEGAL_LENGTH
 LBA=139102393
 
 
 
 
 -- 
 
 Shane Ambler
 
 Looks to me like your disk subsystem is dying.  Most likely it is just
 the disk ad0.  If you don't have a good backup, do that immediately.
 Get a new disk in there and test it thoroughly (with the manufacturer's
 diagnostics.)  If all is well, restore to it.  You'll probably want to
 reread the section in the Handbook on Moving to a Larger Disk, since
 this is a good time to rethink the sizes of your partitions.
 
 Incidentally, you can just install the new disk (as ad1), install FBSD
 on it, and dump|restore from ad0 to ad1.
 
 Once restored, you'll still have to clean up the damage.  This is easier
 if your new new disk has a separate partition for user data, since you
 can use a fresh install of the OS, the ports, etc. and worry about
 repairing the user data as best you can.
 
 Good luck!
 
 -gayn
 
 Bristol Systems Inc.
 714/532-6776
 www.bristolsystems.com
 
 
 

Thanks.

I was kinda thinking that might be the case. Space isn't an issue (it's a
120GB drive) this is mostly a testing/learning server at home - runs squid
and dns cache for home use (my other half does a lot of auto-surfing to try
and make a few bucks) and apache/mysql for testing web devel.

The files that showed up as i/o errors are all in /usr/ports so no probs
there, I should be able to copy across what is readable to another drive
without any problems or real loss and worthwhile data there is easy to
replace. 

I am fairly new to *nix and was looking to see if I could learn more
disaster recovery - thought there might be a chance that it was just bad
sectors that weren't getting mapped out automagicaly and I could learn to
fix it manually without reformatting. Now I know that if I see it happen
again I should just replace the disk as soon as I can.


-- 

Shane Ambler
Sales Department
007Marketing.com
[EMAIL PROTECTED]


___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]