Hi all,

I'm having trouble locating a bad sector on a gstriped file system. Smartd has been nagging about this single bad sector for months now, there don't appear to appear any new ones. It's about time I look into this...

I got so far that I know the sector number in the partition involved. I detailed my attempts after the problem description. I tried newfs- ing the filesystem; it's my /tmp - there's nothing of relevance on it, but newfs-ing doesn't seem to have marked the sector bad. Anything wrong with: newfs -U -o time /dev/stripe/tmp ? I performed that from single-user mode after umounting all file-systems.

I tried opening the filesystem with fsdb, but it can't open the partition, only the striped file-system - how do I determine which sector I'm dealing with on a striped fs? And how do I write to it to have it marked as a bad sector?

I'm not sure whether this error means my disk is at the end of its life, smartd has been spamming me with this single error about the same sector for months now (every half hour!), and it's only the third error in the disks' smart log. If I understand the docs of smartmontools correctly, this could well be caused by the sector not having been written to all this time, which seems plausible to me; it's near the end of a mostly empty /tmp...

From the lifetime it appears the disk is nearly two years old already, and it's been on pretty much 24/7. Maybe it is time to replace it (by a server version probably).

Time for some data.

The disk is an:
Model Family:     Seagate Barracuda 7200.7 and 7200.7 Plus family
Device Model:     ST3200822A
Serial Number:    3LJ020SJ
Firmware Version: 3.01


smartctl says:

Error 3 occurred at disk power-on lifetime: 18356 hours (764 days + 20 hours) When the command that caused the error occurred, the device was active or idle
.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 30 ed 61 40  Error: UNC at LBA = 0x0061ed30 = 6417712

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 20 1f ed 61 40 00      15:42:14.650  READ DMA EXT
  25 00 40 9f e6 61 40 00      15:42:14.419  READ DMA EXT
  25 00 40 df f1 61 40 00      15:42:14.293  READ DMA EXT
  25 00 40 5f e6 61 40 00      15:42:14.049  READ DMA EXT
  25 00 40 5f e9 61 40 00      15:42:13.795  READ DMA EXT


According to fdisk and bsdlabel that's on partition e of slice 1:

# fdisk -s /dev/ad0
/dev/ad0: 387621 cyl 16 hd 63 sec
Part        Start        Size Type Flags
   1:          63   390716802 0xa5 0x80

So the bad sector is at 6417712 - 63 = 6417649 in /dev/ad0s1.

# bsdlabel /dev/ad0s1
# /dev/ad0s1:
8 partitions:
#        size   offset    fstype   [fsize bsize bps/cpg]
  a:   524288        0    4.2BSD     2048 16384 32776
  b:  4194304   524288      swap
c: 390716802 0 unused 0 0 # "raw" part, don't edit
  d:  1048576  4718592    4.2BSD     2048 16384     8
  e:  1048576  5767168    4.2BSD     2048 16384     8
  f: 20971520  6815744    4.2BSD     2048 16384 28552
  g: 362929538 27787264    4.2BSD     2048 16384 28552

So the bad sector is 6417649 - 5767168 = 650481 in partition /dev/ ad0s1e at around 62% of its total size. This is where I started to get lost...

I set up partition ad0s1e to be used in /dev/stripe/tmp:

# gstripe list tmp
Geom name: tmp
State: UP
Status: Total=2, Online=2
Type: AUTOMATIC
Stripesize: 4096
ID: 1982480573
Providers:
1. Name: stripe/tmp
   Mediasize: 1073733632 (1.0G)
   Sectorsize: 512
   Mode: r1w1e1
Consumers:
1. Name: ad0s1e
   Mediasize: 536870912 (512M)
   Sectorsize: 512
   Mode: r1w1e2
   Number: 0
2. Name: ad1s1e
   Mediasize: 536870912 (512M)
   Sectorsize: 512
   Mode: r1w1e2
   Number: 1


I tried: (used -r to prevent it marking my FS's dirty while I was testing)

# fsdb -r /dev/ad0s1e
** /dev/ad0s1e (NO WRITE)
Cannot find file system superblock

LOOK FOR ALTERNATE SUPERBLOCKS? no

fsdb: cannot set up file system `/dev/ad0s1e'
Exit 1

and:

fsdb -r /dev/stripe/tmp
** /dev/stripe/tmp (NO WRITE)
Examining file system `/dev/stripe/tmp'
Last Mounted on /tmp
current inode: directory
I=2 MODE=40777 SIZE=512
        BTIME=Feb  9 12:01:18 2008 [0 nsec]
        MTIME=Feb  9 12:54:41 2008 [0 nsec]
        CTIME=Feb  9 12:54:41 2008 [0 nsec]
        ATIME=Feb  9 13:23:07 2008 [0 nsec]
OWNER=root GRP=wheel LINKCNT=7 FLAGS=0 BLKCNT=4 GEN=7a46458d
fsdb (inum: 2)>

I figured the findblk command would give me the inode of the problem area (although there won't be one if there are no files in that sector I think?), but I'm dealing with sectors striped across two disks... I have no idea which "block number" would be appropriate. The disk containing the bad sector is apparently the first in the stripe, that much I gathered.

So, how to continue?

Regards,
Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:760,47ada565167321710067946!


_______________________________________________
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to