Bad sector on a gstripe

Alban Hertroys Sat, 09 Feb 2008 06:03:44 -0800

Hi all,

I'm having trouble locating a bad sector on a gstriped file system.Smartd has been nagging about this single bad sector for months now,there don't appear to appear any new ones. It's about time I lookinto this...

I got so far that I know the sector number in the partition involved.I detailed my attempts after the problem description. I tried newfs-ing the filesystem; it's my /tmp - there's nothing of relevance onit, but newfs-ing doesn't seem to have marked the sector bad.Anything wrong with: newfs -U -o time /dev/stripe/tmp ? I performedthat from single-user mode after umounting all file-systems.

I tried opening the filesystem with fsdb, but it can't open thepartition, only the striped file-system - how do I determine whichsector I'm dealing with on a striped fs? And how do I write to it tohave it marked as a bad sector?

I'm not sure whether this error means my disk is at the end of itslife, smartd has been spamming me with this single error about thesame sector for months now (every half hour!), and it's only thethird error in the disks' smart log. If I understand the docs ofsmartmontools correctly, this could well be caused by the sector nothaving been written to all this time, which seems plausible to me;it's near the end of a mostly empty /tmp...

From the lifetime it appears the disk is nearly two years oldalready, and it's been on pretty much 24/7. Maybe it is time toreplace it (by a server version probably).


Time for some data.

The disk is an:
Model Family:     Seagate Barracuda 7200.7 and 7200.7 Plus family
Device Model:     ST3200822A
Serial Number:    3LJ020SJ
Firmware Version: 3.01


smartctl says:

Error 3 occurred at disk power-on lifetime: 18356 hours (764 days +20 hours)When the command that caused the error occurred, the device wasactive or idle

.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 30 ed 61 40  Error: UNC at LBA = 0x0061ed30 = 6417712

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 20 1f ed 61 40 00      15:42:14.650  READ DMA EXT
  25 00 40 9f e6 61 40 00      15:42:14.419  READ DMA EXT
  25 00 40 df f1 61 40 00      15:42:14.293  READ DMA EXT
  25 00 40 5f e6 61 40 00      15:42:14.049  READ DMA EXT
  25 00 40 5f e9 61 40 00      15:42:13.795  READ DMA EXT


According to fdisk and bsdlabel that's on partition e of slice 1:

# fdisk -s /dev/ad0
/dev/ad0: 387621 cyl 16 hd 63 sec
Part        Start        Size Type Flags
   1:          63   390716802 0xa5 0x80

So the bad sector is at 6417712 - 63 = 6417649 in /dev/ad0s1.

# bsdlabel /dev/ad0s1
# /dev/ad0s1:
8 partitions:
#        size   offset    fstype   [fsize bsize bps/cpg]
  a:   524288        0    4.2BSD     2048 16384 32776
  b:  4194304   524288      swap

c: 390716802 0 unused 0 0 # "raw"part, don't edit

  d:  1048576  4718592    4.2BSD     2048 16384     8
  e:  1048576  5767168    4.2BSD     2048 16384     8
  f: 20971520  6815744    4.2BSD     2048 16384 28552
  g: 362929538 27787264    4.2BSD     2048 16384 28552

So the bad sector is 6417649 - 5767168 = 650481 in partition /dev/ad0s1e at around 62% of its total size. This is where I started toget lost...


I set up partition ad0s1e to be used in /dev/stripe/tmp:

# gstripe list tmp
Geom name: tmp
State: UP
Status: Total=2, Online=2
Type: AUTOMATIC
Stripesize: 4096
ID: 1982480573
Providers:
1. Name: stripe/tmp
   Mediasize: 1073733632 (1.0G)
   Sectorsize: 512
   Mode: r1w1e1
Consumers:
1. Name: ad0s1e
   Mediasize: 536870912 (512M)
   Sectorsize: 512
   Mode: r1w1e2
   Number: 0
2. Name: ad1s1e
   Mediasize: 536870912 (512M)
   Sectorsize: 512
   Mode: r1w1e2
   Number: 1

I tried: (used -r to prevent it marking my FS's dirty while I wastesting)


# fsdb -r /dev/ad0s1e
** /dev/ad0s1e (NO WRITE)
Cannot find file system superblock

LOOK FOR ALTERNATE SUPERBLOCKS? no

fsdb: cannot set up file system `/dev/ad0s1e'
Exit 1

and:

fsdb -r /dev/stripe/tmp
** /dev/stripe/tmp (NO WRITE)
Examining file system `/dev/stripe/tmp'
Last Mounted on /tmp
current inode: directory
I=2 MODE=40777 SIZE=512
        BTIME=Feb  9 12:01:18 2008 [0 nsec]
        MTIME=Feb  9 12:54:41 2008 [0 nsec]
        CTIME=Feb  9 12:54:41 2008 [0 nsec]
        ATIME=Feb  9 13:23:07 2008 [0 nsec]
OWNER=root GRP=wheel LINKCNT=7 FLAGS=0 BLKCNT=4 GEN=7a46458d
fsdb (inum: 2)>

I figured the findblk command would give me the inode of the problemarea (although there won't be one if there are no files in thatsector I think?), but I'm dealing with sectors striped across twodisks... I have no idea which "block number" would be appropriate.The disk containing the bad sector is apparently the first in thestripe, that much I gathered.


So, how to continue?

Regards,
Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:760,47ada565167321710067946!


_______________________________________________
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Bad sector on a gstripe

Reply via email to