Re: A tool for remapping bad sectors in CURRENT?
Pieter de Goeje writes: > Dag-Erling Smørgrav writes: > > And if you're comfortable *writing* kernel code, I would suggest > > implementing WORF in geom_mirror :) > I am intrigued, what is this WORF you speak of? Write On Read Failure. It means that if you can't read a sector but you have (or can recreate) a copy of the data that's supposed to be on it, you rewrite that data to force the disk to reallocate the sector. I've done this manually several times (dd'ed a sector from the other disk in a mirror). I believe I even posted the procedure at some point; I'll check my archive. DES -- Dag-Erling Smørgrav - d...@des.no ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: A tool for remapping bad sectors in CURRENT?
Dag-Erling Smørgrav wrote: Miroslav Lachman<000.f...@quip.cz> writes: As you can see, there are really two different numbers LBA=79725056 in messages and LBA = 0x04c0826f = 79725167 in SMART log. I don't know how comfortable you are reading kernel code, but I would suggest looking through the atadisk driver to see why the numbers are different. And if you're comfortable *writing* kernel code, I would suggest implementing WORF in geom_mirror :) As I sent to Pieter, I am not a C programmer, so I cannot read kernel code. I was poor webdeveloper before I turned in to sysadmin about 5 years ago. My programming knowledge ends with PHP / SQL / JS and SH coding :) Miroslav Lachman ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: A tool for remapping bad sectors in CURRENT?
On Thursday 18 March 2010 12:11:07 Dag-Erling Smørgrav wrote: > And if you're comfortable *writing* kernel code, I would suggest > implementing WORF in geom_mirror :) I am intrigued, what is this WORF you speak of? Google says it's a certain character from a popular sci-fi show... - Pieter ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: A tool for remapping bad sectors in CURRENT?
Miroslav Lachman <000.f...@quip.cz> writes: > As you can see, there are really two different numbers LBA=79725056 in > messages and LBA = 0x04c0826f = 79725167 in SMART log. I don't know how comfortable you are reading kernel code, but I would suggest looking through the atadisk driver to see why the numbers are different. And if you're comfortable *writing* kernel code, I would suggest implementing WORF in geom_mirror :) DES -- Dag-Erling Smørgrav - d...@des.no ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: A tool for remapping bad sectors in CURRENT?
Gary Jennejohn wrote: On Wed, 17 Mar 2010 12:41:33 +0100 Miroslav Lachman<000.f...@quip.cz> wrote: I absolutely don't understand how you get the number 4 (it is some magic for me :]) but it works! [...] Umm, it's standard C code: 1<< 2 = 4. It's a power of 2, in this case 2 squared. I am not a C programmer, so I didn't understand the syntax. Now it makes sense. Thank you again for the explanation. Miroslav Lachman ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: A tool for remapping bad sectors in CURRENT?
Dag-Erling Smørgrav wrote: Miroslav Lachman<000.f...@quip.cz> writes: Dag-Erling Smørgrav writes: Uh, 79725167 - 63 = 79725104 and 79725104 - 39845888 = 39879216. How did you arrive at 39879105? I am sorry, it was my confusion. My calculation was for *LBA=79725056* reported in messages: ad4: FAILURE - READ_DMA status=51 error=40 LBA=79725056 off-by-111... Are you sure 'smartctl -l error' reports only one error? There is really only one error. The example from my e-mail is half a year old, but the disk is running fine from this time. The error occured at the initial gmirror sync. No more errors shown after rewriting the disk with zeros. As you can see, there are really two different numbers LBA=79725056 in messages and LBA = 0x04c0826f = 79725167 in SMART log. r...@edith ~/# zcat /var/log/messages.3.bz2 | grep LBA Sep 23 23:58:00 edith kernel: ad4: FAILURE - READ_DMA status=51 error=40 LBA=79725056 - r...@edith ~/# smartctl -l error /dev/ad4 smartctl version 5.38 [amd64-portbld-freebsd7.2] Copyright (C) 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF READ SMART DATA SECTION === SMART Error Log Version: 1 ATA Error Count: 1 CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 1 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 6f 82 c0 44 Error: UNC at LBA = 0x04c0826f = 79725167 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- c8 00 00 00 82 c0 44 00 25d+23:23:36.710 READ DMA c8 00 00 00 81 c0 44 00 25d+23:23:36.710 READ DMA c8 00 00 00 80 c0 44 00 25d+23:23:36.710 READ DMA c8 00 00 00 7f c0 44 00 25d+23:23:36.710 READ DMA c8 00 00 00 7e c0 44 00 25d+23:23:36.710 READ DMA Miroslav Lachman ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: A tool for remapping bad sectors in CURRENT?
On Wed, 17 Mar 2010 12:41:33 +0100 Miroslav Lachman <000.f...@quip.cz> wrote: > I absolutely don't understand how you get the number 4 (it is some magic > for me :]) but it works! > > fsdb (inum: 3)> blocks > Blocks for inode 3: > Direct blocks: > 3001 (1 frag) > > 3001 * 4 = 12004 > > fsdb (inum: 3)> findblk 12004 > 12004: data block of inode 3 > > Thank you for this hint! > Umm, it's standard C code: 1 << 2 = 4. It's a power of 2, in this case 2 squared. --- Gary Jennejohn ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: A tool for remapping bad sectors in CURRENT?
Miroslav Lachman <000.f...@quip.cz> writes: > Dag-Erling Smørgrav writes: > > Uh, 79725167 - 63 = 79725104 and 79725104 - 39845888 = 39879216. How > > did you arrive at 39879105? > I am sorry, it was my confusion. > My calculation was for *LBA=79725056* reported in messages: > > ad4: FAILURE - READ_DMA status=51 error=40 > LBA=79725056 off-by-111... Are you sure 'smartctl -l error' reports only one error? DES -- Dag-Erling Smørgrav - d...@des.no ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: A tool for remapping bad sectors in CURRENT?
Gary Jennejohn wrote: On Sun, 14 Mar 2010 17:18:45 +0100 Miroslav Lachman<000.f...@quip.cz> wrote: Gary Jennejohn wrote: On Sun, 14 Mar 2010 10:55:19 +0100 Miroslav Lachman<000.f...@quip.cz> wrote: [big snip] fsdb (inum: 3)> blocks Blocks for inode 3: Direct blocks: 3001 (1 frag) fsdb (inum: 3)> findblk 3001 fsdb (inum: 3)> findblk did not returned inode 3! This is almost guaranteed to be a file system block and not a disk block. Do you mean the number 3001? I am sorry for my ignorance, but it is not clear to me from fsdb manpage what "blocks" means FS block and what disk block. And how can I use (calculate with) this numbers? How can I get the right number to pass to findlbk command (in the example above) to give me back the inode 3? If FS block is 16384 bytes, then it means 16384/512 = 32 disk blocks per FS block. If 3001 is FS block, then it means 3001*32 = 96032 disk block number. Am I right? fsdb (inum: 3)> findblk 96032 fsdb (inum: 3)> Again - findblk did not returned inode 3. So what is the exact formula to get the right findblk number and then right inode number as result of findblk command? I am still lost in terms (words) and numbers :( Well, it's pretty hairy. Looking at findblk() it does this to go from disk block to file system block (this is greatly simplified) file_system_blockno = disk_blockno>> fs_fsbtodb; So conversely, you'd do disk_blockno = file_system_blockno<< fs_fsbtodb. You can get this information using "ffsinfo -l 0x001 -o some_file /dev/ataXY" (using ahci) and grep'ing for fsbtodb in some_file. The 0x001 means to only dump the first super block. I looked at a file system which has default 16kB file system blocks and fsbtodb is 2 ==> *multiply file_system_block by 4 not 32*. This is probably because it's a multiple of a 4kB block, which is the smallest usable file system block size AFAIK. BTW looking at the code leads me to conclude that fsdb will not print out anything if the disk block you're trying to find has bever been allocated to an inode ==> unused disk block, safe to overwrite. This assumes that you calculated the disk block correctly. I absolutely don't understand how you get the number 4 (it is some magic for me :]) but it works! fsdb (inum: 3)> blocks Blocks for inode 3: Direct blocks: 3001 (1 frag) 3001 * 4 = 12004 fsdb (inum: 3)> findblk 12004 12004: data block of inode 3 Thank you for this hint! Miroslav Lachman ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: A tool for remapping bad sectors in CURRENT?
Dag-Erling Smørgrav wrote: Miroslav Lachman<000.f...@quip.cz> writes: The LBA of bad sector is *79725167* [...] s1 starts 63 sectors from the beginning of the drive and /var/db has offset 39845888. So am I right that I need to find block number *39879105* by findblk command? Uh, 79725167 - 63 = 79725104 and 79725104 - 39845888 = 39879216. How did you arrive at 39879105? I am sorry, it was my confusion. My calculation was for *LBA=79725056* reported in messages: ad4: FAILURE - READ_DMA status=51 error=40 LBA=79725056 79725056 - 63 - 39845888 = *39879105* Your calculation is for LBA reported by SMART log 40 51 00 6f 82 c0 44 Error: UNC at LBA = 0x04c0826f = *79725167* That's why I get different result ;) I must pay more attention to the numbers next time! It is interesting that there are two different LBAs for "same" error (appeared at the same time) Miroslav Lachman ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: A tool for remapping bad sectors in CURRENT?
Miroslav Lachman <000.f...@quip.cz> writes: > As I write in my first post to this thread, I already tried fsdb + > findblk, but without success. Findblk did not returned any inode. > Maybe the meaning of block is of different size or something else I > can't understand. AFAICT, "block" is a disk block (i.e. 512-byte sector in most cases) relative to the start of the partition. > The LBA of bad sector is *79725167* [...] s1 starts 63 sectors from > the beginning of the drive and /var/db has offset 39845888. So am I > right that I need to find block number *39879105* by findblk command? Uh, 79725167 - 63 = 79725104 and 79725104 - 39845888 = 39879216. How did you arrive at 39879105? DES -- Dag-Erling Smørgrav - d...@des.no ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: A tool for remapping bad sectors in CURRENT?
On Sun, 14 Mar 2010 17:18:45 +0100 Miroslav Lachman <000.f...@quip.cz> wrote: > Gary Jennejohn wrote: > > On Sun, 14 Mar 2010 10:55:19 +0100 > > Miroslav Lachman<000.f...@quip.cz> wrote: > > > > [big snip] > >> fsdb (inum: 3)> blocks > >> Blocks for inode 3: > >> Direct blocks: > >> 3001 (1 frag) > >> > >> fsdb (inum: 3)> findblk 3001 > >> fsdb (inum: 3)> > >> > >> findblk did not returned inode 3! > >> > > > > This is almost guaranteed to be a file system block and not > > a disk block. > > Do you mean the number 3001? > I am sorry for my ignorance, but it is not clear to me from fsdb manpage > what "blocks" means FS block and what disk block. > > And how can I use (calculate with) this numbers? > > How can I get the right number to pass to findlbk command (in the > example above) to give me back the inode 3? > > If FS block is 16384 bytes, then it means 16384/512 = 32 disk blocks per > FS block. > > If 3001 is FS block, then it means 3001*32 = 96032 disk block number. Am > I right? > > fsdb (inum: 3)> findblk 96032 > fsdb (inum: 3)> > > Again - findblk did not returned inode 3. > > So what is the exact formula to get the right findblk number and then > right inode number as result of findblk command? > > I am still lost in terms (words) and numbers :( > Well, it's pretty hairy. Looking at findblk() it does this to go from disk block to file system block (this is greatly simplified) file_system_blockno = disk_blockno >> fs_fsbtodb; So conversely, you'd do disk_blockno = file_system_blockno << fs_fsbtodb. You can get this information using "ffsinfo -l 0x001 -o some_file /dev/ataXY" (using ahci) and grep'ing for fsbtodb in some_file. The 0x001 means to only dump the first super block. I looked at a file system which has default 16kB file system blocks and fsbtodb is 2 ==> *multiply file_system_block by 4 not 32*. This is probably because it's a multiple of a 4kB block, which is the smallest usable file system block size AFAIK. BTW looking at the code leads me to conclude that fsdb will not print out anything if the disk block you're trying to find has bever been allocated to an inode ==> unused disk block, safe to overwrite. This assumes that you calculated the disk block correctly. --- Gary Jennejohn ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: A tool for remapping bad sectors in CURRENT?
Gary Jennejohn wrote: On Sun, 14 Mar 2010 10:55:19 +0100 Miroslav Lachman<000.f...@quip.cz> wrote: [big snip] fsdb (inum: 3)> blocks Blocks for inode 3: Direct blocks: 3001 (1 frag) fsdb (inum: 3)> findblk 3001 fsdb (inum: 3)> findblk did not returned inode 3! This is almost guaranteed to be a file system block and not a disk block. Do you mean the number 3001? I am sorry for my ignorance, but it is not clear to me from fsdb manpage what "blocks" means FS block and what disk block. And how can I use (calculate with) this numbers? How can I get the right number to pass to findlbk command (in the example above) to give me back the inode 3? If FS block is 16384 bytes, then it means 16384/512 = 32 disk blocks per FS block. If 3001 is FS block, then it means 3001*32 = 96032 disk block number. Am I right? fsdb (inum: 3)> findblk 96032 fsdb (inum: 3)> Again - findblk did not returned inode 3. So what is the exact formula to get the right findblk number and then right inode number as result of findblk command? I am still lost in terms (words) and numbers :( Miroslav Lachman ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: A tool for remapping bad sectors in CURRENT?
On Sun, 14 Mar 2010 10:55:19 +0100 Miroslav Lachman <000.f...@quip.cz> wrote: [big snip] > fsdb (inum: 3)> blocks > Blocks for inode 3: > Direct blocks: > 3001 (1 frag) > > fsdb (inum: 3)> findblk 3001 > fsdb (inum: 3)> > > findblk did not returned inode 3! > This is almost guaranteed to be a file system block and not a disk block. --- Gary Jennejohn ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: A tool for remapping bad sectors in CURRENT?
Dag-Erling Smørgrav wrote: Miroslav Lachman<000.f...@quip.cz> writes: So... can somebody with enough knowledge write some docs / script how to find the affected file based on LBA read error from messages / SMART log? ZFS will tell you straight away, but I guess if you used ZFS, you wouldn't be asking :) Yes, but we have ZFS only on two servers, others are using UFS2 (some with gmirror, some with gjournal) For FFS, you can unmount the file system (boot from a CD or memory stick or whatever if that file system is / or /usr), run fsdb on the failing disk, use findblk to look up the inode number for the file that contains the bad sector. Note that you have to convert the LBA to an offset relative to the start of the partition. As I write in my first post to this thread, I already tried fsdb + findblk, but without success. Findblk did not returned any inode. Maybe the meaning of block is of different size or something else I can't understand. So can you please show me some real world example? I have one from the past: __ /var/log/messages: Sep 23 23:58:00 edith kernel: ad4: FAILURE - READ_DMA status=51 error=40 LBA=79725056 Sep 23 23:58:00 edith kernel: GEOM_MIRROR: Request failed (error=5). ad4[READ(offset=40819228672, length=131072)] __ SMART log: After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 6f 82 c0 44 Error: UNC at LBA = 0x04c0826f = 79725167 The LBA of bad sector is *79725167* __ Information about disk slices: sysid 165 (0xa5),(FreeBSD/NetBSD/386BSD) start 63, size 209712447 (102398 Meg), flag 80 (active) beg: cyl 0/ head 1/ sector 1; end: cyl 1023/ head 254/ sector 63 The data for partition 2 is: sysid 165 (0xa5),(FreeBSD/NetBSD/386BSD) start 209712510, size 1743807555 (851468 Meg), flag 0 beg: cyl 1023/ head 255/ sector 63; end: cyl 1023/ head 254/ sector 63 __ According to LBA and size of s1, I thing the error is in s1 # /dev/mirror/gm0s1: 8 partitions: #size offsetfstype [fsize bsize bps/cpg] a: 20971520/ b: 25165824 2097152swap c: 2097124470 d: 12582912 27262976/var e: 146800640 39845888 /var/db f: 16777216 186646528 /usr g: 6288703 203423744 /tmp And LBA 79725056 is on */var/db* (between offset 39845888 and 186646528) __ s1 starts 63 sectors from the beginning of the drive and /var/db has offset 39845888. So am I right that I need to find block number *39879105* by findblk command? LBA err - s1 start - /var/db offset = findblk inside /dev/mirror/gm0s1e 79725056 - 63 - 39845888 = 39879105 __ /# fsdb -r /dev/mirror/gm0s1e ** /dev/mirror/gm0s1e (NO WRITE) Examining file system '/dev/mirror/gm0s1e' Last Mounted on /var/db current inode: directory I=2 MODE=40755 SIZE=512 BTIME=May 1 08:07:23 2009 [0 nsec] MTIME=Sep 24 15:52:01 2009 [0 nsec] CTIME=Sep 24 15:52:01 2009 [0 nsec] ATIME=Sep 24 16:24:34 2009 [0 nsec] OWNER=root GRP=wheel LINKCNT=11 FLAGS=0 BLKCNT=4 GEN=4ebc65fc findblk 39879105 findblk 39879106 findblk 39879107 findblk 39879108 . . I tried more than 256 incrementing block numbers, but findblk didn't found any inode! (length=131072 in error message means 256 sectors, right?) So there must be some misunderstanding on my part and that's why I am asking for some step-by-step documentation or script "how to find file by LBA read error message" I tried the fsdb + findblk on well known data, but again without success. I created file /tmp/test.txt, it has inum 3, than I use fsdb on gm0s1f (gm0s1f is mounted as /tmp). Command "inode 3" inside fsdb prompt returned informations about this file, command "blocks" returned 3001 as block number, but command "findblk 3001" returned nothing instead of inum 3! Where is the error? What I am doing wrong? __ ~/# echo test > /tmp/test.txt ~/# ls -i /tmp/test.txt 3 /tmp/test.txt ~/# fsdb -r /dev/mirror/gm0s1f ** /dev/mirror/gm0s1f (NO WRITE) Examining file system '/dev/mirror/gm0s1f' Last Mounted on /tmp current inode: directory I=2 MODE=41777 SIZE=512 BTIME=Feb 7 18:32:22 2008 [0 nsec] MTIME=Mar 14 10:33:22 2010 [0 nsec] CTIME=Mar 14 10:33:22 2010 [0 nsec] ATIME=Mar 14 10:33:35 2010 [0 nsec] OWNER=root GRP=wheel LINKCNT=7 FLAGS=0 BLKCNT=4 GEN=3f7c9384 fsdb (inum: 2)> inode 3 current inode: regular file I=3 MODE=100644 SIZE=5 BTIME=Mar 14 10:33:22 2010 [0 nsec] MTIME=Mar 14 10:33:22 2010 [0 nsec] CTIME=Mar 14 10:33:22 2010 [0 nsec] ATIME=Mar 14 10:33:22 2010 [0 nsec] OWNER=root GRP=wheel LINKCNT=1 FLAGS=0 BLKCNT=4 GEN=45c26de1 fsdb (inum: 3)> blocks Blocks for inode 3: Direct blocks: 3001 (1 frag) fsdb (inum: 3)> findblk 3001 fsdb (inum: 3)> findblk did not returned inode 3! Unfortunately, you can't e
Re: A tool for remapping bad sectors in CURRENT?
Miroslav Lachman <000.f...@quip.cz> writes: > So... can somebody with enough knowledge write some docs / script how > to find the affected file based on LBA read error from messages / > SMART log? ZFS will tell you straight away, but I guess if you used ZFS, you wouldn't be asking :) For FFS, you can unmount the file system (boot from a CD or memory stick or whatever if that file system is / or /usr), run fsdb on the failing disk, use findblk to look up the inode number for the file that contains the bad sector. Note that you have to convert the LBA to an offset relative to the start of the partition. Unfortunately, you can't easily go from inode to file name; you have to mount the file system and use something like find -inum. DES -- Dag-Erling Smørgrav - d...@des.no ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: A tool for remapping bad sectors in CURRENT?
Dag-Erling Smørgrav wrote: Miroslav Lachman<000.f...@quip.cz> writes: Yes, rewriting by dd or any other way works for reallocating or clearing pending sectors counter, but in server environment In a server environment, you'd be a fool not to have some sort of redundancy set up. I am using gmirror on low-end servers, so rewriting some sectors on one disk drive is useless and in this case I prefer resync of the whole gmirror (but it is log run - about 10 hours on busy server) I need to know the affected file, as it can be for example database file and then it is a big problem! Rewriting the sector inside InnoDB ib_data file can cause DB crash, data loss etc. How is that different from *not* rewriting the sector? If there's a bad sector somewhere in your data, your database is still going to crash. It is not about "different", it is about "I need to know the affected file" to know what actions should be taken. If it is some logfile, I can delete it and then rewrite the sector. If it is some "normal" unchanged file, I can restore it from backup, if it is database file, I must take some special action. For example, stop DB engine, try to repair/fix the DB file, dump & restore etc. So the first step is to find "what file is affected", then take some action AND rewrite the sector by dd to reallocate the sector. (or replace the drive) So... can somebody with enough knowledge write some docs / script how to find the affected file based on LBA read error from messages / SMART log? Miroslav Lachman ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: A tool for remapping bad sectors in CURRENT?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 11.03.2010 16:21, Dag-Erling Smørgrav wrote: > Miroslav Lachman <000.f...@quip.cz> writes: >> Yes, rewriting by dd or any other way works for reallocating or >> clearing pending sectors counter, but in server environment > > In a server environment, you'd be a fool not to have some sort of > redundancy set up. > >> I need to know the affected file, as it can be for example database >> file and then it is a big problem! Rewriting the sector inside InnoDB >> ib_data file can cause DB crash, data loss etc. > > How is that different from *not* rewriting the sector? If there's a bad > sector somewhere in your data, your database is still going to crash. Only if he hasn't listened to your first advice and set it up on a non-redundant IO solution. If he's set it up on proper hardware, he'll just get a friendly mail about replacing said disk next time he's in the serverroom with a new fresh hostpare. //Svein - -- - +---+--- /"\ |Svein Skogen | sv...@d80.iso100.no \ / |Solberg Østli 9| PGP Key: 0xE5E76831 X|2020 Skedsmokorset | sv...@jernhuset.no / \ |Norway | PGP Key: 0xCE96CE13 | | sv...@stillbilde.net ascii | | PGP Key: 0x58CD33B6 ribbon |System Admin | svein-listm...@stillbilde.net Campaign|stillbilde.net | PGP Key: 0x22D494A4 +---+--- |msn messenger: | Mobile Phone: +47 907 03 575 |sv...@jernhuset.no | RIPE handle:SS16503-RIPE - +---+--- If you really are in a hurry, mail me at svein-mob...@stillbilde.net This mailbox goes directly to my cellphone and is checked even when I'm not in front of my computer. - Picture Gallery: https://gallery.stillbilde.net/v/svein/ - -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.12 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkuZCyQACgkQODUnwSLUlKS2ngCgqF+bE4SqHC39lYAoMpQG1Ysb IzcAoLusP1O4LV0CDoq3GSXjV3YGDLDk =Ljac -END PGP SIGNATURE- ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: A tool for remapping bad sectors in CURRENT?
Miroslav Lachman <000.f...@quip.cz> writes: > Yes, rewriting by dd or any other way works for reallocating or > clearing pending sectors counter, but in server environment In a server environment, you'd be a fool not to have some sort of redundancy set up. > I need to know the affected file, as it can be for example database > file and then it is a big problem! Rewriting the sector inside InnoDB > ib_data file can cause DB crash, data loss etc. How is that different from *not* rewriting the sector? If there's a bad sector somewhere in your data, your database is still going to crash. DES -- Dag-Erling Smørgrav - d...@des.no ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: A tool for remapping bad sectors in CURRENT?
On Mon, 08.03.2010 at 13:09:19 +0200, Eugene Dzhurinsky wrote: > On Mon, Mar 08, 2010 at 12:52:43PM +0200, Eugene Dzhurinsky wrote: > > dd if=/dev/ad4 of=/dev/null skip=222342559 bs=512 count=1 > > dd: /dev/ad4: Input/output error > > 0+0 records in > > 0+0 records out > > 0 bytes transferred in 2.351940 secs (0 bytes/sec) > > > > dd if=/dev/zero of=/dev/ad4 seek=222342559 bs=512 count=1 > > dd: /dev/ad4: Operation not permitted > > > > Should I do it in single mode? > > sysctl kern.geom.debugflags=0x10 > > Did the trick, I was able to write directly to the sector, and now it seems to > work well. No remaps recorded thus, but no errors so far. It's too late now, but you really should have gone with something like # recoverdisk /dev/ad4 /dev/ad4 this will re-write all blocks on the disk, it may fail at reading block 222342559, but there's a chance that the disk error correction gets it right after a couple of times. Regards, Uli ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: A tool for remapping bad sectors in CURRENT?
08.03.2010 13:29, Eugeny N Dzhurinsky пишет: Hello, all! Recently I've started to see the following logs in messages: Mar 8 12:00:24 localhost smartd[795]: Device: /dev/ad4, 2 Currently unreadable (pending) sectors Mar 8 12:00:24 localhost smartd[795]: Device: /dev/ad4, 2 Offline uncorrectable sectors smartctl did really show that something is wrong with my HDD, but still no remaps - just read errors. SMART Self-test log structure revision number 1 Num Test_DescriptionStatus Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offlineCompleted: read failure 60% 1198 222342559 # 2 Extended offlineCompleted: read failure 60% 1187 222342557 # 3 Extended offlineCompleted: read failure 60% 1180 222342559 # 4 Short offline Completed without error 00% 1178 - # 5 Extended offlineAborted by host 90% 1178 - and ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE ... Reallocated_Sector_Ct 0x0033 100 100 036Pre-fail Always - 0 ... Now can I find out which file owns the LBAs 222342557 and 222342559 ? How do I force remapping of these sectors? I assume that I have to write something directly to the sectors? use mhdd ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: A tool for remapping bad sectors in CURRENT?
Eugene Dzhurinsky wrote: On Mon, Mar 08, 2010 at 12:21:44PM +0100, Miroslav Lachman wrote: Eugeny N Dzhurinsky wrote: We have this problem from time to time on bunch of machines. As we are using gmirror, the easiest way is to force re-synchronization (rewrite) of the whole drive. The problem is when there are Pending unreadable sectors on both drives - it ends up with read error and some file(s) are corrupted, but there is no easy way (on FreeBSD) to find what file. I tried it in the past with fsdb / findblk, but it does not work as I expect or I do not fully understand the needed calculations with slices + partitions offsets / LBAs and right meaning of the term "block". It seems there are several meaning in different contexts. It would be nice if somebody with enough FS / GEOM knowledge can write some HowTo or shell script to do the calculations and operations to find file containing bad sector(s) and put it in FAQ, Handbook, or Wiki. Miroslav, thank you for the suggestion - but I am not using gmirror, that HDD is the one on my laptop. However suggestions about using dd to write something into bad block to force IDE controller do it's service stuff about remapping seems did the trick. And I was able to not calculate LBA but use it as block offset, which seemed to be correct way :) Yes, rewriting by dd or any other way works for reallocating or clearing pending sectors counter, but in server environment I need to know the affected file, as it can be for example database file and then it is a big problem! Rewriting the sector inside InnoDB ib_data file can cause DB crash, data loss etc. Miroslav Lachman ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: A tool for remapping bad sectors in CURRENT?
On Mon, Mar 08, 2010 at 12:21:44PM +0100, Miroslav Lachman wrote: > Eugeny N Dzhurinsky wrote: > We have this problem from time to time on bunch of machines. As we are > using gmirror, the easiest way is to force re-synchronization (rewrite) > of the whole drive. The problem is when there are Pending unreadable > sectors on both drives - it ends up with read error and some file(s) are > corrupted, but there is no easy way (on FreeBSD) to find what file. > > I tried it in the past with fsdb / findblk, but it does not work as I > expect or I do not fully understand the needed calculations with slices > + partitions offsets / LBAs and right meaning of the term "block". It > seems there are several meaning in different contexts. > > It would be nice if somebody with enough FS / GEOM knowledge can write > some HowTo or shell script to do the calculations and operations to find > file containing bad sector(s) and put it in FAQ, Handbook, or Wiki. Miroslav, thank you for the suggestion - but I am not using gmirror, that HDD is the one on my laptop. However suggestions about using dd to write something into bad block to force IDE controller do it's service stuff about remapping seems did the trick. And I was able to not calculate LBA but use it as block offset, which seemed to be correct way :) -- Eugene N Dzhurinsky pgpRqAzvt8cSz.pgp Description: PGP signature
Re: A tool for remapping bad sectors in CURRENT?
On Mon, 8 Mar 2010, Miroslav Lachman wrote: > Eugeny N Dzhurinsky wrote: > > Hello, all! > > > > Recently I've started to see the following logs in messages: > > > > Mar 8 12:00:24 localhost smartd[795]: Device: /dev/ad4, 2 Currently > > unreadable (pending) sectors > > Mar 8 12:00:24 localhost smartd[795]: Device: /dev/ad4, 2 Offline > > uncorrectable sectors > > > > smartctl did really show that something is wrong with my HDD, but still no > > remaps - just read errors. > > > > SMART Self-test log structure revision number 1 > > Num Test_DescriptionStatus Remaining LifeTime(hours) > > LBA_of_first_error > > # 1 Extended offlineCompleted: read failure 60% 1198 > > 222342559 > > # 2 Extended offlineCompleted: read failure 60% 1187 > > 222342557 > > # 3 Extended offlineCompleted: read failure 60% 1180 > > 222342559 > > # 4 Short offline Completed without error 00% 1178 > > - > > # 5 Extended offlineAborted by host 90% 1178 > > - > > > > and > > > > ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED > > WHEN_FAILED RAW_VALUE > > ... > > Reallocated_Sector_Ct 0x0033 100 100 036Pre-fail Always - > > 0 > > ... > > > > Now can I find out which file owns the LBAs 222342557 and 222342559 ? How do > > I > > force remapping of these sectors? I assume that I have to write something > > directly to the sectors? > > We have this problem from time to time on bunch of machines. As we are using > gmirror, the easiest way is to force re-synchronization (rewrite) of the whole > drive. The problem is when there are Pending unreadable sectors on both drives > - it ends up with read error and some file(s) are corrupted, but there is no > easy way (on FreeBSD) to find what file. *cough* zfs *cough* I believe this kind of silent corruption is precisely what zfs was designed to prevent. Even though you do have a mirror, how do you know which copy is the correct one? If one drive re-allocates the sector silently, what is the recovery method? If gmirror synchronizes, how do you make sure that the *good* copy is the one synchronized? You'll notice it eventually if you see it in a garbled file, but how does the filesystem handle it? > I tried it in the past with fsdb / findblk, but it does not work as I expect > or I do not fully understand the needed calculations with slices + partitions > offsets / LBAs and right meaning of the term "block". It seems there are > several meaning in different contexts. > > It would be nice if somebody with enough FS / GEOM knowledge can write some > HowTo or shell script to do the calculations and operations to find file > containing bad sector(s) and put it in FAQ, Handbook, or Wiki. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: A tool for remapping bad sectors in CURRENT?
Eugeny N Dzhurinsky wrote: Hello, all! Recently I've started to see the following logs in messages: Mar 8 12:00:24 localhost smartd[795]: Device: /dev/ad4, 2 Currently unreadable (pending) sectors Mar 8 12:00:24 localhost smartd[795]: Device: /dev/ad4, 2 Offline uncorrectable sectors smartctl did really show that something is wrong with my HDD, but still no remaps - just read errors. SMART Self-test log structure revision number 1 Num Test_DescriptionStatus Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offlineCompleted: read failure 60% 1198 222342559 # 2 Extended offlineCompleted: read failure 60% 1187 222342557 # 3 Extended offlineCompleted: read failure 60% 1180 222342559 # 4 Short offline Completed without error 00% 1178 - # 5 Extended offlineAborted by host 90% 1178 - and ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE ... Reallocated_Sector_Ct 0x0033 100 100 036Pre-fail Always - 0 ... Now can I find out which file owns the LBAs 222342557 and 222342559 ? How do I force remapping of these sectors? I assume that I have to write something directly to the sectors? We have this problem from time to time on bunch of machines. As we are using gmirror, the easiest way is to force re-synchronization (rewrite) of the whole drive. The problem is when there are Pending unreadable sectors on both drives - it ends up with read error and some file(s) are corrupted, but there is no easy way (on FreeBSD) to find what file. I tried it in the past with fsdb / findblk, but it does not work as I expect or I do not fully understand the needed calculations with slices + partitions offsets / LBAs and right meaning of the term "block". It seems there are several meaning in different contexts. It would be nice if somebody with enough FS / GEOM knowledge can write some HowTo or shell script to do the calculations and operations to find file containing bad sector(s) and put it in FAQ, Handbook, or Wiki. Miroslav Lachman ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: A tool for remapping bad sectors in CURRENT?
On Mon, Mar 08, 2010 at 12:52:43PM +0200, Eugene Dzhurinsky wrote: > dd if=/dev/ad4 of=/dev/null skip=222342559 bs=512 count=1 > dd: /dev/ad4: Input/output error > 0+0 records in > 0+0 records out > 0 bytes transferred in 2.351940 secs (0 bytes/sec) > > dd if=/dev/zero of=/dev/ad4 seek=222342559 bs=512 count=1 > dd: /dev/ad4: Operation not permitted > > Should I do it in single mode? sysctl kern.geom.debugflags=0x10 Did the trick, I was able to write directly to the sector, and now it seems to work well. No remaps recorded thus, but no errors so far. Thanks a lot! -- Eugene N Dzhurinsky pgpbcuR5aO8BX.pgp Description: PGP signature
Re: A tool for remapping bad sectors in CURRENT?
On Mon, Mar 08, 2010 at 10:51:22AM +, Poul-Henning Kamp wrote: > I would suggest you boot single-user and run > > mdmfs -s 1m md /tmp > recoverdisk -w /tmp/_.wl /dev/ad4 /dev/ad4 > > That will find out how many bad sectors you have and try to recover > the contents of them if possible, leave it running as long as you > care for. > > If you interrupt it, the /tmp/_.wl file will contain a list of areas > not yet successfully read/written. Well, I just want to force IDE drive to remap things :) -- Eugene N Dzhurinsky pgp0a7Gsgnv97.pgp Description: PGP signature
Re: A tool for remapping bad sectors in CURRENT?
On Mon, Mar 08, 2010 at 12:31:24PM +0200, Alexander Motin wrote: > You may try to overwrite these sectors with dd. It should trigger sector > reallocation. To be sure, you may read them before and after the write. dd if=/dev/ad4 of=/dev/null skip=222342559 bs=512 count=1 dd: /dev/ad4: Input/output error 0+0 records in 0+0 records out 0 bytes transferred in 2.351940 secs (0 bytes/sec) dd if=/dev/zero of=/dev/ad4 seek=222342559 bs=512 count=1 dd: /dev/ad4: Operation not permitted Should I do it in single mode? -- Eugene N Dzhurinsky pgpMrducrommM.pgp Description: PGP signature
Re: A tool for remapping bad sectors in CURRENT?
In message <20100308102918.ga5...@localhost>, Eugeny N Dzhurinsky writes: >Now can I find out which file owns the LBAs 222342557 and 222342559 ? >How do I force remapping of these sectors? I assume that I have to write >something directly to the sectors? I would suggest you boot single-user and run mdmfs -s 1m md /tmp recoverdisk -w /tmp/_.wl /dev/ad4 /dev/ad4 That will find out how many bad sectors you have and try to recover the contents of them if possible, leave it running as long as you care for. If you interrupt it, the /tmp/_.wl file will contain a list of areas not yet successfully read/written. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 p...@freebsd.org | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: A tool for remapping bad sectors in CURRENT?
Eugeny N Dzhurinsky wrote: > Recently I've started to see the following logs in messages: > > Mar 8 12:00:24 localhost smartd[795]: Device: /dev/ad4, 2 Currently > unreadable (pending) sectors > Mar 8 12:00:24 localhost smartd[795]: Device: /dev/ad4, 2 Offline > uncorrectable sectors > > smartctl did really show that something is wrong with my HDD, but still no > remaps - just read errors. > > SMART Self-test log structure revision number 1 > Num Test_DescriptionStatus Remaining LifeTime(hours) > LBA_of_first_error > # 1 Extended offlineCompleted: read failure 60% 1198 > 222342559 > # 2 Extended offlineCompleted: read failure 60% 1187 > 222342557 > # 3 Extended offlineCompleted: read failure 60% 1180 > 222342559 > # 4 Short offline Completed without error 00% 1178 - > # 5 Extended offlineAborted by host 90% 1178 - > > and > > ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED > WHEN_FAILED RAW_VALUE > ... > Reallocated_Sector_Ct 0x0033 100 100 036Pre-fail Always - > 0 > ... > > Now can I find out which file owns the LBAs 222342557 and 222342559 ? How do I > force remapping of these sectors? I assume that I have to write something > directly to the sectors? You may try to overwrite these sectors with dd. It should trigger sector reallocation. To be sure, you may read them before and after the write. -- Alexander Motin ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
A tool for remapping bad sectors in CURRENT?
Hello, all! Recently I've started to see the following logs in messages: Mar 8 12:00:24 localhost smartd[795]: Device: /dev/ad4, 2 Currently unreadable (pending) sectors Mar 8 12:00:24 localhost smartd[795]: Device: /dev/ad4, 2 Offline uncorrectable sectors smartctl did really show that something is wrong with my HDD, but still no remaps - just read errors. SMART Self-test log structure revision number 1 Num Test_DescriptionStatus Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offlineCompleted: read failure 60% 1198 222342559 # 2 Extended offlineCompleted: read failure 60% 1187 222342557 # 3 Extended offlineCompleted: read failure 60% 1180 222342559 # 4 Short offline Completed without error 00% 1178 - # 5 Extended offlineAborted by host 90% 1178 - and ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE ... Reallocated_Sector_Ct 0x0033 100 100 036Pre-fail Always - 0 ... Now can I find out which file owns the LBAs 222342557 and 222342559 ? How do I force remapping of these sectors? I assume that I have to write something directly to the sectors? Thank you all in advance! -- Eugene N Dzhurinsky pgpqQ6jlbF1Sg.pgp Description: PGP signature