Re: How to mark a block as invalid ?
Hello, Thank you all for your answers. First I would like to understand better what's happening. According to what I read, there are no block in the disk itself, they refer to the word sector. Then, the OS, here OpenBSD format it with a block size. So from a physical point of view I have faulty sectors on my disk right ? I bought this disk about a year ago. And ok, I write in a few files (~1000 rrd files) every minute all year long. I'm surprise that you guys ask me to throw the disk away because a few blocks out of thousands are faulty. Smartmontools doesn't complain about my disk # smartctl -H /dev/wd1c === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED # smartctl -l selftest /dev/wd1c === START OF READ SMART DATA SECTION === SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Interrupted (host reset) 90% 25468 - Josh do you know which program I should use in sleuthkit suite ? I'm very interested in knowing which files I've lost And do you know which badblocks option I should use to follow your suggestion ? At the end I will follow your recomendations and buy a new disk for my data. But I'll keep this one for a test server and of course I'll put a red sticker on the disk and write faulty disk PS: Lee your mail server is rejecting my mails Access denied (in reply to MAIL FROM command) De : Josh Grosse j...@jggimi.homeip.net À : misc@openbsd.org misc@openbsd.org Envoyé le : Dimanche 18 août 2013 3h26 Objet : Re: How to mark a block as invalid ? On Sat, Aug 17, 2013 at 10:51:36PM +0100, Mik J wrote: Hello, In my message log file I have /bsd: wd1g: uncorrectable data error reading fsbn 27690576 of 27690560-27690591 (wd1 bn 1951859792; cn 121497 tn 166 sn 29), retrying I used the badblocks utility an checked the whole disk and only this block number is faulty. I tried to override it with zeros but no luck, impossible. Since I believe my disk is ok, I would like it to avoid realocating the block number 27690576. How can I do it ? The OS reported that it failed to read the bad block -- therefore, the block is allocated -- to a file, directory, socket, or fifo. As the data within the block is now lost, the only recovery is from backup. I don't believe the OS has any built-in tools that can determine ownership from a block number. The error could also have been produced if you or another admin just happened to be reading blocks outside of filesystem control, such as using dd(1) with if=/dev/... in that case, the block might be unallocated. I haven't used it in some time, but if I recall correctly, sysutils/sleuthkit may be helpful in identifying block ownership. The badblocks program from sysutils/e2fsprogs can be helpful in forcing a drive to reallocate the LBA from its set of spare blocks. The data will be lost, but the bad block LBA will become good again.
Re: How to mark a block as invalid ?
On Sun, Aug 18, 2013 at 01:00:07PM +0100, Mik J wrote: First I would like to understand better what's happening. According to what I read, there are no block in the disk itself, they refer to the word sector. Then, the OS, here OpenBSD format it with a block size. So from a physical point of view I have faulty sectors on my disk right ? My apologies for the confusion. On the drive hardware, one may consider sector, block, and LBA (Logical Block Address) as equivalent terms for the same thing -- an addressable storage location on the hard drive. This is different than filesystem data units -- in FFS, clusters and fragments -- which contain multiple sectors, and therefore you may be confusing with block. I bought this disk about a year ago. And ok, I write in a few files (~1000 rrd files) every minute all year long. I'm surprise that you guys ask me to throw the disk away because a few blocks out of thousands are faulty. Smartmontools doesn't complain about my disk # smartctl -H /dev/wd1c === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED # smartctl -l selftest? /dev/wd1c === START OF READ SMART DATA SECTION === SMART Self-test log structure revision number 1 Num? Test_Description??? Status? Remaining? LifeTime(hours)? LBA_of_first_error # 1? Short offline?? Interrupted (host reset)? 90% 25468 - The short offline test is a test of drive electronics and mechanics, but does not test media. And your short offline test did not complete successfully, the test was stopped prior to completion. A long offline test is a read test of media. Both tests are done by the drive electronics. Josh do you know which program I should use in sleuthkit suite ? I'm very interested in knowing which files I've lost Unfortunately, it's been too long, and there are many sleuthkit utilities. You might consider running sleuthkit.org's Autopsy webserver, which automates the toolset. I've used it once or twice, and I would recommend it to occasional sleuthkit users. And do you know which badblocks option I should use to follow your suggestion ? At the end I will follow your recomendations and buy a new disk for my data. But I'll keep this one for a test server and of course I'll put a red sticker on the disk and write faulty disk Bad sectors (bad blocks, bad LBAs) can occur for a wide variety of reasons. The risk to continued use of the drive is only that your particular reason cannot be determined with assurance. Drive electronics will automatically replace bad sectors from a built-in set of spares. To to that, the drive has to note that the sector is bad, (the drive's electronics may report this to smartmontools as non-zero pinned sectors) and then have a new write occur to the bad sector. The badblocks -w option writes and reads a variety of data patterns to each sector on the drive (or to a subset of the drive) and this activity can clear bad blocks by causing the drive to substitute spare sector(s) for bad ones, using the same Logical Block Address (LBA) as the unusable sectors. There is a badblocks -n option which conducts a non-destructive write test, which I have not used but I assume restores the contents after the pattern has been written and read.
Re: How to mark a block as invalid ?
On 08/18/13 08:00, Mik J wrote: Hello, Thank you all for your answers. First I would like to understand better what's happening. According to what I read, there are no block in the disk itself, they refer to the word sector. Then, the OS, here OpenBSD format it with a block size. So from a physical point of view I have faulty sectors on my disk right ? I bought this disk about a year ago. And ok, I write in a few files (~1000 rrd files) every minute all year long. I'm surprise that you guys ask me to throw the disk away because a few blocks out of thousands are faulty. Smartmontools doesn't complain about my disk [snip] Once a disk sprouts an error, it cannot be trusted. This has always been true, but in an era of multi hundred G disks, any tiny particles floating around in a disk resembles large rocks pummeling the insides of a disk. You MAY be OK and have just a few bad sectors but the uneasy question is, will you get more, how many, and where? Not to mention all the other failures that can happen. I have had disks with bad sectors that I mapped out by never touching files where the bad spots were, and had the disk live for five years. I've also seen a case where a friend showed me a disk that had one bad sector on it, and tens of thousands the next day. You are dancing on a volcano. I hope it doesn't erupt on you. Make backups. Rsync is a good friend. Really. --STeve Andre'
Re: How to mark a block as invalid ?
2013/8/17 Mik J mikyde...@yahoo.fr: I used the badblocks utility an checked the whole disk and only this block number is faulty. What do the smartmontools tell you? Since I believe my disk is ok It is not. Do you have backups? Best Martin
Re: How to mark a block as invalid ?
On Sat, Aug 17, 2013 at 10:51:36PM +0100, Mik J wrote: Hello, In my message log file I have /bsd: wd1g: uncorrectable data error reading fsbn 27690576 of 27690560-27690591 (wd1 bn 1951859792; cn 121497 tn 166 sn 29), retrying I used the badblocks utility an checked the whole disk and only this block number is faulty. I tried to override it with zeros but no luck, impossible. Since I believe my disk is ok, I would like it to avoid realocating the block number 27690576. How can I do it ? The OS reported that it failed to read the bad block -- therefore, the block is allocated -- to a file, directory, socket, or fifo. As the data within the block is now lost, the only recovery is from backup. I don't believe the OS has any built-in tools that can determine ownership from a block number. The error could also have been produced if you or another admin just happened to be reading blocks outside of filesystem control, such as using dd(1) with if=/dev/... in that case, the block might be unallocated. I haven't used it in some time, but if I recall correctly, sysutils/sleuthkit may be helpful in identifying block ownership. The badblocks program from sysutils/e2fsprogs can be helpful in forcing a drive to reallocate the LBA from its set of spare blocks. The data will be lost, but the bad block LBA will become good again.