Re: [reiserfs-list] 'let the hdd remap the bad blocks'
Oleg Drokin [EMAIL PROTECTED] writes: Basically uyou'd better search for this on HDD vendors sites. What's going on is simply can be described this way: You write some block to HDD, if HDD decides the block is bad for some reason and remapping is allowed (usually by tiurning on SMART), block is written to different on-platter location and drive adds one more entry to its remaped-blocks list. Next time you read this block, drive consults its remapped blocks list and if block is remapped, reads it from new location with correct content. Described mechanism works for writing. Actually I've seen something that looks like remapping on read, though I have no meaningful explanation for that (except that they may have some extra redundant info stored when you write data to disk, so that if sector cannot be read, its content is restored with that redundant information and sector is then remapped.). And this process takes a lot of time. My Fujitsu MAH-3182MP drive (SCSI actually) had ARRE enabled as it shipped, but ARWE disabled, for reasons I cannot tell, not even from the data book (PDF). That's Automatic Remap on Read/Write Error. I'm not sure what it really means, but if the drive really remaps on a read error, it's going to leak a block at power loss while it is amidst a block write the next time this block is read. So I switched that to do ARWE. IDE users are not too lucky unless their vendor provides them with a tool (and not many ship raw floppy images, many have some multi-MB Windoze tools just to write some hundred kByte to a floppy disk...) -- Matthias Andree
[reiserfs-list] Memory Corruption
Hi, I'm chasing a wierd memory corruption problem on a ppc64 system. The first byte of a slab_t structure keeps getting stepped on (zeroed, actually.) This happens during a testcase that copies a large file called junk between file systems (a mix of ext2 and reiser) on a 2.4.13 kernel. I know that's REALLY REALLY old, but it's whats in SuSE's SLES-7 release that we have customers running... In every case, the page immediately preceding the slab_t has exactly the same data in it, and it looks like some kind of directory structure (note the presence of the word junk, along with .. and . towards the end.) C00037008E00: FD8C0600 FE8C0600 FF8C0600 008D0600 C00037008E10: 018D0600 028D0600 038D0600 048D0600 C00037008E20: 058D0600 068D0600 078D0600 088D0600 C00037008E30: 098D0600 0A8D0600 0B8D0600 0C8D0600 C00037008E40: 0D8D0600 0E8D0600 0F8D0600 108D0600 C00037008E50: 118D0600 128D0600 138D0600 148D0600 C00037008E60: 158D0600 168D0600 178D0600 188D0600 C00037008E70: 198D0600 1A8D0600 1B8D0600 1C8D0600 C00037008E80: 1D8D0600 1E8D0600 1F8D0600 208D0600 C00037008E90: 218D0600 228D0600 238D0600 248D0600 ! # $ C00037008EA0: 258D0600 268D0600 278D0600 288D0600 % ' ( C00037008EB0: 298D0600 2A8D0600 2B8D0600 2C8D0600 ) * + , C00037008EC0: 2D8D0600 2E8D0600 2F8D0600 308D0600 - . / 0 C00037008ED0: 318D0600 328D0600 338D0600 348D0600 1 2 3 4 C00037008EE0: 358D0600 368D0600 378D0600 388D0600 5 6 7 8 C00037008EF0: 398D0600 3A8D0600 3B8D0600 3C8D0600 9 : ; C00037008F00: 3D8D0600 3E8D0600 3F8D0600 408D0600 = ? C00037008F10: 418D0600 428D0600 438D0600 448D0600 A B C D C00037008F20: 458D0600 468D0600 478D0600 488D0600 E F G H C00037008F30: 498D0600 4A8D0600 4B8D0600 4C8D0600 I J K L C00037008F40: 4D8D0600 4E8D0600 4F8D0600 508D0600 M N O P C00037008F50: 518D0600 528D0600 538D0600 548D0600 Q R S T C00037008F60: 558D0600 A481 0100 0020F906 U C00037008F70: B377493D wI= C00037008F80: C377493D C377493D 907C0300 3200 wI= wI= | 2 C00037008F90: 0100 0100 0200 4400 C00037008FA0: 0200 0100 38000400 8 C00037008FB0: 80F1A501 0200 0300 3400 0 C00037008FC0: 6A756E6B 2E2E junk.. C00037008FD0: 2E00 ED4174F0 0300 .At C00037008FE0: 4800 H C00037008FF0: 91B2103D B377493D B377493D 0100= wI= wI= The byte immediately following that gets zeroed. It sure looks to me like someone is going over the end of a buffer. The question is, does anyone recognize that data structure?!?!?! Thanks!!! Dave B
Re: [reiserfs-list] 'let the hdd remap the bad blocks'
Oleg Drokin wrote: Hello! Basically uyou'd better search for this on HDD vendors sites. What's going on is simply can be described this way: You write some block to HDD, if HDD decides the block is bad for some reason and remapping is allowed (usually by tiurning on SMART), block is written to different on-platter location and drive adds one more entry to its remaped-blocks list. Next time you read this block, drive consults its remapped blocks list and if block is remapped, reads it from new location with correct content. Described mechanism works for writing. Actually I've seen something that looks like remapping on read, though I have no meaningful explanation for that (except that they may have some extra redundant info stored when you write data to disk, so that if sector cannot be read, its content is restored with that redundant information and sector is then remapped.). And this process takes a lot of time. Bye, Oleg On Mon, Aug 19, 2002 at 03:58:30PM +0100, Newsmail wrote: Hello Hans and Oleg, maybe its an offtopic question, but Hans always talks about leaving the hard disk to remap the bad blocks by itself. could you explain it in some words, how all this works, what happens after, and since when it exists, or do you have any special URL explaining this? thx in advance, greg Just taking a guess, many hard drives have difficult and time-consuming procedures that they can go through to read a troublesome block. These can take 20-30 seconds. Probably if they have to go through these procedures, once they finally succeed the smart vendors remap the block. Hans
Re: [reiserfs-list] Memory Corruption
On Mon, 2002-08-19 at 12:50, Dave Boutcher wrote: Hi, I'm chasing a wierd memory corruption problem on a ppc64 system. The first byte of a slab_t structure keeps getting stepped on (zeroed, actually.) This happens during a testcase that copies a large file called junk between file systems (a mix of ext2 and reiser) on a 2.4.13 kernel. I know that's REALLY REALLY old, but it's whats in SuSE's SLES-7 release that we have customers running... In every case, the page immediately preceding the slab_t has exactly the same data in it, and it looks like some kind of directory structure (note the presence of the word junk, along with .. and . towards the end.) Any chance the test case involves renames? -chris