Re: [reiserfs-list] 'let the hdd remap the bad blocks'

2002-08-19 Thread Matthias Andree

Oleg Drokin [EMAIL PROTECTED] writes:

Basically uyou'd better search for this on HDD vendors sites.
What's going on is simply can be described this way:
You write some block to HDD, if HDD decides the block is bad for some reason
and remapping is allowed (usually by tiurning on SMART), block is written to
different on-platter location and drive adds one more entry to its
remaped-blocks list. Next time you read this block, drive consults its
remapped blocks list and if block is remapped, reads it from new location
with correct content.
Described mechanism works for writing.
Actually I've seen something that looks like remapping on read, though 
I have no meaningful explanation for that (except that they may have some
extra redundant info stored when you write data to disk, so that if sector
cannot be read, its content is restored with that redundant information and
sector is then remapped.). And this process takes a lot of time.

My Fujitsu MAH-3182MP drive (SCSI actually) had ARRE enabled as it
shipped, but ARWE disabled, for reasons I cannot tell, not even from the
data book (PDF). That's Automatic Remap on Read/Write Error. I'm not
sure what it really means, but if the drive really remaps on a read
error, it's going to leak a block at power loss while it is amidst a
block write the next time this block is read. So I switched that to do
ARWE. IDE users are not too lucky unless their vendor provides them with
a tool (and not many ship raw floppy images, many have some multi-MB
Windoze tools just to write some hundred kByte to a floppy disk...)

-- 
Matthias Andree



[reiserfs-list] Memory Corruption

2002-08-19 Thread Dave Boutcher

Hi,

I'm chasing a wierd memory corruption problem on a ppc64 system.  The
first byte of a slab_t structure keeps getting stepped on (zeroed,
actually.)  This happens during a testcase that copies a large file
called junk between file systems (a mix of ext2 and reiser) on a
2.4.13 kernel.  I know that's REALLY REALLY old, but it's whats in
SuSE's SLES-7 release that we have customers running...

In every case, the page immediately preceding the slab_t has exactly the
same data in it, and it looks like some kind of directory structure
(note the presence of the word junk, along with .. and . towards
the end.)
C00037008E00: FD8C0600 FE8C0600 FF8C0600 008D0600 
C00037008E10: 018D0600 028D0600 038D0600 048D0600 
C00037008E20: 058D0600 068D0600 078D0600 088D0600 
C00037008E30: 098D0600 0A8D0600 0B8D0600 0C8D0600 
C00037008E40: 0D8D0600 0E8D0600 0F8D0600 108D0600 
C00037008E50: 118D0600 128D0600 138D0600 148D0600 
C00037008E60: 158D0600 168D0600 178D0600 188D0600 
C00037008E70: 198D0600 1A8D0600 1B8D0600 1C8D0600 
C00037008E80: 1D8D0600 1E8D0600 1F8D0600 208D0600 
C00037008E90: 218D0600 228D0600 238D0600 248D0600 !  #   $   
C00037008EA0: 258D0600 268D0600 278D0600 288D0600 %  '   (   
C00037008EB0: 298D0600 2A8D0600 2B8D0600 2C8D0600 )   *   +   ,   
C00037008EC0: 2D8D0600 2E8D0600 2F8D0600 308D0600 -   .   /   0   
C00037008ED0: 318D0600 328D0600 338D0600 348D0600 1   2   3   4   
C00037008EE0: 358D0600 368D0600 378D0600 388D0600 5   6   7   8   
C00037008EF0: 398D0600 3A8D0600 3B8D0600 3C8D0600 9   :   ;  
C00037008F00: 3D8D0600 3E8D0600 3F8D0600 408D0600 =  ?  
C00037008F10: 418D0600 428D0600 438D0600 448D0600 A   B   C   D   
C00037008F20: 458D0600 468D0600 478D0600 488D0600 E   F   G   H   
C00037008F30: 498D0600 4A8D0600 4B8D0600 4C8D0600 I   J   K   L   
C00037008F40: 4D8D0600 4E8D0600 4F8D0600 508D0600 M   N   O   P   
C00037008F50: 518D0600 528D0600 538D0600 548D0600 Q   R   S   T   
C00037008F60: 558D0600 A481 0100 0020F906 U   
C00037008F70:    B377493D  wI=
C00037008F80: C377493D C377493D 907C0300 3200  wI= wI= |  2   
C00037008F90: 0100 0100 0200 4400
C00037008FA0: 0200  0100 38000400 8   
C00037008FB0: 80F1A501 0200 0300 3400 0   
C00037008FC0: 6A756E6B  2E2E  junk..  
C00037008FD0: 2E00  ED4174F0 0300 .At 
C00037008FE0: 4800    H   
C00037008FF0: 91B2103D B377493D B377493D 0100= wI= wI=

The byte immediately following that gets zeroed.  It sure looks to me
like someone is going over the end of a buffer.

The question is, does anyone recognize that data structure?!?!?!

Thanks!!!

Dave B






Re: [reiserfs-list] 'let the hdd remap the bad blocks'

2002-08-19 Thread Hans Reiser

Oleg Drokin wrote:

Hello!

   Basically uyou'd better search for this on HDD vendors sites.
   What's going on is simply can be described this way:
   You write some block to HDD, if HDD decides the block is bad for some reason
   and remapping is allowed (usually by tiurning on SMART), block is written to
   different on-platter location and drive adds one more entry to its
   remaped-blocks list. Next time you read this block, drive consults its
   remapped blocks list and if block is remapped, reads it from new location
   with correct content.
   Described mechanism works for writing.
   Actually I've seen something that looks like remapping on read, though 
   I have no meaningful explanation for that (except that they may have some
   extra redundant info stored when you write data to disk, so that if sector
   cannot be read, its content is restored with that redundant information and
   sector is then remapped.). And this process takes a lot of time.

Bye,
Oleg
On Mon, Aug 19, 2002 at 03:58:30PM +0100, Newsmail wrote:
  

Hello Hans and Oleg,
maybe its an offtopic question, but Hans always talks about leaving the 
hard disk to remap the bad blocks by itself. could you explain it in some 
words, how all this works, what happens after, and since when it exists, or 
do you have any special URL explaining this?
thx in advance,
greg







  

Just taking a guess, many hard drives have difficult and time-consuming 
procedures that they can go through to read a troublesome block.  These 
can take 20-30 seconds.  Probably if they have to go through these 
procedures, once they finally succeed the smart vendors remap the block.

Hans





Re: [reiserfs-list] Memory Corruption

2002-08-19 Thread Chris Mason

On Mon, 2002-08-19 at 12:50, Dave Boutcher wrote:
 Hi,
 
 I'm chasing a wierd memory corruption problem on a ppc64 system.  The
 first byte of a slab_t structure keeps getting stepped on (zeroed,
 actually.)  This happens during a testcase that copies a large file
 called junk between file systems (a mix of ext2 and reiser) on a
 2.4.13 kernel.  I know that's REALLY REALLY old, but it's whats in
 SuSE's SLES-7 release that we have customers running...
 
 In every case, the page immediately preceding the slab_t has exactly the
 same data in it, and it looks like some kind of directory structure
 (note the presence of the word junk, along with .. and . towards
 the end.)

Any chance the test case involves renames?

-chris