Hi

"Jonas Jensen (by way of Jonas Jensen ) (by way of Jonas Jensen )"
wrote:

> One of my reiserfs disks became corrupted last week, and it's still causing
> me problems. I'll try to describe it in full detail, hoping that this problem
> can be fixed for good.
>
> The disk in question is a Linux software raid 0 partition on 2x40GB Maxtor
> IDE100 drives on a Promise FastTrak100 controller (which has raid support,
> but I use Linux software raid instead).
> When this started, my kernel was 2.4.8-ac9, and the machine had an uptime of
> about 1 month running this kernel without problems.
>
> I wanted to clean up a bit, then ls started to act weird -- it could list the
> file names in my directories, but it failed to stat most of the files (I run
> ls -F --color).
>
> In my syslog I got:
>
> Sep 25 16:06:13 monsterbob kernel: hdg: timeout waiting for DMA
> Sep 25 16:06:13 monsterbob kernel: ide_dmaproc: chipset supported
> ide_dma_timeout func only: 14
> Sep 25 16:06:13 monsterbob kernel: hdg: status timeout: status=0x80 { Busy }
> Sep 25 16:06:13 monsterbob kernel: hdg: drive not ready for command
> Sep 25 16:06:15 monsterbob kernel: ide3: reset: success
> Sep 25 16:06:20 monsterbob kernel: is_tree_node: node level 0 does not match
> to
> the expected one 1
> Sep 25 16:06:20 monsterbob kernel: vs-5150: search_by_key: invalid format
> found
> in block 8801. Fsck?
> Sep 25 16:06:20 monsterbob kernel: vs-13070: reiserfs_read_inode2: i/o
> failure occurred trying to find stat data of [2091 2092 0x0 SD]
> Sep 25 16:09:43 monsterbob kernel: is_tree_node: node level 0 does not match
> to
> the expected one 1
> Sep 25 16:09:43 monsterbob kernel: vs-5150: search_by_key: invalid format
> found
> in block 11746. Fsck?
> Sep 25 16:09:43 monsterbob kernel: vs-13070: reiserfs_read_inode2: i/o
> failure occurred trying to find stat data of [3 2100 0x0 SD]
> Sep 25 16:09:43 monsterbob kernel: is_leaf: free space seems wrong: level=1,
> nr_items=1, free_space=0 rdkey
> Sep 25 16:09:43 monsterbob kernel: vs-5150: search_by_key: invalid format
> found
> in block 11749. Fsck?
> Sep 25 16:09:43 monsterbob kernel: vs-13070: reiserfs_read_inode2: i/o
> failure occurred trying to find stat data of [3 2101 0x0 SD]
> [etc...]
>
> >From what I can see, there was first a problem because my disks were sleeping
> and they didn't spin up fast enough. Perhaps hdg was removed from my striped
> raid or something, which confused reiserfs a lot.
>
> I unmounted the partition, hoping that it would work when I remounted it, but
> it failed:
>
> [root@monsterbob root]# mount /mnt/disk
> mount: Not a directory
>
> In my syslog I got:
>
> reiserfs: checking transaction log (device 09:00) ...
> is_tree_node: node level 6425 does not match to the expected one 4
> vs-5150: search_by_key: invalid format found in block 150545. Fsck?...
> vs-13040: reiserfs_read_inode2: i/o failure occurred trying to find stat data
> of [1 2 0x0 SD]
> Using r5 hash to sort names
> is_tree_node: node level 6425 does not match to the expected one 4
> vs-5150: search_by_key: invalid format found in block 150545. Fsck?
> vs-2140: finish_unfinished: search_by_key returned -2
> ReiserFS version 3.6.25
>
> I upgraded my kernel to 2.4.9-ac14, then I did reiserfsck with
> reiserfsprogs-3.x.0j, but it segfaulted. reiserfsprogs-3.x.0k-pre10 worked,
> so I did reiserfsck --rebuild-tree /dev/md0
> and this fixed it. The disk worked for a few hours, then exactly the same
> thing happened while the disks were spinning up.
> While writing this, I'm doing rebuild-tree again, but it seems that this
> "cure" doesn't last very long.
>
> It seems to me that I have a problem with my IDE somewhere below reiserfs
> that needs to be worked out. However, it still seems to be a bug in reiserfs
> that corrupts my filesystem when it gets confused, instead of just giving up
> so it would work the next time I remounted the partition.
>

IMHO, when hardware starts to fail - it is time to think about changing
it.
Reiserfs has not way to know when it should give up. It sends correct
data to
disk, broken hardware writes it wrong. Who did corrupt the data then?
The worst thing in your case is (as it looks for me) that you do not
have
unreadable blocks in certain places but harddisk fails randomly.

Anyway, next time your data will become available - you should find a
way to
backup then on reliable hardware.

Thanks,
vs


>
> Hoping this can be solved,
> Jonas Jensen
>
> PS: please CC me as I don't subscribe to this list.

Reply via email to