[UM-LINUX] Fedora RAID5 Problems

Ray Chen Wed, 21 Jun 2006 08:24:49 -0700

Hello,

I've got a problem with the software raid I've been running for the pastfew months, and was hoping you guys could help me out. Sorry for the longe-mail, but there's a small bit of history that people will probably askme for anyway.

One disk of my raid5 went bad, and after replacing it with a new one, Irequested mdadm to rebuild the device. This was on an FC4 machine. Themachine ended up freezing, and I was forced to pull the plug. This leftme with a dirty, degraded array. And, even though I have a dedicatedsystem disk, my machine wouldn't boot because the kernel would auto-detectthe array, and would halt again before I could get a shell. Even with theraid=noautodetect kernel option.

So, I booted with a Knoppix 5.0.1 CD. I was able to rebuild the array,and successfully run fsck on the device with no problems. I figured FC4would work normally. Nada.

Soon after mounting the devices, my system would freeze again. Thiswas confusing because Knoppix was able to use the device just fine. Ieventually decided to do a fresh install of FC5. I booted with theKnoppix CD, mounted the raid, and backed up my system disk onto the raid.

Installing FC5 was no help. Any process that accessed the mountedarray would become unresponsive. Eventually, I decided that it wasbecause of the following line from "/bin/ps":


USER    PID %CPU %MEM  VSZ RSS TTY  STAT START   TIME COMMAND
root   1329  0.0  0.0    0   0 ?    D<   Jun19   0:00 [md0_raid5]

I assume that the high-priority uninterruptible sleep state is a badthing. After scouring google for some help, I found this:


        echo t > /proc/sysrq-trigger

And, after investigating the output, this looked suspect:

md0_raid5     D E2E06800  3260  1329     11          1345   421 (L-TLB)
f7384ed4 00000011 00000003 e2e06800 003d1ec8 00000001 0000000a f7d60258
f7d60130 f7f480b0 c1a0b160 e2e06800 003d1ec8 00000000 00000000 f7e127c0
c01490ed 00000020 f7f480b0 00000000 c1a0bac0 00000002 f7384efc c01347a4
Call Trace:
 [<c01490ed>] mempool_alloc+0x37/0xd3
 [<c01347a4>] prepare_to_wait+0x12/0x4c
 [<c0289349>] md_super_wait+0xa8/0xbd
 [<c0134693>] autoremove_wake_function+0x0/0x2d
 [<c0289ca5>] md_update_sb+0x107/0x159
 [<c028c088>] md_check_recovery+0x161/0x3c3
 [<f8a40b8c>] raid5d+0x10/0x113 [raid5]
 [<c02f2267>] _spin_lock_irqsave+0x9/0xd
 [<c028c99e>] md_thread+0xed/0x104
 [<c0134693>] autoremove_wake_function+0x0/0x2d
 [<c028c8b1>] md_thread+0x0/0x104
 [<c01345b1>] kthread+0x9d/0xc9
 [<c0134514>] kthread+0x0/0xc9
 [<c0102005>] kernel_thread_helper+0x5/0xb

Every other process was stopped in _spin_unlock_irq, schedule_timeout, orsomething else that made sense. But mempool_alloc? Am I interpretingthe output incorrectly? Does anybody know what the two hex numbersrepresent after the function name?


Any help or insight would be appreciated,
Ray Chen

[UM-LINUX] Fedora RAID5 Problems

Reply via email to