On Wed, Jun 21, 2006 at 11:24:06AM -0400, Ray Chen wrote:
> me with a dirty, degraded array.  And, even though I have a dedicated 
> system disk, my machine wouldn't boot because the kernel would auto-detect 
> the array, and would halt again before I could get a shell.  Even with the 
> raid=noautodetect kernel option.

You can boot with the kernel option 'init=/bin/sh' to avoid this problem.
It's a nice little login trick that lets you by-pass all sorts of
annoyances... <grin>

> Soon after mounting the devices, my system would freeze again.  This 
> was confusing because Knoppix was able to use the device just fine.  I 
> eventually decided to do a fresh install of FC5.  I booted with the 
> Knoppix CD, mounted the raid, and backed up my system disk onto the raid.

Can you try booting with the init=/bin/sh trick?  That will tell you
if the problem is triggered the rc.* scripts or in the kernel itself,
i.e., another useful data point.

> Installing FC5 was no help.  Any process that accessed the mounted 
> array would become unresponsive.  Eventually, I decided that it was 
> because of the following line from "/bin/ps":
> 
> USER    PID %CPU %MEM  VSZ RSS TTY  STAT START   TIME COMMAND
> root   1329  0.0  0.0    0   0 ?    D<   Jun19   0:00 [md0_raid5]
> 
> I assume that the high-priority uninterruptible sleep state is a bad 
> thing.  After scouring google for some help, I found this:
> 
>       echo t > /proc/sysrq-trigger
> 
> And, after investigating the output, this looked suspect:
> 
> md0_raid5     D E2E06800  3260  1329     11          1345   421 (L-TLB)
> f7384ed4 00000011 00000003 e2e06800 003d1ec8 00000001 0000000a f7d60258
> f7d60130 f7f480b0 c1a0b160 e2e06800 003d1ec8 00000000 00000000 f7e127c0
> c01490ed 00000020 f7f480b0 00000000 c1a0bac0 00000002 f7384efc c01347a4
> Call Trace:
>  [<c01490ed>] mempool_alloc+0x37/0xd3
>  [<c01347a4>] prepare_to_wait+0x12/0x4c
>  [<c0289349>] md_super_wait+0xa8/0xbd
>  [<c0134693>] autoremove_wake_function+0x0/0x2d
>  [<c0289ca5>] md_update_sb+0x107/0x159
>  [<c028c088>] md_check_recovery+0x161/0x3c3
>  [<f8a40b8c>] raid5d+0x10/0x113 [raid5]
>  [<c02f2267>] _spin_lock_irqsave+0x9/0xd
>  [<c028c99e>] md_thread+0xed/0x104
>  [<c0134693>] autoremove_wake_function+0x0/0x2d
>  [<c028c8b1>] md_thread+0x0/0x104
>  [<c01345b1>] kthread+0x9d/0xc9
>  [<c0134514>] kthread+0x0/0xc9
>  [<c0102005>] kernel_thread_helper+0x5/0xb
> 
> Every other process was stopped in _spin_unlock_irq, schedule_timeout, or 
> something else that made sense.  But mempool_alloc?  Am I interpreting 
> the output incorrectly?  Does anybody know what the two hex numbers 
> represent after the function name?

At least one of the hex numbers is the offset into that function,
so "mempool_alloc+0x37/0xd3" means 0x37 bytes into the function
mempool_alloc.  I'm assuming the other number has something to do
with the current stack frame, but I can't find any docs on this other
than Documents/sysrq.txt .  Actually, I got annoyed, and traced stuff
from handle_sysrq() in drivers/char to the arch specific stuff in
arch/${ARCH}/kernel/traps.c, and eventually to kernel/kallsyms.c:
__print_symbol() .  It looks like the first hex is the offset (yay),
and the second hex code is the total length of the function, so from
my example above, you are 0x37 bytes into mempool_alloc which is 0xd3
bytes long.  Not that any of this really helps you, but you're right
that it is weird it's getting hung in mempool_alloc.

- Rob
.

Reply via email to