On Jan 7, 2010, at 11:32 AM, Michael Moore wrote:
> In continuing to look at the 100% CPU usage (kernel loop) Randy had
> written about previously I've narrowed the issue down a little. It
> appears related to cancellation of operations when a write() call
> is blocking and I/O has been retried.
>
> While on our cluster the retries were caused by congestion I am
> re-creating the congestion by killing an I/O server. The test C program
> I'm using just loops around writes of 4k to a PVFS file. If,
> while the program is executing, I kill a PVFS I/O server the write hangs
> (expectedly) . About 30% of the time when I try to kill the
> process doing the writing it spikes to 100% CPU usage and is not
> killable. Also, every time I try to kill the writing process
> pvfs2-client-core segfaults with something similar to:
>
> [E 11:58:09.724121] PVFS2 client: signal 11, faulty address is 0x41ec,
> from 0x8050b51
> [E 11:58:09.725403] [bt] pvfs2-client-core [0x8050b51]
> [E 11:58:09.725427] [bt] pvfs2-client-core(main+0xe48) [0x8052498]
> [E 11:58:09.725436] [bt] /lib/libc.so.6(__libc_start_main+0xdc)
> [0x75ee9c]
> [E 11:58:09.725444] [bt] pvfs2-client-core [0x804a381]
> [E 11:58:09.740133] Child process with pid 2555 was killed by an
> uncaught signal 6
>
> In the cases when the CPU usage becomes 100% (and the process can't be
> terminated) the for() loop in PINT_client_io_cancel strangely segfaults
> during exactly iteration 31. The value of sm_p->u.io.context_count is
> in the hunderds so there are a signifigant number of jobs left to cancel.
Hi Michael,
Are you guys using infiniband by chance? Do you have a stack trace with
debugging symbols where the pvfs2-client-core segfault occurs? That might be
useful for narrowing things down.
>
> The real issue is the 30% of the time when the process gets stuck in the
> kernel waiting for a downcall. With some additional debugging, the
> process's write() call is clearly stuck in the while(1) loop in
> wait_for_cancellation_downcall(). The function's assumption is that
> either the request will timeout or it will be serviced after one
> iteration of the loop. However, in this situation it neither occurs. The
> schedule_timeout() call immediately returns with a signal pending but
> the op is never serviced so it spins indefinately.
Those infinite looping conditionals have bugged me for a while now (there's one
in wait_for_matching_downcall too). We should probably go through and
systematically replace all of them. Lets try to fix your bug first though.
I'm actually surprised that the process goes to 100% cpu in this case. You're
right that the op is not getting serviced, so the first if conditional won't
break out of the while loop. But the schedule_timeout should only return
non-zero if the task gets woken up, and it only gets woken up in
purge_waiting_ops() when the pvfs2-client-core segfaults. And that should only
happen once. One thing you might try is to change the first if conditional
from:
if (op_state_serviced(op))
to:
if (op_state_serviced(op) || op_state_purged(op))
That will allow purged ops to exit the while loop. Could you share your
debugging output and modified code?
Thanks,
-sam
>
> Has anyone else seen the issue with client-core segfaulting on every
> cancel op? Should the kernel wait_for_cancellation_downcall() be changed
> to not allow indefinite looping?
>
> Thanks,
> Michael
> _______________________________________________
> Pvfs2-developers mailing list
> [email protected]
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers