On Thu, Jan 07, 2010 at 03:07:17PM -0600, Sam Lang wrote: > > On Jan 7, 2010, at 2:29 PM, Michael Moore wrote: > > > On Thu, Jan 07, 2010 at 12:30:00PM -0600, Sam Lang wrote: > >> > >> On Jan 7, 2010, at 11:32 AM, Michael Moore wrote: > >> > >>> In continuing to look at the 100% CPU usage (kernel loop) Randy had > >>> written about previously I've narrowed the issue down a little. It > >>> appears related to cancellation of operations when a write() call > >>> is blocking and I/O has been retried. > >>> > >>> While on our cluster the retries were caused by congestion I am > >>> re-creating the congestion by killing an I/O server. The test C program > >>> I'm using just loops around writes of 4k to a PVFS file. If, > >>> while the program is executing, I kill a PVFS I/O server the write hangs > >>> (expectedly) . About 30% of the time when I try to kill the > >>> process doing the writing it spikes to 100% CPU usage and is not > >>> killable. Also, every time I try to kill the writing process > >>> pvfs2-client-core segfaults with something similar to: > >>> > >>> [E 11:58:09.724121] PVFS2 client: signal 11, faulty address is 0x41ec, > >>> from 0x8050b51 > >>> [E 11:58:09.725403] [bt] pvfs2-client-core [0x8050b51] > >>> [E 11:58:09.725427] [bt] pvfs2-client-core(main+0xe48) [0x8052498] > >>> [E 11:58:09.725436] [bt] /lib/libc.so.6(__libc_start_main+0xdc) > >>> [0x75ee9c] > >>> [E 11:58:09.725444] [bt] pvfs2-client-core [0x804a381] > >>> [E 11:58:09.740133] Child process with pid 2555 was killed by an > >>> uncaught signal 6 > >>> > >>> In the cases when the CPU usage becomes 100% (and the process can't be > >>> terminated) the for() loop in PINT_client_io_cancel strangely segfaults > >>> during exactly iteration 31. The value of sm_p->u.io.context_count is > >>> in the hunderds so there are a signifigant number of jobs left to cancel. > >> > >> Hi Michael, > >> > >> Are you guys using infiniband by chance? Do you have a stack trace with > >> debugging symbols where the pvfs2-client-core segfault occurs? That might > >> be useful for narrowing things down. > >> > > No infiniband at use, vanilla ethernet. > > > > Stack trace with symbols: > > Program received signal SIGSEGV, Segmentation fault. > > [Switching to Thread 0xb7fe06d0 (LWP 2069)] > > PINT_client_io_cancel (id=5594) at > > src/client/sysint/client-state-machine.c:524 > > 524 if (cur_ctx->msg_send_in_progress) > > (gdb) > > (gdb) > > (gdb) bt > > #0 PINT_client_io_cancel (id=5594) > > at src/client/sysint/client-state-machine.c:524 > > #1 0x0804b926 in service_operation_cancellation (vfs_request=0xa073d98) > > at src/apps/kernel/linux/pvfs2-client-core.c:407 > > #2 0x0804f141 in handle_unexp_vfs_request (vfs_request=0xa073d98) > > at src/apps/kernel/linux/pvfs2-client-core.c:2959 > > #3 0x08050c7b in process_vfs_requests () > > at src/apps/kernel/linux/pvfs2-client-core.c:3148 > > #4 0x08052498 in main (argc=12, argv=0xbfe0d224) > > at src/apps/kernel/linux/pvfs2-client-core.c:3561 > > (gdb) > > What version of PVFS is this? I'm having trouble matching up the lines to > anything interesting. > -sam >
Sorry, Orange-Branch out of CVS. Michael > > > > > >>> The real issue is the 30% of the time when the process gets stuck in the > >>> kernel waiting for a downcall. With some additional debugging, the > >>> process's write() call is clearly stuck in the while(1) loop in > >>> wait_for_cancellation_downcall(). The function's assumption is that > >>> either the request will timeout or it will be serviced after one > >>> iteration of the loop. However, in this situation it neither occurs. The > >>> schedule_timeout() call immediately returns with a signal pending but > >>> the op is never serviced so it spins indefinately. > >> > >> Those infinite looping conditionals have bugged me for a while now > >> (there's one in wait_for_matching_downcall too). We should probably go > >> through and systematically replace all of them. Lets try to fix your bug > >> first though. > >> > >> I'm actually surprised that the process goes to 100% cpu in this case. > >> You're right that the op is not getting serviced, so the first if > >> conditional won't break out of the while loop. But the schedule_timeout > >> should only return non-zero if the task gets woken up, and it only gets > >> woken up in purge_waiting_ops() when the pvfs2-client-core segfaults. And > >> that should only happen once. One thing you might try is to change the > >> first if conditional from: > >> > >> if (op_state_serviced(op)) > >> > >> to: > >> > >> if (op_state_serviced(op) || op_state_purged(op)) > >> > >> That will allow purged ops to exit the while loop. Could you share your > >> debugging output and modified code? > > > > No special code other than some gossip_debugs. The > > kernel debugging right before and including the cancel is: > > > > Jan 7 15:15:25 vm04 kernel: Alloced OP (d0c48020: 875 OP_FILE_IO) > > Jan 7 15:15:25 vm04 kernel: GET op d0c48020 -> buffer_index 0 > > Jan 7 15:15:25 vm04 kernel: pvfs2_file_write: copy_to_user 1 nr_segs 1, > > offset: 3129344 total_size: 4096 > > Jan 7 15:15:25 vm04 kernel: pvfs_bufmap_copy_iovec_from_user: index 0, > > size 4096 > > Jan 7 15:15:25 vm04 kernel: First character (integer value) in > > pvfs_bufmap_copy_from_user: 63 > > Jan 7 15:15:25 vm04 kernel: pvfs2: service_operation: pvfs2_file_write > > d0c48020 > > Jan 7 15:15:25 vm04 kernel: pvfs2: operation posted by process: > > pvfs_rand_write, pid: 2076 > > Jan 7 15:15:25 vm04 kernel: Releasing OP (d0c98060: 874) > > Jan 7 15:15:25 vm04 kernel: client-core: reading op tag 875 OP_FILE_IO > > Jan 7 15:15:55 vm04 kernel: *** operation interrupted by a signal (tag > > 875, op d0c48020) > > Jan 7 15:15:55 vm04 kernel: Interrupted: Removed op d0c48020 from > > htable_ops_in_progress > > Jan 7 15:15:55 vm04 kernel: pvfs2: service_operation pvfs2_file_write > > returning: -4 for d0c48020. > > Jan 7 15:15:55 vm04 kernel: pvfs2_cancel_op_in_progress called on tag > > 875 > > Jan 7 15:15:55 vm04 kernel: Alloced OP (da5000a0: 876 OP_CANCEL) > > Jan 7 15:15:55 vm04 kernel: Attempting PVFS2 operation cancellation of > > tag 875 > > Jan 7 15:15:55 vm04 kernel: pvfs2: service_operation: pvfs2_cancel > > da5000a0 > > Jan 7 15:15:55 vm04 kernel: pvfs2: operation posted by process: > > pvfs_rand_write, pid: 2076 > > Jan 7 15:15:55 vm04 kernel: *** operation didn't time out: da5000a0, > > jiffies_left: 20000 > > Jan 7 15:15:57 vm04 last message repeated 3 times > > Jan 7 15:15:58 vm04 kernel: *** operation didn't time out: da5000a0, > > jiffies_left: 19999 > > Jan 7 15:15:58 vm04 kernel: *** operation didn't time out: da5000a0, > > jiffies_left: 20000 > > Jan 7 15:15:58 vm04 last message repeated 59 times > > ............ > > > > > > The pvfs2-client debug log (from an earlier failure) proves that my > > statement earlier about segfaulting in the 31st iteration false. This > > one goes into the 70's. > > > > [D 14:35:53.231333] cancel_op_in_progress called > > [D 14:35:53.231342] cancelling I/O req 0x88c7f98 from tag 1128 > > [D 14:35:53.231357] PINT_client_io_cancel id 7350 > > [D 14:35:53.231368] job iteration [0] > > [D 14:35:53.231377] [0] Posting cancellation of type: BMI Send (Request) > > [D 14:35:53.231386] [0] Posting cancellation of type: BMI Recv > > (Response) > > [D 14:35:53.231396] [0] Posting cancellation of type: FLOW > > [D 14:35:53.231431] [0] Posting cancellation of type: BMI Recv (Write > > Ack) > > [D 14:35:53.231444] job iteration [1] > > [D 14:35:53.231452] [1] Posting cancellation of type: BMI Send (Request) > > [D 14:35:53.231461] [1] Posting cancellation of type: BMI Recv > > (Response) > > [D 14:35:53.231469] [1] Posting cancellation of type: BMI Recv (Write > > Ack) > > [D 14:35:53.231502] job iteration [2] > > [D 14:35:53.231510] [2] Posting cancellation of type: BMI Send (Request) > > [D 14:35:53.231519] [2] Posting cancellation of type: BMI Recv > > (Response) > > [D 14:35:53.231528] [2] Posting cancellation of type: FLOW > > [D 14:35:53.231536] [2] Posting cancellation of type: BMI Recv (Write > > Ack) > > [D 14:35:53.231545] job iteration [3] > > [D 14:35:53.231553] [3] Posting cancellation of type: BMI Send (Request) > > [D 14:35:53.231562] [3] Posting cancellation of type: BMI Recv > > (Response) > > [D 14:35:53.231571] [3] Posting cancellation of type: FLOW > > [D 14:35:53.231579] [3] Posting cancellation of type: BMI Recv (Write > > Ack) > > [D 14:35:53.231588] job iteration [4] > > [D 14:35:53.231596] [4] Posting cancellation of type: BMI Send (Request) > > [D 14:35:53.231605] [4] Posting cancellation of type: BMI Recv > > (Response) > > [D 14:35:53.231614] [4] Posting cancellation of type: FLOW > > [D 14:35:53.231622] [4] Posting cancellation of type: BMI Recv (Write > > Ack) > > [D 14:35:53.231628] job iteration [5] > > [D 14:35:53.231666] [5] Posting cancellation of type: BMI Send (Request) > > [D 14:35:53.231675] [5] Posting cancellation of type: BMI Recv > > (Response) > > [D 14:35:53.231684] [5] Posting cancellation of type: FLOW > > [D 14:35:53.231692] [5] Posting cancellation of type: BMI Recv (Write > > Ack) > > [D 14:35:53.231701] job iteration [6] > > [D 14:35:53.231709] [6] Posting cancellation of type: BMI Send (Request) > > [D 14:35:53.231718] [6] Posting cancellation of type: BMI Recv > > (Response) > > [D 14:35:53.231727] [6] Posting cancellation of type: FLOW > > [D 14:35:53.231735] [6] Posting cancellation of type: BMI Recv (Write > > Ack) > > ............ > > <snip> > > ............ > > [D 14:35:53.235128] job iteration [69] > > [D 14:35:53.235136] [69] Posting cancellation of type: FLOW > > [D 14:35:53.235145] [69] Posting cancellation of type: BMI Recv (Write > > Ack) > > [D 14:35:53.235153] job iteration [70] > > [D 14:35:53.235161] job iteration [71] > > [D 14:35:53.235169] job iteration [72] > > [D 14:35:53.235177] job iteration [73] > > [E 14:35:53.235212] PVFS2 client: signal 11, faulty address is 0xb8c0c8, > > from 0xb26900 > > [E 14:35:53.235810] [bt] > > /usr/local/pvfs/lib/libpvfs2.so(PINT_client_io_cancel+0 > > xe0) [0xb26900] > > [E 14:35:53.235828] [bt] pvfs2-client-core [0x804b926] > > [E 14:35:53.235836] [bt] pvfs2-client-core [0x804f141] > > [E 14:35:53.235844] [bt] pvfs2-client-core [0x8050c7b] > > [E 14:35:53.235852] [bt] pvfs2-client-core(main+0xe48) [0x8052498] > > > > > > I'll give the pending_ops change a try and post the results. > > > > Michael > > > >> > >> Thanks, > >> > >> -sam > >> > >>> > >>> Has anyone else seen the issue with client-core segfaulting on every > >>> cancel op? Should the kernel wait_for_cancellation_downcall() be changed > >>> to not allow indefinite looping? > >>> > >>> Thanks, > >>> Michael > >>> _______________________________________________ > >>> Pvfs2-developers mailing list > >>> [email protected] > >>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers > >> > _______________________________________________ Pvfs2-developers mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
