Thanks Phil, that appears to solve the problem! I tested it both against 
head and orange branch and didn't see any of the infinite looping or 
client segfaults. I tested it without any of the other changes so it 
looks like that patch alone resolves the issue.

Michael

On Fri, Jan 15, 2010 at 03:28:54PM -0500, Phil Carns wrote:
> Hi Michael,
> 
> I just tried your test case on a clean trunk build here and was able to 
> reproduce the pvfs2-client-core segfault 100% of the time on my box.
> 
> The problem in a nutshell is that pvfs2-client-core was trying to cancel 
> a small-io operation using logic that is only appropriate for a normal 
> I/O operation, in turn causing some memory corruptions.
> 
> Can you try out the fix and see if it solves the problem for you?  The 
> patch is attached your you can pull it from cvs trunk.
> 
> You might want to try that change by itself (without the op purged 
> change) first and go from there.  Some of the other issues you ran into 
> may have been an after-effect from the cancel problem.
> 
> -Phil
> 
> 
> Michael Moore wrote:
> > On Thu, Jan 07, 2010 at 06:48:19PM -0500, Michael Moore wrote:
> >> On Thu, Jan 07, 2010 at 03:07:17PM -0600, Sam Lang wrote:
> >>> On Jan 7, 2010, at 2:29 PM, Michael Moore wrote:
> >>>
> >>>> On Thu, Jan 07, 2010 at 12:30:00PM -0600, Sam Lang wrote:
> >>>>> On Jan 7, 2010, at 11:32 AM, Michael Moore wrote:
> >>>>>
> >>>>>> In continuing to look at the 100% CPU usage (kernel loop) Randy had 
> >>>>>> written about previously I've narrowed the issue down a little. It 
> >>>>>> appears related to cancellation of operations when a write() call 
> >>>>>> is blocking and I/O has been retried. 
> >>>>>>
> >>>>>> While on our cluster the retries were caused by congestion I am 
> >>>>>> re-creating the congestion by killing an I/O server. The test C 
> >>>>>> program 
> >>>>>> I'm using just loops around writes of 4k to a PVFS file. If, 
> >>>>>> while the program is executing, I kill a PVFS I/O server the write 
> >>>>>> hangs 
> >>>>>> (expectedly) . About 30% of the time when I try to kill the 
> >>>>>> process doing the writing it spikes to 100% CPU usage and is not 
> >>>>>> killable. Also, every time I try to kill the writing process 
> >>>>>> pvfs2-client-core segfaults with something similar to:
> >>>>>>
> >>>>>> [E 11:58:09.724121] PVFS2 client: signal 11, faulty address is 0x41ec, 
> >>>>>> from 0x8050b51
> >>>>>> [E 11:58:09.725403] [bt] pvfs2-client-core [0x8050b51]
> >>>>>> [E 11:58:09.725427] [bt] pvfs2-client-core(main+0xe48) [0x8052498]
> >>>>>> [E 11:58:09.725436] [bt] /lib/libc.so.6(__libc_start_main+0xdc) 
> >>>>>> [0x75ee9c]
> >>>>>> [E 11:58:09.725444] [bt] pvfs2-client-core [0x804a381]
> >>>>>> [E 11:58:09.740133] Child process with pid 2555 was killed by an 
> >>>>>> uncaught signal 6
> >>>>>>
> >>>>>> In the cases when the CPU usage becomes 100% (and the process can't be 
> >>>>>> terminated) the for() loop in PINT_client_io_cancel strangely 
> >>>>>> segfaults 
> >>>>>> during exactly iteration 31. The value of sm_p->u.io.context_count is 
> >>>>>> in the hunderds so there are a signifigant number of jobs left to 
> >>>>>> cancel.
> >>>>> Hi Michael,
> >>>>>
> >>>>> Are you guys using infiniband by chance?  Do you have a stack trace 
> >>>>> with debugging symbols where the pvfs2-client-core segfault occurs?  
> >>>>> That might be useful for narrowing things down.
> >>>>>
> >>>> No infiniband at use, vanilla ethernet.
> >>>>
> >>>> Stack trace with symbols:
> >>>> Program received signal SIGSEGV, Segmentation fault.
> >>>> [Switching to Thread 0xb7fe06d0 (LWP 2069)]
> >>>> PINT_client_io_cancel (id=5594) at 
> >>>> src/client/sysint/client-state-machine.c:524
> >>>> 524              if (cur_ctx->msg_send_in_progress)
> >>>> (gdb) 
> >>>> (gdb) 
> >>>> (gdb) bt
> >>>> #0  PINT_client_io_cancel (id=5594)
> >>>>    at src/client/sysint/client-state-machine.c:524
> >>>> #1  0x0804b926 in service_operation_cancellation (vfs_request=0xa073d98)
> >>>>    at src/apps/kernel/linux/pvfs2-client-core.c:407
> >>>> #2  0x0804f141 in handle_unexp_vfs_request (vfs_request=0xa073d98)
> >>>>    at src/apps/kernel/linux/pvfs2-client-core.c:2959
> >>>> #3  0x08050c7b in process_vfs_requests ()
> >>>>    at src/apps/kernel/linux/pvfs2-client-core.c:3148
> >>>> #4  0x08052498 in main (argc=12, argv=0xbfe0d224)
> >>>>    at src/apps/kernel/linux/pvfs2-client-core.c:3561
> >>>> (gdb) 
> > 
> > In addition to the above segfault, I'm also seeing one for vfs_request 
> > in process_vfs_requests. The vfs_request segfault is the one depicted in 
> > the logs attached. The segfault is just at the first time the function 
> > accesses vfs_request:
> > Program received signal SIGSEGV, Segmentation fault.
> > 0x08050cd7 in process_vfs_requests ()
> >     at src/apps/kernel/linux/pvfs2-client-core.c:3125
> > 3125                    if (vfs_request->num_ops == 1 &&
> > 
> > Again, this is orange branch CVS line numbers. As a side note I get a 
> > configure error when trying to build the head version:
> > "config.status: error: cannot find input file: 
> > src/io/bmi/bmi_zoid/module.mk.in"
> > 
> > It's obvious that vfs_request is not a valid address but I 
> > haven't figured out where it's getting mucked with or thrown away. Note, 
> > these logs are after making the op_state_purged(op) change recommended 
> > below. That change seems to have prevented the kernel infinite loop 
> > issue as you indicated it would.
> > 
> > I don't think I'm doing anything special from the application side but 
> > I'm also attaching the application I'm using to create this issue.
> > 
> > Thanks,
> > Michael
> > 
> >>> What version of PVFS is this?  I'm having trouble matching up the lines 
> >>> to anything interesting.
> >>> -sam
> >>>
> >> Sorry, Orange-Branch out of CVS.
> >>
> >> Michael
> >>
> >>>>
> >>>>>> The real issue is the 30% of the time when the process gets stuck in 
> >>>>>> the 
> >>>>>> kernel waiting for a downcall. With some additional debugging, the 
> >>>>>> process's write() call is clearly stuck in the while(1) loop in 
> >>>>>> wait_for_cancellation_downcall(). The function's assumption is that 
> >>>>>> either the request will timeout or it will be serviced after one 
> >>>>>> iteration of the loop. However, in this situation it neither occurs. 
> >>>>>> The 
> >>>>>> schedule_timeout() call immediately returns with a signal pending but 
> >>>>>> the op is never serviced so it spins indefinately.
> >>>>> Those infinite looping conditionals have bugged me for a while now 
> >>>>> (there's one in wait_for_matching_downcall too).  We should probably go 
> >>>>> through and systematically replace all of them.  Lets try to fix your 
> >>>>> bug first though.
> >>>>>
> >>>>> I'm actually surprised that the process goes to 100% cpu in this case.  
> >>>>> You're right that the op is not getting serviced, so the first if 
> >>>>> conditional won't break out of the while loop.  But the 
> >>>>> schedule_timeout should only return non-zero if the task gets woken up, 
> >>>>> and it only gets woken up in purge_waiting_ops() when the 
> >>>>> pvfs2-client-core segfaults.  And that should only happen once.  One 
> >>>>> thing you might try is to change the first if conditional from:
> >>>>>
> >>>>>        if (op_state_serviced(op))
> >>>>>
> >>>>> to:
> >>>>>
> >>>>>        if (op_state_serviced(op) || op_state_purged(op))
> >>>>>
> >>>>> That will allow purged ops to exit the while loop.  Could you share 
> >>>>> your debugging output and modified code?
> >>>> No special code other than some gossip_debugs. The
> >>>> kernel debugging right before and including the cancel is:
> >>>>
> >>>> Jan  7 15:15:25 vm04 kernel: Alloced OP (d0c48020: 875 OP_FILE_IO)
> >>>> Jan  7 15:15:25 vm04 kernel: GET op d0c48020 -> buffer_index 0
> >>>> Jan  7 15:15:25 vm04 kernel: pvfs2_file_write: copy_to_user 1 nr_segs 1, 
> >>>> offset: 3129344 total_size: 4096
> >>>> Jan  7 15:15:25 vm04 kernel: pvfs_bufmap_copy_iovec_from_user: index 0, 
> >>>> size 4096
> >>>> Jan  7 15:15:25 vm04 kernel: First character (integer value) in 
> >>>> pvfs_bufmap_copy_from_user: 63
> >>>> Jan  7 15:15:25 vm04 kernel: pvfs2: service_operation: pvfs2_file_write 
> >>>> d0c48020
> >>>> Jan  7 15:15:25 vm04 kernel: pvfs2: operation posted by process: 
> >>>> pvfs_rand_write, pid: 2076
> >>>> Jan  7 15:15:25 vm04 kernel: Releasing OP (d0c98060: 874)
> >>>> Jan  7 15:15:25 vm04 kernel: client-core: reading op tag 875 OP_FILE_IO
> >>>> Jan  7 15:15:55 vm04 kernel: *** operation interrupted by a signal (tag 
> >>>> 875, op d0c48020)
> >>>> Jan  7 15:15:55 vm04 kernel: Interrupted: Removed op d0c48020 from 
> >>>> htable_ops_in_progress
> >>>> Jan  7 15:15:55 vm04 kernel: pvfs2: service_operation pvfs2_file_write 
> >>>> returning: -4 for d0c48020.
> >>>> Jan  7 15:15:55 vm04 kernel: pvfs2_cancel_op_in_progress called on tag 
> >>>> 875
> >>>> Jan  7 15:15:55 vm04 kernel: Alloced OP (da5000a0: 876 OP_CANCEL)
> >>>> Jan  7 15:15:55 vm04 kernel: Attempting PVFS2 operation cancellation of 
> >>>> tag 875
> >>>> Jan  7 15:15:55 vm04 kernel: pvfs2: service_operation: pvfs2_cancel 
> >>>> da5000a0
> >>>> Jan  7 15:15:55 vm04 kernel: pvfs2: operation posted by process: 
> >>>> pvfs_rand_write, pid: 2076
> >>>> Jan  7 15:15:55 vm04 kernel: *** operation didn't time out: da5000a0, 
> >>>> jiffies_left: 20000
> >>>> Jan  7 15:15:57 vm04 last message repeated 3 times
> >>>> Jan  7 15:15:58 vm04 kernel: *** operation didn't time out: da5000a0, 
> >>>> jiffies_left: 19999
> >>>> Jan  7 15:15:58 vm04 kernel: *** operation didn't time out: da5000a0, 
> >>>> jiffies_left: 20000
> >>>> Jan  7 15:15:58 vm04 last message repeated 59 times
> >>>> ............
> >>>>
> >>>>
> >>>> The pvfs2-client debug log (from an earlier failure) proves that my 
> >>>> statement earlier about segfaulting in the 31st iteration false. This 
> >>>> one goes into the 70's.
> >>>>
> >>>> [D 14:35:53.231333] cancel_op_in_progress called
> >>>> [D 14:35:53.231342] cancelling I/O req 0x88c7f98 from tag 1128
> >>>> [D 14:35:53.231357] PINT_client_io_cancel id 7350
> >>>> [D 14:35:53.231368] job iteration [0]
> >>>> [D 14:35:53.231377] [0] Posting cancellation of type: BMI Send (Request)
> >>>> [D 14:35:53.231386] [0] Posting cancellation of type: BMI Recv 
> >>>> (Response)
> >>>> [D 14:35:53.231396] [0] Posting cancellation of type: FLOW
> >>>> [D 14:35:53.231431] [0] Posting cancellation of type: BMI Recv (Write 
> >>>> Ack)
> >>>> [D 14:35:53.231444] job iteration [1]
> >>>> [D 14:35:53.231452] [1] Posting cancellation of type: BMI Send (Request)
> >>>> [D 14:35:53.231461] [1] Posting cancellation of type: BMI Recv 
> >>>> (Response)
> >>>> [D 14:35:53.231469] [1] Posting cancellation of type: BMI Recv (Write 
> >>>> Ack)
> >>>> [D 14:35:53.231502] job iteration [2]
> >>>> [D 14:35:53.231510] [2] Posting cancellation of type: BMI Send (Request)
> >>>> [D 14:35:53.231519] [2] Posting cancellation of type: BMI Recv 
> >>>> (Response)
> >>>> [D 14:35:53.231528] [2] Posting cancellation of type: FLOW
> >>>> [D 14:35:53.231536] [2] Posting cancellation of type: BMI Recv (Write 
> >>>> Ack)
> >>>> [D 14:35:53.231545] job iteration [3]
> >>>> [D 14:35:53.231553] [3] Posting cancellation of type: BMI Send (Request)
> >>>> [D 14:35:53.231562] [3] Posting cancellation of type: BMI Recv 
> >>>> (Response)
> >>>> [D 14:35:53.231571] [3] Posting cancellation of type: FLOW
> >>>> [D 14:35:53.231579] [3] Posting cancellation of type: BMI Recv (Write 
> >>>> Ack)
> >>>> [D 14:35:53.231588] job iteration [4]
> >>>> [D 14:35:53.231596] [4] Posting cancellation of type: BMI Send (Request)
> >>>> [D 14:35:53.231605] [4] Posting cancellation of type: BMI Recv 
> >>>> (Response)
> >>>> [D 14:35:53.231614] [4] Posting cancellation of type: FLOW
> >>>> [D 14:35:53.231622] [4] Posting cancellation of type: BMI Recv (Write 
> >>>> Ack)
> >>>> [D 14:35:53.231628] job iteration [5]
> >>>> [D 14:35:53.231666] [5] Posting cancellation of type: BMI Send (Request)
> >>>> [D 14:35:53.231675] [5] Posting cancellation of type: BMI Recv 
> >>>> (Response)
> >>>> [D 14:35:53.231684] [5] Posting cancellation of type: FLOW
> >>>> [D 14:35:53.231692] [5] Posting cancellation of type: BMI Recv (Write 
> >>>> Ack)
> >>>> [D 14:35:53.231701] job iteration [6]
> >>>> [D 14:35:53.231709] [6] Posting cancellation of type: BMI Send (Request)
> >>>> [D 14:35:53.231718] [6] Posting cancellation of type: BMI Recv 
> >>>> (Response)
> >>>> [D 14:35:53.231727] [6] Posting cancellation of type: FLOW
> >>>> [D 14:35:53.231735] [6] Posting cancellation of type: BMI Recv (Write 
> >>>> Ack)
> >>>> ............
> >>>> <snip>
> >>>> ............
> >>>> [D 14:35:53.235128] job iteration [69]
> >>>> [D 14:35:53.235136] [69] Posting cancellation of type: FLOW
> >>>> [D 14:35:53.235145] [69] Posting cancellation of type: BMI Recv (Write 
> >>>> Ack)
> >>>> [D 14:35:53.235153] job iteration [70]
> >>>> [D 14:35:53.235161] job iteration [71]
> >>>> [D 14:35:53.235169] job iteration [72]
> >>>> [D 14:35:53.235177] job iteration [73]
> >>>> [E 14:35:53.235212] PVFS2 client: signal 11, faulty address is 0xb8c0c8, 
> >>>> from 0xb26900
> >>>> [E 14:35:53.235810] [bt] 
> >>>> /usr/local/pvfs/lib/libpvfs2.so(PINT_client_io_cancel+0
> >>>> xe0) [0xb26900]
> >>>> [E 14:35:53.235828] [bt] pvfs2-client-core [0x804b926]
> >>>> [E 14:35:53.235836] [bt] pvfs2-client-core [0x804f141]
> >>>> [E 14:35:53.235844] [bt] pvfs2-client-core [0x8050c7b]
> >>>> [E 14:35:53.235852] [bt] pvfs2-client-core(main+0xe48) [0x8052498]
> >>>>
> >>>>
> >>>> I'll give the pending_ops change a try and post the results.
> >>>>
> >>>> Michael
> >>>>
> >>>>> Thanks,
> >>>>>
> >>>>> -sam
> >>>>>
> >>>>>> Has anyone else seen the issue with client-core segfaulting on every 
> >>>>>> cancel op? Should the kernel wait_for_cancellation_downcall() be 
> >>>>>> changed 
> >>>>>> to not allow indefinite looping? 
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Michael
> >>>>>> _______________________________________________
> >>>>>> Pvfs2-developers mailing list
> >>>>>> [email protected]
> >>>>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
> >> _______________________________________________
> >> Pvfs2-developers mailing list
> >> [email protected]
> >> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
> >>
> >> ------------------------------------------------------------------------
> >>
> >> _______________________________________________
> >> Pvfs2-developers mailing list
> >> [email protected]
> >> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
> 

> Index: src/client/sysint/client-state-machine.c
> ===================================================================
> RCS file: /projects/cvsroot/pvfs2-1/src/client/sysint/client-state-machine.c,v
> retrieving revision 1.103
> diff -a -u -p -r1.103 client-state-machine.c
> --- src/client/sysint/client-state-machine.c  7 Oct 2009 13:48:49 -0000       
> 1.103
> +++ src/client/sysint/client-state-machine.c  15 Jan 2010 20:15:16 -0000
> @@ -483,6 +483,7 @@ PVFS_error PINT_client_io_cancel(PVFS_sy
>      PVFS_error ret = -PVFS_EINVAL;
>      PINT_smcb *smcb = NULL;
>      PINT_client_sm *sm_p = NULL;
> +    PINT_client_sm *sm_base_p = NULL;
>  
>      gossip_debug(GOSSIP_CLIENT_DEBUG,
>              "PINT_client_io_cancel id %lld\n",lld(id));
> @@ -507,6 +508,18 @@ PVFS_error PINT_client_io_cancel(PVFS_sy
>      {
>       /* op already completed; nothing to cancel. */
>          return 0;
> +    }
> +    
> +    /* We also don't cancel small I/O operations as posted by
> +     * sys-small-io.sm.  Check the corresponding flag.  We have 
> +     * to jump to the base frame rather than the current frame for this
> +     * information because small-io may have pushed a msgpairarray.
> +     */ 
> +    sm_base_p = PINT_sm_frame(smcb, (-(smcb->frame_count -1)));
> +    if(sm_base_p->u.io.small_io)
> +    {
> +        gossip_debug(GOSSIP_CANCEL_DEBUG,  "skipping cancellation of small 
> I/O operation.\n");
> +        return(0);
>      }
>  
>      /* if we fall to here, the I/O operation is still in flight */

_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to