Hi Phil,

We're still seeing some issues around cancellation. One case I noticed, 
but am finding hard to replicate, is when the sys-io state machine is in 
the unstuff_xfer_msgpair state and has jumped to pvfs2_msgpairarray_sm. 
For that state there will be a similar issue with a non I/O frame on the 
stack, correct? The cases I've seen are when gibberish context counts 
get printed such as the below and are followed by a segfault when 
accessing cur_ctx.

[D 15:51:00.658599] PINT_client_io_cancel id 7707
[D 15:51:00.658639] base frame is at index: -1
[D 15:51:00.658648] PINT_client_io_cancel: sm_p->u.io.context_count: 8958368
[D 15:51:00.658657] PINT_client_io_cancel: iteration i: 0

#0  PINT_client_io_cancel (id=7707) 
    at src/client/sysint/client-state-machine.c:548
#1  0x0804baf7 in service_operation_cancellation (vfs_request=0x85227e0) 
    at src/apps/kernel/linux/pvfs2-client-core.c:407
#2  0x0804f311 in handle_unexp_vfs_request (vfs_request=0x85227e0) 
    at src/apps/kernel/linux/pvfs2-client-core.c:2980
#3  0x08050f1f in process_vfs_requests ()
    at src/apps/kernel/linux/pvfs2-client-core.c:3180
#4  0x080527a8 in main (argc=10, argv=0xbfa14434)
    at src/apps/kernel/linux/pvfs2-client-core.c:3593

I notice there are jumps for io_getattr and io_datafile_size which would 
put other frames on the stack. Should the code after the small io check 
just use the base frame pointer instead of sm_p? 

Thanks,
Michael

On Wed, Jan 20, 2010 at 08:01:41AM -0600, Phil Carns wrote:
> Great!  Thanks for testing it out.
> 
> -Phil
> 
> Michael Moore wrote:
> > Thanks Phil, that appears to solve the problem! I tested it both against 
> > head and orange branch and didn't see any of the infinite looping or 
> > client segfaults. I tested it without any of the other changes so it 
> > looks like that patch alone resolves the issue.
> > 
> > Michael
> > 
> > On Fri, Jan 15, 2010 at 03:28:54PM -0500, Phil Carns wrote:
> >> Hi Michael,
> >>
> >> I just tried your test case on a clean trunk build here and was able to 
> >> reproduce the pvfs2-client-core segfault 100% of the time on my box.
> >>
> >> The problem in a nutshell is that pvfs2-client-core was trying to cancel 
> >> a small-io operation using logic that is only appropriate for a normal 
> >> I/O operation, in turn causing some memory corruptions.
> >>
> >> Can you try out the fix and see if it solves the problem for you?  The 
> >> patch is attached your you can pull it from cvs trunk.
> >>
> >> You might want to try that change by itself (without the op purged 
> >> change) first and go from there.  Some of the other issues you ran into 
> >> may have been an after-effect from the cancel problem.
> >>
> >> -Phil
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to