[EMAIL PROTECTED] wrote on Wed, 13 Feb 2008 21:25 -0600:
> What happens when you restart the client daemon? Does the segfault occur
> with bmi_tcp?
Yeah, I'm getting the same sort of thing, with TCP. 1 client, 1
md+data server. 2.6.24-rc6. A few "ls -la /pvfs" will crash
client-core, and it automatically is restarted. Similar sort of
backtrace. Valgrind doesn't show anything before where it all goes
bad in Troy's traces.
==7517== Invalid read of size 8
==7517== at 0x4C6989E: qlist_empty (quicklist.h:117)
==7517== by 0x4C697DD: PINT_sm_frame (state-machine-fns.c:595)
==7517== by 0x4C270AB: completion_list_retrieve_completed
(client-state-machine.c:140)
==7517== by 0x4C281DB: PINT_client_state_machine_testsome
(client-state-machine.c:694)
==7517== by 0x4C285EB: PVFS_sys_testsome (client-state-machine.c:907)
==7517== by 0x407BED: process_vfs_requests (pvfs2-client-core.c:2943)
==7517== by 0x40A284: main (pvfs2-client-core.c:3379)
==7517== Address 0x53BD870 is 80 bytes inside a block of size 176 free'd
==7517== at 0x4A0560B: free (vg_replace_malloc.c:233)
==7517== by 0x4C696D7: PINT_smcb_free (state-machine-fns.c:551)
==7517== by 0x4C2768A: PINT_client_state_machine_post
(client-state-machine.c:395)
==7517== by 0x4C29FE5: PVFS_isys_getattr (sys-getattr.sm:211)
==7517== by 0x403D94: post_getattr_request (pvfs2-client-core.c:558)
==7517== by 0x408648: handle_unexp_vfs_request (pvfs2-client-core.c:2708)
==7517== by 0x407D70: process_vfs_requests (pvfs2-client-core.c:2990)
==7517== by 0x40A284: main (pvfs2-client-core.c:3379)
(Parse the second half of this first:)
handle_unexp_vfs_request goes off to post a getattr.
PVFS_isys_getattr allocs a new smcb. PINT_client_state_machine_post
starts the SM and it must have finished immediately through a
successful acache lookup. PINT_client_state_machine_post frees the
smcb.
(Now the top half:)
Later testsome decides it has a completed smcb. The same one that
had been freed as above. Although maybe not related.
This is the CVS head _before_ the big cleanup Sam did today. Are
we forgetting to initialize smcb->frames somewhere related? Looking
back for suspicious changes.
There's the memmove() fix on s_completion_list[] by Phil back on 15
jan, but that's obviously a big fix, and it's probably not getting
triggered either. And a bunch of locking changes that are harmless.
Needs more debugging.
-- Pete
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers