Troy,
Could you also sent the stacktrace from gdb where the segfault
occurs? That's going to be the most useful info for us.
Thanks,
-sam
On Feb 13, 2008, at 4:24 PM, Troy Benjegerdes wrote:
http://www.scl.ameslab.gov/~troy/pvfs/pvfs2-client-log-crash
or
http://www.scl.ameslab.gov/~troy/pvfs/pvfs2-client-log-crash.gz
This looks pretty bad:
[D 16:10:47.741903] [SM Entering]: (0x10103cf0) sysdev_unexp_sm:post
(status: 0)
[D 16:10:47.741929] [SM Exiting]: (0x10103cf0) sysdev_unexp_sm:post
(error code: 0), (action: DEFERRED)
[D 16:10:47.741953] Posted PVFS_DEV_UNEXPECTED (375) (waiting for
test)(-1073741839)
[D 16:10:47.741977] [-] reposted unexp req [0x10117fc0] due to
inlined completion
[D 16:10:47.742001] FRAME GET smcb 0x10119bd8 index 0 -> frame: NULL
Maybe we should have an assert in this code??
.. more info from a gdb trace..
(gdb) print smcb
$1 = (PINT_smcb *) 0x10119bd8
(gdb) print *smcb
$2 = {stackptr = 0, current_state = 0x100ea3e4, state_stack =
{0x100ea3c0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
0x0}, frames = {next = 0x10119c00, prev = 0x10119c00}, frame_count
= 1,
op_get_state_machine = 0x1005cce0 <client_op_state_get_machine>, op
= 5, op_id = 0, parent_smcb = 0x0,
op_terminate = 1, op_cancelled = 0, children_running = 0,
op_completed = 1, context = 0,
terminate_fn = 0x1005ce74 <client_state_machine_terminate>, user_ptr
= 0x0}
(gdb) print limit
$3 = 64
(gdb) print i
$4 = 0
(gdb) list PINT_sm_frame
586 * Params: pointer to smcb, stack index
587 * Returns: pointer to frame
588 * Synopsis: returns a frame off of the frame stack
589 */
590 void *PINT_sm_frame(struct PINT_smcb *smcb, int index)
591 {
592 struct PINT_frame_s *frame_entry;
593 struct qlist_head *next;
594
595 if(qlist_empty(&smcb->frames))
(gdb)
596 {
597 gossip_debug(GOSSIP_STATE_MACHINE_DEBUG,
598 "FRAME GET smcb %p index %d -> frame:
NULL\n",
599 smcb, index);
600 return NULL;
601 }
602 else
603 {
604 int i = 0;
605
(gdb)
606 next = smcb->frames.next;
607 while(i < index)
608 {
609 next = next->next;
610 }
611 frame_entry = qlist_entry(next, struct PINT_frame_s,
link);
612 return frame_entry->frame;
613 }
614 }
615
(gdb) print smcb->frames
$5 = {next = 0x10119c00, prev = 0x10119c00}
All I get from this is that the frames qlist has a single entry,
state_stack[4]. Not sure how it got so deep into there. Likely
some sort of memory corruption, or we have a fairly major
undiscovered SM bug on our hands.
If you can repeat this at will, doing a -g build and running with
all debugging would be especially nice. Maybe the debug log would
show something curious.
The other approach is to run under valgrind and cross fingers it
finds something interesting.
-- Pete
(gdb) info locals
i = 0
new_list_index = 0
tmp_completion_list = {0x0 <repeats 256 times>}
sm_p = (PINT_client_sm *) 0x0
__PRETTY_FUNCTION__ = "completion_list_retrieve_completed"
(gdb) print op_id_array
$5 = (PVFS_sys_op_id *) 0xfff7d710
(gdb) print op_id_array[0]
$7 = 34
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers