http://www.scl.ameslab.gov/~troy/pvfs/pvfs2-client-log-crash
or
http://www.scl.ameslab.gov/~troy/pvfs/pvfs2-client-log-crash.gz

This looks pretty bad:

[D 16:10:47.741903] [SM Entering]: (0x10103cf0) sysdev_unexp_sm:post (status: 0)
[D 16:10:47.741929] [SM Exiting]: (0x10103cf0) sysdev_unexp_sm:post (error 
code: 0), (action: DEFERRED)
[D 16:10:47.741953] Posted PVFS_DEV_UNEXPECTED (375) (waiting for 
test)(-1073741839)
[D 16:10:47.741977] [-] reposted unexp req [0x10117fc0] due to inlined 
completion
[D 16:10:47.742001] FRAME GET smcb 0x10119bd8 index 0 -> frame: NULL

Maybe we should have an assert in this code??

.. more info from a gdb trace..

(gdb) print smcb
$1 = (PINT_smcb *) 0x10119bd8
(gdb) print *smcb
$2 = {stackptr = 0, current_state = 0x100ea3e4, state_stack = {0x100ea3c0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
   0x0}, frames = {next = 0x10119c00, prev = 0x10119c00}, frame_count = 1,
op_get_state_machine = 0x1005cce0 <client_op_state_get_machine>, op = 5, op_id = 0, parent_smcb = 0x0, op_terminate = 1, op_cancelled = 0, children_running = 0, op_completed = 1, context = 0, terminate_fn = 0x1005ce74 <client_state_machine_terminate>, user_ptr = 0x0}
(gdb) print limit
$3 = 64
(gdb) print i
$4 = 0
(gdb) list PINT_sm_frame
586      * Params: pointer to smcb, stack index
587      * Returns: pointer to frame
588      * Synopsis: returns a frame off of the frame stack
589      */
590     void *PINT_sm_frame(struct PINT_smcb *smcb, int index)
591     {
592         struct PINT_frame_s *frame_entry;
593         struct qlist_head *next;
594
595         if(qlist_empty(&smcb->frames))
(gdb)
596         {
597             gossip_debug(GOSSIP_STATE_MACHINE_DEBUG,
598                          "FRAME GET smcb %p index %d -> frame: NULL\n",
599                          smcb, index);
600             return NULL;
601         }
602         else
603         {
604             int i = 0;
605
(gdb)
606             next = smcb->frames.next;
607             while(i < index)
608             {
609                 next = next->next;
610             }
611             frame_entry = qlist_entry(next, struct PINT_frame_s, link);
612             return frame_entry->frame;
613         }
614     }
615
(gdb) print smcb->frames
$5 = {next = 0x10119c00, prev = 0x10119c00}



All I get from this is that the frames qlist has a single entry,
state_stack[4].  Not sure how it got so deep into there.  Likely
some sort of memory corruption, or we have a fairly major
undiscovered SM bug on our hands.

If you can repeat this at will, doing a -g build and running with
all debugging would be especially nice.  Maybe the debug log would
show something curious.

The other approach is to run under valgrind and cross fingers it
finds something interesting.

                -- Pete

(gdb) info locals
i = 0
new_list_index = 0
tmp_completion_list = {0x0 <repeats 256 times>}
sm_p = (PINT_client_sm *) 0x0
__PRETTY_FUNCTION__ = "completion_list_retrieve_completed"
(gdb) print op_id_array
$5 = (PVFS_sys_op_id *) 0xfff7d710
(gdb) print op_id_array[0]
$7 = 34


_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to