[EMAIL PROTECTED] wrote on Tue, 12 Feb 2008 17:14 -0600:
> I'm getting a sig11 with the power5 client.. Here's a bunch of debugging 
> info.. now where do I got next?
>
> [D 16:57:30.033896] BMI_post_sendunexpected_list: addr: 269231512, count: 
> 1, tot
> al_size: 52, tag: 15
> [D 16:57:30.033926]    element 0: offset: 0x1013d390, size: 52
> [D 16:57:30.033955] post_send: sq 0x100c7f60 len 52 peer da13:3345.
> [D 16:57:30.033984] encourage_send_waiting_buffer: sq 0x100c7f60 sent EAGER 
> len
> 52.
> [D 16:57:30.034019] ib_check_cq: send to da13:3345 completed locally: sq 
> 0x100c7
> f60 -> SQ_WAITING_USER_TEST.
> [D 16:57:30.034047] test_sq: sq 0x100c7f60 completed 52 to da13:3345.
> [D 16:57:30.034162] ib_check_cq: recv from da13:3345 len 104 type 
> MSG_EAGER_SEND
> credit 1.
> [D 16:57:30.034191] encourage_recv_incoming: recv eager len 104.
> [D 16:57:30.034216] encourage_recv_incoming: matched rq 0x100d8790 now 
> RQ_EAGER_
> WAITING_USER_TEST.
> [D 16:57:30.034246] encourage_recv_incoming: early registration not needed, 
> dere                          g after eager.
> [D 16:57:30.034276] memcache_deregister: dec refcount [0] 0x10146930 len 
> 8224 (v                          ia 0x10146930 len 8224) refcnt now 1.
> [D 16:57:30.034307] test_rq: rq 0x100d8790 completed 88 from da13:3345.
>
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread -134410208 (LWP 6302)]
> completion_list_retrieve_completed (op_id_array=0xfff7d710,
>    user_ptr_array=0xfff7d310, error_code_array=0xfff7d410, limit=64,
>    out_count=0xfff7d2f0) at ../src/client/sysint/client-state-machine.c:141
> 141                 op_id_array[i] = sm_p->sys_op_id;
> (gdb)
> (gdb)
> (gdb)
> (gdb)
> (gdb) bt
> #0  completion_list_retrieve_completed (op_id_array=0xfff7d710,
>    user_ptr_array=0xfff7d310, error_code_array=0xfff7d410, limit=64,
>    out_count=0xfff7d2f0) at ../src/client/sysint/client-state-machine.c:141
> #1  0x100441b4 in PINT_client_state_machine_testsome 
> (op_id_array=0xfff7d710,
>    op_count=0xfff7d2f0, user_ptr_array=0xfff7d310,
>    error_code_array=0xfff7d410, timeout_ms=10)
>    at ../src/client/sysint/client-state-machine.c:694
> #2  0x10010c00 in process_vfs_requests ()
>    at ../src/apps/kernel/linux/pvfs2-client-core.c:2943
> #3  0x100120f4 in main (argc=<value optimized out>, argv=0xfff7dc74)
>    at ../src/apps/kernel/linux/pvfs2-client-core.c:3379
> (gdb) print sm_p
> $1 = (PINT_client_sm *) 0x0
> (gdb)
> $2 = (PINT_client_sm *) 0x0
> (gdb) list
> 136             assert(smcb);
> 137
> 138             if (i < limit)
> 139             {
> 140                 sm_p = PINT_sm_frame(smcb, PINT_FRAME_CURRENT);
> 141                 op_id_array[i] = sm_p->sys_op_id;
> 142                 error_code_array[i] = sm_p->error_code;
> 143
> 144                 if (user_ptr_array)
> 145                 {
> (gdb) print smcb
> No symbol "smcb" in current context.
> (gdb) list -
> 126
> 127         gen_mutex_lock(&s_completion_list_mutex);
> 128         for(i = 0; i < s_completion_list_index; i++)
> 129         {
> 130             if (s_completion_list[i] == NULL)
> 131             {
> 132                 continue;
> 133             }
> 134
> 135             smcb = s_completion_list[i];
> (gdb) print s_completion_list[0]
> $3 = (PINT_smcb *) 0x100da450
> (gdb) print *s_completion_list[0]
> $4 = {stackptr = 0, current_state = 0x100b0068, state_stack = {0x100aff90,
>    0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, frames = {next = 0x100da478,
>    prev = 0x100da478}, frame_count = 1,
>  op_get_state_machine = 0x10043b80 <client_op_state_get_machine>, op = 5,
>  op_id = 0, parent_smcb = 0x0, op_terminate = 1, op_cancelled = 0,
>  children_running = 0, op_completed = 1, context = 0,
>  terminate_fn = 0x100452a0 <client_state_machine_terminate>, user_ptr = 
> 0x0}

All I get from this is that the frames qlist has a single entry,
state_stack[4].  Not sure how it got so deep into there.  Likely
some sort of memory corruption, or we have a fairly major
undiscovered SM bug on our hands.

If you can repeat this at will, doing a -g build and running with
all debugging would be especially nice.  Maybe the debug log would
show something curious.

The other approach is to run under valgrind and cross fingers it
finds something interesting.

                -- Pete

> (gdb) info locals
> i = 0
> new_list_index = 0
> tmp_completion_list = {0x0 <repeats 256 times>}
> sm_p = (PINT_client_sm *) 0x0
> __PRETTY_FUNCTION__ = "completion_list_retrieve_completed"
> (gdb) print op_id_array
> $5 = (PVFS_sys_op_id *) 0xfff7d710
> (gdb) print op_id_array[0]
> $7 = 34
>
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to