According to your stack trace the correct way to call the mca_pml_ob1_dump
is with the communicator from the PMPI call. Thus, this call was successful:
(gdb) call mca_pml_ob1_dump(0xed932d0, 1)
$1 = 0
I should have been more clear, the output is not on gdb but on the output
stream of your application. If you run your application by hand with
mpirun, the output should be on the terminal where you started mpirun. If
you start your job with a batch schedule, the output should be in the
output file associated with your job.
On Fri, Apr 6, 2018 at 12:53 PM, Noam Bernstein <noam.bernst...@nrl.navy.mil
> On Apr 5, 2018, at 4:11 PM, George Bosilca <bosi...@icl.utk.edu> wrote:
> I attach with gdb on the processes and do a "call mca_pml_ob1_dump(comm,
> 1)". This allows the debugger to make a call our function, and output
> internal information about the library status.
> OK - after a number of missteps, I recompiled openmpi with debugging mode
> active, reran the executable (didn’t recompile, just using the new
> library), and got the comm pointer by attaching to the process and looking
> at the stack trace:
> #0 0x00002b8a7599c42b in ibv_poll_cq (cq=0xec66010, num_entries=256,
> wc=0x7ffdea76d680) at /usr/include/infiniband/verbs.h:1272
> #1 0x00002b8a759a8194 in poll_device (device=0xebc5300, count=0) at
> #2 0x00002b8a759a871f in progress_one_device (device=0xebc5300) at
> #3 0x00002b8a759a87be in btl_openib_component_progress () at
> #4 0x00002b8a64b9da42 in opal_progress () at runtime/opal_progress.c:222
> #5 0x00002b8a76c2c199 in ompi_request_wait_completion (req=0xec22600) at
> #6 0x00002b8a76c2d642 in mca_pml_ob1_recv (addr=0x2b8a8a99bf20,
> count=5423600, datatype=0x2b8a64832b80, src=1, tag=200, comm=0xed932d0,
> status=0x385dd90) at pml_ob1_irecv.c:135
> #7 0x00002b8a6454c857 in PMPI_Recv (buf=0x2b8a8a99bf20, count=5423600,
> type=0x2b8a64832b80, source=1, tag=200, comm=0xed932d0, status=0x385dd90)
> at precv.c:79
> #8 0x00002b8a6428ca7c in ompi_recv_f (buf=0x2b8a8a99bf20
> count=0x7ffdea770eb4, datatype=0x2d43bec, source=0x7ffdea770a38,
> tag=0x2d43bf0, comm=0x5d30a68, status=0x385dd90, ierr=0x7ffdea770a3c)
> at precv_f.c:85
> #9 0x000000000042887b in m_recv_z (comm=..., node=-858993460, zvec=Cannot
> access memory at address 0x2d
> ) at mpi.F:680
> #10 0x000000000123e0f1 in fileio::outwav (io=..., wdes=..., w=Cannot
> access memory at address 0x2d
> ) at fileio.F:952
> #11 0x0000000002abfd8f in vamp () at main.F:4204
> #12 0x00000000004139de in main ()
> #13 0x0000003f0c81ed1d in __libc_start_main () from /lib64/libc.so.6
> #14 0x00000000004138e9 in _start ()
> The comm value is different in omp_recv_f and things below, so I tried
> both. With the value of the lower level functions I get nothing useful
> (gdb) call mca_pml_ob1_dump(0xed932d0, 1)
> $1 = 0
> and the value from omp_recv_f I get a seg fault:
> (gdb) call mca_pml_ob1_dump(0x5d30a68, 1)
> Program received signal SIGSEGV, Segmentation fault.
> 0x00002b8a76c26d0d in mca_pml_ob1_dump (comm=0x5d30a68, verbose=1) at
> 577 opal_output(0, "Communicator %s [%p](%d) rank %d recv_seq %d
> num_procs %lu last_probed %lu\n",
> The program being debugged was signaled while in a function called from
> GDB remains in the frame where the signal was received.
> To change this behavior use "set unwindonsignal on".
> Evaluation of the expression containing the function
> (mca_pml_ob1_dump) will be abandoned.
> When the function is done executing, GDB will silently stop.
> Should this have worked, or am I doing something wrong?
> users mailing list
users mailing list