Hello,
I'm currently working on a new RDMA device so that it will work with Open
MPI platform.

During my work i came to a point where the program crashes but leaves not
enough information for me to find the root cause.

I tried to run gdb with the command line (mpirun -np 2 ring_c) but
obviously that didn't help as it just tried to debug mpirun and not ring_c.

Then i tried --debug option but it fails since it works with limited list
of debuggers which unfortunately are not available in Fedora repo.

This is the (part of the) crash log i see when running ring_c:
[fc28-2:01086] *** Process received signal ***
[fc28-2:01086] Signal: Segmentation fault (11)
[fc28-2:01086] Signal code: Invalid permissions (2)
[fc28-2:01086] Failing at address: 0x211dfc0
[fc28-2:01087] *** Process received signal ***
[fc28-2:01087] Signal: Segmentation fault (11)
[fc28-2:01087] Signal code: Invalid permissions (2)
[fc28-2:01087] Failing at address: 0x2656fc0
[fc28-2:01087] [fc28-2:01086] [ 0] 
/lib64/libpthread.so.0(+0x11fc0)[0x7f2dd089afc0]
[fc28-2:01086] [ 1] [0x211dfc0]
[fc28-2:01086] *** End of error message ***
[ 0] /lib64/libpthread.so.0(+0x11fc0)[0x7fcfc7553fc0]
[fc28-2:01087] [ 1] [0x2656fc0]
[fc28-2:01087] *** End of error message ***
-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
-------------------------------------------------------

And here are the command line parameters i'm using:
btl_base_verbose = 100
btl_openib_verbose = 100
btl = openib,self
btl_openib_receive_queues = P,4096,8,6,4
btl_openib_cpc_include = rdmacm

Appreciate any help here.

Thanks,
Yuval
_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Reply via email to