Hi all - I've been trying to debug a segfault in OpenMPI 3.1.2, and in the 
process I noticed that 3.1.3 is out, so I thought I'd test it.  However, with 
3.1.3 the code (LAMMPS) hangs very early, in dealing with input.  I'm running 
16 tasks on a single 16 core node, with Infiniband (which it may be using, 
although it's only one node).  Attaching to the 16 hung processes with gdb it 
appears that 15 of them are in PMPI_Cart_create (input.cpp line 243), while one 
is stuck on an earlier Bcast (input.cpp line 222), which is presumably the 
actual hanging task:
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
0x00002b7303ac7f6f in opal_progress () from 
#0  0x00002b7303ac7f6f in opal_progress () from 
#1  0x00002b73022e0675 in ompi_request_default_wait () from 
#2  0x00002b73023286ee in ompi_coll_base_bcast_intra_generic () from 
#3  0x00002b7302328b67 in ompi_coll_base_bcast_intra_binomial () from 
#4  0x00002b731881670c in ompi_coll_tuned_bcast_intra_dec_fixed () from 
#5  0x00002b73022f5b19 in PMPI_Bcast () from 
#6  0x0000000000483fdb in LAMMPS_NS::Input::file (this=0x3a62eb0) at 
#7  0x000000000040bc48 in main (argc=<optimized out>, argv=<optimized out>) at 

The compilation is pretty straightforward.  CentOS 7 stock gcc (4.8.5), CentOS 
stock IB support and "--with-verbs --with-ofi” flags to configure. My first 
thought was to add —enable-debug and —enable-mem-debug, but with those 
configure options on, the process does not hang.

Does anyone have any suggestions for investigating further?

users mailing list

Reply via email to