Ok, that would be great -- thanks. Recompiling Open MPI with --enable-debug will turn on several debugging/sanity checks inside Open MPI, and it will also enable debugging symbols. Hence, If you can get a failure when a debug Open MPI build, it might give you a core file that can be used to get a more detailed stack trace, poke around and see if there's a NULL pointer somewhere, ...etc.
> On Jul 11, 2018, at 11:03 AM, Noam Bernstein <noam.bernst...@nrl.navy.mil> > wrote: > >> >> On Jul 11, 2018, at 9:58 AM, Noam Bernstein <noam.bernst...@nrl.navy.mil> >> wrote: >> >>> On Jul 10, 2018, at 5:15 PM, Noam Bernstein <noam.bernst...@nrl.navy.mil> >>> wrote: >>> >>> >>> >>> What are useful steps I can do to debug? Recompile with —enable-debug? >>> Are there any other versions that are worth trying? I don’t recall this >>> error happening before we switched to 3.1.0. >>> >>> thanks, >>> Noam >> >> It appears that the problem is there with OpenMPI 3.1.1, but not 2.1.3. Of >> course I can’t be 100% sure, since it’s non deterministic, but 3 runs died >> after 0-3 iterations with 3.1.1, and did 3 runs with 10 iterations each with >> 2.1.3. > > After more extensive testing it’s clear that it still happens with 2.1.3, but > much less frequently. I’m going to try to get more detailed info with > version 3.1.1, where it’s easier to reproduce. > > Noam > > > > ____________ > || > |U.S. NAVAL| > |_RESEARCH_| > LABORATORY > > Noam Bernstein, Ph.D. > Center for Materials Physics and Technology > U.S. Naval Research Laboratory > T +1 202 404 8628 F +1 202 404 7546 > https://www.nrl.navy.mil > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users -- Jeff Squyres jsquy...@cisco.com _______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users