Ah, I was misled by the subject. Can you provide more information about "hangs", and your environment?
You previously cited: - E5-2697A v4 CPUs and Mellanox ConnectX-3 FDR Infiniband - SLRUM - Open MPI v3.0.0 - IMB-MPI1 Can you send the information listed here: https://www.open-mpi.org/community/help/ BTW, the fact that you fixed the last error by growing the tmpdir size (admittedly: we should probably have a better error message here, and shouldn't just segv like you were seeing -- I'll open a bug on that), you can probably remove "--mca btl ^vader" or other similar CLI options. vader and sm were [probably?] failing due to the memory-mapped files on the filesystem running out of space and Open MPI not handling it well. Meaning: in general, you don't want to turn off shared memory support, because that will likely always be the fastest for on-node communication. > On Nov 30, 2017, at 11:10 AM, Götz Waschk <goetz.was...@gmail.com> wrote: > > Dear Jeff, > > I'm using openmpi as shipped by OpenHPC, so I'll upgrade 1.10 to > 1.10.7 when they do. But it isn't 1.10 that is failing for me but > openmpi 3.0.0. > > Regards, Götz > > On Thu, Nov 30, 2017 at 4:24 PM, Jeff Squyres (jsquyres) > <jsquy...@cisco.com> wrote: >> Can you upgrade to 1.10.7? That's the last release in the v1.10 series, and >> has all the latest bug fixes. >> >>> On Nov 30, 2017, at 9:53 AM, Götz Waschk <goetz.was...@gmail.com> wrote: >>> >>> Hi everyone, >>> >>> I have managed to solve the first part of this problem. It was caused >>> by the quota on /tmp, that's where the session directory of openmpi >>> was stored. There's a XFS default quota of 100MB to prevent users from >>> filling up /tmp. Instead of an over quota message, the result was the >>> openmpi crash from a bus error. >>> >>> After setting TMPDIR in slurm, I was finally able to run IMB-MPI1 with >>> 1024 cores and openmpi 1.10.6. >>> >>> But now for the new problem: with openmpi3, the same test (IMB-MPI1, >>> 1024 cores, 32 nodes) hangs after about 30 minutes of runtime. Any >>> idea on this? >>> >>> Regards, Götz Waschk >>> _______________________________________________ >>> users mailing list >>> users@lists.open-mpi.org >>> https://lists.open-mpi.org/mailman/listinfo/users >> >> >> -- >> Jeff Squyres >> jsquy...@cisco.com >> >> _______________________________________________ >> users mailing list >> users@lists.open-mpi.org >> https://lists.open-mpi.org/mailman/listinfo/users > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users -- Jeff Squyres jsquy...@cisco.com _______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users