Re: [OMPI users] Openmpi 1.10.1 fails with SIGXFSZ on file limit <= 131072

2015-11-20 Thread Jeff Squyres (jsquyres)
I'm in an airport right now and can't easily check, but instead of using mmap memory (which treats shared memory as a file), you could tell open MPI to use SYSV shared memory. IIRC that isn't treated like a file. Look for a selection mechanism via an MCA param in the sm or Vader btls- run

Re: [OMPI users] Setting bind-to none as default via environment?

2015-11-20 Thread Dave Love
[Another old one.] Nick Papior writes: > Sure, I guess you should use numa to check how the latency/distance is for > the other processors to not select a _bad_ node? > I am not sure. > I can see difficulties in my above post for huge numa nodes, but for 8-10 > cores per

Re: [OMPI users] Ubuntu/Debian packages for recent version (for Travis CI support)

2015-11-20 Thread Dave Love
[I seem never to have sent this, the end of which might have been relevant before I replied again.] Jeff Hammond writes: > Dave: > > Regarding http://www.open-mpi.org/community/lists/users/2015/11/27981.php... > > The ARMCI-MPI unit test tests/mpi/test_mpi_subarray_accs

Re: [OMPI users] Openmpi 1.10.1 fails with SIGXFSZ on file limit <= 131072

2015-11-20 Thread Saurabh T
> For what it's worth, that's open MPI creating a chunk of shared memory for use with on-server > communication. It shows up as a "file", but it's really shared memory. > You can disable sm and/or Vader, but your on-server message passing > performance will be significantly > lower. > Is

Re: [OMPI users] help understand unhelpful ORTE error message

2015-11-20 Thread Gilles Gouaillardet
Currently, ompi create a file in the temporary directory and then mmap it. an obvious requirement is the temporary directory must have enough free space for that file. (this might be an issue on some disk less nodes) a simple alternative could be to try /tmp, and if there is not enough space,

Re: [OMPI users] help understand unhelpful ORTE error message

2015-11-20 Thread Dave Love
Jeff Hammond writes: >> Doesn't mpich have the option to use sysv memory? You may want to try that >> >> > MPICH? Look, I may have earned my way onto Santa's naughty list more than > a few times, but at least I have the decency not to post MPICH questions to > the

[OMPI users] local directory for shmem etc. (was: help understand unhelpful ORTE error message)

2015-11-20 Thread Dave Love
Martin Siegert writes: >> In particular, is there a way to cause shm to not use the global >> filesystem? I see this issue comes up a lot and I read the list archives, >> but the warning message ( >> https://github.com/hpc/cce-mpi-openmpi-1.6.4/blob/master/ompi/mca/common/sm/ >>

Re: [OMPI users] help understand unhelpful ORTE error message

2015-11-20 Thread Dave Love
[There must be someone better to answer this, but since I've seen it:] Jeff Hammond writes: > I have no idea what this is trying to tell me. Help? > > jhammond@nid00024:~/MPI/qoit/collectives> mpirun -n 2 ./driver.x 64 > [nid00024:00482] [[46168,0],0] ORTE_ERROR_LOG:

Re: [OMPI users] Openmpi 1.10.1 fails with SIGXFSZ on file limit <= 131072

2015-11-20 Thread Jeff Squyres (jsquyres)
Wouldn't be a bad idea to fail a little better, ya. Perhaps a good show-help message. Sent from my phone. No type good. On Nov 20, 2015, at 5:52 AM, Gilles Gouaillardet > wrote: Jeff, should we check ulimit in vader/sm btl

Re: [OMPI users] Openmpi 1.10.1 fails with SIGXFSZ on file limit <= 131072

2015-11-20 Thread Jeff Squyres (jsquyres)
For what it's worth, that's open MPI creating a chunk of shared memory for use with on-server communication. It shows up as a "file", but it's really shared memory. You can disable sm and/or Vader, but your on-server message passing performance will be significantly lower. Is there a reason