Re: [OMPI users] How does binding option affect network traffic?

2014-08-29 Thread tmishima
Hi, Your cluster is very similar to ours where Torque and OpenMPI is installed. I would use this cmd line: #PBS -l nodes=2:ppn=12 mpirun --report-bindings -np 16  Here --map-by socket:pe=1 and -bind-to core is assumed as default setting. Then, you can run 10 jobs independently and

Re: [OMPI users] How does binding option affect network traffic?

2014-08-29 Thread Ralph Castain
Should be okay. I suspect you are correct in that something isn't right in the fabric. On Fri, Aug 29, 2014 at 1:06 PM, McGrattan, Kevin B. Dr. < kevin.mcgrat...@nist.gov> wrote: > I am able to run all 15 of my jobs simultaneously; 16 MPI processes per > job; mapping by socket and binding to

Re: [OMPI users] How does binding option affect network traffic?

2014-08-29 Thread McGrattan, Kevin B. Dr.
I am able to run all 15 of my jobs simultaneously; 16 MPI processes per job; mapping by socket and binding to socket. On a given socket, 6 MPI processes from 6 separate mpiruns share the 6 cores, or at least I assume they are sharing. The load for all CPUs and all processes is 100%. I

Re: [OMPI users] How does binding option affect network traffic?

2014-08-29 Thread Ralph Castain
On Aug 29, 2014, at 10:51 AM, McGrattan, Kevin B. Dr. wrote: > Thanks for the tip. I understand how using the --cpuset option would help me > in the example I described. However, suppose I have multiple users submitting > MPI jobs of various sizes? I wouldn't know a

Re: [OMPI users] Issues with OpenMPI 1.8.2, GCC 4.9.1, and SLURM Interactive Jobs

2014-08-29 Thread Matt Thompson
Ralph, Here you go: (1080) $ /discover/nobackup/mathomp4/MPI/gcc_4.9.1-openmpi_1.8.2-debug/bin/mpirun --leave-session-attached --debug-daemons --mca oob_base_verbose 10 -np 8 ./helloWorld.182-debug.x [borg01x142:29232] mca: base: components_register: registering oob components [borg01x142:29232]

Re: [OMPI users] How does binding option affect network traffic?

2014-08-29 Thread McGrattan, Kevin B. Dr.
Thanks for the tip. I understand how using the --cpuset option would help me in the example I described. However, suppose I have multiple users submitting MPI jobs of various sizes? I wouldn't know a priori which cores were in use and which weren't. I always assumed that this is what these

Re: [OMPI users] Weird error with OMPI 1.6.3

2014-08-29 Thread Ralph Castain
Yeah, the old 1.6 series didn't do a very good job of auto-detection of #sockets. I believe there is an mca param for telling it how many are there, which is probably what you'd need to use. On Aug 29, 2014, at 9:40 AM, Maxime Boissonneault wrote: > It

Re: [OMPI users] Issues with OpenMPI 1.8.2, GCC 4.9.1, and SLURM Interactive Jobs

2014-08-29 Thread Ralph Castain
Okay, something quite weird is happening here. I can't replicate using the 1.8.2 release tarball on a slurm machine, so my guess is that something else is going on here. Could you please rebuild the 1.8.2 code with --enable-debug on the configure line (assuming you haven't already done so),

Re: [OMPI users] Weird error with OMPI 1.6.3

2014-08-29 Thread Maxime Boissonneault
It is still there in 1.6.5 (we also have it). I am just wondering if there is something wrong in our installation that makes MPI unabled to detect that there are two sockets per node if we do not include a npernode directive. Maxime Le 2014-08-29 12:31, Ralph Castain a écrit : No, it isn't

Re: [OMPI users] Weird error with OMPI 1.6.3

2014-08-29 Thread Ralph Castain
No, it isn't - but we aren't really maintaining the 1.6 series any more. You might try updating to 1.6.5 and see if it remains there On Aug 29, 2014, at 9:12 AM, Maxime Boissonneault wrote: > It looks like > -npersocket 1 > > cannot be used alone. If I

Re: [OMPI users] Weird error with OMPI 1.6.3

2014-08-29 Thread Maxime Boissonneault
It looks like -npersocket 1 cannot be used alone. If I do mpiexec -npernode 2 -npersocket 1 ls -la then I get no error message. Is this expected behavior ? Maxime Le 2014-08-29 11:53, Maxime Boissonneault a écrit : Hi, I am having a weird error with OpenMPI 1.6.3. I run a non-MPI command

[OMPI users] Weird error with OMPI 1.6.3

2014-08-29 Thread Maxime Boissonneault
Hi, I am having a weird error with OpenMPI 1.6.3. I run a non-MPI command just to exclude any code error. Here is the error I get (I run with set -x to get the exact command that are run). ++ mpiexec -npersocket 1 ls -la

Re: [OMPI users] open shmem optimization

2014-08-29 Thread Shamis, Pavel
Hi Timur, I don't think this is apples-to-apples comparison. In OpenSHMEM world "MPI_waitall" would be mapped to shmem_quiet(). Even with this mapping, shmem_quiet() has a *stronger* completion semantics if you compare it to MPI_waitall. Quiet guarantees that the data was delivered to a

Re: [OMPI users] How does binding option affect network traffic?

2014-08-29 Thread Reuti
Hi, Am 28.08.2014 um 20:50 schrieb McGrattan, Kevin B. Dr.: > My institute recently purchased a linux cluster with 20 nodes; 2 sockets per > node; 6 cores per socket. OpenMPI v 1.8.1 is installed. I want to run 15 > jobs. Each job requires 16 MPI processes. For each job, I want to use two >

[OMPI users] open shmem optimization

2014-08-29 Thread Timur Ismagilov
Hello! What param can i tune to increase perfomance(scalability) for my app (all to all pattern with message size = constant/nnodes)? I can read  this faq  for mpi, but is it correct for shmem? I have 2 programm doing the same thing(with same input) each node send messages(message size =