Re: [OMPI users] mpi4py+OpenMPI: Qs about submitting bugs and examples

2016-10-31 Thread r...@open-mpi.org
> On Oct 31, 2016, at 10:39 AM, Jason Maldonis wrote: > > Hello everyone, > > I am using mpi4py with OpenMPI for a simulation that uses dynamic resource > allocation via `mpi_spawn_multiple`. I've been working on this problem for > about 6 months now and I have some

Re: [OMPI users] what was the rationale behind rank mapping by socket?

2016-10-28 Thread r...@open-mpi.org
gt; It's our foot, and we have been doing a good job of shooting it. ;-) > > -- bennet > > > > > On Fri, Oct 28, 2016 at 7:18 PM, r...@open-mpi.org <r...@open-mpi.org> wrote: >> FWIW: I’ll be presenting “Mapping, Ranking, and Binding - Oh My!” at the >> OMPI BoF me

Re: [OMPI users] what was the rationale behind rank mapping by socket?

2016-10-28 Thread r...@open-mpi.org
FWIW: I’ll be presenting “Mapping, Ranking, and Binding - Oh My!” at the OMPI BoF meeting at SC’16, for those who can attend. Will try to explain the rationale as well as the mechanics of the options > On Oct 11, 2016, at 8:09 AM, Dave Love wrote: > > Gilles

Re: [OMPI users] Launching hybrid MPI/OpenMP jobs on a cluster: correct OpenMPI flags?

2016-10-28 Thread r...@open-mpi.org
FWIW: I’ll be presenting “Mapping, Ranking, and Binding - Oh My!” at the OMPI BoF meeting at SC’16, for those who can attend > On Oct 11, 2016, at 8:16 AM, Dave Love wrote: > > Wirawan Purwanto writes: > >> Instead of the scenario above, I was

Re: [OMPI users] MCA compilation later

2016-10-28 Thread r...@open-mpi.org
You don’t need any of the hardware - you just need the headers. Things like libfabric and libibverbs are all publicly available, and so you can build all that support even if you cannot run it on your machine. Once your customer installs the binary, the various plugins will check for their

Re: [OMPI users] Slurm binding not propagated to MPI jobs

2016-10-27 Thread r...@open-mpi.org
iet > Here are the relevant Slurm configuration options that could conceivably > change the behavior from system to system: > SelectType = select/cons_res > SelectTypeParameters = CR_CPU > > > On 10/27/2016 01:17 PM, r...@open-mpi.org <mailto:r...@open-mpi.org>

Re: [OMPI users] Slurm binding not propagated to MPI jobs

2016-10-27 Thread r...@open-mpi.org
nd=core env | grep BIND > SLURM_CPU_BIND_VERBOSE=quiet > SLURM_CPU_BIND_TYPE=mask_cpu: > SLURM_CPU_BIND_LIST=0x,0x > SLURM_CPU_BIND=quiet,mask_cpu:0x,0x > SLURM_CPU_BIND_VERBOSE=quiet > SLURM_CPU_BIND_TYPE=mask_cpu: > SLURM_CPU_BIND_LIST=0x,0x > SLURM_CPU_BIND=quiet,m

Re: [OMPI users] Slurm binding not propagated to MPI jobs

2016-10-27 Thread r...@open-mpi.org
Hey Andy Is there a SLURM envar that would tell us the binding option from the srun cmd line? We automatically bind when direct launched due to user complaints of poor performance if we don’t. If the user specifies a binding option, then we detect that we were already bound and don’t do it.

[OMPI users] Supercomputing 2016: Birds-of-a-Feather meetings

2016-10-24 Thread r...@open-mpi.org
Hello all This year, we will again be hosting Birds-of-a-Feather meetings for Open MPI and PMIx. Open MPI: Wed, Nov 16th, 5:15-7pm http://sc16.supercomputing.org/presentation/?id=bof103=sess322 PMIx: Wed, Nov16th,

Re: [OMPI users] how to tell if pmi or pmi2 is being used?

2016-10-13 Thread r...@open-mpi.org
If you are using mpirun, then neither PMI1 or PMI2 are involved at all. ORTE has its own internal mechanism for handling wireup. > On Oct 13, 2016, at 10:43 AM, David Shrader wrote: > > Hello All, > > I'm using Open MPI 1.10.3 with Slurm and would like to ask how do I find

Re: [OMPI users] Launching hybrid MPI/OpenMP jobs on a cluster: correct OpenMPI flags?

2016-10-03 Thread r...@open-mpi.org
FWIW: the socket option seems to work fine for me: $ mpirun -n 12 -map-by socket:pe=2 -host rhc001 --report-bindings hostname [rhc001:200408] MCW rank 1 bound to socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]]: [../../../../../../../../../../../..][BB/BB/../../../../../../../../../..]

Re: [OMPI users] MPI_Comm_spawn

2016-09-29 Thread r...@open-mpi.org
: > > Ralph, > > My guess is that ptl.c comes from PSM lib ... > > Cheers, > > Gilles > > On Thursday, September 29, 2016, r...@open-mpi.org <mailto:r...@open-mpi.org> > <r...@open-mpi.org <mailto:r...@open-mpi.org>> wrote: > Spawn definitely

Re: [OMPI users] MPI_Comm_spawn

2016-09-29 Thread r...@open-mpi.org
Spawn definitely does not work with srun. I don’t recognize the name of the file that segfaulted - what is “ptl.c”? Is that in your manager program? > On Sep 29, 2016, at 6:06 AM, Gilles Gouaillardet > wrote: > > Hi, > > I do not expect spawn can work with

Re: [OMPI users] Openmpi 1.10.x, mpirun and Slurm 15.08 problem

2016-09-23 Thread r...@open-mpi.org
This isn’t an issue with the SLURM integration - this is the problem of our OOB not correctly picking the right subnet for connecting back to mpirun. In this specific case, you probably want -mca btl_tcp_if_include em4 -mca oob_tcp_if_include em4 since it is the em4 network that ties the

Re: [OMPI users] Strange errors when running mpirun

2016-09-22 Thread r...@open-mpi.org
ine. > > Aborting. > -------------- > > and when I type "ls" the directory > "openmpi-sessions-501@Justins-MacBook-Pro-2_0" reappeared. Unless > there's a different directory I need to look for? >

Re: [OMPI users] Strange errors when running mpirun

2016-09-22 Thread r...@open-mpi.org
Maybe I’m missing something, but “mpirun -n 1” doesn’t include the name of an application to execute. The error message prior to that error indicates that you have some cruft sitting in your tmpdir. You just need to clean it out - look for something that starts with “openmpi” > On Sep 22,

Re: [OMPI users] OMPI users] Unable to mpirun from within torque

2016-09-08 Thread r...@open-mpi.org
I’m pruning this email thread so I can actually read the blasted thing :-) Guys: you are off in the wilderness chasing ghosts! Please stop. When I say that Torque uses an “ordered” file, I am _not_ saying that all the host entries of the same name have to be listed consecutively. I am saying

Re: [OMPI users] Unable to mpirun from within torque

2016-09-08 Thread r...@open-mpi.org
If you are correctly analyzing things, then there would be an issue in the code. When we get an allocation from a resource manager, we set a flag indicating that it is “gospel” - i.e., that we do not directly sense the number of cores on a node and set the #slots equal to that value. Instead,

Re: [OMPI users] OMPI users] Unable to mpirun from within torque

2016-09-08 Thread r...@open-mpi.org
only two nodes with 16 slots each are available and i request > -l nodes=3:ppn=1 > i guess this is a different scheduler configuration, and i cannot change that. > > Could you please have a look at this ? > > Cheers, > > Gilles > > On 9/7/2016 11:15 PM, r...@open-mp

Re: [OMPI users] OMPI users] Unable to mpirun from within torque

2016-09-07 Thread r...@open-mpi.org
You aren’t looking in the right place - there is an “openmpi” directory underneath that one, and the mca_xxx libraries are down there > On Sep 7, 2016, at 7:43 AM, Oswin Krause > wrote: > > Hi Gilles, > > I do not have this library. Maybe this helps

Re: [OMPI users] OMPI users] Unable to mpirun from within torque

2016-09-07 Thread r...@open-mpi.org
The usual cause of this problem is that the nodename in the machinefile is given as a00551, while Torque is assigning the node name as a00551.science.domain. Thus, mpirun thinks those are two separate nodes and winds up spawning an orted on its own node. You might try ensuring that your

Re: [OMPI users] stdin issue with openmpi/2.0.0

2016-08-30 Thread r...@open-mpi.org
a.out < test.in > > Please see attached for the outputs. > > Thank you Ralph. I am willing to provide whatever information you need. > > From: users <users-boun...@lists.open-mpi.org > <mailto:users-boun...@lists.open-mpi.org>> on behalf of r...@open-mpi.org > &

Re: [OMPI users] stdin issue with openmpi/2.0.0

2016-08-30 Thread r...@open-mpi.org
> From: users <users-boun...@lists.open-mpi.org > <mailto:users-boun...@lists.open-mpi.org>> on behalf of r...@open-mpi.org > <mailto:r...@open-mpi.org> <r...@open-mpi.org <mailto:r...@open-mpi.org>> > Sent: Tuesday, August 30, 2016 12:56:33 PM > To: Open MP

Re: [OMPI users] stdin issue with openmpi/2.0.0

2016-08-30 Thread r...@open-mpi.org
(ORTE_PROC_MY_NAME), > fd, ORTE_NAME_PRINT(dst_name)); > > /* don't do this if the dst vpid is invalid or the fd is negative! */ > if (ORTE_VPID_INVALID == dst_name->vpid || fd < 0) { > return ORTE_SUCCESS; > } > > /*OPAL_OU

Re: [OMPI users] stdin issue with openmpi/2.0.0

2016-08-29 Thread r...@open-mpi.org
Rank 18 has cleared MPI_Init > Rank 10 has cleared MPI_Init > Rank 11 has cleared MPI_Init > Rank 12 has cleared MPI_Init > Rank 13 has cleared MPI_Init > Rank 17 has cleared MPI_Init > Rank 19 has cleared MPI_Init > > Thanks, > > Dr. Jingchao Zhang > Holland Comp

Re: [OMPI users] stdin issue with openmpi/2.0.0

2016-08-27 Thread r...@open-mpi.org
- this print statement will tell me what I need to know. Thanks! Ralph > On Aug 25, 2016, at 8:19 AM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> > wrote: > > The IOF fix PR for v2.0.1 was literally just merged a few minutes ago; it > wasn't in last night's tarball. >

Re: [OMPI users] stdin issue with openmpi/2.0.0

2016-08-25 Thread r...@open-mpi.org
hanks, > > Dr. Jingchao Zhang > Holland Computing Center > University of Nebraska-Lincoln > 402-472-6400 > From: users <users-boun...@lists.open-mpi.org > <mailto:users-boun...@lists.open-mpi.org>> on behalf of r...@open-mpi.org > <mailto:r...@open-mpi.org> <

Re: [OMPI users] stdin issue with openmpi/2.0.0

2016-08-24 Thread r...@open-mpi.org
debug_info.txt is attached. > > Dr. Jingchao Zhang > Holland Computing Center > University of Nebraska-Lincoln > 402-472-6400 > From: users <users-boun...@lists.open-mpi.org > <mailto:users-boun...@lists.open-mpi.org>> on behalf of r...@open-mpi.org > <

Re: [OMPI users] stdin issue with openmpi/2.0.0

2016-08-24 Thread r...@open-mpi.org
Jingchao Zhang > Holland Computing Center > University of Nebraska-Lincoln > 402-472-6400 > From: users <users-boun...@lists.open-mpi.org > <mailto:users-boun...@lists.open-mpi.org>> on behalf of r...@open-mpi.org > <mailto:r...@open-mpi.org> <r...@open-mpi.o

Re: [OMPI users] Using Open MPI with PBS Pro

2016-08-24 Thread r...@open-mpi.org
om/open-mpi/ompi/issues/341> :-) > > In any case, thanks for the information about the default params file -- I > won't worry too much about modifying it then. > > Andy > > I > On 08/23/2016 08:08 PM, r...@open-mpi.org wrote: >> I’ve never heard of that, and ca

Re: [OMPI users] stdin issue with openmpi/2.0.0

2016-08-23 Thread r...@open-mpi.org
just hanged. > > --Jingchao > > Dr. Jingchao Zhang > Holland Computing Center > University of Nebraska-Lincoln > 402-472-6400 > From: users <users-boun...@lists.open-mpi.org > <mailto:users-boun...@lists.open-mpi.org>> on behalf of r...@open-mpi.org > <

Re: [OMPI users] Using Open MPI with PBS Pro

2016-08-23 Thread r...@open-mpi.org
I’ve never heard of that, and cannot imagine what it has to do with the resource manager. Can you point to where you heard that one? FWIW: we don’t ship OMPI with anything in the default mca params file, so somebody must have put it in there for you. > On Aug 23, 2016, at 4:48 PM, Andy Riebs

Re: [OMPI users] stdin issue with openmpi/2.0.0

2016-08-23 Thread r...@open-mpi.org
ka-Lincoln > 402-472-6400 > From: users <users-boun...@lists.open-mpi.org > <mailto:users-boun...@lists.open-mpi.org>> on behalf of r...@open-mpi.org > <mailto:r...@open-mpi.org> <r...@open-mpi.org <mailto:r...@open-mpi.org>> > Sent: Monday, August 22,

Re: [OMPI users] OS X El Capitan 10.11.6 ld: symbol(s) not found for architecture x86_64

2016-08-23 Thread r...@open-mpi.org
I’m confused - you keep talking about MPICH, but the symbol you are looking for is from OMPI. You cannot mix the two MPI libraries - is that what you are trying to do? > On Aug 23, 2016, at 1:30 PM, Richard G French wrote: > > Thanks for the suggestion, Doug - but I

Re: [OMPI users] stdin issue with openmpi/2.0.0

2016-08-22 Thread r...@open-mpi.org
issue and fix it. > > Dr. Jingchao Zhang > Holland Computing Center > University of Nebraska-Lincoln > 402-472-6400 > From: users <users-boun...@lists.open-mpi.org > <mailto:users-boun...@lists.open-mpi.org>> on behalf of r...@open-mpi.org > <mailto:r...@op

Re: [OMPI users] stdin issue with openmpi/2.0.0

2016-08-22 Thread r...@open-mpi.org
1.0/lib/libmpi.so.20 > #8 0x005c5b5d in LAMMPS_NS::Input::file() () at ../input.cpp:203 > #9 0x005d4236 in main () at ../main.cpp:31 > > Thanks, > > Dr. Jingchao Zhang > Holland Computing Center > University of Nebraska-Lincoln > 402-472-6400 > From: users <users-

Re: [OMPI users] stdin issue with openmpi/2.0.0

2016-08-22 Thread r...@open-mpi.org
Hmmm...perhaps we can break this out a bit? The stdin will be going to your rank=0 proc. It sounds like you have some subsequent step that calls MPI_Bcast? Can you first verify that the input is being correctly delivered to rank=0? This will help us isolate if the problem is in the IO

Re: [OMPI users] Problems with mpirun in openmpi-1.8.1 and -2.0.0

2016-08-19 Thread r...@open-mpi.org
The rdma error sounds like something isn’t right with your machine’s Infiniband installation. The cross-version problem sounds like you installed both OMPI versions into the same location - did you do that?? If so, then that might be the root cause of both problems. You need to install them in

Re: [OMPI users] SGE integration broken in 2.0.0

2016-08-12 Thread r...@open-mpi.org
> On Aug 12, 2016, at 1:48 PM, Reuti <re...@staff.uni-marburg.de> wrote: > > > Am 12.08.2016 um 21:44 schrieb r...@open-mpi.org <mailto:r...@open-mpi.org>: > >> Don’t know about the toolchain issue - I use those same versions, and don’t >> have a

Re: [OMPI users] SGE integration broken in 2.0.0

2016-08-12 Thread r...@open-mpi.org
> On Aug 12, 2016, at 12:15 PM, Reuti <re...@staff.uni-marburg.de> wrote: > >> >> Am 12.08.2016 um 16:52 schrieb r...@open-mpi.org <mailto:r...@open-mpi.org>: >> >> IIRC, the rationale behind adding the check was that someone using SGE >> want

Re: [OMPI users] mpirun won't find programs from the PATH environment variable that are in directories that are relative paths

2016-08-12 Thread r...@open-mpi.org
Sorry for the delay - I had to catchup on some other things before I could come back to checking this one. Took me awhile to track this down, but the change is in test for master: https://github.com/open-mpi/ompi/pull/1958 Once complete, I’ll set it up for inclusion in v2.0.1 Thanks for

Re: [OMPI users] SGE integration broken in 2.0.0

2016-08-12 Thread r...@open-mpi.org
IIRC, the rationale behind adding the check was that someone using SGE wanted to specify a custom launch agent, and we were overriding it with qrsh. However, the check is incorrect as that MCA param cannot be NULL. I have updated this on master - can you see if this fixes the problem for you?

Re: [OMPI users] OPENSHMEM ERROR with 2+ Distributed Machines

2016-08-12 Thread r...@open-mpi.org
Just as a suggestion: most of us are leery of opening Word attachments on mailing lists. I’d suggest sending this to us as plain text if you want us to read it. > On Aug 12, 2016, at 4:03 AM, Debendra Das wrote: > > I have installed OpenMPI-2.0.0 in 5 systems with

<    1   2   3