Re: [OMPI users] OPENSHMEM ERROR with 2+ Distributed Machines

2016-08-12 Thread r...@open-mpi.org
Just as a suggestion: most of us are leery of opening Word attachments on mailing lists. I’d suggest sending this to us as plain text if you want us to read it. > On Aug 12, 2016, at 4:03 AM, Debendra Das wrote: > > I have installed OpenMPI-2.0.0 in 5 systems with

Re: [OMPI users] SGE integration broken in 2.0.0

2016-08-12 Thread r...@open-mpi.org
IIRC, the rationale behind adding the check was that someone using SGE wanted to specify a custom launch agent, and we were overriding it with qrsh. However, the check is incorrect as that MCA param cannot be NULL. I have updated this on master - can you see if this fixes the problem for you?

Re: [OMPI users] mpirun won't find programs from the PATH environment variable that are in directories that are relative paths

2016-08-12 Thread r...@open-mpi.org
Sorry for the delay - I had to catchup on some other things before I could come back to checking this one. Took me awhile to track this down, but the change is in test for master: https://github.com/open-mpi/ompi/pull/1958 Once complete, I’ll set it up for inclusion in v2.0.1 Thanks for

Re: [OMPI users] SGE integration broken in 2.0.0

2016-08-12 Thread r...@open-mpi.org
> On Aug 12, 2016, at 12:15 PM, Reuti <re...@staff.uni-marburg.de> wrote: > >> >> Am 12.08.2016 um 16:52 schrieb r...@open-mpi.org <mailto:r...@open-mpi.org>: >> >> IIRC, the rationale behind adding the check was that someone using SGE >> want

Re: [OMPI users] SGE integration broken in 2.0.0

2016-08-12 Thread r...@open-mpi.org
> On Aug 12, 2016, at 1:48 PM, Reuti <re...@staff.uni-marburg.de> wrote: > > > Am 12.08.2016 um 21:44 schrieb r...@open-mpi.org <mailto:r...@open-mpi.org>: > >> Don’t know about the toolchain issue - I use those same versions, and don’t >> have a

Re: [OMPI users] Several threads making progress - How to disable them

2016-08-04 Thread r...@open-mpi.org
Yep, there are indeed two progress threads running - and no, you cannot disable them. They are, however, “blocked” so they aren’t eating any cycles during normal operation unless an event that requires their attention wakes them up. So they shouldn’t interfere with your app. > On Aug 4, 2016,

Re: [OMPI users] Performance Issues on SMP Workstation

2017-02-01 Thread r...@open-mpi.org
Simple test: replace your executable with “hostname”. If you see multiple hosts come out on your cluster, then you know why the performance is different. > On Feb 1, 2017, at 2:46 PM, Andy Witzig wrote: > > Honestly, I’m not exactly sure what scheme is being used. I am

Re: [OMPI users] Is gridengine integration broken in openmpi 2.0.2?

2017-02-03 Thread r...@open-mpi.org
I do see a diff between 2.0.1 and 2.0.2 that might have a related impact. The way we handled the MCA param that specifies the launch agent (ssh, rsh, or whatever) was modified, and I don’t think the change is correct. It basically says that we don’t look for qrsh unless the MCA param has been

Re: [OMPI users] Is gridengine integration broken in openmpi 2.0.2?

2017-02-03 Thread r...@open-mpi.org
https://github.com/open-mpi/ompi/pull/1960/files> > > > Glenn > > On Fri, Feb 3, 2017 at 10:56 AM, r...@open-mpi.org <mailto:r...@open-mpi.org> > <r...@open-mpi.org <mailto:r...@open-mpi.org>> wrote: > I do see a diff between 2.0.1 and 2.0.2

Re: [OMPI users] MPI_Comm_spawn question

2017-02-03 Thread r...@open-mpi.org
We know v2.0.1 has problems with comm_spawn, and so you may be encountering one of those. Regardless, there is indeed a timeout mechanism in there. It was added because people would execute a comm_spawn, and then would hang and eat up their entire allocation time for nothing. In v2.0.2, I see

Re: [OMPI users] MPI_Comm_spawn question

2017-01-31 Thread r...@open-mpi.org
What version of OMPI are you using? > On Jan 31, 2017, at 7:33 AM, elistrato...@info.sgu.ru wrote: > > Hi, > > I am trying to write trivial master-slave program. Master simply creates > slaves, sends them a string, they print it out and exit. Everything works > just fine, however, when I add a

Re: [OMPI users] Startup limited to 128 remote hosts in some situations?

2017-01-20 Thread r...@open-mpi.org
; On Jan 19, 2017, at 5:29 PM, r...@open-mpi.org wrote: > > I’ll create a patch that you can try - if it works okay, we can commit it > >> On Jan 18, 2017, at 3:29 AM, William Hay <w@ucl.ac.uk> wrote: >> >> On Tue, Jan 17, 2017 at 09:56:54AM -0800, r...@ope

Re: [OMPI users] Segmentation Fault when using OpenMPI 1.10.6 and PGI 17.1.0 on POWER8

2017-02-21 Thread r...@open-mpi.org
Can you provide a backtrace with line numbers from a debug build? We don’t get much testing with lsf, so it is quite possible there is a bug in there. > On Feb 21, 2017, at 7:39 PM, Hammond, Simon David (-EXP) > wrote: > > Hi OpenMPI Users, > > Has anyone successfully

Re: [OMPI users] OpenMPI and Singularity

2017-02-20 Thread r...@open-mpi.org
If there are diagnostics you would like, I can try to > provide those. I will be gone starting Thu for a week. > > -- bennet > > > > > On Fri, Feb 17, 2017 at 11:20 PM, r...@open-mpi.org <r...@open-mpi.org> wrote: >> I -think- that is correct, but you may

Re: [OMPI users] "-map-by socket:PE=1" doesn't do what I expect

2017-02-17 Thread r...@open-mpi.org
Mark - this is now available in master. Will look at what might be required to bring it to 2.0 > On Feb 15, 2017, at 5:49 AM, r...@open-mpi.org wrote: > > >> On Feb 15, 2017, at 5:45 AM, Mark Dixon <m.c.di...@leeds.ac.uk> wrote: >> >> On Wed, 15 Feb 2017, r.

Re: [OMPI users] Is building with "--enable-mpi-thread-multiple" recommended?

2017-02-17 Thread r...@open-mpi.org
Depends on the version, but if you are using something in the v2.x range, you should be okay with just one installed version > On Feb 17, 2017, at 4:41 AM, Mark Dixon wrote: > > Hi, > > We have some users who would like to try out openmpi MPI_THREAD_MULTIPLE > support

Re: [OMPI users] fatal error with openmpi-master-201702150209-404fe32 on Linux with Sun C

2017-02-17 Thread r...@open-mpi.org
iegmar, > > > meanwhile, feel free to manually apply the attached patch > > > > Cheers, > > > Gilles > > > On 2/16/2017 8:09 AM, r...@open-mpi.org wrote: >> I guess it was the next nightly tarball, but not next commit. However, it >> was almost c

Re: [OMPI users] OpenMPI and Singularity

2017-02-17 Thread r...@open-mpi.org
The embedded Singularity support hasn’t made it into the OMPI 2.x release series yet, though OMPI will still work within a Singularity container anyway. Compatibility across the container boundary is always a problem, as your examples illustrate. If the system is using one OMPI version and the

Re: [OMPI users] OpenMPI and Singularity

2017-02-17 Thread r...@open-mpi.org
port, it would still run, > but it would fall back to non-verbs communication, so it would just be > commensurately slower. > > Let me know if I've garbled things. Otherwise, wish me luck, and have > a good weekend! > > Thanks, -- bennet > > > > On Fri, Feb

Re: [OMPI users] Is building with "--enable-mpi-thread-multiple" recommended?

2017-02-18 Thread r...@open-mpi.org
t; > On Fri, 17 Feb 2017, r...@open-mpi.org wrote: > >> Depends on the version, but if you are using something in the v2.x range, >> you should be okay with just one installed version > > Thanks Ralph. > > How good is MPI_THREAD_MULTIPLE support these days an

Re: [OMPI users] Using OpenMPI / ORTE as cluster aware GNU Parallel

2017-02-23 Thread r...@open-mpi.org
You might want to try using the DVM (distributed virtual machine) mode in ORTE. You can start it on an allocation using the “orte-dvm” cmd, and then submit jobs to it with “mpirun --hnp ”, where foo is either the contact info printed out by orte-dvm, or the name of the file you told orte-dvm to

Re: [OMPI users] More confusion about --map-by!

2017-02-23 Thread r...@open-mpi.org
From the mpirun man page: ** Open MPI employs a three-phase procedure for assigning process locations and ranks: mapping Assigns a default location to each process ranking Assigns an MPI_COMM_WORLD rank value to each process binding Constrains each process to run on specific

Re: [OMPI users] More confusion about --map-by!

2017-02-23 Thread r...@open-mpi.org
rank 7 bound to socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: [../../../../../../../../../../../..][../../BB/BB/../../../../../../../..] “span” causes ORTE to treat all the sockets etc. as being on a single giant node. HTH Ralph > On Feb 23, 2017, at 6:38 AM, r...@open-mpi.org wr

Re: [OMPI users] Is building with "--enable-mpi-thread-multiple" recommended?

2017-02-18 Thread r...@open-mpi.org
andling of application > exceptions across processes) > > So it is good to hear there is progress. > > On Feb 18, 2017 7:43 AM, "r...@open-mpi.org <mailto:r...@open-mpi.org>" > <r...@open-mpi.org <mailto:r...@open-mpi.org>> wrote: > We have been m

Re: [OMPI users] Is gridengine integration broken in openmpi 2.0.2?

2017-02-13 Thread r...@open-mpi.org
clarified the logic in the OMPI master repo. However, I don’t know how long it will be before a 2.0.3 release is issued, so GridEngine users might want to locally fix things in the interim. > On Feb 12, 2017, at 1:52 PM, r...@open-mpi.org wrote: > > Yeah, I’ll fix it this week. Th

Re: [OMPI users] "-map-by socket:PE=1" doesn't do what I expect

2017-02-15 Thread r...@open-mpi.org
Ah, yes - I know what the problem is. We weren’t expecting a PE value of 1 - the logic is looking expressly for values > 1 as we hadn’t anticipated this use-case. I can make that change. I’m off to a workshop for the next day or so, but can probably do this on the plane. > On Feb 15, 2017,

Re: [OMPI users] "-map-by socket:PE=1" doesn't do what I expect

2017-02-15 Thread r...@open-mpi.org
Ah, yes - I know what the problem is. We weren’t expecting a PE value of 1 - the logic is looking expressly for values > 1 as we hadn’t anticipated this use-case. I can make that change. I’m off to a workshop for the next day or so, but can probably do this on the plane. > On Feb 15, 2017,

Re: [OMPI users] "-map-by socket:PE=1" doesn't do what I expect

2017-02-15 Thread r...@open-mpi.org
> On Feb 15, 2017, at 5:45 AM, Mark Dixon <m.c.di...@leeds.ac.uk> wrote: > > On Wed, 15 Feb 2017, r...@open-mpi.org wrote: > >> Ah, yes - I know what the problem is. We weren’t expecting a PE value of 1 - >> the logic is looking expressly for values > 1 as w

Re: [OMPI users] Is gridengine integration broken in openmpi 2.0.2?

2017-02-12 Thread r...@open-mpi.org
Yeah, I’ll fix it this week. The problem is that you can’t check the source as being default as the default is ssh - so the only way to get the current code to check for qrsh is to specify something other than the default ssh (it doesn’t matter what you specify - anything will get you past the

Re: [OMPI users] fatal error with openmpi-master-201702150209-404fe32 on Linux with Sun C

2017-02-15 Thread r...@open-mpi.org
If we knew what line in that file was causing the compiler to barf, we could at least address it. There is probably something added in recent commits that is causing problems for the compiler. So checking to see what commit might be triggering the failure would be most helpful. > On Feb 15,

Re: [OMPI users] Problem with MPI_Comm_spawn using openmpi 2.0.x + sbatch

2017-02-15 Thread r...@open-mpi.org
n when in batch environment but you may also want > to upgrade to Open MPI 2.0.2. > > Howard > > r...@open-mpi.org <mailto:r...@open-mpi.org> <r...@open-mpi.org > <mailto:r...@open-mpi.org>> schrieb am Mi. 15. Feb. 2017 um 07:49: > Nothing immediate comes to mi

Re: [OMPI users] fatal error with openmpi-master-201702150209-404fe32 on Linux with Sun C

2017-02-15 Thread r...@open-mpi.org
mpi-master-201702080209-bc2890e-Linux.x86_64.64_cc/opal/mca/pmix/pmix2x/pmix/src/dstore/pmix_esh.lo > > loki openmpi-master 148 find > openmpi-master-201702100209-51def91-Linux.x86_64.64_cc -name pmix_esh.lo > loki openmpi-master 149 > > Which files do you need? Which command

Re: [OMPI users] Problem with MPI_Comm_spawn using openmpi 2.0.x + sbatch

2017-02-15 Thread r...@open-mpi.org
re job.sh is: >> >> #!/bin/bash -l >> module load openmpi/2.0.1-icc >> mpirun -np 1 ./manager 4 >> >> >> >> >> >> >> >>> On 15 February 2017 at 17:58, r...@open-mpi.org <r...@open-mpi.org> wrote: >>> The cmd line looks fine - whe

Re: [OMPI users] Specify the core binding when spawning a process

2017-02-15 Thread r...@open-mpi.org
Sorry for slow response - was away for awhile. What version of OMPI are you using? > On Feb 8, 2017, at 1:59 PM, Allan Ma wrote: > > Hello, > > I'm designing a program on a dual socket system that needs the parent process > and spawned child process to be at least

Re: [OMPI users] Problem with MPI_Comm_spawn using openmpi 2.0.x + sbatch

2017-02-15 Thread r...@open-mpi.org
Nothing immediate comes to mind - all sbatch does is create an allocation and then run your script in it. Perhaps your script is using a different “mpirun” command than when you type it interactively? > On Feb 14, 2017, at 5:11 AM, Anastasia Kruchinina > wrote: >

Re: [OMPI users] Startup limited to 128 remote hosts in some situations?

2017-01-17 Thread r...@open-mpi.org
As I recall, the problem was that qrsh isn’t available on the backend compute nodes, and so we can’t use a tree for launch. If that isn’t true, then we can certainly adjust it. > On Jan 17, 2017, at 9:37 AM, Mark Dixon wrote: > > Hi, > > While commissioning a new

Re: [OMPI users] Startup limited to 128 remote hosts in some situations?

2017-01-19 Thread r...@open-mpi.org
I’ll create a patch that you can try - if it works okay, we can commit it > On Jan 18, 2017, at 3:29 AM, William Hay <w@ucl.ac.uk> wrote: > > On Tue, Jan 17, 2017 at 09:56:54AM -0800, r...@open-mpi.org wrote: >> As I recall, the problem was that qrsh isn???t ava

Re: [OMPI users] Problems with mpirun in openmpi-1.8.1 and -2.0.0

2016-08-19 Thread r...@open-mpi.org
The rdma error sounds like something isn’t right with your machine’s Infiniband installation. The cross-version problem sounds like you installed both OMPI versions into the same location - did you do that?? If so, then that might be the root cause of both problems. You need to install them in

Re: [OMPI users] stdin issue with openmpi/2.0.0

2016-08-22 Thread r...@open-mpi.org
Hmmm...perhaps we can break this out a bit? The stdin will be going to your rank=0 proc. It sounds like you have some subsequent step that calls MPI_Bcast? Can you first verify that the input is being correctly delivered to rank=0? This will help us isolate if the problem is in the IO

Re: [OMPI users] Using OpenMPI / ORTE as cluster aware GNU Parallel

2017-02-27 Thread r...@open-mpi.org
> On Feb 27, 2017, at 4:58 AM, Angel de Vicente <ang...@iac.es> wrote: > > Hi, > > "r...@open-mpi.org" <r...@open-mpi.org> writes: >> You might want to try using the DVM (distributed virtual machine) >> mode in ORTE. You can start it on an alloca

Re: [OMPI users] State of the DVM in Open MPI

2017-02-28 Thread r...@open-mpi.org
Hi Reuti The DVM in master seems to be fairly complete, but several organizations are in the process of automating tests for it so it gets more regular exercise. If you are using a version in OMPI 2.x, those are early prototype - we haven’t updated the code in the release branches. The more

Re: [OMPI users] Using OpenMPI / ORTE as cluster aware GNU Parallel

2017-02-27 Thread r...@open-mpi.org
> On Feb 27, 2017, at 9:39 AM, Reuti wrote: > > >> Am 27.02.2017 um 18:24 schrieb Angel de Vicente : >> >> […] >> >> For a small group of users if the DVM can run with my user and there is >> no restriction on who can use it or if I somehow can

Re: [OMPI users] stdin issue with openmpi/2.0.0

2016-08-25 Thread r...@open-mpi.org
hanks, > > Dr. Jingchao Zhang > Holland Computing Center > University of Nebraska-Lincoln > 402-472-6400 > From: users <users-boun...@lists.open-mpi.org > <mailto:users-boun...@lists.open-mpi.org>> on behalf of r...@open-mpi.org > <mailto:r...@open-mpi.org> <

Re: [OMPI users] stdin issue with openmpi/2.0.0

2016-08-30 Thread r...@open-mpi.org
> From: users <users-boun...@lists.open-mpi.org > <mailto:users-boun...@lists.open-mpi.org>> on behalf of r...@open-mpi.org > <mailto:r...@open-mpi.org> <r...@open-mpi.org <mailto:r...@open-mpi.org>> > Sent: Tuesday, August 30, 2016 12:56:33 PM > To: Open MP

Re: [OMPI users] Unable to mpirun from within torque

2016-09-08 Thread r...@open-mpi.org
If you are correctly analyzing things, then there would be an issue in the code. When we get an allocation from a resource manager, we set a flag indicating that it is “gospel” - i.e., that we do not directly sense the number of cores on a node and set the #slots equal to that value. Instead,

Re: [OMPI users] OMPI users] Unable to mpirun from within torque

2016-09-08 Thread r...@open-mpi.org
only two nodes with 16 slots each are available and i request > -l nodes=3:ppn=1 > i guess this is a different scheduler configuration, and i cannot change that. > > Could you please have a look at this ? > > Cheers, > > Gilles > > On 9/7/2016 11:15 PM, r...@open-mp

Re: [OMPI users] OMPI users] Unable to mpirun from within torque

2016-09-08 Thread r...@open-mpi.org
I’m pruning this email thread so I can actually read the blasted thing :-) Guys: you are off in the wilderness chasing ghosts! Please stop. When I say that Torque uses an “ordered” file, I am _not_ saying that all the host entries of the same name have to be listed consecutively. I am saying

Re: [OMPI users] stdin issue with openmpi/2.0.0

2016-08-29 Thread r...@open-mpi.org
Rank 18 has cleared MPI_Init > Rank 10 has cleared MPI_Init > Rank 11 has cleared MPI_Init > Rank 12 has cleared MPI_Init > Rank 13 has cleared MPI_Init > Rank 17 has cleared MPI_Init > Rank 19 has cleared MPI_Init > > Thanks, > > Dr. Jingchao Zhang > Holland Comp

Re: [OMPI users] stdin issue with openmpi/2.0.0

2016-08-30 Thread r...@open-mpi.org
(ORTE_PROC_MY_NAME), > fd, ORTE_NAME_PRINT(dst_name)); > > /* don't do this if the dst vpid is invalid or the fd is negative! */ > if (ORTE_VPID_INVALID == dst_name->vpid || fd < 0) { > return ORTE_SUCCESS; > } > > /*OPAL_OU

Re: [OMPI users] OMPI users] Unable to mpirun from within torque

2016-09-07 Thread r...@open-mpi.org
The usual cause of this problem is that the nodename in the machinefile is given as a00551, while Torque is assigning the node name as a00551.science.domain. Thus, mpirun thinks those are two separate nodes and winds up spawning an orted on its own node. You might try ensuring that your

Re: [OMPI users] OMPI users] Unable to mpirun from within torque

2016-09-07 Thread r...@open-mpi.org
You aren’t looking in the right place - there is an “openmpi” directory underneath that one, and the mca_xxx libraries are down there > On Sep 7, 2016, at 7:43 AM, Oswin Krause > wrote: > > Hi Gilles, > > I do not have this library. Maybe this helps

Re: [OMPI users] stdin issue with openmpi/2.0.0

2016-08-30 Thread r...@open-mpi.org
a.out < test.in > > Please see attached for the outputs. > > Thank you Ralph. I am willing to provide whatever information you need. > > From: users <users-boun...@lists.open-mpi.org > <mailto:users-boun...@lists.open-mpi.org>> on behalf of r...@open-mpi.org > &

Re: [OMPI users] MPI_Comm_spawn

2016-09-29 Thread r...@open-mpi.org
Spawn definitely does not work with srun. I don’t recognize the name of the file that segfaulted - what is “ptl.c”? Is that in your manager program? > On Sep 29, 2016, at 6:06 AM, Gilles Gouaillardet > wrote: > > Hi, > > I do not expect spawn can work with

Re: [OMPI users] MPI_Comm_spawn

2016-09-29 Thread r...@open-mpi.org
: > > Ralph, > > My guess is that ptl.c comes from PSM lib ... > > Cheers, > > Gilles > > On Thursday, September 29, 2016, r...@open-mpi.org <mailto:r...@open-mpi.org> > <r...@open-mpi.org <mailto:r...@open-mpi.org>> wrote: > Spawn definitely

Re: [OMPI users] Launching hybrid MPI/OpenMP jobs on a cluster: correct OpenMPI flags?

2016-10-03 Thread r...@open-mpi.org
FWIW: the socket option seems to work fine for me: $ mpirun -n 12 -map-by socket:pe=2 -host rhc001 --report-bindings hostname [rhc001:200408] MCW rank 1 bound to socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]]: [../../../../../../../../../../../..][BB/BB/../../../../../../../../../..]

Re: [OMPI users] Openmpi 1.10.x, mpirun and Slurm 15.08 problem

2016-09-23 Thread r...@open-mpi.org
This isn’t an issue with the SLURM integration - this is the problem of our OOB not correctly picking the right subnet for connecting back to mpirun. In this specific case, you probably want -mca btl_tcp_if_include em4 -mca oob_tcp_if_include em4 since it is the em4 network that ties the

Re: [OMPI users] Strange errors when running mpirun

2016-09-22 Thread r...@open-mpi.org
ine. > > Aborting. > -------------- > > and when I type "ls" the directory > "openmpi-sessions-501@Justins-MacBook-Pro-2_0" reappeared. Unless > there's a different directory I need to look for? >

Re: [OMPI users] Strange errors when running mpirun

2016-09-22 Thread r...@open-mpi.org
Maybe I’m missing something, but “mpirun -n 1” doesn’t include the name of an application to execute. The error message prior to that error indicates that you have some cruft sitting in your tmpdir. You just need to clean it out - look for something that starts with “openmpi” > On Sep 22,

Re: [OMPI users] stdin issue with openmpi/2.0.0

2016-08-27 Thread r...@open-mpi.org
- this print statement will tell me what I need to know. Thanks! Ralph > On Aug 25, 2016, at 8:19 AM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> > wrote: > > The IOF fix PR for v2.0.1 was literally just merged a few minutes ago; it > wasn't in last night's tarball. >

Re: [OMPI users] stdin issue with openmpi/2.0.0

2016-08-22 Thread r...@open-mpi.org
issue and fix it. > > Dr. Jingchao Zhang > Holland Computing Center > University of Nebraska-Lincoln > 402-472-6400 > From: users <users-boun...@lists.open-mpi.org > <mailto:users-boun...@lists.open-mpi.org>> on behalf of r...@open-mpi.org > <mailto:r...@op

Re: [OMPI users] OS X El Capitan 10.11.6 ld: symbol(s) not found for architecture x86_64

2016-08-23 Thread r...@open-mpi.org
I’m confused - you keep talking about MPICH, but the symbol you are looking for is from OMPI. You cannot mix the two MPI libraries - is that what you are trying to do? > On Aug 23, 2016, at 1:30 PM, Richard G French wrote: > > Thanks for the suggestion, Doug - but I

Re: [OMPI users] stdin issue with openmpi/2.0.0

2016-08-23 Thread r...@open-mpi.org
ka-Lincoln > 402-472-6400 > From: users <users-boun...@lists.open-mpi.org > <mailto:users-boun...@lists.open-mpi.org>> on behalf of r...@open-mpi.org > <mailto:r...@open-mpi.org> <r...@open-mpi.org <mailto:r...@open-mpi.org>> > Sent: Monday, August 22,

Re: [OMPI users] stdin issue with openmpi/2.0.0

2016-08-23 Thread r...@open-mpi.org
just hanged. > > --Jingchao > > Dr. Jingchao Zhang > Holland Computing Center > University of Nebraska-Lincoln > 402-472-6400 > From: users <users-boun...@lists.open-mpi.org > <mailto:users-boun...@lists.open-mpi.org>> on behalf of r...@open-mpi.org > <

Re: [OMPI users] Using Open MPI with PBS Pro

2016-08-23 Thread r...@open-mpi.org
I’ve never heard of that, and cannot imagine what it has to do with the resource manager. Can you point to where you heard that one? FWIW: we don’t ship OMPI with anything in the default mca params file, so somebody must have put it in there for you. > On Aug 23, 2016, at 4:48 PM, Andy Riebs

Re: [OMPI users] stdin issue with openmpi/2.0.0

2016-08-22 Thread r...@open-mpi.org
1.0/lib/libmpi.so.20 > #8 0x005c5b5d in LAMMPS_NS::Input::file() () at ../input.cpp:203 > #9 0x005d4236 in main () at ../main.cpp:31 > > Thanks, > > Dr. Jingchao Zhang > Holland Computing Center > University of Nebraska-Lincoln > 402-472-6400 > From: users <users-

Re: [OMPI users] stdin issue with openmpi/2.0.0

2016-08-24 Thread r...@open-mpi.org
Jingchao Zhang > Holland Computing Center > University of Nebraska-Lincoln > 402-472-6400 > From: users <users-boun...@lists.open-mpi.org > <mailto:users-boun...@lists.open-mpi.org>> on behalf of r...@open-mpi.org > <mailto:r...@open-mpi.org> <r...@open-mpi.o

Re: [OMPI users] stdin issue with openmpi/2.0.0

2016-08-24 Thread r...@open-mpi.org
debug_info.txt is attached. > > Dr. Jingchao Zhang > Holland Computing Center > University of Nebraska-Lincoln > 402-472-6400 > From: users <users-boun...@lists.open-mpi.org > <mailto:users-boun...@lists.open-mpi.org>> on behalf of r...@open-mpi.org > <

Re: [OMPI users] Using Open MPI with PBS Pro

2016-08-24 Thread r...@open-mpi.org
om/open-mpi/ompi/issues/341> :-) > > In any case, thanks for the information about the default params file -- I > won't worry too much about modifying it then. > > Andy > > I > On 08/23/2016 08:08 PM, r...@open-mpi.org wrote: >> I’ve never heard of that, and ca

Re: [OMPI users] Slurm binding not propagated to MPI jobs

2016-10-27 Thread r...@open-mpi.org
Hey Andy Is there a SLURM envar that would tell us the binding option from the srun cmd line? We automatically bind when direct launched due to user complaints of poor performance if we don’t. If the user specifies a binding option, then we detect that we were already bound and don’t do it.

Re: [OMPI users] MCA compilation later

2016-10-28 Thread r...@open-mpi.org
You don’t need any of the hardware - you just need the headers. Things like libfabric and libibverbs are all publicly available, and so you can build all that support even if you cannot run it on your machine. Once your customer installs the binary, the various plugins will check for their

Re: [OMPI users] Slurm binding not propagated to MPI jobs

2016-11-04 Thread r...@open-mpi.org
ave us from > having to write some Slurm prolog scripts to effect that. > > Thanks Ralph! > > On 11/01/2016 11:36 PM, r...@open-mpi.org <mailto:r...@open-mpi.org> wrote: >> Ah crumby!! We already solved this on master, but it cannot be backported to >> the 1.10 series w

[OMPI users] Supercomputing 2016: Birds-of-a-Feather meetings

2016-10-24 Thread r...@open-mpi.org
Hello all This year, we will again be hosting Birds-of-a-Feather meetings for Open MPI and PMIx. Open MPI: Wed, Nov 16th, 5:15-7pm http://sc16.supercomputing.org/presentation/?id=bof103=sess322 PMIx: Wed, Nov16th,

Re: [OMPI users] malloc related crash inside openmpi

2016-11-23 Thread r...@open-mpi.org
It looks like the library may not have been fully installed on that node - can you see if the prefix location is present, and that the LD_LIBRARY_PATH on that node is correctly set? The referenced component did not exist prior to the 2.0 series, so I’m betting that your LD_LIBRARY_PATH isn’t

Re: [OMPI users] malloc related crash inside openmpi

2016-11-24 Thread r...@open-mpi.org
016, at 2:31 PM, Noam Bernstein <noam.bernst...@nrl.navy.mil> > wrote: > > >> On Nov 23, 2016, at 5:26 PM, r...@open-mpi.org <mailto:r...@open-mpi.org> >> wrote: >> >> It looks like the library may not have been fully installed on that node -

Re: [OMPI users] mpi4py+OpenMPI: Qs about submitting bugs and examples

2016-10-31 Thread r...@open-mpi.org
> On Oct 31, 2016, at 10:39 AM, Jason Maldonis wrote: > > Hello everyone, > > I am using mpi4py with OpenMPI for a simulation that uses dynamic resource > allocation via `mpi_spawn_multiple`. I've been working on this problem for > about 6 months now and I have some

Re: [OMPI users] MCA compilation later

2016-10-31 Thread r...@open-mpi.org
w set of header files? > > (I'm a bit surprised only header files are necessary. Shouldn't the plugin > require at least runtime linking with a low-level transport library?) > > -Sean > > -- > Sean Ahern > Computational Engineering International > 919-363-0883 > > On

Re: [OMPI users] Slurm binding not propagated to MPI jobs

2016-10-27 Thread r...@open-mpi.org
nd=core env | grep BIND > SLURM_CPU_BIND_VERBOSE=quiet > SLURM_CPU_BIND_TYPE=mask_cpu: > SLURM_CPU_BIND_LIST=0x,0x > SLURM_CPU_BIND=quiet,mask_cpu:0x,0x > SLURM_CPU_BIND_VERBOSE=quiet > SLURM_CPU_BIND_TYPE=mask_cpu: > SLURM_CPU_BIND_LIST=0x,0x > SLURM_CPU_BIND=quiet,m

Re: [OMPI users] Slurm binding not propagated to MPI jobs

2016-10-27 Thread r...@open-mpi.org
iet > Here are the relevant Slurm configuration options that could conceivably > change the behavior from system to system: > SelectType = select/cons_res > SelectTypeParameters = CR_CPU > > > On 10/27/2016 01:17 PM, r...@open-mpi.org <mailto:r...@open-mpi.org>

Re: [OMPI users] Launching hybrid MPI/OpenMP jobs on a cluster: correct OpenMPI flags?

2016-10-28 Thread r...@open-mpi.org
FWIW: I’ll be presenting “Mapping, Ranking, and Binding - Oh My!” at the OMPI BoF meeting at SC’16, for those who can attend > On Oct 11, 2016, at 8:16 AM, Dave Love wrote: > > Wirawan Purwanto writes: > >> Instead of the scenario above, I was

Re: [OMPI users] what was the rationale behind rank mapping by socket?

2016-10-28 Thread r...@open-mpi.org
FWIW: I’ll be presenting “Mapping, Ranking, and Binding - Oh My!” at the OMPI BoF meeting at SC’16, for those who can attend. Will try to explain the rationale as well as the mechanics of the options > On Oct 11, 2016, at 8:09 AM, Dave Love wrote: > > Gilles

Re: [OMPI users] mpirun --map-by-node

2016-11-04 Thread r...@open-mpi.org
r/bin/openmpiWiFiBulb > > But,i want use hostfile only.. > kindly help me. > > > On Fri, Nov 4, 2016 at 5:00 PM, r...@open-mpi.org <mailto:r...@open-mpi.org> > <r...@open-mpi.org <mailto:r...@open-mpi.org>> wrote: > you mistyped the option - it is “--map-by node”.

Re: [OMPI users] mpirun --map-by-node

2016-11-04 Thread r...@open-mpi.org
you mistyped the option - it is “--map-by node”. Note the space between “by” and “node” - you had typed it with a “-“ instead of a “space” > On Nov 4, 2016, at 4:28 AM, Mahesh Nanavalla > wrote: > > Hi all, > > I am using openmpi-1.10.3,using quad core

Re: [OMPI users] mpirun --map-by-node

2016-11-04 Thread r...@open-mpi.org
ining processors. >> >> Just a thought, -- bennet >> >> >> >> >> >> On Fri, Nov 4, 2016 at 8:39 AM, Mahesh Nanavalla >> <mahesh.nanavalla...@gmail.com> wrote: >>> s... >>> >>> Thanks for responding me

Re: [OMPI users] Slurm binding not propagated to MPI jobs

2016-11-01 Thread r...@open-mpi.org
> Subject: Re: [OMPI users] Slurm binding not propagated to MPI jobs > > Hi Ralph, > > I haven't played around in this code, so I'll flip the question over to the > Slurm list, and report back here when I learn anything. > > Cheers > Andy > > On 10/27/2016 01:44

Re: [OMPI users] what was the rationale behind rank mapping by socket?

2016-10-28 Thread r...@open-mpi.org
gt; It's our foot, and we have been doing a good job of shooting it. ;-) > > -- bennet > > > > > On Fri, Oct 28, 2016 at 7:18 PM, r...@open-mpi.org <r...@open-mpi.org> wrote: >> FWIW: I’ll be presenting “Mapping, Ranking, and Binding - Oh My!” at the >> OMPI BoF me

Re: [OMPI users] how to tell if pmi or pmi2 is being used?

2016-10-13 Thread r...@open-mpi.org
If you are using mpirun, then neither PMI1 or PMI2 are involved at all. ORTE has its own internal mechanism for handling wireup. > On Oct 13, 2016, at 10:43 AM, David Shrader wrote: > > Hello All, > > I'm using Open MPI 1.10.3 with Slurm and would like to ask how do I find

Re: [OMPI users] Abort/ Deadlock issue in allreduce

2016-12-07 Thread r...@open-mpi.org
Hi Christof Sorry if I missed this, but it sounds like you are saying that one of your procs abnormally terminates, and we are failing to kill the remaining job? Is that correct? If so, I just did some work that might relate to that problem that is pending in PR #2528:

Re: [OMPI users] Abort/ Deadlock issue in allreduce

2016-12-08 Thread r...@open-mpi.org
etc. to try to catch signals. Would that be useful ? I need a >>>> moment to figure out how to do this, but I can definitively try. >>>> >>>> Some remark: During "make install" from the git repo I see a >>>> >>>> WARNING! Commo

Re: [OMPI users] device failed to appear .. Connection timed out

2016-12-08 Thread r...@open-mpi.org
Sounds like something didn’t quite get configured right, or maybe you have a library installed that isn’t quite setup correctly, or... Regardless, we generally advise building from source to avoid such problems. Is there some reason not to just do so? > On Dec 8, 2016, at 6:16 AM, Daniele

Re: [OMPI users] still segmentation fault with openmpi-2.0.2rc3 on Linux

2017-01-10 Thread r...@open-mpi.org
I think there is some relevant discussion here: https://github.com/open-mpi/ompi/issues/1569 It looks like Gilles had (at least at one point) a fix for master when enable-heterogeneous, but I don’t know if that was committed. > On Jan 9, 2017, at

Re: [OMPI users] Signal propagation in 2.0.1

2016-12-02 Thread r...@open-mpi.org
Fix is on the way: https://github.com/open-mpi/ompi/pull/2498 <https://github.com/open-mpi/ompi/pull/2498> Thanks Ralph > On Dec 1, 2016, at 10:49 AM, r...@open-mpi.org wrote: > > Yeah, that’s a bug - we’ll have to address it > > Thanks > Ralph > >> On Nov 2

Re: [OMPI users] question about "--rank-by slot" behavior

2016-11-30 Thread r...@open-mpi.org
I think you have confused “slot” with a physical “core”. The two have absolutely nothing to do with each other. A “slot” is nothing more than a scheduling entry in which a process can be placed. So when you --rank-by slot, the ranks are assigned round-robin by scheduler entry - i.e., you

Re: [OMPI users] Signal propagation in 2.0.1

2016-12-01 Thread r...@open-mpi.org
Yeah, that’s a bug - we’ll have to address it Thanks Ralph > On Nov 28, 2016, at 9:29 AM, Noel Rycroft wrote: > > I'm seeing different behaviour between Open MPI 1.8.4 and 2.0.1 with regards > to signal propagation. > > With version 1.8.4 mpirun seems to propagate

Re: [OMPI users] question about "--rank-by slot" behavior

2016-11-30 Thread r...@open-mpi.org
algorithm (the only thing I changed in my > examples) and this results in ranks being assigned differently? > > Thanks again, > David > > On 11/30/2016 01:23 PM, r...@open-mpi.org wrote: >> I think you have confused “slot” with a physical “core”. The two have >> absolutel

Re: [OMPI users] OpenMPI + InfiniBand

2016-12-23 Thread r...@open-mpi.org
Also check to ensure you are using the same version of OMPI on all nodes - this message usually means that a different version was used on at least one node. > On Dec 23, 2016, at 1:58 AM, gil...@rist.or.jp wrote: > > Serguei, > > > this looks like a very different issue, orted cannot be

[OMPI users] Release of OMPI v1.10.5

2016-12-19 Thread r...@open-mpi.org
The Open MPI Team, representing a consortium of research, academic, and industry partners, is pleased to announce the release of Open MPI version 1.10.5. v1.10.5 is a bug fix release that includes an important performance regression fix. All users are encouraged to upgrade to v1.10.5 when

Re: [OMPI users] OpenMPI-2.1.0 problem with executing orted when using SGE

2017-03-22 Thread r...@open-mpi.org
Sorry folks - for some reason (probably timing for getting 2.1.0 out), the fix for this got pushed to v2.1.1 - see the PR here: https://github.com/open-mpi/ompi/pull/3163 > On Mar 22, 2017, at 7:49 AM, Reuti wrote: >

Re: [OMPI users] Performance degradation of OpenMPI 1.10.2 when oversubscribed?

2017-03-27 Thread r...@open-mpi.org
I’m confused - mpi_yield_when_idle=1 is precisely the “oversubscribed” setting. So why would you expect different results? > On Mar 27, 2017, at 3:52 AM, Jordi Guitart wrote: > > Hi Ben, > > Thanks for your feedback. As described here >

Re: [OMPI users] Communicating MPI processes running in Docker containers in the same host by means of shared memory?

2017-03-26 Thread r...@open-mpi.org
There are a couple of things you’d need to resolve before worrying about code: * IIRC, there is a separate ORTE daemon in each Docker container since OMPI thinks these are separate nodes. So you’ll first need to find some way those daemons can “discover” that they are on the same physical node.

Re: [OMPI users] How to launch ompi-server?

2017-03-19 Thread r...@open-mpi.org
Well, your initial usage looks correct - you don’t launch ompi-server via mpirun. However, it sounds like there is probably a bug somewhere if it hangs as you describe. Scratching my head, I can only recall less than a handful of people ever using these MPI functions to cross-connect jobs, so

  1   2   >