Re: [OMPI users] disable slurm/munge from mpirun

2017-06-22 Thread r...@open-mpi.org
You can add "OMPI_MCA_plm=rsh OMPI_MCA_sec=^munge” to your environment > On Jun 22, 2017, at 7:28 AM, John Hearns via users > wrote: > > Michael, try > --mca plm_rsh_agent ssh > > I've been fooling with this myself recently, in the contect of a PBS cluster > > On

Re: [OMPI users] MPI_ABORT, indirect execution of executables by mpirun, Open MPI 2.1.1

2017-06-19 Thread r...@open-mpi.org
quot;aborttest" process > remains after all the signals are sent. > > On 19 Jun 2017 at 10:10, r...@open-mpi.org wrote: > >> >> That is typical behavior when you throw something into "sleep" - not much we >> can do about it, I >> think. >&g

Re: [OMPI users] MPI_ABORT, indirect execution of executables by mpirun, Open MPI 2.1.1

2017-06-15 Thread r...@open-mpi.org
PI calls, the shells should not be aborted. > > And users might have several layers of shells in between mpirun and the > executable. > > So now I will look for the latest version of Open MPI that has the 1.4.3 > behavior. > > Sincerely, > > Ted Sussman > >

Re: [OMPI users] MPI_ABORT, indirect execution of executables by mpirun, Open MPI 2.1.1

2017-06-15 Thread r...@open-mpi.org
and never stops. > 2.1.1 process 1 doesn't abort, but stops after it is finished sleeping > > Sincerely, > > Ted Sussman > > On 15 Jun 2017 at 9:18, r...@open-mpi.org wrote: > >> Here is how the system is working: >> >> Master: each process is put in

Re: [OMPI users] MPI_ABORT, indirect execution of executables by mpirun, Open MPI 2.1.1

2017-06-15 Thread r...@open-mpi.org
Here is how the system is working: Master: each process is put into its own process group upon launch. When we issue a “kill”, however, we only issue it to the individual process (instead of the process group that is headed by that child process). This is probably a bug as I don’t believe that

Re: [OMPI users] "undefined reference to `MPI_Comm_create_group'" error message when using Open MPI 1.6.2

2017-06-09 Thread r...@open-mpi.org
looks for ORTE file(s) on the hard disks of > compute nodes. > > Now I know that I can install Open MPI in a shared directory. But is it > possible to make executable files that don't look for any Open MPI's files on > disk? > > Arham > > > From: "r...@op

Re: [OMPI users] Node failure handling

2017-06-09 Thread r...@open-mpi.org
It has been awhile since I tested it, but I believe the --enable-recovery option might do what you want. > On Jun 8, 2017, at 6:17 AM, Tim Burgess wrote: > > Hi! > > So I know from searching the archive that this is a repeated topic of > discussion here, and

Re: [OMPI users] "undefined reference to `MPI_Comm_create_group'" error message when using Open MPI 1.6.2

2017-06-09 Thread r...@open-mpi.org
Sure - just configure OMPI with “--enable-static --disable-shared” > On Jun 9, 2017, at 5:50 AM, Arham Amouie via users > wrote: > > Thank you very much. Could you please answer another somewhat related > question? I'd like to know if ORTE could be linked statically

Re: [OMPI users] Issue in installing PMIx

2017-06-07 Thread r...@open-mpi.org
I guess I should also have clarified - I tested with PMIx v1.1.5 as that is the latest in the 1.1 series. > On Jun 7, 2017, at 8:23 PM, r...@open-mpi.org wrote: > > It built fine for me - on your configure path-to-pmix, what did you tell it? > It wants the path supplied as when yo

Re: [OMPI users] Issue in installing PMIx

2017-06-07 Thread r...@open-mpi.org
It built fine for me - on your configure path-to-pmix, what did you tell it? It wants the path supplied as when you configured pmix itself > On Jun 7, 2017, at 2:50 PM, Marc Cooper <marccooper2...@gmail.com> wrote: > > OpenMPI 2.1.1 and PMIx v1.1 > > On 7 June 2017 at 11:

Re: [OMPI users] Issue in installing PMIx

2017-06-07 Thread r...@open-mpi.org
Ummm...what version of OMPI and PMIx are you talking about? > On Jun 6, 2017, at 2:20 PM, Marc Cooper wrote: > > Hi, > > I've been trying to install PMIx external to OpenMPI, with separate libevent > and hwloc. My configuration script is > > ./configure --prefix=

Re: [OMPI users] problem with "--host" with openmpi-v3.x-201705250239-d5200ea

2017-05-30 Thread r...@open-mpi.org
i:2,exin --mca plm_base_verbose 5 hostname > mpiexec -np 1 --host exin --mca plm_base_verbose 5 hostname > mpiexec -np 1 --host exin ldd ./hello_1_mpi > > if Open MPI is not installed on a shared filesystem (NFS for example), please > also double check > both install we

Re: [OMPI users] problem with "--host" with openmpi-v3.x-201705250239-d5200ea

2017-05-30 Thread r...@open-mpi.org
This behavior is as-expected. When you specify "-host foo,bar”, you have told us to assign one slot to each of those nodes. Thus, running 3 procs exceeds the number of slots you assigned. You can tell it to set the #slots to the #cores it discovers on the node by using “-host foo:*,bar:*” I

Re: [OMPI users] Closing pipes associated with repeated MPI comm spawns

2017-05-29 Thread r...@open-mpi.org
n 2.1.0. Should have clarified that initially, sorry. Running on > Ubuntu 12.04.5. > > On Fri, Apr 28, 2017 at 10:29 AM, r...@open-mpi.org > <mailto:r...@open-mpi.org> <r...@open-mpi.org <mailto:r...@open-mpi.org>> > wrote: > What version of OMPI are you using

Re: [OMPI users] How to launch ompi-server?

2017-05-27 Thread r...@open-mpi.org
- if you'd like me to open a bug report for this one > somewhere, just let me know. > > -Adam > > On Sun, Mar 19, 2017 at 2:46 PM, r...@open-mpi.org <mailto:r...@open-mpi.org> > <r...@open-mpi.org <mailto:r...@open-mpi.org>> wrote: > Well, your initial usage looks

Re: [OMPI users] MPI_Comm_accept()

2017-05-27 Thread r...@open-mpi.org
ate the quick turnaround. > > On Tue, Mar 14, 2017 at 10:24 AM, r...@open-mpi.org > <mailto:r...@open-mpi.org> <r...@open-mpi.org <mailto:r...@open-mpi.org>> > wrote: > I don’t see an issue right away, though I know it has been brought up before. > I hope to res

Re: [OMPI users] pmix, lxc, hpcx

2017-05-26 Thread r...@open-mpi.org
You can also get around it by configuring OMPI with “--disable-pmix-dstore” > On May 26, 2017, at 3:02 PM, Howard Pritchard wrote: > > Hi John, > > In the 2.1.x release stream a shared memory capability was introduced into > the PMIx component. > > I know nothing about

Re: [OMPI users] IBM Spectrum MPI problem

2017-05-19 Thread r...@open-mpi.org
If I might interject here before lots of time is wasted. Spectrum MPI is an IBM -product- and is not free. What you are likely running into is that their license manager is blocking you from running, albeit without a really nice error message. I’m sure that’s something they are working on. If

Re: [OMPI users] Can OpenMPI support torque and slurm at the same time?

2017-05-10 Thread r...@open-mpi.org
Certainly. Just make sure you have the headers for both on the node where you build OMPI so we build the required components. Then we will auto-detect which one we are running under, so nothing further is required > On May 10, 2017, at 11:41 AM, Belgin, Mehmet >

Re: [OMPI users] [OMPI USERS] Jumbo frames

2017-05-05 Thread r...@open-mpi.org
If you are looking to use TCP packets, then you want to set the send/recv buffer size in the TCP btl, not the openib one, yes? Also, what version of OMPI are you using? > On May 5, 2017, at 7:16 AM, Alberto Ortiz wrote: > > Hi, > I have a program running with

Re: [OMPI users] Closing pipes associated with repeated MPI comm spawns

2017-04-28 Thread r...@open-mpi.org
What version of OMPI are you using? > On Apr 28, 2017, at 8:26 AM, Austin Herrema wrote: > > Hello all, > > I am using mpi4py in an optimization code that iteratively spawns an MPI > analysis code (fortran-based) via "MPI.COMM_SELF.Spawn" (I gather that this > is not an

Re: [OMPI users] OpenMPI 2.1.0: FAIL: opal_path_nfs

2017-04-26 Thread r...@open-mpi.org
; > On 04/26/2017 05:54 PM, r...@open-mpi.org wrote: >> You can probably safely ignore it. >> >>> On Apr 26, 2017, at 2:29 PM, Prentice Bisbal <pbis...@pppl.gov> wrote: >>> >>> I'm trying to build OpenMPI 2.1.0 with GCC 5.4.0 on CentOS 6.8. Afte

Re: [OMPI users] OpenMPI 2.1.0: FAIL: opal_path_nfs

2017-04-26 Thread r...@open-mpi.org
You can probably safely ignore it. > On Apr 26, 2017, at 2:29 PM, Prentice Bisbal wrote: > > I'm trying to build OpenMPI 2.1.0 with GCC 5.4.0 on CentOS 6.8. After working > around the '-Lyes/lib' errors I reported in my previous post, opal_path_nfs > fails during 'make

Re: [OMPI users] In preparation for 3.x

2017-04-25 Thread r...@open-mpi.org
Sure - there is always an MCA param for everything: OMPI_MCA_rmaps_base_oversubscribe=1 > On Apr 25, 2017, at 2:10 PM, Eric Chamberland > <eric.chamberl...@giref.ulaval.ca> wrote: > > On 25/04/17 04:36 PM, r...@open-mpi.org wrote: >> add --oversubscribe to the cmd

Re: [OMPI users] In preparation for 3.x

2017-04-25 Thread r...@open-mpi.org
behavior. I can cope with this (I > don't usually restrict my runs to a subset of the nodes). > > George. > > > On Tue, Apr 25, 2017 at 4:53 PM, r...@open-mpi.org <mailto:r...@open-mpi.org> > <r...@open-mpi.org <mailto:r...@open-mpi.org>> wrote:

Re: [OMPI users] In preparation for 3.x

2017-04-25 Thread r...@open-mpi.org
NKNOWN > dancer28: flags=0x10 slots=8 max_slots=0 slots_inuse=0 state=UNKNOWN > dancer29: flags=0x10 slots=8 max_slots=0 slots_inuse=0 state=UNKNOWN > dancer30: flags=0x10 slots=8 max_slots=0 slots_inuse=0 state=UNKNOWN > dancer31: flags=0x10 slots=8 max_slots=0

Re: [OMPI users] In preparation for 3.x

2017-04-25 Thread r...@open-mpi.org
e=UNKNOWN > dancer30: flags=0x10 slots=8 max_slots=0 slots_inuse=0 state=UNKNOWN > dancer31: flags=0x10 slots=8 max_slots=0 slots_inuse=0 state=UNKNOWN > ===== > -- > There are not enough slots available in the s

Re: [OMPI users] In preparation for 3.x

2017-04-25 Thread r...@open-mpi.org
g:22463] [[40219,0],0] plm:base:launch_apps for job [40219,1] > [zorg:22463] [[40219,0],0] hostfile: checking hostfile > /opt/openmpi-3.x_debug/etc/openmpi-default-hostfile for nodes > [zorg:22463] [[40219,0],0] plm:base:launch wiring up iof for job [40219,1] > [zorg:22463

Re: [OMPI users] In preparation for 3.x

2017-04-25 Thread r...@open-mpi.org
tations, such as LAM/MPI). A localhost-only node list is > # created by the RAS component named "localhost" if no other RAS > # components were able to find any hosts to run on (this behavior can > # be disabled by excluding the localhost RAS component by specifying > # the valu

Re: [OMPI users] In preparation for 3.x

2017-04-25 Thread r...@open-mpi.org
What is in your hostfile? > On Apr 25, 2017, at 11:39 AM, Eric Chamberland > wrote: > > Hi, > > just testing the 3.x branch... I launch: > > mpirun -n 8 echo "hello" > > and I get: > >

Re: [OMPI users] scheduling to real cores, not using hyperthreading (openmpi-2.1.0)

2017-04-24 Thread r...@open-mpi.org
./../../../../../../../..] > [pascal-3-07:21027] ... > [../../../../../B./B./B./B./B.][../../../../../../../../../..] > [pascal-3-07:21027] ... > [../../../../../../../../../..][B./B./B./B./B./../../../../..] > [pascal-3-07:21027] ... > [../../../../../../../../../..][../../../../../B./B./B./B./B./] > &

Re: [OMPI users] scheduling to real cores, not using hyperthreading (openmpi-2.1.0)

2017-04-22 Thread r...@open-mpi.org
0[core 4[hwt 0]]: > [B./B./B./B./B./../../../../..][../../../../../../../../../..] > from OpenMPI directly? > > Cheers and thanks again, > > Ado > > On 13.04.2017 17:34, r...@open-mpi.org wrote: >> Yeah, we need libnuma to set the memory binding. There is a param

Re: [OMPI users] fatal error for openmpi-master-201704200300-ded63c with SuSE Linux and gcc-6.3.0

2017-04-20 Thread r...@open-mpi.org
This is a known issue due to something in the NVIDIA library and it’s interactions with hwloc. Your tarball tag indicates you should have the attempted fix in it, so likely that wasn’t adequate. See https://github.com/open-mpi/ompi/pull/3283 for the

Re: [OMPI users] scheduling to real cores, not using hyperthreading (openmpi-2.1.0)

2017-04-13 Thread r...@open-mpi.org
You can always specify a particular number of cpus to use for each process by adding it to the map-by directive: mpirun -np 8 --map-by ppr:2:socket:pe=5 --use-hwthread-cpus -report-bindings --mca plm_rsh_agent "qrsh" ./myid would map 2 processes to each socket, binding each process to 5 HTs

Re: [OMPI users] scheduling to real cores, not using hyperthreading (openmpi-2.1.0)

2017-04-12 Thread r...@open-mpi.org
004 is on pascal-3-00: MP thread #0002(pid 07514), > 025, Cpus_allowed_list: > 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 > MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0003(pid 07514), > 029, Cpus_allowed_list: > 1,3,5,7,9,11,13,15,17,19,21,23

Re: [OMPI users] scheduling to real cores, not using hyperthreading (openmpi-2.1.0)

2017-04-10 Thread r...@open-mpi.org
I’m not entirely sure I understand your reference to “real cores”. When we bind you to a core, we bind you to all the HT’s that comprise that core. So, yes, with HT enabled, the binding report will list things by HT, but you’ll always be bound to the full core if you tell us bind-to core The

Re: [OMPI users] No more default core binding since 2.0.2?

2017-04-10 Thread r...@open-mpi.org
> On Apr 10, 2017, at 1:37 AM, Reuti <re...@staff.uni-marburg.de> wrote: > >> >> Am 10.04.2017 um 01:58 schrieb r...@open-mpi.org <mailto:r...@open-mpi.org>: >> >> Let me try to clarify. If you launch a job that has only 1 or 2 processes in >

Re: [OMPI users] No more default core binding since 2.0.2?

2017-04-09 Thread r...@open-mpi.org
Let me try to clarify. If you launch a job that has only 1 or 2 processes in it (total), then we bind to core by default. This is done because a job that small is almost always some kind of benchmark. If there are more than 2 processes in the job (total), then we default to binding to NUMA (if

Re: [OMPI users] No more default core binding since 2.0.2?

2017-04-09 Thread r...@open-mpi.org
> On Apr 9, 2017, at 1:49 PM, Reuti <re...@staff.uni-marburg.de> wrote: > > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > Hi, > > Am 09.04.2017 um 16:35 schrieb r...@open-mpi.org <mailto:r...@open-mpi.org>: > >> There has been no change in th

Re: [OMPI users] Install openmpi.2.0.2 with certain option

2017-04-04 Thread r...@open-mpi.org
--without-cuda --without-slurm should do the trick > On Apr 4, 2017, at 4:49 AM, Andrey Shtyrov via users > wrote: > > Dear openmpi communite, > > I am need to install openmpi.2.0.2 on sistem with slurm, and cuda, without > support it. > > I have tried write

Re: [OMPI users] Performance degradation of OpenMPI 1.10.2 when oversubscribed?

2017-03-27 Thread r...@open-mpi.org
I’m confused - mpi_yield_when_idle=1 is precisely the “oversubscribed” setting. So why would you expect different results? > On Mar 27, 2017, at 3:52 AM, Jordi Guitart wrote: > > Hi Ben, > > Thanks for your feedback. As described here >

Re: [OMPI users] OpenMPI-2.1.0 problem with executing orted when using SGE

2017-03-22 Thread r...@open-mpi.org
Sorry folks - for some reason (probably timing for getting 2.1.0 out), the fix for this got pushed to v2.1.1 - see the PR here: https://github.com/open-mpi/ompi/pull/3163 > On Mar 22, 2017, at 7:49 AM, Reuti wrote: >

Re: [OMPI users] How to launch ompi-server?

2017-03-19 Thread r...@open-mpi.org
Well, your initial usage looks correct - you don’t launch ompi-server via mpirun. However, it sounds like there is probably a bug somewhere if it hangs as you describe. Scratching my head, I can only recall less than a handful of people ever using these MPI functions to cross-connect jobs, so

Re: [OMPI users] MPI_Comm_accept()

2017-03-14 Thread r...@open-mpi.org
pect this to > be fixed? > > Thanks. > > On Mon, Mar 13, 2017 at 10:45 AM, r...@open-mpi.org > <mailto:r...@open-mpi.org> <r...@open-mpi.org <mailto:r...@open-mpi.org>> > wrote: > You should consider it a bug for now - it won’t work in the 2.0 series, and I

Re: [OMPI users] MPI_Comm_accept()

2017-03-13 Thread r...@open-mpi.org
You should consider it a bug for now - it won’t work in the 2.0 series, and I don’t think it will work in the upcoming 2.1.0 release. Probably will be fixed after that. > On Mar 13, 2017, at 5:17 AM, Adam Sylvester wrote: > > As a follow-up, I tried this with Open MPI

Re: [OMPI users] OpenMPI in docker container

2017-03-11 Thread r...@open-mpi.org
Past attempts have indicated that only TCP works well with Docker - if you want to use OPA, you’re probably better off using Singularity as your container. http://singularity.lbl.gov/ The OMPI master has some optimized integration for Singularity, but 2.0.2 will

Re: [OMPI users] MPI for microcontrolles without OS

2017-03-08 Thread r...@open-mpi.org
y - thats > why I force to find a solutions. > About this ported version - was it working properly? > > Thanks in advance, > Mateusz Tasz > > > 2017-03-08 18:23 GMT+01:00 r...@open-mpi.org <r...@open-mpi.org>: >> OpenMPI has been ported to microcontrollers before, b

Re: [OMPI users] MPI for microcontrolles without OS

2017-03-08 Thread r...@open-mpi.org
OpenMPI has been ported to microcontrollers before, but it does require at least a minimal OS to provide support (e.g., TCP for communications). Most IoT systems already include an OS on them for just that reason. I personally have OMPI running on a little Edison board using the OS that comes

Re: [OMPI users] Issues with different IB adapters and openmpi 2.0.2

2017-02-28 Thread r...@open-mpi.org
The root cause is that the nodes are defined as “heterogeneous” because the difference in HCAs causes a difference in selection logic. For scalability purposes, we don’t circulate the choice of PML as that isn’t something mpirun can “discover” and communicate. One option we could pursue is to

Re: [OMPI users] State of the DVM in Open MPI

2017-02-28 Thread r...@open-mpi.org
Hi Reuti The DVM in master seems to be fairly complete, but several organizations are in the process of automating tests for it so it gets more regular exercise. If you are using a version in OMPI 2.x, those are early prototype - we haven’t updated the code in the release branches. The more

Re: [OMPI users] Using OpenMPI / ORTE as cluster aware GNU Parallel

2017-02-27 Thread r...@open-mpi.org
> On Feb 27, 2017, at 9:39 AM, Reuti wrote: > > >> Am 27.02.2017 um 18:24 schrieb Angel de Vicente : >> >> […] >> >> For a small group of users if the DVM can run with my user and there is >> no restriction on who can use it or if I somehow can

Re: [OMPI users] Using OpenMPI / ORTE as cluster aware GNU Parallel

2017-02-27 Thread r...@open-mpi.org
> On Feb 27, 2017, at 4:58 AM, Angel de Vicente <ang...@iac.es> wrote: > > Hi, > > "r...@open-mpi.org" <r...@open-mpi.org> writes: >> You might want to try using the DVM (distributed virtual machine) >> mode in ORTE. You can start it on an alloca

Re: [OMPI users] Using OpenMPI / ORTE as cluster aware GNU Parallel

2017-02-23 Thread r...@open-mpi.org
You might want to try using the DVM (distributed virtual machine) mode in ORTE. You can start it on an allocation using the “orte-dvm” cmd, and then submit jobs to it with “mpirun --hnp ”, where foo is either the contact info printed out by orte-dvm, or the name of the file you told orte-dvm to

Re: [OMPI users] More confusion about --map-by!

2017-02-23 Thread r...@open-mpi.org
rank 7 bound to socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]]: [../../../../../../../../../../../..][../../BB/BB/../../../../../../../..] “span” causes ORTE to treat all the sockets etc. as being on a single giant node. HTH Ralph > On Feb 23, 2017, at 6:38 AM, r...@open-mpi.org wr

Re: [OMPI users] More confusion about --map-by!

2017-02-23 Thread r...@open-mpi.org
From the mpirun man page: ** Open MPI employs a three-phase procedure for assigning process locations and ranks: mapping Assigns a default location to each process ranking Assigns an MPI_COMM_WORLD rank value to each process binding Constrains each process to run on specific

Re: [OMPI users] Segmentation Fault when using OpenMPI 1.10.6 and PGI 17.1.0 on POWER8

2017-02-21 Thread r...@open-mpi.org
Can you provide a backtrace with line numbers from a debug build? We don’t get much testing with lsf, so it is quite possible there is a bug in there. > On Feb 21, 2017, at 7:39 PM, Hammond, Simon David (-EXP) > wrote: > > Hi OpenMPI Users, > > Has anyone successfully

Re: [OMPI users] OpenMPI and Singularity

2017-02-20 Thread r...@open-mpi.org
If there are diagnostics you would like, I can try to > provide those. I will be gone starting Thu for a week. > > -- bennet > > > > > On Fri, Feb 17, 2017 at 11:20 PM, r...@open-mpi.org <r...@open-mpi.org> wrote: >> I -think- that is correct, but you may

Re: [OMPI users] Is building with "--enable-mpi-thread-multiple" recommended?

2017-02-18 Thread r...@open-mpi.org
andling of application > exceptions across processes) > > So it is good to hear there is progress. > > On Feb 18, 2017 7:43 AM, "r...@open-mpi.org <mailto:r...@open-mpi.org>" > <r...@open-mpi.org <mailto:r...@open-mpi.org>> wrote: > We have been m

Re: [OMPI users] Is building with "--enable-mpi-thread-multiple" recommended?

2017-02-18 Thread r...@open-mpi.org
t; > On Fri, 17 Feb 2017, r...@open-mpi.org wrote: > >> Depends on the version, but if you are using something in the v2.x range, >> you should be okay with just one installed version > > Thanks Ralph. > > How good is MPI_THREAD_MULTIPLE support these days an

Re: [OMPI users] OpenMPI and Singularity

2017-02-17 Thread r...@open-mpi.org
port, it would still run, > but it would fall back to non-verbs communication, so it would just be > commensurately slower. > > Let me know if I've garbled things. Otherwise, wish me luck, and have > a good weekend! > > Thanks, -- bennet > > > > On Fri, Feb

Re: [OMPI users] OpenMPI and Singularity

2017-02-17 Thread r...@open-mpi.org
The embedded Singularity support hasn’t made it into the OMPI 2.x release series yet, though OMPI will still work within a Singularity container anyway. Compatibility across the container boundary is always a problem, as your examples illustrate. If the system is using one OMPI version and the

Re: [OMPI users] "-map-by socket:PE=1" doesn't do what I expect

2017-02-17 Thread r...@open-mpi.org
Mark - this is now available in master. Will look at what might be required to bring it to 2.0 > On Feb 15, 2017, at 5:49 AM, r...@open-mpi.org wrote: > > >> On Feb 15, 2017, at 5:45 AM, Mark Dixon <m.c.di...@leeds.ac.uk> wrote: >> >> On Wed, 15 Feb 2017, r.

Re: [OMPI users] fatal error with openmpi-master-201702150209-404fe32 on Linux with Sun C

2017-02-17 Thread r...@open-mpi.org
iegmar, > > > meanwhile, feel free to manually apply the attached patch > > > > Cheers, > > > Gilles > > > On 2/16/2017 8:09 AM, r...@open-mpi.org wrote: >> I guess it was the next nightly tarball, but not next commit. However, it >> was almost c

Re: [OMPI users] Is building with "--enable-mpi-thread-multiple" recommended?

2017-02-17 Thread r...@open-mpi.org
Depends on the version, but if you are using something in the v2.x range, you should be okay with just one installed version > On Feb 17, 2017, at 4:41 AM, Mark Dixon wrote: > > Hi, > > We have some users who would like to try out openmpi MPI_THREAD_MULTIPLE > support

Re: [OMPI users] fatal error with openmpi-master-201702150209-404fe32 on Linux with Sun C

2017-02-15 Thread r...@open-mpi.org
mpi-master-201702080209-bc2890e-Linux.x86_64.64_cc/opal/mca/pmix/pmix2x/pmix/src/dstore/pmix_esh.lo > > loki openmpi-master 148 find > openmpi-master-201702100209-51def91-Linux.x86_64.64_cc -name pmix_esh.lo > loki openmpi-master 149 > > Which files do you need? Which command

Re: [OMPI users] Problem with MPI_Comm_spawn using openmpi 2.0.x + sbatch

2017-02-15 Thread r...@open-mpi.org
re job.sh is: >> >> #!/bin/bash -l >> module load openmpi/2.0.1-icc >> mpirun -np 1 ./manager 4 >> >> >> >> >> >> >> >>> On 15 February 2017 at 17:58, r...@open-mpi.org <r...@open-mpi.org> wrote: >>> The cmd line looks fine - whe

Re: [OMPI users] Problem with MPI_Comm_spawn using openmpi 2.0.x + sbatch

2017-02-15 Thread r...@open-mpi.org
n when in batch environment but you may also want > to upgrade to Open MPI 2.0.2. > > Howard > > r...@open-mpi.org <mailto:r...@open-mpi.org> <r...@open-mpi.org > <mailto:r...@open-mpi.org>> schrieb am Mi. 15. Feb. 2017 um 07:49: > Nothing immediate comes to mi

Re: [OMPI users] fatal error with openmpi-master-201702150209-404fe32 on Linux with Sun C

2017-02-15 Thread r...@open-mpi.org
If we knew what line in that file was causing the compiler to barf, we could at least address it. There is probably something added in recent commits that is causing problems for the compiler. So checking to see what commit might be triggering the failure would be most helpful. > On Feb 15,

Re: [OMPI users] Problem with MPI_Comm_spawn using openmpi 2.0.x + sbatch

2017-02-15 Thread r...@open-mpi.org
Nothing immediate comes to mind - all sbatch does is create an allocation and then run your script in it. Perhaps your script is using a different “mpirun” command than when you type it interactively? > On Feb 14, 2017, at 5:11 AM, Anastasia Kruchinina > wrote: >

Re: [OMPI users] Specify the core binding when spawning a process

2017-02-15 Thread r...@open-mpi.org
Sorry for slow response - was away for awhile. What version of OMPI are you using? > On Feb 8, 2017, at 1:59 PM, Allan Ma wrote: > > Hello, > > I'm designing a program on a dual socket system that needs the parent process > and spawned child process to be at least

Re: [OMPI users] "-map-by socket:PE=1" doesn't do what I expect

2017-02-15 Thread r...@open-mpi.org
> On Feb 15, 2017, at 5:45 AM, Mark Dixon <m.c.di...@leeds.ac.uk> wrote: > > On Wed, 15 Feb 2017, r...@open-mpi.org wrote: > >> Ah, yes - I know what the problem is. We weren’t expecting a PE value of 1 - >> the logic is looking expressly for values > 1 as w

Re: [OMPI users] "-map-by socket:PE=1" doesn't do what I expect

2017-02-15 Thread r...@open-mpi.org
Ah, yes - I know what the problem is. We weren’t expecting a PE value of 1 - the logic is looking expressly for values > 1 as we hadn’t anticipated this use-case. I can make that change. I’m off to a workshop for the next day or so, but can probably do this on the plane. > On Feb 15, 2017,

Re: [OMPI users] "-map-by socket:PE=1" doesn't do what I expect

2017-02-15 Thread r...@open-mpi.org
Ah, yes - I know what the problem is. We weren’t expecting a PE value of 1 - the logic is looking expressly for values > 1 as we hadn’t anticipated this use-case. I can make that change. I’m off to a workshop for the next day or so, but can probably do this on the plane. > On Feb 15, 2017,

Re: [OMPI users] Is gridengine integration broken in openmpi 2.0.2?

2017-02-13 Thread r...@open-mpi.org
clarified the logic in the OMPI master repo. However, I don’t know how long it will be before a 2.0.3 release is issued, so GridEngine users might want to locally fix things in the interim. > On Feb 12, 2017, at 1:52 PM, r...@open-mpi.org wrote: > > Yeah, I’ll fix it this week. Th

Re: [OMPI users] Is gridengine integration broken in openmpi 2.0.2?

2017-02-12 Thread r...@open-mpi.org
Yeah, I’ll fix it this week. The problem is that you can’t check the source as being default as the default is ssh - so the only way to get the current code to check for qrsh is to specify something other than the default ssh (it doesn’t matter what you specify - anything will get you past the

Re: [OMPI users] MPI_Comm_spawn question

2017-02-03 Thread r...@open-mpi.org
We know v2.0.1 has problems with comm_spawn, and so you may be encountering one of those. Regardless, there is indeed a timeout mechanism in there. It was added because people would execute a comm_spawn, and then would hang and eat up their entire allocation time for nothing. In v2.0.2, I see

Re: [OMPI users] Is gridengine integration broken in openmpi 2.0.2?

2017-02-03 Thread r...@open-mpi.org
https://github.com/open-mpi/ompi/pull/1960/files> > > > Glenn > > On Fri, Feb 3, 2017 at 10:56 AM, r...@open-mpi.org <mailto:r...@open-mpi.org> > <r...@open-mpi.org <mailto:r...@open-mpi.org>> wrote: > I do see a diff between 2.0.1 and 2.0.2

Re: [OMPI users] Is gridengine integration broken in openmpi 2.0.2?

2017-02-03 Thread r...@open-mpi.org
I do see a diff between 2.0.1 and 2.0.2 that might have a related impact. The way we handled the MCA param that specifies the launch agent (ssh, rsh, or whatever) was modified, and I don’t think the change is correct. It basically says that we don’t look for qrsh unless the MCA param has been

Re: [OMPI users] Performance Issues on SMP Workstation

2017-02-01 Thread r...@open-mpi.org
Simple test: replace your executable with “hostname”. If you see multiple hosts come out on your cluster, then you know why the performance is different. > On Feb 1, 2017, at 2:46 PM, Andy Witzig wrote: > > Honestly, I’m not exactly sure what scheme is being used. I am

Re: [OMPI users] MPI_Comm_spawn question

2017-01-31 Thread r...@open-mpi.org
What version of OMPI are you using? > On Jan 31, 2017, at 7:33 AM, elistrato...@info.sgu.ru wrote: > > Hi, > > I am trying to write trivial master-slave program. Master simply creates > slaves, sends them a string, they print it out and exit. Everything works > just fine, however, when I add a

Re: [OMPI users] Startup limited to 128 remote hosts in some situations?

2017-01-20 Thread r...@open-mpi.org
; On Jan 19, 2017, at 5:29 PM, r...@open-mpi.org wrote: > > I’ll create a patch that you can try - if it works okay, we can commit it > >> On Jan 18, 2017, at 3:29 AM, William Hay <w@ucl.ac.uk> wrote: >> >> On Tue, Jan 17, 2017 at 09:56:54AM -0800, r...@ope

Re: [OMPI users] Startup limited to 128 remote hosts in some situations?

2017-01-17 Thread r...@open-mpi.org
As I recall, the problem was that qrsh isn’t available on the backend compute nodes, and so we can’t use a tree for launch. If that isn’t true, then we can certainly adjust it. > On Jan 17, 2017, at 9:37 AM, Mark Dixon wrote: > > Hi, > > While commissioning a new

Re: [OMPI users] still segmentation fault with openmpi-2.0.2rc3 on Linux

2017-01-10 Thread r...@open-mpi.org
I think there is some relevant discussion here: https://github.com/open-mpi/ompi/issues/1569 It looks like Gilles had (at least at one point) a fix for master when enable-heterogeneous, but I don’t know if that was committed. > On Jan 9, 2017, at

Re: [OMPI users] OpenMPI + InfiniBand

2016-12-23 Thread r...@open-mpi.org
Also check to ensure you are using the same version of OMPI on all nodes - this message usually means that a different version was used on at least one node. > On Dec 23, 2016, at 1:58 AM, gil...@rist.or.jp wrote: > > Serguei, > > > this looks like a very different issue, orted cannot be

[OMPI users] Release of OMPI v1.10.5

2016-12-19 Thread r...@open-mpi.org
The Open MPI Team, representing a consortium of research, academic, and industry partners, is pleased to announce the release of Open MPI version 1.10.5. v1.10.5 is a bug fix release that includes an important performance regression fix. All users are encouraged to upgrade to v1.10.5 when

Re: [OMPI users] Abort/ Deadlock issue in allreduce

2016-12-08 Thread r...@open-mpi.org
etc. to try to catch signals. Would that be useful ? I need a >>>> moment to figure out how to do this, but I can definitively try. >>>> >>>> Some remark: During "make install" from the git repo I see a >>>> >>>> WARNING! Commo

Re: [OMPI users] device failed to appear .. Connection timed out

2016-12-08 Thread r...@open-mpi.org
Sounds like something didn’t quite get configured right, or maybe you have a library installed that isn’t quite setup correctly, or... Regardless, we generally advise building from source to avoid such problems. Is there some reason not to just do so? > On Dec 8, 2016, at 6:16 AM, Daniele

Re: [OMPI users] Abort/ Deadlock issue in allreduce

2016-12-07 Thread r...@open-mpi.org
Hi Christof Sorry if I missed this, but it sounds like you are saying that one of your procs abnormally terminates, and we are failing to kill the remaining job? Is that correct? If so, I just did some work that might relate to that problem that is pending in PR #2528:

Re: [OMPI users] Signal propagation in 2.0.1

2016-12-02 Thread r...@open-mpi.org
Fix is on the way: https://github.com/open-mpi/ompi/pull/2498 <https://github.com/open-mpi/ompi/pull/2498> Thanks Ralph > On Dec 1, 2016, at 10:49 AM, r...@open-mpi.org wrote: > > Yeah, that’s a bug - we’ll have to address it > > Thanks > Ralph > >> On Nov 2

Re: [OMPI users] Signal propagation in 2.0.1

2016-12-01 Thread r...@open-mpi.org
Yeah, that’s a bug - we’ll have to address it Thanks Ralph > On Nov 28, 2016, at 9:29 AM, Noel Rycroft wrote: > > I'm seeing different behaviour between Open MPI 1.8.4 and 2.0.1 with regards > to signal propagation. > > With version 1.8.4 mpirun seems to propagate

Re: [OMPI users] question about "--rank-by slot" behavior

2016-11-30 Thread r...@open-mpi.org
algorithm (the only thing I changed in my > examples) and this results in ranks being assigned differently? > > Thanks again, > David > > On 11/30/2016 01:23 PM, r...@open-mpi.org wrote: >> I think you have confused “slot” with a physical “core”. The two have >> absolutel

Re: [OMPI users] question about "--rank-by slot" behavior

2016-11-30 Thread r...@open-mpi.org
I think you have confused “slot” with a physical “core”. The two have absolutely nothing to do with each other. A “slot” is nothing more than a scheduling entry in which a process can be placed. So when you --rank-by slot, the ranks are assigned round-robin by scheduler entry - i.e., you

Re: [OMPI users] malloc related crash inside openmpi

2016-11-24 Thread r...@open-mpi.org
016, at 2:31 PM, Noam Bernstein <noam.bernst...@nrl.navy.mil> > wrote: > > >> On Nov 23, 2016, at 5:26 PM, r...@open-mpi.org <mailto:r...@open-mpi.org> >> wrote: >> >> It looks like the library may not have been fully installed on that node -

Re: [OMPI users] malloc related crash inside openmpi

2016-11-23 Thread r...@open-mpi.org
It looks like the library may not have been fully installed on that node - can you see if the prefix location is present, and that the LD_LIBRARY_PATH on that node is correctly set? The referenced component did not exist prior to the 2.0 series, so I’m betting that your LD_LIBRARY_PATH isn’t

Re: [OMPI users] Slurm binding not propagated to MPI jobs

2016-11-04 Thread r...@open-mpi.org
ave us from > having to write some Slurm prolog scripts to effect that. > > Thanks Ralph! > > On 11/01/2016 11:36 PM, r...@open-mpi.org <mailto:r...@open-mpi.org> wrote: >> Ah crumby!! We already solved this on master, but it cannot be backported to >> the 1.10 series w

Re: [OMPI users] mpirun --map-by-node

2016-11-04 Thread r...@open-mpi.org
ining processors. >> >> Just a thought, -- bennet >> >> >> >> >> >> On Fri, Nov 4, 2016 at 8:39 AM, Mahesh Nanavalla >> <mahesh.nanavalla...@gmail.com> wrote: >>> s... >>> >>> Thanks for responding me

Re: [OMPI users] mpirun --map-by-node

2016-11-04 Thread r...@open-mpi.org
r/bin/openmpiWiFiBulb > > But,i want use hostfile only.. > kindly help me. > > > On Fri, Nov 4, 2016 at 5:00 PM, r...@open-mpi.org <mailto:r...@open-mpi.org> > <r...@open-mpi.org <mailto:r...@open-mpi.org>> wrote: > you mistyped the option - it is “--map-by node”.

Re: [OMPI users] mpirun --map-by-node

2016-11-04 Thread r...@open-mpi.org
you mistyped the option - it is “--map-by node”. Note the space between “by” and “node” - you had typed it with a “-“ instead of a “space” > On Nov 4, 2016, at 4:28 AM, Mahesh Nanavalla > wrote: > > Hi all, > > I am using openmpi-1.10.3,using quad core

Re: [OMPI users] Slurm binding not propagated to MPI jobs

2016-11-01 Thread r...@open-mpi.org
> Subject: Re: [OMPI users] Slurm binding not propagated to MPI jobs > > Hi Ralph, > > I haven't played around in this code, so I'll flip the question over to the > Slurm list, and report back here when I learn anything. > > Cheers > Andy > > On 10/27/2016 01:44

Re: [OMPI users] MCA compilation later

2016-10-31 Thread r...@open-mpi.org
w set of header files? > > (I'm a bit surprised only header files are necessary. Shouldn't the plugin > require at least runtime linking with a low-level transport library?) > > -Sean > > -- > Sean Ahern > Computational Engineering International > 919-363-0883 > > On

<    1   2   3   >