Re: [OMPI users] Error in file base/plm_base_launch_support.c: OPAL_HWLOC_TOPO

2018-07-21 Thread r...@open-mpi.org
More than likely the problem is the difference in hwloc versions - sounds like the topology to/from xml is different between the two versions, and the older one doesn’t understand the new one. > On Jul 21, 2018, at 12:04 PM, Brian Smith > wrote: > > Greetings, > > I'm having trouble getting

[MTT users] MTT docs now online

2018-07-12 Thread r...@open-mpi.org
We finally got the bugs out and the documentation for Python MTT implementation is now online at https://open-mpi.github.io/mtt . It also picked up the Perl stuff, but we’ll just ignore that little detail :-) Thanks to Akshaya Jagannadharao for all the hard work

Re: [OMPI users] Locking down TCP ports used

2018-07-07 Thread r...@open-mpi.org
I suspect the OOB is working just fine and you are seeing the TCP/btl opening the other ports. There are two TCP elements at work here: the OOB (which sends management messages between daemons) and the BTL (which handles the MPI traffic). In addition to what you provided, you also need to

Re: [OMPI users] Verbose output for MPI

2018-07-04 Thread r...@open-mpi.org
Try adding PMIX_MCA_ptl_base_verbose=10 to your environment > On Jul 4, 2018, at 8:51 AM, Maksym Planeta > wrote: > > Thanks for quick response, > > I tried this out and I do get more output: https://pastebin.com/JkXAYdM4. But > the line I need does not appear in the output. > > On 04/07/18

Re: [OMPI users] Disable network interface selection

2018-06-22 Thread r...@open-mpi.org
> On Jun 22, 2018, at 8:25 PM, r...@open-mpi.org wrote: > > > >> On Jun 22, 2018, at 7:31 PM, Gilles Gouaillardet >> mailto:gilles.gouaillar...@gmail.com>> wrote: >> >> Carlos, >> >> By any chance, could >> >> mpiru

Re: [OMPI users] Disable network interface selection

2018-06-22 Thread r...@open-mpi.org
> On Jun 22, 2018, at 7:31 PM, Gilles Gouaillardet > wrote: > > Carlos, > > By any chance, could > > mpirun—mca oob_tcp_if_exclude 192.168.100.0/24 ... > > work for you ? > > Which Open MPI version are you running ? > > > IIRC, subnets are internally translated

Re: [OMPI users] new core binding issues?

2018-06-22 Thread r...@open-mpi.org
Afraid I’m not familiar with that option, so I really don’t know :-( > On Jun 22, 2018, at 10:13 AM, Noam Bernstein > wrote: > >> On Jun 22, 2018, at 1:00 PM, r...@open-mpi.org <mailto:r...@open-mpi.org> >> wrote: >> >> I suspect it is okay. Keep in min

Re: [OMPI users] new core binding issues?

2018-06-22 Thread r...@open-mpi.org
I suspect it is okay. Keep in mind that OMPI itself is starting multiple progress threads, so that is likely what you are seeing. The binding patter in the mpirun output looks correct as the default would be to map-by socket and you asked that we bind-to core. > On Jun 22, 2018, at 9:33 AM,

Re: [OMPI users] Enforcing specific interface and subnet usage

2018-06-19 Thread r...@open-mpi.org
nment > On Jun 19, 2018, at 2:08 AM, Maksym Planeta > wrote: > > But what about remote connections parameter? Why is it not set? > > On 19/06/18 00:58, r...@open-mpi.org wrote: >> I’m not entirely sure I understand what you are trying to do. The >> PMIX_SERVER

Re: [OMPI users] Enforcing specific interface and subnet usage

2018-06-18 Thread r...@open-mpi.org
I’m not entirely sure I understand what you are trying to do. The PMIX_SERVER_URI2 envar tells local clients how to connect to their local PMIx server (i.e., the OMPI daemon on that node). This is always done over the loopback device since it is a purely local connection that is never used for

Re: [OMPI users] srun works, mpirun does not

2018-06-18 Thread r...@open-mpi.org
l terminate the job. >> ---------- >> >> That makes sense, I guess. >> >> I'll keep you posted as to what happens with 3.0.0 and downgrading SLURM. >> >> >> Thanks,

Re: [OMPI users] srun works, mpirun does not

2018-06-18 Thread r...@open-mpi.org
ORTE_SUCCESS > > > At one point, I am almost certain that OMPI mpirun did work, and I am > at a loss to explain why it no longer does. > > I have also tried the 3.1.1rc1 version. I am now going to try 3.0.0, > and we'll try downgrading SLURM to a prior version. >

Re: [OMPI users] srun works, mpirun does not

2018-06-18 Thread r...@open-mpi.org
TIME NODES > NODELIST(REASON) > 158 standard bash bennet R 14:30 1 cav01 > [bennet@cavium-hpc ~]$ srun hostname > cav01.arc-ts.umich.edu > [ repeated 23 more times ] > > As always, your help is much appreciated, > > -- bennet > > On S

Re: [OMPI users] Fwd: srun works, mpirun does not

2018-06-17 Thread r...@open-mpi.org
Add --enable-debug to your OMPI configure cmd line, and then add --mca plm_base_verbose 10 to your mpirun cmd line. For some reason, the remote daemon isn’t starting - this will give you some info as to why. > On Jun 17, 2018, at 9:07 AM, Bennet Fauber wrote: > > I have a compiled binary

Re: [OMPI users] Fwd: OpenMPI 3.1.0 on aarch64

2018-06-08 Thread r...@open-mpi.org
is solved. Thanks very much Ralph and Artem for your > help! > > -- bennet > > > On Thu, Jun 7, 2018 at 11:06 AM r...@open-mpi.org <mailto:r...@open-mpi.org> > mailto:r...@open-mpi.org>> wrote: > Odd - Artem, do you have any suggestions? > > >

Re: [OMPI users] Fwd: OpenMPI 3.1.0 on aarch64

2018-06-07 Thread r...@open-mpi.org
don't have a saved build log, but I can rebuild this and save the > build logs, in case any information in those logs would help. > > I will also mention that we have, in the past, used the > --disable-dlopen and --enable-shared flags, which we did not use here. > Just in case that makes

Re: [OMPI users] Fwd: OpenMPI 3.1.0 on aarch64

2018-06-07 Thread r...@open-mpi.org
rm/mpi_pmix.so -> ./mpi_pmix_v2.so > -rwxr-xr-x 1 root root 828232 May 30 15:20 > /opt/slurm/lib64/slurm/mpi_pmix_v2.so > > > Let me know if anything else would be helpful. > > Thanks,-- bennet > > On Thu, Jun 7, 2018 at 8:56 AM, r...@open-mpi.org wrot

Re: [OMPI users] Fwd: OpenMPI 3.1.0 on aarch64

2018-06-07 Thread r...@open-mpi.org
You didn’t show your srun direct launch cmd line or what version of Slurm is being used (and how it was configured), so I can only provide some advice. If you want to use PMIx, then you have to do two things: 1. Slurm must be configured to use PMIx - depending on the version, that might be

Re: [OMPI users] --oversubscribe option

2018-06-06 Thread r...@open-mpi.org
I’m not entirely sure what you are asking here. If you use oversubscribe, we do not bind your processes and you suffer some performance penalty for it. If you want to run one process/thread and retain binding, then do not use --oversubscribe and instead use --use-hwthread-cpus > On Jun 6,

Re: [OMPI users] A hang in Rmpi at PMIx_Disconnect

2018-06-04 Thread r...@open-mpi.org
bsolutely sure* that your application will successfully > and correctly survive a call to fork(), you may disable this warning > by setting the mpi_warn_on_fork MCA parameter to 0. > -- > And the process hangs as wel

Re: [OMPI users] A hang in Rmpi at PMIx_Disconnect

2018-06-04 Thread r...@open-mpi.org
ient disconnect >> [localhost.localdomain:11646] ext2x:client disconnect >> >> In your example it's only called once per process. >> >> Do you have any suspicion where the second call comes from? Might this be >> the reason for the hang? >> >> Than

Re: [OMPI users] A hang in Rmpi at PMIx_Disconnect

2018-06-04 Thread r...@open-mpi.org
Try running the attached example dynamic code - if that works, then it likely is something to do with how R operates. simple_spawn.c Description: Binary data > On Jun 4, 2018, at 3:43 AM, marcin.krotkiewski > wrote: > > Hi, > > I have some problems running R + Rmpi with OpenMPI 3.1.0 +

Re: [OMPI users] slurm configuration override mpirun command line process mapping

2018-05-17 Thread r...@open-mpi.org
mpirun takes the #slots for each node from the slurm allocation. Your hostfile (at least, what you provided) retained that information and shows 2 slots on each node. So both the original allocation _and_ your constructed hostfile are both telling mpirun to assign 2 slots on each node. Like I

Re: [OMPI users] slurm configuration override mpirun command line process mapping

2018-05-16 Thread r...@open-mpi.org
The problem here is that you have made an incorrect assumption. In the older OMPI versions, the -H option simply indicated that the specified hosts were available for use - it did not imply the number of slots on that host. Since you have specified 2 slots on each host, and you told mpirun to

Re: [OMPI users] OpenMPI 3.0.1 - mpirun hangs with 2 hosts

2018-05-14 Thread r...@open-mpi.org
You got that error because the orted is looking for its rank on the cmd line and not finding it. > On May 14, 2018, at 12:37 PM, Max Mellette wrote: > > Hi Gus, > > Thanks for the suggestions. The correct version of openmpi seems to be > getting picked up; I also

Re: [OMPI users] Openmpi-3.1.0 + slurm (fixed)

2018-05-08 Thread r...@open-mpi.org
Good news - thanks! > On May 8, 2018, at 7:02 PM, Bill Broadley wrote: > > > Sorry all, > > Chris S over on the slurm list spotted it right away. I didn't have the > MpiDefault set to pmix_v2. > > I can confirm that Ubuntu 18.04, gcc-7.3, openmpi-3.1.0, pmix-2.1.1, and

Re: [OMPI users] Memory leak with pmix_finalize not being called

2018-05-04 Thread r...@open-mpi.org
Ouch - trivial fix. I’m inserting it into PMIx and will provide it to the OMPI team > On May 4, 2018, at 5:20 AM, Saurabh T wrote: > > This is with valgrind 3.0.1 on a Centos 6 system. It appears pmix_finalize > isnt called and this reports leaks from valgrind despite the

Re: [OMPI users] openmpi/slurm/pmix

2018-04-25 Thread r...@open-mpi.org
> On Apr 25, 2018, at 8:16 AM, Michael Di Domenico <mdidomeni...@gmail.com> > wrote: > > On Mon, Apr 23, 2018 at 6:07 PM, r...@open-mpi.org <r...@open-mpi.org> wrote: >> Looks like the problem is that you didn’t wind up with the external PMIx. >>

Re: [OMPI users] Error in hello_cxx.cc

2018-04-23 Thread r...@open-mpi.org
and 3) we no longer support Windows. You could try using the cygwin port instead. > On Apr 23, 2018, at 7:52 PM, Nathan Hjelm wrote: > > Two things. 1) 1.4 is extremely old and you will not likely get much help > with it, and 2) the c++ bindings were deprecated in MPI-2.2

Re: [OMPI users] openmpi/slurm/pmix

2018-04-23 Thread r...@open-mpi.org
Hi Michael Looks like the problem is that you didn’t wind up with the external PMIx. The component listed in your error is the internal PMIx one which shouldn’t have built given that configure line. Check your config.out and see what happened. Also, ensure that your LD_LIBRARY_PATH is

Re: [OMPI users] Error in hello_cxx.cc

2018-04-23 Thread r...@open-mpi.org
Also, I note from the screenshot that you appear to be running on Windows with a Windows binary. Correct? > On Apr 23, 2018, at 7:08 AM, Jeff Squyres (jsquyres) > wrote: > > Can you send all the information listed here: > >https://www.open-mpi.org/community/help/ >

Re: [OMPI users] ARM/Allinea DDT

2018-04-11 Thread r...@open-mpi.org
This inadvertently was sent directly to me, and Ryan asked if I could please post it on his behalf - he needs to get fully approved on the mailing list. Ralph > On Apr 11, 2018, at 1:19 PM, Ryan Hulguin wrote: > > Hello Charlie Taylor, > > I have replied to your

Re: [OMPI users] ARM/Allinea DDT

2018-04-11 Thread r...@open-mpi.org
You probably should provide a little more info here. I know the MPIR attach was broken in the v2.x series, but we fixed that - could be something remains broken in OMPI 3.x. FWIW: I doubt it's an Allinea problem. > On Apr 11, 2018, at 11:54 AM, Charles A Taylor wrote: > > >

Re: [OMPI users] OpenMPI 3.0.0 on RHEL-7

2018-03-07 Thread r...@open-mpi.org
So long as it is libevent-devel-2.0.22, you should be okay. You might want to up PMIx to v1.2.5 as Slurm 16.05 should handle that okay. OMPI v3.0.0 has PMIx 2.0 in it, but should be okay with 1.2.5 last I checked (but it has been awhile and I can’t swear to it). > On Mar 7, 2018, at 2:03 PM,

Re: [OMPI users] libopen-pal not found

2018-03-02 Thread r...@open-mpi.org
Not that I’ve heard - you need to put it in your LD_LIBRARY_PATH > On Mar 2, 2018, at 10:15 AM, Mahmood Naderan wrote: > > Hi, > After a successful installation of opmi v3 with cuda enabled, I see that ldd > can not find a right lib file although it exists.

Re: [OMPI users] Cannot compile 1.10.2 under CentOS 7 rdma-core-devel-13-7.el7.x86_64

2018-02-28 Thread r...@open-mpi.org
Not unless you have a USNIC card in your machine! > On Feb 28, 2018, at 8:08 AM, William T Jones <w.t.jo...@nasa.gov> wrote: > > Thank you! > > Will that have any adverse side effects? > Performance penalties? > > On 02/28/2018 10:57 AM, r...@open-mpi.org

Re: [OMPI users] Cannot compile 1.10.2 under CentOS 7 rdma-core-devel-13-7.el7.x86_64

2018-02-28 Thread r...@open-mpi.org
Add --without-usnic > On Feb 28, 2018, at 7:50 AM, William T Jones wrote: > > I realize that OpenMPI 1.10.2 is quite old, however, for compatibility I > am attempting to compile it after a system upgrade to CentOS 7. > > This system does include infiniband and I have

Re: [OMPI users] Using OMPI Standalone in a Windows/Cygwin Environment

2018-02-26 Thread r...@open-mpi.org
There are a couple of problems here. First the “^tcp,self,sm” is telling OMPI to turn off all three of those transports, which probably leaves you with nothing. What you really want is to restrict to shared memory, so your param should be “-mca btl self,sm”. This will disable all transports

[OMPI users] Fabric manager interactions: request for comments

2018-02-05 Thread r...@open-mpi.org
Hello all The PMIx community is starting work on the next phase of defining support for network interactions, looking specifically at things we might want to obtain and/or control via the fabric manager. A very preliminary draft is shown here:

Re: [OMPI users] Setting mpirun default parameters in a file

2018-01-10 Thread r...@open-mpi.org
Set the MCA param “rmaps_base_oversubscribe=1” in your default MCA param file, or in your environment > On Jan 10, 2018, at 4:42 AM, Florian Lindner wrote: > > Hello, > > a recent openmpi update on my Arch machine seems to have enabled > --nooversubscribe, as described

Re: [OMPI users] latest Intel CPU bug

2018-01-05 Thread r...@open-mpi.org
tel, but i am getting pissed > with the hysteria around this issue. > > Gilles > > > On 1/5/2018 3:54 PM, John Chludzinski wrote: > That article gives the best technical assessment I've seen of Intel's > architecture bug. I noted the discussion's subject and thought I'd

Re: [OMPI users] latest Intel CPU bug

2018-01-04 Thread r...@open-mpi.org
mit to architectural security blunders and > reveal publicly embarrassing policies until forced to disclose by a > governmental agency being exploited by a foreign power is another example > that shines a harsh light on their ‘best practices’ line. There are many more > like this. Inte

Re: [OMPI users] latest Intel CPU bug

2018-01-04 Thread r...@open-mpi.org
a “problem”. * containers and VMs don’t fully resolve the problem - the only solution other than the patches is to limit allocations to single users on a node HTH Ralph > On Jan 3, 2018, at 10:47 AM, r...@open-mpi.org wrote: > > Well, it appears from that article that the primary imp

Re: [OMPI users] latest Intel CPU bug

2018-01-03 Thread r...@open-mpi.org
Well, it appears from that article that the primary impact comes from accessing kernel services. With an OS-bypass network, that shouldn’t happen all that frequently, and so I would naively expect the impact to be at the lower end of the reported scale for those environments. TCP-based systems,

Re: [OMPI users] Q: Binding to cores on AWS?

2017-12-22 Thread r...@open-mpi.org
t; has a big impact. > > Thanks again, and merry Christmas! > - Brian > > > On Fri, Dec 22, 2017 at 1:53 PM, r...@open-mpi.org <mailto:r...@open-mpi.org> > <r...@open-mpi.org <mailto:r...@open-mpi.org>> wrote: > Actually, that message is telling you

Re: [OMPI users] Q: Binding to cores on AWS?

2017-12-22 Thread r...@open-mpi.org
Actually, that message is telling you that binding to core is available, but that we cannot bind memory to be local to that core. You can verify the binding pattern by adding --report-bindings to your cmd line. > On Dec 22, 2017, at 11:58 AM, Brian Dobbins wrote: > > >

Re: [OMPI users] openmpi-v3.0.0: error for --map-by

2017-12-20 Thread r...@open-mpi.org
cesses could not be completed due to > an internal error - the locale of the following process was > not set by the mapper code: > ... > > > Kind regards > > Siegmar > > > On 12/20/17 09:22, r...@open-mpi.org <mailto:r...@open-mpi.org> wrote: >> I just c

Re: [OMPI users] openmpi-v3.0.0: error for --map-by

2017-12-20 Thread r...@open-mpi.org
I just checked the head of both the master and 3.0.x branches, and they both work fine: $ mpirun --map-by ppr:1:socket:pe=1 date [rhc001:139231] SETTING BINDING TO CORE [rhc002.cluster:203672] SETTING BINDING TO CORE Wed Dec 20 00:20:55 PST 2017 Wed Dec 20 00:20:55 PST 2017 Tue Dec 19 18:37:03

Re: [OMPI users] OpenMPI & Slurm: mpiexec/mpirun vs. srun

2017-12-19 Thread r...@open-mpi.org
gestions, or just other experiences are welcome. Also, if > anyone is interested in the tmpdir spank plugin, you can contact me. We are > happy to share. > > Best and Merry Christmas to all, > > Charlie Taylor > UF Research Computing > > > >> On Dec 18, 2017, a

Re: [OMPI users] OpenMPI & Slurm: mpiexec/mpirun vs. srun

2017-12-18 Thread r...@open-mpi.org
We have had reports of applications running faster when executing under OMPI’s mpiexec versus when started by srun. Reasons aren’t entirely clear, but are likely related to differences in mapping/binding options (OMPI provides a very large range compared to srun) and optimization flags provided

Re: [OMPI users] OMPI 3.0.0 crashing at mpi_init on OS X using Fortran

2017-12-11 Thread r...@open-mpi.org
FWIW: I just cloned the v3.0.x branch to get the latest 3.0.1 release candidate, built and ran it on Mac OSX High Sierra. Everything built and ran fine for both C and Fortran codes. You might want to test the same - could be this was already fixed. > On Dec 11, 2017, at 12:43 PM, Ricardo

Re: [OMPI users] OMPI 2.1.2 and SLURM compatibility

2017-12-11 Thread r...@open-mpi.org
dled PMIx. > > $ gcc -I/tmp/build/openmpi-3.0.0/opal/mca/pmix/pmix2x/pmix/include \ > pmix-test.c > pmix-test.c:95:2: error: #error "not version 3" > #error "not version 3" > ^ > > But the config.log generated when using the internal version of P

Re: [OMPI users] OMPI 2.1.2 and SLURM compatibility

2017-11-29 Thread r...@open-mpi.org
r: *** JOB 116 ON bn1 CANCELLED AT 2017-11-29T08:42:54 *** > > Any thoughts you might have on this would be very much appreciated. > > Thanks, -- bennet > > > > On Fri, Nov 17, 2017 at 11:45 PM, Howard Pritchard <hpprit...@gmail.com> > wrote: >> Hello Benn

Re: [OMPI users] signal handling with mpirun

2017-11-21 Thread r...@open-mpi.org
Try upgrading to the v3.0, or at least to the latest in the v2.x series. The v1.10 series is legacy and no longer maintained. > On Nov 21, 2017, at 8:20 AM, Kulshrestha, Vipul > wrote: > > Hi, > > I am finding that on Ctrl-C, mpirun immediately stops and does

Re: [OMPI users] --map-by

2017-11-21 Thread r...@open-mpi.org
ern within the current context. > On Nov 21, 2017, at 5:34 AM, Noam Bernstein <noam.bernst...@nrl.navy.mil> > wrote: > >> >> On Nov 20, 2017, at 7:02 PM, r...@open-mpi.org <mailto:r...@open-mpi.org> >> wrote: >> >> So there are two

Re: [OMPI users] --map-by

2017-11-20 Thread r...@open-mpi.org
> On Nov 16, 2017, at 7:08 AM, Noam Bernstein <noam.bernst...@nrl.navy.mil> > wrote: > > >> On Nov 16, 2017, at 9:49 AM, r...@open-mpi.org <mailto:r...@open-mpi.org> >> wrote: >> >> Do not include the “bind-to core” option.the mapping directive

Re: [OMPI users] OMPI 2.1.2 and SLURM compatibility

2017-11-16 Thread r...@open-mpi.org
What Charles said was true but not quite complete. We still support the older PMI libraries but you likely have to point us to wherever slurm put them. However,we definitely recommend using PMIx as you will get a faster launch Sent from my iPad > On Nov 16, 2017, at 9:11 AM, Bennet Fauber

Re: [OMPI users] --map-by

2017-11-16 Thread r...@open-mpi.org
Do not include the “bind-to core” option.the mapping directive already forces that Sent from my iPad > On Nov 16, 2017, at 7:44 AM, Noam Bernstein > wrote: > > Hi all - I’m trying to run mixed MPI/OpenMP, so I ideally want binding of > each MPI process to a

Re: [OMPI users] Can't connect using MPI Ports

2017-11-09 Thread r...@open-mpi.org
I did a quick check across the v2.1 and v3.0 OMPI releases and both failed, though with different signatures. Looks like a problem in the OMPI dynamics integration (i.e., the PMIx library looked like it was doing the right things). I’d suggest filing an issue on the OMPI github site so someone

Re: [OMPI users] OpenMPI 1.10.x handling of simultaneous MPI_Abort calls

2017-11-08 Thread r...@open-mpi.org
for 6 processes. I observe a failure to > exit mpirun around 25-30% of the time with 2 processes, causing an > inconsistent hang in both my example program and my larger application. > > -Nik > > On Nov 8, 2017 11:40, "r...@open-mpi.org <mailto:r...@open-mpi.org>&q

Re: [OMPI users] OpenMPI 1.10.x handling of simultaneous MPI_Abort calls

2017-11-08 Thread r...@open-mpi.org
problem. > > Thanks, > Nik > > 2017-11-07 19:00 GMT-07:00 r...@open-mpi.org <mailto:r...@open-mpi.org> > <r...@open-mpi.org <mailto:r...@open-mpi.org>>: > Glad to hear it has already been fixed :-) > > Thanks! > >> On Nov 7, 2017, at 4

Re: [OMPI users] Can't connect using MPI Ports

2017-11-06 Thread r...@open-mpi.org
> On Nov 6, 2017, at 7:46 AM, Florian Lindner <mailingli...@xgm.de> wrote: > > Am 05.11.2017 um 20:57 schrieb r...@open-mpi.org: >> >>> On Nov 5, 2017, at 6:48 AM, Florian Lindner <mailingli...@xgm.de >>> <mailto:mailingli...@xgm.de>> wrote

Re: [OMPI users] Can't connect using MPI Ports

2017-11-05 Thread r...@open-mpi.org
> On Nov 5, 2017, at 6:48 AM, Florian Lindner <mailingli...@xgm.de> wrote: > > Am 04.11.2017 um 00:05 schrieb r...@open-mpi.org <mailto:r...@open-mpi.org>: >> Yeah, there isn’t any way that is going to work in the 2.x series. I’m not >> sure it was ever fixed,

Re: [OMPI users] Can't connect using MPI Ports

2017-11-03 Thread r...@open-mpi.org
ner <mailingli...@xgm.de> wrote: > > > Am 03.11.2017 um 16:18 schrieb r...@open-mpi.org: >> What version of OMPI are you using? > > 2.1.1 @ Arch Linux. > > Best, > Florian > ___ > users mailing list > users@lists.o

Re: [OMPI users] Can't connect using MPI Ports

2017-11-03 Thread r...@open-mpi.org
What version of OMPI are you using? > On Nov 3, 2017, at 7:48 AM, Florian Lindner wrote: > > Hello, > > I'm working on a sample program to connect two MPI communicators launched > with mpirun using Ports. > > Firstly, I use MPI_Open_port to obtain a name and write that

Re: [OMPI users] [OMPI devel] Open MPI 2.0.4rc2 available for testing

2017-11-02 Thread r...@open-mpi.org
I would suggest also considering simply updating to v3.0, or at least to v2.1. I’d rather not play “whack-a-mole” with the Sun compiler again :-( > On Nov 2, 2017, at 6:06 AM, Howard Pritchard wrote: > > HI Siegmar, > > Could you check if you also see a similar problem

Re: [OMPI users] Problem with MPI jobs terminating when using OMPI 3.0.x

2017-10-27 Thread r...@open-mpi.org
Two questions: 1. are you running this on node04? Or do you have ssh access to node04? 2. I note you are building this against an old version of PMIx for some reason. Does it work okay if you build it with the embedded PMIx (which is 2.0)? Does it work okay if you use PMIx v1.2.4, the latest

Re: [OMPI users] IpV6 Openmpi mpirun failed

2017-10-19 Thread r...@open-mpi.org
> On Wed, Oct 18, 2017 at 3:52 PM, Mukkie <mukunthh...@gmail.com > <mailto:mukunthh...@gmail.com>> wrote: > Thanks for your suggestion. However my firewall's are already disabled on > both the machines. > > Cordially, > Muku. > > On Wed, Oct 18, 2017 at 2:3

Re: [OMPI users] IpV6 Openmpi mpirun failed

2017-10-18 Thread r...@open-mpi.org
Looks like there is a firewall or something blocking communication between those nodes? > On Oct 18, 2017, at 1:29 PM, Mukkie wrote: > > Adding a verbose output. Please check for failed and advise. Thank you. > > [mselvam@ipv-rhel73 examples]$ mpirun -hostfile host

Re: [OMPI users] Failed to register memory (openmpi 2.0.2)

2017-10-18 Thread r...@open-mpi.org
Put “oob=tcp” in your default MCA param file > On Oct 18, 2017, at 9:00 AM, Mark Dixon wrote: > > Hi, > > We're intermittently seeing messages (below) about failing to register memory > with openmpi 2.0.2 on centos7 / Mellanox FDR Connect-X 3 and the vanilla IB > stack

Re: [OMPI users] Controlling spawned process

2017-10-06 Thread r...@open-mpi.org
Couple of things you can try: * add --oversubscribe to your mpirun cmd line so it doesn’t care how many slots there are * modify your MPI_INFO to be “host”, “node0:22” so it thinks there are more slots available It’s possible that the “host” info processing has a bug in it, but this will

Re: [OMPI users] OpenMPI with-tm is not obeying torque

2017-10-06 Thread r...@open-mpi.org
No problem - glad you were able to work it out! > On Oct 5, 2017, at 11:22 PM, Anthony Thyssen <a.thys...@griffith.edu.au> > wrote: > > Sorry r...@open-mpi.org <mailto:r...@open-mpi.org> as Gilles Gouaillardet > pointed out to me the problem wasn't OpenMPI,

Re: [OMPI users] OpenMPI with-tm is not obeying torque

2017-10-03 Thread r...@open-mpi.org
Can you try a newer version of OMPI, say the 3.0.0 release? Just curious to know if we perhaps “fixed” something relevant. > On Oct 3, 2017, at 5:33 PM, Anthony Thyssen wrote: > > FYI... > > The problem is discussed further in > > Redhat Bugzilla: Bug 1321154 -

Re: [OMPI users] OMPI users] upgraded mpi and R and now cannot find slots

2017-10-03 Thread r...@open-mpi.org
You can add it to the default MCA param file, if you want - /etc/openmpi-mca-params.conf > On Oct 3, 2017, at 12:44 PM, Jim Maas <jimmaa...@gmail.com> wrote: > > Thanks RHC where do I put that so it will be in the environment? > > J > > On 3 October 2017

Re: [OMPI users] OMPI users] upgraded mpi and R and now cannot find slots

2017-10-03 Thread r...@open-mpi.org
As Gilles said, we default to slots = cores, not HTs. If you want to treat HTs as independent cpus, then you need to add OMPI_MCA_hwloc_base_use_hwthreads_as_cpus=1 in your environment. > On Oct 3, 2017, at 7:27 AM, Jim Maas wrote: > > Tried this and got this error, and

Re: [OMPI users] OpenMPI with-tm is not obeying torque

2017-10-02 Thread r...@open-mpi.org
One thing I can see is that the local host (where mpirun executed) shows as “node21” in the allocation, while all others show their FQDN. This might be causing some confusion. You might try adding "--mca orte_keep_fqdn_hostnames 1” to your cmd line and see if that helps. > On Oct 2, 2017, at

Re: [OMPI users] OpenMPI 3.0.0, compilation using Intel icc 11.1 on Linux, error when compiling pmix_mmap

2017-10-02 Thread r...@open-mpi.org
99/xsh/time.h.html, should be > declared in , which is definitely #included by > opal/threads/condition.h. > > Since this error occurred with Intel 11.x but didn't occur with later > versions of the Intel compiler, I'm wondering if the Intel 11.x compiler > suite didn't support

Re: [OMPI users] Fwd: OpenMPI does not obey hostfile

2017-09-26 Thread r...@open-mpi.org
That is correct. If you don’t specify a slot count, we auto-discover the number of cores on each node and set #slots to that number. If an RM is involved, then we use what they give us Sent from my iPad > On Sep 26, 2017, at 8:11 PM, Anthony Thyssen > wrote: > >

Re: [OMPI users] Fwd: Make All error regarding either "Conflicting" or "Previous Declaration" among others

2017-09-19 Thread r...@open-mpi.org
Err...you might want to ask the MPICH folks. This is the Open MPI mailing list :-) > On Sep 19, 2017, at 7:38 AM, Aragorn Inocencio > wrote: > > Good evening, > > Thank you for taking the time to develop and assist in the use of this tool. > > I am trying to

Re: [OMPI users] Honor host_aliases file for tight SGE integration

2017-09-15 Thread r...@open-mpi.org
Hi Reuti As far as I am concerned, you SGE users “own” the SGE support - so feel free to submit a patch! Ralph > On Sep 13, 2017, at 9:10 AM, Reuti wrote: > > Hi, > > I wonder whether it came ever to the discussion, that SGE can have a similar > behavior like

Re: [OMPI users] OpenMPI 1.10.5 oversubscribing cores

2017-09-08 Thread r...@open-mpi.org
What you probably want to do is add --cpu-list a,b,c... to each mpirun command, where each one lists the cores you want to assign to that job. > On Sep 8, 2017, at 6:46 AM, twu...@goodyear.com wrote: > > > I posted this question last year and we ended up not upgrading to the newer > openmpi.

Re: [OMPI users] Slot count parameter in hostfile ignored

2017-09-08 Thread r...@open-mpi.org
> >>> >>> Cheers, >>> >>> >>> Gilles >>> >>> >>> On 9/8/2017 4:19 PM, Maksym Planeta wrote: >>>> Indeed mpirun shows slots=1 per node, but I create allocation with >>>> --ntasks-per-node 24,

Re: [OMPI users] Slot count parameter in hostfile ignored

2017-09-07 Thread r...@open-mpi.org
My best guess is that SLURM has only allocated 2 slots, and we respect the RM regardless of what you say in the hostfile. You can check this by adding --display-allocation to your cmd line. You probably need to tell slurm to allocate more cpus/node. > On Sep 7, 2017, at 3:33 AM, Maksym

Re: [OMPI users] Setting LD_LIBRARY_PATH for orted

2017-08-22 Thread r...@open-mpi.org
I’m afraid not - that only applies the variable to the application, not the daemons. Truly, your only real option is to put something in your .bashrc since you cannot modify the configure. Or, if you are running in a managed environment, you can ask to have your resource manager forward your

Re: [OMPI users] mpirun 2.1.1 refuses to start a Torque 6.1.1.1 job if I change the scheduler to Maui 3.3.1

2017-08-09 Thread r...@open-mpi.org
sounds to me like your maui scheduler didn’t provide any allocated slots on the nodes - did you check $PBS_NODEFILE? > On Aug 9, 2017, at 12:41 PM, A M wrote: > > > Hello, > > I have just ran into a strange issue with "mpirun". Here is what happened: > > I successfully

Re: [OMPI users] error building openmpi-v2.* with SUN C 5.15 on SuSE Linux

2017-08-08 Thread r...@open-mpi.org
Should be fixed for 2.x here: https://github.com/open-mpi/ompi/pull/4054 > On Jul 31, 2017, at 5:56 AM, Siegmar Gross > wrote: > > Hi, > > I've been able to install openmpi-v2.0.x-201707270322-239c439 and >

Re: [OMPI users] -host vs -hostfile

2017-07-31 Thread r...@open-mpi.org
?? Doesn't that tell pbs to allocate 1 node with 2 slots on it? I don't see where you get 4 Sent from my iPad > On Jul 31, 2017, at 10:00 AM, Mahmood Naderan wrote: > > OK. The next question is how touse it with torque (PBS)? currently we write > this directive > >

Re: [OMPI users] Questions about integration with resource distribution systems

2017-07-26 Thread r...@open-mpi.org
gt; From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Reuti > Sent: Wednesday, July 26, 2017 9:25 AM > To: Open MPI Users <users@lists.open-mpi.org> > Subject: Re: [OMPI users] Questions about integration with resource > distribution systems > > >>

Re: [OMPI users] Questions about integration with resource distribution systems

2017-07-26 Thread r...@open-mpi.org
ry. Is that correct? I am > doubtful, because I don’t understand how mpirun gets access to information > about RAM requirement. > > qsub –pe orte 8 –b y –V –l m_mem_free=40G –cwd mpirun –np 8 a.out > > > Regards, > Vipul > > >   <> > From: users [mai

Re: [OMPI users] Questions about integration with resource distribution systems

2017-07-25 Thread r...@open-mpi.org
> On Jul 25, 2017, at 3:48 PM, Kulshrestha, Vipul > wrote: > > I have several questions about integration of openmpi with resource queuing > systems. > > 1. > I understand that openmpi supports integration with various resource > distribution systems such as

Re: [OMPI users] Issue handling SIGUSR1 in OpenMPI

2017-07-25 Thread r...@open-mpi.org
ment > 'received SIGUSR1 signal' twice, but it prints just once. What am I missing > here? > > > > On 25 July 2017 at 11:17, r...@open-mpi.org <mailto:r...@open-mpi.org> > <r...@open-mpi.org <mailto:r...@open-mpi.org>> wrote: > I’m afraid we don

Re: [OMPI users] Issue handling SIGUSR1 in OpenMPI

2017-07-25 Thread r...@open-mpi.org
I’m afraid we don’t currently support that use-case. We forward signals sent by the user to mpiexec (i.e., the user “hits” mpiexec with a signal), but we don’t do anything to support an application proc attempting to raise a signal and asking it to be propagated. If you are using OMPI master,

Re: [OMPI users] OpenMPI 2.1.1, --map-to socket, application context files

2017-06-30 Thread r...@open-mpi.org
ing application context files. > > How can I do this? > > Sincerely, > > Ted Sussman > > On 29 Jun 2017 at 19:09, r...@open-mpi.org wrote: > >> >> It´s a difficult call to make as to which is the correct behavior. In >> Example 1, you are executing

Re: [OMPI users] OpenMPI 2.1.1, --map-to socket, application context files

2017-06-29 Thread r...@open-mpi.org
It’s a difficult call to make as to which is the correct behavior. In Example 1, you are executing a single app_context that has two procs in it. In Example 2, you are executing two app_contexts, each with a single proc in it. Now some people say that the two should be treated the same, with

Re: [OMPI users] Node failure handling

2017-06-27 Thread r...@open-mpi.org
Okay, this should fix it - https://github.com/open-mpi/ompi/pull/3771 <https://github.com/open-mpi/ompi/pull/3771> > On Jun 27, 2017, at 6:31 AM, r...@open-mpi.org wrote: > > Actually, the error message is coming from mpirun to indicate that it lost > connection to one (or mo

Re: [OMPI users] Node failure handling

2017-06-27 Thread r...@open-mpi.org
use they lose connection to a peer. I was not aware that this > capability exists in the master version of ORTE, but if it does then it makes > our life easier. > > George. > > > On Tue, Jun 27, 2017 at 6:14 AM, r...@open-mpi.org <mailto:r...@open-mpi.org> > <r...

Re: [OMPI users] Node failure handling

2017-06-26 Thread r...@open-mpi.org
e job. > -------------- > [bud96:20652] [[8878,0],0] orted_cmd: received halt_vm cmd > [bud96:20652] [[8878,0],0] orted_cmd: all routes and children gone - exiting > ``` > > On 27 June 2017 at 12:19, r...@open-

Re: [OMPI users] waiting for message either from MPI communicator or from TCP socket

2017-06-25 Thread r...@open-mpi.org
I suspect nobody is answering because the question makes no sense to us. You are implying that the TCP socket is outside of MPI since it isn’t tied to a communicator. If you want to setup a non-MPI communication path between two procs and monitor it separate from the MPI library, you can

Re: [OMPI users] --host works but --hostfile does not

2017-06-22 Thread r...@open-mpi.org
From “man mpirun” - note that not specifying “slots=N” in a hostfile defaults to slots=#cores on that node (as it states in the text): Specifying Host Nodes Host nodes can be identified on the mpirun command line with the -host option or in a hostfile. For example,

Re: [OMPI users] disable slurm/munge from mpirun

2017-06-22 Thread r...@open-mpi.org
I gather you are using OMPI 2.x, yes? And you configured it --with-pmi=, then moved the executables/libs to your workstation? I suppose I could state the obvious and say “don’t do that - just rebuild it”, and I fear that (after checking the 2.x code) you really have no choice. OMPI v3.0 will

  1   2   3   >