Re: [OMPI users] OpenMPI 4 and pmi2 support

2019-06-20 Thread Charles A Taylor via users
Sure… + ./configure --build=x86_64-redhat-linux-gnu \ --host=x86_64-redhat-linux-gnu \ --program-prefix= \ --disable-dependency-tracking \ --prefix=/apps/mpi/intel/2019.1.144/openmpi/4.0.1 \ --exec-prefix=/apps/mpi/intel/2019.1.144/openmpi/4.0.1 \ --bindir=/apps/mpi/intel/2019.1.144

Re: [OMPI users] Intel Compilers

2019-06-20 Thread Charles A Taylor via users
g it. :( Charlie Taylor UF Research Computing > > Tim > > From: users <mailto:users-boun...@lists.open-mpi.org>> On Behalf Of Charles A Taylor via > users > Sent: Thursday, June 20, 2019 8:55 AM > To: Open MPI Users <mailto:users@lists.open-mpi.org>>

[OMPI users] Intel Compilers

2019-06-20 Thread Charles A Taylor via users
OpenMPI probably has one of the largest and most complete configure+build systems I’ve ever seen. I’m surprised however that it doesn’t pick up the use of the intel compilers and modify the command line parameters as needed. ifort: command line warning #10006: ignoring unknown option '-pipe'

Re: [OMPI users] growing memory use from MPI application

2019-06-20 Thread Charles A Taylor via users
This looks a lot like a problem I had with OpenMPI 3.1.2. I thought the fix was landed in 4.0.0 but you might want to check the code to be sure there wasn’t a regression in 4.1.x. Most of our codes are still running 3.1.2 so I haven’t built anything beyond 4.0.0 which definitely included the f

[OMPI users] Error initializing an UCX / OpenFabrics device. #6300

2019-03-22 Thread Charles A Taylor
Anyone else running into the issue below with OpenMPI 4.0.0? https://github.com/open-mpi/ompi/issues/6300 (Error initializing an UCX / OpenFabrics device) I’m hitting it and don’t really see why. I posted to the bug but maybe I need to just o

Re: [OMPI users] Memory Leak in 3.1.2 + UCX

2018-10-17 Thread Charles A Taylor
t 4, 2018, at 5:39 PM, Charles A Taylor wrote: > > > We are seeing a gaping memory leak when running OpenMPI 3.1.x (or 2.1.2, for > that matter) built with UCX support. The leak shows up > whether the “ucx” PML is specified for the run or not. The applications in > question are are

Re: [OMPI users] issue compiling openmpi 3.2.1 with pmi and slurm

2018-10-10 Thread Charles A Taylor
In our config the "--with-pmi" points to the slurm “prefix” dir not the slurm libdir. The options below work for us with SLURM installed in “/opt/slurm”. I’ll note that after sharing this config with regard to another issue, it was recommended to drop the “/usr” in the “—with-foo=/usr” options

Re: [OMPI users] Memory Leak in 3.1.2 + UCX

2018-10-06 Thread Charles A Taylor
milar problem previously in when > configuring against an external PMIx library. The configure >> script produces (or did) a "-L/usr/lib” instead of a "-L/usr/lib64” > resulting in unresolved PMIx routines when linking. >> That was with OpenMPI 2.1.2. We now include a lib

Re: [OMPI users] Memory Leak in 3.1.2 + UCX

2018-10-06 Thread Charles A Taylor
see if that was fixed for 3.x or not. I should have also mentioned in my previous post that HPC_CUDA_DIR=NO meaning that CUDA support has been excluded from these builds (in case anyone was wondering). Thanks for the feedback, Charlie > > Cheers, > > Gilles > On Fri, Oct 5, 2018 at

[OMPI users] Memory Leak in 3.1.2 + UCX

2018-10-04 Thread Charles A Taylor
We are seeing a gaping memory leak when running OpenMPI 3.1.x (or 2.1.2, for that matter) built with UCX support. The leak shows up whether the “ucx” PML is specified for the run or not. The applications in question are arepo and gizmo but it I have no reason to believe that others are not af

Re: [OMPI users] OpenMPI + PMIx + SLURM

2018-07-01 Thread Charles A Taylor
, srun —mpi=pmix_v1 Apologies for the wasted bandwidth. Regards, Charlie > On Jun 28, 2018, at 8:14 AM, Charles A Taylor wrote: > > There is a name for my pain and it is “OpenMPI + PMIx”. :) > > I’m looking at upgrading SLURM from 16.05.11 to 17.11.05 (bear with me, t

[OMPI users] OpenMPI + PMIx + SLURM

2018-07-01 Thread Charles A Taylor
There is a name for my pain and it is “OpenMPI + PMIx”. :) I’m looking at upgrading SLURM from 16.05.11 to 17.11.05 (bear with me, this is not a SLURM question). After building SLURM 17.11.05 with ‘--with-pmix=/opt/pmix/1.1.5:/opt/pmix/2.1/1’ and installing a test instance, I see $ srun --mp

Re: [OMPI users] A couple of general questions

2018-06-14 Thread Charles A Taylor
UywsZ7EtczOR47D8Jb5O22ESQO_TI&s=cop4oKioc-d7X7CFVHdWTiX4p6tsnD7V-uT7JdSnIdw&e=> > Best, > P. > > > On Thu, Jun 14, 2018 at 3:25 PM Charles A Taylor <mailto:chas...@ufl.edu>> wrote: > Hmmm. ompi_info only shows the ucx pml. I don’t see any “transports”. &

Re: [OMPI users] A couple of general questions

2018-06-14 Thread Charles A Taylor
nts, you should likely give this a try and see if > it works for you. > > The libfabric / verbs combo *may* work, but I don't know how robust the verbs > libfabric support was in the v1.5 release series. > > >> On Jun 14, 2018, at 10:01 AM, Charles A Taylor

Re: [OMPI users] A couple of general questions

2018-06-14 Thread Charles A Taylor
Thank you, Jeff. The ofi MTL with the verbs provider seems to be working well at the moment. I’ll need to let it run a day or so before I know whether we can avoid the deadlocks experienced with the straight openib BTL. I’ve also built-in UCX support so I’ll be trying that next. Again, than

Re: [OMPI users] A couple of general questions

2018-06-14 Thread Charles A Taylor
Hi Matias, Thanks for the response. As of a couple of hours ago we are running: libfabric-devel-1.5.3-1.el7.x86_64 libfabric-1.5.3-1.el7.x86_64 As for the provider, I saw that one but just listed “verbs”. I’ll go with the “verbs;ofi_rxm” going forward. Regards, Charlie > On Jun 1

Re: [OMPI users] A couple of general questions

2018-06-14 Thread Charles A Taylor
l. > > Could you run your app with > > export OMPI_MCA_mtl_base_verbose=100 > > and post the output? > > It would also help if you described the system you are using : OS > interconnect cpu type etc. > > Howard > > Charles A Taylor mailto:chas...@ufl.edu&

Re: [OMPI users] A couple of general questions

2018-06-14 Thread Charles A Taylor
t you might want to run the libfabric fi_info command to see what > capabilities you picked up from the libfabric RPMs. > > Next you may well not actually be using the OFI mtl. > > Could you run your app with > > export OMPI_MCA_mtl_base_verbose=100 > > and post the outp

[OMPI users] A couple of general questions

2018-06-14 Thread Charles A Taylor
Because of the issues we are having with OpenMPI and the openib BTL (questions previously asked), I’ve been looking into what other transports are available. I was particularly interested in OFI/libfabric support but cannot find any information on it more recent than a reference to the usNIC BT

[OMPI users] OpenMPI + gadget/gizmo/arepo

2018-05-23 Thread Charles A Taylor
I feel a little funny posting this but I have observed this problem now over three different versions of OpenMPI (1.10.2, 2.0.3, 3.0.0) and have refrained from asking about it before now because we always had a work-around. That may not be the case now and feel like I’m missing something obvio

Re: [OMPI users] openmpi/slurm/pmix

2018-04-24 Thread Charles A Taylor
Hi Gilles, Yes, I did. It was ignored AFAICT.I did not look for the reason - only so many hours in the day. Regards, Charlie > On Apr 24, 2018, at 8:07 AM, wrote: > > Charles, > > have you tried to configure --with-pmix-libdir=/.../lib64 ? > > Cheers, > > Gilles > > - Origin

Re: [OMPI users] openmpi/slurm/pmix

2018-04-24 Thread Charles A Taylor
I’ll add that when building OpenMPI 3.0.0 with an external PMIx, I found that the OpenMPI configure script only looks in “lib” for the the pmix library but the pmix configure/build uses “lib64” (as it should on a 64-bit system) so the configure script falls back to the internal PMIx. As Robert

Re: [OMPI users] ARM/Allinea DDT

2018-04-16 Thread Charles A Taylor
those who replied, Charie > On Apr 11, 2018, at 2:54 PM, Charles A Taylor wrote: > > > Contacting ARM seems a bit difficult so I thought I would ask here. We rely > on DDT for debugging but it doesn’t work with OpenMPI 3.x and I can’t find > anything about them having p

Re: [OMPI users] ARM/Allinea DDT

2018-04-12 Thread Charles A Taylor
a problem. > >> On Apr 11, 2018, at 11:54 AM, Charles A Taylor wrote: >> >> >> Contacting ARM seems a bit difficult so I thought I would ask here. We rely >> on DDT for debugging but it doesn’t work with OpenMPI 3.x and I can’t find >> anything about t

[OMPI users] ARM/Allinea DDT

2018-04-11 Thread Charles A Taylor
Contacting ARM seems a bit difficult so I thought I would ask here. We rely on DDT for debugging but it doesn’t work with OpenMPI 3.x and I can’t find anything about them having plans to support it. Anyone know if ARM DDT has plans to support newer versions of OpenMPI? Charlie Taylor UF Resea

Re: [OMPI users] OpenMPI 3.0.0 on RHEL-7

2018-03-08 Thread Charles A Taylor
; to up PMIx to v1.2.5 as Slurm 16.05 should handle that okay. OMPI v3.0.0 has >> PMIx 2.0 in it, but should be okay with 1.2.5 last I checked (but it has >> been awhile and I can’t swear to it). >> >> >>> On Mar 7, 2018, at 2:03 PM, Charles A Taylor wrote: >&

[OMPI users] OpenMPI 3.0.0 on RHEL-7

2018-03-07 Thread Charles A Taylor
Hi Distro: RHEL-7 (7.4) SLURM: 16.05.11 PMIx: 1.1.5 Trying to build OpenMPI 3.0.0 for our RHEL7 systems but running into what might be a configure script issue more than a real incompatibility problem. Configuring with the following, --with-slurm=/opt/slurm --with-pmix=/opt/pmix --with-e

Re: [OMPI users] OpenMPI & Slurm: mpiexec/mpirun vs. srun

2017-12-19 Thread Charles A Taylor
> Or one could tell OMPI to do what you really want it to do using map-by and > bind-to options, perhaps putting them in the default MCA param file. Nod. Agreed, but far too complicated for 98% of our users. > > Or you could enable cgroups in slurm so that OMPI sees the binding envelope - > i

Re: [OMPI users] OpenMPI & Slurm: mpiexec/mpirun vs. srun

2017-12-19 Thread Charles A Taylor
Hi All, I’m glad to see this come up. We’ve used OpenMPI for a long time and switched to SLURM (from torque+moab) about 2.5 years ago. At the time, I had a lot of questions about running MPI jobs under SLURM and good information seemed to be scarce - especially regarding “srun”. I’ll just b

Re: [OMPI users] OMPI 2.1.2 and SLURM compatibility

2017-11-16 Thread Charles A Taylor
Hi Bennet, Three things... 1. OpenMPI 2.x requires PMIx in lieu of pmi1/pmi2. 2. You will need slurm 16.05 or greater built with —with-pmix 2a. You will need pmix 1.1.5 which you can get from github. (https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_pmix_tarballs&d=DwIFaQ&c=pZJP

Re: [OMPI users] PMIx + OpenMPI

2017-08-07 Thread Charles A Taylor
Many thanks to all who replied and especially to Artem Polyakov of Mellanox who provided a slurm-15.08.13 specific pmix patch. That patch applied and built cleanly against the 15.08.13 tarball and better yet, it works. Regards, Charles A. Taylor UF Research Computing > On Aug 6, 2017, a

Re: [OMPI users] PMIx + OpenMPI

2017-08-06 Thread Charles A Taylor
g 6, 2017, at 7:43 AM, Gilles Gouaillardet > wrote: > > Charles, > > did you build Open MPI with the external PMIx ? > iirc, Open MPI 2.0.x does not support cross version PMIx > > Cheers, > > Gilles > > On Sun, Aug 6, 2017 at 7:59 PM, Charles A Taylor wr

Re: [OMPI users] PMIx + OpenMPI

2017-08-06 Thread Charles A Taylor
> On Aug 6, 2017, at 6:53 AM, Charles A Taylor wrote: > > > Anyone successfully using PMIx with OpenMPI and SLURM? I have, > > 1. Installed an “external” version (1.1.5) of PMIx. > 2. Patched SLURM 15.08.13 with the SchedMD-provided PMIx patch (results in an > m

[OMPI users] PMIx + OpenMPI

2017-08-06 Thread Charles A Taylor
Anyone successfully using PMIx with OpenMPI and SLURM? I have, 1. Installed an “external” version (1.1.5) of PMIx. 2. Patched SLURM 15.08.13 with the SchedMD-provided PMIx patch (results in an mpi_pmix plugin along the lines of mpi_pmi2). 3. Built OpenMPI 2.0.1 (tried 2.0.3 as well). However,