Re: [OMPI users] OpenMPI 4 and pmi2 support

2019-06-20 Thread Charles A Taylor via users
Sure… + ./configure --build=x86_64-redhat-linux-gnu \ --host=x86_64-redhat-linux-gnu \ --program-prefix= \ --disable-dependency-tracking \ --prefix=/apps/mpi/intel/2019.1.144/openmpi/4.0.1 \ --exec-prefix=/apps/mpi/intel/2019.1.144/openmpi/4.0.1 \

Re: [OMPI users] Intel Compilers

2019-06-20 Thread Charles A Taylor via users
g it. :( Charlie Taylor UF Research Computing > > Tim > > From: users <mailto:users-boun...@lists.open-mpi.org>> On Behalf Of Charles A Taylor via > users > Sent: Thursday, June 20, 2019 8:55 AM > To: Open MPI Users <mailto:users@lists.open-mpi.org>>

[OMPI users] Intel Compilers

2019-06-20 Thread Charles A Taylor via users
OpenMPI probably has one of the largest and most complete configure+build systems I’ve ever seen. I’m surprised however that it doesn’t pick up the use of the intel compilers and modify the command line parameters as needed. ifort: command line warning #10006: ignoring unknown option '-pipe'

Re: [OMPI users] growing memory use from MPI application

2019-06-20 Thread Charles A Taylor via users
the fix. See… - Apply patch for memory leak associated with UCX PML. -https://github.com/openucx/ucx/issues/2921 -https://github.com/open-mpi/ompi/pull/5878 Charles Taylor UF Research Computing > On Jun 19, 2019, at 2:26 PM, Noam Bernstein via users > wrote: > >> On Jun 19

[OMPI users] Error initializing an UCX / OpenFabrics device. #6300

2019-03-22 Thread Charles A Taylor
Anyone else running into the issue below with OpenMPI 4.0.0? https://github.com/open-mpi/ompi/issues/6300 (Error initializing an UCX / OpenFabrics device) I’m hitting it and don’t really see why. I posted to the bug but maybe I need to just

Re: [OMPI users] Memory Leak in 3.1.2 + UCX

2018-10-17 Thread Charles A Taylor
t 4, 2018, at 5:39 PM, Charles A Taylor wrote: > > > We are seeing a gaping memory leak when running OpenMPI 3.1.x (or 2.1.2, for > that matter) built with UCX support. The leak shows up > whether the “ucx” PML is specified for the run or not. The applications in > question are

Re: [OMPI users] issue compiling openmpi 3.2.1 with pmi and slurm

2018-10-10 Thread Charles A Taylor
In our config the "--with-pmi" points to the slurm “prefix” dir not the slurm libdir. The options below work for us with SLURM installed in “/opt/slurm”. I’ll note that after sharing this config with regard to another issue, it was recommended to drop the “/usr” in the “—with-foo=/usr” options

Re: [OMPI users] Memory Leak in 3.1.2 + UCX

2018-10-06 Thread Charles A Taylor
a similar problem previously in when > configuring against an external PMIx library. The configure >> script produces (or did) a "-L/usr/lib” instead of a "-L/usr/lib64” > resulting in unresolved PMIx routines when linking. >> That was with OpenMPI 2.1.2. We now include a lib

Re: [OMPI users] Memory Leak in 3.1.2 + UCX

2018-10-06 Thread Charles A Taylor
ed to see if that was fixed for 3.x or not. I should have also mentioned in my previous post that HPC_CUDA_DIR=NO meaning that CUDA support has been excluded from these builds (in case anyone was wondering). Thanks for the feedback, Charlie > > Cheers, > > Gilles > On Fri, Oct 5, 2018

[OMPI users] Memory Leak in 3.1.2 + UCX

2018-10-04 Thread Charles A Taylor
We are seeing a gaping memory leak when running OpenMPI 3.1.x (or 2.1.2, for that matter) built with UCX support. The leak shows up whether the “ucx” PML is specified for the run or not. The applications in question are arepo and gizmo but it I have no reason to believe that others are not

Re: [OMPI users] OpenMPI + PMIx + SLURM

2018-07-01 Thread Charles A Taylor
with, srun —mpi=pmix_v1 Apologies for the wasted bandwidth. Regards, Charlie > On Jun 28, 2018, at 8:14 AM, Charles A Taylor wrote: > > There is a name for my pain and it is “OpenMPI + PMIx”. :) > > I’m looking at upgrading SLURM from 16.05.11 to 17.11.0

[OMPI users] OpenMPI + PMIx + SLURM

2018-07-01 Thread Charles A Taylor
There is a name for my pain and it is “OpenMPI + PMIx”. :) I’m looking at upgrading SLURM from 16.05.11 to 17.11.05 (bear with me, this is not a SLURM question). After building SLURM 17.11.05 with ‘--with-pmix=/opt/pmix/1.1.5:/opt/pmix/2.1/1’ and installing a test instance, I see $ srun

Re: [OMPI users] A couple of general questions

2018-06-14 Thread Charles A Taylor
uT7JdSnIdw=> > Best, > P. > > > On Thu, Jun 14, 2018 at 3:25 PM Charles A Taylor <mailto:chas...@ufl.edu>> wrote: > Hmmm. ompi_info only shows the ucx pml. I don’t see any “transports”. > Will they show up somewhere or are they documented. Right now it looks

Re: [OMPI users] A couple of general questions

2018-06-14 Thread Charles A Taylor
nts, you should likely give this a try and see if > it works for you. > > The libfabric / verbs combo *may* work, but I don't know how robust the verbs > libfabric support was in the v1.5 release series. > > >> On Jun 14, 2018, at 10:01 AM, Charles A Taylor wrot

Re: [OMPI users] A couple of general questions

2018-06-14 Thread Charles A Taylor
Thank you, Jeff. The ofi MTL with the verbs provider seems to be working well at the moment. I’ll need to let it run a day or so before I know whether we can avoid the deadlocks experienced with the straight openib BTL. I’ve also built-in UCX support so I’ll be trying that next. Again,

Re: [OMPI users] A couple of general questions

2018-06-14 Thread Charles A Taylor
Hi Matias, Thanks for the response. As of a couple of hours ago we are running: libfabric-devel-1.5.3-1.el7.x86_64 libfabric-1.5.3-1.el7.x86_64 As for the provider, I saw that one but just listed “verbs”. I’ll go with the “verbs;ofi_rxm” going forward. Regards, Charlie > On Jun

Re: [OMPI users] A couple of general questions

2018-06-14 Thread Charles A Taylor
l. > > Could you run your app with > > export OMPI_MCA_mtl_base_verbose=100 > > and post the output? > > It would also help if you described the system you are using : OS > interconnect cpu type etc. > > Howard > > Charles A Taylor mailto:chas...@ufl.edu&g

Re: [OMPI users] A couple of general questions

2018-06-14 Thread Charles A Taylor
t want to run the libfabric fi_info command to see what > capabilities you picked up from the libfabric RPMs. > > Next you may well not actually be using the OFI mtl. > > Could you run your app with > > export OMPI_MCA_mtl_base_verbose=100 > > and post the output? >

[OMPI users] A couple of general questions

2018-06-14 Thread Charles A Taylor
Because of the issues we are having with OpenMPI and the openib BTL (questions previously asked), I’ve been looking into what other transports are available. I was particularly interested in OFI/libfabric support but cannot find any information on it more recent than a reference to the usNIC

[OMPI users] OpenMPI + gadget/gizmo/arepo

2018-05-23 Thread Charles A Taylor
I feel a little funny posting this but I have observed this problem now over three different versions of OpenMPI (1.10.2, 2.0.3, 3.0.0) and have refrained from asking about it before now because we always had a work-around. That may not be the case now and feel like I’m missing something

Re: [OMPI users] openmpi/slurm/pmix

2018-04-24 Thread Charles A Taylor
Hi Gilles, Yes, I did. It was ignored AFAICT.I did not look for the reason - only so many hours in the day. Regards, Charlie > On Apr 24, 2018, at 8:07 AM, wrote: > > Charles, > > have you tried to configure --with-pmix-libdir=/.../lib64 ? >

Re: [OMPI users] openmpi/slurm/pmix

2018-04-24 Thread Charles A Taylor
I’ll add that when building OpenMPI 3.0.0 with an external PMIx, I found that the OpenMPI configure script only looks in “lib” for the the pmix library but the pmix configure/build uses “lib64” (as it should on a 64-bit system) so the configure script falls back to the internal PMIx. As Robert

Re: [OMPI users] ARM/Allinea DDT

2018-04-12 Thread Charles A Taylor
blem. > >> On Apr 11, 2018, at 11:54 AM, Charles A Taylor <chas...@ufl.edu> wrote: >> >> >> Contacting ARM seems a bit difficult so I thought I would ask here. We rely >> on DDT for debugging but it doesn’t work with OpenMPI 3.x and I can’t find >>

[OMPI users] ARM/Allinea DDT

2018-04-11 Thread Charles A Taylor
Contacting ARM seems a bit difficult so I thought I would ask here. We rely on DDT for debugging but it doesn’t work with OpenMPI 3.x and I can’t find anything about them having plans to support it. Anyone know if ARM DDT has plans to support newer versions of OpenMPI? Charlie Taylor UF

Re: [OMPI users] OpenMPI 3.0.0 on RHEL-7

2018-03-08 Thread Charles A Taylor
t-devel-2.0.22, you should be okay. You might want >> to up PMIx to v1.2.5 as Slurm 16.05 should handle that okay. OMPI v3.0.0 has >> PMIx 2.0 in it, but should be okay with 1.2.5 last I checked (but it has >> been awhile and I can’t swear to it). >> >> >>>

[OMPI users] OpenMPI 3.0.0 on RHEL-7

2018-03-07 Thread Charles A Taylor
Hi Distro: RHEL-7 (7.4) SLURM: 16.05.11 PMIx: 1.1.5 Trying to build OpenMPI 3.0.0 for our RHEL7 systems but running into what might be a configure script issue more than a real incompatibility problem. Configuring with the following, --with-slurm=/opt/slurm --with-pmix=/opt/pmix

Re: [OMPI users] OpenMPI & Slurm: mpiexec/mpirun vs. srun

2017-12-19 Thread Charles A Taylor
> Or one could tell OMPI to do what you really want it to do using map-by and > bind-to options, perhaps putting them in the default MCA param file. Nod. Agreed, but far too complicated for 98% of our users. > > Or you could enable cgroups in slurm so that OMPI sees the binding envelope - >

Re: [OMPI users] OpenMPI & Slurm: mpiexec/mpirun vs. srun

2017-12-19 Thread Charles A Taylor
Hi All, I’m glad to see this come up. We’ve used OpenMPI for a long time and switched to SLURM (from torque+moab) about 2.5 years ago. At the time, I had a lot of questions about running MPI jobs under SLURM and good information seemed to be scarce - especially regarding “srun”. I’ll just

Re: [OMPI users] PMIx + OpenMPI

2017-08-07 Thread Charles A Taylor
Many thanks to all who replied and especially to Artem Polyakov of Mellanox who provided a slurm-15.08.13 specific pmix patch. That patch applied and built cleanly against the 15.08.13 tarball and better yet, it works. Regards, Charles A. Taylor UF Research Computing > On Aug 6, 2017, a

Re: [OMPI users] PMIx + OpenMPI

2017-08-06 Thread Charles A Taylor
g 6, 2017, at 7:43 AM, Gilles Gouaillardet > <gilles.gouaillar...@gmail.com> wrote: > > Charles, > > did you build Open MPI with the external PMIx ? > iirc, Open MPI 2.0.x does not support cross version PMIx > > Cheers, > > Gilles > > On Sun, Aug 6, 2017

Re: [OMPI users] PMIx + OpenMPI

2017-08-06 Thread Charles A Taylor
> On Aug 6, 2017, at 6:53 AM, Charles A Taylor <chas...@ufl.edu> wrote: > > > Anyone successfully using PMIx with OpenMPI and SLURM? I have, > > 1. Installed an “external” version (1.1.5) of PMIx. > 2. Patched SLURM 15.08.13 with the SchedMD-provided PMIx patch

[OMPI users] PMIx + OpenMPI

2017-08-06 Thread Charles A Taylor
Anyone successfully using PMIx with OpenMPI and SLURM? I have, 1. Installed an “external” version (1.1.5) of PMIx. 2. Patched SLURM 15.08.13 with the SchedMD-provided PMIx patch (results in an mpi_pmix plugin along the lines of mpi_pmi2). 3. Built OpenMPI 2.0.1 (tried 2.0.3 as well). However,