Re: [OMPI users] OpenMPI 4 and pmi2 support

2019-06-20 Thread Charles A Taylor via users
Sure…

+ ./configure 
  --build=x86_64-redhat-linux-gnu \
  --host=x86_64-redhat-linux-gnu \
  --program-prefix= \
  --disable-dependency-tracking \
  --prefix=/apps/mpi/intel/2019.1.144/openmpi/4.0.1 \
  --exec-prefix=/apps/mpi/intel/2019.1.144/openmpi/4.0.1 \
  --bindir=/apps/mpi/intel/2019.1.144/openmpi/4.0.1/bin \
  --sbindir=/apps/mpi/intel/2019.1.144/openmpi/4.0.1/sbin \
  --sysconfdir=/apps/mpi/intel/2019.1.144/openmpi/4.0.1/etc \
  --datadir=/apps/mpi/intel/2019.1.144/openmpi/4.0.1/share \
  --includedir=/apps/mpi/intel/2019.1.144/openmpi/4.0.1/include \
  --libdir=/apps/mpi/intel/2019.1.144/openmpi/4.0.1/lib64 \
  --libexecdir=/apps/mpi/intel/2019.1.144/openmpi/4.0.1/libexec \
  --localstatedir=/var \
  --sharedstatedir=/var/lib \
  --mandir=/apps/mpi/intel/2019.1.144/openmpi/4.0.1/share/man \
  --infodir=/apps/mpi/intel/2019.1.144/openmpi/4.0.1/share/info \
  C=icc CXX=icpc FC=ifort 'FFLAGS=-O2 -g -warn -m64' LDFLAGS= \
  --enable-static \
  --enable-orterun-prefix-by-default \
  --with-slurm=/opt/slurm \
  --with-pmix=/opt/pmix/3.1.2 \
  --with-pmi=/opt/slurm \
  --with-libevent=external \
  --with-hwloc=external \
  --without-verbs \
  --with-libfabric \
  --with-ucx \
  --with-mxm=no \
  --with-cuda=no \
  --enable-openib-udcm \
  --enable-openib-rdmacm


> On Jun 20, 2019, at 12:49 PM, Jeff Squyres (jsquyres) via users 
>  wrote:
> 
> Ok.
> 
> Perhaps we still missed something in the configury.
> 
> Worst case, you can:
> 
> $ ./configure CPPFLAGS=-I/usr/include/slurm ...rest of your configure 
> params...
> 
> That will add the -I to CPPFLAGS, and it will preserve that you set that 
> value in the top few lines of config.log.
> 
> 
> 
> On Jun 20, 2019, at 12:25 PM, Carlson, Timothy S  
> wrote:
>> 
>> As of recent you needed to use --with-slurm and --with-pmi2
>> 
>> While the configure line indicates it picks up pmi2 as part of slurm that is 
>> not in fact true and you need to specifically tell it about pmi2
>> 
>> From: users  On Behalf Of Noam Bernstein 
>> via users
>> Sent: Thursday, June 20, 2019 9:16 AM
>> To: Jeff Squyres (jsquyres) 
>> Cc: Noam Bernstein ; Open MPI User's List 
>> 
>> Subject: Re: [OMPI users] OpenMPI 4 and pmi2 support
>> 
>> 
>> 
>> 
>> On Jun 20, 2019, at 11:54 AM, Jeff Squyres (jsquyres)  
>> wrote:
>> 
>> On Jun 14, 2019, at 2:02 PM, Noam Bernstein via users 
>>  wrote:
>> 
>> 
>> Hi Jeff - do you remember this issue from a couple of months ago?  
>> 
>> Noam: I'm sorry, I totally missed this email.  My INBOX is a continual 
>> disaster.  :-(
>> 
>> No problem.  We’re running with mpirun for now.
>> 
>> 
>> 
>> Unfortunately, the failure to find pmi.h is still happening.  I just tried 
>> with 4.0.1 (not rc), and I still run into the same error (failing to find 
>> #include  when compiling opal/mca/pmix/s1/mca_pmix_s1_la-pmix_s1.lo):
>> make[2]: Entering directory 
>> `/home_tin/bernadm/configuration/110_compile_mpi/OpenMPI/openmpi-4.0.1/opal/mca/pmix/s1'
>> CC   mca_pmix_s1_la-pmix_s1.lo
>> pmix_s1.c:29:17: fatal error: pmi.h: No such file or directory
>> #include 
>>^
>> compilation terminated.
>> make[2]: *** [mca_pmix_s1_la-pmix_s1.lo] Error 1
>> make[2]: Leaving directory 
>> `/home_tin/bernadm/configuration/110_compile_mpi/OpenMPI/openmpi-4.0.1/opal/mca/pmix/s1'
>> make[1]: *** [all-recursive] Error 1
>> make[1]: Leaving directory 
>> `/home_tin/bernadm/configuration/110_compile_mpi/OpenMPI/openmpi-4.0.1/opal'
>> make: *** [all-recursive] Error 1
>> 
>> I looked back earlier in this thread, and I don't see the version of SLRUM 
>> that you're using.  What version is it?
>> 
>> 18.08, provided for our CentOS 7.6-based Rocks through the slurm roll, so 
>> not compiled by me.
>> 
>> 
>> 
>> Is there a pmi2.h in the SLURM installation (i.e., not pmi.h)?
>> 
>> Or is the problem that -I/usr/include/slurm is not passed to the compile 
>> line (per your output, below)?
>> 
>> /usr/include/slurm has both pmi.h and pmi2.h, but (from what I could tell 
>> when trying to manually reproduce what make is doing)
>> -I/usr/include/slurm 
>> is not being passed when compiling those files.
>> 
>> 
>> 
>> When I dig into what libtool is trying to do, I get (once I remove the 
>> —silent flag):
>> 
>> (FWIW, you can also "make V=1" to have it show you all this detail)
>> 
>> I’ll check that, to confirm that I’m correct about it not being passed.
>> 
>>  
>>  Noam
>> 
>> 
>> |
>> |
>> |
>> U.S. NAVAL
>> |
>> |
>> _RESEARCH_
>> |
>> LABORATORY
>> Noam Bernstein, Ph.D.
>> Center for Materials Physics and Technology
>> U.S. Naval Research Laboratory
>> T +1 202 404 8628  F +1 202 404 7546
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.nrl.navy.mil&d=DwIGaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=NpYP1iUbEbTx87BW8Gx5ow&m=u1fQ9HzG1l1CRApve71dA4BBKPDM3lRS__c1Ev4h4bM&s=UevOrdYXRuu7JeDg4GBR5Y6tF0ZlSLkb-updK57HYTU&e=
>>  
> 
> 
> -- 
> Jeff

Re: [OMPI users] Intel Compilers

2019-06-20 Thread Charles A Taylor via users


> On Jun 20, 2019, at 12:10 PM, Carlson, Timothy S  
> wrote:
> 
> I’ve never seen that error and have built some flavor of this combination 
> dozens of times.  What version of Intel Compiler and what version of OpenMPI 
> are you trying to build?

[chasman@login4 gizmo-mufasa]$ ifort -V
Intel(R) Fortran Intel(R) 64 Compiler for applications running on Intel(R) 64, 
Version 19.0.1.144 Build 20181018
Copyright (C) 1985-2018 Intel Corporation.  All rights reserved.

OpenMPI 4.0.1 

It is probably something I/we are doing that is throwing the configure script 
and macros off.  We include some version (7.3.0 in this case) of gcc in our 
command and library paths because icpc needs the gnu headers for certain 
things.  Perhaps the configure script is picking that up and thinks we are 
using gnu.   

I’ll have to look more closely now that I know I’m the only one seeing it.  :(

Charlie Taylor
UF Research Computing


>  
> Tim
>  
> From: users  <mailto:users-boun...@lists.open-mpi.org>> On Behalf Of Charles A Taylor via 
> users
> Sent: Thursday, June 20, 2019 8:55 AM
> To: Open MPI Users  <mailto:users@lists.open-mpi.org>>
> Cc: Charles A Taylor mailto:chas...@ufl.edu>>
> Subject: [OMPI users] Intel Compilers
>  
> OpenMPI probably has one of the largest and most complete configure+build 
> systems I’ve ever seen.  
>  
> I’m surprised however that it doesn’t pick up the use of the intel compilers 
> and modify the command line
> parameters as needed.
>  
> ifort: command line warning #10006: ignoring unknown option '-pipe'
> ifort: command line warning #10157: ignoring option '-W'; argument is of 
> wrong type
> ifort: command line warning #10006: ignoring unknown option 
> '-fparam=ssp-buffer-size=4'
> ifort: command line warning #10006: ignoring unknown option '-pipe'
> ifort: command line warning #10157: ignoring option '-W'; argument is of 
> wrong type
> ifort: command line warning #10006: ignoring unknown option 
> '-fparam=ssp-buffer-size=4'
> ifort: command line warning #10006: ignoring unknown option '-pipe'
> ifort: command line warning #10157: ignoring option '-W'; argument is of 
> wrong type
> ifort: command line warning #10006: ignoring unknown option 
> '-fparam=ssp-buffer-size=4’
>  
> Maybe I’m missing something.
>  
> Regards,
>  
> Charlie Taylor
> UF Research Computing

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

[OMPI users] Intel Compilers

2019-06-20 Thread Charles A Taylor via users
OpenMPI probably has one of the largest and most complete configure+build 
systems I’ve ever seen.  

I’m surprised however that it doesn’t pick up the use of the intel compilers 
and modify the command line
parameters as needed.

ifort: command line warning #10006: ignoring unknown option '-pipe'
ifort: command line warning #10157: ignoring option '-W'; argument is of wrong 
type
ifort: command line warning #10006: ignoring unknown option 
'-fparam=ssp-buffer-size=4'
ifort: command line warning #10006: ignoring unknown option '-pipe'
ifort: command line warning #10157: ignoring option '-W'; argument is of wrong 
type
ifort: command line warning #10006: ignoring unknown option 
'-fparam=ssp-buffer-size=4'
ifort: command line warning #10006: ignoring unknown option '-pipe'
ifort: command line warning #10157: ignoring option '-W'; argument is of wrong 
type
ifort: command line warning #10006: ignoring unknown option 
'-fparam=ssp-buffer-size=4’

Maybe I’m missing something.

Regards,

Charlie Taylor
UF Research Computing___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] growing memory use from MPI application

2019-06-20 Thread Charles A Taylor via users
This looks a lot like a problem I had with OpenMPI 3.1.2.  I thought the fix 
was landed in 4.0.0 but you might
want to check the code to be sure there wasn’t a regression in 4.1.x.  Most of 
our codes are still running
3.1.2 so I haven’t built anything beyond 4.0.0 which definitely included the 
fix.

See…

- Apply patch for memory leak associated with UCX PML.
-https://github.com/openucx/ucx/issues/2921
-https://github.com/open-mpi/ompi/pull/5878

Charles Taylor
UF Research Computing


> On Jun 19, 2019, at 2:26 PM, Noam Bernstein via users 
>  wrote:
> 
>> On Jun 19, 2019, at 2:00 PM, John Hearns via users > > wrote:
>> 
>> Noam, it may be a stupid question. Could you try runningslabtop   ss the 
>> program executes
> 
> The top SIZE usage is this line
>OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME  
>  
> 5937540 5937540 100%0.09K 141370 42565480K kmalloc-96
> which seems to be growing continuously. However, it’s much smaller than the 
> drop in free memory.  It gets to around 1 GB after tens of seconds (500 MB 
> here), but the overall free memory is dropping by about 1 GB / second, so 
> tens of GB over the same time.
> 
>> 
>> Also  'watch  cat /proc/meminfo'is also a good diagnostic
> 
> Other than MemFree dropping, I don’t see much. Here’s a diff, 10 seconds 
> apart:
> 2,3c2,3
> < MemFree:54229400 kB
> < MemAvailable:   54271804 kB
> ---
> > MemFree:45010772 kB
> > MemAvailable:   45054200 kB
> 19c19
> < AnonPages:  22063260 kB
> ---
> > AnonPages:  22526300 kB
> 22,24c22,24
> < Slab: 851380 kB
> < SReclaimable:  87100 kB
> < SUnreclaim:   764280 kB
> ---
> > Slab:1068208 kB
> > SReclaimable:  89148 kB
> > SUnreclaim:   979060 kB
> 31c31
> < Committed_AS:   34976896 kB
> ---
> > Committed_AS:   34977680 kB
> 
> MemFree has dropped by 9 GB, but as far as I can tell nothing else has 
> increased by anything near as much, so I don’t know where the memory is going.
> 
>   Noam
> 
> 
> 
> ||
> |U.S. NAVAL|
> |_RESEARCH_|
> LABORATORY
> 
> Noam Bernstein, Ph.D.
> Center for Materials Physics and Technology
> U.S. Naval Research Laboratory
> T +1 202 404 8628  F +1 202 404 7546
> https://www.nrl.navy.mil 
> 
> ___
> users mailing list
> users@lists.open-mpi.org
> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.open-2Dmpi.org_mailman_listinfo_users&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=NpYP1iUbEbTx87BW8Gx5ow&m=uR1yQLj0g46Qb_ELHglK3ck3gNxjVqxMHyRu2bcfRQo&s=oTZPqoXvy0rvbh3Ni6Mquuzel8PXWG1ub4-c6xleDnQ&e=

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users