Re: [OMPI users] Error with OpenMPI: Could not resolve generic procedure mpi_irecv

2019-08-19 Thread Gilles Gouaillardet via users
Thanks, but this is not really helping.

Could you please build a Minimal, Reproducible Example as described at
https://stackoverflow.com/help/minimal-reproducible-example ?

Cheers,

Gilles

On Mon, Aug 19, 2019 at 7:19 PM Sangam B via users
 wrote:
>
> Hi,
>
> Here is the sample program snippet:
>
> 
> #include "intrinsic_sizes.h"
> #include "redef.h"
>
> module module1_m
>
>   use mod1_m, only:  some__example2
>   use mod2_m, only:  some__example3
>   use mod3_m, only:  some__example4
>
>   use mpi
>   use, intrinsic :: iso_c_binding
>
> implicit none
>
>   private
>
>   public :: some__example___memory
>
>   type, public, extends(some__example5) :: some__example6
>  logical, public :: some__example7 = .False.
>  class(some__example8), private, pointer :: some__example9
>contains
>
> ...
> ...
> end type some__example6
>
> contains
> 
> some_pure_functions here
> 
>
>
> subroutine recv(this,lmb)
> class(some__example6), intent(inout) ::  this
> integer, intent(in) :: lmb(2,2)
>
> integer :: cs3, ierr
> integer(kind=C_LONG) :: size
>
> ! receive only from buffer at different process
> if(this%is_bf_referred) return
>
> cs3=this%uspecifier%get_recv_buff_3rd_dim_size(this%xb,this%vwb,lmb)
> if(cs3.eq.0) return ! nothing to recv
>
> size = this%size_dim(this%gi)*this%size_dim(this%gj)*cs3
> if(this%is_exchange_off) then
>call this%update_stats(size)
>this%bf(:,:,1:cs3) = cmplx(0.,0.)
> else
>call MPI_Irecv(this%bf(:,:,1:cs3),size,MPI_COMPLEX_TYPE,&
> this%nrank,this%tag,this%comm_xvw,this%request,ierr)
> end if
>   end subroutine recv
>
>
> Hope this helps.
>
> On Mon, Aug 19, 2019 at 3:21 PM Gilles Gouaillardet via users 
>  wrote:
>>
>> Thanks,
>>
>> and your reproducer is ?
>>
>> Cheers,
>>
>> Gilles
>>
>> On Mon, Aug 19, 2019 at 6:42 PM Sangam B via users
>>  wrote:
>> >
>> > Hi,
>> >
>> > OpenMPI is configured as follows:
>> >
>> > export CC=`which clang`
>> > export CXX=`which clang++`
>> > export FC=`which flang`
>> > export F90=`which flang`
>> >
>> > ../configure --prefix=/sw/openmpi/3.1.1/aocc20hpcx210-mpifort 
>> > --enable-mpi-fortran --enable-mpi-cxx --without-psm --without-psm2 
>> > --without-knem --without-libfabric --without-lsf --with-verbs=/usr 
>> > --with-mxm=/sw/hpcx/hpcx-v2.1.0-gcc-MLNX_OFED_LINUX-4.3-1.0.1.0-redhat7.4-x86_64/mxm
>> >
>> >
>> > ..
>> >
>> > On Mon, Aug 19, 2019 at 2:43 PM Sangam B  wrote:
>> >>
>> >> Hi,
>> >>
>> >> I get following error if the application is compiled with openmpi-3.1.1:
>> >>
>> >> mpifort -O3 -march=native -funroll-loops -finline-aggressive -flto 
>> >> -J./bin/obj_amd64aocc20 -std=f2008 -O3 -march=native -funroll-loops 
>> >> -finline-aggressive -flto -fallow-fortran-gnu-ext -ffree-form 
>> >> -fdefault-real-8 example_program.F90
>> >> F90-S-0155-Could not resolve generic procedure mpi_irecv ( 
>> >> example_program.F90  : 97)
>> >>   0 inform,   0 warnings,   1 severes, 0 fatal for recv
>> >>
>> >> Following is the line causing this error:
>> >>
>> >> call MPI_Irecv(this%bf(:,:,1:cs3),size,MPI_COMPLEX_TYPE,&
>> >> this%nrank,this%tag,this%comm_xvw,this%request,ierr)
>> >>
>> >> The program has following module mentioned in the beginning:
>> >>  use mpi
>> >>
>> >> The openmpi has following module files in lib folder:
>> >> $ ls *.mod
>> >> mpi_ext.modmpi_f08_ext.mod   
>> >> mpi_f08_interfaces.mod  mpi_f08_types.mod  pmpi_f08_interfaces.mod
>> >> mpi_f08_callbacks.mod  mpi_f08_interfaces_callbacks.mod  mpi_f08.mod  
>> >>mpi.mod
>> >>
>> >> The same program works with Intel MPI (gcc/intel as base compilers).
>> >> But fails with OpenMPI, whether gcc-8.1.0 or AOCC are used as base 
>> >> compilers. What could be the reason for it?
>> >>
>> >> ..
>> >
>> > ___
>> > users mailing list
>> > users@lists.open-mpi.org
>> > https://lists.open-mpi.org/mailman/listinfo/users
>> ___
>> users mailing list
>> users@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/users
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


Re: [OMPI users] Error with OpenMPI: Could not resolve generic procedure mpi_irecv

2019-08-19 Thread Gilles Gouaillardet via users
I am not questioning whether you are facing an issue with Open MPI or not.
I am just asking for "the same application" (read minimal source code)
so I can reproduce the issue, investigate it and hopefully help you.

Meanwhile, try rebuilding Open MPI with '-fdefault-real-8' in your
FCFLAGS (since this is what you are using to build your app) and see
whether it helps.

Cheers,

Gilles

Cheers,

Gilles

On Mon, Aug 19, 2019 at 7:06 PM Sangam B via users
 wrote:
>
> Hi,
>
> I've tried both gcc-8.1.0 and AOCC-2.0 compilers with openmpi-3.1.1. It fails 
> for both the compilers.
>
> Posted error message was from OpenMPI-3.1.1 + AOCC-2.0 compiler.
>
> To cross-check whether it is problem with OpenMPI or the base compiler, 
> compiled the same application with Intel MPI using base compiler as (1) intel 
> & (2) gcc. It works for both case.
>
> --
>
>
> On Mon, Aug 19, 2019 at 3:25 PM Gilles Gouaillardet via users 
>  wrote:
>>
>> One more thing ...
>>
>> Your initial message mentioned a failure with gcc 8.2.0, but your
>> follow-up message mentions LLVM compiler.
>>
>> So which compiler did you use to build Open MPI that fails to build your 
>> test ?
>>
>>
>> Cheers,
>>
>> Gilles
>>
>> On Mon, Aug 19, 2019 at 6:49 PM Gilles Gouaillardet
>>  wrote:
>> >
>> > Thanks,
>> >
>> > and your reproducer is ?
>> >
>> > Cheers,
>> >
>> > Gilles
>> >
>> > On Mon, Aug 19, 2019 at 6:42 PM Sangam B via users
>> >  wrote:
>> > >
>> > > Hi,
>> > >
>> > > OpenMPI is configured as follows:
>> > >
>> > > export CC=`which clang`
>> > > export CXX=`which clang++`
>> > > export FC=`which flang`
>> > > export F90=`which flang`
>> > >
>> > > ../configure --prefix=/sw/openmpi/3.1.1/aocc20hpcx210-mpifort 
>> > > --enable-mpi-fortran --enable-mpi-cxx --without-psm --without-psm2 
>> > > --without-knem --without-libfabric --without-lsf --with-verbs=/usr 
>> > > --with-mxm=/sw/hpcx/hpcx-v2.1.0-gcc-MLNX_OFED_LINUX-4.3-1.0.1.0-redhat7.4-x86_64/mxm
>> > >
>> > >
>> > > ..
>> > >
>> > > On Mon, Aug 19, 2019 at 2:43 PM Sangam B  wrote:
>> > >>
>> > >> Hi,
>> > >>
>> > >> I get following error if the application is compiled with openmpi-3.1.1:
>> > >>
>> > >> mpifort -O3 -march=native -funroll-loops -finline-aggressive -flto 
>> > >> -J./bin/obj_amd64aocc20 -std=f2008 -O3 -march=native -funroll-loops 
>> > >> -finline-aggressive -flto -fallow-fortran-gnu-ext -ffree-form 
>> > >> -fdefault-real-8 example_program.F90
>> > >> F90-S-0155-Could not resolve generic procedure mpi_irecv ( 
>> > >> example_program.F90  : 97)
>> > >>   0 inform,   0 warnings,   1 severes, 0 fatal for recv
>> > >>
>> > >> Following is the line causing this error:
>> > >>
>> > >> call MPI_Irecv(this%bf(:,:,1:cs3),size,MPI_COMPLEX_TYPE,&
>> > >> this%nrank,this%tag,this%comm_xvw,this%request,ierr)
>> > >>
>> > >> The program has following module mentioned in the beginning:
>> > >>  use mpi
>> > >>
>> > >> The openmpi has following module files in lib folder:
>> > >> $ ls *.mod
>> > >> mpi_ext.modmpi_f08_ext.mod   
>> > >> mpi_f08_interfaces.mod  mpi_f08_types.mod  pmpi_f08_interfaces.mod
>> > >> mpi_f08_callbacks.mod  mpi_f08_interfaces_callbacks.mod  mpi_f08.mod
>> > >>  mpi.mod
>> > >>
>> > >> The same program works with Intel MPI (gcc/intel as base compilers).
>> > >> But fails with OpenMPI, whether gcc-8.1.0 or AOCC are used as base 
>> > >> compilers. What could be the reason for it?
>> > >>
>> > >> ..
>> > >
>> > > ___
>> > > users mailing list
>> > > users@lists.open-mpi.org
>> > > https://lists.open-mpi.org/mailman/listinfo/users
>> ___
>> users mailing list
>> users@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/users
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


Re: [OMPI users] Error with OpenMPI: Could not resolve generic procedure mpi_irecv

2019-08-19 Thread Gilles Gouaillardet via users
One more thing ...

Your initial message mentioned a failure with gcc 8.2.0, but your
follow-up message mentions LLVM compiler.

So which compiler did you use to build Open MPI that fails to build your test ?


Cheers,

Gilles

On Mon, Aug 19, 2019 at 6:49 PM Gilles Gouaillardet
 wrote:
>
> Thanks,
>
> and your reproducer is ?
>
> Cheers,
>
> Gilles
>
> On Mon, Aug 19, 2019 at 6:42 PM Sangam B via users
>  wrote:
> >
> > Hi,
> >
> > OpenMPI is configured as follows:
> >
> > export CC=`which clang`
> > export CXX=`which clang++`
> > export FC=`which flang`
> > export F90=`which flang`
> >
> > ../configure --prefix=/sw/openmpi/3.1.1/aocc20hpcx210-mpifort 
> > --enable-mpi-fortran --enable-mpi-cxx --without-psm --without-psm2 
> > --without-knem --without-libfabric --without-lsf --with-verbs=/usr 
> > --with-mxm=/sw/hpcx/hpcx-v2.1.0-gcc-MLNX_OFED_LINUX-4.3-1.0.1.0-redhat7.4-x86_64/mxm
> >
> >
> > ..
> >
> > On Mon, Aug 19, 2019 at 2:43 PM Sangam B  wrote:
> >>
> >> Hi,
> >>
> >> I get following error if the application is compiled with openmpi-3.1.1:
> >>
> >> mpifort -O3 -march=native -funroll-loops -finline-aggressive -flto 
> >> -J./bin/obj_amd64aocc20 -std=f2008 -O3 -march=native -funroll-loops 
> >> -finline-aggressive -flto -fallow-fortran-gnu-ext -ffree-form 
> >> -fdefault-real-8 example_program.F90
> >> F90-S-0155-Could not resolve generic procedure mpi_irecv ( 
> >> example_program.F90  : 97)
> >>   0 inform,   0 warnings,   1 severes, 0 fatal for recv
> >>
> >> Following is the line causing this error:
> >>
> >> call MPI_Irecv(this%bf(:,:,1:cs3),size,MPI_COMPLEX_TYPE,&
> >> this%nrank,this%tag,this%comm_xvw,this%request,ierr)
> >>
> >> The program has following module mentioned in the beginning:
> >>  use mpi
> >>
> >> The openmpi has following module files in lib folder:
> >> $ ls *.mod
> >> mpi_ext.modmpi_f08_ext.mod   
> >> mpi_f08_interfaces.mod  mpi_f08_types.mod  pmpi_f08_interfaces.mod
> >> mpi_f08_callbacks.mod  mpi_f08_interfaces_callbacks.mod  mpi_f08.mod   
> >>   mpi.mod
> >>
> >> The same program works with Intel MPI (gcc/intel as base compilers).
> >> But fails with OpenMPI, whether gcc-8.1.0 or AOCC are used as base 
> >> compilers. What could be the reason for it?
> >>
> >> ..
> >
> > ___
> > users mailing list
> > users@lists.open-mpi.org
> > https://lists.open-mpi.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


Re: [OMPI users] Error with OpenMPI: Could not resolve generic procedure mpi_irecv

2019-08-19 Thread Gilles Gouaillardet via users
Thanks,

and your reproducer is ?

Cheers,

Gilles

On Mon, Aug 19, 2019 at 6:42 PM Sangam B via users
 wrote:
>
> Hi,
>
> OpenMPI is configured as follows:
>
> export CC=`which clang`
> export CXX=`which clang++`
> export FC=`which flang`
> export F90=`which flang`
>
> ../configure --prefix=/sw/openmpi/3.1.1/aocc20hpcx210-mpifort 
> --enable-mpi-fortran --enable-mpi-cxx --without-psm --without-psm2 
> --without-knem --without-libfabric --without-lsf --with-verbs=/usr 
> --with-mxm=/sw/hpcx/hpcx-v2.1.0-gcc-MLNX_OFED_LINUX-4.3-1.0.1.0-redhat7.4-x86_64/mxm
>
>
> ..
>
> On Mon, Aug 19, 2019 at 2:43 PM Sangam B  wrote:
>>
>> Hi,
>>
>> I get following error if the application is compiled with openmpi-3.1.1:
>>
>> mpifort -O3 -march=native -funroll-loops -finline-aggressive -flto 
>> -J./bin/obj_amd64aocc20 -std=f2008 -O3 -march=native -funroll-loops 
>> -finline-aggressive -flto -fallow-fortran-gnu-ext -ffree-form 
>> -fdefault-real-8 example_program.F90
>> F90-S-0155-Could not resolve generic procedure mpi_irecv ( 
>> example_program.F90  : 97)
>>   0 inform,   0 warnings,   1 severes, 0 fatal for recv
>>
>> Following is the line causing this error:
>>
>> call MPI_Irecv(this%bf(:,:,1:cs3),size,MPI_COMPLEX_TYPE,&
>> this%nrank,this%tag,this%comm_xvw,this%request,ierr)
>>
>> The program has following module mentioned in the beginning:
>>  use mpi
>>
>> The openmpi has following module files in lib folder:
>> $ ls *.mod
>> mpi_ext.modmpi_f08_ext.mod   
>> mpi_f08_interfaces.mod  mpi_f08_types.mod  pmpi_f08_interfaces.mod
>> mpi_f08_callbacks.mod  mpi_f08_interfaces_callbacks.mod  mpi_f08.mod 
>> mpi.mod
>>
>> The same program works with Intel MPI (gcc/intel as base compilers).
>> But fails with OpenMPI, whether gcc-8.1.0 or AOCC are used as base 
>> compilers. What could be the reason for it?
>>
>> ..
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


Re: [OMPI users] Error with OpenMPI: Could not resolve generic procedure mpi_irecv

2019-08-19 Thread Gilles Gouaillardet via users
Hi,

Can you please post a full but minimal example that evidences the issue?

Also please post your Open MPI configure command line.

Cheers,


Gilles 

Sent from my iPod

> On Aug 19, 2019, at 18:13, Sangam B via users  
> wrote:
> 
> Hi,
> 
> I get following error if the application is compiled with openmpi-3.1.1:
> 
> mpifort -O3 -march=native -funroll-loops -finline-aggressive -flto 
> -J./bin/obj_amd64aocc20 -std=f2008 -O3 -march=native -funroll-loops 
> -finline-aggressive -flto -fallow-fortran-gnu-ext -ffree-form 
> -fdefault-real-8 example_program.F90
> F90-S-0155-Could not resolve generic procedure mpi_irecv ( 
> example_program.F90  : 97)
>   0 inform,   0 warnings,   1 severes, 0 fatal for recv
> 
> Following is the line causing this error:
> 
> call MPI_Irecv(this%bf(:,:,1:cs3),size,MPI_COMPLEX_TYPE,&
> this%nrank,this%tag,this%comm_xvw,this%request,ierr)
> 
> The program has following module mentioned in the beginning:
>  use mpi
> 
> The openmpi has following module files in lib folder:
> $ ls *.mod
> mpi_ext.modmpi_f08_ext.mod   
> mpi_f08_interfaces.mod  mpi_f08_types.mod  pmpi_f08_interfaces.mod
> mpi_f08_callbacks.mod  mpi_f08_interfaces_callbacks.mod  mpi_f08.mod  
>mpi.mod
> 
> The same program works with Intel MPI (gcc/intel as base compilers).
> But fails with OpenMPI, whether gcc-8.1.0 or AOCC are used as base compilers. 
> What could be the reason for it?
> 
> ..
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


Re: [OMPI users] OMPI was not built with SLURM's PMI support

2019-08-08 Thread Gilles GOUAILLARDET via users
Hi,

You need to

configure --with-pmi ...


Cheers,

Gilles

On August 8, 2019, at 11:28 PM, Jing Gong via users  
wrote:

 

Hi,


Recently our Slurm system has been upgraded to 19.0.5. I tried to recompile 
openmpi v3.0 due to the bug reported in


https://bugs.schedmd.com/show_bug.cgi?id=6993


The configure flags are:


$./configure --enable-shared --enable-static --with-slurm --with-pmix


and the output of ompi_info is following


$ ompi_info -a |grep pmix
  Configure command line: '--enable-shared' '--enable-static' '--with-slurm' 
'--with-pmix'


   MCA pmix: isolated (MCA v2.1.0, API v2.0.0, Component v3.0.0)
 MCA pmix: pmix2x (MCA v2.1.0, API v2.0.0, Component v3.0.0)
   MCA pmix base: ---
   MCA pmix base: parameter "pmix" (current value: "", data source: 
default, level: 2 user/detail, type: string)
  Default selection set of components for the pmix 
framework ( means use all components that can be found)
   MCA pmix base: ---
   MCA pmix base: parameter "pmix_base_verbose" (current value: 
"error", data source: default, level: 8 dev/detail, type: int)
  Verbosity level for the pmix framework (default: 0)
   MCA pmix base: parameter "pmix_base_async_modex" (current value: 
"false", data source: default, level: 9 dev/all, type: bool)
   MCA pmix base: parameter "pmix_base_collect_data" (current value: 
"true", data source: default, level: 9 dev/all, type: bool)
   MCA pmix base: parameter "pmix_base_exchange_timeout" (current 
value: "-1", data source: default, level: 3 user/all, type: int)
 MCA pmix pmix2x: ---
 MCA pmix pmix2x: parameter "pmix_pmix2x_silence_warning" (current 
value: "false", data source: default, level: 4 tuner/basic, type: bool)

But when srun the openmpi, I got error likes




$ srun -n 4 ./a.out


--
The application appears to have been direct launched using "srun",
but OMPI was not built with SLURM's PMI support and therefore cannot
execute. There are several options for building PMI support under
SLURM, depending upon the SLURM version you are using:

  version 16.05 or later: you can use SLURM's PMIx support. This
  requires that you configure and build SLURM --with-pmix.

  Versions earlier than 16.05: you must use either SLURM's PMI-1 or
  PMI-2 support. SLURM builds PMI-1 by default, or you can manually
  install PMI-2. You must then build Open MPI using --with-pmi pointing
  to the SLURM PMI library location.

Please configure as appropriate and try again.
--
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
Local abort before MPI_INIT completed completed successfully, but am not able 
to aggregate error messages, and not able to guarantee that all other processes 
were killed!
===


How can I check if openmpi is built for the PMI support ?


Thanks a lot. /Jing 





___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] OpenMPI 2.1.1 bug on Ubuntu 18.04.2 LTS

2019-08-01 Thread Gilles Gouaillardet via users

Juanchao,


Is the issue related to https://github.com/open-mpi/ompi/pull/4501 ?


Jeff,


you might have to configure with --enable-heterogeneous to evidence the 
issue




Cheers,


Gilles

On 8/2/2019 4:06 AM, Jeff Squyres (jsquyres) via users wrote:

I am able to replicate the issue on a stock Ubuntu 18.04 install with their 
Open MPI package.

But if I compile my own Open MPI 2.1.1, it works fine.
Also, if I compile my own Open MPI 2.1.6, it works fine.

I filed a bug at Ubuntu about this:

 https://bugs.launchpad.net/ubuntu/+source/xubuntu-meta/+bug/1838684




On Aug 1, 2019, at 2:33 PM, Zhang, Junchao  wrote:

$ aptitude versions libopenmpi-dev
Package libopenmpi-dev:
i   2.1.1-8  bionic  500
Package libopenmpi-dev:i386:
p   2.1.1-8 bionic  500

$ sudo apt-get install libopenmpi-dev=2.1.6
Reading package lists... Done
Building dependency tree
Reading state information... Done
E: Version '2.1.6' for 'libopenmpi-dev' was not found

--Junchao Zhang


On Thu, Aug 1, 2019 at 1:15 PM Jeff Squyres (jsquyres)  
wrote:
Does the bug exist in Open MPI v2.1.6?


On Jul 31, 2019, at 2:19 PM, Zhang, Junchao via users 
 wrote:

Hello,
   I met a bug with OpenMPI 2.1.1 distributed in the latest Ubuntu 18.04.2 LTS. 
It happens with self to self send/recv using MPI_ANY_SOURCE for message 
matching.  See the attached test code.  You can reproduce it even with one 
process.
   It is a severe bug. Since this Ubuntu is widely used and has long term 
support, could it be somehow fixed?
   Thanks a lot.

--Junchao Zhang
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


--
Jeff Squyres
jsquy...@cisco.com




___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


Re: [OMPI users] When is it save to free the buffer after MPI_Isend?

2019-07-27 Thread Gilles Gouaillardet via users
Carlos,

MPI_Isend() does not automatically frees the buffer after it sends the message.
(it simply cannot do it since the buffer might be pointing to a global
variable or to the stack).

Can you please extract a reproducer from your program ?

Out of curiosity, what if you insert an (useless) MPI_Wait() like this ?

MPI_Test(req, , );
if (flag){
MPI_Wait(req, MPI_STATUS_IGNORE);
free(buffer);
}

Cheers,

Gilles

On Sun, Jul 28, 2019 at 5:45 AM carlos aguni via users
 wrote:
>
> Hi Jeff,
>
> Thank you for your reply.
>
> If i don't free the program completes but I'm not sure whether MPI_Isend 
> automatically frees the buffer after it sends the message. Does it?
>
> I put a long sleep at the end to check the memory used using pmap.
>
> The pmap command reported I'm using around 2GB which I'm guessing it isn't 
> freeing it.
>
> Is there anything I could try?
>
> Regards,
> C.
>
> On Mon, Jul 22, 2019 at 10:59 AM Jeff Squyres (jsquyres)  
> wrote:
>>
>> > On Jul 21, 2019, at 11:31 AM, carlos aguni via users 
>> >  wrote:
>> >
>> > MPI_Isend()
>> > ... some stuff..
>> > flag = 0;
>> > MPI_Test(req, , );
>> > if (flag){
>> > free(buffer);
>> > }
>> >
>> > After the free() i'm getting errors like:
>> > [[58327,1],0][btl_tcp_frag.c:130:mca_btl_tcp_frag_send] 
>> > mca_btl_tcp_frag_send: writev error (0x2b9daf474000, 12800)
>> > Bad address(1)
>> > [[58327,1],0][btl_tcp_frag.c:130:mca_btl_tcp_frag_send] 
>> > mca_btl_tcp_frag_send: writev error (0x2b9daf473ee8, 19608)
>> > Bad address(1)
>> > pml_ob1_sendreq.c:308 FATAL
>>
>> Do you get the same error if you don't free()?
>>
>> --
>> Jeff Squyres
>> jsquy...@cisco.com
>>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


Re: [OMPI users] How is the rank determined (Open MPI and Podman)

2019-07-22 Thread Gilles Gouaillardet via users

Adrian,


An option is to involve the modex.

each task would OPAL_MODEX_SEND() its own namespace ID, and then 
OPAL_MODEX_RECV()


the one from its peers and decide whether CMA support can be enabled.


Cheers,


Gilles

On 7/22/2019 4:53 PM, Adrian Reber via users wrote:

I had a look at it and not sure if it really makes sense.

In btl_vader_{put,get}.c it would be easy to check for the user
namespace ID of the other process, but the function would then just
return OPAL_ERROR a bit earlier instead of as a result of
process_vm_{read,write}v(). Nothing would really change.

A better place for the check would be mca_btl_vader_check_single_copy()
but I do not know if at this point the PID of the other processes is
already known. Not sure if I can check for the user namespace ID of the
other processes.

Any recommendations how to do this?

Adrian

On Sun, Jul 21, 2019 at 03:08:01PM -0400, Nathan Hjelm wrote:

Patches are always welcome. What would be great is a nice big warning that CMA 
support is disabled because the processes are on different namespaces. Ideally 
all MPI processes should be on the same namespace to ensure the best 
performance.

-Nathan


On Jul 21, 2019, at 2:53 PM, Adrian Reber via users  
wrote:

For completeness I am mentioning my results also here.

To be able to mount file systems in the container it can only work if
user namespaces are used and even if the user IDs are all the same (in
each container and on the host), to be able to ptrace the kernel also
checks if the processes are in the same user namespace (in addition to
being owned by the same user). This check - same user namespace - fails
and so process_vm_readv() and process_vm_writev() will also fail.

So Open MPI's checks are currently not enough to detect if 'cma' can be
used. Checking for the same user namespace would also be necessary.

Is this a use case important enough to accept a patch for it?

Adrian


On Fri, Jul 12, 2019 at 03:42:15PM +0200, Adrian Reber via users wrote:
Gilles,

thanks again. Adding '--mca btl_vader_single_copy_mechanism none' helps
indeed.

The default seems to be 'cma' and that seems to use process_vm_readv()
and process_vm_writev(). That seems to require CAP_SYS_PTRACE, but
telling Podman to give the process CAP_SYS_PTRACE with '--cap-add=SYS_PTRACE'
does not seem to be enough. Not sure yet if this related to the fact
that Podman is running rootless. I will continue to investigate, but now
I know where to look. Thanks!

Adrian


On Fri, Jul 12, 2019 at 06:48:59PM +0900, Gilles Gouaillardet via users wrote:
Adrian,

Can you try
mpirun --mca btl_vader_copy_mechanism none ...

Please double check the MCA parameter name, I am AFK

IIRC, the default copy mechanism used by vader directly accesses the remote 
process address space, and this requires some permission (ptrace?) that might 
be dropped by podman.

Note Open MPI might not detect both MPI tasks run on the same node because of 
podman.
If you use UCX, then btl/vader is not used at all (pml/ucx is used instead)


Cheers,

Gilles

Sent from my iPod


On Jul 12, 2019, at 18:33, Adrian Reber via users  
wrote:

So upstream Podman was really fast and merged a PR which makes my
wrapper unnecessary:

Add support for --env-host : https://github.com/containers/libpod/pull/3557

As commented in the PR I can now start mpirun with Podman without a
wrapper:

$ mpirun --hostfile ~/hosts --mca orte_tmpdir_base /tmp/podman-mpirun podman 
run --env-host --security-opt label=disable -v 
/tmp/podman-mpirun:/tmp/podman-mpirun --userns=keep-id --net=host mpi-test 
/home/mpi/ring
Rank 0 has cleared MPI_Init
Rank 1 has cleared MPI_Init
Rank 0 has completed ring
Rank 0 has completed MPI_Barrier
Rank 1 has completed ring
Rank 1 has completed MPI_Barrier

This is example was using TCP and on an InfiniBand based system I have
to map the InfiniBand devices into the container.

$ mpirun --mca btl ^openib --hostfile ~/hosts --mca orte_tmpdir_base 
/tmp/podman-mpirun podman run --env-host -v 
/tmp/podman-mpirun:/tmp/podman-mpirun --security-opt label=disable 
--userns=keep-id --device /dev/infiniband/uverbs0 --device 
/dev/infiniband/umad0 --device /dev/infiniband/rdma_cm --net=host mpi-test 
/home/mpi/ring
Rank 0 has cleared MPI_Init
Rank 1 has cleared MPI_Init
Rank 0 has completed ring
Rank 0 has completed MPI_Barrier
Rank 1 has completed ring
Rank 1 has completed MPI_Barrier

This is all running without root and only using Podman's rootless
support.

Running multiple processes on one system, however, still gives me an
error. If I disable vader I guess that Open MPI is using TCP for
localhost communication and that works. But with vader it fails.

The first error message I get is a segfault:

[test1:1] *** Process received signal ***
[test1:1] Signal: Segmentation fault (11)
[test1:1] Signal code: Address not mapped (1)
[test1:1] Failing at address: 0x7fb7b1552010
[test1:1] [ 0] /lib64/libpthread.so.0(+0x12d80

Re: [OMPI users] How it the rank determined (Open MPI and Podman)

2019-07-12 Thread Gilles Gouaillardet via users
t;> thanks for pointing out the environment variables. I quickly created a
>> wrapper which tells Podman to re-export all OMPI_ and PMIX_ variables
>> (grep "\(PMIX\|OMPI\)"). Now it works:
>> 
>> $ mpirun --hostfile ~/hosts ./wrapper -v /tmp:/tmp --userns=keep-id 
>> --net=host mpi-test /home/mpi/hello
>> 
>> Hello, world (2 procs total)
>>--> Process #   0 of   2 is alive. ->test1
>>--> Process #   1 of   2 is alive. ->test2
>> 
>> I need to tell Podman to mount /tmp from the host into the container, as
>> I am running rootless I also need to tell Podman to use the same user ID
>> in the container as outside (so that the Open MPI files in /tmp) can be
>> shared and I am also running without a network namespace.
>> 
>> So this is now with the full Podman provided isolation except the
>> network namespace. Thanks for you help!
>> 
>>Adrian
>> 
>>> On Thu, Jul 11, 2019 at 04:47:21PM +0900, Gilles Gouaillardet via users 
>>> wrote:
>>> Adrian,
>>> 
>>> 
>>> the MPI application relies on some environment variables (they typically
>>> start with OMPI_ and PMIX_).
>>> 
>>> The MPI application internally uses a PMIx client that must be able to
>>> contact a PMIx server
>>> 
>>> (that is included in mpirun and the orted daemon(s) spawned on the remote
>>> hosts).
>>> 
>>> located on the same host.
>>> 
>>> 
>>> If podman provides some isolation between the app inside the container (e.g.
>>> /home/mpi/hello)
>>> 
>>> and the outside world (e.g. mpirun/orted), that won't be an easy ride.
>>> 
>>> 
>>> Cheers,
>>> 
>>> 
>>> Gilles
>>> 
>>> 
>>>> On 7/11/2019 4:35 PM, Adrian Reber via users wrote:
>>>> I did a quick test to see if I can use Podman in combination with Open
>>>> MPI:
>>>> 
>>>> [test@test1 ~]$ mpirun --hostfile ~/hosts podman run 
>>>> quay.io/adrianreber/mpi-test /home/mpi/hello
>>>> 
>>>>  Hello, world (1 procs total)
>>>> --> Process #   0 of   1 is alive. ->789b8fb622ef
>>>> 
>>>>  Hello, world (1 procs total)
>>>> --> Process #   0 of   1 is alive. ->749eb4e1c01a
>>>> 
>>>> The test program (hello) is taken from 
>>>> https://raw.githubusercontent.com/openhpc/ohpc/obs/OpenHPC_1.3.8_Factory/tests/mpi/hello.c
>>>> 
>>>> 
>>>> The problem with this is that each process thinks it is process 0 of 1
>>>> instead of
>>>> 
>>>>  Hello, world (2 procs total)
>>>> --> Process #   1 of   2 is alive.  ->test1
>>>> --> Process #   0 of   2 is alive.  ->test2
>>>> 
>>>> My questions is how is the rank determined? What resources do I need to 
>>>> have
>>>> in my container to correctly determine the rank.
>>>> 
>>>> This is Podman 1.4.2 and Open MPI 4.0.1.
>>>> 
>>>>Adrian
>>>> ___
>>>> users mailing list
>>>> users@lists.open-mpi.org
>>>> https://lists.open-mpi.org/mailman/listinfo/users
>>>> 
>>> ___
>>> users mailing list
>>> users@lists.open-mpi.org
>>> https://lists.open-mpi.org/mailman/listinfo/users
>> ___
>> users mailing list
>> users@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/users
> 
>Adrian
> 
> -- 
> Adrian Reber http://lisas.de/~adrian/
> The data on your hard drive is out of balance.
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


Re: [OMPI users] How it the rank determined (Open MPI and Podman)

2019-07-11 Thread Gilles Gouaillardet via users

Adrian,


the MPI application relies on some environment variables (they typically 
start with OMPI_ and PMIX_).


The MPI application internally uses a PMIx client that must be able to 
contact a PMIx server


(that is included in mpirun and the orted daemon(s) spawned on the 
remote hosts).


located on the same host.


If podman provides some isolation between the app inside the container 
(e.g. /home/mpi/hello)


and the outside world (e.g. mpirun/orted), that won't be an easy ride.


Cheers,


Gilles


On 7/11/2019 4:35 PM, Adrian Reber via users wrote:

I did a quick test to see if I can use Podman in combination with Open
MPI:

[test@test1 ~]$ mpirun --hostfile ~/hosts podman run 
quay.io/adrianreber/mpi-test /home/mpi/hello

  Hello, world (1 procs total)
 --> Process #   0 of   1 is alive. ->789b8fb622ef

  Hello, world (1 procs total)
 --> Process #   0 of   1 is alive. ->749eb4e1c01a

The test program (hello) is taken from 
https://raw.githubusercontent.com/openhpc/ohpc/obs/OpenHPC_1.3.8_Factory/tests/mpi/hello.c


The problem with this is that each process thinks it is process 0 of 1
instead of

  Hello, world (2 procs total)
 --> Process #   1 of   2 is alive.  ->test1
 --> Process #   0 of   2 is alive.  ->test2

My questions is how is the rank determined? What resources do I need to have
in my container to correctly determine the rank.

This is Podman 1.4.2 and Open MPI 4.0.1.

Adrian
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


Re: [OMPI users] Naming scheme of PSM2 and Vader shared memory segments

2019-07-07 Thread Gilles Gouaillardet via users

Sebastian,


the PSM2 shared memory segment name is set by the PSM2 library and

my understanding is that Open MPI has no control over it.


If you believe the root cause of the crash is related to non unique PSM2 
shared


memory segment name, I guess you should report this at 
https://github.com/intel/opa-psm2




Below is a snippet from ptl_am/am_reqrep_shmem.c



Cheers,


Gilles


psm2_error_t psmi_shm_create(ptl_t *ptl_gen)
{
// ...

   snprintf(shmbuf,
 sizeof(shmbuf),
 "/psm2_shm.%ld%016lx%d",
 (long int) getuid(),
 ep->epid,
 iterator);
    amsh_keyname = psmi_strdup(NULL, shmbuf);
 // ...

  shmfd =
    shm_open(amsh_keyname, O_RDWR | O_CREAT, S_IRUSR | 
S_IWUSR);



On 7/5/2019 4:13 AM, Kraus, Sebastian via users wrote:

Hi all,
anyone around there, who could explain me how the naming scheme for the PSM2 
and Vader shared memory segments is constructed.
I am curious if there is a possibility to influence the naming scheme via 
run-time parameters. I am confronted to the situation where distinct
SLURM jobs of the same user on the same node randomly segfault. I suppose that 
the problem is connected with the non-unique naming
scheme of the PSM2 shared memory segments (as determined by openmpi/SLURM).
The PSM segments show the following naming convention: 
/dev/shm/psm2_shm.[user_id][some_mask]
Unfortunately, the values of the mask do not change for distinct SLURM jobs. 
Instead  the names of the Vader segments show uniqueness for
different process ids: 
/dev/shm/vader_segment.[nodename].[some_process-mask].[SLURM_STEPID]

An example:

Vader segments:
-rw--- 1 XXX YYY 4.1M Jul  4 19:09 
/dev/shm/vader_segment.nodename.93e1.5
-rw--- 1 XXX YYY 4.1M Jul  4 19:09 
/dev/shm/vader_segment.nodename.93e1.3
-rw--- 1 XXX YYY 4.1M Jul  4 19:09 
/dev/shm/vader_segment.nodename.93e1.1
-rw--- 1 XXX YYY 4.1M Jul  4 19:09 
/dev/shm/vader_segment.nodename.93e1.7
-rw--- 1 XXX YYY 4.1M Jul  4 19:09 
/dev/shm/vader_segment.nodename.93e1.6
-rw--- 1 XXX YYY 4.1M Jul  4 19:09 
/dev/shm/vader_segment.nodename.93e1.0
-rw--- 1 XXX YYY 4.1M Jul  4 19:09 
/dev/shm/vader_segment.nodename.93e1.2
-rw--- 1 XXX YYY 4.1M Jul  4 19:09 
/dev/shm/vader_segment.nodename.93e1.4
-rw--- 1 XXX YYY 4.1M Jul  4 19:09 
/dev/shm/vader_segment.nodename.93650001.7
-rw--- 1 XXX YYY 4.1M Jul  4 19:09 
/dev/shm/vader_segment.nodename.93650001.5
-rw--- 1 XXX YYY 4.1M Jul  4 19:09 
/dev/shm/vader_segment.nodename.93650001.1
-rw--- 1 XXX YYY 4.1M Jul  4 19:09 
/dev/shm/vader_segment.nodename.93650001.4
-rw--- 1 XXX YYY 4.1M Jul  4 19:09 
/dev/shm/vader_segment.nodename.93650001.3
-rw--- 1 XXX YYY 4.1M Jul  4 19:09 
/dev/shm/vader_segment.nodename.93650001.0
-rw--- 1 XXX YYY 4.1M Jul  4 19:09 
/dev/shm/vader_segment.nodename.93650001.6
-rw--- 1 XXX YYY 4.1M Jul  4 19:09 
/dev/shm/vader_segment.nodename.93650001.2

PSM2 segments:
-rw--- 1 XXX YYY 4.2M Jul  4 19:09 
/dev/shm/psm2_shm.11764850007ffe00
-rw--- 1 XXX YYY 4.2M Jul  4 19:09 
/dev/shm/psm2_shm.11764850006ffc00
-rw--- 1 XXX YYY 4.2M Jul  4 19:09 
/dev/shm/psm2_shm.11764850005ffa00
-rw--- 1 XXX YYY 4.2M Jul  4 19:09 
/dev/shm/psm2_shm.11764850003ff600
-rw--- 1 XXX YYY 4.2M Jul  4 19:09 
/dev/shm/psm2_shm.11764850002ff400
-rw--- 1 XXX YYY 4.2M Jul  4 19:09 
/dev/shm/psm2_shm.11764850001ff200
-rw--- 1 XXX YYY 4.2M Jul  4 19:09 
/dev/shm/psm2_shm.1176485ff000
-rw--- 1 XXX YYY 4.2M Jul  4 19:09 
/dev/shm/psm2_shm.11764850004ff800

Thanks for your time and support
Sebastian


Sebastian Kraus

Technische Universität Berlin
Fakultät II
Institut für Chemie
Sekretariat C3
Straße des 17. Juni 135
10623 Berlin
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Problems with MPI_Comm_spawn

2019-07-02 Thread Gilles Gouaillardet via users

Thanks for the report,


this is indeed a bug I fixed at https://github.com/open-mpi/ompi/pull/6790

meanwhile, you can manually download and apply the patch at 
https://github.com/open-mpi/ompi/pull/6790.patch



Cheers,


Gilles


On 7/3/2019 1:30 AM, Gyevi-Nagy László via users wrote:


Hi,

I had some issues with spawning processes in Fortran. I currently use 
Open MPI v4.0.1. I've looked into it and stumbled into a few errors in 
ompi_comm_spawn_f().


If 8-byte integers are used, when the function is called with 
array_of_errcodes=MPI_ERRCODES_IGNORE, the array c_array_of_errcodes 
does not get allocated but it is freed before the function returns. 
Also, when array_of_errcodes is not MPI_ERRCODES_IGNORE, size (the 
size of the calling communicator) * sizeof(int) bytes are allocated 
for c_array_of_errcodes, but it is filled - correctly - with maxprocs 
(the number of spawned processes) integers in mpi_comm_spawn().


I also had trouble with -map-by node and mpi_comm_spawn. This bug has 
since been fixed (Oct 6, 2018, commit 51acbf7) but I can only see it 
on the master branch. Could you tell me when it will be available in a 
stable release?


Thank you in advance,

László Gyevi-Nagy


___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Possible bugs in MPI_Neighbor_alltoallv()

2019-06-27 Thread Gilles Gouaillardet via users

Thanks Junchao,


I issued https://github.com/open-mpi/ompi/pull/6782 in order to fix this 
(and the alltoallw variant as well)


Meanwhile, you can manually download and apply the patch at 
https://github.com/open-mpi/ompi/pull/6782.patch




Cheers,


Gilles

On 6/28/2019 1:10 PM, Zhang, Junchao via users wrote:

Hello,
  When I do MPI_Neighbor_alltoallv or MPI_Ineighbor_alltoallv,  I find 
when either outdegree or indegree is zero, OpenMPI will return an 
error. The suspicious code is at pneighbor_alltoallv.c / 
pineighbor_alltoallv.c


101         } else if ((NULL == sendcounts) || (NULL == sdispls) ||
102             (NULL == recvcounts) || (NULL == rdispls) ||
103             MPI_IN_PLACE == sendbuf || MPI_IN_PLACE == recvbuf) {
104             return OMPI_ERRHANDLER_INVOKE(comm, MPI_ERR_ARG, 
FUNC_NAME);

105         }

Apparently,  the counts, displs error-checking should only be done 
when degree != 0.


Thanks.
--Junchao Zhang

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] undefined reference error related to ucx

2019-06-25 Thread Gilles Gouaillardet via users

Passant,


The support for UCX 1.6 has been backported into the v4.0.x branch, but 
after Open MPI 4.0.1 was released,


and it will be available in Open MPI 4.0.2.


Meanwhile, you can manually download and apply the patch at 
https://patch-diff.githubusercontent.com/raw/open-mpi/ompi/pull/6748.patch




Cheers,


Gilles

On 6/25/2019 5:51 PM, Passant A. Hafez via users wrote:

Thanks Gilles!

The thing is I'm having this error
ud_iface.c:271  UCX Assertion `qp_init_attr.cap.max_inline_data >= 
UCT_UD_MIN_INLINE' failed
and core files.

I looked that up and it was suggested here 
https://github.com/openucx/ucx/issues/3336 that the UCX 1.6 might solve this 
issue, so I tried the pre-release version to just check if it will.




All the best,
--
Passant


From: users  on behalf of Gilles Gouaillardet via 
users 
Sent: Tuesday, June 25, 2019 11:27 AM
To: Open MPI Users
Cc: Gilles Gouaillardet
Subject: Re: [OMPI users] undefined reference error related to ucx

Passant,

UCX 1.6.0 is not yet officially released, and it seems Open MPI
(4.0.1) does not support it yet, and some porting is needed.

Cheers,

Gilles

On Tue, Jun 25, 2019 at 5:13 PM Passant A. Hafez via users
 wrote:

Hello,


I'm trying to build ompi 4.0.1 with external ucx 1.6.0 but I'm getting


../../../opal/.libs/libopen-pal.so: undefined reference to 
`uct_ep_create_connected'
collect2: error: ld returned 1 exit status

configure line for ompi
./configure --prefix=/opt/ompi401_ucx16 --with-slurm --with-hwloc=internal 
--with-pmix=internal --enable-shared --enable-static --with-x 
--with-ucx=/opt/ucx-1.6.0

configure line for ucx
./configure --prefix=/opt/ucx-1.6.0


What could be the reason?






All the best,
--
Passant
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


Re: [OMPI users] undefined reference error related to ucx

2019-06-25 Thread Gilles Gouaillardet via users
Passant,

UCX 1.6.0 is not yet officially released, and it seems Open MPI
(4.0.1) does not support it yet, and some porting is needed.

Cheers,

Gilles

On Tue, Jun 25, 2019 at 5:13 PM Passant A. Hafez via users
 wrote:
>
> Hello,
>
>
> I'm trying to build ompi 4.0.1 with external ucx 1.6.0 but I'm getting
>
>
> ../../../opal/.libs/libopen-pal.so: undefined reference to 
> `uct_ep_create_connected'
> collect2: error: ld returned 1 exit status
>
> configure line for ompi
> ./configure --prefix=/opt/ompi401_ucx16 --with-slurm --with-hwloc=internal 
> --with-pmix=internal --enable-shared --enable-static --with-x 
> --with-ucx=/opt/ucx-1.6.0
>
> configure line for ucx
> ./configure --prefix=/opt/ucx-1.6.0
>
>
> What could be the reason?
>
>
>
>
>
>
> All the best,
> --
> Passant
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


Re: [OMPI users] error running mpirun command

2019-05-03 Thread Gilles Gouaillardet via users
Eric,

which version of Open MPI are you using ? how many hosts in your hostsfile ?

The error message suggests this could be a bug within Open MPI, and a
potential workaround for you would be to try
mpirun -np 84  - -hostfile hostsfile --mca routed direct ./openmpi_hello.c

You might also want to double check all your hosts can access each
other with TCP/IP and on all ports (e.g. no firewall should be
running)

Cheers,

Gilles


On Sat, May 4, 2019 at 9:41 AM Eric F. Alemany via users
 wrote:
>
> Hello everyone,
>
> I am new to Open MPI please forgive me for my beginner mistake. I read 
> through the FAQ of open-mpi.org website and built a small cluster (9 nodes - 
> including a master node).
> I thought i followed the instructions accordingly but i am having issue 
> running a simple mpirun.
>
> $ mpirun -np 84  - -hostfile hostsfile ./openmpi_hello.c
>
> mpirun: Forwarding signal 20 to job
> --
> ORTE does not know how to route a message to the specified daemon
> located on the indicated node:
>
>   my node:   phaser-manager
>   target node:  radonc-phaser01
>
> This is usually an internal programming error that should be
> reported to the developers. In the meantime, a workaround may
> be to set the MCA param routed=direct on the command line or
> in your environment. We apologize for the problem.
> —
>
> I dont understand the meaning of the error message. I can share more of my 
> configuration files if someone would be interested in helping me.
>
> Thank you in advance for your help.
>
>
> Best,
> Eric
>
> _
>
> Eric F.  Alemany
> System Administrator for Research
>
> IRT
> Division of Radiation & Cancer  Biology
> Department of Radiation Oncology
>
> Stanford University School of Medicine
> Stanford, California 94305
>
> Tel:1-650-498-7969  No Texting
> Fax:1-650-723-7382
>
>
>
>
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] 3.0.4, 4.0.1 build failure on OSX Mojave with LLVM

2019-04-24 Thread Gilles Gouaillardet via users
John,

what if you move some parameters to CPPFLAGS and CXXCPPFLAGS (see the
new configure command line below)

Cheers,

Gilles

'/Users/cary/projects/ulixesall-llvm/builds/openmpi-4.0.1/nodl/../configure'
\
--prefix=/Volumes/GordianStorage/opt/contrib-llvm7_appleclang/openmpi-4.0.1-nodl
\
CC='/Volumes/GordianStorage/opt/clang+llvm-7.0.0-x86_64-apple-darwin/bin/clang'
\
CXX='/Volumes/GordianStorage/opt/clang+llvm-7.0.0-x86_64-apple-darwin/bin/clang++'
\
   FC='/opt/homebrew/bin/gfortran-6' \
   F77='/opt/homebrew/bin/gfortran-6' \
   CFLAGS='-fvisibility=default -mmacosx-version-min=10.10 -fPIC -pipe' \
   
CPPFLAGS='-I/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include
-DSTDC_HEADERS' \
  CXXFLAGS='-std=c++11 -stdlib=libc++ -fvisibility=default
-mmacosx-version-min=10.10 -fPIC -pipe' \
  
CXXCPFLAGS='-I/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include'
\
   FCFLAGS='-fPIC -pipe' \
   --enable-static \
   --with-pic \
   --disable-dlopen \
   --enable-mpirun-prefix-by-default

On Wed, Apr 24, 2019 at 7:01 PM Jeff Squyres (jsquyres) via users
 wrote:
>
> Can you send at least your config.log file?  It would be good to know why the 
> "STDC" test is reporting that this setup does not have STDC headers when it 
> actually does.
>
>
> > On Apr 23, 2019, at 8:14 PM, John R. Cary  wrote:
> >
> > It appears that the problem is with AC_HEADER_STDC, which is reporting
> > that this setup does not have stdc headers when in fact it does.
> >
> > Overriding with
> >
> >  CFLAGS='-fvisibility=default -mmacosx-version-min=10.10 
> > -I/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include
> >  -fPIC -pipe -DSTDC_HEADERS -DOPAL_STDC_HEADERS'
> >
> > in particular, the last two defines, fixes this.
> >
> > .John
> >
> >
> >
> > On 4/23/2019 4:59 PM, Jeff Squyres (jsquyres) wrote:
> >> The version of LLVM that I have installed on my Mac 10.14.4 is:
> >>
> >> $ where clang
> >> /usr/bin/clang
> >> $ clang --version
> >> Apple LLVM version 10.0.1 (clang-1001.0.46.4)
> >> Target: x86_64-apple-darwin18.5.0
> >> Thread model: posix
> >> InstalledDir: 
> >> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
> >>
> >> I don't know how that compares to upstream clang v7.x...?
> >>
> >> More below.
> >>
> >>
> >>
> >>> On Apr 23, 2019, at 6:26 PM, John R. Cary via users 
> >>>  wrote:
> >>>
> >>> The failure is
> >>>
> >>> In file included from 
> >>> /Users/cary/projects/ulixesall-llvm/builds/openmpi-4.0.1/nodl/../ompi/datatype/ompi_datatype_external.c:29:
> >>> In file included from 
> >>> /Users/cary/projects/ulixesall-llvm/builds/openmpi-4.0.1/nodl/../ompi/communicator/communicator.h:38:
> >>> In file included from 
> >>> /Users/cary/projects/ulixesall-llvm/builds/openmpi-4.0.1/nodl/../ompi/errhandler/errhandler.h:33:
> >>> ../../ompi/include/mpi.h:397:9: error: unknown type name 'ptrdiff_t'
> >>> typedef OMPI_MPI_AINT_TYPE MPI_Aint;
> >>> ^
> >>> ../../opal/include/opal_config.h:1911:28: note: expanded from macro 
> >>> 'OMPI_MPI_AINT_TYPE'
> >>> #define OMPI_MPI_AINT_TYPE ptrdiff_t
> >>>
> >>>
> >>> Is there a known fix for this?
> >>>
> >>> Thx...John Cary
> >>>
> >>>
> >>> More info:
> >>>
> >>> Configured with
> >>>
> >> A few notes on your configure line:
> >>
> >>> '/Users/cary/projects/ulixesall-llvm/builds/openmpi-4.0.1/nodl/../configure'
> >>>  \
> >>> --prefix=/Volumes/GordianStorage/opt/contrib-llvm7_appleclang/openmpi-4.0.1-nodl
> >>>  \
> >>> CC='/Volumes/GordianStorage/opt/clang+llvm-7.0.0-x86_64-apple-darwin/bin/clang'
> >>>  \
> >>> CXX='/Volumes/GordianStorage/opt/clang+llvm-7.0.0-x86_64-apple-darwin/bin/clang++'
> >>>  \
> >> The MPI C++ bindings are no longer built by default, and you're not 
> >> enabling them (via --enable-mpi-cxx), so you don't need to specify CXX or 
> >> CXXFLAGS here.
> >>
> >>>   FC='/opt/homebrew/bin/gfortran-6' \
> >>>   F77='/opt/homebrew/bin/gfortran-6' \
> >> F77 is ignored these days; FC is the only one that matters.
> >>
> >>>   CFLAGS='-fvisibility=default -mmacosx-version-min=10.10 
> >>> -I/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include
> >>>  -fPIC -pipe -DSTDC_HEADERS' \
> >> Do you need all these CFLAGS?  E.g., does clang not -I that directory by 
> >> default (I don't actually know if it is necessary or not)?  What does 
> >> -DSTDC_HEADERS do?
> >>
> >>>   CXXFLAGS='-std=c++11 -stdlib=libc++ -fvisibility=default 
> >>> -mmacosx-version-min=10.10 
> >>> -I/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include
> >>>  -fPIC -pipe' \
> >>>   FCFLAGS='-fPIC -pipe' \
> >>>   --enable-static \
> >>>   --with-pic \
> >>>   --disable-dlopen \
> >>>   --enable-mpirun-prefix-by-default
> >> I don't have that version of clang, 

Re: [OMPI users] fatal error: ac_nonexistent.h: No such file or directory (openmpi-4.0.0)

2019-04-20 Thread Gilles Gouaillardet via users
The root cause is configure cannot run a simple Fortran program
(see the relevant log below)

I suggest you

export LD_LIBRARY_PATH=/share/apps/gcc-5.4.0/lib64:$LD_LIBRARY_PATH
and then try again.

Cheers,

Gilles

configure:44254: checking Fortran value of selected_int_kind(4)
configure:44281: /share/apps/gcc-5.4.0/bin/gfortran -o conftest
-L/share/apps/gcc-5.4.0/lib64  conftest.f90  -lz >&5
configure:44281: $? = 0
configure:44281: ./conftest
./conftest: /usr/lib64/libgfortran.so.3: version `GFORTRAN_1.4' not
found (required by ./conftest)
configure:44281: $? = 1
configure: program exited with status 1
configure: failed program was:
|   program main
|
| open(8, file="conftest.out")
| write(8, fmt="(I5)") selected_int_kind(4)
| close(8)
|
|   end
configure:44303: result: no
configure:44314: WARNING: Could not determine KIND value of C_SIGNED_CHAR
configure:44316: WARNING: See config.log for more details
configure:44318: error: Cannot continue

On Sun, Apr 21, 2019 at 9:32 AM Guido granda muñoz via users
 wrote:
>
> Hello openmpi users,
> I recently got an error during the instalation of openmpi-4.0.0. The error 
> showed up during the configuration.
> My configuration options are the following:
>   $ ./configure 
> --prefix=/home/guido/libraries/compiled_with_gcc-5.4.0/openmpi-4.0.0 
> --enable-fortran=all FC=/share/apps/gcc-5.4.0/bin/gfortran 
> CC=/share/apps/gcc-5.4.0/bin/gcc CXX=/share/apps/gcc-5.4.0/bin/g++ 
> CPPFLAGS=-I/share/apps/gcc-5.4.0/include LDFLAGS=-L/share/apps/gcc-5.4.0/lib64
> The gcc version used is : gcc version 5.4.0
>
> I got the following error:
>
> 1)
> configure:7047: /share/apps/gcc-5.4.0/bin/gcc -E 
> -I/share/apps/gcc-5.4.0/include conftest.c
> conftest.c:10:28: fatal error: ac_nonexistent.h: No such file or directory
> compilation terminated.
> 2)
> conftest.c:88:19: fatal error: /cuda.h: No such file or directory
>
> The full content of my log file can be download from:
>
> https://pastebin.com/9Gu62FTT?fbclid=IwAR3ajK2-S2t-wzRxBsq1g0_Fw54YC2kr0YwPuxO-xhJKKpb0QEY7uZkpirA
>
> Information about my system:
>
> $ uname -a
> Linux mouruka.crya.privado 2.6.32-504.16.2.el6.x86_64 #1 SMP Wed Apr 22 
> 06:48:29 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
>
> I hope you can help me to figure out what is wrong with my configuration 
> options or if something else
> is wrong. I am not an expert so please be answer accordingtly.
>
> Cheers,
>
>
> --
> Guido
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users