Re: [OMPI users] How to enable vprotocol + pessimistic message logging?

2019-01-22 Thread Aurelien Bouteiller
Hey Kiril, 

Indeed, the pessimist message logging does not support threaded accesses. That 
test is however overly cautious if you are initializing as multiple but then do 
not perform concurrent accesses. Please verify that he NBP3.3 do 
MPI_INIT_THREADS(MPI_THREAD_SINGLE), as doing other initialization will set 
MPI_THREAD_MULTIPLE implicitly. 

Best,
Aurelien

> On Jan 22, 2019, at 12:13, Kiril Dichev  wrote:
> 
> Hi,
> 
> I’m doing some research on message logging protocols. It seems that Vprotocol 
> in Open MPI can wrap around communication calls and log messages, if enabled. 
> Unfortunately, when I try to use it with Open MPI- 4.0.0, I get an error:
> 
> mpirun   --mca vprotocol pessimist-mca vprotocol_pessimist_priority 10  
> -n 4 $HOME/NPB3.3-MPI/bin/cg.B.4
> …
> vprotocol_pessimist: component_init: threads are enabled, and not supported 
> by vprotocol pessimist fault tolerant layer, will not load
> …
> 
> Unfortunately, it seems that actually disabling multi-threading is not 
> possible in 4.0.0 (MPI_THREAD_MULTIPLE is always used during compilation, and 
> in contrast to the README file, --enable-mpi-thread-multiple or 
> —disable-mpi-thread-multiple are not recognised as options). 
> 
> I’m pretty much stuck. Should I give up on the VProtocol as unusable then at 
> the moment?
> 
> Thanks,
> Kiril
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Help Getting Started with Open MPI and PMIx and UCX

2019-01-22 Thread Cabral, Matias A
Hi Matt,

There seem to be two different issues here:

a)  The warning message comes from the openib btl. Given that Omnipath has 
verbs API and you have the necessary libraries in your system, openib btl finds 
itself as a potential transport and prints the warning during its init (openib 
btl is its way to deprecation). You may try to explicitly ask for vader btl 
given you are running on shared mem: -mca btl self,vader -mca pml ob1. Or 
better, explicitly build without openib: ./configure --with-verbs=no …

b)  Not my field of expertise, but you may be having some conflict with the 
external components you are using: --with-pmix=/usr/nlocal/pmix/2.1 
--with-libevent=/usr . You may try not specifying these and using the ones 
provided by OMPI.

_MAC

From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Matt Thompson
Sent: Tuesday, January 22, 2019 6:04 AM
To: Open MPI Users 
Subject: Re: [OMPI users] Help Getting Started with Open MPI and PMIx and UCX

Well,

By turning off UCX compilation per Howard, things get a bit better in that 
something happens! It's not a good something, as it seems to die with an 
infiniband error. As this is an Omnipath system, is OpenMPI perhaps seeing 
libverbs somewhere and compiling it in? To wit:

(1006)(master) $ mpirun -np 4 ./helloWorld.mpi3.SLES12.OMPI400.exe
--
By default, for Open MPI 4.0 and later, infiniband ports on a device
are not used by default.  The intent is to use UCX for these devices.
You can override this policy by setting the btl_openib_allow_ib MCA parameter
to true.

  Local host:  borgc129
  Local adapter:   hfi1_0
  Local port:  1

--
--
WARNING: There was an error initializing an OpenFabrics device.

  Local host:   borgc129
  Local device: hfi1_0
--
Compiler Version: Intel(R) Fortran Intel(R) 64 Compiler for applications 
running on Intel(R) 64, Version 18.0.5.274 Build 20180823
MPI Version: 3.1
MPI Library Version: Open MPI v4.0.0, package: Open MPI mathomp4@discover23 
Distribution, ident: 4.0.0, repo rev: v4.0.0, Nov 12, 2018
[borgc129:260830] *** An error occurred in MPI_Barrier
[borgc129:260830] *** reported by process [140736833716225,46909632806913]
[borgc129:260830] *** on communicator MPI_COMM_WORLD
[borgc129:260830] *** MPI_ERR_OTHER: known error not in list
[borgc129:260830] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will 
now abort,
[borgc129:260830] ***and potentially your MPI job)
forrtl: error (78): process killed (SIGTERM)
Image  PCRoutineLineSource
helloWorld.mpi3.S  0040A38E  for__signal_handl Unknown  Unknown
libpthread-2.22.s  2B9CCB20  Unknown   Unknown  Unknown
libpthread-2.22.s  2B9C90CD  pthread_cond_wait Unknown  Unknown
libpmix.so.2.1.11  2AAAB1D780A1  PMIx_AbortUnknown  Unknown
mca_pmix_ext2x.so  2AAAB1B3AA75  ext2x_abort   Unknown  Unknown
mca_ess_pmi.so 2AAAB1724BC0  Unknown   Unknown  Unknown
libopen-rte.so.40  2C3E941C  orte_errmgr_base_ Unknown  Unknown
mca_errmgr_defaul  2AAABC401668  Unknown   Unknown  Unknown
libmpi.so.40.20.0  2B3CDBC4  ompi_mpi_abortUnknown  Unknown
libmpi.so.40.20.0  2B3BB1EF  ompi_mpi_errors_a Unknown  Unknown
libmpi.so.40.20.0  2B3B99C9  ompi_errhandler_i Unknown  Unknown
libmpi.so.40.20.0  2B3E4576  MPI_Barrier   Unknown  Unknown
libmpi_mpifh.so.4  2B15EE53  MPI_Barrier_f08   Unknown  Unknown
libmpi_usempif08.  2ACE7732  mpi_barrier_f08_  Unknown  Unknown
helloWorld.mpi3.S  0040939F  Unknown   Unknown  Unknown
helloWorld.mpi3.S  0040915E  Unknown   Unknown  Unknown
libc-2.22.so   2BBF96D5  __libc_start_main 
Unknown  Unknown
helloWorld.mpi3.S  00409069  Unknown   Unknown  Unknown

On Sun, Jan 20, 2019 at 4:19 PM Howard Pritchard 
mailto:hpprit...@gmail.com>> wrote:
Hi Matt

Definitely do not include the ucx option for an omnipath cluster.  Actually if 
you accidentally installed ucx in it’s default location use on the system 
Switch to this config option

—with-ucx=no

Otherwise you will hit

https://github.com/openucx/ucx/issues/750

Howard


Gilles Gouaillardet 
mailto:gilles.gouaillar...@gmail.com>> schrieb 
am Sa. 19. Jan. 2019 um 18:41:
Matt,

There are two ways of using PMIx

- if you use mpirun, then the MPI app (e.g. the PMIx client) will talk
to mpirun and orted daemons (e.g. the PMIx server)
- if you use SLURM srun, then the MPI app will directly talk to the
PMIx server provided by SLURM. 

[OMPI users] How to enable vprotocol + pessimistic message logging?

2019-01-22 Thread Kiril Dichev
Hi,

I’m doing some research on message logging protocols. It seems that Vprotocol 
in Open MPI can wrap around communication calls and log messages, if enabled. 
Unfortunately, when I try to use it with Open MPI- 4.0.0, I get an error:

mpirun   --mca vprotocol pessimist-mca vprotocol_pessimist_priority 10  -n 
4 $HOME/NPB3.3-MPI/bin/cg.B.4
…
vprotocol_pessimist: component_init: threads are enabled, and not supported by 
vprotocol pessimist fault tolerant layer, will not load
…

Unfortunately, it seems that actually disabling multi-threading is not possible 
in 4.0.0 (MPI_THREAD_MULTIPLE is always used during compilation, and in 
contrast to the README file, --enable-mpi-thread-multiple or 
—disable-mpi-thread-multiple are not recognised as options). 

I’m pretty much stuck. Should I give up on the VProtocol as unusable then at 
the moment?

Thanks,
Kiril___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Help Getting Started with Open MPI and PMIx and UCX

2019-01-22 Thread Matt Thompson
Well,

By turning off UCX compilation per Howard, things get a bit better in that
something happens! It's not a good something, as it seems to die with an
infiniband error. As this is an Omnipath system, is OpenMPI perhaps seeing
libverbs somewhere and compiling it in? To wit:

(1006)(master) $ mpirun -np 4 ./helloWorld.mpi3.SLES12.OMPI400.exe
--
By default, for Open MPI 4.0 and later, infiniband ports on a device
are not used by default.  The intent is to use UCX for these devices.
You can override this policy by setting the btl_openib_allow_ib MCA
parameter
to true.

  Local host:  borgc129
  Local adapter:   hfi1_0
  Local port:  1

--
--
WARNING: There was an error initializing an OpenFabrics device.

  Local host:   borgc129
  Local device: hfi1_0
--
Compiler Version: Intel(R) Fortran Intel(R) 64 Compiler for applications
running on Intel(R) 64, Version 18.0.5.274 Build 20180823
MPI Version: 3.1
MPI Library Version: Open MPI v4.0.0, package: Open MPI mathomp4@discover23
Distribution, ident: 4.0.0, repo rev: v4.0.0, Nov 12, 2018
[borgc129:260830] *** An error occurred in MPI_Barrier
[borgc129:260830] *** reported by process [140736833716225,46909632806913]
[borgc129:260830] *** on communicator MPI_COMM_WORLD
[borgc129:260830] *** MPI_ERR_OTHER: known error not in list
[borgc129:260830] *** MPI_ERRORS_ARE_FATAL (processes in this communicator
will now abort,
[borgc129:260830] ***and potentially your MPI job)
forrtl: error (78): process killed (SIGTERM)
Image  PCRoutineLineSource
helloWorld.mpi3.S  0040A38E  for__signal_handl Unknown  Unknown
libpthread-2.22.s  2B9CCB20  Unknown   Unknown  Unknown
libpthread-2.22.s  2B9C90CD  pthread_cond_wait Unknown  Unknown
libpmix.so.2.1.11  2AAAB1D780A1  PMIx_AbortUnknown  Unknown
mca_pmix_ext2x.so  2AAAB1B3AA75  ext2x_abort   Unknown  Unknown
mca_ess_pmi.so 2AAAB1724BC0  Unknown   Unknown  Unknown
libopen-rte.so.40  2C3E941C  orte_errmgr_base_ Unknown  Unknown
mca_errmgr_defaul  2AAABC401668  Unknown   Unknown  Unknown
libmpi.so.40.20.0  2B3CDBC4  ompi_mpi_abortUnknown  Unknown
libmpi.so.40.20.0  2B3BB1EF  ompi_mpi_errors_a Unknown  Unknown
libmpi.so.40.20.0  2B3B99C9  ompi_errhandler_i Unknown  Unknown
libmpi.so.40.20.0  2B3E4576  MPI_Barrier   Unknown  Unknown
libmpi_mpifh.so.4  2B15EE53  MPI_Barrier_f08   Unknown  Unknown
libmpi_usempif08.  2ACE7732  mpi_barrier_f08_  Unknown  Unknown
helloWorld.mpi3.S  0040939F  Unknown   Unknown  Unknown
helloWorld.mpi3.S  0040915E  Unknown   Unknown  Unknown
libc-2.22.so   2BBF96D5  __libc_start_main Unknown  Unknown
helloWorld.mpi3.S  00409069  Unknown   Unknown  Unknown

On Sun, Jan 20, 2019 at 4:19 PM Howard Pritchard 
wrote:

> Hi Matt
>
> Definitely do not include the ucx option for an omnipath cluster.
> Actually if you accidentally installed ucx in it’s default location use on
> the system Switch to this config option
>
> —with-ucx=no
>
> Otherwise you will hit
>
> https://github.com/openucx/ucx/issues/750
>
> Howard
>
>
> Gilles Gouaillardet  schrieb am Sa. 19.
> Jan. 2019 um 18:41:
>
>> Matt,
>>
>> There are two ways of using PMIx
>>
>> - if you use mpirun, then the MPI app (e.g. the PMIx client) will talk
>> to mpirun and orted daemons (e.g. the PMIx server)
>> - if you use SLURM srun, then the MPI app will directly talk to the
>> PMIx server provided by SLURM. (note you might have to srun
>> --mpi=pmix_v2 or something)
>>
>> In the former case, it does not matter whether you use the embedded or
>> external PMIx.
>> In the latter case, Open MPI and SLURM have to use compatible PMIx
>> libraries, and you can either check the cross-version compatibility
>> matrix,
>> or build Open MPI with the same PMIx used by SLURM to be on the safe
>> side (not a bad idea IMHO).
>>
>>
>> Regarding the hang, I suggest you try different things
>> - use mpirun in a SLURM job (e.g. sbatch instead of salloc so mpirun
>> runs on a compute node rather than on a frontend node)
>> - try something even simpler such as mpirun hostname (both with sbatch
>> and salloc)
>> - explicitly specify the network to be used for the wire-up. you can
>> for example mpirun --mca oob_tcp_if_include 192.168.0.0/24 if this is
>> the network subnet by which all the nodes (e.g. compute nodes and
>> frontend node if you use salloc) communicate.
>>
>>
>> Cheers,
>>
>> Gilles
>>
>> On Sat, Jan 19, 2019 at 3:31 AM Matt Thompson