Re: [OMPI users] can not compile Openmpi-3.0

2018-04-02 Thread Gilles Gouaillardet
og here: https://drive.google.com/drive/u/0/folders/0B6O-L5Y7BiGJfmQ4N2FpblBEcFNxaDZnaGpsUFFEUlotVWFjajR0UFFHNk5aYlhoSHVTWkU On Tue, Apr 3, 2018 at 9:47 AM, Gilles Gouaillardet <gil...@rist.or.jp <mailto:gil...@rist.or.jp>> wrote: Can you please compress and attach your config.lo

Re: [OMPI users] can not compile Openmpi-3.0

2018-04-02 Thread Gilles Gouaillardet
Can you please compress and attach your config.log ? You might also want to double check you can compile *and* run a simple C hello world program Cheers, Gilles On 4/3/2018 1:06 PM, abhisek Mondal wrote: Hi, I need some help regarding compiling Openmpi-3.0. I have perfectly working C

Re: [OMPI users] can not compile Openmpi-3.0

2018-04-02 Thread Gilles Gouaillardet
PURPOSE. I have gcc and gcc-c++ installed. On Tue, Apr 3, 2018 at 10:08 AM, Gilles Gouaillardet <gil...@rist.or.jp <mailto:gil...@rist.or.jp>> wrote: This is the relevant part related to this error configure:6620: checking whether we are cross compiling configure:

Re: [OMPI users] running mpi program between my PC and an ARM-architektur raspberry

2018-04-03 Thread Gilles Gouaillardet
Let me shed a different light on that. Once in a while, I run Open MPI between x86_64 and sparcv9, and it works quite well as far as I am concerned. Note this is the master branch, and I never try older nor releases branches. Note you likely need to configure Open MPI with

Re: [OMPI users] OpenMPI slow with Infiniband

2018-03-21 Thread Gilles Gouaillardet
Supun, did you configure Open MPI with --disable-dlopen ? It was previously reported that this option disable the patcher (memory registration), which impacts performance negatively. If yes, then I suggest you reconfigure (and rebuild) without this option and see if it helps Cheers,

Re: [OMPI users] Memory Leak in 3.1.2 + UCX

2018-10-05 Thread Gilles Gouaillardet
Charles, are you saying that even if you mpirun --mca pml ob1 ... (e.g. force the ob1 component of the pml framework) the memory leak is still present ? As a side note, we strongly recommend to avoid configure --with-FOO=/usr instead configure --with-FOO should be used (otherwise you will end

Re: [OMPI users] OpenFabrics warning

2018-11-12 Thread Gilles Gouaillardet
Andrei, you can mpirun --mca btl ^openib ... in order to "disable" infiniband Cheers, Gilles On Mon, Nov 12, 2018 at 9:52 AM Andrei Berceanu wrote: > > The node has an IB card, but it is a stand-alone node, disconnected from the > rest of the cluster. > I am using OMPI to communicate

Re: [OMPI users] issue compiling openmpi 3.2.1 with pmi and slurm

2018-10-10 Thread Gilles Gouaillardet
I digged a bit the configury logic and found ./configure --prefix=/usr/local/ --with-cuda --with-slurm --with-pmi=/usr/local/slurm should do the trick, if not ./configure --prefix=/usr/local/ --with-cuda --with-slurm --with-pmi=/usr/local/slurm --with-pmi-libdir=/usr/local/slurm/lib64

Re: [OMPI users] Cannot run MPI code on multiple cores with PBS

2018-10-04 Thread Gilles Gouaillardet
In this case, some Open MPI plugins are missing some third party libraries, so you would have to ldd all the plugins (e.g. the .so files) located in /lib/openmpi in order to evidence any issue. Cheers, Gilles On Thu, Oct 4, 2018 at 4:34 PM John Hearns via users wrote: > > Michele one tip:

Re: [OMPI users] Fwd: Minimum time between MPI_Bcast or MPI_Reduce calls?

2019-01-18 Thread Gilles Gouaillardet
Jeff, that could be a copy/paste error and/or an email client issue. The syntax is mpirun --mca variable value ... (short hyphen, short hyphen, m, c, a) The error message is about the missing —-mca executable (long hyphen, short hyphen, m, c, a) This is most likely the root cause of this

Re: [OMPI users] Help Getting Started with Open MPI and PMIx and UCX

2019-01-19 Thread Gilles Gouaillardet
Matt, There are two ways of using PMIx - if you use mpirun, then the MPI app (e.g. the PMIx client) will talk to mpirun and orted daemons (e.g. the PMIx server) - if you use SLURM srun, then the MPI app will directly talk to the PMIx server provided by SLURM. (note you might have to srun

Re: [OMPI users] Unable to build Open MPI with external PMIx library support

2018-12-17 Thread Gilles Gouaillardet
Eduardo, By config.log, we mean the config.log automatically generated by your configure command (e.g. not the output of the configure command) this is a huge file, so please compress it Cheers, Gilles this file should start with This file contains any messages produced by compilers

Re: [OMPI users] questions about attribute caching

2018-12-15 Thread Gilles Gouaillardet
Hi, Your understanding is incorrect : "Attributes are local to the process and specific to the communicator to which they are attached." (per https://www.mcs.anl.gov/research/projects/mpi/mpi-standard/mpi-report-1.1/node119.htm) think an attribute is often a pointer, and really bad things can

Re: [OMPI users] Querying/limiting OpenMPI memory allocations

2018-12-20 Thread Gilles Gouaillardet
Adam, Are you using btl/tcp (e.g. plain TCP/IP) for internode communications ? Or are you using libfabric on top of the latest EC2 drivers ? There is no control flow in btl/tcp, which means for example if all your nodes send messages to rank 0, that can create a lot of unexpected messages on

Re: [OMPI users] Querying/limiting OpenMPI memory allocations

2018-12-20 Thread Gilles Gouaillardet
ranks at these >> points. So, does your comment about using the coll/sync module apply in >> this case? I'm not familiar with this module - is this something I specify >> at OpenMPI compile time or a runtime option that I enable? >> >> Thanks for the detailed help.

Re: [OMPI users] number of exchange messages

2018-12-12 Thread Gilles Gouaillardet
Hi, Open MPI does not do this out of the box. MPI profilers can achieve that. Some are commercials (ITAC from Intel, MAP from Allinea/ARM) and some are free (Score-P) An other option is to build your own mini-profiler with the PMPI interface. You can find an example at

Re: [OMPI users] singularity support

2018-12-12 Thread Gilles Gouaillardet
My understanding is that MPI tasks will be launched inside a singularity container. In a typical environment, mpirun spawns an instance of the orted on each node, and then each orted daemon (or mpirun on the local node) fork the MPI tasks (a.out) With singularity, orted would fork a

Re: [OMPI users] questions about attribute caching

2018-12-16 Thread Gilles Gouaillardet
e MPI_Comm_dup() once with the help of attribute, > instead of duplicating the MPI_COMM_WORLD whenever there is a sequential > execution. Am I right? > > Best wishes! > > Gilles Gouaillardet 于2018年12月16日周日 上午8:14写道: >> >> Hi, >> >> Your understanding is i

Re: [OMPI users] MPI_Reduce_Scatter Segmentation Fault with Intel 2019 Update 1 Compilers on OPA-1

2018-12-04 Thread Gilles Gouaillardet
Thanks for the report. As far as I am concerned, this is a bug in the IMB benchmark, and I issued a PR to fix that https://github.com/intel/mpi-benchmarks/pull/11 Meanwhile, you can manually download and apply the patch at https://github.com/intel/mpi-benchmarks/pull/11.patch Cheers,

Re: [OMPI users] MPI_Reduce_Scatter Segmentation Fault with Intel 2019 Update 1 Compilers on OPA-1

2018-12-04 Thread Gilles Gouaillardet
Thanks Mikhail, You have a good point. With the current semantic used in the IMB benchmark, this cannot be equivalent to MPI_Reduce() of N bytes followed by MPI_Scatterv() of N bytes. So this is indeed a semantical question : what should be a MPI_Reduce_scatter() of N bytes equivalent

Re: [OMPI users] OpenMPI2 + slurm

2018-11-23 Thread Gilles Gouaillardet
Lothar, it seems you did not configure Open MPI with --with-pmi= If SLURM was built with PMIx support, then an other option is to use that. First, srun --mpi=list will show you the list of available MPI modules, and then you could srun --mpi=pmix_v2 ... MPI_Hellow If you believe that should be

Re: [OMPI users] Building OpenMPI with Lustre support using PGI fails

2018-11-27 Thread Gilles Gouaillardet
Folks, sorry for the late follow-up. The config.log was indeed sent offline. Here is the relevant part : configure:294375: checking for required lustre data structures configure:294394: pgcc -O -DNDEBUG   -Iyes/include -c conftest.c PGC-S-0040-Illegal use of symbol, u_int64_t

Re: [OMPI users] Increasing OpenMPI RMA win attach region count.

2019-01-10 Thread Gilles Gouaillardet
Jeff, At first glance, a comment in the code suggests the rationale is to minimize the number of allocations and hence the time spent registering the memory. Cheers, Gilles Jeff Hammond wrote: >Why is this allocated statically? I dont understand the difficulty of a >dynamically allocates

Re: [OMPI users] Open MPI 4.0.0 - error with MPI_Send

2019-01-10 Thread Gilles Gouaillardet
Eduardo, You have two options to use OmniPath - “directly” via the psm2 mtl mpirun —mca pml cm —mca mtl psm2 ... - “indirectly” via libfabric mpirun —mca pml cm —mca mtl ofi ... I do invite you to try both. By explicitly requesting the mtl you will avoid potential conflicts. libfabric is used

Re: [OMPI users] Open MPI 4.0.0 - error with MPI_Send

2019-01-09 Thread Gilles Gouaillardet
Eduardo, The first part of the configure command line is for an install in /usr, but then there is ‘—prefix=/opt/openmpi/4.0.0’ and this is very fishy. You should also use ‘—with-hwloc=external’. How many nodes are you running on and which interconnect are you using ? What if you mpirun —mca pml

Re: [OMPI users] How do I build 3.1.0 (or later) with mellanox's libraries

2018-09-14 Thread Gilles Gouaillardet
Alan, Can you please compress and post your config.log ? My understanding of the mentioned commit is it does not build the reachable/netlink component if libnl version 1 is used (by third party libs such as mxm). I do not believe it should abort configure Cheers, Gilles On Saturday, September

Re: [OMPI users] rmaps_base_oversubscribe Option in Open MPI 4.0

2019-01-27 Thread Gilles Gouaillardet
Ben, This is a bug that will be fixed in 4.0.1 (it is already fixed in the v4.0.x branch) meanwhile, you can add rmaps_base_mapping_policy=numa:OVERSUBSCRIBE in your openmpi-mca-params.conf. Note the default policy is to bind to NUMA domain if there are more than two MPI tasks, and bind

Re: [OMPI users] OpenMPI 3 without network connection

2019-01-28 Thread Gilles Gouaillardet
Patrick, Does “no network is available” means the lo interface (localhost 127.0.0.1) is not even available ? Cheers, Gilles On Monday, January 28, 2019, Patrick Bégou < patrick.be...@legi.grenoble-inp.fr> wrote: > Hi, > > I fall in a strange problem with OpenMPI 3.1 installed on a CentOS7 >

Re: [OMPI users] stack overflow in alloca() for Java programs in openmpi-master with pgcc-18.4

2019-03-26 Thread Gilles Gouaillardet
Siegmar, Is this issue specific to the PGI compiler ? What if you ulimit -s before invoking mpirun, is that good enough to workaround the problem ? Cheers, Gilles On Tue, Mar 26, 2019 at 6:32 PM Siegmar Gross wrote: > > Hi, > > I've installed openmpi-v4.0.x-201903220241-97aa434 and >

Re: [OMPI users] Possible buffer overflow on Recv rank

2019-03-27 Thread Gilles Gouaillardet
Carlos, can you post a trimmed version of your code that evidences the issue ? Keep in mind that if you want to write MPI code that is correct with respect to the standard, you should assume MPI_Send() might block until a matching receive is posted. Cheers, Gilles Sent from my iPod > On

Re: [OMPI users] Issues compiling HPL with OMPIv4.0.0

2019-04-03 Thread Gilles Gouaillardet
Do not get fooled by the symlinks to opal_wrapper ! opal_wrapper checks how it is invoked (e.g. check argv[0] in main()) and the behavior is different if it is invoked as mpicc, mpiCC, mpifort and other If the error persists with mpicc, you can manually extract the mpicc command line, and

Re: [OMPI users] Using strace with Open MPI on Cray

2019-03-30 Thread Gilles Gouaillardet
Christoph, I do not know how to fix this, but here are some suggestions/thoughts - do you need the -f flag ? if not, just remote it - what if you mpirun strace -o /dev/null ... ? - if the former works, then you might want to redirect the strace output to a local file (mpirun wrapper.sh, in which

Re: [OMPI users] Network performance over TCP

2019-03-23 Thread Gilles Gouaillardet
t would >>> allow Open MPI to use multiple ports similar to what iperf is doing? >>> >>> Thanks. >>> -Adam >>> >>> On Mon, Jul 10, 2017 at 9:31 PM, Adam Sylvester >>> wrote: >>> >>>> Thanks again Gilles. Ahh, better yet

Re: [OMPI users] Building PMIx and Slurm support

2019-02-24 Thread Gilles Gouaillardet
this fix included in the PMIx 2.2.2 > https://github.com/pmix/pmix/releases/tag/v2.2.2 ? > > > > > All the best, > > > ____ > From: users on behalf of Gilles > Gouaillardet > Sent: Sunday, February 24, 2019 4:09 AM > To:

Re: [OMPI users] Building PMIx and Slurm support

2019-02-23 Thread Gilles Gouaillardet
ore Laboratory (KSL) > King Abdullah University of Science and Technology > Building 1, Al-Khawarizmi, Room 0123 > Mobile : +966 (0) 55-247-9568 > Mobile : +20 (0) 106-146-9644 > Office : +966 (0) 12-808-0367 > > > From: users on

Re: [OMPI users] Building PMIx and Slurm support

2019-03-03 Thread Gilles Gouaillardet
Daniel, On 3/4/2019 3:18 PM, Daniel Letai wrote: So unless you have a specific reason not to mix both, you might also give the internal PMIx a try. Does this hold true for libevent too? Configure complains if libevent for openmpi is different than the one used for the other tools. I am

Re: [OMPI users] Building PMIx and Slurm support

2019-03-03 Thread Gilles Gouaillardet
mon". Neither seem to pull from the correct path. > > > Regards, > > Dani_L. > > > On 2/24/19 3:09 AM, Gilles Gouaillardet wrote: > > Passant, > > you have to manually download and apply > https://github.com/pmix/pmix/commit/2e2f4445b45eac5a3fcbd409c81

Re: [OMPI users] Building PMIx and Slurm support

2019-03-03 Thread Gilles Gouaillardet
: Sent from my iPhone On 3 Mar 2019, at 16:31, Gilles Gouaillardet wrote: Daniel, PMIX_MODEX and PMIX_INFO_ARRAY have been removed from PMIx 3.1.2, and Open MPI 4.0.0 was not ready for this. You can either use the internal PMIx (3.0.2), or try 4.0.1rc1 (with the external PMIx 3.1.2

Re: [OMPI users] Building PMIx and Slurm support

2019-02-23 Thread Gilles Gouaillardet
Hi, PMIx has cross-version compatibility, so as long as the PMIx library used by SLURM is compatible with the one (internal or external) used by Open MPI, you should be fine. If you want to minimize the risk of cross-version incompatibility, then I encourage you to use the same (and hence

Re: [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI 3.1.3

2019-02-20 Thread Gilles Gouaillardet
5 1.10.4 "make check" problems w/OpenMPI 3.1.3 -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 This is on GPFS. I'll try it on XFS to see if it makes any difference. On 2/16/19 11:57 PM, Gilles Gouaillardet wrote: Ryan, What filesystem are you running on ? Open MPI defaults to the omp

Re: [OMPI users] MPI_Comm_spawn leads to pipe leak and other errors

2019-03-17 Thread Gilles Gouaillardet
FWIW I could observe some memory leaks on both mpirun and MPI task 0 with the latest master branch. So I guess mileage varies depending on available RAM and number of iterations. Sent from my iPod > On Mar 17, 2019, at 20:47, Riebs, Andy wrote: > > Thomas, your test case is somewhat similar

Re: [OMPI users] Building PMIx and Slurm support

2019-03-11 Thread Gilles Gouaillardet
4/19 8:28 AM, Gilles Gouaillardet wrote: Daniel, On 3/4/2019 3:18 PM, Daniel Letai wrote: So unless you have a specific reason not to mix both, you might also give the internal PMIx a try. Does this hold true for libevent too? Configure complains if libevent for openmpi is different than the

Re: [OMPI users] Building PMIx and Slurm support

2019-03-11 Thread Gilles Gouaillardet
with that? Before that, when we tried to launch MPI apps directly with srun, we got the error message saying Slurm missed the PMIx support, that's why we proceeded with the installation. All the best, -- Passant On Mar 12, 2019 6:53 AM, Gilles Gouaillardet wrote: Passant, I built a similar

Re: [OMPI users] local rank to rank comms

2019-03-11 Thread Gilles Gouaillardet
anding the logs) Cheers, Gilles On 3/12/2019 1:41 AM, Michael Di Domenico wrote: On Mon, Mar 11, 2019 at 12:09 PM Gilles Gouaillardet wrote: You can force mpirun --mca pml ob1 ... And btl/vader (shared memory) will be used for intra node communications ... unless MPI tasks are from dif

Re: [OMPI users] Building PMIx and Slurm support

2019-03-12 Thread Gilles Gouaillardet
and Technology Building 1, Al-Khawarizmi, Room 0123 Mobile : +966 (0) 55-247-9568 Mobile : +20 (0) 106-146-9644 Office : +966 (0) 12-808-0367 From: users on behalf of Gilles Gouaillardet Sent: Tuesday, March 12, 2019 8:22 AM To: users@lists.open-mpi.org

Re: [OMPI users] local rank to rank comms

2019-03-11 Thread Gilles Gouaillardet
Michael, You can mpirun --mca pml_base_verbose 10 --mca btl_base_verbose 10 --mca mtl_base_verbose 10 ... It might show that pml/cm and mtl/psm2 are used. In that case, then yes, the OmniPath library is used even for intra node communications. If this library is optimized for intra node,

Re: [OMPI users] Double free in collectives

2019-03-14 Thread Gilles Gouaillardet
Jeff, The first location is indeed in ompi_coll_libnbc_iallreduce() Lee Ann, thanks for the bug report, for the time being, can you please give the attached patch a try ? Cheers, Gilles FWIW NBC_Schedule_request() sets handle->tmpbuf = tmpbuf and call NBC_Start(handle, schedule)

Re: [OMPI users] What's the right approach to run a singleton MPI+OpenMP process

2019-02-16 Thread Gilles Gouaillardet
Simone, If you want to run a single MPI task, you can either - mpirun -np 1 ./a.out (this is the most standard option) - ./a.out (this is the singleton mode. Note a.out will fork an orted daemon under the hood, this is necessary for example if your app will MPI_Comm_spawn(). -

Re: [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI 3.1.3

2019-02-16 Thread Gilles Gouaillardet
k which tests are being run by that. > > > > Edgar > > > >> -----Original Message- > >> From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Gilles > >> Gouaillardet > >> Sent: Saturday, February 16, 2019 1:49 AM > >

Re: [OMPI users] Segfault with OpenMPI 4 and dynamic window

2019-02-17 Thread Gilles Gouaillardet
Thanks Bart, I opened https://github.com/open-mpi/ompi/issues/6394 to track this issue, and we should follow-up there from now. FWIW, I added a more minimal example, and a possible fix. Cheers, Gilles On 2/18/2019 12:43 AM, Bart Janssens wrote: I just tried on master (commit

Re: [OMPI users] v3.1.3 cannot suppress load errors

2019-02-11 Thread Gilles Gouaillardet
Jingchao, The error message is not coming from Open MPI but from the PMIx component. adding the following line in pmix-mca-params.conf should do the trick mca_base_component_show_load_errors=0 Cheers, Gilles On 2/12/2019 7:31 AM, Jingchao Zhang wrote: Hi, We have both psm and psm2

Re: [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI 3.1.3

2019-02-15 Thread Gilles Gouaillardet
Ryan, Can you export OMPI_MCA_io=^ompio and try again after you made sure this environment variable is passed by srun to the MPI tasks ? We have identified and fixed several issues specific to the (default) ompio component, so that could be a valid workaround until the next release. Cheers,

Re: [OMPI users] Segfault with OpenMPI 4 and dynamic window

2019-02-16 Thread Gilles Gouaillardet
Bart, It looks like a bug that involves the osc/rdma component. Meanwhile, you can mpirun --mca osc ^rdma ... Cheers, Gilles On Sat, Feb 16, 2019 at 8:43 PM b...@bartjanssens.org wrote: > > Hi, > > Running the following test code on two processes: > > #include > #include > #include > >

Re: [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI 3.1.3

2019-02-18 Thread Gilles Gouaillardet
BEGIN PGP SIGNED MESSAGE- Hash: SHA1 This is on GPFS. I'll try it on XFS to see if it makes any difference. On 2/16/19 11:57 PM, Gilles Gouaillardet wrote: Ryan, What filesystem are you running on ? Open MPI defaults to the ompio component, except on Lustre filesystem where ROMIO is

Re: [OMPI users] OpenMPI 3 without network connection

2019-01-28 Thread Gilles Gouaillardet
Patrick, The root cause is we do not include the localhost interface by default for OOB communications. You should be able to run with mpirun --mca oob_tcp_if_include lo -np 4 hostname Cheers, Gilles On 1/28/2019 11:02 PM, Patrick Bégou wrote: Hi, I fall in a strange problem with

Re: [OMPI users] OpenMPI 3 without network connection

2019-01-28 Thread Gilles Gouaillardet
will include this fix, meanwhile, you can either remove the virbr0 interface or use the workaround I previously described Cheers, Gilles On 1/29/2019 1:56 PM, Gilles Gouaillardet wrote: Patrick, The root cause is we do not include the localhost interface by default for OOB communications

Re: [OMPI users] Open MPI installation problem

2019-01-25 Thread Gilles Gouaillardet
Great point from David ! As a side note, you can configure --enable-mpirun-prefix-by-default ... && make install If you choose to do so, you will not have to set LD_LIBRARY_PATH since it will be "built in" the Open MPI binaries/libraries (via the -rpath linker option) Cheers, Gilles On Sat,

Re: [OMPI users] TCP usage in MPI singletons

2019-04-17 Thread Gilles Gouaillardet
Daniel, If your MPI singleton will never MPI_Comm_spawn(), then you can use the isolated mode like this OMPI_MCA_ess_singleton_isolated=true ./program You can also save some ports by blacklisting the btl/tcp component OMPI_MCA_ess_singleton_isolated=true OMPI_MCA_pml=ob1

Re: [OMPI users] Best way to send on mpi c, architecture dependent data type

2019-03-13 Thread Gilles Gouaillardet
Sergio, I think your best option here is to install an aarch64 (read 64bits ARM) distro on your raspberry pi 3. /* FWIW, the default raspberry distro is 32bits so it can run on all raspberry pi models */ If you cannot/do not wish to go that way, make sure Open MPI is built with

Re: [OMPI users] relocating an installation

2019-04-09 Thread Gilles Gouaillardet
Dave, Can you please post the configure command line of the Open MPI you are trying to relocate ? Cheers, Gilles Dave Love wrote: >Reuti writes: > >> export OPAL_PREFIX= >> >> to point it to the new location of installation before you start `mpiexec`. > >Thanks; that's now familiar, and I

Re: [OMPI users] undefined reference error related to ucx

2019-06-25 Thread Gilles Gouaillardet via users
Passant, UCX 1.6.0 is not yet officially released, and it seems Open MPI (4.0.1) does not support it yet, and some porting is needed. Cheers, Gilles On Tue, Jun 25, 2019 at 5:13 PM Passant A. Hafez via users wrote: > > Hello, > > > I'm trying to build ompi 4.0.1 with external ucx 1.6.0 but

Re: [OMPI users] error running mpirun command

2019-05-03 Thread Gilles Gouaillardet via users
Eric, which version of Open MPI are you using ? how many hosts in your hostsfile ? The error message suggests this could be a bug within Open MPI, and a potential workaround for you would be to try mpirun -np 84 - -hostfile hostsfile --mca routed direct ./openmpi_hello.c You might also want to

Re: [OMPI users] Possible bugs in MPI_Neighbor_alltoallv()

2019-06-27 Thread Gilles Gouaillardet via users
Thanks Junchao, I issued https://github.com/open-mpi/ompi/pull/6782 in order to fix this (and the alltoallw variant as well) Meanwhile, you can manually download and apply the patch at https://github.com/open-mpi/ompi/pull/6782.patch Cheers, Gilles On 6/28/2019 1:10 PM, Zhang,

Re: [OMPI users] How it the rank determined (Open MPI and Podman)

2019-07-11 Thread Gilles Gouaillardet via users
Adrian, the MPI application relies on some environment variables (they typically start with OMPI_ and PMIX_). The MPI application internally uses a PMIx client that must be able to contact a PMIx server (that is included in mpirun and the orted daemon(s) spawned on the remote hosts).

Re: [OMPI users] How it the rank determined (Open MPI and Podman)

2019-07-12 Thread Gilles Gouaillardet via users
t;--> Process # 0 of 2 is alive. ->test1 >>--> Process # 1 of 2 is alive. ->test2 >> >> I need to tell Podman to mount /tmp from the host into the container, as >> I am running rootless I also need to tell Podman to use the same user ID &

Re: [OMPI users] Problems with MPI_Comm_spawn

2019-07-02 Thread Gilles Gouaillardet via users
Thanks for the report, this is indeed a bug I fixed at https://github.com/open-mpi/ompi/pull/6790 meanwhile, you can manually download and apply the patch at https://github.com/open-mpi/ompi/pull/6790.patch Cheers, Gilles On 7/3/2019 1:30 AM, Gyevi-Nagy László via users wrote: Hi, I

Re: [OMPI users] Naming scheme of PSM2 and Vader shared memory segments

2019-07-07 Thread Gilles Gouaillardet via users
Sebastian, the PSM2 shared memory segment name is set by the PSM2 library and my understanding is that Open MPI has no control over it. If you believe the root cause of the crash is related to non unique PSM2 shared memory segment name, I guess you should report this at

Re: [OMPI users] fatal error: ac_nonexistent.h: No such file or directory (openmpi-4.0.0)

2019-04-20 Thread Gilles Gouaillardet via users
The root cause is configure cannot run a simple Fortran program (see the relevant log below) I suggest you export LD_LIBRARY_PATH=/share/apps/gcc-5.4.0/lib64:$LD_LIBRARY_PATH and then try again. Cheers, Gilles configure:44254: checking Fortran value of selected_int_kind(4) configure:44281:

Re: [OMPI users] 3.0.4, 4.0.1 build failure on OSX Mojave with LLVM

2019-04-24 Thread Gilles Gouaillardet via users
John, what if you move some parameters to CPPFLAGS and CXXCPPFLAGS (see the new configure command line below) Cheers, Gilles '/Users/cary/projects/ulixesall-llvm/builds/openmpi-4.0.1/nodl/../configure' \ --prefix=/Volumes/GordianStorage/opt/contrib-llvm7_appleclang/openmpi-4.0.1-nodl \

Re: [OMPI users] undefined reference error related to ucx

2019-06-25 Thread Gilles Gouaillardet via users
tps://github.com/openucx/ucx/issues/3336 that the UCX 1.6 might solve this issue, so I tried the pre-release version to just check if it will. All the best, -- Passant From: users on behalf of Gilles Gouaillardet via users Sent: Tuesday, June 25, 2019 11

Re: [OMPI users] Error with OpenMPI: Could not resolve generic procedure mpi_irecv

2019-08-19 Thread Gilles Gouaillardet via users
ch compiler did you use to build Open MPI that fails to build your >> test ? >> >> >> Cheers, >> >> Gilles >> >> On Mon, Aug 19, 2019 at 6:49 PM Gilles Gouaillardet >> wrote: >> > >> > Thanks, >> > >> > a

Re: [OMPI users] Error with OpenMPI: Could not resolve generic procedure mpi_irecv

2019-08-19 Thread Gilles Gouaillardet via users
gt; size = this%size_dim(this%gi)*this%size_dim(this%gj)*cs3 > if(this%is_exchange_off) then >call this%update_stats(size) >this%bf(:,:,1:cs3) = cmplx(0.,0.) > else >call MPI_Irecv(this%bf(:,:,1:cs3),size,MPI_COMPLEX_TYPE,& > t

Re: [OMPI users] Error with OpenMPI: Could not resolve generic procedure mpi_irecv

2019-08-19 Thread Gilles Gouaillardet via users
One more thing ... Your initial message mentioned a failure with gcc 8.2.0, but your follow-up message mentions LLVM compiler. So which compiler did you use to build Open MPI that fails to build your test ? Cheers, Gilles On Mon, Aug 19, 2019 at 6:49 PM Gilles Gouaillardet wrote: > >

Re: [OMPI users] Error with OpenMPI: Could not resolve generic procedure mpi_irecv

2019-08-19 Thread Gilles Gouaillardet via users
Hi, Can you please post a full but minimal example that evidences the issue? Also please post your Open MPI configure command line. Cheers, Gilles Sent from my iPod > On Aug 19, 2019, at 18:13, Sangam B via users > wrote: > > Hi, > > I get following error if the application is compiled

Re: [OMPI users] Error with OpenMPI: Could not resolve generic procedure mpi_irecv

2019-08-19 Thread Gilles Gouaillardet via users
Thanks, and your reproducer is ? Cheers, Gilles On Mon, Aug 19, 2019 at 6:42 PM Sangam B via users wrote: > > Hi, > > OpenMPI is configured as follows: > > export CC=`which clang` > export CXX=`which clang++` > export FC=`which flang` > export F90=`which flang` > > ../configure

Re: [OMPI users] When is it save to free the buffer after MPI_Isend?

2019-07-27 Thread Gilles Gouaillardet via users
Carlos, MPI_Isend() does not automatically frees the buffer after it sends the message. (it simply cannot do it since the buffer might be pointing to a global variable or to the stack). Can you please extract a reproducer from your program ? Out of curiosity, what if you insert an (useless)

Re: [OMPI users] OpenMPI 2.1.1 bug on Ubuntu 18.04.2 LTS

2019-08-01 Thread Gilles Gouaillardet via users
Juanchao, Is the issue related to https://github.com/open-mpi/ompi/pull/4501 ? Jeff, you might have to configure with --enable-heterogeneous to evidence the issue Cheers, Gilles On 8/2/2019 4:06 AM, Jeff Squyres (jsquyres) via users wrote: I am able to replicate the issue on a

Re: [OMPI users] OMPI was not built with SLURM's PMI support

2019-08-08 Thread Gilles GOUAILLARDET via users
Hi, You need to configure --with-pmi ... Cheers, Gilles On August 8, 2019, at 11:28 PM, Jing Gong via users wrote: Hi, Recently our Slurm system has been upgraded to 19.0.5. I tried to recompile openmpi v3.0 due to the bug reported in https://bugs.schedmd.com/show_bug.cgi?id=6993

Re: [OMPI users] How is the rank determined (Open MPI and Podman)

2019-07-22 Thread Gilles Gouaillardet via users
that Podman is running rootless. I will continue to investigate, but now I know where to look. Thanks! Adrian On Fri, Jul 12, 2019 at 06:48:59PM +0900, Gilles Gouaillardet via users wrote: Adrian, Can you try mpirun --mca btl_vader_copy_mechanism none ... Please double check the MCA

Re: [OMPI users] mpirun --output-filename behavior

2019-10-31 Thread Gilles Gouaillardet via users
Joseph, you can achieve this via an agent (and it works with DDT too) For example, the nostderr script below redirects each MPI task's stderr to /dev/null (so it is not forwarded to mpirun) $ cat nostderr #!/bin/sh exec 2> /dev/null exec "$@" and then you can simply $ mpirun --mca

Re: [OMPI users] MPI_Iallreduce with multidimensional Fortran array

2019-11-13 Thread Gilles Gouaillardet via users
Camille, your program is only valid with a MPI library that features |MPI_SUBARRAYS_SUPPORTED| and this is not (yet) the case in Open MPI. A possible fix is to use an intermediate contiguous buffer   integer, allocatable, dimension(:,:,:,:) :: tmp   allocate( tmp(N,N,N,N) ) and then

Re: [OMPI users] mca_oob_tcp_recv_handler: invalid message type: 15

2019-12-11 Thread Gilles Gouaillardet via users
Guido, This error message is from MPICH and not Open MPI. Make sure your environment is correct and the shared filesystem is mounted on the compute nodes. Cheers, Gilles Sent from my iPod > On Dec 12, 2019, at 1:44, Guido granda muñoz via users > wrote: > > Hi, > after following the

Re: [OMPI users] Parameters at run time

2019-10-20 Thread Gilles Gouaillardet via users
Raymond, In the case of UCX, you can mpirun --mca pml_base_verbose 10 ... If the pml/ucx component is used, then your app will run over UCX. If the pml/ob1 component is used, then you can mpirun --mca btl_base_verbose 10 ... btl/self should be used for communications to itself. if btl/uct

Re: [OMPI users] Deadlock in netcdf tests

2019-10-25 Thread Gilles Gouaillardet via users
Orion, thanks for the report. I can confirm this is indeed an Open MPI bug. FWIW, a workaround is to disable the fcoll/vulcan component. That can be achieved by mpirun --mca fcoll ^vulcan ... or OMPI_MCA_fcoll=^vulcan mpirun ... I also noted the tst_parallel3 program crashes with the

Re: [OMPI users] Program hangs when MPI_Bcast is called rapidly

2019-10-28 Thread Gilles Gouaillardet via users
Charles, unless you expect yes or no answers, can you please post a simple program that evidences the issue you are facing ? Cheers, Gilles On 10/29/2019 6:37 AM, Garrett, Charles via users wrote: Does anyone have any idea why this is happening?  Has anyone seen this problem before?

Re: [OMPI users] speed of model is slow with openmpi

2019-11-27 Thread Gilles Gouaillardet via users
Your gfortran command line strongly suggests your program is serial and does not use MPI at all. Consequently, mpirun will simply spawn 8 identical instances of the very same program, and no speed up should be expected (but you can expect some slow down and/or file corruption). If you

Re: [OMPI users] mpirun --output-filename behavior

2019-11-01 Thread Gilles GOUAILLARDET via users
via users wrote: Gilles, Thanks for your suggestions! I just tried both of them, see below: On 11/1/19 1:15 AM, Gilles Gouaillardet via users wrote: > Joseph, > > > you can achieve this via an agent (and it works with DDT too) > > > For example, the nostderr script

Re: [OMPI users] Optimized and portable Open MPI packaging in Guix

2019-12-20 Thread Gilles Gouaillardet via users
Ludovic, in order to figure out which interconnect is used, you can mpirun --mca pml_base_verbose 10 --mca mtl_base_verbose 10 --mca btl_base_verbose 10 ... the output might be a bit verbose, so here are a few tips on how to get it step by step first, mpirun --mca pml_base_verbose 10 ... in

Re: [OMPI users] Read from file performance degradation when increasing number of processors in some cases

2020-03-06 Thread Gilles Gouaillardet via users
Hi, The log filenames suggests you are always running on a single node, is that correct ? Do you create the input file on the tmpfs once for all? before each run? Can you please post your mpirun command lines? If you did not bind the tasks, can you try again mpirun --bind-to core ...

Re: [OMPI users] Read from file performance degradation whenincreasing number of processors in some cases

2020-03-06 Thread Gilles Gouaillardet via users
s on a resource: Bind to:CORE Node: compute-0 #processes: 2 #cpus: 1 You can override this protection by adding the "overload-allowed" option to your binding directive. — I will solve this and get back to you soon. Best regards, Al

Re: [OMPI users] OpenMPI 4.0.2 with PGI 19.10, will not build with hcoll

2020-01-25 Thread Gilles Gouaillardet via users
Thanks Jeff for the information and sharing the pointer. FWIW, this issue typically occurs when libtool pulls the -pthread flag from libhcoll.la that was compiled with a GNU compiler. The simplest workaround is to remove libhcoll.la (so libtool simply links with libhcoll.so and does not pull any

Re: [OMPI users] HELP: openmpi is not using the specified infiniband interface !!

2020-01-14 Thread Gilles Gouaillardet via users
Soporte, The error message is from MPICH! If you intend to use Open MPI, fix your environment first Cheers, Gilles Sent from my iPod > On Jan 15, 2020, at 7:53, SOPORTE MODEMAT via users > wrote: > > Hello everyone. > > I would like somebody help me to figure out how can I make that

Re: [OMPI users] file/process write speed is not scalable

2020-04-09 Thread Gilles Gouaillardet via users
Note there could be some NUMA-IO effect, so I suggest you compare running every MPI tasks on socket 0, to running every MPI tasks on socket 1 and so on, and then compared to running one MPI task per socket. Also, what performance do you measure? - Is this something in line with the

Re: [OMPI users] Hwlock library problem

2020-04-15 Thread Gilles Gouaillardet via users
that the > "-Wl," before "-force_load” might be a workable solution, but I don’t > understand what to change in order to accomplish this. Might you have some > suggestions. > > > > > > On Apr 15, 2020, at 0:06, Gilles Gouaillardet via users > wrote: >

Re: [OMPI users] Hwlock library problem

2020-04-15 Thread Gilles Gouaillardet via users
Sorry for your trouble. > > > > > On Apr 16, 2020, at 11:49, Gilles Gouaillardet via users > > wrote: > > > > Paul, > > > > My ifort eval license on OSX has expired so I cannot test myself, > > sorry about that. > > > > It has been rep

Re: [OMPI users] Hwlock library problem

2020-04-15 Thread Gilles Gouaillardet via users
g. > link_static_flag="" > > After running “make clean”, I ran make again and found the same error. Any > ideas? > > FCLD libmpi_usempif08.la > ld: library not found for -lhwloc > make[2]: *** [libmpi_usempif08.la] Error 1 > make[1]: *** [all-recursive] Error

Re: [OMPI users] Slow collective MPI File IO

2020-04-06 Thread Gilles GOUAILLARDET via users
Collin, Do you have any data to backup your claim? As long as MPI-IO is used to perform file I/O, the Fortran bindings overhead should be hardly noticeable. Cheers, Gilles On April 6, 2020, at 23:22, Collin Strassburger via users wrote: Hello,   Just a quick comment on this; is your

Re: [OMPI users] Slow collective MPI File IO

2020-04-06 Thread Gilles Gouaillardet via users
David, I suggest you rely on well established benchmarks such as IOR or iozone. As already pointed by Edgar, you first need to make sure you are not benchmarking your (memory) cache by comparing the bandwidth you measure vs the performance you can expect from your hardware. As a side note,

Re: [OMPI users] How to prevent linking in GPFS when it is present

2020-03-29 Thread Gilles Gouaillardet via users
Jonathon, GPFS is used by both the ROMIO component (that comes from MPICH) and the fs/gpfs component that is used by ompio (native Open MPI MPI-IO so to speak). you should be able to disable both by running ac_cv_header_gpfs_h=no configure --without-gpfs ... Note that Open MPI is modular

Re: [OMPI users] OMPI v2.1.5 with Slurm

2020-04-21 Thread Gilles Gouaillardet via users
Levi, as a workaround, have you tried using mpirun instead of direct launch (e.g. srun) ? Note you are using pmix 1.2.5, so you likely want to srun --mpi=pmix_v1 Also, as reported by the logs 1. [nodeA:12838] OPAL ERROR: Error in file pmix3x_client.c at line 112 there is something

<    4   5   6   7   8   9   10   11   >