Re: [OMPI users] Another OpenMPI 5.0.1. Installation Failure

2024-04-21 Thread Gilles Gouaillardet via users
Hi, Is there any reason why you do not build the latest 5.0.2 package? Anyway, the issue could be related to an unknown filesystem. Do you get a meaningful error if you manually run /.../test/util/opal_path_nfs? If not, can you share the output of mount | cut -f3,5 -d' ' Cheers, Gilles On

Re: [OMPI users] "MCW rank 0 is not bound (or bound to all available processors)" when running multiple jobs concurrently

2024-04-15 Thread Gilles Gouaillardet via users
Greg, If Open MPI was built with UCX, your jobs will likely use UCX (and the shared memory provider) even if running on a single node. You can mpirun --mca pml ob1 --mca btl self,sm ... if you want to avoid using UCX. What is a typical mpirun command line used under the hood by your "make test"?

Re: [OMPI users] Subject: Clarification about mpirun behavior in Slurm jobs

2024-02-24 Thread Gilles Gouaillardet via users
Christopher, I do not think Open MPI explicitly asks SLURM which cores have been assigned on each node. So if you are planning to run multiple jobs on the same node, your best bet is probably to have SLURM use cpusets. Cheers, Gilles On Sat, Feb 24, 2024 at 7:25 AM Christopher Daley via users

Re: [OMPI users] Seg error when using v5.0.1

2024-01-31 Thread Gilles Gouaillardet via users
Hi, please open an issue on GitHub at https://github.com/open-mpi/ompi/issues and provide the requested information. If the compilation failed when configured with --enable-debug, please share the logs. the name of the WRF subroutine suggests the crash might occur in MPI_Comm_split(), if so,

Re: [OMPI users] OpenMPI 5.0.1 Installation Failure

2024-01-26 Thread Gilles Gouaillardet via users
Hi, Please open a GitHub issue at https://github.com/open-ompi/ompi/issues and provide the requested information Cheers, Gilles On Sat, Jan 27, 2024 at 12:04 PM Kook Jin Noh via users < users@lists.open-mpi.org> wrote: > Hi, > > > > I’m installing OpenMPI 5.0.1 on Archlinux 6.7.1. Everything

Re: [OMPI users] Binding to thread 0

2023-09-08 Thread Gilles Gouaillardet via users
Luis, you can pass the --bind-to hwthread option in order to bind on the first thread of each core Cheers, Gilles On Fri, Sep 8, 2023 at 8:30 PM Luis Cebamanos via users < users@lists.open-mpi.org> wrote: > Hello, > > Up to now, I have been using numerous ways of binding with wrappers >

Re: [OMPI users] MPI_Init_thread error

2023-07-25 Thread Gilles Gouaillardet via users
Aziz, When using direct run (e.g. srun), OpenMPI has to interact with SLURM. This is typically achieved via PMI2 or PMIx You can srun --mpi=list to list the available options on your system if PMIx is available, you can srun --mpi=pmix ... if only PMI2 is available, you need to make sure Open

Re: [OMPI users] libnuma.so error

2023-07-19 Thread Gilles Gouaillardet via users
Luis, That can happen if a component is linked with libnuma.so: Open MPI will fail to open it and try to fallback on an other one. You can run ldd on the mca_*.so components in the /.../lib/openmpi directory to figure out which is using libnuma.so and assess if it is needed or not. Cheers,

Re: [OMPI users] OpenMPI crashes with TCP connection error

2023-06-16 Thread Gilles Gouaillardet via users
Kurt, I think Joachim was also asking for the command line used to launch your application. Since you are using Slurm and MPI_Comm_spawn(), it is important to understand whether you are using mpirun or srun FWIW, --mpi=pmix is a srun option. you can srun --mpi=list to find the available

Re: [OMPI users] Issue with Running MPI Job on CentOS 7

2023-05-31 Thread Gilles Gouaillardet via users
Open MPI 1.6.5 is an antique version and you should not expect any support with it. Instead, I suggest you try the latest one, rebuild your app and try again. FWIW, that kind of error occurs when the MPI library does not match mpirun That can happen when mpirun and libmpi.so come from different

Re: [OMPI users] psec warning when launching with srun

2023-05-20 Thread Gilles Gouaillardet via users
Christof, Open MPI switching to the internal PMIx is a bug I addressed in https://github.com/open-mpi/ompi/pull/11704 Feel free to manually download and apply the patch, you will then need recent autotools and run ./autogen.pl --force An other option is to manually edit the configure file Look

Re: [OMPI users] Issue with unexpected IP address in OpenMPI

2023-03-27 Thread Gilles Gouaillardet via users
Todd, Similar issues were also reported when there is Network Translation (NAT) between hosts, and that occured when using kvm/qemu virtual machine running on the same host. First you need to list the available interfaces on both nodes. Then try to restrict to a single interface that is

Re: [OMPI users] What is the best choice of pml and btl for intranode communication

2023-03-05 Thread Gilles Gouaillardet via users
Arun, First Open MPI selects a pml for **all** the MPI tasks (for example, pml/ucx or pml/ob1) Then, if pml/ob1 ends up being selected, a btl component (e.g. btl/uct, btl/vader) is used for each pair of MPI tasks (tasks on the same node will use btl/vader, tasks on different nodes will use

Re: [OMPI users] Open MPI 4.0.3 outside as well as inside a SimpleFOAM container: step creation temporarily disabled, retrying Requested nodes are busy

2023-02-28 Thread Gilles Gouaillardet via users
Rob, Do you invoke mpirun from **inside** the container? IIRC, mpirun is generally invoked from **outside** the container, could you try this if not already the case? The error message is from SLURM, so this is really a SLURM vs singularity issue. What if you srun -N 2 -n 2 hostname

Re: [OMPI users] ucx configuration

2023-01-11 Thread Gilles Gouaillardet via users
You can pick one test, make it standalone, and open an issue on GitHub. How does (vanilla) Open MPI compare to your vendor Open MPI based library? Cheers, Gilles On Wed, Jan 11, 2023 at 10:20 PM Dave Love via users < users@lists.open-mpi.org> wrote: > Gilles Gouaillardet via user

Re: [OMPI users] ucx configuration

2023-01-07 Thread Gilles Gouaillardet via users
Dave, If there is a bug you would like to report, please open an issue at https://github.com/open-mpi/ompi/issues and provide all the required information (in this case, it should also include the UCX library you are using and how it was obtained or built). Cheers, Gilles On Fri, Jan 6, 2023

Re: [OMPI users] Question about "mca" parameters

2022-11-29 Thread Gilles Gouaillardet via users
Hi, Simply add btl = tcp,self If the openib error message persists, try also adding osc_rdma_btls = ugni,uct,ucp or simply osc = ^rdma Cheers, Gilles On 11/29/2022 5:16 PM, Gestió Servidors via users wrote: Hi, If I run “mpirun --mca btl tcp,self --mca allow_ib 0 -n 12

Re: [OMPI users] CephFS and striping_factor

2022-11-28 Thread Gilles Gouaillardet via users
Hi Eric, Currently, Open MPI does not provide specific support for CephFS. MPI-IO is either implemented by ROMIO (imported from MPICH, it does not support CephFS today) or the "native" ompio component (that also does not support CephFS today). A proof of concept for CephFS in ompio might

Re: [OMPI users] Run on dual-socket system

2022-11-26 Thread Gilles Gouaillardet via users
Arham, It should be balanced: the default mapping is to allocate NUMA packages round robin. you can mpirun --report-bindings -n 28 true to have Open MPI report the bindings or mpirun --tag-output -n 28 grep Cpus_allowed_list /proc/self/status to have each task report which physical cpu it is

Re: [OMPI users] users Digest, Vol 4818, Issue 1

2022-11-14 Thread Gilles Gouaillardet via users
;machinefile" (vs. a copy-and-pasted "em dash")? -- Jeff Squyres jsquy...@cisco.com From: users on behalf of Gilles Gouaillardet via users Sent: Sunday, November 13, 2022 9:18 PM To: Open MPI Users Cc: Gilles Gouaillardet Subject: Re: [OMPI users

Re: [OMPI users] [OMPI devel] There are not enough slots available in the system to satisfy the 2, slots that were requested by the application

2022-11-13 Thread Gilles Gouaillardet via users
There is a typo in your command line. You should use --mca (minus minus) instead of -mca Also, you can try --machinefile instead of -machinefile Cheers, Gilles There are not enough slots available in the system to satisfy the 2 slots that were requested by the application: –mca On Mon, Nov

Re: [OMPI users] lots of undefined symbols compiling a hello-world

2022-11-05 Thread Gilles Gouaillardet via users
Chris, Did you double check libopen-rte.so.40 and libopen-pal.so.40 are installed in /mnt/software/o/openmpi/4.1.4-ct-test/lib? If they are not present, it means your install is busted and you should try to reinstall it. Cheers, Gilles On Sat, Nov 5, 2022 at 3:42 AM Chris Taylor via users <

Re: [OMPI users] ifort and openmpi

2022-09-15 Thread Gilles Gouaillardet via users
Volker, https://ntq1982.github.io/files/20200621.html (mentioned in the ticket) suggests that patching the generated configure file can do the trick. We already patch the generated configure file in autogen.pl (if the patch_autotools_output subroutine), so I guess that could be enhanced to

Re: [OMPI users] Hardware topology influence

2022-09-13 Thread Gilles Gouaillardet via users
Lucas, the number of MPI tasks started by mpirun is either - explicitly passed via the command line (e.g. mpirun -np 2306 ...) - equals to the number of available slots, and this value is either a) retrieved from the resource manager (such as a SLURM allocation) b) explicitly set in a

Re: [OMPI users] Using MPI in Toro unikernel

2022-07-24 Thread Gilles Gouaillardet via users
Matias, Assuming you run one MPI task per unikernel, and two unikernels share nothing, it means that inter-node communication cannot be performed via shared memory or kernel feature (such as xpmem or knem). That also implies communication are likely using the loopback interface which is much

Re: [OMPI users] Intercommunicator issue (any standard about communicator?)

2022-06-24 Thread Gilles Gouaillardet via users
> cout << "intercommunicator interClient=" << interClient << endl; > > After connection from a third party client it returns "c403" (in hex). > > Both 8406 and c403 are negative integer in dec. > > I don't know if it is "norm

Re: [OMPI users] OpenMPI and names of the nodes in a cluster

2022-06-24 Thread Gilles Gouaillardet via users
Sorry if I did not make my intent clear. I was basically suggesting to hack the Open MPI and PMIx wrappers to hostname() and remove the problematic underscores to make the regx components a happy panda again. Cheers, Gilles - Original Message - > I think the files suggested by Gilles

Re: [OMPI users] OpenMPI and names of the nodes in a cluster

2022-06-16 Thread Gilles Gouaillardet via users
Patrick, you will likely also need to apply the same hack to opal_net_get_hostname() in opal/util/net.c Cheers, Gilles On Thu, Jun 16, 2022 at 7:30 PM Gilles Gouaillardet < gilles.gouaillar...@gmail.com> wrote: > Patrick, > > I am not sure Open MPI can do that out of the

Re: [OMPI users] OpenMPI and names of the nodes in a cluster

2022-06-16 Thread Gilles Gouaillardet via users
Patrick, I am not sure Open MPI can do that out of the box. Maybe hacking pmix_net_get_hostname() in opal/mca/pmix/pmix3x/pmix/src/util/net.c can do the trick. Cheers, Gilles On Thu, Jun 16, 2022 at 4:24 PM Patrick Begou via users < users@lists.open-mpi.org> wrote: > Hi all, > > we are

Re: [OMPI users] mpirun hangs on m1 mac w openmpi-4.1.3

2022-05-05 Thread Gilles Gouaillardet via users
Scott, I am afraid this test is inconclusive since stdout is processed by mpirun. What if you mpirun -np 1 touch /tmp/xyz abort (since it will likely hang) and ls -l /tmp/xyz In my experience on mac, this kind of hangs can happen if you are running a firewall and/or the IP of your host does

Re: [OMPI users] mpi-test-suite shows errors on openmpi 4.1.x

2022-05-03 Thread Gilles Gouaillardet via users
Alois, Thanks for the report. FWIW, I am not seeing any errors on my Mac with Open MPI from brew (4.1.3) How many MPI tasks are you running? Can you please confirm you can evidence the error with mpirun -np ./mpi_test_suite -d MPI_TYPE_MIX_ARRAY -c 0 -t collective Also, can you try the same

Re: [OMPI users] Help diagnosing MPI+OpenMP application segmentation fault only when run with --bind-to none

2022-04-22 Thread Gilles Gouaillardet via users
You can first double check you MPI_Init_thread(..., MPI_THREAD_MULTIPLE, ...) And the provided level is MPI_THREAD_MULTIPLE as you requested. Cheers, Gilles On Fri, Apr 22, 2022, 21:45 Angel de Vicente via users < users@lists.open-mpi.org> wrote: > Hello, > > I'm running out of ideas, and

Re: [OMPI users] help with M1 chip macOS openMPI installation

2022-04-21 Thread Gilles Gouaillardet via users
Cici, I do not think the Intel C compiler is able to generate native code for the M1 (aarch64). The best case scenario is it would generate code for x86_64 and then Rosetta would be used to translate it to aarch64 code, and this is a very downgraded solution. So if you really want to stick to

Re: [OMPI users] Is there a MPI routine that returns the value of "npernode" being used?

2022-04-02 Thread Gilles Gouaillardet via users
Ernesto, Not directly. But you can use MPI_Comm_split_type(..., MPI_COMM_TYPE_SHARED, ...) and then MPI_Comm_size(...) on the "returned" communicator. Cheers, Gilles On Sun, Apr 3, 2022 at 5:52 AM Ernesto Prudencio via users < users@lists.open-mpi.org> wrote: > Thanks, > > > > Ernesto. > >

Re: [OMPI users] 101 question on MPI_Bcast()

2022-04-02 Thread Gilles Gouaillardet via users
Ernesto, MPI_Bcast() has no barrier semantic. It means the root rank can return after the message is sent (kind of eager send) and before it is received by other ranks. Cheers, Gilles On Sat, Apr 2, 2022, 09:33 Ernesto Prudencio via users < users@lists.open-mpi.org> wrote: > I have an

Re: [OMPI users] Need help for troubleshooting OpenMPI performances

2022-03-24 Thread Gilles Gouaillardet via users
Patrick, In the worst case scenario, requiring MPI_THREAD_MULTIPLE support can disable some fast interconnect and make your app fallback on IPoIB or similar. And in that case, Open MPI might prefer a suboptimal IP network which can impact the overall performances even more. Which threading

Re: [OMPI users] [Ext] Re: Call to MPI_Allreduce() returning value 15

2022-03-14 Thread Gilles Gouaillardet via users
Ernesto. > > > > *From:* users *On Behalf Of *Gilles > Gouaillardet via users > *Sent:* Monday, March 14, 2022 2:22 AM > *To:* Open MPI Users > *Cc:* Gilles Gouaillardet > *Subject:* Re: [OMPI users] [Ext] Re: Call to MPI_Allreduce() returning > value 15 > >

Re: [OMPI users] [Ext] Re: Call to MPI_Allreduce() returning value 15

2022-03-14 Thread Gilles Gouaillardet via users
is not an issue on situation 3, when I explicitly point the >runtime mpi to be 4.0.3 compiled with INTEL (even though I compiled the >application and openmpi 4.1.2 with GNU, and I link the application with >openmpi 4.1.2) > > > > Best, > > > > Ernesto. &g

Re: [OMPI users] [Ext] Re: Call to MPI_Allreduce() returning value 15

2022-03-14 Thread Gilles Gouaillardet via users
Ernesto, the coll/tuned module (that should handle collective subroutines by default) has a known issue when matching but non identical signatures are used: for example, one rank uses one vector of n bytes, and an other rank uses n bytes. Is there a chance your application might use this pattern?

Re: [OMPI users] Trouble compiling OpenMPI with Infiniband support

2022-02-17 Thread Gilles Gouaillardet via users
Angel, Infiniband detection likely fails before checking expanded verbs. Please compress and post the full configure output Cheers, Gilles On Fri, Feb 18, 2022 at 12:02 AM Angel de Vicente via users < users@lists.open-mpi.org> wrote: > Hi, > > I'm trying to compile the latest OpenMPI version

Re: [OMPI users] libmpi_mpifh.so.40 - error

2022-01-30 Thread Gilles Gouaillardet via users
Hari, What does ldd solver.exe (or whatever your clever exe file is called) reports? Cheers, Gilles On Mon, Jan 31, 2022 at 2:09 PM Hari Devaraj via users < users@lists.open-mpi.org> wrote: > Hello, > > I am trying to run a FEA solver exe file. > I get this error message: > > error while

Re: [OMPI users] RES: OpenMPI - Intel MPI

2022-01-27 Thread Gilles Gouaillardet via users
Thanks Ralph, Now I get what you had in mind. Strictly speaking, you are making the assumption that Open MPI performance matches the system MPI performances. This is generally true for common interconnects and/or those that feature providers for libfabric or UCX, but not so for "exotic"

Re: [OMPI users] RES: OpenMPI - Intel MPI

2022-01-26 Thread Gilles Gouaillardet via users
al MPI implementations may > simplify the mechanics of setting it up, but it is definitely not required. > > > On Jan 26, 2022, at 8:55 PM, Gilles Gouaillardet via users < > users@lists.open-mpi.org> wrote: > > Brian, > > FWIW > > Keep in mind that when

Re: [OMPI users] RES: OpenMPI - Intel MPI

2022-01-26 Thread Gilles Gouaillardet via users
Brian, FWIW Keep in mind that when running a container on a supercomputer, it is generally recommended to use the supercomputer MPI implementation (fine tuned and with support for the high speed interconnect) instead of the one of the container (generally a vanilla MPI with basic support for TCP

Re: [OMPI users] Creating An MPI Job from Procs Launched by a Different Launcher

2022-01-25 Thread Gilles Gouaillardet via users
You need a way for your process to exchange information so MPI_Init() can work. One option is to have your custom launcher implement a PMIx server https://pmix.github.io If you choose this path, you will likely want to use the Open PMIx reference implementation https://openpmix.github.io

Re: [OMPI users] Open MPI + Slurm + lmod

2022-01-25 Thread Gilles Gouaillardet via users
gives the error > mentioned below. > > Best, > Matthias > > Am 25.01.22 um 07:17 schrieb Gilles Gouaillardet via users: > > Matthias, > > > > do you run the MPI application with mpirun or srun? > > > > The error log suggests you are using srun, and SL

Re: [OMPI users] Open MPI + Slurm + lmod

2022-01-24 Thread Gilles Gouaillardet via users
Matthias, do you run the MPI application with mpirun or srun? The error log suggests you are using srun, and SLURM only provides only PMI support. If this is the case, then you have three options: - use mpirun - rebuild Open MPI with PMI support as Ralph previously explained - use SLURM PMIx:

Re: [OMPI users] unexpected behavior when combining MPI_Gather and MPI_Type_vector

2021-12-16 Thread Gilles Gouaillardet via users
rank, n, m, v_glob); and also resize rtype so the second element starts at v_glob[3][0] => upper bound = (3*sizeof(int)) By the way, since this question is not Open MPI specific, sites such as Stack Overflow are a better fit. Cheers, Gilles On Thu, Dec 16, 2021 at 6:46 PM Gilles

Re: [OMPI users] unexpected behavior when combining MPI_Gather and MPI_Type_vector

2021-12-16 Thread Gilles Gouaillardet via users
Jonas, Assuming v_glob is what you expect, you will need to `MPI_Type_create_resized_type()` the received type so the block received from process 1 will be placed at the right position (v_glob[3][1] => upper bound = ((4*3+1) * sizeof(int)) Cheers, Gilles On Thu, Dec 16, 2021 at 6:33 PM Jonas

Re: [OMPI users] Reserving slots and filling them after job launch with MPI_Comm_spawn

2021-11-03 Thread Gilles Gouaillardet via users
Kurt, Assuming you built Open MPI with tm support (default if tm is detected at configure time, but you can configure --with-tm to have it abort if tm support is not found), you should not need to use a hostfile. As a workaround, I would suggest you try to mpirun --map-by node -np 21 ...

Re: [OMPI users] Newbie Question.

2021-11-01 Thread Gilles Gouaillardet via users
Hi Ben, have you tried export OMPI_MCA_common_ucx_opal_mem_hooks=1 Cheers, Gilles On Mon, Nov 1, 2021 at 9:22 PM bend linux4ms.net via users < users@lists.open-mpi.org> wrote: > Ok, I a am newbie supporting the a HPC project and learning about MPI. > > I have the following portion of a

Re: [OMPI users] Cannot build working Open MPI 4.1.1 with NAG Fortran/clang on macOS (but I could before!)

2021-10-28 Thread Gilles Gouaillardet via users
Matt, did you build the same Open MPI 4.1.1 from an official tarball with the previous NAG Fortran? did you run autogen.pl (--force) ? Just to be sure, can you rerun the same test with the previous NAG version? When using static libraries, you can try manually linking with -lopen-orted-mpir

Re: [OMPI users] OpenMPI 4.1.1, CentOS 7.9, nVidia HPC-SDk, build hints?

2021-09-30 Thread Gilles Gouaillardet via users
Carl, I opened https://github.com/open-mpi/ompi/issues/9444 to specifically track the issue related to the op/avx component TL;DR nvhpc compilers can compile AVX512 intrinsics (so far so good), but do not define at least one of these macros __AVX512BW__ __AVX512F__ __AVX512VL__ and Open MPI is

Re: [OMPI users] OpenMPI 4.1.1, CentOS 7.9, nVidia HPC-SDk, buildhints?

2021-09-30 Thread Gilles Gouaillardet via users
Ray, there is a typo, the configure option is --enable-mca-no-build=op-avx Cheers, Gilles - Original Message - Added --enable-mca-no-build=op-avx to the configure line. Still dies in the same place. CCLD mca_op_avx.la

Re: [OMPI users] OpenMPI 4.1.1, CentOS 7.9, nVidia HPC-SDk, build hints?

2021-09-29 Thread Gilles Gouaillardet via users
Ray, note there is a bug in nvc compilers since 21.3 (it has been reported and is documented at https://github.com/open-mpi/ompi/issues/9402) For the time being, I suggest you use gcc, g++ and nvfortran FWIW, the AVX2 issue is likely caused by nvc **not** defining some macros (that are both

Re: [OMPI users] (Fedora 34, x86_64-pc-linux-gnu, openmpi-4.1.1.tar.gz): PML ucx cannot be selected

2021-09-13 Thread Gilles Gouaillardet via users
ly for preliminary > > testing, and does not merit re-enabling these providers in > > this notebook. > > > > Thank you very much for the clarification. > > > > Regards, > > Jorge. > > > > - Mensaje original - > >> De: "Gilles Gouaillar

Re: [OMPI users] cross-compilation documentation seems to be missing

2021-09-07 Thread Gilles Gouaillardet via users
Hi Jeff, Here is a sample file I used some times ago (some definitions might be missing though ...) In order to automatically generate this file - this is a bit of an egg and the chicken problem - you can run configure -c on the RISC-V node. It will generate a config.cache file. Then

Re: [OMPI users] OpenMPI-4.0.5 and MPI_spawn

2021-08-26 Thread Gilles Gouaillardet via users
],0]) is on host: unknown!   BTLs attempted: vader tcp self Your MPI job is now going to abort; sorry. [fsc08:465159] [[45369,2],27] ORTE_ERROR_LOG: Unreachable in file dpm/dpm.c at line 493 On Thu, 2021-08-26 at 14:30 +0900, Gilles Gouaillardet via users wrote: Franco, I am surprised UCX

Re: [OMPI users] OpenMPI-4.0.5 and MPI_spawn

2021-08-25 Thread Gilles Gouaillardet via users
Franco, I am surprised UCX gets selected since there is no Infiniband network. There used to be a bug that lead UCX to be selected on shm/tcp systems, but it has been fixed. You might want to give a try to the latest versions of Open MPI (4.0.6 or 4.1.1) Meanwhile, try to mpirun --mca pml ^ucx

Re: [OMPI users] vectorized reductions

2021-07-20 Thread Gilles Gouaillardet via users
to evaluate them and report the performance numbers. On 7/20/2021 11:00 PM, Dave Love via users wrote: Gilles Gouaillardet via users writes: One motivation is packaging: a single Open MPI implementation has to be built, that can run on older x86 processors (supporting only SSE) and the latest

Re: [OMPI users] vectorized reductions

2021-07-19 Thread Gilles Gouaillardet via users
One motivation is packaging: a single Open MPI implementation has to be built, that can run on older x86 processors (supporting only SSE) and the latest ones (supporting AVX512). The op/avx component will select at runtime the most efficient implementation for vectorized reductions. On Mon, Jul

Re: [OMPI users] how to suppress "libibverbs: Warning: couldn't load driver ..." messages?

2021-06-23 Thread Gilles Gouaillardet via users
Hi Jeff, Assuming you did **not** explicitly configure Open MPI with --disable-dlopen, you can try mpirun --mca pml ob1 --mca btl vader,self ... Cheers, Gilles On Thu, Jun 24, 2021 at 5:08 AM Jeff Hammond via users < users@lists.open-mpi.org> wrote: > I am running on a single node and do not

Re: [OMPI users] (Fedora 34, x86_64-pc-linux-gnu, openmpi-4.1.1.tar.gz): PML ucx cannot be selected

2021-05-28 Thread Gilles Gouaillardet via users
Jorge, pml/ucx used to be selected when no fast interconnect were detected (since ucx provides driver for both TCP and shared memory). These providers are now disabled by default, so unless your machine has a supported fast interconnect (such as Infiniband), pml/ucx cannot be used out of the box

Re: [OMPI users] [EXTERNAL] Linker errors in Fedora 34 Docker container

2021-05-25 Thread Gilles Gouaillardet via users
Howard, I have a recollection of a similar issue that only occurs with the latest flex (that requires its own library to be passed to the linker). I cannot remember if this was a flex packaging issue, or if we ended up recommending to downgrade flex to a known to work version. The issue

Re: [OMPI users] Stable and performant openMPI version for Ubuntu20.04?

2021-04-08 Thread Gilles Gouaillardet via users
takes place. > I also request you to plan for an early 4.1.1rc2 release at least by June 2021. > > With Regards, > S. Biplab Raut > > -Original Message- > From: Gilles Gouaillardet > Sent: Thursday, April 1, 2021 8:31 AM > To: Raut, S Biplab > Subject:

Re: [OMPI users] Building Open-MPI with Intel C

2021-04-07 Thread Gilles Gouaillardet via users
Michael, orted is able to find its dependencies to the Intel runtime on the host where you sourced the environment. However, it is unlikely able to do it on a remote host For example ssh ... ldd `which opted` will likely fail. An option is to use -rpath (and add the path to the Intel runtime).

Re: [OMPI users] HWLOC icc error

2021-03-23 Thread Gilles Gouaillardet via users
Luis, this file is never compiled when an external hwloc is used. Please open a github issue and include all the required information Cheers, Gilles On Tue, Mar 23, 2021 at 5:44 PM Luis Cebamanos via users wrote: > > Hello, > > Compiling OpenMPI 4.0.5 with Intel 2020 I came across this

Re: [OMPI users] [External] Help with MPI and macOS Firewall

2021-03-18 Thread Gilles Gouaillardet via users
Matt, you can either mpirun --mca btl self,vader ... or export OMPI_MCA_btl=self,vader mpirun ... you may also add btl = self,vader in your /etc/openmpi-mca-params.conf and then simply mpirun ... Cheers, Gilles On Fri, Mar 19, 2021 at 5:44 AM Matt Thompson via users wrote: > > Prentice,

Re: [OMPI users] config: gfortran: "could not run a simple Fortran program"

2021-03-07 Thread Gilles Gouaillardet via users
Anthony, Did you make sure you can compile a simple fortran program with gfortran? and gcc? Please compress and attach both openmpi-config.out and config.log, so we can diagnose the issue. Cheers, Gilles On Mon, Mar 8, 2021 at 6:48 AM Anthony Rollett via users wrote: > > I am trying to

Re: [OMPI users] Stable and performant openMPI version for Ubuntu20.04 ?

2021-03-04 Thread Gilles Gouaillardet via users
On top of XPMEM, try to also force btl/vader with mpirun --mca pml ob1 --mca btl vader,self, ... On Fri, Mar 5, 2021 at 8:37 AM Nathan Hjelm via users wrote: > > I would run the v4.x series and install xpmem if you can > (http://github.com/hjelmn/xpmem). You will need to build with >

Re: [OMPI users] MPI executable fails on ArchLinux on Termux

2021-02-25 Thread Gilles Gouaillardet via users
yes, you need to (re)build Open MPI from source in order to try this trick. On 2/26/2021 3:55 PM, LINUS FERNANDES via users wrote: No change. What do you mean by running configure? Are you expecting me to build OpenMPI from source? On Fri, 26 Feb 2021, 11:16 Gilles Gouaillardet via users

Re: [OMPI users] MPI executable fails on ArchLinux on Termux

2021-02-25 Thread Gilles Gouaillardet via users
gt;>> >>>> Errno==13 is EACCESS, which generically translates to "permission denied". >>>> Since you're running as root, this suggests that something outside of >>>> your local environment (e.g., outside of that immediate layer of >>>&g

Re: [OMPI users] MPI executable fails on ArchLinux on Termux

2021-02-25 Thread Gilles Gouaillardet via users
s which I > obviously can't on Termux since it doesn't support OpenJDK. > > On Thu, 25 Feb 2021, 13:37 Gilles Gouaillardet via users, > wrote: >> >> Is SELinux running on ArchLinux under Termux? >> >> On 2/25/2021 4:36 PM, LINUS FERNANDES via users wrote: >> &g

Re: [OMPI users] MPI executable fails on ArchLinux on Termux

2021-02-25 Thread Gilles Gouaillardet via users
Is SELinux running on ArchLinux under Termux? On 2/25/2021 4:36 PM, LINUS FERNANDES via users wrote: Yes, I did not receive this in my inbox since I set to receive digest. ifconfig output: dummy0: flags=195 mtu 1500         inet6 fe80::38a0:1bff:fe81:d4f5 prefixlen 64 scopeid 0x20    

Re: [OMPI users] MPI executable fails on ArchLinux on Termux

2021-02-24 Thread Gilles Gouaillardet via users
Can you run ifconfig or ip addr in both Termux and ArchLinux for Termux? On 2/25/2021 2:00 PM, LINUS FERNANDES via users wrote: Why do I see the following error messages when executing |mpirun| on ArchLinux for Termux? The same program executes on Termux without any glitches.

Re: [OMPI users] weird mpi error report: Type mismatch between arguments

2021-02-17 Thread Gilles Gouaillardet via users
Diego, IIRC, you now have to build your gfortran 10 apps with -fallow-argument-mismatch Cheers, Gilles - Original Message - Dear OPENMPI users, i'd like to notify you a strange issue that arised right after installing a new up-to-date version of Linux (Kubuntu 20.10, with

Re: [OMPI users] GROMACS with openmpi

2021-02-11 Thread Gilles Gouaillardet via users
This is not an Open MPI question, and hence not a fit for this mailing list. But here we go: first, try cmake -DGMX_MPI=ON ... if it fails, try cmake -DGMX_MPI=ON -DCMAKE_C_COMPILER=mpicc -DCMAKE_CXX_COMPILER=mpicxx . .. Cheers, Gilles - Original Message - Hi, MPI

Re: [OMPI users] OpenMPI 4.1.0 misidentifies x86 capabilities

2021-02-10 Thread Gilles Gouaillardet via users
Max, at configure time, Open MPI detects the *compiler* capabilities. In your case, your compiler can emit AVX512 code. (and fwiw, the tests are only compiled and never executed) Then at *runtime*, Open MPI detects the *CPU* capabilities. In your case, it should not invoke the functions

Re: [OMPI users] OMPI 4.1 in Cygwin packages?

2021-02-04 Thread Gilles Gouaillardet via users
packages? > > > > Do we know if this was definitely fixed in v4.1.x? > > > > On Feb 4, 2021, at 7:46 AM, Gilles Gouaillardet via users > > wrote: > > > > Martin, > > > > this is a connectivity issue reported by the btl/tcp component. > >

Re: [OMPI users] OMPI 4.1 in Cygwin packages?

2021-02-04 Thread Gilles Gouaillardet via users
Martin, this is a connectivity issue reported by the btl/tcp component. You can try restricting the IP interface to a subnet known to work (and with no firewall) between both hosts mpirun --mca btl_tcp_if_include 192.168.0.0/24 ... If the error persists, you can mpirun --mca

Re: [OMPI users] Debugging a crash

2021-01-29 Thread Gilles Gouaillardet via users
Diego, the mpirun command line starts 2 MPI task, but the error log mentions rank 56, so unless there is a copy/paste error, this is highly suspicious. I invite you to check the filesystem usage on this node, and make sure there is a similar amount of available space in /tmp and /dev/shm (or

Re: [OMPI users] Error with building OMPI with PGI

2021-01-19 Thread Gilles Gouaillardet via users
Passant, unless this is a copy paste error, the last error message reads plus zero three, which is clearly an unknown switch (plus uppercase o three is a known one) At the end of the configure, make sure Fortran bindings are generated. If the link error persists, you can ldd

Re: [OMPI users] 4.1 mpi-io test failures on lustre

2021-01-18 Thread Gilles Gouaillardet via users
Dave, On 1/19/2021 2:13 AM, Dave Love via users wrote: Generally it's not surprising if there's a shortage of effort when outside contributions seem unwelcome. I've tried to contribute several times. The final attempt wasted two or three days, after being encouraged to get the port of

Re: [OMPI users] Timeout in MPI_Bcast/MPI_Barrier?

2021-01-11 Thread Gilles Gouaillardet via users
over TCP, where we use the socket timeout to prevent deadlocks. As you > already did quite a few communicator duplications and other collective > communications before you see the timeout, we need more info about this. As > Gilles indicated, having the complete output might help. What is

Re: [OMPI users] Confusing behaviour of compiler wrappers

2021-01-09 Thread Gilles Gouaillardet via users
Sajid, I believe this is a Spack issue and Open MPI cannot do anything about it. (long story short, `module load openmpi-xyz` does not set the environment for the (spack) external `xpmem` library. I updated the spack issue with some potential workarounds you might want to give a try. Cheers,

Re: [OMPI users] Timeout in MPI_Bcast/MPI_Barrier?

2021-01-08 Thread Gilles Gouaillardet via users
Daniel, Can you please post the full error message and share a reproducer for this issue? Cheers, Gilles On Fri, Jan 8, 2021 at 10:25 PM Daniel Torres via users wrote: > > Hi all. > > Actually I'm implementing an algorithm that creates a process grid and > divides it into row and column

Re: [OMPI users] MPMD hostfile: executables on same hosts

2020-12-21 Thread Gilles Gouaillardet via users
Vineet, probably *not* what you expect, but I guess you can try $ cat host-file host1 slots=3 host2 slots=3 host3 slots=3 $ mpirun -hostfile host-file -np 2 ./EXE1 : -np 1 ./EXE2 : -np 2 ./EXE1 : -np 1 ./EXE2 : -np 2 ./EXE1 : -np 1 ./EXE2 Cheers, Gilles On Mon, Dec 21, 2020 at 10:26 PM

Re: [OMPI users] MPI_type_free question

2020-12-14 Thread Gilles Gouaillardet via users
the memory used by rank 0 before (blue) and after (red) the patch. > > Thanks > > Patrick > > > Le 10/12/2020 à 10:15, Gilles Gouaillardet via users a écrit : > > Patrick, > > > First, thank you very much for sharing the reproducer. > > > Yes, please open

Re: [OMPI users] MPI_type_free question

2020-12-10 Thread Gilles Gouaillardet via users
binaries for the 4 runs and OpenMPI 3.1 (same behavior with 4.0.5). The code is in attachment. I'll try to check type deallocation as soon as possible. Patrick Le 04/12/2020 à 01:34, Gilles Gouaillardet via users a écrit : Patrick, based on George's idea, a simpler check is to retrieve

Re: [OMPI users] MPI_type_free question

2020-12-04 Thread Gilles Gouaillardet via users
r Omnipath based. I will have to investigate too but not sure it is the same problem. Patrick Le 04/12/2020 à 01:34, Gilles Gouaillardet via users a écrit : Patrick, based on George's idea, a simpler check is to retrieve the Fortran index via the (standard) MPI_Type_c2() function after

Re: [OMPI users] MPI_type_free question

2020-12-03 Thread Gilles Gouaillardet via users
Patrick, based on George's idea, a simpler check is to retrieve the Fortran index via the (standard) MPI_Type_c2() function after you create a derived datatype. If the index keeps growing forever even after you MPI_Type_free(), then this clearly indicates a leak. Unfortunately, this

Re: [OMPI users] Parallel HDF5 low performance

2020-12-03 Thread Gilles Gouaillardet via users
> > I was tracking this problem for several weeks but not looking in the > right direction (testing NFS server I/O, network bandwidth.) > > I think we will now move definitively to modern OpenMPI implementations. > > Patrick > > Le 03/12/2020 à 09:06, Gilles

Re: [OMPI users] Parallel HDF5 low performance

2020-12-03 Thread Gilles Gouaillardet via users
Patrick, In recent Open MPI releases, the default component for MPI-IO is ompio (and no more romio) unless the file is on a Lustre filesystem. You can force romio with mpirun --mca io ^ompio ... Cheers, Gilles On 12/3/2020 4:20 PM, Patrick Bégou via users wrote: Hi, I'm using an

Re: [OMPI users] Unable to run complicated MPI Program

2020-11-28 Thread Gilles Gouaillardet via users
Dean, That typically occurs when some nodes have multiple interfaces, and several nodes have a similar IP on a private/unused interface. I suggest you explicitly restrict the interface Open MPI should be using. For example, you can mpirun --mca btl_tcp_if_include eth0 ... Cheers, Gilles On

Re: [OMPI users] 4.0.5 on Linux Pop!_OS

2020-11-07 Thread Gilles Gouaillardet via users
Paul, a "slot" is explicitly defined in the error message you copy/pasted: "If none of a hostfile, the --host command line parameter, or an RM is present, Open MPI defaults to the number of processor cores" The error message also lists 4 ways on how you can move forward, but you should first

Re: [OMPI users] ompe support for filesystems

2020-10-31 Thread Gilles Gouaillardet via users
Hi Ognen, MPI-IO is implemented by two components: - ROMIO (from MPICH) - ompio ("native" Open MPI MPI-IO, default component unless running on Lustre) Assuming you want to add support for a new filesystem in ompio, first step is to implement a new component in the fs framework the framework is

Re: [OMPI users] Anyone try building openmpi 4.0.5 w/ llvm 11

2020-10-22 Thread Gilles Gouaillardet via users
Alan, thanks for the report, I addressed this issue in https://github.com/open-mpi/ompi/pull/8116 As a temporary workaround, you can apply the attached patch. FWIW, f18 (shipped with LLVM 11.0.0) is still in development and uses gfortran under the hood. Cheers, Gilles On Wed, Oct 21, 2020 at

Re: [OMPI users] mpirun on Kubuntu 20.4.1 hangs

2020-10-21 Thread Gilles Gouaillardet via users
Hi Jorge, If a firewall is running on your nodes, I suggest you disable it and try again Cheers, Gilles On Wed, Oct 21, 2020 at 5:50 AM Jorge SILVA via users wrote: > > Hello, > > I installed kubuntu20.4.1 with openmpi 4.0.3-0ubuntu in two different > computers in the standard way. Compiling

Re: [OMPI users] Issue with shared memory arrays in Fortran

2020-08-24 Thread Gilles Gouaillardet via users
Patrick, Thanks for the report and the reproducer. I was able to confirm the issue with python and Fortran, but - I can only reproduce it with pml/ucx (read --mca pml ob1 --mca btl tcp,self works fine) - I can only reproduce it with bcast algorithm 8 and 9 As a workaround, you can keep using

Re: [OMPI users] ORTE HNP Daemon Error - Generated by Tweaking MTU

2020-08-09 Thread Gilles Gouaillardet via users
John, I am not sure you will get much help here with a kernel crash caused by a tweaked driver. About HPL, you are more likely to get better performance with P and Q closer (e.g. 4x8 is likely better then 2x16 or 1x32). Also, HPL might have better performance with one MPI task per node and a

  1   2   3   4   5   6   7   8   9   10   >