Re: [OMPI users] ucx configuration

2023-01-11 Thread Gilles Gouaillardet via users
You can pick one test, make it standalone, and open an issue on GitHub. How does (vanilla) Open MPI compare to your vendor Open MPI based library? Cheers, Gilles On Wed, Jan 11, 2023 at 10:20 PM Dave Love via users < users@lists.open-mpi.org> wrote: > Gilles Gouaillardet via user

Re: [OMPI users] CephFS and striping_factor

2022-11-28 Thread Gilles Gouaillardet via users
Hi Eric, Currently, Open MPI does not provide specific support for CephFS. MPI-IO is either implemented by ROMIO (imported from MPICH, it does not support CephFS today) or the "native" ompio component (that also does not support CephFS today). A proof of concept for CephFS in ompio might

Re: [OMPI users] Question about "mca" parameters

2022-11-29 Thread Gilles Gouaillardet via users
Hi, Simply add btl = tcp,self If the openib error message persists, try also adding osc_rdma_btls = ugni,uct,ucp or simply osc = ^rdma Cheers, Gilles On 11/29/2022 5:16 PM, Gestió Servidors via users wrote: Hi, If I run “mpirun --mca btl tcp,self --mca allow_ib 0 -n 12

Re: [OMPI users] Run on dual-socket system

2022-11-26 Thread Gilles Gouaillardet via users
Arham, It should be balanced: the default mapping is to allocate NUMA packages round robin. you can mpirun --report-bindings -n 28 true to have Open MPI report the bindings or mpirun --tag-output -n 28 grep Cpus_allowed_list /proc/self/status to have each task report which physical cpu it is

Re: [OMPI users] lots of undefined symbols compiling a hello-world

2022-11-05 Thread Gilles Gouaillardet via users
Chris, Did you double check libopen-rte.so.40 and libopen-pal.so.40 are installed in /mnt/software/o/openmpi/4.1.4-ct-test/lib? If they are not present, it means your install is busted and you should try to reinstall it. Cheers, Gilles On Sat, Nov 5, 2022 at 3:42 AM Chris Taylor via users <

Re: [OMPI users] users Digest, Vol 4818, Issue 1

2022-11-14 Thread Gilles Gouaillardet via users
;machinefile" (vs. a copy-and-pasted "em dash")? -- Jeff Squyres jsquy...@cisco.com From: users on behalf of Gilles Gouaillardet via users Sent: Sunday, November 13, 2022 9:18 PM To: Open MPI Users Cc: Gilles Gouaillardet Subject: Re: [OMPI users

Re: [OMPI users] [OMPI devel] There are not enough slots available in the system to satisfy the 2, slots that were requested by the application

2022-11-13 Thread Gilles Gouaillardet via users
There is a typo in your command line. You should use --mca (minus minus) instead of -mca Also, you can try --machinefile instead of -machinefile Cheers, Gilles There are not enough slots available in the system to satisfy the 2 slots that were requested by the application: –mca On Mon, Nov

Re: [OMPI users] What is the best choice of pml and btl for intranode communication

2023-03-05 Thread Gilles Gouaillardet via users
Arun, First Open MPI selects a pml for **all** the MPI tasks (for example, pml/ucx or pml/ob1) Then, if pml/ob1 ends up being selected, a btl component (e.g. btl/uct, btl/vader) is used for each pair of MPI tasks (tasks on the same node will use btl/vader, tasks on different nodes will use

Re: [OMPI users] Open MPI 4.0.3 outside as well as inside a SimpleFOAM container: step creation temporarily disabled, retrying Requested nodes are busy

2023-02-28 Thread Gilles Gouaillardet via users
Rob, Do you invoke mpirun from **inside** the container? IIRC, mpirun is generally invoked from **outside** the container, could you try this if not already the case? The error message is from SLURM, so this is really a SLURM vs singularity issue. What if you srun -N 2 -n 2 hostname

Re: [OMPI users] Issue with unexpected IP address in OpenMPI

2023-03-27 Thread Gilles Gouaillardet via users
Todd, Similar issues were also reported when there is Network Translation (NAT) between hosts, and that occured when using kvm/qemu virtual machine running on the same host. First you need to list the available interfaces on both nodes. Then try to restrict to a single interface that is

Re: [OMPI users] libnuma.so error

2023-07-19 Thread Gilles Gouaillardet via users
Luis, That can happen if a component is linked with libnuma.so: Open MPI will fail to open it and try to fallback on an other one. You can run ldd on the mca_*.so components in the /.../lib/openmpi directory to figure out which is using libnuma.so and assess if it is needed or not. Cheers,

Re: [OMPI users] Issue with Running MPI Job on CentOS 7

2023-05-31 Thread Gilles Gouaillardet via users
Open MPI 1.6.5 is an antique version and you should not expect any support with it. Instead, I suggest you try the latest one, rebuild your app and try again. FWIW, that kind of error occurs when the MPI library does not match mpirun That can happen when mpirun and libmpi.so come from different

Re: [OMPI users] MPI_Init_thread error

2023-07-25 Thread Gilles Gouaillardet via users
Aziz, When using direct run (e.g. srun), OpenMPI has to interact with SLURM. This is typically achieved via PMI2 or PMIx You can srun --mpi=list to list the available options on your system if PMIx is available, you can srun --mpi=pmix ... if only PMI2 is available, you need to make sure Open

Re: [OMPI users] OpenMPI crashes with TCP connection error

2023-06-16 Thread Gilles Gouaillardet via users
Kurt, I think Joachim was also asking for the command line used to launch your application. Since you are using Slurm and MPI_Comm_spawn(), it is important to understand whether you are using mpirun or srun FWIW, --mpi=pmix is a srun option. you can srun --mpi=list to find the available

Re: [OMPI users] psec warning when launching with srun

2023-05-20 Thread Gilles Gouaillardet via users
Christof, Open MPI switching to the internal PMIx is a bug I addressed in https://github.com/open-mpi/ompi/pull/11704 Feel free to manually download and apply the patch, you will then need recent autotools and run ./autogen.pl --force An other option is to manually edit the configure file Look

Re: [OMPI users] OpenMPI 5.0.1 Installation Failure

2024-01-26 Thread Gilles Gouaillardet via users
Hi, Please open a GitHub issue at https://github.com/open-ompi/ompi/issues and provide the requested information Cheers, Gilles On Sat, Jan 27, 2024 at 12:04 PM Kook Jin Noh via users < users@lists.open-mpi.org> wrote: > Hi, > > > > I’m installing OpenMPI 5.0.1 on Archlinux 6.7.1. Everything

Re: [OMPI users] Seg error when using v5.0.1

2024-01-31 Thread Gilles Gouaillardet via users
Hi, please open an issue on GitHub at https://github.com/open-mpi/ompi/issues and provide the requested information. If the compilation failed when configured with --enable-debug, please share the logs. the name of the WRF subroutine suggests the crash might occur in MPI_Comm_split(), if so,

Re: [OMPI users] Subject: Clarification about mpirun behavior in Slurm jobs

2024-02-24 Thread Gilles Gouaillardet via users
Christopher, I do not think Open MPI explicitly asks SLURM which cores have been assigned on each node. So if you are planning to run multiple jobs on the same node, your best bet is probably to have SLURM use cpusets. Cheers, Gilles On Sat, Feb 24, 2024 at 7:25 AM Christopher Daley via users

Re: [OMPI users] "MCW rank 0 is not bound (or bound to all available processors)" when running multiple jobs concurrently

2024-04-15 Thread Gilles Gouaillardet via users
Greg, If Open MPI was built with UCX, your jobs will likely use UCX (and the shared memory provider) even if running on a single node. You can mpirun --mca pml ob1 --mca btl self,sm ... if you want to avoid using UCX. What is a typical mpirun command line used under the hood by your "make test"?

Re: [OMPI users] Another OpenMPI 5.0.1. Installation Failure

2024-04-21 Thread Gilles Gouaillardet via users
Hi, Is there any reason why you do not build the latest 5.0.2 package? Anyway, the issue could be related to an unknown filesystem. Do you get a meaningful error if you manually run /.../test/util/opal_path_nfs? If not, can you share the output of mount | cut -f3,5 -d' ' Cheers, Gilles On

<    6   7   8   9   10   11