Re: [OMPI users] mpi program gets stuck

2022-11-29 Thread Jeff Squyres (jsquyres) via users
(we've conversed a bit off-list; bringing this back to the list with a good subject to differentiate it from other digest threads) I'm glad the tarball I provided (that included the PMIx fix) resolved running "uptime" for you. Can you try running a plain C MPI program instead of a Python MPI

Re: [OMPI users] CephFS and striping_factor

2022-11-29 Thread Edgar Gabriel via users
[AMD Official Use Only - General] I can also offer to help if there are any question regarding the ompio code, but I do not have the bandwidth/resources to do that myself, and more importantly, I do not have a platform to test the new component. Edgar From: users On Behalf Of Jeff Squyres

[OMPI users] mpi program gets stuck

2022-11-29 Thread timesir via users
see also: https://pastebin.com/s5tjaUkF (py3.9) ➜ /share cat hosts 192.168.180.48 slots=1 192.168.60.203 slots=1 1. This command now runs correctly using your openmpi-gitclone-pr11096.tar.bz2 (py3.9) ➜ /share mpirun -n 2 --machinefile hosts --mca plm_base_verbose 100 --mca rmaps_base_verbose

Re: [OMPI users] CephFS and striping_factor

2022-11-29 Thread Jeff Squyres (jsquyres) via users
More specifically, Gilles created a skeleton "ceph" component in this draft pull request: https://github.com/open-mpi/ompi/pull/11122 If anyone has any cycles to work on it and develop it beyond the skeleton that is currently there, that would be great! -- Jeff Squyres jsquy...@cisco.com

Re: [OMPI users] Question about "mca" parameters

2022-11-29 Thread Jeff Squyres (jsquyres) via users
Also, you probably want to add "vader" into your BTL specification. Although the name is counter-intuitive, "vader" in Open MPI v3.x and v4.x is the shared memory transport. Hence, if you run with "btl=tcp,self", you are only allowing MPI processes to talk via the TCP stack or process

Re: [OMPI users] Question about "mca" parameters

2022-11-29 Thread Gilles Gouaillardet via users
Hi, Simply add btl = tcp,self If the openib error message persists, try also adding osc_rdma_btls = ugni,uct,ucp or simply osc = ^rdma Cheers, Gilles On 11/29/2022 5:16 PM, Gestió Servidors via users wrote: Hi, If I run “mpirun --mca btl tcp,self --mca allow_ib 0 -n 12

[OMPI users] Question about "mca" parameters

2022-11-29 Thread Gestió Servidors via users
Hi, If I run "mpirun --mca btl tcp,self --mca allow_ib 0 -n 12 ./my_program", I get to disable some "extra" info in the output file like: The OpenFabrics (openib) BTL failed to initialize while trying to allocate some locked memory. This typically can indicate that the memlock limits are set