The linked pastebin includes the following version information: [1,0]<stdout>:package:Open MPI spackapps@eu-c7-042-03 Distribution [1,0]<stdout>:ompi:version:full:4.0.2 [1,0]<stdout>:ompi:version:repo:v4.0.2 [1,0]<stdout>:ompi:version:release_date:Oct 07, 2019 [1,0]<stdout>:orte:version:full:4.0.2 [1,0]<stdout>:orte:version:repo:v4.0.2 [1,0]<stdout>:orte:version:release_date:Oct 07, 2019 [1,0]<stdout>:opal:version:full:4.0.2 [1,0]<stdout>:opal:version:repo:v4.0.2 [1,0]<stdout>:opal:version:release_date:Oct 07, 2019 [1,0]<stdout>:mpi-api:version:full:3.1.0 [1,0]<stdout>:ident:4.0.2
Best Christoph ----- Original Message ----- From: "Open MPI Users" <users@lists.open-mpi.org> To: "Open MPI Users" <users@lists.open-mpi.org> Cc: "Ralph Castain" <r...@open-mpi.org> Sent: Thursday, 3 February, 2022 00:22:30 Subject: Re: [OMPI users] Error using rankfile to bind multiple cores on the same node for threaded OpenMPI application Errr...what version OMPI are you using? > On Feb 2, 2022, at 3:03 PM, David Perozzi via users > <users@lists.open-mpi.org> wrote: > > Helo, > > I'm trying to run a code implemented with OpenMPI and OpenMP (for threading) > on a large cluster that uses LSF for the job scheduling and dispatch. The > problem with LSF is that it is not very straightforward to allocate and bind > the right amount of threads to an MPI rank inside a single node. Therefore, I > have to create a rankfile myself, as soon as the (a priori unknown) > ressources are allocated. > > So, after my job get dispatched, I run: > > mpirun -n "$nslots" -display-allocation -nooversubscribe --map-by core:PE=1 > --bind-to core mpi_allocation/show_numactl.sh > >mpi_allocation/allocation_files/allocation.txt > > where show_numactl.sh consists of just one line: > > { hostname; numactl --show; } | sed ':a;N;s/\n/ /;ba' > > If I ask for 16 slots, in blocks of 4 (i.e., bsub -n 16 -R "span[block=4]"), > I get something like: > > ====================== ALLOCATED NODES ====================== > eu-g1-006-1: flags=0x11 slots=4 max_slots=0 slots_inuse=0 state=UP > eu-g1-009-2: flags=0x11 slots=4 max_slots=0 slots_inuse=0 state=UP > eu-g1-002-3: flags=0x11 slots=4 max_slots=0 slots_inuse=0 state=UP > eu-g1-005-1: flags=0x11 slots=4 max_slots=0 slots_inuse=0 state=UP > ================================================================= > eu-g1-006-1 policy: default preferred node: current physcpubind: 16 cpubind: > 1 nodebind: 1 membind: 0 1 2 3 4 5 6 7 > eu-g1-006-1 policy: default preferred node: current physcpubind: 24 cpubind: > 1 nodebind: 1 membind: 0 1 2 3 4 5 6 7 > eu-g1-006-1 policy: default preferred node: current physcpubind: 32 cpubind: > 2 nodebind: 2 membind: 0 1 2 3 4 5 6 7 > eu-g1-002-3 policy: default preferred node: current physcpubind: 21 cpubind: > 1 nodebind: 1 membind: 0 1 2 3 4 5 6 7 > eu-g1-002-3 policy: default preferred node: current physcpubind: 22 cpubind: > 1 nodebind: 1 membind: 0 1 2 3 4 5 6 7 > eu-g1-009-2 policy: default preferred node: current physcpubind: 0 cpubind: > 0 nodebind: 0 membind: 0 1 2 3 4 5 6 7 > eu-g1-009-2 policy: default preferred node: current physcpubind: 1 cpubind: > 0 nodebind: 0 membind: 0 1 2 3 4 5 6 7 > eu-g1-009-2 policy: default preferred node: current physcpubind: 2 cpubind: > 0 nodebind: 0 membind: 0 1 2 3 4 5 6 7 > eu-g1-002-3 policy: default preferred node: current physcpubind: 19 cpubind: > 1 nodebind: 1 membind: 0 1 2 3 4 5 6 7 > eu-g1-002-3 policy: default preferred node: current physcpubind: 23 cpubind: > 1 nodebind: 1 membind: 0 1 2 3 4 5 6 7 > eu-g1-006-1 policy: default preferred node: current physcpubind: 52 cpubind: > 3 nodebind: 3 membind: 0 1 2 3 4 5 6 7 > eu-g1-009-2 policy: default preferred node: current physcpubind: 3 cpubind: > 0 nodebind: 0 membind: 0 1 2 3 4 5 6 7 > eu-g1-005-1 policy: default preferred node: current physcpubind: 90 cpubind: > 5 nodebind: 5 membind: 0 1 2 3 4 5 6 7 > eu-g1-005-1 policy: default preferred node: current physcpubind: 91 cpubind: > 5 nodebind: 5 membind: 0 1 2 3 4 5 6 7 > eu-g1-005-1 policy: default preferred node: current physcpubind: 94 cpubind: > 5 nodebind: 5 membind: 0 1 2 3 4 5 6 7 > eu-g1-005-1 policy: default preferred node: current physcpubind: 95 cpubind: > 5 nodebind: 5 membind: 0 1 2 3 4 5 6 7 > > After that, I parse this allocation file in python and I create a hostfile > and a rankfile. > > The hostfile reads: > > eu-g1-006-1 > eu-g1-009-2 > eu-g1-002-3 > eu-g1-005-1 > > The rankfile: > > rank 0=eu-g1-006-1 slot=16,24,32,52 > rank 1=eu-g1-009-2 slot=0,1,2,3 > rank 2=eu-g1-002-3 slot=21,22,19,23 > rank 3=eu-g1-005-1 slot=90,91,94,95 > > Following OpenMPI's manpages and FAQs, I then run my application using > > mpirun -n "$nmpiproc" --rankfile mpi_allocation/hostfiles/rankfile --mca > rmaps_rank_file_physical 1 ./build/"$executable_name" true "$input_file" > > where the bash variables are passed in directly in the bsub command (I > basically run bsub -n 16 -R "span[block=4]" "my_script.sh num_slots > num_thread_per_rank executable_name input_file"). > > > Now, this procedure sometimes works just fine, sometimes not. When it > doesn't, the problem is that I don't get any error message (I noticed that if > an error is made inside the rankfile, one does not get any error). Strangely, > it seems that for 16 slots and four threads (so 4 MPI ranks), it works better > if I have 8 slots allocated in two nodes than if I have 4 slots in 4 > different nodes. My goal is tu run the application with 256 slots and 32 > threads per rank (the cluster has mainly AMD EPYC based nodes). > > The ompi information of the nodes running a failed job and the rankfile for > that failed job can be found at https://pastebin.com/40f6FigH and the > allocation file at https://pastebin.com/jeWnkU40 > > > Do you see any problem with my procedure? Why is it failing seemingly > randomly? Can I somehow get more informtion about what's failing from mpirun? > > > I hope not having omitted to much information but, in case, just ask and I'll > provide more details. > > > Cheers, > > David > >