Yes, I understand. I use mpirun for my self as I want to have more control. However, when trying to provide our developed code to normal users, I think I should change all the "mpirun" to "srun" in the batch scripts, so that the users will not encounter unexpected behavior, as they use slurm to allocate jobs on the cluster system.

Chang

On 10/11/21 1:49 PM, Ralph Castain via users wrote:
Oh my - that is a pretty strong statement. It depends on what you are trying to do, and whether or not Slurm offers a mapping pattern that matches. mpirun tends to have a broader range of options, which is why many people use it. It also means that your job script is portable and not locked to a specific RM, which is important to quite a few users.

However, if Slurm has something you can use/like and you don't need to worry about portability, then by all means one should use it.

Just don't assume that everyone fits in that box :-)


On Oct 11, 2021, at 10:40 AM, Chang Liu via users <users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>> wrote:

OK thank you. Seems that srun is a better option for normal users.

Chang

On 10/11/21 1:23 PM, Ralph Castain via users wrote:
Sorry, your output wasn't clear about cores vs hwthreads. Apparently, your Slurm config is setup to use hwthreads as independent cpus - what you are calling "logical cores", which is a little confusing. No, mpirun has no knowledge of what mapping pattern you passed to salloc. We don't have any good way of obtaining config information, for one thing - e.g., that Slurm is treating hwthreads as cpus. So we can't really interpret what they might have done.
Given this clarification, you can probably get what you want with:
mpirun --use-hwthread-cpus --map-by hwthread:pe=2 ..."
On Oct 11, 2021, at 7:35 AM, Chang Liu via users <users@lists.open-mpi.org <mailto:users@lists.open-mpi.org><mailto:users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>>> wrote:

This is not what I need. The cpu can run 4 threads per core, so "--bind-to core" results in one process occupying 4 logical cores.

I want one process to occupy 2 logical cores, so two processes sharing a physical core.

I guess there is a way to do that by playing with mapping. I just want to know if this is a bug in mpirun, or this feature for interacting with slurm was never implemented.

Chang

On 10/11/21 10:07 AM, Ralph Castain via users wrote:
You just need to tell mpirun that you want your procs to be bound to cores, not socket (which is the default).
Add "--bind-to core" to your mpirun cmd line
On Oct 10, 2021, at 11:17 PM, Chang Liu via users <users@lists.open-mpi.org <mailto:users@lists.open-mpi.org><mailto:users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>><mailto:users@lists.open-mpi.org <mailto:users@lists.open-mpi.org><mailto:users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>>>> wrote:

Yes they are. This is an interactive job from

salloc -N 1 --ntasks-per-node=64 --cpus-per-task=2 --gpus-per-node=4 --gpu-mps --time=24:00:00

Chang

On 10/11/21 2:09 AM, Åke Sandgren via users wrote:
On 10/10/21 5:38 PM, Chang Liu via users wrote:
OMPI v4.1.1-85-ga39a051fd8

% srun bash -c "cat /proc/self/status|grep Cpus_allowed_list"
Cpus_allowed_list:      58-59
Cpus_allowed_list:      106-107
Cpus_allowed_list:      110-111
Cpus_allowed_list:      114-115
Cpus_allowed_list:      16-17
Cpus_allowed_list:      36-37
Cpus_allowed_list:      54-55
...

% mpirun bash -c "cat /proc/self/status|grep Cpus_allowed_list"
Cpus_allowed_list:      0-127
Cpus_allowed_list:      0-127
Cpus_allowed_list:      0-127
Cpus_allowed_list:      0-127
Cpus_allowed_list:      0-127
Cpus_allowed_list:      0-127
Cpus_allowed_list:      0-127
...
Was that run in the same batch job? If not, the data is useless.

--
Chang Liu
Staff Research Physicist
+1 609 243 3438
c...@pppl.gov <mailto:c...@pppl.gov><mailto:c...@pppl.gov <mailto:c...@pppl.gov>><mailto:c...@pppl.gov <mailto:c...@pppl.gov><mailto:c...@pppl.gov <mailto:c...@pppl.gov>>>
Princeton Plasma Physics Laboratory
100 Stellarator Rd, Princeton NJ 08540, USA

--
Chang Liu
Staff Research Physicist
+1 609 243 3438
c...@pppl.gov <mailto:c...@pppl.gov><mailto:c...@pppl.gov <mailto:c...@pppl.gov>>
Princeton Plasma Physics Laboratory
100 Stellarator Rd, Princeton NJ 08540, USA

--
Chang Liu
Staff Research Physicist
+1 609 243 3438
c...@pppl.gov <mailto:c...@pppl.gov>
Princeton Plasma Physics Laboratory
100 Stellarator Rd, Princeton NJ 08540, USA


--
Chang Liu
Staff Research Physicist
+1 609 243 3438
c...@pppl.gov
Princeton Plasma Physics Laboratory
100 Stellarator Rd, Princeton NJ 08540, USA

Reply via email to