Yes, I understand. I use mpirun for my self as I want to have more
control. However, when trying to provide our developed code to normal
users, I think I should change all the "mpirun" to "srun" in the batch
scripts, so that the users will not encounter unexpected behavior, as
they use slurm to allocate jobs on the cluster system.
Chang
On 10/11/21 1:49 PM, Ralph Castain via users wrote:
Oh my - that is a pretty strong statement. It depends on what you are
trying to do, and whether or not Slurm offers a mapping pattern that
matches. mpirun tends to have a broader range of options, which is why
many people use it. It also means that your job script is portable and
not locked to a specific RM, which is important to quite a few users.
However, if Slurm has something you can use/like and you don't need to
worry about portability, then by all means one should use it.
Just don't assume that everyone fits in that box :-)
On Oct 11, 2021, at 10:40 AM, Chang Liu via users
<users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>> wrote:
OK thank you. Seems that srun is a better option for normal users.
Chang
On 10/11/21 1:23 PM, Ralph Castain via users wrote:
Sorry, your output wasn't clear about cores vs hwthreads. Apparently,
your Slurm config is setup to use hwthreads as independent cpus -
what you are calling "logical cores", which is a little confusing.
No, mpirun has no knowledge of what mapping pattern you passed to
salloc. We don't have any good way of obtaining config information,
for one thing - e.g., that Slurm is treating hwthreads as cpus. So we
can't really interpret what they might have done.
Given this clarification, you can probably get what you want with:
mpirun --use-hwthread-cpus --map-by hwthread:pe=2 ..."
On Oct 11, 2021, at 7:35 AM, Chang Liu via users
<users@lists.open-mpi.org
<mailto:users@lists.open-mpi.org><mailto:users@lists.open-mpi.org
<mailto:users@lists.open-mpi.org>>> wrote:
This is not what I need. The cpu can run 4 threads per core, so
"--bind-to core" results in one process occupying 4 logical cores.
I want one process to occupy 2 logical cores, so two processes
sharing a physical core.
I guess there is a way to do that by playing with mapping. I just
want to know if this is a bug in mpirun, or this feature for
interacting with slurm was never implemented.
Chang
On 10/11/21 10:07 AM, Ralph Castain via users wrote:
You just need to tell mpirun that you want your procs to be bound
to cores, not socket (which is the default).
Add "--bind-to core" to your mpirun cmd line
On Oct 10, 2021, at 11:17 PM, Chang Liu via users
<users@lists.open-mpi.org
<mailto:users@lists.open-mpi.org><mailto:users@lists.open-mpi.org
<mailto:users@lists.open-mpi.org>><mailto:users@lists.open-mpi.org
<mailto:users@lists.open-mpi.org><mailto:users@lists.open-mpi.org
<mailto:users@lists.open-mpi.org>>>> wrote:
Yes they are. This is an interactive job from
salloc -N 1 --ntasks-per-node=64 --cpus-per-task=2
--gpus-per-node=4 --gpu-mps --time=24:00:00
Chang
On 10/11/21 2:09 AM, Åke Sandgren via users wrote:
On 10/10/21 5:38 PM, Chang Liu via users wrote:
OMPI v4.1.1-85-ga39a051fd8
% srun bash -c "cat /proc/self/status|grep Cpus_allowed_list"
Cpus_allowed_list: 58-59
Cpus_allowed_list: 106-107
Cpus_allowed_list: 110-111
Cpus_allowed_list: 114-115
Cpus_allowed_list: 16-17
Cpus_allowed_list: 36-37
Cpus_allowed_list: 54-55
...
% mpirun bash -c "cat /proc/self/status|grep Cpus_allowed_list"
Cpus_allowed_list: 0-127
Cpus_allowed_list: 0-127
Cpus_allowed_list: 0-127
Cpus_allowed_list: 0-127
Cpus_allowed_list: 0-127
Cpus_allowed_list: 0-127
Cpus_allowed_list: 0-127
...
Was that run in the same batch job? If not, the data is useless.
--
Chang Liu
Staff Research Physicist
+1 609 243 3438
c...@pppl.gov <mailto:c...@pppl.gov><mailto:c...@pppl.gov
<mailto:c...@pppl.gov>><mailto:c...@pppl.gov
<mailto:c...@pppl.gov><mailto:c...@pppl.gov <mailto:c...@pppl.gov>>>
Princeton Plasma Physics Laboratory
100 Stellarator Rd, Princeton NJ 08540, USA
--
Chang Liu
Staff Research Physicist
+1 609 243 3438
c...@pppl.gov <mailto:c...@pppl.gov><mailto:c...@pppl.gov
<mailto:c...@pppl.gov>>
Princeton Plasma Physics Laboratory
100 Stellarator Rd, Princeton NJ 08540, USA
--
Chang Liu
Staff Research Physicist
+1 609 243 3438
c...@pppl.gov <mailto:c...@pppl.gov>
Princeton Plasma Physics Laboratory
100 Stellarator Rd, Princeton NJ 08540, USA
--
Chang Liu
Staff Research Physicist
+1 609 243 3438
c...@pppl.gov
Princeton Plasma Physics Laboratory
100 Stellarator Rd, Princeton NJ 08540, USA