Hi Prentice,

you are right, and I looked into the wrapper script (not my part, never did 
anything in that thing).
In fact the mpi processes are spawned on the backend nodes, the only process 
remaining on the login/frontend node is the spawner process.

The wrapper checks the load on the nodes and then creates a corresponding 
hostfile:
Host nrm214:  current load 0.53  =>  96 slots left
Host nrm215:  current load 0.14  =>  96 slots left
Host nrm212:  current load 0.09  =>  96 slots left
Host nrm213:  current load 0.13  =>  96 slots left

Used hosts:
nrm214 0   (current load is: 0.53)
nrm215 0   (current load is: 0.14)
nrm212 2.0   (current load is: 0.09)
nrm213 0   (current load is: 0.13)

Writing to /tmp/mw445520/login_60004/hostfile-613910

Contents:
nrm212:2

And then spawns the job:
Command: /opt/intel/impi/2018.4.274/compilers_and_libraries/linux/mpi/bin64/mpirun 
-launcher ssh -machinefile /tmp/mw445520/login_60004/hostfile-63375 -np 2 <code>


I hope to have cleared things up a little bit.


Best
Marcus

Am 27.04.2021 um 17:48 schrieb Prentice Bisbal:
But won't that first process be able to use 100% of a core? What if enough 
users do this such that every core is at 100% utilization? Or, what if the 
application is MPI + OpenMP? In that case, that one process on the login node 
could spawn multiple threads that use the remaining cores on the login node.

Prentice

On 4/26/21 2:01 AM, Marcus Wagner wrote:
Hi,

we also have a wrapper script, together with a number of "MPI-Backends".
If mpiexec is called on the login nodes, only the first process is started on 
the login node, the rest runs on the MPI backends.

Best
Marcus

Am 25.04.2021 um 09:46 schrieb Patrick Begou:
Hi,

I also saw a cluster setup where mpirun or mpiexec commands were
replaced by a shell script just saying "please use srun or sbatch...".

Patrick

Le 24/04/2021 à 10:03, Ole Holm Nielsen a écrit :
On 24-04-2021 04:37, Cristóbal Navarro wrote:
Hi Community,
I have a set of users still not so familiar with slurm, and yesterday
they bypassed srun/sbatch and just ran their CPU program directly on
the head/login node thinking it would still run on the compute node.
I am aware that I will need to teach them some basic usage, but in
the meanwhile, how have you solved this type of user-behavior
problem? Is there a preffered way to restrict the master/login
resources, or actions,  to the regular users ?

We restrict user limits in /etc/security/limits.conf so users can't
run very long or very big tasks on the login nodes:

# Normal user limits
*               hard    cpu             20
*               hard    rss             50000000
*               hard    data            50000000
*               soft    stack           40000000
*               hard    stack           50000000
*               hard    nproc           250

/Ole






--
Dipl.-Inf. Marcus Wagner

IT Center
Gruppe: Systemgruppe Linux
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter Weg 23
52074 Aachen
Tel: +49 241 80-24383
Fax: +49 241 80-624383
wag...@itc.rwth-aachen.de
www.itc.rwth-aachen.de

Social Media Kanäle des IT Centers:
https://blog.rwth-aachen.de/itc/
https://www.facebook.com/itcenterrwth
https://www.linkedin.com/company/itcenterrwth
https://twitter.com/ITCenterRWTH
https://www.youtube.com/channel/UCKKDJJukeRwO0LP-ac8x8rQ

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to