Hi Prentice, you are right, and I looked into the wrapper script (not my part, never did anything in that thing). In fact the mpi processes are spawned on the backend nodes, the only process remaining on the login/frontend node is the spawner process.
The wrapper checks the load on the nodes and then creates a corresponding hostfile: Host nrm214: current load 0.53 => 96 slots left Host nrm215: current load 0.14 => 96 slots left Host nrm212: current load 0.09 => 96 slots left Host nrm213: current load 0.13 => 96 slots left Used hosts: nrm214 0 (current load is: 0.53) nrm215 0 (current load is: 0.14) nrm212 2.0 (current load is: 0.09) nrm213 0 (current load is: 0.13) Writing to /tmp/mw445520/login_60004/hostfile-613910 Contents: nrm212:2 And then spawns the job: Command: /opt/intel/impi/2018.4.274/compilers_and_libraries/linux/mpi/bin64/mpirun -launcher ssh -machinefile /tmp/mw445520/login_60004/hostfile-63375 -np 2 <code> I hope to have cleared things up a little bit. Best Marcus Am 27.04.2021 um 17:48 schrieb Prentice Bisbal:
But won't that first process be able to use 100% of a core? What if enough users do this such that every core is at 100% utilization? Or, what if the application is MPI + OpenMP? In that case, that one process on the login node could spawn multiple threads that use the remaining cores on the login node. Prentice On 4/26/21 2:01 AM, Marcus Wagner wrote:Hi, we also have a wrapper script, together with a number of "MPI-Backends". If mpiexec is called on the login nodes, only the first process is started on the login node, the rest runs on the MPI backends. Best Marcus Am 25.04.2021 um 09:46 schrieb Patrick Begou:Hi, I also saw a cluster setup where mpirun or mpiexec commands were replaced by a shell script just saying "please use srun or sbatch...". Patrick Le 24/04/2021 à 10:03, Ole Holm Nielsen a écrit :On 24-04-2021 04:37, Cristóbal Navarro wrote:Hi Community, I have a set of users still not so familiar with slurm, and yesterday they bypassed srun/sbatch and just ran their CPU program directly on the head/login node thinking it would still run on the compute node. I am aware that I will need to teach them some basic usage, but in the meanwhile, how have you solved this type of user-behavior problem? Is there a preffered way to restrict the master/login resources, or actions, to the regular users ?We restrict user limits in /etc/security/limits.conf so users can't run very long or very big tasks on the login nodes: # Normal user limits * hard cpu 20 * hard rss 50000000 * hard data 50000000 * soft stack 40000000 * hard stack 50000000 * hard nproc 250 /Ole
-- Dipl.-Inf. Marcus Wagner IT Center Gruppe: Systemgruppe Linux Abteilung: Systeme und Betrieb RWTH Aachen University Seffenter Weg 23 52074 Aachen Tel: +49 241 80-24383 Fax: +49 241 80-624383 wag...@itc.rwth-aachen.de www.itc.rwth-aachen.de Social Media Kanäle des IT Centers: https://blog.rwth-aachen.de/itc/ https://www.facebook.com/itcenterrwth https://www.linkedin.com/company/itcenterrwth https://twitter.com/ITCenterRWTH https://www.youtube.com/channel/UCKKDJJukeRwO0LP-ac8x8rQ
smime.p7s
Description: S/MIME Cryptographic Signature