Dear all,

we have seen that a regular "squeue -w `hostname`" on our nodes was
responsible. (We need this to clean up processes after a job.)

So, although the answer to the query above are a few lines, the
slurmctld sends the whole job list to the squeue process. In our case
this amounts to some 10-15 MB. Per call. Doing this for 2000 nodes
simply creates traffic.
We also have users who are polling the status of their jobs in a loop on
the login node.

As a first workaround, we now have a wrapper script just executing
squeue on the master node via ssh.

> Are you utilizing message aggregation?

This is a good idea ( I thought this would be used automatically). - I
have configured it and check the network traffic.

Thank you,

Dr. Ulf Markwardt

Technische Universit├Ąt Dresden
Center for Information Services and High Performance Computing (ZIH)
01062 Dresden, Germany

Phone: (+49) 351/463-33640      WWW:

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to