Dear all,

we have seen that a regular "squeue -w `hostname`" on our nodes was
responsible. (We need this to clean up processes after a job.)

So, although the answer to the query above are a few lines, the
slurmctld sends the whole job list to the squeue process. In our case
this amounts to some 10-15 MB. Per call. Doing this for 2000 nodes
simply creates traffic.
We also have users who are polling the status of their jobs in a loop on
the login node.

As a first workaround, we now have a wrapper script just executing
squeue on the master node via ssh.

> Are you utilizing message aggregation?
> http://slurm.schedmd.com/slurm.conf.html#OPT_MsgAggregationParams

This is a good idea ( I thought this would be used automatically). - I
have configured it and check the network traffic.

Thank you,
Ulf

-- 
___________________________________________________________________
Dr. Ulf Markwardt

Technische Universit├Ąt Dresden
Center for Information Services and High Performance Computing (ZIH)
01062 Dresden, Germany

Phone: (+49) 351/463-33640      WWW:  http://www.tu-dresden.de/zih

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to