I would also check that your configured addresses for the nodes in slurm.conf are correct (e.g. NodeName and NodeAddr match in slurm.conf).

Quoting Danny Auble <[email protected]>:
Ulf, I would verify the slurm.conf is the same in each node.

On February 4, 2015 3:41:35 AM PST, Ulf Markwardt <[email protected]> wrote:
Dear all,

we see messages like this:

grep "wrong node" /var/log/slurm/slurmctld.log

[2015-02-04T04:27:05.591] error: Registered job 11923579.0 on wrong
node taurusi3033
[2015-02-04T04:27:05.591] error: Registered job 11900205.4294967294 on
wrong node taurusi3033
[2015-02-04T04:27:05.591] error: Registered job 11925038.0 on wrong
node taurusi3033
[2015-02-04T08:59:23.360] error: Registered job 11923729.0 on wrong
node taurusi3019
[2015-02-04T09:23:23.143] error: Registered job 11923729.0 on wrong
node taurusi3107
[2015-02-04T11:01:58.993] error: Batch completion for job 11923075 sent
from wrong node (taurusi3178 rather than taurusi3084), ignored request
[2015-02-04T11:28:31.198] error: Batch completion for job 11925657 sent
from wrong node (taurusi3137 rather than taurusi1235), ignored request
[2015-02-04T12:17:06.055] error: Registered job 11925657.0 on wrong
node taurusi3137

What can possibly have gone wrong here? I have no clue!
(Slurm 14.11.03)

Thank you
Ulf

--
___________________________________________________________________
Dr. Ulf Markwardt

Technische Universität Dresden
Center for Information Services and High Performance Computing (ZIH)
01062 Dresden, Germany

Phone: (+49) 351/463-33640      WWW:  http://www.tu-dresden.de/zih


--
Morris "Moe" Jette
CTO, SchedMD LLC
Commercial Slurm Development and Support

Reply via email to