Ulf, I would verify the slurm.conf is the same in each node.
On February 4, 2015 3:41:35 AM PST, Ulf Markwardt <[email protected]> wrote: >Dear all, > >we see messages like this: > > grep "wrong node" /var/log/slurm/slurmctld.log > >[2015-02-04T04:27:05.591] error: Registered job 11923579.0 on wrong >node taurusi3033 >[2015-02-04T04:27:05.591] error: Registered job 11900205.4294967294 on >wrong node taurusi3033 >[2015-02-04T04:27:05.591] error: Registered job 11925038.0 on wrong >node taurusi3033 >[2015-02-04T08:59:23.360] error: Registered job 11923729.0 on wrong >node taurusi3019 >[2015-02-04T09:23:23.143] error: Registered job 11923729.0 on wrong >node taurusi3107 >[2015-02-04T11:01:58.993] error: Batch completion for job 11923075 sent >from wrong node (taurusi3178 rather than taurusi3084), ignored request >[2015-02-04T11:28:31.198] error: Batch completion for job 11925657 sent >from wrong node (taurusi3137 rather than taurusi1235), ignored request >[2015-02-04T12:17:06.055] error: Registered job 11925657.0 on wrong >node taurusi3137 > >What can possibly have gone wrong here? I have no clue! >(Slurm 14.11.03) > >Thank you >Ulf > >-- >___________________________________________________________________ >Dr. Ulf Markwardt > >Technische Universität Dresden >Center for Information Services and High Performance Computing (ZIH) >01062 Dresden, Germany > >Phone: (+49) 351/463-33640 WWW: http://www.tu-dresden.de/zih
