Hi!

I have configured slurm-2.6.5 with two node its working fine with serial
jobs
But when I am submitting MPI jobs my controller node always going to
IDLE+COMPLETING state.
so every time I have to do manualy
sudo scontrol update nodename=node1 state=down reason=hung_proc
sudo scontrol update nodename=node1 state=resume

I checked my slurmctld.log file for error its showing like

error:  A non superuser 106 tried to complete batch job 23
error:  Security violation, NODE_REGISTER RPC from uid=106

one more thing for any serial jobs the node is in completing state  for
time given in batch script  --time=00:15:00  e.g. I have given 15 min so
even job has been completed node state still showing in completing state
till --time=00:15:00 after this state become IDLE why ?.

Could any one help me.


Thanks
Nagendra

Reply via email to