Hi! I have configured slurm-2.6.5 with two node its working fine with serial jobs But when I am submitting MPI jobs my controller node always going to IDLE+COMPLETING state. so every time I have to do manualy sudo scontrol update nodename=node1 state=down reason=hung_proc sudo scontrol update nodename=node1 state=resume
I checked my slurmctld.log file for error its showing like error: A non superuser 106 tried to complete batch job 23 error: Security violation, NODE_REGISTER RPC from uid=106 one more thing for any serial jobs the node is in completing state for time given in batch script --time=00:15:00 e.g. I have given 15 min so even job has been completed node state still showing in completing state till --time=00:15:00 after this state become IDLE why ?. Could any one help me. Thanks Nagendra
