Hi,
We've got an RHEL 6.5 system running Slurm 14.11.1. After hundreds of
thousands of test runs, we're trying to clean it up for real users. We
shut down Slurm, removed the SaveState files and the SlurmdSpoolDir
files, but when we come back up, and set the partitions to up, jobs are
held with "resource not available". With slurmctld -vvvvvvv, this is
what we see:
[2015-01-06T21:42:38.293] debug2: found 96 usable nodes from config
containing noden[00-95]
[2015-01-06T21:42:38.293] debug3: _pick_best_nodes: job 2 idle_nodes 96
share_nodes 96
[2015-01-06T21:42:38.295] debug3: JobId=2 required nodes not avail
Even a simple "srun -N1 hostname" hits this. We ARE using slurmdbd with
mysql, but assuming that this would only impact accounting results. Any
guidance on what we should be doing to reset the world?
Andy
--
Andy Riebs
Hewlett-Packard Company
High Performance Computing
+1 404 648 9024
My opinions are not necessarily those of HP