Hi,

We've got an RHEL 6.5 system running Slurm 14.11.1. After hundreds of thousands of test runs, we're trying to clean it up for real users. We shut down Slurm, removed the SaveState files and the SlurmdSpoolDir files, but when we come back up, and set the partitions to up, jobs are held with "resource not available". With slurmctld -vvvvvvv, this is what we see:

[2015-01-06T21:42:38.293] debug2: found 96 usable nodes from config containing noden[00-95] [2015-01-06T21:42:38.293] debug3: _pick_best_nodes: job 2 idle_nodes 96 share_nodes 96
[2015-01-06T21:42:38.295] debug3: JobId=2 required nodes not avail

Even a simple "srun -N1 hostname" hits this. We ARE using slurmdbd with mysql, but assuming that this would only impact accounting results. Any guidance on what we should be doing to reset the world?

Andy

--
Andy Riebs
Hewlett-Packard Company
High Performance Computing
+1 404 648 9024
My opinions are not necessarily those of HP

Reply via email to