-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi folks,
We are hitting an odd bug with Slurm where a projects jobs won't start because Slurm seems to think that they have no quota (GrpCPUMins) available. Here's the data we have with all numbers normalised to hours: GrpCPUMins 71200 CPURunMins 58348 Raw Usage 12967 So that means GrpCPUMins-CPURunMins-RawUsage = -115 hours. Our user accounting system, which imports usage info from slurm logs and makes it available to users via some commands indicates a total usage of just 1,471 CPU hours. This not as bad as yesterday as then they were about -250 hours over, so as their running jobs have continued their usage has gone down.. This first started on 2.6.1 and so we brought forward our upgrade to 2.6.5 but it's still persisting. One caveat is that the nodes running their running jobs are still running 2.6.1 slurmd's - we didn't feel brave enough to restart those with running jobs on them.. ;-) To me it's reminiscent of this bug: http://bugs.schedmd.com/show_bug.cgi?id=392 except our problem persists across a slurmctld (and slurmdbd) restart. :-( Any ideas? cheers, Chris - -- Christopher Samuel Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: [email protected] Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.14 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlLMlX8ACgkQO2KABBYQAh+s2ACfeC7fkvhmrG2twles8RM4b3cV kVoAnRVgJzIPgGVAIhDpVb9ZaBO06Ptk =ONcc -----END PGP SIGNATURE-----
