-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi folks,

We are hitting an odd bug with Slurm where a projects jobs won't start
because Slurm seems to think that they have no quota (GrpCPUMins)
available.

Here's the data we have with all numbers normalised to hours:

GrpCPUMins 71200
CPURunMins 58348
Raw Usage 12967

So that means GrpCPUMins-CPURunMins-RawUsage = -115 hours.

Our user accounting system, which imports usage info from slurm logs
and makes it available to users via some commands indicates a total
usage of just 1,471 CPU hours.

This not as bad as yesterday as then they were about -250 hours over,
so as their running jobs have continued their usage has gone down..

This first started on 2.6.1 and so we brought forward our upgrade to
2.6.5 but it's still persisting.

One caveat is that the nodes running their running jobs are still
running 2.6.1 slurmd's - we didn't feel brave enough to restart those
with running jobs on them.. ;-)

To me it's reminiscent of this bug:

http://bugs.schedmd.com/show_bug.cgi?id=392

except our problem persists across a slurmctld (and slurmdbd) restart. :-(

Any ideas?

cheers,
Chris
- -- 
 Christopher Samuel        Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: [email protected] Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/      http://twitter.com/vlsci

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlLMlX8ACgkQO2KABBYQAh+s2ACfeC7fkvhmrG2twles8RM4b3cV
kVoAnRVgJzIPgGVAIhDpVb9ZaBO06Ptk
=ONcc
-----END PGP SIGNATURE-----

Reply via email to