Hi Stuart,
Exceeding late reply! I very sorry; I missed your question completely, and only
found it just now while searching for something else. It's probably too late to
address your issue, but hopefully it might help someone else.
Yes, the negative balance in sbank was something we didn't anticipate properly.
It actually doesn't make sense to combine data from two sources together like
that (local usage data which can decay, and historical slurmdbd usage which
always increases).
So we (recently) changed sbank to only use the local non-dbd usage values.
That will give an accurate bank balance, in that the local usage values are what
slurm itself uses to decide if a job has enough TRES CPU balance to start (or if
the job will have to wait until the usage decays or is reset).
https://github.com/paddydoyle/slurm-bank
(sbank can be used as a simple sreport wrapper with the '-s -mm-dd' if you
still want something like the previous sbank behaviour)
Paddy
On Sun, May 31, 2015 at 06:21:53AM -0700, Stuart Rankin wrote:
>
> Hi Paddy,
>
> Out of curiosity, have you made further modifications to sbank to stop
> balances becoming negative
> after they reach zero but the assocation usage decays and the actual usage as
> seen by sreport is
> allowed to continue to increase?
>
> Best regards
>
> Stuart
>
> On 23/04/15 10:24, Paddy Doyle wrote:
> > This works reasonably well for us, but we are investigating other tweaks. We
> > would actually like to emphasise accounting/reporting more than restricting
> > use;
> > we want to encourage as much usage of the systems as possible, and don't
> > want
> > project limits to prevent people from running jobs when the resources are
> > idle.
> > In our current sbank setup, people's `available balance' can still drop to 0
> > when their usage is high enough, and before the half-life decays it, and so
> > with
> > a 0 balance their jobs will wait in the queue until their usage decays
> > enough.
> > We are investigating the expired/normal QoS approach, to see if that will
> > give
> > us a better system usage.
>
> --
> Dr. Stuart Rankin
>
> Senior System Administrator
> High Performance Computing Service
> University of Cambridge
> Email: sj...@cam.ac.uk
> Tel: (+)44 1223 763517
>
--
Paddy Doyle
Trinity Centre for High Performance Computing,
Lloyd Building, Trinity College Dublin, Dublin 2, Ireland.
Phone: +353-1-896-3725
http://www.tchpc.tcd.ie/