With our previous scheduling system, we had a submit-time feedback mechanism that could provide feedback to the user, and optionally reject the submission of a job, depending on how severe the problem was.
We had checks for a number of things, but this one is probably the most relevant: > - check_historical_job_mem_usage > -- Getting info from the database, and grouping historical jobs by > node/processor requests and memory requests, warn if: > --- There are at least jobcount_min (currently 10) jobs in the history for > this profile > --- one or more of the following are true: > ---- memory usage < (memory requested * mem_undercommit_factor) (currently > 0.8 factor) > ---- memory usage > (memory requested * mem_overcommit_factor) (currently 1.1 > factor) > ---- memory request < minimum RAM recommendation (currently 100MB) We found that immediate feedback, at submit time, seemed to be helpful to our users. We logged the messages, and it appeared that the number of these instances, seemed to decline over time. Of course we still haven't re-implemented since our transition to SLURM, but that's not the point. Lloyd Brown Systems Administrator Fulton Supercomputing Lab Brigham Young University http://marylou.byu.edu On 01/22/2015 03:37 AM, Loris Bennett wrote: > > Hi, > > We run node in mixed mode and thus have issue with users overestimating > their memory requirements. We send out summaries of memory usage > regularly via email, but not all users act on this information. > > So I was wondering whether anyone has done any work on implementing a > mechanism for dynamically determining the memory needed based on the > memory actually used by previous jobs? This seems like it might be > feasible as a part of a submit-plugin for users who go over a threshold > of a certain number of submitted jobs within a certain time interval. > > Any thoughts on this? > > Cheers, > > Loris >
