To follow up on this - we made the modification to job_mgr.c so that QOS' of a running job could be changed. We have a python daemon that ensures each user gets a fair share of high QOS jobs.
Things seem to run fine. However, the multi-factor priority plugin does not update the priority of a RUNNING job whose QOS has been changed. I think Moe mentioned this would happen. Nevertheless, we worked around it and thinks seem to function good enough for our purposes. So thanks again! mike On Sun, Jul 24, 2011 at 9:22 AM, Mike Schachter <[email protected]> wrote: > Thanks everyone for the input! The use case for our lab is > such that we're ok with having our own copy of the slurm code > that has this very minor modification. From reading responses > it obviously doesn't make sense for other people's setups. > > On top of allowing jobs to have a modifiable QOS we have > a python daemon that guarantees each user an equal number > of "high" QOS jobs, and the preempt/qos plugin turned on > so to enforce this. So far it seems to work but alot more testing > is required to make sure. > > Thanks again! > > mike > > > > On Fri, Jul 22, 2011 at 5:47 PM, Aaron Knister <[email protected]> wrote: >> I have to second Phil's point as it's relevant to where I'd like to go with >> SLURM on my cluster. Perhaps it could be a configurable parameter in >> slurm.conf-- something like AllowUserChangeQOS? >> >> On Fri, Jul 22, 2011 at 11:09 AM, Eckert, Phil <[email protected]> wrote: >>> >>> At our site we have qos's defined that allow will allow jobs to >>> start much sooner than others, but with the penalty of being >>> preempted when higher priority qos's jobs are submitted. >>> >>> If the qos of the running job is allowed to be modified, this >>> will allow users to game the system. They can submit the job >>> with the preemptable qos and then modify it to a non-preemptable >>> qos after it starts running. I can also imagine other qos >>> definitions that might be abused in this manner. >>> >>> I do not think it would be wrong to allow root/admin to modify >>> it, but it would cause problems (for us anyway) if the owner >>> of the job were allowed to do this. >>> >>> Phil Eckert >>> LLNL >>> >>> >>> On 7/21/11 9:49 AM, "[email protected]" <[email protected]> wrote: >>> >>> > Mike, >>> > >>> > Here is some updated information. This patch will make accounting >>> > inconsistent since we don't create a new job record that says the job >>> > ran for so much time under each QOS. If that capability is important, >>> > it would require some SLURM development work to change the accounting. >>> > >>> > Moe Jette >>> > SchedMD LLC >>> > >>> > Quoting [email protected]: >>> > >>> >> We believe that removing that test will be fine, but it would take some >>> >> work to be certain. Removing the test could break some QOS-related >>> >> functionality. >>> >> >>> >> Quoting Mike Schachter <[email protected]>: >>> >> >>> >>> I just posted a question yesterday about this, but might >>> >>> be better on a separate thread for archival purposes. >>> >>> >>> >>> I want to change the QOS of a job that is a RUNNING state, >>> >>> but line 5882 of slurmctld/job_mgr.c (version 2.2.7) is preventing >>> >>> me from doing so: >>> >>> >>> >>> } else if (job_specs->qos) { >>> >>> slurmdb_qos_rec_t qos_rec; >>> >>> if (!IS_JOB_PENDING(job_ptr)) // i want to remove this >>> >>> error_code = ESLURM_DISABLED; >>> >>> else { >>> >>> >>> >>> Is there any danger in removing that check, and allowing qos to >>> >>> be changed for a running job? >>> >>> >>> >>> mike >>> >>> >>> > >>> > >>> > >>> > >>> >>> >> >> >> >> -- >> Aaron Knister >> Systems Administrator >> Division of Information Technology >> University of Maryland, Baltimore County >> [email protected] >> >
