Thanks everyone for the input! The use case for our lab is such that we're ok with having our own copy of the slurm code that has this very minor modification. From reading responses it obviously doesn't make sense for other people's setups.
On top of allowing jobs to have a modifiable QOS we have a python daemon that guarantees each user an equal number of "high" QOS jobs, and the preempt/qos plugin turned on so to enforce this. So far it seems to work but alot more testing is required to make sure. Thanks again! mike On Fri, Jul 22, 2011 at 5:47 PM, Aaron Knister <[email protected]> wrote: > I have to second Phil's point as it's relevant to where I'd like to go with > SLURM on my cluster. Perhaps it could be a configurable parameter in > slurm.conf-- something like AllowUserChangeQOS? > > On Fri, Jul 22, 2011 at 11:09 AM, Eckert, Phil <[email protected]> wrote: >> >> At our site we have qos's defined that allow will allow jobs to >> start much sooner than others, but with the penalty of being >> preempted when higher priority qos's jobs are submitted. >> >> If the qos of the running job is allowed to be modified, this >> will allow users to game the system. They can submit the job >> with the preemptable qos and then modify it to a non-preemptable >> qos after it starts running. I can also imagine other qos >> definitions that might be abused in this manner. >> >> I do not think it would be wrong to allow root/admin to modify >> it, but it would cause problems (for us anyway) if the owner >> of the job were allowed to do this. >> >> Phil Eckert >> LLNL >> >> >> On 7/21/11 9:49 AM, "[email protected]" <[email protected]> wrote: >> >> > Mike, >> > >> > Here is some updated information. This patch will make accounting >> > inconsistent since we don't create a new job record that says the job >> > ran for so much time under each QOS. If that capability is important, >> > it would require some SLURM development work to change the accounting. >> > >> > Moe Jette >> > SchedMD LLC >> > >> > Quoting [email protected]: >> > >> >> We believe that removing that test will be fine, but it would take some >> >> work to be certain. Removing the test could break some QOS-related >> >> functionality. >> >> >> >> Quoting Mike Schachter <[email protected]>: >> >> >> >>> I just posted a question yesterday about this, but might >> >>> be better on a separate thread for archival purposes. >> >>> >> >>> I want to change the QOS of a job that is a RUNNING state, >> >>> but line 5882 of slurmctld/job_mgr.c (version 2.2.7) is preventing >> >>> me from doing so: >> >>> >> >>> } else if (job_specs->qos) { >> >>> slurmdb_qos_rec_t qos_rec; >> >>> if (!IS_JOB_PENDING(job_ptr)) // i want to remove this >> >>> error_code = ESLURM_DISABLED; >> >>> else { >> >>> >> >>> Is there any danger in removing that check, and allowing qos to >> >>> be changed for a running job? >> >>> >> >>> mike >> >>> >> > >> > >> > >> > >> >> > > > > -- > Aaron Knister > Systems Administrator > Division of Information Technology > University of Maryland, Baltimore County > [email protected] >
