[slurm-dev] How to allow "cleanup" work to be done when a job is cancelled?
My cluster uses Slurm's preemption feature to allow jobs to run on otherwise idle resources but still allow owners of compute hardware to have on-demand access to their equipment. However, preemption means a cancelled job. One of our users is asking if there is a method to have his job do cleanup work when it is cancelled/preempted. Right now the user simply has cleanup code at the end of the job script but of course this won't run if the job dies or is cancelled. Any idea what I can suggest to this person to have their "cleanup" code always run when the job ends for any reason? -- Jeff White HPC Systems Engineer Information Technology Services - WSU
[slurm-dev] Re: Launching a VMWare Virtual Machine
Thanks John for your response. Unfortunately, setting the -l in the header did not keep the VMWare window open. Do you or anyone else have any other suggestions? Thanks, Sean On Fri, Jun 2, 2017 at 10:59 AM, John Hearnswrote: > Sean, > this sound slike the difference between interactive and non-interactive > shells. > > When you log in directly to the node, you have an interactive shell and > the environment is set up, and /etc/profile.d scripts are sourced. > Someone will be along in a minute with the correct answer, however try > submitting with #!/bin/bash -l > > > > > On 2 June 2017 at 01:18, Sean M wrote: > >> Greetings, >> >> I am trying to schedule a VMWare VM to start automatically but once the >> slurm script is submitted and executed, VMWare launches, it's window >> appears, and closes immediately without launching the VM. When I run VMWare >> with "nogui", the VM also does not run. For these cases, there are no >> errors in the VMWare or slurm logs. Also, if I schedule just VMWare to >> open, it opens but requires human interaction to launch the VM, which is >> not feasible for my use case. >> >> On my base case, I have two machines: my node is running Ubuntu Desktop >> 17 and the controller Ubuntu Server. >> >> I have tried two methods. >> Method 1: My controller submits a script with the following command: >> vmrun -T ws start >> >> Method 2: My controller executes a bash script on the node. The node's >> bash script has the following command: >> vmrun -T ws start >> >> Both methods have the same result: the VMWare window appears briefly and >> then closes. The VM launches perfectly if I execute Method 2's bash script >> directly on the node; the bash script is owned by the same user and group >> with root access on the node and controller and has 777 rights. Here is a >> weird thing, if I change method 1's script (on the same line) to ssh into >> the node and launch the vmrun command, the VM successfully starts >> automatically. The ssh solution is not ideal because I will not know in the >> future which node will get the job. Any suggestions on how to resolve this >> issue? >> >> Thanks! >> Sean >> > >
[slurm-dev] Re: Accounting: preventing scheduling after TRES limit reached (permanently)
Hi Douglas, Thanks for the insight. That may actually be desirable, and it's good to know that it's supported like that. I'll speak with my supervisors about what policies they are wanting giving these new details. I appreciate your all's help. Jacob Chappell On Mon, Jun 5, 2017 at 10:20 AM, Douglas Jacobsenwrote: > I believe you can use fairshare without decaying usage, the fairshares > will only decline over time is all. This may mean that a user that > consumes a large portion of their share early may have trouble getting > priority later. > > On Jun 5, 2017 7:10 AM, "Jacob Chappell" wrote: > > Hi Douglas, > > It'd be nice to have the ability to incorporate recent usage into the > priority, but it seems like I can't do both that *and* have hard limits > right? I think hard limits are most important between the two. I should > just be able to set the FairshareWeight to 0 to ignore that component in > the priority, but still enforce the limits with the GrpMins parameters > right? > > Thanks, > Jacob Chappell > > On Mon, Jun 5, 2017 at 9:05 AM, Douglas Jacobsen > wrote: > >> Sorry, I meant GrpTRESMins, but that usage is decayed, as Chris >> mentioned, based on the decay rate half life. In your scenario however, it >> seems like not decaying usage would make sense. >> >> Are you wanting to consider recent usage when making priority decisions? >> >> On Jun 5, 2017 5:53 AM, "Douglas Jacobsen" wrote: >> >>> I think you could still set GrpTRESRunMins on an account or association >>> to set hard quotas. >>> >>> On Jun 5, 2017 5:21 AM, "Jacob Chappell" wrote: >>> Hi Chris, Thank you very much for the details and clarification. It's unfortunate that you can't have both fairshare and fixed quotas. I'll pass this information along to my supervisors. Jacob Chappell On Sun, Jun 4, 2017 at 7:55 PM, Christopher Samuel < sam...@unimelb.edu.au> wrote: > > On 03/06/17 07:03, Jacob Chappell wrote: > > > Sorry, that was a mouthful, but important. Does anyone know if Slurm > can > > accomplish this for me. If so how? > > This was how we used to run prior to switching to fair-share. > > Basically you set: > > PriorityDecayHalfLife=0 > > which stops the values decaying over time so once they hit their limit > that's it. > > We also set: > > PriorityUsageResetPeriod=QUARTERLY > > so that limits would reset on the quarter boundaries. This was because > we used to have fixed quarterly allocations for projects. > > We went to fair-share because of a change of the funding model for us > meant previous rules were removed and so we could go to fair-share > which > meant a massive improvement in utilisation (compute nodes were no > longer > idle with jobs waiting but unable to run because of being out of > quota). > > NOTE: You can't have both fairshare and hard quotas at the same time. > > All the best, > Chris > -- > Christopher SamuelSenior Systems Administrator > Melbourne Bioinformatics - The University of Melbourne > Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 > > >
[slurm-dev] Re: Accounting: preventing scheduling after TRES limit reached (permanently)
I believe you can use fairshare without decaying usage, the fairshares will only decline over time is all. This may mean that a user that consumes a large portion of their share early may have trouble getting priority later. On Jun 5, 2017 7:10 AM, "Jacob Chappell"wrote: Hi Douglas, It'd be nice to have the ability to incorporate recent usage into the priority, but it seems like I can't do both that *and* have hard limits right? I think hard limits are most important between the two. I should just be able to set the FairshareWeight to 0 to ignore that component in the priority, but still enforce the limits with the GrpMins parameters right? Thanks, Jacob Chappell On Mon, Jun 5, 2017 at 9:05 AM, Douglas Jacobsen wrote: > Sorry, I meant GrpTRESMins, but that usage is decayed, as Chris mentioned, > based on the decay rate half life. In your scenario however, it seems like > not decaying usage would make sense. > > Are you wanting to consider recent usage when making priority decisions? > > On Jun 5, 2017 5:53 AM, "Douglas Jacobsen" wrote: > >> I think you could still set GrpTRESRunMins on an account or association >> to set hard quotas. >> >> On Jun 5, 2017 5:21 AM, "Jacob Chappell" wrote: >> >>> Hi Chris, >>> >>> Thank you very much for the details and clarification. It's unfortunate >>> that you can't have both fairshare and fixed quotas. I'll pass this >>> information along to my supervisors. >>> >>> Jacob Chappell >>> >>> On Sun, Jun 4, 2017 at 7:55 PM, Christopher Samuel < >>> sam...@unimelb.edu.au> wrote: >>> On 03/06/17 07:03, Jacob Chappell wrote: > Sorry, that was a mouthful, but important. Does anyone know if Slurm can > accomplish this for me. If so how? This was how we used to run prior to switching to fair-share. Basically you set: PriorityDecayHalfLife=0 which stops the values decaying over time so once they hit their limit that's it. We also set: PriorityUsageResetPeriod=QUARTERLY so that limits would reset on the quarter boundaries. This was because we used to have fixed quarterly allocations for projects. We went to fair-share because of a change of the funding model for us meant previous rules were removed and so we could go to fair-share which meant a massive improvement in utilisation (compute nodes were no longer idle with jobs waiting but unable to run because of being out of quota). NOTE: You can't have both fairshare and hard quotas at the same time. All the best, Chris -- Christopher SamuelSenior Systems Administrator Melbourne Bioinformatics - The University of Melbourne Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 >>> >>>
[slurm-dev] Re: Accounting: preventing scheduling after TRES limit reached (permanently)
Hi Douglas, It'd be nice to have the ability to incorporate recent usage into the priority, but it seems like I can't do both that *and* have hard limits right? I think hard limits are most important between the two. I should just be able to set the FairshareWeight to 0 to ignore that component in the priority, but still enforce the limits with the GrpMins parameters right? Thanks, Jacob Chappell On Mon, Jun 5, 2017 at 9:05 AM, Douglas Jacobsenwrote: > Sorry, I meant GrpTRESMins, but that usage is decayed, as Chris mentioned, > based on the decay rate half life. In your scenario however, it seems like > not decaying usage would make sense. > > Are you wanting to consider recent usage when making priority decisions? > > On Jun 5, 2017 5:53 AM, "Douglas Jacobsen" wrote: > >> I think you could still set GrpTRESRunMins on an account or association >> to set hard quotas. >> >> On Jun 5, 2017 5:21 AM, "Jacob Chappell" wrote: >> >>> Hi Chris, >>> >>> Thank you very much for the details and clarification. It's unfortunate >>> that you can't have both fairshare and fixed quotas. I'll pass this >>> information along to my supervisors. >>> >>> Jacob Chappell >>> >>> On Sun, Jun 4, 2017 at 7:55 PM, Christopher Samuel < >>> sam...@unimelb.edu.au> wrote: >>> On 03/06/17 07:03, Jacob Chappell wrote: > Sorry, that was a mouthful, but important. Does anyone know if Slurm can > accomplish this for me. If so how? This was how we used to run prior to switching to fair-share. Basically you set: PriorityDecayHalfLife=0 which stops the values decaying over time so once they hit their limit that's it. We also set: PriorityUsageResetPeriod=QUARTERLY so that limits would reset on the quarter boundaries. This was because we used to have fixed quarterly allocations for projects. We went to fair-share because of a change of the funding model for us meant previous rules were removed and so we could go to fair-share which meant a massive improvement in utilisation (compute nodes were no longer idle with jobs waiting but unable to run because of being out of quota). NOTE: You can't have both fairshare and hard quotas at the same time. All the best, Chris -- Christopher SamuelSenior Systems Administrator Melbourne Bioinformatics - The University of Melbourne Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 >>> >>>
[slurm-dev] Re: Accounting: preventing scheduling after TRES limit reached (permanently)
Sorry, I meant GrpTRESMins, but that usage is decayed, as Chris mentioned, based on the decay rate half life. In your scenario however, it seems like not decaying usage would make sense. Are you wanting to consider recent usage when making priority decisions? On Jun 5, 2017 5:53 AM, "Douglas Jacobsen"wrote: > I think you could still set GrpTRESRunMins on an account or association > to set hard quotas. > > On Jun 5, 2017 5:21 AM, "Jacob Chappell" wrote: > >> Hi Chris, >> >> Thank you very much for the details and clarification. It's unfortunate >> that you can't have both fairshare and fixed quotas. I'll pass this >> information along to my supervisors. >> >> Jacob Chappell >> >> On Sun, Jun 4, 2017 at 7:55 PM, Christopher Samuel > > wrote: >> >>> >>> On 03/06/17 07:03, Jacob Chappell wrote: >>> >>> > Sorry, that was a mouthful, but important. Does anyone know if Slurm >>> can >>> > accomplish this for me. If so how? >>> >>> This was how we used to run prior to switching to fair-share. >>> >>> Basically you set: >>> >>> PriorityDecayHalfLife=0 >>> >>> which stops the values decaying over time so once they hit their limit >>> that's it. >>> >>> We also set: >>> >>> PriorityUsageResetPeriod=QUARTERLY >>> >>> so that limits would reset on the quarter boundaries. This was because >>> we used to have fixed quarterly allocations for projects. >>> >>> We went to fair-share because of a change of the funding model for us >>> meant previous rules were removed and so we could go to fair-share which >>> meant a massive improvement in utilisation (compute nodes were no longer >>> idle with jobs waiting but unable to run because of being out of quota). >>> >>> NOTE: You can't have both fairshare and hard quotas at the same time. >>> >>> All the best, >>> Chris >>> -- >>> Christopher SamuelSenior Systems Administrator >>> Melbourne Bioinformatics - The University of Melbourne >>> Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 >>> >> >>
[slurm-dev] Re: Accounting: preventing scheduling after TRES limit reached (permanently)
I think you could still set GrpTRESRunMins on an account or association to set hard quotas. On Jun 5, 2017 5:21 AM, "Jacob Chappell"wrote: > Hi Chris, > > Thank you very much for the details and clarification. It's unfortunate > that you can't have both fairshare and fixed quotas. I'll pass this > information along to my supervisors. > > Jacob Chappell > > On Sun, Jun 4, 2017 at 7:55 PM, Christopher Samuel > wrote: > >> >> On 03/06/17 07:03, Jacob Chappell wrote: >> >> > Sorry, that was a mouthful, but important. Does anyone know if Slurm can >> > accomplish this for me. If so how? >> >> This was how we used to run prior to switching to fair-share. >> >> Basically you set: >> >> PriorityDecayHalfLife=0 >> >> which stops the values decaying over time so once they hit their limit >> that's it. >> >> We also set: >> >> PriorityUsageResetPeriod=QUARTERLY >> >> so that limits would reset on the quarter boundaries. This was because >> we used to have fixed quarterly allocations for projects. >> >> We went to fair-share because of a change of the funding model for us >> meant previous rules were removed and so we could go to fair-share which >> meant a massive improvement in utilisation (compute nodes were no longer >> idle with jobs waiting but unable to run because of being out of quota). >> >> NOTE: You can't have both fairshare and hard quotas at the same time. >> >> All the best, >> Chris >> -- >> Christopher SamuelSenior Systems Administrator >> Melbourne Bioinformatics - The University of Melbourne >> Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 >> > >
[slurm-dev] Re: Accounting: preventing scheduling after TRES limit reached (permanently)
Hi Chris, Thank you very much for the details and clarification. It's unfortunate that you can't have both fairshare and fixed quotas. I'll pass this information along to my supervisors. Jacob Chappell On Sun, Jun 4, 2017 at 7:55 PM, Christopher Samuelwrote: > > On 03/06/17 07:03, Jacob Chappell wrote: > > > Sorry, that was a mouthful, but important. Does anyone know if Slurm can > > accomplish this for me. If so how? > > This was how we used to run prior to switching to fair-share. > > Basically you set: > > PriorityDecayHalfLife=0 > > which stops the values decaying over time so once they hit their limit > that's it. > > We also set: > > PriorityUsageResetPeriod=QUARTERLY > > so that limits would reset on the quarter boundaries. This was because > we used to have fixed quarterly allocations for projects. > > We went to fair-share because of a change of the funding model for us > meant previous rules were removed and so we could go to fair-share which > meant a massive improvement in utilisation (compute nodes were no longer > idle with jobs waiting but unable to run because of being out of quota). > > NOTE: You can't have both fairshare and hard quotas at the same time. > > All the best, > Chris > -- > Christopher SamuelSenior Systems Administrator > Melbourne Bioinformatics - The University of Melbourne > Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 >