[slurm-dev] How to allow "cleanup" work to be done when a job is cancelled?

2017-06-05 Thread Jeff White


My cluster uses Slurm's preemption feature to allow jobs to run on 
otherwise idle resources but still allow owners of compute hardware to 
have on-demand access to their equipment.  However, preemption means a 
cancelled job.  One of our users is asking if there is a method to have 
his job do cleanup work when it is cancelled/preempted.  Right now the 
user simply has cleanup code at the end of the job script but of course 
this won't run if the job dies or is cancelled.


Any idea what I can suggest to this person to have their "cleanup" code 
always run when the job ends for any reason?


--
Jeff White
HPC Systems Engineer
Information Technology Services - WSU


[slurm-dev] Re: Launching a VMWare Virtual Machine

2017-06-05 Thread Sean M
Thanks John for your response. Unfortunately, setting the -l in the header
did not keep the VMWare window open. Do you or anyone else have any other
suggestions?

Thanks,
Sean

On Fri, Jun 2, 2017 at 10:59 AM, John Hearns  wrote:

> Sean,
> this sound slike the difference between interactive and non-interactive
> shells.
>
> When you log in directly to the node, you have an interactive shell and
> the environment is set up, and /etc/profile.d scripts are sourced.
> Someone will be along in a minute with the correct answer, however try
> submitting with #!/bin/bash -l
>
>
>
>
> On 2 June 2017 at 01:18, Sean M  wrote:
>
>> Greetings,
>>
>> I am trying to schedule a VMWare VM to start automatically but once the
>> slurm script is submitted and executed, VMWare launches, it's window
>> appears, and closes immediately without launching the VM. When I run VMWare
>> with "nogui", the VM also does not run. For these cases, there are no
>> errors in the VMWare or slurm logs. Also, if I schedule just VMWare to
>> open, it opens but requires human interaction to launch the VM, which is
>> not feasible for my use case.
>>
>> On my base case, I have two machines: my node is running Ubuntu Desktop
>> 17 and the controller Ubuntu Server.
>>
>> I have tried two methods.
>> Method 1: My controller submits a script with the following command:
>> vmrun -T ws start 
>>
>> Method 2: My controller executes a bash script on the node. The node's
>> bash script has the following command:
>> vmrun -T ws start 
>>
>> Both methods have the same result: the VMWare window appears briefly and
>> then closes. The VM launches perfectly if I execute Method 2's bash script
>> directly on the node; the bash script is owned by the same user and group
>> with root access on the node and controller and has 777 rights. Here is a
>> weird thing, if I change method 1's script (on the same line) to ssh into
>> the node and launch the vmrun command, the VM successfully starts
>> automatically. The ssh solution is not ideal because I will not know in the
>> future which node will get the job. Any suggestions on how to resolve this
>> issue?
>>
>> Thanks!
>> Sean
>>
>
>


[slurm-dev] Re: Accounting: preventing scheduling after TRES limit reached (permanently)

2017-06-05 Thread Jacob Chappell
Hi Douglas,

Thanks for the insight. That may actually be desirable, and it's good to
know that it's supported like that. I'll speak with my supervisors about
what policies they are wanting giving these new details.

I appreciate your all's help.

Jacob Chappell

On Mon, Jun 5, 2017 at 10:20 AM, Douglas Jacobsen 
wrote:

> I believe you can use fairshare without decaying usage, the fairshares
> will only decline over time is all.  This may mean that a user that
> consumes a large portion of their share early may have trouble getting
> priority later.
>
> On Jun 5, 2017 7:10 AM, "Jacob Chappell"  wrote:
>
> Hi Douglas,
>
> It'd be nice to have the ability to incorporate recent usage into the
> priority, but it seems like I can't do both that *and* have hard limits
> right? I think hard limits are most important between the two. I should
> just be able to set the FairshareWeight to 0 to ignore that component in
> the priority, but still enforce the limits with the GrpMins parameters
> right?
>
> Thanks,
> Jacob Chappell
>
> On Mon, Jun 5, 2017 at 9:05 AM, Douglas Jacobsen 
> wrote:
>
>> Sorry, I meant GrpTRESMins, but that usage is decayed, as Chris
>> mentioned, based on the decay rate half life.  In your scenario however, it
>> seems like not decaying usage would make sense.
>>
>> Are you wanting to consider recent usage when making priority decisions?
>>
>> On Jun 5, 2017 5:53 AM, "Douglas Jacobsen"  wrote:
>>
>>> I think you could still set GrpTRESRunMins on an account or association
>>> to set hard quotas.
>>>
>>> On Jun 5, 2017 5:21 AM, "Jacob Chappell"  wrote:
>>>
 Hi Chris,

 Thank you very much for the details and clarification. It's unfortunate
 that you can't have both fairshare and fixed quotas. I'll pass this
 information along to my supervisors.

 Jacob Chappell

 On Sun, Jun 4, 2017 at 7:55 PM, Christopher Samuel <
 sam...@unimelb.edu.au> wrote:

>
> On 03/06/17 07:03, Jacob Chappell wrote:
>
> > Sorry, that was a mouthful, but important. Does anyone know if Slurm
> can
> > accomplish this for me. If so how?
>
> This was how we used to run prior to switching to fair-share.
>
> Basically you set:
>
> PriorityDecayHalfLife=0
>
> which stops the values decaying over time so once they hit their limit
> that's it.
>
> We also set:
>
> PriorityUsageResetPeriod=QUARTERLY
>
> so that limits would reset on the quarter boundaries.  This was because
> we used to have fixed quarterly allocations for projects.
>
> We went to fair-share because of a change of the funding model for us
> meant previous rules were removed and so we could go to fair-share
> which
> meant a massive improvement in utilisation (compute nodes were no
> longer
> idle with jobs waiting but unable to run because of being out of
> quota).
>
> NOTE: You can't have both fairshare and hard quotas at the same time.
>
> All the best,
> Chris
> --
>  Christopher SamuelSenior Systems Administrator
>  Melbourne Bioinformatics - The University of Melbourne
>  Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
>


>
>


[slurm-dev] Re: Accounting: preventing scheduling after TRES limit reached (permanently)

2017-06-05 Thread Douglas Jacobsen
I believe you can use fairshare without decaying usage, the fairshares will
only decline over time is all.  This may mean that a user that consumes a
large portion of their share early may have trouble getting priority later.


On Jun 5, 2017 7:10 AM, "Jacob Chappell"  wrote:

Hi Douglas,

It'd be nice to have the ability to incorporate recent usage into the
priority, but it seems like I can't do both that *and* have hard limits
right? I think hard limits are most important between the two. I should
just be able to set the FairshareWeight to 0 to ignore that component in
the priority, but still enforce the limits with the GrpMins parameters
right?

Thanks,
Jacob Chappell

On Mon, Jun 5, 2017 at 9:05 AM, Douglas Jacobsen  wrote:

> Sorry, I meant GrpTRESMins, but that usage is decayed, as Chris mentioned,
> based on the decay rate half life.  In your scenario however, it seems like
> not decaying usage would make sense.
>
> Are you wanting to consider recent usage when making priority decisions?
>
> On Jun 5, 2017 5:53 AM, "Douglas Jacobsen"  wrote:
>
>> I think you could still set GrpTRESRunMins on an account or association
>> to set hard quotas.
>>
>> On Jun 5, 2017 5:21 AM, "Jacob Chappell"  wrote:
>>
>>> Hi Chris,
>>>
>>> Thank you very much for the details and clarification. It's unfortunate
>>> that you can't have both fairshare and fixed quotas. I'll pass this
>>> information along to my supervisors.
>>>
>>> Jacob Chappell
>>>
>>> On Sun, Jun 4, 2017 at 7:55 PM, Christopher Samuel <
>>> sam...@unimelb.edu.au> wrote:
>>>

 On 03/06/17 07:03, Jacob Chappell wrote:

 > Sorry, that was a mouthful, but important. Does anyone know if Slurm
 can
 > accomplish this for me. If so how?

 This was how we used to run prior to switching to fair-share.

 Basically you set:

 PriorityDecayHalfLife=0

 which stops the values decaying over time so once they hit their limit
 that's it.

 We also set:

 PriorityUsageResetPeriod=QUARTERLY

 so that limits would reset on the quarter boundaries.  This was because
 we used to have fixed quarterly allocations for projects.

 We went to fair-share because of a change of the funding model for us
 meant previous rules were removed and so we could go to fair-share which
 meant a massive improvement in utilisation (compute nodes were no longer
 idle with jobs waiting but unable to run because of being out of quota).

 NOTE: You can't have both fairshare and hard quotas at the same time.

 All the best,
 Chris
 --
  Christopher SamuelSenior Systems Administrator
  Melbourne Bioinformatics - The University of Melbourne
  Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545

>>>
>>>


[slurm-dev] Re: Accounting: preventing scheduling after TRES limit reached (permanently)

2017-06-05 Thread Jacob Chappell
Hi Douglas,

It'd be nice to have the ability to incorporate recent usage into the
priority, but it seems like I can't do both that *and* have hard limits
right? I think hard limits are most important between the two. I should
just be able to set the FairshareWeight to 0 to ignore that component in
the priority, but still enforce the limits with the GrpMins parameters
right?

Thanks,
Jacob Chappell

On Mon, Jun 5, 2017 at 9:05 AM, Douglas Jacobsen  wrote:

> Sorry, I meant GrpTRESMins, but that usage is decayed, as Chris mentioned,
> based on the decay rate half life.  In your scenario however, it seems like
> not decaying usage would make sense.
>
> Are you wanting to consider recent usage when making priority decisions?
>
> On Jun 5, 2017 5:53 AM, "Douglas Jacobsen"  wrote:
>
>> I think you could still set GrpTRESRunMins on an account or association
>> to set hard quotas.
>>
>> On Jun 5, 2017 5:21 AM, "Jacob Chappell"  wrote:
>>
>>> Hi Chris,
>>>
>>> Thank you very much for the details and clarification. It's unfortunate
>>> that you can't have both fairshare and fixed quotas. I'll pass this
>>> information along to my supervisors.
>>>
>>> Jacob Chappell
>>>
>>> On Sun, Jun 4, 2017 at 7:55 PM, Christopher Samuel <
>>> sam...@unimelb.edu.au> wrote:
>>>

 On 03/06/17 07:03, Jacob Chappell wrote:

 > Sorry, that was a mouthful, but important. Does anyone know if Slurm
 can
 > accomplish this for me. If so how?

 This was how we used to run prior to switching to fair-share.

 Basically you set:

 PriorityDecayHalfLife=0

 which stops the values decaying over time so once they hit their limit
 that's it.

 We also set:

 PriorityUsageResetPeriod=QUARTERLY

 so that limits would reset on the quarter boundaries.  This was because
 we used to have fixed quarterly allocations for projects.

 We went to fair-share because of a change of the funding model for us
 meant previous rules were removed and so we could go to fair-share which
 meant a massive improvement in utilisation (compute nodes were no longer
 idle with jobs waiting but unable to run because of being out of quota).

 NOTE: You can't have both fairshare and hard quotas at the same time.

 All the best,
 Chris
 --
  Christopher SamuelSenior Systems Administrator
  Melbourne Bioinformatics - The University of Melbourne
  Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545

>>>
>>>


[slurm-dev] Re: Accounting: preventing scheduling after TRES limit reached (permanently)

2017-06-05 Thread Douglas Jacobsen
Sorry, I meant GrpTRESMins, but that usage is decayed, as Chris mentioned,
based on the decay rate half life.  In your scenario however, it seems like
not decaying usage would make sense.

Are you wanting to consider recent usage when making priority decisions?

On Jun 5, 2017 5:53 AM, "Douglas Jacobsen"  wrote:

> I think you could still set GrpTRESRunMins on an account or association
> to set hard quotas.
>
> On Jun 5, 2017 5:21 AM, "Jacob Chappell"  wrote:
>
>> Hi Chris,
>>
>> Thank you very much for the details and clarification. It's unfortunate
>> that you can't have both fairshare and fixed quotas. I'll pass this
>> information along to my supervisors.
>>
>> Jacob Chappell
>>
>> On Sun, Jun 4, 2017 at 7:55 PM, Christopher Samuel > > wrote:
>>
>>>
>>> On 03/06/17 07:03, Jacob Chappell wrote:
>>>
>>> > Sorry, that was a mouthful, but important. Does anyone know if Slurm
>>> can
>>> > accomplish this for me. If so how?
>>>
>>> This was how we used to run prior to switching to fair-share.
>>>
>>> Basically you set:
>>>
>>> PriorityDecayHalfLife=0
>>>
>>> which stops the values decaying over time so once they hit their limit
>>> that's it.
>>>
>>> We also set:
>>>
>>> PriorityUsageResetPeriod=QUARTERLY
>>>
>>> so that limits would reset on the quarter boundaries.  This was because
>>> we used to have fixed quarterly allocations for projects.
>>>
>>> We went to fair-share because of a change of the funding model for us
>>> meant previous rules were removed and so we could go to fair-share which
>>> meant a massive improvement in utilisation (compute nodes were no longer
>>> idle with jobs waiting but unable to run because of being out of quota).
>>>
>>> NOTE: You can't have both fairshare and hard quotas at the same time.
>>>
>>> All the best,
>>> Chris
>>> --
>>>  Christopher SamuelSenior Systems Administrator
>>>  Melbourne Bioinformatics - The University of Melbourne
>>>  Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
>>>
>>
>>


[slurm-dev] Re: Accounting: preventing scheduling after TRES limit reached (permanently)

2017-06-05 Thread Douglas Jacobsen
I think you could still set GrpTRESRunMins on an account or association to
set hard quotas.

On Jun 5, 2017 5:21 AM, "Jacob Chappell"  wrote:

> Hi Chris,
>
> Thank you very much for the details and clarification. It's unfortunate
> that you can't have both fairshare and fixed quotas. I'll pass this
> information along to my supervisors.
>
> Jacob Chappell
>
> On Sun, Jun 4, 2017 at 7:55 PM, Christopher Samuel 
> wrote:
>
>>
>> On 03/06/17 07:03, Jacob Chappell wrote:
>>
>> > Sorry, that was a mouthful, but important. Does anyone know if Slurm can
>> > accomplish this for me. If so how?
>>
>> This was how we used to run prior to switching to fair-share.
>>
>> Basically you set:
>>
>> PriorityDecayHalfLife=0
>>
>> which stops the values decaying over time so once they hit their limit
>> that's it.
>>
>> We also set:
>>
>> PriorityUsageResetPeriod=QUARTERLY
>>
>> so that limits would reset on the quarter boundaries.  This was because
>> we used to have fixed quarterly allocations for projects.
>>
>> We went to fair-share because of a change of the funding model for us
>> meant previous rules were removed and so we could go to fair-share which
>> meant a massive improvement in utilisation (compute nodes were no longer
>> idle with jobs waiting but unable to run because of being out of quota).
>>
>> NOTE: You can't have both fairshare and hard quotas at the same time.
>>
>> All the best,
>> Chris
>> --
>>  Christopher SamuelSenior Systems Administrator
>>  Melbourne Bioinformatics - The University of Melbourne
>>  Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
>>
>
>


[slurm-dev] Re: Accounting: preventing scheduling after TRES limit reached (permanently)

2017-06-05 Thread Jacob Chappell
Hi Chris,

Thank you very much for the details and clarification. It's unfortunate
that you can't have both fairshare and fixed quotas. I'll pass this
information along to my supervisors.

Jacob Chappell

On Sun, Jun 4, 2017 at 7:55 PM, Christopher Samuel 
wrote:

>
> On 03/06/17 07:03, Jacob Chappell wrote:
>
> > Sorry, that was a mouthful, but important. Does anyone know if Slurm can
> > accomplish this for me. If so how?
>
> This was how we used to run prior to switching to fair-share.
>
> Basically you set:
>
> PriorityDecayHalfLife=0
>
> which stops the values decaying over time so once they hit their limit
> that's it.
>
> We also set:
>
> PriorityUsageResetPeriod=QUARTERLY
>
> so that limits would reset on the quarter boundaries.  This was because
> we used to have fixed quarterly allocations for projects.
>
> We went to fair-share because of a change of the funding model for us
> meant previous rules were removed and so we could go to fair-share which
> meant a massive improvement in utilisation (compute nodes were no longer
> idle with jobs waiting but unable to run because of being out of quota).
>
> NOTE: You can't have both fairshare and hard quotas at the same time.
>
> All the best,
> Chris
> --
>  Christopher SamuelSenior Systems Administrator
>  Melbourne Bioinformatics - The University of Melbourne
>  Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
>