Re: [slurm-users] wckey specification error

2018-05-01 Thread Mahmood Naderan
Thanks Trevor for pointing out that there is an option for such thing
is slurm.conf. Although I previously greped for *wc* and found
nothing, the correct name is TrackWCKey which is set to "yes" by
default. After setting that to "no", the error disappeared.

About the comments on Rocks and the Slurm roll... in my experiences,
rocks 7 is very good and the unofficial slurm roll provided by Werner
is also very good. It is worth to give them a try. Although I had some
experiences with manual slurm installation on an ubuntu cluster some
years ago, the automatic installation of the roll was very nice
indeed! All the commands and configurations can be extracted from the
roll. So there is no dark point about that. Limited issues about
slurm, e.g. installation, are directly related to Werner. Most of the
other question are related to the slurm itself. For example accounting
and other things.


Regards,
Mahmood





On Tue, May 1, 2018 at 9:35 PM, Cooper, Trevor  wrote:
>
>> On May 1, 2018, at 2:58 AM, John Hearns  wrote:
>>
>> Rocks 7 is now available, which is based on CentOS 7.4
>> I hate to be uncharitable, but I am not a fan of Rocks. I speak from 
>> experience, having installed my share of Rocks clusters.
>> The philosophy just does not fit in with the way I look at the world.
>>
>> Anyway, to install extra software on Rocks you need a 'Roll'   Mahmood Looks 
>> like you are using this Roll
>> https://sourceforge.net/projects/slurm-roll/
>> It seems pretty mpdern as it installs Slurm 17.11.3
>>
>>
>> On 1 May 2018 at 11:40, Chris Samuel  wrote:
>> On Tuesday, 1 May 2018 2:45:21 PM AEST Mahmood Naderan wrote:
>>
>> > The wckey explanation in the manual [1] is not meaningful at the
>> > moment. Can someone explain that?
>>
>> I've never used it, but it sounds like you've configured your system to 
>> require
>> it (or perhaps Rocks has done that?).
>>
>> https://slurm.schedmd.com/wckey.html
>>
>> Good luck,
>> Chris
>> --
>>  Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC
>>
>>
>
> The slurm-roll hosted on sourceforge is developed and supported by Werner 
> Saar not by developers of Rocks and/or other Rocks application rolls (e.g. 
> SDSC).
>
> There is ample documentation on sourceforge[1] on how to configure your Rocks 
> cluster to properly deploy the slurm-roll components and update your Slurm 
> configuration.
>
> There is also an active discussion group for the slurm-roll on sourceforge[2] 
> where Werner supports users of the slurm-roll for Rocks.
>
> While we don't use Werner's slurm-roll on our Rocks/Slurm based systems I 
> have installed it on test system and can say that it works as 
> expected/documented.
>
> In the default configuration WCKeys were NOT enabled so this something that 
> you must have added to your Slurm configuration.
>
> If you don't need the WCKeys capability of Slurm perhaps you could simply 
> disable it in your Slurm configuration.
>
> Hope this helps,
> Trevor
>
> [1] - 
> https://sourceforge.net/projects/slurm-roll/files/release-7.0.0-17.11.05/slurm-roll.pdf
> [2] - https://sourceforge.net/p/slurm-roll/discussion/
>
> --
> Trevor Cooper
> HPC Systems Programmer
> San Diego Supercomputer Center, UCSD
> 9500 Gilman Drive, 0505
> La Jolla, CA 92093-0505
>



Re: [slurm-users] GPU / cgroup challenges

2018-05-01 Thread Christopher Samuel

On 02/05/18 10:15, R. Paul Wiegand wrote:

Yes, I am sure they are all the same.  Typically, I just scontrol 
reconfig; however, I have also tried restarting all daemons.


Understood. Any diagnostics in the slurmd logs when trying to start
a GPU job on the node?


We are moving to 7.4 in a few weeks during our downtime.  We had a
QDR -> OFED version constraint -> Lustre client version constraint
issue that delayed our upgrade.


I feel your pain..  BTW RHEL 7.5 is out now so you'll need that if
you need current security fixes.


Should I just wait and test after the upgrade?


Well 17.11.6 will be out then that will include for a deadlock
that some sites hit occasionally, so that will be worth throwing
into the mix too.   Do read the RELEASE_NOTES carefully though,
especially if you're using slurmdbd!

All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC



Re: [slurm-users] GPU / cgroup challenges

2018-05-01 Thread R. Paul Wiegand
Yes, I am sure they are all the same.  Typically, I just scontrol reconfig;
however, I have also tried restarting all daemons.

We are moving to 7.4 in a few weeks during our downtime.  We had a QDR ->
OFED version constraint -> Lustre client version constraint issue that
delayed our upgrade.

Should I just wait and test after the upgrade?

On Tue, May 1, 2018, 19:56 Christopher Samuel  wrote:

> On 02/05/18 09:31, R. Paul Wiegand wrote:
>
> > Slurm 17.11.0 on CentOS 7.1
>
> That's quite old (on both fronts, RHEL 7.1 is from 2015), we started on
> that same Slurm release but didn't do the GPU cgroup stuff until a later
> version (17.11.3 on RHEL 7.4).
>
> I don't see anything in the NEWS file about relevant cgroup changes
> though (there is a cgroup affinity fix but that's unrelated).
>
> You do have identical slurm.conf, cgroup.conf,
> cgroup_allowed_devices_file.conf etc on all the compute nodes too?
> Slurmd and slurmctld have both been restarted since they were
> configured?
>
> All the best,
> Chris
> --
>   Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC
>
>


Re: [slurm-users] GPU / cgroup challenges

2018-05-01 Thread Christopher Samuel

On 02/05/18 09:31, R. Paul Wiegand wrote:


Slurm 17.11.0 on CentOS 7.1


That's quite old (on both fronts, RHEL 7.1 is from 2015), we started on
that same Slurm release but didn't do the GPU cgroup stuff until a later
version (17.11.3 on RHEL 7.4).

I don't see anything in the NEWS file about relevant cgroup changes
though (there is a cgroup affinity fix but that's unrelated).

You do have identical slurm.conf, cgroup.conf,
cgroup_allowed_devices_file.conf etc on all the compute nodes too?
Slurmd and slurmctld have both been restarted since they were
configured?

All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC



Re: [slurm-users] GPU / cgroup challenges

2018-05-01 Thread Kevin Manalo
Chris,

Thanks for the correction there, that /dev/nvidia* isn’t needed in 
[cgroup_allowed_devices_file.conf] for constraining GPU devices.

-Kevin

From: slurm-users  on behalf of "R. Paul 
Wiegand" 
Reply-To: "p...@tesseract.org" , Slurm User Community List 

Date: Tuesday, May 1, 2018 at 7:34 PM
To: Slurm User Community List 
Subject: Re: [slurm-users] GPU / cgroup challenges

Slurm 17.11.0 on CentOS 7.1

On Tue, May 1, 2018, 19:26 Christopher Samuel 
> wrote:
On 02/05/18 09:23, R. Paul Wiegand wrote:

> I thought including the /dev/nvidia* would whitelist those devices
> ... which seems to be the opposite of what I want, no?  Or do I
> misunderstand?

No, I think you're right there, we don't have them listed and cgroups
constrains it correctly (nvidia-smi says no devices when you don't
request any GPUs).

Which version of Slurm are you on?

cheers,
Chris
--
  Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC


Re: [slurm-users] GPU / cgroup challenges

2018-05-01 Thread R. Paul Wiegand
Slurm 17.11.0 on CentOS 7.1

On Tue, May 1, 2018, 19:26 Christopher Samuel  wrote:

> On 02/05/18 09:23, R. Paul Wiegand wrote:
>
> > I thought including the /dev/nvidia* would whitelist those devices
> > ... which seems to be the opposite of what I want, no?  Or do I
> > misunderstand?
>
> No, I think you're right there, we don't have them listed and cgroups
> constrains it correctly (nvidia-smi says no devices when you don't
> request any GPUs).
>
> Which version of Slurm are you on?
>
> cheers,
> Chris
> --
>   Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC
>
>


Re: [slurm-users] GPU / cgroup challenges

2018-05-01 Thread R. Paul Wiegand
Thanks Chris.  I do have the ConstrainDevices turned on.  Are the
differences in your cgroup_allowed_devices_file.conf relevant in this case?

On Tue, May 1, 2018, 19:23 Christopher Samuel  wrote:

> On 02/05/18 09:00, Kevin Manalo wrote:
>
> > Also, I recall appending this to the bottom of
> >
> > [cgroup_allowed_devices_file.conf]
> > ..
> > Same as yours
> > ...
> > /dev/nvidia*
> >
> > There was a SLURM bug issue that made this clear, not so much in the
> website docs.
>
> That shouldn't be necessary, all we have for this is..
>
> The relevant line from our cgroup.conf:
>
> [...]
> # Constrain devices via cgroups (to limits access to GPUs etc)
> ConstrainDevices=yes
> [...]
>
> Our entire cgroup_allowed_devices_file.conf:
>
> /dev/null
> /dev/urandom
> /dev/zero
> /dev/sda*
> /dev/cpu/*/*
> /dev/pts/*
> /dev/ram
> /dev/random
> /dev/hfi*
>
>
> This is on RHEL7.
>
> --
>   Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC
>
>


Re: [slurm-users] GPU / cgroup challenges

2018-05-01 Thread R. Paul Wiegand
Thanks Kevin!

Indeed, nvidia-smi in an interactive job tells me that I can get access to
the device when I should not be able to.

I thought including the /dev/nvidia* would whitelist those devices ...
which seems to be the opposite of what I want, no?  Or do I misunderstand?

Thanks,
Paul

On Tue, May 1, 2018, 19:00 Kevin Manalo  wrote:

> Paul,
>
> Having recently set this up, this was my test, when you make a single GPU
> request from inside an interactive run (salloc ... --gres=gpu:1 srun --pty
> bash) request you should only see the GPU assigned to you via 'nvidia-smi'
>
> When gres is unset you should see
>
> nvidia-smi
> No devices were found
>
> Otherwise, if you ask for 1 of 2, you should only see 1 device.
>
> Also, I recall appending this to the bottom of
>
> [cgroup_allowed_devices_file.conf]
> ..
> Same as yours
> ...
> /dev/nvidia*
>
> There was a SLURM bug issue that made this clear, not so much in the
> website docs.
>
> -Kevin
>
>
> On 5/1/18, 5:28 PM, "slurm-users on behalf of R. Paul Wiegand" <
> slurm-users-boun...@lists.schedmd.com on behalf of rpwieg...@gmail.com>
> wrote:
>
> Greetings,
>
> I am setting up our new GPU cluster, and I seem to have a problem
> configuring things so that the devices are properly walled off via
> cgroups.  Our nodes each of two GPUS; however, if --gres is unset, or
> set to --gres=gpu:0, I can access both GPUs from inside a job.
> Moreover, if I ask for just 1 GPU then unset the CUDA_VISIBLE_DEVICES
> environmental variable, I can access both GPUs.  From my
> understanding, this suggests that it is *not* being protected under
> cgroups.
>
> I've read the documentation, and I've read through a number of threads
> where people have resolved similar issues.  I've tried a lot of
> configurations, but to no avail. Below I include some snippets of
> relevant (current) parameters; however, I also am attaching most of
> our full conf files.
>
> [slurm.conf]
> ProctrackType=proctrack/cgroup
> TaskPlugin=task/cgroup
> SelectType=select/cons_res
> SelectTypeParameters=CR_Core_Memory
> JobAcctGatherType=jobacct_gather/linux
> AccountingStorageTRES=gres/gpu
> GresTypes=gpu
>
> NodeName=evc1 CPUs=32 RealMemory=191917 Sockets=2 CoresPerSocket=16
> ThreadsPerCore=1 State=UNKNOWN NodeAddr=ivc1 Weight=1 Gres=gpu:2
>
> [gres.conf]
> NodeName=evc[1-10] Name=gpu File=/dev/nvidia0
> COREs=0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30
> NodeName=evc[1-10] Name=gpu File=/dev/nvidia1
> COREs=1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31
>
> [cgroup.conf]
> ConstrainDevices=yes
>
> [cgroup_allowed_devices_file.conf]
> /dev/null
> /dev/urandom
> /dev/zero
> /dev/sda*
> /dev/cpu/*/*
> /dev/pts/*
>
> Thanks,
> Paul.
>
>
>


Re: [slurm-users] GPU / cgroup challenges

2018-05-01 Thread Christopher Samuel

On 02/05/18 09:00, Kevin Manalo wrote:


Also, I recall appending this to the bottom of

[cgroup_allowed_devices_file.conf]
..
Same as yours
...
/dev/nvidia*

There was a SLURM bug issue that made this clear, not so much in the website 
docs.


That shouldn't be necessary, all we have for this is..

The relevant line from our cgroup.conf:

[...]
# Constrain devices via cgroups (to limits access to GPUs etc)
ConstrainDevices=yes
[...]

Our entire cgroup_allowed_devices_file.conf:

/dev/null
/dev/urandom
/dev/zero
/dev/sda*
/dev/cpu/*/*
/dev/pts/*
/dev/ram
/dev/random
/dev/hfi*


This is on RHEL7.

--
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC



Re: [slurm-users] GPU / cgroup challenges

2018-05-01 Thread Kevin Manalo
Paul, 

Having recently set this up, this was my test, when you make a single GPU 
request from inside an interactive run (salloc ... --gres=gpu:1 srun --pty 
bash) request you should only see the GPU assigned to you via 'nvidia-smi'

When gres is unset you should see 

nvidia-smi
No devices were found

Otherwise, if you ask for 1 of 2, you should only see 1 device.

Also, I recall appending this to the bottom of 

[cgroup_allowed_devices_file.conf]
..
Same as yours
...
/dev/nvidia*

There was a SLURM bug issue that made this clear, not so much in the website 
docs.

-Kevin


On 5/1/18, 5:28 PM, "slurm-users on behalf of R. Paul Wiegand" 
 wrote:

Greetings,

I am setting up our new GPU cluster, and I seem to have a problem
configuring things so that the devices are properly walled off via
cgroups.  Our nodes each of two GPUS; however, if --gres is unset, or
set to --gres=gpu:0, I can access both GPUs from inside a job.
Moreover, if I ask for just 1 GPU then unset the CUDA_VISIBLE_DEVICES
environmental variable, I can access both GPUs.  From my
understanding, this suggests that it is *not* being protected under
cgroups.

I've read the documentation, and I've read through a number of threads
where people have resolved similar issues.  I've tried a lot of
configurations, but to no avail. Below I include some snippets of
relevant (current) parameters; however, I also am attaching most of
our full conf files.

[slurm.conf]
ProctrackType=proctrack/cgroup
TaskPlugin=task/cgroup
SelectType=select/cons_res
SelectTypeParameters=CR_Core_Memory
JobAcctGatherType=jobacct_gather/linux
AccountingStorageTRES=gres/gpu
GresTypes=gpu

NodeName=evc1 CPUs=32 RealMemory=191917 Sockets=2 CoresPerSocket=16
ThreadsPerCore=1 State=UNKNOWN NodeAddr=ivc1 Weight=1 Gres=gpu:2

[gres.conf]
NodeName=evc[1-10] Name=gpu File=/dev/nvidia0
COREs=0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30
NodeName=evc[1-10] Name=gpu File=/dev/nvidia1
COREs=1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31

[cgroup.conf]
ConstrainDevices=yes

[cgroup_allowed_devices_file.conf]
/dev/null
/dev/urandom
/dev/zero
/dev/sda*
/dev/cpu/*/*
/dev/pts/*

Thanks,
Paul.




[slurm-users] GPU / cgroup challenges

2018-05-01 Thread R. Paul Wiegand
Greetings,

I am setting up our new GPU cluster, and I seem to have a problem
configuring things so that the devices are properly walled off via
cgroups.  Our nodes each of two GPUS; however, if --gres is unset, or
set to --gres=gpu:0, I can access both GPUs from inside a job.
Moreover, if I ask for just 1 GPU then unset the CUDA_VISIBLE_DEVICES
environmental variable, I can access both GPUs.  From my
understanding, this suggests that it is *not* being protected under
cgroups.

I've read the documentation, and I've read through a number of threads
where people have resolved similar issues.  I've tried a lot of
configurations, but to no avail. Below I include some snippets of
relevant (current) parameters; however, I also am attaching most of
our full conf files.

[slurm.conf]
ProctrackType=proctrack/cgroup
TaskPlugin=task/cgroup
SelectType=select/cons_res
SelectTypeParameters=CR_Core_Memory
JobAcctGatherType=jobacct_gather/linux
AccountingStorageTRES=gres/gpu
GresTypes=gpu

NodeName=evc1 CPUs=32 RealMemory=191917 Sockets=2 CoresPerSocket=16
ThreadsPerCore=1 State=UNKNOWN NodeAddr=ivc1 Weight=1 Gres=gpu:2

[gres.conf]
NodeName=evc[1-10] Name=gpu File=/dev/nvidia0
COREs=0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30
NodeName=evc[1-10] Name=gpu File=/dev/nvidia1
COREs=1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31

[cgroup.conf]
ConstrainDevices=yes

[cgroup_allowed_devices_file.conf]
/dev/null
/dev/urandom
/dev/zero
/dev/sda*
/dev/cpu/*/*
/dev/pts/*

Thanks,
Paul.


cgroup_allowed_devices_file.conf
Description: Binary data


cgroup.conf
Description: Binary data


gres.conf
Description: Binary data


slurm.conf
Description: Binary data


Re: [slurm-users] wckey specification error

2018-05-01 Thread Cooper, Trevor

> On May 1, 2018, at 2:58 AM, John Hearns  wrote:
> 
> Rocks 7 is now available, which is based on CentOS 7.4
> I hate to be uncharitable, but I am not a fan of Rocks. I speak from 
> experience, having installed my share of Rocks clusters.
> The philosophy just does not fit in with the way I look at the world.
> 
> Anyway, to install extra software on Rocks you need a 'Roll'   Mahmood Looks 
> like you are using this Roll
> https://sourceforge.net/projects/slurm-roll/
> It seems pretty mpdern as it installs Slurm 17.11.3
> 
> 
> On 1 May 2018 at 11:40, Chris Samuel  wrote:
> On Tuesday, 1 May 2018 2:45:21 PM AEST Mahmood Naderan wrote:
> 
> > The wckey explanation in the manual [1] is not meaningful at the
> > moment. Can someone explain that?
> 
> I've never used it, but it sounds like you've configured your system to 
> require 
> it (or perhaps Rocks has done that?).
> 
> https://slurm.schedmd.com/wckey.html
> 
> Good luck,
> Chris
> -- 
>  Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC
> 
> 

The slurm-roll hosted on sourceforge is developed and supported by Werner Saar 
not by developers of Rocks and/or other Rocks application rolls (e.g. SDSC).

There is ample documentation on sourceforge[1] on how to configure your Rocks 
cluster to properly deploy the slurm-roll components and update your Slurm 
configuration.

There is also an active discussion group for the slurm-roll on sourceforge[2] 
where Werner supports users of the slurm-roll for Rocks.

While we don't use Werner's slurm-roll on our Rocks/Slurm based systems I have 
installed it on test system and can say that it works as expected/documented.

In the default configuration WCKeys were NOT enabled so this something that you 
must have added to your Slurm configuration.

If you don't need the WCKeys capability of Slurm perhaps you could simply 
disable it in your Slurm configuration.

Hope this helps,
Trevor

[1] - 
https://sourceforge.net/projects/slurm-roll/files/release-7.0.0-17.11.05/slurm-roll.pdf
[2] - https://sourceforge.net/p/slurm-roll/discussion/

--
Trevor Cooper
HPC Systems Programmer
San Diego Supercomputer Center, UCSD
9500 Gilman Drive, 0505
La Jolla, CA 92093-0505



Re: [slurm-users] Jobs escaping cgroup device controls after some amount of time.

2018-05-01 Thread Nate Coraor
Thanks Andy,

I've been able to confirm that in my case, any jobs that ran for at least
30 minutes (puppet's run interval) would lose their cgroups, and that the
time those cgroups disappear corresponds exactly with puppet runs. I am not
sure if this is cgroup change to root is what causes the oom event that
Slurm detects - I looked through
src/plugins/task/cgroup/task_cgroup_memory.c and the memory cgroup
documentation and it's not clear to me what would happen if you've created
the oom event listener on a specific cgroup and that cgroup disappears. But
since I disabled puppet overnight, jobs running longer than 30 minutes are
completing, and cgroups are persisting, whereas before that, they were not.

--nate

On Mon, Apr 30, 2018 at 5:47 PM, Andy Georges  wrote:

>
>
> > On 30 Apr 2018, at 22:37, Nate Coraor  wrote:
> >
> > Hi Shawn,
> >
> > I'm wondering if you're still seeing this. I've recently enabled
> task/cgroup on 17.11.5 running on CentOS 7 and just discovered that jobs
> are escaping their cgroups. For me this is resulting in a lot of jobs
> ending in OUT_OF_MEMORY that shouldn't, because it appears slurmd thinks
> the oom-killer has triggered when it hasn't. I'm not using GRES or devices,
> only:
>
> I am not sure that you are making the correct conclusion here.
>
> There is a known cgroups issue, due to
>
> https://www.kernel.org/doc/Documentation/cgroup-v1/memory.txt
>
> Relevant part:
>
> The memory controller has a long history. A request for comments for the
> memory
> controller was posted by Balbir Singh [1]. At the time the RFC was posted
> there were several implementations for memory control. The goal of the
> RFC was to build consensus and agreement for the minimal features required
> for memory control. The first RSS controller was posted by Balbir Singh[2]
> in Feb 2007. Pavel Emelianov [3][4][5] has since posted three versions of
> the
> RSS controller. At OLS, at the resource management BoF, everyone suggested
> that we handle both page cache and RSS together. Another request was raised
> to allow user space handling of OOM. The current memory controller is
> at version 6; it combines both mapped (RSS) and unmapped Page
> Cache Control [11].
>
> Are the jobs killed prematurely? If not, then you ran into the above.
>
> Kind regards.
> — Andy
>


Re: [slurm-users] wckey specification error

2018-05-01 Thread John Hearns
I quickly downloaded that roll and unpacked the RPMs.
I cannot quite see how SLurm is configured, so to my shame I gave up (I did
say that Rocks was not my thing)

On 1 May 2018 at 11:58, John Hearns  wrote:

> Rocks 7 is now available, which is based on CentOS 7.4
> I hate to be uncharitable, but I am not a fan of Rocks. I speak from
> experience, having installed my share of Rocks clusters.
> The philosophy just does not fit in with the way I look at the world.
>
> Anyway, to install extra software on Rocks you need a 'Roll'   Mahmood
> Looks like you are using this Roll
> https://sourceforge.net/projects/slurm-roll/
> It seems pretty mpdern as it installs Slurm 17.11.3
>
>
>
>
> On 1 May 2018 at 11:40, Chris Samuel  wrote:
>
>> On Tuesday, 1 May 2018 2:45:21 PM AEST Mahmood Naderan wrote:
>>
>> > The wckey explanation in the manual [1] is not meaningful at the
>> > moment. Can someone explain that?
>>
>> I've never used it, but it sounds like you've configured your system to
>> require
>> it (or perhaps Rocks has done that?).
>>
>> https://slurm.schedmd.com/wckey.html
>>
>> Good luck,
>> Chris
>> --
>>  Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC
>>
>>
>>
>


Re: [slurm-users] wckey specification error

2018-05-01 Thread Chris Samuel
On Tuesday, 1 May 2018 2:45:21 PM AEST Mahmood Naderan wrote:

> The wckey explanation in the manual [1] is not meaningful at the
> moment. Can someone explain that?

I've never used it, but it sounds like you've configured your system to require 
it (or perhaps Rocks has done that?).

https://slurm.schedmd.com/wckey.html

Good luck,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC