Re: [slurm-users] RHEL8 support - Missing Symbols in SelectType libraries

2019-10-28 Thread Philip Kovacs
>On Monday, October 28, 2019, 03:18:06 PM EDT, Brian Andrus > wrote: >I spoke too soon. >While I can successfully build/run slurmctld, slurmd is failing because ALL of >the SelectType libraries are missing symbols. >Example from select_cons_tres.so: ># slurmd >slurmd: error:

Re: [slurm-users] OverMemoryKill Not Working?

2019-10-28 Thread Mike Mosley
Thanks again to the folks who responded to my post. I finally manage to get jobs to be terminated after exceeding their memory location. Here is the configuration I used slurm.conf EnforcePartLimits=ALL TaskPlugin=task/cgroup JobAcctGatherType=jobacct_gather/cgroup

Re: [slurm-users] RHEL8 support - Missing Symbols in SelectType libraries

2019-10-28 Thread Brian Andrus
I spoke too soon. While I can successfully build/run slurmctld, slurmd is failing because ALL of the SelectType libraries are missing symbols. Example from select_cons_tres.so: /*# slurmd*//* *//*slurmd: error: plugin_load_from_file: dlopen(/usr/lib64/slurm/select_cons_tres.so):

Re: [slurm-users] tie a reservation to a QoS?

2019-10-28 Thread Kurt H Maier
On Mon, Oct 28, 2019 at 06:40:48PM +, Tina Friedrich wrote: > That's fine and all sounds nice but doesn't precisely help me solve my > problem - which is how to ensure that people can't access both > scheduling priority and a reservation at the same time. I'm not going to > be able to

Re: [slurm-users] tie a reservation to a QoS?

2019-10-28 Thread Tina Friedrich
That's fine and all sounds nice but doesn't precisely help me solve my problem - which is how to ensure that people can't access both scheduling priority and a reservation at the same time. I'm not going to be able to change our terms for co-investment on the grounds of can't figure out how to

Re: [slurm-users] RHEL8 support

2019-10-28 Thread Brian Andrus
Ok, I had been planning on getting around to it, so this prompted me to do so. Yes, I can get slurm 19.05.3 to build (and package) under CentOS 8. There are some caveats, however since many repositories and packages are changed. The only bits I don't have yet are the nvml, pmix and ucx I

Re: [slurm-users] tie a reservation to a QoS?

2019-10-28 Thread Bill Wichser
One thing we changed years ago was to think about things differently. While researchers are in fact buying nodes for the cluster, it's rarely the case that they get any rights to "their" nodes. Instead they are buying CPU time in an equivalent way but it gets averaged over 30 days. We

[slurm-users] tie a reservation to a QoS?

2019-10-28 Thread Tina Friedrich
Hello, is there a possibility to tie a reservation to a QoS (instead of an account or user), or enforce a QoS for jobs submitted into a reservation? The problem I'm trying to solve is - some of our resources are bought on a co-investment basis. As part of that, the 'owning' group can get very

Re: [slurm-users] Running job is canceled when starting a new job from queue

2019-10-28 Thread Uwe Seher
Hello! I cannot fond any hints on oom-kills, but it is systemd so i need maybe a little more time searching. We have 128GB mem on the node and the tasks do not use this to the limit we know, dependencies have also worked fine with the same tasks. Monitoring does not show any problems with memory.

Re: [slurm-users] Running job is canceled when starting a new job from queue

2019-10-28 Thread Lech Nieroda
Hello Uwe, when the requested time limit of a job runs out the job is cancelled and terminated with signal SIGTERM (15) and later on SIGKILL (9) if that should fail, the job gets the state „TIMEOUT“. However the job 161 gets killed immediately by SIGKILL and gets the state „FAILED“. That

[slurm-users] Running job is canceled when starting a new job from queue

2019-10-28 Thread Uwe Seher
Hello group! While running our first jobs i git a strange issue while running multiple Jobs on a single partition. The partition is a single Node with 32 cores and 128GB memory. ther is a queue with three jobs each should use 15 cores, memory usage is not important. As planned 2 jobs are running,

[slurm-users] Observation on SchedMD issue 6787: Add EndTime, CompletingTime to output of `scontrol completing`

2019-10-28 Thread Kevin Buckley
Hi there, in SchedMD issue 6787 (https://bugs.schedmd.com/show_bug.cgi?id=6787), there was a patch, supplied by Doug Jacobsen, that altered the output of `scontrol completing` to be akin to the following (have cut-and-pasted Chris Samuel's example from the issue ticket) when run from the command

Re: [slurm-users] RHEL8 support

2019-10-28 Thread Benjamin Redling
On 28/10/2019 08.26, Bjørn-Helge Mevik wrote: > Taras Shapovalov writes: > >> Do I understand correctly that Slurm19 is not compatible with rhel8? It is >> not in the list https://slurm.schedmd.com/platforms.html > > It says > > "RedHat Enterprise Linux 7 (RHEL7), CentOS 7, Scientific Linux 7

Re: [slurm-users] RHEL8 support

2019-10-28 Thread Bjørn-Helge Mevik
Taras Shapovalov writes: > Do I understand correctly that Slurm19 is not compatible with rhel8? It is > not in the list https://slurm.schedmd.com/platforms.html It says "RedHat Enterprise Linux 7 (RHEL7), CentOS 7, Scientific Linux 7 (and newer)" Perhaps that includes RHEL8, and CentOS 8, not