[slurm-users] How to config slurm so that I can retrieve all of the history job logs.

2020-05-06 Thread Hongyi Zhao
Hi, I config slurm with the following log settings: werner@ubuntu-01:~$ scontrol show config | grep -i logfile SlurmctldLogFile= /var/log/slurm/slurmctld.log SlurmdLogFile = /var/log/slurm/slurmd.log SlurmSchedLogFile = /var/log/slurm/slurmsched.log But this still can

Re: [slurm-users] [EXT] Re: Limit the number of GPUS per user per partition

2020-05-06 Thread Sean Crosby
Do you have other limits set? The QoS is hierarchical, and especially partition QoS can override other QoS. What's the output of sacctmgr show qos -p and scontrol show part Sean -- Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead Research Computing Services | Business Services The

Re: [slurm-users] Do not upgrade mysql to 5.7.30!

2020-05-06 Thread Marcus Wagner
Yeah, and I found the reason. Seems that (at least for the mysql procedure get_parent_limits) mySQL 5.7.30 returns NULL where mySQL 5.7.29 returned an empty string. Running mySQL < 5.7.30 is a bad idea, as there exist two remotely exploitable bugs with a CVSS score of 9.8! (see also

Re: [slurm-users] Job Step Resource Requests are Ignored

2020-05-06 Thread Maria Semple
That's great! Thanks David! On Wed, May 6, 2020 at 11:35 AM David Braun wrote: > i'm not sure I understand the problem. If you want to make sure the > preamble and postamble run even if the main job doesn't run you can use '-d' > > from the man page > > -d, --dependency= > Defer

Re: [slurm-users] Job Step Resource Requests are Ignored

2020-05-06 Thread David Braun
i'm not sure I understand the problem. If you want to make sure the preamble and postamble run even if the main job doesn't run you can use '-d' from the man page -d, --dependency= Defer the start of this job until the specified dependencies have been

[slurm-users] Do not upgrade mysql to 5.7.30!

2020-05-06 Thread Dustin Lang
Hi, Ubuntu has made mysql 5.7.30 the default version. At least with Ubuntu 16.04, this causes severe problems with Slurm dbd (v 17.x, 18.x, and 19.x; not sure about 20). Reverting to mysql 5.7.29 seems to make everything work okay again. cheers, --dustin

Re: [slurm-users] Job Step Resource Requests are Ignored

2020-05-06 Thread Maria Semple
Hi Chris, I think my question isn't quite clear, but I'm also pretty confident the answer is no at this point. The idea is that the script is sort of like a template for running a job, and an end user can submit a custom job with their own desired resource requests which will end up filling in

Re: [slurm-users] how to restrict jobs

2020-05-06 Thread Mark Hahn
Is there no way to set or define a custom variable like at node level and you could use a per-node Feature for this, but a partition would also work.

Re: [slurm-users] how to restrict jobs

2020-05-06 Thread navin srivastava
Is there no way to set or define a custom variable like at node level and then you pass the same variable in the job request so that it will land into those nodes only. Regards Navin On Wed, May 6, 2020, 21:04 Renfro, Michael wrote: > Ok, then regular license accounting won’t work. > >

Re: [slurm-users] how to restrict jobs

2020-05-06 Thread Renfro, Michael
Ok, then regular license accounting won’t work. Somewhat tested, but should work or at least be a starting point. Given a job number JOBID that’s already running with this license on one or more nodes: sbatch -w $(scontrol show job JOBID | grep ' NodeList=' | cut -d= -f2) -N 1 should start a

Re: [slurm-users] how to restrict jobs

2020-05-06 Thread navin srivastava
To explain with more details. job will be submitted based on core at any time but it will go to any random nodes but limited to 4 Nodes only.(license having some intelligence that it calculate the nodes and if it reached to 4 then it will not allow any more nodes. yes it didn't depend on the no

Re: [slurm-users] how to restrict jobs

2020-05-06 Thread Renfro, Michael
To make sure I’m reading this correctly, you have a software license that lets you run jobs on up to 4 nodes at once, regardless of how many CPUs you use? That is, you could run any one of the following sets of jobs: - four 1-node jobs, - two 2-node jobs, - one 1-node and one 3-node job, - two

Re: [slurm-users] "sacctmgr add cluster" crashing slurmdbd

2020-05-06 Thread Marcus Wagner
Sorry, forgot, we use by the way, slurm 18.08.7 I just saw, in an earlier coredump, that there is another (earlier) line involved: 2136: if (row2[ASSOC2_REQ_MTPJ][0]) the corresponding mysql response was:

Re: [slurm-users] [EXT] Re: Limit the number of GPUS per user per partition

2020-05-06 Thread Theis, Thomas
Still have the same issue when I updated the user and qos.. Command I am using. ‘sacctmgr modify qos normal set MaxTRESPerUser=gres/gpu=2’ I restarted the services. Unfortunately I am still have to saturate the cluster with jobs. We have a cluster of 10 nodes each with 4 gpus, for a total of 40

Re: [slurm-users] "sacctmgr add cluster" crashing slurmdbd

2020-05-06 Thread Marcus Wagner
Hi, same here :/ the segfault happens after the procedure call in mysql: call get_parent_limits('assoc_table', 'rwth0515', 'rcc', 0); select @par_id, @mj, @mja, @mpt, @msj, @mwpj, @mtpj, @mtpn, @mtmpj, @mtrm, @def_qos_id, @qos, @delta_qos; The mysql answer is:

[slurm-users] Slurm memory units

2020-05-06 Thread Killian Murphy
Hi all. I'm probably making a rookie error here...which 'megabyte' (powers of 1000 or 1024) does the Slurm documentation refer to in, for example, the slurm.conf documentation for RealMemory and the sbatch documentation for `--mem`? Most of our nodes have the same physical memory configuration.

Re: [slurm-users] Slurm memory units

2020-05-06 Thread Killian Murphy
More investigation and your message has confirmed for me that it's all working in powers of 1024 (which is what I would expect, although the use of the word 'megabytes' in the doc is a little misleading, I think...). So, our nodes have 187GiB total memory, and we need to re-jig our user

Re: [slurm-users] how to restrict jobs

2020-05-06 Thread navin srivastava
Thanks Micheal. Actually one application license are based on node and we have 4 Node license( not a fix node). we have several nodes but when job lands on any 4 random nodes it runs on those nodes only. After that it fails if it goes to other nodes. can we define a custom variable and set it on

Re: [slurm-users] Slurm memory units

2020-05-06 Thread Peter Kjellström
On Wed, 6 May 2020 10:42:46 +0100 Killian Murphy wrote: > Hi all. > > I'm probably making a rookie error here...which 'megabyte' (powers of > 1000 or 1024) does the Slurm documentation refer to in, for example, > the slurm.conf documentation for RealMemory and the sbatch > documentation for

Re: [slurm-users] "sacctmgr add cluster" crashing slurmdbd

2020-05-06 Thread Ben Polman
On 06-05-2020 07:38, Chris Samuel wrote: We are experiencing exactly the same problem after mysql upgrade to 5.7.30, moving database to old mysql server running 5.6 solves the problem. Most likely downgrading mysql to 5.7.29 will work as well I have no clue which change in mysql-server is

Re: [slurm-users] Job Step Resource Requests are Ignored

2020-05-06 Thread Chris Samuel
On Tuesday, 5 May 2020 11:00:27 PM PDT Maria Semple wrote: > Is there no way to achieve what I want then? I'd like the first and last job > steps to always be able to run, even if the second step needs too many > resources (based on the cluster). That should just work. #!/bin/bash #SBATCH -c 2

Re: [slurm-users] Job Step Resource Requests are Ignored

2020-05-06 Thread Maria Semple
Hi Chris, Thanks for the tip about the memory units, I'll double check that I'm using them. Is there no way to achieve what I want then? I'd like the first and last job steps to always be able to run, even if the second step needs too many resources (based on the cluster). As a side note, do you