Hi,
I config slurm with the following log settings:
werner@ubuntu-01:~$ scontrol show config | grep -i logfile
SlurmctldLogFile= /var/log/slurm/slurmctld.log
SlurmdLogFile = /var/log/slurm/slurmd.log
SlurmSchedLogFile = /var/log/slurm/slurmsched.log
But this still can
Do you have other limits set? The QoS is hierarchical, and especially
partition QoS can override other QoS.
What's the output of
sacctmgr show qos -p
and
scontrol show part
Sean
--
Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead
Research Computing Services | Business Services
The
Yeah,
and I found the reason. Seems that (at least for the mysql procedure
get_parent_limits) mySQL 5.7.30 returns NULL where mySQL 5.7.29 returned
an empty string.
Running mySQL < 5.7.30 is a bad idea, as there exist two remotely
exploitable bugs with a CVSS score of 9.8!
(see also
That's great! Thanks David!
On Wed, May 6, 2020 at 11:35 AM David Braun wrote:
> i'm not sure I understand the problem. If you want to make sure the
> preamble and postamble run even if the main job doesn't run you can use '-d'
>
> from the man page
>
> -d, --dependency=
> Defer
i'm not sure I understand the problem. If you want to make sure the
preamble and postamble run even if the main job doesn't run you can use '-d'
from the man page
-d, --dependency=
Defer the start of this job until the
specified dependencies have been
Hi,
Ubuntu has made mysql 5.7.30 the default version. At least with Ubuntu
16.04, this causes severe problems with Slurm dbd (v 17.x, 18.x, and 19.x;
not sure about 20). Reverting to mysql 5.7.29 seems to make everything
work okay again.
cheers,
--dustin
Hi Chris,
I think my question isn't quite clear, but I'm also pretty confident the
answer is no at this point. The idea is that the script is sort of like a
template for running a job, and an end user can submit a custom job with
their own desired resource requests which will end up filling in
Is there no way to set or define a custom variable like at node level and
you could use a per-node Feature for this, but a partition would also work.
Is there no way to set or define a custom variable like at node level and
then you pass the same variable in the job request so that it will land
into those nodes only.
Regards
Navin
On Wed, May 6, 2020, 21:04 Renfro, Michael wrote:
> Ok, then regular license accounting won’t work.
>
>
Ok, then regular license accounting won’t work.
Somewhat tested, but should work or at least be a starting point. Given a job
number JOBID that’s already running with this license on one or more nodes:
sbatch -w $(scontrol show job JOBID | grep ' NodeList=' | cut -d= -f2) -N 1
should start a
To explain with more details.
job will be submitted based on core at any time but it will go to any
random nodes but limited to 4 Nodes only.(license having some intelligence
that it calculate the nodes and if it reached to 4 then it will not allow
any more nodes. yes it didn't depend on the no
To make sure I’m reading this correctly, you have a software license that lets
you run jobs on up to 4 nodes at once, regardless of how many CPUs you use?
That is, you could run any one of the following sets of jobs:
- four 1-node jobs,
- two 2-node jobs,
- one 1-node and one 3-node job,
- two
Sorry, forgot, we use by the way, slurm 18.08.7
I just saw, in an earlier coredump, that there is another (earlier) line
involved:
2136: if (row2[ASSOC2_REQ_MTPJ][0])
the corresponding mysql response was:
Still have the same issue when I updated the user and qos..
Command I am using.
‘sacctmgr modify qos normal set MaxTRESPerUser=gres/gpu=2’
I restarted the services. Unfortunately I am still have to saturate the cluster
with jobs.
We have a cluster of 10 nodes each with 4 gpus, for a total of 40
Hi, same here :/
the segfault happens after the procedure call in mysql:
call get_parent_limits('assoc_table', 'rwth0515', 'rcc', 0); select
@par_id, @mj, @mja, @mpt, @msj, @mwpj, @mtpj, @mtpn, @mtmpj, @mtrm,
@def_qos_id, @qos, @delta_qos;
The mysql answer is:
Hi all.
I'm probably making a rookie error here...which 'megabyte' (powers of 1000
or 1024) does the Slurm documentation refer to in, for example, the
slurm.conf documentation for RealMemory and the sbatch documentation for
`--mem`?
Most of our nodes have the same physical memory configuration.
More investigation and your message has confirmed for me that it's all
working in powers of 1024 (which is what I would expect, although the use
of the word 'megabytes' in the doc is a little misleading, I think...).
So, our nodes have 187GiB total memory, and we need to re-jig our user
Thanks Micheal.
Actually one application license are based on node and we have 4 Node
license( not a fix node). we have several nodes but when job lands on any 4
random nodes it runs on those nodes only. After that it fails if it goes to
other nodes.
can we define a custom variable and set it on
On Wed, 6 May 2020 10:42:46 +0100
Killian Murphy wrote:
> Hi all.
>
> I'm probably making a rookie error here...which 'megabyte' (powers of
> 1000 or 1024) does the Slurm documentation refer to in, for example,
> the slurm.conf documentation for RealMemory and the sbatch
> documentation for
On 06-05-2020 07:38, Chris Samuel wrote:
We are experiencing exactly the same problem after mysql upgrade to 5.7.30,
moving database to old mysql server running 5.6 solves the problem.
Most likely downgrading mysql to 5.7.29 will work as well
I have no clue which change in mysql-server is
On Tuesday, 5 May 2020 11:00:27 PM PDT Maria Semple wrote:
> Is there no way to achieve what I want then? I'd like the first and last job
> steps to always be able to run, even if the second step needs too many
> resources (based on the cluster).
That should just work.
#!/bin/bash
#SBATCH -c 2
Hi Chris,
Thanks for the tip about the memory units, I'll double check that I'm using
them.
Is there no way to achieve what I want then? I'd like the first and last
job steps to always be able to run, even if the second step needs too many
resources (based on the cluster).
As a side note, do you
22 matches
Mail list logo