Thanks Erik.
Last night i made the changes.
i defined in slurm.conf on all the nodes as well as on the slurm server.
TmpFS=/lscratch
NodeName=node[01-10] CPUs=44 RealMemory=257380 Sockets=2
CoresPerSocket=22 ThreadsPerCore=1 TmpDisk=160 State=UNKNOWN
Feature=P4000 Gres=gpu:2
These
Who owns the munge directory and key? Is it the right uid/gid? Is the munge
daemon running?
--
Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead
Research Computing Services | Business Services
The University of Melbourne, Victoria 3010 Australia
On Thu, 16 Apr 2020 at 04:57, Dean
Hi Slurm-Users,
Hope this post finds all of you healthy and safe amidst the ongoing COVID19
craziness. We've got a strange error state that occurs when we enable
preemption and we need help diagnosing what is wrong. I'm not sure if we
are missing a default value or other necessary configuration,
/etc/munge is 700
/etc/munge/munge.key is 400
On Wed, Apr 15, 2020 at 12:11 PM Riebs, Andy wrote:
> Two trivial things to check:
>
> 1. Permissions on /etc/munge and /etc/munge.key
>
> 2. Is munged running on the problem node?
>
>
>
> Andy
>
>
>
> *From:* slurm-users
The default value for TmpDisk is 0, so if you want local scratch available on a
node, the amount of TmpDisk space must be defined in the node configuration in
slurm.conf.
example:
NodeName=TestNode01 CPUs=8 Boards=1 SocketsPerBoard=2 CoresPerSocket=4
ThreadsPerCore=1 RealMemory=24099
I’d check ntp as your encoding time seems odd to me
On Wed, 15 Apr 2020 at 19:59, Dean Schulze wrote:
> I've installed two new nodes onto my slurm cluster. One node works, but
> the other one complains about an invalid credential for munge. I've
> verified that the munge.key is the same as on
Two trivial things to check:
1. Permissions on /etc/munge and /etc/munge.key
2. Is munged running on the problem node?
Andy
From: slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] On Behalf Of
Dean Schulze
Sent: Wednesday, April 15, 2020 1:57 PM
To: Slurm User Community
I've installed two new nodes onto my slurm cluster. One node works, but
the other one complains about an invalid credential for munge. I've
verified that the munge.key is the same as on all other nodes with
sudo cksum /etc/munge/munge.key
I recopied a munge.key from a node that works. I've
The more flexible way to do this is with QoS. (PreemptType=preempt/qos) You'll
need to have Accounting enabled and you'll probably want qos listed in
AccountingStorageEnforce. Once you do that you create a "shared" for the
scavenger jobs, a QoS for each group that buys into resources. Assign
Dear all,
the Slurm official documentation say that: Trigger events are not processed
instantly, but a check is performed for trigger events on a periodic basis
(currently every 15 seconds).
https://slurm.schedmd.com/strigger.html
Is it possible to reduce this time?
Thank You.
Hi,
just wanted to update this thread in case someone has a similar problem.
I managed to achieve this result by:
- changing SelectTypeParameters=CR_Core_Memory
To SelectTypeParameters=CR_CPU_Memory
- giving the low priority job a very high "nice" value, near the end of the
scale
With this
Thank you Erik.
To define the local scratch on all the compute node is not mandatory? only
on slurm server is enough right?
Also the TMPdisk should be defined in MB or can be defined in GB as well
while requesting --tmp , we can use the value in GB right?
Regards
Navin.
On Tue, Apr 14, 2020
12 matches
Mail list logo