Re: [slurm-users] How to request for the allocation of scratch .

2020-04-15 Thread navin srivastava
Thanks Erik. Last night i made the changes. i defined in slurm.conf on all the nodes as well as on the slurm server. TmpFS=/lscratch NodeName=node[01-10] CPUs=44 RealMemory=257380 Sockets=2 CoresPerSocket=22 ThreadsPerCore=1 TmpDisk=160 State=UNKNOWN Feature=P4000 Gres=gpu:2 These

Re: [slurm-users] [EXTERNAL] Re: Munge decode failing on new node

2020-04-15 Thread Sean Crosby
Who owns the munge directory and key? Is it the right uid/gid? Is the munge daemon running? -- Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead Research Computing Services | Business Services The University of Melbourne, Victoria 3010 Australia On Thu, 16 Apr 2020 at 04:57, Dean

[slurm-users] Need help configuring 3 tier priority/multifactor preemption in cluster

2020-04-15 Thread Joshua Sonstroem
Hi Slurm-Users, Hope this post finds all of you healthy and safe amidst the ongoing COVID19 craziness. We've got a strange error state that occurs when we enable preemption and we need help diagnosing what is wrong. I'm not sure if we are missing a default value or other necessary configuration,

Re: [slurm-users] Munge decode failing on new node

2020-04-15 Thread Dean Schulze
/etc/munge is 700 /etc/munge/munge.key is 400 On Wed, Apr 15, 2020 at 12:11 PM Riebs, Andy wrote: > Two trivial things to check: > > 1. Permissions on /etc/munge and /etc/munge.key > > 2. Is munged running on the problem node? > > > > Andy > > > > *From:* slurm-users

Re: [slurm-users] How to request for the allocation of scratch .

2020-04-15 Thread Ellestad, Erik
The default value for TmpDisk is 0, so if you want local scratch available on a node, the amount of TmpDisk space must be defined in the node configuration in slurm.conf. example: NodeName=TestNode01 CPUs=8 Boards=1 SocketsPerBoard=2 CoresPerSocket=4 ThreadsPerCore=1 RealMemory=24099

Re: [slurm-users] Munge decode failing on new node

2020-04-15 Thread Carlos Fenoy
I’d check ntp as your encoding time seems odd to me On Wed, 15 Apr 2020 at 19:59, Dean Schulze wrote: > I've installed two new nodes onto my slurm cluster. One node works, but > the other one complains about an invalid credential for munge. I've > verified that the munge.key is the same as on

Re: [slurm-users] Munge decode failing on new node

2020-04-15 Thread Riebs, Andy
Two trivial things to check: 1. Permissions on /etc/munge and /etc/munge.key 2. Is munged running on the problem node? Andy From: slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] On Behalf Of Dean Schulze Sent: Wednesday, April 15, 2020 1:57 PM To: Slurm User Community

[slurm-users] Munge decode failing on new node

2020-04-15 Thread Dean Schulze
I've installed two new nodes onto my slurm cluster. One node works, but the other one complains about an invalid credential for munge. I've verified that the munge.key is the same as on all other nodes with sudo cksum /etc/munge/munge.key I recopied a munge.key from a node that works. I've

Re: [slurm-users] [External] Another question about partition and node allocation

2020-04-15 Thread Michael Robbert
The more flexible way to do this is with QoS. (PreemptType=preempt/qos) You'll need to have Accounting enabled and you'll probably want qos listed in AccountingStorageEnforce. Once you do that you create a "shared" for the scavenger jobs, a QoS for each group that buys into resources. Assign

[slurm-users] Reduce trigger execution time

2020-04-15 Thread Nicolò Parmiggiani
Dear all, the Slurm official documentation say that: Trigger events are not processed instantly, but a check is performed for trigger events on a periodic basis (currently every 15 seconds). https://slurm.schedmd.com/strigger.html Is it possible to reduce this time? Thank You.

Re: [slurm-users] clarification on slurm scheduler and the "nice" parameter

2020-04-15 Thread Matteo F
Hi, just wanted to update this thread in case someone has a similar problem. I managed to achieve this result by: - changing SelectTypeParameters=CR_Core_Memory To SelectTypeParameters=CR_CPU_Memory - giving the low priority job a very high "nice" value, near the end of the scale With this

Re: [slurm-users] How to request for the allocation of scratch .

2020-04-15 Thread navin srivastava
Thank you Erik. To define the local scratch on all the compute node is not mandatory? only on slurm server is enough right? Also the TMPdisk should be defined in MB or can be defined in GB as well while requesting --tmp , we can use the value in GB right? Regards Navin. On Tue, Apr 14, 2020