Is there a way to diagnose if the I/O to the
/cm/shared/apps/slurm/var/cm/statesave directory (Used for job status) on the
NFS storage is the cause of the socket errors?
What values/threshold from the nfsiostat command would signal the NFS storage
as the bottleneck?
From: Buckley, Ronan
Sent
-users/2019-June/003534.html
My take is that there is no answer to the question, each site is different.
Best Regards
mg.
From: slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] On Behalf Of
Buckley, Ronan
Sent: Dienstag, 25. Juni 2019 11:17
To: 'slurm-users@lists.schedmd.com'
mailto:slurm
Hi,
Since configuring a backup slurm controller (including moving the
StateSaveLocation from a local disk to a NFS share), we are seeing these errors
in the slurmctld logs on a regular basis:
Socket timed out on send/recv operation
It sometimes occurs when a job array is started and squeue
Hi,
Since configuring a backup slurm controller (including moving the
StateSaveLocation from a local disk to a NFS share), we are seeing these errors
in the slurmctld logs on a regular basis:
Socket timed out on send/recv operation
It sometimes occurs when a job array is started and squeue
Hi,
Does restarting the slurmctld daemon on a slurm head node affect running slurm
jobs on the compute nodes in any way?
Rgds
Hi,
I want to increase the MaxJobCount in the slurm.conf file from its default
value of 10,000. I want to increase it to 250,000.
The online documentation says:
MaxJobCount
The maximum number of jobs Slurm can have in its active database at one time.
Set the values of MaxJobCount and MinJobAge
Hi,
I want to increase the MaxArraySize in the slurm.conf file from its default
value of 1001. I want to increase it to 1.
Is it a case of just adding "MaxArraySize=1" to the slurm.conf file and
then running "scontrol reconfigure" to update slurm.conf ?
Will this update affect running
Disabling the firewall service on the centos client allows the ‘srun hostname’
command to run.
From: Buckley, Ronan
Sent: Tuesday, July 17, 2018 12:00 PM
To: 'Slurm User Community List'
Subject: RE: [slurm-users] 'srun hostname' hangs on the command line
Hi Carlos, Is there a way to test
already run an ssh into a node and run the
hostname command manually.
On 17 July 2018 at 09:50, Buckley, Ronan
mailto:ronan.buck...@dell.com>> wrote:
Yes I do.
From: slurm-users
[mailto:slurm-users-boun...@lists.schedmd.com<mailto:slurm-users-boun...@lists.schedmd.com>]
On Behalf
-root user?
From: slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] On Behalf Of
Buckley, Ronan
Sent: Tuesday, 17 July 2018 12:53 AM
To: slurm-users@lists.schedmd.com<mailto:slurm-users@lists.schedmd.com>
Subject: [slurm-users] 'srun hostname' hangs on the command line
Hi All,
Verbos
Hi All,
Verbose mode doesn't show much.
I hashed out the hostnames.
Any ideas/suggestions?
# srun hostname
^Csrun: interrupt (one more within 1 sec to abort)
srun: task 0: unknown
^Z
[1]+ Stopped srun hostname
#
# srun -v hostname
srun: defined options for program `srun'
srun:
Hi all,
Slurm accounting commands like sstat and sacct report information but sreport
always reports no information, even though by default it works on my VM.
What am I missing?
Rgds
Ronan
Hi All,
Commands like sacct and sreport provide blank information:
# sreport cluster utilization
Cluster Utilization 2018-06-04T00:00:00 - 2018-06-04T23:59:59
Use reported in TRES Minutes
f file, as well. You will have to create the accounts and
users using sacctmgr, and possibly QOSs, depending on what you'd like to do.
It's not difficult, but there are a number of small steps.
There's a document online that walks you through the process.
Paul.
> On May 28, 2018, at 10:31
Hi All,
I need to enable SLURM accounting so that I can use commands like sacct,
sstat,sreport etc. It looks like SLURM accounting was not enabled by default.
From reading the online documentation, all I have to do is to un-commented the
following lines in /etc/slurm/slurm.conf:
Has anyone any experience of setting up users that can cancel jobs?
From: Buckley, Ronan
Sent: Wednesday, April 18, 2018 9:06 AM
To: 'slurm-users@lists.schedmd.com'
Subject: RE: SLURM Operator Role (to cancel SLURM Jobs)
According to the online documentation:
"When using the Slurm db, user
Hi,
I have given 4 users the operator role and they are all part of the coordinator
accounts. However, when I su to the users in question, they get a permission
denied error when trying to cancel a job.
What am I missing?
Ronan
17 matches
Mail list logo