Re: [slurm-users] Slurm: Socket timed out on send/recv operation - slurm 17.02.2

2019-06-25 Thread Buckley, Ronan
Is there a way to diagnose if the I/O to the /cm/shared/apps/slurm/var/cm/statesave directory (Used for job status) on the NFS storage is the cause of the socket errors? What values/threshold from the nfsiostat command would signal the NFS storage as the bottleneck? From: Buckley, Ronan Sent

Re: [slurm-users] Slurm: Socket timed out on send/recv operation - slurm 17.02.2

2019-06-25 Thread Buckley, Ronan
-users/2019-June/003534.html My take is that there is no answer to the question, each site is different. Best Regards mg. From: slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] On Behalf Of Buckley, Ronan Sent: Dienstag, 25. Juni 2019 11:17 To: 'slurm-users@lists.schedmd.com' mailto:slurm

[slurm-users] Slurm: Socket timed out on send/recv operation - slurm 17.02.2

2019-06-25 Thread Buckley, Ronan
Hi, Since configuring a backup slurm controller (including moving the StateSaveLocation from a local disk to a NFS share), we are seeing these errors in the slurmctld logs on a regular basis: Socket timed out on send/recv operation It sometimes occurs when a job array is started and squeue

[slurm-users] Slurm: Socket timed out on send/recv operation - slurm 17.02.2

2019-06-24 Thread Buckley, Ronan
Hi, Since configuring a backup slurm controller (including moving the StateSaveLocation from a local disk to a NFS share), we are seeing these errors in the slurmctld logs on a regular basis: Socket timed out on send/recv operation It sometimes occurs when a job array is started and squeue

[slurm-users] service slurmctld restart

2019-01-31 Thread Buckley, Ronan
Hi, Does restarting the slurmctld daemon on a slurm head node affect running slurm jobs on the compute nodes in any way? Rgds

[slurm-users] Increase MaxJobCount in slurm.conf

2019-01-31 Thread Buckley, Ronan
Hi, I want to increase the MaxJobCount in the slurm.conf file from its default value of 10,000. I want to increase it to 250,000. The online documentation says: MaxJobCount The maximum number of jobs Slurm can have in its active database at one time. Set the values of MaxJobCount and MinJobAge

[slurm-users] Increase MaxArraySize in slurm.conf

2019-01-29 Thread Buckley, Ronan
Hi, I want to increase the MaxArraySize in the slurm.conf file from its default value of 1001. I want to increase it to 1. Is it a case of just adding "MaxArraySize=1" to the slurm.conf file and then running "scontrol reconfigure" to update slurm.conf ? Will this update affect running

Re: [slurm-users] 'srun hostname' hangs on the command line

2018-07-17 Thread Buckley, Ronan
Disabling the firewall service on the centos client allows the ‘srun hostname’ command to run. From: Buckley, Ronan Sent: Tuesday, July 17, 2018 12:00 PM To: 'Slurm User Community List' Subject: RE: [slurm-users] 'srun hostname' hangs on the command line Hi Carlos, Is there a way to test

Re: [slurm-users] 'srun hostname' hangs on the command line

2018-07-17 Thread Buckley, Ronan
already run an ssh into a node and run the hostname command manually. On 17 July 2018 at 09:50, Buckley, Ronan mailto:ronan.buck...@dell.com>> wrote: Yes I do. From: slurm-users [mailto:slurm-users-boun...@lists.schedmd.com<mailto:slurm-users-boun...@lists.schedmd.com>] On Behalf

Re: [slurm-users] 'srun hostname' hangs on the command line

2018-07-17 Thread Buckley, Ronan
-root user? From: slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] On Behalf Of Buckley, Ronan Sent: Tuesday, 17 July 2018 12:53 AM To: slurm-users@lists.schedmd.com<mailto:slurm-users@lists.schedmd.com> Subject: [slurm-users] 'srun hostname' hangs on the command line Hi All, Verbos

[slurm-users] 'srun hostname' hangs on the command line

2018-07-16 Thread Buckley, Ronan
Hi All, Verbose mode doesn't show much. I hashed out the hostnames. Any ideas/suggestions? # srun hostname ^Csrun: interrupt (one more within 1 sec to abort) srun: task 0: unknown ^Z [1]+ Stopped srun hostname # # srun -v hostname srun: defined options for program `srun' srun:

[slurm-users] sreport reports blank information

2018-06-15 Thread Buckley, Ronan
Hi all, Slurm accounting commands like sstat and sacct report information but sreport always reports no information, even though by default it works on my VM. What am I missing? Rgds Ronan

[slurm-users] cluster not registered

2018-06-05 Thread Buckley, Ronan
Hi All, Commands like sacct and sreport provide blank information: # sreport cluster utilization Cluster Utilization 2018-06-04T00:00:00 - 2018-06-04T23:59:59 Use reported in TRES Minutes

Re: [slurm-users] Enable SLURM Accounting

2018-05-28 Thread Buckley, Ronan
f file, as well. You will have to create the accounts and users using sacctmgr, and possibly QOSs, depending on what you'd like to do. It's not difficult, but there are a number of small steps. There's a document online that walks you through the process. Paul. > On May 28, 2018, at 10:31

[slurm-users] Enable SLURM Accounting

2018-05-28 Thread Buckley, Ronan
Hi All, I need to enable SLURM accounting so that I can use commands like sacct, sstat,sreport etc. It looks like SLURM accounting was not enabled by default. From reading the online documentation, all I have to do is to un-commented the following lines in /etc/slurm/slurm.conf:

Re: [slurm-users] SLURM Operator Role (to cancel SLURM Jobs)

2018-04-20 Thread Buckley, Ronan
Has anyone any experience of setting up users that can cancel jobs? From: Buckley, Ronan Sent: Wednesday, April 18, 2018 9:06 AM To: 'slurm-users@lists.schedmd.com' Subject: RE: SLURM Operator Role (to cancel SLURM Jobs) According to the online documentation: "When using the Slurm db, user

[slurm-users] SLURM Operator Role (to cancel SLURM Jobs)

2018-04-17 Thread Buckley, Ronan
Hi, I have given 4 users the operator role and they are all part of the coordinator accounts. However, when I su to the users in question, they get a permission denied error when trying to cancel a job. What am I missing? Ronan