[slurm-users] Forcing CPU bindings

2018-05-31 Thread Sean Crosby
Hi, When a user requests all of the GPUs on a system, but less than the total number of CPUs, the CPU bindings aren't ideal [root@host ~]# nvidia-smi topo -m GPU0 GPU1 GPU2 GPU3 mlx5_3 mlx5_1 mlx5_2 mlx5_0 CPU Affinity GPU0 X PHB SYS SYS SYS PHB SYS PHB

Re: [slurm-users] "fatal: can't stat gres.conf"

2018-07-23 Thread Sean Crosby
Hi Alex, What's the actual content of your gres.conf file? Seems to me that you have a trailing comma after the location of the nvidia device Our gres.conf has NodeName=gpuhost[001-077] Name=gpu Type=p100 File=/dev/nvidia0 Cores=0,2,4,6,8,10,12,14,16,18,20,22 NodeName=gpuhost[001-077] Name=gpu

Re: [slurm-users] Failed to launch jobs with mpirun after upgrading to Slurm 19.05

2019-06-06 Thread Sean Crosby
Hi Andrés, Did you recompile OpenMPI after updating to SLURM 19.05? Sean -- Sean Crosby Senior DevOpsHPC Engineer and HPC Team Lead | Research Platform Services Research Computing | CoEPP | School of Physics University of Melbourne On Thu, 6 Jun 2019 at 20:11, Andrés Marín Díaz mailto:ama

Re: [slurm-users] 19.05.0 x11 in sbatch

2019-05-29 Thread Sean Crosby
de has been revamped, and no longer relies on libssh2 to function. However, support for --x11 alongside sbatch has been removed, as the new forwarding code relies on the allocating salloc or srun command to process the forwarding. Sean -- Sean Crosby Senior DevOpsHPC Engineer and HPC Team Lead | Re

Re: [slurm-users] Issue with x11

2019-05-14 Thread Sean Crosby
Hi Mahmood, To get native X11 working with SLURM, we had to add this config to sshd_config on the login node (your rocks7 host) X11UseLocalhost no You'll then need to restart sshd Sean -- Sean Crosby Senior DevOpsHPC Engineer and HPC Team Lead | Research Platform Services Research Computing

Re: [slurm-users] Issue with x11

2019-05-15 Thread Sean Crosby
Hi Mahmood, I've never tried using the native X11 of SLURM without being ssh'ed into the submit node. Can you try ssh'ing with X11 forwarding to rocks7 (i.e. ssh -X user@rocks7) from a different machine, and then try your srun --x11 command? Sean -- Sean Crosby Senior DevOpsHPC Engineer

Re: [slurm-users] Limit Number of Jobs Per User Per Partition

2019-04-20 Thread Sean Crosby
Hi Eric, Look at partition QOS - https://slurm.schedmd.com/SLUG15/Partition_QOS.pdf The QoS options are MaxJobsPerUser and MaxSubmitPerUser (and also PerAccount versions) Sean -- Sean Crosby Senior DevOpsHPC Engineer and HPC Team Lead | Research Platform Services Research Computing | CoEPP

Re: [slurm-users] Give priority to specific server

2019-07-14 Thread Sean Crosby
individually. The default value is 1. Add Weight=1000 to the serv1 line, and serv2 should be given the job first. Sean -- Sean Crosby Senior DevOpsHPC Engineer and HPC Team Lead | Research Platform Services Research Computing | CoEPP | School of Physics University of Melbourne On Sun, 14 Jul 2019

Re: [slurm-users] pam_slurm_adopt and memory constraints?

2019-07-17 Thread Sean Crosby
se_uid session required pam_unix.so Sean -- Sean Crosby Senior DevOpsHPC Engineer and HPC Team Lead | Research Platform Services Research Computing | CoEPP | School of Physics University of Melbourne On Wed, 17 Jul 2019 at 21:05, Andy Georges mailto:andy.geor...@ugent.be>> wr

Re: [slurm-users] Slurm node weights

2019-07-25 Thread Sean Crosby
Hi David, What does: scontrol show node orange01 scontrol show node orange02 show? Just to see if there's a default node weight hanging around, and if your weight changes have been picked up. Sean -- Sean Crosby Senior DevOpsHPC Engineer and HPC Team Lead | Research Platform Services

Re: [slurm-users] Question about memory allocation

2019-12-17 Thread Sean Crosby
What services did you restart after changing the slurm.conf? Did you do an scontrol reconfigure? Do you have any reservations? scontrol show res Sean On Tue, 17 Dec. 2019, 10:35 pm Mahmood Naderan, mailto:mahmood...@gmail.com>> wrote: >Your running job is requesting 6 CPUs per node (4 nodes,

Re: [slurm-users] Question about memory allocation

2019-12-17 Thread Sean Crosby
Hi Mahmood, Your running job is requesting 6 CPUs per node (4 nodes, 6 CPUs per node). That means 6 CPUs are being used on node hpc. Your queued job is requesting 5 CPUs per node (4 nodes, 5 CPUs per node). In total, if it was running, that would require 11 CPUs on node hpc. But hpc only has

Re: [slurm-users] srun: Error generating job credential

2019-10-08 Thread Sean Crosby
Looking at the SLURM code, it looks like it is failing with a call to getpwuid_r on the ctld What is (on slurm-master): getent passwd turing getent passwd 1000 Sean -- Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead Research Platform Services | Business Services CoEPP Research

Re: [slurm-users] Slurm 19.05 X11-forwarding

2020-02-27 Thread Sean Crosby
-- Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead Research Computing Services | Business Services The University of Melbourne, Victoria 3010 Australia On Wed, 26 Feb 2020 at 20:52, Pär Lundö mailto:par.lu...@foi.se>> wrote: Hi, Thank you for your quick replies. Please bear with m

Re: [slurm-users] [EXTERNAL] Re: Munge decode failing on new node

2020-04-15 Thread Sean Crosby
Who owns the munge directory and key? Is it the right uid/gid? Is the munge daemon running? -- Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead Research Computing Services | Business Services The University of Melbourne, Victoria 3010 Australia On Thu, 16 Apr 2020 at 04:57, Dean

Re: [slurm-users] [EXTERNAL] CentOS 7 CUDA 8.0 can't find plugin cons_tres

2020-04-16 Thread Sean Crosby
Hi Lisa, cons_tres is part of Slurm 19.05 and higher. As you are using Slurm 18.08, it won't be there. The select plugin for 18.05 is cons_res. Is there a reason why you're using an old Slurm? Sean -- Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead Research Computing Services

Re: [slurm-users] [EXT] Re: Limit the number of GPUS per user per partition

2020-05-06 Thread Sean Crosby
Do you have other limits set? The QoS is hierarchical, and especially partition QoS can override other QoS. What's the output of sacctmgr show qos -p and scontrol show part Sean -- Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead Research Computing Services | Business Services

Re: [slurm-users] [EXT] Re: Limit the number of GPUS per user per partition

2020-05-05 Thread Sean Crosby
Hi Thomas, That value should be sacctmgr modify qos gpujobs set MaxTRESPerUser=gres/gpu=4 Sean -- Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead Research Computing Services | Business Services The University of Melbourne, Victoria 3010 Australia On Wed, 6 May 2020 at 04:53, Theis

Re: [slurm-users] [EXT] Slurmd problem on client

2020-08-24 Thread Sean Crosby
Hi Lars, Do the regular slurm commands work from the client? e.g. squeue scontrol show part If they don't, it would be a sign of communication problems. Is there a software firewall running on the master/client? Sean -- Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead Research

Re: [slurm-users] [EXT] Slurmd problem on client

2020-08-24 Thread Sean Crosby
Make sure slurmd on the client is stopped, and then run it in verbose mode in the foreground e.g. /usr/local/slurm/latest/sbin/slurmd -D -v Then post the output -- Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead Research Computing Services | Business Services The University

Re: [slurm-users] [EXT] Jobs Immediately Fail for Certain Users

2020-07-07 Thread Sean Crosby
$? 1 Sean -- Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead Research Computing Services | Business Services The University of Melbourne, Victoria 3010 Australia On Wed, 8 Jul 2020 at 01:14, Jason Simms wrote: > *UoM notice: External email. Be cautious of links, attachme

Re: [slurm-users] [EXT] Weird issues with slurm's Priority

2020-07-07 Thread Sean Crosby
it is set? Using (e.g. scontrol show job 337475 or sacct -j 337475 -o Timelimit) Sean > > Thanks again > > On Tue, Jul 7, 2020 at 11:39 AM Sean Crosby > wrote: > >> Hi, >> >> What you have described is how the backfill scheduler works. If a lower >> prior

Re: [slurm-users] [EXT] Weird issues with slurm's Priority

2020-07-08 Thread Sean Crosby
timelimit accurately) means that cores will go idle when there are jobs that could use them. If you're happy with that, then all is fine. Sean -- Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead Research Computing Services | Business Services The University of Melbourne, Victoria 3010

Re: [slurm-users] [EXT] Weird issues with slurm's Priority

2020-07-07 Thread Sean Crosby
starting in its original time. In your example job list, can you also list the requested times for each job? That will show if it is the backfill scheduler doing what it is designed to do. Sean -- Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead Research Computing Services | Business

Re: [slurm-users] [EXT] Re: Module "pam_slurm_adopt"

2020-07-01 Thread Sean Crosby
You have to install the pam-devel package on the server you use to build Slurm on. You'll then need to configure and then make. Then you'll be able to make the files in the contrib/pam_slurm_adopt folder Sean -- Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead Research Computing

Re: [slurm-users] [EXT] Set a per-cluster default limit of the number of active cores per user at a time

2020-06-19 Thread Sean Crosby
different QoS names for all the partitions across all of your clusters, and set the limits on the QoS? Sean -- Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead Research Computing Services | Business Services The University of Melbourne, Victoria 3010 Australia On Sat, 20 Jun 2020 at 07

Re: [slurm-users] [EXT] Re: [EXTERNAL] Re: trying to diagnose a connectivity issue between the slurmctld process and the slurmd nodes

2020-11-30 Thread Sean Crosby
communicating with the other Slurmd's e.g. from SRVGRIDSLURM01 do nc -z SRVGRIDSLURM02 6818 || echo Cannot communicate nc -z srvgridslurm03 6818 || echo Cannot communicate Replace 6818 with the port you get from the scontrol show config command earlier Sean -- Sean Crosby | Senior DevOpsHPC

Re: [slurm-users] [EXT] job_submit.lua - choice of error on failure / job_desc.gpus?

2020-12-04 Thread Sean Crosby
slurm.user_msg("--gpus-per-task option requires --tasks specification") return ESLURM_BAD_TASK_COUNT end end end end end end end Let me know if you improve it please? We're always on the h

Re: [slurm-users] [EXT] job_submit.lua - choice of error on failure / job_desc.gpus?

2020-12-07 Thread Sean Crosby
Hi Loris, We have a completely separate test system, complete with a few worker nodes, separate slurmctld/slurmdbd, so we can test Slurm upgrades etc. Sean -- Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead Research Computing Services | Business Services The University of Melbourne

Re: [slurm-users] [EXT] GPU Jobs with Slurm

2021-01-14 Thread Sean Crosby
MSpace=yes ConstrainSwapSpace=yes ConstrainDevices=yes TaskAffinity=no CgroupMountpoint=/sys/fs/cgroup The ConstrainDevices=yes is the key to stopping jobs from having access to GPUs they didn't request. Sean -- Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead Research Computin

Re: [slurm-users] [EXT] wrong number of jobs used

2021-01-19 Thread Sean Crosby
ows that for this node, it has 72 cores and 1.5TB RAM (the CfgTRES part), and currently jobs are using 72 cores, and 442GB RAM. I would run the same command on 4 or 5 of the nodes on your cluster, and we'll have a better idea about what's going on. Sean -- Sean Crosby | Senior DevOpsHPC Engineer and HPC T

Re: [slurm-users] [EXT] slurm/munge problem: invalid credentials

2020-12-16 Thread Sean Crosby
to contact the new compute node on SlurmdPort. Sean -- Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead Research Computing Services | Business Services The University of Melbourne, Victoria 3010 Australia On Wed, 16 Dec 2020 at 03:48, Olaf Gellert wrote: > UoM notice: External em

Re: [slurm-users] [EXT] Re: Is there a scontrol ping slurmdbd?

2021-06-10 Thread Sean Crosby
We use sacctmgr list stats for our Slurmdbd check Our Nagios check is RESULT=$(/usr/local/slurm/latest/bin/sacctmgr list stats) if [ $? -ne 0 ] then echo "ERROR: cannot connect to database" exit 2 fi echo "$RESULT" | head -n 4 exit 0 Sean From:

Re: [slurm-users] [EXT] incorrect number of cpu's being reported in srun job

2021-06-17 Thread Sean Crosby
Hi Sid, On our cluster, it performs just like your PBS cluster. $ srun -N 1 --cpus-per-task 8 --time 01:00:00 --mem 2g --partition physicaltest -q hpcadmin --pty python3 srun: job 27060036 queued and waiting for resources srun: job 27060036 has been allocated resources Python 3.6.8 (default,

Re: [slurm-users] [EXT] rejecting jobs that exceed QOS limits

2021-05-28 Thread Sean Crosby
Hi Paul, Try sacctmgr modify qos gputest set flags=DenyOnLimit Sean From: slurm-users on behalf of Paul Raines Sent: Saturday, 29 May 2021 12:48 To: slurm-users@lists.schedmd.com Subject: [EXT] [slurm-users] rejecting jobs that exceed QOS limits External

Re: [slurm-users] [EXT] How to determine (on the ControlMachine) which cores/gpus are assigned to a job?

2021-02-05 Thread Sean Crosby
Licenses=(null) Network=(null) Note the CPU_IDs and GPU IDX in the output Sean -- Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead Research Computing Services | Business Services The University of Melbourne, Victoria 3010 Australia On Fri, 5 Feb 2021 at 02:01, Thomas Zeiser

Re: [slurm-users] [EXT] slurmctld error

2021-04-05 Thread Sean Crosby
What's the output of ss -lntp | grep $(pidof slurmdbd) on your dbd host? Sean -- Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead Research Computing Services | Business Services The University of Melbourne, Victoria 3010 Australia On Tue, 6 Apr 2021 at 05:00, wrote: > *

Re: [slurm-users] [EXT] slurmctld error

2021-04-05 Thread Sean Crosby
Interesting. It looks like slurmdbd is not opening the 6819 port What does ss -lntp | grep 6819 show? Is something else using that port? You can also stop the slurmdbd service and run it in debug mode using slurmdbd -D -vvv Sean -- Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead

Re: [slurm-users] [EXT] slurmctld error

2021-04-05 Thread Sean Crosby
The other thing I notice for my slurmdbd.conf is that I have DbdAddr=localhost DbdHost=localhost You can try changing your slurmdbd.conf to set those 2 values as well to see if that gets slurmdbd to listen on port 6819 Sean -- Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead Research

Re: [slurm-users] [EXT] slurmctld error

2021-04-06 Thread Sean Crosby
It looks like your attachment of sinfo -R didn't come through It also looks like your dbd isn't set up correctly Can you also show the output of sacctmgr list cluster and scontrol show config | grep ClusterName Sean -- Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead Research

Re: [slurm-users] [EXT] slurmctld error

2021-04-06 Thread Sean Crosby
on all your nodes. It needs to be owned by user slurm ls -lad /var/spool/slurmd Sean -- Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead Research Computing Services | Business Services The University of Melbourne, Victoria 3010 Australia On Tue, 6 Apr 2021 at 20:37, Sean Crosby wrote

Re: [slurm-users] [EXT] slurmctld error

2021-04-06 Thread Sean Crosby
If that doesn't work, try changing AccountingStorageHost in slurm.conf to localhost as well For your worker nodes, your nodes are all in drain state. Show the output of scontrol show node wn001 It will give you the reason for why the node is drained. Sean -- Sean Crosby | Senior DevOpsHPC

Re: [slurm-users] [EXT] slurmctld error

2021-04-06 Thread Sean Crosby
I just checked my cluster and my spool dir is SlurmdSpoolDir=/var/spool/slurm (i.e. without the d at the end) It doesn't really matter, as long as the directory exists and has the correct permissions on all nodes -- Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead Research Computing

Re: [slurm-users] [EXT] slurmctld error

2021-04-05 Thread Sean Crosby
This will try connecting to port 6819 on the host 10.0.0.100, and output nothing if the connection works, and would output Connection not working otherwise I would also test this on the DBD server itself -- Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead Research Computing Services | Business

Re: [slurm-users] [EXT] slurmctld error

2021-04-05 Thread Sean Crosby
out the lines AccountingStorageUser=slurm AccountingStoragePass=/run/munge/munge.socket.2 You shouldn't need those lines Sean -- Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead Research Computing Services | Business Services The University of Melbourne, Victoria 3010 Australia On Mon

Re: [slurm-users] [EXT] Re: [External] maxRSS and aveRSS

2021-03-12 Thread Sean Crosby
On Sat, 13 Mar 2021 at 08:48, Prentice Bisbal wrote: > * UoM notice: External email. Be cautious of links, attachments, or > impersonation attempts * > -- > > It sounds like your confusing job steps and tasks. For an MPI program, > tasks and MPI ranks are the same

Re: [slurm-users] [EXT] Job ended with OUT_OF_MEMORY even though MaxRSS and MaxVMSize are under the ReqMem value

2021-03-16 Thread Sean Crosby
avid Chin, PhD (he/him) Sr. SysAdmin, URCF, Drexel > dw...@drexel.edu 215.571.4335 (o) > For URCF support: urcf-supp...@drexel.edu > https://proteusmaster.urcf.drexel.edu/urcfwiki > github:prehensilecode > > > -- > *From:* slurm-user

Re: [slurm-users] [EXT] Job ended with OUT_OF_MEMORY even though MaxRSS and MaxVMSize are under the ReqMem value

2021-03-15 Thread Sean Crosby
What are your Slurm settings - what's the values of ProctrackType JobAcctGatherType JobAcctGatherParams and what's the contents of cgroup.conf? Also, what version of Slurm are you using? Sean -- Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead Research Computing Services | Business

Re: [slurm-users] [EXT] Is it possible to set a default QOS per partition?

2021-03-01 Thread Sean Crosby
r QoS, set the OverPartQOS flag, and get the users to specify that QoS. Sean -- Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead Research Computing Services | Business Services The University of Melbourne, Victoria 3010 Australia On Tue, 2 Mar 2021 at 08:24, Stack Korora wrote: &g

Re: [slurm-users] [EXT] slurmctld error

2021-04-08 Thread Sean Crosby
resolution works. You have set the names in Slurm to be wn001-wn044, so every node has to be able to resolve those names. Hence the check using ping Sean -- Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead Research Computing Services | Business Services The University of Melbourne, Victoria

Re: [slurm-users] [EXT] slurmctld error

2021-04-08 Thread Sean Crosby
node Sean -- Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead Research Computing Services | Business Services The University of Melbourne, Victoria 3010 Australia On Thu, 8 Apr 2021 at 16:38, Ioannis Botsis wrote: > * UoM notice: External email. Be cautious of links

Re: [slurm-users] [EXT] [Beginner, SLURM 20.11.2] Unable to allocate resources when specifying gres in srun or sbatch

2021-04-12 Thread Sean Crosby
gt;AllocTRES= >CapWatts=n/a >CurrentWatts=0 AveWatts=0 >ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s >Comment=(null) > > > > > On Sun, Apr 11, 2021 at 2:03 AM Sean Crosby > wrote: > >> Hi Cristobal, >> >> My hunch is it is due

Re: [slurm-users] [EXT] [Beginner, SLURM 20.11.2] Unable to allocate resources when specifying gres in srun or sbatch

2021-04-11 Thread Sean Crosby
Hi Cristobal, My hunch is it is due to the default memory/CPU settings. Does it work if you do srun --gres=gpu:A100:1 --cpus-per-task=1 --mem=10G nvidia-smi Sean -- Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead Research Computing Services | Business Services The University

Re: [slurm-users] [EXT] How to determine (on the ControlMachine) which cores/gpus are assigned to a job?

2021-02-12 Thread Sean Crosby
~]# cat /sys/fs/cgroup/cpuset/slurm/uid_11470/job_24115684/cpuset.cpus 58 I will keep searching. I know we capture the real CPU ID as well, using daemons running on the worker nodes, and we feed that into Ganglia. Sean -- Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead Research Computing

Re: [slurm-users] [EXT] User association with partition and Qos

2021-08-27 Thread Sean Crosby
Hi Amjad, Make sure you have qos in the config entry AccountingStorageEnforce e.g. AccountingStorageEnforce=associations,limits,qos,safe Sean From: slurm-users on behalf of Amjad Syed Sent: Friday, 27 August 2021 20:28 To: slurm-us...@schedmd.com Subject:

Re: [slurm-users] [EXT] Re: EXTERNAL-Re: [External] scancel gpu jobs when gpu is not requested

2021-08-30 Thread Sean Crosby
Hi Fritz, job_submit_lua.so gets made upon compilation of Slurm if you have the lua-devel package installed at the time of configure/make. Sean From: slurm-users on behalf of Ratnasamy, Fritz Sent: Tuesday, 31 August 2021 15:05 To: Slurm User Community List

Re: [slurm-users] [EXT] User association with partition and Qos

2021-08-31 Thread Sean Crosby
...@gmail.com>> wrote: Hi Sean, Thanks for the suggestion, seems to work now. Majid On Fri, Aug 27, 2021 at 12:56 PM Sean Crosby mailto:scro...@unimelb.edu.au>> wrote: Hi Amjad, Make sure you have qos in the config entry AccountingStorageEnforce e.g. AccountingStorageEnforce=associa

Re: [slurm-users] [EXT] User association with partition and Qos

2021-08-31 Thread Sean Crosby
s root ? Can this be an issue Amjad On Tue, Aug 31, 2021 at 8:22 AM Sean Crosby mailto:scro...@unimelb.edu.au>> wrote: What does sacctmgr show for the user you added to have access to the QoS, and what does Slurm show for the partition config? sacctmgr show account withassoc -p scontr

Re: [slurm-users] [EXT] Re: Missing data in sreport for a time period in slurm

2021-10-21 Thread Sean Crosby
: Thursday, 21 October 2021 21:54 To: slurm-users@lists.schedmd.com ; Sean Crosby Subject: Re: [EXT] Re: [slurm-users] Missing data in sreport for a time period in slurm External email: Please exercise caution Hi Sean, After changing those values yesterday

Re: [slurm-users] [EXT] Re: Missing data in sreport for a time period in slurm

2021-10-18 Thread Sean Crosby
sreport keeps a track of when it has done the last rollup calculations in the database. Open MySQL for your Slurm accounting database, do select * from slurm_acct_db.clustername_last_ran_table; where slurm_acct_db is your accounting database name (slurm_acct_db is default), and clustername is

Re: [slurm-users] [EXT] Re: Missing data in sreport for a time period in slurm

2021-10-18 Thread Sean Crosby
; Sean Crosby Subject: Re: [EXT] Re: [slurm-users] Missing data in sreport for a time period in slurm External email: Please exercise caution Dear All, By checking the value of last ran table, hourly rollup shows today's date

Re: [slurm-users] [EXT] Re: slurmdbd does not work

2021-12-03 Thread Sean Crosby
Did you run ./configure (with any other options you normally use) make make install on your DBD server after you installed the mariadb-devel package? From: slurm-users on behalf of Giuseppe G. A. Celano Sent: Saturday, 4 December 2021 10:07 To: Slurm User

Re: [slurm-users] [EXT] Re: slurmdbd does not work

2021-12-03 Thread Sean Crosby
n of Mariadb are you using? Brian Andrus On 12/3/2021 4:20 PM, Giuseppe G. A. Celano wrote: After installation of libmariadb-dev, I have reinstalled the entire slurm with ./configure + options, make, and make install. Still, accounting_storage_mysql.so is missing. On Sat, Dec 4, 2021 at 12:24 A

Re: [slurm-users] problem building pam_slurm_adopt

2021-07-14 Thread Sean Crosby
Hi Mike, To build pam_slurm_adopt, you need the pam-devel package installed on the node you're building Slurm on. On RHEL, it's pam-devel, and Debian it's libpam-dev Once you have installed that, do ./configure again, and then you should be able to make the pam_slurm_adopt Sean

Re: [slurm-users] [EXT] slurmctld.log over 500 MB

2021-07-27 Thread Sean Crosby
Hi Felix, >From one of the recent Slurm user group meetings, the recommended way to >logrotate the Slurm logs is to send SIGUSR2. My logrotate entry is /var/log/slurm/slurmctld.log { compress missingok nocopytruncate nocreate delaycompress nomail notifempty noolddir rotate 5

Re: [slurm-users] How to open a slurm support case

2022-03-24 Thread Sean Crosby
Hi Jeff, The support system is here - https://bugs.schedmd.com/ Create an account, log in, and when creating a request, select your site from the Site selection box. Sean From: slurm-users on behalf of Jeffrey R. Lang Sent: Friday, 25 March 2022 08:48 To:

Re: [slurm-users] [EXT] Re: systemctl enable slurmd.service Failed to execute operation: No such file or directory

2022-01-31 Thread Sean Crosby
Did you build Slurm yourself from source? If so, when you build from source, on that node, you need to have the munge-devel package installed (munge-devel on EL systems, libmunge-dev on Debian) You then need to set up munge with a shared munge key between the nodes, and have the munge daemon

Re: [slurm-users] [EXT] Strange sbatch error with 21.08.2&5

2022-01-14 Thread Sean Crosby
Any error in slurmd.log on the node or slurmctld.log on the ctl? Sean From: slurm-users on behalf of Wayne Hendricks Sent: Saturday, 15 January 2022 16:04 To: slurm-us...@schedmd.com Subject: [EXT] [slurm-users] Strange sbatch error with 21.08.2&5 External

Re: [slurm-users] Temporary Stop User Submission

2023-05-25 Thread Sean Crosby
Hi Willy, sacctmgr modify account slurmaccount user=baduser set maxjobs=0 Sean From: slurm-users on behalf of Markuske, William Sent: Friday, 26 May 2023 09:16 To: slurm-users@lists.schedmd.com Subject: [EXT] [slurm-users] Temporary Stop User Submission

Re: [slurm-users] [EXT] --mem is not limiting the job's memory

2023-06-22 Thread Sean Crosby
On the worker node, check if cgroups are mounted grep cgroup /proc/mounts (normally it's in /sys/fs/cgroup ) then check if Slurm is setting up the cgroup find /sys/fs/cgroup | grep slurm e.g. [root@spartan-gpgpu164 ~]# find /sys/fs/cgroup/memory | grep slurm /sys/fs/cgroup/memory/slurm

Re: [slurm-users] [EXT] error: Couldn't find the specified plugin name for cred/munge looking at all files

2024-01-23 Thread Sean Crosby
slurmctld runs as the user slurm, whereas slurmd runs as root. Make sure the permissions on /app/slurm-24.0.8/lib/slurm allow the user slurm to read the files e.g. you could do (as root) sudo -u slurm ls /app/slurm-24.0.8/lib/slurm and see if the slurm user can read the directory (as well as