[slurm-dev] Re: How to strictly limit the memory per CPU
On 02/11/17 14:34, 马银萍 wrote: > It means that he used only one cpu and asked for 125G memoey, so he used > most of the memory on that node, then it will affect other user's job, > this is invalid. > So is there any way to strictly limit the avarage memory per CPU and > users can't override it? or any way to disable --mem and --mem-per-cpu ? I believe you can restrict the amount of memory jobs can use via TRES functionality: https://slurm.schedmd.com/tres.html it's not something we do here though. Best of luck! Chris -- Christopher SamuelSenior Systems Administrator Melbourne Bioinformatics - The University of Melbourne Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
[slurm-dev] Re: Qos limits associations and AD auth
On 18/10/17 16:27, Nadav Toledo wrote: > about B:ן¿½ The reason is I dont want to manually adding each user to > the slurm database (sacctmgr create user...) I'm afraid you don't really have an option there, if you want to use the slurmdbd limits then you're going to need to add the users to the database. You could script the addition/removal to avoid having to do it all by hand. Regarding the idea of a web portal - I think what is being suggested is not using AD but instead have your own LDAP server for the cluster which is populated via a web portal. If you are tied into using AD (which it sounds like you are) then that's not really an option for you. All the best, Chris -- Christopher SamuelSenior Systems Administrator Melbourne Bioinformatics - The University of Melbourne Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
[slurm-dev] Re: mysql job_table and step_table growth
On 19/10/17 05:24, Douglas Meyer wrote: > We have job_table purge set for 61 days and step_table for 11. Seems > to have no impact. So you have this in slurmdbd.conf? PurgeJobAfter=61days PurgeStepAfter=11days Anything in the logs when you start up slurmdbd? What does this say? sacctmgr list config | fgrep Purge cheer, Chris -- Christopher SamuelSenior Systems Administrator Melbourne Bioinformatics - The University of Melbourne Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
[slurm-dev] Re: mysql job_table and step_table growth
On 14/10/17 00:24, Doug Meyer wrote: > The job_table.idb and step_table.idb do not clear as part of day-to-day > slurmdbd.conf > > Have slurmdbd.conf set to purge after 8 weeks but this does not appear > to be working. Anything in your slurmdbd logs? -- Christopher SamuelSenior Systems Administrator Melbourne Bioinformatics - The University of Melbourne Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
[slurm-dev] Re: Slurm 17.02.7 and PMIx
On 05/10/17 11:27, Christopher Samuel wrote: > PMIX v1.2.2: Slurm complains and tells me it wants v2. I think that was due to a config issue on the system I was helping out with, after having to install some extra packages (like a C++ compiler) to get other things working I can no longer reproduce this issue. So next outage they get we can add PMIx support to Slurm (my test build compiled OK). cheers, Chris -- Christopher SamuelSenior Systems Administrator Melbourne Bioinformatics - The University of Melbourne Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
[slurm-dev] Re: Tasks distribution
On 09/10/17 22:11, Sysadmin CAOS wrote: > Now, after that, should srun distribute correctly my tasks as mpirun > does right? No, srun will distribute the tasks as how Slurm wants to, remember it's the MPI implementations job to listen to what the resource manager tells it to do, not the other way around. So the issue here is getting Slurm to allocate nodes in the way you wish. On my cluster I see: srun: Warning: can't honor --ntasks-per-node set to 4 which doesn't match the requested tasks 17 with the number of requested nodes 5. Ignoring --ntasks-per-node. That's Slurm 16.05.8. Do you see the same? Did you try both having CR_Pack_Nodes *and* specifying this? -n 17 --ntasks-per-node=4 cheers, Chris -- Christopher SamuelSenior Systems Administrator Melbourne Bioinformatics - The University of Melbourne Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
[slurm-dev] Re: Camacho Barranco, Roberto <rcamachobarra...@utep.edu> ssirimu...@utep.edu
On 10/10/17 07:21, Suman Sirimulla wrote: > We have installed and configured slurm on our cluster, but unable to > start the slurmctld daemon. We followed the instructions > (https://slurm.schedmd.com/troubleshoot.html) > <https://slurm.schedmd.com/troubleshoot.html%29> and tried to stop and > restart it multiple times but still not working. Please see the error below. Check your slurmctld.log, that should have hints about why it won't start. cheers! Chris -- Christopher SamuelSenior Systems Administrator Melbourne Bioinformatics - The University of Melbourne Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
[slurm-dev] Slurm 17.02.7 and PMIx
Hi folks, Just wondering if anyone here has had any success getting Slurm to compile with PMIx support? I'm trying 17.02.7 and I find that with PMIx I get either: PMIX v1.2.2: Slurm complains and tells me it wants v2. PMIX v2.0.1: Slurm can't find it because the header files are not where it is looking for them, and when I do a symlink hack to make PMIX detection work it then fails to compile, with: /bin/sh ../../../../libtool --tag=CC --mode=compile gcc -DHAVE_CONFIG_H -I. -I../../../.. -I../../../../slurm -I../../../.. -I../../../../src/common -I/usr/include -I/usr/local/pmix/latest/include -DHAVE_PMIX_VER=2 -g -O0 -pthread -Wall -g -O0 -fno-strict-aliasing -MT mpi_pmix_v2_la-pmixp_client.lo -MD -MP -MF .deps/mpi_pmix_v2_la-pmixp_client.Tpo -c -o mpi_pmix_v2_la-pmixp_client.lo `test -f 'pmixp_client.c' || echo './'`pmixp_client.c libtool: compile: gcc -DHAVE_CONFIG_H -I. -I../../../.. -I../../../../slurm -I../../../.. -I../../../../src/common -I/usr/include -I/usr/local/pmix/latest/include -DHAVE_PMIX_VER=2 -g -O0 -pthread -Wall -g -O0 -fno-strict-aliasing -MT mpi_pmix_v2_la-pmixp_client.lo -MD -MP -MF .deps/mpi_pmix_v2_la-pmixp_client.Tpo -c pmixp_client.c -fPIC -DPIC -o .libs/mpi_pmix_v2_la-pmixp_client.o pmixp_client.c: In function ‘_set_procdatas’: pmixp_client.c:468:24: error: request for member ‘size’ in something not a structure or union kvp->value.data.array.size = count; ^ pmixp_client.c:482:24: error: request for member ‘array’ in something not a structure or union kvp->value.data.array.array = (pmix_info_t *)info; ^ make[4]: *** [mpi_pmix_v2_la-pmixp_client.lo] Error 1 So I'm guessing that I'm missing something but the documentation for PMIX in Slurm seems pretty much non-existent. :-( Anyone had any luck with this? cheers, Chris -- Christopher SamuelSenior Systems Administrator Melbourne Bioinformatics - The University of Melbourne Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
[slurm-dev] Re: Upgrading Slurm
On 04/10/17 20:51, Gennaro Oliva wrote: > If you are talking about Slurm I would backup the configuration files > also. Not directly Slurm related but don't forget to install and configure etckeeper first. It puts your /etc/ directory under git version control and will do commits of changes before and after any package upgrade/install/removal so you have a good history of changes made. I'm assuming that the slurm config files in the Debian package are under /etc so that will be helpful to you for this. > Anyway there have been a lot of major changes in SLURM and in Debian since > 2013 (Wheezy release date), so be prepared that it will be no picnic. The Debian package name also changed from slurm-llnl to slurm-wlm at some point too, so missing the intermediate release may result in that not transitioning properly. To be honest I would never use a distros packages for Slurm, I'd always install it centrally (NFS exported to compute nodes) to keep things simple. That way you decouple your Slurm version from the OS and can keep it up to date (or keep it on a known working version). All the best! Chris -- Christopher SamuelSenior Systems Administrator Melbourne Bioinformatics - The University of Melbourne Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
[slurm-dev] Re: Setting up Environment Modules package
On 05/10/17 03:11, Mike Cammilleri wrote: > 2. Install Environment Modules packages in a location visible to the > entire cluster (NFS or similar), including the compute nodes, and the > user then includes their 'module load' commands in their actual slurm > submit scripts since the command would be available on the compute > nodes - loading software (either local or from network locations > depending on what they're loading) visible to the nodes This is what we do, the management node for the cluster exports its /usr/local read-only to the rest of the cluster. We also have in our taskprolog.sh: echo export BASH_ENV=/etc/profile.d/module.sh to try and ensure that bash shells have modules set up, just in case. :-) -- Christopher SamuelSenior Systems Administrator Melbourne Bioinformatics - The University of Melbourne Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
[slurm-dev] Re: Upgrading Slurm
On 04/10/17 17:12, Loris Bennett wrote: > Ole's pages on Slurm are indeed very useful (Thanks, Ole!). I just > thought I point out that the limitation on only upgrading by 2 major > versions is for the case that you are upgrading a production system and > don't want to lose any running jobs. The on disk format might for spooled jobs may also change between releases too, so you probably want to keep that in mind as well.. -- Christopher SamuelSenior Systems Administrator Melbourne Bioinformatics - The University of Melbourne Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
[slurm-dev] Re: Is PriorityUsageResetPeriod really required for hard limits?
On 29/09/17 06:34, Jacob Chappell wrote: > Hi all. The slurm.conf documentation says that if decayed usage is > disabled, then PriorityUsageResetPeriod must be set to some value. Is > this really true? What is the technical reason for this requirement if > so? Can we set this period to sometime far into the future to have > effectively an infinite period (no reset)? Basically this is because once a user exceeds something like their maximum CPU run time limit then they will never be able to run jobs again unless you either decay or reset usage. -- Christopher SamuelSenior Systems Administrator Melbourne Bioinformatics - The University of Melbourne Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
[slurm-dev] Re: Limiting SSH sessions to cgroups?
On 21/09/17 00:29, Jacob Chappell wrote: > I still have one weird issue. I'm probably missing another setting > somewhere. The cgroup that the SSH session is adopted into does not seem > to include the /dev files. That's something I can't help with I'm afraid, we're still on RHEL6. In a job there I see: $ cat /proc/$$/cgroup 4:cpuacct:/slurm/uid_500/job_6900206/step_0/task_0 3:memory:/slurm/uid_500/job_6900206/step_0 2:cpuset:/slurm/uid_500/job_6900206/step_0 1:freezer:/slurm/uid_500/job_6900206/step_0 and in my SSH session I see: $ cat /proc/$$/cgroup 4:cpuacct:/slurm/uid_500/job_6900206/step_extern/task_0 3:memory:/slurm/uid_500/job_6900206/step_extern/task_0 2:cpuset:/slurm/uid_500/job_6900206/step_extern 1:freezer:/slurm/uid_500/job_6900206/step_extern I'm about to start travelling for the Slurm User Group in a few hours, so I'll be off-air for quite a while. Good luck! All the best, Chris -- Christopher SamuelSenior Systems Administrator Melbourne Bioinformatics - The University of Melbourne Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
[slurm-dev] Re: Accounting using LDAP ?
On 20/09/17 17:14, Loris Bennett wrote: > Is the user management system homegrown or something more generally > available? Both, it was started as a project at $JOB-1 and open-sourced. http://karaage.readthedocs.org/ The current main developer no longer works in HPC (as $JOB-1 folded years after I left for here) but he's still looking after this as he's helping the university out in his spare time. He's currently moving from deploying it via Debian packages to using Docker. We have our own custom module for karaage for project creation as we needed a lot more information from applicants than what it captures by default, but that's the nice thing, it is modular. Also includes Shibboleth support. All the best! Chris -- Christopher SamuelSenior Systems Administrator Melbourne Bioinformatics - The University of Melbourne Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
[slurm-dev] Re: Accounting using LDAP ?
On 20/09/17 03:03, Carlos Lijeron wrote: > I'm trying to enable accounting on our SLURM configuration, but our > cluster is managed by Bright Management which has its own LDAP for users > and groups. When setting up SLURM accounting, I don't know how to make > the connection between the users and groups from the LDAP as opposed to > the local UNIX. Slurm just uses the host's NSS config for that, so as long as the OS can see the users and groups then slurmdbd will be able to see them too. *However*, you _still_ need to manually create users in slurmdbd to ensure that they can run jobs, but that's a separate issue to whether slurmdbd can resolve users in LDAP. I would hope that Bright would have the ability to do that for you rather than having you handle it manually, but that's a question for Bright. Best of luck, Chris -- Christopher SamuelSenior Systems Administrator Melbourne Bioinformatics - The University of Melbourne Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
[slurm-dev] Re: Limiting SSH sessions to cgroups?
On 20/09/17 06:39, Jacob Chappell wrote: > Thanks everyone who has replied. I am trying to get pam_slurm_adopt.so > implemented. Does it work with batch jobs? It does indeed, we use it as well. Do you have: PrologFlags=contain set? From slurm.conf: Contain At job allocation time, use the ProcTrack plugin to create a job container on all allocated compute nodes. This container may be used for user processes not launched under Slurm control, for example the PAM module may place processes launch through a direct user login into this container. Setting the Contain implicitly sets the Alloc flag. -- Christopher SamuelSenior Systems Administrator Melbourne Bioinformatics - The University of Melbourne Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
[slurm-dev] Re: Cores, CPUs, and threads: take 2
On 14/09/17 16:04, Lachlan Musicman wrote: > It's worth noting that before this change cgroups couldn't get down to > the thread level. We would only consume at the core level - ie, all jobs > would get an even number of cpus - jobs that requested an odd number of > cpus (threads) would be rounded up to the next even number. Did you have this set too (either explicitly or implicitly)? CR_ONE_TASK_PER_CORE Allocate one task per core by default. Without this option, by default one task will be allocated per thread on nodes with more than one ThreadsPerCore configured. cheers! Chris -- Christopher SamuelSenior Systems Administrator Melbourne Bioinformatics - The University of Melbourne Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
[slurm-dev] Re: Cores, CPUs, and threads: take 2
On 14/09/17 11:07, Lachlan Musicman wrote: > Node configuration differs from hardware: CPUs=8:8(hw) Boards=1:1(hw) > SocketsPerBoard=8:1(hw) CoresPerSocket=1:4(hw) ThreadsPerCore=1:2(hw) OK, so this is saying that Slurm is seeing: 8 CPUs 1 board 1 socket per board 4 cores per socket 2 threads per core which is what lscpu also describes the node as Whereas the config that it thinks it should have is: 8 CPUs 1 board 8 sockets per board 1 core per socket 1 thread per core which to me looks like what you would expect with just CPUS=8 in the config and nothing else. I guess a couple of questions: 1) Have you restarted slurmctld and slurmd everywhere? 2) Can you confirm that slurm.conf is the same everywhere? 3) what does slurmd -C report? cheers! Chris -- Christopher SamuelSenior Systems Administrator Melbourne Bioinformatics - The University of Melbourne Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
[slurm-dev] Re: Cores, CPUs, and threads: take 2
On 14/09/17 11:07, Lachlan Musicman wrote: > Node configuration differs from hardware: CPUs=8:8(hw) Boards=1:1(hw) > SocketsPerBoard=8:1(hw) CoresPerSocket=1:4(hw) ThreadsPerCore=1:2(hw) Hmm, are you virtualised by some chance? If so it might be that the VM layer is lying to the guest about the actual hardware layout. What does "lscpu" say? cheers, Chris -- Christopher SamuelSenior Systems Administrator Melbourne Bioinformatics - The University of Melbourne Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
[slurm-dev] Re: On the need for slurm uid/gid consistency
On 13/09/17 04:53, Phil K wrote: > I'm hoping someone can provide an explanation as to why slurm > requires uid/gid consistency across nodes, with emphasis on the need > for the 'SlurmUser' to be uid/gid-consistent. I think this is a consequence of the use of Munge, rather than being inherent in Slurm itself. https://dun.github.io/munge/ # It allows a process to authenticate the UID and GID of another # local or remote process within a group of hosts having common # users and groups Gory details are in the munged(8) manual page: https://github.com/dun/munge/wiki/Man-8-munged But I think the core of the matter is: # When a credential is validated, munged first checks the # message authentication code to ensure the credential has # not been subsequently altered. Next, it checks the embedded # UID/GID restrictions to determine whether the requesting # client is allowed to decode it. So if the UID's & GID's of the user differ across systems then it appears it will not allow the receiver to validate the message. cheers, Chris -- Christopher SamuelSenior Systems Administrator Melbourne Bioinformatics - The University of Melbourne Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
[slurm-dev] Re: Cores, CPUs, and threads: take 2
On 13/09/17 10:47, Lachlan Musicman wrote: > Chris how does this sacrifice performance? If none of my software > (bioinformatics/perl) is HT, surely I'm sacrificing capacity by leaving > one thread unused as jobs take an entire core? A HT is not a core, so if you are running multiple processes on a single core then you will have some form of extra contention - now how much of an impact that will have will depend on your application mix and your hardware generation. As ever, benchmark and see if you gain more than you lose in that method. For HPC stuff which tends to be compute bound the usual advice is to disable HT in the BIOS, but for I/O bound things you may not be so badly off. Hope that helps! Chris -- Christopher SamuelSenior Systems Administrator Melbourne Bioinformatics - The University of Melbourne Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
[slurm-dev] Re: Cores, CPUs, and threads: take 2
On 13/09/17 07:22, Patrick Goetz wrote: > All I have to say to this is: um, what? My take has always been that ThreadsPerCore is really for HPC workloads where you've decided not to disable HT full stop but want to allocate full cores to each task and then let the code have 2 threads per Slurm task (for HPC often that's the same as an MPI rank). > So, moving to a specific > implementation example, the question is should this configuration work > properly? I do what to include memory in the resource allocation > calculations, if possible. Hence: > > SelectType=select/cons_res > SelectTypeParameters=CR_CPU_Memory > NodeName=n[001-048] CPUs=16 RealMemory=61500 State=UNKNOWN > > > Is this going to work as expected? I would think so, basically you're saying you're willing to sacrifice performance and consider each HT unit a core to run a job on. All the best, Chris -- Christopher SamuelSenior Systems Administrator Melbourne Bioinformatics - The University of Melbourne Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
[slurm-dev] Re: Exceeded job memory limit problem
On 06/09/17 17:38, Sema Atasever wrote: > I tried the line of code what you recommended but the code still > generates an error unfortunately. We've seen issues where using: JobAcctGatherType=jobacct_gather/linux gathers incorrect values for jobs (in our experience MPI ones). We constrain jobs via cgroups and have found that using the cgroup plugin for this results in jobs not getting killed incorrectly. Using cgroups in Slurm is a definite win for us, so I would suggest looking into it if you've not already done so. All the best, Chris -- Christopher SamuelSenior Systems Administrator Melbourne Bioinformatics - The University of Melbourne Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
[slurm-dev] Re: Jobs cancelled "DUE TO TIME LIMIT" long before actual timelimit
On 30/08/17 04:34, Brian W. Johanson wrote: > Any idea on what would cause this? It looks like the job *step* hit the timelimit, not the job itself. Could you try the sacct command without the -X flag to see what the timelimit for the step was according to Slurm please? $ sacct -S 071417 -a --format JobID%20,State%20,timelimit,Elapsed,ExitCode -j 1695151 cheers, Chris -- Christopher SamuelSenior Systems Administrator Melbourne Bioinformatics - The University of Melbourne Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
[slurm-dev] Re: Fair resource scheduling
On 25/08/17 03:03, Patrick Goetz wrote: > 1. When users submit (say) 8 long running single core jobs, it doesn't > appear that Slurm attempts to consolidate them on a single node (each of > our nodes can accommodate 16 tasks). How much memory have you configured for your nodes and how much memory are these single CPU jobs asking for? That's one thing that can make Slurm need to start jobs on other nodes. You can also tell it to pack single CPU jobs onto nodes at the other end of the cluster with this: pack_serial_at_end If used with the select/cons_res plugin then put serial jobs at the end of the available nodes rather than using a best fit algorithm. This may reduce resource fragmentation for some work- loads. cheers, Chris -- Christopher SamuelSenior Systems Administrator Melbourne Bioinformatics - The University of Melbourne Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
[slurm-dev] Re: Delete jobs from slurmctld runtime database
On 22/08/17 01:59, Selch, Brigitte (FIDF) wrote: > That’s the reason for my question. I'm not aware of any way to do that, and I would advise against mucking around in the Slurm MySQL database directly. The idea of slurmdbd is to have a comprehensive view of all jobs (within its expiry parameters), and removing them will likely break its statistics and probably do Bad Things(tm). Here be dragons.. -- Christopher SamuelSenior Systems Administrator Melbourne Bioinformatics - The University of Melbourne Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
[slurm-dev] Re: CGroups, Threads as CPUs, TaskPlugins
On 15/08/17 09:41, Lachlan Musicman wrote: > I guess I'm not 100% sure what I'm looking for, but I do see that there > is a > > 1:name=systemd:/user.slice/user-0.slice/session-373.scope > > in /proc/self/cgroup Something is wrong in your config then. It should look something like: 4:cpuacct:/slurm/uid_3959/job_6779703/step_9/task_1 3:memory:/slurm/uid_3959/job_6779703/step_9/task_1 2:cpuset:/slurm/uid_3959/job_6779703/step_9 1:freezer:/slurm/uid_3959/job_6779703/step_9 for /proc/${PID_OF_PROC}/cgroup I notice you have /proc/self - that will be the shell you are running in for your SSH session and not the job! cheers, Chris -- Christopher SamuelSenior Systems Administrator Melbourne Bioinformatics - The University of Melbourne Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
[slurm-dev] Re: Proctrack cgroup; documentation bug
On 14/08/17 08:55, Lachlan Musicman wrote: > Was it here I read that proctrack/linuxproc was better than > proctrack/cgroup? I think you're thinking of JobAcctGatherType, but even then our experience there was that jobacct_gather/cgroup was more accurate. -- Christopher SamuelSenior Systems Administrator Melbourne Bioinformatics - The University of Melbourne Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
[slurm-dev] Re: RebootProgram - who uses it?
On 07/08/17 17:57, Aaron Knister wrote: > Good grief. "reboot" is a legacy tool?!?! I've about had enough of systemd. FWIW reboot is provided by the init system implementation (for instance on RHEL6 it's from upstart), and /sbin/reboot is only optional in the FHS. Only /sbin/shutdown is required by the FHS. http://www.pathname.com/fhs/2.2/fhs-3.14.html On proprietary UNIX versions reboot (not guaranteed to be in /sbin, it was /etc/reboot on Ultrix 4, /usr/sbin/reboot on Solaris) may not run shutdown scripts either (eg Solaris), you'd want to use shutdown for that. cheers, Chris -- Christopher SamuelSenior Systems Administrator Melbourne Bioinformatics - The University of Melbourne Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
[slurm-dev] Re: RebootProgram - who uses it?
On 07/08/17 14:08, Lachlan Musicman wrote: > In slurm.conf, there is a RebootProgram - does this need to be a direct > link to a bin or can it be a command? We have: RebootProgram = /sbin/reboot Works for us. cheers, Chris -- Christopher SamuelSenior Systems Administrator Melbourne Bioinformatics - The University of Melbourne Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
[slurm-dev] Re: Multifactor Priority Plugin for Small clusters
On 03/07/17 16:02, Loris Bennett wrote: > I don't think you can achieve what you want with Fairshare and > Multifactor Priority. Fairshare looks at distributing resources fairly > between users over a *period* of time. At any *point* in time it is > perfectly possible for all the resources to be allocated to one user. Loris is quite right about this, but it is possible to impose limits on a project if you chose to use slurmdbd. First you need to set up accounting: https://slurm.schedmd.com/accounting.html then you can set limits: https://slurm.schedmd.com/resource_limits.html Best of luck! Chris -- Christopher SamuelSenior Systems Administrator Melbourne Bioinformatics - The University of Melbourne Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
[slurm-dev] Re: How to get Qos limits
On 07/06/17 03:08, Kowshik Thopalli wrote: > I wish to know the max number of jobs that as a user I can run. That > is MaxJobsPerUser*. *I will be thankful if you can actually provide the > commands that I will have to execute. You probably want: sacctmgr list user ${USER} format=MaxJobsPerUser For a more general view you would do: sacctmgr list user ${USER} withassoc Hope this helps, Chris -- Christopher SamuelSenior Systems Administrator Melbourne Bioinformatics - The University of Melbourne Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
[slurm-dev] Re: srun - replacement for --x11?
On 06/06/17 23:46, Edward Walter wrote: > Doesn't that functionality come from a spank plugin? > https://github.com/hautreux/slurm-spank-x11 Yes, that's the one we use. Works nicely. Provides the --x11 option for srun. All the best, Chris -- Christopher SamuelSenior Systems Administrator Melbourne Bioinformatics - The University of Melbourne Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
[slurm-dev] Re: Accounting: preventing scheduling after TRES limit reached (permanently)
On 03/06/17 07:03, Jacob Chappell wrote: > Sorry, that was a mouthful, but important. Does anyone know if Slurm can > accomplish this for me. If so how? This was how we used to run prior to switching to fair-share. Basically you set: PriorityDecayHalfLife=0 which stops the values decaying over time so once they hit their limit that's it. We also set: PriorityUsageResetPeriod=QUARTERLY so that limits would reset on the quarter boundaries. This was because we used to have fixed quarterly allocations for projects. We went to fair-share because of a change of the funding model for us meant previous rules were removed and so we could go to fair-share which meant a massive improvement in utilisation (compute nodes were no longer idle with jobs waiting but unable to run because of being out of quota). NOTE: You can't have both fairshare and hard quotas at the same time. All the best, Chris -- Christopher SamuelSenior Systems Administrator Melbourne Bioinformatics - The University of Melbourne Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
[slurm-dev] Re: sinfo
On 25/05/17 05:12, Will French wrote: > We have an alias setup that shows free and allocated nodes grouped by feature: > > sinfo --format="%55N %.35f %.6a %.10A" Nice, here's an alternative that is more useful in our setup which groups nodes by reason and GRES. sinfo --format="%60N %.15G %.30E %.10A" The reason can be quite long, but there doesn't seem to be a way to just show the status as down/drain/idle/etc. cheers, Chris -- Christopher SamuelSenior Systems Administrator Melbourne Bioinformatics - The University of Melbourne Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
[slurm-dev] Re: Job ends successfully but spawned processes still run?
On 24/05/17 13:45, Lachlan Musicman wrote: > Not yet - that's part of the next update cycle :/ Ah well that might help, along with pam_slurm_adopt so that users SSH'ing into nodes they have jobs on are put into a cgroup of theirs on that node. Helps catch any legacy SSH based MPI launchers (and other naughtiness). Good luck! Chris -- Christopher SamuelSenior Systems Administrator Melbourne Bioinformatics - The University of Melbourne Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
[slurm-dev] Re: Job ends successfully but spawned processes still run?
Hiya, On 24/05/17 13:10, Lachlan Musicman wrote: > Occasionally I'll see a bunch of processes "running" (sleeping) on a > node well after the job they are associated with has finished. > > How does this happen - does slurm not make sure all processes spawned by > a job have finished at completion? Are you not using cgroups for enforcement? Usually that picks everything up. cheers, Chris -- Christopher SamuelSenior Systems Administrator Melbourne Bioinformatics - The University of Melbourne Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
[slurm-dev] Re: discrepancy between node config and # of cpus found
On 20/05/17 07:46, Jeff Avila wrote: > Yes, I did give that a try, though it didn’t seem to make any difference to > the error messages I got. Have you also set DefMemPerCPU and checked how much RAM is allocated to the jobs? Remember that you can have free cores but not free memory on a node and then Slurm isn't going to put more jobs there (unless you tell it to ignore memory, which is not likely to end well). All the best, Chris -- Christopher SamuelSenior Systems Administrator Melbourne Bioinformatics - The University of Melbourne Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
[slurm-dev] Re: LDAP required?
On 13/04/17 18:32, Janne Blomqvist wrote: > 15.08 should also work with enumeration disabled, except for > AllowGroups/DenyGroups partition specifications. I'm pretty sure this was what we got stuck on, and so had to drop AD. > So how do you manage user accounts? Just curious if someone has a sane > middle ground between integrating with the organization user account > system (AD or whatever) and DIY. We use some software that was developed at my previous employer called Karaage to manage our projects, including allowing project leaders to invite members, talking to LDAP and integrating with slurmdbd. Sadly my previous employer shut down at the end of 2015 (long after I left I hasten to add!) and the person who was doing a lot of that work has moved on to other things and only tinkers with the code base. That said there are 2 different HPC organisations inside the university using it and the other group use it with Shibboleth integration so that people with AAF (Australian Access Federation) credentials can auth to the web interface with their institutional ID (though of course it still creates a separate LDAP account for them). https://github.com/Karaage-Cluster All the best, Chris -- Christopher SamuelSenior Systems Administrator Melbourne Bioinformatics - The University of Melbourne Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
[slurm-dev] Re: LDAP required?
On 13/04/17 01:47, Jeff White wrote: > +1 for Active Directory bashing. I wasn't intending to "bash" AD here, just that the AD that we were trying to use (and I suspect that Lachlan might me talking to) has tens of thousands of accounts in it and we just could not get the Slurm->sssd->AD chain to work reliably to be able to run a production system. This was with both sssd trying to enumarate the whole domain and also (before that) trying to get Slurm to work without sssd enumeration. Smaller AD domains might work more reliably, but that's not where we sit so we fell back to using our own LDAP server with Karaage to manage project/account applications, adding people to slurmdbd, etc. All the best, Chris -- Christopher SamuelSenior Systems Administrator Melbourne Bioinformatics - The University of Melbourne Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
[slurm-dev] Re: LDAP required?
On 11/04/17 16:05, Lachlan Musicman wrote: > Our auth actually backs onto an Active Directory domain You have my sympathies. That caused us no end of headaches when we tried that on a cluster I help out on and in the end we gave up and fell back to running our own LDAP to make things reliable again. +1 for running your own LDAP. I would seriously look at a cluster toolkit for running nodes, especially if it supports making a single image that your compute nodes then netboot. That way you know everything is consistent. Best of luck, Chris -- Christopher SamuelSenior Systems Administrator Melbourne Bioinformatics - The University of Melbourne Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
[slurm-dev] Re: Jobs submitted simultaneously go on the same GPU
On 10/04/17 21:08, Oliver Grant wrote: > We did not have a gres.conf file. I've created one: > cat /cm/shared/apps/slurm/var/etc/gres.conf > # Configure support for our four GPU > NodeName=node[001-018] Name=gpu File=/dev/nvidia[0-3] > > I've read about "global" and "per-node" gres.conf, but I don't know how > to implement them or if I need to? Yes you do. Here's an (anonymised) example from a cluster that I help with that has both GPUs and MIC's on various nodes. # We will have GPU & KNC nodes so add the GPU & MIC GresType to manage them GresTypes=gpu,mic # Node definitions for nodes with GPUs NodeName=thing-gpu[001-005] Weight=3000 NodeAddr=thing-gpu[001-005] RealMemory=254000 CoresPerSocket=6 Sockets=2 Gres=gpu:k80:4 # Node definitions for nodes with Xeon Phi NodeName=thing-knc[01-03] Weight=2000 NodeAddr=thing-knc[01-03] RealMemory=126000 CoresPerSocket=10 Sockets=2 ThreadsPerCore=2 Gres=mic:5110p:2 You'll also need to restart slurmctld & all slurmd's to pick up this new config, I don't think "scontrol reconfigure" will deal with this. Best of luck, Chris -- Christopher SamuelSenior Systems Administrator Melbourne Bioinformatics - The University of Melbourne Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
[slurm-dev] Distinguishing past jobs that waited due to dependencies vs resources?
Hi folks, We're looking at wait times on our clusters historically but would like to be able to distinguish jobs that had long wait times due to dependencies rather than just waiting for resources (or because the user had too many other jobs in the queue at that time). A quick 'git grep' of the source code after reading 'man sacct' and not finding anything (also running 'sacct -e' and not seeing anything useful there either) doesn't offer much hope. Anyone else dealing with this? We're on 16.05.x at the moment with slurmdbd. All the best, Chris -- Christopher SamuelSenior Systems Administrator Melbourne Bioinformatics - The University of Melbourne Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
[slurm-dev] Re: Randomly jobs failures
On 11/04/17 17:42, Andrea del Monaco wrote: > [2017-04-11T08:22:03+02:00] error: Error opening file > /cm/shared/apps/slurm/var/cm/statesave/job.830332/script, No such file > or directory > [2017-04-11T08:22:03+02:00] error: Error opening file > /cm/shared/apps/slurm/var/cm/statesave/job.830332/environment, No such > file or directory I would suggest that you are looking at transient NFS failures (which may not be logged). Are you using NFSv3 or v4 to talk to the NFS server and what are the OS's you are using for both? cheers, Chris -- Christopher SamuelSenior Systems Administrator Melbourne Bioinformatics - The University of Melbourne Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
[slurm-dev] Re: Scheduling jobs according to the CPU load
On 22/03/17 08:35, kesim wrote: > You are right. Many thanks for correcting. Just note that load average is not necessarily the same as CPU load. If you have tasks blocked for I/O they will contribute to load average but will not be using much CPU at all. So, for instance, on one of our compute nodes a Slurm job can ask for 1 core, start 100 tasks doing heavy I/O, they all use the same 1 core and get the load average to 100 but the other 31 cores on the node are idle and can quite safely be used for HPC work. The manual page for "uptime" on RHEL7 describes it thus: # System load averages is the average number of processes that # are either in a runnable or uninterruptable state. A process # in a runnable state is either using the CPU or waiting to use # the CPU. A process in uninterruptable state is waiting for # some I/O access, eg waiting for disk. All the best, Chris -- Christopher SamuelSenior Systems Administrator Melbourne Bioinformatics - The University of Melbourne Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
[slurm-dev] Re: Fwd: Scheduling jobs according to the CPU load
On 19/03/17 23:25, kesim wrote: > I have 11 nodes and declared 7 CPUs per node. My setup is such that all > desktop belongs to group members who are using them mainly as graphics > stations. Therefore from time to time an application is requesting high > CPU usage. In this case I would suggest you carve off 3 cores via cgroups for interactive users and give Slurm the other 7 to parcel out to jobs by ensuring that Slurm starts within a cgroup dedicated to those 7 cores.. This is similar to the "boot CPU set" concept that SGI came up with (at least I've not come across people doing that before them). To be fair this is not really Slurm's problem to solve, Linux gives you the tools to do this already, it's just that people don't realise that you can use cgroups to do this. Your use case is valid, but it isn't really HPC, and you can't really blame Slurm for not catering to this. It can use cgroups to partition cores to jobs precisely so it doesn't need to care what the load average is - it knows the kernel is ensuring the cores the jobs want are not being stomped on by other tasks. Best of luck! Chris -- Christopher SamuelSenior Systems Administrator Melbourne Bioinformatics - The University of Melbourne Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
[slurm-dev] Re: reporting used memory with job Accounting or Completion plugins?
On 11/03/17 09:36, Chris Samuel wrote: > If you use the slurmdbd accounting database then yes, you get information > about memory usage (both RSS and VM). > > Have a look at the sacct manual page and look for MaxRSS and MaxVM. I should mention that for jobs that trigger job steps with srun you can also monitor them as the job is going with 'sstat' (rather than just post-mortem with sacct). All the best, Chris -- Christopher SamuelSenior Systems Administrator Melbourne Bioinformatics - The University of Melbourne Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
[slurm-dev] Re: Storage of job submission and working directory paths
On 08/03/17 08:15, Chad Cropper wrote: > Am I missing something? Why is it that the DBD cannot store these 2 > pieces of information? I suspect it's just not been requested, I'd suggest opening a feature request at: http://bugs.schedmd.com/ Report the bug ID back as it would be useful to us here too. All the best, Chris -- Christopher SamuelSenior Systems Administrator Melbourne Bioinformatics - The University of Melbourne Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
[slurm-dev] Re: slurmctld not pinging at regular interval
On 17/02/17 05:36, Allan Streib wrote: > t-019 is one of my nodes that's frequently "down" according to slurm but > really isn't. What is that "Can't find an address" about? DNS lookups > seem to be working fine in a shell on the same machine. This looks to be an issue when Slurm is wanting to forward messages and trying to find hosts in slurm.conf: src/common/forward.c - _forward_thread(): /* repeat until we are sure the message was sent */ while ((name = hostlist_shift(hl))) { if (slurm_conf_get_addr(name, ) == SLURM_ERROR) { error("forward_thread: can't find address for host " "%s, check slurm.conf", name); slurm_mutex_lock(_struct->forward_mutex); mark_as_failed_forward(_struct->ret_list, name, SLURM_UNKNOWN_FORWARD_ADDR); free(name); if (hostlist_count(hl) > 0) { slurm_mutex_unlock(_struct->forward_mutex); continue; } goto cleanup; } It would be interesting to know if increasing your TreeWidth to 256 would help (basically turn off forwarding if I'm reading it right). TreeWidth Slurmd daemons use a virtual tree network for communications. TreeWidth specifies the width of the tree (i.e. the fanout). On architectures with a front end node running the slurmd daemon, the value must always be equal to or greater than the number of front end nodes which eliminates the need for message forwarding between the slurmd daemons. On other architectures the default value is 50, meaning each slurmd daemon can communicate with up to 50 other slurmd daemons and over 2500 nodes can be contacted with two message hops. The default value will work well for most clusters. Optimal system performance can typically be achieved if TreeWidth is set to the square root of the number of nodes in the cluster for systems having no more than 2500 nodes or the cube root for larger systems. The value may not exceed 65533. If so then I suspect that this is a possible transient DNS failure? All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci
[slurm-dev] Re: New User Creation Issue
On 16/02/17 09:45, Lachlan Musicman wrote: [partitions down in slurm.conf] > What's the reasoning behind this? So that you can test the cluster still > works with debug before jobs start getting submitted and failing? Yeah, pretty much! It's a continuation from when running Torque+Moab/Maui here and at VPAC before that - we would always start Moab paused so we could check out what impact any changes had to our queues & priorities before starting jobs running. Measure twice, cut once. cheers! Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci
[slurm-dev] Re: New User Creation Issue
On 16/02/17 07:09, Katsnelson, Joe wrote: > Just curious is there a way to restart slurm to get the below working > without impacting the current jobs that are running? You should be able to restart slurmctld with running jobs quite safely, if you are paranoid (like me) then just mark partitions down first so you know Slurm won't be trying to start any jobs just when you shut it down. We also have all our partitions (other than our debug one reserved for sysadmins) marked as "State=DOWN" in slurm.conf so that they won't start jobs when slurmctld is brought back up again. All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci
[slurm-dev] Re: New User Creation Issue
Hi Joe, On 27/01/17 09:01, Katsnelson, Joe wrote: > sacctmgr list clusters Sorry I missed this before! Can you check that the machine where your slurmdbd is running can connect to 172.16.0.1 on port 6817 please? If it can't then that'll be the reason why you need to restart. cheers, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci
[slurm-dev] Re: Job priority/cluster utilization help
On 08/02/17 11:19, Vicker, Darby (JSC-EG311) wrote: > Sorry for the long post but not sure how to get adequate help without > providing a lot of detail. Any recommendations on configuring the > scheduler to help these jobs run and increase the cluster utilization > would be appreciated. My one thought after a quick scan is that both the jobs you mention are listed as reason "Priority" and there's a higher priority job 1772 in the list before them. You might want to look at your backfill settings to see whether it's looking far enough down the queue to see these. Perhaps an alternative idea would be to instead of using features use partitions and then have people submit to all partitions (there is a plugin for that, though we use a submit filter instead to accomplish the same). That way Slurm should consider each job against each partition (set of architectures) individually. Best of luck! Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci
[slurm-dev] Re: 16.05.8 bug with memory handling?
On 28/01/17 09:04, Bill Broadley wrote: > I have a very simple script: > #!/bin/bash -l > #SBATCH --mem=35 > #SBATCH --time=96:00:00 > #SBATCH --nodes=1 > #SBATCH --cpus-per-task=4 > > srun date > > It acts exactly like I expect with srun: [...] > So 4 CPUs are allocated, but the binary is run once. If I used -n 4 I'd > expect the binary to be run 4 times. But the srun you are using to launch the script doesn't ask for any resources, so it'll get the default which I would expect to be 1 core and so the srun inside the script will just use that 1 core. > But now with sbatch: That's more peculiar, I would expect it to get 1 task with 4 CPUs associated and then just run date once, with access to the 4 cores. Before your script does "srun date" add: env | fgrep SLURM echo '' scontrol show job -dd ${SLURM_JOB_ID} echo '' In my Slurm environment it does just what I expect, runs date once with 4 cores allocated, to whit the environment has: SLURM_JOB_CPUS_PER_NODE=4 and scontrol shows: [...] NumNodes=1 NumCPUs=4 NumTasks=1 CPUs/Task=4 ReqB:S:C:T=0:0:*:* [...] Best of luck, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci
[slurm-dev] Re: Daytime Interactive jobs
On 28/01/17 04:08, Skouson, Gary B wrote: > We'd like to have some nodes available during the workday exclusively > for interactive or debug jobs. These are fairly small, short running > jobs. I'd like to make these nodes available for other jobs at night > and on weekends. With Slurm you can have reservations that always start a certain distance into the future: TIME_FLOATThe reservation start time is relative to the current time and moves forward through time (e.g. a StartTime=now+10minutes will always be 10 minutes in the future). They added that for us and we use it to reserve a node 24x7 for jobs of less than 2 hours (we have a reasonable amount of those to be able to justify this). But, I can't see from the docs how that would work with the DAILY flag to get it to repeat at the same time each day. :-( Might be a feature request.. cheers, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci
[slurm-dev] Re: New User Creation Issue
On 24/01/17 08:34, Katsnelson, Joe wrote: > I’m having an issue creating new users on our cluster. After > running the below commands slurm has to be restarted in order for that > user to be able to run sbatch. Otherwise they get an error When I've seen that before it's been because the slurmdbd cannot connect back to slurmctld to send RPCs on the IP address that slurmctld has registered with slurmdbd. What does this say? sacctmgr list clusters cheers, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci
[slurm-dev] Re: Job temporary directory
On 23/01/17 08:40, Lachlan Musicman wrote: > We use the SPANK plugin found here > > https://github.com/hpc2n/spank-private-tmp > > and find it works very well. +1 to that, though we had to customise it to our environment (it breaks when your nodes are diskless and your scratch area is a high-performance parallel filesystem shared across all nodes). https://github.com/vlsci/spank-private-tmp All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci
[slurm-dev] Re: mail job status to user
On 16/01/17 15:56, Ryan Novosielski wrote: > I think you are right actually. Might have also been configurable > system-wide. The difference though, still, is that you don't have to > provide an e-mail address, so you could share a script like this and it > would work for anyone without modifying it. You don't need to provide an email address to Slurm either: --mail-user= User to receive email notification of state changes as defined by --mail-type. The default value is the submitting user. Our Postfix config rewrites their username to their registered email address that's stored in LDAP. cheers, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci
[slurm-dev] Re: mail job status to user
On 10/01/17 18:56, Ole Holm Nielsen wrote: > For the record: Torque will always send mail if a job is aborted It's been a few years since I've used Torque so I don't remember that behaviour. Thanks for the info! -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci
[slurm-dev] Re: mail job status to user
On 14/01/17 09:28, Steven Lo wrote: > Is it true that there is no configurable parameter to achieve what we > want to do and user need to specific in > either the sbatch or the submission script? Not that I'm aware of. A submit filter would let you set that up though. All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci
[slurm-dev] Re: Prolog behavior with and without srun
On 10/01/17 10:57, Christopher Samuel wrote: > If you are unlucky enough to have SSH based job launchers then you would > also look at the BYU contributed pam_slurm_adopt Actually this is useful even without that as it allows users to SSH into a node they have a job on and not disturb the cores allocated to other jobs on the node, just their own. You could argue that this is more elegant though, to add an interactive shell job step to a running job: [samuel@barcoo ~]$ srun --jobid=6522365 --pty -u ${SHELL} -i -l [samuel@barcoo010 ~]$ [samuel@barcoo010 ~]$ cat /proc/$$/cgroup 4:cpuacct:/slurm/uid_500/job_6522365/step_1/task_0 3:memory:/slurm/uid_500/job_6522365/step_1 2:cpuset:/slurm/uid_500/job_6522365/step_1 1:freezer:/slurm/uid_500/job_6522365/step_1 -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci
[slurm-dev] Re: Prolog behavior with and without srun
On 06/01/17 15:08, Vicker, Darby (JSC-EG311) wrote: > Among other things we want the prolog and epilog scripts to clean up any > stray processes. I would argue that a much better way to do that is to use Slurm's cgroups support, that will contain a jobs processes into the cgroup allowing it to kill off only those processes (and not miss any) when the job ends. If you are unlucky enough to have SSH based job launchers then you would also look at the BYU contributed pam_slurm_adopt which will put those tasks into the cgroup of that users job on the node they are trying to SSH into. You do need PrologFlags=contain for that to ensure that all jobs get an "extern" batch step on job creation for these processes to be adopted into. We use both here with great success. All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci
[slurm-dev] Re: Question about -m cyclic and --exclusive options to slurm
On 04/01/17 10:29, Koziol, Lucas wrote: > I want to have 1 batch script, where I reserve a certain large number > of CPUs, and then run multiple 1-CPU tasks from within this single > script. The reason being that I do several cycles of these tasks, and > I need to process the outputs between tasks. OK, I'm not sure how Slurm will behave with multiple srun's and cons_res and CR_LLN but it's still worth a shot. Best of luck! Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci
[slurm-dev] Re: Question about -m cyclic and --exclusive options to slurm
On 04/01/17 10:19, Koziol, Lucas wrote: > Will Slurm read a local slurm.conf file or in my home directory? No, I'm afraid not, it's a global configuration thing. > The default slurm.conf file I can't modify. I can ask the admins here > to modify it I I have to though. I strongly believe that will be necessary, sorry! Best of luck, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci
[slurm-dev] Re: Question about -m cyclic and --exclusive options to slurm
On 04/01/17 04:20, Koziol, Lucas wrote: > The hope was that all 16 tasks would run on Node 1, and 16 tasks would > run on Node 2. Unfortunately what happens is that all 32 jobs get > assigned to Node 1. I thought –m cyclic was supposed to avoid this. You're only running a single task at a time, so it's a bit hard for srun to distribute 1 task over multiple nodes. :-) The confusion is, I suspect, that job steps (an srun instance) are not the same as tasks (individual processes launched in a job step). The behaviour in the manual page is for things like MPI jobs where you want to distribute the many ranks (tasks) over nodes/sockets/cores in a particular way - in this instance a single srun might be launching 10's through to 100,000's of tasks (or more) at once. What might work better for you is to use a job array for your work instead of a single Slurm job and then have this in your slurm.conf: SelectType=select/cons_res SelectTypeParameters=CR_LLN This should get Slurm to distribute the job array elements across nodes picking the least loaded (allocated) node in each case. Job arrays are documented here: https://slurm.schedmd.com/job_array.html Hope this helps! All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci
[slurm-dev] Re: SLURM reports much higher memory usage than really used
On 16/12/16 10:33, Kilian Cavalotti wrote: > I remember Danny recommending to use jobacct_gather/linux over > jobacct_gather/cgroup, because "cgroup adds quite a bit of overhead > with very little benefit". > > Did that change? We took that advice but reverted because of this issue (from memory). -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci
[slurm-dev] Re: SLURM reports much higher memory usage than really used
On 16/12/16 02:15, Stefan Doerr wrote: > If I check on "top" indeed it shows all processes using the same amount > of memory. Hence if I spawn 10 processes and you sum usages it would > look like 10x the memory usage. Do you have: JobAcctGatherType=jobacct_gather/linux or: JobAcctGatherType=jobacct_gather/cgroup If the former, try the latter and see if it helps get better numbers (we went to the former after suggestions from SchedMD but from highly unreliable memory had to revert due to similar issues to those you are seeing). Best of luck, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci
[slurm-dev] Re: job arrays, fifo queueing not wanted
On 14/12/16 04:57, Michael Miller wrote: > thank you for your answer. I do not need round-robin - I need some > mechanism that allows both/multiple job arrays to share the resources. We set CPU limits per association per cluster: ${SACCTMGR} -i modify account set grpcpus=256 where cluster=snowy So no project can use more than 256 cores at once. You can also do that for nodes, GrpCpuMins (product of cores and runtime), etc. This only makes sense if your cluster is going to be overcommitted most of the time though, otherwise you may have jobs pending due to limits with idle resources. All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci
[slurm-dev] RE: Wrong behaviour of "--tasks-per-node" flag
On 19/11/16 03:38, Manuel Rodríguez Pascual wrote: > sbatch --ntasks=16 --tasks-per-node=2 --wrap 'mpiexec ./helloWorldMPI' If your MPI stack properly supports Slurm shouldn't that be: sbatch --ntasks=16 --tasks-per-node=2 --wrap 'srun ./helloWorldMPI' ? Otherwise you're at the mercy of what your mpiexec chooses to do. -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci
[slurm-dev] RE: Wrong behaviour of "--tasks-per-node" flag
On 28/10/16 18:20, Manuel Rodríguez Pascual wrote: > -bash-4.2$ sbatch --ntasks=16 --tasks-per-node=2 test.sh Could you post the content of your batch script too please? We're not seeing this on 16.05.5, but I can't be sure I'm correctly replicating what you are seeing. cheers, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci
[slurm-dev] Re: Gres issue
On 17/11/16 11:31, Christopher Samuel wrote: > It depends on the library used to pass options, Oops - that should be parse, not pass. Need more caffeine.. -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci
[slurm-dev] Re: Gres issue
On 17/11/16 00:04, Michael Di Domenico wrote: > this might be nothing, but i usually call --gres with an equals > > srun --gres=gpu:k10:8 > > i'm not sure if the equals is optional or not It depends on the library used to pass options, I'm used to it being mandatory but apparently with Slurm it's not - just tested it out and using: --gres mic results in my job being scheduled on a Phi node with OFFLOAD_DEVICES=0 set in its environment. All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci
[slurm-dev] Re: Using slurm to control container images?
On 16/11/16 12:23, Lachlan Musicman wrote: > Has anyone tried Shifter out and has there been any movement on this? I > presume the licensing issues remain. We've got both Shifter and Singularity set up at VLSCI for users. https://www.vlsci.org.au/documentation/running_jobs/shifter/ https://www.vlsci.org.au/documentation/running_jobs/singularity/ The important thing to recognise for both is that they are *NOT* Docker, but they are both able to use Docker containers. Shifter imports directly from the Docker registry (or from other registries that you configure) and lives entirely on your HPC system, Singularity needs to be installed on the users system and configured, but they can do the conversion there and there is no central public repo of converted containers (a plus if you need a private container, a minus if you're going to end up with hundreds of copies). Having private containers is on the roadmap for Shifter. Shifter also integrates with Slurm. All the best! Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci
[slurm-dev] Re: How to account how many cpus/gpus per node has been allocated to a specific job?
On 09/11/16 13:07, Ran Du wrote: >However, the scheduler must have information about separate > allocated number on each node, or they cannot track how many resources > left on each node. The question is, if SLURM keep these separate numbers > in files(e.g. log files or database), or just keep them in memory. I am > going to read other docs info, to see if there is any lead. I strongly suspect it's only held in memory by slurmctld whilst running (and in its state files of course). Unfortunately it doesn't even appear in "scontrol show job --detail" from what I can see. :-( Here's the lines from a test job of mine: TRES=cpu=2,mem=8G,node=2 Socks/Node=* NtasksPerN:B:S:C=0:0:*:1 CoreSpec=* Nodes=barcoo[069-070] CPU_IDs=0 Mem=4096 MinCPUsNode=1 MinMemoryNode=4G MinTmpDiskNode=0 Features=(null) Gres=mic:1 Reservation=(null) All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci
[slurm-dev] Re: How to account how many cpus/gpus per node has been allocated to a specific job?
On 09/11/16 12:15, Ran Du wrote: >Thanks a lot for your reply. However, it's not what I want to > get. For the example of Job 6449483, it is allocated with only one node, > what if it was allocated with multiple nodes? I'd like to get the > accounting statistics about how many CPUs/GPUs separately on each node, > but not the sum number on all nodes. Oh sorry, that's my fault, I completely misread what you were after and managed to invert your request! I don't know if that information is included in the accounting data. I believe the allocation is uniform across the nodes, for instance: $ sbatch --gres=mic:1 --mem=4g --nodes=2 --wrap /bin/true resulted in: $ sacct -j 6449484 -o jobid%20,jobname,alloctres%20,allocnodes,allocgres JobIDJobNameAllocTRES AllocNodesAllocGRES -- -- 6449484 wrap cpu=2,mem=8G,node=2 2mic:2 6449484.batch batch cpu=1,mem=4G,node=1 1mic:2 6449484.extern extern cpu=2,mem=8G,node=2 2mic:2 The only oddity there is that the batch step is of course only on the first node, but it says it was allocated 2 GRES. I suspect that's just a symptom of Slurm only keeping a total number. I don't think Slurm can give you an uneven GRES allocation, but the SchedMD folks would need to confirm that I'm afraid. All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci
[slurm-dev] Re: Re:
On 09/11/16 09:50, Lachlan Musicman wrote: > I don't know Chris, I think that /dev/null would rate tbh. :) Ah, but that's a file (OK character special device), not a directory. ;-) -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci
[slurm-dev] Re: Re:
Hi there, On 08/11/16 21:21, Alexandre Strube wrote: > For example, on slurm 10.05.6, the example config file says: > > StateSaveLocation=/tmp > > Which is not the best place to write sensitive information, but it will > for sure be there and will be writable by the slurm user. Frankly using /tmp seems like a *really* bad idea to me. The reason is that (depending on system configuration) may be cleaned up either on a reboot (either scripts or by using non-persistent tmpfs) or periodically using tmpwatch or similar scripts. So if you've got jobs queued for any period of time that information will be lost. We build from source and use: StateSaveLocation = /var/spool/slurm/jobs but the decision is yours where exactly to put it. But /tmp is almost certainly the second worst place (after /dev/shm). All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci
[slurm-dev] Re: sinfo man page
On 08/11/16 11:42, Lachlan Musicman wrote: > > %CNumber of CPUs by state in the format > "allocated/idle/other/total". Do not use this with a node state > option ("%t" or "%T") or the different node states will be placed on > separate lines. > > I presume I am doing something wrong? I think it's a bug (possibly just in the manual page) as the -N option says there: -N, --Node Print information in a node-oriented format with one line per node. The default is to print informa- tion in a partition-oriented format. This is ignored if the --format option is specified. Except it's not being ignored when you use --format (-o). All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci
[slurm-dev] Re: Passing binding information
On 02/11/16 02:01, Riebs, Andy wrote: > Interesting -- thanks for the info Chris. No worries, it's a bit sad I think, but I can understand it. -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci
[slurm-dev] Re: Passing binding information
On 01/11/16 05:43, Andy Riebs wrote: > Does anyone have any recent experience with this code who can answer the > questions? Unfortunately it looks like all SchedMD folks have dropped off the mailing list (apart from posting announcements), presumably due to workload. You may want to contact them directly. All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci
[slurm-dev] Re: Slurm versions 16.05.6 and 17.02.0-pre3 are now available
On 29/10/16 00:58, Peixin Qiao wrote: > Will the slurm version 16.05.6 support ubuntu 16.04? If you build it from source I suspect any moderately recent version will work there. If you are asking about the Ubuntu packaged version, then that's a question for Canonical, not SchedMD. :-) All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci
[slurm-dev] Re: How to restart a job "(launch failed requeued held)"
On 28/10/16 08:44, Lachlan Musicman wrote: > So I checked the system, noticed that one node was drained, resumed it. > Then I tried both > > scontrol requeue 230591 > scontrol resume 230591 What happens if you "scontrol hold" it first before "scontrol release"'ing it? -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci
[slurm-dev] Query number of cores allocated per node for a job
Hi all, I can't help but think I'm missing something blindingly obvious, but does anyone know how to find out how Slurm has distributed a job in terms of cores per node? In other words, if I submit: sbatch --ntasks=64 --wrap sleep 60 on a system with (say 16 core nodes where nodes are already running disparate number of jobs using variable cores, how do I see what cores on what nodes Slurm has allocated my running job? I know I can go and poke around with cgroups, but is there a way to get that out of squeue, sstat or sacct? All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci
[slurm-dev] Re: slurm_load_partitions: Unable to contact slurm controller (connect failure)
On 25/10/16 10:05, Peixin Qiao wrote: > > I installed slurm-llnl on Debian on one computer. When I ran slurmctld > and slurmd, I got the error: > slurm_load_partitions: Unable to contact slurm controller (connect failure). Check your firewall rules to ensure that those connections aren't getting blocked, and also check that the hostname correctly resolves. -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci
[slurm-dev] Re: sreport "duplicate" lines
On 21/10/16 13:07, Andrew Elwell wrote: > Yep, and for that particular account, not all of the members are > showing twice - I can't work out what causes it Looks like you've somehow created partition specific associations for some people - not something we do at all. I suspect that's what's triggering the different display in sreport, a line per association/partition. -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci
[slurm-dev] Re: sreport "duplicate" lines
On 21/10/16 12:29, Andrew Elwell wrote: > When running sreport (both 14.11 and 16.05) I'm seeing "duplicate" > user info with different timings. Can someone say what's being added > up separately here - it seems to be summing something differently for > me and I can't work out what makes it split into two: Not that it helps, but I don't see the same here for me with 16.05.5. # sreport cluster AccountUtilizationByUser start=2016-07-01 end=2016-10-01 account=vlsci user=samuel cluster=avoca -t h Cluster/Account/User Utilization 2016-07-01T00:00:00 - 2016-09-30T23:59:59 (7948800 secs) Use reported in TRES Hours Cluster Account Login Proper Name Used Energy - --- - --- avoca vlscisamuel Christopher Sa+15103 0 -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci
[slurm-dev] Re: Send notification email
On 06/10/16 03:07, Fanny Pagés Díaz wrote: > Oct 5 11:34:52 compute-0-3 postfix/smtp[6469]: connect to > 10.8.52.254[10.8.52.254]:25: Connection refused So you are blocked from connecting to the mail server you are trying to talk to on port 25/tcp (SMTP) - you need to get that opened up. -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci
[slurm-dev] Re: Best way to control synchronized clocks in cluster?
On 07/10/16 01:17, Per Lönnborg wrote: > But what is the preferred way to check that the compute nodes on our > have correct time, and if not, see to it that Slurm doesn�t allocate > these nodes to perform tasks? We run NTP everywhere - we have to because GPFS depends on correct clocks as well and if they're out of step well then GPFS will stop working on the node making Slrm the least of your worries. :-) So just run ntpd. All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci
[slurm-dev] Re: Send notification email
On 03/10/16 23:39, Fanny Pagés Díaz wrote: > I have a slurm running in the same HPC cluster server, but I need send > all notification using my corporate mail server, which running in > another server at my internal network. I not need use the local postfix > installed at slurm server. The most reliable solution will be to configure Postfix to send emails via the corporate server. All our clusters send using our own mail server quite deliberately. We set: relayhost (to say where to relay email via) myorigin (to set the system name to its proper FQDN) aliasmaps (to add an LDAP lookup to rewrite users email to the value in LDAP) But really this isn't a Slurm issue, it's a host config issue for Postfix. All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci
[slurm-dev] Re: Slurmctld auto restart and kill running job, why ?
On 29/09/16 01:16, John DeSantis wrote: > We get the same snippet when our logrotate takes action against the > cltdlog: Does your slurmctld restart then too? -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci
[slurm-dev] Re: Invalid Protocol Version
On 28/09/16 16:25, Barbara Krasovec wrote: > Yes, this worked! Thank you very much for your help! My pleasure! -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci
[slurm-dev] Re: CGroups
On 26/09/16 16:51, Lachlan Musicman wrote: > Does this mean that it's now considered acceptable to run cgroups for > ProcTrackType? We've been running with that on all our x86 clusters since we switched to Slurm, haven't seen an issue yet. All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci
[slurm-dev] Re: Invalid Protocol Version
On 27/09/16 23:54, Barbara Krasovec wrote: > The version of the client and server is the same. I guess the problem is > in the slurmctld state file, where the slurm protocol version of some > worker nodes must be wrong. I suspect this is bug 3050 - we hit it for frontend nodes on BlueGene/Q and reported it against that, but I've seen the same symptom that you've hit on x86 clusters too - as has Ulf from Dresden. https://bugs.schedmd.com/show_bug.cgi?id=3050 Tim is checking the code for the generic case. > Creating a clean state file would fix the problem but also kill the > jobs. Is there another way to fix this? Enforce the correct version in > node_mgr.c and job_mgr.c? Restarting the services doesn't help at all. Actually you don't lose jobs - but you do lose the reason for nodes being offline.. What we did was: 1) save the state of offline nodes and make a script to restore via scontrol 2) shutdown slurmctld and all slurmds 3) move the node_stat* files out of the way 4) start up slurmd again 5) start up slurmctld 6) run the script created at step 1 Hope that helps! All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci
[slurm-dev] Re: Slurmctld auto restart and kill running job, why ?
On 26/09/16 17:48, Philippe wrote: > [2016-09-26T08:02:16.582] Terminate signal (SIGINT or SIGTERM) received So that's some external process sending one of those two signals to slurmctld, it's not something it's choosing to do at all. We've never seen this. One other question - you've got the shutdown log from slurmctld and the start log of a slurmd - what happens when slurmctld starts up? That might be your clue about why yours jobs are getting killed. -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci
[slurm-dev] Re: Slurmctld auto restart and kill running job, why ?
On 26/09/16 17:48, Philippe wrote: > [2016-09-26T08:01:44.792] debug: slurmdbd: Issue with call > DBD_CLUSTER_CPUS(1407): 4294967295(This cluster hasn't been added to > accounting yet) Not related - but it looks like whilst it's been told to talk to slurmdbd you haven't added the cluster to slurmdbd with "sacctmgr" yet so I suspect all your accounting info is getting lost. -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci
[slurm-dev] Re: Slurmctld auto restart and kill running job, why ?
On 27/09/16 17:40, Philippe wrote: > /usr/sbin/invoke-rc.d --quiet slurm-llnl reconfig >/dev/null I think you want to check whether that's really restarting it or just doing an "scontrol reconfigure" which won't (shouldn't) restart it. -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci
[slurm-dev] Bug in node suspend/resume config code with scontrol reconfigure in 16.05.x (bugzilla #3078)
Hi folks A heads up for those using or looking to use node suspend/resume aka power saving aka elastic computing in 16.05.x. The slurmctld will lose your list of excluded nodes/partitions on an: scontrol reconfigure and then will treat all nodes as being eligible for power control, putting them into a bad state. :-( This is Slurm bug: https://bugs.schedmd.com/show_bug.cgi?id=3078 which has been hit separately by two friends of mine at different places, one of whom I'm helping out with elastic computing/cloudburst. Hopefully this saves someone else from losing sleep over this! All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci
[slurm-dev] Re: external slurmdbd for multiple clusters
On 23/09/16 19:12, Miguel Gila wrote: > We also do this, run a number of Slurm clusters attached to a single > slurmdbd. One issue with this is setup is that once you decommission > a cluster, it needs to be removed from the DB somehow, otherwise your > DB grows beyond reasonable size... For accounting reasons we can't do that, consequently our slurmdbd is around 17GB in size. To give you an idea the largest table is the job step table for our BlueGene/Q which takes almost 7GB for pushing 19 million job steps. The next largest is the job step table for one of our x86 clusters which is around 3GB for 8 million job steps. Neither cause us any issues these days (we used to have a problem when, for complicated historical reasons, slurmdbd was running on a 32-bit VM and could run out of memory). Admittedly we do have beefy database servers. :-) All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci
[slurm-dev] Re: external slurmdbd for multiple clusters
On 02/09/16 18:54, Paddy Doyle wrote: > We currently have the dbd on 16.05.4 and have some 15.x clusters still > pointing > to it fine. I can't recall exactly, but in the past we may have even had 2 > major > releases behind pointing to an up-to-date dbd... I'm not sure how far back you > can go, but I suspect 14.x talking to a 16.x dbd would be fine. Slurm supports 2 major releases behind. So a 16.05.x slurmdbd should talk to 15.08.x and 14.11.x slurmctld's but *not* 14.03.x. All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci
[slurm-dev] Re: Slurm array scheduling question
On 21/09/16 14:15, Christopher benjamin Coffey wrote: > I’d like to get some feedback on this please from other sites and the > developers if possible. Thank you! The best I can offer is this from the job array documentation from Slurm: # When a job array is submitted to Slurm, only one job record is # created. Additional job records will only be created when the # state of a task in the job array changes, typically when a task # is allocated resources or its state is modified using the # scontrol command. In 2.6.x I think there was a record for each element, but this was cleaned up in a later release (can't remember when sorry!). Hope that helps, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci