[slurm-dev] slurmctld causes slurmdbd to seg fault
Hi, We have been having some with NFS mounts via Infiniband getting dropped by nodes. We ended up switching our main admin server, which provides NFS and Slurm from one machine to another. Now, however, if slurmdbd is started, as soon as slurmctld starts, slurmdbd seg faults. In the slurmdbd.log we have slurmdbd: error: We have more allocated time than is possible (7724741 > 7012800) for cluster soroban(1948) from 2017-10-17T16:00:00 - 2017-10-17T17:00:00 tres 1 slurmdbd: error: We have more time than is possible (7012800+36720+0)(7049520) > 7012800 for cluster soroban(1948) from 2017-10-17T16:00:00 - 2017-10-17T17:00:00 tres 1 slurmdbd: Warning: Note very large processing time from hourly_rollup for soroban: usec=46390426 began=17:08:17.777 Segmentation fault (core dumped) and the corresponding output of strace is fstat(3, {st_mode=S_IFREG|0600, st_size=871270, ...}) = 0 write(3, "[2017-10-17T17:09:04.168] Warnin"..., 132) = 132 +++ killed by SIGSEGV (core dumped) +++ We're running 17.02.7. Any ideas? Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Re: file and directory permissions
Hi Marcus, Marcus Wagner <wag...@itc.rwth-aachen.de> writes: > Hello, everyone. > > I'm also fairly new to slurm, still in a conceptual rather than a test or > productive phase. Currently I am still trying to find out where to create > which > files and directories, on the host or in a network directory. > I'm a little confused about the description in the manpage of slurm. conf. > For example, the JobCheckpointDir should be accessible from both the primary > and > backup controller. Now it is clear (at least I believe) that this has to be > done > in the NCCR, for example. If the primary controller goes down, the backup > controller must be able to access it. > On the other hand, SlurmctldPidFile should also be available on both the > primary > and backup controller. Since there is usually in /var/run, I assume that this > should be a local path. It should also be unique on every controller. > The manpage is not quite clear in its description. Your understanding is correct. > What about the SlurmctldLogFile, for example? Theoretically, both > could write to the same file. We have everything local except for the config files and the statesave location. > If anyone has an advice or would like to tell me how it was solved on your > site, > I would be very happy. > > > best > Marcus Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Re: Upgrading Slurm
Hi Elisabetta, Elisabetta Falivene <e.faliv...@ilabroma.com> writes: > Upgrading Slurm > > Thank you all for useful advices! > > So The 'jump' could not be a problem if there are no running jobs > (which is my case as you guessed). Surely I'll report how it went > doing it. I would like to do some test on a virtual machine, but > really can't imagine how to replicate the exact situation of a 7Tb > cluster locally... > > Just some other questions. How would you do the upgrade in the safer > way? Letting aptitude do his job? Would you to debian 9? And the nodes > must be upgraded in the same way one by one? If no jobs are running, I would just let aptitude get on with it. It there are no other reasons not to, I would upgrade to Debian 9. In this case, your version of Slurm will be 16.05 and thus not too old. > Let's think about the worst case: upgrading nuke slurm. I don't really > know well this machine's configuration. You would backup something > else beside The database before upgrading? The only other thing I backup is the statesave directory, but this only interesting if you are upgrading while jobs are running. In your case, only the database is worth backing up, and even then, that's only really interesting if you need the old data for statistical purposes, or you need to maintain, say, fairshare information across the upgrade. In bocca al lupo! Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Re: Upgrading Slurm
Hi Elisabetta, Ole Holm Nielsen <ole.h.niel...@fysik.dtu.dk> writes: > On 10/03/2017 03:29 PM, Elisabetta Falivene wrote: >> I've been asked to upgrade our slurm installation. I have a slurm 2.3.4 on a >> Debian 7.0 wheezy cluster (1 master + 8 nodes). I've not installed it so I'm >> a >> bit confused about how to do this and how to proceed without destroying >> anything. >> >> I was thinking to upgrade at least to Jessie (Debian 8) but what about Slurm? >> I've read carefully the upgrading section >> (https://slurm.schedmd.com/quickstart_admin.html) of the doc, reading that >> the >> upgrade must be done incrementally and not jumping from 2.3.4 to 17, for >> example. > > Yes, you may jump max 2 versions per upgrade. > Quoting https://slurm.schedmd.com/quickstart_admin.html#upgrade > >> Slurm daemons will support RPCs and state files from the two previous minor >> releases (e.g. a version 16.05.x SlurmDBD will support slurmctld daemons and >> commands with a version of 16.05.x, 15.08.x or 14.11.x). > > >> Stil is not clear to me precisely how to do this. How would you proceed if >> asked to upgrade a cluster you just don't know nothing about? What would you >> check? What version of o.s. and slurm would you choose? What would you >> backup? >> And how would you proceed? >> >> Any info is gold! Thank you > > My 2 cents of information: > > My Slurm Wiki explains how to upgrade Slurm on CentOS 7: > https://wiki.fysik.dtu.dk/niflheim/Slurm_installation#upgrading-slurm > > Probably the general method is the same for Debian. Ole's pages on Slurm are indeed very useful (Thanks, Ole!). I just thought I point out that the limitation on only upgrading by 2 major versions is for the case that you are upgrading a production system and don't want to lose any running jobs. If you are upgrading the whole operating system, you are probably planning a downtime anyway and so there won't be any such jobs. In this case, there shouldn't in theory be a problem - although I must admit that I wouldn't be that surprised if converting the database from 2.3.4 to, say, 17.02.7 didn't go 100% smoothly. However, Debian users who just rely on Debian packages are always going to face this problem of large version jumps between Debian releases, and so it would be useful for the community to know how well this works. Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Re: Job stuck in CONFIGURING, node is 'mix~'
Marcin Stolarek <stolarek.mar...@gmail.com> writes: > Re: [slurm-dev] Re: Job stuck in CONFIGURING, node is 'mix~' > > I think that all you needed was to set the node state to DOWN/FAIL and > then RESUME without actually rebooting the node. Did you try this? I > remember that in FAQ this was used for jobs stacked in CG state. I think I may have tried this, but without success. However, due to another issue I was forced to reboot the server running slurmctld and the problem is now resolved. Incidentally, it also solved my problem that idle nodes were not being powered down. So I guess "Have you tried turning it off and then on again?" is still often a valid suggestion. Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Re: Accounting using LDAP ?
Hi Chris, Christopher Samuel <sam...@unimelb.edu.au> writes: > On 20/09/17 15:53, Loris Bennett wrote: > >> Having said that, the only scenario I can see being easily automated is >> one where each user only has one association, namely with their Unix >> group, and everyone has equal shares. This is our set up, but as soon >> as you have, say, users with multiple associations and/or membership in some >> associations confers more shares automation becomes very difficult. > > The user management system we use adds/removes users to accounts (which > map to projects in our lingo) whenever a user is added/removed to a > project as well as creating/deleting them. Users can change their > default project which changes their default account in Slurm. Is the user management system homegrown or something more generally available? Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Re: Accounting using LDAP ?
Christopher Samuel <sam...@unimelb.edu.au> writes: > On 20/09/17 03:03, Carlos Lijeron wrote: > >> I'm trying to enable accounting on our SLURM configuration, but our >> cluster is managed by Bright Management which has its own LDAP for users >> and groups. When setting up SLURM accounting, I don't know how to make >> the connection between the users and groups from the LDAP as opposed to >> the local UNIX. > > Slurm just uses the host's NSS config for that, so as long as the OS can > see the users and groups then slurmdbd will be able to see them too. > > *However*, you _still_ need to manually create users in slurmdbd to > ensure that they can run jobs, but that's a separate issue to whether > slurmdbd can resolve users in LDAP. > > I would hope that Bright would have the ability to do that for you > rather than having you handle it manually, but that's a question for Bright. Our version, Bright Cluster Manger 5.2, doesn't have any features to help set up accounting in Slurm, but then again it's a pretty old version and things may have changed. Having said that, the only scenario I can see being easily automated is one where each user only has one association, namely with their Unix group, and everyone has equal shares. This is our set up, but as soon as you have, say, users with multiple associations and/or membership in some associations confers more shares automation becomes very difficult. Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Does powering down as suspend action still work?
Hi, Can any one confirm that powering off nodes still works as a suspend action in 16.05 and/or 17.02? Cheers, Loris BTW, the example of slurmctld logging contains the line: [May 02 15:31:25] Power save mode 0 nodes Given that the code now reads if (((now - last_log) > 600) && (susp_total > 0)) { info("Power save mode: %d nodes", susp_total); I assume that shown line can no longer appear. -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Suspend stopped working - debug flag?
Hi, We have been powering down idle nodes for many years now. However, at some point recently, this seems to have stopped working. I can't pinpoint exactly when the problem started, as the cluster is usually full and so the situation in which nodes should be powered down doesn't occur very often. To try to debug the problem I have set DebugFlags=Power but don't get any logging information about node suspension. The man page for 'slurm.conf' says that this provides debugging for Power management plugin but does this refer to the mechanism for capping the power used by nodes? If so, what debug flags should I be using? Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Re: Job stuck in CONFIGURING, node is 'mix~'
Loris Bennett <loris.benn...@fu-berlin.de> writes: > Hi, > > I have a node which is powered on and to which I have sent a job. The > output of sinfo is > > PARTITION AVAIL TIMELIMIT NODES STATE NODELIST > test up 7-00:00:00 1 mix~ node001 > > The output of squeue is > > JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) > 1795993 test 7_singleloris CF 24:29 1 node001 > > I don't understand the node state 'mix~'. If at all, I would only > expect it to exist very briefly between 'idle~' and 'mix#'. The '~' is > certainly incorrect, as the node is not in a power-saving state, which > in our case is powered-off. > > This problem may have existed in 16.05.10-2, but currently we are using > 17.02.7. All other nodes in the cluster apart from one are functioning > normally. > > Does anyone have any idea what we might be doing wrong? I still don't know what the problem was, but I got the node back into a sensible state by setting the state to FAIL, rebooting the node, and then setting the state to RESUME. Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Re: Behaviour of Partition setting MaxTime
Hi Greg, Greg Wickham <greg.wick...@kaust.edu.sa> writes: > Hi, > > What is the behaviour when either root or the SlurmUser update the > duration of an unprivileged user's running job to exceed the "MaxTime" > setting of the partition? > > The documentation includes the text "This limit does not apply to jobs > executed by SlurmUser or user root." however what about jobs executed > by normal users that are extended by SlurmUser or root? You don't say where the text you quote is from, but I have often updated the 'TimeLimit' of a user job to extend its run-time beyond the 'MaxTime' of the partition. Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Re: Job stuck in CONFIGURING, node is 'mix~'
Hi Lyn, Unfortunately, rebooting the node makes no difference to the state of the node. The job gets re-queued and the node goes back to 'mix~'. What baffles me is that there is obviously some sort of communication problem between the slurmctld on the admin node and the slurmd on the compute node, but I can't find anything in the log files to indicate what's going wrong. Cheers, Loris Lyn Gerner <schedulerqu...@gmail.com> writes: > Re: [slurm-dev] Job stuck in CONFIGURING, node is 'mix~' > > Hi Loris, > > At least with earlier releases, I've not found a way to act directly upon the > job. However, if it's possible to down the node, that should requeue (or > cancel) the job. > > Best, > Lyn > > On Tue, Sep 12, 2017 at 3:40 AM, Loris Bennett <loris.benn...@fu-berlin.de> > wrote: > > Hi, > > I have a node which is powered on and to which I have sent a job. The > output of sinfo is > > PARTITION AVAIL TIMELIMIT NODES STATE NODELIST > test up 7-00:00:00 1 mix~ node001 > > The output of squeue is > > JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) > 1795993 test 7_single loris CF 24:29 1 node001 > > I don't understand the node state 'mix~'. If at all, I would only > expect it to exist very briefly between 'idle~' and 'mix#'. The '~' is > certainly incorrect, as the node is not in a power-saving state, which > in our case is powered-off. > > This problem may have existed in 16.05.10-2, but currently we are using > 17.02.7. All other nodes in the cluster apart from one are functioning > normally. > > Does anyone have any idea what we might be doing wrong? > > Cheers, > > Loris > > -- > Dr. Loris Bennett (Mr.) > ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de > > -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Job stuck in CONFIGURING, node is 'mix~'
Hi, I have a node which is powered on and to which I have sent a job. The output of sinfo is PARTITION AVAIL TIMELIMIT NODES STATE NODELIST test up 7-00:00:00 1 mix~ node001 The output of squeue is JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 1795993 test 7_singleloris CF 24:29 1 node001 I don't understand the node state 'mix~'. If at all, I would only expect it to exist very briefly between 'idle~' and 'mix#'. The '~' is certainly incorrect, as the node is not in a power-saving state, which in our case is powered-off. This problem may have existed in 16.05.10-2, but currently we are using 17.02.7. All other nodes in the cluster apart from one are functioning normally. Does anyone have any idea what we might be doing wrong? Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] sreport job sizesbyaccount over all accounts?
Hi, I can do the following: $ sreport job sizesbyaccount -T mem -t hours start=2017-07-24 end=2017-07-24 grouping=18,42,90 Job Sizes 2017-07-24T00:00:00 - 2017-07-24T00:59:59 (3600 secs) TRES type is mem Time reported in Hours Cluster Account 0-17 TRES18-41 TRES42-89 TRES>= 90 TRES % of cluster - - - - - - clusterdept01 1281631 19200122880 0 58.30% clusterdept02 7 0 0 0 2.87% clusterdept03849353 0 0 0 34.78% clusterdept04 88752 0 0 0 3.63% clusterdept05 10240 0 0 0 0.42% However I'd really just like to have the sums for the various TRES groups over all departments (and then compare this with the values for other time periods). I'm going to read the data into R, so I can do the roll-up there, but I wondered whether I can get the information directly from Slurm. Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Re: Elapsed time for slurm job
Dear Sema, You need to set up accounting first: https://slurm.schedmd.com/accounting.html You obviously won't have data for jobs which ran before accounting was set up. When you have done this, you will be able to do something like sacct -j 123456 -o jobid,elapsed for subsequent jobs. Read 'man sacct' for more info. Regards Loris Sema Atasever <s.atase...@gmail.com> writes: > Re: [slurm-dev] Re: Elapsed time for slurm job > > Dear Loris, > > When i try this command (sacct -o 2893,elapsed) i get this error message > unfortunately: > > SLURM accounting storage is disabled > > How to solve this problem? > > Regards, Sema. > > On Mon, Jul 24, 2017 at 4:25 PM, Loris Bennett <loris.benn...@fu-berlin.de> > wrote: > > Sema Atasever <s.atase...@gmail.com> writes: > > > Elapsed time for slurm job > > > > Dear Friends, > > > > How can i retrieve elapsed time if the slurm job has completed? > > > > Thanks in advance. > > sacct -o jobid,elapsed > > See 'man sacct' or 'sacct -e' for the full list of fields. > > Cheers, > > Loris > > -- > Dr. Loris Bennett (Mr.) > ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de > > -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Re: Elapsed time for slurm job
Sema Atasever <s.atase...@gmail.com> writes: > Elapsed time for slurm job > > Dear Friends, > > How can i retrieve elapsed time if the slurm job has completed? > > Thanks in advance. sacct -o jobid,elapsed See 'man sacct' or 'sacct -e' for the full list of fields. Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Re: ANNOUNCE: A collection of Slurm tools
Hi Ole, Ole Holm Nielsen <ole.h.niel...@fysik.dtu.dk> writes: > As a small contribution to the Slurm community, I've moved my collection of > Slurm tools to GitHub at https://github.com/OleHolmNielsen/Slurm_tools. These > are tools which I feel makes the daily cluster monitoring and management a > little easier. > > The following Slurm tools are available: > > * pestat Prints a Slurm cluster nodes status with 1 line per node and job > info. > > * slurmreportmonth Generate monthly accounting statistics from Slurm using the > sreport command. > > * showuserjobs Print the current node status and batch jobs status broken down > into userids. > > * slurmibtopology Infiniband topology tool for Slurm. > > * Slurm triggers scripts. > > * Scripts for managing nodes. > > * Scripts for managing jobs. > > The tools "pestat" and "slurmibtopology" have previously been announced to > this > list, but future updates will be on GitHub only. > > I would also like to mention our Slurm deployment HowTo guide at > https://wiki.fysik.dtu.dk/niflheim/SLURM > > /Ole Thanks for sharing your tools. Here are some brief comments - psjob/psnode - The USERLIST variable makes the commands a bit brittle, since ps will fail if you pass an unknown username. - showuserjobs - Doesn't handle usernames longer than 8-chars (we have longer names) - The grouping doesn't seem quite correct. As shown in the example below, not all the users of the group appear under the group total for the appropriate group: UsernameJobs CPUs Jobs CPUs Group Further info = = = GRAND_TOTAL 168 1089 55 451 ALL running+idle=1540 CPUs 29 users GROUP_TOTAL 56 349 10 119 group01 running+idle=468 CPUs 8 users user0127 324 452 group02 One, User GROUP_TOTAL 27 324 452 group02 running+idle=376 CPUs 1 users user0229 174 1 6 group01 Two, User GROUP_TOTAL5 148 18 208 group03 running+idle=356 CPUs 4 users user03 3 120 16 176 group03 Three, User user041196 348 group01 Four, User ... In general, maybe it would good to have a common config file, where things such as paths to binaries, USERLIST and username lengths are defined. Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Re: srun can't use variables in a batch script after upgrade
Hi Dennis, Dennis Tants <dennis.ta...@zarm.uni-bremen.de> writes: > Hello Loris, > > Am 10.07.2017 um 07:39 schrieb Loris Bennett: >> Hi Dennis, >> >> Dennis Tants <dennis.ta...@zarm.uni-bremen.de> writes: >> >>> Hi list, >>> >>> I am a little bit lost right now and would appreciate your help. >>> We have a little cluster with 16 nodes running with SLURM and it is >>> doing everything we want, except a few >>> little things I want to improve. >>> >>> So that is why I wanted to upgrade our old SLURM 15.X (don't know the >>> exact version) to 17.02.4 on my test machine. >>> I just deleted the old version completely with 'yum erase slurm-*' >>> (CentOS 7 btw.) and build the new version with rpmbuild. >>> Everything went fine so I started configuring a new slurm[dbd].conf. >>> This time I also wanted to integrate backfill instead of FIFO >>> and also use accounting (just to know which person uses the most >>> resources). Because we had no databases yet I started >>> slurmdbd and slurmctld without problems. >>> >>> Everything seemed fine with a simple mpi hello world test on one and two >>> nodes. >>> Now I wanted to enhance the script a bit more and include working in the >>> local directory of the nodes which is /work. >>> To get everything up and running I used the script which I attached for >>> you (it also includes the output after running the script). >>> It should basically just copy all data to /work/tants/$SLURM_JOB_NAME >>> before doing the mpi hello world. >>> But it seems that srun does not know $SLURM_JOB_NAME even though it is >>> there. >>> /work/tants belongs to the correct user and has rwx permissions. >>> >>> So did I just configure something wrong or what happened here? Nearly >>> the same example is working on our cluster with >>> 15.X. The script is only for testing purposes, thats why there are so >>> many echo commands in there. >>> If you see any mistake or can recommend better configurations I would >>> glady hear them. >>> Should you need any more information I will provide them. >>> Thank you for your time! >> Shouldn't the variable be $SBATCH_JOB_NAME? >> >> Cheers, >> >> Loris >> > > when I use "echo $SLURM_JOB_NAME" it will tell me the name I specified > with #SBATCH -J. > It is not working with srun in this version (it was working in 15.x). > > However, when I now use "echo $SBATCH_JOB_NAME" it is just a blank > variable. As told by someone from the list, > I used the command "env" to verify which variables are available. This > list includes SLURM_JOB_NAME > with the name I specified. So $SLURM_JOB_NAME shouldn't be a problem. > > Thank you for your suggestion though. > Any other hints? > > Best regards, > Dennis The manpage of srun says the following: SLURM_JOB_NAMESame as -J, --job-name except within an existing allocation, in which case it is ignored to avoid using the batch job’s name as the name of each job step. This sounds like it might mean that if you submit a job script via sbatch and in this script call srun, the variable might not be defined. However, the wording is a bit unclear and I have never tried this myself. Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Re: slurm 17.2.06 min memory problem
Hi Roy, Roe Zohar <roezo...@gmail.com> writes: > slurm 17.2.06 min memory problem > > Hi all, > I have installed the last Slurm version and I have noticed a strange behavior > with the memory allocated for jobs. > In my slurm conf I am having: > SelectTypeParameters=CR_LLN,CR_CPU_Memory > > Now, when I am sending a new job with out giving it a --mem amount, it > automatically assign it all the server memory, which mean I am getting only > one job per server. > > I had to add DefMemPerCPU in order to get around that. > > Any body know why is that? > > Thanks, > Roy What value of SelectType are you using? Note also that CR_LLN schedules jobs to the least loaded nodes and so until all nodes have one job, you will not more than one job per node. See 'man slurm.conf'. Regards Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Re: srun can't use variables in a batch script after upgrade
Hi Dennis, Dennis Tants <dennis.ta...@zarm.uni-bremen.de> writes: > Hi list, > > I am a little bit lost right now and would appreciate your help. > We have a little cluster with 16 nodes running with SLURM and it is > doing everything we want, except a few > little things I want to improve. > > So that is why I wanted to upgrade our old SLURM 15.X (don't know the > exact version) to 17.02.4 on my test machine. > I just deleted the old version completely with 'yum erase slurm-*' > (CentOS 7 btw.) and build the new version with rpmbuild. > Everything went fine so I started configuring a new slurm[dbd].conf. > This time I also wanted to integrate backfill instead of FIFO > and also use accounting (just to know which person uses the most > resources). Because we had no databases yet I started > slurmdbd and slurmctld without problems. > > Everything seemed fine with a simple mpi hello world test on one and two > nodes. > Now I wanted to enhance the script a bit more and include working in the > local directory of the nodes which is /work. > To get everything up and running I used the script which I attached for > you (it also includes the output after running the script). > It should basically just copy all data to /work/tants/$SLURM_JOB_NAME > before doing the mpi hello world. > But it seems that srun does not know $SLURM_JOB_NAME even though it is > there. > /work/tants belongs to the correct user and has rwx permissions. > > So did I just configure something wrong or what happened here? Nearly > the same example is working on our cluster with > 15.X. The script is only for testing purposes, thats why there are so > many echo commands in there. > If you see any mistake or can recommend better configurations I would > glady hear them. > Should you need any more information I will provide them. > Thank you for your time! Shouldn't the variable be $SBATCH_JOB_NAME? Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Length of possible SlurmDBD without HA
Hi, On the Slurm FAQ page https://slurm.schedmd.com/faq.html it says the following: 52. How critical is configuring high availability for my database? Consider if you really need mysql failover. Short outage of slurmdbd is not a problem, because slurmctld will store all data in memory and send it to slurmdbd when it's back operating. The slurmctld daemon will also cache all user limits and fair share information. I was wondering how long a "short outage" can be. Presumably this is determined by the amount of free memory on the server running slurmctld, the number of jobs, and the amount of memory required per job. So roughly how much memory will be required per job? Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Re: Slurm query
"suprita.bot...@wipro.com" <suprita.bot...@wipro.com> writes: > Hi , > > Just wanted to know,What is the meaning of * in the partition name. > > When we type the following command: > > Sinfo : > > The o/p comes as: > > [root@punehpcdl01 ~]# sinfo > > PARTITION AVAIL TIMELIMIT NODES STATE NODELIST > > debug* up infinite 1 idle punehpcdl01 It means that 'debug' is the default partition. See 'man sinfo'. Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Re: Multifactor Priority Plugin for Small clusters
Hi Sourabh, sourabh shinde <sourabhshinde2...@gmail.com> writes: > Re: [slurm-dev] Re: Multifactor Priority Plugin for Small clusters > > Thank you guys for your reply. > > @Loris : Yes, i had a look at Gang Scheduling which does not fits my > requirements. In my case, a job which is scheduled should complete > its execution and then the next job should start. this is what i need. > > @Chris : I had already set up accounting. but the resource limits was > new. I have set limits on QOS and Users now, constraints are working > well. > > @Ole : Wiki page was really helpful. :) > > Also, does modifying the mult-ifactor logic would really help in my > case ? if Not, what else can i do in order to go atleast close to > what i need to achieve(the example referring in my previous post) ? You may need to rethink what you are trying to achieve. You seem to expect that the priority of a job intrinsically has something to do with the resources allocated the job. This may be the case if you define your factors appropriately, but primarily the two are not connected and the priority just determines which order jobs should be started in at a given point in time. From your example, what you seem to want is that three users, each with a different degree of what I'll call "importance" can all start jobs at the same time, but, depending on the amount of "importance", they can use different numbers of nodes. With multifactor fairshare, the priorities would have to be essentially equal and you would have to restrict the number of nodes for the different degrees of "importance". This brings with it other problems, such as what happens if only the user with the lowest "importance" has jobs in the queue. Can he or she use all the nodes, or do some remain idle in case a more "important" job comes along? I think if you and your users can accept the idea of fairshare over a period rather than at every point in time, you might save yourself a great deal of time and trouble with Slurm. Regards Loris > Thanks and Regards > Sourabh > > Regards, > Sourabh Shinde > +49 176 4569 5546 > sourabhshinde.cf > > On Mon, Jul 3, 2017 at 8:02 AM, Loris Bennett <loris.benn...@fu-berlin.de> > wrote: > > Hi Sourabh, > > sourabh shinde <sourabhshinde2...@gmail.com> writes: > > > Multifactor Priority Plugin for Small clusters > > > > Hello Everyone, > > > > I am new to SLURM and trying to run it locally on my PC. I am using > > Multifactor plugin to assign priorities for the job. The problem is > > multi factor doesn’t work as needed on small clusters. I tried > > assigning weightage to the factors as per my need but the scheduler > > always schedule the job on FIFO basis. > > > > I am trying to find some alternative where making changes to the > > priority plugin code could make it work on small clusters. > > > > for e.g > > > > If I have 12 nodes on my cluster, and if 3 users A,B and C with QOS > > low, normal and high respectively submit their job for execution. I > > want that SLURM should assign not all nodes to the User A. Atleast 1 > > node should be assigned to the users B and C which are having low and > > normal priority. how can I achieve this ? > > > > PS: Gang scheduling and preemption are not possible in my case. > > > > Any help would be appreciated. > > > > Thanks in advance. > > > > Regards, > > Sourabh Shinde > > I don't think you can achieve what you want with Fairshare and > Multifactor Priority. Fairshare looks at distributing resources fairly > between users over a *period* of time. At any *point* in time it is > perfectly possible for all the resources to be allocated to one user. > It is only over time that the allocation of resources will average out > to correspond to how you have configured the shares. > > If you only have a small amount of resources and a small number of > users, this may not work very well. Have you looked at Gang scheduling > without premption? > > Cheers, > > Loris > > -- > Dr. Loris Bennett (Mr.) > ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de > > -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] sacct: --unit applied to NNodes
Hi, With version 16.05.10-2, the option '--units' get applied incorrectly to the column 'NNodes': $ sacct -u user1234 -o jobid,nnodes,ncpus,reqmem,maxrss,elapsed -S 2017-07-01 --units=G JobID NNodes NCPUS ReqMem MaxRSSElapsed -- -- -- -- 1601832 0.00G 164Gc11-01:00:30 1601832.bat+0.00G 14Gc 0.01G 11-01:00:30 1601832.0 0.00G 94Gc 7.42G 11-01:00:28 1699682 0.00G 164Gc 16:52:49 1699682.0 0.00G 34Gc 16:52:48 Has this been fixed in more recent versions? Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Rewarding good memory requirement estimation on shared nodes?
Hi, When nodes are being shared, it is desirable for users to estimate memory requirements as accurately as possible. One way to do this would be to have a cron job bump up the priority of the jobs of users who have a high average value of MaxRSS/ReqMem. A more elegant way would be to add a component to the multifactor priority plugin which would do the same thing. Is there any way to do this, short of writing one's own version? Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Re: Multifactor Priority Plugin for Small clusters
Hi Sourabh, sourabh shinde <sourabhshinde2...@gmail.com> writes: > Multifactor Priority Plugin for Small clusters > > Hello Everyone, > > I am new to SLURM and trying to run it locally on my PC. I am using > Multifactor plugin to assign priorities for the job. The problem is > multi factor doesn’t work as needed on small clusters. I tried > assigning weightage to the factors as per my need but the scheduler > always schedule the job on FIFO basis. > > I am trying to find some alternative where making changes to the > priority plugin code could make it work on small clusters. > > for e.g > > If I have 12 nodes on my cluster, and if 3 users A,B and C with QOS > low, normal and high respectively submit their job for execution. I > want that SLURM should assign not all nodes to the User A. Atleast 1 > node should be assigned to the users B and C which are having low and > normal priority. how can I achieve this ? > > PS: Gang scheduling and preemption are not possible in my case. > > Any help would be appreciated. > > Thanks in advance. > > Regards, > Sourabh Shinde I don't think you can achieve what you want with Fairshare and Multifactor Priority. Fairshare looks at distributing resources fairly between users over a *period* of time. At any *point* in time it is perfectly possible for all the resources to be allocated to one user. It is only over time that the allocation of resources will average out to correspond to how you have configured the shares. If you only have a small amount of resources and a small number of users, this may not work very well. Have you looked at Gang scheduling without premption? Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Re: Dry run upgrade procedure for the slurmdbd database
Hi Ole, We have also upgraded in place continuously from 2.2.4 to currently 16.05.10 without any problems. As I mentioned previously, it can be handy to make a copy of the statesave directory, once the daemons have been stopped. However, if you want to know how long the upgrade might take, then yours is a good approach. What is your use case here? Do you want to inform the users about the length of the outage with regard to job submission? Cheers, Loris Lachlan Musicman <data...@gmail.com> writes: > Re: [slurm-dev] Dry run upgrade procedure for the slurmdbd database > > We did it in place, worked as noted on the tin. It was less painful > than I expected. TBH, your procedures are admirable, but you shouldn't > worry - it's a relatively smooth process. > > cheers > L. > > -- > "Mission Statement: To provide hope and inspiration for collective action, to > build collective power, to achieve collective transformation, rooted in grief > and rage but pointed towards vision and dreams." > > - Patrisse Cullors, Black Lives Matter founder > > On 26 June 2017 at 20:04, Ole Holm Nielsen <ole.h.niel...@fysik.dtu.dk> wrote: > > We're planning to upgrade Slurm 16.05 to 17.02 soon. The most critical step > seems to me to be the upgrade of the slurmdbd database, which may also take > tens of minutes. > > I thought it's a good idea to test the slurmdbd database upgrade locally on > a drained compute node in order to verify both correctness and the time > required. > > I've developed the dry run upgrade procedure documented in the Wiki page > https://wiki.fysik.dtu.dk/niflheim/Slurm_installation#upgrading-slurm > > Question 1: Would people who have real-world Slurm upgrade experience kindly > offer comments on this procedure? > > My testing was actually successful, and the database conversion took less > than 5 minutes in our case. > > A crucial step is starting the slurmdbd manually after the upgrade. But how > can we be sure that the database conversion has been 100% completed? > > Question 2: Can anyone confirm that the output "slurmdbd: debug2: Everything > rolled up" indeed signifies that conversion is complete? > > Thanks, > Ole > > -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Re: Controlling the output of 'scontrol show hostlist'?
Michael Jennings <m...@lanl.gov> writes: > On Thursday, 22 June 2017, at 04:19:04 (-0600), > Loris Bennett wrote: > >> rpmbuild --rebuild --with=slurm --without=torque pdsh-2.26-4.el6.src.rpm > > Remove the equals signs. I have no problems building pdsh 2.29 via: > > rpmbuild --rebuild --with slurm --without torque pdsh-2.29-1.el7.src.rpm > > for EL5, EL6, and EL7. Thanks, Michael. That did the trick. Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Re: Controlling the output of 'scontrol show hostlist'?
Hi Ole, Ole Holm Nielsen <ole.h.niel...@fysik.dtu.dk> writes: > You may want to throw in a uniq command in case the user runs multiple jobs on > some nodes: > > # squeue -u user123 -h -o "%N" | tr '\n' , | xargs scontrol show > hostlistsorted > b[135,135,135] > > This gives a better list: > > # squeue -u user123 -h -o "%N" | uniq | tr '\n' , | xargs scontrol show > hostlistsorted > b135 > > BTW, if you enter a non-existent user, the output is an unexpected error > message > and a long help info :-) > > /Ole I have just realised that pdsh, which was what I wanted the consolidated list for, has a Slurm module, which knows about Slurm jobs. I followed your instructions here: https://wiki.fysik.dtu.dk/niflheim/SLURM#pdsh-parallel-distributed-shell with some modifications for EPEL6. However, in the 'rebuild' line rpmbuild --rebuild --with=slurm --without=torque pdsh-2.26-4.el6.src.rpm fails with --with=slurm: unknown option The page https://github.com/grondo/pdsh implies it should be rpmbuild --rebuild --with-slurm --without-torque pdsh-2.26-4.el6.src.rpm but this also fails: --with-slurm: unknown option Any ideas what I'm doing wrong? Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Re: Controlling the output of 'scontrol show hostlist'?
Hi, Kent Engström <k...@nsc.liu.se> writes: > "Loris Bennett" <loris.benn...@fu-berlin.de> writes: >> Hi, >> >> I can generate a list of node lists on which the jobs of a given user >> are running with the following: >> >> $ squeue -u user123 -h -o "%N" >> node[006-007,014,016,021,024] >> node[012,094] >> node[005,008-011,013,015,026,095,097-099] >> >> I would like to merge these node lists to obtain >> >> node[005-016,021,024,026,094-095,097-099] >> >> I can do the following: >> >> $ squeue -u user123 -h -o "%N" | xargs -I {} scontrol show hostname {} | >> sed ':a;N;$!ba;s/\n/,/g' | xargs scontrol show hostlistsorted >> node[005-016,021,024,026,094-095,097-099] >> >> Would it be worth adding an option to allow the delimiter in the output >> of 'scontrol show hostname' to be changed from an newline to, say, a >> comma? That would permit easier manipulation of node lists without >> one having to google the appropiate sed magic. > > Hi, > > slighly off topic, but if you are willing to install and use an external > program that is not part of SLURM itself, I might perhaps be allowed to > advertise the python-hostlist package? > > Your example would be: > > squeue -u user123 -h -o "%N" | hostlist -c - > > (read as "Collapse several hostlist into one, and take the input from > stdin"). > > You find it at: > https://pypi.python.org/pypi/python-hostlist > https://www.nsc.liu.se/~kent/python-hostlist/ > > Best Regards, > -- > Kent Engström, National Supercomputer Centre > k...@nsc.liu.se, +46 13 28 In fact, I had already spotted your ad from 2010: https://groups.google.com/forum/#!topic/slurm-devel/n6x2WgGmDls but was wondering whether there might be any interest in having a solution more tightly integrated solution. I am not averse to installing an external program and I do rather like Python, despite having done most of my programming in Perl. However, my experience of installing Python software for users is that the packet management is somewhat fragmented and brittle. Nevertheless, I was able to install your package (with only minor moaning from pip) and it works fine. Thanks, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Re: Controlling the output of 'scontrol show hostlist'?
Hi Jens, Well golfed. I hadn't realised that 'hostlistsorted' will take mutlitple sorted lists and resort them. Cheers, Loris Jens Dreger <jens.dre...@physik.fu-berlin.de> writes: > I think > > squeue -u user123 -h -o "%N" | tr '\n' , | xargs scontrol show > hostlistsorted > > should also do it... Slightly better to remember ;) > > On Thu, Jun 22, 2017 at 02:59:11AM -0600, Loris Bennett wrote: >> >> Hi, >> >> I can generate a list of node lists on which the jobs of a given user >> are running with the following: >> >> $ squeue -u user123 -h -o "%N" >> node[006-007,014,016,021,024] >> node[012,094] >> node[005,008-011,013,015,026,095,097-099] >> >> I would like to merge these node lists to obtain >> >> node[005-016,021,024,026,094-095,097-099] >> >> I can do the following: >> >> $ squeue -u user123 -h -o "%N" | xargs -I {} scontrol show hostname {} | >> sed ':a;N;$!ba;s/\n/,/g' | xargs scontrol show hostlistsorted >> node[005-016,021,024,026,094-095,097-099] >> >> Would it be worth adding an option to allow the delimiter in the output >> of 'scontrol show hostname' to be changed from an newline to, say, a >> comma? That would permit easier manipulation of node lists without >> one having to google the appropiate sed magic. >> -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Controlling the output of 'scontrol show hostlist'?
Hi, I can generate a list of node lists on which the jobs of a given user are running with the following: $ squeue -u user123 -h -o "%N" node[006-007,014,016,021,024] node[012,094] node[005,008-011,013,015,026,095,097-099] I would like to merge these node lists to obtain node[005-016,021,024,026,094-095,097-099] I can do the following: $ squeue -u user123 -h -o "%N" | xargs -I {} scontrol show hostname {} | sed ':a;N;$!ba;s/\n/,/g' | xargs scontrol show hostlistsorted node[005-016,021,024,026,094-095,097-099] Would it be worth adding an option to allow the delimiter in the output of 'scontrol show hostname' to be changed from an newline to, say, a comma? That would permit easier manipulation of node lists without one having to google the appropiate sed magic. Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Re: ExitCode 139
Hi Djibril, Djibril Mboup <djibril.mb...@aims-senegal.org> writes: > Re: [slurm-dev] Re: ExitCode 139 > > I see M Loris, but I don't know why I got this error. Does it mean I don't > have enough memory to execute my code? You can see my batch script below: > > #SBATCH --partition=x > #SBATCH --account=x > #SBATCH --nodes=2 > #SBATCH --ntasks=4 > #SBATCH --cpus-per-task=20 > #SBATCH --time=01:00:00 > #SBATCH --exclusive > > srun hostname -s| sort -u > mpd.hosts > > mpiexec.hydra -f mpd.hosts -perhost $nb_cpu -n $SLURM_NTASKS ./code -c > config.info I can't tell how much memory your program needs just by looking at the batch script. It depends on what the program does and, possibly, what parameters you pass to it in 'config.info'. The error could be due to the program running out of memory, but it could also be due to your program doing something wrong, such as trying to write beyond the bounds of an array. This is probably unrelated, but the value of --cpus-per-task is quite high. Do the nodes have 20 CPUs each? Cheers, Loris > On 21 June 2017 at 05:45, Loris Bennett <loris.benn...@fu-berlin.de> wrote: > > Hi Djibril, > > Djibril Mboup <djibril.mb...@aims-senegal.org> writes: > > > ExitCode 139 > > > > Hello, > > Since yesterday, I have got an error after submitting a job. The exit code > 139:0 remind you something. > > Thanks > > Try searching for "exit code 139" with your favourite search engine. > You will find that it indicates that your program experienced a > segmentation fault. > > Cheers, > > Loris > > -- > Dr. Loris Bennett (Mr.) > ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de > > -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Re: ExitCode 139
Hi Djibril, Djibril Mboup <djibril.mb...@aims-senegal.org> writes: > ExitCode 139 > > Hello, > Since yesterday, I have got an error after submitting a job. The exit code > 139:0 remind you something. > Thanks Try searching for "exit code 139" with your favourite search engine. You will find that it indicates that your program experienced a segmentation fault. Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Re: Long delay starting slurmdbd after upgrade to 17.02
Hi Ole, Ole Holm Nielsen <ole.h.niel...@fysik.dtu.dk> writes: > On 06/20/2017 04:32 PM, Loris Bennett wrote: >> We do our upgrades while full production is up and running. We just stop >> the Slurm daemons, dump the database and copy the statesave directory >> just in case. We then do the update, and finally restart the Slurm >> daemons. We only lost jobs once during an upgrade back around 2.2.6 or >> so, but that was due a rather brittle configuration provided by our >> vendor (the statesave path contained the Slurm version), rather than >> Slurm itself and was before we had acquired any Slurm expertise >> ourselves. > > 1. When you refer to "daemons", do you mean slurmctld, slurmdbd as well as > slurmd on all compute nodes? AFAIK, the recommended procedure upgrading and > restarting in this order: 1) slurmdbd, 2) slurmctld, 3) slurmd on nodes. We don't stop slurmd on the nodes. The nodes only get the new Slurm version on the next reboot. The documentation mentions this possibility of this kind of rolling upgrade and we haven't had any problems with it. > 2. When you mention statesave, I suppose this is what you refer to: > # scontrol show config | grep -i statesave > StateSaveLocation = /var/spool/slurmctld Yes, that's right. Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Re: Long delay starting slurmdbd after upgrade to 17.02
Hi Nick, We do our upgrades while full production is up and running. We just stop the Slurm daemons, dump the database and copy the statesave directory just in case. We then do the update, and finally restart the Slurm daemons. We only lost jobs once during an upgrade back around 2.2.6 or so, but that was due a rather brittle configuration provided by our vendor (the statesave path contained the Slurm version), rather than Slurm itself and was before we had acquired any Slurm expertise ourselves. Paul: How do you pause the jobs? SIGSTOP all the user processes on the cluster? Cheers, Loris Paul Edmon <ped...@cfa.harvard.edu> writes: > If you follow the guide on the Slurm website you shouldn't have many > problems. We've made it standard practice here to set all partitions to DOWN > and suspend all the jobs when we do upgrades. This has led to > far greater stability. So we haven't lost any jobs in an upgrade. The only > weirdness we have seen is if jobs exit while the DB upgrade is going. > Sometimes it can leave residual jobs in the DB that were properly closed > out. This is why we pause all the jobs as it makes it such that we don't end > up with jobs exiting before the DB is back. In 16.05+ you have the: > > sacctmgr show runawayjobs > > Feature which can clean up all those orphan jobs. So its not as much a > concern anymore. > > Beyond that we follow the guide at the bottom of this page: > > https://slurm.schedmd.com/quickstart_admin.html > > I haven't tried going two major versions at once though. The docs indicate > that it should work fine. We generally try to keep pace with current stable. > > Given that you only have 100,000 jobs your upgrade should probably go fairly > quick. I could imagine around 10-15 minutes. Our DB has several million jobs > and it takes about 30 min to an hour depending on what > operations are bing done. > > -Paul Edmon- > > On 06/20/2017 09:37 AM, Nicholas McCollum wrote: > > I'm about to update 15.08 to the latest SLURM in August and would appreciate > any notes you have on the process. > > I'm especially interested in maintaining the DB as well as associations. I'd > also like to keep the pending job list if possible. > > I've only got around 100,000 jobs in the DB so far, since January. > > Thanks > > Nick McCollum > Alabama Supercomputer Authority > > On Jun 20, 2017 8:07 AM, Paul Edmon <ped...@cfa.harvard.edu> wrote: > > Yeah, that sounds about right. Changes between major versions can take > quite a bit of time. In the past I've seen upgrades take 2-3 hours for > the DB. > > As for ways to speed it up. Putting the DB on newer hardware if you > haven't already helps quite a bit (depends on architecture as to how > much gain you will get, we went from AMD Abu Dhabi to Intel Broadwell > and saw a factor of 3-4 speed improvement). Upgrading to the latest > version of MariaDB if you are on an old version of MySQL can get you > about 30-40%. > > Doing all of these whittled our DB upgrade times for major upgrades to > about 30 min or so. > > Beyond that I imagine some more specific DB optimization tricks could be > done, but I'm not a DB admin so I won't venture to say. > > -Paul Edmon- > > On 06/20/2017 08:42 AM, Tim Fora wrote: > > Hi, > > > > Upgraded from 15.08 to 17.02. It took about one hour for slurmdbd to > > start. Logs show most of the time was spent on this step and other table > > changes: > > > > adding column admin_comment after account in table > > > > Does this sound right? Any ideas to help things speed up. > > > > Thanks, > > Tim > > > > > > -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Re: Long delay starting slurmdbd after upgrade to 17.02
Hi Tim, Tim Fora <tf...@riseup.net> writes: > Hi, > > Upgraded from 15.08 to 17.02. It took about one hour for slurmdbd to > start. Logs show most of the time was spent on this step and other table > changes: > > adding column admin_comment after account in table > > Does this sound right? Any ideas to help things speed up. It probably depends a great deal on how many entries you have in your database and what sort of hardware you have. We are up to around 1.6 million jobs and have never purged anything. I seem to remember the last update between major releases taking long enough to allow me to get slightly uneasy, but not long enough for me to really worry, so I guess it was probably around 10-15 minutes. Our CPUs are around 6 years old, but the DB is on an SSD. HTH Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Re: Can't get formatted sinfo to work...
Hi Mehmet, "Belgin, Mehmet" <mehmet.bel...@oit.gatech.edu> writes: > I’m troubleshooting an issue that causes NHC to fail to offline a bad > node. The node offline script uses formatted “sinfo" to identify the > node status, which returns blank for some reason. Interestingly, sinfo > works without custom formatting. > > Could this be due to a bug in the current version (17.02.4)? Would > someone mind trying the following commands in an older Slurm version > to compare the output? > > [root@devel-vcomp1 nhc]# sinfo --version > slurm 17.02.4 > > [root@devel-vcomp1 nhc]# sinfo -o '%t %E' -hn `hostname` > > (NOTHING!) > > [root@devel-vcomp1 nhc]# sinfo -hn `hostname` > test up infinite 0 n/a > vtest* up infinite 0 n/a > > (OK) > > Thanks! > > -Mehmet > Seem to work as expected with our version: [root@node003 ~]# sinfo --version slurm 16.05.10-2 [root@node003 ~]# sinfo -o '%t %E' -hn `hostname` mix none [root@node003 ~]# sinfo -hn `hostname` test up3:00:00 0n/a main* up 14-00:00:0 1mix node003 gpuup 14-00:00:0 0n/a HTH, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Re: Slurm accounting problem with GPFS
> Am 09.06.2017 um 12:02 schrieb Loris Bennett: >> >> Hi Marcel, >> >> Marcel Sommer <marcelsommer...@gmail.com> writes: >> >>> Slurm accounting problem with GPFS >>> >>> Hi, >>> >>> we are running slurm 2.6.5 and we have a master and a backup >>> controller configuration. We use the filetxt plugin and the accounting >>> logfile is stored in a folder on a central filesystem (GPFS). >>> >>> The problem that we have is that the accounting stuck after a couple >>> of days. When I restart the slurm daemon it works fine but a few days >>> later the problem comes again. >>> >>> Have you any suggestions? >>> >>> Cheers, >>> Marcel >> >> The combination of GPFS, filetxt, and, in particular, such an old version >> of Slurm is probably quite rare, so I suspect not many people will be >> able to help you. Unless, that is, it is a known problem, in which case >> it has probably been fixed in a later version. In addition, it now says >> the following on the Slurm download page: >> >> Due to a security vulnerability (CVE-2016-10030), all versions of >> Slurm prior to 15.08.13 or 16.05.8 are no longer available. >> >> So you need to do an update anyway. And as the intermediate versions >> are now no longer available, you basically just need to set up Slurm >> again from scratch. >> >> Sorry about that, >> >> Loris >> "marcelsommer...@gmail.com" <marcelsommer...@gmail.com> writes: > Hi Loris, > > thank you for the quick reply. Unfortunately this old version comes from > Ubuntu 14.04 what we have installed on the nodes. > > OK, the easier solution for us is to try to install an newer slurm > version on this OS. > > ...or can a slurmdbd backend solve this problem? Hard to say, but even if it does fix this problem, you will still be using a very old version with a serious security problem (although if you are not using a prolog script, you won't be affected). If I were you, I would install a fairly current version. You will get a lot of new functionality and more people on the mailing list will be able to help you if you do have any issues. But, of course, it's your call. Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Re: Slurm accounting problem with GPFS
Hi Marcel, Marcel Sommer <marcelsommer...@gmail.com> writes: > Slurm accounting problem with GPFS > > Hi, > > we are running slurm 2.6.5 and we have a master and a backup > controller configuration. We use the filetxt plugin and the accounting > logfile is stored in a folder on a central filesystem (GPFS). > > The problem that we have is that the accounting stuck after a couple > of days. When I restart the slurm daemon it works fine but a few days > later the problem comes again. > > Have you any suggestions? > > Cheers, > Marcel The combination of GPFS, filetxt, and, in particular, such an old version of Slurm is probably quite rare, so I suspect not many people will be able to help you. Unless, that is, it is a known problem, in which case it has probably been fixed in a later version. In addition, it now says the following on the Slurm download page: Due to a security vulnerability (CVE-2016-10030), all versions of Slurm prior to 15.08.13 or 16.05.8 are no longer available. So you need to do an update anyway. And as the intermediate versions are now no longer available, you basically just need to set up Slurm again from scratch. Sorry about that, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Re: understanding of Purge in Slurmdb.conf
Hi Rohan, Rohan Gadalkar <rohangadal...@gmail.com> writes: > understanding of Purge in Slurmdb.conf > > Hello Slurm Team, > > I am new-bee to the world of SLURM. I was going through the > Slurmdb.conf page, where I came across the PurgeEventAfter etc. as > mentioned below. > > Unable to understand the below things, as in each of the below most of > the lines are copied. > > I would request you to share any kind of diagrammatic explanation > which will clear the confusion in understanding this topic. > > Below is the link and topics which I want you to explain for me. > > https://slurm.schedmd.com/slurmdbd.conf.html > > PurgeEventAfter;PurgeJobAfter;PurgeResvAfter;PurgeStepAfter; > PurgeSuspendAfter;PurgeTXNAfter;PurgeUsageAfter > > Looking forward to your KB, as it will help me and my colleagues to > understand this. You really need to be more specific about what you don't understand. The documentation you refer to seems to me to be fairly clear. As described, the parameters just allow you to set various time periods after which various types of entries in the database will be purged. I'm not sure how a diagram would help in this case. Regards Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Re: srun - replacement for --x11?
Edward Walter <ewal...@cs.cmu.edu> writes: > On 06/06/2017 05:29 AM, Loris Bennett wrote: >> >> Hi, >> >> We used to tell users that they could specify the '--x11' option >> to run a graphical application interactively within a Slurm job. >> With version 16.05.10-2 this option is no longer available. >> >> Is the canonical solution now to use the scripts given here: >> >>https://slurm.schedmd.com/faq.html#terminal >> >> (or one of the various modifications/forks)? >> > Doesn't that functionality come from a spank plugin? > https://github.com/hautreux/slurm-spank-x11 > > Hope that helps. > > -Ed It may well do, but the last commit is from 11th December 2014. Up to now I thought '--x11' was the shiny new replacement for the SPANK plugin. I have tried https://github.com/jabl/sinteractive.git but it didn't really work for me: $ sinteractive Waiting for JOBID 1551288 to start No screen session found. No screen session found. No screen session found. No screen session found. There is no screen to be detached matching slurm1551288. Connection to node001 closed. I guess I'll just have to try out some of the other forks, etc. Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] srun - replacement for --x11?
Hi, We used to tell users that they could specify the '--x11' option to run a graphical application interactively within a Slurm job. With version 16.05.10-2 this option is no longer available. Is the canonical solution now to use the scripts given here: https://slurm.schedmd.com/faq.html#terminal (or one of the various modifications/forks)? Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Wrong Python version used in batch MPI job
Hi, My system Python is version 2.6.6. Using RedHat Software Collections I have successfully built a program using Python 3.5.1 and Intel's MPI. When I run a job with scl enable python35 bash module load gpaw/test gpaw -P 4 test via Slurm, I get the following error: File "/cm/shared/apps/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/bin/mpiexec", line 187 except EOFError, e: ^ SyntaxError: invalid syntax This is because the mpiexec script is written for Python 2 but is being interpreted by Python 3. Has anyone had a similar issue and come up with a solution? Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Re: Multinode MATLAB jobs
Hi Benjamin, Benjamin Redling <benjamin.ra...@uni-jena.de> writes: > Hi, > > Am 31.05.2017 um 10:39 schrieb Loris Bennett: >> Does any one know whether one can run multinode MATLAB jobs with Slurm >> using only the Distributed Computing Toolbox? Or do I need to be >> running a Distributed Computing Server too? > > if you can get a hand on the overpriced and underwhelming DCS (at least > up to the 2016b Linux variant the mdce service neither has startup > scripts with LSB tags, nor systemd units; only the very first annoyance), > the following might be a consolation: > " > Access to all eligible licensed toolboxes or blocksets with a single > server license on the distributed computing resource > " > https://www.mathworks.com/products/distriben/features.html > > > (We currently use DCS without Slurm integration and thous are bad > citizens considering the license pool we have to share. > But running DCS without scheduler integration is bad in many ways. e.g. > proper security levels don't cooperate with plain LDAP, default security > runs job as root [hello, inaccessible NFS shares] so it seems users > either start single node parallel jobs apart from DCS or DCS-Slurm > integration is mandatory and you get all the benefits -- license count, > security level, multi-node) Thanks for the information. Currently we only have one user wanting to run jobs on more cores than we have on individual nodes and I also need to check how his code scales before shelling out for DCS. We are also bad citizens in that we allow some interactive and batch usage of MATLAB licenses from the same pool. To stop jobs failing because the licenses are all in use, jobs have to specify a reservation containing the available licenses. This is updated regularly by a cron job which parses the output of the license manager. It works, but it's a bit of a nasty hack. Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Re: Multinode MATLAB jobs
John DeSantis <desan...@usf.edu> writes: > Loris, > >>> Does any one know whether one can run multinode MATLAB jobs with Slurm > > I completely missed the _multinode_ part. Feel free to ignore, and sorry to > all for the noise in > the list! No problem. The bit about having separate cluster profiles was new to me, so I still learned something :-) Loris > John DeSantis > > John DeSantis wrote: >> >> Loris, >> >>> Does any one know whether one can run multinode MATLAB jobs with Slurm >>> using only the >>> Distributed Computing Toolbox? Or do I need to be running a Distributed >>> Computing Server >>> too? >> >> Our users are able to use only the Distributed Computing Toolbox by ensuring >> that they: >> >> 1.) Request a single node with the desired number of processors per parpool >> [0]; 2.) Ensure >> that a separate cluster profile is created with each job. >> >> By taking the two steps above, users can submit multiple jobs without MATLAB >> crashing stating >> that a pool is already open. >> >> [0] Nodes in our cluster depending on their age have between 12-24 >> processors available. If >> a user wants a parpool of 24, they must request either a constraint or a >> combination of -N 1 >> and --ntasks-per-node=24, for example. >> >> HTH, John DeSantis >> >> Loris Bennett wrote: >> >>> Hi, >> >>> Does any one know whether one can run multinode MATLAB jobs with Slurm >>> using only the >>> Distributed Computing Toolbox? Or do I need to be running a Distributed >>> Computing Server >>> too? >> >>> Cheers, >> >>> Loris >> >> > -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Multinode MATLAB jobs
Hi, Does any one know whether one can run multinode MATLAB jobs with Slurm using only the Distributed Computing Toolbox? Or do I need to be running a Distributed Computing Server too? Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] RE: Slurm job priorities
equals 2**63, i.e. 1 larger than the largest signed 64-bit integer, which looks like some sort of overflow or type mismatch. Unless anyone else has any ideas, I would be tempted to say that your database is borked and you need to start over again. Sorry not be more helpful :-( Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] RE: Slurm job priorities
Hi David, Baker D.J. <d.j.ba...@soton.ac.uk> writes: > Hi Loris, > > Thank you for your reply. Below is the output from the sprio -n command -- > David > > [djb1@blue34 slurm]$ sprio -n > JOBID PRIORITY AGEFAIRSHARE JOBSIZEPARTITION QOS > > 259920.0717171 nan0.0010979 1.000 > 0.000 > 259930.0717113 nan0.0010979 1.000 > 0.000 > 259940.0716807 nan0.0010979 1.000 > 0.000 > 259950.0716741 nan0.0010979 1.000 > 0.000 > 259960.0716667 nan0.0010979 1.000 > 0.000 > 259970.0716592 nan0.0010979 1.000 > 0.000 > 259990.0104456 nan0.0005946 1.000 > 0.000 > 260000.0102257 nan0.0005946 1.000 > 0.000 > 260010.0098041 nan0.0010979 1.000 > 0.000 > 260030.0095379 nan0.0005946 1.000 > 0.000 > 260040.0094436 nan0.0005946 1.000 > 0.000 > 260050.0094114 nan0.0005946 1.000 > 0.000 > 260060.0093742 nan0.0005946 1.000 > 0.000 > 260070.0091526 nan0.0005946 1.000 > 0.000 > 260080.0091154 nan0.0005946 1.000 > 0.000 > 260090.0090832 nan0.0005946 1.000 > 0.000 > 260100.0087988 nan0.0005946 1.000 > 0.000 > 260110.0087054 nan0.0005946 1.000 > 0.000 > 260120.0086119 nan0.0005946 1.000 > 0.000 > 260140.0054638 nan0.0005946 1.000 > 0.000 > 260160.0026513 nan0.0005946 1.000 > 0.000 > 259880.0717221 nan0.0010979 1.000 > 0.000 What about sshare -la ? That should show you something about how the fairshare values is calculated from the raw shares and the CPU usage. Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] RE: Slurm job priorities
Hi David, Baker D.J. <d.j.ba...@soton.ac.uk> writes: > Hi Loris, > > Thank you again for your comments. I thought that I understood the > situation better, and I have followed your basic model re setting up > shares. So, for example, I have users in the group "research" and > added shares accordingly...see below. In other words all the users in > the research group have an equal share of the pie. On the other hand I > see that "sprio" is still reporting "nan" for the fairshare. Have I > missed something fundamental here? I even waited some time before > submitting another test job, however the situation was unchanged. > > Some clues would really be appreciated, please. > > Best regards, > David > > [djb1@blue34 slurm]$ sacctmgr list assoc tree format=account,user,fairshare > Account User Share > -- - > root1 > root root 1 > gpuusers 2 > gpuusers djb1 1 > gpuusers hpc 1 > research 15 > research ab24g12 1 > research cica1d14 1 > research djb1 1 > research dpm1u13 1 > research gtj1y12 1 > research hpc 1 > research icw 1 > research jag1g13 1 > research jec1f12 1 > research lmr1u16 1 > research mb1a10 1 > research mjp1m12 1 > research ph1m12 1 > research srw1g10 1 > research tp1v09 1 > > [djb1@blue34 slurm]$ sprio -l > JOBID USER PRIORITYAGE FAIRSHAREJOBSIZE > PARTITIONQOSNICE TRES > 25992 mjp1m12 -922337203 63nan 1 > 1000 0 0 > 25993 mjp1m12 -922337203 63nan 1 > 1000 0 0 > 25994 mjp1m12 -922337203 63nan 1 > 1000 0 0 > 25995 mjp1m12 -922337203 63nan 1 > 1000 0 0 > 25996 mjp1m12 -922337203 63nan 1 > 1000 0 0 > 25997 mjp1m12 -922337203 63nan 1 > 1000 0 0 > 25988 mjp1m12 -922337203 63nan 1 > 1000 0 0 > 25999 djb1 -922337203 2nan 1 > 1000 0 0 > 26000 djb1 -922337203 2nan 1 > 1000 0 0 > What does sprio -n show (this shows the normalised, i.e. unweighted, priority factors)? Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] RE: Slurm job priorities
Hi David, Baker D.J. <d.j.ba...@soton.ac.uk> writes: > Hi Loris, > > Thank you for your reply. The output from "sprio -l" is: > > JOBID USER PRIORITYAGE FAIRSHAREJOBSIZE > PARTITIONQOSNICE TRES > 25988 mjp1m12 -922337203 2nan 1 > 1000 0 0 > 25992 mjp1m12 -922337203 2nan 1 > 1000 0 0 > 25993 mjp1m12 -922337203 2nan 1 > 1000 0 0 > 25994 mjp1m12 -922337203 2nan 1 > 1000 0 0 > 25995 mjp1m12 -922337203 2nan 1 > 1000 0 0 > 25996 mjp1m12 -922337203 2nan 1 > 1000 0 0 > 25997 mjp1m12 -922337203 2nan 1 > 1000 0 0 > > I've also attached a copy of our slurm.conf, if that helps. Any advice that > you could give us would be appreciated. A value of 'nan' for 'FAIRSHARE' is not what you want. I suspect you haven't set up any shares. What does the following produce? sacctmgr list assoc tree format=account,user,fairshare For me this looks something like: Account User Share -- - root1 root root 1 bcp 169 biology 15 group01 3 group01 alice 1 group01 bob 1 group01 carol 1 group02 1 group02dave 1 ... For each user and account you need to set up the shares. Check the official 'sacctmgr' page: https://slurm.schedmd.com/sacctmgr.html Ole Holm Nielsen also has some helpful information on the following page: https://wiki.fysik.dtu.dk/niflheim/Slurm_accounting In general it can be a bit of a faff setting up and maintaining shares. All our users have equal shares and only belong to one account, so when we add a users, we just automatically increment all the shares up to the top of the hierarchy and decrement correspondingly when the user is deleted. HTH Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Re: Slurm job priorities
Hi David, Baker D.J. <d.j.ba...@soton.ac.uk> writes: > Hello, > > I guess that may a simple question for someone more experienced with slurm > scheduling than us. When jobs are queuing in our cluster we find that we get > a lot of these messages in our slurmctld.log > > error: Job 25766 priority exceeds 32 bits > > I cannot find any mention or discussion of this type of error in the mailing > list archives, and so I wondered if someone could please explain how to > prevent these errors. We have tried reducing the fair share > component to no avail…. > > PriorityWeightAge=1000 > > PriorityWeightFairshare=10 > > PriorityWeightJobSize=1000 > > PriorityWeightPartition=1000 > > PriorityWeightQOS=1 # don't use the qos factor > > Best regards, > > David You would need to show us a little more information. The weights are just that - weights. If you had, say, a partition with a very large priority, then multiplying it by 1000 could push the total priority over the size of a 32-bit integer. What kinds of values does 'sprio -l' show? Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Re: Nodes in state 'down*' despite slurmd running
Ole Holm Nielsen <ole.h.niel...@fysik.dtu.dk> writes: > On 04/05/2017 03:59 PM, Loris Bennett wrote: > >> We are running 16.05.10-2 with power-saving. However, we have noticed a >> problem recently when nodes are woken up in order to start a job. The >> node will go from 'idle~' to, say, 'mixed#', but then the job will fail >> and the node will be put in 'down*'. We have turned up the log level to >> 'debug' with the DebugFlag 'Power', but this hasn't produced anything >> relevant. The problem is, however, resolved if the node is rebooted. >> >> Thus, there seems to be some disturbance of the communication between >> the slurmd on the woken node and the slurmctd on the administration >> node. Does anyone have any idea what might be going on? > > We have seen something similar with Slurm 16.05.10. > > How many nodes are in your network? If there are more than about 400 devices > in > the network, you must tune the kernel ARP cache of the slurmctld server, see > https://wiki.fysik.dtu.dk/niflheim/Slurm_configuration#configure-arp-cache-for-large-networks Thanks for the link, but we have fewer than 120 nodes, so we are along way from the 512-device limit. Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Nodes in state 'down*' despite slurmd running
Hi, We are running 16.05.10-2 with power-saving. However, we have noticed a problem recently when nodes are woken up in order to start a job. The node will go from 'idle~' to, say, 'mixed#', but then the job will fail and the node will be put in 'down*'. We have turned up the log level to 'debug' with the DebugFlag 'Power', but this hasn't produced anything relevant. The problem is, however, resolved if the node is rebooted. Thus, there seems to be some disturbance of the communication between the slurmd on the woken node and the slurmctd on the administration node. Does anyone have any idea what might be going on? Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Re: MaxSubmitPU
Hi Danny, Danny Marc Rotscher <danny.rotsc...@tu-dresden.de> writes: > Hello, > > when I want to add the MaxSubmitPU parameter to one of my qos, it fails with > the following error output: > > sacctmgr modify qos where name=interactive set MaxSubmitPU=1 > Unknown option: MaxSubmitPU=1 > Use keyword 'where' to modify condition > > Does anybody have a solution for my problem? > > Kind reagrds, > Danny Looking at the man page, but not having tried it out, I would guess that it should be sacctmgr modify qos where name=interactive set MaxSubmitJobsPerUser=1 The form shortened for 'MaxSubmitPU' is probably just used for display. HTH Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] error: chdir(/var/log): Permission denied
Hi, I've updated to 16.05.9 and everything seems to be working fine. However, when slurmctld is started, in the file /var/log/slurmctld I get the error [2017-03-03T10:45:13.096] error: chdir(/var/log): Permission denied As I say, everything seems to be working, so is this error an, er, error? Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Re: Slurm version 17.02.0 is now available
Danny Auble <d...@schedmd.com> writes: > After 9 months of development we are pleased to announce the availability of > Slurm version 17.02.0. > > A brief description of what is contained in this release and other notes about > it is contained below. For a fuller description please consult the > RELEASE_NOTES file available in the source. > > Thanks to all involved! > > Slurm downloads are available from https://schedmd.com/downloads.php. This link currently (09:50 CET) just returns the following: [an error occurred while processing this directive] Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Re: Power outage causes wrong reports
Hi Lucas, Lucas Vuotto <l.vuott...@gmail.com> writes: > Hi all, > sreport was showing that an user was using more CPU hours per week > than available. After checking the output of sacct, we found that some > jobs from an array didn't ended: > > $ sacct -j 69204 -o jobid%-14,state%6,start,elapsed,end > > JobID State StartElapsed End > > -- -- --- -- --- > 69204_[1-1000] FAILED 2016-11-09T17:46:50 00:00:00 2016-11-09T17:46:50 > 69204_1FAILED 2016-11-09T17:46:44 71-20:25:55Unknown > 69204_2FAILED 2016-11-09T17:46:44 71-20:25:55Unknown > [...] > 69204_295 FAILED 2016-11-09T17:46:46 71-20:25:53Unknown > 69204_296 FAILED 2016-11-09T17:46:46 71-20:25:53Unknown > 69204_297 FAILED 2016-11-09T17:46:46 00:00:00 2016-11-09T17:46:46 > [...] > 69204_999 FAILED 2016-11-09T17:46:50 00:00:00 2016-11-09T17:46:50 > > It seems that somehow those jobs got stucked (~72 days after > 2016-11-09 is today, 2017-01-20, and that's why the wrong reports). > scancel says that 69204 is an invalid job id. > > Any idea on how to fix this? We're thinking about deleting the entries > of those jobs in the DB. Is it safe to run "arbitrary" commands in the > DB, bypassing slurmdbd? > > Thanks in advance. The following might also be useful: https://groups.google.com/d/msg/slurm-devel/nf7JxV91F40/KUsS1AmyWRYJ The code heuristically decides how to deal with inconsistencies in the database and produces an SQL script to fix them as well as a second script to roll back the changes. Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Permissible updates
Hi, Some years ago, I had a problem with understanding when updates are possible: https://groups.google.com/forum/#!topic/slurm-devel/CNu9iDbQl7U As then, the documentation says Slurm permits upgrades of up to two major or minor updates (e.g. 14.03.x or 14.11.x to 16.08.x) without loss of jobs or other state information I still read this as "two major or *two* minor updates", even though I know that's not what's meant. I think it would be clearer to write: Slurm permits upgrades between any two versions whose major release numbers differ by two or less (e.g. 14.11.x or 15.08.x to 16.05.x) without loss of jobs or other state information I have updated Slurm a few times already and I am a native English speaker, but I still stumble over the current wording. Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Re: Standard suspend/resume scripts?
Lachlan Musicman <data...@gmail.com> writes: > Re: [slurm-dev] Standard suspend/resume scripts? > > If you are looking to suspend and resume jobs, use scontrol: > > scontrol suspend > scontrol resume > > https://slurm.schedmd.com/scontrol.html > > The docs you are pointing to look more like taking nodes offline in times of > low usage? Yes, because that's what I'm interested in ;-) Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Standard suspend/resume scripts?
Hi, I was looking around on the web for standard scripts to use as for SuspendProgram and ResumeProgram, but didn't find much other than the following: https://slurm.schedmd.com/power_save.html Would 'node_shutdown' need to do much more than ssh $host shutdown -P now and 'node_start' more than something like ipmitool -H $host chassis power on ? Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Re: sacctmgr case insensitive
Hi Daniel, Daniel Ruiz Molina <daniel.r...@caos.uab.es> writes: > Hi, > > I'm adding user to accounts in accounting information. However, some users in > my > system have capital letters and when I try to add them to their account, > sacctmgr returns this message: "There is no uid for user 'MY_USER' Are you > sure > you want to continue?". > Then, if I click "y", user is added to its accounting but its name has been > changed to all lower case (I could check with "sacctmgr list user" and > "sacctmgr > list account"), so I suppose there is no relationship between real user (with > capital letters) and the user "modified" in sacctmgr. > > How could I solve this (avoiding, of course, change user names in system)? > > Thanks. > As it says in the man page for 'sacctmgr': user The login name. Only lowercase usernames are supported. If you are importing the usernames from another system, you could filter them in some way. We import from a central university LDAP server to our own LDAP server and can thus tweak the attributes or add attributes, such as 'loginShell'. Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Re: Setting a partition QOS, etc
Hi David, Baker D.J. <d.j.ba...@soton.ac.uk> writes: > Hello, > > This is hopefully a very simple set of questions for someone. I’m evaluating > slurm with a view to replacing our existing torque/moab system, and I’ve been > reading about defining partitions and QoSs. I like the idea of being able to > use > a QoS to throttle user activity -- for example to set maxcpus/user, > maxjobs/user > and maxnodes/user, etc, etc. Also I’m going to define a very simple set of > partitions to reflect the different types of nodes in the cluster. For example > > Batch – normal compute nodes > > Highmem – high memory nodes > > Gpu – gpu nodes We have a similar range of hardware, albeit with three different categories of memory, but we decided against setting these up as separate partitions. The disadvantage is that small memory jobs can potentially clog up the large memory nodes; the advantage is that small memory jobs can use the large memory nodes if they would otherwise be empty. > So presumably it makes sense to associate the “normal” QOS with the batch > queue > and define throttling limits as needs. Then define corresponding QoSs for the > highmem and gpu partitions. In this respect do the QOS definitions override > any > definitions on the PartitionName line? For example does QOS Maxwall override > MaxTime? The hierarchy of the limits is given here: https://slurm.schedmd.com/resource_limits.html However, unless you have specific needs, having limits defined on both the partitions and QOS might be overkill. If, as you say later, you have a heterogeneous job mix, you probably also have a heterogeneous user base, some of whom might find the setup confusing. For that reason, I would start with a fairly simple configuration and only add to that as the need arises. > Also I suspect I’ll need to define a test queue with a high level of > throttling > to enable users to get a limited number of small test jobs through the system > quickly. In this respect does it make sense for my batch and test partitions > to > overlap either partially or completely? At any one time the test partition > will > only take a few resources out of the pool of normal compute nodes? We originally had a separate test partition, but have now moved to a 'short' QOS on the main batch partition which increases the priority for a limited number of jobs with a short maximum run-time. If you have overlapping batch and test partitions, the batch jobs can clog the test nodes, although you could have different priorities for each partition. > Another issue is that we do have a large mix of small and large jobs. In our > torque/moab cluster we make use of the XFACTOR component to make sure that > small > jobs don’t get starved out of the system. I don’t think there is an analog of > this parameter in slurm, and so I need to understand how to enable smaller > jobs > to compete with the larger jobs and not get starved out. Using slurm I > understand that the backfill mechanism and priority flags like > PriorityFavorSmall=NO and SMALL_RELATIVE_TO_TIME can help the situation. What > are your thoughts? We also have a very heterogeneous job mix, but don't have any problem with small jobs starving. On the contrary, as we share nodes, small jobs with moderate memory requirements have an advantage, as there are always a few cores available somewhere in the cluster, even when it is quite full. For this reason we favour large jobs slighty. > Your advice on the above points would be appreciated, please. > > Best regards, > > David Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Re: Daytime Interactive jobs
"Vicker, Darby (JSC-EG311)" <darby.vicke...@nasa.gov> writes: [snip (54 lines)] > In the end, I just took the debug nodes out of the normal partition. In other > words, we have debug nodes 24/7. This was the simplest thing to do to avoid > the > redefinition of partitions via cron as Gary suggested. Our cluster has grown > quite a bit since we first set up this debug standing reservation so having > dedicated debug nodes isn't as big of a deal for us now. But if there is an > elegant way to accomplish the same setup under slurm, I would appreciate > knowing > how to do that. [snip (27 lines)] We used to have dedicated partition with a couple of test/debug nodes. Now, however, we have moved to a single partition and have defined a QOS for short run-times which has an much larger priority weight than the standard QOS. This allows users to, say, run tests of large MPI jobs. The total number of jobs a user can have in the test/debug QOS is limited. Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] RE: A little bit help from my slurm-friends
David WALTER <david.wal...@ens.fr> writes: > Dear Loris, > > Thanks for your response ! > > I'm going to look on this features in slurm.conf. I only configured > the CPUs, Sockets per node. Do you have any example or link to > explain me how it's working and what can I use ? It's not very complicated. A feature is just a label, so if you had some nodes with Intel processors and some with AMD, you could attach the features, e.g. NodeName=node[001,002] Procs=12 Sockets=2 CoresPerSocket=6 ThreadsPerCore=1 RealMemory=42000 State=unknown Feature=intel NodeName=node[003,004] Procs=12 Sockets=2 CoresPerSocket=6 ThreadsPerCore=1 RealMemory=42000 State=unknown Feature=amd Users then just request the required CPU type in their batch scripts as a constraint, e.g: #SBATCH --constraint="intel" > My goal is to respond to people needs and launch their jobs as fast as > possible without losing time when one partition is idle whereas the > others are fully loaded. The easiest way to avoid the problem you describe is to just have one partition. If you have multiple partitions, the users have to understand what the differences are so that they can choose sensibly. > That's why I thought the fair share factor was the best solution Fairshare won't really help you with the problem that one partition might be full while another is empty. It will just affect the ordering of jobs in the full partition, although the weight of the partition term in the priority expression can affect the relative attractiveness of the partitions. In general, however, I would suggest you start with a simple set-up. You can always add to it later to address specific issues as they arise. For instance, you could start with one partition and two QOS: one for normal jobs and one for test jobs. The latter could have a higher priority, but only a short maximum run-time and possibly a low maximum number of jobs per user. Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Re: A little bit help from my slurm-friends
Hello David, David WALTER <david.wal...@ens.fr> writes: > Hello everyone, > > I need some advice or some good practices as I’m a new SLURM’s administrator… > in > fact a new cluster manager ! > > Everything is OK, jobs running well etc… But now I would like to configure > priority on jobs to improve the efficiency of my cluster. I see I have to > activate “Multifactor Priority plugin” to get rid of the FIFO's default > behavior > of SLURM. > > So there are 6 factors and the fair share one is interesting me but do you > some > advices ? I’m managing a small cluster (I think), 40 nodes, with 4 different > generations (and different hardware) and I would like to optimize it. For now > I > set 4 partitions, 1 per generation that may be not the best solution ? An alternative would be to have just one partition and to distinguish the the machines via 'features defined in slurm.conf. It depends a bit on how different the machines are and how interested in these differences the users are. > Do you think I can just use the “job size” and “partition” and maybe the “age” > factors ? Maybe you need more information ? I would have thought that in general you want to use 'fairshare' as well, but that obviously depends on what you are trying to achieve. > In any case thanks for your help > > David Regards Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Re: where to find completed job execution command
Sean McGrath <smcg...@tchpc.tcd.ie> writes: > Hi, > > On Thu, Jan 05, 2017 at 02:29:11PM -0800, Prasad, Bhanu wrote: > >> Hi, >> >> >> Is there a convenient command like `scontrol show job id` to check more info >> of jobs that are completed > > Not to my knowledge. > >> >> or any command to check the sbatch command run in that particular job > > How we do this is with the slurmctld epilog script: > > EpilogSlurmctld=/etc/slurm/slurm.epilogslurmctld > > Which does the following: > > /usr/bin/scontrol show job=$SLURM_JOB_ID > > $recordsdir/$SLURM_JOBID.record > > The `scontrol show jobid=` record is saved to the file system for future > reference if it is needed. It might be worth using the option '--oneliner' to print out the record in a single line. You could then parse it more easily for, say, then inserting the data into a table in a database. Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Re: Unrestricted use of a node
Loris Bennett <loris.benn...@fu-berlin.de> writes: > Hi, > > Ulf Markwardt <ulf.markwa...@tu-dresden.de> writes: > >> Dear all, >> >> we are using CR_Core_Memory, granularity of our jobs is cores, so: >> shared nodes. And all is well, jobs get killed once they use too much >> memory, cgroups are in place. >> >> But. >> A user wants to have a node explicitely, not caring about number of CPU >> cores and amount of RAM in that specific node (ranging e.g. from 12 >> cores to 24, and from 32 to 256 GB), but he wants to use ALL resources. >> >> At the moment, I see no way to tell this Slurm. - OK, I can ask for 24 >> cores and 64 GB in a node, but then I do not get the chance to run on 12 >> cores/32 GB. >> >> Is there already a parameter in Slurm to handle this? >> >> Thanks, >> Ulf > > Wouldn't the sbatch option > > --exclusive > > help? D'oh. This obviously isn't what you want. I some how overlooked the point about using all the resources available ("ALL" just wasn't in caps enough ;-) for me). However, on a system with shared nodes, I would have thought that if the jobs can run on only 12 cores, throughput would be generally increased by always specifying that rather that anything larger. That way you reduce wait times for entire nodes with more cores and you usually get better scaling with, say, two parallel 12-core jobs than with one 24-core job. Obviously in your specific case, this may not be true. Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Re: Unrestricted use of a node
Hi, Ulf Markwardt <ulf.markwa...@tu-dresden.de> writes: > Dear all, > > we are using CR_Core_Memory, granularity of our jobs is cores, so: > shared nodes. And all is well, jobs get killed once they use too much > memory, cgroups are in place. > > But. > A user wants to have a node explicitely, not caring about number of CPU > cores and amount of RAM in that specific node (ranging e.g. from 12 > cores to 24, and from 32 to 256 GB), but he wants to use ALL resources. > > At the moment, I see no way to tell this Slurm. - OK, I can ask for 24 > cores and 64 GB in a node, but then I do not get the chance to run on 12 > cores/32 GB. > > Is there already a parameter in Slurm to handle this? > > Thanks, > Ulf Wouldn't the sbatch option --exclusive help? Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Re: Slurm license management question
ce_available_string = ",".join(licence_strings_available) scontrol_string = scontrol + \ ' update reservationname=licenses_' + vendor + \ ' licenses=' + slurm_licence_available_string if args.initialise: print(slurm_licence_total_string) continue if args.dryrun: print(scontrol_string) continue # Actually update the reservation os.system(scontrol_string) # Strings used for testing # #string = 'Users of MATLAB_Distrib_Comp_Engine: (Total of 16 licenses issued; Total of 0 licenses in use)' #string = 'Users of Wavelet_Toolbox: (Error: 2 licenses, unsupported by licensed server)' --- -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Re: Impact to jobs when reconfiguring partitions?
Tuo Chen Peng <tp...@nvidia.com> writes: > I thought ‘scontrol update’ command is for letting slurmctld to pick up any > change in slurm.conf. > > But after reading the manual again, it seems this command is instead to change > the setting at runtime, instead of reading any change from slurm.conf. > > So is restarting slurmctld the only way to let it pick up changes in > slurm.conf? No. You can also do scontrol reconfigure This does not restart slurmctld. Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Re: Slurm license management question
Baker D.J. <d.j.ba...@soton.ac.uk> writes: > Hello, > > Looking at the Slurm documentation I see that it is possible to handle basic > license management (this is the link http://slurm.schedmd.com/licenses.html). > In > other words software licenses can be treated as a resource, however things > appear to be fairly rudimentary at the moment – at least that’s my impression. > We are used to doing license management in moab, and if we don’t have that > properly implemented is it not the end of the world, however not ideal. > > One situation that we would like to be able to deal with is a FlexLM 3 server > redundancy situation. So, for example, our Comsol licenses are served out in > this fashion. Is this something that slurm can deal with, and, if so, how can > it > be done? Any advice including slurm’s short comings and/or future plans in > this > respect would be useful, please. > > Best regards, > > David We have licenses, such as Intel compiler licenses, which can be used both interactively outside the queuing system and within Slurm jobs. We use a script which parses the output of the FlexLM manager and modifies a reservation in which the licenses are defined. This is run as a cron job once a minute. It's a bit of a kludge and obviously won't work well if there is a lot of contention for licenses. I can post the code if anyone is interested. Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] scontrol: update multiple jobs?
Hi, The update jobs section of the manpage for scontrol 15.08.8 says JobId= Identify the job(s) to be updated. The job_list may be a comma separated list of job IDs. However, trying this, I get the following error: $ scontrol update jobid=1135541,1135542 timelimit=+1:00:00 scontrol: error: Invalid job ID 1135541,1135542 Is this a documentation error? Does the syntax work for more recent versions? -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Re: Jobs which started and completed within an interval
"Loris Bennett" <loris.benn...@fu-berlin.de> writes: > Hi, > > Is it possible to find jobs which both started and completed in a given > interval? > > I am investigating an incident, during which an abnormally high load > occurred on one of our storage servers. To this end I would like to > know whether the beginning and and of any jobs correspond to the > beginning and end of the high-load period. > > I can do something like > > sacct -S 2016-07-13T22:20 -E 2016-07-14T06:20 -s RUNNING -X | grep COMPLETED > > to get jobs which were running in the period and subsequently completed, > but this includes jobs which were running both before and after the > period in question. As this specific question didn't elicit any responses, I would be interested in answers to these more general ones: Do you try to relate events within your system to specific, possibly misbehaving jobs? If so, how? If not, why not? Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Re: number of processes in slurm job
Husen R <hus...@gmail.com> writes: > Re: [slurm-dev] Re: number of processes in slurm job > > Hi, > > Thanks for your reply ! > > I use this sbatch script > > #!/bin/bash > #SBATCH -J mm6kn2_03 > #SBATCH -o 6kn203-%j.out > #SBATCH -A necis > #SBATCH -N 3 > #SBATCH -n 16 > #SBATCH --time=05:30:00 > > mpirun ./mm.o 6000 You need to tell 'mpirun' how many processes to start. If you do not, probably all cores available will be used. So it looks like you have 6 cores per node and thus 'mpirun' starts 18 processes. You should write some thing like mpirun -np ${SLURM_NTASKS} ./mm.o 6000 Cheers, Loris > regards, > > Husen > > On Tue, Jul 12, 2016 at 1:21 PM, Loris Bennett > <loris.benn...@fu-berlin.de> wrote: > > Husen R <hus...@gmail.com> writes: > > > number of processes in slurm job > > > > > > Hi all, > > > > I tried to run a job on 3 nodes (N=3) with 16 number of processes > > (n=16) but slurm automatically changes that n value to 18 (n=18). > > > > I also tried to use other combination of n values that are not equally > > devided by N but Slurm automatically changes those n values to values > > that are equally devided by N. > > > > How to change this behavior ? > > I need to use a specific value of n for experimental purpose. > > > > Thank you in advance. > > > > Regards, > > > > Husen > > > You need to give more details about what you did. How did you set the > number of processes? > > Cheers, > > Loris > > -- > Dr. Loris Bennett (Mr.) > ZEDAT, Freie Universität Berlin Email > loris.benn...@fu-berlin.de > > > -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Re: number of processes in slurm job
Husen R <hus...@gmail.com> writes: > number of processes in slurm job > > Hi all, > > I tried to run a job on 3 nodes (N=3) with 16 number of processes > (n=16) but slurm automatically changes that n value to 18 (n=18). > > I also tried to use other combination of n values that are not equally > devided by N but Slurm automatically changes those n values to values > that are equally devided by N. > > How to change this behavior ? > I need to use a specific value of n for experimental purpose. > > Thank you in advance. > > Regards, > > Husen You need to give more details about what you did. How did you set the number of processes? Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Re: Output of 'sinfo -Nel' not aggregated
Hi Chris, Christopher Samuel <sam...@unimelb.edu.au> writes: > On 30/06/16 17:37, Loris Bennett wrote: > >> With version slurm 15.08.8, the node-oriented output of 'sinfo' is not >> longer aggregated. Instead I get a line for each node, even if the data >> for multiple nodes are the same, > > I think it's a deliberate change, and the it's the manual page > that is out of step. Looks like it changed around 15.08.4. > > $ git describe eafec3c0c0cb977361a5b10388d5469136e1ef38 > slurm-15-08-3-1-94-geafec3c > > commit eafec3c0c0cb977361a5b10388d5469136e1ef38 > Author: Morris Jette <je...@schedmd.com> > Date: Mon Nov 23 15:48:15 2015 -0800 > > sinfo: Print each node one separate line with -N option > > diff --git a/src/sinfo/sinfo.c b/src/sinfo/sinfo.c > index 2613629..78f2857 100644 > --- a/src/sinfo/sinfo.c > +++ b/src/sinfo/sinfo.c > @@ -736,6 +736,9 @@ static bool _match_node_data(sinfo_data_t *sinfo_ptr, > node_info_t *node_ptr) > { > uint32_t tmp = 0; > > + if (params.node_flag) > + return false; > + > if (params.match_flags.hostnames_flag && > (hostlist_find(sinfo_ptr->hostnames, >node_ptr->node_hostname) == -1)) Thanks for looking into the issue. I'm sure there are good reasons to want one line per node, but equally I thought the aggregated view was quite useful even though I've only got just over 100 nodes. Surely those with many thousands of nodes would like the option of having a more compact view. Or do they obtain similar information in a completely different way? Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Output of 'sinfo -Nel' not aggregated
Hi, With version slurm 15.08.8, the node-oriented output of 'sinfo' is not longer aggregated. Instead I get a line for each node, even if the data for multiple nodes are the same, e.g. $ sinfo -Nel Thu Jun 30 09:28:43 2016 NODELIST NODES PARTITION STATE CPUSS:C:T MEMORY TMP_DISK WEIGHT FEATURES REASON gpu01 1 gpudown 122:6:1 180000 1 (null) Not responding node0011 test idle~ 122:6:1 420000 1 ram48gb none node0021 test idle~ 122:6:1 420000 1 ram48gb none node0031 main* mixed 122:6:1 180000 1 ram24gb none node0041 main* mixed 122:6:1 180000 1 ram24gb none node0051 main* mixed 122:6:1 180000 1 ram24gb none node0061 main* mixed 122:6:1 180000 1 ram24gb none As I remember and as the man page indicates, this should be NODELIST NODES PARTITION STATE CPUSS:C:T MEMORY TMP_DISK WEIGHT FEATURES REASON gpu01 1 gpudown 122:6:1 180000 1 (null) Not responding node[001-002] 1 test idle~ 122:6:1 420000 1 ram48gb none node[003-006] 1 main* mixed 122:6:1 180000 1 ram24gb none Is this a bug and, if so, has it already been fixed? Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Re: License manager and interactively used licenses
"Loris Bennett" <loris.benn...@fu-berlin.de> writes: > "Loris Bennett" <loris.benn...@fu-berlin.de> > writes: > >> Hi Roshan, >> >> Yes, you're right - this will work for us. So the update tweaks the >> number of licences available and presumably extends the reservation by >> another 30 sec, so that you have essentially an infinite reservation >> holding, at any given time, the currently available number of >> licenses. Clever. >> >> Thanks again, >> >> Loris > > [snip (103 lines)] > > I have run into a problem setting up the initial reservation. > > How do I set it up just for licenses such that any user with any account > can use it? It seems that either 'Users' or 'Accounts' must be > specified. Never mind, I figured it out. Not specifying 'Users' and specifying 'Accounts=root' works. Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Timeout before resource becomes available
Hi, One of our users was carrying out some tests and running some very short jobs with a TimeLimit of 60s. However, because one of the nodes had to be booted, which takes a couple of minutes, the jobs were terminated with TIMEOUT as the state. I am aware that we can set BatchStartTimeout to a larger value, but wouldn't it make more sense if the run-time for the job only started to accumulate, once the slurmd on the node became available? Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Re: How to get rid of "zombie" jobs?
Hello Steffen, Steffen Grunewald <steffen.grunew...@aei.mpg.de> writes: > Hello all, > > I've got a rather newly setup cluster, which at the moment is completely idle > ("squeue" doesn't return anything.) > > From the testing phases, a couple of now unused accounts and associations are > left, which I'd like to get rid of: > > [root@login ~]# sacctmgr show assoc >ClusterAccount User Partition Share GrpJobs GrpTRES > GrpSubmit GrpWall GrpTRESMins MaxJobs MaxTRES MaxTRESPerNode > MaxSubmit MaxWall MaxTRESMins QOS Def QOS > GrpTRESRunMin > -- -- -- -- - --- - > - --- - --- - -- > - --- - - > - > [...] >clusterdefault 1 > > normal >clusterdefaulttom1 > > normal > [...] > [root@login ~]# sacctmgr delete user name=tom account=default > Error with request: Job(s) active, cancel job(s) before remove > JobID = 15498 C = clusterA = defaultU = tom > JobID = 15500 C = clusterA = defaultU = tom > JobID = 15501 C = clusterA = defaultU = tom > JobID = 15502 C = clusterA = defaultU = tom > JobID = 15503 C = clusterA = defaultU = tom > JobID = 15504 C = clusterA = defaultU = tom > JobID = 15505 C = clusterA = defaultU = tom > JobID = 15506 C = clusterA = defaultU = tom > JobID = 15508 C = clusterA = defaultU = tom > JobID = 15509 C = clusterA = defaultU = tom > [root@login ~]# scontrol show jobid -dd 15500 > slurm_load_jobs error: Invalid job id specified > [root@login ~]# sacct -j 15500 >JobIDJobName PartitionAccount AllocCPUS State ExitCode > -- -- -- -- -- > 15500intel-test partitiondefault 48RUNNING 0:0 > > > Is there a "gold standard" way to repair this? I don't think there is a "gold standard" for this. You probably just have to go into the database an fix it yourself. A while ago I posted some code to fix anomalous jobs. It was intended to make the data plausible (e.g. by adding a missing completion date for a job with status "RUNNING" which no longer exists), and not for deleting jobs completely, but it might help: https://groups.google.com/forum/#!msg/slurm-devel/nf7JxV91F40/KUsS1AmyWRYJ Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Re: Incorrect handling of non-ASCII characters
Hi Gary, Gary Brown <gbr...@adaptivecomputing.com> writes: > Re: [slurm-dev] Re: Incorrect handling of non-ASCII characters > > Another option is to use the alternative spelling of Schroedinger, which is > perfectly acceptable German. Personally I don't think it is acceptable - and I think that here in Germany in most contexts it would be considered strange to replace umlauts. It might be OK in the area of HPC, but only because expectations of user-friendliness are quite low. In my view we are decades beyond the point where restricting characters to those available in the 7bit ASCII set is acceptable. Just image Italian had become the dominant language in the USA instead of English - Slurm might think your name is "Gari Brovvn". In fact, I would prefer incorrect justification with umlauts to correct justification without umlauts. [snip (57 lines)] Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Incorrect handling of non-ASCII characters
Hi, With Slurm 15.08.8, sreport does not handle non-ASCII characters in the 'Proper Name' column properly: Top 3 Users 2016-05-31T00:00:00 - 2016-05-31T23:59:59 (86400 secs) Use reported in TRES Minutes Cluster Login Proper Name Account Used Energy - - --- --- sorobanalbertEinstein physics 2304000 soroban erwinSchrödinger physics 2005690 sorobanwerner Heisenberg physics 1396800 The presence of the umlaut in 'Schrödinger' causes the name to be justified incorrectly. In addition, all the lines from the line with the column names to the final line of data have an additional space at the end of the line. The terminal space is not much of a problem, but it would be nice if justification problem could be fixed. Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Re: More tasks than allocated CPUs
Hi Yaron, Yaron Weitz <yar...@mail.huji.ac.il> writes: > slurm-dev > > Hi, > > I'm new to the Slurm. I have been working with it for the past 7 > months. We sometimes have a situation where a job generates more > tasks than the number of CPU's allocated to it. > I don't know if the cause is the code of the running job or something > to do with the use or configuration of the Slurm. We have a cluster > of Ubuntu 14.04 servers and slurm-llnl version 2.6.5-1 from the Ubuntu > repos. > > Thanks, > Yaron On our system this is usually a result of user error. Particularly if people don't use the environment variable ${SLURM_NTASKS} in their batch scripts, they may end up requesting a number of cores, but passing a different number to their MPI launcher for the number of processes to start. However, your version of Slurm is quite old, so it is conceivable that you are being bitten by a probably long-fixed bug. Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] BadConstraints - node list not recalculated
Hi, The 'Reason' field for a pending job has changed from 'Priority' to 'BadConstraints'. This seems to be because the status of one of the nodes in the node list reported by 'scontrol show job' has changed to 'draining'. The job itself just specifies the number of tasks required, not specific nodes. Shouldn't the scheduler just be able to replace the draining node with another node in the projected node list? This is happening with version 15.08.8. Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Re: How to get command of a running/pending job
Benjamin Redling <benjamin.ra...@uni-jena.de> writes: > On 05/17/2016 10:02, Loris Bennett wrote: >> >> Benjamin Redling >> <benjamin.ra...@uni-jena.de> writes: >> >>> On 2016-05-13 05:58, Husen R wrote: >>>> Does slurm provide feature to get command that being executed/will be >>>> executed by running/pending jobs ? >>> >>> scontrol show --detail job >>> or >>> scontrol show -d job >>> >>> Benjamin >> >> Which version does this? 15.08.8 just seems to show the 'Command' entry, >> which is the file containing the actual command. > > An older one. I see. I made the mistake before (squeue -n ...): > I assumed slurm commands/parameters don't change (all over the board). > > (Will I ever be able to depend on _any_ script I write, or any parameter > I thought I knew? > What other surprises will await me after an upgrade? And where are these > major changes documented? > Who thinks changing parameter semantics is a good idea?) > > This is not something were slurm shines. I haven't really been bitten by such changes. My main gripe with the Slurm tools is the inconsistency of the interfaces, e.g. output columns: squeue -o " %.18i" sacct -o jobid%18 or selection according to nodes squeue -w node001 sacct -N node001 This is obviously not a real problem, but it is a daily annoyance. So in that sense, I do think that changing the parameter semantics would be a good idea, but only once and only if the options become harmonised across all the tools! Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Re: How to get command of a running/pending job
Benjamin Redling <benjamin.ra...@uni-jena.de> writes: > On 2016-05-13 05:58, Husen R wrote: >> Does slurm provide feature to get command that being executed/will be >> executed by running/pending jobs ? > > scontrol show --detail job > or > scontrol show -d job > > Benjamin Which version does this? 15.08.8 just seems to show the 'Command' entry, which is the file containing the actual command. Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Re: General question - Where can I find Slurm docs in German?
Hi Brian, Brian Gilmer <bfgil...@gmail.com> writes: > General question - Where can I find Slurm docs in German? I don't think there is anything official, but I provide information in both English and German for the system I am involved in running: https://www.zedat.fu-berlin.de/HPC/SorobanQueueingSystem The documentation not very extensive, although it is extended occasionally, and is somewhat specific to our site, but it may be of help to you. Any mistakes in both the English and the German versions are probably mine. Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] squeue shows job running on node in state 'idle~'
Hi, I have a job shown as running by 'squeue': $ squeue -w node086 JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 1234567 main abcdef user1234 R 10-09:32:34 1 node086 However with 'sinfo' I can see that the node has been powered off: $ sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST test up3:00:00 2 idle~ node[001-002] main*up 14-00:00:0 1 idle~ node086 ... This is the second time I have seen this phenomenon since updating to version 15.08.8 a month ago. Is this a bug or can this just happen if a job just crashes in an odd enough way? Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Re: Fair share priority stopped working
Hi Nirmal, Nirmal Seenu <n19...@gmail.com> writes: > Fair share priority stopped working > > Hi, > > I just noticed that the fair share priority stopped working in the last few > days > and would appreciate any help in debugging this problem. I am running Slurm > version 14.11.11 on Centos 7.2. > > I am not sure when it stopped working but the only thing that I changed was > PriorityDecayHalfLife=00:10:00 and PriorityUsageResetPeriod=WEEKLY. The > following is the current values that I have set -- the initial value when fair > share was working fine: > > PriorityType=priority/multifactor > PriorityDecayHalfLife=00:01:00 I would think that this value for PriorityDecayHalfLife is much too short. The CPU-time usage will decay very rapidly, so there will be the contribution to the priority will be the similar both for heavy users and those who don't consume much CPU-time. I would guess you want a value more like a single-digit number of days. Cheers, Loris > PriorityUsageResetPeriod=NONE > PriorityWeightFairshare=1 > PriorityWeightAge=100 > PriorityWeightPartition=1 > PriorityWeightJobSize=1 > PriorityMaxAge=7-0 > > Everything seems to be fine on the database side: > > [root@tcs-bcm-1 ~]# sacctmgr list assoc tree > format=cluster,account,user,fairshare > Cluster Account User Share > -- -- - > slurm_clu+ root 1 > slurm_clu+ root root 1 > slurm_clu+ dev 50 > slurm_clu+ dev c 1 > slurm_clu+ r 1 > slurm_clu+ r a2 1 > slurm_clu+ r a1 1 > slurm_clu+ r b 1 > slurm_clu+ r d 1 > slurm_clu+ r e 1 > slurm_clu+ r j2 1 > slurm_clu+ r j1 1 > slurm_clu+ r m4 1 > slurm_clu+ r m3 1 > slurm_clu+ r m2 1 > slurm_clu+ r m1 1 > slurm_clu+ r r 1 > slurm_clu+ r s 1 > slurm_clu+ r t 1 > > [root@tcs-bcm-1 ~]# sprio -l | head > JOBID USER PRIORITY AGE FAIRSHARE JOBSIZE PARTITION QOS NICE > 1378456 j1 385 10 0 276 100 0 0 > 1378457 j1 385 10 0 276 100 0 0 > 1378458 j1 385 10 0 276 100 0 0 > > Relevant log entry when I restarted both slurmdbd and slurm: > /var/log/slurmctld: > [2016-03-22T17:47:13.533] Running as primary controller > [2016-03-22T17:47:13.533] Registering slurmctld at port 6817 with slurmdbd. > [2016-03-22T17:47:17.817] > SchedulerParameters=default_queue_depth=100,max_rpc_cnt=0,max_sched_time=2,partition_job_depth=0,sched_max_job_start=0,sched_min_interval=0 > > /var/log/slurmdbd: > [2016-03-22T17:46:53.733] Accounting storage MYSQL plugin loaded > [2016-03-22T17:46:53.735] error: chdir(/var/log): Permission denied > [2016-03-22T17:46:53.735] chdir to /var/tmp > [2016-03-22T17:46:53.744] slurmdbd version 14.11.11 started > [2016-03-22T17:46:57.010] DBD_JOB_START: cluster not registered > [2016-03-22T17:47:01.910] DBD_STEP_START: cluster not registered > > Thanks in advance for your help! > Nirmal > -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Re: What cluster provisioning system do you use?
Hi Bjørn-Helge, Bjørn-Helge Mevik <b.h.me...@usit.uio.no> writes: > I apologize for the slightly off-topic subject, but I could not think of > a better forum to ask. If you know of a more proper place to ask this, > I'd be happy to know about it. > > We are currently in the design fase for a new cluster that is going to > be set up next year. We have so far used Rocks (on top of CentOS) for > cluster provisioning. However, Rocks don't support CentOS >= 7, and it > doesn't look like it will in the near future. Also for other reasons, > we are looking for alternatives to Rocks. > > So, what are you using for cluster provisioning? > > - Rocks? > - A different provisioning tool? > - A locally developed solution? We currently use Bright Cluster Manager, but are looking to move away from this due to cost, lack of an update path from our current set-up, and the fact that the integration with Slurm locked us to version 2.2.7 for a long time until we decided to do without the integration and installed an up-to-date version. I am currently setting up a test cluster and shall be looking at - Warewulf - DRBL - maybe xCat I would also be interested in other options. Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Exporting environment variables by default?
Hi, I'm using 15.08.8. Am I correct in thinking that environment variables which are to be evaluated by a job must be passed via sbatch's option '--export' and that it is not possible to define variables centrally within the Slurm configuration? Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Re: User education tools for fair share
Hi Chris, Christopher Samuel <sam...@unimelb.edu.au> writes: > Hi folks, > > We've just migrated to fairshare and one of the things we've been > puzzling over is how to show users what their fairshare status is. > > With quotas it was pretty easy, we had a bar-graph showing how far > through the quarter they were, and another bar-graph per project that > showed the percentage of quota burnt so far this quarter. > > After 6 years of running like that it's hurting our heads to think > differently about how to display it. > > It's also complicated as we are using Fair Tree (thanks Ryan et. al!) > and so we think we should show users their priorities back up the tree. > > I'm even wondering if we should not worry about showing them that and > instead just educate them about the priority of queued jobs instead. > > How do other sites handle this? > > All the best, > Chris We use fairshare without Fair Tree and with all users having the same number of shares. Occasionally we have users complaining about the system being unfair, particularly when other users are able to profit from backfill. The problem is that users often just look at the number of jobs someone is able to run, regardless of the resources being used. To help the user understand their current fairshare/priority status, I usually point them to 'sprio', generally in the following incantation: sprio -l | sort -nk3 to get the jobs sorted by priority. Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Re: Fw:Slurm question for help
温圣召 <wenshengz...@yeah.net> writes: > Fw:Slurm question for help > > Dear Sir/Madam: > > I'm using slurm to build a small cluster。my munge,slurmctl,slurmdbd,slurmd > all run as root。 > I use srun submit jobs with --uid= opention. > r...@yq01-sys-hic-k4007.yq01.baidu.com matrixMulCUBLAS]# srun > --comment=wsz_111 - > -account="testAccount" -N1 --chdir=/home/wenshengzhao/ --uid=wenshengzhao . > /testbatch --- is work ok > == > wenshengz...@yq01-sys-hic-k4007.yq01.baidu.com matrixMulCUBLAS]$ srun - > -comment=wsz_111 --account="testAccount" -N1 --chdir=/home/wenshengzhao/ - > -uid=root ./testbatch -- can not work, error info as : srun: error: Unable > to allocate resources: Invalid user id > == > t...@yq01-sys-hic-k4007.yq01.baidu.com root]$ srun --comment=wsz_111 - > -account="testAccount" -N2 --chdir=/home/wenshengzhao/ --uid=wenshengzhao . > /testbatch -- can not work, error info as: srun: error: Unable to > allocate resources: Invalid user id > > how can I solve this problem? > > I am looking forward to your reply Have you added the user 'wenshengzhao' to the accounting information? If not, have a look at the "Database Configuration" section on the following page http://slurm.schedmd.com/accounting.html Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] squeue: Collapsing running array jobs?
Hi, I'm using Slurm 15.08.4 and in the man page for 'squeue' it says -r, --array Display one job array element per line. Without this option, the display will be optimized for use with job arrays (pending job array elements will be combined on one line of output with the array index values printed using a regular expression). Is there any way of having *running* job array elements collapsed to a single line per job? Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Re: Accounting
Hi Jeff, Jeff White <jeff.wh...@wsu.edu> writes: > I'm working on getting accounting set up on a new SLURM instance. The > cluster is working, slurmdbd is running, database is configured, sacct > spits out some job info, all appears to be working. Good, I built a > thing and it seems to work. Now the hard part: what do I do with it? > > * What exactly is an "account" in SLURM speak? We have well-defined > groups already and I don't want my users to need to specify an account > or anything of the such with their jobs. What do I need to do (if > anything) to have accounting use purely users and groups and no > manually-defined "accounts"? My understanding is that it is a collection of resource restrictions. If you have well-defined groups, then an account will correspond to a group. The account model is, however, more general, because, say, one person could run jobs in various projects which have all have different CPU-time budgets and/or priorities. However, I also just have research groups and they correspond 1-to-1 with my accounts. The accounts are arranged in a hierarchy (via the parent organisation property) which corresponds to the organigram of the university institutes and departments. If you are using fairshare, you then need to set the shares per entity in the organigram. As all our users are created equal, this means adding a user to a group, incrementing the shares of the group, incrementing the shares of the institute, and incrementing the shares of the department. When a user leaves the group, this obviously all has to be done is reverse. Because this is a bit of a chore and quite error prone, we use a wrapper around sacctmgr to automate this which is integrated into our user-lifecycle-management mechanism. > * The whole JobComp explanation in the documentation isn't clear to > me. What does accounting to slurmdbd /not/ provide that setting > JobComp to log elsewhere would? Why can't slurmdbd be used for > everything? It can. > Here's some parts of the config, let me know if you want more: > > # grep AccountingStorage /etc/slurm/slurm.conf > #AccountingStorageEnforce=0 > AccountingStorageHost=slurm-p1n01.mgmt.kamiak.example.edu > #AccountingStorageLoc= > #AccountingStoragePass= > #AccountingStoragePort= > AccountingStorageType=accounting_storage/slurmdbd > #AccountingStorageUser= > > # grep JobCompType /etc/slurm/slurm.conf > #JobCompType=jobcomp/slurmdbd If you are using AccountingStorageType=accounting_storage/slurmdbd my understanding is that you don't need to set JobComp, as this provides only a subset of the data you get from accounting storage. HTH Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Re: Backfill parameters
Hi Ulf, Ulf Markwardt <ulf.markwa...@tu-dresden.de> writes: > Dear all, > > I have a problem with a large reservation in a few hours, ~1700 > long-running jobs waiting to start afterwards and my short job (srun -t > 1 hostname) with priority of 1 that would fill any gap... > > > "sdiag" always shows a value of about 100 as "Last depth cycle" for > backfilling. Does that mean that it only looks at the first 100 jobs? > I thought, bf_continue should take care of this, so that the next > backfilling test starts where the last has finished. > > At the moment we have 15.08.6 running with: > SchedulerParameters=bf_interval=30,bf_max_job_test=2000,bf_window=7200,default_queue_depth=5000,bf_continue,sched_interval=120,defer > (Some values might be too high for production, but I was desperate to ge > my job running...) Is your bf_window at least as large as the timelimit on the partition in question? If not, see the info about bf_window on the slurm.conf manpage. > Can anybody give me a hint on how to change this so that my low priority > job gets scheduled? > > Thanks a lot, > Ulf > > PS. As soon as I give this job a Nice=-200 it starts, but that is not > the way I want it :-) Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Question mark in MaxRSS
Hi, With version 15.08.4, 'sacct' gives me values of MaxRSS which contain '16?': $ sacct -o jobid,maxrss,state -S 2015-01-29T09:00 -E 2015-01-29T10:00 -s CD JobID MaxRSS State -- -- 612354 16? COMPLETED 612354.batch 13722480K COMPLETED 613334 16? COMPLETED 613334.batch179580K COMPLETED 613337 16? COMPLETED 613337.batch 8776K COMPLETED 613337.0 3772344K CANCELLED+ This also applies to jobs run under older versions of Slurm. As far as I recall the fields used to be empty, as they are when the option '-X' is given: $ sacct -o jobid,maxrss,state -S 2015-01-29T09:00 -E 2015-01-29T10:00 -s CD -X JobID MaxRSS State -- -- 612354 COMPLETED 613334 COMPLETED 613337 COMPLETED Is this a bug in 'sacct' or do I have a local issue? Regards Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Re: sreport/sacct: discrepancy between utilization and CPUTime
Hi Carlos, Carlos Fenoy <mini...@gmail.com> writes: > Re: [slurm-dev] sreport/sacct: discrepancy between utilization and CPUTime > > Hi Loris, > > Can you check when did the job actually started or ended? It may be that the > job > spans 2 days, and that is the reason the the sreport is reporting less time. Yes, you are correct. If I choose a start time before the jobs started, I get the same result with sreport as I do with sacct: $ sreport cluster UserUtilizationByAccount user=bsp start=2016-01-14T21:00:00 end=2016-01-15T04:30:00 -t hours Cluster/User/Account Utilization 2016-01-14T21:00:00 - 2016-01-15T04:59:59 (28800 secs) Use reported in TRES Hours Cluster Login Proper Name Account Used Energy - - --- --- -- -- soroban bspBeispiel agexample 6 0 Thanks for the hint, Loris > Regards, > Carlos > > On Fri, Jan 22, 2016 at 9:31 AM, Loris Bennett <loris.benn...@fu-berlin.de> > wrote: > > Hi, > > Using version 15.08.4 I am looking at the value 'Used' from sreport and > comparing this with the corresponding 'CPUTime' from sacct: > > $ sreport cluster UserUtilizationByAccount user=bsp start=2016-01-15 -t > hours > > > > Cluster/User/Account Utilization 2016-01-15T00:00:00 - 2016-01-21T23:59:59 > (604800 secs) > Use reported in TRES Seconds > > > > Cluster Login Proper Name Account Used Energy > - - --- --- -- -- > soroban bsp Beispiel agexample 4 0 > > $ sacct -S 2016-01-15 -u bsp -o jobid,cputime,state > JobID CPUTime State > -- -- > 954088 05:49:45 COMPLETED > 954088.batch 05:49:45 COMPLETED > > Rounding aside, why is the 'Used' value given by report lower than > 'CPUTime' given by sacct? > > Regards > > Loris > > -- > Dr. Loris Bennett (Mr.) > ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] sreport/sacct: discrepancy between utilization and CPUTime
Hi, Using version 15.08.4 I am looking at the value 'Used' from sreport and comparing this with the corresponding 'CPUTime' from sacct: $ sreport cluster UserUtilizationByAccount user=bsp start=2016-01-15 -t hours Cluster/User/Account Utilization 2016-01-15T00:00:00 - 2016-01-21T23:59:59 (604800 secs) Use reported in TRES Seconds Cluster Login Proper Name Account Used Energy - - --- --- -- -- soroban bspBeispiel agexample 4 0 $ sacct -S 2016-01-15 -u bsp -o jobid,cputime,state JobIDCPUTime State -- -- 954088 05:49:45 COMPLETED 954088.batch 05:49:45 COMPLETED Rounding aside, why is the 'Used' value given by report lower than 'CPUTime' given by sacct? Regards Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de