[slurm-dev] Re: Scripting Account Provisioning

2016-06-13 Thread Thomas M. Payerle
I have a couple of Perl packages available on CPAN Slurm::Sacctmgr Slurm::Sshare to provide wrappers around the sacctmgr and sshare commands, respectively. These will parse the output of query subcommands ( e.g. sacctmgr show XXX) and present as Perl-ish data structures, as

[slurm-dev] Re: Interesting unexpected consequences

2016-07-19 Thread Thomas M. Payerle
Sacctmgr usage can be a bit tricky. Although there are "user", "account", etc. entities that can be manipulated with sacctmgr, much of the time you are dealing with "association" entities. And although sacctmgr will allow you to view associations, modifying associations is handled by

[slurm-dev] Re: Oversubscription and running job priority

2016-08-03 Thread Thomas M. Payerle
On Tue, 2 Aug 2016, Joshua Baker-LePain wrote: On Mon, 1 Aug 2016 at 1:16pm, Thomas M. Payerle wrote So jobs of "paying" customers can cut ahead of jobs of "non-paying" and "scavenger" in the queue, and will bump "scavenger" jobs. They will

[slurm-dev] Re: Setting a partition QOS, etc

2017-02-01 Thread Thomas M. Payerle
If you decide to go the single partition model, you can use the "Weight" parameter in slurm.conf to cause the standard nodes to be preferentially used to the high-mem and GPU nodes. So jobs only end up on high-mem or GPU nodes if they requested a lot of memory or a GPU, or if the cluster is

[slurm-dev] Re: Accounting and limits

2017-02-13 Thread Thomas M. Payerle
I also have a bunch of Perl packages for wrappers around sshare, and sacctmgr on CPAN. The sshare package comes with an "sbalance" script which presents the sshare info in an user friendly format and will work with common accounting setups. See

[slurm-dev] Re: Accounting and limits

2017-02-13 Thread Thomas M. Payerle
Yes, I agree with you. The TRES stuff is nice, but seems to be added somewhat quickly, and in particular I do not see anyway to view that information. I am looking forward to applying your sshare patch. (I attempted once to write a patch similar to yours but had issues I could not resolve

[slurm-dev] Re: sinfo like behavior from scontrol

2016-10-12 Thread Thomas M. Payerle
Something like sacctmgr show user withassoc -p user=USERNAME --noheader | awk -F'|' '{ print $6}' should give you what you want. On Wed, 12 Oct 2016, Eggleston, Nicholas J. wrote: I?ve got kind of a weird question for you all. I?m in a place where I need to know a list of all the

[slurm-dev] Re: Node selection for serial tasks

2016-12-08 Thread Thomas M. Payerle
I've not used it, but when looking at srun man page (for something else) notices a -r / --relative option which sounds like it might be what you are looking for. On Thu, 8 Dec 2016, Nigella Sanders wrote: Hi Michael, Actually, that was my first try but it didn't work. ? srun finds it

[slurm-dev] Re: Node selection for serial tasks

2016-12-09 Thread Thomas M. Payerle
done wait Setting -r2 will confine al the tasks in torus6003, as expected. Regards, Nigella 2016-12-08 15:27 GMT+01:00 Thomas M. Payerle <paye...@umd.edu>: I've not used it, but when looking at srun man page (for something else) notices a -r / --relative option which sounds

[slurm-dev] Re: uniqueuser or process cleanup for shared nodes.

2016-12-21 Thread Thomas M. Payerle
I am unaware of anything analogous to 'uniqueuser' but cleanup is not that bad. We have prolog and epilog scripts to handle it. The prolog script creates a "jobtrack" directory if not already there, and creates a file named after jobid with uid of user running the job (e.g. echo $SLURM_JOB_UID

[slurm-dev] Re: Re:Best Way to Schedule Jobs based on predetermined Lists

2017-04-10 Thread Thomas M. Payerle
On 2017-04-05 16:00, maviko.wag...@fau.de wrote: Hello Dani and Thomas, [ ...] However i think i did not specify clear enough what my cluster looks like and what i'm trying to achieve. Compared to a regular HPC-Cluster my testing cluster consists of as little as 5 nodes (each having the

[slurm-dev] Re: Best Way to Schedule Jobs based on predetermined Lists

2017-04-04 Thread Thomas M. Payerle
You can define nodes with "features", and then at job submission time require specific features. E.g., if you had some nodes with ethernet only, some with QDR infiniband, and some with FDR ib, you could define the QDR and FDR nodes to have the feature qdr or fdr, respectively. Then, e.g. a

[slurm-dev] Re: integrating slurm limits with project and organization information

2017-07-18 Thread Thomas M. Payerle
Basically, you create an allocation account for each of the organizations and projects, with the projects having as "parent" the allocation account corresponding to the organization it belongs to. Then set the appropriate limits on the allocation accounts, and create associations for users

[slurm-dev] Re: #SBATCH --time= not always overriding default?

2017-06-30 Thread Thomas M. Payerle
Also, I believe the #SBATCH ... flags in a script must come before any non-blank/non-comment lines in the script file in order to be honored by Slurm. So something like #!/bin/bash WDIR=$PWD #SBATCH -t 1:00 the -t 1:00 will get ignored by sbatch On Thu, 29 Jun 2017, Lachlan Musicman wrote:

[slurm-dev] Re: Purpose of requesting nodes in SLURM

2017-06-19 Thread Thomas M. Payerle
Your confusion seems to be stemming from misunderstanding job arrays, not the --node and --ntasks-per-node options, Job arrays are basically a short cut for submitting large numbers of similar jobs at once. The "sbatch array.sub" command basically submits 4 jobs, each job being allocated 1

[slurm-dev] Re: Is PriorityUsageResetPeriod really required for hard limits?

2017-10-04 Thread Thomas M. Payerle
On Tue, 3 Oct 2017, Christopher Samuel wrote: On 29/09/17 06:34, Jacob Chappell wrote: Hi all. The slurm.conf documentation says that if decayed usage is disabled, then PriorityUsageResetPeriod must be set to some value. Is this really true? What is the technical reason for this requirement

[slurm-dev] RE: Node always going to DRAIN state with reason=Low TmpDisk

2017-10-10 Thread Thomas M. Payerle
I think Uwe was on the right track. It looks to me like the problem node is somehow thinking TmpFS=/tmp rather than /local/scratch. That seems to be consistent with what is being reported (TmpDisk = 500 ). I would check the slurm.conf/scontrol show config output on the problem node and

[slurm-dev] Re: Setting up Environment Modules package

2017-10-05 Thread Thomas M. Payerle
On Wed, 4 Oct 2017, Mike Cammilleri wrote: Hi Everyone, I'm in search of a best practice for setting up Environment Modules for our Slurm 16.05.6 installation (we have not had the time to upgrade to 17.02 yet). We're a small group and had no explicit need for this in the beginning, but as