[slurm-dev] Re: Wrong device order in CUDA_VISIBLE_DEVICES

2017-11-03 Thread Kilian Cavalotti
Hi Malk, On Fri, Nov 3, 2017 at 2:14 AM, Maik Schmidt wrote: > It is my understanding that when ConstrainDevices is not set to "yes", SLURM > uses the so called "Minor Number" (nvidia-smi -q | grep Minor) that is the > number in the device name (/dev/nvidia0 -> ID 0

[slurm-dev] Re: CPU/GPU Affinity Not Working

2017-10-30 Thread Kilian Cavalotti
Hi Dave, On Fri, Oct 27, 2017 at 3:57 PM, Dave Sizer wrote: > Kilian, when you specify your CPU bindings in gres.conf, are you using the > same IDs that show up in nvidia-smi? Yes: $ srun -p gpu -c 4 --gres gpu:1 --pty bash sh-114-01 $ cat /etc/slurm/gres.conf name=gpu

[slurm-dev] Re: CPU/GPU Affinity Not Working

2017-10-27 Thread Kilian Cavalotti
On Fri, Oct 27, 2017 at 12:45 PM, Dave Sizer wrote: > Also, supposedly adding the "--accel-bind=g" option to srun will do this, > though we are observing that this is broken and causes jobs to hang. > > Can anyone confirm this? Not really, it doesn't seem to be hanging for

[slurm-dev] Re: CPU/GPU Affinity Not Working

2017-10-27 Thread Kilian Cavalotti
Hi Michael, On Fri, Oct 27, 2017 at 4:44 AM, Michael Di Domenico wrote: > as an aside, is there some tool which provides the optimal mapping of > CPU id's to GPU cards? We use nvidia-smi: -- 8<

[slurm-dev] Re: CPU/GPU Affinity Not Working

2017-10-26 Thread Kilian Cavalotti
Hi Dave, On Wed, Oct 25, 2017 at 9:23 PM, Dave Sizer wrote: > For some reason, we are observing that the preferred CPUs defined in > gres.conf for GPU devices are being ignored when running jobs. That is, in > our gres.conf we have gpu resource lines, such as: > > Name=gpu

[slurm-dev] Re: Per-job tmp directories and namespaces

2017-08-14 Thread Kilian Cavalotti
On Thu, Aug 10, 2017 at 10:31 AM, Kilian Cavalotti <kilian.cavalotti.w...@gmail.com> wrote: > Do you use cgroups in your Slurm setup with pam_systemd on nodes? And > if so, did you notice any issue with cgroups? For what it's worth, I just checked again with Slurm 17.02 and CentOS

[slurm-dev] Re: Per-job tmp directories and namespaces

2017-08-10 Thread Kilian Cavalotti
Hi Bill, On Thu, Aug 10, 2017 at 5:33 AM, Bill Barth wrote: > If you add the same line from /etc/pam.d/system-auth (or your OS’s > equivalent) to /etc/pam.d/slurm, then srun- and sbatch-initiated shells and > processes will also have the directory properly set up.

[slurm-dev] Re: Controlling the output of 'scontrol show hostlist'?

2017-06-23 Thread Kilian Cavalotti
Hi Ole, On Fri, Jun 23, 2017 at 1:26 AM, Ole Holm Nielsen wrote: > Yes, ClusterShell has indeed lots of features and compares favorably to > PDSH. I've added a brief description in my Slurm Wiki > https://wiki.fysik.dtu.dk/niflheim/SLURM#clustershell, please comment

[slurm-dev] Re: Controlling the output of 'scontrol show hostlist'?

2017-06-22 Thread Kilian Cavalotti
On Thu, Jun 22, 2017 at 2:11 AM, Kent Engström wrote: > slighly off topic, but if you are willing to install and use an external > program that is not part of SLURM itself, I might perhaps be allowed to > advertise the python-hostlist package? And I'd like to also advertise the

[slurm-dev] Re: Looking for distributions of wait times for jobs submitted over the past year

2017-06-15 Thread Kilian Cavalotti
Hi Barry, On Thu, Jun 15, 2017 at 9:16 AM, Barry Moore wrote: > Does anyone have a script or knowledge of how to query wait times for Slurm > jobs in the last year or so? With the help of histogram.py from https://github.com/bitly/data_hacks, you can have a one-liner: $

[slurm-dev] Re: Some quesitons about using slurm to depatch GPU jobs

2017-04-19 Thread Kilian Cavalotti
On Wed, Apr 19, 2017 at 12:52 AM, gaoxinglong9...@163.com wrote: > Hi: > I have 5 nodes and there are 4 K80 NVIDA GPU on each of them. Now I want to > use 6 gpu to execute some tasks, and I want to 4 of the first node and 2 of > the second node, Can I do this by using

[slurm-dev] Re: LDAP required?

2017-04-13 Thread Kilian Cavalotti
Hi Janne, On Thu, Apr 13, 2017 at 1:32 AM, Janne Blomqvist wrote: > Should work as of 16.05 unless you have some very peculiar setup. IIRC I > submitted some patch to get rid of the enumeration entirely, but > apparently SchedMD has customers who have multiple groups

[slurm-dev] Re: How to allow Epilog script to run for job that is cancelled

2017-04-13 Thread Kilian Cavalotti
Hi Wensheng, On Thu, Apr 13, 2017 at 6:23 AM, Wensheng Deng wrote: > Hi, several months ago when I started learning Slurm and reading through the > web pages, I made this picture to help myself understanding the *prolog and > *epilog interactions with job steps. Please see the

[slurm-dev] Re: SLURM reports much higher memory usage than really used

2016-12-15 Thread Kilian Cavalotti
On Thu, Dec 15, 2016 at 11:47 PM, Douglas Jacobsen wrote: > > There are other good reasons to use jobacct_gather/cgroup, in particular if > memory enforcement is used, jobacct_gather/linux will cause a job to be > terminated if the summed memory exceeds the limit, which is OK

[slurm-dev] Re: Restrict users to see only jobs of their groups

2016-11-01 Thread Kilian Cavalotti
On Tue, Nov 1, 2016 at 7:34 AM, Taras Shapovalov wrote: > Yeah, PrivateData does not really help here. Would be useful to see in the > future such PrivateData options like 'group' or 'account'. I'd suggest submitting this as an enhancement request at

[slurm-dev] Re: About gres.conf configuration for all compute nodes

2016-08-17 Thread Kilian Cavalotti
I guess the best way is to have the same file on all the nodes. Even if each node has a different Gres configuration, by using the NodeName= parameter, each node will find its own relevant configuration in the file. Much easier to manage this way. Cheers, -- Kilian

[slurm-dev] Re: About gres.conf configuration for all compute nodes

2016-08-17 Thread Kilian Cavalotti
Hi Giuseppe, It needs to be on every single node in your Slurm cluster: http://slurm.schedmd.com/gres.conf.html """ DESCRIPTION gres.conf is an ASCII file which describes the configuration of generic resources on each compute node. Each node must contain a gres.conf file if generic resources

[slurm-dev] Re: srun hanging when requesting --gres after cgroups configured, 15.08, CentOS 7

2016-08-12 Thread Kilian Cavalotti
On Fri, Aug 12, 2016 at 2:26 PM, Ryan Novosielski wrote: > Curious, to anyone reading this — anyone think that this could be the same > bug: https://bugs.schedmd.com/show_bug.cgi?id=2493 Do you see any sign of a segfault in the node's dmesg? Cheers, -- Kilian

[slurm-dev] Re: srun hanging when requesting --gres after cgroups configured, 15.08, CentOS 7

2016-08-12 Thread Kilian Cavalotti
On Thu, Aug 11, 2016 at 9:44 PM, Ryan Novosielski wrote: > [pid 11767] > open("/sys/fs/cgroup/devices/slurm/uid_109366/job_5377709/devices.allow", > O_WRONLY) = 10 > [pid 11767] write(10, "501835 rwm", 10) = -1 EINVAL (Invalid argument) > [pid 11767] close(10)

[slurm-dev] Re: srun hanging when requesting --gres after cgroups configured, 15.08, CentOS 7

2016-08-11 Thread Kilian Cavalotti
On Thu, Aug 11, 2016 at 12:46 PM, Ryan Novosielski wrote: > I’ll try adding the Gres debugging, but is there some way to figure out what > this alleged device “819275” is (this number will change with each job). Weird, indeed. /dev/nv* devices should be 195:x, and slurmd

[slurm-dev] Re: srun hanging when requesting --gres after cgroups configured, 15.08, CentOS 7

2016-08-11 Thread Kilian Cavalotti
Hi Ryan, You probably shouldn't have /dev/nvidia* devices listed in cgroup_allowed_devices_file.conf, Slurm will automatically manage them and add them to the list of authorized devices in a cgroup when a job starts. For reference, our cgroup_allowed_devices_file.conf contains this: /dev/null

[slurm-dev] Re: CUDA_VISIBLE_DEVICES always set to 0

2016-06-21 Thread Kilian Cavalotti
Hi Tom, On Tue, Jun 21, 2016 at 5:44 AM, Tom Deakin wrote: > I’m having trouble getting SLURM to choose the 2nd GPU on this node. > If I then run srun --gres=gpu:gtx580 I get CUDA_VISIBLE_DEVICES=0 > If I also run srun --gres=gpu:gtx680 I get CUDA_VISIBLE_DEVICES=0

[slurm-dev] Re: Difficulty using reboot_nodes or similar for maintenance, SLURM 15.08

2016-05-10 Thread Kilian Cavalotti
On Tue, May 10, 2016 at 9:54 AM, Ryan Novosielski wrote: > Problem is that the nodes come back to service without running NHC (and > are idle for the number of seconds required to be assigned work, > whatever tiny amount that is). There's a SLURM 16.05 fix to make sure >

[slurm-dev] Re: Difficulty using reboot_nodes or similar for maintenance, SLURM 15.08

2016-05-10 Thread Kilian Cavalotti
HI there, Did you guys try to use LBNs NHC? https://github.com/mej/nhc You can set it up to check filesystem mounts, and Slurm has hooks to run it to verify that nodes are ready for production. Cheers, -- Kilian

[slurm-dev] Re: squeue and nodelist format

2016-05-09 Thread Kilian Cavalotti
HI Nicholas, On Mon, May 9, 2016 at 6:33 AM, Eggleston, Nicholas J. wrote: > Given that I'm writing in Python right now, Michael's solution looks freaking > brilliant. Python really is the best all around language, my apologies to C. I should have mentioned it, but

[slurm-dev] Re: squeue and nodelist format

2016-05-07 Thread Kilian Cavalotti
And there's the awesome clustershell [1], which outperforms (and has more features than) anything I know of. It can do this, and much more: $ nodeset -e edrcompute-42-[12-14,16] edrcompute-42-12 edrcompute-42-13 edrcompute-42-14 edrcompute-42-16 $ nodeset -e -S '\n' edrcompute-42-[12-14,16]

[slurm-dev] Re: dynamic gres scheduling

2016-04-15 Thread Kilian Cavalotti
Hi Daniel, On Wed, Feb 3, 2016 at 3:33 AM, Daniel Letai wrote: > The question is - does slurm also use the dev files to track the > availability of the cards? > > I do not wish to drain any nodes with failing cards - just let slurm know > about this dynamically so jobs

[slurm-dev] Re: Patch for health check during slurmd start

2016-03-02 Thread Kilian Cavalotti
On Wed, Mar 2, 2016 at 10:12 AM, wrote: > We want to introduce a new behavior in the way slurmd uses the > HealthCheckProgram. The idea is to avoid a race condition between the first > HealthCheckProgram run and the node accepting jobs. The slurmd daemon will > initialize and

[slurm-dev] Re: GRES for both K80 GPU's

2016-02-11 Thread Kilian Cavalotti
Hi Michael, This is not currently possible, but there is a feature request for this feature. See http://bugs.schedmd.com/show_bug.cgi?id=1725 for details Cheers, -- Kilian

[slurm-dev] Re: Need for recompiling openmpi built with --with-pmi?

2016-02-04 Thread Kilian Cavalotti
Hi all, I would like to revive this old thread, as we've been bitten by this also when moving from 14.11 to 15.08. On Mon, Oct 5, 2015 at 4:38 AM, Bjørn-Helge Mevik wrote: > We have verified that we can compile openmpi (1.8.6) against slurm > 14.03.7 (with the .la files

[slurm-dev] Re: Need for recompiling openmpi built with --with-pmi?

2016-02-04 Thread Kilian Cavalotti
On Thu, Feb 4, 2016 at 2:56 PM, wrote: > They have already been removed for the next major release (version 16.05). > See: > https://github.com/SchedMD/slurm/commit/a49ce346ff1deda34865da45f9958df23158dff7 Very good, thanks Moe! Cheers, -- Kilian

[slurm-dev] Re: srun mytask & when using cgroups

2015-12-16 Thread Kilian Cavalotti
Hi Ewan, My bet is that one of the job resources is entirely consumed by the first step, so the 2nd one waits in queue. It's likely memory, maybe you have a DefMemPerCpu setting in your slurm.conf? You can try to request say 4G for your whole job and then 2G for each srun steps, they should both

[slurm-dev] Re: Combing fair share with GrpCPUMins caps

2015-11-30 Thread Kilian Cavalotti
Hi Chris, On Sun, Nov 29, 2015 at 9:43 PM, Christopher Samuel wrote: > We're looking at seeing if we can combine fair share with our existing > quota system that uses GrpCPUMins. > > However, for fair share a decay factor is strongly suggested and I worry > that there is

[slurm-dev] Re: Limiting the number of nodes per user

2015-11-20 Thread Kilian Cavalotti
Hi Vsevolod, On Tue, Nov 17, 2015 at 11:43 PM, Vsevolod Nikonorov wrote: > Is it possible to limit a number of nodes allocated simultaneously to a given > user? Yes, that would be the MaxNodesPerUser QOS limit. Please note that there's some limitations here: if 2 jobs

[slurm-dev] Re: Disk I/O as consumable?

2015-09-10 Thread Kilian Cavalotti
On Tue, Sep 8, 2015 at 5:01 AM, Marcin Stolarek wrote: > using specified mountpoint, but... thats not real IOPS threshold. Currently > I don't how any linux mechanism that allows limitting process to specified > number of I/O operations per second. At our side we've

[slurm-dev] Re: Disk I/O as consumable?

2015-09-10 Thread Kilian Cavalotti
Oh I don't think it's specific to Lustre. You can limit I/O on any kind of filesystem using the blkio controller in a cgroup. See https://fritshoogland.wordpress.com/2012/12/15/throttling-io-with-linux/ for example. Cheers, -- Kilian

[slurm-dev] Re: sreport inconsistency

2015-07-28 Thread Kilian Cavalotti
Hi Eric, If you use slurmdbd, that usually means you have runaway jobs in the Slurm DB, ie. jobs that are not running anymore (don's show up in squeue), but don't have an end date and/or are still considered running in sacct. Phil Eckert posted a perl script to detect such jobs some time ago:

[slurm-dev] Re: CUDA_VISIBLE_DEVICES always set to 0 and mismatch with cgroups

2015-06-09 Thread Kilian Cavalotti
On Tue, Jun 9, 2015 at 6:19 AM, Roche Ewan ewan.ro...@epfl.ch wrote: It’ll be interesting to see how many codes break if we get than chance to change the 0 based numbering in a future CUDA release. All of them? ;) AFAIK, the current idea is to provide a switch that would allow to choose

[slurm-dev] Re: CUDA_VISIBLE_DEVICES always set to 0 and mismatch with cgroups

2015-06-08 Thread Kilian Cavalotti
Hi Ewan, On Mon, Jun 8, 2015 at 2:39 AM, Roche Ewan ewan.ro...@epfl.ch wrote: The underlying problem seems to be that SLURM isn’t correctly setting CUDA_VISIBLE_DEVICES to match the device allowed by the cgroup. Slurm actually does the right thing. The real culprit here is the NVML. So for

[slurm-dev] Re: Slurm and docker/containers

2015-05-20 Thread Kilian Cavalotti
Hi Michael, On Wed, May 20, 2015 at 7:37 AM, Michael Jennings m...@lbl.gov wrote: Unfortunately the demand for Docker is growing rapidly, largely due to papers such as this one: http://arxiv.org/pdf/1410.0846.pdf which tout Docker images as a prudent deliverable for research scientists

[slurm-dev] Re: Slurm and docker/containers

2015-05-19 Thread Kilian Cavalotti
On Tue, May 19, 2015 at 9:28 AM, Michael Jennings m...@lbl.gov wrote: What Chris is asking for, I *think*, is what we're looking for as well -- anyone who has figured out a way to allow users to execute jobs inside user-supplied (or at least user-specified) Docker containers. It would be

[slurm-dev] Re: Slurm and docker/containers

2015-05-19 Thread Kilian Cavalotti
Hi David, On Tue, May 19, 2015 at 1:40 PM, David Bigagli da...@schedmd.com wrote: You can create a user inside a docker machine just like any other and then just ssh to it. You can, but nothing forces you to. :) I guess it's a matter or how much you trust your users, then. Cheers, -- Kilian

[slurm-dev] Re: Nested cgroup messages

2015-03-17 Thread Kilian Cavalotti
Hi Bjørn-Helge, I think this was fixed in commit 837c360 [1], which is in 14.11.x versions. [1] https://github.com/SchedMD/slurm/commit/837c360f671142f36a434235a7c8488631e481de Cheers, -- Kilian On Tue, Mar 17, 2015 at 6:50 AM, Bjørn-Helge Mevik b.h.me...@usit.uio.no wrote: While testing

[slurm-dev] Re: Array job: get number of array tasks in batch script

2015-03-14 Thread Kilian Cavalotti
On Sat, Mar 14, 2015 at 6:26 AM, Jason Bacon jwba...@tds.net wrote: How about SLURM_ARRAY_MIN_TASK_ID, SLURM_ARRAY_MAX_TASK_ID and SLURM_ARRAY_NUM_TASKS? That looks even better. I kind of overlooked what Moe pointed out, that tasks ids don't have to be continuous. So max_id and num_tasks

[slurm-dev] Re: Array job: get number of array tasks in batch script

2015-03-13 Thread Kilian Cavalotti
On Fri, Mar 13, 2015 at 6:34 PM, Jason Bacon jwba...@tds.net wrote: by creating a new variable such as SLURM_ARRAY_NUM_TASKS? On Fri, Mar 13, 2015 at 7:17 PM, Moe Jette je...@schedmd.com wrote: How about this for a name? SLURM_ARRAY_MAX_TASK_ID I like SLURM_ARRAY_NUM_TASKS better, it's more

[slurm-dev] Re: Single gres.conf file and multiple GPUs

2015-01-14 Thread Kilian Cavalotti
Hi Jared, On Wed, Jan 14, 2015 at 2:14 PM, Jared David Baker jared.ba...@uwyo.edu wrote: NodeName=loren[01-60] Name=gpu Type=k20x File=/dev/nvidia[0-3] I don't think you can aggregate multiple GPUs on a single line (at least that was the case in 14.03). So you would have to split it up over 4

[slurm-dev] Re: totalCpuAllocateTime or totalNodeTime etc.

2014-12-26 Thread Kilian Cavalotti
Hi Sefa, This is not currently implemented, but it's being discussed here: http://bugs.schedmd.com/show_bug.cgi?id=858 Cheers, -- Kilian

[slurm-dev] Re: scontrol update partition

2014-11-06 Thread Kilian Cavalotti
Hi Sefa, On Thu, Nov 6, 2014 at 1:04 AM, Sefa Arslan sefa.ars...@tubitak.gov.tr wrote: In order to update the node list of a partition, I use a command like scontrol update partition=part1 nodes=node[A-B,F,K-H,...] Is there a way to add/remove a single node from a partition without

[slurm-dev] Re: Trouble with Prolog scripts

2014-10-07 Thread Kilian Cavalotti
Hi Ian, That doesn't answer your question about prolog scripts, but for that sort of checks, we use NHC (http://warewulf.lbl.gov/trac/wiki/Node%20Health%20Check). It integrates very well with Slurm and provides all sorts of ready-to-use checks. Cheers, -- Kilian

[slurm-dev] Re: Implementing fair-share policy using BLCR

2014-09-23 Thread Kilian Cavalotti
Hi, On Tue, Sep 23, 2014 at 7:18 AM, Yann Sagon ysa...@gmail.com wrote: To lower the problem of having to deal with two queues, you can specify the two queues like that when you submit a job : --partition=queue1,queue2 and the first one that is free is selected. You can even define an env

[slurm-dev] Re: Building MySQL support within RPMs

2014-09-12 Thread Kilian Cavalotti
Hi Brian, On Fri, Sep 12, 2014 at 9:58 AM, Brian B for...@gmail.com wrote: I am trying to setup my slurm setup to use MySQL. I installed via pre-compiled RPMs but I am having trouble actually loading the plugin as it isn’t being installed from he RPMs I currently have. I see documentation

[slurm-dev] Re: Intel MPI Performance inconsistency (and workaround)

2014-08-21 Thread Kilian Cavalotti
Hi Jesse, Just a shot in the dark, but do you use task affinity or CPU binding? Cheers, -- Kilian

[slurm-dev] Re: cgroup freezer throwing Device or resource busy upon job cancel or kill - 14.03.6

2014-08-13 Thread Kilian Cavalotti
On Tue, Aug 12, 2014 at 6:56 PM, Trey Dockendorf treyd...@tamu.edu wrote: This is slurm-14.03.6 running CentOS 6.5 kernel 2.6.32-431.23.3.el6.x86_64 Exact same behavior here, same Slurm version and same kernel. Cheers, -- Kilian

[slurm-dev] Re: cgroup freezer throwing Device or resource busy upon job cancel or kill - 14.03.6

2014-08-13 Thread Kilian Cavalotti
On Wed, Aug 13, 2014 at 10:00 AM, David Bigagli da...@schedmd.com wrote: For some reason at the first attempt rmdir(2) returns EBUSY. Would writing to memory.force_empty before calling rmdir() help? See http://lxr.free-electrons.com/source/Documentation/cgroups/memory.txt?v=2.6.32#L269

[slurm-dev] Re: SLURM_PTY_WIN_COL=62, not resizing

2014-07-09 Thread Kilian Cavalotti
On Tue, Jul 8, 2014 at 1:54 PM, je...@schedmd.com wrote: It looks like two places needed trivial changes (changed from 8 to 16 bit fields). See: https://github.com/SchedMD/slurm/commit/9bd58eec0b511fb7e054ca87dcb0a65938253f5f Thanks! I get this will be in 14.03.5? Cheers, -- Kilian

[slurm-dev] MaxNodesPU acts as MaxCPUsPU in QOS

2014-03-11 Thread Kilian Cavalotti
Hi all, I'm currently seeing a behavior I don't understand using MaxNodesPerUser in a QoS setting. The sacctmgr(1) documentation states: SPECIFICATIONS FOR QOS [...] MaxNodesPerUser Maximum number of nodes each user is able to use. I'm using the following QOS: # sacctmgr