[slurm-users] wckey specification error

2018-04-30 Thread Mahmood Naderan
Hi, I can not figure out why the following mpi script failed to start. [siadati@rocks7 ~]$ sacctmgr list association format=partition,account,user,grptres | grep siadati othersem1siadati cpu=6,mem=8G [siadati@rocks7 ~]$ cat slurm_script.sh #!/bin/bash #SBATCH --output=test.out

Re: [slurm-users] Jobs escaping cgroup device controls after some amount of time.

2018-04-30 Thread Andy Georges
> On 30 Apr 2018, at 22:37, Nate Coraor wrote: > > Hi Shawn, > > I'm wondering if you're still seeing this. I've recently enabled task/cgroup > on 17.11.5 running on CentOS 7 and just discovered that jobs are escaping > their cgroups. For me this is resulting in a lot of

Re: [slurm-users] Jobs escaping cgroup device controls after some amount of time.

2018-04-30 Thread Nate Coraor
Nevermind - it appears to happen when puppet runs. I have no hand in that, so I'll kick it to those admins and report back with what I find. I ruled out slurm by simply creating a non-slurm cgroup, with e.g. `cgcreate -g memory:test`, and that cgroup also disappeared unexpectedly. --nate On

Re: [slurm-users] Jobs escaping cgroup device controls after some amount of time.

2018-04-30 Thread Nate Coraor
Hi Shawn, I'm wondering if you're still seeing this. I've recently enabled task/cgroup on 17.11.5 running on CentOS 7 and just discovered that jobs are escaping their cgroups. For me this is resulting in a lot of jobs ending in OUT_OF_MEMORY that shouldn't, because it appears slurmd thinks the

Re: [slurm-users] New Billing TRES Issue

2018-04-30 Thread Roberts, John E.
Hi, Unfortunately that can't be a solution in my running production environment for a number of reasons. I did consider it ( Thanks! -John On 4/30/18, 2:40 AM, "slurm-users on behalf of Bjørn-Helge Mevik" wrote:

Re: [slurm-users] sacctmgr - bug listing accounts?

2018-04-30 Thread Ole Holm Nielsen
Hi Loris, On 04/30/2018 01:09 PM, Loris Bennett wrote: Your example of how to use 'Organisation' to setup separate groups within one department is illuminating. However, I am still unable to set up 'geochemie' as a sibling of 'geophysik' and a child of 'geowiss': $ sacctmgr list acc where

Re: [slurm-users] sacctmgr - bug listing accounts?

2018-04-30 Thread Loris Bennett
Hi Ole, Ole Holm Nielsen writes: > Hi Loris, > > On 04/30/2018 10:12 AM, Loris Bennett wrote: >> Thanks, I should have spotted that, although I don't understand the >> difference between 'parent' and 'organisation' and in fact asked this >> question: >> >>

Re: [slurm-users] sacctmgr - bug listing accounts?

2018-04-30 Thread Ole Holm Nielsen
Hi Loris, On 04/30/2018 10:12 AM, Loris Bennett wrote: Thanks, I should have spotted that, although I don't understand the difference between 'parent' and 'organisation' and in fact asked this question: https://groups.google.com/forum/#!topic/slurm-users/f1vftgIRcVk on the subject recently.

Re: [slurm-users] sacctmgr - bug listing accounts?

2018-04-30 Thread Loris Bennett
Hi Simon, Simon Flood writes: > Hi Loris, > > On 27/04/18 13:46, Loris Bennett wrote: >> Hi, >> >> If I dump my account structure with sacctmgr, I get >> >>Parent - 'geowiss' >>Account - 'geochemie':Fairshare=2 >>Account - >>

Re: [slurm-users] Jobs in pending state

2018-04-30 Thread Zohar Roe MLM
Hi Paul, Thanks for the reply. Just want over the backfill options again. It look reasonable that after a certain number of jobs in the first cluster, the other one doesn't even get tested since there are too many jobs to backfill in the first cluster. I will try to look at the

Re: [slurm-users] New Billing TRES Issue

2018-04-30 Thread Bjørn-Helge Mevik
"Roberts, John E." writes: > So now the issue remains on why I can’t use decimals to bill for time… As a work around, perhaps you can just scale up all numbers so you get integers. -- Regards, Bjørn-Helge Mevik, dr. scient, Department for Research Computing, University of