[slurm-dev] Re: One CPU always reserved for one GPU

2016-06-03 Thread Diego Zuccato
n't, it could be seen as a feature request :) -- Diego Zuccato Servizi Informatici Dip. di Fisica e Astronomia (DIFA) - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 mail: diego.zucc...@unibo.it 5x1000 AI GIOVANI RICERCATORI DELL'UNIVERSITÀ DI BOLOGNA Cod

[slurm-dev] Re: Increase size of running job/correcting incorrect resource allocations?

2016-05-31 Thread Diego Zuccato
he job to the allocated CPUs. This way the real state and the scheduler's view are the same even with misbehaving jobs (we've had 3rd party executables that automatically used every CPU in the system). *Way* less need to watch closely the cluster :) -- Diego Zuccato Servizi Informatici Dip. di Fisica e As

[slurm-dev] Re: AllowGroups and AD

2016-02-11 Thread Diego Zuccato
Il 11/02/2016 09:17, Christopher Samuel ha scritto: > Do you go through sssd on the way to winbind? Nope. > If so you may want the following in /etc/sssd/sssd.conf: > [domain/default] > enumerate = True Can't do that: a reboot would take about half a day on our forest. -- D

[slurm-dev] Re: AllowGroups and AD

2016-02-12 Thread Diego Zuccato
limit > cluster access to a particular group in AD means that the enumeration > will be limited to members of that group. > It'll be interesting to see if they're right.. Yep. But watch out for nested groups! I just noticed that "getent group " does *not* return users in nested groups. -- Diego

[slurm-dev] Re: AllowGroups and AD

2016-02-11 Thread Diego Zuccato
scontrol reconf I don't really understand why is scontrol reconf needed at all, but if I don't use it, jobs won't get submitted. -- Diego Zuccato Servizi Informatici Dip. di Fisica e Astronomia (DIFA) - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 2

[slurm-dev] Re: AllowGroups and AD

2016-02-12 Thread Diego Zuccato
Il 12/02/2016 10:59, Diego Zuccato ha scritto: > Probably it could be better to add an > echo 'print Cancelled for insufficient privilege' > just before scancel, but I've not tested it. Tested and it does not work. So I simplified the script a bit more: --8<-- #!/bin/ba

[slurm-dev] AllowGroups and AD

2016-02-10 Thread Diego Zuccato
-cluster:~$ time getent group str957.tecnici str957.tecnici:x:242965:[...],diego.zuccato,[...] real0m0.012s user0m0.000s sys 0m0.000s Any hints? TIA -- Diego Zuccato Servizi Informatici Dip. di Fisica e Astronomia (DIFA) - Universitᅵ di Bologna V.le Berti-Pichat 6/2 - 40127

[slurm-dev] Re: Setting up SLURM for a single multi-core node

2016-02-11 Thread Diego Zuccato
econd job... -- Diego Zuccato Servizi Informatici Dip. di Fisica e Astronomia (DIFA) - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 mail: diego.zucc...@unibo.it

[slurm-dev] Re: GRES for both K80 GPU's

2016-02-11 Thread Diego Zuccato
dia[4-5] NodeName=node[001-008] Name=Gpu Type=k80 File=/dev/nvidia[6-7] --8<-- ? (I omitted CPUs since I don't know if in your case they're significant or not) IIUC, you should define each K80 as a different resource. But I started with SLURM about a week ago, so I could be way off target! HiH --

[slurm-dev] Re: AllowGroups and AD

2016-02-13 Thread Diego Zuccato
Il 12/02/2016 12:06, Diego Zuccato ha scritto: Another self-quote. Sorry. "Obviously" this line > /usr/bin/id $SLURM_JOBUSER | /bin/grep -qi $1 || ( should use $SLURM_JOB_USER var... Hope it helps. -- Diego Zuccato Servizi Informatici Dip. di Fisica e Astronomia (DIFA) -

[slurm-dev] Re: allocating entire nodes from within an allocation

2016-02-24 Thread Diego Zuccato
Il 23/02/2016 20:28, Craig Yoshioka ha scritto: > $ salloc -N 2 —exclusive You're allocating 2 nodes for threads originated by your user. When you use srun, you have to tell it to use 'n' threads with 'n'==ncpus. Maybe just removing "-n 1" could be enough. -- Diego Zuccato Serviz

[slurm-dev] Re: select/cons_res, memory limitation and cgroups

2016-02-16 Thread Diego Zuccato
e limit be reset after the job ended? I'm quite sure PrologSlurmctld is the *wrong* place to set ulimit (unless these limits gets propagated to work nodes). IIUC http://slurm.schedmd.com/prolog_epilog.html, the right place should be TaskProlog script that gets executed on the work nodes. --

[slurm-dev] RE: Ressouces allocation problem

2016-02-15 Thread Diego Zuccato
t or not ? Depends on what you expect and what you're obtaining :) HIH. -- Diego Zuccato Servizi Informatici Dip. di Fisica e Astronomia (DIFA) - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 mail: diego.zucc...@unibo.it

[slurm-dev] RE: Ressouces allocation problem

2016-02-15 Thread Diego Zuccato
in the same partition. NOTE: Gang scheduling is performed independently for each partition, so configuring partitions with overlapping nodes and gang scheduling is generally not recommended. But to use SUSPEND it seems you have to use GANG too... I don't know more, sorry. Maybe some expert can giv

[slurm-dev] Re: Slurm squeue working but not pam module

2016-04-06 Thread Diego Zuccato
e correct behaviour. If it allowed access, any user could request a short allocation, connect to the allocated node manually and run an untracked job. So the pam module accepts only logins generated by scheduled jobs. -- Diego Zuccato Servizi Informatici Dip. di Fisica e Astronomia (DIFA) - Università d

[slurm-dev] Overview of jobs in the cluster

2016-03-24 Thread Diego Zuccato
, for every node/cpu the corresponding job. Tks. -- Diego Zuccato Servizi Informatici Dip. di Fisica e Astronomia (DIFA) - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 mail: diego.zucc...@unibo.it 5x1000 AI GIOVANI RICERCATORI DELL'UNIVERSITÀ DI BOLOGNA

[slurm-dev] Re: Overview of jobs in the cluster

2016-03-25 Thread Diego Zuccato
ata available from htop: you can do it (more or less) but it requires a lot of time. So it's harder to "keep an eye" on the cluster as a whole. Add to the picture that we have overlapping partitions and you can see that it quickly gives a good headache... -- Diego Zuccato Servizi Infor

[slurm-dev] Re: Overview of jobs in the cluster

2016-03-25 Thread Diego Zuccato
slurmtop 5.00 (from schedtop 'sources'). I'll try 5.02 as soon as some new jobs gets submitted. -- Diego Zuccato Servizi Informatici Dip. di Fisica e Astronomia (DIFA) - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 mail: diego.zucc...@unibo.it 5x1000 A

[slurm-dev] Re: Overview of jobs in the cluster

2016-03-25 Thread Diego Zuccato
Il 25/03/2016 09:59, Diego Zuccato ha scritto: > I'm using SLURM 14.03.9 (the one packaged in Debian 8) and slurmtop 5.00 > (from schedtop 'sources'). I'll try 5.02 as soon as some new jobs gets > submitted. Seems I found the problem. Searching schedtop, I found the announcement, whe

[slurm-dev] Re: Overview of jobs in the cluster

2016-03-25 Thread Diego Zuccato
hence the policy to only use packages in stable. > We are also currently thinking about backporting 15.08 into jessie as > well but don't hold your breath :) If it happens before current testing becomes the new stable, good. Else I'll have to wait. It's not a showstopper. Tks! -- Diego Zuc

[slurm-dev] Re: Overview of jobs in the cluster

2016-03-28 Thread Diego Zuccato
. As soon as Debian will ship updated packages in stable, I'll upgrade. Tks anyway :) -- Diego Zuccato Servizi Informatici Dip. di Fisica e Astronomia (DIFA) - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 mail: diego.zucc...@unibo.it 5x1000 AI GIOVANI RI

[slurm-dev] Re: CPUSpecList and reservation problem

2016-07-26 Thread Diego Zuccato
see any difference. PS: which kind of machine is that, with 64 sockets? -- Diego Zuccato Servizi Informatici Dip. di Fisica e Astronomia (DIFA) - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 mail: diego.zucc...@unibo.it

[slurm-dev] Re: Fully utilizing nodes

2016-08-10 Thread Diego Zuccato
tilize the allocated core or is it restricted to a single thread? -- Diego Zuccato Servizi Informatici Dip. di Fisica e Astronomia (DIFA) - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 mail: diego.zucc...@unibo.it

[slurm-dev] Restricting number of jobs per user w/o accounting?

2017-01-27 Thread Diego Zuccato
A are completed. What I'd need is some way to launch first B's job before A's fifth. And possibly doing that w/o the accounting infrastructure. Any idea? TIA! -- Diego Zuccato Servizi Informatici Dip. di Fisica e Astronomia (DIFA) - Universitᅵ di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna

[slurm-dev] Re: slurmctld not pinging at regular interval

2017-02-20 Thread Diego Zuccato
me? Moreover, check that they're listed with names case-matching your config. I've been hit by this a couple of times: Str957 is different from str957 for Slurm! Don't ask me why, since DNS should be case insensitive). -- Diego Zuccato Servizi Informatici Dip. di Fisica e Astronomia (DIFA) - U

[slurm-dev] Re: Fully utilizing nodes

2016-08-22 Thread Diego Zuccato
ork well enough. -- Diego Zuccato Servizi Informatici Dip. di Fisica e Astronomia (DIFA) - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 mail: diego.zucc...@unibo.it

[slurm-dev] Re: jobs rejected for no reason

2016-08-24 Thread Diego Zuccato
ut one of the servers only offers 8 CPUs w/ 64GB and the other offers 32 CPUs with only 8GB: what do you tell the user?). -- Diego Zuccato Servizi Informatici Dip. di Fisica e Astronomia (DIFA) - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 mail: diego.zucc...@unibo.it

[slurm-dev] Re: setup Slurm on Ubuntu 16.04 server

2016-08-25 Thread Diego Zuccato
tch (even if DNS itself is not case-sensitive)! So if you have x.y.z.k default default.mydomain.org in /etc/fstab and NodeName=Default in slurm.conf, it won't work! -- Diego Zuccato Servizi Informatici Dip. di Fisica e Astronomia (DIFA) - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologn

[slurm-dev] Re: LDAP required?

2017-04-13 Thread Diego Zuccato
en but it's having problems with colliding UIDs and GIDs (multi-domain forest with quite a lot of [100k+] users and even more groups). -- Diego Zuccato Servizi Informatici Dip. di Fisica e Astronomia (DIFA) - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 9578

[slurm-dev] Re: LDAP required?

2017-04-13 Thread Diego Zuccato
colliding UID's/GID's. IIUC it's not its problem: seems it does not manage authentication or user/group mapping. :( I'll have to do some tests to see if ldap client can bind with machine credentials using Kerberos... Tks for the hint. -- Diego Zuccato Servizi Informatici Dip. di Fisica e Astrono

[slurm-dev] Re: slurm 17.2.06 min memory problem

2017-07-10 Thread Diego Zuccato
equest a specific amount of RAM and no default is given, how could SLURM know how much memory the job needs? How can it pack different jobs on the same node w/o knowing they won't interfere? -- Diego Zuccato Servizi Informatici Dip. di Fisica e Astronomia (DIFA) - Università di Bologna V.le Berti-Pich

[slurm-dev] Re: Change in srun ?

2017-07-19 Thread Diego Zuccato
doing (since I have nearly no idea: I'm just starting to understand the difference between OMP and MPI...). That's the reason I stick with Debian stable, even if it ships older versions. Tks, I'll have to find the resources to try that way... If it doesn't get fixed in the meantime. -- Diego Z

[slurm-dev] Change in srun ?

2017-07-19 Thread Diego Zuccato
it's the only one and all threads work on the same dataset), while mpirun launches one thread with mpi_world_size=N . Did I miss something? Tks. -- Diego Zuccato Servizi Informatici Dip. di Fisica e Astronomia (DIFA) - Universitᅵ di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel

[slurm-dev] Re: Memory bounds for jobs

2017-05-04 Thread Diego Zuccato
obs on the same server. What is possible is to not consider memory as a consumable resource... -- Diego Zuccato Servizi Informatici Dip. di Fisica e Astronomia (DIFA) - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 mail: diego.zucc...@unibo.it

[slurm-dev] Re: defaults, passwd and data

2017-09-25 Thread Diego Zuccato
quot;sometimes" (randomly, but usually after many months) some machines lose the join. I couldn't make sssd work with our AD (I'm not an AD admin, I can only join machines, and there's no special bind-account). -- Diego Zuccato Servizi Informatici Dip. di Fisica e Astronomia (DIFA) - Università di