[slurm-dev] Re: defaults, passwd and data

2017-09-25 Thread Diego Zuccato

Il 24/09/2017 12:10, Marcin Stolarek ha scritto:

> So do I, however, I'm using sssd with AD provider joined into AD domain.
> It's tricky and requires good sssd understanding, but it works... in
> general.  
We are using PBIS-open to join the nodes. Quite easy to setup, just
"sometimes" (randomly, but usually after many months) some machines lose
the join.
I couldn't make sssd work with our AD (I'm not an AD admin, I can only
join machines, and there's no special bind-account).

-- 
Diego Zuccato
Servizi Informatici
Dip. di Fisica e Astronomia (DIFA) - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
mail: diego.zucc...@unibo.it


[slurm-dev] Re: Change in srun ?

2017-07-19 Thread Diego Zuccato

Il 19/07/2017 09:37, Gilles Gouaillardet ha scritto:

> this is probably due to a change in the PMI interface.
Glip!

> i suggest you rebuilt your MPI library first, and then try again
That's problematic... I can only install from deb files, hoping package
maintainers know what they're doing (since I have nearly no idea: I'm
just starting to understand the difference between OMP and MPI...).
That's the reason I stick with Debian stable, even if it ships older
versions.

Tks, I'll have to find the resources to try that way... If it doesn't
get fixed in the meantime.

-- 
Diego Zuccato
Servizi Informatici
Dip. di Fisica e Astronomia (DIFA) - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
mail: diego.zucc...@unibo.it


[slurm-dev] Change in srun ?

2017-07-19 Thread Diego Zuccato

Hello all.

I've just upgraded from Debian 8 to Debian 9, and that upgraded slurm
from 14.03 to 16.05.

But now some MPI jobs are no more really parallel if run via srun, but
work OK if run via mpirun.
The problem seems that srun launches N threads with mpi_world_size=1 (so
every process thinks it's the only one and all threads work on the same
dataset), while mpirun launches one thread with mpi_world_size=N .

Did I miss something?

Tks.

-- 
Diego Zuccato
Servizi Informatici
Dip. di Fisica e Astronomia (DIFA) - Universitᅵ di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
mail: diego.zucc...@unibo.it


[slurm-dev] Re: slurm 17.2.06 min memory problem

2017-07-10 Thread Diego Zuccato

Il 10/07/2017 10:53, Roe Zohar ha scritto:

> Adding DefMemPerCpu was the only solution but I don't understand this
> behavior.
I've had to do the same (and gave a default of just 200MB, actually
forcing users to request the RAM they need).
IIUC, that's because if the user does not request a specific amount of
RAM and no default is given, how could SLURM know how much memory the
job needs? How can it pack different jobs on the same node w/o knowing
they won't interfere?


-- 
Diego Zuccato
Servizi Informatici
Dip. di Fisica e Astronomia (DIFA) - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
mail: diego.zucc...@unibo.it


[slurm-dev] Re: Memory bounds for jobs

2017-05-04 Thread Diego Zuccato

Il 27/04/2017 03:30, Alexander Vorobiev ha scritto:

> The question is how to achieve that with job/partition configuration?
I'm no expert, but I think you can't do that with Slurm.
The maximum memory a job can use must be defined before the job starts,
to allow the scheduler to spawn other jobs on the same server.

What is possible is to not consider memory as a consumable resource...

-- 
Diego Zuccato
Servizi Informatici
Dip. di Fisica e Astronomia (DIFA) - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
mail: diego.zucc...@unibo.it


[slurm-dev] Re: LDAP required?

2017-04-13 Thread Diego Zuccato

Il 13/04/2017 14:26, Janne Blomqvist ha scritto:

> We use adcli (there's an rpm package called adcli in EL7, FWIW; upstream
> seems to be http://cgit.freedesktop.org/realmd/adcli ).
Uhm... I didn't know it. BTW I use Debian for the servers.

> Not sure how any of this would work with colliding UID's/GID's.
IIUC it's not its problem: seems it does not manage authentication or
user/group mapping. :(
I'll have to do some tests to see if ldap client can bind with machine
credentials using Kerberos...

Tks for the hint.

-- 
Diego Zuccato
Servizi Informatici
Dip. di Fisica e Astronomia (DIFA) - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
mail: diego.zucc...@unibo.it


[slurm-dev] Re: LDAP required?

2017-04-13 Thread Diego Zuccato

Il 12/04/2017 08:52, Janne Blomqvist ha scritto:

> BTW, do you have some kind of trust relationship between your FreeIPA
> domain and the AD domain, or how do you do it? I did play around with
> using FreeIPA for our cluster as well and somehow synchronizing it with
> the university AD domain, but in the end we managed to convince the
> university IT to allow us to join our nodes directly to AD, so we were
> able to skip FreeIPA entirely.
What are you using to join nodes to AD?

I've used samba-winbind in the past but it was very fragile, and am
currently using PBIS-Open but it's having problems with colliding UIDs
and GIDs (multi-domain forest with quite a lot of [100k+] users and even
more groups).

-- 
Diego Zuccato
Servizi Informatici
Dip. di Fisica e Astronomia (DIFA) - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
mail: diego.zucc...@unibo.it


[slurm-dev] Re: slurmctld not pinging at regular interval

2017-02-20 Thread Diego Zuccato

Il 16/02/2017 23:23, Lachlan Musicman ha scritto:

> I can't remember where I saw it and I can't see it in the docs any more,
> but for whatever reason, I was under the impression that the machines in
> the cluster required all hosts to be in the hostfile both as FQDN and
> shortname?
Moreover, check that they're listed with names case-matching your
config. I've been hit by this a couple of times: Str957 is different
from str957 for Slurm! Don't ask me why, since DNS should be case
insensitive).

-- 
Diego Zuccato
Servizi Informatici
Dip. di Fisica e Astronomia (DIFA) - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
mail: diego.zucc...@unibo.it


[slurm-dev] Restricting number of jobs per user w/o accounting?

2017-01-27 Thread Diego Zuccato

Hello all.

I'm wondering if it's possible to define a rule to limit a user's queued
jobs priority when there are other users' jobs.

Say user A submits 100 jobs. The cluster is idle and 4 jobs start.
Then user B submits a couple of jobs. Currently B's jobs are scheduled
after the 100 jobs from A are completed. What I'd need is some way to
launch first B's job before A's fifth. And possibly doing that w/o the
accounting infrastructure.

Any idea?

TIA!

-- 
Diego Zuccato
Servizi Informatici
Dip. di Fisica e Astronomia (DIFA) - Universitᅵ di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
mail: diego.zucc...@unibo.it


[slurm-dev] Re: setup Slurm on Ubuntu 16.04 server

2016-08-25 Thread Diego Zuccato

Il 24/08/2016 10:05, Raymond Wan ha scritto:

> One thing I remember is that the node names you have in the COMPUTE
> NODES section should match the names in your /etc/hosts file.  When I
> had the error above, I think this was the problem that I had...
And it must be a case-sensitive match (even if DNS itself is not
case-sensitive)!
So if you have
x.y.z.k default default.mydomain.org
in /etc/fstab and
NodeName=Default
in slurm.conf, it won't work!

-- 
Diego Zuccato
Servizi Informatici
Dip. di Fisica e Astronomia (DIFA) - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
mail: diego.zucc...@unibo.it


[slurm-dev] Re: jobs rejected for no reason

2016-08-24 Thread Diego Zuccato

Il 18/08/2016 18:27, Ade Fewings ha scritto:

> If I request an impossible memory configuration I get the message you do
> at job submission.
Happened to me yesterday (due to DefMemPerCPU GRRR).

> Wonder if passing the '-v -v -v' to sbatch provides
> useful debugging output on the what Slurm assesses the job requirements
> to be and whether specifying compliant memory requirements for the job
> helps.   
>From a quick test it does not help.
That info could be really useful! Put probably is too hard to get (say
you're requesting 2 nodes, 16cpu each and 32GB each, but one of the
servers only offers 8 CPUs w/ 64GB and the other offers 32 CPUs with
only 8GB: what do you tell the user?).

-- 
Diego Zuccato
Servizi Informatici
Dip. di Fisica e Astronomia (DIFA) - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
mail: diego.zucc...@unibo.it


[slurm-dev] Re: Fully utilizing nodes

2016-08-22 Thread Diego Zuccato

Il 12/08/2016 01:42, Christopher Samuel ha scritto:

> The CR_ONE_TASK_PER_CORE is a hold over from then, it means that if
> you've got HT/SMT enabled you'll get one MPI rank (Slurm task) per
> physical core and that can then use threads to utilise the hardware
> thread units.  Without it you'll get a rank per thread, which may not be
> useful to your code.
I've had to remove CR_ONE_TASK_PER_CORE since it seems our users' jobs
(or cpuset bindings, that consider every thread an independent CPU)
didn't cope well with it. Having one rank per thread, on the other hand,
seems to work well enough.

-- 
Diego Zuccato
Servizi Informatici
Dip. di Fisica e Astronomia (DIFA) - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
mail: diego.zucc...@unibo.it


[slurm-dev] Re: Fully utilizing nodes

2016-08-10 Thread Diego Zuccato

Il 10/08/2016 01:37, Christopher Samuel ha scritto:

> SelectType=select/cons_res
> SelectTypeParameters=CR_Core_Memory,CR_ONE_TASK_PER_CORE
How does that work when considering cgroup enforcement on nodes with
multithread cores? Is a single task allowed to use two hreads and fully
utilize the allocated core or is it restricted to a single thread?

-- 
Diego Zuccato
Servizi Informatici
Dip. di Fisica e Astronomia (DIFA) - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
mail: diego.zucc...@unibo.it


[slurm-dev] Re: CPUSpecList and reservation problem

2016-07-26 Thread Diego Zuccato

Il 25/07/2016 09:24, Danny Marc Rotscher ha scritto:

> Could you please tell me, what I’m doing wrong?
I'm really not an expert, but IIUC you're just wasting a lot of CPUs for
a process that shouldn't use so many. Pinning it to one CPU could
improve performance if the user jobs you're running are actually
saturating the machine and are massively interconnected (every task
needs the results of all the others to proceed), so that having a task
that temporarily suspends the user job to do other things could have a
cascade effect on the other tasks.
But under normal workload you shouldn't see any difference.

PS: which kind of machine is that, with 64 sockets?

-- 
Diego Zuccato
Servizi Informatici
Dip. di Fisica e Astronomia (DIFA) - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
mail: diego.zucc...@unibo.it


[slurm-dev] Re: One CPU always reserved for one GPU

2016-06-03 Thread Diego Zuccato

Il 02/06/2016 18:48, Felix Willenborg ha scritto:

> Maybe someone has a good idea to solve my problem anyway?
I think there's no solution, since (IIUC) once a node gets a job from a
partition it won't accept jobs from other partitions.
If I did misunderstand it would be great. If I didn't, it could be seen
as a feature request :)

--
Diego Zuccato
Servizi Informatici
Dip. di Fisica e Astronomia (DIFA) - Università di Bologna V.le
Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
mail: diego.zucc...@unibo.it

5x1000 AI GIOVANI RICERCATORI
DELL'UNIVERSITÀ DI BOLOGNA
Codice Fiscale: 80007010376

[slurm-dev] Re: Increase size of running job/correcting incorrect resource allocations?

2016-05-31 Thread Diego Zuccato

Il 25/05/2016 01:15, Ryan Novosielski ha scritto:

> The reason I most often want to do something like this is that as a sysadmin, 
> I’ll notice someone who has requested 1 core but is really using 16, for 
> example. In many cases, I will not have noticed this for quite awhile, and 
> the job is running on a node by itself (because it is common for people to 
> request full nodes). I’d like to adjust the allocation for this job to 
> prevent other jobs from using the cores that are in use.
What I did is the other way around: I've used cpuset to "pin" the job to
the allocated CPUs. This way the real state and the scheduler's view are
the same even with misbehaving jobs (we've had 3rd party executables
that automatically used every CPU in the system).
*Way* less need to watch closely the cluster :)

--
Diego Zuccato
Servizi Informatici
Dip. di Fisica e Astronomia (DIFA) - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
mail: diego.zucc...@unibo.it

5x1000 AI GIOVANI RICERCATORI
DELL'UNIVERSITÀ DI BOLOGNA
Codice Fiscale: 80007010376

[slurm-dev] Re: Slurm squeue working but not pam module

2016-04-06 Thread Diego Zuccato

Il 05/04/2016 01:00, Mehdi Acheli ha scritto:

> Everything in slurm is working fine. I can issue jobs and see the state
> of the eight nodes as Idle. However, when I try to connect to a compute
> node with a user, even if he has a job running on, I get rejected. The
IIUC, that's the correct behaviour. If it allowed access, any user could
request a short allocation, connect to the allocated node manually and
run an untracked job. So the pam module accepts only logins generated by
scheduled jobs.

--
Diego Zuccato
Servizi Informatici
Dip. di Fisica e Astronomia (DIFA) - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
mail: diego.zucc...@unibo.it

5x1000 AI GIOVANI RICERCATORI
DELL'UNIVERSITÀ DI BOLOGNA
Codice Fiscale: 80007010376

[slurm-dev] Re: Overview of jobs in the cluster

2016-03-28 Thread Diego Zuccato

Il 28/03/2016 14:45, Ade Fewings ha scritto:

> Have a look at slurmtop:
> https://bugs.schedmd.com/show_bug.cgi?id=1868
> We find it very handy, and it does the character per process part.
Too bad I'm stuck with an older SLURM version, where it seems slurmtop
does not work very well. As soon as Debian will ship updated packages in
stable, I'll upgrade.

Tks anyway :)

--
Diego Zuccato
Servizi Informatici
Dip. di Fisica e Astronomia (DIFA) - Università di Bologna V.le
Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
mail: diego.zucc...@unibo.it

5x1000 AI GIOVANI RICERCATORI
DELL'UNIVERSITÀ DI BOLOGNA
Codice Fiscale: 80007010376

[slurm-dev] Re: Overview of jobs in the cluster

2016-03-25 Thread Diego Zuccato

Il 25/03/2016 11:28, Rémi Palancher ha scritto:

> If you mean not available in jessie (stable), that's true so far but you
> should be able to rebuild stretch's packages on jessie w/o trouble.
Too risky: I only have a production cluster... But *many* other
(heterogeneus) servers, hence the policy to only use packages in stable.

> We are also currently thinking about backporting 15.08 into jessie as
> well but don't hold your breath :)
If it happens before current testing becomes the new stable, good. Else
I'll have to wait. It's not a showstopper.

Tks!

--
Diego Zuccato
Servizi Informatici
Dip. di Fisica e Astronomia (DIFA) - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
mail: diego.zucc...@unibo.it

5x1000 AI GIOVANI RICERCATORI
DELL'UNIVERSITÀ DI BOLOGNA
Codice Fiscale: 80007010376

[slurm-dev] Re: Overview of jobs in the cluster

2016-03-25 Thread Diego Zuccato

Il 25/03/2016 09:59, Diego Zuccato ha scritto:

> I'm using SLURM 14.03.9 (the one packaged in Debian 8) and slurmtop 5.00
> (from schedtop 'sources'). I'll try 5.02 as soon as some new jobs gets
> submitted.
Seems I found the problem. Searching schedtop, I found the announcement,
where Dennis says:
> schedtop/slurmtop requires Slurm 15.08 (or a recent beta)

So I'm out of luck till Debian decides to upgrade to 15.08 :(
Too many other machines to manage to even *think* compiling SLURM from
sources!

--
Diego Zuccato
Servizi Informatici
Dip. di Fisica e Astronomia (DIFA) - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
mail: diego.zucc...@unibo.it

5x1000 AI GIOVANI RICERCATORI
DELL'UNIVERSITÀ DI BOLOGNA
Codice Fiscale: 80007010376

[slurm-dev] Re: Overview of jobs in the cluster

2016-03-25 Thread Diego Zuccato

Il 24/03/2016 15:54, Bill Wichser ha scritto:

> Hmmm.  This sounds like a problem with version?  We are running slurm
> 15.08.8 with slurmtop 5.02 and this is working for us, including being
> able to view all jobs in color.
I'm using SLURM 14.03.9 (the one packaged in Debian 8) and slurmtop 5.00
(from schedtop 'sources'). I'll try 5.02 as soon as some new jobs gets
submitted.

--
Diego Zuccato
Servizi Informatici
Dip. di Fisica e Astronomia (DIFA) - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
mail: diego.zucc...@unibo.it

5x1000 AI GIOVANI RICERCATORI
DELL'UNIVERSITÀ DI BOLOGNA
Codice Fiscale: 80007010376

[slurm-dev] Re: Overview of jobs in the cluster

2016-03-25 Thread Diego Zuccato

Il 24/03/2016 15:39, Benjamin Redling ha scritto:

> What are you missing from "squeue"?
What am I missing from 'top' that makes me use htop ? :)
It allows me to get the status of the whole machine at a glance.
The same did pbstop. But using qnodes is like using ps to get the same
data available from htop: you can do it (more or less) but it requires a
lot of time. So it's harder to "keep an eye" on the cluster as a whole.
Add to the picture that we have overlapping partitions and you can see
that it quickly gives a good headache...

--
Diego Zuccato
Servizi Informatici
Dip. di Fisica e Astronomia (DIFA) - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
mail: diego.zucc...@unibo.it

5x1000 AI GIOVANI RICERCATORI
DELL'UNIVERSITÀ DI BOLOGNA
Codice Fiscale: 80007010376

[slurm-dev] Overview of jobs in the cluster

2016-03-24 Thread Diego Zuccato

Hello all.

Is there an equivalent of torque's pbstop for SLURM?

I already tried slurmtop, but it seems something is not right (nodes are
shown as fully allocated with '@' even if only one CPU is really
allocated, there is no color mapping, etc).
What I'm looking for is a tool that gives me, for every node/cpu the
corresponding job.

Tks.

--
Diego Zuccato
Servizi Informatici
Dip. di Fisica e Astronomia (DIFA) - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
mail: diego.zucc...@unibo.it

5x1000 AI GIOVANI RICERCATORI
DELL'UNIVERSITÀ DI BOLOGNA
Codice Fiscale: 80007010376

[slurm-dev] Re: allocating entire nodes from within an allocation

2016-02-24 Thread Diego Zuccato

Il 23/02/2016 20:28, Craig Yoshioka ha scritto:

> $ salloc -N 2 —exclusive
You're allocating 2 nodes for threads originated by your user. When you
use srun, you have to tell it to use 'n' threads with 'n'==ncpus.
Maybe just removing "-n 1" could be enough.

-- 
Diego Zuccato
Servizi Informatici
Dip. di Fisica e Astronomia (DIFA) - Università di Bologna V.le
Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
mail: diego.zucc...@unibo.it


[slurm-dev] Re: select/cons_res, memory limitation and cgroups

2016-02-16 Thread Diego Zuccato

Il 15/02/2016 12:39, Uwe Sauter ha scritto:

> I am unsure how this can be implemented. If I call "ulimit -d 
> $((SLURM_MEM_PER_CPU * SLURM_NTASKS_PER_NODE * 1024))" in the
> PrologSlurmctld script, will that limit still be active when the user's job 
> executes? Will the limit be reset after the job ended?
I'm quite sure PrologSlurmctld is the *wrong* place to set ulimit
(unless these limits gets propagated to work nodes).
IIUC http://slurm.schedmd.com/prolog_epilog.html, the right place should
be TaskProlog script that gets executed on the work nodes.

-- 
Diego Zuccato
Servizi Informatici
Dip. di Fisica e Astronomia (DIFA) - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
mail: diego.zucc...@unibo.it


[slurm-dev] RE: Ressouces allocation problem

2016-02-15 Thread Diego Zuccato

Il 15/02/2016 12:55, David Roman ha scritto:

> What I want :
> OPA have a biger priority than other partitions. OPA cant preempt jobs 
> submitted in partitions LDEV, LOP, LALL.
> LDEV, LOP, LALL have the same priority.
> LDEV can't suspend jobs submitted in LOP or LALL.
> LOP and LALL can't suspend jobs submitted in LDEV.
Ok.

> In practice:
> 1- I submit a job A in LOP : A running (it is ok for me)
> 2- I submit a job B in LALL : A and B running (it is ok for me)
> 3- I submit a job C in LDEV : A and B and C running (it is not ok for me)
> For me, the job C must be in PENDING state !!!
Yes, it should be since there are no consumables left.

Another thing that could help you pinpoint the error is enabling
scheduler verbose logging.

IMVHO it's due to your use of PreempMode, that includes GANG:
GANG
enables gang scheduling (time slicing) of jobs in the same
partition. NOTE: Gang scheduling is performed independently for each
partition, so configuring partitions with overlapping nodes and gang
scheduling is generally not recommended.

But to use SUSPEND it seems you have to use GANG too... I don't know
more, sorry. Maybe some expert can give a definitive answer.

-- 
Diego Zuccato
Servizi Informatici
Dip. di Fisica e Astronomia (DIFA) - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
mail: diego.zucc...@unibo.it


[slurm-dev] RE: Ressouces allocation problem

2016-02-15 Thread Diego Zuccato

Il 15/02/2016 11:00, David Roman ha scritto:
> Hello,
> 
> I'm coming back on this problem. Because after I read again the documentation 
> and do other tests I have the same problem with the preemption..
> 
> My partition :
> PartitionName=OPA   Nodes=hpc-node[1-2] Priority=100 Default=No  
> PreemptMode=OFF
> PartitionName=LDEV  Nodes=hpc-node[1-2] Priority=50  Default=Yes 
> PreemptMode=SUSPEND
> PartitionName=LOP   Nodes=hpc-node[1]   Priority=50  Default=Yes 
> PreemptMode=SUSPEND
> PartitionName=LALL  Nodes=hpc-node[2]   Priority=50  Default=Yes 
> PreemptMode=SUSPEND
> 
> 
> In my mind, if I submit a job on each partitions LOP and LALL with 16 cpus 
> requested. For me I can't see a job submit on LDEV running !
Uh? That's really unclear to me... Sequence of ops as I understand it:
1) You submit a job to LOP -> runs on hpc-node1, prio=50
2) You submit a job to LALL - > runs on hpc-node2, prio=50
3) You submit a job to LDEV

IIUC, the third job gets *queued*, since LDEV have the same prio as LOP
and LALL. If you want the third job to preempt the other two, you have
to change partitions priorities.

> Am I right or not ?
Depends on what you expect and what you're obtaining :)

HIH.

-- 
Diego Zuccato
Servizi Informatici
Dip. di Fisica e Astronomia (DIFA) - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
mail: diego.zucc...@unibo.it


[slurm-dev] Re: AllowGroups and AD

2016-02-13 Thread Diego Zuccato

Il 12/02/2016 12:06, Diego Zuccato ha scritto:

Another self-quote. Sorry.
"Obviously" this line
> /usr/bin/id $SLURM_JOBUSER | /bin/grep -qi $1 || (
should use $SLURM_JOB_USER var...

Hope it helps.

-- 
Diego Zuccato
Servizi Informatici
Dip. di Fisica e Astronomia (DIFA) - Università di Bologna V.le
Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
mail: diego.zucc...@unibo.it


[slurm-dev] Re: AllowGroups and AD

2016-02-12 Thread Diego Zuccato

Il 12/02/2016 06:26, Christopher Samuel ha scritto:

> It's interesting timing as I'm helping a group bring up a cluster
> themselves that they want to auth against the university AD and
> so we could be in the same boat.
Auth is not a big issue. If you need help with joining the machine, I
can help you off-list.

Authorizing only some groups to submit jobs to a queue is a completely
different story. It was easy in torque+moab, since it leveraged the
standard Unix lookups.
I resorted to a scipt, called by
PrologSlurmctld=/etc/slurm-llnl/SlurmCtldProlog.sh

The scipt is:
--8<--
#!/bin/bash
case "$SLURM_JOB_PARTITION" in
hpc_inf|hpc_large) ALLOWED=Str957-bl0-abilitati-baldi
;;
*) ALLOWED=Str957-bl0-abilitati
esac

RV=0
# "getent groups" does not recurse :(
ID=$(/usr/bin/id $SLURM_JOBUSER)
echo $ID | /bin/grep -qi $ALLOWED || (RV=1; /usr/bin/scancel $SLURM_JOBID)

#/usr/bin/env >> /tmp/env.txt

return $RV
--8<--

BTW I think doc is not accurate, since it says (from
http://slurm.schedmd.com/prolog_epilog.html ) that
>> Note that for security reasons, these programs do not have a
>> search path set.
Actually, at least /bin and /usr/bin are searched anyway, but I
preferred to use the full paths.

Probably it could be better to add an
echo 'print Cancelled for insufficient privilege'
just before scancel, but I've not tested it.

> They reckon, though, that the fact that the intention is to limit
> cluster access to a particular group in AD means that the enumeration
> will be limited to members of that group.
> It'll be interesting to see if they're right..
Yep. But watch out for nested groups!
I just noticed that "getent group " does *not* return users
in nested groups.

-- 
Diego Zuccato
Servizi Informatici
Dip. di Fisica e Astronomia (DIFA) - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
mail: diego.zucc...@unibo.it


[slurm-dev] Re: AllowGroups and AD

2016-02-12 Thread Diego Zuccato

Il 12/02/2016 10:59, Diego Zuccato ha scritto:

> Probably it could be better to add an
> echo 'print Cancelled for insufficient privilege'
> just before scancel, but I've not tested it.
Tested and it does not work. So I simplified the script a bit more:
--8<--
#!/bin/bash
allowonly()
{
# "getent groups" does not recurse :(
/usr/bin/id $SLURM_JOBUSER | /bin/grep -qi $1 || (
/usr/bin/scancel $SLURM_JOBID
)
}

case "$SLURM_JOB_PARTITION" in
hpc_inf|hpc_large)
allowonly Str957-bl0-abilitati-baldi
;;
esac
--8<--

This way it can easily be extended to different groups for different
partitions. Moreover, when "public" partition is requested there's no
'id' overhead (as I said, on my frontend it could take up to about 10
seconds!).

-- 
Diego Zuccato
Servizi Informatici
Dip. di Fisica e Astronomia (DIFA) - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
mail: diego.zucc...@unibo.it


[slurm-dev] Re: AllowGroups and AD

2016-02-11 Thread Diego Zuccato

Il 11/02/2016 09:17, Christopher Samuel ha scritto:

> Do you go through sssd on the way to winbind?
Nope.

> If so you may want the following in /etc/sssd/sssd.conf:
> [domain/default]
> enumerate = True
Can't do that: a reboot would take about half a day on our forest.

-- 
Diego Zuccato
Servizi Informatici
Dip. di Fisica e Astronomia (DIFA) - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
mail: diego.zucc...@unibo.it


[slurm-dev] Re: AllowGroups and AD

2016-02-11 Thread Diego Zuccato

Il 11/02/2016 08:23, Janne Blomqvist ha scritto:

> do you have user and group enumeration enabled in winbind?
No. I couldn't do it: when I tried it (some years ago, while
experimenting with winbind join), it took more than 8 *hours* to fetch
the groups list. IIRC we have more than 200k users and more than 800k
groups. Yes' we've quite a large forest :)

> I.e. does
> $ getent passwd
> and
> $ getent group
> return nothing, or the entire user and group lists?
They just return LOCAL users/groups.

> FWIW, slurm 16.05 will have some changes to work better in environments
> with enumeration disabled, see http://bugs.schedmd.com/show_bug.cgi?id=1629
Yep. Seems quite applicable.

For the time being I worked around it by using a script:
#!/bin/bash
# Syntax: sspp partition group
AU=$(getent group $2 | sed 's/.*://')
scontrol update PartitionName=$1 AllowAccounts=$AU
scontrol reconf

I don't really understand why is scontrol reconf needed at all, but if I
don't use it, jobs won't get submitted.

-- 
Diego Zuccato
Servizi Informatici
Dip. di Fisica e Astronomia (DIFA) - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
mail: diego.zucc...@unibo.it


[slurm-dev] Re: Setting up SLURM for a single multi-core node

2016-02-11 Thread Diego Zuccato

Il 11/02/2016 13:00, Benjamin Redling ha scritto:

> If SelectType is configured to select/cons_res, it must have a
> parameter of CR_Core, CR_Core_Memory, CR_Socket, or CR_Socket_Memory for
> this option to be honored.
What it doesn't say (and made me bang my head against the wall for a
bit) is that if you use CR_CORE_MEMORY then users have to specify how
much memory their job needs or (with the defaults) the nodes gets
allocated as a unit. Quite rightly, since the default says that a job
can use the whole memory of the node, hence there's no mem available for
a second job...

-- 
Diego Zuccato
Servizi Informatici
Dip. di Fisica e Astronomia (DIFA) - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
mail: diego.zucc...@unibo.it


[slurm-dev] Re: GRES for both K80 GPU's

2016-02-11 Thread Diego Zuccato

Il 11/02/2016 12:25, Michael Senizaiz ha scritto:

> This doesn't enforce keeping the jobs on a K80.  There are only 4 K80's
> in the system.  If I submit a 1 gpu job and a 2 gpu job after the first
> will get GPU0 (0 and 1 are a K80, 2 and 3 are a K80, etc).  The 2gpu job
> will then get GPU 1 and GPU 2.  Then the user will complain that their
> peer-to-peer code isn't working and the job performance is bad because
> they are running across two discreet K80's and not the 2 GPU's on a
> single K80.
Like allocating multithread jobs across different hosts.

> gres.conf
> NodeName=node[001-008] Name=gpu Type=k80 File=/dev/nvidia[0-7] CPUs=0-19
Shouldn't you have
--8<--
NodeName=node[001-008] Name=Gpu Type=k80 File=/dev/nvidia[0-1]
NodeName=node[001-008] Name=Gpu Type=k80 File=/dev/nvidia[2-3]
NodeName=node[001-008] Name=Gpu Type=k80 File=/dev/nvidia[4-5]
NodeName=node[001-008] Name=Gpu Type=k80 File=/dev/nvidia[6-7]
--8<--
?
(I omitted CPUs since I don't know if in your case they're significant
or not)
IIUC, you should define each K80 as a different resource. But I started
with SLURM about a week ago, so I could be way off target!
HiH

-- 
Diego Zuccato
Servizi Informatici
Dip. di Fisica e Astronomia (DIFA) - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
mail: diego.zucc...@unibo.it


[slurm-dev] AllowGroups and AD

2016-02-10 Thread Diego Zuccato

Hello all.

I think I'm doing something wrong, but I don't understand what.

I'm trying to limit users allowed to use a partition (that, coming from
Torque, I think is the equivalent of a queue), but obviously I'm failing. :(

Frontend and work nodes are all Debians joined to AD via Winbind (that
ensures consistent UID/GID mapping, at the expense of having many groups
and a bit of slowness while looking 'em up).
On every node I can run 'id' and it says (redacted):
uid=108036(diego.zuccato) gid=100013(domain_users)
gruppi=100013(domain_users),[...],242965(str957.tecnici),[...]

(it takes about 10s to get the complete list of groups).

Linux ACLs work as expected (if I set a file to be readable only by
Str957.tecnici I can read it), but when I do
scontrol update PartitionName=pp_base AllowGroups=str957.tecnici
or even
scontrol update PartitionName=pp_base AllowGroups=242965

when I try to sbath a job I get:
diego.zuccato@Str957-cluster:~$ sbatch aaa.sh
sbatch: error: Batch job submission failed: User's group not permitted
to use this partition
diego.zuccato@Str957-cluster:~$ newgrp Str957.tecnici
diego.zuccato@Str957-cluster:~$ sbatch aaa.sh
sbatch: error: Batch job submission failed: User's group not permitted
to use this partition

So I won't get recognized even if I change my primary GID :(

I've been in that group since way before installing the cluster, and I
already tried rebooting everyting to refresh the cache.

Another detail that can be useful:
diego.zuccato@Str957-cluster:~$ time getent group str957.tecnici
str957.tecnici:x:242965:[...],diego.zuccato,[...]

real0m0.012s
user0m0.000s
sys 0m0.000s

Any hints?

TIA

-- 
Diego Zuccato
Servizi Informatici
Dip. di Fisica e Astronomia (DIFA) - Universitᅵ di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
mail: diego.zucc...@unibo.it