n't, it could be seen
as a feature request :)
--
Diego Zuccato
Servizi Informatici
Dip. di Fisica e Astronomia (DIFA) - Università di Bologna V.le
Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
mail: diego.zucc...@unibo.it
5x1000 AI GIOVANI RICERCATORI
DELL'UNIVERSITÀ DI BOLOGNA
Cod
he job to
the allocated CPUs. This way the real state and the scheduler's view are
the same even with misbehaving jobs (we've had 3rd party executables
that automatically used every CPU in the system).
*Way* less need to watch closely the cluster :)
--
Diego Zuccato
Servizi Informatici
Dip. di Fisica e As
Il 11/02/2016 09:17, Christopher Samuel ha scritto:
> Do you go through sssd on the way to winbind?
Nope.
> If so you may want the following in /etc/sssd/sssd.conf:
> [domain/default]
> enumerate = True
Can't do that: a reboot would take about half a day on our forest.
--
D
limit
> cluster access to a particular group in AD means that the enumeration
> will be limited to members of that group.
> It'll be interesting to see if they're right..
Yep. But watch out for nested groups!
I just noticed that "getent group " does *not* return users
in nested groups.
--
Diego
scontrol reconf
I don't really understand why is scontrol reconf needed at all, but if I
don't use it, jobs won't get submitted.
--
Diego Zuccato
Servizi Informatici
Dip. di Fisica e Astronomia (DIFA) - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 2
Il 12/02/2016 10:59, Diego Zuccato ha scritto:
> Probably it could be better to add an
> echo 'print Cancelled for insufficient privilege'
> just before scancel, but I've not tested it.
Tested and it does not work. So I simplified the script a bit more:
--8<--
#!/bin/ba
-cluster:~$ time getent group str957.tecnici
str957.tecnici:x:242965:[...],diego.zuccato,[...]
real0m0.012s
user0m0.000s
sys 0m0.000s
Any hints?
TIA
--
Diego Zuccato
Servizi Informatici
Dip. di Fisica e Astronomia (DIFA) - Universitᅵ di Bologna
V.le Berti-Pichat 6/2 - 40127
econd job...
--
Diego Zuccato
Servizi Informatici
Dip. di Fisica e Astronomia (DIFA) - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
mail: diego.zucc...@unibo.it
dia[4-5]
NodeName=node[001-008] Name=Gpu Type=k80 File=/dev/nvidia[6-7]
--8<--
?
(I omitted CPUs since I don't know if in your case they're significant
or not)
IIUC, you should define each K80 as a different resource. But I started
with SLURM about a week ago, so I could be way off target!
HiH
--
Il 12/02/2016 12:06, Diego Zuccato ha scritto:
Another self-quote. Sorry.
"Obviously" this line
> /usr/bin/id $SLURM_JOBUSER | /bin/grep -qi $1 || (
should use $SLURM_JOB_USER var...
Hope it helps.
--
Diego Zuccato
Servizi Informatici
Dip. di Fisica e Astronomia (DIFA) -
Il 23/02/2016 20:28, Craig Yoshioka ha scritto:
> $ salloc -N 2 —exclusive
You're allocating 2 nodes for threads originated by your user. When you
use srun, you have to tell it to use 'n' threads with 'n'==ncpus.
Maybe just removing "-n 1" could be enough.
--
Diego Zuccato
Serviz
e limit be reset after the job ended?
I'm quite sure PrologSlurmctld is the *wrong* place to set ulimit
(unless these limits gets propagated to work nodes).
IIUC http://slurm.schedmd.com/prolog_epilog.html, the right place should
be TaskProlog script that gets executed on the work nodes.
--
t or not ?
Depends on what you expect and what you're obtaining :)
HIH.
--
Diego Zuccato
Servizi Informatici
Dip. di Fisica e Astronomia (DIFA) - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
mail: diego.zucc...@unibo.it
in the same
partition. NOTE: Gang scheduling is performed independently for each
partition, so configuring partitions with overlapping nodes and gang
scheduling is generally not recommended.
But to use SUSPEND it seems you have to use GANG too... I don't know
more, sorry. Maybe some expert can giv
e correct behaviour. If it allowed access, any user could
request a short allocation, connect to the allocated node manually and
run an untracked job. So the pam module accepts only logins generated by
scheduled jobs.
--
Diego Zuccato
Servizi Informatici
Dip. di Fisica e Astronomia (DIFA) - Università d
, for every node/cpu the
corresponding job.
Tks.
--
Diego Zuccato
Servizi Informatici
Dip. di Fisica e Astronomia (DIFA) - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
mail: diego.zucc...@unibo.it
5x1000 AI GIOVANI RICERCATORI
DELL'UNIVERSITÀ DI BOLOGNA
ata available from htop: you can do it (more or less) but it requires a
lot of time. So it's harder to "keep an eye" on the cluster as a whole.
Add to the picture that we have overlapping partitions and you can see
that it quickly gives a good headache...
--
Diego Zuccato
Servizi Infor
slurmtop 5.00
(from schedtop 'sources'). I'll try 5.02 as soon as some new jobs gets
submitted.
--
Diego Zuccato
Servizi Informatici
Dip. di Fisica e Astronomia (DIFA) - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
mail: diego.zucc...@unibo.it
5x1000 A
Il 25/03/2016 09:59, Diego Zuccato ha scritto:
> I'm using SLURM 14.03.9 (the one packaged in Debian 8) and slurmtop 5.00
> (from schedtop 'sources'). I'll try 5.02 as soon as some new jobs gets
> submitted.
Seems I found the problem. Searching schedtop, I found the announcement,
whe
hence the policy to only use packages in stable.
> We are also currently thinking about backporting 15.08 into jessie as
> well but don't hold your breath :)
If it happens before current testing becomes the new stable, good. Else
I'll have to wait. It's not a showstopper.
Tks!
--
Diego Zuc
. As soon as Debian will ship updated packages in
stable, I'll upgrade.
Tks anyway :)
--
Diego Zuccato
Servizi Informatici
Dip. di Fisica e Astronomia (DIFA) - Università di Bologna V.le
Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
mail: diego.zucc...@unibo.it
5x1000 AI GIOVANI RI
see any difference.
PS: which kind of machine is that, with 64 sockets?
--
Diego Zuccato
Servizi Informatici
Dip. di Fisica e Astronomia (DIFA) - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
mail: diego.zucc...@unibo.it
tilize the allocated core or is it restricted to a single thread?
--
Diego Zuccato
Servizi Informatici
Dip. di Fisica e Astronomia (DIFA) - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
mail: diego.zucc...@unibo.it
A are completed. What I'd need is some way to
launch first B's job before A's fifth. And possibly doing that w/o the
accounting infrastructure.
Any idea?
TIA!
--
Diego Zuccato
Servizi Informatici
Dip. di Fisica e Astronomia (DIFA) - Universitᅵ di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna
me?
Moreover, check that they're listed with names case-matching your
config. I've been hit by this a couple of times: Str957 is different
from str957 for Slurm! Don't ask me why, since DNS should be case
insensitive).
--
Diego Zuccato
Servizi Informatici
Dip. di Fisica e Astronomia (DIFA) - U
ork well enough.
--
Diego Zuccato
Servizi Informatici
Dip. di Fisica e Astronomia (DIFA) - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
mail: diego.zucc...@unibo.it
ut one of the
servers only offers 8 CPUs w/ 64GB and the other offers 32 CPUs with
only 8GB: what do you tell the user?).
--
Diego Zuccato
Servizi Informatici
Dip. di Fisica e Astronomia (DIFA) - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
mail: diego.zucc...@unibo.it
tch (even if DNS itself is not
case-sensitive)!
So if you have
x.y.z.k default default.mydomain.org
in /etc/fstab and
NodeName=Default
in slurm.conf, it won't work!
--
Diego Zuccato
Servizi Informatici
Dip. di Fisica e Astronomia (DIFA) - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologn
en but it's having problems with colliding UIDs
and GIDs (multi-domain forest with quite a lot of [100k+] users and even
more groups).
--
Diego Zuccato
Servizi Informatici
Dip. di Fisica e Astronomia (DIFA) - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 9578
colliding UID's/GID's.
IIUC it's not its problem: seems it does not manage authentication or
user/group mapping. :(
I'll have to do some tests to see if ldap client can bind with machine
credentials using Kerberos...
Tks for the hint.
--
Diego Zuccato
Servizi Informatici
Dip. di Fisica e Astrono
equest a specific amount of
RAM and no default is given, how could SLURM know how much memory the
job needs? How can it pack different jobs on the same node w/o knowing
they won't interfere?
--
Diego Zuccato
Servizi Informatici
Dip. di Fisica e Astronomia (DIFA) - Università di Bologna
V.le Berti-Pich
doing (since I have nearly no idea: I'm
just starting to understand the difference between OMP and MPI...).
That's the reason I stick with Debian stable, even if it ships older
versions.
Tks, I'll have to find the resources to try that way... If it doesn't
get fixed in the meantime.
--
Diego Z
it's the only one and all threads work on the same
dataset), while mpirun launches one thread with mpi_world_size=N .
Did I miss something?
Tks.
--
Diego Zuccato
Servizi Informatici
Dip. di Fisica e Astronomia (DIFA) - Universitᅵ di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel
obs on the same server.
What is possible is to not consider memory as a consumable resource...
--
Diego Zuccato
Servizi Informatici
Dip. di Fisica e Astronomia (DIFA) - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
mail: diego.zucc...@unibo.it
quot;sometimes" (randomly, but usually after many months) some machines lose
the join.
I couldn't make sssd work with our AD (I'm not an AD admin, I can only
join machines, and there's no special bind-account).
--
Diego Zuccato
Servizi Informatici
Dip. di Fisica e Astronomia (DIFA) - Università di
35 matches
Mail list logo