[slurm-dev] Re: defaults, passwd and data
Il 24/09/2017 12:10, Marcin Stolarek ha scritto: > So do I, however, I'm using sssd with AD provider joined into AD domain. > It's tricky and requires good sssd understanding, but it works... in > general. We are using PBIS-open to join the nodes. Quite easy to setup, just "sometimes" (randomly, but usually after many months) some machines lose the join. I couldn't make sssd work with our AD (I'm not an AD admin, I can only join machines, and there's no special bind-account). -- Diego Zuccato Servizi Informatici Dip. di Fisica e Astronomia (DIFA) - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 mail: diego.zucc...@unibo.it
[slurm-dev] Re: Change in srun ?
Il 19/07/2017 09:37, Gilles Gouaillardet ha scritto: > this is probably due to a change in the PMI interface. Glip! > i suggest you rebuilt your MPI library first, and then try again That's problematic... I can only install from deb files, hoping package maintainers know what they're doing (since I have nearly no idea: I'm just starting to understand the difference between OMP and MPI...). That's the reason I stick with Debian stable, even if it ships older versions. Tks, I'll have to find the resources to try that way... If it doesn't get fixed in the meantime. -- Diego Zuccato Servizi Informatici Dip. di Fisica e Astronomia (DIFA) - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 mail: diego.zucc...@unibo.it
[slurm-dev] Change in srun ?
Hello all. I've just upgraded from Debian 8 to Debian 9, and that upgraded slurm from 14.03 to 16.05. But now some MPI jobs are no more really parallel if run via srun, but work OK if run via mpirun. The problem seems that srun launches N threads with mpi_world_size=1 (so every process thinks it's the only one and all threads work on the same dataset), while mpirun launches one thread with mpi_world_size=N . Did I miss something? Tks. -- Diego Zuccato Servizi Informatici Dip. di Fisica e Astronomia (DIFA) - Universitᅵ di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 mail: diego.zucc...@unibo.it
[slurm-dev] Re: slurm 17.2.06 min memory problem
Il 10/07/2017 10:53, Roe Zohar ha scritto: > Adding DefMemPerCpu was the only solution but I don't understand this > behavior. I've had to do the same (and gave a default of just 200MB, actually forcing users to request the RAM they need). IIUC, that's because if the user does not request a specific amount of RAM and no default is given, how could SLURM know how much memory the job needs? How can it pack different jobs on the same node w/o knowing they won't interfere? -- Diego Zuccato Servizi Informatici Dip. di Fisica e Astronomia (DIFA) - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 mail: diego.zucc...@unibo.it
[slurm-dev] Re: Memory bounds for jobs
Il 27/04/2017 03:30, Alexander Vorobiev ha scritto: > The question is how to achieve that with job/partition configuration? I'm no expert, but I think you can't do that with Slurm. The maximum memory a job can use must be defined before the job starts, to allow the scheduler to spawn other jobs on the same server. What is possible is to not consider memory as a consumable resource... -- Diego Zuccato Servizi Informatici Dip. di Fisica e Astronomia (DIFA) - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 mail: diego.zucc...@unibo.it
[slurm-dev] Re: LDAP required?
Il 13/04/2017 14:26, Janne Blomqvist ha scritto: > We use adcli (there's an rpm package called adcli in EL7, FWIW; upstream > seems to be http://cgit.freedesktop.org/realmd/adcli ). Uhm... I didn't know it. BTW I use Debian for the servers. > Not sure how any of this would work with colliding UID's/GID's. IIUC it's not its problem: seems it does not manage authentication or user/group mapping. :( I'll have to do some tests to see if ldap client can bind with machine credentials using Kerberos... Tks for the hint. -- Diego Zuccato Servizi Informatici Dip. di Fisica e Astronomia (DIFA) - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 mail: diego.zucc...@unibo.it
[slurm-dev] Re: LDAP required?
Il 12/04/2017 08:52, Janne Blomqvist ha scritto: > BTW, do you have some kind of trust relationship between your FreeIPA > domain and the AD domain, or how do you do it? I did play around with > using FreeIPA for our cluster as well and somehow synchronizing it with > the university AD domain, but in the end we managed to convince the > university IT to allow us to join our nodes directly to AD, so we were > able to skip FreeIPA entirely. What are you using to join nodes to AD? I've used samba-winbind in the past but it was very fragile, and am currently using PBIS-Open but it's having problems with colliding UIDs and GIDs (multi-domain forest with quite a lot of [100k+] users and even more groups). -- Diego Zuccato Servizi Informatici Dip. di Fisica e Astronomia (DIFA) - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 mail: diego.zucc...@unibo.it
[slurm-dev] Re: slurmctld not pinging at regular interval
Il 16/02/2017 23:23, Lachlan Musicman ha scritto: > I can't remember where I saw it and I can't see it in the docs any more, > but for whatever reason, I was under the impression that the machines in > the cluster required all hosts to be in the hostfile both as FQDN and > shortname? Moreover, check that they're listed with names case-matching your config. I've been hit by this a couple of times: Str957 is different from str957 for Slurm! Don't ask me why, since DNS should be case insensitive). -- Diego Zuccato Servizi Informatici Dip. di Fisica e Astronomia (DIFA) - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 mail: diego.zucc...@unibo.it
[slurm-dev] Restricting number of jobs per user w/o accounting?
Hello all. I'm wondering if it's possible to define a rule to limit a user's queued jobs priority when there are other users' jobs. Say user A submits 100 jobs. The cluster is idle and 4 jobs start. Then user B submits a couple of jobs. Currently B's jobs are scheduled after the 100 jobs from A are completed. What I'd need is some way to launch first B's job before A's fifth. And possibly doing that w/o the accounting infrastructure. Any idea? TIA! -- Diego Zuccato Servizi Informatici Dip. di Fisica e Astronomia (DIFA) - Universitᅵ di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 mail: diego.zucc...@unibo.it
[slurm-dev] Re: setup Slurm on Ubuntu 16.04 server
Il 24/08/2016 10:05, Raymond Wan ha scritto: > One thing I remember is that the node names you have in the COMPUTE > NODES section should match the names in your /etc/hosts file. When I > had the error above, I think this was the problem that I had... And it must be a case-sensitive match (even if DNS itself is not case-sensitive)! So if you have x.y.z.k default default.mydomain.org in /etc/fstab and NodeName=Default in slurm.conf, it won't work! -- Diego Zuccato Servizi Informatici Dip. di Fisica e Astronomia (DIFA) - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 mail: diego.zucc...@unibo.it
[slurm-dev] Re: jobs rejected for no reason
Il 18/08/2016 18:27, Ade Fewings ha scritto: > If I request an impossible memory configuration I get the message you do > at job submission. Happened to me yesterday (due to DefMemPerCPU GRRR). > Wonder if passing the '-v -v -v' to sbatch provides > useful debugging output on the what Slurm assesses the job requirements > to be and whether specifying compliant memory requirements for the job > helps. >From a quick test it does not help. That info could be really useful! Put probably is too hard to get (say you're requesting 2 nodes, 16cpu each and 32GB each, but one of the servers only offers 8 CPUs w/ 64GB and the other offers 32 CPUs with only 8GB: what do you tell the user?). -- Diego Zuccato Servizi Informatici Dip. di Fisica e Astronomia (DIFA) - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 mail: diego.zucc...@unibo.it
[slurm-dev] Re: Fully utilizing nodes
Il 12/08/2016 01:42, Christopher Samuel ha scritto: > The CR_ONE_TASK_PER_CORE is a hold over from then, it means that if > you've got HT/SMT enabled you'll get one MPI rank (Slurm task) per > physical core and that can then use threads to utilise the hardware > thread units. Without it you'll get a rank per thread, which may not be > useful to your code. I've had to remove CR_ONE_TASK_PER_CORE since it seems our users' jobs (or cpuset bindings, that consider every thread an independent CPU) didn't cope well with it. Having one rank per thread, on the other hand, seems to work well enough. -- Diego Zuccato Servizi Informatici Dip. di Fisica e Astronomia (DIFA) - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 mail: diego.zucc...@unibo.it
[slurm-dev] Re: Fully utilizing nodes
Il 10/08/2016 01:37, Christopher Samuel ha scritto: > SelectType=select/cons_res > SelectTypeParameters=CR_Core_Memory,CR_ONE_TASK_PER_CORE How does that work when considering cgroup enforcement on nodes with multithread cores? Is a single task allowed to use two hreads and fully utilize the allocated core or is it restricted to a single thread? -- Diego Zuccato Servizi Informatici Dip. di Fisica e Astronomia (DIFA) - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 mail: diego.zucc...@unibo.it
[slurm-dev] Re: CPUSpecList and reservation problem
Il 25/07/2016 09:24, Danny Marc Rotscher ha scritto: > Could you please tell me, what I’m doing wrong? I'm really not an expert, but IIUC you're just wasting a lot of CPUs for a process that shouldn't use so many. Pinning it to one CPU could improve performance if the user jobs you're running are actually saturating the machine and are massively interconnected (every task needs the results of all the others to proceed), so that having a task that temporarily suspends the user job to do other things could have a cascade effect on the other tasks. But under normal workload you shouldn't see any difference. PS: which kind of machine is that, with 64 sockets? -- Diego Zuccato Servizi Informatici Dip. di Fisica e Astronomia (DIFA) - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 mail: diego.zucc...@unibo.it
[slurm-dev] Re: One CPU always reserved for one GPU
Il 02/06/2016 18:48, Felix Willenborg ha scritto: > Maybe someone has a good idea to solve my problem anyway? I think there's no solution, since (IIUC) once a node gets a job from a partition it won't accept jobs from other partitions. If I did misunderstand it would be great. If I didn't, it could be seen as a feature request :) -- Diego Zuccato Servizi Informatici Dip. di Fisica e Astronomia (DIFA) - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 mail: diego.zucc...@unibo.it 5x1000 AI GIOVANI RICERCATORI DELL'UNIVERSITÀ DI BOLOGNA Codice Fiscale: 80007010376
[slurm-dev] Re: Increase size of running job/correcting incorrect resource allocations?
Il 25/05/2016 01:15, Ryan Novosielski ha scritto: > The reason I most often want to do something like this is that as a sysadmin, > I’ll notice someone who has requested 1 core but is really using 16, for > example. In many cases, I will not have noticed this for quite awhile, and > the job is running on a node by itself (because it is common for people to > request full nodes). I’d like to adjust the allocation for this job to > prevent other jobs from using the cores that are in use. What I did is the other way around: I've used cpuset to "pin" the job to the allocated CPUs. This way the real state and the scheduler's view are the same even with misbehaving jobs (we've had 3rd party executables that automatically used every CPU in the system). *Way* less need to watch closely the cluster :) -- Diego Zuccato Servizi Informatici Dip. di Fisica e Astronomia (DIFA) - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 mail: diego.zucc...@unibo.it 5x1000 AI GIOVANI RICERCATORI DELL'UNIVERSITÀ DI BOLOGNA Codice Fiscale: 80007010376
[slurm-dev] Re: Slurm squeue working but not pam module
Il 05/04/2016 01:00, Mehdi Acheli ha scritto: > Everything in slurm is working fine. I can issue jobs and see the state > of the eight nodes as Idle. However, when I try to connect to a compute > node with a user, even if he has a job running on, I get rejected. The IIUC, that's the correct behaviour. If it allowed access, any user could request a short allocation, connect to the allocated node manually and run an untracked job. So the pam module accepts only logins generated by scheduled jobs. -- Diego Zuccato Servizi Informatici Dip. di Fisica e Astronomia (DIFA) - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 mail: diego.zucc...@unibo.it 5x1000 AI GIOVANI RICERCATORI DELL'UNIVERSITÀ DI BOLOGNA Codice Fiscale: 80007010376
[slurm-dev] Re: Overview of jobs in the cluster
Il 28/03/2016 14:45, Ade Fewings ha scritto: > Have a look at slurmtop: > https://bugs.schedmd.com/show_bug.cgi?id=1868 > We find it very handy, and it does the character per process part. Too bad I'm stuck with an older SLURM version, where it seems slurmtop does not work very well. As soon as Debian will ship updated packages in stable, I'll upgrade. Tks anyway :) -- Diego Zuccato Servizi Informatici Dip. di Fisica e Astronomia (DIFA) - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 mail: diego.zucc...@unibo.it 5x1000 AI GIOVANI RICERCATORI DELL'UNIVERSITÀ DI BOLOGNA Codice Fiscale: 80007010376
[slurm-dev] Re: Overview of jobs in the cluster
Il 25/03/2016 11:28, Rémi Palancher ha scritto: > If you mean not available in jessie (stable), that's true so far but you > should be able to rebuild stretch's packages on jessie w/o trouble. Too risky: I only have a production cluster... But *many* other (heterogeneus) servers, hence the policy to only use packages in stable. > We are also currently thinking about backporting 15.08 into jessie as > well but don't hold your breath :) If it happens before current testing becomes the new stable, good. Else I'll have to wait. It's not a showstopper. Tks! -- Diego Zuccato Servizi Informatici Dip. di Fisica e Astronomia (DIFA) - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 mail: diego.zucc...@unibo.it 5x1000 AI GIOVANI RICERCATORI DELL'UNIVERSITÀ DI BOLOGNA Codice Fiscale: 80007010376
[slurm-dev] Re: Overview of jobs in the cluster
Il 25/03/2016 09:59, Diego Zuccato ha scritto: > I'm using SLURM 14.03.9 (the one packaged in Debian 8) and slurmtop 5.00 > (from schedtop 'sources'). I'll try 5.02 as soon as some new jobs gets > submitted. Seems I found the problem. Searching schedtop, I found the announcement, where Dennis says: > schedtop/slurmtop requires Slurm 15.08 (or a recent beta) So I'm out of luck till Debian decides to upgrade to 15.08 :( Too many other machines to manage to even *think* compiling SLURM from sources! -- Diego Zuccato Servizi Informatici Dip. di Fisica e Astronomia (DIFA) - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 mail: diego.zucc...@unibo.it 5x1000 AI GIOVANI RICERCATORI DELL'UNIVERSITÀ DI BOLOGNA Codice Fiscale: 80007010376
[slurm-dev] Re: Overview of jobs in the cluster
Il 24/03/2016 15:54, Bill Wichser ha scritto: > Hmmm. This sounds like a problem with version? We are running slurm > 15.08.8 with slurmtop 5.02 and this is working for us, including being > able to view all jobs in color. I'm using SLURM 14.03.9 (the one packaged in Debian 8) and slurmtop 5.00 (from schedtop 'sources'). I'll try 5.02 as soon as some new jobs gets submitted. -- Diego Zuccato Servizi Informatici Dip. di Fisica e Astronomia (DIFA) - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 mail: diego.zucc...@unibo.it 5x1000 AI GIOVANI RICERCATORI DELL'UNIVERSITÀ DI BOLOGNA Codice Fiscale: 80007010376
[slurm-dev] Re: Overview of jobs in the cluster
Il 24/03/2016 15:39, Benjamin Redling ha scritto: > What are you missing from "squeue"? What am I missing from 'top' that makes me use htop ? :) It allows me to get the status of the whole machine at a glance. The same did pbstop. But using qnodes is like using ps to get the same data available from htop: you can do it (more or less) but it requires a lot of time. So it's harder to "keep an eye" on the cluster as a whole. Add to the picture that we have overlapping partitions and you can see that it quickly gives a good headache... -- Diego Zuccato Servizi Informatici Dip. di Fisica e Astronomia (DIFA) - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 mail: diego.zucc...@unibo.it 5x1000 AI GIOVANI RICERCATORI DELL'UNIVERSITÀ DI BOLOGNA Codice Fiscale: 80007010376
[slurm-dev] Overview of jobs in the cluster
Hello all. Is there an equivalent of torque's pbstop for SLURM? I already tried slurmtop, but it seems something is not right (nodes are shown as fully allocated with '@' even if only one CPU is really allocated, there is no color mapping, etc). What I'm looking for is a tool that gives me, for every node/cpu the corresponding job. Tks. -- Diego Zuccato Servizi Informatici Dip. di Fisica e Astronomia (DIFA) - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 mail: diego.zucc...@unibo.it 5x1000 AI GIOVANI RICERCATORI DELL'UNIVERSITÀ DI BOLOGNA Codice Fiscale: 80007010376
[slurm-dev] Re: allocating entire nodes from within an allocation
Il 23/02/2016 20:28, Craig Yoshioka ha scritto: > $ salloc -N 2 —exclusive You're allocating 2 nodes for threads originated by your user. When you use srun, you have to tell it to use 'n' threads with 'n'==ncpus. Maybe just removing "-n 1" could be enough. -- Diego Zuccato Servizi Informatici Dip. di Fisica e Astronomia (DIFA) - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 mail: diego.zucc...@unibo.it
[slurm-dev] Re: select/cons_res, memory limitation and cgroups
Il 15/02/2016 12:39, Uwe Sauter ha scritto: > I am unsure how this can be implemented. If I call "ulimit -d > $((SLURM_MEM_PER_CPU * SLURM_NTASKS_PER_NODE * 1024))" in the > PrologSlurmctld script, will that limit still be active when the user's job > executes? Will the limit be reset after the job ended? I'm quite sure PrologSlurmctld is the *wrong* place to set ulimit (unless these limits gets propagated to work nodes). IIUC http://slurm.schedmd.com/prolog_epilog.html, the right place should be TaskProlog script that gets executed on the work nodes. -- Diego Zuccato Servizi Informatici Dip. di Fisica e Astronomia (DIFA) - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 mail: diego.zucc...@unibo.it
[slurm-dev] RE: Ressouces allocation problem
Il 15/02/2016 12:55, David Roman ha scritto: > What I want : > OPA have a biger priority than other partitions. OPA cant preempt jobs > submitted in partitions LDEV, LOP, LALL. > LDEV, LOP, LALL have the same priority. > LDEV can't suspend jobs submitted in LOP or LALL. > LOP and LALL can't suspend jobs submitted in LDEV. Ok. > In practice: > 1- I submit a job A in LOP : A running (it is ok for me) > 2- I submit a job B in LALL : A and B running (it is ok for me) > 3- I submit a job C in LDEV : A and B and C running (it is not ok for me) > For me, the job C must be in PENDING state !!! Yes, it should be since there are no consumables left. Another thing that could help you pinpoint the error is enabling scheduler verbose logging. IMVHO it's due to your use of PreempMode, that includes GANG: GANG enables gang scheduling (time slicing) of jobs in the same partition. NOTE: Gang scheduling is performed independently for each partition, so configuring partitions with overlapping nodes and gang scheduling is generally not recommended. But to use SUSPEND it seems you have to use GANG too... I don't know more, sorry. Maybe some expert can give a definitive answer. -- Diego Zuccato Servizi Informatici Dip. di Fisica e Astronomia (DIFA) - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 mail: diego.zucc...@unibo.it
[slurm-dev] RE: Ressouces allocation problem
Il 15/02/2016 11:00, David Roman ha scritto: > Hello, > > I'm coming back on this problem. Because after I read again the documentation > and do other tests I have the same problem with the preemption.. > > My partition : > PartitionName=OPA Nodes=hpc-node[1-2] Priority=100 Default=No > PreemptMode=OFF > PartitionName=LDEV Nodes=hpc-node[1-2] Priority=50 Default=Yes > PreemptMode=SUSPEND > PartitionName=LOP Nodes=hpc-node[1] Priority=50 Default=Yes > PreemptMode=SUSPEND > PartitionName=LALL Nodes=hpc-node[2] Priority=50 Default=Yes > PreemptMode=SUSPEND > > > In my mind, if I submit a job on each partitions LOP and LALL with 16 cpus > requested. For me I can't see a job submit on LDEV running ! Uh? That's really unclear to me... Sequence of ops as I understand it: 1) You submit a job to LOP -> runs on hpc-node1, prio=50 2) You submit a job to LALL - > runs on hpc-node2, prio=50 3) You submit a job to LDEV IIUC, the third job gets *queued*, since LDEV have the same prio as LOP and LALL. If you want the third job to preempt the other two, you have to change partitions priorities. > Am I right or not ? Depends on what you expect and what you're obtaining :) HIH. -- Diego Zuccato Servizi Informatici Dip. di Fisica e Astronomia (DIFA) - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 mail: diego.zucc...@unibo.it
[slurm-dev] Re: AllowGroups and AD
Il 12/02/2016 12:06, Diego Zuccato ha scritto: Another self-quote. Sorry. "Obviously" this line > /usr/bin/id $SLURM_JOBUSER | /bin/grep -qi $1 || ( should use $SLURM_JOB_USER var... Hope it helps. -- Diego Zuccato Servizi Informatici Dip. di Fisica e Astronomia (DIFA) - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 mail: diego.zucc...@unibo.it
[slurm-dev] Re: AllowGroups and AD
Il 12/02/2016 06:26, Christopher Samuel ha scritto: > It's interesting timing as I'm helping a group bring up a cluster > themselves that they want to auth against the university AD and > so we could be in the same boat. Auth is not a big issue. If you need help with joining the machine, I can help you off-list. Authorizing only some groups to submit jobs to a queue is a completely different story. It was easy in torque+moab, since it leveraged the standard Unix lookups. I resorted to a scipt, called by PrologSlurmctld=/etc/slurm-llnl/SlurmCtldProlog.sh The scipt is: --8<-- #!/bin/bash case "$SLURM_JOB_PARTITION" in hpc_inf|hpc_large) ALLOWED=Str957-bl0-abilitati-baldi ;; *) ALLOWED=Str957-bl0-abilitati esac RV=0 # "getent groups" does not recurse :( ID=$(/usr/bin/id $SLURM_JOBUSER) echo $ID | /bin/grep -qi $ALLOWED || (RV=1; /usr/bin/scancel $SLURM_JOBID) #/usr/bin/env >> /tmp/env.txt return $RV --8<-- BTW I think doc is not accurate, since it says (from http://slurm.schedmd.com/prolog_epilog.html ) that >> Note that for security reasons, these programs do not have a >> search path set. Actually, at least /bin and /usr/bin are searched anyway, but I preferred to use the full paths. Probably it could be better to add an echo 'print Cancelled for insufficient privilege' just before scancel, but I've not tested it. > They reckon, though, that the fact that the intention is to limit > cluster access to a particular group in AD means that the enumeration > will be limited to members of that group. > It'll be interesting to see if they're right.. Yep. But watch out for nested groups! I just noticed that "getent group " does *not* return users in nested groups. -- Diego Zuccato Servizi Informatici Dip. di Fisica e Astronomia (DIFA) - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 mail: diego.zucc...@unibo.it
[slurm-dev] Re: AllowGroups and AD
Il 12/02/2016 10:59, Diego Zuccato ha scritto: > Probably it could be better to add an > echo 'print Cancelled for insufficient privilege' > just before scancel, but I've not tested it. Tested and it does not work. So I simplified the script a bit more: --8<-- #!/bin/bash allowonly() { # "getent groups" does not recurse :( /usr/bin/id $SLURM_JOBUSER | /bin/grep -qi $1 || ( /usr/bin/scancel $SLURM_JOBID ) } case "$SLURM_JOB_PARTITION" in hpc_inf|hpc_large) allowonly Str957-bl0-abilitati-baldi ;; esac --8<-- This way it can easily be extended to different groups for different partitions. Moreover, when "public" partition is requested there's no 'id' overhead (as I said, on my frontend it could take up to about 10 seconds!). -- Diego Zuccato Servizi Informatici Dip. di Fisica e Astronomia (DIFA) - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 mail: diego.zucc...@unibo.it
[slurm-dev] Re: AllowGroups and AD
Il 11/02/2016 09:17, Christopher Samuel ha scritto: > Do you go through sssd on the way to winbind? Nope. > If so you may want the following in /etc/sssd/sssd.conf: > [domain/default] > enumerate = True Can't do that: a reboot would take about half a day on our forest. -- Diego Zuccato Servizi Informatici Dip. di Fisica e Astronomia (DIFA) - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 mail: diego.zucc...@unibo.it
[slurm-dev] Re: AllowGroups and AD
Il 11/02/2016 08:23, Janne Blomqvist ha scritto: > do you have user and group enumeration enabled in winbind? No. I couldn't do it: when I tried it (some years ago, while experimenting with winbind join), it took more than 8 *hours* to fetch the groups list. IIRC we have more than 200k users and more than 800k groups. Yes' we've quite a large forest :) > I.e. does > $ getent passwd > and > $ getent group > return nothing, or the entire user and group lists? They just return LOCAL users/groups. > FWIW, slurm 16.05 will have some changes to work better in environments > with enumeration disabled, see http://bugs.schedmd.com/show_bug.cgi?id=1629 Yep. Seems quite applicable. For the time being I worked around it by using a script: #!/bin/bash # Syntax: sspp partition group AU=$(getent group $2 | sed 's/.*://') scontrol update PartitionName=$1 AllowAccounts=$AU scontrol reconf I don't really understand why is scontrol reconf needed at all, but if I don't use it, jobs won't get submitted. -- Diego Zuccato Servizi Informatici Dip. di Fisica e Astronomia (DIFA) - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 mail: diego.zucc...@unibo.it
[slurm-dev] Re: Setting up SLURM for a single multi-core node
Il 11/02/2016 13:00, Benjamin Redling ha scritto: > If SelectType is configured to select/cons_res, it must have a > parameter of CR_Core, CR_Core_Memory, CR_Socket, or CR_Socket_Memory for > this option to be honored. What it doesn't say (and made me bang my head against the wall for a bit) is that if you use CR_CORE_MEMORY then users have to specify how much memory their job needs or (with the defaults) the nodes gets allocated as a unit. Quite rightly, since the default says that a job can use the whole memory of the node, hence there's no mem available for a second job... -- Diego Zuccato Servizi Informatici Dip. di Fisica e Astronomia (DIFA) - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 mail: diego.zucc...@unibo.it
[slurm-dev] Re: GRES for both K80 GPU's
Il 11/02/2016 12:25, Michael Senizaiz ha scritto: > This doesn't enforce keeping the jobs on a K80. There are only 4 K80's > in the system. If I submit a 1 gpu job and a 2 gpu job after the first > will get GPU0 (0 and 1 are a K80, 2 and 3 are a K80, etc). The 2gpu job > will then get GPU 1 and GPU 2. Then the user will complain that their > peer-to-peer code isn't working and the job performance is bad because > they are running across two discreet K80's and not the 2 GPU's on a > single K80. Like allocating multithread jobs across different hosts. > gres.conf > NodeName=node[001-008] Name=gpu Type=k80 File=/dev/nvidia[0-7] CPUs=0-19 Shouldn't you have --8<-- NodeName=node[001-008] Name=Gpu Type=k80 File=/dev/nvidia[0-1] NodeName=node[001-008] Name=Gpu Type=k80 File=/dev/nvidia[2-3] NodeName=node[001-008] Name=Gpu Type=k80 File=/dev/nvidia[4-5] NodeName=node[001-008] Name=Gpu Type=k80 File=/dev/nvidia[6-7] --8<-- ? (I omitted CPUs since I don't know if in your case they're significant or not) IIUC, you should define each K80 as a different resource. But I started with SLURM about a week ago, so I could be way off target! HiH -- Diego Zuccato Servizi Informatici Dip. di Fisica e Astronomia (DIFA) - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 mail: diego.zucc...@unibo.it
[slurm-dev] AllowGroups and AD
Hello all. I think I'm doing something wrong, but I don't understand what. I'm trying to limit users allowed to use a partition (that, coming from Torque, I think is the equivalent of a queue), but obviously I'm failing. :( Frontend and work nodes are all Debians joined to AD via Winbind (that ensures consistent UID/GID mapping, at the expense of having many groups and a bit of slowness while looking 'em up). On every node I can run 'id' and it says (redacted): uid=108036(diego.zuccato) gid=100013(domain_users) gruppi=100013(domain_users),[...],242965(str957.tecnici),[...] (it takes about 10s to get the complete list of groups). Linux ACLs work as expected (if I set a file to be readable only by Str957.tecnici I can read it), but when I do scontrol update PartitionName=pp_base AllowGroups=str957.tecnici or even scontrol update PartitionName=pp_base AllowGroups=242965 when I try to sbath a job I get: diego.zuccato@Str957-cluster:~$ sbatch aaa.sh sbatch: error: Batch job submission failed: User's group not permitted to use this partition diego.zuccato@Str957-cluster:~$ newgrp Str957.tecnici diego.zuccato@Str957-cluster:~$ sbatch aaa.sh sbatch: error: Batch job submission failed: User's group not permitted to use this partition So I won't get recognized even if I change my primary GID :( I've been in that group since way before installing the cluster, and I already tried rebooting everyting to refresh the cache. Another detail that can be useful: diego.zuccato@Str957-cluster:~$ time getent group str957.tecnici str957.tecnici:x:242965:[...],diego.zuccato,[...] real0m0.012s user0m0.000s sys 0m0.000s Any hints? TIA -- Diego Zuccato Servizi Informatici Dip. di Fisica e Astronomia (DIFA) - Universitᅵ di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 mail: diego.zucc...@unibo.it