You can also have a submit plugin that will put job in multiple partition if non specified. This should reduce the drawback of multiple partitions.
However I think that with features and topology plugin you should be able to aviod multiple partitions setup. cheers Marcin 2017-01-17 9:49 GMT+01:00 David WALTER <david.wal...@ens.fr>: > > Thanks Paul for your response and your advices. > > That's actually the reason why they asked me to set 3 and now 4 > partitions. As we have now 4 different generation of nodes with significant > differences of hardware (not the same CPU, not the same amount of RAM) we > thought that it was a good solution. > > I will test with people to adjust the solution with the needs of the many. > > Thanks again > > ------------------------------ > David WALTER > The computer guy > david.wal...@ens.fr > 01/44/32/27/94 > > INSERM U960 > Laboratoire de Neurosciences Cognitives > Ecole Normale Supérieure > 29, rue d'Ulm > 75005 Paris > > -----Message d'origine----- > De : Paul Edmon [mailto:ped...@cfa.harvard.edu] > Envoyé : lundi 16 janvier 2017 16:37 > À : slurm-dev > Objet : [slurm-dev] RE: A little bit help from my slurm-friends > > > I agree having multiple partitions will decrease efficiency of the > scheduler. That said if you have to do it, you have to do it. Using the > features is a good way to go if people need specific ones. I could see > having multiple partitions so you can charge differently for each > generation of hardware, as run times will invariably be different. > Still if that isn't a concern just have a single queue. > > For multifactor I would turn on fairshare and age. JobSize really isn't > useful unless you have people running multicore jobs and you want to > prioritize, or deprioritize those. > > If you end up in a multipartition scenario then I recommend having a > backfill queue that underlies all the partitions and setting up REQUEUE on > that partition. That way people can farm idle cycles. This is especially > good for people who are hardware agnostic and don't really care when their > jobs get done but rather just have a ton to do that can be interrupted at > any moment. That's what we do here and we have 110 partitions. Our > backfill queue does a pretty good job up picking up the idle cores but > still there is structural inefficiencies with that many partitions so we > never get above about 70% usage of our hardware. > > So just keep that in mind when you are setting things up. More partitions > means more structural inefficiency but it does give you other benefits such > as isolating hardware for specific use. It really depends on what you > need. I highly recommend experimenting to figure out what fits you and > your users best. > > -Paul Edmon- > > On 1/16/2017 10:16 AM, Loris Bennett wrote: > > David WALTER <david.wal...@ens.fr> writes: > > > >> Dear Loris, > >> > >> Thanks for your response ! > >> > >> I'm going to look on this features in slurm.conf. I only configured > >> the CPUs, Sockets.... per node. Do you have any example or link to > >> explain me how it's working and what can I use ? > > It's not very complicated. A feature is just a label, so if you had > > some nodes with Intel processors and some with AMD, you could attach > > the features, e.g. > > > > NodeName=node[001,002] Procs=12 Sockets=2 CoresPerSocket=6 > > ThreadsPerCore=1 RealMemory=42000 State=unknown Feature=intel > > NodeName=node[003,004] Procs=12 Sockets=2 CoresPerSocket=6 > > ThreadsPerCore=1 RealMemory=42000 State=unknown Feature=amd > > > > Users then just request the required CPU type in their batch scripts > > as a constraint, e.g: > > > > #SBATCH --constraint="intel" > > > >> My goal is to respond to people needs and launch their jobs as fast > >> as possible without losing time when one partition is idle whereas > >> the others are fully loaded. > > The easiest way to avoid the problem you describe is to just have one > > partition. If you have multiple partitions, the users have to > > understand what the differences are so that they can choose sensibly. > > > >> That's why I thought the fair share factor was the best solution > > Fairshare won't really help you with the problem that one partition > > might be full while another is empty. It will just affect the > > ordering of jobs in the full partition, although the weight of the > > partition term in the priority expression can affect the relative > > attractiveness of the partitions. > > > > In general, however, I would suggest you start with a simple set-up. > > You can always add to it later to address specific issues as they arise. > > For instance, you could start with one partition and two QOS: one for > > normal jobs and one for test jobs. The latter could have a higher > > priority, but only a short maximum run-time and possibly a low maximum > > number of jobs per user. > > > > Cheers, > > > > Loris > > >