[slurm-dev] RE: A little bit help from my slurm-friends

Marcin Stolarek Thu, 16 Feb 2017 01:45:13 -0800

You can also have a submit plugin that will put job in multiple partition
if non specified. This should reduce the drawback of multiple partitions.


However I  think that with features and topology plugin you should be able
to aviod multiple partitions setup.

cheers
Marcin

2017-01-17 9:49 GMT+01:00 David WALTER <david.wal...@ens.fr>:

>
> Thanks Paul for your response and your advices.
>
> That's actually the reason why they asked me to set 3 and now 4
> partitions. As we have now 4 different generation of nodes with significant
> differences of hardware (not the same CPU, not the same amount of RAM) we
> thought that it was a good solution.
>
> I will test with people to adjust the solution with the needs of the many.
>
> Thanks again
>
> ------------------------------
> David WALTER
> The computer guy
> david.wal...@ens.fr
> 01/44/32/27/94
>
> INSERM U960
> Laboratoire de Neurosciences Cognitives
> Ecole Normale Supérieure
> 29, rue d'Ulm
> 75005 Paris
>
> -----Message d'origine-----
> De : Paul Edmon [mailto:ped...@cfa.harvard.edu]
> Envoyé : lundi 16 janvier 2017 16:37
> À : slurm-dev
> Objet : [slurm-dev] RE: A little bit help from my slurm-friends
>
>
> I agree having multiple partitions will decrease efficiency of the
> scheduler.  That said if you have to do it, you have to do it.  Using the
> features is a good way to go if people need specific ones.  I could see
> having multiple partitions so you can charge differently for each
> generation of hardware, as run times will invariably be different.
> Still if that isn't a concern just have a single queue.
>
> For multifactor I would turn on fairshare and age.  JobSize really isn't
> useful unless you have people running multicore jobs and you want to
> prioritize, or deprioritize those.
>
> If you end up in a multipartition scenario then I recommend having a
> backfill queue that underlies all the partitions and setting up REQUEUE on
> that partition.  That way people can farm idle cycles.  This is especially
> good for people who are hardware agnostic and don't really care when their
> jobs get done but rather just have a ton to do that can be interrupted at
> any moment. That's what we do here and we have 110 partitions.  Our
> backfill queue does a pretty good job up picking up the idle cores but
> still there is structural inefficiencies with that many partitions so we
> never get above about 70% usage of our hardware.
>
> So just keep that in mind when you are setting things up.  More partitions
> means more structural inefficiency but it does give you other benefits such
> as isolating hardware for specific use.  It really depends on what you
> need.  I highly recommend experimenting to figure out what fits you and
> your users best.
>
> -Paul Edmon-
>
> On 1/16/2017 10:16 AM, Loris Bennett wrote:
> > David WALTER <david.wal...@ens.fr> writes:
> >
> >> Dear Loris,
> >>
> >> Thanks for your response !
> >>
> >> I'm going to look on this features in slurm.conf.  I only configured
> >> the CPUs, Sockets.... per node. Do you have any example or link to
> >> explain me how it's working and what can I use ?
> > It's not very complicated.  A feature is just a label, so if you had
> > some nodes with Intel processors and some with AMD, you could attach
> > the features, e.g.
> >
> > NodeName=node[001,002] Procs=12 Sockets=2 CoresPerSocket=6
> > ThreadsPerCore=1 RealMemory=42000 State=unknown Feature=intel
> > NodeName=node[003,004] Procs=12 Sockets=2 CoresPerSocket=6
> > ThreadsPerCore=1 RealMemory=42000 State=unknown Feature=amd
> >
> > Users then just request the required CPU type in their batch scripts
> > as a constraint, e.g:
> >
> > #SBATCH --constraint="intel"
> >
> >> My goal is to respond to people needs and launch their jobs as fast
> >> as possible without losing time when one partition is idle whereas
> >> the others are fully loaded.
> > The easiest way to avoid the problem you describe is to just have one
> > partition.  If you have multiple partitions, the users have to
> > understand what the differences are so that they can choose sensibly.
> >
> >> That's why I thought the fair share factor was the best solution
> > Fairshare won't really help you with the problem that one partition
> > might be full while another is empty.  It will just affect the
> > ordering of jobs in the full partition, although the weight of the
> > partition term in the priority expression can affect the relative
> > attractiveness of the partitions.
> >
> > In general, however, I would suggest you start with a simple set-up.
> > You can always add to it later to address specific issues as they arise.
> > For instance, you could start with one partition and two QOS: one for
> > normal jobs and one for test jobs.  The latter could have a higher
> > priority, but only a short maximum run-time and possibly a low maximum
> > number of jobs per user.
> >
> > Cheers,
> >
> > Loris
> >
>

[slurm-dev] RE: A little bit help from my slurm-friends

Reply via email to