[slurm-dev] Re: How to spread jobs among nodes?

Sean Caron Thu, 08 May 2014 13:34:24 -0700

So, say, you've got some (large) number of jobs that are more or less
"perfectly" parallel, that is, they don't really have anything to do with
one another; they can be executed in any sequence and have no interlocking
dependencies; the only communication is to and from the client node to the
gateway node to fetch more data, or to output results; there is no
communication between client nodes. In this case, it would be preferable to
run LLN (or CR_LLN? which? why?) versus running CR_Core_Memory?


Best,

Sean



On Thu, May 8, 2014 at 4:21 PM, Lloyd Brown <[email protected]> wrote:

>
> Don't forget the communication implications of the task distribution
> either.  In general, if you start with fewer nodes, and change to more
> nodes (for the same total number of processes), your communication is
> more likely to be going between nodes, which will be slower than working
> within a node.
>
> Also, if the communication pattern is highly adjacent (lots of
> communication with near neighbors, but not much to farther neighbors),
> using a cyclic allocation may also hurt you, even in the same
> node/processes-per-node allocation, since the neighbors to a process are
> more likely to be on different nodes.
>
> How severe this is will depend on the specific algorithm, software,
> communication pattern, etc.  But as you work with your users, it's worth
> considering.
>
> Lloyd Brown
> Systems Administrator
> Fulton Supercomputing Lab
> Brigham Young University
> http://marylou.byu.edu
>
> On 05/08/2014 12:01 PM, Ryan Cox wrote:
> > Rather than maximize fragmentation, you probably want to do it on a
> > per-job basis.  If you want one core per node:  sbatch:  -N $numnodes -n
> > $numnodes.  Anything else would require the -m flag.  I haven't played
> > with it recently but I think you would want -m cyclic.
> >
> > Ryan
> >
> > On 05/08/2014 11:49 AM, Atom Powers wrote:
> >> How to spread jobs among nodes?
> >>
> >> It appears that my Slurm cluster is scheduling jobs to load up nodes
> >> as much as possible before putting jobs on other nodes. I understand
> >> the reasons for doing this, however I foresee my users wanting to
> >> spread jobs out among as many nodes as possible for various reasons,
> >> some of which are even valid.
> >>
> >> How would I configure the scheduler to distribute jobs in something
> >> like a round-robin fashion to many nodes instead of loading jobs onto
> >> just a few nodes?
> >>
> >> I currently have:
> >>     'SchedulerType'         => 'sched/builtin',
> >>     'SelectTypeParameters'  => 'CR_Core_Memory',
> >>     'SelectType'            => 'select/cons_res',
> >>
> >> --
> >> Perfection is just a word I use occasionally with mustard.
> >> --Atom Powers--
> >
> > --
> > Ryan Cox
> > Operations Director
> > Fulton Supercomputing Lab
> > Brigham Young University
> >
>

[slurm-dev] Re: How to spread jobs among nodes?

Reply via email to