[slurm-dev] Re: How to spread jobs among nodes?

Atom Powers Thu, 08 May 2014 15:28:30 -0700

Thank you Morris,

LLN looks like it may be what I'm looking for.


If it allocates jobs based on CPU load, what happens when a node is running
a high-memory, low-cpu job and somebody schedules a high-cpu, low-memory
job that won't fit on that node because of memory requirements?



On Thu, May 8, 2014 at 1:34 PM, Sean Caron <[email protected]> wrote:

>  So, say, you've got some (large) number of jobs that are more or less
> "perfectly" parallel, that is, they don't really have anything to do with
> one another; they can be executed in any sequence and have no interlocking
> dependencies; the only communication is to and from the client node to the
> gateway node to fetch more data, or to output results; there is no
> communication between client nodes. In this case, it would be preferable to
> run LLN (or CR_LLN? which? why?) versus running CR_Core_Memory?
>
> Best,
>
> Sean
>
>
>
> On Thu, May 8, 2014 at 4:21 PM, Lloyd Brown <[email protected]> wrote:
>
>>
>> Don't forget the communication implications of the task distribution
>> either.  In general, if you start with fewer nodes, and change to more
>> nodes (for the same total number of processes), your communication is
>> more likely to be going between nodes, which will be slower than working
>> within a node.
>>
>> Also, if the communication pattern is highly adjacent (lots of
>> communication with near neighbors, but not much to farther neighbors),
>> using a cyclic allocation may also hurt you, even in the same
>> node/processes-per-node allocation, since the neighbors to a process are
>> more likely to be on different nodes.
>>
>> How severe this is will depend on the specific algorithm, software,
>> communication pattern, etc.  But as you work with your users, it's worth
>> considering.
>>
>> Lloyd Brown
>> Systems Administrator
>> Fulton Supercomputing Lab
>> Brigham Young University
>> http://marylou.byu.edu
>>
>> On 05/08/2014 12:01 PM, Ryan Cox wrote:
>> > Rather than maximize fragmentation, you probably want to do it on a
>> > per-job basis.  If you want one core per node:  sbatch:  -N $numnodes -n
>> > $numnodes.  Anything else would require the -m flag.  I haven't played
>> > with it recently but I think you would want -m cyclic.
>> >
>> > Ryan
>> >
>> > On 05/08/2014 11:49 AM, Atom Powers wrote:
>> >> How to spread jobs among nodes?
>> >>
>> >> It appears that my Slurm cluster is scheduling jobs to load up nodes
>> >> as much as possible before putting jobs on other nodes. I understand
>> >> the reasons for doing this, however I foresee my users wanting to
>> >> spread jobs out among as many nodes as possible for various reasons,
>> >> some of which are even valid.
>> >>
>> >> How would I configure the scheduler to distribute jobs in something
>> >> like a round-robin fashion to many nodes instead of loading jobs onto
>> >> just a few nodes?
>> >>
>> >> I currently have:
>> >>     'SchedulerType'         => 'sched/builtin',
>> >>     'SelectTypeParameters'  => 'CR_Core_Memory',
>> >>     'SelectType'            => 'select/cons_res',
>> >>
>> >> --
>> >> Perfection is just a word I use occasionally with mustard.
>> >> --Atom Powers--
>> >
>> > --
>> > Ryan Cox
>> > Operations Director
>> > Fulton Supercomputing Lab
>> > Brigham Young University
>> >
>>
>
>


-- 
Perfection is just a word I use occasionally with mustard.
--Atom Powers--

[slurm-dev] Re: How to spread jobs among nodes?

Reply via email to