[slurm-dev] Re: How to spread jobs among nodes?

Morris Jette Thu, 08 May 2014 15:39:25 -0700

Then it goes to another node...

On May 8, 2014 3:28:46 PM PDT, Atom Powers <[email protected]> wrote:
>Thank you Morris,
>
>LLN looks like it may be what I'm looking for.
>
>If it allocates jobs based on CPU load, what happens when a node is
>running
>a high-memory, low-cpu job and somebody schedules a high-cpu,
>low-memory
>job that won't fit on that node because of memory requirements?
>
>
>
>On Thu, May 8, 2014 at 1:34 PM, Sean Caron <[email protected]> wrote:
>
>>  So, say, you've got some (large) number of jobs that are more or
>less
>> "perfectly" parallel, that is, they don't really have anything to do
>with
>> one another; they can be executed in any sequence and have no
>interlocking
>> dependencies; the only communication is to and from the client node
>to the
>> gateway node to fetch more data, or to output results; there is no
>> communication between client nodes. In this case, it would be
>preferable to
>> run LLN (or CR_LLN? which? why?) versus running CR_Core_Memory?
>>
>> Best,
>>
>> Sean
>>
>>
>>
>> On Thu, May 8, 2014 at 4:21 PM, Lloyd Brown <[email protected]>
>wrote:
>>
>>>
>>> Don't forget the communication implications of the task distribution
>>> either.  In general, if you start with fewer nodes, and change to
>more
>>> nodes (for the same total number of processes), your communication
>is
>>> more likely to be going between nodes, which will be slower than
>working
>>> within a node.
>>>
>>> Also, if the communication pattern is highly adjacent (lots of
>>> communication with near neighbors, but not much to farther
>neighbors),
>>> using a cyclic allocation may also hurt you, even in the same
>>> node/processes-per-node allocation, since the neighbors to a process
>are
>>> more likely to be on different nodes.
>>>
>>> How severe this is will depend on the specific algorithm, software,
>>> communication pattern, etc.  But as you work with your users, it's
>worth
>>> considering.
>>>
>>> Lloyd Brown
>>> Systems Administrator
>>> Fulton Supercomputing Lab
>>> Brigham Young University
>>> http://marylou.byu.edu
>>>
>>> On 05/08/2014 12:01 PM, Ryan Cox wrote:
>>> > Rather than maximize fragmentation, you probably want to do it on
>a
>>> > per-job basis.  If you want one core per node:  sbatch:  -N
>$numnodes -n
>>> > $numnodes.  Anything else would require the -m flag.  I haven't
>played
>>> > with it recently but I think you would want -m cyclic.
>>> >
>>> > Ryan
>>> >
>>> > On 05/08/2014 11:49 AM, Atom Powers wrote:
>>> >> How to spread jobs among nodes?
>>> >>
>>> >> It appears that my Slurm cluster is scheduling jobs to load up
>nodes
>>> >> as much as possible before putting jobs on other nodes. I
>understand
>>> >> the reasons for doing this, however I foresee my users wanting to
>>> >> spread jobs out among as many nodes as possible for various
>reasons,
>>> >> some of which are even valid.
>>> >>
>>> >> How would I configure the scheduler to distribute jobs in
>something
>>> >> like a round-robin fashion to many nodes instead of loading jobs
>onto
>>> >> just a few nodes?
>>> >>
>>> >> I currently have:
>>> >>     'SchedulerType'         => 'sched/builtin',
>>> >>     'SelectTypeParameters'  => 'CR_Core_Memory',
>>> >>     'SelectType'            => 'select/cons_res',
>>> >>
>>> >> --
>>> >> Perfection is just a word I use occasionally with mustard.
>>> >> --Atom Powers--
>>> >
>>> > --
>>> > Ryan Cox
>>> > Operations Director
>>> > Fulton Supercomputing Lab
>>> > Brigham Young University
>>> >
>>>
>>
>>
>
>
>-- 
>Perfection is just a word I use occasionally with mustard.
>--Atom Powers--


-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.

[slurm-dev] Re: How to spread jobs among nodes?

Reply via email to