Then it goes to another node... On May 8, 2014 3:28:46 PM PDT, Atom Powers <[email protected]> wrote: >Thank you Morris, > >LLN looks like it may be what I'm looking for. > >If it allocates jobs based on CPU load, what happens when a node is >running >a high-memory, low-cpu job and somebody schedules a high-cpu, >low-memory >job that won't fit on that node because of memory requirements? > > > >On Thu, May 8, 2014 at 1:34 PM, Sean Caron <[email protected]> wrote: > >> So, say, you've got some (large) number of jobs that are more or >less >> "perfectly" parallel, that is, they don't really have anything to do >with >> one another; they can be executed in any sequence and have no >interlocking >> dependencies; the only communication is to and from the client node >to the >> gateway node to fetch more data, or to output results; there is no >> communication between client nodes. In this case, it would be >preferable to >> run LLN (or CR_LLN? which? why?) versus running CR_Core_Memory? >> >> Best, >> >> Sean >> >> >> >> On Thu, May 8, 2014 at 4:21 PM, Lloyd Brown <[email protected]> >wrote: >> >>> >>> Don't forget the communication implications of the task distribution >>> either. In general, if you start with fewer nodes, and change to >more >>> nodes (for the same total number of processes), your communication >is >>> more likely to be going between nodes, which will be slower than >working >>> within a node. >>> >>> Also, if the communication pattern is highly adjacent (lots of >>> communication with near neighbors, but not much to farther >neighbors), >>> using a cyclic allocation may also hurt you, even in the same >>> node/processes-per-node allocation, since the neighbors to a process >are >>> more likely to be on different nodes. >>> >>> How severe this is will depend on the specific algorithm, software, >>> communication pattern, etc. But as you work with your users, it's >worth >>> considering. >>> >>> Lloyd Brown >>> Systems Administrator >>> Fulton Supercomputing Lab >>> Brigham Young University >>> http://marylou.byu.edu >>> >>> On 05/08/2014 12:01 PM, Ryan Cox wrote: >>> > Rather than maximize fragmentation, you probably want to do it on >a >>> > per-job basis. If you want one core per node: sbatch: -N >$numnodes -n >>> > $numnodes. Anything else would require the -m flag. I haven't >played >>> > with it recently but I think you would want -m cyclic. >>> > >>> > Ryan >>> > >>> > On 05/08/2014 11:49 AM, Atom Powers wrote: >>> >> How to spread jobs among nodes? >>> >> >>> >> It appears that my Slurm cluster is scheduling jobs to load up >nodes >>> >> as much as possible before putting jobs on other nodes. I >understand >>> >> the reasons for doing this, however I foresee my users wanting to >>> >> spread jobs out among as many nodes as possible for various >reasons, >>> >> some of which are even valid. >>> >> >>> >> How would I configure the scheduler to distribute jobs in >something >>> >> like a round-robin fashion to many nodes instead of loading jobs >onto >>> >> just a few nodes? >>> >> >>> >> I currently have: >>> >> 'SchedulerType' => 'sched/builtin', >>> >> 'SelectTypeParameters' => 'CR_Core_Memory', >>> >> 'SelectType' => 'select/cons_res', >>> >> >>> >> -- >>> >> Perfection is just a word I use occasionally with mustard. >>> >> --Atom Powers-- >>> > >>> > -- >>> > Ryan Cox >>> > Operations Director >>> > Fulton Supercomputing Lab >>> > Brigham Young University >>> > >>> >> >> > > >-- >Perfection is just a word I use occasionally with mustard. >--Atom Powers--
-- Sent from my Android phone with K-9 Mail. Please excuse my brevity.
