So, say, you've got some (large) number of jobs that are more or less "perfectly" parallel, that is, they don't really have anything to do with one another; they can be executed in any sequence and have no interlocking dependencies; the only communication is to and from the client node to the gateway node to fetch more data, or to output results; there is no communication between client nodes. In this case, it would be preferable to run LLN (or CR_LLN? which? why?) versus running CR_Core_Memory?
Best, Sean On Thu, May 8, 2014 at 4:21 PM, Lloyd Brown <[email protected]> wrote: > > Don't forget the communication implications of the task distribution > either. In general, if you start with fewer nodes, and change to more > nodes (for the same total number of processes), your communication is > more likely to be going between nodes, which will be slower than working > within a node. > > Also, if the communication pattern is highly adjacent (lots of > communication with near neighbors, but not much to farther neighbors), > using a cyclic allocation may also hurt you, even in the same > node/processes-per-node allocation, since the neighbors to a process are > more likely to be on different nodes. > > How severe this is will depend on the specific algorithm, software, > communication pattern, etc. But as you work with your users, it's worth > considering. > > Lloyd Brown > Systems Administrator > Fulton Supercomputing Lab > Brigham Young University > http://marylou.byu.edu > > On 05/08/2014 12:01 PM, Ryan Cox wrote: > > Rather than maximize fragmentation, you probably want to do it on a > > per-job basis. If you want one core per node: sbatch: -N $numnodes -n > > $numnodes. Anything else would require the -m flag. I haven't played > > with it recently but I think you would want -m cyclic. > > > > Ryan > > > > On 05/08/2014 11:49 AM, Atom Powers wrote: > >> How to spread jobs among nodes? > >> > >> It appears that my Slurm cluster is scheduling jobs to load up nodes > >> as much as possible before putting jobs on other nodes. I understand > >> the reasons for doing this, however I foresee my users wanting to > >> spread jobs out among as many nodes as possible for various reasons, > >> some of which are even valid. > >> > >> How would I configure the scheduler to distribute jobs in something > >> like a round-robin fashion to many nodes instead of loading jobs onto > >> just a few nodes? > >> > >> I currently have: > >> 'SchedulerType' => 'sched/builtin', > >> 'SelectTypeParameters' => 'CR_Core_Memory', > >> 'SelectType' => 'select/cons_res', > >> > >> -- > >> Perfection is just a word I use occasionally with mustard. > >> --Atom Powers-- > > > > -- > > Ryan Cox > > Operations Director > > Fulton Supercomputing Lab > > Brigham Young University > > >
