We have sched/wiki tied in to a custom allocator that does all the actual scheduling. At the time it was designed, the driving factors were priority-based preemption, user-specified priorities, and job affinity. We use select/cons_res and typically run 1 job per core. There is a preference for certain jobs to be located on the same node as other jobs but still have their own allocation. I'd be interested in exploring alternatives that bring the scheduling back into slurm's arena, but for now we are using sched/wiki.
-JE On Thu, 2011-04-14 at 11:01 -0700, Danny Auble wrote: > I don't know what you are using the sched/wiki for, but the perlapi > should work just fine. You might consider using the > priority/multifactor plugin for your priority calculation along with > the sched/backfill if you are using something else today. > > > In any case the multi cluster stuff should work fine for most cases. I > am interested what you are using sched/wiki for though. > > > Danny > > > > I hadn't read into the multi-cluster functionality yet. That might > be > > > just the way to go but we're making heavy use of the sched/wiki > > > interface and perlapi bindings. Is the multi-cluster functionality > > > exposed to those layers? > > > > > > -JE > > > > > > On Thu, 2011-04-14 at 09:56 -0700, Auble, Danny wrote: > > > > I am guessing you have each one of these clusters in a separate > partition. > > > > > > > > How big are these clusters? You can turn off the communication by > just set up the treewidth to the number of nodes in you system. > > > > > > > > Is there any reason you don't want to/can't use the multi cluster > functionality, and operate in traditional SLURM fashion with 1 > slurmctld per cluster? > > > > > > > > Danny > > > > > > > > > -----Original Message----- > > > > > From: [email protected] [mailto:owner-slurm- > > > > > [email protected]] On Behalf Of Josh England > > > > > Sent: Thursday, April 14, 2011 9:50 AM > > > > > To: [email protected] > > > > > Subject: [slurm-dev] two clusters / one scheduler > > > > > > > > > > I'd like to have a single slurm instance schedule jobs onto two > > > > > physically disjoint clusters. The compute nodes of one cluster > cannot > > > > > reach the compute nodes of the other cluster, but they can all > see the > > > > > scheduler nodes. With slurm's hierarchical communication, when > some > > > > > nodes can't reach others slurm thinks the nodes are not > responding and > > > > > would eventually mark them offline. Is there any way to > logically group > > > > > nodes into separate communication groups to avoid this problem? > > > > > > > > > > -JE > > > > > > > > > > > > > > > > > > > > > > > > > >
