I'm running version 2.6.9 and wondering if the preemption algorithm takes into account the topology, as defined in topology.conf, when it selects which jobs to preempt to make room for a new higher priority MPI job.

Based on what I have seen it appears that it doesn't.

The reason I ask is that we define our infiniband topology as 8 individual fabrics because we have 8 bladecenters that each have their own fabric, they are not interconnected, one partition includes all 8 bladecenters, 32 nodes per bladecenter.

Eventually enough jobs are preempted and the MPI job is scheduled into a bladecenter, but it comes at the cost of many jobs. The main problem is that it preempts jobs on bladecenters where the MPI job does not ultimately land.

If it took into consideration our defined topology and focused on preempting jobs that reside in a single bladecenter, it could make room for the MPI job with a much lower number of preempted jobs.

We have been scratching our heads on this one for a while.

SelectType=select/cons_res
PreemptType=preempt/partition_prio
TopologyPlugin=topology/tree

Thanks

--
Marcin Sliwowski | SysAdmin@RENCI | 919-445-0479

Reply via email to