Moe, One last question when you have a chance.
If there are running jobs with active partitions now and we switch to one partition with a topology, would those running jobs be lost? Thank you! John DeSantis 2015-02-03 12:42 GMT-05:00 <[email protected]>: > > You can configure a single queue and use the topology/tree plugin to > identify the nodes on separate fabrics. > > > Quoting John Desantis <[email protected]>: >> >> Hello all, >> >> Unfortunately, I have some confusion regarding how to achieve a global >> and single partition for our users with several separate host groups >> after reading the man pages and various documentation. >> >> When I say host groups, I mean separate sets of hardware which utilize >> different infiniband fabrics and/or are accessible in different data >> centers, different CPU architectures, etc. >> >> During initial testing periods, I was able to have use of a default >> partition with all of the nodes allocated via the "Nodes=" value. >> All was well until a latter set of nodes were added which had a >> separate infiniband fabric. Testing proved that applications were >> attempting to utilize the nodes within the separate fabrics, which >> failed miserably, and as a result we're using separate partitions - >> which most users don't mind. >> >> Now that we're getting more users converted to Slurm, we're realizing >> that some users don't know how to check for free partitions and >> available hardware (boo!) and have grown used to our previous >> scheduler configuration of 1 global queue. >> >> I'm looking into how to emulate this and I'm not quite clear if this >> can be done using multiple partition definitions with a DEFAULT clause >> or not. I've looked at the topology/tree plugin as well and seeing >> that you can specify either switches or nodes, if this would be the >> preferred method to achieve 1 "global" partition which utilizes all of >> the separate hardware pools and respects the separate host groups. >> >> Thank you, >> John DeSantis > > > > -- > Morris "Moe" Jette > CTO, SchedMD LLC > Commercial Slurm Development and Support
