Moe, Thank you!
John DeSantis 2015-02-03 15:28 GMT-05:00 <[email protected]>: > > I believe that is OK, but don't know off the top of my head. Either test for > youself or > 1. add the new partition > 2 move pending jobs to the new partition > 3. Delete the other partitions once their jobs end > > > Quoting John Desantis <[email protected]>: > >> Moe, >> >> One last question when you have a chance. >> >> If there are running jobs with active partitions now and we switch to >> one partition with a topology, would those running jobs be lost? >> >> Thank you! >> John DeSantis >> >> 2015-02-03 12:42 GMT-05:00 <[email protected]>: >>> >>> >>> You can configure a single queue and use the topology/tree plugin to >>> identify the nodes on separate fabrics. >>> >>> >>> Quoting John Desantis <[email protected]>: >>>> >>>> >>>> Hello all, >>>> >>>> Unfortunately, I have some confusion regarding how to achieve a global >>>> and single partition for our users with several separate host groups >>>> after reading the man pages and various documentation. >>>> >>>> When I say host groups, I mean separate sets of hardware which utilize >>>> different infiniband fabrics and/or are accessible in different data >>>> centers, different CPU architectures, etc. >>>> >>>> During initial testing periods, I was able to have use of a default >>>> partition with all of the nodes allocated via the "Nodes=" value. >>>> All was well until a latter set of nodes were added which had a >>>> separate infiniband fabric. Testing proved that applications were >>>> attempting to utilize the nodes within the separate fabrics, which >>>> failed miserably, and as a result we're using separate partitions - >>>> which most users don't mind. >>>> >>>> Now that we're getting more users converted to Slurm, we're realizing >>>> that some users don't know how to check for free partitions and >>>> available hardware (boo!) and have grown used to our previous >>>> scheduler configuration of 1 global queue. >>>> >>>> I'm looking into how to emulate this and I'm not quite clear if this >>>> can be done using multiple partition definitions with a DEFAULT clause >>>> or not. I've looked at the topology/tree plugin as well and seeing >>>> that you can specify either switches or nodes, if this would be the >>>> preferred method to achieve 1 "global" partition which utilizes all of >>>> the separate hardware pools and respects the separate host groups. >>>> >>>> Thank you, >>>> John DeSantis >>> >>> >>> >>> >>> -- >>> Morris "Moe" Jette >>> CTO, SchedMD LLC >>> Commercial Slurm Development and Support > > > > -- > Morris "Moe" Jette > CTO, SchedMD LLC > Commercial Slurm Development and Support
