Re: [slurm-users] A Slurm topological scheduling question

2021-12-07 Thread Ole Holm Nielsen
Hi David, The topology.conf file groups nodes into sets such that parallel jobs will not be scheduled by Slurm across disjoint sets. Even though the topology.conf man-page refers to network switches, it's really about topology rather than network. You may use fake (non-existing) switch

Re: [slurm-users] A Slurm topological scheduling question

2021-12-07 Thread Bernstein, Noam CIV USN NRL (6393) Washington DC (USA)
You can schedule jobs across the two racks, with any given job only using one rack, by specifying #SBATCH --partition rack1,rack2 It'll only use 1 partition, in order of priority (not liti I never found a way for topology to do that - all I could get it to do is to prefer to keep things within a

Re: [slurm-users] A Slurm topological scheduling question

2021-12-07 Thread Paul Edmon
This should be fine assuming you don't mind the mismatch in CPU speeds.  Unless the codes are super sensitive to topology things should be okay as modern IB is wicked fast. In our environment here we have a variety of different hardware types all networked together on the same IB fabric. 

[slurm-users] A Slurm topological scheduling question

2021-12-07 Thread David Baker
Hello, These days we have now enabled topology aware scheduling on our Slurm cluster. One part of the cluster consists of two racks of AMD compute nodes. These racks are, now, treated as separate entities by Slurm. Soon, we may add another set of AMD nodes with slightly difference CPU specs to