This doesn't enforce keeping the jobs on a K80. There are only 4 K80's in the system. If I submit a 1 gpu job and a 2 gpu job after the first will get GPU0 (0 and 1 are a K80, 2 and 3 are a K80, etc). The 2gpu job will then get GPU 1 and GPU 2. Then the user will complain that their peer-to-peer code isn't working and the job performance is bad because they are running across two discreet K80's and not the 2 GPU's on a single K80.
If this were k40 or k20 cards there wouldn't be an issue, but the K80 and any other dual GPU card are a different matter. On Feb 11, 2016 5:14 AM, "Barbara Krasovec" <[email protected]> wrote: > > Doesn't it work if you just specify that there are 8GPU-s on the machine? > > For example: > > slurm.conf > #for allocation > SelectType=select/cons_res > SelectTypeParameters=CR_Core_Memory > #for generic resources > GresTypes=gpu # > NodeName=node[001-008] ... Features=gpu Gres=gpu:8 > > > gres.conf > NodeName=node[001-008] Name=gpu Type=k80 File=/dev/nvidia[0-7] CPUs=0-19 > > Cheers, > Barbara > > > > > On 02/10/2016 06:41 PM, Michael Senizaiz wrote: > > I have a couple nodes with 4xK80 GPU's in them (nvidia0-7). > > Is there a way to either request peer-to-peer GPU's, or force allocation > to 2 GPU's at a time? We'd prefer for the former (run when peer-to-peer > is available, unless you don't care) so we can fit more users onto the > machine. However, ensuring the peer-to-peer codes get the proper > allocation is more important. > > > User 1 - needs a full K80 with peer-to-peer > User 2 - needs a single GPU > User 3 - needs a single GPU > User 4 - Needs 2 full K80 > > I.e > 0,1 - User 1 > 2 - User 2 > 3 - User 3 > 4,5,6,7 - User 4 > > Or > > 0,1 - User 1 > 2,3 - User 2 > 4,5 - User 3 > QUEUED - User 4 > > I tried this gres configuration, but it didn't do what I expected. > > Name=gpu File=/dev/nvidia[0-1] Count=2 CPUs=0-9 > Name=gpu File=/dev/nvidia[2-3] Count=2 CPUs=0-9 > Name=gpu File=/dev/nvidia[4-5] Count=2 CPUs=10-19 > Name=gpu File=/dev/nvidia[6-7] Count=2 CPUs=10-19 > > >
