Sorry, I didn't realize this was a problem since the system sees 4 GPU-s
and doesn't care if there are two single cards or one dual cards. Is it
possible that you have compute mode set to exclusive (GPU locked to a
single process)? I mean, is it for sure a SLURM problem? Does it work if
you run the programs locally?
Cheers,
Barbara
On 02/11/2016 12:25 PM, Michael Senizaiz wrote:
Re: [slurm-dev] Re: GRES for both K80 GPU's
This doesn't enforce keeping the jobs on a K80. There are only 4 K80's
in the system. If I submit a 1 gpu job and a 2 gpu job after the
first will get GPU0 (0 and 1 are a K80, 2 and 3 are a K80, etc). The
2gpu job will then get GPU 1 and GPU 2. Then the user will complain
that their peer-to-peer code isn't working and the job performance is
bad because they are running across two discreet K80's and not the 2
GPU's on a single K80.
If this were k40 or k20 cards there wouldn't be an issue, but the K80
and any other dual GPU card are a different matter.
On Feb 11, 2016 5:14 AM, "Barbara Krasovec" <[email protected]
<mailto:[email protected]>> wrote:
Doesn't it work if you just specify that there are 8GPU-s on the
machine?
For example:
slurm.conf
#for allocation
SelectType=select/cons_res
SelectTypeParameters=CR_Core_Memory
#for generic resources
GresTypes=gpu #
NodeName=node[001-008] ... Features=gpu Gres=gpu:8
gres.conf
NodeName=node[001-008] Name=gpu Type=k80 File=/dev/nvidia[0-7]
CPUs=0-19
Cheers,
Barbara
On 02/10/2016 06:41 PM, Michael Senizaiz wrote:
I have a couple nodes with 4xK80 GPU's in them (nvidia0-7).
Is there a way to either request peer-to-peer GPU's, or force
allocation to 2 GPU's at a time? We'd prefer for the former (run
when peer-to-peer is available, unless you don't care) so we can
fit more users onto the machine. However, ensuring the
peer-to-peer codes get the proper allocation is more important.
User 1 - needs a full K80 with peer-to-peer
User 2 - needs a single GPU
User 3 - needs a single GPU
User 4 - Needs 2 full K80
I.e
0,1 - User 1
2 - User 2
3 - User 3
4,5,6,7 - User 4
Or
0,1 - User 1
2,3 - User 2
4,5 - User 3
QUEUED - User 4
I tried this gres configuration, but it didn't do what I expected.
Name=gpu File=/dev/nvidia[0-1] Count=2 CPUs=0-9
Name=gpu File=/dev/nvidia[2-3] Count=2 CPUs=0-9
Name=gpu File=/dev/nvidia[4-5] Count=2 CPUs=10-19
Name=gpu File=/dev/nvidia[6-7] Count=2 CPUs=10-19