[slurm-dev] Re: GRES for both K80 GPU's

Michael Senizaiz Thu, 11 Feb 2016 03:25:44 -0800

This doesn't enforce keeping the jobs on a K80.  There are only 4 K80's in
the system.  If I submit a 1 gpu job and a 2 gpu job after the first will
get GPU0 (0 and 1 are a K80, 2 and 3 are a K80, etc).  The 2gpu job will
then get GPU 1 and GPU 2.  Then the user will complain that their
peer-to-peer code isn't working and the job performance is bad because they
are running across two discreet K80's and not the 2 GPU's on a single K80.


If this were k40 or k20 cards there wouldn't be an issue, but the K80 and
any other dual GPU card are a different matter.
On Feb 11, 2016 5:14 AM, "Barbara Krasovec" <[email protected]> wrote:

>
> Doesn't it work if you just specify that there are 8GPU-s on the machine?
>
> For example:
>
> slurm.conf
> #for allocation
> SelectType=select/cons_res
> SelectTypeParameters=CR_Core_Memory
> #for generic resources
> GresTypes=gpu #
> NodeName=node[001-008] ... Features=gpu Gres=gpu:8
>
>
> gres.conf
> NodeName=node[001-008] Name=gpu Type=k80 File=/dev/nvidia[0-7] CPUs=0-19
>
> Cheers,
> Barbara
>
>
>
>
> On 02/10/2016 06:41 PM, Michael Senizaiz wrote:
>
> I have a couple nodes with 4xK80 GPU's in them (nvidia0-7).
>
> Is there a way to either request peer-to-peer GPU's, or force allocation
> to 2 GPU's at a time?  We'd prefer for the former (run when peer-to-peer
> is available, unless you don't care) so we can fit more users onto the
> machine.  However, ensuring the peer-to-peer codes get the proper
> allocation is more important.
>
>
> User 1 - needs a full K80 with peer-to-peer
> User 2 - needs a single GPU
> User 3 - needs a single GPU
> User 4 - Needs 2 full K80
>
> I.e
> 0,1 - User 1
> 2    - User 2
> 3    - User 3
> 4,5,6,7 - User 4
>
> Or
>
> 0,1 - User 1
> 2,3  - User 2
> 4,5   - User 3
> QUEUED - User 4
>
> I tried this gres configuration, but it didn't do what I expected.
>
> Name=gpu File=/dev/nvidia[0-1] Count=2 CPUs=0-9
> Name=gpu File=/dev/nvidia[2-3] Count=2 CPUs=0-9
> Name=gpu File=/dev/nvidia[4-5] Count=2 CPUs=10-19
> Name=gpu File=/dev/nvidia[6-7] Count=2 CPUs=10-19
>
>
>

[slurm-dev] Re: GRES for both K80 GPU's

Reply via email to