Sorry, I didn't realize this was a problem since the system sees 4 GPU-s and doesn't care if there are two single cards or one dual cards. Is it possible that you have compute mode set to exclusive (GPU locked to a single process)? I mean, is it for sure a SLURM problem? Does it work if you run the programs locally?

Cheers,
Barbara

On 02/11/2016 12:25 PM, Michael Senizaiz wrote:
Re: [slurm-dev] Re: GRES for both K80 GPU's

This doesn't enforce keeping the jobs on a K80. There are only 4 K80's in the system. If I submit a 1 gpu job and a 2 gpu job after the first will get GPU0 (0 and 1 are a K80, 2 and 3 are a K80, etc). The 2gpu job will then get GPU 1 and GPU 2. Then the user will complain that their peer-to-peer code isn't working and the job performance is bad because they are running across two discreet K80's and not the 2 GPU's on a single K80.

If this were k40 or k20 cards there wouldn't be an issue, but the K80 and any other dual GPU card are a different matter.

On Feb 11, 2016 5:14 AM, "Barbara Krasovec" <[email protected] <mailto:[email protected]>> wrote:


    Doesn't it work if you just specify that there are 8GPU-s on the
    machine?

    For example:

    slurm.conf
    #for allocation
    SelectType=select/cons_res
    SelectTypeParameters=CR_Core_Memory
    #for generic resources
    GresTypes=gpu #
    NodeName=node[001-008] ... Features=gpu Gres=gpu:8


    gres.conf
    NodeName=node[001-008] Name=gpu Type=k80 File=/dev/nvidia[0-7]
    CPUs=0-19

    Cheers,
    Barbara




    On 02/10/2016 06:41 PM, Michael Senizaiz wrote:
    I have a couple nodes with 4xK80 GPU's in them (nvidia0-7).

    Is there a way to either request peer-to-peer GPU's, or force
    allocation to 2 GPU's at a time?  We'd prefer for the former (run
    when peer-to-peer is available, unless you don't care) so we can
    fit more users onto the machine.  However, ensuring the
    peer-to-peer codes get the proper allocation is more important.


    User 1 - needs a full K80 with peer-to-peer
    User 2 - needs a single GPU
    User 3 - needs a single GPU
    User 4 - Needs 2 full K80

    I.e
    0,1 - User 1
    2    - User 2
    3    - User 3
    4,5,6,7 - User 4

    Or

    0,1 - User 1
    2,3  - User 2
    4,5   - User 3
    QUEUED - User 4

    I tried this gres configuration, but it didn't do what I expected.

    Name=gpu File=/dev/nvidia[0-1] Count=2 CPUs=0-9
    Name=gpu File=/dev/nvidia[2-3] Count=2 CPUs=0-9
    Name=gpu File=/dev/nvidia[4-5] Count=2 CPUs=10-19
    Name=gpu File=/dev/nvidia[6-7] Count=2 CPUs=10-19


Reply via email to