The slurm-dev@lists.llnl.gov list will soon be retired.  The schedmd.com domain 
will host its replacement.

http://www.schedmd.com/slurmdocs/mail.html

The new list is now operational.  Please resubmit this message to 
slurm-...@schedmd.com

The archive of the slurm-dev list will remain here:  
http://groups.google.com/group/slurm-devel.  Postings to the new list will be 
archived to the same place.

On 02/16/2012 08:25 AM, Sergio Iserte Agut wrote:
Thank you for your answer,
I'm thinking I have made a mistake in my explanation:
I don't want to have a unique "gres.conf" file in the cluster.
I want to /link /resources (in my case GPUs) on other nodes.
For instance, if I have 2 nodes with 2 GPUs each one, I would like to have this files:
NODE 1 (gres.conf):
Name=gpu File=/dev/nvidia0
Name=gpu File=/dev/nvidia1
Name=gpu File=*node2*:/dev/nvidia0
Name=gpu File=*node2*:/dev/nvidia1

NODE2 (gres.conf):
Name=gpu File=/dev/nvidia0
Name=gpu File=/dev/nvidia1
Name=gpu File=*node1*:/dev/nvidia0
Name=gpu File=*node1*:/dev/nvidia1


But, on node1, node2 gpus are not a resources for node1, right ?
Can node2 physically address node1 gpu device ? Not sure.
Just like the CPUs on node2 cannot address CPUs on node1, no ?

They are for the "app" of course.  And vice-versa.

I think it's the other way around. gres.conf is uniform across the cluster and
relevant node(s) slurm.conf definition have a specific "gres" attribute
attached to it. From above, sounds like you would have:

   Nodename=node[1-2] .... Gres=gpu:2

It sounds to me a unique gres.conf have the disadvantage to not be flexible when a given node would need an exceptioned-customized-specific gres.conf. That seem
odd since it means the cluster configuration seems likely heterogeneous.

At srun level, you would "srun --gres=gpu:4 ..." I believe.

It sounds from the above you're trying to establish a relationship between "gpus"
between nodes. I don't think slurm have any idea(s) about that. The app
get resources reserved by slurm but slurm knows nothing about the inter-relationship between gpus between the nodes. it builds up the environment for the job step, ie.
CUDA_VISIBLE_DEVICES.

I remember a recent slurm-dev thread where gres had issues with device file assignment
when:

   Name=gpu File=/dev/nvidia0
   Name=gpu File=/dev/nvidia2

Not sure if it's fixed in 2.3.3 or later.

PS;. Hey, not a gres expert, but could not resist. Feel free to correct me :) I am trying to figure out if I could use gres ( or topology ) to implement selectable rack(s) and/or iru(s) and/or nodes(s) but for our UV large SSI machines. To me, I'd call it "virtual nodes", ie "nodes within 1 node".
     Of course, it would be a new plugin.

Regards!

El 16 de febrer de 2012 14:11, Carles Fenoy <mini...@gmail.com <mailto:mini...@gmail.com>> ha escrit:

    The slurm-dev@lists.llnl.gov <mailto:slurm-dev@lists.llnl.gov>
    list will soon be retired.  The schedmd.com <http://schedmd.com>
    domain will host its replacement.

    http://www.schedmd.com/slurmdocs/mail.html

    The new list is now operational.  Please resubmit this message to
    slurm-...@schedmd.com <mailto:slurm-...@schedmd.com>

    The archive of the slurm-dev list will remain here:
    http://groups.google.com/group/slurm-devel.  Postings to the new
    list will be archived to the same place.

    Hi Sergio,

    I don't think you can do that. The file has to be in every node, and
    is read by slurmd I think.

    Regards,
    Carles Fenoy

    On Thu, Feb 16, 2012 at 1:15 PM, Sergio Iserte Agut
    <sise...@uji.es <mailto:sise...@uji.es>> wrote:
    > I'm wondering if i can do this in the "grep.conf" with my GPUs, for
    > instance:
    >
    > Name=gpu File=/dev/nvidia0
    > Name=gpu File=/dev/nvidia1
    > Name=gpu File=host1:/dev/nvidia0
    > Name=gpu File=host2:/dev/nvidia0
    >
    > Regards!



    --
    --
    Carles Fenoy




--

-----------------------------------------------------------
     Michel Bourget - SGI - Linux Software Engineering
    "Past BIOS POST, everything else is extra" (travis)
-----------------------------------------------------------

Reply via email to