Michel is correct that each node requires its own gres.conf file and  
can not reference other nodes generic resources. Node 1 and 2 would  
both have a gres.conf file that look something line this:
Name=gpu File=/dev/nvidia0
Name=gpu File=/dev/nvidia1

Regarding the change to skip nvidia files; that required changes in  
some SLURM RPCs and adding new functions to the plugin so we did not  
want to add it to version 2.3. If you want to apply the patch yourself  
to version 2.3 for your own use, it is here:
https://github.com/SchedMD/slurm/commit/bccf0f8542ad98f7df8de3750df149563bd37eb6.patch

----- Forwarded message from [email protected] -----
     Date: Thu, 16 Feb 2012 10:22:35 -0500
     From: Michel Bourget <[email protected]>
Reply-To: [email protected]
  Subject: Re: [slurm-dev] Can I specify a file parameter for a  
resource on other node?
       To: [email protected]
       Cc: Sergio Iserte Agut <[email protected]>

The [email protected] list will soon be retired.  The  
schedmd.com domain will host its replacement.

http://www.schedmd.com/slurmdocs/mail.html

The new list is now operational.  Please resubmit this message to  
[email protected]

The archive of the slurm-dev list will remain here:   
http://groups.google.com/group/slurm-devel.  Postings to the new list  
will be archived to the same place.

On 02/16/2012 08:25 AM, Sergio Iserte Agut wrote:
> Thank you for your answer,
> I'm thinking I have made a mistake in my explanation:
> I don't want to have a unique "gres.conf" file in the cluster.
> I want to /link /resources (in my case GPUs) on other nodes.
> For instance, if I have 2 nodes with 2 GPUs each one, I would like  
> to have this files:
> NODE 1 (gres.conf):
> Name=gpu File=/dev/nvidia0
> Name=gpu File=/dev/nvidia1
> Name=gpu File=*node2*:/dev/nvidia0
> Name=gpu File=*node2*:/dev/nvidia1
>
> NODE2 (gres.conf):
> Name=gpu File=/dev/nvidia0
> Name=gpu File=/dev/nvidia1
> Name=gpu File=*node1*:/dev/nvidia0
> Name=gpu File=*node1*:/dev/nvidia1
>

But, on node1, node2 gpus are not a resources for node1, right ?
Can node2 physically address node1 gpu device ? Not sure.
Just like the CPUs on node2 cannot address CPUs on node1, no ?

They are for the "app" of course.  And vice-versa.

I think it's the other way around. gres.conf is uniform across the cluster and
relevant node(s) slurm.conf definition have a specific "gres" attribute
attached to it. From above, sounds like you would have:

    Nodename=node[1-2] .... Gres=gpu:2

It sounds to me a unique gres.conf have the disadvantage to not be  
flexible when
a given node would need an exceptioned-customized-specific gres.conf.  
That seem
odd since it means the cluster configuration seems likely heterogeneous.

At srun level, you would "srun --gres=gpu:4 ..." I believe.

It sounds from the above you're trying to establish a relationship  
between "gpus"
between nodes. I don't think slurm have any idea(s) about that. The app
get resources reserved by slurm but slurm knows nothing about the  
inter-relationship
between gpus between the nodes. it builds up the environment for the  
job step, ie.
CUDA_VISIBLE_DEVICES.

I remember a recent slurm-dev thread where gres had issues with device  
file assignment
when:

    Name=gpu File=/dev/nvidia0
    Name=gpu File=/dev/nvidia2

Not sure if it's fixed in 2.3.3 or later.

PS;. Hey, not a gres expert, but could not resist. Feel free to correct me :)
      I am trying to figure out if I could use gres ( or topology ) to  
implement
      selectable rack(s) and/or iru(s) and/or nodes(s) but for our UV large SSI
      machines. To me, I'd call it "virtual nodes", ie "nodes within 1 node".
      Of course, it would be a new plugin.

> Regards!
>
> El 16 de febrer de 2012 14:11, Carles Fenoy <[email protected]  
> <mailto:[email protected]>> ha escrit:
>
>    The [email protected] <mailto:[email protected]>
>    list will soon be retired.  The schedmd.com <http://schedmd.com>
>    domain will host its replacement.
>
>    http://www.schedmd.com/slurmdocs/mail.html
>
>    The new list is now operational.  Please resubmit this message to
>    [email protected] <mailto:[email protected]>
>
>    The archive of the slurm-dev list will remain here:
>    http://groups.google.com/group/slurm-devel.  Postings to the new
>    list will be archived to the same place.
>
>    Hi Sergio,
>
>    I don't think you can do that. The file has to be in every node, and
>    is read by slurmd I think.
>
>    Regards,
>    Carles Fenoy
>
>    On Thu, Feb 16, 2012 at 1:15 PM, Sergio Iserte Agut
>    <[email protected] <mailto:[email protected]>> wrote:
>    > I'm wondering if i can do this in the "grep.conf" with my GPUs, for
>    > instance:
>    >
>    > Name=gpu File=/dev/nvidia0
>    > Name=gpu File=/dev/nvidia1
>    > Name=gpu File=host1:/dev/nvidia0
>    > Name=gpu File=host2:/dev/nvidia0
>    >
>    > Regards!
>
>
>
>    --
>    --
>    Carles Fenoy
>
>


-- 

-----------------------------------------------------------
      Michel Bourget - SGI - Linux Software Engineering
     "Past BIOS POST, everything else is extra" (travis)
-----------------------------------------------------------



----- End forwarded message -----

Reply via email to