Re: [slurm-dev] Can I specify a file parameter for a resource on other node?

Michel Bourget Thu, 16 Feb 2012 07:23:20 -0800

The slurm-dev@lists.llnl.gov list will soon be retired.  The schedmd.com domain 
will host its replacement.


http://www.schedmd.com/slurmdocs/mail.html

The new list is now operational.  Please resubmit this message to 
slurm-...@schedmd.com

The archive of the slurm-dev list will remain here:  
http://groups.google.com/group/slurm-devel.  Postings to the new list will be 
archived to the same place.

On 02/16/2012 08:25 AM, Sergio Iserte Agut wrote:

Thank you for your answer,
I'm thinking I have made a mistake in my explanation:
I don't want to have a unique "gres.conf" file in the cluster.
I want to /link /resources (in my case GPUs) on other nodes.

For instance, if I have 2 nodes with 2 GPUs each one, I would like tohave this files:

NODE 1 (gres.conf):
Name=gpu File=/dev/nvidia0
Name=gpu File=/dev/nvidia1
Name=gpu File=*node2*:/dev/nvidia0
Name=gpu File=*node2*:/dev/nvidia1

NODE2 (gres.conf):
Name=gpu File=/dev/nvidia0
Name=gpu File=/dev/nvidia1
Name=gpu File=*node1*:/dev/nvidia0
Name=gpu File=*node1*:/dev/nvidia1


But, on node1, node2 gpus are not a resources for node1, right ?
Can node2 physically address node1 gpu device ? Not sure.
Just like the CPUs on node2 cannot address CPUs on node1, no ?

They are for the "app" of course.  And vice-versa.

I think it's the other way around. gres.conf is uniform across thecluster and

relevant node(s) slurm.conf definition have a specific "gres" attribute
attached to it. From above, sounds like you would have:

   Nodename=node[1-2] .... Gres=gpu:2

It sounds to me a unique gres.conf have the disadvantage to not beflexible whena given node would need an exceptioned-customized-specific gres.conf.That seem

odd since it means the cluster configuration seems likely heterogeneous.

At srun level, you would "srun --gres=gpu:4 ..." I believe.

It sounds from the above you're trying to establish a relationshipbetween "gpus"

between nodes. I don't think slurm have any idea(s) about that. The app

get resources reserved by slurm but slurm knows nothing about theinter-relationshipbetween gpus between the nodes. it builds up the environment for the jobstep, ie.

CUDA_VISIBLE_DEVICES.

I remember a recent slurm-dev thread where gres had issues with devicefile assignment

when:

   Name=gpu File=/dev/nvidia0
   Name=gpu File=/dev/nvidia2

Not sure if it's fixed in 2.3.3 or later.

PS;. Hey, not a gres expert, but could not resist. Feel free to correctme :)I am trying to figure out if I could use gres ( or topology ) toimplementselectable rack(s) and/or iru(s) and/or nodes(s) but for our UVlarge SSImachines. To me, I'd call it "virtual nodes", ie "nodes within 1node".

     Of course, it would be a new plugin.

Regards!

El 16 de febrer de 2012 14:11, Carles Fenoy <mini...@gmail.com<mailto:mini...@gmail.com>> ha escrit:


    The slurm-dev@lists.llnl.gov <mailto:slurm-dev@lists.llnl.gov>
    list will soon be retired.  The schedmd.com <http://schedmd.com>
    domain will host its replacement.

    http://www.schedmd.com/slurmdocs/mail.html

    The new list is now operational.  Please resubmit this message to
    slurm-...@schedmd.com <mailto:slurm-...@schedmd.com>

    The archive of the slurm-dev list will remain here:
    http://groups.google.com/group/slurm-devel.  Postings to the new
    list will be archived to the same place.

    Hi Sergio,

    I don't think you can do that. The file has to be in every node, and
    is read by slurmd I think.

    Regards,
    Carles Fenoy

    On Thu, Feb 16, 2012 at 1:15 PM, Sergio Iserte Agut
    <sise...@uji.es <mailto:sise...@uji.es>> wrote:
    > I'm wondering if i can do this in the "grep.conf" with my GPUs, for
    > instance:
    >
    > Name=gpu File=/dev/nvidia0
    > Name=gpu File=/dev/nvidia1
    > Name=gpu File=host1:/dev/nvidia0
    > Name=gpu File=host2:/dev/nvidia0
    >
    > Regards!



    --
    --
    Carles Fenoy



--

-----------------------------------------------------------
     Michel Bourget - SGI - Linux Software Engineering
    "Past BIOS POST, everything else is extra" (travis)
-----------------------------------------------------------

Re: [slurm-dev] Can I specify a file parameter for a resource on other node?

Reply via email to