The slurm-dev@lists.llnl.gov list will soon be retired. The schedmd.com domain
will host its replacement.
http://www.schedmd.com/slurmdocs/mail.html
The new list is now operational. Please resubmit this message to
slurm-...@schedmd.com
The archive of the slurm-dev list will remain here:
http://groups.google.com/group/slurm-devel. Postings to the new list will be
archived to the same place.
On 02/16/2012 08:25 AM, Sergio Iserte Agut wrote:
Thank you for your answer,
I'm thinking I have made a mistake in my explanation:
I don't want to have a unique "gres.conf" file in the cluster.
I want to /link /resources (in my case GPUs) on other nodes.
For instance, if I have 2 nodes with 2 GPUs each one, I would like to
have this files:
NODE 1 (gres.conf):
Name=gpu File=/dev/nvidia0
Name=gpu File=/dev/nvidia1
Name=gpu File=*node2*:/dev/nvidia0
Name=gpu File=*node2*:/dev/nvidia1
NODE2 (gres.conf):
Name=gpu File=/dev/nvidia0
Name=gpu File=/dev/nvidia1
Name=gpu File=*node1*:/dev/nvidia0
Name=gpu File=*node1*:/dev/nvidia1
But, on node1, node2 gpus are not a resources for node1, right ?
Can node2 physically address node1 gpu device ? Not sure.
Just like the CPUs on node2 cannot address CPUs on node1, no ?
They are for the "app" of course. And vice-versa.
I think it's the other way around. gres.conf is uniform across the
cluster and
relevant node(s) slurm.conf definition have a specific "gres" attribute
attached to it. From above, sounds like you would have:
Nodename=node[1-2] .... Gres=gpu:2
It sounds to me a unique gres.conf have the disadvantage to not be
flexible when
a given node would need an exceptioned-customized-specific gres.conf.
That seem
odd since it means the cluster configuration seems likely heterogeneous.
At srun level, you would "srun --gres=gpu:4 ..." I believe.
It sounds from the above you're trying to establish a relationship
between "gpus"
between nodes. I don't think slurm have any idea(s) about that. The app
get resources reserved by slurm but slurm knows nothing about the
inter-relationship
between gpus between the nodes. it builds up the environment for the job
step, ie.
CUDA_VISIBLE_DEVICES.
I remember a recent slurm-dev thread where gres had issues with device
file assignment
when:
Name=gpu File=/dev/nvidia0
Name=gpu File=/dev/nvidia2
Not sure if it's fixed in 2.3.3 or later.
PS;. Hey, not a gres expert, but could not resist. Feel free to correct
me :)
I am trying to figure out if I could use gres ( or topology ) to
implement
selectable rack(s) and/or iru(s) and/or nodes(s) but for our UV
large SSI
machines. To me, I'd call it "virtual nodes", ie "nodes within 1
node".
Of course, it would be a new plugin.
Regards!
El 16 de febrer de 2012 14:11, Carles Fenoy <mini...@gmail.com
<mailto:mini...@gmail.com>> ha escrit:
The slurm-dev@lists.llnl.gov <mailto:slurm-dev@lists.llnl.gov>
list will soon be retired. The schedmd.com <http://schedmd.com>
domain will host its replacement.
http://www.schedmd.com/slurmdocs/mail.html
The new list is now operational. Please resubmit this message to
slurm-...@schedmd.com <mailto:slurm-...@schedmd.com>
The archive of the slurm-dev list will remain here:
http://groups.google.com/group/slurm-devel. Postings to the new
list will be archived to the same place.
Hi Sergio,
I don't think you can do that. The file has to be in every node, and
is read by slurmd I think.
Regards,
Carles Fenoy
On Thu, Feb 16, 2012 at 1:15 PM, Sergio Iserte Agut
<sise...@uji.es <mailto:sise...@uji.es>> wrote:
> I'm wondering if i can do this in the "grep.conf" with my GPUs, for
> instance:
>
> Name=gpu File=/dev/nvidia0
> Name=gpu File=/dev/nvidia1
> Name=gpu File=host1:/dev/nvidia0
> Name=gpu File=host2:/dev/nvidia0
>
> Regards!
--
--
Carles Fenoy
--
-----------------------------------------------------------
Michel Bourget - SGI - Linux Software Engineering
"Past BIOS POST, everything else is extra" (travis)
-----------------------------------------------------------