[slurm-dev] I can´t send job for several nodes with gpus

Fany Pagés Díaz Tue, 20 Oct 2015 07:07:50 -0700

I configured the cluster for send jobs for gpus but is not works fine. When
I send a job for one node it works but I get a little error (only I can send
for node compute-0-0 for the others I can´t). This is the output.


 

[root@cluster bin]# srun -n 2 -N 1 --gres=gpu:2 mpirun cudampi 
  We have 2 processors
  Spawning from compute-0-0.local 
  CUDA MPI

  Probing nodes...
     Node        Psid  CUDA Cards (devID)
     ----------- ----- ---- ----------
  We have 2 processors
  Spawning from compute-0-0.local 
  CUDA MPI

  Probing nodes...
     Node        Psid  CUDA Cards (devID)
     ----------- ----- ---- ----------
+ compute-0-0.local     1    2 GeForce GTX 260 (0)  GeForce GTX 260 (1) 

+ compute-0-0.local     1    2 GeForce GTX 260 (0)  GeForce GTX 260 (1) 

--------------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--------------------------------------------------------------------------
srun: error: compute-0-0: tasks 0-1: Exited with exit code 1
[root@cluster bin]# 

 

 

But when I send for several nodes I have the next error.

[root@cluster bin]# srun -n 2 -N 2 --gres=gpu:2 mpirun cudampi 
srun: Force Terminated job 408
srun: error: Unable to allocate resources: Requested node configuration is
not available
[root@cluster bin]# 

 

I dont know what I missed because I have the same configuration in all
nodes.

 

This is the file /etc/slurm/slurm.conf

 

NodeName=cluster NodeAddr=10.8.52.254 gres=gpu:2 

GresTypes=gpu

SelectType=select/cons_res

 

This is the file /etc/slurm/gres.conf (this file is in each node)

 

#Configuracion de gres en los nodos
NodeName=compute-0-[0,3-4] Name=gpu File=/dev/nvidia[0-1]

#Configuration of two GPUs
Name=gpu File=/dev/nvidia0 
Name=gpu File=/dev/nvidia1 

 

Any idea? please any can help me? Thanks

[slurm-dev] I can´t send job for several nodes with gpus

Reply via email to