Yes, I have defined the gres.conf with:

##gres.conf
Name=gpu File=/dev/nvidia[0-1]


I have two Nvidia devices per node


On 05/10/12 11:55, [email protected] wrote:
see error. read "man gres.conf". "File" defined?
--
Sent from my Android phone. Please excuse my brevity and typos.

Alfonso Pardo <[email protected]> wrote:
Activating the DEBUGFLAG=gres I have got the next error:

[2012-10-05T08:22:44] error: gres_plugin_node_config_unpack: gres/gpu lacks File parameter for node bc-p10-01
[2012-10-05T08:22:44] gres/gpu: state for bc-p10-01
[2012-10-05T08:22:44] error: Setting node bc-p10-01 state to DOWN
[2012-10-05T08:22:44] debug2: inserting bc-p10-01(cluster) with 8 cpus
[2012-10-05T08:22:44] error: _slurm_rpc_node_registration node=bc-p10-01: Invalid argument




On 05/10/12 08:20, Alfonso Pardo wrote:
Thanks!

I will activate the DegugFlag with "gres" value, and I will  wacth logs



On 04/10/12 18:00, Moe Jette wrote:
All that I can think of is the slurmd daemon was unable to read the  
gres.conf file when starting. You could add to the slurm.conf  
"DebugFlags=gres" for more information about gres.

Quoting Alfonso Pardo <[email protected]>:

Hello,

I have a cluster with GPU resources. The cluster works correctly,  
but sometimes fall nodes showing the following error: "gres/gpu  
count too low"


NodeName=bc-p10-01 Arch=x86_64 CoresPerSocket=4
   CPUAlloc=0 CPUErr=0 CPUTot=8 Features=(null)
   Gres=gpu:2
   NodeAddr=bc-p10-01 NodeHostName=bc-p10-01
   OS=Linux RealMemory=1 Sockets=2
   State=DOWN ThreadsPerCore=1 TmpDisk=0 Weight=1
   BootTime=2012-07-30T12:25:31 SlurmdStartTime=2012-07-31T08:16:03
   Reason=gres/gpu count too low


Any suggestions?



-- 

/Alfonso Pardo Díaz
*Researcher / System Administrator at CETA-Ciemat*
c/ Sola nº 1; 10200 Trujillo, ESPAÑA
Tel: +34 927 65 93 17 Fax: +34 927 32 32 37
CETA-Ciemat logo <http://www.ceta-ciemat.es/>/


----------------------------
Confidencialidad: Este mensaje y sus ficheros adjuntos se dirige  
exclusivamente a su destinatario y puede contener información  
privilegiada o confidencial. Si no es vd. el destinatario indicado,  
queda notificado de que la utilización, divulgación y/o copia sin  
autorización está prohibida en virtud de la legislación vigente. Si  
ha recibido este mensaje por error, le rogamos que nos lo comunique  
inmediatamente respondiendo al mensaje y proceda a su destrucción.

Disclaimer: This message and its attached files is intended  
exclusively for its recipients and may contain confidential  
information. If you received this e-mail in error you are hereby  
notified that any dissemination, copy or disclosure of this  
communication is strictly prohibited and may be unlawful. In this  
case, please notify us by a reply and delete this email and its  
contents immediately. ----------------------------




--

Alfonso Pardo Díaz
Researcher / System Administrator at CETA-Ciemat
c/ Sola nº 1; 10200 Trujillo, ESPAÑA
Tel: +34 927 65 93 17 Fax: +34 927 32 32 37
CETA-Ciemat logo



--

Alfonso Pardo Díaz
Researcher / System Administrator at CETA-Ciemat
c/ Sola nº 1; 10200 Trujillo, ESPAÑA
Tel: +34 927 65 93 17 Fax: +34 927 32 32 37
CETA-Ciemat logo



--

Alfonso Pardo Díaz
Researcher / System Administrator at CETA-Ciemat
c/ Sola nº 1; 10200 Trujillo, ESPAÑA
Tel: +34 927 65 93 17 Fax: +34 927 32 32 37
CETA-Ciemat logo

Reply via email to