Thanks!

I will activate the DegugFlag with "gres" value, and I will  wacth logs



On 04/10/12 18:00, Moe Jette wrote:
All that I can think of is the slurmd daemon was unable to read the  
gres.conf file when starting. You could add to the slurm.conf  
"DebugFlags=gres" for more information about gres.

Quoting Alfonso Pardo <[email protected]>:

Hello,

I have a cluster with GPU resources. The cluster works correctly,  
but sometimes fall nodes showing the following error: "gres/gpu  
count too low"


NodeName=bc-p10-01 Arch=x86_64 CoresPerSocket=4
   CPUAlloc=0 CPUErr=0 CPUTot=8 Features=(null)
   Gres=gpu:2
   NodeAddr=bc-p10-01 NodeHostName=bc-p10-01
   OS=Linux RealMemory=1 Sockets=2
   State=DOWN ThreadsPerCore=1 TmpDisk=0 Weight=1
   BootTime=2012-07-30T12:25:31 SlurmdStartTime=2012-07-31T08:16:03
   Reason=gres/gpu count too low


Any suggestions?



-- 

/Alfonso Pardo Díaz
*Researcher / System Administrator at CETA-Ciemat*
c/ Sola nº 1; 10200 Trujillo, ESPAÑA
Tel: +34 927 65 93 17 Fax: +34 927 32 32 37
CETA-Ciemat logo <http://www.ceta-ciemat.es/>/


----------------------------
Confidencialidad: Este mensaje y sus ficheros adjuntos se dirige  
exclusivamente a su destinatario y puede contener información  
privilegiada o confidencial. Si no es vd. el destinatario indicado,  
queda notificado de que la utilización, divulgación y/o copia sin  
autorización está prohibida en virtud de la legislación vigente. Si  
ha recibido este mensaje por error, le rogamos que nos lo comunique  
inmediatamente respondiendo al mensaje y proceda a su destrucción.

Disclaimer: This message and its attached files is intended  
exclusively for its recipients and may contain confidential  
information. If you received this e-mail in error you are hereby  
notified that any dissemination, copy or disclosure of this  
communication is strictly prohibited and may be unlawful. In this  
case, please notify us by a reply and delete this email and its  
contents immediately. ----------------------------




--

Alfonso Pardo Díaz
Researcher / System Administrator at CETA-Ciemat
c/ Sola nº 1; 10200 Trujillo, ESPAÑA
Tel: +34 927 65 93 17 Fax: +34 927 32 32 37
CETA-Ciemat logo

Reply via email to