All that I can think of is the slurmd daemon was unable to read the  
gres.conf file when starting. You could add to the slurm.conf  
"DebugFlags=gres" for more information about gres.

Quoting Alfonso Pardo <[email protected]>:

>
> Hello,
>
> I have a cluster with GPU resources. The cluster works correctly,  
> but sometimes fall nodes showing the following error: "gres/gpu  
> count too low"
>
>
> NodeName=bc-p10-01 Arch=x86_64 CoresPerSocket=4
>    CPUAlloc=0 CPUErr=0 CPUTot=8 Features=(null)
>    Gres=gpu:2
>    NodeAddr=bc-p10-01 NodeHostName=bc-p10-01
>    OS=Linux RealMemory=1 Sockets=2
>    State=DOWN ThreadsPerCore=1 TmpDisk=0 Weight=1
>    BootTime=2012-07-30T12:25:31 SlurmdStartTime=2012-07-31T08:16:03
>    Reason=gres/gpu count too low
>
>
> Any suggestions?
>
>
>
> -- 
>
> /Alfonso Pardo Díaz
> *Researcher / System Administrator at CETA-Ciemat*
> c/ Sola nº 1; 10200 Trujillo, ESPAÑA
> Tel: +34 927 65 93 17 Fax: +34 927 32 32 37
> CETA-Ciemat logo <http://www.ceta-ciemat.es/>/
>
>
> ----------------------------
> Confidencialidad: Este mensaje y sus ficheros adjuntos se dirige  
> exclusivamente a su destinatario y puede contener información  
> privilegiada o confidencial. Si no es vd. el destinatario indicado,  
> queda notificado de que la utilización, divulgación y/o copia sin  
> autorización está prohibida en virtud de la legislación vigente. Si  
> ha recibido este mensaje por error, le rogamos que nos lo comunique  
> inmediatamente respondiendo al mensaje y proceda a su destrucción.
>
> Disclaimer: This message and its attached files is intended  
> exclusively for its recipients and may contain confidential  
> information. If you received this e-mail in error you are hereby  
> notified that any dissemination, copy or disclosure of this  
> communication is strictly prohibited and may be unlawful. In this  
> case, please notify us by a reply and delete this email and its  
> contents immediately. ----------------------------
>
>

Reply via email to