All that I can think of is the slurmd daemon was unable to read the gres.conf file when starting. You could add to the slurm.conf "DebugFlags=gres" for more information about gres.
Quoting Alfonso Pardo <[email protected]>: > > Hello, > > I have a cluster with GPU resources. The cluster works correctly, > but sometimes fall nodes showing the following error: "gres/gpu > count too low" > > > NodeName=bc-p10-01 Arch=x86_64 CoresPerSocket=4 > CPUAlloc=0 CPUErr=0 CPUTot=8 Features=(null) > Gres=gpu:2 > NodeAddr=bc-p10-01 NodeHostName=bc-p10-01 > OS=Linux RealMemory=1 Sockets=2 > State=DOWN ThreadsPerCore=1 TmpDisk=0 Weight=1 > BootTime=2012-07-30T12:25:31 SlurmdStartTime=2012-07-31T08:16:03 > Reason=gres/gpu count too low > > > Any suggestions? > > > > -- > > /Alfonso Pardo Díaz > *Researcher / System Administrator at CETA-Ciemat* > c/ Sola nº 1; 10200 Trujillo, ESPAÑA > Tel: +34 927 65 93 17 Fax: +34 927 32 32 37 > CETA-Ciemat logo <http://www.ceta-ciemat.es/>/ > > > ---------------------------- > Confidencialidad: Este mensaje y sus ficheros adjuntos se dirige > exclusivamente a su destinatario y puede contener información > privilegiada o confidencial. Si no es vd. el destinatario indicado, > queda notificado de que la utilización, divulgación y/o copia sin > autorización está prohibida en virtud de la legislación vigente. Si > ha recibido este mensaje por error, le rogamos que nos lo comunique > inmediatamente respondiendo al mensaje y proceda a su destrucción. > > Disclaimer: This message and its attached files is intended > exclusively for its recipients and may contain confidential > information. If you received this e-mail in error you are hereby > notified that any dissemination, copy or disclosure of this > communication is strictly prohibited and may be unlawful. In this > case, please notify us by a reply and delete this email and its > contents immediately. ---------------------------- > >
