Yes, I have defined the gres.conf with:
##gres.conf
Name=gpu File=/dev/nvidia[0-1]
I have two Nvidia devices per node
On 05/10/12 11:55, [email protected] wrote:
see error. read "man gres.conf". "File" defined?
--
Sent from my Android phone. Please excuse my brevity and typos.
Alfonso Pardo
<[email protected]> wrote:
Activating the DEBUGFLAG=gres I have got
the next error:
[2012-10-05T08:22:44] error: gres_plugin_node_config_unpack:
gres/gpu lacks File parameter for node bc-p10-01
[2012-10-05T08:22:44] gres/gpu: state for bc-p10-01
[2012-10-05T08:22:44] error: Setting node bc-p10-01 state to
DOWN
[2012-10-05T08:22:44] debug2: inserting bc-p10-01(cluster)
with 8 cpus
[2012-10-05T08:22:44] error: _slurm_rpc_node_registration
node=bc-p10-01: Invalid argument
On 05/10/12 08:20, Alfonso Pardo wrote:
Thanks!
I will activate the DegugFlag with "gres" value, and I will
wacth logs
On 04/10/12 18:00, Moe Jette wrote:
All that I can think of is the slurmd daemon was unable to read the
gres.conf file when starting. You could add to the slurm.conf
"DebugFlags=gres" for more information about gres.
Quoting Alfonso Pardo <[email protected]>:
Hello,
I have a cluster with GPU resources. The cluster works correctly,
but sometimes fall nodes showing the following error: "gres/gpu
count too low"
NodeName=bc-p10-01 Arch=x86_64 CoresPerSocket=4
CPUAlloc=0 CPUErr=0 CPUTot=8 Features=(null)
Gres=gpu:2
NodeAddr=bc-p10-01 NodeHostName=bc-p10-01
OS=Linux RealMemory=1 Sockets=2
State=DOWN ThreadsPerCore=1 TmpDisk=0 Weight=1
BootTime=2012-07-30T12:25:31 SlurmdStartTime=2012-07-31T08:16:03
Reason=gres/gpu count too low
Any suggestions?
--
/Alfonso Pardo Díaz
*Researcher / System Administrator at CETA-Ciemat*
c/ Sola nº 1; 10200 Trujillo, ESPAÑA
Tel: +34 927 65 93 17 Fax: +34 927 32 32 37
CETA-Ciemat logo <http://www.ceta-ciemat.es/>/
----------------------------
Confidencialidad: Este mensaje y sus ficheros adjuntos se dirige
exclusivamente a su destinatario y puede contener información
privilegiada o confidencial. Si no es vd. el destinatario indicado,
queda notificado de que la utilización, divulgación y/o copia sin
autorización está prohibida en virtud de la legislación vigente. Si
ha recibido este mensaje por error, le rogamos que nos lo comunique
inmediatamente respondiendo al mensaje y proceda a su destrucción.
Disclaimer: This message and its attached files is intended
exclusively for its recipients and may contain confidential
information. If you received this e-mail in error you are hereby
notified that any dissemination, copy or disclosure of this
communication is strictly prohibited and may be unlawful. In this
case, please notify us by a reply and delete this email and its
contents immediately. ----------------------------
--
Alfonso Pardo Díaz
Researcher / System Administrator at
CETA-Ciemat
c/ Sola nº 1; 10200 Trujillo, ESPAÑA
Tel: +34 927 65 93 17 Fax: +34 927 32 32 37
![CETA-Ciemat logo]()
--
Alfonso Pardo Díaz
Researcher / System Administrator at
CETA-Ciemat
c/ Sola nº 1; 10200 Trujillo, ESPAÑA
Tel: +34 927 65 93 17 Fax: +34 927 32 32 37
![CETA-Ciemat logo]()
--
Alfonso
Pardo Díaz
Researcher
/ System Administrator at CETA-Ciemat
c/
Sola nº 1; 10200 Trujillo, ESPAÑA
Tel:
+34 927 65 93 17 Fax: +34 927 32 32 37

|
- [slurm-dev] Re: gres/gpu count too low Alfonso Pardo
-