SlurmctldDebug=3
SlurmctldLogFile=/var/log/slurm-llnl/slurmctld.log
SlurmdDebug=3
SlurmdLogFile=/var/log/slurm-llnl/slurmd.log

can you change thedebug level to 7 for both ctld and node, and provide the contents of both log files ?


On 04/10//2011 14:09, Ramiro Alba wrote:
Andy,

On Tue, 2011-10-04 at 08:00 -0400, Andy Riebs wrote:
Hi Ramiro,

You might check to ensure that all of your clocks are in sync (varying 
by no more than a minute or two).
No, they are not:

root@jff:~# date; rsh jffmds date
Tue Oct  4 14:03:52 CEST 2011
Tue Oct  4 14:03:42 CEST 2011

I am using Lustre and this is a MUST.

Furthermore, the issue is also present in the simplest case (one node
acting as a controller and compute)




Andy

On 10/04/2011 07:56 AM, Ramiro Alba wrote:
Sten,


On Tue, 2011-10-04 at 13:45 +0200, Sten Wolf wrote:
did you create munge key?

Yes, I did. See local test:

# munge -n | unmunge

STATUS:           Success (0)
ENCODE_HOST:      jff.cttc-jffeth.org (10.2.254.1)
ENCODE_TIME:      2011-10-04 13:54:19 (1317729259)
DECODE_TIME:      2011-10-04 13:54:19 (1317729259)
TTL:              300
CIPHER:           aes128 (4)
MAC:              sha1 (3)
ZIP:              none (0)
UID:              root (0)
GID:              root (0)
LENGTH:           0




On 04/10//2011 13:28, Ramiro Alba wrote:
Hi all,

I am trying to setup a slurm controller (2.2.7) on Ubuntu 10.04 on
cluster server and even with a simple slurm.conf (see attached file) the
'slurmctld' daemon sends continuously to the log file:

debug:  _slurm_recv_timeout at 0 of 4, recv zero bytes
error: slurm_receive_msg: Zero Bytes were transmitted or received
error: slurm_receive_msg: Zero Bytes were transmitted or received

You can see at 'slurm.conf' that the same node acts as a controller and
as a compute node. Jobs can be submited.


Any other cluster node/server (apparently having the same/similar
hardware and the same operating system) works smoothly without any error
acting both as a controller or a backup controller.

Can anyone give me some idea what to look at, so as to suppress those
error messages?
I've looked at the mailing list for similar messages but none was of
help.

-- 
Aquest missatge ha estat analitzat per MailScanner
a la cerca de virus i d'altres continguts perillosos,
i es considera que está net.
-- 
Andy Riebs
Hewlett-Packard Company
High Performance Computing
+1-786-263-9743
My opinions are not necessarily those of HP



    

Reply via email to