Andy,

On Tue, 2011-10-04 at 08:00 -0400, Andy Riebs wrote:
> Hi Ramiro,
> 
> You might check to ensure that all of your clocks are in sync (varying 
> by no more than a minute or two).

No, they are not:

root@jff:~# date; rsh jffmds date
Tue Oct  4 14:03:52 CEST 2011
Tue Oct  4 14:03:42 CEST 2011

I am using Lustre and this is a MUST.

Furthermore, the issue is also present in the simplest case (one node
acting as a controller and compute)




> 
> Andy
> 
> On 10/04/2011 07:56 AM, Ramiro Alba wrote:
> > Sten,
> >
> >
> > On Tue, 2011-10-04 at 13:45 +0200, Sten Wolf wrote:
> >> did you create munge key?
> >>
> > Yes, I did. See local test:
> >
> > # munge -n | unmunge
> >
> > STATUS:           Success (0)
> > ENCODE_HOST:      jff.cttc-jffeth.org (10.2.254.1)
> > ENCODE_TIME:      2011-10-04 13:54:19 (1317729259)
> > DECODE_TIME:      2011-10-04 13:54:19 (1317729259)
> > TTL:              300
> > CIPHER:           aes128 (4)
> > MAC:              sha1 (3)
> > ZIP:              none (0)
> > UID:              root (0)
> > GID:              root (0)
> > LENGTH:           0
> >
> >
> >
> >
> >>
> >> On 04/10//2011 13:28, Ramiro Alba wrote:
> >>> Hi all,
> >>>
> >>> I am trying to setup a slurm controller (2.2.7) on Ubuntu 10.04 on
> >>> cluster server and even with a simple slurm.conf (see attached file) the
> >>> 'slurmctld' daemon sends continuously to the log file:
> >>>
> >>> debug:  _slurm_recv_timeout at 0 of 4, recv zero bytes
> >>> error: slurm_receive_msg: Zero Bytes were transmitted or received
> >>> error: slurm_receive_msg: Zero Bytes were transmitted or received
> >>>
> >>> You can see at 'slurm.conf' that the same node acts as a controller and
> >>> as a compute node. Jobs can be submited.
> >>>
> >>>
> >>> Any other cluster node/server (apparently having the same/similar
> >>> hardware and the same operating system) works smoothly without any error
> >>> acting both as a controller or a backup controller.
> >>>
> >>> Can anyone give me some idea what to look at, so as to suppress those
> >>> error messages?
> >>> I've looked at the mailing list for similar messages but none was of
> >>> help.
> >>>
> >> -- 
> >> Aquest missatge ha estat analitzat per MailScanner
> >> a la cerca de virus i d'altres continguts perillosos,
> >> i es considera que está net.
> 
> -- 
> Andy Riebs
> Hewlett-Packard Company
> High Performance Computing
> +1-786-263-9743
> My opinions are not necessarily those of HP
> 
> 

-- 
Ramiro Alba

Centre Tecnològic de Tranferència de Calor
http://www.cttc.upc.edu


Escola Tècnica Superior d'Enginyeries
Industrial i Aeronàutica de Terrassa
Colom 11, E-08222, Terrassa, Barcelona, Spain
Tel: (+34) 93 739 86 46



-- 
Aquest missatge ha estat analitzat per MailScanner
a la cerca de virus i d'altres continguts perillosos,
i es considera que est� net.

Reply via email to