Andy, On Tue, 2011-10-04 at 08:00 -0400, Andy Riebs wrote: > Hi Ramiro, > > You might check to ensure that all of your clocks are in sync (varying > by no more than a minute or two).
No, they are not: root@jff:~# date; rsh jffmds date Tue Oct 4 14:03:52 CEST 2011 Tue Oct 4 14:03:42 CEST 2011 I am using Lustre and this is a MUST. Furthermore, the issue is also present in the simplest case (one node acting as a controller and compute) > > Andy > > On 10/04/2011 07:56 AM, Ramiro Alba wrote: > > Sten, > > > > > > On Tue, 2011-10-04 at 13:45 +0200, Sten Wolf wrote: > >> did you create munge key? > >> > > Yes, I did. See local test: > > > > # munge -n | unmunge > > > > STATUS: Success (0) > > ENCODE_HOST: jff.cttc-jffeth.org (10.2.254.1) > > ENCODE_TIME: 2011-10-04 13:54:19 (1317729259) > > DECODE_TIME: 2011-10-04 13:54:19 (1317729259) > > TTL: 300 > > CIPHER: aes128 (4) > > MAC: sha1 (3) > > ZIP: none (0) > > UID: root (0) > > GID: root (0) > > LENGTH: 0 > > > > > > > > > >> > >> On 04/10//2011 13:28, Ramiro Alba wrote: > >>> Hi all, > >>> > >>> I am trying to setup a slurm controller (2.2.7) on Ubuntu 10.04 on > >>> cluster server and even with a simple slurm.conf (see attached file) the > >>> 'slurmctld' daemon sends continuously to the log file: > >>> > >>> debug: _slurm_recv_timeout at 0 of 4, recv zero bytes > >>> error: slurm_receive_msg: Zero Bytes were transmitted or received > >>> error: slurm_receive_msg: Zero Bytes were transmitted or received > >>> > >>> You can see at 'slurm.conf' that the same node acts as a controller and > >>> as a compute node. Jobs can be submited. > >>> > >>> > >>> Any other cluster node/server (apparently having the same/similar > >>> hardware and the same operating system) works smoothly without any error > >>> acting both as a controller or a backup controller. > >>> > >>> Can anyone give me some idea what to look at, so as to suppress those > >>> error messages? > >>> I've looked at the mailing list for similar messages but none was of > >>> help. > >>> > >> -- > >> Aquest missatge ha estat analitzat per MailScanner > >> a la cerca de virus i d'altres continguts perillosos, > >> i es considera que está net. > > -- > Andy Riebs > Hewlett-Packard Company > High Performance Computing > +1-786-263-9743 > My opinions are not necessarily those of HP > > -- Ramiro Alba Centre Tecnològic de Tranferència de Calor http://www.cttc.upc.edu Escola Tècnica Superior d'Enginyeries Industrial i Aeronàutica de Terrassa Colom 11, E-08222, Terrassa, Barcelona, Spain Tel: (+34) 93 739 86 46 -- Aquest missatge ha estat analitzat per MailScanner a la cerca de virus i d'altres continguts perillosos, i es considera que est� net.
