|
SlurmctldDebug=3 can you change thedebug level to 7 for both ctld and node, and
provide the contents of both log files ? On 04/10//2011 14:09, Ramiro Alba wrote: Andy, On Tue, 2011-10-04 at 08:00 -0400, Andy Riebs wrote:Hi Ramiro,You might check to ensure that all of your clocks are in sync (varying by no more than a minute or two).No, they are not: root@jff:~# date; rsh jffmds date Tue Oct 4 14:03:52 CEST 2011 Tue Oct 4 14:03:42 CEST 2011 I am using Lustre and this is a MUST. Furthermore, the issue is also present in the simplest case (one node acting as a controller and compute)Andy On 10/04/2011 07:56 AM, Ramiro Alba wrote:Sten, On Tue, 2011-10-04 at 13:45 +0200, Sten Wolf wrote:did you create munge key?Yes, I did. See local test: # munge -n | unmunge STATUS: Success (0) ENCODE_HOST: jff.cttc-jffeth.org (10.2.254.1) ENCODE_TIME: 2011-10-04 13:54:19 (1317729259) DECODE_TIME: 2011-10-04 13:54:19 (1317729259) TTL: 300 CIPHER: aes128 (4) MAC: sha1 (3) ZIP: none (0) UID: root (0) GID: root (0) LENGTH: 0On 04/10//2011 13:28, Ramiro Alba wrote:Hi all, I am trying to setup a slurm controller (2.2.7) on Ubuntu 10.04 on cluster server and even with a simple slurm.conf (see attached file) the 'slurmctld' daemon sends continuously to the log file: debug: _slurm_recv_timeout at 0 of 4, recv zero bytes error: slurm_receive_msg: Zero Bytes were transmitted or received error: slurm_receive_msg: Zero Bytes were transmitted or received You can see at 'slurm.conf' that the same node acts as a controller and as a compute node. Jobs can be submited. Any other cluster node/server (apparently having the same/similar hardware and the same operating system) works smoothly without any error acting both as a controller or a backup controller. Can anyone give me some idea what to look at, so as to suppress those error messages? I've looked at the mailing list for similar messages but none was of help.-- Aquest missatge ha estat analitzat per MailScanner a la cerca de virus i d'altres continguts perillosos, i es considera que está net.-- Andy Riebs Hewlett-Packard Company High Performance Computing +1-786-263-9743 My opinions are not necessarily those of HP |
- [slurm-dev] Zero Bytes were transmitted or received... Ramiro Alba
- Re: [slurm-dev] Zero Bytes were transmitted or... Sten Wolf
- Re: [slurm-dev] Zero Bytes were transmitte... Ramiro Alba
- Re: [slurm-dev] Zero Bytes were transm... Andy Riebs
- Re: [slurm-dev] Zero Bytes were tr... Ramiro Alba
- Re: [slurm-dev] Zero Bytes we... Sten Wolf
- Re: [slurm-dev] Zero Byte... Sten Wolf
- Re: [slurm-dev] Zero ... Ramiro Alba
- Re: [slurm-dev] Zero ... Sten Wolf
- [slurm-dev] Re: Zero ... Andrej N. Gritsenko
- Re: [slurm-dev] Zero ... Ramiro Alba
- Re: [slurm-dev] Zero Byte... Moe Jette
- Re: [slurm-dev] Zero ... Ramiro Alba
- Re: [slurm-dev] Zero ... Ramiro Alba
- Re: [slurm-dev] Zero Bytes were transmitted or... Alejandro Lucero Palau
- Re: [slurm-dev] Zero Bytes were transmitte... Ramiro Alba
