|
Ramiro, Another option might be iptables blocking port 6817. Just for the sake of testing - change auth/munge to auth/none (if you have ruled out iptables) and see if anything changes in the controller log. BTW, according to the docs debug level 7 is highest. On 04/10//2011 14:44, Ramiro Alba wrote: Sten, Yes. I did. Look at:root@jff:/var/log/slurm-llnl# scontrol show conf | grep -i debug DebugFlags = (null) SlurmctldDebug = 7 SlurmdDebug = 7 I suppose you say that because you can only see debug3 tag at logs, but it is the same as using debug devel 9. Has this sense? Cheers On Tue, 2011-10-04 at 14:28 +0200, Sten Wolf wrote:Hi, did you remember to restart the slurm service? you might need to "service slurm stop ; service slurm start" , as "service slurm restart" doesn't actually re-read the slurm.conf file. The debug level seems to be 3 currently. On 04/10//2011 14:24, Ramiro Alba wrote:Sten, There you are (as attachments) Cheers On Tue, 2011-10-04 at 14:13 +0200, Sten Wolf wrote:SlurmctldDebug=3 SlurmctldLogFile=/var/log/slurm-llnl/slurmctld.log SlurmdDebug=3 SlurmdLogFile=/var/log/slurm-llnl/slurmd.logcan you change thedebug level to 7 for both ctld and node, and provide the contents of both log files ? On 04/10//2011 14:09, Ramiro Alba wrote:Andy, On Tue, 2011-10-04 at 08:00 -0400, Andy Riebs wrote:Hi Ramiro, You might check to ensure that all of your clocks are in sync (varying by no more than a minute or two).No, they are not: root@jff:~# date; rsh jffmds date Tue Oct 4 14:03:52 CEST 2011 Tue Oct 4 14:03:42 CEST 2011 I am using Lustre and this is a MUST. Furthermore, the issue is also present in the simplest case (one node acting as a controller and compute)Andy On 10/04/2011 07:56 AM, Ramiro Alba wrote:Sten, On Tue, 2011-10-04 at 13:45 +0200, Sten Wolf wrote:did you create munge key?Yes, I did. See local test: # munge -n | unmunge STATUS: Success (0) ENCODE_HOST: jff.cttc-jffeth.org (10.2.254.1) ENCODE_TIME: 2011-10-04 13:54:19 (1317729259) DECODE_TIME: 2011-10-04 13:54:19 (1317729259) TTL: 300 CIPHER: aes128 (4) MAC: sha1 (3) ZIP: none (0) UID: root (0) GID: root (0) LENGTH: 0On 04/10//2011 13:28, Ramiro Alba wrote:Hi all, I am trying to setup a slurm controller (2.2.7) on Ubuntu 10.04 on cluster server and even with a simple slurm.conf (see attached file) the 'slurmctld' daemon sends continuously to the log file: debug: _slurm_recv_timeout at 0 of 4, recv zero bytes error: slurm_receive_msg: Zero Bytes were transmitted or received error: slurm_receive_msg: Zero Bytes were transmitted or received You can see at 'slurm.conf' that the same node acts as a controller and as a compute node. Jobs can be submited. Any other cluster node/server (apparently having the same/similar hardware and the same operating system) works smoothly without any error acting both as a controller or a backup controller. Can anyone give me some idea what to look at, so as to suppress those error messages? I've looked at the mailing list for similar messages but none was of help.-- Aquest missatge ha estat analitzat per MailScanner a la cerca de virus i d'altres continguts perillosos, i es considera que está net.-- Andy Riebs Hewlett-Packard Company High Performance Computing +1-786-263-9743 My opinions are not necessarily those of HP-- Aquest missatge ha estat analitzat per MailScanner a la cerca de virus i d'altres continguts perillosos, i es considera que está net.-- Aquest missatge ha estat analitzat per MailScanner a la cerca de virus i d'altres continguts perillosos, i es considera que está net. |
- [slurm-dev] Zero Bytes were transmitted or received... Ramiro Alba
- Re: [slurm-dev] Zero Bytes were transmitted or... Sten Wolf
- Re: [slurm-dev] Zero Bytes were transmitte... Ramiro Alba
- Re: [slurm-dev] Zero Bytes were transm... Andy Riebs
- Re: [slurm-dev] Zero Bytes were tr... Ramiro Alba
- Re: [slurm-dev] Zero Bytes we... Sten Wolf
- Re: [slurm-dev] Zero Byte... Sten Wolf
- Re: [slurm-dev] Zero ... Ramiro Alba
- Re: [slurm-dev] Zero ... Sten Wolf
- [slurm-dev] Re: Zero ... Andrej N. Gritsenko
- Re: [slurm-dev] Zero ... Ramiro Alba
- Re: [slurm-dev] Zero Byte... Moe Jette
- Re: [slurm-dev] Zero ... Ramiro Alba
- Re: [slurm-dev] Zero ... Ramiro Alba
- Re: [slurm-dev] Zero Bytes were transmitted or... Alejandro Lucero Palau
- Re: [slurm-dev] Zero Bytes were transmitte... Ramiro Alba
