Hi, Just a 'me-too' - also running 14.03.6 on compute nodes, with master nodes running RHEL5 with -O0 and getting the same thing in the logs, so it's not just you.
-- *Nathan Harper* // IT Systems Architect *e: * [email protected] // *t: * 0117 906 1104 // *m: * 07875 510891 // *w: * www.cfms.org.uk <http://www.cfms.org.uk%22> // [image: Linkedin grey icon scaled] <http://uk.linkedin.com/pub/nathan-harper/21/696/b81> CFMS Services Ltd // Bristol & Bath Science Park // Dirac Crescent // Emersons Green // Bristol // BS16 7FR [image: 4.2 CFMS_Artwork_RGB] <http://www.cfms.org.uk> ------------------------------ CFMS Services Ltd is registered in England and Wales No 05742022 - a subsidiary of CFMS Ltd CFMS Services Ltd registered office // Victoria House // 51 Victoria Street // Bristol // BS1 6AD On 18 August 2014 11:12, Gerben Roest <[email protected]> wrote: > > Hi Paddy, > > > On Sun, Aug 17, 2014 at 01:26:12PM -0700, Gerben Roest wrote: >> >> >>> I run a slurmctld and slurmdbd on a Scientific Linux (SL) 5 server and >>> have three SL6 nodes, all running Slurm 14.03.6, with one node behind >>> another slurmctld on another cluster. The whole slurm setup seems to run >>> fine with tests, even submitting from one cluster to the other. >>> The slurmctld daemon on the machine where slurmdbd is also running, shows >>> >>> error: slurm_receive_msg: Zero Bytes were transmitted or received >>> >> >> For me, that's usually a version mis-match somewhere. One of the daemons >> is a >> version behind and so there's a protocol mis-match when trying to >> communicate. >> I'd double-check that all versions are the same (and have been restarted >> since >> any upgrades) first. >> > > I have checked the versions of the main slurmctld and the slurmd's on the > nodes, and the slurmctld on the other cluster and slurmd's on that nodes, > and all use 14.03.6. I didn't upgrade, started straight from 14.03.6. > The only thing might be that the main master runs 14.03.6 compiled for SL5 > with "-O0" and the others run it from another dir (NFS) compiled from the > same source but without "-O0" and "make installed" to that other dir, > created for SL6 machines (because of GLIBC deps). But I guess you should be > able to run slurm on different builds provided it is the same version? It > all seems to work but I only get the strange logs. > If you would need more verbose information, please let me know what I have > to check. > > Gerben >
