Hi,

Just a 'me-too' - also running 14.03.6 on compute nodes, with master nodes
running RHEL5 with -O0 and getting the same thing in the logs, so it's not
just you.


-- 
*Nathan Harper* // IT Systems Architect

*e: * [email protected] // *t: * 0117 906 1104 // *m: * 07875 510891 //
*w: * www.cfms.org.uk <http://www.cfms.org.uk%22> // [image: Linkedin grey
icon scaled] <http://uk.linkedin.com/pub/nathan-harper/21/696/b81>
CFMS Services Ltd // Bristol & Bath Science Park // Dirac Crescent // Emersons
Green // Bristol // BS16 7FR

[image: 4.2 CFMS_Artwork_RGB] <http://www.cfms.org.uk>

------------------------------
CFMS Services Ltd is registered in England and Wales No 05742022 - a
subsidiary of CFMS Ltd
CFMS Services Ltd registered office // Victoria House // 51 Victoria Street
// Bristol // BS1 6AD


On 18 August 2014 11:12, Gerben Roest <[email protected]> wrote:

>
> Hi Paddy,
>
>
>  On Sun, Aug 17, 2014 at 01:26:12PM -0700, Gerben Roest wrote:
>>
>>
>>> I run a slurmctld and slurmdbd on a Scientific Linux (SL) 5 server and
>>> have three SL6 nodes, all running Slurm 14.03.6, with one node behind
>>> another slurmctld on another cluster. The whole slurm setup seems to run
>>> fine with tests, even submitting from one cluster to the other.
>>> The slurmctld daemon on the machine where slurmdbd is also running, shows
>>>
>>> error: slurm_receive_msg: Zero Bytes were transmitted or received
>>>
>>
>> For me, that's usually a version mis-match somewhere. One of the daemons
>> is a
>> version behind and so there's a protocol mis-match when trying to
>> communicate.
>> I'd double-check that all versions are the same (and have been restarted
>> since
>> any upgrades) first.
>>
>
> I have checked the versions of the main slurmctld and the slurmd's on the
> nodes, and the slurmctld on the other cluster and slurmd's on that nodes,
> and all use 14.03.6. I didn't upgrade, started straight from 14.03.6.
> The only thing might be that the main master runs 14.03.6 compiled for SL5
> with "-O0" and the others run it from another dir (NFS) compiled from the
> same source but without "-O0" and "make installed" to that other dir,
> created for SL6 machines (because of GLIBC deps). But I guess you should be
> able to run slurm on different builds provided it is the same version? It
> all seems to work but I only get the strange logs.
> If you would need more verbose information, please let me know what I have
> to check.
>
> Gerben
>

Reply via email to