I think the IP of the misbehaving node is listed in the munge log on your 
slurmctld host.

Trevor

On May 26, 2016, at 5:44 AM, Per Lönnborg 
<[email protected]<mailto:[email protected]>> wrote:

We do have some monitoring, we even check correct time in a Prolog-script.

We also have a ongoing, tail -f slurmctld.log, and when slurmctld (or munge) 
after all reports this (every second!) we thought it would be nice (and 
simple?) to add the hostname, or at least the ip-address?

/Per


<-----Ursprungligt Meddelande----->
                From: Benjamin Redling 
[[email protected]<mailto:[email protected]>]
Sent: 26/5/2016 1:27:54 PM
To: [email protected]<mailto:[email protected]>
Subject: [slurm-dev] Re: Munge time error, but from WHICH node?

On 05/26/2016 12:16, Per L�nnborg wrote:
> Example from logfile below. LOTS of info saying that one ore several
> nodes has incorrect time. I want to see which node(s)!
> Of course I can ask all nodes about the time, but it�s a bit dull. Even
> if we do it in parallell.

A monitoring application is a worthwhile investment.

A former co-worker introduced check_mk and it automatically discovers
the NTP service and any problems with it.
Either as part of OMD (open monitoring distribution) or raw edition: its
setup is a no-brainer.

Regards,
Benjamin
--
FSU Jena | 
JULIELab.de/Staff/Benjamin+Redling.html<http://julielab.de/Staff/Benjamin+Redling.html>
vox: +49 3641 9 44323 | fax: +49 3641 9 44321


Reply via email to