We do have some monitoring, we even check correct time in a
Prolog-script.
We also have a ongoing, tail -f slurmctld.log, and when slurmctld (or
munge) after all reports this (every second!) we thought it would be
nice (and simple?) to add the hostname, or at least the ip-address?
/Per
<-----Ursprungligt Meddelande----->
From: Benjamin Redling [[email protected]]
Sent: 26/5/2016 1:27:54 PM
To: [email protected]
Subject: [slurm-dev] Re: Munge time error, but from WHICH node?
On 05/26/2016 12:16, Per Lönnborg wrote:
> Example from logfile below. LOTS of info saying that one ore several
> nodes has incorrect time. I want to see which node(s)!
> Of course I can ask all nodes about the time, but it´s a bit dull.
Even
> if we do it in parallell.
A monitoring application is a worthwhile investment.
A former co-worker introduced check_mk and it automatically discovers
the NTP service and any problems with it.
Either as part of OMD (open monitoring distribution) or raw edition: its
setup is a no-brainer.
Regards,
Benjamin
--
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321
.
<P><p><font face="Arial, Helvetica, sans-serif" size="2"
style="font-size:13.5px">_______________________________________________________________<BR>Annons:
Handla enkelt och smidigt hos <a
href="http://www.dpbolvw.net/click-5762941-10771045" target="_blank">Clas
Ohlson</a><img src="http://www.lduhtrp.net/image-5762941-10771045" width="1"
height="1" border="0"/></font>