I think the IP of the misbehaving node is listed in the munge log on your slurmctld host.
Trevor On May 26, 2016, at 5:44 AM, Per Lönnborg <[email protected]<mailto:[email protected]>> wrote: We do have some monitoring, we even check correct time in a Prolog-script. We also have a ongoing, tail -f slurmctld.log, and when slurmctld (or munge) after all reports this (every second!) we thought it would be nice (and simple?) to add the hostname, or at least the ip-address? /Per <-----Ursprungligt Meddelande-----> From: Benjamin Redling [[email protected]<mailto:[email protected]>] Sent: 26/5/2016 1:27:54 PM To: [email protected]<mailto:[email protected]> Subject: [slurm-dev] Re: Munge time error, but from WHICH node? On 05/26/2016 12:16, Per L�nnborg wrote: > Example from logfile below. LOTS of info saying that one ore several > nodes has incorrect time. I want to see which node(s)! > Of course I can ask all nodes about the time, but it�s a bit dull. Even > if we do it in parallell. A monitoring application is a worthwhile investment. A former co-worker introduced check_mk and it automatically discovers the NTP service and any problems with it. Either as part of OMD (open monitoring distribution) or raw edition: its setup is a no-brainer. Regards, Benjamin -- FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html<http://julielab.de/Staff/Benjamin+Redling.html> vox: +49 3641 9 44323 | fax: +49 3641 9 44321
