Hi, as a sysadmin, I know the importance of keeping correct time on "things". We use (of course) ntp for that.
But what is the preferred way to check that the compute nodes on our have correct time, and if not, see to it that Slurm doesn´t allocate these nodes to perform tasks? For about a year ago, we started to use Munge for authentication. Default time drift between nodes and slurm server for Munge is +/- 5 minutes (300sec). If time exceeds these values, the node cannot communicate with slurmd and will be marked "down*" in slurm. Perfect, we thought, since this check would be enough for Slurm not to allocate time broken nodes to users. But... I´ve read the Slurm documentation and it says "While Slurm itself does not rely upon synchronized clocks on all nodes of a cluster for proper operation, its underlying authentication mechanism does have this requirement." True. Tests we have performed with drifting time on a compute node shows that if a node clock is >5 minutes AFTER correct time we get "Job credential expired". Fine. (=same TTL as Munge) But - if time on node is approx. JUST 2.5 minutes BEFORE correct time, we also get "Job credential expired". NOT Fine. About half the TTL vs. Munge. So, if we just let Slurm "rely" on Munge, we will have users complaining about "Job credential expired" if time on node is between about 2.5 minutes and 5 minutes wrong. We also have tried to alter the TTL for Munge, but that doesn´t seem to be implemented...yet? Earlier (before we used Munge) we had a bit of quite crappy code in a Prolog-script that checked NTP, but it was buggy... I would appreciate input from other admins how to check and maintaining synchronized clocks in a Slurm managed cluster! Thanks, /Per Lönnborg <P><p><font face="Arial, Helvetica, sans-serif" size="2" style="font-size:13.5px">_______________________________________________________________<BR>Annons: Handla enkelt och smidigt hos <a href="http://www.dpbolvw.net/click-5762941-10771045" target="_blank">Clas Ohlson</a><img src="http://www.lduhtrp.net/image-5762941-10771045" width="1" height="1" border="0"/></font>
