David,

On closer examination of the code, the scenario I suggested in my previous message is not possible. In other words there is no possibility the reach register is zero and the tally code is other than space. The reason for this is that the select algorithm that determines the system peer and lights the tally codes is called not only when a new update is received from any server, but also after four poll intervals when no sample have been received from a server. This means not only does the indicated dispersion increases rapidly, which would greatly reduce its chances of becoming the system peer if other sources were present, but prevents the race condition between the time a poll is sent and the next update is received.

The sysadmins of the world have had almost thirty years to develop uses for the monitoring facilities first designed by Dennis Fergusson circa 1983 and only minor changes since then. When I implemented the tally codes circa 1992 the intent was that the sysadmin needs only the pe command and the tally codes do asses the general health and the rv command only as diagnostic aid.

Dave

David Woolley wrote:

David L. Mills wrote:

Miroslav,

You might be confusing the server role with the client role. The server has one or more upstream sources and downstream clients. The tally code for each source is displayed by the pe command separately at the server and the client. Each time an update is received from a source at either the server or the client the tally codes for all sources are redetermined. If a source is considered invalid, unreachable or the maximum error statistic exceed the select threshold, the tally indicator surely will be blank. If a source is marked as the system peer, it surely is valid and reachable.


This is not the behaviour that the person who started the thread is complaining about. He is complaining that the system peer and selected markers are not cleared on the server when it loses reachability to the respective upstream servers. My previous article was on the basis that you were not challenging that aspect of his report.

In the real world, most administrators judge whether a server is synchronized by doing ntpq peers and looking for these flags, not by doing a client request and looking at the error statistics. In fact, relatively few people realise that you need to use rv on the associations to properly diagnose a failure to select.


In the case you present the server has lost all sources, but remains a viable choice even beyond that, as long as the maximum error does not exceed the select threshold. The user can set this to whatever value is appropriate, with default 1.5 s. The point I emphasize is that the server, even if it has lost all sources, remains conformant to the formal specification. Thus, the time provider does not judge the quality which the receiver requires; this is specified by the receiver.


_______________________________________________
questions mailing list
[email protected]
http://lists.ntp.org/listinfo/questions


_______________________________________________
questions mailing list
[email protected]
http://lists.ntp.org/listinfo/questions

Reply via email to