On Thu, Oct 6, 2011 at 19:11, Conner, Matthew <[email protected]> wrote:
> We are experiencing an issue using Orphan mode and peering in our ntpd 4.2.6p4
> set-up. With the loss of our stratum 1 time hosts, the stratum 2 are not 
> properly
> choosing a primary time provider.
> [...]
> The stratum 2 (timehost[1-4]) attempt to peer with the loss of the stratum 1 
> (tfds[1-3]}.
> However, instead of them all staying at stratum 4 as was seen when using ntpd
> 4.2.4p7 (have other issues with 4.2.4p7 and need to update), the peers are 
> dropping
> down 1 stratum from the peer they are locking to. Since they are peering to 
> one
> another, this results in the timehosts slowly dropping in stratum as they 
> attempt to
> stay 1 stratum below the locked to host. They continue to drop in stratum 
> until
> reaching a stratum 16. Once they hit stratum 16, all other hosts disconnect 
> and the
> peers previously locking to the now stratum 16 host will unlock and jump back 
> to a
> stratum 4. Once at least 1  peer jumps back to 4, the others will begin 
> jumping to
> stratum 4-5. This process will repeat itself until the stratum 1 hosts are 
> reconnected
> or the timehosts choose a primary. We have only once seen it stabilize with 
> all 4
> hosts and it took almost a full 24 hours to do so. With only 3 timehosts 
> running, they
> will stabilize within minutes.
>
> From what we are able to tell, a primary peer is chosen when 3 of the 4 
> timehosts
> lock to the same peer.  When the 4th peer sees that the others are all 
> connected to
> it, it syncs to its internal clock and remains a stratum 4. Is this correct, 
> or is
> something else going on here?

The vicious cycle of dropping stratum suggests a bug to me.  Orphan
mode is supposed to result in the orphan peers agreeing on a single
"orphan parent" (via luck of the random number generator), with the
parent operating at the "tos orphan" stratum, and all others using the
single orphan parent so long as it remains available, and thereby each
running with a stratum one higher (5, in your example).

> Further questions:
> Are the peers intentionally dropping below the orphan mode set stratum, or is 
> that
> a bug?

Whether the source is a peer or server association, the stratum of
ntpd is by design one higher than its upstream source.  The problem
you are experiencing implies the clients are failing to agree on the
orphan parent once the WAN sources are unusable.

I am very curious if the same problem exists with the latest 4.2.7
(ntp-dev) snapshot.  We are getting close to starting the RC cycle of
refining 4.2.7 to be the next ntp-stable, likely 4.2.8.  If this
problem with 4.2.6 has since been solved in 4.2.7, that's great, but
if it's not, it would be nice to get it resolved before releasing a
new ntp-stable with all the 4.2.7 changes.

Cheers,
Dave Hart
_______________________________________________
questions mailing list
[email protected]
http://lists.ntp.org/listinfo/questions

Reply via email to