On Thu, Oct 6, 2011 at 19:11, Conner, Matthew <[email protected]> wrote: > We are experiencing an issue using Orphan mode and peering in our ntpd 4.2.6p4 > set-up. With the loss of our stratum 1 time hosts, the stratum 2 are not > properly > choosing a primary time provider. > [...] > The stratum 2 (timehost[1-4]) attempt to peer with the loss of the stratum 1 > (tfds[1-3]}. > However, instead of them all staying at stratum 4 as was seen when using ntpd > 4.2.4p7 (have other issues with 4.2.4p7 and need to update), the peers are > dropping > down 1 stratum from the peer they are locking to. Since they are peering to > one > another, this results in the timehosts slowly dropping in stratum as they > attempt to > stay 1 stratum below the locked to host. They continue to drop in stratum > until > reaching a stratum 16. Once they hit stratum 16, all other hosts disconnect > and the > peers previously locking to the now stratum 16 host will unlock and jump back > to a > stratum 4. Once at least 1 peer jumps back to 4, the others will begin > jumping to > stratum 4-5. This process will repeat itself until the stratum 1 hosts are > reconnected > or the timehosts choose a primary. We have only once seen it stabilize with > all 4 > hosts and it took almost a full 24 hours to do so. With only 3 timehosts > running, they > will stabilize within minutes. > > From what we are able to tell, a primary peer is chosen when 3 of the 4 > timehosts > lock to the same peer. When the 4th peer sees that the others are all > connected to > it, it syncs to its internal clock and remains a stratum 4. Is this correct, > or is > something else going on here?
The vicious cycle of dropping stratum suggests a bug to me. Orphan mode is supposed to result in the orphan peers agreeing on a single "orphan parent" (via luck of the random number generator), with the parent operating at the "tos orphan" stratum, and all others using the single orphan parent so long as it remains available, and thereby each running with a stratum one higher (5, in your example). > Further questions: > Are the peers intentionally dropping below the orphan mode set stratum, or is > that > a bug? Whether the source is a peer or server association, the stratum of ntpd is by design one higher than its upstream source. The problem you are experiencing implies the clients are failing to agree on the orphan parent once the WAN sources are unusable. I am very curious if the same problem exists with the latest 4.2.7 (ntp-dev) snapshot. We are getting close to starting the RC cycle of refining 4.2.7 to be the next ntp-stable, likely 4.2.8. If this problem with 4.2.6 has since been solved in 4.2.7, that's great, but if it's not, it would be nice to get it resolved before releasing a new ntp-stable with all the 4.2.7 changes. Cheers, Dave Hart _______________________________________________ questions mailing list [email protected] http://lists.ntp.org/listinfo/questions
