Re: [chrony-dev] SW/HW timestamping on Linux
Yep. Seeing lots of that. ~70% of the samples for the locally attached unit. Denny > On Nov 16, 2016, at 23:39, Miroslav Lichvar wrote: > > On Wed, Nov 16, 2016 at 11:24:14PM -0800, Denny Page wrote: >> To be clear, would I still expect to see ‘D H’ in the measurements log with >> this change? > > Yes, but the second bit in the column with four test bits should be > always zero.
Re: [chrony-dev] SW/HW timestamping on Linux
On Wed, Nov 16, 2016 at 11:24:14PM -0800, Denny Page wrote: > To be clear, would I still expect to see ‘D H’ in the measurements log with > this change? Yes, but the second bit in the column with four test bits should be always zero. > Thanks, > Denny > > > > On Nov 16, 2016, at 00:53, Miroslav Lichvar wrote: > > > > Hm, the fix helped with the spikes I was seeking. Did we rule out the > > possibility that in your case the spikes are due to the other issue with > > out-of-order HW timestamps? Could you try it with this patch to make > > sure only measurements with HW timestamps are used? > > > > --- a/ntp_core.c > > +++ b/ntp_core.c > > @@ -1434,6 +1434,9 @@ receive_packet(NCR_Instance inst, NTP_Local_Address > > *local_addr, > >prevent a synchronisation loop */ > > testD = message->stratum <= 1 || REF_GetMode() != REF_ModeNormal || > > pkt_refid != UTI_IPToRefid(&local_addr->ip_addr); > > + > > +if (inst->local_tx.source != NTP_TS_HARDWARE || rx_ts->source != > > NTP_TS_HARDWARE) > > + testB = 0; > > } else { > > offset = delay = dispersion = 0.0; > > sample_time = rx_ts->ts; > > > -- Miroslav Lichvar -- To unsubscribe email chrony-dev-requ...@chrony.tuxfamily.org with "unsubscribe" in the subject. For help email chrony-dev-requ...@chrony.tuxfamily.org with "help" in the subject. Trouble? Email listmas...@chrony.tuxfamily.org.
Re: [chrony-dev] SW/HW timestamping on Linux
To be clear, would I still expect to see ‘D H’ in the measurements log with this change? Thanks, Denny > On Nov 16, 2016, at 00:53, Miroslav Lichvar wrote: > > Hm, the fix helped with the spikes I was seeking. Did we rule out the > possibility that in your case the spikes are due to the other issue with > out-of-order HW timestamps? Could you try it with this patch to make > sure only measurements with HW timestamps are used? > > --- a/ntp_core.c > +++ b/ntp_core.c > @@ -1434,6 +1434,9 @@ receive_packet(NCR_Instance inst, NTP_Local_Address > *local_addr, >prevent a synchronisation loop */ > testD = message->stratum <= 1 || REF_GetMode() != REF_ModeNormal || > pkt_refid != UTI_IPToRefid(&local_addr->ip_addr); > + > +if (inst->local_tx.source != NTP_TS_HARDWARE || rx_ts->source != > NTP_TS_HARDWARE) > + testB = 0; > } else { > offset = delay = dispersion = 0.0; > sample_time = rx_ts->ts; >
Re: [chrony-dev] SW/HW timestamping on Linux
Miroslav, I’m sorry, I didn’t mean to imply that the spikes were an unrelated to the ordering issue. I do believe that the spikes are related to the hardware timestamp ordering issue. I’ll try to set up some of the tests you’ve requested later this evening, but it may take a day for me to have results. Denny > On Nov 16, 2016, at 00:53, Miroslav Lichvar wrote: > > On Tue, Nov 15, 2016 at 08:24:37PM -0800, Denny Page wrote: >> With the latest drop in the repo, I’m still seeing the wild spikes in the >> standard deviation with hardware time stamping against the fast responding >> hardare units. I'm also still seeing a better base deviation using software >> timestamps against them as well. >> >> I do see better results with hardware time stamping when doing chrony to >> chrony, but I believe that this is a result of the general purpose computers >> being a bit slower to respond than the dedicated hardware units. > > Hm, the fix helped with the spikes I was seeking. Did we rule out the > possibility that in your case the spikes are due to the other issue with > out-of-order HW timestamps? Could you try it with this patch to make > sure only measurements with HW timestamps are used? > > --- a/ntp_core.c > +++ b/ntp_core.c > @@ -1434,6 +1434,9 @@ receive_packet(NCR_Instance inst, NTP_Local_Address > *local_addr, >prevent a synchronisation loop */ > testD = message->stratum <= 1 || REF_GetMode() != REF_ModeNormal || > pkt_refid != UTI_IPToRefid(&local_addr->ip_addr); > + > +if (inst->local_tx.source != NTP_TS_HARDWARE || rx_ts->source != > NTP_TS_HARDWARE) > + testB = 0; > } else { > offset = delay = dispersion = 0.0; > sample_time = rx_ts->ts; > > -- > Miroslav Lichvar > > -- > To unsubscribe email chrony-dev-requ...@chrony.tuxfamily.org with > "unsubscribe" in the subject. > For help email chrony-dev-requ...@chrony.tuxfamily.org with "help" in the > subject. > Trouble? Email listmas...@chrony.tuxfamily.org. > -- To unsubscribe email chrony-dev-requ...@chrony.tuxfamily.org with "unsubscribe" in the subject. For help email chrony-dev-requ...@chrony.tuxfamily.org with "help" in the subject. Trouble? Email listmas...@chrony.tuxfamily.org.
Re: [chrony-dev] SW/HW timestamping on Linux
Yes, all ports on the monitoring system are identical. The i354 is a 4 port chip, and all the ethernet ports on the monitoring unit are connected through that same chip. The general server has both I354 and I211 chips. I shut everything down on the server to conduct a test. It didn’t matter which the direct connect vs switch was in use. Denny > On Nov 16, 2016, at 01:53, Miroslav Lichvar wrote: > > Is the port to the switch identical to the one connected to the third > server? It would be interesting to see if the offset changes when the > ports are swapped.
Re: [chrony-dev] SW/HW timestamping on Linux
On Tue, Nov 15, 2016 at 09:39:40PM -0800, Denny Page wrote: > > While I can get my head around a differential of 250ns resulting from the > switch, I’m finding it very difficult to believe the almost 2500ns > differential that appears when hardware time stamping is enabled. > > Any thoughts on this? Is the port to the switch identical to the one connected to the third server? It would be interesting to see if the offset changes when the ports are swapped. I'd trust HW timestamping. The 2.2us offset doesn't seem unrealistic. There is a reason why there are switches with support for PTP. You have exceptionally stable measurements with SW timestamping, but that doesn't mean the asymmetry in delay and processing has to be the same between the two ports. To get a better perspective, I'd suggest to compare offsets between SW and HW timestamping at the same time. It is possible to run two chronyd instances, one using SW timestamping, the other HW timestamping. They need to be configured to use a different cmdport (e.g. 323 and 324) and pidfile, and they both need all their sources to have the noselect option, so they don't try to control the clock and interfere with each other. You can then run "chronyc -p 323 -h ::1" and "chronyc -p 324 -h ::1" to compare the offsets. You may need to synchronize the clock first, so the offsets are small enough to see the difference between them. -- Miroslav Lichvar -- To unsubscribe email chrony-dev-requ...@chrony.tuxfamily.org with "unsubscribe" in the subject. For help email chrony-dev-requ...@chrony.tuxfamily.org with "help" in the subject. Trouble? Email listmas...@chrony.tuxfamily.org.
Re: [chrony-dev] SW/HW timestamping on Linux
On Tue, Nov 15, 2016 at 08:24:37PM -0800, Denny Page wrote: > With the latest drop in the repo, I’m still seeing the wild spikes in the > standard deviation with hardware time stamping against the fast responding > hardare units. I'm also still seeing a better base deviation using software > timestamps against them as well. > > I do see better results with hardware time stamping when doing chrony to > chrony, but I believe that this is a result of the general purpose computers > being a bit slower to respond than the dedicated hardware units. Hm, the fix helped with the spikes I was seeking. Did we rule out the possibility that in your case the spikes are due to the other issue with out-of-order HW timestamps? Could you try it with this patch to make sure only measurements with HW timestamps are used? --- a/ntp_core.c +++ b/ntp_core.c @@ -1434,6 +1434,9 @@ receive_packet(NCR_Instance inst, NTP_Local_Address *local_addr, prevent a synchronisation loop */ testD = message->stratum <= 1 || REF_GetMode() != REF_ModeNormal || pkt_refid != UTI_IPToRefid(&local_addr->ip_addr); + +if (inst->local_tx.source != NTP_TS_HARDWARE || rx_ts->source != NTP_TS_HARDWARE) + testB = 0; } else { offset = delay = dispersion = 0.0; sample_time = rx_ts->ts; -- Miroslav Lichvar -- To unsubscribe email chrony-dev-requ...@chrony.tuxfamily.org with "unsubscribe" in the subject. For help email chrony-dev-requ...@chrony.tuxfamily.org with "help" in the subject. Trouble? Email listmas...@chrony.tuxfamily.org.
Re: [chrony-dev] SW/HW timestamping on Linux
On Tue, Nov 15, 2016 at 08:14:28PM -0800, Denny Page wrote: > The chip is actually a I354, which is slightly different than the I350, but I > don’t think it matters much. I also have interfaces with I211 chips, and the > ordering issue appears to happen there as well. I don’t think the sleep after > the send is going to affect the order of the timestamp and response messages > since both are requested at the point of the outbound send. I think what matters is the order in which they are received from the socket. I don't know much about kernel internals. I assume the error queue is separate and if chronyd waits until they are both queued, the one in the error queue will be picked first, even if the response from the server was processed by the kernel first. If the response does need to be received from the socket before the tx timestamp and this expected behavior, then I'm not sure how we will work around that. Can you please try it again with the following patch and post the debug output if the problem persists? --- a/ntp_io.c +++ b/ntp_io.c @@ -29,6 +29,7 @@ #include "config.h" #include "sysincl.h" +#include #include "array.h" #include "ntp_io.h" @@ -829,5 +830,17 @@ NIO_SendPacket(NTP_Packet *packet, NTP_Remote_Address *remote_addr, UTI_IPToString(&remote_addr->ip_addr), remote_addr->port, UTI_IPToString(&local_addr->ip_addr), local_addr->sock_fd); + struct timespec ts1, ts2; + struct pollfd pfd; + pfd.fd = local_addr->sock_fd; + pfd.events = POLLPRI; + clock_gettime(CLOCK_REALTIME, &ts1); + if (poll(&pfd, 1, 100)) { +clock_gettime(CLOCK_REALTIME, &ts2); +DEBUG_LOG(LOGF_NtpIO, "poll fd ready after %.9f seconds", UTI_DiffTimespecsToDouble(&ts2, &ts1)); + } else { +DEBUG_LOG(LOGF_NtpIO, "poll timeout"); + } + return 1; } -- Miroslav Lichvar -- To unsubscribe email chrony-dev-requ...@chrony.tuxfamily.org with "unsubscribe" in the subject. For help email chrony-dev-requ...@chrony.tuxfamily.org with "help" in the subject. Trouble? Email listmas...@chrony.tuxfamily.org.