Re: [chrony-dev] SW/HW timestamping on Linux

2016-11-16 Thread Denny Page
Yep. Seeing lots of that. ~70% of the samples for the locally attached unit.

Denny


> On Nov 16, 2016, at 23:39, Miroslav Lichvar  wrote:
> 
> On Wed, Nov 16, 2016 at 11:24:14PM -0800, Denny Page wrote:
>> To be clear, would I still expect to see ‘D H’ in the measurements log with 
>> this change?
> 
> Yes, but the second bit in the column with four test bits should be
> always zero.



Re: [chrony-dev] SW/HW timestamping on Linux

2016-11-16 Thread Miroslav Lichvar
On Wed, Nov 16, 2016 at 11:24:14PM -0800, Denny Page wrote:
> To be clear, would I still expect to see ‘D H’ in the measurements log with 
> this change?

Yes, but the second bit in the column with four test bits should be
always zero.

> Thanks,
> Denny
> 
> 
> > On Nov 16, 2016, at 00:53, Miroslav Lichvar  wrote:
> > 
> > Hm, the fix helped with the spikes I was seeking. Did we rule out the
> > possibility that in your case the spikes are due to the other issue with
> > out-of-order HW timestamps? Could you try it with this patch to make
> > sure only measurements with HW timestamps are used?
> > 
> > --- a/ntp_core.c
> > +++ b/ntp_core.c
> > @@ -1434,6 +1434,9 @@ receive_packet(NCR_Instance inst, NTP_Local_Address 
> > *local_addr,
> >prevent a synchronisation loop */
> > testD = message->stratum <= 1 || REF_GetMode() != REF_ModeNormal ||
> > pkt_refid != UTI_IPToRefid(&local_addr->ip_addr);
> > +
> > +if (inst->local_tx.source != NTP_TS_HARDWARE || rx_ts->source != 
> > NTP_TS_HARDWARE)
> > +  testB = 0;
> >   } else {
> > offset = delay = dispersion = 0.0;
> > sample_time = rx_ts->ts;
> > 
> 

-- 
Miroslav Lichvar

-- 
To unsubscribe email chrony-dev-requ...@chrony.tuxfamily.org with "unsubscribe" 
in the subject.
For help email chrony-dev-requ...@chrony.tuxfamily.org with "help" in the 
subject.
Trouble?  Email listmas...@chrony.tuxfamily.org.



Re: [chrony-dev] SW/HW timestamping on Linux

2016-11-16 Thread Denny Page
To be clear, would I still expect to see ‘D H’ in the measurements log with 
this change?

Thanks,
Denny


> On Nov 16, 2016, at 00:53, Miroslav Lichvar  wrote:
> 
> Hm, the fix helped with the spikes I was seeking. Did we rule out the
> possibility that in your case the spikes are due to the other issue with
> out-of-order HW timestamps? Could you try it with this patch to make
> sure only measurements with HW timestamps are used?
> 
> --- a/ntp_core.c
> +++ b/ntp_core.c
> @@ -1434,6 +1434,9 @@ receive_packet(NCR_Instance inst, NTP_Local_Address 
> *local_addr,
>prevent a synchronisation loop */
> testD = message->stratum <= 1 || REF_GetMode() != REF_ModeNormal ||
> pkt_refid != UTI_IPToRefid(&local_addr->ip_addr);
> +
> +if (inst->local_tx.source != NTP_TS_HARDWARE || rx_ts->source != 
> NTP_TS_HARDWARE)
> +  testB = 0;
>   } else {
> offset = delay = dispersion = 0.0;
> sample_time = rx_ts->ts;
> 



Re: [chrony-dev] SW/HW timestamping on Linux

2016-11-16 Thread Denny Page
Miroslav,

I’m sorry, I didn’t mean to imply that the spikes were an unrelated to the 
ordering issue. I do believe that the spikes are related to the hardware 
timestamp ordering issue.

I’ll try to set up some of the tests you’ve requested later this evening, but 
it may take a day for me to have results.

Denny



> On Nov 16, 2016, at 00:53, Miroslav Lichvar  wrote:
> 
> On Tue, Nov 15, 2016 at 08:24:37PM -0800, Denny Page wrote:
>> With the latest drop in the repo, I’m still seeing the wild spikes in the 
>> standard deviation with hardware time stamping against the fast responding 
>> hardare units. I'm also still seeing a better base deviation using software 
>> timestamps against them as well.
>> 
>> I do see better results with hardware time stamping when doing chrony to 
>> chrony, but I believe that this is a result of the general purpose computers 
>> being a bit slower to respond than the dedicated hardware units.
> 
> Hm, the fix helped with the spikes I was seeking. Did we rule out the
> possibility that in your case the spikes are due to the other issue with
> out-of-order HW timestamps? Could you try it with this patch to make
> sure only measurements with HW timestamps are used?
> 
> --- a/ntp_core.c
> +++ b/ntp_core.c
> @@ -1434,6 +1434,9 @@ receive_packet(NCR_Instance inst, NTP_Local_Address 
> *local_addr,
>prevent a synchronisation loop */
> testD = message->stratum <= 1 || REF_GetMode() != REF_ModeNormal ||
> pkt_refid != UTI_IPToRefid(&local_addr->ip_addr);
> +
> +if (inst->local_tx.source != NTP_TS_HARDWARE || rx_ts->source != 
> NTP_TS_HARDWARE)
> +  testB = 0;
>   } else {
> offset = delay = dispersion = 0.0;
> sample_time = rx_ts->ts;
> 
> -- 
> Miroslav Lichvar
> 
> -- 
> To unsubscribe email chrony-dev-requ...@chrony.tuxfamily.org with 
> "unsubscribe" in the subject.
> For help email chrony-dev-requ...@chrony.tuxfamily.org with "help" in the 
> subject.
> Trouble?  Email listmas...@chrony.tuxfamily.org.
> 


--
To unsubscribe email chrony-dev-requ...@chrony.tuxfamily.org with "unsubscribe" 
in the subject.
For help email chrony-dev-requ...@chrony.tuxfamily.org with "help" in the 
subject.
Trouble?  Email listmas...@chrony.tuxfamily.org.



Re: [chrony-dev] SW/HW timestamping on Linux

2016-11-16 Thread Denny Page
Yes, all ports on the monitoring system are identical. The i354 is a 4 port 
chip, and all the ethernet ports on the monitoring unit are connected through 
that same chip.

The general server has both I354 and I211 chips. I shut everything down on the 
server to conduct a test. It didn’t matter which the direct connect vs switch 
was in use.

Denny



> On Nov 16, 2016, at 01:53, Miroslav Lichvar  wrote:
> 
> Is the port to the switch identical to the one connected to the third
> server? It would be interesting to see if the offset changes when the
> ports are swapped.



Re: [chrony-dev] SW/HW timestamping on Linux

2016-11-16 Thread Miroslav Lichvar
On Tue, Nov 15, 2016 at 09:39:40PM -0800, Denny Page wrote:
> 
> While I can get my head around a differential of 250ns resulting from the 
> switch, I’m finding it very difficult to believe the  almost 2500ns 
> differential that appears when hardware time stamping is enabled.
> 
> Any thoughts on this?

Is the port to the switch identical to the one connected to the third
server? It would be interesting to see if the offset changes when the
ports are swapped.

I'd trust HW timestamping. The 2.2us offset doesn't seem unrealistic.
There is a reason why there are switches with support for PTP. You
have exceptionally stable measurements with SW timestamping, but that
doesn't mean the asymmetry in delay and processing has to be the same
between the two ports.

To get a better perspective, I'd suggest to compare offsets between SW
and HW timestamping at the same time. It is possible to run two chronyd
instances, one using SW timestamping, the other HW timestamping. They
need to be configured to use a different cmdport (e.g. 323 and 324)
and pidfile, and they both need all their sources to have the noselect
option, so they don't try to control the clock and interfere with each
other. You can then run "chronyc -p 323 -h ::1" and "chronyc -p 324 -h
::1" to compare the offsets. You may need to synchronize the clock
first, so the offsets are small enough to see the difference between
them.

-- 
Miroslav Lichvar

-- 
To unsubscribe email chrony-dev-requ...@chrony.tuxfamily.org with "unsubscribe" 
in the subject.
For help email chrony-dev-requ...@chrony.tuxfamily.org with "help" in the 
subject.
Trouble?  Email listmas...@chrony.tuxfamily.org.



Re: [chrony-dev] SW/HW timestamping on Linux

2016-11-16 Thread Miroslav Lichvar
On Tue, Nov 15, 2016 at 08:24:37PM -0800, Denny Page wrote:
> With the latest drop in the repo, I’m still seeing the wild spikes in the 
> standard deviation with hardware time stamping against the fast responding 
> hardare units. I'm also still seeing a better base deviation using software 
> timestamps against them as well.
> 
> I do see better results with hardware time stamping when doing chrony to 
> chrony, but I believe that this is a result of the general purpose computers 
> being a bit slower to respond than the dedicated hardware units.

Hm, the fix helped with the spikes I was seeking. Did we rule out the
possibility that in your case the spikes are due to the other issue with
out-of-order HW timestamps? Could you try it with this patch to make
sure only measurements with HW timestamps are used?

--- a/ntp_core.c
+++ b/ntp_core.c
@@ -1434,6 +1434,9 @@ receive_packet(NCR_Instance inst, NTP_Local_Address 
*local_addr,
prevent a synchronisation loop */
 testD = message->stratum <= 1 || REF_GetMode() != REF_ModeNormal ||
 pkt_refid != UTI_IPToRefid(&local_addr->ip_addr);
+
+if (inst->local_tx.source != NTP_TS_HARDWARE || rx_ts->source != 
NTP_TS_HARDWARE)
+  testB = 0;
   } else {
 offset = delay = dispersion = 0.0;
 sample_time = rx_ts->ts;

-- 
Miroslav Lichvar

-- 
To unsubscribe email chrony-dev-requ...@chrony.tuxfamily.org with "unsubscribe" 
in the subject.
For help email chrony-dev-requ...@chrony.tuxfamily.org with "help" in the 
subject.
Trouble?  Email listmas...@chrony.tuxfamily.org.



Re: [chrony-dev] SW/HW timestamping on Linux

2016-11-16 Thread Miroslav Lichvar
On Tue, Nov 15, 2016 at 08:14:28PM -0800, Denny Page wrote:
> The chip is actually a I354, which is slightly different than the I350, but I 
> don’t think it matters much. I also have interfaces with I211 chips, and the 
> ordering issue appears to happen there as well. I don’t think the sleep after 
> the send is going to affect the order of the timestamp and response messages 
> since both are requested at the point of the outbound send.

I think what matters is the order in which they are received from the
socket. I don't know much about kernel internals. I assume the error
queue is separate and if chronyd waits until they are both queued, the
one in the error queue will be picked first, even if the response from
the server was processed by the kernel first. If the response does
need to be received from the socket before the tx timestamp and this
expected behavior, then I'm not sure how we will work around that.

Can you please try it again with the following patch and post the
debug output if the problem persists?

--- a/ntp_io.c
+++ b/ntp_io.c
@@ -29,6 +29,7 @@
 #include "config.h"
 
 #include "sysincl.h"
+#include 
 
 #include "array.h"
 #include "ntp_io.h"
@@ -829,5 +830,17 @@ NIO_SendPacket(NTP_Packet *packet, NTP_Remote_Address 
*remote_addr,
   UTI_IPToString(&remote_addr->ip_addr), remote_addr->port,
   UTI_IPToString(&local_addr->ip_addr), local_addr->sock_fd);
 
+  struct timespec ts1, ts2;
+  struct pollfd pfd;
+  pfd.fd = local_addr->sock_fd;
+  pfd.events = POLLPRI;
+  clock_gettime(CLOCK_REALTIME, &ts1);
+  if (poll(&pfd, 1, 100)) {
+clock_gettime(CLOCK_REALTIME, &ts2);
+DEBUG_LOG(LOGF_NtpIO, "poll fd ready after %.9f seconds", 
UTI_DiffTimespecsToDouble(&ts2, &ts1));
+  } else {
+DEBUG_LOG(LOGF_NtpIO, "poll timeout");
+  }
+
   return 1;
 }


-- 
Miroslav Lichvar

-- 
To unsubscribe email chrony-dev-requ...@chrony.tuxfamily.org with "unsubscribe" 
in the subject.
For help email chrony-dev-requ...@chrony.tuxfamily.org with "help" in the 
subject.
Trouble?  Email listmas...@chrony.tuxfamily.org.