Re: [chrony-dev] SW/HW timestamping on Linux

2016-11-14 Thread Denny Page
I tested with a usleep(100) following the sendmsg() call. This didn’t appear to 
have any impact. Was the usleep() intended to influence the order of timestamp 
vs. server response messages?

Denny


> On Nov 14, 2016, at 11:04, Miroslav Lichvar  wrote:
> 
> usleep() should be called after sendmsg(), e.g. in
> NIO_SendPacket().



Re: [chrony-dev] SW/HW timestamping on Linux

2016-11-14 Thread Denny Page
Setting noselect for all sources seems to have no impact on standard deviation.

Disabling all forms of dynamic tic (CONFIG_HZ_PERIODIC=y) seems to have some 
effect in reducing the number of “D H”, but does not appear to have much of an 
impact on the standard deviation. I am returning CONFIG_NO_HZ_IDLE=y.

Denny



> On Nov 14, 2016, at 05:35, Miroslav Lichvar  wrote:
> 
> Another experiment would be to try
> configuring all sources with the noselect option and see how much
> stddev improves. If it does improve significantly, it would suggest a
> problem with the synchronization of the clock. Maybe the kernel is too
> slow (the nohz=off option might help) or there is a problem in the
> chrony's loop.



Re: [chrony-dev] SW/HW timestamping on Linux

2016-11-14 Thread Miroslav Lichvar
On Mon, Nov 14, 2016 at 10:47:52AM -0800, Denny Page wrote:
> I’m not sure I understand what effect a delay in NIO_Linux_RequestTxTimestamp 
> would have. That I see, NIO_Linux_RequestTxTimestamp builds a control message 
> structure but does not make any level 2 calls. Introducing a delay here 
> should be the same as introducing a delay one level up in NIO_SendPacket. 
> Either way, this is before the call to sendmsg(), so the only effect I see is 
> that the original message to the server is delayed by 100us. I don’t 
> understand how this would affect the timing of the server response message, 
> or the timestamp message. Is there something that I’m missing?

No, you are right. usleep() should be called after sendmsg(), e.g. in
NIO_SendPacket(). I was thinking it would be a Linux-specific hack,
but this is a wrong place for it.

-- 
Miroslav Lichvar

-- 
To unsubscribe email chrony-dev-requ...@chrony.tuxfamily.org with "unsubscribe" 
in the subject.
For help email chrony-dev-requ...@chrony.tuxfamily.org with "help" in the 
subject.
Trouble?  Email listmas...@chrony.tuxfamily.org.



Re: [chrony-dev] SW/HW timestamping on Linux

2016-11-14 Thread Denny Page
I’m not sure I understand what effect a delay in NIO_Linux_RequestTxTimestamp 
would have. That I see, NIO_Linux_RequestTxTimestamp builds a control message 
structure but does not make any level 2 calls. Introducing a delay here should 
be the same as introducing a delay one level up in NIO_SendPacket. Either way, 
this is before the call to sendmsg(), so the only effect I see is that the 
original message to the server is delayed by 100us. I don’t understand how this 
would affect the timing of the server response message, or the timestamp 
message. Is there something that I’m missing?

Denny


> On Nov 14, 2016, at 05:17, Miroslav Lichvar  wrote:
> 
> As a quick workaround I'd suggest to try it with this change:
> 
> --- a/ntp_io_linux.c
> +++ b/ntp_io_linux.c
> @@ -465,6 +465,8 @@ NIO_Linux_RequestTxTimestamp(struct msghdr *msg, int 
> cmsglen, int sock_fd)
> {
>   struct cmsghdr *cmsg;
> 
> +  usleep(100);
> +
>   /* Check if TX timestamping is disabled on this socket */
>   if (permanent_ts_options || !NIO_IsServerSocket(sock_fd))
> return cmsglen;



Re: [chrony-dev] SW/HW timestamping on Linux

2016-11-14 Thread Denny Page
Yes, I these on the monitoring system. For all servers/peers.

Denny


> On Nov 14, 2016, at 05:35, Miroslav Lichvar  wrote:
> 
> Do you see in measurements.log any entries with 'D K' and
> '111 111 ' in the columns with tests results (i.e. were any 'D K'
> measurements used for synchronization)? 



Re: [chrony-dev] SW/HW timestamping on Linux

2016-11-14 Thread Miroslav Lichvar
On Sat, Nov 12, 2016 at 10:36:55AM -0800, Denny Page wrote:
> Here is a reasonable visual representation of what I am seeing. The section 
> on the left (before 8:00) is with hardware timestamping, while the section on 
> the right is with software timestamping. maxdelaydevratio of 4 in both cases.
> 
> I think I have a kernel/driver problem.

I think I see something similar. Occasional spikes that are causing
resets of the HW clock instance. It looks like a chrony bug.

-- 
Miroslav Lichvar

-- 
To unsubscribe email chrony-dev-requ...@chrony.tuxfamily.org with "unsubscribe" 
in the subject.
For help email chrony-dev-requ...@chrony.tuxfamily.org with "help" in the 
subject.
Trouble?  Email listmas...@chrony.tuxfamily.org.



Re: [chrony-dev] SW/HW timestamping on Linux

2016-11-14 Thread Miroslav Lichvar
On Sat, Nov 12, 2016 at 10:36:55AM -0800, Denny Page wrote:
> Here is a reasonable visual representation of what I am seeing. The section 
> on the left (before 8:00) is with hardware timestamping, while the section on 
> the right is with software timestamping. maxdelaydevratio of 4 in both cases.

Do you see in measurements.log any entries with 'D K' and
'111 111 ' in the columns with tests results (i.e. were any 'D K'
measurements used for synchronization)? 

It would be interesting to see if this improves with the workaround I
posted in the other mail. Another experiment would be to try
configuring all sources with the noselect option and see how much
stddev improves. If it does improve significantly, it would suggest a
problem with the synchronization of the clock. Maybe the kernel is too
slow (the nohz=off option might help) or there is a problem in the
chrony's loop.

-- 
Miroslav Lichvar

-- 
To unsubscribe email chrony-dev-requ...@chrony.tuxfamily.org with "unsubscribe" 
in the subject.
For help email chrony-dev-requ...@chrony.tuxfamily.org with "help" in the 
subject.
Trouble?  Email listmas...@chrony.tuxfamily.org.



Re: [chrony-dev] SW/HW timestamping on Linux

2016-11-14 Thread Miroslav Lichvar
On Fri, Nov 11, 2016 at 10:06:25AM -0800, Denny Page wrote:
> Following is output from test 4. Please let me know if this gives you the 
> information that you are looking for.

It does. Thanks.

> 2016-11-11T17:45:06Z ntp_io.c:657:(process_message) Received 48 bytes from 
> 192.168.230.244:123 to 192.168.230.3 fd=8 if=2 tss=2 delay=0.22516
> 2016-11-11T17:45:06Z sourcestats.c:808:(SST_IsGoodSample) Bad sample: 
> offset=-0.02 delay=0.42 incr_delay=0.03 allowed=0.14
> 2016-11-11T17:45:06Z ntp_core.c:1484:(receive_packet) NTP packet lvm=44 
> stratum=1 poll=4 prec=-25 root_delay=0.00 root_disp=0.00 
> refid=47505300 []
> 2016-11-11T17:45:06Z ntp_core.c:1489:(receive_packet) 
> reference=1478886306.0 origin=882400267.756283207 
> receive=1478886306.082181777 transmit=1478886306.082186496
> 2016-11-11T17:45:06Z ntp_core.c:1491:(receive_packet) offset=0.08163 
> delay=0.42087 dispersion=0.00 root_delay=0.42 
> root_dispersion=0.00
> 2016-11-11T17:45:06Z ntp_core.c:1495:(receive_packet) test123=111 test567=111 
> testABCD=1101 kod_rate=0 interleaved=0 valid=1 good=0 updated=1
> 2016-11-11T17:45:06Z ntp_io_linux.c:444:(NIO_Linux_ProcessMessage) Received 
> 48 bytes from error queue for 192.168.230.244:123 fd=8 if=2 tss=2
> 2016-11-11T17:45:06Z ntp_core.c:1888:(update_tx_timestamp) Updated TX 
> timestamp delay=0.16447

This shows the problem is indeed in late reception of the transmit timestamp.
It comes after the response from the server is received, so the sample
is calculated with HW RX timestamp but only daemon TX timestamp.

I'm not sure if this is a driver or HW bug, or it's working as expected and
chrony just needs to be smarter and wait for the timestamp. This would be
tricky as we don't save packets and we generally don't know if there will
actually be a timestamp. I'll ask the kernel developers.

As a quick workaround I'd suggest to try it with this change:

--- a/ntp_io_linux.c
+++ b/ntp_io_linux.c
@@ -465,6 +465,8 @@ NIO_Linux_RequestTxTimestamp(struct msghdr *msg, int 
cmsglen, int sock_fd)
 {
   struct cmsghdr *cmsg;
 
+  usleep(100);
+
   /* Check if TX timestamping is disabled on this socket */
   if (permanent_ts_options || !NIO_IsServerSocket(sock_fd))
 return cmsglen;


-- 
Miroslav Lichvar

-- 
To unsubscribe email chrony-dev-requ...@chrony.tuxfamily.org with "unsubscribe" 
in the subject.
For help email chrony-dev-requ...@chrony.tuxfamily.org with "help" in the 
subject.
Trouble?  Email listmas...@chrony.tuxfamily.org.