Re: [chrony-dev] Moving chrony website

2023-07-26 Thread Denny Page


> On Jul 26, 2023, at 07:05, Miroslav Lichvar  wrote:
> 
> On Mon, Jul 24, 2023 at 05:07:13PM -0700, Denny Page wrote:
>> FWIW, it looks like chrony.net <http://chrony.net/> and chrony.org 
>> <http://chrony.org/> are just being camped on hoping for a quick sale. Both 
>> are held by the same individual (Tampa FL), so it is likely either one or 
>> both domains could purchased for a small fee.
> 
> These two and few others seem to be owned by a Canadian company which
> used to sell shooting chronographs. Last update on their websites is
> from 2007. The contact form doesn't work. The phone number goes
> straight to voicemail. I left a message, but got no calls back.

I think the Canadian company is long gone. The domains have privacy, but the 
underlying IP information does not. Reading the tea leaves… Email for both 
chrony.net <http://chrony.net/> and chrony.com <http://chrony.com/> route to a 
company called “The Producers, Inc.” Primary DNS for both domains is handled by 
"Directnic, LLC” (directnic.com <http://directnic.com/>), which is a 
domain/hosting reseller. Both of these companies are owned by the same two 
people, and both are based in the same residence in Tampa Bay, FL.

Contact name both is Sigmund Solares, email ssola...@theproducers.com 
<mailto:ssola...@theproducers.com>, phone +1-813-642-4377.



https://bisprofiles.com/fl/the-producers-p05000164851 

https://bisprofiles.com/fl/directnic-l2194400



Re: [chrony-dev] Moving chrony website

2023-07-24 Thread Denny Page
I think this is a really good idea.

FWIW, it looks like chrony.net  and chrony.org 
 are just being camped on hoping for a quick sale. Both are 
held by the same individual (Tampa FL), so it is likely either one or both 
domains could purchased for a small fee.

Failing that, I personally like chrony-project.org.

Best,
Denny


> On Jul 24, 2023, at 06:22, Miroslav Lichvar  wrote:
> 
> To make it easy to move again if/when gitlab turns out to be
> problematic, I'm considering to register a domain for chrony. All the
> nice ones like chrony.net , chrony.org 
> , chrony.cd seem to be taken.
> The following domains are currently available, which I thought might
> be acceptable:
> 
> - chronyd.net 
> - chronyd.org 
> - chrony-project.net 
> - chrony-project.org 
> - chrony.network
> - chrony.dev



Re: [chrony-dev] Frequency transfer in NTP

2021-01-28 Thread Denny Page
Thank you for sending that out Miroslav. Very interesting read.

Denny


> On Jan 28, 2021, at 06:40, Miroslav Lichvar  wrote:
> 
> I guess most people here don't follow the NTP WG list. There is one
> feature proposed for NTPv5 that I think would make a big difference
> for chrony:
> 
> https://fedorapeople.org/~mlichvar/ntp-freq-transfer/
> 
> -- 
> Miroslav Lichvar
> 
> 
> -- 
> To unsubscribe email chrony-dev-requ...@chrony.tuxfamily.org with 
> "unsubscribe" in the subject.
> For help email chrony-dev-requ...@chrony.tuxfamily.org with "help" in the 
> subject.
> Trouble?  Email listmas...@chrony.tuxfamily.org.
> 


--
To unsubscribe email chrony-dev-requ...@chrony.tuxfamily.org with "unsubscribe" 
in the subject.
For help email chrony-dev-requ...@chrony.tuxfamily.org with "help" in the 
subject.
Trouble?  Email listmas...@chrony.tuxfamily.org.



Re: [chrony-dev] new feature request: add "fast" and "slow" to "clock wrong" and "clock stepped" log messages

2017-11-14 Thread Denny Page
I think the core issue is that the log messages are intended to communicate 
expressions of multiple complex states. I think of it in this simplified way: 
This is the state from 3 minutes ago; This is the correction action that I 
computed for that state which was scheduled to be completed 2 minutes from now; 
This is the state that I computed that I should currently be in based on that 
correction process; This is the state that I am actually in, and it doesn’t 
match the computed state. All of this is very difficult to reduce to a single 
layman oriented log message.

2017-11-14 11:39:17 chronyd: System clock appears to be unreliable

Denny


--
To unsubscribe email chrony-dev-requ...@chrony.tuxfamily.org with "unsubscribe" 
in the subject.
For help email chrony-dev-requ...@chrony.tuxfamily.org with "help" in the 
subject.
Trouble?  Email listmas...@chrony.tuxfamily.org.



Re: [chrony-dev] new feature request: add "fast" and "slow" to "clock wrong" and "clock stepped" log messages

2017-10-30 Thread Denny Page
FWIW, I believe “ahead” and “behind” are the most clear in speaking to the 
condition before initiating a correct. “Fast” and “Slow” somewhat imply an 
ongoing condition that may or may not be true. It could be the accuracy of 
setting the RTC before reboot for instance.

System clock ahead by ?.
System clock behind by ?.

Denny


> On Oct 30, 2017, at 04:07, Miroslav Lichvar  wrote:
> 
> On Fri, Oct 27, 2017 at 10:31:16AM -0600, James Feeney wrote:
>> How about, rather than using the term "wrong", instead use the terms "fast" 
>> and "slow" to describe this quantity "1.693005 seconds"?  Then the log 
>> message might read:
>> 
>> chronyd[622]: System clock fast by 1.693005 seconds, adjustment started
>> 
>> or
>> 
>> chronyd[622]: System clock slow by 1.693005 seconds, adjustment started
> 
> I agree this would be much clearer for the user. I never remember
> which sign is for fast and slow in what context (it's not consistent
> unfortunately). The trouble is that it would break existing scripts
> that parse the log and the parsing itself would be more difficult if
> it had to look for the word "slow" or "fast" instead of the sign. I'm
> not sure how important that really is.
> 
> Do you think it would make sense to keep the sign and indicate whether
> it's fast or slow in parentheses?
> 
> System clock wrong by +/-?.??? (slow/fast) ?
> 
> System clock stepped by +/-?.??? (forward/backward) ?
> 
> There are other messages that print an offset, so maybe they could be
> all changed at once to keep it consistent.
> 
> What do others think?
> 
> -- 
> Miroslav Lichvar
> 
> -- 
> To unsubscribe email chrony-dev-requ...@chrony.tuxfamily.org with 
> "unsubscribe" in the subject.
> For help email chrony-dev-requ...@chrony.tuxfamily.org with "help" in the 
> subject.
> Trouble?  Email listmas...@chrony.tuxfamily.org.
> 


--
To unsubscribe email chrony-dev-requ...@chrony.tuxfamily.org with "unsubscribe" 
in the subject.
For help email chrony-dev-requ...@chrony.tuxfamily.org with "help" in the 
subject.
Trouble?  Email listmas...@chrony.tuxfamily.org.



Re: [chrony-dev] [contrib] unidled: a linux idle management daemon for chronyd/gpsd

2017-10-06 Thread Denny Page
Andreas,

Okay, that’s damn creative. A very interesting alternative to disabling power 
management. :)

Kinda cool.

One thing however...

Comments from the code:

 * When using unidled, assert that all of gpsd, chronyd and unidled are set
 * to run on the same core and with realtime privilege (unidled highest,
 * followed by gpsd and then chronyd whith lowest privilege). Make sure that
 * the pps serial interrupt is served by the same core, use a script to
 * irqbalance to assert this, if required.

Setting real-time priority for gpsd and chronyd will be counterproductive, 
particularly when you have them bound to the processor that is handling the IRQ 
for the serial port. When using kernel PPS (KPPS), the PPS timestamp is set by 
the kernel IRQ handler relative to the system clock at that point. Gpsd is not 
involved in this process. When gpsd subsequently runs, it uses the kernel’s PPS 
timestamp as the basis for what it delivers to chronyd. Yes, without KPPS, gpsd 
would be the generator of the timestamp and process priority could certainly 
have an impact, but with KPPS it should not offer benefit. Likewise, while 
chronyd previously benefited from real-time priority before support for 
kernel/hardware packet timestamping was introduced, it doesn’t really offer 
benefit anymore.

Running processes at real-time priority unnecessarily, particularly if you are 
not using a preempt-able kernel, will have an adverse effect on the jitter of 
the IRQ timestamp. It makes sense to keep a high process priority for unidled, 
but you should see comparable or better results if you run gpsd and chronyd at 
normal priority. Note that IRQ jitter is heavily masked by the larger polling 
interval and median filter. If you want to get a good view of the actual PPS 
jitter you can pull the raw clock data out of refclocks.log and do calculations 
based on that. Or use ppswatch.

FWIW, the best result I see is with a pre-emptable kernel @ 1000HZ, power 
management disabled, and a single core servicing the serial IRQ (and no other 
IRQs). I test with on a C2558 based system with a serial driver that doesn’t 
support MSI. With a poll interval of 3 and a median filter length of 8, I see 
an average offset in chrony of ~7us (IRQ response time), with an average 
standard deviation of ~200ns. However if I look at the underlying PPS signal, 
the actual standard deviation is ~700ns.

Your mileage may vary.

Denny


--
To unsubscribe email chrony-dev-requ...@chrony.tuxfamily.org with "unsubscribe" 
in the subject.
For help email chrony-dev-requ...@chrony.tuxfamily.org with "help" in the 
subject.
Trouble?  Email listmas...@chrony.tuxfamily.org.



Re: [chrony-dev] SW/HW timestamping on Linux

2017-01-19 Thread Denny Page

> On Jan 18, 2017, at 23:28, Miroslav Lichvar  wrote:
> 
> On Wed, Jan 18, 2017 at 08:41:03PM -0800, Denny Page wrote:
>> Tx -> Rx timestamp:
>>  Uncompensated3175ns  (stddev 110ns)
>>  Compensated  3175ns
>> 
>> Connection types:   ExpectedError
>>  Regular switch  6721ns  -3546ns
>>  Cut-through switch  2081ns   1094ns
>>  loopback cable 1ns   3174ns
>> 
>> Intel’s predicted result is 3177ns.
> 
> Interesting. So 3174ns is what txcomp+rxcomp should be for the i211?

Yes, although I’ve seen numbers from 3172 to 3182. Intel's 3177 number is right 
in the middle of the range.

In the data sheet for the i210 & i211, Intel says the 100Mb min/max range for 
tx is 984/1024, and 2148/2228 for rx. However they go on to say that when 
measured against an external link partner, there is “a shift of approximately 
40ns”. They go on to give the following average values as seen externally:

Parameter

Average

Comment

Tx timestamp to start of SFD on MDI

1044 ns

The range (max minus min) values measured for the Tx and Rx latency parameters 
are similar to the measured parameters in a stand-alone setup.

Start of SDF on MDI to Rx timestamp

2133 ns

Tx + Rx latency

3177 ns


I’m trusting these values as they correspond quite closely to my test results.

The kernel driver for the i210 uses 1024/2213. Based on both Intel’s data and 
my own tests, these appear to be wrong.

Denny



Re: [chrony-dev] SW/HW timestamping on Linux

2017-01-18 Thread Denny Page
I’ve pushed a new version of ethtscal that greatly improves the 
accuracy/stability of results. With runs of 5 packets, I’m seeing 
consistency of +-5ns on loopback. If you pulled an older version, I recommend 
pulling a new one. 

Here is a sample run with i211 chips with loopback cable:

Iterating over 10 packets (minimum runtime 200 seconds)

Tx interface igb0, speed 100Mb, adddress 0:8:a2:9:5b:8b
Rx interface igb1, speed 100Mb, adddress 0:8:a2:9:5b:8a

Tx avg ptp offset -200414826ns, 15 of 25 samples used
Rx avg ptp offset -96810099ns, 15 of 25 samples used

Misc times:
  Tx before send  ->  after send   8108ns
  Tx before send  ->  tx timestamp15481ns
  Tx after send   ->  before poll224578ns
  Tx after send   ->  after poll 231414ns
  Rx timestamp->  poll return 27296ns

Compensation values:
  Tx timestamp0ns
  Rx timestamp0ns
  Switch port to port 0ns
  Cable length1ns  (0.3m)
  Total   1ns

Tx -> Rx timestamp:
  Uncompensated3175ns  (stddev 110ns)
  Compensated  3175ns

Connection types:   ExpectedError
  Regular switch  6721ns  -3546ns
  Cut-through switch  2081ns   1094ns
  loopback cable 1ns   3174ns

Intel’s predicted result is 3177ns.

Denny




Re: [chrony-dev] SW/HW timestamping on Linux

2017-01-18 Thread Denny Page

> On Jan 18, 2017, at 09:56, Miroslav Lichvar  wrote:
> 
> Avoiding the asymmetry on the PCIe bus is what I'm most interested in.

I’m not sure how much asymmetry on the PICe bus there is. Or how it affects 
PTP_SYS_OFFSET.

PCIe asymmetry/delay  won't affect the actual timestamps. To my knowledge, the 
only asymmetry there is the PHY connection.

Denny



Re: [chrony-dev] SW/HW timestamping on Linux

2017-01-18 Thread Denny Page

> On Jan 18, 2017, at 09:11, Denny Page  wrote:
> 
> 
>> On Jan 18, 2017, at 05:43, Miroslav Lichvar > <mailto:mlich...@redhat.com>> wrote:
>> Denny,
>> 
>> there is apparently a new ioctl for measuring the offset between the
>> NIC clock and system clock, which is supported on some onboard NICs
>> (that share clock with the CPU?). It's called PTP_SYS_OFFSET_PRECISE
>> and if I understand how it works correctly, it should solve all these
>> problems with delay and asymmetry. Is it supported on the i211 or
>> i350?
> 
> 
> Miroslav,
> 
> Unfortunately, no. PTP_SYS_OFFSET_PRECISE uses PCIE_PTM (Precise Time 
> Measurement) which requires newer PCIe chipsets. I don’t have a system that 
> supports PTM. Best that I can tell, the only Ethernet driver in the kernel 
> that currently offers PTM support is the Intel PCI-Express pro/1000 (E1000E), 
> and I don’t have one of those either. Otherwise I would have added support 
> for it in ethtscal. :)
> 
> https://pcisig.com/sites/default/files/specification_documents/ECN_PTM_Revision1a_31_Mar_2013.pdf
>  
> <https://pcisig.com/sites/default/files/specification_documents/ECN_PTM_Revision1a_31_Mar_2013.pdf>
> 
> Denny
> 


And To be clear, this would not address phy/mac delay/asymmetry issues. It 
would just replace the call to PTP_SYS_OFFSET. Instead of an array that you 
have to interpolate, you just get a single (accurate) value. This would be very 
helpful however, because PTP_SYS_OFFSET is the source of most of the variance 
in ethtscal.

Denny



Re: [chrony-dev] SW/HW timestamping on Linux

2017-01-18 Thread Denny Page

> On Jan 18, 2017, at 05:43, Miroslav Lichvar  wrote:
> Denny,
> 
> there is apparently a new ioctl for measuring the offset between the
> NIC clock and system clock, which is supported on some onboard NICs
> (that share clock with the CPU?). It's called PTP_SYS_OFFSET_PRECISE
> and if I understand how it works correctly, it should solve all these
> problems with delay and asymmetry. Is it supported on the i211 or
> i350?


Miroslav,

Unfortunately, no. PTP_SYS_OFFSET_PRECISE uses PCIE_PTM (Precise Time 
Measurement) which requires newer PCIe chipsets. I don’t have a system that 
supports PTM. Best that I can tell, the only Ethernet driver in the kernel that 
currently offers PTM support is the Intel PCI-Express pro/1000 (E1000E), and I 
don’t have one of those either. Otherwise I would have added support for it in 
ethtscal. :)

https://pcisig.com/sites/default/files/specification_documents/ECN_PTM_Revision1a_31_Mar_2013.pdf
 


Denny



Re: [chrony-dev] SW/HW timestamping on Linux

2017-01-11 Thread Denny Page
The compensation values appear to be functioning as expected.

Thanks,
Denny


> On Jan 10, 2017, at 03:21, Miroslav Lichvar  wrote:
> 
> Anyway, could you please confirm the new txcomp and rxcomp options in
> the hwtimestamp directive work for you as expected?



Re: [chrony-dev] SW/HW timestamping on Linux

2017-01-10 Thread Denny Page

> On Jan 10, 2017, at 03:21, Miroslav Lichvar  wrote:
> 
> On Mon, Jan 09, 2017 at 09:02:14AM -0800, Denny Page wrote:
>> The program up if you want to have a look. I am still working on the 
>> guidelines for use.
>> 
>>  https://github.com/dennypage/ethtscal
>> 
>> My testing has been principally focused on the i354. I will also be testing 
>> the i211, which I believe should be very close to the i210, but I need to 
>> finish the i354 first. Unfortunately it’s a long iterative process to 
>> balance the tx and rx values. 
> 
> This is very interesting and I'm curious to see how you balance the
> tx and rx values. If I know the expected delay is x and I measure
> delay y, how do I distribute the (x-y) difference to the tx and rx
> comp values? Is there some symmetry in the RX and TX timestamping
> error?

Well, that’s the bad part—without a scope or other interval hardware device the 
only way to determine the values is by trial and error. The short version is 
that you establish a baseline for different speeds using loopback cables, and 
then work the values with different speeds through a switch until all the 
values fit. A long and painful process. And it doesn’t help that switches 
change latency between 10/100Mb and 1Gb.

The tx and rx compensation values are asymmetric on the Intel chips I’ve 
tested. Between 1:2.1 and 1:2.6 depending upon speed.

Denny


--
To unsubscribe email chrony-dev-requ...@chrony.tuxfamily.org with "unsubscribe" 
in the subject.
For help email chrony-dev-requ...@chrony.tuxfamily.org with "help" in the 
subject.
Trouble?  Email listmas...@chrony.tuxfamily.org.



Re: [chrony-dev] SW/HW timestamping on Linux

2017-01-09 Thread Denny Page
The program up if you want to have a look. I am still working on the guidelines 
for use.

  https://github.com/dennypage/ethtscal

My testing has been principally focused on the i354. I will also be testing the 
i211, which I believe should be very close to the i210, but I need to finish 
the i354 first. Unfortunately it’s a long iterative process to balance the tx 
and rx values. 

Denny


> On Jan 06, 2017, at 10:22, Denny Page  wrote:
> 
> I have a program that will test this for you. Give me a few days to finish 
> cleaning it up.
> 
> Denny
> 
> 
>> On Jan 06, 2017, at 05:04, Miroslav Lichvar > <mailto:mlich...@redhat.com>> wrote:
>> 
>> Could be the measured peer delay used to check if the combined TX/RX
>> compensation is correct? If I have two identical NICs connected
>> directly with a short cable on 1gbit, what is the expected peer delay?
>> With no compensation I see now delay of about 1450 ns between two i210
>> NICs.
> 



Re: [chrony-dev] SW/HW timestamping on Linux

2017-01-06 Thread Denny Page
I have a program that will test this for you. Give me a few days to finish 
cleaning it up.

Denny


> On Jan 06, 2017, at 05:04, Miroslav Lichvar  wrote:
> 
> Could be the measured peer delay used to check if the combined TX/RX
> compensation is correct? If I have two identical NICs connected
> directly with a short cable on 1gbit, what is the expected peer delay?
> With no compensation I see now delay of about 1450 ns between two i210
> NICs.



Re: [chrony-dev] SW/HW timestamping on Linux

2017-01-06 Thread Denny Page

> On Jan 06, 2017, at 05:04, Miroslav Lichvar  wrote:
> 
> On Thu, Jan 05, 2017 at 10:15:08AM -0800, Denny Page wrote:
>> Yes, comp is an abbreviation for compensation. One will always an add and 
>> the other is always a subtract. This is because the timestamps are defined 
>> as being at the php layer (last bit of the SFD), but the timestamps are 
>> being taken at the mii layer. For inbound, the phy layer precedes the mii 
>> layer which means that the timestamp has to be adjusted backward in time, 
>> hence the subtract. For outbound, the phy later succeeds the mii layer which 
>> means that the timestamp has to be adjusted forward in time, hence the add. 
>> Have a look at the igb kernel driver or linuxptp for examples.
> 
> Ok. I thought the drivers were supposed to adjust the timestamp for
> the (minimum?) MAC-PHY error.

That would be nice, but that doesn’t look like it’s going to happen. The only 
Intel driver that has compensation added in the kernel distribution is the 
i210. There is no compensation done for the i211, i350, etc. Notably, the 
official Intel driver, which is module only, does not have compensation for any 
device including the i210. The compensation that was introduced for the i210 in 
the kernel distribution was based on latency estimates from Intel in the i210 
data sheet. While the information for 100Mb is still present, Intel has 
withdrawn the the estimates for 10Mb and 1Gb, and they are no longer in the 
data sheet. Lastly, if one trusts the remaining 100Mb information, the values 
used in the kernel are incorrect. They took the tx compensation from table 
7-62, which is a stand-alone test, and the rx compensation from table 7-63 
which is a partner test. And even there, the numbers appear to have been 
mistyped. The data sheet says “2133” while the kernel has “2213”.  The correct 
numbers for tx/rx compensation according to the data sheet are 1024/2228 for 
stand-alone and 1044/2133 for partner observed. I raised this on the kernel 
Intel wired list twice, with no response so I doubt it’s going to get fixed.

If you want to have a look, see the following links.

Kernel source:
  
https://github.com/torvalds/linux/blob/master/drivers/net/ethernet/intel/igb/igb_ptp.c
 
<https://github.com/torvalds/linux/blob/master/drivers/net/ethernet/intel/igb/igb_ptp.c>
  
https://github.com/torvalds/linux/blob/master/drivers/net/ethernet/intel/igb/igb.h
 
<https://github.com/torvalds/linux/blob/master/drivers/net/ethernet/intel/igb/igb.h>

Intel data sheet:
  
http://www.intel.com/content/dam/www/public/us/en/documents/datasheets/i210-ethernet-controller-datasheet.pdf
 
<http://www.intel.com/content/dam/www/public/us/en/documents/datasheets/i210-ethernet-controller-datasheet.pdf>
 (page 341).

Official Intel driver:
  https://sourceforge.net/projects/e1000/files/igb%20stable/5.3.5.4 
<https://sourceforge.net/projects/e1000/files/igb%20stable/5.3.5.4>

In short, I think it’s much better to go the way of linuxptp and offer 
compensation in user space.

Denny



Re: [chrony-dev] SW/HW timestamping on Linux

2017-01-05 Thread Denny Page
Yes, comp is an abbreviation for compensation. One will always an add and the 
other is always a subtract. This is because the timestamps are defined as being 
at the php layer (last bit of the SFD), but the timestamps are being taken at 
the mii layer. For inbound, the phy layer precedes the mii layer which means 
that the timestamp has to be adjusted backward in time, hence the subtract. For 
outbound, the phy later succeeds the mii layer which means that the timestamp 
has to be adjusted forward in time, hence the add. Have a look at the igb 
kernel driver or linuxptp for examples.

I understand the desire to keep everything consistent in terms of seconds. I am 
concerned that this will lead to a lot of errors on people’s part because the 
values are always going to be two digit microseconds or under. For 100Mb on an 
i354 (more compensation than an i210/i211) the values would be:

  hwtimestamp igb0 txcomp 0.01188 rxcomp 0.02566

And for the more common 1Gb:

  hwtimestamp igb0 txcomp 0.00300 rxcomp 0.00645

As networks get faster, these compensations will get smaller.

This is derived from the description used by linuxptp:

txcomp
Specifies the difference in nanoseconds between the actual transmission
time at the physical layer and the reported transmit time stamp. This
value will be added to transmit time stamps obtained from the network
card. The default is 0.
rxcomp
Specifies the difference in nanoseconds between the reported receive
time stamp and the actual reception time at the physical layer. This value
will be subtracted from receive time stamps obtained from the network
card. The default is 0.

Does this work for you?

Denny



> On Jan 5, 2017, at 03:53, Miroslav Lichvar  wrote:
> 
> On Mon, Jan 02, 2017 at 10:54:16PM -0800, Denny Page wrote:
>> That would be great. For option names, I would suggest something like:
>> 
>>  hwtimestamp igb0 txcomp 178 rxcomp 448
>> 
>> txcomp is the number of nanoseconds to add to tx timestamps
>> rxcomp is the number of nanoseconds to subtract from rx timestamps
> 
> Ok, this looks doable.
> 
> comp as short for compensation? Why is one adding and the other
> substracting? Would it be ok with you if the values were specified in
> seconds to be consistent with other options?
> 
> If you could prepare a short description (or patch) for these options
> I could put in the man page, that would be great too :).
> 
> -- 
> Miroslav Lichvar
> 
> -- 
> To unsubscribe email chrony-dev-requ...@chrony.tuxfamily.org with 
> "unsubscribe" in the subject.
> For help email chrony-dev-requ...@chrony.tuxfamily.org with "help" in the 
> subject.
> Trouble?  Email listmas...@chrony.tuxfamily.org.
> 


--
To unsubscribe email chrony-dev-requ...@chrony.tuxfamily.org with "unsubscribe" 
in the subject.
For help email chrony-dev-requ...@chrony.tuxfamily.org with "help" in the 
subject.
Trouble?  Email listmas...@chrony.tuxfamily.org.



Re: [chrony-dev] SW/HW timestamping on Linux

2017-01-02 Thread Denny Page
That would be great. For option names, I would suggest something like:

  hwtimestamp igb0 txcomp 178 rxcomp 448

txcomp is the number of nanoseconds to add to tx timestamps
rxcomp is the number of nanoseconds to subtract from rx timestamps

Thanks,
Denny


> On Jan 02, 2017, at 05:47, Miroslav Lichvar  wrote:
> 
> On Mon, Dec 19, 2016 at 10:36:15PM -0800, Denny Page wrote:
>> Yes, I think offset covers the delayAsymmetry feature. The other two, 
>> latency for tx and rx would be very, very helpful. Any chance of that 
>> happening prior to release?
> 
> Maybe. I'll see how invasive that change would be. I have now only few
> minor changes in the queue with some additions to the testsuite. I'd
> like to do another round of testing and make a final release the next
> week.
> 
> Can you suggest names for these options? I assume they would be an
> additional correction to the one derived from the length of the packet
> and link speed.


--
To unsubscribe email chrony-dev-requ...@chrony.tuxfamily.org with "unsubscribe" 
in the subject.
For help email chrony-dev-requ...@chrony.tuxfamily.org with "help" in the 
subject.
Trouble?  Email listmas...@chrony.tuxfamily.org.



Re: [chrony-dev] SW/HW timestamping on Linux

2016-12-19 Thread Denny Page
Yes, I think offset covers the delayAsymmetry feature. The other two, latency 
for tx and rx would be very, very helpful. Any chance of that happening prior 
to release?

Thanks,
Denny


> On Dec 15, 2016, at 00:30, Miroslav Lichvar  wrote:
> 
> On Wed, Dec 14, 2016 at 10:34:19PM -0800, Denny Page wrote:
>> Btw, it might be good for Chrony to support configuration for corrections 
>> similar to what is discussed in that thread:
>> 
>>  delayAsymmetry - corrects for any asymmetry between the Rx and Tx paths
>>  egressLatency  - corrects the transmit latency at the MAC/PHY
>>  ingressLatency - corrects the receive latency at the MAC/PHY
> 
> I think the offset option in chrony is basically the same as
> delayAsymmetry, but there is nothing like the other two options. They
> could be specified in the hwtimestamp directive and they would be
> useful on servers.



Re: [chrony-dev] SW/HW timestamping on Linux

2016-12-14 Thread Denny Page
Btw, it might be good for Chrony to support configuration for corrections 
similar to what is discussed in that thread:

  delayAsymmetry - corrects for any asymmetry between the Rx and Tx paths
  egressLatency  - corrects the transmit latency at the MAC/PHY
  ingressLatency - corrects the receive latency at the MAC/PHY

Denny


> On Dec 13, 2016, at 23:21, Miroslav Lichvar  wrote:
> 
> [1] https://sourceforge.net/p/linuxptp/mailman/message/34782351/


--
To unsubscribe email chrony-dev-requ...@chrony.tuxfamily.org with "unsubscribe" 
in the subject.
For help email chrony-dev-requ...@chrony.tuxfamily.org with "help" in the 
subject.
Trouble?  Email listmas...@chrony.tuxfamily.org.



Re: [chrony-dev] SW/HW timestamping on Linux

2016-12-14 Thread Denny Page
Interesting! I would not have thought of this. Both the i211 and i354 have the 
SDP pins. Not exactly a pre-made PPS input, but possible with some soldering 
and some code.

Denny


> On Dec 13, 2016, at 23:21, Miroslav Lichvar  wrote:
> 
> On Tue, Dec 13, 2016 at 05:03:15PM -0800, Denny Page wrote:
>> Unfortunately, I’ve not yet received any response on the Intel list. While 
>> it’s quite clear that both the i211 and i354 require some compensation, I 
>> don’t have a good way to determine it other than by shots in the dark. The 
>> i211 seems to correspond with the i210 correction parameters, so I am 
>> currently using those for the i210, but the i354 is completely different. 
>> Still at it.
> 
> Do you know if the i211 or i354 has a PPS input? i210 does have one
> [1]. I have not had a chance to try it yet. If you could connect it to
> the LeoNTP unit, maybe you could measure the difference between errors
> in TX and RX timestamping, and with direct connection to the server
> maybe even estimate the absolute value of the TX/RX corrections.
> 
> [1] https://sourceforge.net/p/linuxptp/mailman/message/34782351/


--
To unsubscribe email chrony-dev-requ...@chrony.tuxfamily.org with "unsubscribe" 
in the subject.
For help email chrony-dev-requ...@chrony.tuxfamily.org with "help" in the 
subject.
Trouble?  Email listmas...@chrony.tuxfamily.org.



Re: [chrony-dev] SW/HW timestamping on Linux

2016-12-13 Thread Denny Page
Also, would it be possible to allow hardware timestamps to be used on one 
interface, and software timestamps on another (without falling back to daemon 
timestamps)?

Thanks,
Denny


--
To unsubscribe email chrony-dev-requ...@chrony.tuxfamily.org with "unsubscribe" 
in the subject.
For help email chrony-dev-requ...@chrony.tuxfamily.org with "help" in the 
subject.
Trouble?  Email listmas...@chrony.tuxfamily.org.



Re: [chrony-dev] SW/HW timestamping on Linux

2016-12-13 Thread Denny Page
Unfortunately, I’ve not yet received any response on the Intel list. While it’s 
quite clear that both the i211 and i354 require some compensation, I don’t have 
a good way to determine it other than by shots in the dark. The i211 seems to 
correspond with the i210 correction parameters, so I am currently using those 
for the i210, but the i354 is completely different. Still at it.

Denny


> On Dec 7, 2016, at 23:04, Denny Page  wrote:
> 
> On the offset issue… one of the benefits of having so many ports, and several 
> of hardware based NTP units (thanks Leo/Anthony!) is that I have been able to 
> replicate the issue by connecting serval ports directly from host to host 
> with and without an intervening switch. I’ve also been able to test different 
> chipsets. The short version of all the tests is that there is clear asymmetry 
> with hardware timestamps. Based on research and the fact that the asymmetry 
> varies with chipset, my best guess is that the hardware tx/rx timestamps are 
> incorrect and need compensation in the driver. The 4.8.X kernel introduced 
> compensation for the i210, but not for the i211 or the i354 (which I am 
> testing with). I have a query into the Intel folk about this and will let you 
> know what they say.


--
To unsubscribe email chrony-dev-requ...@chrony.tuxfamily.org with "unsubscribe" 
in the subject.
For help email chrony-dev-requ...@chrony.tuxfamily.org with "help" in the 
subject.
Trouble?  Email listmas...@chrony.tuxfamily.org.



Re: [chrony-dev] SW/HW timestamping on Linux

2016-12-07 Thread Denny Page
Thank you


> On Apr 11, 2017, at 12:14, Miroslav Lichvar  wrote:
> 
> On Wed, Dec 07, 2016 at 11:26:19PM -0800, Denny Page wrote:
>> Are you running linuxptp to manage the clock or letting it free-run? I have 
>> some ptp servers, but have been focused on testing ntp with chrony managing 
>> the clock.
> 
> The PHC is free running. As long as the system clock can stay close to
> it, I think it shouldn't matter.


--
To unsubscribe email chrony-dev-requ...@chrony.tuxfamily.org with "unsubscribe" 
in the subject.
For help email chrony-dev-requ...@chrony.tuxfamily.org with "help" in the 
subject.
Trouble?  Email listmas...@chrony.tuxfamily.org.



Re: [chrony-dev] SW/HW timestamping on Linux

2016-12-07 Thread Denny Page
Are you running linuxptp to manage the clock or letting it free-run? I have 
some ptp servers, but have been focused on testing ntp with chrony managing the 
clock.

Thanks,
Denny


> On Apr 11, 2017, at 11:56, Miroslav Lichvar  wrote:
> 
> I've synchronised the system clock to the PHC with
> "phc2sys -s eth0 -m -q -O 0 -N 25 -R 20", configured chronyd to not
> adjust the system clock (noselect), and printed in chronyd the raw
> PHC-sys offset and delay for each PHC reading.



Re: [chrony-dev] SW/HW timestamping on Linux

2016-12-07 Thread Denny Page
On the offset issue… one of the benefits of having so many ports, and several 
of hardware based NTP units (thanks Leo/Anthony!) is that I have been able to 
replicate the issue by connecting serval ports directly from host to host with 
and without an intervening switch. I’ve also been able to test different 
chipsets. The short version of all the tests is that there is clear asymmetry 
with hardware timestamps. Based on research and the fact that the asymmetry 
varies with chipset, my best guess is that the hardware tx/rx timestamps are 
incorrect and need compensation in the driver. The 4.8.X kernel introduced 
compensation for the i210, but not for the i211 or the i354 (which I am testing 
with). I have a query into the Intel folk about this and will let you know what 
they say.

Denny


--
To unsubscribe email chrony-dev-requ...@chrony.tuxfamily.org with "unsubscribe" 
in the subject.
For help email chrony-dev-requ...@chrony.tuxfamily.org with "help" in the 
subject.
Trouble?  Email listmas...@chrony.tuxfamily.org.



Re: [chrony-dev] SW/HW timestamping on Linux

2016-12-07 Thread Denny Page

> On Dec 07, 2016, at 09:20, Miroslav Lichvar  wrote:
> 
> Yes, I think that might be helpful. I spent some time today comparing
> your method with the current code and at least on my system with i210
> I see a shift in the distribution of the offset to one side when the
> network is (heavily) loaded. Compare these two histograms
> 
> http://i.imgur.com/GsRwhyX.png (min delay * 1.1)
> http://i.imgur.com/MHYRCCx.png (min delay + sys_prec)

Can you say a bit about how these graphs were done? I would like to compare to 
the systems I have.

Have you tested any NICs other than the I210?

Thanks,
Denny


--
To unsubscribe email chrony-dev-requ...@chrony.tuxfamily.org with "unsubscribe" 
in the subject.
For help email chrony-dev-requ...@chrony.tuxfamily.org with "help" in the 
subject.
Trouble?  Email listmas...@chrony.tuxfamily.org.



Re: [chrony-dev] SW/HW timestamping on Linux

2016-12-06 Thread Denny Page
Not sure what you meant here. The current code bases all its calculations of 
intervals base on the begin time of slot zero. Even if slot zero is bad, it’s 
total window is used in each time interval calculation. In the new code, a bad 
slot is never used in any way even if it is slot zero.

Denny



> On Dec 6, 2016, at 01:43, Miroslav Lichvar  wrote:
> 
>> The change has a couple of things it does. First, it ignore bad slots. On 
>> all the systems I’ve tested, slot one is always the worst slot, which long 
>> delays, resulting in a baseline skew.
> 
> The current code does that too.



Re: [chrony-dev] SW/HW timestamping on Linux

2016-12-06 Thread Denny Page
Yes, the general server that I refer to, and which the graphs are from, is 
fairly active on the primary nic. It’s the network log server, does some DB, 
etc.

On the quiet system, the change has much less impact. But it was the general 
server that I was frequently seeing small spikes on.

I can certainly load the network a lot more if you would find it beneficial for 
me to test that.

Denny


> On Dec 6, 2016, at 01:43, Miroslav Lichvar  wrote:
> 
> Have you tried it on a system where the network card is busy?



Re: [chrony-dev] SW/HW timestamping on Linux

2016-12-05 Thread Denny Page
And a visual representation. This is from the loaded system.




Re: [chrony-dev] SW/HW timestamping on Linux

2016-12-05 Thread Denny Page
Here is an example of where I am at after the change:

210 Number of sources = 6
MS Name/IP address Stratum Poll Reach LastRx Last sample   
===
^* 192.168.230.240   1   0   377 0+24ns[  +22ns] +/-   13us
^+ 192.168.230.244   1   0   377 0+35ns[  +35ns] +/-   13us
^? 192.168.230.245   1   0   377 0  +1323ns[+1323ns] +/-   10us
=? 192.168.230.2 2   1   377 3 +5ns[   +3ns] +/-   36us
=? 192.168.231.2 2   1   377 1   +983ns[ +983ns] +/-   32us
=? 192.168.232.2 2   1   377 3   +884ns[ +882ns] +/-   53us
210 Number of sources = 6
Name/IP AddressNP  NR  Span  Frequency  Freq Skew  Offset  Std Dev
==
192.168.230.24043  2657 -0.000  0.001 +9ns37ns
192.168.230.24440  2352 +0.000  0.001 -7ns33ns
192.168.230.24540  2453 +0.000  0.001  +1281ns34ns
192.168.230.2  22  1345 -0.001  0.002-29ns38ns
192.168.231.2  21  1245 +0.001  0.002   +971ns42ns
192.168.232.2  33  1773 -0.001  0.002   +854ns75ns

240,244,245 are the hardware units. All the .2s are chrony.

Denny



Re: [chrony-dev] SW/HW timestamping on Linux

2016-12-05 Thread Denny Page

> On Dec 05, 2016, at 01:05, Miroslav Lichvar  wrote:
> 
> If I understand your change correctly, it increases the minimum
> acceptable delay, which increases the number of combined readings and
> makes the offset more stable. I'm worried on a busy system bad
> readings could be included and disrupt the mean value.

On a busier systems, I was actually seeing more error than on quiet systems. 
This brings the noise and spike level on the busy system down to the same level 
as the quiet system.

The change has a couple of things it does. First, it ignore bad slots. On all 
the systems I’ve tested, slot one is always the worst slot, which long delays, 
resulting in a baseline skew. The second is avoiding the SysPrecision variable 
as a gate. Using this results in a tendency to select the wider slots for 
averaging, while ignoring the consistent slots. In my testing on busy systems, 
it discarded many slots, sometimes to the point of all but two slots. This is 
obviously bad for averaging. My first attempt was simply to remove the 
averaging and use the best slot. This provided an improvement on the noise over 
the prior approach, but using the slots within 10% is smoother.

FWIW, I haven’t dug into how SysPrecision is calculated, but in looking at 
several systems it appears to be inconsistent on identical hardware.


> Note that if you save the offset between PHC and system clock to a
> double, you will lose precision when the offset is large as double has
> a 53-bit precision.

I agree. I think all of this should be done using 64 bit integers, but the code 
that was there was using floating point so I stayed with that to fit in. Use of 
doubles seems rather pervasive throughout chrony. I was very surprised by this. 
I think it makes sense to use integer math for as much as possible. At least 
for anything to do with time intervals. But it seemed a bit much to rewrite 
everything to address this one issue. :)

Denny



Re: [chrony-dev] SW/HW timestamping on Linux

2016-12-04 Thread Denny Page
I’m still chasing the offset issue. However on the spikes/noise issue, the 
patch below addresses the noise and low level spikes that I was seeing. I was 
seeing errors of 2-60ns off in the average offset for the nic clock, which this 
corrects. I’m still seeing the occasional larger spike, but this does not 
appear to be related to issues with PTP_SYS_OFFSET. I’m speculating that the 
large spikes are associated with the kernel preemption, but haven’t had time to 
test this theory.

Denny


*** ntp_io_linux.c.org  Wed Nov 23 08:41:54 2016
--- ntp_io_linux.c  Sun Dec  4 22:32:45 2016
***
*** 288,295 
  {
struct ptp_sys_offset sys_off;
struct timespec ts1, ts2, ts3, phc_tss[PHC_READINGS], sys_tss[PHC_READINGS];
!   double min_delay = 0.0, delays[PHC_READINGS], phc_sum, local_sum, 
local_prec;
!   int i, n;
  
/* Silence valgrind */
memset(&sys_off, 0, sizeof (sys_off));
--- 288,295 
  {
struct ptp_sys_offset sys_off;
struct timespec ts1, ts2, ts3, phc_tss[PHC_READINGS], sys_tss[PHC_READINGS];
!   double delays[PHC_READINGS], delay_limit, delay_sum, offset_sum;
!   int min_delay = 0, i, n;
  
/* Silence valgrind */
memset(&sys_off, 0, sizeof (sys_off));
***
*** 317,344 
/* Step in the middle of a PHC reading? */
return 0;
  
! if (!i || delays[i] < min_delay)
!   min_delay = delays[i];
}
  
-   local_prec = LCL_GetSysPrecisionAsQuantum();
- 
/* Combine best readings */
!   for (i = n = 0, phc_sum = local_sum = 0.0; i < PHC_READINGS; i++) {
! if (delays[i] > min_delay + local_prec)
continue;
  
! phc_sum += UTI_DiffTimespecsToDouble(&phc_tss[i], &phc_tss[0]);
! local_sum += UTI_DiffTimespecsToDouble(&sys_tss[i], &sys_tss[0]) + 
delays[i] / 2.0;
  n++;
}
  
assert(n);
  
!   UTI_AddDoubleToTimespec(&phc_tss[0], phc_sum / n, phc_ts);
!   UTI_AddDoubleToTimespec(&sys_tss[0], local_sum / n, &ts1);
LCL_CookTime(&ts1, local_ts, NULL);
!   *p_delay = min_delay;
  
return 1;
  }
--- 317,345 
/* Step in the middle of a PHC reading? */
return 0;
  
! if (delays[i] < delays[min_delay])
!   min_delay = i;
}
  
/* Combine best readings */
!   delay_limit = delays[min_delay] * 1.1;
!   for (i = n = 0, delay_sum = offset_sum = 0.0; i < PHC_READINGS; i++) {
! if (delays[i] > delay_limit)
continue;
  
! delay_sum += delays[i];
! UTI_AddDoubleToTimespec(&sys_tss[i], delays[i] / 2.0, &ts1);
! offset_sum += UTI_DiffTimespecsToDouble(&phc_tss[i], &ts1);
  n++;
}
  
assert(n);
  
!   ts1 = sys_tss[min_delay];
!   UTI_AddDoubleToTimespec(&ts1, offset_sum / n, phc_ts);
! 
LCL_CookTime(&ts1, local_ts, NULL);
!   *p_delay = delays[min_delay];
  
return 1;
  }


--
To unsubscribe email chrony-dev-requ...@chrony.tuxfamily.org with "unsubscribe" 
in the subject.
For help email chrony-dev-requ...@chrony.tuxfamily.org with "help" in the 
subject.
Trouble?  Email listmas...@chrony.tuxfamily.org.



Re: [chrony-dev] SW/HW timestamping on Linux

2016-11-28 Thread Denny Page
> On Nov 28, 2016, at 01:01, Miroslav Lichvar  wrote:
> 
> If you are sure the error doesn't come from the switch, I suspect it's
> a HW or driver issue. It seems the drivers need to have some
> timestamping-specific magic. Look at the following commit for
> instance.
> 
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=0066c8b6f4050d7c57f6379d6fd4535e2f267f17


This changes is why I asked what kernel version you were testing with on your 
I210. Intel documents these time offsets in the spec sheet for the I210. There 
are no offsets listed for the I211 or I354.


> I'd suggest to send a detailed report to the intel-wired-lan list and
> see if anyone has any suggestions on what could be wrong.



Before I try and make a case to the driver and hardware folk, I think I need to 
be able to explain how stamps on both two linux systems can sometimes be in 
agreement with stamps on the second interface and sometimes not. Given just the 
following two tests:

 igb0 @ 1Gb; igb3 @ 100Mb direct connect: 192.168.230.245 shows offset of 
+1230ns
 igb0 @ 100Mb; igb3 @ 100Mb, direct connect: 192.168.230.245 shows no offset

I cannot explain why the two linux systems do not disagree on stamps in the 
second test. Can you think of something that the driver or hardware could be 
doing that would explain that?

Denny


--
To unsubscribe email chrony-dev-requ...@chrony.tuxfamily.org with "unsubscribe" 
in the subject.
For help email chrony-dev-requ...@chrony.tuxfamily.org with "help" in the 
subject.
Trouble?  Email listmas...@chrony.tuxfamily.org.



Re: [chrony-dev] SW/HW timestamping on Linux

2016-11-27 Thread Denny Page
I still have one significant offset mystery with the latest version.

In the tests described below, the following IP addresses are involved:

  192.168.230.2 Linux box. igb0 connected to main switch at 1Gb.
  192.168.230.3 Linux box. igb0 connected to main switch, speed as described in 
tests. igb3 connected as described in tests.
  192.168.230.240 hardware NTP server connected to main switch at 100Mb.
  192.168.230.244 hardware NTP server connected to main switch at 100Mb.
  192.168.230.245 hardware NTP server connected to igb3 on 192.168.230.3 as 
described in tests.

Relevant config from 192.168.230.2:
  server 192.168.230.240 iburst minpoll 0 maxpoll 0
  server 192.168.230.244 iburst minpoll 0 maxpoll 0
  peer 192.168.230.3 iburst minpoll 0 maxpoll 0 xleave noselect
  hwtimestamp igb0

Relevant config from 192.168.230.3:
  server 192.168.230.240 iburst minpoll 0 maxpoll 0
  server 192.168.230.244 iburst minpoll 0 maxpoll 0 
  server 192.168.230.245 iburst minpoll 0 maxpoll 0 noselect
  peer 192.168.230.2 iburst minpoll 0 maxpoll 0 xleave noselect
  hwtimestamp igb0
  hwtimestamp igb3

On 192.168.230.2, igb0 is an Intel I211. On 192.168.230.3, igb0 and igb3 are 
both ports of an I354. Both Linux boxes show ‘H H’ in the measurements log for 
all servers/peers past the first few samples. Linux kernel version is 4.4.26, 
although I’ve done spot checks with 4.8.10 and seen similar results. 

Here are the various test restuls:

  igb0 @ 1Gb; igb3 @ 100Mb direct connect: 192.168.230.245 shows offset of 
+1230ns
  igb0 @ 100Mb; igb3 @ 100Mb, direct connect: 192.168.230.245 shows no offset
  igb0 @ 1Gb, igb3 @ 1Gb via secondary switch: 192.168.230.245 shows small 
offset of +75ns
  igb0 @ 1Gb, igb3 @ 100Mb via secondary switch: 192.268.230.245 shows offset 
of +1300ns
  igb0 @ 100Mb, igb3 @ 100Mb via secondary switch: 192.268.230.245 shows no 
offset
  igb0 @ 100Mb, igb3 @ 1Gb via secondary switch: 192.268.230.245 shows offset 
of -1100ns

What is very strange, is that in all tests, the two Linux boxes were closely 
in-line with each other, with no offset indicated by either side. If there 
speed connection issues, I would have expected to see an offset with igb0 @ 
100Mb.

Is there any possibility of cross pollination of hardware timestamps between 
the two interfaces?

Thanks,
Denny



--
To unsubscribe email chrony-dev-requ...@chrony.tuxfamily.org with "unsubscribe" 
in the subject.
For help email chrony-dev-requ...@chrony.tuxfamily.org with "help" in the 
subject.
Trouble?  Email listmas...@chrony.tuxfamily.org.



Re: [chrony-dev] SW/HW timestamping on Linux

2016-11-25 Thread Denny Page
I believe the stddev issue with multiple interfaces may have been leftover 
sampling from prior runs. I cleared all the point data, and things looked 
better. Sorry about that.

The priority issue remains, but I don’t think it’s a big deal per se. When 
running with scheduling priority (-P 9), I see about 30% of ‘D H’ for the fast 
units. Running without priority I see virtually none (all ‘H H’). I don’t 
believe that this is much of an issue, because scheduling priority is really 
only needed when daemon timestamps are in use. With hardware or driver 
timestamps, it isn't necessary.

Denny


> On Nov 24, 2016, at 05:11, Miroslav Lichvar  wrote:
> 
> On Wed, Nov 23, 2016 at 03:24:56PM -0800, Denny Page wrote:
>> I am now seeing better standard deviations with hardware timestamping than 
>> software timestamping. Thank you.
>> 
>> Couple of caveats:
>>  - I need to disable priority scheduling (-P). With priority scheduling, 
>> software stamps still have lower stddev.
>>  - I can only use a single ethernet interface. With multiple interfaces, 
>> software stamps still have lower stddev.
> 
> That's interesting. Were you testing this with the patch that ignores
> non-HW measurements and did any of these two things change probability
> of getting a 'D H' measurement?
> 
> FWIW, I tried running chronyd with -P 50 for a bit and I didn't notice
> any changes.


--
To unsubscribe email chrony-dev-requ...@chrony.tuxfamily.org with "unsubscribe" 
in the subject.
For help email chrony-dev-requ...@chrony.tuxfamily.org with "help" in the 
subject.
Trouble?  Email listmas...@chrony.tuxfamily.org.



Re: [chrony-dev] SW/HW timestamping on Linux

2016-11-25 Thread Denny Page
That cured my issue with offset between the hosts.


> On Nov 24, 2016, at 06:57, Miroslav Lichvar  wrote:
> 
> On Thu, Nov 24, 2016 at 06:46:36AM -0800, Denny Page wrote:
>> I do not have xleave enabled. Can you explain more as to why you would 
>> expect to see such an offset without xleave?
> 
> The offset is calculated from four timestamps. Two local and two
> remote. Without xleave the peer can't deliver the TX HW timestamp, so
> the offset will be calculated from three (two local + one remote) HW
> timestamps and one (remote) daemon timestamp, which will ruin the
> offset.


--
To unsubscribe email chrony-dev-requ...@chrony.tuxfamily.org with "unsubscribe" 
in the subject.
For help email chrony-dev-requ...@chrony.tuxfamily.org with "help" in the 
subject.
Trouble?  Email listmas...@chrony.tuxfamily.org.



Re: [chrony-dev] SW/HW timestamping on Linux

2016-11-24 Thread Denny Page
This makes sense. Thank you. I will enable xleave and retest shortly.

Denny


> On Nov 24, 2016, at 06:57, Miroslav Lichvar  wrote:
> 
> On Thu, Nov 24, 2016 at 06:46:36AM -0800, Denny Page wrote:
>> I do not have xleave enabled. Can you explain more as to why you would 
>> expect to see such an offset without xleave?
> 
> The offset is calculated from four timestamps. Two local and two
> remote. Without xleave the peer can't deliver the TX HW timestamp, so
> the offset will be calculated from three (two local + one remote) HW
> timestamps and one (remote) daemon timestamp, which will ruin the
> offset.


--
To unsubscribe email chrony-dev-requ...@chrony.tuxfamily.org with "unsubscribe" 
in the subject.
For help email chrony-dev-requ...@chrony.tuxfamily.org with "help" in the 
subject.
Trouble?  Email listmas...@chrony.tuxfamily.org.



Re: [chrony-dev] SW/HW timestamping on Linux

2016-11-24 Thread Denny Page
I do not have xleave enabled. Can you explain more as to why you would expect 
to see such an offset without xleave?

Thanks,
Denny

> On Nov 24, 2016, at 05:11, Miroslav Lichvar  wrote:
> 
> That suggests the interleaved mode is not working. Are the peers specified 
> with the xleave option?

--
To unsubscribe email chrony-dev-requ...@chrony.tuxfamily.org with "unsubscribe" 
in the subject.
For help email chrony-dev-requ...@chrony.tuxfamily.org with "help" in the 
subject.
Trouble?  Email listmas...@chrony.tuxfamily.org.



Re: [chrony-dev] SW/HW timestamping on Linux

2016-11-23 Thread Denny Page
I am now seeing better standard deviations with hardware timestamping than 
software timestamping. Thank you.

Couple of caveats:
  - I need to disable priority scheduling (-P). With priority scheduling, 
software stamps still have lower stddev.
  - I can only use a single ethernet interface. With multiple interfaces, 
software stamps still have lower stddev.

I am still seeing issue strange offset issues.

The view from 192.168.230.2:

210 Number of sources = 3
MS Name/IP address Stratum Poll Reach LastRx Last sample   
===
^* 192.168.230.240   1   0   377 0-20ns[  -21ns] +/-   13us
^+ 192.168.230.244   1   0   377 0+71ns[  +71ns] +/-   13us
=? 192.168.230.3 2   0   377 0  +9159ns[+9159ns] +/-   45us
210 Number of sources = 3
Name/IP AddressNP  NR  Span  Frequency  Freq Skew  Offset  Std Dev
==
192.168.230.24064  3865 +0.000  0.001+22ns41ns
192.168.230.24464  3269 +0.000  0.001-18ns47ns
192.168.230.3  64  3365 +0.001  0.001  +8851ns58ns


The view from 192.168.230.3:

210 Number of sources = 3
MS Name/IP address Stratum Poll Reach LastRx Last sample   
===
^* 192.168.230.240   1   0   377 0 +5ns[   +6ns] +/-   13us
^+ 192.168.230.244   1   0   377 0-32ns[  -32ns] +/-   13us
=? 192.168.230.2 2   0   377 0+15us[  +15us] +/-   51us
210 Number of sources = 3
Name/IP AddressNP  NR  Span  Frequency  Freq Skew  Offset  Std Dev
==
192.168.230.24016   815 +0.002  0.009+24ns45ns
192.168.230.24422   921 -0.002  0.007-25ns48ns
192.168.230.2  16   753 -0.004  0.003+12us45ns


Both crony instances think the other is off by a large amount. This 
disagreement is very stable.

Denny



> On Nov 23, 2016, at 01:40, Miroslav Lichvar  wrote:
> 
> On Fri, Nov 18, 2016 at 03:37:07PM +0100, Miroslav Lichvar wrote:
>> On Thu, Nov 17, 2016 at 05:44:23PM -0800, Denny Page wrote:
>>> Although reduced, I’m still seeing spikes with the patch below.
>> 
>> I'm not sure what could be wrong at this point. Maybe it really is a
>> kernel or HW issue. I'm wondering what would be the best way to
>> confirm or reject that idea.
> 
> I think I finally found what was causing the spikes for you. There was
> a typo in the code that was processing raw PHC readings, which
> effectively disabled filtering of delayed readings and added a
> significant error to the PHC sample time. It explains why the results
> I was seeing were not quite as good as I expected. Maybe because I'm
> testing on a machine with a faster CPU, it looked more like noise than
> spikes. 
> 
> It should be now fixed in git. Please test. If you were doing any
> experiments comparing offsets between SW and HW timestamping, you will
> probably need to start again as this bug effectively added an offset
> of maybe 1-3 microseconds. I'm sorry it took so long to figure it out. 


--
To unsubscribe email chrony-dev-requ...@chrony.tuxfamily.org with "unsubscribe" 
in the subject.
For help email chrony-dev-requ...@chrony.tuxfamily.org with "help" in the 
subject.
Trouble?  Email listmas...@chrony.tuxfamily.org.



Re: [chrony-dev] SW/HW timestamping on Linux

2016-11-23 Thread Denny Page
Miroslav, what version of the kernel are you using for testing? And what are 
you using for scheduling priority?

Denny
--
To unsubscribe email chrony-dev-requ...@chrony.tuxfamily.org with "unsubscribe" 
in the subject.
For help email chrony-dev-requ...@chrony.tuxfamily.org with "help" in the 
subject.
Trouble?  Email listmas...@chrony.tuxfamily.org.



Re: [chrony-dev] SW/HW timestamping on Linux

2016-11-22 Thread Denny Page
I tested with that earlier. Almost all the timestamps were rejected. But I was 
using poll 0. I’ll have to test again with a longer polling interval.

Denny


> On Nov 22, 2016, at 07:43, Miroslav Lichvar  wrote:
> 
> Can you confirm that with the patch I sent you earlier
> which waits with poll()? 



Re: [chrony-dev] SW/HW timestamping on Linux

2016-11-22 Thread Denny Page
Perfect. Thank you.

Denny


> On Nov 22, 2016, at 07:34, Miroslav Lichvar  wrote:
> 
> I've pushed to the git an implementation of the idea I proposed
> earlier, which assumes symmetric position of UDP data in received and
> transmitted packets. You can modify the calculation or hardcode the RX
> correction in process_hw_timestamp() in ntp_io_linux.c. It's the
> rx_correction variable.



Re: [chrony-dev] SW/HW timestamping on Linux

2016-11-21 Thread Denny Page
Miroslav,

Just to make sure I test correctly, would you mind creating a diff for a test 
version for me? One that a can plug a hard coded number of nanoseconds 
correction for just packet receive?

Thanks,
Denny


> On Nov 21, 2016, at 17:26, Denny Page  wrote:
> 
>  I may hard code a correction for the hardware timestamps in chrony for 
> testing.



Re: [chrony-dev] SW/HW timestamping on Linux

2016-11-21 Thread Denny Page
I would not have expected this, and I haven’t dug into it, but looking at the 
time to generate timestamps for pcap, I am seeing times of more than a 
second(!). I am wondering what happens if we get to the point of sending the 
next request before we have received the transmit timestamp for the prior one. 
Given the speed of the units I’m testing with, it appears to be possible to 
have received the second response before the first transmit timestamp.

Have you tested against a fast server with "minpoll 0 max poll 0”? 

Just a thought.

Denny


> On Nov 18, 2016, at 06:37, Miroslav Lichvar  wrote:
> 
> On Thu, Nov 17, 2016 at 05:44:23PM -0800, Denny Page wrote:
>> Although reduced, I’m still seeing spikes with the patch below.
> 
> I'm not sure what could be wrong at this point. Maybe it really is a
> kernel or HW issue. I'm wondering what would be the best way to
> confirm or reject that idea.


--
To unsubscribe email chrony-dev-requ...@chrony.tuxfamily.org with "unsubscribe" 
in the subject.
For help email chrony-dev-requ...@chrony.tuxfamily.org with "help" in the 
subject.
Trouble?  Email listmas...@chrony.tuxfamily.org.



Re: [chrony-dev] SW/HW timestamping on Linux

2016-11-21 Thread Denny Page
I checked this. It appears to be completely symmetrical at the switch level. 
For both the 1Gb and 100Mb ports, the delta I see for NTP packets is a very 
consistent 912ns on a 1Gb mirror port. This is exactly as expected:

  preamble: 56 bits
  SFD: 8 bits
  MAC dest: 48 bits
  MAC src: 48 bits
  Length: 16 bits
  IP/UDP/NTP: 608 bits
  FCS: 32 bits
  IPG: 96 bits

Total of 912 bits, which equates to 912ns on a 1Gb connection.

NB: The packet cannot be forwarded until it is received, so the preamble and 
IPG count when looking at packet to packet deltas when mirroring.

I’ll have to look at the driver/kernel next. It may just be an inherent offset 
due to software timestamping. I may hard code a correction for the hardware 
timestamps in chrony for testing.

Denny


> On Nov 21, 2016, at 10:50, Denny Page  wrote:
> 
> I am going to set up a span port to capture the packets on both interfaces 
> which may give an indication.


--
To unsubscribe email chrony-dev-requ...@chrony.tuxfamily.org with "unsubscribe" 
in the subject.
For help email chrony-dev-requ...@chrony.tuxfamily.org with "help" in the 
subject.
Trouble?  Email listmas...@chrony.tuxfamily.org.



Re: [chrony-dev] SW/HW timestamping on Linux

2016-11-21 Thread Denny Page

> On Nov 21, 2016, at 10:59, Miroslav Lichvar  wrote:
> 
> On Mon, Nov 21, 2016 at 10:28:26AM -0800, Denny Page wrote:
>> The only layer we know the size of is the UDP layer. The Ethernet layer can 
>> have VLAN tag. Yes, it’s only 32 bits, but error is error. The IP layer may 
>> have option headers. This is rare with IPv4, but not with IPv6.
> 
> With TX timestamping the kernel gives us ethernet frames and we have to
> painfully extract the UDP data from them.

I didn’t know this. Thanks. If we already have the raw Ethernet frame in hand, 
then raw sockets are not necessary. I’ll have to look at the code.


> So, in order to make this as simple as possible, we could make an
> assumption that received packets have the same format as transmitted
> packets. At least with VLANs this might work more often than not.

It really doesn’t have anything to do with the packet as we sent it. VLANS and 
IP options can be added along the way. We only need to look at the packet as 
received in order to correct the receive timestamp. The transmit timestamp is 
already correct.


>> The formula would be: timestamp += (ether_bytes + 4) * 8 / link_bps.
> 
> Ok, so if we store offset of UDP data from the beginning of a
> transmitted frame at layer 2, it would be this?
> 
>   (udp_data_offset + NTP packet length + 4)

For the correction, we don’t care where the UDP data is. All we want is the 
length of the Ethernet frame. If this isn’t handed to us when we are given the 
raw frame, we can pull the length out of the frame. I’ll have to look at the 
code.

Denny


--
To unsubscribe email chrony-dev-requ...@chrony.tuxfamily.org with "unsubscribe" 
in the subject.
For help email chrony-dev-requ...@chrony.tuxfamily.org with "help" in the 
subject.
Trouble?  Email listmas...@chrony.tuxfamily.org.



Re: [chrony-dev] SW/HW timestamping on Linux

2016-11-21 Thread Denny Page
Miroslav,

It was 300ns. I’ve tried several different switches, each with differing 
results, although I didn’t run long enough to be be absolutely sure they would 
not settle to the same number. With my main Cisco switch I’m seeing the 8 hour 
average at 298.858ns. Such a precise number. I don’t have an explanation for 
this yet, but I am confident it’s not bit transmission time. I am going to set 
up a span port to capture the packets on both interfaces which may give an 
indication.

Denny


> On Nov 21, 2016, at 09:52, Miroslav Lichvar  wrote:
> 
> I think your own experiments showed there is a 350 ns offset :). That
> would be 700 extra nanoseconds in the B->A direction.



Re: [chrony-dev] SW/HW timestamping on Linux

2016-11-21 Thread Denny Page
Miroslav,

We actually don’t know the length of the data at layer 2. We can’t even 
guarantee the length at layer 3. Dispensing with layer numbers from now on.

We have three levels we have to concern ourselves with:

  Ethernet
  IP
  UDP

The only layer we know the size of is the UDP layer. The Ethernet layer can 
have VLAN tag. Yes, it’s only 32 bits, but error is error. The IP layer may 
have option headers. This is rare with IPv4, but not with IPv6.

The only way to correct the hardware timestamp is to know the entire length of 
the frame at the Ethernet level. I don’t see how this happens without using raw 
sockets.

The formula would be: timestamp += (ether_bytes + 4) * 8 / link_bps.

Denny


> On Nov 21, 2016, at 00:30, Miroslav Lichvar  wrote:
> 
> We know the length of the
> transmitted data at layer 2. Maybe we could use the same length of the
> headers for received packets on the same interface? The link speed
> should not be too difficult to get. Can you suggest a formula?



Re: [chrony-dev] SW/HW timestamping on Linux

2016-11-21 Thread Denny Page
Miroslav,

This is called a cut-through, and is not the default. The reason it isn’t the 
default is because it propagates invalid frames (runts, run-ons, crc errors, 
etc.) beyond the source port. Store and forward is the default. Cut-though 
generally doesn’t work with port speed mismatch and I’ve never seen it used 
with 100Mb. This really isn’t something you have to worry about. 

FWIW, assuming two 1Gb ports on the same switch, the one-way transmission time 
with cut-through would go from 1504ns to 800ns. However the round trip would 
still be symmetrical.

Denny


> On Nov 21, 2016, at 09:52, Miroslav Lichvar  wrote:
> 
> As I understand it, modern switches don't wait until they have whole
> packet before forwading it to another port. They just need to wait for
> the part that contains the destination MAC address. So the total time
> should be closer to 7540 ns.



Re: [chrony-dev] SW/HW timestamping on Linux

2016-11-21 Thread Denny Page
Miroslav,

There is no correction necessary. The speed change itself does not have any 
impact because the transmission time is symmetrical.

Given host A on a 1Gb link and host B on a 100Mb link:

  A -> Switch transmission time 754ns
  Switch -> B transmission time 7540ns
  Total time 8294ns

  B -> Switch transmission time 7540ns
  Switch -> transmission time 754ns
  Total time 8294ns

The propagation delay is symmetrical.

Unless there are a backlog of packets on one of the involved ports, the switch 
has zero impact on accuracy.

Denny



> On Nov 21, 2016, at 00:30, Miroslav Lichvar  wrote:
> 
> In your case you will still need to have some correction for the extra
> delay due the switch in the 100mbit->1gbit direction and I suspect
> switches in general will be the biggest trouble when trying to get the
> best accuracy.



Re: [chrony-dev] SW/HW timestamping on Linux

2016-11-20 Thread Denny Page
Miroslav,

I found the problem. The hardware receive timestamps are incorrect. It doesn’t 
matter if there is a switch involved or not.

NTP depends upon round trip latency being symmetrical for it’s accuracy, both 
in local network and in remote networks. To help ensure this, NTP uses preamble 
timestamps for transmit and trailer timestamps for receive. The following 
article makes for good background reading:

  https://www.eecis.udel.edu/~mills/stamp.html

From that article:

• A preamble timestamp is struck as near to the start of the packet as 
possible. The preferred point follows the last bit of the preamble and 
start-of-frame (STF) octet and before the first octet of the data.
• A trailer timestamp is struck as near to the end of the packet as 
possible. On transmit this follows the last octet of the data and before the 
frame check sequence (FCS); on receive this follows the last octet of the FCS.

In the current Chrony implementation, we have the following timestamp types 
available:

• Daemon timestamps. Time stamps generated in application space.
• Software timestamps. Timestamps created by the kernel / driver using 
SOF_TIMESTAMPING_TX_SOFTWARE and SOF_TIMESTAMPING_RX_SOFTWARE. This is called 
“driver timestamping” in the article.
• Hardware timestamps. Timestamps created by the network interface 
using SOF_TIMESTAMPING_TX_HARDWARE and SOF_TIMESTAMPING_RX_HARDWARE.

With the current implementation, software timestamps are the most accurate. 
Depending on network speed, processor speed and load, daemon timestamps are 
second, and hardware timestamps third.


Looking at the transmit side, these are the steps involved:

send() invoked
… variance from time in kernel ...
kernel invokes driver
… variance from time in driver ...
driver writes packet to network controller
… variance from time in network controller (packet 
scheduling/buffering, inter-packet gap, preamble collisions, etc.) ...
network controller sends SFD (start frame delimiter, called STF in the 
article)
network controller sends data bytes
network controller sends FCS (frame check sequence)

With daemon timestamps, Chrony generates the timestamp right before invoking 
send(). We suffer variance from time in kernel, network driver, and network 
controller. Software timestamping improves this by moving timestamp generation 
to the point immediately prior to the network driver writing the packet to the 
network controller, which removes the variance from the kernel and driver. 
Hardware timestamps improve this even further by moving timestamp generation to 
the point immediately following the SFD and prior to the data being sent, 
removing the last of the variance. The perfect preamble timestamp.


Looking at the receive side, these are the steps involved:

network controller receives SFD
network controller receives data bytes
network controller receives FCS
network controller generates interrupt
… variance from time waiting to process interrupt ...
driver invoked
… variance from time in driver ...
driver hands packet to kernel
… variance from time in kernel ...
select() returns
recv() invoked

With daemon timestamps, Chrony generates the timestamp immediately after select 
returns. We suffer variance from time in the network driver and kernel. 
Software timestamping improves this by moving timestamp generation to the point 
immediately after the driver hands the packet to the kernel, which removes the 
vast majority of the variance from time spent in the kernel. Now we come to 
hardware timestamps. What we would desire from hardware timestamping is that it 
be generated immediately following the FCS, which would remove the rest of the 
variance and give us the perfect trailer timestamp. However that’s not how the 
network controller generates hardware receive timestamps. With network 
controllers, the hardware receive timestamps are generated immediately 
following the SFD, just as they are for transmit. See Figure 7-27 (Time Stamp 
Point) on page 454 here: 
http://www.intel.com/content/www/us/en/embedded/products/networking/ethernet-controller-i350-datasheet.html.

What this means for NTP is that hardware timestamps are off by a minimum of 752 
transmission bits with IPv4. This assumes no VLAN, and no IP option headers. In 
a 100Mb network, this means a guaranteed minimum timestamp error of 7.52 
microseconds.

In order to generate a correct receive timestamp from the Ethernet hardware 
timestamp, one needs to need to have the FCS timestamp, the current interface 
speed, and the length of the packet at the Ethernet level. This is doable, but 
requires use of raw sockets and is quite a bit of work. A simpler (and safer) 
approach would be to use a combination of hardware timestamping for send 
(SOF_TIMESTAMPING_TX_HARDWARE), and softw

Re: [chrony-dev] SW/HW timestamping on Linux

2016-11-18 Thread Denny Page
I’ll have to come back to this after the offset issue is resolved.

Denny



> On Nov 18, 2016, at 06:37, Miroslav Lichvar  wrote:
> 
> On Thu, Nov 17, 2016 at 05:44:23PM -0800, Denny Page wrote:
>> Although reduced, I’m still seeing spikes with the patch below.
> 
> I'm not sure what could be wrong at this point. Maybe it really is a
> kernel or HW issue. I'm wondering what would be the best way to
> confirm or reject that idea.


--
To unsubscribe email chrony-dev-requ...@chrony.tuxfamily.org with "unsubscribe" 
in the subject.
For help email chrony-dev-requ...@chrony.tuxfamily.org with "help" in the 
subject.
Trouble?  Email listmas...@chrony.tuxfamily.org.



Re: [chrony-dev] SW/HW timestamping on Linux

2016-11-18 Thread Denny Page
Miroslav,

I believe that the hardware NTP device, chrony, or both, are 
striking/calculating timestamps incorrectly. I have a test in mind that will 
allow me to determine if this is correct, and if so which. Back to you soon.

Denny



> On Nov 18, 2016, at 00:00, Miroslav Lichvar  wrote:
> 
> On Thu, Nov 17, 2016 at 05:49:44PM -0800, Denny Page wrote:
>> This port speed differential appears to result in a asymmetry in 
>> transmit/receive time which significantly affects the calculations. If I 
>> lock the monitor host port at 100Mb, all three units show precise 
>> synchronization, both with hardware and software time stamping. As noted 
>> previously, with the monitor host port at 1Gb, I see ~300ns (positive) with 
>> software and ~2200ns (negative) with hardware.
> 
> Very interesting!
> 
>> I’ve spent many years on latency in networks, but have never come across 
>> this specific issue. I would like to get my head around how the asymmetry 
>> comes about, and how much it is. I am continuing to research this. I believe 
>> I generally understand how asymmetry affects the calculations, but would 
>> appreciate any guidance you can offer in terms of quantifying how much 
>> asymmetry is required to produce the offsets seen. Also any reason that you 
>> can think of for the offset to be positive with software timestamps, but 
>> negative with hardware timestamps.
> 
> The general rule is that in order to see a positive increase in offset
> of d, the delay of packets from the server to the client needs to
> increase by 2 * d. So, in your case if we take the offset of the local
> unit as a reference, we see an increase of 600ns in the client->server
> delay with SW timestamping and an increase of 4400ns in the
> server->client direction with HW timestamping.
> 
> I don't know much about networking HW and I can only speculate. I
> suspect that if the link speeds don't match, the switch is forced to
> buffer the data and this buffering takes longer when going from 100mb
> to 1gb than when going from 1gb to 100mb. This might explain the
> offset with HW timestamping.
> 
> In the case with SW timestamping, maybe the lower speed of the link to
> the local unit increases the delay of the RX interrupt for some
> reason? Maybe coallescing is not completely disabled and the delay
> takes into account the link speed? I've no idea. It would be great to
> hear from someone who is familiar with the HW and network driver.
> 
> -- 
> Miroslav Lichvar
> 
> -- 
> To unsubscribe email chrony-dev-requ...@chrony.tuxfamily.org with 
> "unsubscribe" in the subject.
> For help email chrony-dev-requ...@chrony.tuxfamily.org with "help" in the 
> subject.
> Trouble?  Email listmas...@chrony.tuxfamily.org.
> 


--
To unsubscribe email chrony-dev-requ...@chrony.tuxfamily.org with "unsubscribe" 
in the subject.
For help email chrony-dev-requ...@chrony.tuxfamily.org with "help" in the 
subject.
Trouble?  Email listmas...@chrony.tuxfamily.org.



Re: [chrony-dev] SW/HW timestamping on Linux

2016-11-17 Thread Denny Page
Miroslav,

I’ve found the issue with the 2.2us offset between a locally attached vs across 
the switch.

It’s not the switch. The act of crossing the switch itself seems to be 
negligible. The problem appears to be one of asymmetry arising from port speed 
mismatch. The hardware NTP device uses 10/100Mb Ethernet, and the monitor host 
uses 10/100/1000Mb. When the hardware NTP is plugged directly into the the 
monitor host, the connection negotiates at 100Mb. However when connected via 
the switch, the port for the monitor host auto negotiates at 1000Mb (1Gb), 
while the port for the hardware NTP device runs at 100Mb. This port speed 
differential appears to result in a asymmetry in transmit/receive time which 
significantly affects the calculations. If I lock the monitor host port at 
100Mb, all three units show precise synchronization, both with hardware and 
software time stamping. As noted previously, with the monitor host port at 1Gb, 
I see ~300ns (positive) with software and ~2200ns (negative) with hardware.

I’ve also tested by by introducing a separate switch (different manufacturer, 
dumb vs smart) between the locally attached unit and the monitor, and I see the 
same behavior. If let the monitor host port run at 1Gb, the unit lines up with 
the other two that are across the main switch. I lock the monitor host port at 
100Mb, The offset is again clearly visible, both with software and hardware 
time stamping.

I’ve spent many years on latency in networks, but have never come across this 
specific issue. I would like to get my head around how the asymmetry comes 
about, and how much it is. I am continuing to research this. I believe I 
generally understand how asymmetry affects the calculations, but would 
appreciate any guidance you can offer in terms of quantifying how much 
asymmetry is required to produce the offsets seen. Also any reason that you can 
think of for the offset to be positive with software timestamps, but negative 
with hardware timestamps.

Thanks,  
Denny


> On Nov 16, 2016, at 01:53, Miroslav Lichvar  wrote:
> 
> Is the port to the switch identical to the one connected to the third
> server? It would be interesting to see if the offset changes when the
> ports are swapped.
> 
> I'd trust HW timestamping. The 2.2us offset doesn't seem unrealistic.
> There is a reason why there are switches with support for PTP. You
> have exceptionally stable measurements with SW timestamping, but that
> doesn't mean the asymmetry in delay and processing has to be the same
> between the two ports.


--
To unsubscribe email chrony-dev-requ...@chrony.tuxfamily.org with "unsubscribe" 
in the subject.
For help email chrony-dev-requ...@chrony.tuxfamily.org with "help" in the 
subject.
Trouble?  Email listmas...@chrony.tuxfamily.org.



Re: [chrony-dev] SW/HW timestamping on Linux

2016-11-17 Thread Denny Page
Although reduced, I’m still seeing spikes with the patch below.

Denny



> On Nov 16, 2016, at 00:53, Miroslav Lichvar  wrote:
> 
> On Tue, Nov 15, 2016 at 08:24:37PM -0800, Denny Page wrote:
>> With the latest drop in the repo, I’m still seeing the wild spikes in the 
>> standard deviation with hardware time stamping against the fast responding 
>> hardare units. I'm also still seeing a better base deviation using software 
>> timestamps against them as well.
>> 
>> I do see better results with hardware time stamping when doing chrony to 
>> chrony, but I believe that this is a result of the general purpose computers 
>> being a bit slower to respond than the dedicated hardware units.
> 
> Hm, the fix helped with the spikes I was seeking. Did we rule out the
> possibility that in your case the spikes are due to the other issue with
> out-of-order HW timestamps? Could you try it with this patch to make
> sure only measurements with HW timestamps are used?
> 
> --- a/ntp_core.c
> +++ b/ntp_core.c
> @@ -1434,6 +1434,9 @@ receive_packet(NCR_Instance inst, NTP_Local_Address 
> *local_addr,
>prevent a synchronisation loop */
> testD = message->stratum <= 1 || REF_GetMode() != REF_ModeNormal ||
> pkt_refid != UTI_IPToRefid(&local_addr->ip_addr);
> +
> +if (inst->local_tx.source != NTP_TS_HARDWARE || rx_ts->source != 
> NTP_TS_HARDWARE)
> +  testB = 0;
>   } else {
> offset = delay = dispersion = 0.0;
> sample_time = rx_ts->ts;


--
To unsubscribe email chrony-dev-requ...@chrony.tuxfamily.org with "unsubscribe" 
in the subject.
For help email chrony-dev-requ...@chrony.tuxfamily.org with "help" in the 
subject.
Trouble?  Email listmas...@chrony.tuxfamily.org.



Re: [chrony-dev] SW/HW timestamping on Linux

2016-11-16 Thread Denny Page
Yep. Seeing lots of that. ~70% of the samples for the locally attached unit.

Denny


> On Nov 16, 2016, at 23:39, Miroslav Lichvar  wrote:
> 
> On Wed, Nov 16, 2016 at 11:24:14PM -0800, Denny Page wrote:
>> To be clear, would I still expect to see ‘D H’ in the measurements log with 
>> this change?
> 
> Yes, but the second bit in the column with four test bits should be
> always zero.



Re: [chrony-dev] SW/HW timestamping on Linux

2016-11-16 Thread Denny Page
To be clear, would I still expect to see ‘D H’ in the measurements log with 
this change?

Thanks,
Denny


> On Nov 16, 2016, at 00:53, Miroslav Lichvar  wrote:
> 
> Hm, the fix helped with the spikes I was seeking. Did we rule out the
> possibility that in your case the spikes are due to the other issue with
> out-of-order HW timestamps? Could you try it with this patch to make
> sure only measurements with HW timestamps are used?
> 
> --- a/ntp_core.c
> +++ b/ntp_core.c
> @@ -1434,6 +1434,9 @@ receive_packet(NCR_Instance inst, NTP_Local_Address 
> *local_addr,
>prevent a synchronisation loop */
> testD = message->stratum <= 1 || REF_GetMode() != REF_ModeNormal ||
> pkt_refid != UTI_IPToRefid(&local_addr->ip_addr);
> +
> +if (inst->local_tx.source != NTP_TS_HARDWARE || rx_ts->source != 
> NTP_TS_HARDWARE)
> +  testB = 0;
>   } else {
> offset = delay = dispersion = 0.0;
> sample_time = rx_ts->ts;
> 



Re: [chrony-dev] SW/HW timestamping on Linux

2016-11-16 Thread Denny Page
Miroslav,

I’m sorry, I didn’t mean to imply that the spikes were an unrelated to the 
ordering issue. I do believe that the spikes are related to the hardware 
timestamp ordering issue.

I’ll try to set up some of the tests you’ve requested later this evening, but 
it may take a day for me to have results.

Denny



> On Nov 16, 2016, at 00:53, Miroslav Lichvar  wrote:
> 
> On Tue, Nov 15, 2016 at 08:24:37PM -0800, Denny Page wrote:
>> With the latest drop in the repo, I’m still seeing the wild spikes in the 
>> standard deviation with hardware time stamping against the fast responding 
>> hardare units. I'm also still seeing a better base deviation using software 
>> timestamps against them as well.
>> 
>> I do see better results with hardware time stamping when doing chrony to 
>> chrony, but I believe that this is a result of the general purpose computers 
>> being a bit slower to respond than the dedicated hardware units.
> 
> Hm, the fix helped with the spikes I was seeking. Did we rule out the
> possibility that in your case the spikes are due to the other issue with
> out-of-order HW timestamps? Could you try it with this patch to make
> sure only measurements with HW timestamps are used?
> 
> --- a/ntp_core.c
> +++ b/ntp_core.c
> @@ -1434,6 +1434,9 @@ receive_packet(NCR_Instance inst, NTP_Local_Address 
> *local_addr,
>prevent a synchronisation loop */
> testD = message->stratum <= 1 || REF_GetMode() != REF_ModeNormal ||
> pkt_refid != UTI_IPToRefid(&local_addr->ip_addr);
> +
> +if (inst->local_tx.source != NTP_TS_HARDWARE || rx_ts->source != 
> NTP_TS_HARDWARE)
> +  testB = 0;
>   } else {
> offset = delay = dispersion = 0.0;
> sample_time = rx_ts->ts;
> 
> -- 
> Miroslav Lichvar
> 
> -- 
> To unsubscribe email chrony-dev-requ...@chrony.tuxfamily.org with 
> "unsubscribe" in the subject.
> For help email chrony-dev-requ...@chrony.tuxfamily.org with "help" in the 
> subject.
> Trouble?  Email listmas...@chrony.tuxfamily.org.
> 


--
To unsubscribe email chrony-dev-requ...@chrony.tuxfamily.org with "unsubscribe" 
in the subject.
For help email chrony-dev-requ...@chrony.tuxfamily.org with "help" in the 
subject.
Trouble?  Email listmas...@chrony.tuxfamily.org.



Re: [chrony-dev] SW/HW timestamping on Linux

2016-11-16 Thread Denny Page
Yes, all ports on the monitoring system are identical. The i354 is a 4 port 
chip, and all the ethernet ports on the monitoring unit are connected through 
that same chip.

The general server has both I354 and I211 chips. I shut everything down on the 
server to conduct a test. It didn’t matter which the direct connect vs switch 
was in use.

Denny



> On Nov 16, 2016, at 01:53, Miroslav Lichvar  wrote:
> 
> Is the port to the switch identical to the one connected to the third
> server? It would be interesting to see if the offset changes when the
> ports are swapped.



Re: [chrony-dev] SW/HW timestamping on Linux

2016-11-15 Thread Denny Page
Miroslav,

There is an additional issue, which is perhaps unrelated to the other issues 
discussed.

Background: Each target IP is a hardware based NTP server. IPs 240 and 244 are 
across the switch. Chrony is synchronizing with these two IPs. Incremental 
latency from the switch is approximately ~1.8 microseconds each way. IP 245 is 
directly attached, and is marked as noselect.

With software time stamping, the directly connected unit (IP 245) shows an 
average offset of approximately +250ns from the reference units.

MS Name/IP address Stratum Poll Reach LastRx Last sample   
===
^+ 192.168.230.240   1   0   377 1+32ns[  +32ns] +/-   19us
^* 192.168.230.244   1   0   377 1-47ns[  -76ns] +/-   19us
^? 192.168.230.245   1   0   377 1   +310ns[ +310ns] +/-   16us

Name/IP AddressNP  NR  Span  Frequency  Freq Skew  Offset  Std Dev
==
192.168.230.24058  3678 -0.000  0.002-33ns80ns
192.168.230.24430  1841 +0.000  0.005+25ns   111ns
192.168.230.24520  1624 -0.005  0.011   +188ns87ns


With hardware time stamping enabled, this offset just to an average of 
approximate -2200ns from the reference units.

MS Name/IP address Stratum Poll Reach LastRx Last sample   
===
^* 192.168.230.240   1   0   377 0   -515ns[ -547ns] +/-   13us
^+ 192.168.230.244   1   0   377 0   -454ns[ -486ns] +/-   13us
^? 192.168.230.245   1   0   377 0  -1800ns[-1832ns] +/- 6546ns

Name/IP AddressNP  NR  Span  Frequency  Freq Skew  Offset  Std Dev
==
192.168.230.24064  3076 -0.002  0.005-66ns   264ns
192.168.230.24464  3276 +0.001  0.005+66ns   231ns
192.168.230.24557  3368 +0.003  0.004  -2036ns   183ns


While I can get my head around a differential of 250ns resulting from the 
switch, I’m finding it very difficult to believe the  almost 2500ns 
differential that appears when hardware time stamping is enabled.

Any thoughts on this?

Thanks,
Denny







Re: [chrony-dev] SW/HW timestamping on Linux

2016-11-15 Thread Denny Page
With the latest drop in the repo, I’m still seeing the wild spikes in the 
standard deviation with hardware time stamping against the fast responding 
hardare units. I'm also still seeing a better base deviation using software 
timestamps against them as well.

I do see better results with hardware time stamping when doing chrony to 
chrony, but I believe that this is a result of the general purpose computers 
being a bit slower to respond than the dedicated hardware units.

Denny



> On Nov 15, 2016, at 05:58, Miroslav Lichvar  wrote:
> 
> On Mon, Nov 14, 2016 at 03:40:32PM +0100, Miroslav Lichvar wrote:
>> On Sat, Nov 12, 2016 at 10:36:55AM -0800, Denny Page wrote:
>>> Here is a reasonable visual representation of what I am seeing. The section 
>>> on the left (before 8:00) is with hardware timestamping, while the section 
>>> on the right is with software timestamping. maxdelaydevratio of 4 in both 
>>> cases.
>>> 
>>> I think I have a kernel/driver problem.
>> 
>> I think I see something similar. Occasional spikes that are causing
>> resets of the HW clock instance. It looks like a chrony bug.
> 
> This should be now fixed in git.
> 
> -- 
> Miroslav Lichvar
> 
> -- 
> To unsubscribe email chrony-dev-requ...@chrony.tuxfamily.org with 
> "unsubscribe" in the subject.
> For help email chrony-dev-requ...@chrony.tuxfamily.org with "help" in the 
> subject.
> Trouble?  Email listmas...@chrony.tuxfamily.org.
> 


--
To unsubscribe email chrony-dev-requ...@chrony.tuxfamily.org with "unsubscribe" 
in the subject.
For help email chrony-dev-requ...@chrony.tuxfamily.org with "help" in the 
subject.
Trouble?  Email listmas...@chrony.tuxfamily.org.



Re: [chrony-dev] SW/HW timestamping on Linux

2016-11-15 Thread Denny Page
The chip is actually a I354, which is slightly different than the I350, but I 
don’t think it matters much. I also have interfaces with I211 chips, and the 
ordering issue appears to happen there as well. I don’t think the sleep after 
the send is going to affect the order of the timestamp and response messages 
since both are requested at the point of the outbound send.

Denny


> On Nov 15, 2016, at 02:07, Miroslav Lichvar  wrote:
> 
> On Mon, Nov 14, 2016 at 08:59:17PM -0800, Denny Page wrote:
>> I tested with a usleep(100) following the sendmsg() call. This didn’t appear 
>> to have any impact. Was the usleep() intended to influence the order of 
>> timestamp vs. server response messages?
> 
> Yes, that was the idea. Could you try increasing the sleep interval to
> 1000 or maybe 1? Anyway, I asked about this on the Intel
> development list:
> 
> http://lists.osuosl.org/pipermail/intel-wired-lan/Week-of-Mon-20161114/007226.html


--
To unsubscribe email chrony-dev-requ...@chrony.tuxfamily.org with "unsubscribe" 
in the subject.
For help email chrony-dev-requ...@chrony.tuxfamily.org with "help" in the 
subject.
Trouble?  Email listmas...@chrony.tuxfamily.org.



Re: [chrony-dev] SW/HW timestamping on Linux

2016-11-14 Thread Denny Page
I tested with a usleep(100) following the sendmsg() call. This didn’t appear to 
have any impact. Was the usleep() intended to influence the order of timestamp 
vs. server response messages?

Denny


> On Nov 14, 2016, at 11:04, Miroslav Lichvar  wrote:
> 
> usleep() should be called after sendmsg(), e.g. in
> NIO_SendPacket().



Re: [chrony-dev] SW/HW timestamping on Linux

2016-11-14 Thread Denny Page
Setting noselect for all sources seems to have no impact on standard deviation.

Disabling all forms of dynamic tic (CONFIG_HZ_PERIODIC=y) seems to have some 
effect in reducing the number of “D H”, but does not appear to have much of an 
impact on the standard deviation. I am returning CONFIG_NO_HZ_IDLE=y.

Denny



> On Nov 14, 2016, at 05:35, Miroslav Lichvar  wrote:
> 
> Another experiment would be to try
> configuring all sources with the noselect option and see how much
> stddev improves. If it does improve significantly, it would suggest a
> problem with the synchronization of the clock. Maybe the kernel is too
> slow (the nohz=off option might help) or there is a problem in the
> chrony's loop.



Re: [chrony-dev] SW/HW timestamping on Linux

2016-11-14 Thread Denny Page
I’m not sure I understand what effect a delay in NIO_Linux_RequestTxTimestamp 
would have. That I see, NIO_Linux_RequestTxTimestamp builds a control message 
structure but does not make any level 2 calls. Introducing a delay here should 
be the same as introducing a delay one level up in NIO_SendPacket. Either way, 
this is before the call to sendmsg(), so the only effect I see is that the 
original message to the server is delayed by 100us. I don’t understand how this 
would affect the timing of the server response message, or the timestamp 
message. Is there something that I’m missing?

Denny


> On Nov 14, 2016, at 05:17, Miroslav Lichvar  wrote:
> 
> As a quick workaround I'd suggest to try it with this change:
> 
> --- a/ntp_io_linux.c
> +++ b/ntp_io_linux.c
> @@ -465,6 +465,8 @@ NIO_Linux_RequestTxTimestamp(struct msghdr *msg, int 
> cmsglen, int sock_fd)
> {
>   struct cmsghdr *cmsg;
> 
> +  usleep(100);
> +
>   /* Check if TX timestamping is disabled on this socket */
>   if (permanent_ts_options || !NIO_IsServerSocket(sock_fd))
> return cmsglen;



Re: [chrony-dev] SW/HW timestamping on Linux

2016-11-14 Thread Denny Page
Yes, I these on the monitoring system. For all servers/peers.

Denny


> On Nov 14, 2016, at 05:35, Miroslav Lichvar  wrote:
> 
> Do you see in measurements.log any entries with 'D K' and
> '111 111 ' in the columns with tests results (i.e. were any 'D K'
> measurements used for synchronization)? 



Re: [chrony-dev] SW/HW timestamping on Linux

2016-11-12 Thread Denny Page
Here is a reasonable visual representation of what I am seeing. The section on 
the left (before 8:00) is with hardware timestamping, while the section on the 
right is with software timestamping. maxdelaydevratio of 4 in both cases.

I think I have a kernel/driver problem.

Denny




Re: [chrony-dev] SW/HW timestamping on Linux

2016-11-11 Thread Denny Page
Miroslav,

These are representative samples of what I am seeing. These are all taken from 
my dedicated monitoring system.

192.168.230.240 and 192.168.230.244 are hardware units on the switch.
192.168.230.245 is a locally attached hardware unit.
192.168.230.1 is a FreeBSD system running ntpd.
192.168.230.2 is a Linux server on the switch running the same version of 
chrony and timestamp config as the monitor.
192.168.225.10 is a Linux server two switches and a firewall away running the 
same version of chrony and timestamp config as the monitor.

Other than the baseline with the prior repo version, the chrony to chrony peers 
have xleave enabled.

Please let me know if you have any questions.

Denny

—

Non timestamp version (prior repo version):

210 Number of sources = 6
MS Name/IP address Stratum Poll Reach LastRx Last sample   
===
^* 192.168.230.240   1   0   377 0   -132ns[ -142ns] +/-   24us
^? 192.168.230.244   1   0   377 0   +190ns[ +180ns] +/-   24us
^? 192.168.230.245   1   0   377 0   +267ns[ +257ns] +/-   21us
=? 192.168.230.1 2   3   37725  +2817ns[+2896ns] +/- 1480us
=? 192.168.230.2 2   1   377 0  +9384ns[+9384ns] +/-   80us
=? 192.168.225.102   1   37711   -454ns[ -421ns] +/-  167us
210 Number of sources = 6
Name/IP AddressNP  NR  Span  Frequency  Freq Skew  Offset  Std Dev
==
192.168.230.24057  2768 -0.000  0.008 -0ns   311ns
192.168.230.24464  3777 -0.003  0.006-22ns   327ns
192.168.230.24551  3165 +0.001  0.007   +356ns   264ns
192.168.230.1  16   6   209 -0.027  0.092+44us  4799ns
192.168.230.2  36  1678 -0.005  0.065  +8794ns  2393ns
192.168.225.10 20  11   263 -0.003  0.049  -4115ns  4324ns


With software timestamps:

210 Number of sources = 6
MS Name/IP address Stratum Poll Reach LastRx Last sample   
===
^* 192.168.230.240   1   0   377 1   +884ns[ +895ns] +/-   19us
^? 192.168.230.244   1   0   377 1   +513ns[ +513ns] +/-   18us
^? 192.168.230.245   1   0   377 0   +789ns[ +789ns] +/-   17us
=? 192.168.230.1 2   3   377 7   +496ns[ +419ns] +/- 1702us
=? 192.168.230.2 2   1   377 1+10us[  +10us] +/-   70us
=? 192.168.225.102   1   377 1  -6797ns[-6797ns] +/-  160us
210 Number of sources = 6
Name/IP AddressNP  NR  Span  Frequency  Freq Skew  Offset  Std Dev
==
192.168.230.24042  1853 +0.000  0.003 +0ns83ns
192.168.230.24464  3389 -0.000  0.001+19ns92ns
192.168.230.24564  3590 -0.001  0.002   +314ns   126ns
192.168.230.1  26  10   202 +0.005  0.017  +4447ns  1418ns
192.168.230.2  21   745 -0.069  0.134+11us  2489ns
192.168.225.10 16   439 -0.023  0.325  +1543ns  3726ns


With hardware timestamps:

210 Number of sources = 6
MS Name/IP address Stratum Poll Reach LastRx Last sample   
===
^* 192.168.230.240   1   0   377 0+68ns[  +71ns] +/-   13us
^? 192.168.230.244   1   0   377 0-15ns[  -12ns] +/-   13us
^? 192.168.230.245   1   0   377 0  -2127ns[-2127ns] +/- 6592ns
=? 192.168.230.1 2   3   377 5  +4730ns[+4709ns] +/- 1113us
=? 192.168.230.2 2   1   377 2  +7042ns[+7031ns] +/-   65us
=? 192.168.225.102   1   377 0  +8856ns[+8859ns] +/-  150us
210 Number of sources = 6
Name/IP AddressNP  NR  Span  Frequency  Freq Skew  Offset  Std Dev
==
192.168.230.24064  3374 -0.000  0.003 -0ns   153ns
192.168.230.24464  3575 +0.000  0.003-19ns   145ns
192.168.230.24564  2975 +0.001  0.002  -2115ns   106ns
192.168.230.1  16   6   121 +0.073  0.041  +5859ns  1343ns
192.168.230.2  63  30   149 -0.016  0.020  +4040ns  2099ns
192.168.225.10 31  1566 +0.011  0.109  +5302ns  3002ns

210 Number of sources = 6
MS Name/IP address Stratum Poll Reach LastRx Last sample   
===
^* 192.168.230.240 

Re: [chrony-dev] SW/HW timestamping on Linux

2016-11-11 Thread Denny Page
The average offset is ~300ns. I would not expect there to be so much asymmetry 
with the switch, but I suppose anything is possible. If I enable hardware time 
stamps the offset jumps to ~2100ns.

Denny


> On Nov 11, 2016, at 02:22, Miroslav Lichvar  wrote:
> 
> This looks very good and I'm curious to see how much further it will
> improve with HW timestamping. The offset of ~400ns between the 1st/2nd
> and 3rd server could be an asymmetry in delay due to the switch, right?


--
To unsubscribe email chrony-dev-requ...@chrony.tuxfamily.org with "unsubscribe" 
in the subject.
For help email chrony-dev-requ...@chrony.tuxfamily.org with "help" in the 
subject.
Trouble?  Email listmas...@chrony.tuxfamily.org.



Re: [chrony-dev] SW/HW timestamping on Linux

2016-11-11 Thread Denny Page

> On Nov 11, 2016, at 02:17, Miroslav Lichvar  wrote:
> 
> The mixed results in tests 3, 4 and 8 are not as expected, however.
> Can you please try running one of those tests with 'acquisitionport
> 1' in chrony.conf and post the debug output (after ./configure
> --enable-debug) showing few exchanges?
> 
> 
>> test 4
>> - hw stamps on for both igb0 and igb3
>> - all ntp servers enabled
>> - result: H H for one server attached via igb0, mixed H H and D H for other 
>> server attached via igb0, mixed H H and D H for server attached via igb3


Miroslav,

Following is output from test 4. Please let me know if this gives you the 
information that you are looking for.

Thanks,
Denny

---

chrony.conf (partial):
server 192.168.230.240 iburst minpoll 0 maxpoll 0 maxdelaydevratio 4
server 192.168.230.244 iburst minpoll 0 maxpoll 0 maxdelaydevratio 4 noselect
server 192.168.230.245 iburst minpoll 0 maxpoll 0 maxdelaydevratio 4 noselect
hwtimestamp igb0
hwtimestamp igb3
acquisitionport 1


measurements log:
2016-11-11 17:45:05 192.168.230.244 N  1 111 111 1101   0  4 1.00  1.000e-08  
2.560e-05  1.618e-07  0.000e+00  0.000e+00 47505300 4B H H
2016-11-11 17:45:05 192.168.230.240 N  1 111 111    0  4 1.00 -9.500e-08  
2.566e-05  1.618e-07  0.000e+00  0.000e+00 47505300 4B H H
2016-11-11 17:45:05 192.168.230.245 N  1 111 111    0  4 1.00  2.142e-06  
1.276e-05  1.618e-07  0.000e+00  0.000e+00 47505300 4B H H
2016-11-11 17:45:06 192.168.230.244 N  1 111 111 1101   0  4 1.00  8.163e-06  
4.209e-05  1.618e-07  0.000e+00  0.000e+00 47505300 4B D H
2016-11-11 17:45:06 192.168.230.240 N  1 111 111    0  4 1.00 -6.000e-08  
2.565e-05  1.618e-07  0.000e+00  0.000e+00 47505300 4B H H
2016-11-11 17:45:06 192.168.230.245 N  1 111 111    0  4 1.00  2.262e-06  
1.277e-05  1.618e-07  0.000e+00  0.000e+00 47505300 4B H H
2016-11-11 17:45:07 192.168.230.244 N  1 111 111 1101   0  4 1.00 -1.700e-08  
2.556e-05  1.618e-07  0.000e+00  0.000e+00 47505300 4B H H
2016-11-11 17:45:07 192.168.230.240 N  1 111 111    0  4 1.00  5.200e-08  
2.562e-05  1.618e-07  0.000e+00  0.000e+00 47505300 4B H H
2016-11-11 17:45:07 192.168.230.245 N  1 111 111 1101   0  4 1.00  1.103e-05  
3.105e-05  1.618e-07  0.000e+00  0.000e+00 47505300 4B D H
2016-11-11 17:45:08 192.168.230.244 N  1 111 111 1101   0  4 1.00  1.400e-07  
2.572e-05  1.618e-07  0.000e+00  0.000e+00 47505300 4B H H
2016-11-11 17:45:08 192.168.230.240 N  1 111 111    0  4 1.00 -1.280e-07  
2.562e-05  1.618e-07  0.000e+00  0.000e+00 47505300 4B H H
2016-11-11 17:45:08 192.168.230.245 N  1 111 111    0  4 1.00  2.123e-06  
1.273e-05  1.618e-07  0.000e+00  0.000e+00 47505300 4B H H
2016-11-11 17:45:09 192.168.230.244 N  1 111 111 1101   0  4 1.00 -7.800e-08  
2.564e-05  1.618e-07  0.000e+00  0.000e+00 47505300 4B H H
2016-11-11 17:45:09 192.168.230.240 N  1 111 111    0  4 1.00 -6.000e-08  
2.564e-05  1.618e-07  0.000e+00  0.000e+00 47505300 4B H H
2016-11-11 17:45:09 192.168.230.245 N  1 111 111 1101   0  4 1.00  1.188e-05  
3.229e-05  1.618e-07  0.000e+00  0.000e+00 47505300 4B D H


debug output (command line optiosn -f /etc/chrony/chrony.conf -P 9 -s -r -d -d):
2016-11-11T17:45:05Z ntp_core.c:1032:(transmit_timeout) Transmit timeout for 
[192.168.230.244:123]
2016-11-11T17:45:05Z ntp_io.c:819:(NIO_SendPacket) Sent 48 bytes to 
192.168.230.244:123 from [UNSPEC] fd 8
2016-11-11T17:45:05Z ntp_io_linux.c:444:(NIO_Linux_ProcessMessage) Received 48 
bytes from error queue for 192.168.230.244:123 fd=8 if=2 tss=2
2016-11-11T17:45:05Z ntp_core.c:1888:(update_tx_timestamp) Updated TX timestamp 
delay=0.16623
2016-11-11T17:45:05Z ntp_io.c:657:(process_message) Received 48 bytes from 
192.168.230.244:123 to 192.168.230.3 fd=8 if=2 tss=2 delay=0.70806
2016-11-11T17:45:05Z sourcestats.c:808:(SST_IsGoodSample) Bad sample: 
offset=0.06 delay=0.26 incr_delay=0.02 allowed=0.06
2016-11-11T17:45:05Z ntp_core.c:1484:(receive_packet) NTP packet lvm=44 
stratum=1 poll=4 prec=-25 root_delay=0.00 root_disp=0.00 refid=47505300 
[]
2016-11-11T17:45:05Z ntp_core.c:1489:(receive_packet) 
reference=1478886305.0 origin=4099013356.361615043 
receive=1478886305.070201480 transmit=1478886305.070206240
2016-11-11T17:45:05Z ntp_core.c:1491:(receive_packet) offset=0.00010 
delay=0.25600 dispersion=0.00 root_delay=0.26 
root_dispersion=0.00
2016-11-11T17:45:05Z ntp_core.c:1495:(receive_packet) test123=111 test567=111 
testABCD=1101 kod_rate=0 interleaved=0 valid=1 good=0 updated=1
2016-11-11T17:45:05Z ntp_io.c:657:(process_message) Received 48 bytes from 
192.168.225.10:123 to 192.168.230.3 fd=10 if=2 tss=2 delay=0.29434
2016-11-11T17:45:05Z clientlog.c:447:(CLG_LogNTPAccess) NTP hits 24 rate -4 
trate -128 tokens 0
2016-11-11T17:45:05Z ntp_io.c:819:(NIO_SendPacket) Sent 48 bytes to 
192.168.225.10:123 from 192.168.230.3 fd 10
2016-11-11T17:45:05Z ntp_io_linux.c:444:(NIO_Linux_ProcessMessage) Received 48 
bytes from error que

Re: [chrony-dev] SW/HW timestamping on Linux

2016-11-11 Thread Denny Page
Turns out that after I run it for longer period while, I see mixed D H and H H  
for test 5 as well.


> On Nov 10, 2016, at 13:35, Denny Page  wrote:
> 
> test 5
> - hw stamps on for igb0
> - ntp servers attached via igb0 enabled, server attached via igb3 disabled
> - result: H H for both servers (as expected)


--
To unsubscribe email chrony-dev-requ...@chrony.tuxfamily.org with "unsubscribe" 
in the subject.
For help email chrony-dev-requ...@chrony.tuxfamily.org with "help" in the 
subject.
Trouble?  Email listmas...@chrony.tuxfamily.org.



Re: [chrony-dev] SW/HW timestamping on Linux

2016-11-10 Thread Denny Page
Things are working quite well with software timestamping.

This is from my monitoring system:

210 Number of sources = 6
MS Name/IP address Stratum Poll Reach LastRx Last sample   
===
^? 192.168.230.240   1   0   377 0   +295ns[ +295ns] +/-   19us
^? 192.168.230.244   1   0   377 1   +214ns[ +227ns] +/-   19us
^* 192.168.230.245   1   0   377 1   +577ns[ +590ns] +/-   16us
=? 192.168.230.1 2   3   377 5  -5142ns[-5127ns] +/- 1380us
=? 192.168.230.2 2   1   377 2   +549ns[ +549ns] +/-   58us
=? 192.168.225.102   1   37728-25us[  -25us] +/-  149us
210 Number of sources = 6
Name/IP AddressNP  NR  Span  Frequency  Freq Skew  Offset  Std Dev
==
192.168.230.24040  2153 +0.000  0.004   -370ns   105ns
192.168.230.24462  3589 -0.001  0.002   -430ns   108ns
192.168.230.24546  2360 +0.000  0.004 +0ns   126ns
192.168.230.1  42  20   330 -0.009  0.010  -1152ns  1662ns
192.168.230.2  38  2098 +0.014  0.023   +742ns  1163ns
192.168.225.10 29  15  1083 -0.007  0.009+33us  4906ns

[NB: 192.168.230.245 is the directly attached unit]


And this is from my general (busy) server:

210 Number of sources = 5
MS Name/IP address Stratum Poll Reach LastRx Last sample   
===
^* 192.168.230.240   1   0   377 0  +2164ns[+2167ns] +/-   23us
^? 192.168.230.244   1   0   377 0   +601ns[ +604ns] +/-   22us
=? 192.168.230.1 2   3   377 8  -3633ns[-3586ns] +/- 1151us
=? 192.168.230.3 2   1   377 2   +295ns[ +298ns] +/-   62us
=? 192.168.225.102   1   377 2-19us[  -19us] +/-  145us
210 Number of sources = 5
Name/IP AddressNP  NR  Span  Frequency  Freq Skew  Offset  Std Dev
==
192.168.230.24064  3090 +0.000  0.006 +0ns   355ns
192.168.230.24464  2679 -0.003  0.008   -297ns   401ns
192.168.230.1  16   6   121 -0.084  0.097  -8027ns  3564ns
192.168.230.3  64  32   142 -0.002  0.014   +154ns  1266ns
192.168.225.10 16   533 -0.004  0.204-20us  1884ns

Denny


--
To unsubscribe email chrony-dev-requ...@chrony.tuxfamily.org with "unsubscribe" 
in the subject.
For help email chrony-dev-requ...@chrony.tuxfamily.org with "help" in the 
subject.
Trouble?  Email listmas...@chrony.tuxfamily.org.



Re: [chrony-dev] SW/HW timestamping on Linux

2016-11-10 Thread Denny Page
Miroslav,

The kernel is 4.4.26. The problem appears to be hardware timestamping and 
multiple interfaces.

Here is the test setup: 3 identical hardware NTP units. Two accessible via 
switch on igb0. One accessible vi igb3 as a directly connected IP point to 
point. Here are test results:

test 1
- hw stamps off for all interfaces
- all ntp servers enabled
- result: K K for all servers (as expected)

test 2
- hw stamps on for igb0
- all ntp servers enabled
- result: H H for servers attached via igb0, D K for server attached via igb3

test 3
- hw stamps on for igb3
- all ntp servers enabled
- result: D K for servers attached via igb0, mixed D H and H H for server 
attached via igb3

test 4
- hw stamps on for both igb0 and igb3
- all ntp servers enabled
- result: H H for one server attached via igb0, mixed H H and D H for other 
server attached via igb0, mixed H H and D H for server attached via igb3


test 5
- hw stamps on for igb0
- ntp servers attached via igb0 enabled, server attached via igb3 disabled
- result: H H for both servers (as expected)

test 6
- hw stamps on for igb3
- ntp servers attached via igb0 enabled, server attached via igb3 disabled
- result: D K for both servers


test 7
- hw stamps on for igb0
- ntp servers attached via igb0 disabled, server attached via igb3 enabled
- result: D K

test 8
- hw stamps on for igb3
- ntp servers attached via igb0 disabled, server attached via igb3 enabled
- result: mixed D H and H H


Denny



> On Nov 10, 2016, at 11:45, Miroslav Lichvar  wrote:
> 
> On Thu, Nov 10, 2016 at 11:25:31AM -0800, Denny Page wrote:
>> Miroslav,
>> 
>> And D H?
> 
> D means daemon, i.e. no SW or HW timestamp. The first one is for
> transmit timestampt, the other one for receive timestamp. If only the
> first entry in the log is like that, then it's ok. If not, something
> is wrong. What kernel version do you have? It might help if we could
> see the debug output from chronyd -d -d when compiled with the
> --enable-debug option.
> 
> -- 
> Miroslav Lichvar
> 
> -- 
> To unsubscribe email chrony-dev-requ...@chrony.tuxfamily.org with 
> "unsubscribe" in the subject.
> For help email chrony-dev-requ...@chrony.tuxfamily.org with "help" in the 
> subject.
> Trouble?  Email listmas...@chrony.tuxfamily.org.
> 


--
To unsubscribe email chrony-dev-requ...@chrony.tuxfamily.org with "unsubscribe" 
in the subject.
For help email chrony-dev-requ...@chrony.tuxfamily.org with "help" in the 
subject.
Trouble?  Email listmas...@chrony.tuxfamily.org.



Re: [chrony-dev] SW/HW timestamping on Linux

2016-11-10 Thread Denny Page
Miroslav,

And D H?

Denny


> On Nov 10, 2016, at 06:45, Miroslav Lichvar  wrote:
> 
> You can see if it's actually working in the measurements log. With SW
> timestamping the last two fields should be K K, with HW timestamping
> they should be H H.
> 



Re: [chrony-dev] Nanosecond timestamps

2016-11-08 Thread Denny Page
> On Nov 08, 2016, at 03:01, Miroslav Lichvar  wrote:
> 
>> In addition to the kernel, I disable eee and interrupt coalescing on the 
>> network interfaces.
> 
> Oh, I knew ethernet has some power saving features, but I didn't
> realize they could increase latency/jitter. I'll need to experiment
> with this :).


The value of wake time is negotiated between the ethernet partners. I believe 
the max for 100Mb is 30us, and 16us for 1000Mb. In other words, for 100Mb, 
which is what hardware NTP solutions commonly use, the wake delay can 
theoretically be larger than the NTP packet transmission time. Usually it ends 
up being much smaller, but it is still still a significant jitter impact in a 
local network.

Denny





Re: [chrony-dev] Nanosecond timestamps

2016-11-08 Thread Denny Page
> On Nov 7, 2016, at 15:04, Denny Page  wrote:
> 
> For comparison, this is the view from the monitoring system (192.168.230.3):
> 
> 210 Number of sources = 5
> MS Name/IP address Stratum Poll Reach LastRx Last sample  
>  
> ===
> ^* 192.168.230.240   1   0   377 0  -1202ns[-1231ns] +/-   
> 25us
> ^- 192.168.230.244   1   0   377 0   +569ns[ +540ns] +/-   
> 25us
> =x 192.168.230.1 2   3   377 1  -3050ns[-3079ns] +/- 
> 1385us
> =x 192.168.230.2 2   1   377 1  +4778ns[+4778ns] +/-   
> 87us
> =x 192.168.225.102   1   377 0-25us[  -25us] +/-  
> 183us
> 210 Number of sources = 5
> Name/IP AddressNP  NR  Span  Frequency  Freq Skew  Offset  Std Dev
> ==
> 192.168.230.24064  4170 -0.000  0.006 -0ns   299ns
> 192.168.230.24464  3672 -0.000  0.008-43ns   386ns
> 192.168.230.1  44  22   347 -0.021  0.018  -5406ns  3169ns
> 192.168.230.2  31  1762 -0.047  0.075  +1258ns  2220ns
> 192.168.225.10 16   855 -0.026  0.300-29us  3670ns
> 



FWIW, I have a third LeoNTP unit temporarily from the vendor. It shows as 
192.168.230.245 below, and is directly connected to the monitoring system (no 
switch). Direct connect is giving slightly increased precision, but nothing 
otherwise too dramatic. Following are a few samples. [All except 
192.168.230.240 are marked as “noselect”]

210 Number of sources = 6
MS Name/IP address Stratum Poll Reach LastRx Last sample   
===
^* 192.168.230.240   1   0   377 1   +736ns[ +770ns] +/-   24us
^? 192.168.230.244   1   0   377 0   -385ns[ -385ns] +/-   24us
^? 192.168.230.245   1   0   377 1   +237ns[ +271ns] +/-   21us
=? 192.168.230.1 2   3   377 6  -4466ns[-4433ns] +/- 1151us
=? 192.168.230.2 2   1   377 1  +3873ns[+3873ns] +/-   91us
=? 192.168.225.102   1   377 1-17us[  -17us] +/-  176us
210 Number of sources = 6
Name/IP AddressNP  NR  Span  Frequency  Freq Skew  Offset  Std Dev
==
192.168.230.24064  3281 +0.000  0.007 +0ns   378ns
192.168.230.24464  3977 +0.001  0.007+90ns   362ns
192.168.230.24564  3579 -0.002  0.007   +193ns   364ns
192.168.230.1  35  17   275 -0.018  0.018  -1160ns  2487ns
192.168.230.2  64  31   150 +0.007  0.025  +2138ns  2479ns
192.168.225.10 55  32   135 +0.017  0.042  -6887ns  3876ns

210 Number of sources = 6
MS Name/IP address Stratum Poll Reach LastRx Last sample   
===
^* 192.168.230.240   1   0   377 1+56ns[  +64ns] +/-   24us
^? 192.168.230.244   1   0   377 0-14ns[  -14ns] +/-   24us
^? 192.168.230.245   1   0   377 0-31ns[  -31ns] +/-   22us
=? 192.168.230.1 2   3   377 3  +3246ns[+3215ns] +/- 1093us
=? 192.168.230.2 2   1   377 2  +3253ns[+3261ns] +/-   85us
=? 192.168.225.102   1   377 1-17us[  -17us] +/-  190us
210 Number of sources = 6
Name/IP AddressNP  NR  Span  Frequency  Freq Skew  Offset  Std Dev
==
192.168.230.24064  3783 +0.000  0.009 +0ns   486ns
192.168.230.24464  3787 -0.000  0.007 +8ns   381ns
192.168.230.24564  3983 -0.000  0.005   +334ns   303ns
192.168.230.1  35  24   274 +0.021  0.021  +3344ns  2930ns
192.168.230.2  64  34   147 -0.002  0.032  +1621ns  3411ns
192.168.225.10 64  31   144 +0.002  0.058-13us  5873ns

210 Number of sources = 6
MS Name/IP address Stratum Poll Reach LastRx Last sample   
===
^* 192.168.230.240   1   0   377 0-77ns[  -85ns] +/-   24us
^? 192.168.230.244   1   0   377 0-47ns[  -47ns] +/-   24us
^? 192.168.230.245   1   0   377 1   +663ns[ +655ns] +/-   21us
=? 192.168.230.1 2   3   377 4   +669ns[ +635ns] +/- 1596us
=? 192.168.230.2 2   1   377 2  +3811ns[+3803n

Re: [chrony-dev] Nanosecond timestamps

2016-11-08 Thread Denny Page

> On Nov 7, 2016, at 15:04, Denny Page  wrote:
> 
> The kernels are Linux 4.4.26, configured for low latency. I’ll follow up with 
> the list of kernel configuration parameters I use. Both the kernels and 
> chrony are built with mtune=native. 


For the kernel configuration, the general guideline I follow is to build a 
static kernel with no modules and only the drivers/features I need enabled. 
This can take a bit of effort the first time around for a piece of hardware, 
but thereafter it’s relatively easy. Regardless of the hardware, I have listed 
some of the general parameters that I consider important for low latency below. 
Note that the configuration parameters that are not set are just as important 
as the ones that are.

CONFIG_NO_HZ_IDLE=y
CONFIG_HIGH_RES_TIMERS=y
# CONFIG_MODULES is not set
CONFIG_DEFAULT_CFQ=y
CONFIG_NR_CPUS=
# CONFIG_SCHED_SMT is not set
CONFIG_SCHED_MC=y
CONFIG_PREEMPT=y
CONFIG_HZ_1000=y
# CONFIG_PM is not set
CONFIG_ACPI=y
# CONFIG_CPU_FREQ is not set
# CONFIG_CPU_IDLE is not set
CONFIG_HPET=y

There is always room for improvement, so feedback is welcome.

In addition to the kernel, I disable eee and interrupt coalescing on the 
network interfaces.

Denny



Re: [chrony-dev] Nanosecond timestamps

2016-11-04 Thread Denny Page
The offset is being taken at 15 second intervals via a script that calls 
"chronyc -c sourcestats” and feeds the data to rrd. My polling was not quite as 
aggressive, at 2 for the device shown in the graphs I sent.

I wouldn’t think I have much in the way of asymmetry, but anything’s possible. 
Both the monitoring system and the and the NTP servers under test are connected 
to the same switch. Here’s what my sources/sourcestats looked like:

210 Number of sources = 7
MS Name/IP address Stratum Poll Reach LastRx Last sample   
===
^+ 192.168.230.240   1   2   377 3   -218ns[ -240ns] +/-   23us
^x 192.168.230.241   1   2   377 0-40us[  -40us] +/- 2930us
^x 192.168.230.242   1   2   377 0-40us[  -40us] +/- 2930us
^* 192.168.230.244   1   2   377 0   -146ns[ -168ns] +/-   23us
=x 192.168.230.1 2   3   377 1  -7027ns[-7049ns] +/- 1530us
=x 192.168.230.2 2   2   377 4  +4821ns[+4799ns] +/-   86us
=x 192.168.225.102   2   377 5-24us[  -24us] +/-  206us
210 Number of sources = 7
Name/IP AddressNP  NR  Span  Frequency  Freq Skew  Offset  Std Dev
==
192.168.230.24047  28   202 -0.001  0.005+53ns   558ns
192.168.230.24125  1497 -0.015  0.208-41us  7733ns
192.168.230.24216   660 -0.162  0.355-48us  5623ns
192.168.230.24426  13   105 +0.002  0.010-52ns   410ns
192.168.230.1  39  17   355 -0.002  0.023   +361ns  3795ns
192.168.230.2  48  24   251 -0.008  0.020   +382ns  2505ns
192.168.225.10 64  38   606 +0.003  0.013-17us  5355ns

23/24us for the error estimate isn’t bad considering that the minimum round 
trip transmission time through the switch is 21.44 us. If I direct connect the 
server (bypassing the switch), then this drops down to 21/22us (the switch is 
3.44us).

Back on the switch, if I bring the polling down to 1 then things tighten up a 
bit more. But still nowhere near the 151ns standard deviation that you are 
getting.

210 Number of sources = 7
MS Name/IP address Stratum Poll Reach LastRx Last sample   
===
^+ 192.168.230.240   1   1   377 0   -212ns[ -212ns] +/-   24us
^x 192.168.230.241   1   2   377 0-27us[  -27us] +/- 2930us
^x 192.168.230.242   1   2   377 1-43us[  -43us] +/- 2930us
^* 192.168.230.244   1   1   377 1   +174ns[ +157ns] +/-   24us
=x 192.168.230.1 2   3   377 7  +2809ns[+2835ns] +/- 1552us
=x 192.168.230.2 2   1   377 3  -1394ns[-1411ns] +/-   82us
=x 192.168.225.102   1   335 0-31us[  -31us] +/-  191us
210 Number of sources = 7
Name/IP AddressNP  NR  Span  Frequency  Freq Skew  Offset  Std Dev
==
192.168.230.24016   630 +0.005  0.029 +4ns   264ns
192.168.230.24164  28   255 +0.029  0.038-25us  6948ns
192.168.230.24239  23   154 +0.005  0.102-46us  7070ns
192.168.230.24445  2792 -0.001  0.005-23ns   237ns
192.168.230.1  23  13   210 +0.015  0.033  +1262ns  2521ns
192.168.230.2  64  29   215 +0.007  0.012  -2301ns  1733ns
192.168.225.10 63  33   474 +0.003  0.019-19us  5579ns

Denny



> On Nov 03, 2016, at 23:51, Miroslav Lichvar  wrote:
> 
> On Thu, Nov 03, 2016 at 01:38:25PM -0700, Denny Page wrote:
>> I’m a little late to the party, but I thought it would be worth visually 
>> noting the positive impact that the nanosecond timestamp changes have had. 
>> The first graph is from a dedicated chrony monitoring system. The transition 
>> from 2.4 to the current version occurs midway though the graph. Nothing too 
>> dramatic, but still a noticeable improvement. The second graph is from a 
>> more active general server. The improvement here is much more significant.
> 
> Which offset do the graphs show? The one from statistics.log? The
> improvement might be related also to the correction for asymmetric
> jitter that was included recently.
> 
>> Thank you for all the hard work. I’m really looking forward to hardware 
>> timestamps. :)
> 
> I'm busy with other things right now, but I hope I'll have something
> ready for testing soon. It's a lot of new code and I need to put it in
> a better shape before I can push it to git. An

[chrony-dev] Nanosecond timestamps

2016-11-03 Thread Denny Page
I’m a little late to the party, but I thought it would be worth visually noting 
the positive impact that the nanosecond timestamp changes have had. The first 
graph is from a dedicated chrony monitoring system. The transition from 2.4 to 
the current version occurs midway though the graph. Nothing too dramatic, but 
still a noticeable improvement. The second graph is from a more active general 
server. The improvement here is much more significant.

Thank you for all the hard work. I’m really looking forward to hardware 
timestamps. :)

Denny