Re: [ntp:questions] NTP 4.2.8p10 released
Hi, I asked my original question to see if the process could be improved. For instance, it would be good if there was direct pointers to the various distributions page where packaging state and availability for various releases would be shown and updates. This is what can be found for several other security bug scenarios. Especially for security updates, it is more important for coordinating the efforts and help showing how this is handed off. While each party have their responsibility, they way things coordinate can be important for how smooth the upgrade becomes for the users. I wanted to just ask about it to see if there is improvements to be done. For the moment, I don't have 4.2.8p10 available in my version of distro (Debian testing). Packaging is important to many, so this is to illustrate possible improvements. MVH Magnus On 03/24/2017 09:30 AM, Harlan Stenn wrote: Hal, Thanks - we're going over the lists and processes to improve things. H -- Hal Murray writes: Harlan said: We're open to doing an even better job of telling folks about things like this. I think a message should go out to (almost?) all lists when security fixes are available to the general public. The first mail on questions was David Taylor's announcement of the availability of Windows binaries. There was no mention of a security release . I just checked the archives for announce. Nothing since April 2015. The hackers list has only 3 messages in March. None was an announcement. If the mailing list traffic has moved to other lists and/or venues, then please make an official announcement and disable the old lists. (but please save the archives) -- These are my opinions. I hate spam. ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] NTP 4.2.8p10 released
Hi, On 03/23/2017 05:12 PM, Martin Burnicki wrote: Magnus Danielson schrieb: Hi Martin, On 03/23/2017 03:25 PM, Martin Burnicki wrote: Hi Magnus, Magnus Danielson wrote: Hi, On 03/22/2017 02:27 PM, David Taylor wrote: NTP 4.2.8p10 released Windows binaries working on Windows-XP SP3 & later - download: http://www.satsignal.eu/ntp/x86/index.html Source: http://archive.ntp.org/ntp4/ntp-4.2/ntp-4.2.8p10.tar.gz Is bugs generated in distros? Debian for instance. What do you mean? I don't understand your question. In particular for security updates it would be good to do coordinated filing of a tracking bug for various Linux distributions, such that they upgrade their packaging quickly. Exactly who does what may vary, but with some form of orchestrated handling upgrading may become quicker. The folks who are in the security group knew about the upcoming security release, but I agree an email should have been sent on the normal NTP announce mailing list as soon as the new version became available.. It took some time to find: https://packages.qa.debian.org/n/ntp.html 4.2.8p10 now in unstable. Hope it ripples out to the others soon. My point here is that we like to make sure the progress is there and easy to evaluate when an update is possible. Cheers, Magnus ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] NTP 4.2.8p10 released
Hi Martin, On 03/23/2017 03:25 PM, Martin Burnicki wrote: Hi Magnus, Magnus Danielson wrote: Hi, On 03/22/2017 02:27 PM, David Taylor wrote: NTP 4.2.8p10 released Windows binaries working on Windows-XP SP3 & later - download: http://www.satsignal.eu/ntp/x86/index.html Source: http://archive.ntp.org/ntp4/ntp-4.2/ntp-4.2.8p10.tar.gz Is bugs generated in distros? Debian for instance. What do you mean? I don't understand your question. In particular for security updates it would be good to do coordinated filing of a tracking bug for various Linux distributions, such that they upgrade their packaging quickly. Exactly who does what may vary, but with some form of orchestrated handling upgrading may become quicker. Cheers, Magnus ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] NTP 4.2.8p10 released
Hi, On 03/22/2017 02:27 PM, David Taylor wrote: NTP 4.2.8p10 released Windows binaries working on Windows-XP SP3 & later - download: http://www.satsignal.eu/ntp/x86/index.html Source: http://archive.ntp.org/ntp4/ntp-4.2/ntp-4.2.8p10.tar.gz Is bugs generated in distros? Debian for instance. Cheers, Magnus ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Writing the drift file
Harlan, On 03/06/2015 10:35 AM, Harlan Stenn wrote: Folks, A while ago we got a request from the embedded folks asking for a way to limit the writing of the drift file unless there was a big enough change in the value to warrant this. Somebody came up with an interesting way to do this that involves looking to see how much the drift value has changed and only writing the value if the change was big enough. By my read of the code and the comments: 1) it looks like the code is implementing something other than what the comments want, and 2) what's described *or* implemented seems way more complicated than what we need. I'm wondering if we should just let folks specify a drift/wander threshold, and if the current value is more than that amount we write the file, and if the current value is less than that amount we don't bother updating the file. If folks are on a filesystem where the number of writes doesn't matter, no value would be set (or we could use 0.0) and it's not an issue. Thoughts? For embedded systems, reducing the rate of writing make sense, so in this regard the question is valid IMHO. There can't be a universal limit that will once and for all satisfy all needs, so some user configurability would make sense. Another aspect is that if a drift-file exists, NTP believes it and skip the frequency track-in phase and does not resolve the resulting constant reset of time that will be a result of not being able to track in the difference between the given offset and actual offset, if it is too large (this addresses Jochen's 3rd point about worst case scenario). I discovered this the hard way some years back, but I think it never got resolved. Anyway, as long as this misfeature remains, lowering the drift-write rate could lead to subtle misconfiguration issues worse than existing drift-file issues. This would limit the range of values users could configure and it would also end up expose the existing misfeature since we need to make sure that there is a very high likelihood that the drift-value written is within the capture range of the PLL mode, even when written more seldom. Trying to modify the way NTP write (Joochen's point 1) does not help for flashes, it's the write itself which may be long-term destruction, so it is reducing the rate of write which should be the key. It might be better to do the write as part of shut-down. In general, the drift-file handling should be made more fool-proof first, before attempting to improve the write rate issue. It has already proven itself causing a number for problems, but rather than addressing it directly, trying to address triggers seems to be the main problem. Until this is done, the drift-file can be a beneficial accelerator for systems where it works, but may be discouraged for other system uses. It's main motivation is to overcome the FLL phase time of NTP start-up. Cheers, Magnus ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Writing the drift file
Harlan, On 03/07/2015 12:12 PM, Harlan Stenn wrote: OK, a fair amount of good stuff is being discussed. Do we mostly all agree that the purpose of the drift file is to give ntpd a hint as to the frequency drift at startup? If so... The current mechanism is designed to handle the case where ntpd is restarted fairly quickly, so there's a good chance the same drift value will work. Remember that for embedded devices, the operational conditions may be such that it's not a quick restart at all times. You cannot assume and know when it will go up the next time after being powered off. It can be minutes, hours, weeks, years. Just like the leap-second file, the time of the drift-file is relevant, and if it is too old, it is one (of several) reasons to disqualify it. Relying on the time-stamp of the file can be trouble-some, as it may not be respected by file-management. Writing it into the file is more robust. Cheers, Magnus ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] NTP Autokey - who is actively using it?
Hi, On 01/15/2015 03:06 AM, Harlan Stenn wrote: I'm trying to figure out if anybody is actively using autokey, in a production deployment. If you are, please let me know - I have some questions for you. We use it to pull leap-second info off the NTP servers. It took some effort to get it running, and well, it hasn't been painless but we now got the process debugged anyway. Did the authkey-less distribution of leap-second info ever got implemented? I do know there was an I-D essentially interprenting the existing RFCs in such a way. What kind of questions do you have? Cheers, Magnus ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Thoughts on KOD
On 07/08/2014 12:11 PM, Jason Rabel wrote: There are two obvious ways to go for an embedded client. One way would be to use the sntp code as the base. The other would be to either use the current NTP codebase and use the configure options to disable all the refclocks and anything else you didn't want, or wait until we're done with the post-4.2.8 rewrite. For post-4.2.8, we're looking at having a client core with any refclock code being handled a separate process. I do not know if this is the case with NTP, but quite often it takes considerable hacking of sources to get code to compile on non-x86 embedded hardware (i.e. ARM MIPS)... It would probably help boost usage if someone was assuring NTP sources compile on those platforms without the need for modification. You need to gift wrap it considerable, such that the proliferation of bad-hacked NTPish code gets replaced. Putting a price-tag on it mean that it will prohibit the shift of code, which in itself is a cost. Hobby-hackers already do many first breaks, so why not make sure that their contributions make it into the code such that support for a large range of embedded platforms exists either directly in NTPD or easily accessible port. Another thought is to have people review the NTP/SNTPish code that is out there to see how their complience are, what KOD they would react to and how much effort it would to fix the basics. Then again, the basic problem is that people doesn't upgrade their FW as they should. Listing of implementations, their target environments and versions to use and versions to avoid should maybe be of assistance. Cheers, Magnus ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Thoughts on KOD
Danny, On 07/07/2014 04:00 PM, Danny Mayer wrote: On 7/6/2014 2:42 AM, Rob wrote: Harlan Stenn st...@ntp.org wrote: Discussion appreciated. I think it is best to remove KOD from ntpd. It does not serve a useful purpose, because precisely the kind of clients that you want to say goodbye to, do not support it. In real life it has either no effect at all, or it even has a negative effect because the client does not understand it and re-tries the request sooner than it would when no reply was sent at all. You haven't read the code. Any client that ignores the KOD flag will find (if they ever looked) that their clock will be drifting away further and further from the proper time. When KOD is set the value of the received and sent timestamps are the same as the initial client sent timestamp. It doesn't use the system time for the returned packet. Calculate what this does to the resulting clock. Please also note that there is more than one type of KOD packet. See RFC 5905 Section 7.4. See also Figure 13. You need to clearly distinguish the different ones when talking about them. Most of this discussion seems to be about action a. As discussed above this is an extremely useful feature because any client ignoring the KOD flag and using the packet any way will get pushed way of the actual time that they would normally expect regardless of the client software used. Which would make sense if the client has multiple sources and is a relatively decent NTP client. Issues we have seen is outside of the NTP client realm. Cheers, Magnus ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Thoughts on KOD
On 07/07/2014 04:10 PM, Danny Mayer wrote: The experience with blocking has actually being negative and we have seen traffic actually INCREASE after it is blocked because the client, not having received a response, tries more often. This has been observed in the wild. This might be true for proper NTP clients, but I wonder if this is true for faked NTP requests from DDOSers. KOD fills no purpose for DDOSers, so massive attacks is best handled by dropping that traffic, and possibly push the dropping away from the node and subnet running the server. For more modest overload scenarios as miss-configured or otherwise error-ed NTP clients, I believe that what you describe is correct. Let's not confuse these different scenarios, as they most probably have different solutions. My point was that DDOS amplification/relaying should be considered, as we need that solved, while KOD refinements is maybe nice but addresses another problem. I don't think you will be able to handle the DDOS issues without doing blocking, and you want that blocking to move away from your server in order to reduce the impact of the service. Cheers, Magnus ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Thoughts on KOD
On 07/07/2014 04:50 PM, Majdi S. Abbas wrote: On Mon, Jul 07, 2014 at 08:35:25AM +0100, David Taylor wrote: Seconded. Why remove KOD when it has to be expressly enabled (via restrict kod and limited)? I'd rather see a two tier system, where you can enable the use of KOD beyond the initial rate limit, and a second limit beyond which requests are simply ignored. But I don't understand why anyone would remove functionality that the server administrator has to expressly configure to enable. I think KOD is fine for it's intended purpose, but it does not solve this other problem we are having. Thus, a two-tire solution is what I advocate for. Cheers, Magnus ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Thoughts on KOD
On 07/06/2014 12:38 PM, Terje Mathisen wrote: Rob wrote: Harlan Stenn st...@ntp.org wrote: Discussion appreciated. I think it is best to remove KOD from ntpd. It does not serve a useful purpose, because precisely the kind of clients that you want to say goodbye to, do not support it. In real life it has either no effect at all, or it even has a negative effect because the client does not understand it and re-tries the request sooner than it would when no reply was sent at all. I'm afraid this is exactly right: KOD is a way to keep honest guys honest, i.e. it only helps against programmers/users why actually try (hard) to do the right thing. Currently it will cause a badly configured ntpd installation (burst + minpoll 4 + maxpoll 4) to possibly stop using any server which sends back KOD, but only if it also uses the pool directive to actively search out the best servers. Maybe it's time to figure out how to auto-tune configurations as a better alternative than people keep following aged advice. In the meanwhile, make sure that good concrete advice with a section of don't do this anymore is on ntp.org. I don't want to think about users actively trying to generate as much traffic as possible. :-( Unfortunately we need to. The use of NTP features as accelerator in DDOS attack happen this spring. We had to turn of nice features, which in itself becomes a form of DOS. If we rather had ways to protect a server (remember that clients also act as servers) so that proper use does not cause loss of service, but aggressive use cause block-out. Soft-state remembering signaling peers for some time and then forget them to keep statistics of packets per time-period, and if the signaling peer acts reasonably well it is stays, overtransmitting packets will cause black-listing. KOD is the least, but inserting drop rules into the local host should follow, and possibly push the block rule into the network to clear off the machine and part of the network with the offending traffic. For cases like that, KOD won't help at all. Cheers, Magnus ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Thoughts on KOD
Detha, On 07/06/2014 03:23 PM, detha wrote: On Sun, 06 Jul 2014 12:28:09 +, Magnus Danielson wrote: For cases like that, KOD won't help at all. All the state table/KOD/filter rules mitigation approaches I have seen so far are limited to one server. Maybe time to take a look at a DNSBL-type approach for abusive clients; that way, once a client is labeled 'abusive', it will stop working with any of the pool servers that use the blacklist. Policies (for how long to auto-blacklist, how to prevent DoS by blacklisting the competition, how to 'promise to behave and express-delist' etc.) to be defined. Maybe. For the moment I think it is sufficient if we provide a mechanism by which offenders gets reported to *some* system. We *could* also provide a method by which white/black-list can be dynamically set from an external source, so enough hooks exists, but I do not think that NTPD should be burned with the rest of that system. Once NTPD can report it feels offended by a source, and beyond KODing it also report to some external mechanism that could potentially block it by any external means, NTPD does not have to do much more. My point being with this line of thinking is that KOD in itself makes assumptions on the offending source actually respects it, and while KOD rules probably can be improved, it does not provide a very effective means of protection with sources not respecting KOD, and thus we also needs to think i broader terms. In my mind, the defenses is according to these lines: 0) NTPD tolerates a source, packet approval checks 1) NTPD does not tolerate a source, fires of KOD, source is expected to shut up 2) NTPD admin does not tolerate a source, blacklist it, NTPD will drop the traffic 3) NTPD admin does not tolerate a source, filters in in box firewall, box firewall drops the traffic 4) NTPD admin does not tolerate a source, filters in network firewall, network firewall drops the traffic Notice how step 2-4 moves the traffic load further away from NTPD process, interface and eventually subnetwork. What I proposed would allow for automation of these steps. It is reasonable that escalation should be done when a source does not respect KOD and keeps transmitting requests. It is also resonable that blocking times out, such that blocking is removed after some reasonable time, as offenders can be on dynamic addresses, and usually works over limited time when intentional. How to automate step 2-4 is however not a core concern for NTPD, but feeding the data out of NTPD in a way that is handy for such a mechanism is. Separate block-log file as I proposed is probably better than only a syslog file, as it removes the need to parse syslog for matching blocks, but rather can focus on changes in a dedicated file. Cheers, Magnus ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Thoughts on KOD
Harlan, On 07/05/2014 11:40 PM, Harlan Stenn wrote: Folks, I was chatting with PHK about: http://support.ntp.org/bin/view/Dev/NtpProtocolResponseMatrix http://bugs.ntp.org/show_bug.cgi?id=2367 and how we probably want to extend KOD coverage to more than just the limited case. I was assuming folks would want finer-grained control over this behavior, and thought about being able to choose any of kod-limited, kod-noserve, and kod-query. PHK suggested that we consider going the other way - KOD would mean Send KODs whenever appropriate. I wonder what the costs/benefits will be when weighing the extra complexity of multiple choices against when the defaults change and we get new behavior that we can't tune, that costs us in X and Y. This gets a bit more complicated when taking into consideration: - we'll get more traffic from a NAT gateway - - do we need to be able to configure a threshhold for this case? - we should pay attention to how a client, whom we find to be abusive, reacts to: - - getting no response - - getting a KOD response and adapt accordingly. Discussion appreciated. There is also the aspect when KOD does not bite. We have seen that. Like other forms of defenses, inserting drop rules into firewall rules for the offending node is an alternative to consider. KOD only bites for nodes which follows the protocol, but somehow is offending in their configuration. More offensive configuration or packet generation will render KOD relatively useless. Thus, there might be a limit on how much effort should be going into perfecting KOD-generation when maybe raising the bar even further is needed. Then, we should also consider how KOD and drop-rule triggering can be used to trigger denial of service, and how to potentially protect against them. Sorry for muddling your water even more. Cheers, Magnus ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Thoughts on KOD
Harlan, On 07/06/2014 02:18 AM, Harlan Stenn wrote: Magnus, Yes, we know that if we decide to track finely-grained behavior we'll need to watch how {IP,port} responds when getting {no,KOD} responses. Just want to gently remind you. We might just want a syslog entry for KOD, because it's clear that there can come a time when we don't want to rely on the remote side doing anything. Unless there is a better solution. I like the syslog idea because we can tag it and let other mechanisms decide what to do with that raw information. For that purpose it may be good to allow for a separate log for sent KOD messages, besides properly log to syslog. A script or program can then monitor it for updates and insert rules, without having to filter the syslog. Cheers, Magnus ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Asymmetric Delay and NTP
On 24/03/14 14:38, Joe Gwinn wrote: Magnus, In article 532fa47b.7060...@rubidium.dyndns.org, Magnus Danielson mag...@rubidium.dyndns.org wrote: Joe, On 23/03/14 23:20, Joe Gwinn wrote: Magnus, In article 532e45db.5000...@rubidium.dyndns.org, Magnus Danielson mag...@rubidium.dyndns.org wrote: Joe, On 21/03/14 17:04, Joe Gwinn wrote: [snip] It is interesting. I've now read it reasonably closely. The basic approach is to express each packet flight in a one-line equation (a row) in a linear-system matrix equation, where the system matrix (the A in the traditional y=Ax+b formulation, where b is zero in the absence of noise), where A is 4 columns wide by a variable number of rows long (one row to a packet flight), and show that one column of A can always be computed from the two other columns that describe who is added and subtracted from who. In other words, these three columns are linearly dependent on one another. The forth column contains measured data. This dependency means that A is always rank-deficient no matter how many packets (including infinity) and no matter the order, so the linear system cannot be solved. It is just another formulation of the same equations I provided. For each added link, one unknown and one measure is added. For each added node, one unknown is added. True, but there is more. Let's come back to that. As you do more measures, you will add information about variations the delays and time-differences between the nodes, but you will not disclose the basic offsets. Also true. The advantage of the matrix formulation is that one can then appeal to the vast body of knowledge about matrixes and linear systems. It's not that one cannot prove it without the matrixes, it's that the proof is immediate with them - less work. And the issue was to prove that no such system could work. As much as I like matrix formulation, it ain't giving you much more in this case, rather than a handy notation. The trouble is that beyond the properties of the noise, there is no information leakage about the static time-errors and asymmetries. You end up having free variables. Yes. You correctly noted the mathematical equivalence of the two approaches, and I agree. My point was that the matrix approach is less work to get to the desired proof because by formulation as a linear solution with matrixes, one immediately inherits lots of properties and proofs. The problem is that the unknown and the relationships builds up in an uneven rate, and the observations only relate to two unknowns. The only trustworthy fact we get is the sum of the delays, but no real hint about its distribution. If you do more observations along the same paths, you can do some statistics, but you won't get un-biased result without adding a prior knowledge one way or another. Formulate it as you wish, but as you add more observations, those will be reduced to by their linear properties to equations existing and noise. You need to add observations which does not fully reduce in order for your equation system to grow to such size that you can solve it. Yes, this is a good statement of the consequences of the proof. Thanks. Show me how you achieve it, and I listen. I don't understand the challenge. There is no dispute. It's not a personal challenge, it's a wide-spread challenge. If someone worked something out I'm really keen to learn about it. I've spent quite some time about figuring these things out as I need to understand them. The *one* thing you can figure out with more measurements is how non-zero-mean noise such as network traffic contribute to asymmetry. You can do pretty good approximations of that contribution. However, if there is an underlying asymmetry in static delay sources, they won't disclose themselves with more measurements of the set measurements. What you *can* do is bring a precise time to the first slave, measure the time-error, compensate for it and then step-by-step calibrate a path. The trouble is that you know adds the a prior assumption of stable asymmetry, which may not be true. I've experienced it not to be true. It is only by cheating that you can overcome the limits of the system. Is GPS cheating? That's our usual answer, but GPS isn't always available or possible. If you are trying to solve it within a network, it is. You can convert your additional GPS observation into an a prior knowledge, and once you done enough of those, then you can solve it completely. The estimated variables better stay static thought, or you have to start over again. GPS is the usual answer, but isn't always available or useful. I know, I know. Recall that the original question was random asymmetry due to asymmetric background traffic in a PTP network. If the network is controllable, a lab experiment is to simply turn the background traffic off and see how much the clocks change with respect to one another. But this tells one how much trouble one
Re: [ntp:questions] Asymmetric Delay and NTP
Joe, On 23/03/14 23:20, Joe Gwinn wrote: Magnus, In article 532e45db.5000...@rubidium.dyndns.org, Magnus Danielson mag...@rubidium.dyndns.org wrote: Joe, On 21/03/14 17:04, Joe Gwinn wrote: [snip] It is interesting. I've now read it reasonably closely. The basic approach is to express each packet flight in a one-line equation (a row) in a linear-system matrix equation, where the system matrix (the A in the traditional y=Ax+b formulation, where b is zero in the absence of noise), where A is 4 columns wide by a variable number of rows long (one row to a packet flight), and show that one column of A can always be computed from the two other columns that describe who is added and subtracted from who. In other words, these three columns are linearly dependent on one another. The forth column contains measured data. This dependency means that A is always rank-deficient no matter how many packets (including infinity) and no matter the order, so the linear system cannot be solved. It is just another formulation of the same equations I provided. For each added link, one unknown and one measure is added. For each added node, one unknown is added. True, but there is more. Let's come back to that. As you do more measures, you will add information about variations the delays and time-differences between the nodes, but you will not disclose the basic offsets. Also true. The advantage of the matrix formulation is that one can then appeal to the vast body of knowledge about matrixes and linear systems. It's not that one cannot prove it without the matrixes, it's that the proof is immediate with them - less work. And the issue was to prove that no such system could work. As much as I like matrix formulation, it ain't giving you much more in this case, rather than a handy notation. The trouble is that beyond the properties of the noise, there is no information leakage about the static time-errors and asymmetries. You end up having free variables. The problem is that the unknown and the relationships builds up in an uneven rate, and the observations only relate to two unknowns. The only trustworthy fact we get is the sum of the delays, but no real hint about it's distribution. If you do more observations along the same paths, you can do some statistics, but you won't get un-biased result without adding a prior knowledge one way or another. Formulate it as you wish, but as you add more observations, those will be reduced to by their linear properties to equations existing and noise. You need to add observations which does not fully reduce in order for your equation system to grow to such size that you can solve it. Show me how you achieve it, and I listen. The no matter the order part comes from the property of linear systems that permuting the rows and/or columns has no effect, so long as one is self-consistent. So far, I have not come up with a refutation of this approach. Nor have the automatic control folk - this proof was first published in 2004 into a community that knows their linear systems, and one would think that someone would have published a critique by now. The key mathematical issue is if there are message exchange patterns that cannot be described by a matrix of the assumed pattern. If not, the proof is complete. If yes, more work is required. So far, I have not come up with a counter-example. It takes only one to refute the proof. It is only by cheating that you can overcome the limits of the system. Is GPS cheating? That's our usual answer, but GPS isn't always available or possible. If you are trying to solve it within a network it is. You can convert your additional GPS observation into an a prior knowledge, and once you done enough of those, then you can solve it completely. The estimated variables better stay static thought, or you have to start over again. Although, if one goes to the trouble to make a NIC PTP-capable, it wouldn't be so hard to have it recognize and timestamp passing NTP packets. The hard part would be figuring out how to transfer this timestamp data from collection in the NIC to point of use in the NTP daemon, and standardizing the answer. The Linux-kernel has such support. NTPD has already some support for such NICs included. All true. But I'm reluctant to recommend a solution that lacks a common standard and/or has fewer than three credible vendors supporting that standard. I have no doubt that these things will come to pass, but we are not there just yet. Indeed. Cheers, Magnus ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] IEEE 1588 (PTP) at the nanosecond level?
Joe, On 21/03/14 16:17, Joe Gwinn wrote: Magnus, Thus, another fairly severe environment. I have a personal war story from 1992: At a Air Traffic Control center in Canada, one 19 cabinet had the green (safety ground) and white (power neutral) cables transposed. This caused 2.3 Vrms at 180 Hz to appear between the VMEbus ground and the cabinet shell, with enough oomph to cause a small spark when oscilloscope probe grounding clip was connected to that VMEbus ground, this causing the system (and my heart) to crash. If left connected, the ground clip became warm. And how can ground generate a spark, even a small one? Fixing the grounds dropped the offset to around ten millivolts. The 180 Hz arose because the power supplies were single-phase capacitor-input, driven from the legs of three phase prime power. Power neutral isn't really neutral when it takes a lot of beating. Similarly, a grounding wire isn't doing much grounding as frequency goes up. That fails economically - might as well stick to IRIG. Indeed. Doing 1 us level might be possible, going lower than that will cause you more and more grey hairs one way or another. Well, now, this could be an advantage -- my hair is already gray, and more could be better. Well, you may have younger colleagues which fails to have this advantage. I knew you would make the comment. :) There is a truism in the standards world, that it take three major releases (versions) of a standard for it to achieve maturity. PTP is at version 2, so one more to go. I'd say it depends on for what application. The trouble is when the assumed applications increase at a quicker rate than the standard adapts to handle them. It does, but having the market grow faster than the standards cycle can be the mark of success. To some degree. Being perceived to be a solution isn't the same as it being a solution. By the way, development of the third revision of 1588 started in 2013. I joined what purported to be their reflector, but now that you mention it I haven't gotten any traffic -- Something must be wrong. I will need to enquire. They formally had their first session at the ISPCS in Lemgo. Cheers, Magnus ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] IEEE 1588 (PTP) at the nanosecond level?
On 19/03/14 10:43, Martin Burnicki wrote: Magnus Danielson wrote: On 18/03/14 10:17, Martin Burnicki wrote: We have mades some tests and found that NTP can yield the same accuracy as NTP if also hardware timestamping of NTP packets is supported on all nodes, similar as for PTP. In fact this isn't surprising, is it? No, it's not. NTP is being perceived to be software timestamping but nothing prohibits you from doing it in hardware. Similarly can you implement PTP with software time-stamping (with shitty performance). As I mentioned in a different posting, even if you use hardware timestamping with NTP you are out of luck for highest accuracy since there are (AFAIK) no switches which have been designed to timestamp NTP packets. And even if there were, the next question is how to get the measured latency compensation parameters to the client without breaking the existing protocol? Maybe this would be an interesting approach for NTP v5. Indeed. When I look at NTP and PTP, I see two protocols that could learn a lot from each other. NTP got some things (more or less) right that PTP is bad at, and vice versa. Cheers, Magnus ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] IEEE 1588 (PTP) at the nanosecond level?
On 19/03/14 10:50, Miroslav Lichvar wrote: On Tue, Mar 18, 2014 at 10:20:08PM +0100, Magnus Danielson wrote: No, it's not. NTP is being perceived to be software timestamping but nothing prohibits you from doing it in hardware. Similarly can you implement PTP with software time-stamping (with shitty performance). Doing HNTP makes NTP match up against PTPv1 to some degree, but PTP then pulls out the explicit means to make PTP-aware transparent clocks to correct for delays, cancelling some of the asymmetry. You could do NTP with PTP 2-step processing, but what we would call such a bastard would be an interesting thing, NPTP? There is already a two step mode implemented in ntpd that works with NTP peers or broadcast, it's activated by the xleave option. An NTP transparent clock could be implemented too. One problem is that with the current protocol it would have to track the connections. For a stateless operation a new NTP extension field would probably be needed. Similarly to PTP, all NTP-aware routers and switches between NTP server and client would increment a path delay correction. Interesting! NTP aware routers and switches is probably less common than PTP aware ditos. Cheers, Magnus ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] IEEE 1588 (PTP) at the nanosecond level?
On 19/03/14 18:51, E-Mail Sent to this address will be added to the BlackLists wrote: Martin Burnicki wrote: Magnus Danielson wrote: Indeed. If you read the right article from 1990 you also know you can do it on L1 C/A only by monitoring both code and carrier phase, as their ionospheric effect have opposite signs. That's interesting, and I didn't know about this. Do you have a pointer to this article? http://www.navipedia.net/index.php/Code-Carrier_Divergence_Effect http://www.dtic.mil/cgi-bin/GetTRDoc?AD=ADA497270 There ya' go! It was the Robert Giffard article I was refering to, but was too tired to dig up at the time. http://www.academia.edu/457163/Ionosphere_Effect_Mitigation_for_Single-frequency_Precise_Point_Positioning http://www.ngs.noaa.gov/PUBS_LIB/GPSCarrierPhase.pdf Haven't see those two, so I will see if they add something new. Cheers, Magnus ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] IEEE 1588 (PTP) at the nanosecond level?
Hi Joe, On 20/03/14 01:53, Joe Gwinn wrote: In article 5328ad2...@rubidium.dyndns.org, Magnus Danielson mag...@rubidium.dyndns.org wrote: On 18/03/14 01:24, Joe Gwinn wrote: I've used IRIG-B004 DCLS before, for cables two meters long within a cabinet. Worked well. How well do they handle 100 meter cables, in areas where the concept of ground can be elusive? The rising edge of the 100 Hz is your time reference, the falling edges is your information. Proper signal conditioning and cabling should not be a problem given proper drivers and receivers. IRIG-B004 DCLS also travels nicely over optical connections, and grounding issues will be less of a problem. Known to work well in power sub-stations, so there can be off the shelf products if you look for them. That's a pretty severe environment. I thought it would get your attention. I should give more context: On ships at full steam, there can be a steady seven volts rms or so at power frequency (and harmonics) between bow and stern, which will cause large currents to flow in the shield. This is well below the frequency at which inside and outside shield currents become decoupled due to skin effect, so the full voltage drop in the shield may be seen on the center conductor. We use optical links a lot, and triax some. One can also make RF boxes largely immune with a DC-block capacitor in series with the center conductor. Thus, another fairly severe environment. Maybe, depends on your needs. Consider doing a separate network for PTP. That approach have been used in systems where you want to make sure it works. That fails economically - might as well stick to IRIG. Indeed. Doing 1 us level might be possible, going lower than that will cause you more and more grey hairs one way or another. This is my fear and instinct. But people read the adverts and will continue to ask. And some customers will demand. So, I'm digging deeper. Are there any good places to start? You asked here, it's not the worst place to start. :) To be sure. There is a truism in the standards world, that it take three major releases (versions) of a standard for it to achieve maturity. PTP is at version 2, so one more to go. I'd say it depends on for what application. The trouble is when the assumed applications increase at a quicker rate than the standard adapts to handle them. Cheers, Magnus ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Asymmetric Delay and NTP
Joe, On 19/03/14 11:55, Joe Gwinn wrote: In article 5328aaa6.70...@rubidium.dyndns.org, Magnus Danielson mag...@rubidium.dyndns.org wrote: On 18/03/14 01:36, Joe Gwinn wrote: In article 5327757e.5040...@rubidium.dyndns.org, Magnus Danielson mag...@rubidium.dyndns.org wrote: Is that formal enough for you? It may be. This I did know, and would seem to suffice, but I recall a triumphant comment from Dr. Mills in one of his documentation pieces. Which I cannot recall well enough to find. It may be the above analysis that was being referred to, or something else. I can't recall. The above I came up with myself some 10 years ago or so. When I awoke the day after writing the above, I saw two problems with the above analysis. First is that with added message-exchange volleys, one does not get added variables and equations, one instead gets repeats of the equations one already has. If there is no noise, the added volleys convey no new information. If there is noise, multiple volleys allows one to average random noise out. True. What does happen over time is: 1) Clocks drift away from each other due to systematics and noises 2) The path delay shifts, sometimes because of physical distance shifts, but also due to shift of day and season. These require continuous tracking to handle Second is that what is proven is that a specific message-exchange protocol cannot work, not that there is no possible protocol that can work. The above analysis only assumes a way to measure some form of signal. The same equations is valid for TWTFTT as for NTP, PTP or whatever uses the two-way time-transfer. What will differ is they way they convey the information and the noise-sources they see. Will see if I can find Dave's reference. I hit pay dirt yesterday, while searching for data on outliers in 1588 systems. Dave's reference may well be in the references of the following article. Fundamental Limits on Synchronizing Clocks Over Networks, Nikolaos M. Freris, Scott R. Graham, and P. R. Kumar, IEEE Trans on Automatic Control, v.56, n.6, June 2011, pages 1352-1364. Sounds like an interesting article. Always interesting to see different peoples view of fundamental limits. I also took the next step, which is to treat d_AB and d_BA as random variables with differing means and variances (due to interference from asymmetrical background traffic), and trace this to the effect on clock sync. It isn't pretty on anything like a nanosecond scale. The required level of isolation between PTP traffic and background traffic is quite stringent. It's even worse when you get into packet networks, as the delays contain noise sources of variable mean and variable deviation, besides being asymmetrical. NTP combats some of that, but doesn't get deep enough due to too low packet rate. PTP may do it, but it's not in the standard so it will be propritary algorithms. The PTP standard is a protocol framework. ITU have spent time to fill in more of the empty spots. Yes. In closed networks, the biggest cause of asymmetry I've found is interference between NTP traffic and heavy background traffic in the operating system kernels of the hosts running application code. Another big hitter was background backups via NFS (Network File System). The network switches were not the problem. What greatly helps is to have a LAN for the heavy applications traffic, and a different LAN for NTP and the like, forcing different paths in the OS kernel to be taken. If you can get your NIC to hardware time-stamp your NTP, you will clean things up a lot. Cheers, Magnus ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Asymmetric Delay and NTP
On 18/03/14 01:36, Joe Gwinn wrote: In article 5327757e.5040...@rubidium.dyndns.org, Magnus Danielson mag...@rubidium.dyndns.org wrote: Joe, On 16/03/14 23:16, Joe Gwinn wrote: I recall seeing something from Dr. Mills saying that a formal proof had been found showing that no packet-exchange protocol (like NTP) could tell delay asymmetry from clock offset. Can anyone provide a reference to this proof? It's relative simple. You have two nodes (A and B) and a link in each direction (A-B and B-A). You have three unknowns, the time-difference between the nodes (T_B - T_A), the delay from node A to B (d_AB) and the delay from node B to A (d_BA). You make two observations of the pseudo-range from node A to node B (t_AB) and from node B to node A (t_BA). These are made by the source announcing it's time and the receiver time-stamping in it's own time when it occurs. t_AB = T_B - T_A + d_AB t_BA = T_A - T_B + d_BA We thus have three unknowns and two equations. You can't solve that. For each link you add, you add one observation and one unknown. For each node and two links you add, you add three unknowns and two observations. You can't win this game. There are things you can do. Let's take out observations and add them, then we get RTT = t_AB + t_BA = (T_B - T_A) + d_AB + (T_A - T_B) + d_BA = d_AB + d_BA Now, that is useful. If we diff them we get /|T = t_AB - t_BA = (T_B - T_A) + d_AB - (T_A - T_B) - d_BA = 2(T_B - T_A) + d_AB - d_BA TE = /|T / 2 = T_B - T_A + (d_AB - d_BA)/2 So, diffing them gives the time-difference, plus half the asymmetric delay. If we assume that the delay is symmetric, then we can use these measures to compute the time-difference between the clocks, and if there is an asymmetry, it *will* show up as a bias in the slave clock. The way to come around this is to either avoid asymmetries like the plague, find a means estimate them (PTPv2) or calibrate them away. Is that formal enough for you? It may be. This I did know, and would seem to suffice, but I recall a triumphant comment from Dr. Mills in one of his documentation pieces. Which I cannot recall well enough to find. It may be the above analysis that was being referred to, or something else. I can't recall. The above I came up with myself some 10 years ago or so. Will see if I can find Dave's reference. I also took the next step, which is to treat d_AB and d_BA as random variables with differing means and variances (due to interference from asymmetrical background traffic), and trace this to the effect on clock sync. It isn't pretty on anything like a nanosecond scale. The required level of isolation between PTP traffic and background traffic is quite stringent. It's even worse when you get into packet networks, as the delays contain noise sources of variable mean and variable deviation, besides being asymmetrical. NTP combats some of that, but doesn't get deep enough due to too low packet rate. PTP may do it, but it's not in the standard so it will be propritary algorithms. The PTP standard is a protocol framework. ITU have spent time to fill in more of the empty spots. Cheers, Magnus ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] IEEE 1588 (PTP) at the nanosecond level?
On 18/03/14 01:24, Joe Gwinn wrote: In article 532778bf.50...@rubidium.dyndns.org, Magnus Danielson mag...@rubidium.dyndns.org wrote: On 17/03/14 13:50, Joe Gwinn wrote: In article lg61s4$ong$3...@dont-email.me, William Unruh un...@invalid.ca wrote: On 2014-03-16, Joe Gwinn joegw...@comcast.net wrote: I keep seeing claims that Precision Time Protocol (IEEE 1588-2008) can achieve sub-microsecond to nanosecond-level synchronization over ethernet (with the right hardware to be sure). I've been reading IEEE 1588-2008, and they do talk of one nanosecond, but that's the standard, and aspirational paper is not practical hardware running in a realistic system. 1ns is silly. However 10s of ns are possible. It is achieved by Radio Astronomy networks with special hardware (but usually post facto) IEEE 1588-2008 does say one nanosecond, in section 1.1 Scope. I interpret it as aspirational - one generally makes a hardware standard somewhat bigger and better than current practice, so the standard won't be too soon outgrown. IEEE standards time out in five years, unless revised or reaffirmed. I've seen some papers reporting tens to hundreds of nanoseconds average sync error, but for datasets that might have 100 points, and even then there are many outliers. I'm getting PTP questions on this from hopeful system designers. These systems already run NTP, and achieve millisecond level sync errors. Uh, perhaps show them to achievement of microsecond level sync errors? That is already a factor of 1000 better than they achieve. I forgot to mention a key point. We also have IRIG hardware, which does provide microsecond level sync errors. The hope is to eliminate the IRIG hardware by using the ethernet network that we must have anyway. IRIG-B004 DCLS can provide really good performance if you let it. To get *good* PTP performance, comparable to your IRIG-B, prepare to do a lot of testing to find the right Ethernet switches, and then replace them all. Redoing the IRIG properly start to look like cheap and straight-forward. I've used IRIG-B004 DCLS before, for cables two meters long within a cabinet. Worked well. How well do they handle 100 meter cables, in areas where the concept of ground can be elusive? The rising edge of the 100 Hz is your time reference, the falling edges is your information. Proper signal conditioning and cabling should not be a problem given proper drivers and receivers. IRIG-B004 DCLS also travels nicely over optical connections, and grounding issues will be less of a problem. Known to work well in power sub-stations, so there can be off the shelf products if you look for them. This is for proposed new systems, so there are no switches to replace. In response to questions from hopeful engineers, I had already made the point about the need for serious testing, with asymmetrical loads a factor larger than the real system will sustain. I'm not sure they are convinced of the need. Anyway, the hope is that PTP will be simpler and cheaper than having multiple IRIG systems, assuming that one starts from scratch. Maybe, depends on your needs. Consider doing a separate network for PTP. That approach have been used in systems where you want to make sure it works. One of the key problems is getting the packets onto the network (delays withing the ethernet card) special hardware on the cards which timestamps the sending and receiveing of packets on both ends could do better. But it also depends on the routers and switches between the two systems. Yes. My question is basically a query about the current state of the art [in PTP]. The state of the art is not yet standard and not yet off the shelf products, if you want to call it PTP. This is my fear and instinct. But people read the adverts and will continue to ask. And some customers will demand. So, I'm digging deeper. Are there any good places to start? You asked here, it's not the worst place to start. :) Cheers, Magnus ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] IEEE 1588 (PTP) at the nanosecond level?
On 18/03/14 02:45, Paul wrote: On Mon, Mar 17, 2014 at 9:33 PM, Joseph Gwinn joegw...@comcast.net wrote: Will it do 100 meters or more, in bad neighborhoods? I'm not the right person to ask but since it is expected to maintain between 2.5 and 100 nanosecond sync with CPE nodes (cable modems) I assume it requires RF techniques not readily available (or cost effective) outside a cable plant. The DOCSIS time interface is fun in that it uses two different frequencies to provide the transfer, so you get an interpolation function from relatively benign frequencies. Will make your crystal supplier happy. Cheers, Magnus ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] IEEE 1588 (PTP) at the nanosecond level?
On 18/03/14 09:59, Martin Burnicki wrote: Magnus Danielson wrote: On 17/03/14 09:48, Martin Burnicki wrote: You'd need hardware (FPGA?) which can be clocked at 1 GHz, and even in the hardware signal processing you'd need to account for a number of signal propagation delays which you can eventually ignore at lower clock rates. So of course the effort becomes much higher if you want more accuracy, but this is always the case, even if you compare NTP to the time protocol, or PTP to NTP. You don't need to count at 1 GHz, you can achieve the resolution with *much* lower frequencies. One pair of counters I have achieve 2,7 ps single-shot resolution using 90 MHz clock. Interpolators does the trick. There is many ways to interpolate. Agreed. I just thought the way to use a higher counter clock is more obvious. All depends on how accurate and precise you can get your timestamps, and this is probably easier with network packet timestampers at both sides of a cable than with a wireless time transfer method like GPS which usually suffers from delays which can't easily be measured, like ionospheric delays. And yes, I know that this can be improved if you receive 2 GPS frequencies instead of only the L1. ;-) Indeed. If you read the right article from 1990 you also know you can do it on L1 C/A only by monitoring both code and carrier phase, as their ionospheric effect have opposite signs. Carrier phase by the way is a good illustration that the frequency you have (1,57542 GHz) is not the limiting factor, as you can make observations to within 1/100 of the cycle, or about 2 mm. The precision and accuracy needs *tons* of processing to get in that neighborhood, especially since the tropospherical delay is hard to estimate and compensate for in a single receiver. Anway, resolution and counter frequency is and remains two different things. Precision measurements can be made using two lower frequency signals. Achieving the necessary resolution then turns into the troublesome issue of precision, which require calibrations and systematic studies. I've seen some papers reporting tens to hundreds of nanoseconds average sync error, but for datasets that might have 100 points, and even then there are many outliers. I'm getting PTP questions on this from hopeful system designers. These systems already run NTP, and achieve millisecond level sync errors. Uh, perhaps show them to achievement of microsecond level sync errors? That is already a factor of 1000 better than they achieve. One of the key problems is getting the packets onto the network (delays withing the ethernet card) special hardware on teh cards which timestamps the sending and receiveing of packets on both ends could do better.a But it also depends on the routers and switches between the two systems. Of course all involved network nodes needed to be able to timestamp at this high resolution, otherwise the overall accuracy would be degraded. And, it would probably be easier to achieve this accuracy with an embedded device with dedicated hardware than with a a standard PC and a NIC connected via the PCI bus. There is a whole myriad of issues you end up with when you try to get down that low. Yep. If there were a 1 GHz oscillator on the NIC used for timestamping then you still have to provide a way to relate the timestamps from the NIC to your local system time. If the only way to do this is via the (PCI?) bus then the accuracy could suffer from bus latency, arbitration, etc. No go. On a dedicated hardware the same oscillator/high resolution counter chain could be used for system timekeeping, and to timestamp network packets, which makes things much easier. You end up with quite dedicated hardware if you want to go there, yes. Regardless of how you do it. Standard PC hardware hasn't been designed for timekeeping at highest accuracy, neither the cheap crystal oscillator usually assembled on the mainboard, nor the missing hard link between hardware on a PCI card and the CPU, nor the TSCs often used for timekeeping which may suffer from changes of the CPU clock frequency (with older CPU types) or changes of the front bus clock frequency (with newer CPU types). Indeed. In the old days, you could accurately count your clock cycles in your assembler and a single common clock for the full machine. No such luck anymore. Cheers, Magnus ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] IEEE 1588 (PTP) at the nanosecond level?
On 18/03/14 10:17, Martin Burnicki wrote: Paul wrote: On Mon, Mar 17, 2014 at 8:07 PM, Joe Gwinn joegw...@comcast.net wrote: People are also lusting after sub-microsecond sync. Sure but not optimally in comp.protocols.ntp/questions@lists.ntp.org. With some help NTP can be quite good but the intent really isn't nanosecond accuracy. We have mades some tests and found that NTP can yield the same accuracy as NTP if also hardware timestamping of NTP packets is supported on all nodes, similar as for PTP. In fact this isn't surprising, is it? No, it's not. NTP is being perceived to be software timestamping but nothing prohibits you from doing it in hardware. Similarly can you implement PTP with software time-stamping (with shitty performance). Doing HNTP makes NTP match up against PTPv1 to some degree, but PTP then pulls out the explicit means to make PTP-aware transparent clocks to correct for delays, cancelling some of the asymmetry. You could do NTP with PTP 2-step processing, but what we would call such a bastard would be an interesting thing, NPTP? Cheers, Magnus ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Asymmetric Delay and NTP
Joe, On 16/03/14 23:16, Joe Gwinn wrote: I recall seeing something from Dr. Mills saying that a formal proof had been found showing that no packet-exchange protocol (like NTP) could tell delay asymmetry from clock offset. Can anyone provide a reference to this proof? It's relative simple. You have two nodes (A and B) and a link in each direction (A-B and B-A). You have three unknowns, the time-difference between the nodes (T_B - T_A), the delay from node A to B (d_AB) and the delay from node B to A (d_BA). You make two observations of the pseudo-range from node A to node B (t_AB) and from node B to node A (t_BA). These are made by the source announcing it's time and the receiver time-stamping in it's own time when it occurs. t_AB = T_B - T_A + d_AB t_BA = T_A - T_B + d_BA We thus have three unknowns and two equations. You can't solve that. For each link you add, you add one observation and one unknown. For each node and two links you add, you add three unknowns and two observations. You can't win this game. There are things you can do. Let's take out observations and add them, then we get RTT = t_AB + t_BA = (T_B - T_A) + d_AB + (T_A - T_B) + d_BA = d_AB + d_BA Now, that is useful. If we diff them we get /|T = t_AB - t_BA = (T_B - T_A) + d_AB - (T_A - T_B) - d_BA = 2(T_B - T_A) + d_AB - d_BA TE = /|T / 2 = T_B - T_A + (d_AB - d_BA)/2 So, diffing them gives the time-difference, plus half the asymmetric delay. If we assume that the delay is symmetric, then we can use these measures to compute the time-difference between the clocks, and if there is an asymmetry, it *will* show up as a bias in the slave clock. The way to come around this is to either avoid asymmetries like the plauge, find a means estimate them (PTPv2) or calibrate them away. Is that formal enough for you? Cheers, Magnus ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] IEEE 1588 (PTP) at the nanosecond level?
On 17/03/14 09:48, Martin Burnicki wrote: William Unruh wrote: On 2014-03-16, Joe Gwinn joegw...@comcast.net wrote: I keep seeing claims that Precision Time Protocol (IEEE 1588-2008) can achieve sub-microsecond to nanosecond-level synchronization over ethernet (with the right hardware to be sure). I've been reading IEEE 1588-2008, and they do talk of one nanosecond, but that's the standard, and aspirational paper is not practical hardware running in a realistic system. 1ns is silly. However 10s of ns are possible. It is achieved by Radio Astronomy networks with special hardware (but usually post facto) Why should 1 ns be silly? If you have a counter chain clocked by 20 MHz then the timestamps captured when PTP packets are going out or are coming in have a resolution of 50 ns. If your hardware can be clocked a 1 GHz then the resolution could be increase to 1 ns. Of course I know high resolution is not the only thing you need for high accuracy, but it's a precondition. You'd need hardware (FPGA?) which can be clocked at 1 GHz, and even in the hardware signal processing you'd need to account for a number of signal propagation delays which you can eventually ignore at lower clock rates. So of course the effort becomes much higher if you want more accuracy, but this is always the case, even if you compare NTP to the time protocol, or PTP to NTP. You don't need to count at 1 GHz, you can achieve the resolution with *much* lower frequencies. One pair of counters I have achieve 2,7 ps single-shot resolution using 90 MHz clock. Interpolators does the trick. There is many ways to interpolate. Achieving the necessary resolution then turns into the troublesome issue of precision, which require calibrations and systematic studies. I've seen some papers reporting tens to hundreds of nanoseconds average sync error, but for datasets that might have 100 points, and even then there are many outliers. I'm getting PTP questions on this from hopeful system designers. These systems already run NTP, and achieve millisecond level sync errors. Uh, perhaps show them to achievement of microsecond level sync errors? That is already a factor of 1000 better than they achieve. One of the key problems is getting the packets onto the network (delays withing the ethernet card) special hardware on teh cards which timestamps the sending and receiveing of packets on both ends could do better.a But it also depends on the routers and switches between the two systems. Of course all involved network nodes needed to be able to timestamp at this high resolution, otherwise the overall accuracy would be degraded. And, it would probably be easier to achieve this accuracy with an embedded device with dedicated hardware than with a a standard PC and a NIC connected via the PCI bus. There is a whole myriad of issues you end up with when you try to get down that low. If there were a 1 GHz oscillator on the NIC used for timestamping then you still have to provide a way to relate the timestamps from the NIC to your local system time. If the only way to do this is via the (PCI?) bus then the accuracy could suffer from bus latency, arbitration, etc. No go. On a dedicated hardware the same oscillator/high resolution counter chain could be used for system timekeeping, and to timestamp network packets, which makes things much easier. You end up with quite dedicated hardware if you want to go there, yes. Regardless of how you do it. Cheers, Magnus ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] IEEE 1588 (PTP) at the nanosecond level?
On 17/03/14 13:50, Joe Gwinn wrote: In article lg61s4$ong$3...@dont-email.me, William Unruh un...@invalid.ca wrote: On 2014-03-16, Joe Gwinn joegw...@comcast.net wrote: I keep seeing claims that Precision Time Protocol (IEEE 1588-2008) can achieve sub-microsecond to nanosecond-level synchronization over ethernet (with the right hardware to be sure). I've been reading IEEE 1588-2008, and they do talk of one nanosecond, but that's the standard, and aspirational paper is not practical hardware running in a realistic system. 1ns is silly. However 10s of ns are possible. It is achieved by Radio Astronomy networks with special hardware (but usually post facto) IEEE 1588-2008 does say one nanosecond, in section 1.1 Scope. I interpret it as aspirational - one generally makes a hardware standard somewhat bigger and better than current practice, so the standard won't be too soon outgrown. IEEE standards time out in five years, unless revised or reaffirmed. I've seen some papers reporting tens to hundreds of nanoseconds average sync error, but for datasets that might have 100 points, and even then there are many outliers. I'm getting PTP questions on this from hopeful system designers. These systems already run NTP, and achieve millisecond level sync errors. Uh, perhaps show them to achievement of microsecond level sync errors? That is already a factor of 1000 better than they achieve. I forgot to mention a key point. We also have IRIG hardware, which does provide microsecond level sync errors. The hope is to eliminate the IRIG hardware by using the ethernet network that we must have anyway. IRIG-B004 DCLS can provide really good performance if you let it. To get *good* PTP performance, comparable to your IRIG-B, prepare to do a lot of testing to find the right Ethernet switches, and then replace them all. Redoing the IRIG properly start to look like cheap and straight-forward. One of the key problems is getting the packets onto the network (delays withing the ethernet card) special hardware on the cards which timestamps the sending and receiveing of packets on both ends could do better. But it also depends on the routers and switches between the two systems. Yes. My question is basically a query about the current state of the art. The state of the art is not yet standard and not yet off the shelf products, if you want to call it PTP. Cheers, Magnus ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Meinberg Configuration Help
On 02/03/14 19:31, William Unruh wrote: On 2014-03-02, Brian Inglis brian.ing...@systematicsw.ab.ca wrote: On 2014-03-01 15:43, boostinbad...@gmail.com wrote: My NTP server is part of the pool project and appears to be running fine. Comcast contacted me about a month ago to let me know that my NTP server was infected with a bot. I checked and everything seems to be ok. I re-enabled my server about a week ago and I received another phone call last week concerning security on my network. I contacted Ask and he said that it was not a bot but an issue with my server allowing management requests. I asked Ask how to properly configure my Meinberg client to not allow management requests because I understand that they can be problematic. I know the config for ntpd but I am not sure of the proper syntax for Meinberg. Can someone provide me with that info? Banner on http://support.ntp.org links to http://support.ntp.org/bin/view/Main/SecurityNotice#DRDoS_Amplification_Attack_using and recommends restrict default noquery [and possibly other no... options] or you could use restrict default ignore; also add disable monitor. And why those are not the default I will never know. They should never have been on by default-- the problem was obvous 15 years ago, if nothing else in giving an attacker knowledge about your system. Things which go out to the broad internet should be off by default, and be switched on by the user who needs them. Just as ntpd does not have a list of servers it uses by default, but I guess people running ntp servers got burned by that one 20 years ago. There is a complete new generation of sys-admins since then. well known among those so skilled in the art does not mean active knowledge amongst users. This might be a lesson to remember. Cheers, Magnus ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] NTP not syncing
On 12/07/2013 11:39 PM, Harlan Stenn wrote: Magnus Danielson writes: The drift-file-accelerated lock-in isn't robust. Current behavior of response isn't very useful for most people experiencing it. I'm not sure I'd agree with the word most. It's certainly worked very well on hundreds of machines where I've run it, and the feedback I've had from people when I've told them about iburst and drift files has been positive except when they've had Linux kernels that calculate a different clock frequency on a reboot. Experiencing the problem that is. When it works, it's a lovely tool. Sorry if the wording was unclear in that aspect. There are at least 2 other issues here. One goes to robust, and yes, we can do better with that. It's not yet clear to me that in the wider perspective this effort will be worthwhile. Well, you can either choose a rather simple back-out method or if you think it is worthwhile a more elaborate method. Getting cyclic re-set of time is a little to coarse a method. I think it is better to back-out and one way or another recover phase and frequency. The other goes to the amount of time it takes to adequately determine the offset and drift. With a good driftfile and iburst, ntpd will sync to a handful of PPM in about 11 seconds' time. We've been working on a project to produce sufficiently accurate offset and drift measurements at startup time, and the main problem here is that it can take minutes to figure this out well, and there is a significant need to get the time in the right ballpark at startup in less than a minute. These goals are mutually incompatible. The intent is to find a way to get there as well as possible, as quickly as possible. Getting the time in the right ball-park is by itself not all that hard. However, frequency takes time to learn and getting phase errors down quickly becomes an issue. NTP has as far as I have seen reduced loop bandwidth and at the same time reduced the capture range, and whenever you reduce the capture range you need to have heuristics to make sure you back-out if things get upset. Recovery of old state is good, but one needs to make sure that you don't loose that robustness. As for method of locking in quickly, that can be debated on in length. Cheers, Magnus ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] NTP not syncing
On 12/06/2013 10:53 AM, Harlan Stenn wrote: mike cook writes: If you know the drift file is unreliable, you should delete it. ntpd will then perform a frequency calibration before entering the main loop. ... This is what has been recommended for ages but it doesn't completely fix the issue. It still takes a long time to settle. Here are the results of a test I did using the same system and ntp config as in my previous reply wit h the unrepresentative drift file data. An unrepresentative drift file is not a deleted drift file. I filed a bug to address this. If the drift file is obviously nuts, ignore it for speed-up and just work as it was not there, that is, do normal frequency lock-in. Cheers, Magnus ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] NTP not syncing
On 12/07/2013 01:17 AM, unruh wrote: On 2013-12-06, Magnus Danielson mag...@rubidium.dyndns.org wrote: On 12/06/2013 10:53 AM, Harlan Stenn wrote: mike cook writes: If you know the drift file is unreliable, you should delete it. ntpd will then perform a frequency calibration before entering the main loop. ... This is what has been recommended for ages but it doesn't completely fix the issue. It still takes a long time to settle. Here are the results of a test I did using the same system and ntp config as in my previous reply wit h the unrepresentative drift file data. An unrepresentative drift file is not a deleted drift file. I filed a bug to address this. If the drift file is obviously nuts, ignore it for speed-up and just work as it was not there, that is, do normal frequency lock-in. How does it know that the drift file is obviously nuts? If it knew that it could fix it. It does not know that. ntpd ONLY knows the current offset. Now on bootup if there is not drift file, then it tries to remember the past few offsets and use those to estimate a drift, but if there is a drift file, it trusts the value in that drift file. If you are always going to do a drift estimate for the first few polls anyway, why have a drift file at all? Well, we can discuss which is the best way to detect it, but when you fail to lock and is forced to re-set the time, then you surely know you didn't where were you expected to be. The drift-file-accelerated lock-in isn't robust. Current behavior of response isn't very useful for most people experiencing it. Cheers, Magnus ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] NTP not syncing
Antonio, On 11/04/2013 06:40 PM, Antonio Marcheselli wrote: If the oscillator drifts from last drift-file write, outside of +/- 15 ppm if I recall it, it fails to lock in again. It would be good if it could bail out and do normal frequency acquisition if that occurs. That particular feature have bitten hard, and was a side-consequence of other faults, but none-the-less. Thanks Magnus, I saw the bug report you filed. Would it be wiser to delete the drift file at boot - by script - and let ntpd resync and recreate a new drift file? As mentioned, I don't really need my system to be synced down to the millisecond, if ntpd takes a few hours to settle and the time is off up to a few seconds during that time it's perfectly fine with me. If you don't want the bootstrap feature of drift-file, don't specify it in the configuration is much wiser than deleting it. It's good to have this acceleration, if we can make it to be fool-proof. This is an attempt to get in that direction. Cheers, Magnus ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] NTP not syncing
On 11/03/2013 12:43 AM, David Woolley wrote: On 02/11/13 21:48, David Lord wrote: Ntpd writes to its drift file and also ntp.log. The drift file is critical and is used and updated at intervals by ntpd. The drift file is an optimisation. ntpd should work without it, but will take longer to acquire lock after a restart. What would cause more problems would be a drift file that was present, but read-only, as ntpd would skip its frequency calibration and trust the frozen value in that file, then suffer wild swings as it begins to discover the value was wildly wrong. If the oscillator drifts from last drift-file write, outside of +/- 15 ppm if I recall it, it fails to lock in again. It would be good if it could bail out and do normal frequency acquisition if that occurs. That particular feature have bitten hard, and was a side-consequence of other faults, but none-the-less. Cheers, Magnus ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] NTP not syncing
On 11/03/2013 06:26 PM, Magnus Danielson wrote: On 11/03/2013 12:43 AM, David Woolley wrote: On 02/11/13 21:48, David Lord wrote: Ntpd writes to its drift file and also ntp.log. The drift file is critical and is used and updated at intervals by ntpd. The drift file is an optimisation. ntpd should work without it, but will take longer to acquire lock after a restart. What would cause more problems would be a drift file that was present, but read-only, as ntpd would skip its frequency calibration and trust the frozen value in that file, then suffer wild swings as it begins to discover the value was wildly wrong. If the oscillator drifts from last drift-file write, outside of +/- 15 ppm if I recall it, it fails to lock in again. It would be good if it could bail out and do normal frequency acquisition if that occurs. That particular feature have bitten hard, and was a side-consequence of other faults, but none-the-less. By request from Harlan, I put this into a bug-report: http://bugs.ntp.org/show_bug.cgi?id=2500 Hope it was clear enough. Cheers, Magnus ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] NTPD silently not tracking
Hi, On 09/02/2013 12:39 AM, Harlan Stenn wrote: unruh writes: In this case ntpd wandered off by hours with no complaint. That is not a proper behaviour of a professional piece of software. And I don't remember the config file. If the target platform has a config file with setting that change the performance so the described behavior is possible, that's not a bug in NTP. Now it could be that they have the local clock enables, and for some reason ntpd chased that rather than all of the other server sources. Pointing out that they should never actually use the local clock as a source is certainly useful since the clock is never wrong with respect to the local source. Again, if this is what happened and the config file directives make this the stated behavior, that's not a bug in the code. That's a configuration file problem. Giving reasonable warnings when the config will cause a system to be isolated is however expected. You can't expect all the users deeply understand all twists of the configuration. A particular problem with a software like NTPD is that so many recommendations from so many different ages is floating around. Comprehend the full documentation can be daunting. So it's not as easy as saying it is a configuration problem rather than a software problem, sometimes the art of configuration of the software IS the problem. But if the computer has 5 outside source available and still chases after the local source that is a bug that should be fixed. If you know some attempt was made to fix a bug like than in a more recent version than the one used by the user, then advising upgrade is appropriate (as is telling him never to use local) Sure, and we will need to see the bug duplicated on the latest version of whatever branch has the problem, and with 4.2.8 nearly ready for release and with almost no chance of another 4.2.6 release, it makes sense for folks to focus on the latest -dev code. If some volunteer feels like working on this for older code that's great, and if somebody wants active support for older code that is available too. The affected machine was a server for other things and was a client for NTP time. There where very limited time to fool around on that machine with latest and greatest code or whatever comes recommended hours/days after we encountered the problem, and the core behavior was so major that not reporting it when we saw it again would have been bad. We had to focus on getting our system into operational state again, as that is our primary task. I have not had the time to setup another machine to replicate the problem with that or any other version of code. Cheers, Magnus ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] NTPD silently not tracking
On 09/02/2013 02:33 PM, David Lord wrote: Harlan Stenn wrote: David Lord writes: Magnus Danielson wrote: server ntp1.kth.se iburst maxpoll 7 server ntp2.kth.se iburst maxpoll 7 server ntp3.kth.se iburst maxpoll 7 server ntp1.sp.se iburst maxpoll 7 server ntp2.sp.se iburst maxpoll 7 that seems too restrictive and possibly abusive if you do not yourself have control over those servers. iburst is not abusive. Perhaps you are thinking of burst? I was thinking about maxpoll 7 and the few stats that were given indicating the very poor reach for the configured servers. There is good network connectivity to all 5 servers. If you advice us not to use maxpoll 7, then we naturally will learn from it. I don't use it personally, but I didn't set this machine up. Would be nice to hear your explanation thought. However, when doing the ntpdc peers command (in interactive mode), it had all 5 servers available, and was tracking one (as indicated with = and * at the beginning of the lines, I was told this over phone, so I don't have visual memory of it all). So, I don't think bad connectivity was the cause. It looked to a non-NTP expert like it had peers, was happy with offsets (albeit it looked unexpectedly good at 0) but just was plain way off in time. It took multiples querries with ntpdc peers before it reacted on the time-offset, started to display big offsets and eventually clean up itself. ntpdate -q did expose the time error of 6 days. Cheers, Magnus ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] NTPD silently not tracking
On 09/02/2013 03:49 AM, unruh wrote: On 2013-09-01, Magnus Danielson mag...@rubidium.dyndns.org wrote: server ntp1.kth.se iburst maxpoll 7 server ntp2.kth.se iburst maxpoll 7 server ntp3.kth.se iburst maxpoll 7 server ntp1.sp.se iburst maxpoll 7 server ntp2.sp.se iburst maxpoll 7 # Access control configuration; see /usr/share/doc/ntp-doc/html/accopt.html for # details. The web page http://support.ntp.org/bin/view/Support/AccessRestrictions I do hope that was really all on the same line, or there was a # at the start of that second line. Otherwise ntpd will be confused. No worries. That mishap came in the copy-and-past between less /etc/ntp.conf in one window any my email client. This is the default Debian config file which have been changed to point out 5 servers, which I was referring to in my follow-up message: 8--- It has 2 stratum 1 and 3 stratum 2 unicast servers configured. NTP wise this machine is a client with 5 configured servers. The problem was that it was way off time with no apparent indication, which is wrong. Agreed. Noone is arguing it is right. The question is why. You do not seem to be using the local refclock, so that is one explanation gone. Seemed strange to see those comments, as I had already said otherwise. None of those servers happens to be the machine itself do they? Of progeny of that server? No. This is a server, but not of NTP. NTP-wise it is a client. The three stratum 2 servers are local, and the stratum 1 servers are national well known servers. And looking at those log files around the time things go bad might be suggestive. Exactly which version of ntpd, and you are sure that someone has not made improvements to it? If you read my initial message, you would have seen this: ii ntp1:4.2.6.p5+d i386 Network Time Protocol daemon and which is the result of running dpkg -l ntp on that Debian system. We don't have time to improve things with local patches. We might be accused of misconfiguration. I made a report here, in hope you could make more sense of the behavior than the normal Debian packet maintainer. Cheers, Magnus ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] NTPD silently not tracking
On 09/01/2013 10:42 PM, unruh wrote: On 2013-09-01, Steve Kostecke koste...@ntp.org wrote: On 2013-09-01, Rob nom...@example.com wrote: The NTP Reference Implementation is free software. The copyright holder (The University of Delaware) makes no representations about the suitability this software for any purpose. It is provided as is without express or implied warranty. Please visit http://www.ntp.org/copyright for the complete copyright notice and license statement. Yes, usual legal ass protection. Fortunately ntpd developers usually do not actually either believe that nor act as though they believe that. They tend not to say Oh-- it does not work, tough shit. And you do them, and yourself a disservice by saying that that is what they do. It is not what they or you do. In this case ntpd wandered off by hours with no complaint. That is not a proper behaviour of a professional piece of software. Now it could be that they have the local clock enables, and for some reason ntpd chased that rather than all of the other server sources. Pointing out that they should never actually use the local clock as a source is certainly useful since the clock is never wrong with respect to the local source. But if the computer has 5 outside source available and still chases after the local source that is a bug that should be fixed. If you know some attempt was made to fix a bug like than in a more recent version than the one used by the user, then advising upgrade is appropriate (as is telling him never to use local) As we are coming back to topic... 8--- # /etc/ntp.conf, configuration for ntpd; see ntp.conf(5) for help driftfile /var/lib/ntp/ntp.drift # Enable this if you want statistics to be logged. #statsdir /var/log/ntpstats/ statistics loopstats peerstats clockstats filegen loopstats file loopstats type day enable filegen peerstats file peerstats type day enable filegen clockstats file clockstats type day enable # You do need to talk to an NTP server or two (or three). #server ntp.your-provider.example # pool.ntp.org maps to about 1000 low-stratum NTP servers. Your server will # pick a different set every time it starts up. Please consider joining the # pool: http://www.pool.ntp.org/join.html server ntp1.kth.se iburst maxpoll 7 server ntp2.kth.se iburst maxpoll 7 server ntp3.kth.se iburst maxpoll 7 server ntp1.sp.se iburst maxpoll 7 server ntp2.sp.se iburst maxpoll 7 # Access control configuration; see /usr/share/doc/ntp-doc/html/accopt.html for # details. The web page http://support.ntp.org/bin/view/Support/AccessRestrictions # might also be helpful. # # Note that restrict applies to both servers and clients, so a configuration # that might be intended to block requests from certain clients could also end # up blocking replies from your own upstream servers. # By default, exchange time with everybody, but don't allow configuration. restrict -4 default kod notrap nomodify nopeer noquery restrict -6 default kod notrap nomodify nopeer noquery # Local users may interrogate the ntp server more closely. restrict 127.0.0.1 restrict ::1 # Clients from this (example!) subnet have unlimited access, but only if # cryptographically authenticated. # up blocking replies from your own upstream servers. # By default, exchange time with everybody, but don't allow configuration. restrict -4 default kod notrap nomodify nopeer noquery restrict -6 default kod notrap nomodify nopeer noquery # Local users may interrogate the ntp server more closely. restrict 127.0.0.1 restrict ::1 # Clients from this (example!) subnet have unlimited access, but only if # cryptographically authenticated. #restrict 192.168.123.0 mask 255.255.255.0 notrust # If you want to provide time to your local subnet, change the next line. # (Again, the address is an example only.) #broadcast 192.168.123.255 # If you want to listen to time broadcasts on your local subnet, de-comment the # next lines. Please do this only if you trust everybody on the network! #disable auth #broadcastclient ---8 This is the default Debian config file which have been changed to point out 5 servers, which I was referring to in my follow-up message: 8--- It has 2 stratum 1 and 3 stratum 2 unicast servers configured. NTP wise this machine is a client with 5 configured servers. The problem was that it was way off time with no apparent indication, which is wrong. ---8 The debugger (another system admin) of this system did strace, and saw updates to kernel. Nothing anywhere to indicate problems other than what I mentioned that there was a zero offset. I'll try to see if I can re-create this behavior on another machine, as the machine we did see it on needs to be on time since its a server for other things than time. Cheers, Magnus ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] NTPD silently not tracking
On 08/30/2013 04:17 AM, E-Mail Sent to this address will be added to the BlackLists wrote: Magnus Danielson wrote: We had another incident where a node configured with multiple NTP sources had an NTPD which when asked with ntpdc have peers, looks like things are all OK, but with offsets less than a second, while the node in fact was 6 days off the mark. Only on a number of ntpdc querries did some of the peers expose a gigantic offset. Everything looked OK, but time was off such that normal remote login did not work. The error was way to non-obvious and felt like a Heisenbug in that only when we looked more carefully at it, it started to see itself that it was out of touch with reality. ii ntp 1:4.2.6.p5+d i386 Network Time Protocol daemon and What ntpdc commands did you issue, and what results did you get? Did you also try ntpq commands, did you see differing results? ntpq -n -c rv 0 leap ntpq -n -c rv 0 stratum ntpq -n -c rv 0 refid ntpq -n -c rv 0 offset ntpq -n -c rv 0 rootdisp Unfortunatly no. I got the call after the fact, but lack of remote login due to time error would prohibit me from doing anything anyway. The server needed to be operational rather than optimize for NTP debugging. Have you tried a newer version of NTP ? http://www.ntp.org/downloads.html http://www.eecis.udel.edu/~ntp/ntp_spool/ntp4/ntp-dev/ http://www.eecis.udel.edu/~ntp/ntp_spool/ntp4/ntp-dev/ntp-dev-4.2.7p385.tar.gz No, I listed the affected version as packaged by Debian. Don't use Undisciplined Local Clock 27.127.1.0 Try Orphan instead is you need LAN NTP clients to stick together while LAN and/or Internet NTP servers become unavailable. ... keys /etc/ntp.keys # e.g. contains: 123 M LAN_MD5_KEY , 321 M Corp_MD5_KEY , ... trustedkey 123 321 tos cohort 1 orphan 10 restrict source nomodify manycastserver 224.0.1.1 manycastclient 224.0.1.1 key 123 preempt ... It has 2 stratum 1 and 3 stratum 2 unicast servers configured. NTP wise this machine is a client with 5 configured servers. The problem was that it was way off time with no apparent indication, which is wrong. Cheers, Magnus ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
[ntp:questions] NTPD silently not tracking
Hi, We had another incident where a node configured with multiple NTP sources had an NTPD which when asked with ntpdc have peers, looks like things are all OK, but with offsets less than a second, while the node in fact was 6 days off the mark. Only on a number of ntpdc querries did some of the peers expose a gigantic offset. Everything looked OK, but time was off such that normal remote login did not work. The error was way to non-obvious and felt like a Heisenbug in that only when we looked more carefully at it, it started to see itself that it was out of touch with reality. We have now designed a script that warns of an error: cat /etc/cron.hourly/timechecker #!/bin/bash awk 'BEGIN {printf ntpdate -q } ; $1 == server {printf $2 }; END {print }' /etc/ntp.conf | bash | awk '$5 == adjust ( $10 1.0 || $10 -1.0 ) {print WARNING: timechecker says that time of host is off by $10 seconds}' However, this should be addressed in a much more direct manor by NTPD. Have you seen this before? Do you have a remedy? ii ntp1:4.2.6.p5+d i386 Network Time Protocol daemon and Cheers, Magnus ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Start of new GPS 1024 week epoch
On 08/18/2013 07:51 AM, David Taylor wrote: On 17/08/2013 18:31, Magnus Danielson wrote: [] What might be useful is to store the corrected 1024 weeks offsets, since if the NTPD is restarted, those corrections can be applied up-front and then can these corrected values be used to provide good basis for majority decisions about correct time. When a particular receiver flips, then it is the only one (possibly a few of them changing at the same time) which shift by 1024 weeks, and then it is easy to use the 1024-week assumption as a priori knowledge to correct them. When you wake up the flipped receivers may form a majority, which would be unfortunate, as we already know they have flipped, but we forgot it in the re-start process. Doing this, the system integrity can be maintained throughout. Cheers, Magnus It will be interesting to see what folks come up with for the patch. I must admit to feeling that a fault in GPS receivers should /not/ have to be fixed in NTP, but I accept that's likely to be the best solution. It should certainly be an option that has to be specifically enabled (command-line switch or fudge command), and one which has no impact otherwise on the reliability and maintainability of NTP. But this isn't an actual receiver bug in that context. The receivers can't always to the right thing. If a receiver has backup-power, then it can remember from different power-ups the latest year, and that's enough to guess right. Trouble is that most receivers don't have that installed, and most of them having it isn't using that knowledge anyway to resolve it. The problem is that it is a system bug (misfeature really), for which some receivers have more or less smart ways of dealing with, and we need to make handle the case when their ways of fixing it does not work anymore. Thus, we are really talking about patching on a patch, which is ugly, but if you want continuous operation that is what it takes. If you want this feature to be disabled by default, you end up with causing the disruption that the fix is there to avoid. Few will know that they need to fiddle with that bit, and it becomes a continuous support thing, rather than letting the default being that it fixes the problem and then let the really cautious people turn it off. Default disabled is a bad idea. Yes, you change the default behaviour of NTP this way, but it's done because it has been analyzed and it's more likely to fix a problem than cause a problem for the majority of the users. Cheers, Magnus ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Start of new GPS 1024 week epoch
On 08/18/2013 12:16 PM, Rob wrote: David Taylor david-tay...@blueyonder.co.uk.invalid wrote: On 18/08/2013 09:19, Magnus Danielson wrote: [] If you want this feature to be disabled by default, you end up with causing the disruption that the fix is there to avoid. Few will know that they need to fiddle with that bit, and it becomes a continuous support thing, rather than letting the default being that it fixes the problem and then let the really cautious people turn it off. Default disabled is a bad idea. Yes, you change the default behaviour of NTP this way, but it's done because it has been analyzed and it's more likely to fix a problem than cause a problem for the majority of the users. Cheers, Magnus I'm simply saying that I'm happy with NTP as it is now, and that if any /new/ feature is added, it should be optional and disabled by default. The new feature should /only/ apply to GPS sources. Perhaps the code should be restructured so that the network time protocol remains part of ntpd, and local reference clocks are moved out into processes that are more loosely coupled than drivers are now. A fix like this belongs in a driver for GPS, not in the main code that supports networking and synchronization of the local clock. Only the shared memory interface currently has functionality like this, and it has some limitations in the information it can convey. If this interface is improved, all the local clock drivers can be moved out into separate processes and everyone can tinker his driver to fix problems like this one. It will also be easier to release a fixed driver once a problem like this suddenly appears. This is relevant for any driver interfacing a GPS. It's the correction of time as it comes into NTPD. Cheers, Magnus ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Start of new GPS 1024 week epoch
On 08/17/2013 06:02 PM, David Taylor wrote: On 17/08/2013 09:30, Terje Mathisen wrote: David Taylor wrote: [] Thanks for the pointers to the documents. A pity that they haven't been able to find two or three spare bits to reduce the 1024 week ambiguity to nearer a half-century or even 100 years. Oh, well! That would be even worse: Exceptional events like the 10-bit week rollover needs to happen often enough that every programmer is forced to write code to handle them correctly, or it should not happen at all! I.e. a fault-tolerant server setup is only fault-tolerant if you are comfortable doing monthly fire drills where you pull the power cord (or network cable(s)) from either half. For GPS 19+ years was probably intended to be long enough that every given receiver would only live to see a single epoch, meaning that a simple test against the firmware generation time would suffice, right? Well, now we've seen a lot of timing receivers that just keep on working, and that 19+ year range turned out to be not quite long enough. If this has happened every year or so (i.e. a 64-week rollover), then every GS would have had some method to enter the current epoch, and a way to remember it across reboots. Personally I think the (Trimble?) hack to use the TAI-UTC offset field as an epoch guess table index is pretty nice: As long as the offset keeps increasing this will suffice to handle at least one or two epoch rollovers. OTOH, the firmware timestamp method I outlined above will work perfectly as long as (a) somebody is still willing to generate new firmware versions and (b) you still have some machine with compatible hardware/software to allow you to load it onto the GPS. Combined remote antenna/GPS receivers with an RS422 or similar connection to an NTP server requires that firmware update capability to be included in the NTP box. :-( Terje Thanks for your thoughts, Terje. Using a 12-bit (or even 16-bit) field to send the current year would be a preferable solution - at least until they start messing with leap-seconds and change the whole time scale. But I take your point - once every 19 years it will be remembered a lot more easily than once every 76 years. Having a multiplicity of different GPS sources from different manufacturers may at least improve the chance of the problem being spotted. True. You can make better decisions by looking at more sources, so when a particular model flips, all the other sources and likelyhood of flipping makes it reasonable to correct for the systematic effect and then continue. Most GPS receivers will continue to operate correctly after a flip, so as long as we correct for the 1024 week flip period, we can continue to operate. What might be useful is to store the corrected 1024 weeks offsets, since if the NTPD is restarted, those corrections can be applied up-front and then can these corrected values be used to provide good basis for majority decisions about correct time. When a particular receiver flips, then it is the only one (possibly a few of them changing at the same time) which shift by 1024 weeks, and then it is easy to use the 1024-week assumption as a priori knowledge to correct them. When you wake up the flipped receivers may form a majority, which would be unfortunate, as we already know they have flipped, but we forgot it in the re-start process. Doing this, the system integrity can be maintained throughout. Cheers, Magnus ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Start of new GPS 1024 week epoch
On 08/16/2013 05:44 AM, David Taylor wrote: On 15/08/2013 21:33, Magnus Danielson wrote: [] They completely avoid it by not numbering it that way. They have their own numbering scheme that fit's the system, and the conversion over to UTC is an added feature. It's all in ICD-GPS-200 for the current set of details, and in the ION red book series for the early stages. GPS and GPS problems is best understood if you realize that everything is counted in the GPS clock machinery with it's own set of gears. Conversion isn't that hard and it is done every second in the GPS receiver. Cheers, Magnus Thanks, Magnus. I've not heard of ICD-GPS-2000 or ION red book before. Perhaps one day I will look them up. If you go here: http://www.gps.gov/technical/icwg/ you will find IS-GPS-200G (which is the new name since 2006, I have failed to adapt) on this link here: http://www.gps.gov/technical/icwg/IS-GPS-200G.pdf Using ICD-GPS-200D gives a fair idea of what the older GPS receivers was designed to meet. In these documents, the gears of GPS is explained such that you should be able to implement a correctly working receiver (in principle). There are a handful of technical details outside of this spec you need to figure out too, but there are good books for that. GPS continues to impress me - I counted and on holiday recently we took (at least) 7 GPS receivers - his and hers smart-phones, 2 iPads, Garmin GPS 60 CSx, Ventus 750, and one built into my Sony HX200V camera! The Garmin spent much of its time with a puck antenna stuck on the cabin porthole plotting our course. They have gone small now, but you still have L1 C/A only receivers. Many of them probably does not use carrier phase in any way. Cheers, Magnus ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Start of new GPS 1024 week epoch
On 08/15/2013 11:02 PM, unruh wrote: On 2013-08-15, Magnus Danielson mag...@rubidium.dyndns.org wrote: On 08/15/2013 10:22 AM, David Taylor wrote: On 15/08/2013 08:34, Rob wrote: David Taylor david-tay...@blueyonder.co.uk.invalid wrote: On 14/08/2013 17:44, Rob wrote: [] How does a good receiver know the correct time? Does it rely on local (backed-up) storage, or is there some way of receiving it via the almanac? Or are good receivers hardwired as well, only with a different valid span? I would not be surprised when good receivers turn out to have just a different moment or mode of failure. [] Some receivers have battery backup, in fact all but one of the receiver types I use have this. Ok but what happens when the battery is replaced? [] Hope and pray? Wish for a large capacitor or flash-rom? I had thought that either ephemeris or almanac data might contain the real UTC time, but apparently it does not. Obviously a system designed too far in advance of the Year2000 fuss and bother! They completely avoid it by not numbering it that way. They have their own numbering scheme that fit's the system, and the conversion over to UTC is an added feature. It's all in ICD-GPS-200 for the current set of details, and in the ION red book series for the early stages. GPS and GPS problems is best understood if you realize that everything is counted in the GPS clock machinery with it's own set of gears. Conversion isn't that hard and it is done every second in the GPS receiver. That is fine, but I think that the question is what are those internal geers and do those internal geers have a rollover time? Ie, for how long a time period is there a unique mapping from the internals of GPS and the time (UTC or whatever). Obviously the oscillations of the H atoms in the H laser clocks have a rollover of picoseconds. Somewhere in those sattelites is some counter with a lot longer period before it rolls over. As I just answered to David Taylor, it's all described this document: http://www.gps.gov/technical/icwg/IS-GPS-200G.pdf You might enjoy reading the earlier revisions as things have been modified over time, and to understand olrder receivers you need to look at the older spec, available here: http://www.gps.gov/technical/icwg/ The 1024 weeks period I have been speaking of comes from interpreting this document. Cheers, Magnus ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Start of new GPS 1024 week epoch
On 08/16/2013 10:36 AM, David Taylor wrote: Yes, all my receivers are very simple, consumer-level ones. Sometimes I see as low as 2m location accuracy on the GPS 60 CSx, more likely 3m when walking. Thanks for the pointers to the documents. A pity that they haven't been able to find two or three spare bits to reduce the 1024 week ambiguity to nearer a half-century or even 100 years. Oh, well! If you look at the new signals (L1C, L2C, L5), they have 13-bit Week Number (WN) compared to the old 10-bit numbers. Adding the bits to the traditional signal structure would be possible, but would not help if you have not upgraded to include them. Also, the 8192 week cycle would also loop eventually, and it would still be a multiple of 1024 weeks off in that case. However, that's almost 157 years up in 2136/2137 shift. We know that it's too soon for software folks to fix their code. Cheers, Magnus ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Start of new GPS 1024 week epoch
On 08/16/2013 03:34 PM, David Taylor wrote: On 16/08/2013 13:02, John Hasler wrote: David Taylor writes: A pity that they haven't been able to find two or three spare bits to reduce the 1024 week ambiguity to nearer a half-century or even 100 years. From the Wikipedia article: To determine the current Gregorian date, a GPS receiver must be provided with the approximate date (to within 3,584 days) to correctly translate the GPS date signal. To address this concern the modernized GPS navigation message uses a 13-bit field that only repeats every 8,192 weeks (157 years), thus lasting until the year 2137 (157 years after GPS week zero). Oh, that /is/ good news, John! Many thanks. I couldn't see that from a quick scan of the referenced documents, so that's most helpful to know. I wonder whether there is any way to determine which satellites are sending this modernised message, perhaps they all do, or whether a particular receiver is using the full 13-bit field? It's something I've not seen listed in various specifications I've read, but perhaps it's taken for granted after a certain date? None will do that on the L1 C/A signal. It occurs on the new signals such as L2C and L1C which is code-wise separate from the L1 C/A signal. None of the traditional receivers will benefit from this shift. So far I have only seen advanced receivers to receive those signals. Hopefully things will change. As I said, even if you add the bits in the signal, just because they are there, if you haven't upgrade FW which includes it's interpretation, the GPS receiver will not be able to use it and the problem remains. Cheers, Magnus ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Start of new GPS 1024 week epoch
On 08/15/2013 07:55 AM, David Taylor wrote: On 14/08/2013 22:07, Harlan Stenn wrote: David Malone writes: Indeed - you need to have a timestamp within about ten years of correct before you start up, otherwise the problem will be worse. Ntp has the same problem in figuring out the ntp epoch, though we've yet to see an ntp timestamp wrap around. ntp-dev has a fix for this problem - while the original solution was make sure the clock is correct to within ~65 years' time the new code uses a date of compile value, and needs the system time to be either 10 years' before that date or up to 128 years' after that date. See http://bugs.ntp.org/show_bug.cgi?id=1995 for more information (thanks, Juergen!). H If you make that 9.5 years rather than 10 it might then cover the 500-week period mentioned by Magnus. I do not mention a 500 week period. I mention a 1024 week period with various phases, 500, 512 and obviously 729 (wrapped this Sunday as we went into week 1753). Judging by some reports here, people may be using NTP more than 10 years old. Does this fix cause a problem in that case? Not really. This problem is common mode to recent and 10 year old NTPs. Cheers, Magnus ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Start of new GPS 1024 week epoch
On 08/15/2013 10:22 AM, David Taylor wrote: On 15/08/2013 08:34, Rob wrote: David Taylor david-tay...@blueyonder.co.uk.invalid wrote: On 14/08/2013 17:44, Rob wrote: [] How does a good receiver know the correct time? Does it rely on local (backed-up) storage, or is there some way of receiving it via the almanac? Or are good receivers hardwired as well, only with a different valid span? I would not be surprised when good receivers turn out to have just a different moment or mode of failure. [] Some receivers have battery backup, in fact all but one of the receiver types I use have this. Ok but what happens when the battery is replaced? [] Hope and pray? Wish for a large capacitor or flash-rom? I had thought that either ephemeris or almanac data might contain the real UTC time, but apparently it does not. Obviously a system designed too far in advance of the Year2000 fuss and bother! They completely avoid it by not numbering it that way. They have their own numbering scheme that fit's the system, and the conversion over to UTC is an added feature. It's all in ICD-GPS-200 for the current set of details, and in the ION red book series for the early stages. GPS and GPS problems is best understood if you realize that everything is counted in the GPS clock machinery with it's own set of gears. Conversion isn't that hard and it is done every second in the GPS receiver. Cheers, Magnus ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Start of new GPS 1024 week epoch
Hi, On 08/14/2013 03:54 PM, unruh wrote: On 2013-08-14, Mark C. Stephens ma...@non-stop.com.au wrote: Um Let's see, Datum was bought by Austron, who was bought by ... etc. For collectors such as myself, having this 'mature' equipment still working is great. Looking at Mr Malone's code, he added 2 lines which enabled NTPD compatibility with GPS receivers that would have long ago have been sent to the TIP as waste. It is however fragile code. Ie, all kinds of situations could arise in which it would give the wrong time. Now, you may say that there are situations in which it will give the right time when, without the kludge, it would give the wrong time. This addresses a known feature of the GPS system, common over a large range of receivers. The differences between them lies in which GPS week they flip over (GPS week 500, 512 and 729 from the top of my head). The failure they have is not in their operation, but in their production of a human readable date. This is what I have proposed elsewhere (on time-nuts) and it is a sound solution considering the situation we have where the ICD-GPS-200 through it's many revisions have not provided additional bits for the L1 C/A code signal. For the L2C (and I assume also L1C, but I haven't checked yet) signal additional bits exists, but very few recievers have that support. I recommend reading the time-nuts backlog on this issue. Among the alternatives you have, it's ditching an otherwise perfectly operating GPS receiver or use the fact that the 1024 week wrap-around is bound to happen, is predictable as a systematic effect from how the GPS C/A data is structured and re-occurs over the fleet of GPS receivers. Do note that the GPS receivers does compute leap-second info correctly regardless of this 1024 offset hickup, as that information is structured modulu 1024 weeks. Cheers, Magnus ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Start of new GPS 1024 week epoch
Hi again, On 08/11/2013 03:36 PM, Magnus Danielson wrote: Hi David, On 08/11/2013 08:44 AM, David Taylor wrote: Today is the start of a new GPS 1024 week epoch - see: http://adn.agi.com/GNSSWeb/ Folks with really old GPS units are reporting problems, those of us with current millennium GPS receivers should be OK, though. I would word it differently, it's the epoch of a particular line of GPS receivers, but not of GPS itself. Remember that any Sunday, it is likely that a GPS reciever have slipped a multiple of 1024 weeks. NTP drivers should be able to recognice it and compensate for it, as it is a re-occuring bug in many recievers. This issue have been discussed over and over again at time-nuts. I forgot to mention that we have already seen HP/Agilent recievers of the Z3805A and Z3815A generation affected, and that Furuno (maker of the GPS module in them) have issued a statement relating to this wrap-around. See the time-nuts list for details. Any week is potential for older recievers, but this one seems like a real one. Cheers, Magnus ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] NIST vs. pool.ntp.org ?
On 03/27/2013 10:45 PM, David Woolley wrote: Robert Scott wrote: I am confused about the proper usage of pool.ntp.org and NIST. pool.ntp.org seems to be a collection of private sector time servers offered for all to use, but with registration expected for regular The pool system has no provision for enforcing registration. It wouldn't make sense to hand out a random server address if most of them then refused to serve you because you hadn't registered. users. And NIST has a government run set of time servers. Neither group (NIST or pool.ntp.org) seems to include or referece the other. I would hope all the pool servers ultimately reference their national equivalent of NIST and therefore what becomes, after the fact, UTC. I think you will find that Navstar (GPS) and WWV times are traceable to NIST. Yes and no. GPS is traceable to USNO. USNO and NIST have traceability between each other within the BIPM framework. MSF times are traceable to NPL. NPL is traceable to both USNO and NIST within the BIPM framework. Are they in competition? Who normally uses the NIST servers and who uses pool.ntp.org? The open NIST servers are heavily overloaded, so probably don't serve the highest quality time, but they are likely to be around for a long time. I would setup a local server under your control. It will help both from debugging and noise perspective. Cheers, Magnus ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] NIST vs. pool.ntp.org ?
Hi again Robert, On 03/28/2013 04:22 AM, Robert Scott wrote: On Thu, 28 Mar 2013 02:50:17 GMT, unruhun...@invalid.ca wrote: You really should read my posts before responding. No, I do not intend to hard-code NIST or any other server. I never said I wanted to. No, the app is not intended for all musicians. It is intended for professional piano tuners only. I sell about one per day. And I never said the pool would not be good enough for my needs. I only asked about the relative benefits of the pool vs. NIST, which E-mail sent...Blacklists answered very nicely. There is no real benefit in using either, rather you should use the mix of servers which gives you good confidence in removing false-tickers as well as good precision due to use of short distances. Look at the NTP code and book, as many of the filtering steps aims at removing noise which polute the time and frequency errors. Do the to-way time-transfer. Cheers, Magnus ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] New 60 KHz WWVB Time Format
Hi Tom, On 01/16/2013 03:36 PM, Thomas Laus wrote: I have not seen this information posted to this newsgroup. The US NIST radio station WWVB will be changing it's transmission format. The information can be found at: http://www.nist.gov/pml/div688/grp40/wwvb.cfm The old format is still being sent twice a day until the end of January 2013, but the station will only transmit the new phase modulated time code after this month. It is supposed to be compatible with the existing 'Atomic' clocks, but I have some of the original ones that were made in China that are no longer syncing. The WWVB new format has been covered in several lengthy threads on the time-nuts email-list during the last half-year or so. Look in the archives. Some of the high-precision time and frequency receivers will require modifications to handle the new format. Cheap receivers will keep working. Cheers, Magnus ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions