From: joel jaeggli <[email protected]> To: Nalini Elkins <[email protected]>; "MORTON JR., ALFRED C (AL)" <[email protected]>; IPv6 Ops WG <[email protected]>; "[email protected]" <[email protected]> Cc: "[email protected]" <[email protected]>; "[email protected]" <[email protected]> Sent: Friday, August 30, 2013 11:45 AM Subject: Re: draft-elkins-v6ops-ipv6-pdm-recommended-usage-00
On 8/29/13 6:36 PM, Nalini Elkins wrote: > > > On 8/29/13 5:47 AM, Nalini Elkins wrote: > > Please take a look at: > > > > > http://www.ripe.net/data-tools/stats/ttm/test-traffic-measurement-service > > > > > > Note that what they provide is: > > > > * NTP-server, the test-boxes can act a stratum 1 server for the > machines on your network, providing time-stamps with an accuracy of 10 > microseconds. > >With respect to the IETF 87 presentation/discussion I think is is a > >dicussion of the quality of time and it relationship to the general > >utility of of pdm optional header, not the availability of specific > >point solutions. > > Let me back up and see if I can restate your basic concern in simple > words so that I am sure that I understand you. > > I believe that you are concerned that the timestamps provided by NTP > will not be accurate enough if the timestamps are taken from more than > one administrative domain. The clock in the computer is providing the timestamp, ntp is syncronizing the clock in the computer to another clock. > Our PDM proposal depends on fairly reliable timestamps. It depends on the accuracy of clocks generating the timestamps with respect to each other. > So, if the timestamps are not reliable, then the PDM is not very useful. >If a diagnostic result is dependant on low number of ms or usec level >timestamp level accuracy and you don't have that, then yes. A related >question is how do you know when you do or don't? if you have long >timeseries data for two devices you may have evidence of the magnitude >of the eror. If packets arrive from the future with respect to your >clock that's a nice and gross indicator. Looking at a packet trace in >the past, the information availble to reconstruct the state of the >clocks is only the timestamps, baring external sources of that information. The basic issue here goes far beyond our PDM header. Accurate time synchronization is necessary for much data processing today. Let me quote from an IBM Redbook: "In the information technology world, time synchronization has become a critical component for managing the correct order of the events in distributed applications (transaction processing, message logging), especially for audit and legal purposes." http://www.redbooks.ibm.com/redbooks/pdfs/sg247280.pdf (section 1.1) If you are arguing that time synchronization cannot be done to the millisecond or microsecond level, then a great deal of data processing would flat out not work. Let me point you to something called 'parallel sysplex' which is a way of putting together very large computer systems http://www-03.ibm.com/systems/z/advantages/pso/bsvsps.html These kind of extremely expensive systems are in most large data centers today. They require synchronization to the microsecond level. They do it using NTP. One of the members of my team has implemented NTP with a consortium of 120 separate organizations (all DIFFERENT administrative domains) and they have synched up to 20 - 50 milliseconds. If anyone would like, they can contact us offline and we can share the implementation documentation. If you would like, I can get from them the sources for timing that they use. Multiple MASTER Clocks geographically dispersed for either accuracy and/or backup-contingency purposes may be needed. By the way, this was done several years ago. As far as our PDM header is concerned, what we are looking for MOST is to do triage. That is to say which of the three: inbound network time, server processing time, and outbound network time is consistently quite large. What happens in real life is that problems happen with response time to a major region or bank branch, and we need to quickly (hopefully VERY quickly) get the right team to work solving the problem. So, what is required is that the differences in those times be consistently greater than any problems with time synchronization. To give an example: Assume: time synchronization varies at 50 milliseconds. Inbound network time = 100 milliseconds Server time = 500 milliseconds Outbound network time = 200 milliseconds The point of this game is to say which is the failing component. If you add or subtract 50 milliseconds from the above numbers, it won't matter. It will still be the server time. If you think that we will have problems if all times are close together, (ex. all times = 40 milliseconds), then yes. And, it doesn't matter. What we are looking for in doing diagnostics and triage are, basically, the outliers (or those at the 95% to be a bit more precise). I am looking for which response times are among the worst. That is where the problems are likely to be. In those transactions, in my experience, I have NEVER found transactions where all numbers are close together. Definitely for trending and capacity planning purposes, we want to look at all response times. But, I believe that time synchronization to the level that is required can be achieved today. I have cited numerous examples of organizations doing so. If our PDM proposal is accepted, it will still be 5-6 years before it is in use because operating systems need to be changed to use it. I expect that time synchronization will become even better over time. Notice how much more prevalent GPS is from even a few years ago.
