Re: draft-elkins-v6ops-ipv6-pdm-recommended-usage-00

Nalini Elkins Sat, 31 Aug 2013 05:20:32 -0700

From: joel jaeggli <[email protected]>
To: Nalini Elkins <[email protected]>; "MORTON JR., ALFRED C 
(AL)" <[email protected]>; IPv6 Ops WG <[email protected]>; "[email protected]" 
<[email protected]> 
Cc: "[email protected]" <[email protected]>; 
"[email protected]" <[email protected]> 
Sent: Friday, August 30, 2013 11:45 AM
Subject: Re: draft-elkins-v6ops-ipv6-pdm-recommended-usage-00

On 8/29/13 6:36 PM, Nalini Elkins wrote:
>
>
> On 8/29/13 5:47 AM, Nalini Elkins wrote:
> > Please take a look at:
> >
> >
> http://www.ripe.net/data-tools/stats/ttm/test-traffic-measurement-service
> >
> >
> > Note that what they provide is:
> >
> >    * NTP-server, the test-boxes can act a stratum 1 server for the
> machines on your network, providing time-stamps with an accuracy of 10
> microseconds.
> >With respect to the IETF 87 presentation/discussion I think is is a
> >dicussion of the quality of time and it relationship to the general
> >utility of of pdm optional header, not the availability of specific
> >point solutions.
>
> Let me back up and see if I can restate your basic concern in simple
> words so that I am sure that I understand you.
>
> I believe that you are concerned that the timestamps provided by NTP
> will not be accurate enough if the timestamps are taken from more than
> one administrative domain.
The clock in the computer is providing the timestamp, ntp is
syncronizing the clock in the computer to another clock.
>   Our PDM proposal depends on fairly reliable timestamps.
It depends on the accuracy of clocks generating the timestamps with
respect to each other.
>  So, if the timestamps are not reliable, then the PDM is not very useful.

>If a diagnostic result is dependant on low number of ms or usec level
>timestamp level accuracy and you don't have that, then yes. A related
>question is how do you know when you do or don't? if you have long
>timeseries data for two devices you may have evidence of the magnitude
>of the eror. If packets arrive from the future with respect to your
>clock that's a nice and gross indicator. Looking at a packet trace in
>the past, the information availble to reconstruct the state of the
>clocks is only the timestamps, baring external sources of that information.

The basic issue here goes far beyond our PDM header.   Accurate time 
synchronization is necessary for much data processing today.   Let me quote 
from an IBM Redbook:

"In the information technology world, time synchronization has become a 
critical component for managing the correct order of the events in distributed 
applications (transaction processing, message logging), especially for audit 
and legal purposes." http://www.redbooks.ibm.com/redbooks/pdfs/sg247280.pdf 
(section 1.1)

If you are arguing that time synchronization cannot be done to the millisecond 
or microsecond level, then a great deal of data processing would flat out not 
work.   Let me point you to something called 'parallel sysplex' which is a way 
of putting together very large computer systems 
http://www-03.ibm.com/systems/z/advantages/pso/bsvsps.html  These kind of 
extremely expensive systems are in most large data centers today.  They require 
synchronization to the microsecond level.  They do it using NTP.

One of the members of my team has implemented NTP with a consortium of 120 
separate organizations (all DIFFERENT administrative domains) and they have 
synched up to 20 - 50 milliseconds.  If anyone would like, they can contact us 
offline and we can share the implementation documentation.  If you would like, 
I can get from them the sources for timing that they use.   Multiple MASTER 
Clocks geographically dispersed for either accuracy and/or backup-contingency 
purposes may be needed.   By the way, this was done several years ago.

As far as our PDM header is concerned, what we are looking for MOST is to do 
triage.   That is to say which of the three: inbound network time, server 
processing time, and outbound network time is consistently quite large.   What 
happens in real life is that problems happen with response time to a major 
region or bank branch, and we need to quickly (hopefully VERY quickly) get the 
right team to work solving the problem.   So, what is required is that the 
differences in those times be consistently greater than any problems with time 
synchronization.   To give an example:

Assume: time synchronization varies at 50 milliseconds.

Inbound network time = 100 milliseconds
Server time = 500 milliseconds
Outbound network time = 200 milliseconds

The point of this game is to say which is the failing component.  If you add or 
subtract 50 milliseconds from the above numbers, it won't matter.  It will 
still be the server time.

If you think that we will have problems if all times are close together, (ex. 
all times = 40 milliseconds), then yes.  And, it doesn't matter.  What we are 
looking for in doing diagnostics and triage are, basically, the outliers (or 
those at the 95% to be a bit more precise). I am looking for which response 
times are among the worst.  That is where the problems are likely to be.   In 
those transactions, in my experience, I have NEVER found transactions where all 
numbers are close together.   

Definitely for trending and capacity planning purposes, we want to look at all 
response times.  But, I believe that time synchronization to the level that is 
required can be achieved today.   I have cited numerous examples of 
organizations doing so.

If our PDM proposal is accepted, it will still be 5-6 years before it is in use 
because operating systems need to be changed to use it.   I expect that time 
synchronization will become even better over time.   Notice how much more 
prevalent GPS is from even a few years ago.

Re: draft-elkins-v6ops-ipv6-pdm-recommended-usage-00

Reply via email to