Re: packet loss question

2016-07-10 Thread Mark Andrews

In message <25577fe1-6366-4d6d-b82e-a779193cb...@beckman.org>, Mel Beckman writ
es:
> Philip,
>
> Quite often slow Web page loading and email transport -- termed an
> application-layer problem because basic transport seems unaffected -- is
> due to DNS problems, particularly reverse DNS for the IP addresses
> originating your Web queries. If you have non-existent or intermittent
> IN-ADDR entries for those IPs, the remote Web servers can be timing out
> if they have older configurations that, for example, do DNS lookups in
> order to log HTTP requests and block on completion, resulting in
> timeouts. Use "nslookup x.x.x.x" command line queries (nslookup is on
> Windows, Mac and UNIX/Linux) to see if you can resolve the public IP
> addresses your users original queries from. You can find those addresses
> by visiting http://whatismyip.com from a problem desktop.
>
> A second common cause of app-specific throughput problems, particularly
> where email is involved, is failed MTU discovery. The standard Internet
> MTU is 1500 bytes, but sometimes a router misconfiguration or change in
> encapsulation type along the path through your ISP lowers that to, say,
> 1492 or 1486 bytes (MTU is in increments of 8). The result is that
> whenever your web or email client sends a maximum MTU packet, the packet
> is dropped, resulting in connection impairment. Most HTTP and Email
> packets are not max-MTU in size, so you get very uneven performance
> simulating network congestion.

The Internet Standard MTU's are 68 octets for IPv4 (RFC 791) and
1280 octets for IPv6 (RFC 2460).

Every size greater than those is subject to negotiation.  Now most
paths pass packets greater than those values.  Ethernet is very
common and passes 1500.

Encapsulated / translate traffic is also very common and has MTUs
< 1500 and affects BOTH IPv4 and IPv6 data streams and will become
more so as we move from dual stack to IPv6 only where IPv4 is a
service running on top of IPv6.

> You can force the MTU to a lower number at your border to test this. You
> typically do this at your firewall; it's a setting on the WAN interface
> config. Temporarily lower that value dramatically to something like 1440
> and see if your problem goes away. If it does, you may need to
> permanently reduce MTU, so you should try other divisible-by-8 values --
> 1492, 1486, 1478, etc -- until you find the largest one that works. I
> commonly see this when a customer switches ISPs from DSL to Cable. Cable
> providers are fond of stealing 8 or 16 bytes for their CMT headers in a
> way that breaks MTU discovery.
>
> A third frequent application-layer throughout debillitator is IPv6
> misconfiguration. If you support IPv6 for your end users, they may be
> getting directed to IPv6 web or mail servers (which are generally
> preferred via DNS) but thwarted by IPv6 transport issues, which could be
> as simple as routing or MTU, or as complex as an invisible 6-over-4 NAT
> somewhere (such as a your upstream ISP). These problems generally require
> an IPv6-competent network engineer to resolve, but you can test by
> disabling IPv6 on your network (which also requires an IPv6-competent
> network engineer :)
>
> I'm always amazed at how often these three causes are at the root of
> performance problems. So it's worth investigating each.
>
>  -mel beckman
>
> > On Jul 8, 2016, at 6:02 AM, Phillip Lynn 
> wrote:
> >
> >> On 07/07/2016 03:52 PM, Ken Chase wrote:
> >> No offence, but i swear that mtr should come with a license to use it.
> I get more
> >> questions from people accusing us of network issues with mtr in hand...
> >>
> >> You shoudlnt care that there's 80% packet loss in the middle of your
> route, unless
> >> you have actual traffic to lag-101.ear3.miami2.level3.net. I suspect
> you dont.
> >> (If you did, you'd have mtr'd to it directly of course.)
> >>
> >> As for your second trace, the sudden jump from 0% on 2nd last hop to
> 100% last
> >> hop packetloss seems like firewalling to me. (long discussion about the
> >> probabilities of getting 5 0%pl hops in a row and 100% on an
> unfirewalled
> >> endpoint elided. TL;DR: use more packets in your test -i 0.1 -c 100
> thanks).
> >>
> >> If you have 0% packetloss to your target endpoint, is there an issue
> here?
> >> What caused you to mtr?  0% pl is pretty good. You could play quake 1.0
> >> through that pl and ping time. The +20ms ATL<>CHI jump in the route
> you'd have
> >> to take up with einstein/bill nye/$deity.
> >>
> >> For the 2nd trace, the 1st hop is your latency issue (plus the big
> jump from
> >> miami<>ashburn, again the limit is c.)
> >>
> >> ICMP is allowed to be dropped by intervening routers. Someone will
> quote an RFC
> >> at us shortly.
> >>
> >> Mtr without a return route is not that useful in figuring out
> packetloss
> >> because pl requires the packet make it there and back. Pl could be
> anywhere on
> >> the return route, which is probably not symmetrical. The internet
> stopped
> >> being sym

Re: packet loss question

2016-07-10 Thread Mel Beckman
James,

You may be thinking of this presentation: 

https://www.nanog.org/meetings/nanog47/presentations/Sunday/RAS_Traceroute_N47_Sun.pdf

 -mel beckman

> On Jul 10, 2016, at 4:49 PM, James Greig  wrote:
> 
> There was a useful nanog presentation somewhere that explained this really 
> well in particular reading traceroutes correctly
> 
> Kind regards
> 
> James Greig 
> 
>> On 7 Jul 2016, at 20:17, Phillip Lynn  wrote:
>> 
>> Hi all,
>> 
>> I am writing because I do not understand what is happening.  I ran mtr 
>> against our email server and www.teco.comand below are the results.  I am 
>> not a network engineer so I am at a loss.  I think what I am seeing is maybe 
>> a hand off issue, between Frontier and Level3Miami2. If I am correct then 
>> what can I do?
>> 
>> My system is running Centos 6.5 Linux.
>> 
>> Thanks,
>> 
>> Phillip
>> 
>> 
>> 
>> (! 1011)-> sudo mtr -r netwolves.securence.com
>> HOST: x@netwolves.comLoss%   Snt   Last   Avg Best  Wrst StDev
>> 1. 172.24.109.1  0.0%100.6 0.6 0.6   0.7   0.0
>> 2. lo0-100.TAMPFL-VFTTP-322.gni  0.0%103.2 2.0 1.0   4.3   1.2
>> 3. 172.99.44.214 0.0%104.0 4.9 2.3   6.9   1.5
>> 4. ae8---0.scr02.mias.fl.fronti  0.0%109.3 9.1 7.5   9.8   1.0
>> 5. ae1---0.cbr01.mias.fl.fronti  0.0%108.9   9.1   7.6 9.7 0.7
>> 6. lag-101.ear3.Miami2.Level3.n 80.0%109.0   8.9   8.8 9.0 0.1
>> 7. 10ge9-14.core1.mia1.he.net0.0%10   14.3 13.0 7.6  18.1   4.3
>> 8. 10ge1-1.core1.atl1.he.net 0.0%10   25.6  33.2 22.4  99.7  23.6
>> 9. 10ge10-4.core1.chi1.he.net0.0%10   45.6  51.8 45.5  82.7  12.5
>> 10. 100ge14-2.core1.msp1.he.net   0.0%10   53.6  63.9 53.6 125.2  21.8
>> 11. t4-2-usi-cr02-mpls-usinterne  0.0%10   53.2  73.1 53.2 225.6  54.0
>> 12. v102.usi-cr04-mtka.usinterne  0.0%10   53.2  53.9 53.2  55.3   0.6
>> 13. netwolves.securence.com   0.0%10   53.4  53.9 53.4  55.4   0.7
>> 
>> (! 1014)-> sudo mtr -r www.teco.com
>> HOST: x@netwolves.comLoss%   Snt   Last   Avg Best  Wrst StDev
>> 1. 172.24.109.1  0.0%100.6 0.6 0.6   0.7   0.0
>> 2. lo0-100.TAMPFL-VFTTP-322.gni  0.0%10  104.8 81.4 1.1 113.2  43.2
>> 3. 172.99.47.198 0.0%10  115.0 77.8 2.9 115.0  40.2
>> 4. ae7---0.scr01.mias.fl.fronti  0.0%10  111.1 80.2 8.5 113.5  41.3
>> 5. ae0---0.cbr01.mias.fl.fronti  0.0%10  105.9  82.2   7.6 115.4 33.8
>> 6. lag-101.ear3.Miami2.Level3.n 70.0%10  116.1  80.2   8.5 116.1 62.0
>> 7. NTT-level3-80G.Miami.Level3.  0.0%10  110.0 81.5 9.0 120.3  41.9
>> 8. ae-3.r20.miamfl02.us.bb.gin.  0.0%10  119.8  84.0 10.0 119.8  38.5
>> 9. ae-4.r23.asbnva02.us.bb.gin. 10.0%10  137.4 107.6 30.1 142.7  45.7
>> 10. ae-2.r05.asbnva02.us.bb.gin.  0.0%10  135.0 109.9 36.6 140.0  39.1
>> 11. xe-0-9-0-8.r05.asbnva02.us.c  0.0%10  147.5 125.6 49.4 165.5  41.1
>> 12. 24.52.112.21  0.0%10  158.6 124.0 49.6 161.3  41.5
>> 13. 24.52.112.42  0.0%10  151.0 127.7 52.2 159.0  41.2
>> 14. ???  100.0100.0 0.0 0.0   0.0   0.0
>> 
>> -- 
>> Phillip Lynn
>> Software Engineer III
>> NetWolves
>> Phone:813-579-3214
>> Fax:813-882-0209
>> Email: phillip.l...@netwolves.com
>> www.netwolves.com
>> 
> 


Re: packet loss question

2016-07-10 Thread James Greig
There was a useful nanog presentation somewhere that explained this really well 
in particular reading traceroutes correctly

Kind regards

James Greig 

> On 7 Jul 2016, at 20:17, Phillip Lynn  wrote:
> 
> Hi all,
> 
>  I am writing because I do not understand what is happening.  I ran mtr 
> against our email server and www.teco.comand below are the results.  I am not 
> a network engineer so I am at a loss.  I think what I am seeing is maybe a 
> hand off issue, between Frontier and Level3Miami2. If I am correct then what 
> can I do?
> 
>  My system is running Centos 6.5 Linux.
> 
> Thanks,
> 
> Phillip
> 
> 
> 
> (! 1011)-> sudo mtr -r netwolves.securence.com
> HOST: x@netwolves.comLoss%   Snt   Last   Avg Best  Wrst StDev
>  1. 172.24.109.1  0.0%100.6 0.6 0.6   0.7   0.0
>  2. lo0-100.TAMPFL-VFTTP-322.gni  0.0%103.2 2.0 1.0   4.3   1.2
>  3. 172.99.44.214 0.0%104.0 4.9 2.3   6.9   1.5
>  4. ae8---0.scr02.mias.fl.fronti  0.0%109.3 9.1 7.5   9.8   1.0
>  5. ae1---0.cbr01.mias.fl.fronti  0.0%108.9   9.1   7.6 9.7 0.7
>  6. lag-101.ear3.Miami2.Level3.n 80.0%109.0   8.9   8.8 9.0 0.1
>  7. 10ge9-14.core1.mia1.he.net0.0%10   14.3 13.0 7.6  18.1   4.3
>  8. 10ge1-1.core1.atl1.he.net 0.0%10   25.6  33.2 22.4  99.7  23.6
>  9. 10ge10-4.core1.chi1.he.net0.0%10   45.6  51.8 45.5  82.7  12.5
> 10. 100ge14-2.core1.msp1.he.net   0.0%10   53.6  63.9 53.6 125.2  21.8
> 11. t4-2-usi-cr02-mpls-usinterne  0.0%10   53.2  73.1 53.2 225.6  54.0
> 12. v102.usi-cr04-mtka.usinterne  0.0%10   53.2  53.9 53.2  55.3   0.6
> 13. netwolves.securence.com   0.0%10   53.4  53.9 53.4  55.4   0.7
> 
> (! 1014)-> sudo mtr -r www.teco.com
> HOST: x@netwolves.comLoss%   Snt   Last   Avg Best  Wrst StDev
>  1. 172.24.109.1  0.0%100.6 0.6 0.6   0.7   0.0
>  2. lo0-100.TAMPFL-VFTTP-322.gni  0.0%10  104.8 81.4 1.1 113.2  43.2
>  3. 172.99.47.198 0.0%10  115.0 77.8 2.9 115.0  40.2
>  4. ae7---0.scr01.mias.fl.fronti  0.0%10  111.1 80.2 8.5 113.5  41.3
>  5. ae0---0.cbr01.mias.fl.fronti  0.0%10  105.9  82.2   7.6 115.4 33.8
>  6. lag-101.ear3.Miami2.Level3.n 70.0%10  116.1  80.2   8.5 116.1 62.0
>  7. NTT-level3-80G.Miami.Level3.  0.0%10  110.0 81.5 9.0 120.3  41.9
>  8. ae-3.r20.miamfl02.us.bb.gin.  0.0%10  119.8  84.0 10.0 119.8  38.5
>  9. ae-4.r23.asbnva02.us.bb.gin. 10.0%10  137.4 107.6 30.1 142.7  45.7
> 10. ae-2.r05.asbnva02.us.bb.gin.  0.0%10  135.0 109.9 36.6 140.0  39.1
> 11. xe-0-9-0-8.r05.asbnva02.us.c  0.0%10  147.5 125.6 49.4 165.5  41.1
> 12. 24.52.112.21  0.0%10  158.6 124.0 49.6 161.3  41.5
> 13. 24.52.112.42  0.0%10  151.0 127.7 52.2 159.0  41.2
> 14. ???  100.0100.0 0.0 0.0   0.0   0.0
> 
> -- 
> Phillip Lynn
> Software Engineer III
> NetWolves
> Phone:813-579-3214
> Fax:813-882-0209
> Email: phillip.l...@netwolves.com
> www.netwolves.com
> 



Re: Leap Second planned for 2016

2016-07-10 Thread Jimmy Hess
On Sun, Jul 10, 2016 at 3:27 AM, Saku Ytti  wrote:

[snip]
> a) use UTC or unix time, and accept that code is broken
[snip]

The Unix time format  might be an unsuitable time representation for
applications which require clock precision or time precision within a
few seconds  for the purposes of Timestamping or synchronizing events
 down to a  Per-Second or   Subsecond resolution.

Suggest revising Unix/POSIX   Time implementation to use a 3-Tuple
representation of calendar time,  instead of a single Integer.

typedef   int64_t  time_t [3];

 [  Delta from Epoch  in Seconds,  Delta in Microseconds,
Cumulative  Leap Adjustment from the Epoch in Microseconds]

Thus to compare two  timestamps  A and B

long long difference_in_seconds(time_t A, time_t B) {

return  (B[0] - A[0])  +   ( B[1] - A[1]  +  B[2] - A[2] ) /100;

}

-- 
-JH


Falsehoods programmers believe about time, etc (was Re: Leap Second planned for 2016)

2016-07-10 Thread Jay R. Ashworth
- Original Message -
> From: "Chris Adams" 

> Once upon a time, Patrick W. Gilmore  said:
>> But time _DOES_ flow. The seconds count
>>  58, 59, 60, 00, 01, …
>> If you can’t keep up, that’s not UTC’s fault.

[ ... ]

> Leap second handling code is not well-tested and is an ultimate corner
> case.  There's been debate about abolishing leap seconds; with all the
> every-day bugs people have to deal with, few people set up a special
> test environment to handle something that may never happen again (until
> you get less than six months warning that it'll happen at least once
> more), and even then, tests tend to focus on what broke before, because
> it is really hard to test EVERYTHING.

If this particular issue is your beat -- or your avocation -- you really should
read both these blog postings, and all their comments; they are nearly
comprehensive:

  
http://infiniteundo.com/post/25326999628/falsehoods-programmers-believe-about-time

and

  
http://infiniteundo.com/post/25509354022/more-falsehoods-programmers-believe-about-time

They are also both funny as hell.



To myself be comprehensive, I should point out a companion piece about names:

  
https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/

and there are similar lists for phone numbers, geography, civil addresses and 
gender,
linked from this thread:

  https://news.ycombinator.com/item?id=11321236

If you write any code that has to interface with the outside world, these are 
pieces
I think you should read at least annually.

Cheers,
-- jra


-- 
Jay R. Ashworth  Baylink   j...@baylink.com
Designer The Things I Think   RFC 2100
Ashworth & Associates   http://www.bcp38.info  2000 Land Rover DII
St Petersburg FL USA  BCP38: Ask For It By Name!   +1 727 647 1274


Re: Leap Second planned for 2016

2016-07-10 Thread Jay R. Ashworth
- Original Message -
> From: "Andrew Gallo" 

> Looks like we'll have another second in 2016:
> http://www.space.com/33361-leap-second-2016-atomic-clocks.html

"5... 4... 3... 2... 1... Zero!... Happy New Year!"

But only if you're in London.  7pm EDT.

Cheers,
-- jr 'not on my birthday, damn'
-- 
Jay R. Ashworth  Baylink   j...@baylink.com
Designer The Things I Think   RFC 2100
Ashworth & Associates   http://www.bcp38.info  2000 Land Rover DII
St Petersburg FL USA  BCP38: Ask For It By Name!   +1 727 647 1274


Re: Leap Second planned for 2016

2016-07-10 Thread Mikael Abrahamsson

On Sun, 10 Jul 2016, Saku Ytti wrote:


On 10 July 2016 at 00:12,   wrote:

It doesn't help that the POSIX standard doesn't represent leap seconds
anyplace, so any elapsed time calculation that crosses a leap second
is guaranteed to be wrong


So how can we solve the problem? Immediately and long term?


Since one problem is that the leap second code isn't exercised regularily, 
I propose that each month there is a leap second either forward or 
backward. These forward/backward motions should be fudged to over time 
make sure that we stay pretty much correct.


If POSIX needs to be changed, then change it. By making leap second not a 
rare event, this would hopefully mean it'll get taken more serously and 
the code would receive wider testing than today.


--
Mikael Abrahamssonemail: swm...@swm.pp.se


Re: Leap Second planned for 2016

2016-07-10 Thread Steve Allen
On Sun 2016-07-10T11:27:33 +0300, Saku Ytti hath writ:
> So how can we solve the problem? Immediately and long term?

The ITU-R had the question of leap seconds on their agenda for 14
years and did not come up with an answer.  Their 2015 decision was to
drop the question and ask an alphabet soup of international acronym
agencies to come up with something better by 2023.

The problem remains that simply abandoning leap seconds has the effect
of redefining the calendar, and Pope Gregory's last attempt to do that
took 300 years to consolidate.  For time scales there are three
desirable goals, but it is only possible to pick two

http://www.ucolick.org/~sla/leapsecs/picktwo.html

--
Steve Allen  WGS-84 (GPS)
UCO/Lick Observatory--ISB 260  Natural Sciences II, Room 165  Lat  +36.99855
1156 High Street   Voice: +1 831 459 3046 Lng -122.06015
Santa Cruz, CA 95064   http://www.ucolick.org/~sla/   Hgt +250 m


Re: Leap Second planned for 2016

2016-07-10 Thread Saku Ytti
On 10 July 2016 at 00:12,   wrote:
> It doesn't help that the POSIX standard doesn't represent leap seconds
> anyplace, so any elapsed time calculation that crosses a leap second
> is guaranteed to be wrong

So how can we solve the problem? Immediately and long term?

a) use UTC or unix time, and accept that code is broken
b) migrate to CLOCK_MONOTONIC, and accept that epoch is unknown (you
cannot serialise clock and consume it in another system)
c) use NTP smear to make clocks run incorrectly to hide the problem
d) use GPSTIME or TAI and implement leaps at last possible moment (at
the presentation layer)
e) wait for 2023 and hope the problem goes away

-- 
  ++ytti