Headscratcher of the week

2013-05-31 Thread Mike

Gang,

	In the interest of sharing 'the weird stuff' which makes the job of 
being an operator ... uh, fun? is that the right word?..., I would like 
to present the following two smokeping latency/packetloss plots, which 
are by far the weirdest I have ever seen.


	These plots are from our smokeping host out to a customer location. The 
customer is connected via DSL and they run PPPoE over it to connect with 
our access concentrator. There is about 5 physical insfastructure hops 
between the host and customer; The switch, the BRAS, the Switch again, 
and then directly to the DSLAM and then customer on the end.



The 10 day plot:
http://picpaste.com/10_Day_graph-YV3IdvRV.png

The 30 hour plot:
http://picpaste.com/30_hour_graph-DrwzfhYJ.png


	How can you possibly have consistent increase in latency like that? I'd 
love to hear theories (or offers of beer, your choice!).


Happy friday all!


Mike-



Re: Headscratcher of the week

2013-05-31 Thread Jonathan Lassoff
Those are some truly perplexing graphs. Quite strange that it appears
linear, as if something is slightly changing over time or
growing/shrinking at a constant-ish rate.

Do you have throughput or PPS graphs for the intermediate links as
well? Any similar correlations in the derivative slope?

My only hunch would be some intermediate buffer being increasingly
full over time, as some other application riding the path linearly
grows in packets/second or bits/second.

Cheers,
jof

On Fri, May 31, 2013 at 3:25 PM, Mike mike-na...@tiedyenetworks.com wrote:
 Gang,

 In the interest of sharing 'the weird stuff' which makes the job of
 being an operator ... uh, fun? is that the right word?..., I would like to
 present the following two smokeping latency/packetloss plots, which are by
 far the weirdest I have ever seen.

 These plots are from our smokeping host out to a customer location.
 The customer is connected via DSL and they run PPPoE over it to connect with
 our access concentrator. There is about 5 physical insfastructure hops
 between the host and customer; The switch, the BRAS, the Switch again, and
 then directly to the DSLAM and then customer on the end.


 The 10 day plot:
 http://picpaste.com/10_Day_graph-YV3IdvRV.png

 The 30 hour plot:
 http://picpaste.com/30_hour_graph-DrwzfhYJ.png


 How can you possibly have consistent increase in latency like that?
 I'd love to hear theories (or offers of beer, your choice!).

 Happy friday all!


 Mike-




Re: Headscratcher of the week

2013-05-31 Thread Jeff Kell
OK, here's a wild guess from left-field.  Well, at least from left-field
where I made at least one game-saving catch :)

We had a similar case some years back, but it was a ramp-up in overall
traffic we were looking at.  If you're looking at latency, it could be
related to traffic (do you have traffic graphs?).

One particular user that was accustomed to Windows and trying to get
started with Linux was playing games with our NAT firewall.  Rather
than file a request with us for a static NAT and firewall openings for
their new Linux server, they discovered that as long as they generated
some internet traffic periodically, they could defeat the NAT
translation timeout, and essentially keep a static outside IP.

Problem was, they crontabed a ping of an outside server to run once
a minute.  Just a ping x.x.x.x.

Windows as we know defaults to only ping 4 times then quit.

Linux does not :)

So you might look for some recurring scheduled event on the customer's
end that might be cumulative rather than simply recurring.

Jeff

On 5/31/2013 6:25 PM, Mike wrote:
 Gang,

 In the interest of sharing 'the weird stuff' which makes the job
 of being an operator ... uh, fun? is that the right word?..., I would
 like to present the following two smokeping latency/packetloss plots,
 which are by far the weirdest I have ever seen.

 These plots are from our smokeping host out to a customer
 location. The customer is connected via DSL and they run PPPoE over it
 to connect with our access concentrator. There is about 5 physical
 insfastructure hops between the host and customer; The switch, the
 BRAS, the Switch again, and then directly to the DSLAM and then
 customer on the end.


 The 10 day plot:
 http://picpaste.com/10_Day_graph-YV3IdvRV.png

 The 30 hour plot:
 http://picpaste.com/30_hour_graph-DrwzfhYJ.png


 How can you possibly have consistent increase in latency like
 that? I'd love to hear theories (or offers of beer, your choice!).

 Happy friday all!


 Mike-







Re: Headscratcher of the week

2013-05-31 Thread Brett Frankenberger
On Fri, May 31, 2013 at 03:25:22PM -0700, Mike wrote:
 Gang,
 
   In the interest of sharing 'the weird stuff' which makes the job of
 being an operator ... uh, fun? is that the right word?..., I would
 like to present the following two smokeping latency/packetloss
 plots, which are by far the weirdest I have ever seen.
 
   These plots are from our smokeping host out to a customer location.
 The customer is connected via DSL and they run PPPoE over it to
 connect with our access concentrator. There is about 5 physical
 insfastructure hops between the host and customer; The switch, the
 BRAS, the Switch again, and then directly to the DSLAM and then
 customer on the end.
 
 
 The 10 day plot:
 http://picpaste.com/10_Day_graph-YV3IdvRV.png
 
 The 30 hour plot:
 http://picpaste.com/30_hour_graph-DrwzfhYJ.png
 
   How can you possibly have consistent increase in latency like that?
 I'd love to hear theories (or offers of beer, your choice!).

Theory:

There's a stateful device (firewall, NAT, something else) in the path
that is creating state for every ICMP Echo Request it forwards and
(possibly) searching that state when forwarding the ICMP Echo Reply
responses, and never destroying that state, and either the create
operation or the search operation (or both) takes an amount of time
that is a linear function of the number of state entries.

 -- Brett




Re: Headscratcher of the week

2013-05-31 Thread Jake Khuon

On 31/05/13 17:30, Brett Frankenberger wrote:


How can you possibly have consistent increase in latency like that?
I'd love to hear theories (or offers of beer, your choice!).


Variation of the buffer filling theory is that there's some 
QoS/traffic-shaping going on which is causing your ping packets to get 
classed and policed into an ever depleting buffer pool.


I wonder what would happen to the pattern if you reset the interface. |8^)


--
/*=[ Jake Khuon kh...@neebu.net ]=+
 | Packet Plumber, Network Engineers /| / [~ [~ |) | |  |
 | for Effective Bandwidth Utilisation  / |/  [_ [_ |) |_| NETWORKS |
 +==*/



Re: Headscratcher of the week

2013-05-31 Thread Blake Dunlap
I agree with previous poster, table size progression and corresponding
increase in search delay, probably related directly to the monitoring
itself, or at least a connection state of some kind.


On Fri, May 31, 2013 at 7:40 PM, Jake Khuon kh...@neebu.net wrote:

 On 31/05/13 17:30, Brett Frankenberger wrote:

  How can you possibly have consistent increase in latency like
 that?
 I'd love to hear theories (or offers of beer, your choice!).


 Variation of the buffer filling theory is that there's some
 QoS/traffic-shaping going on which is causing your ping packets to get
 classed and policed into an ever depleting buffer pool.

 I wonder what would happen to the pattern if you reset the interface. |8^)


 --
 /*=[ Jake Khuon kh...@neebu.net ]=+
  | Packet Plumber, Network Engineers /| / [~ [~ |) | |  |
  | for Effective Bandwidth Utilisation  / |/  [_ [_ |) |_| NETWORKS |
  +=**==**===*/