Re: MTU to CDN's

2018-01-19 Thread Mikael Abrahamsson

On Sat, 20 Jan 2018, Mark Andrews wrote:


Which doesn’t work with IPv6 as UDP doesn’t have the field to clamp.


Well, not with UDP/IPv4 either. Actually, the only protocol I know out 
there that has this kind of clamping (and is in wide use for clamping), is 
TCP.


Thus, my earlier comment about making strong advise for protocols using 
PLMTUD.


--
Mikael Abrahamssonemail: swm...@swm.pp.se


Re: MTU to CDN's

2018-01-19 Thread Mark Andrews
Which doesn’t work with IPv6 as UDP doesn’t have the field to clamp. 

-- 
Mark Andrews

> On 20 Jan 2018, at 03:35, Radu-Adrian Feurdean 
>  wrote:
> 
>> On Fri, Jan 19, 2018, at 01:14, Jared Mauch wrote:
>> If you’re then doing DSL + PPPoE and your customers really see a MTU
>> of 1492 or less, then another device has to fragment 5x again.
> 
> In this part of the world we have even worse stuff around: PPP over L2TP over 
> over IP with 1500 MTU interconnection. Remove another 40 bytes. Add some more 
> headers for various tunneling scenarios and you may get into a situation 
> where even 1400 is too much.  But it usually works with MSS clamping to the 
> correct value. Some small ISPs don't even make the effort to check if the 
> transport supports "more than 1500" in order to give the 1500 bytes to the 
> customer - they just clamp down MSS.



Re: MTU to CDN's

2018-01-19 Thread Radu-Adrian Feurdean
On Fri, Jan 19, 2018, at 01:14, Jared Mauch wrote:
> If you’re then doing DSL + PPPoE and your customers really see a MTU
> of 1492 or less, then another device has to fragment 5x again.

In this part of the world we have even worse stuff around: PPP over L2TP over 
over IP with 1500 MTU interconnection. Remove another 40 bytes. Add some more 
headers for various tunneling scenarios and you may get into a situation where 
even 1400 is too much.  But it usually works with MSS clamping to the correct 
value. Some small ISPs don't even make the effort to check if the transport 
supports "more than 1500" in order to give the 1500 bytes to the customer - 
they just clamp down MSS.


Re: MTU to CDN's

2018-01-19 Thread William Herrin
On Fri, Jan 19, 2018 at 9:07 AM, Mike Hammett  wrote:
> Wouldn't those situations be causing issues now, given the likelihood that 
> someone with a less than 1,500 byte MTU is communicating with you now?

Hi Mike,

They do. These are the people calling your support line with the
complaint that they can't get to your web site from home, but can from
work (or vice versa). Your web site is "obviously" working and the
calls are infrequent, so support advises there's a problem with the
customer's ISP.

Regards,
Bill Herrin


-- 
William Herrin  her...@dirtside.com  b...@herrin.us
Dirtside Systems . Web: 


Re: MTU to CDN's

2018-01-19 Thread Vincent Bernat
 ❦ 19 janvier 2018 08:07 -0600, Mike Hammett  :

> Wouldn't those situations be causing issues now, given the likelihood
> that someone with a less than 1,500 byte MTU is communicating with you
> now?

Those situations are causing issues now. If you have a MTU less than
1500 bytes, it is likely some destination are unreachable to you if you
only rely on PMTUD. People usually rely on TCP MSS for those cases.
-- 
I'll burn my books.
-- Christopher Marlowe


Re: MTU to CDN's

2018-01-19 Thread William Herrin
On Fri, Jan 19, 2018 at 8:58 AM, Jared Mauch  wrote:
>> On Jan 18, 2018, at 8:44 PM, William Herrin  wrote:
>>> Which packet?  Is there a specific CDN that does this?  I’d be curious to 
>>> see
>>> data vs speculation.
>>
>> Path MTU discovery (which sets the DF bit on TCP packets) is enabled
>> by default on -every- operating system that's shipped for decades now.
>
> I’m not seeing this in a PCAP capture to at least one CDN, either from my
> host or from the CDN endpoint.
> PCAP: https://puck.nether.net/~jared/akamai.pcap

Hi Jared,

tcpdump -v -n -nn -r akamai.pcap |more
reading from file akamai.pcap, link-type EN10MB (Ethernet)

08:54:48.611321 IP (tos 0x0, ttl 64, id 12596, offset 0, flags [DF],
proto TCP (6), length 60)
204.42.254.5.60262 > 23.0.51.165.80: Flags [S], cksum 0x1504
(incorrect -> 0x5a14), seq 3315894416, win 29200, options [mss
1460,sackOK,TS val 3822930236 ecr 0,nop,wscale 7], length 0

08:54:48.633286 IP (tos 0x0, ttl 58, id 0, offset 0, flags [DF], proto
TCP (6), length 60)
23.0.51.165.80 > 204.42.254.5.60262: Flags [S.], cksum 0x0972
(correct), seq 3383397658, ack 3315894417, win 28960, options [mss
1460,sackOK,TS val 2906475904 ecr 3822930236,nop,wscale 5], length 0


Note: "flags [DF]"

That means the don't fragment bit is set.

Regards,
Bill Herrin


-- 
William Herrin  her...@dirtside.com  b...@herrin.us
Dirtside Systems . Web: 


Re: MTU to CDN's

2018-01-19 Thread Olivier Benghozi
And also:

When the router generates the ICMP by punting the packet to its CPU and such 
traffic is - legitimately - rate-limited to avoir crashing the router.

When the ICMP is sourced by a private IP on the router for various legitimate 
reasons (not enough public IPv4 addresses, from within a VRF, or whatever), 
while packets from private IPs are legitimately filtered when entering the 
target network.


> Le 19 janv. 2018 à 15:05, Mikael Abrahamsson  a écrit :
> 
> On Fri, 19 Jan 2018, Mike Hammett wrote:
> 
>> Other than people improperly blocking ICMP, when does PMTUD not work? Honest 
>> question, not troll.
> 
> Mismatch of MTU interface settings between interfaces, mismatch of MTU 
> between L3 devices and intermediate L2 devices, anycast services, ECMP based 
> services where the ICMP error is delivered to the wrong node.
> 
> So yes, there are plenty reasons that PMTUD doesn't work without anyone doing 
> it because of ill will or incompetence.



Re: MTU to CDN's

2018-01-19 Thread William Herrin
On Fri, Jan 19, 2018 at 8:48 AM, Mike Hammett  wrote:

> Other than people improperly blocking ICMP, when does PMTUD not work?
> Honest question, not troll.
>

Hi Mike,

One common scenario: the router's interface is numbered with an RFC 1918
private IP address. The packet is dropped because it tries to enter an
adjacent system with a source address that isn't valid for the transit.

Another common scenario: the packet is encapsulated in MPLS when it reaches
the segment which can't handle the large packet. That particular router is
not set up to decapsulate the MPLS packet and act on the IPv4 packet inside.

A third scenario: asymmetric routing. A particular router is capable of
moving packets to your destination but either intentionally or due to a
configuration error is unable to route packets back to the source.

A fourth scenario: for security reasons (part of defense in depth), a host
is only permitted to communicate with whitelisted IP addresses. Random
Internet routers are not on the whitelist.


PMTUD's routine failure demonstrates the wisdom of the end to end
principle. It's the one critical place in base IPv4 that doesn't follow it.

Regards,
Bill Herrin


-- 
William Herrin  her...@dirtside.com  b...@herrin.us
Dirtside Systems . Web: 


Re: MTU to CDN's

2018-01-19 Thread Mikael Abrahamsson

On Fri, 19 Jan 2018, Mike Hammett wrote:

Wouldn't those situations be causing issues now, given the likelihood 
that someone with a less than 1,500 byte MTU is communicating with you 
now?


If the issue is that you're letting 8996 byte packets through but not 9000 
byte packets, then no.


--
Mikael Abrahamssonemail: swm...@swm.pp.se


Re: MTU to CDN's

2018-01-19 Thread Mike Hammett
"Many folks these days just fail away from a seemingly problematic link quickly 
and don’t always identify the root cause." Agreed. 




- 
Mike Hammett 
Intelligent Computing Solutions 
http://www.ics-il.com 

Midwest-IX 
http://www.midwest-ix.com 

- Original Message -

From: "Jared Mauch" <ja...@puck.nether.net> 
To: "Mike Hammett" <na...@ics-il.net> 
Cc: "NANOG list" <nanog@nanog.org> 
Sent: Friday, January 19, 2018 8:13:02 AM 
Subject: Re: MTU to CDN's 



> On Jan 19, 2018, at 9:07 AM, Mike Hammett <na...@ics-il.net> wrote: 
> 
> Wouldn't those situations be causing issues now, given the likelihood that 
> someone with a less than 1,500 byte MTU is communicating with you now? 
> 


Tends to be more localized and less visible in many cases. 

I’m aware of at least one regional network that has duplicate packet issues 
going on and they’ve yet to understand the root cause. This can have 
performance impacts that are not always understood. 

Things get harder to diagnose when there’s multiple paths, etc.. involved. Many 
folks these days just fail away from a seemingly problematic link quickly and 
don’t always identify the root cause. 

- jared 


Re: MTU to CDN's

2018-01-19 Thread Jared Mauch


> On Jan 19, 2018, at 9:07 AM, Mike Hammett  wrote:
> 
> Wouldn't those situations be causing issues now, given the likelihood that 
> someone with a less than 1,500 byte MTU is communicating with you now? 
> 


Tends to be more localized and less visible in many cases.

I’m aware of at least one regional network that has duplicate packet issues 
going on and they’ve yet to understand the root cause.  This can have 
performance impacts that are not always understood.

Things get harder to diagnose when there’s multiple paths, etc.. involved.  
Many folks these days just fail away from a seemingly problematic link quickly 
and don’t always identify the root cause.

- jared

Re: MTU to CDN's

2018-01-19 Thread Ruairi Carroll
On 19 January 2018 at 13:48, Mike Hammett <na...@ics-il.net> wrote:

> Other than people improperly blocking ICMP, when does PMTUD not work?
> Honest question, not troll.
>
>
It can break under _certain_ scenarios with Anycast.

It can break under _certain_ scenarios in v6 with ECMP.

It can break across an LB in L4 mode, when a real behind the LB has an
unexpected MSS.

None of these scenarios are the normal, obviously, however PMTUD does have
some edge-cases.

/Ruairi



>
>
>
> -
> Mike Hammett
> Intelligent Computing Solutions
> http://www.ics-il.com
>
> Midwest-IX
> http://www.midwest-ix.com
>
> - Original Message -
>
> From: "Mikael Abrahamsson" <swm...@swm.pp.se>
> To: "Michael Crapse" <mich...@wi-fiber.io>
> Cc: "NANOG list" <nanog@nanog.org>
> Sent: Friday, January 19, 2018 1:22:02 AM
> Subject: Re: MTU to CDN's
>
> On Thu, 18 Jan 2018, Michael Crapse wrote:
>
> > I don't mind letting the client premises routers break down 9000 byte
> > packets. My ISP controls end to end connectivity. 80% of people even let
> > our techs change settings on their computer, this would allow me to give
> > ~5% increase in speeds, and less network congestion for end users for a
> one
> > time $60 service many people would want. It's also where the internet
> > should be heading... Not to beat a dead horse(re:ipv6 ) but why hasn't
> the
> > entire internet just moved to 9000(or 9600 L2) byte MTU? It was created
> for
> > the jump to gigabit... That's 4 orders of magnitude ago. The internet
> > backbone shouldn't be shuffling around 1500byte packets at 1tbps. That
> > means if you want to layer 3 that data, you need a router capable of more
> > than half a billion packets/s forwarding capacity. On the other hand,
> with
> > even just a 9000 byte MTU, TCP/IP overhead is reduced 6 fold, and
> > forwarding capacity needs just 100 or so mpps capacity. Routers that
> > forward at that rate are found for less than $2k.
>
> As usual, there are 5-10 (or more) factors playing into this. Some, in
> random order:
>
> 1. IEEE hasn't standardised > 1500 byte ethernet packets
> 2. DSL/WIFI chips typically don't support > ~2300 because reasons.
> 3. Because 2, most SoC ethernet chips don't either
> 4. There is no standardised way to understand/probe the L2 MTU to your
> next hop (ARP/ND and probing if the value actually works)
> 5. PMTUD doesn't always work.
> 6. PLPMTUD hasn't been implemented neither in protocols nor hosts
> generally.
> 7. Some implementations have been optimized to work on packets < 2000
> bytes and actually has less performance than if they have to support
> larger packets (they will allocate 2k buffer memory per packet), 9k is
> ill-fitting across 2^X values
> 8. Because of all above reasons, mixed-MTU LAN doesn't work, and it's
> going to be mixed-MTU unless you control all devices (which is typically
> not the case outside of the datacenter).
> 9. The PPS problem in hosts and routers was solved by hardware offloading
> to NICs and forwarding NPUs/ASICs with very high lookup speeds where PPS
> no longer was a big problem.
>
> On the value to choose for "large MTU", 9000 for edge and 9180 for core is
> what I advocate, after non-trivial amount of looking into this. All major
> core routing platforms work with 9180 (with JunOS only supporting this
> after 2015 or something). So if we'd want to standardise on MTU that all
> devices should support, then it's 9180, but we'd typically use 9000 in RA
> to send to devices.
>
> If we want a higher MTU to be deployable across the Internet, we need to
> make it incrementally deployable. Some key things to achieve that:
>
> 1. Get something like
> https://tools.ietf.org/html/draft-van-beijnum-multi-mtu-05 implemented.
> 2. Go to the IETF and get a document published that advises all protocols
> to support PLMTUD (RFC4821)
>
> 1 to enable mixed-MTU lans.
> 2 to enable large MTU hosts to actually be able to communicate when PMTUD
> doesn't work.
>
> With this in place (wait ~10 years), larger MTU is now incrementally
> deployable which means it'll be deployable on the Internet, and IEEE might
> actually accept to standardise > 1500 byte packets for ethernet.
>
> --
> Mikael Abrahamsson email: swm...@swm.pp.se
>
>


Re: MTU to CDN's

2018-01-19 Thread Mike Hammett
Wouldn't those situations be causing issues now, given the likelihood that 
someone with a less than 1,500 byte MTU is communicating with you now? 




- 
Mike Hammett 
Intelligent Computing Solutions 
http://www.ics-il.com 

Midwest-IX 
http://www.midwest-ix.com 

- Original Message -

From: "Mikael Abrahamsson" <swm...@swm.pp.se> 
To: "Mike Hammett" <na...@ics-il.net> 
Cc: "NANOG list" <nanog@nanog.org> 
Sent: Friday, January 19, 2018 8:05:17 AM 
Subject: Re: MTU to CDN's 

On Fri, 19 Jan 2018, Mike Hammett wrote: 

> Other than people improperly blocking ICMP, when does PMTUD not work? 
> Honest question, not troll. 

Mismatch of MTU interface settings between interfaces, mismatch of MTU 
between L3 devices and intermediate L2 devices, anycast services, ECMP 
based services where the ICMP error is delivered to the wrong node. 

So yes, there are plenty reasons that PMTUD doesn't work without anyone 
doing it because of ill will or incompetence. 

-- 
Mikael Abrahamsson email: swm...@swm.pp.se 



Re: MTU to CDN's

2018-01-19 Thread Mikael Abrahamsson

On Fri, 19 Jan 2018, Mike Hammett wrote:

Other than people improperly blocking ICMP, when does PMTUD not work? 
Honest question, not troll.


Mismatch of MTU interface settings between interfaces, mismatch of MTU 
between L3 devices and intermediate L2 devices, anycast services, ECMP 
based services where the ICMP error is delivered to the wrong node.


So yes, there are plenty reasons that PMTUD doesn't work without anyone 
doing it because of ill will or incompetence.


--
Mikael Abrahamssonemail: swm...@swm.pp.se


Re: MTU to CDN's

2018-01-19 Thread Jared Mauch
Bah, never mind.. reading my PCAP wrong :-(


> On Jan 19, 2018, at 8:58 AM, Jared Mauch  wrote:
> 
> 
> 
>> On Jan 18, 2018, at 8:44 PM, William Herrin  wrote:
>> 
>>> Which packet?  Is there a specific CDN that does this?  I’d be curious to 
>>> see
>>> data vs speculation.
>> 
>> Howdy,
>> 
>> Path MTU discovery (which sets the DF bit on TCP packets) is enabled
>> by default on -every- operating system that's shipped for decades now.
>> If you don't want it, you have to explicitly disable it. Disabling it
>> for any significant quantity of traffic is considered antisocial since
>> routers generally can't fragment in the hardware fast path.
> 
> I’m not seeing this in a PCAP capture to at least one CDN, either from my
> host or from the CDN endpoint.
> 
> I suspect you’re mistaken.
> 
> - Jared
> 
> PCAP: https://puck.nether.net/~jared/akamai.pcap



Re: MTU to CDN's

2018-01-19 Thread Jared Mauch


> On Jan 18, 2018, at 8:44 PM, William Herrin  wrote:
> 
>> Which packet?  Is there a specific CDN that does this?  I’d be curious to see
>> data vs speculation.
> 
> Howdy,
> 
> Path MTU discovery (which sets the DF bit on TCP packets) is enabled
> by default on -every- operating system that's shipped for decades now.
> If you don't want it, you have to explicitly disable it. Disabling it
> for any significant quantity of traffic is considered antisocial since
> routers generally can't fragment in the hardware fast path.

I’m not seeing this in a PCAP capture to at least one CDN, either from my
host or from the CDN endpoint.

I suspect you’re mistaken.

- Jared

PCAP: https://puck.nether.net/~jared/akamai.pcap



Re: MTU to CDN's

2018-01-19 Thread Mike Hammett
Other than people improperly blocking ICMP, when does PMTUD not work? Honest 
question, not troll. 




- 
Mike Hammett 
Intelligent Computing Solutions 
http://www.ics-il.com 

Midwest-IX 
http://www.midwest-ix.com 

- Original Message -

From: "Mikael Abrahamsson" <swm...@swm.pp.se> 
To: "Michael Crapse" <mich...@wi-fiber.io> 
Cc: "NANOG list" <nanog@nanog.org> 
Sent: Friday, January 19, 2018 1:22:02 AM 
Subject: Re: MTU to CDN's 

On Thu, 18 Jan 2018, Michael Crapse wrote: 

> I don't mind letting the client premises routers break down 9000 byte 
> packets. My ISP controls end to end connectivity. 80% of people even let 
> our techs change settings on their computer, this would allow me to give 
> ~5% increase in speeds, and less network congestion for end users for a one 
> time $60 service many people would want. It's also where the internet 
> should be heading... Not to beat a dead horse(re:ipv6 ) but why hasn't the 
> entire internet just moved to 9000(or 9600 L2) byte MTU? It was created for 
> the jump to gigabit... That's 4 orders of magnitude ago. The internet 
> backbone shouldn't be shuffling around 1500byte packets at 1tbps. That 
> means if you want to layer 3 that data, you need a router capable of more 
> than half a billion packets/s forwarding capacity. On the other hand, with 
> even just a 9000 byte MTU, TCP/IP overhead is reduced 6 fold, and 
> forwarding capacity needs just 100 or so mpps capacity. Routers that 
> forward at that rate are found for less than $2k. 

As usual, there are 5-10 (or more) factors playing into this. Some, in 
random order: 

1. IEEE hasn't standardised > 1500 byte ethernet packets 
2. DSL/WIFI chips typically don't support > ~2300 because reasons. 
3. Because 2, most SoC ethernet chips don't either 
4. There is no standardised way to understand/probe the L2 MTU to your 
next hop (ARP/ND and probing if the value actually works) 
5. PMTUD doesn't always work. 
6. PLPMTUD hasn't been implemented neither in protocols nor hosts 
generally. 
7. Some implementations have been optimized to work on packets < 2000 
bytes and actually has less performance than if they have to support 
larger packets (they will allocate 2k buffer memory per packet), 9k is 
ill-fitting across 2^X values 
8. Because of all above reasons, mixed-MTU LAN doesn't work, and it's 
going to be mixed-MTU unless you control all devices (which is typically 
not the case outside of the datacenter). 
9. The PPS problem in hosts and routers was solved by hardware offloading 
to NICs and forwarding NPUs/ASICs with very high lookup speeds where PPS 
no longer was a big problem. 

On the value to choose for "large MTU", 9000 for edge and 9180 for core is 
what I advocate, after non-trivial amount of looking into this. All major 
core routing platforms work with 9180 (with JunOS only supporting this 
after 2015 or something). So if we'd want to standardise on MTU that all 
devices should support, then it's 9180, but we'd typically use 9000 in RA 
to send to devices. 

If we want a higher MTU to be deployable across the Internet, we need to 
make it incrementally deployable. Some key things to achieve that: 

1. Get something like 
https://tools.ietf.org/html/draft-van-beijnum-multi-mtu-05 implemented. 
2. Go to the IETF and get a document published that advises all protocols 
to support PLMTUD (RFC4821) 

1 to enable mixed-MTU lans. 
2 to enable large MTU hosts to actually be able to communicate when PMTUD 
doesn't work. 

With this in place (wait ~10 years), larger MTU is now incrementally 
deployable which means it'll be deployable on the Internet, and IEEE might 
actually accept to standardise > 1500 byte packets for ethernet. 

-- 
Mikael Abrahamsson email: swm...@swm.pp.se 



Re: MTU to CDN's

2018-01-18 Thread Mikael Abrahamsson

On Thu, 18 Jan 2018, Michael Crapse wrote:


I don't mind letting the client premises routers break down 9000 byte
packets. My ISP controls end to end connectivity. 80% of people even let
our techs change settings on their computer, this would allow me to give
~5% increase in speeds, and less network congestion for end users for a one
time $60 service many people would want. It's also where the internet
should be heading... Not to beat a dead horse(re:ipv6 ) but why hasn't the
entire internet just moved to 9000(or 9600 L2) byte MTU? It was created for
the jump to gigabit... That's 4 orders of magnitude ago. The internet
backbone shouldn't be shuffling around 1500byte packets at 1tbps. That
means if you want to layer 3 that data, you need a router capable of more
than half a billion packets/s forwarding capacity. On the other hand, with
even just a 9000 byte MTU, TCP/IP overhead is reduced 6 fold, and
forwarding capacity needs just 100 or so mpps capacity. Routers that
forward at that rate are found for less than $2k.


As usual, there are 5-10 (or more) factors playing into this. Some, in 
random order:


1. IEEE hasn't standardised > 1500 byte ethernet packets
2. DSL/WIFI chips typically don't support > ~2300 because reasons.
3. Because 2, most SoC ethernet chips don't either
4. There is no standardised way to understand/probe the L2 MTU to your 
next hop (ARP/ND and probing if the value actually works)

5. PMTUD doesn't always work.
6. PLPMTUD hasn't been implemented neither in protocols nor hosts 
generally.
7. Some implementations have been optimized to work on packets < 2000 
bytes and actually has less performance than if they have to support 
larger packets (they will allocate 2k buffer memory per packet), 9k is 
ill-fitting across 2^X values
8. Because of all above reasons, mixed-MTU LAN doesn't work, and it's 
going to be mixed-MTU unless you control all devices (which is typically 
not the case outside of the datacenter).
9. The PPS problem in hosts and routers was solved by hardware offloading 
to NICs and forwarding NPUs/ASICs with very high lookup speeds where PPS 
no longer was a big problem.


On the value to choose for "large MTU", 9000 for edge and 9180 for core is 
what I advocate, after non-trivial amount of looking into this. All major 
core routing platforms work with 9180 (with JunOS only supporting this 
after 2015 or something). So if we'd want to standardise on MTU that all 
devices should support, then it's 9180, but we'd typically use 9000 in RA 
to send to devices.


If we want a higher MTU to be deployable across the Internet, we need to 
make it incrementally deployable. Some key things to achieve that:


1. Get something like 
https://tools.ietf.org/html/draft-van-beijnum-multi-mtu-05 implemented.
2. Go to the IETF and get a document published that advises all protocols 
to support PLMTUD (RFC4821)


1 to enable mixed-MTU lans.
2 to enable large MTU hosts to actually be able to communicate when PMTUD 
doesn't work.


With this in place (wait ~10 years), larger MTU is now incrementally 
deployable which means it'll be deployable on the Internet, and IEEE might 
actually accept to standardise > 1500 byte packets for ethernet.


--
Mikael Abrahamssonemail: swm...@swm.pp.se


Re: MTU to CDN's

2018-01-18 Thread Michael Crapse
I don't mind letting the client premises routers break down 9000 byte
packets. My ISP controls end to end connectivity. 80% of people even let
our techs change settings on their computer, this would allow me to give
~5% increase in speeds, and less network congestion for end users for a one
time $60 service many people would want. It's also where the internet
should be heading... Not to beat a dead horse(re:ipv6 ) but why hasn't the
entire internet just moved to 9000(or 9600 L2) byte MTU? It was created for
the jump to gigabit... That's 4 orders of magnitude ago. The internet
backbone shouldn't be shuffling around 1500byte packets at 1tbps. That
means if you want to layer 3 that data, you need a router capable of more
than half a billion packets/s forwarding capacity. On the other hand, with
even just a 9000 byte MTU, TCP/IP overhead is reduced 6 fold, and
forwarding capacity needs just 100 or so mpps capacity. Routers that
forward at that rate are found for less than $2k.

On 18 January 2018 at 23:31, Vincent Bernat  wrote:

>  ❦ 18 janvier 2018 22:06 -0700, Michael Crapse  :
>
> > Why though? If i could get the major CDNs all inside my network willing
> to
> > run 9000 byte packets, My routers just got that much cheaper and less
> > loaded. The Routing capacity of x86 is hindered only by forwarding
> > capacity(PPS), not data line rate.
>
> Unless your clients use a 9000-byte MTU, you won't see a difference but
> you'll have to deal with broken PMTUD (or have your routers fragment).
> --
> Many a writer seems to think he is never profound except when he can't
> understand his own meaning.
> -- George D. Prentice
>


Re: MTU to CDN's

2018-01-18 Thread Vincent Bernat
 ❦ 19 janvier 2018 08:53 +1000, George Michaelson  :

> if I was an ISP (Im not) and a CDN came and said "we want to be inside
> you" (ewww) why wouldn't I say "sure: lets jumbo"

Most traffic would be with clients limited to at most 1500 bytes.
-- 
Its name is Public Opinion.  It is held in reverence.  It settles everything.
Some think it is the voice of God.
-- Mark Twain


Re: MTU to CDN's

2018-01-18 Thread William Herrin
On Thu, Jan 18, 2018 at 7:41 PM, Jared Mauch  wrote:
>> On Jan 18, 2018, at 7:32 PM, William Herrin  wrote:
>>
>> On Thu, Jan 18, 2018 at 7:14 PM, Jared Mauch  wrote:
>>> lets say i can
>>> send you a 9K packet.  If you receive that frame, and realize you need
>>> to fragment, then it’s your routers job to slice 9000 into 5 x 1500.
>>
>> In practice, no, because the packet you sent had the "don't fragment"
>> bit set.
>
> Which packet?  Is there a specific CDN that does this?  I’d be curious to see
> data vs speculation.

Howdy,

Path MTU discovery (which sets the DF bit on TCP packets) is enabled
by default on -every- operating system that's shipped for decades now.
If you don't want it, you have to explicitly disable it. Disabling it
for any significant quantity of traffic is considered antisocial since
routers generally can't fragment in the hardware fast path.

Regards,
Bill


-- 
William Herrin  her...@dirtside.com  b...@herrin.us
Dirtside Systems . Web: 


Re: MTU to CDN's

2018-01-18 Thread Jared Mauch


> On Jan 18, 2018, at 7:32 PM, William Herrin  wrote:
> 
> On Thu, Jan 18, 2018 at 7:14 PM, Jared Mauch  wrote:
>> lets say i can
>> send you a 9K packet.  If you receive that frame, and realize you need
>> to fragment, then it’s your routers job to slice 9000 into 5 x 1500.
> 
> In practice, no, because the packet you sent had the "don't fragment"
> bit set.

Which packet?  Is there a specific CDN that does this?  I’d be curious to see
data vs speculation.


> That means my router is not allowed to fragment the packet.
> Instead, I must send the originating host an ICMP destination
> unreachable packet stating that the largest packet I can send further
> is 1500 bytes.
> 
> You might receive my ICMP message. You might not. After all, I am not
> the host you were looking for.

:-)

Nor is it likely the reply.

- Jared

Re: MTU to CDN's

2018-01-18 Thread Owen DeLong

> On Jan 18, 2018, at 4:32 PM, William Herrin  wrote:
> 
> On Thu, Jan 18, 2018 at 7:14 PM, Jared Mauch  wrote:
>> lets say i can
>> send you a 9K packet.  If you receive that frame, and realize you need
>> to fragment, then it’s your routers job to slice 9000 into 5 x 1500.
> 
> In practice, no, because the packet you sent had the "don't fragment"
> bit set. That means my router is not allowed to fragment the packet.
> Instead, I must send the originating host an ICMP destination
> unreachable packet stating that the largest packet I can send further
> is 1500 bytes.
> 
> You might receive my ICMP message. You might not. After all, I am not
> the host you were looking for.

This gets especially bad in cases such as anycast where the return path may be 
asymmetrical and could result in delivery of the ICMP PTB message to a 
different anycast instance or to a stateless load balancer that is incapable of 
determining which machine originated the packet being referenced.

One of the many reasons I continue to question the wisdom of using anycast for 
multi-packet transactions.

Owen

> 
> Good luck.
> 
> Regards,
> Bill Herrin
> 
> 
> P.S. This makes Linux servers happy:
> 
> iptables -t mangle --insert POSTROUTING --proto tcp \
>--tcp-flags SYN,RST,FIN SYN --match tcpmss --mss 1241:65535 \
>--jump TCPMSS --set-mss 1240
> 
> 
> 
> -- 
> William Herrin  her...@dirtside.com  b...@herrin.us
> Dirtside Systems . Web: 



Re: MTU to CDN's

2018-01-18 Thread William Herrin
On Thu, Jan 18, 2018 at 7:14 PM, Jared Mauch  wrote:
> lets say i can
> send you a 9K packet.  If you receive that frame, and realize you need
> to fragment, then it’s your routers job to slice 9000 into 5 x 1500.

In practice, no, because the packet you sent had the "don't fragment"
bit set. That means my router is not allowed to fragment the packet.
Instead, I must send the originating host an ICMP destination
unreachable packet stating that the largest packet I can send further
is 1500 bytes.

You might receive my ICMP message. You might not. After all, I am not
the host you were looking for.

Good luck.

Regards,
Bill Herrin


P.S. This makes Linux servers happy:

iptables -t mangle --insert POSTROUTING --proto tcp \
--tcp-flags SYN,RST,FIN SYN --match tcpmss --mss 1241:65535 \
--jump TCPMSS --set-mss 1240



-- 
William Herrin  her...@dirtside.com  b...@herrin.us
Dirtside Systems . Web: 


Re: MTU to CDN's

2018-01-18 Thread Jared Mauch


> On Jan 18, 2018, at 5:53 PM, George Michaelson  wrote:
> 
> if I was an ISP (Im not) and a CDN came and said "we want to be inside
> you" (ewww) why wouldn't I say "sure: lets jumbo"
> 
> not even "asking for a friend" I genuinely don't understand why a CDN
> who colocates and is not using public exchange, but is inside your
> transit boundary (which I am told is actually a bit thing now) would
> not drive to the packet size which works in your switching gear.
> 
> I understand that CDN/DC praxis now drives to cheap dumb switches, but
> even dumb switches like bigger packets dont they? less forwarding
> decision cost, for more throughput?

The reason is most customers are at a lower MTU size.  lets say i can
send you a 9K packet.  If you receive that frame, and realize you need
to fragment, then it’s your routers job to slice 9000 into 5 x 1500.
I may have caused you to hit your exception path (which could be expensive)
as well as made your PPS load 5x larger downstream.

This doesn’t even account for the fact that you may need to have a speed
mismatch, whereby I am sending 100Gb+ and your outputs may be only 10G.

If you’re then doing DSL + PPPoE and your customers really see a MTU
of 1492 or less, then another device has to fragment 5x again.

For server to server, 9K makes a lot of sense, it reduces the packet processing
and increases the throughput.  If your consumer electronic wifi gear or switch
can’t handle >1500, and doesn’t even have a setting for layer-2 > 1500, the
cost is just too high.  Much easier for me to send 5x packets in the first place
and be more compatible.

Like many things, I’d love for this to be as simple and purist as you
purport.  I might even be willing to figure out if at $DayJob we could see
a benefit from doing this, but from the servers to switches to routers then
a partner interface.. it’s a lot of things to make sure are just right.

Plus.. can your phone do > 1500 MTU on the Wifi?  Where’s that setting?

(mumbling person about CSLIP and MRUs from back in the day)

- Jared



Re: MTU to CDN's

2018-01-18 Thread George Michaelson
thanks. good answer. low risk answer. "it will work" answer.

If its a variant of "the last mile is your problem" problem, I'm ok
with that. If its a consequence of the middleware deployment I feel
like its more tangibly bad decision logic, but its real.

-G

On Fri, Jan 19, 2018 at 9:50 AM, Mark Andrews  wrote:
> Because the CDN delivers to your customers not you.  It’s your customers link
> requirements that are the ones you need to worry about.  If you support
> jumbo frames to all of your customers and their gear also supports jumbo
> frame then sure go ahead and use jumbo frames otherwise use the lowest
> common denominator MTU when transmitting.  This is less than 1500 on
> today Internet and encapsulated traffic is reasonable common.
>
> embedded CND <--> NAT64 <--> CLAT <--> client
>  1500   14XX 1500
> embedded CDN <--> B4 <— > 6RD <— > client
>  1500.   14XX 1500
>
> Now you can increase the first 1500 easily.  The rest of the path not so
> easily.
>
>> On 19 Jan 2018, at 9:53 am, George Michaelson  wrote:
>>
>> if I was an ISP (Im not) and a CDN came and said "we want to be inside
>> you" (ewww) why wouldn't I say "sure: lets jumbo"
>>
>> not even "asking for a friend" I genuinely don't understand why a CDN
>> who colocates and is not using public exchange, but is inside your
>> transit boundary (which I am told is actually a bit thing now) would
>> not drive to the packet size which works in your switching gear.
>>
>> I understand that CDN/DC praxis now drives to cheap dumb switches, but
>> even dumb switches like bigger packets dont they? less forwarding
>> decision cost, for more throughput?
>>
>> On Fri, Jan 19, 2018 at 6:21 AM, Dovid Bender  wrote:
>>> Vincent,
>>>
>>> Thanks. That URL explained a lot.
>>>
>>> On Tue, Jan 9, 2018 at 3:11 AM, Vincent Bernat  wrote:
>>>
 ❦  8 janvier 2018 15:08 -0800, joel jaeggli  :

>> N00b here trying to understand why certain CDN's such as Cloudfare have
>> issues where my MTU is low. For instance if I am using pptp and the MTU
 is
>> at 1300 it wont work. If I increase to 1478 it may or may not work.
> PMTUD has a lot of trouble working reliability when the destination of
> the PTB  is a stateless load-balancer.

 More explanations are available here:
 https://blog.cloudflare.com/path-mtu-discovery-in-practice/
 --
 Don't comment bad code - rewrite it.
- The Elements of Programming Style (Kernighan & Plauger)

>
> --
> Mark Andrews, ISC
> 1 Seymour St., Dundas Valley, NSW 2117, Australia
> PHONE: +61 2 9871 4742  INTERNET: ma...@isc.org
>


Re: MTU to CDN's

2018-01-18 Thread Mark Andrews
Because the CDN delivers to your customers not you.  It’s your customers link
requirements that are the ones you need to worry about.  If you support
jumbo frames to all of your customers and their gear also supports jumbo
frame then sure go ahead and use jumbo frames otherwise use the lowest
common denominator MTU when transmitting.  This is less than 1500 on
today Internet and encapsulated traffic is reasonable common.

embedded CND <--> NAT64 <--> CLAT <--> client
 1500   14XX 1500
embedded CDN <--> B4 <— > 6RD <— > client
 1500.   14XX 1500

Now you can increase the first 1500 easily.  The rest of the path not so
easily.

> On 19 Jan 2018, at 9:53 am, George Michaelson  wrote:
> 
> if I was an ISP (Im not) and a CDN came and said "we want to be inside
> you" (ewww) why wouldn't I say "sure: lets jumbo"
> 
> not even "asking for a friend" I genuinely don't understand why a CDN
> who colocates and is not using public exchange, but is inside your
> transit boundary (which I am told is actually a bit thing now) would
> not drive to the packet size which works in your switching gear.
> 
> I understand that CDN/DC praxis now drives to cheap dumb switches, but
> even dumb switches like bigger packets dont they? less forwarding
> decision cost, for more throughput?
> 
> On Fri, Jan 19, 2018 at 6:21 AM, Dovid Bender  wrote:
>> Vincent,
>> 
>> Thanks. That URL explained a lot.
>> 
>> On Tue, Jan 9, 2018 at 3:11 AM, Vincent Bernat  wrote:
>> 
>>> ❦  8 janvier 2018 15:08 -0800, joel jaeggli  :
>>> 
> N00b here trying to understand why certain CDN's such as Cloudfare have
> issues where my MTU is low. For instance if I am using pptp and the MTU
>>> is
> at 1300 it wont work. If I increase to 1478 it may or may not work.
 PMTUD has a lot of trouble working reliability when the destination of
 the PTB  is a stateless load-balancer.
>>> 
>>> More explanations are available here:
>>> https://blog.cloudflare.com/path-mtu-discovery-in-practice/
>>> --
>>> Don't comment bad code - rewrite it.
>>>- The Elements of Programming Style (Kernighan & Plauger)
>>> 

-- 
Mark Andrews, ISC
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742  INTERNET: ma...@isc.org



Re: MTU to CDN's

2018-01-18 Thread George Michaelson
if I was an ISP (Im not) and a CDN came and said "we want to be inside
you" (ewww) why wouldn't I say "sure: lets jumbo"

not even "asking for a friend" I genuinely don't understand why a CDN
who colocates and is not using public exchange, but is inside your
transit boundary (which I am told is actually a bit thing now) would
not drive to the packet size which works in your switching gear.

I understand that CDN/DC praxis now drives to cheap dumb switches, but
even dumb switches like bigger packets dont they? less forwarding
decision cost, for more throughput?

On Fri, Jan 19, 2018 at 6:21 AM, Dovid Bender  wrote:
> Vincent,
>
> Thanks. That URL explained a lot.
>
> On Tue, Jan 9, 2018 at 3:11 AM, Vincent Bernat  wrote:
>
>>  ❦  8 janvier 2018 15:08 -0800, joel jaeggli  :
>>
>> >> N00b here trying to understand why certain CDN's such as Cloudfare have
>> >> issues where my MTU is low. For instance if I am using pptp and the MTU
>> is
>> >> at 1300 it wont work. If I increase to 1478 it may or may not work.
>> > PMTUD has a lot of trouble working reliability when the destination of
>> > the PTB  is a stateless load-balancer.
>>
>> More explanations are available here:
>>  https://blog.cloudflare.com/path-mtu-discovery-in-practice/
>> --
>> Don't comment bad code - rewrite it.
>> - The Elements of Programming Style (Kernighan & Plauger)
>>


Re: MTU to CDN's

2018-01-18 Thread Dovid Bender
Vincent,

Thanks. That URL explained a lot.

On Tue, Jan 9, 2018 at 3:11 AM, Vincent Bernat  wrote:

>  ❦  8 janvier 2018 15:08 -0800, joel jaeggli  :
>
> >> N00b here trying to understand why certain CDN's such as Cloudfare have
> >> issues where my MTU is low. For instance if I am using pptp and the MTU
> is
> >> at 1300 it wont work. If I increase to 1478 it may or may not work.
> > PMTUD has a lot of trouble working reliability when the destination of
> > the PTB  is a stateless load-balancer.
>
> More explanations are available here:
>  https://blog.cloudflare.com/path-mtu-discovery-in-practice/
> --
> Don't comment bad code - rewrite it.
> - The Elements of Programming Style (Kernighan & Plauger)
>


Re: MTU to CDN's

2018-01-09 Thread Vincent Bernat
 ❦  8 janvier 2018 15:08 -0800, joel jaeggli  :

>> N00b here trying to understand why certain CDN's such as Cloudfare have
>> issues where my MTU is low. For instance if I am using pptp and the MTU is
>> at 1300 it wont work. If I increase to 1478 it may or may not work.
> PMTUD has a lot of trouble working reliability when the destination of
> the PTB  is a stateless load-balancer.

More explanations are available here:
 https://blog.cloudflare.com/path-mtu-discovery-in-practice/
-- 
Don't comment bad code - rewrite it.
- The Elements of Programming Style (Kernighan & Plauger)


Re: MTU to CDN's

2018-01-08 Thread Mikael Abrahamsson

On Mon, 8 Jan 2018, joel jaeggli wrote:


PMTUD has a lot of trouble working reliability when the destination of
the PTB  is a stateless load-balancer.

If your tunnel or host clamps the mss  to the appropriate value it can
support. it is highly likely that connection attempts to the same
destination will work fine.


This is understandable, but if this is also an operational practice we as 
the operational community want to condone (people using solutions where 
PMTUD doesn't work), then we also need to make sure that all applications 
do PLMTUD (RFC4821, Packet Level MTU Discovery). This is currently NOT the 
case, and from what I can tell, there isn't even an IETF document saying 
this is the best current practice.


So, is this something we want to say? We should talk about that.

--
Mikael Abrahamssonemail: swm...@swm.pp.se


Re: MTU to CDN's

2018-01-08 Thread valdis . kletnieks
On Mon, 08 Jan 2018 17:55:55 -0500, Dovid Bender said:
> Hi,
>
> N00b here trying to understand why certain CDN's such as Cloudfare have
> issues where my MTU is low. For instance if I am using pptp and the MTU is
> at 1300 it wont work. If I increase to 1478 it may or may not work.

Wait, what?  MTU 1300 fails but 1478 sometimes works?  Or was 1300 a typo
and you meant 1500?


pgpGA7EF3roTe.pgp
Description: PGP signature


Re: MTU to CDN's

2018-01-08 Thread Mark Andrews
CDN’s (or anyone using a load balancer to multiple server instances) needs to
assume that traffic may be encapsulated (4in6, 6in4, 464XLAT) and lower the
interface MTU’s so that all traffic generated can be encapsulated without
fragmentation or PTB’s being generated.

This is only going to get worse as more and more eyeballs are being forced
into using IPv4 as a service scenarios.

> On 9 Jan 2018, at 11:54 am, Jared Mauch  wrote:
> 
> On Mon, Jan 08, 2018 at 05:55:55PM -0500, Dovid Bender wrote:
>> Hi,
>> 
>> N00b here trying to understand why certain CDN's such as Cloudfare have
>> issues where my MTU is low. For instance if I am using pptp and the MTU is
>> at 1300 it wont work. If I increase to 1478 it may or may not work.
> 
>   I've done some measurements over the internet in the past year or
> so and 1400 byte packets with DF bit seem to make it just fine.
> 
>   - Jared
> 
> -- 
> Jared Mauch  | pgp key available via finger from ja...@puck.nether.net
> clue++;  | http://puck.nether.net/~jared/  My statements are only mine.

-- 
Mark Andrews, ISC
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742  INTERNET: ma...@isc.org



Re: MTU to CDN's

2018-01-08 Thread Jared Mauch
On Mon, Jan 08, 2018 at 05:55:55PM -0500, Dovid Bender wrote:
> Hi,
> 
> N00b here trying to understand why certain CDN's such as Cloudfare have
> issues where my MTU is low. For instance if I am using pptp and the MTU is
> at 1300 it wont work. If I increase to 1478 it may or may not work.

I've done some measurements over the internet in the past year or
so and 1400 byte packets with DF bit seem to make it just fine.

- Jared

-- 
Jared Mauch  | pgp key available via finger from ja...@puck.nether.net
clue++;  | http://puck.nether.net/~jared/  My statements are only mine.


Re: MTU to CDN's

2018-01-08 Thread joel jaeggli
On 1/8/18 2:55 PM, Dovid Bender wrote:

> Hi,
>
> N00b here trying to understand why certain CDN's such as Cloudfare have
> issues where my MTU is low. For instance if I am using pptp and the MTU is
> at 1300 it wont work. If I increase to 1478 it may or may not work.
PMTUD has a lot of trouble working reliability when the destination of
the PTB  is a stateless load-balancer.

If your tunnel or host clamps the mss  to the appropriate value it can
support. it is highly likely that connection attempts to the same
destination will work fine.
> TIA.
>




MTU to CDN's

2018-01-08 Thread Dovid Bender
Hi,

N00b here trying to understand why certain CDN's such as Cloudfare have
issues where my MTU is low. For instance if I am using pptp and the MTU is
at 1300 it wont work. If I increase to 1478 it may or may not work.

TIA.